Solving Sudoku, Revisited

Thanks to Scott Y’s comments in my article regarding a Sudoku Solver I wrote as part of an AI project, I decided to write a real solver that not only can solve any Sudoku puzzle, but can do it in just a few milliseconds.

Long story short: I represented Sudoku as an exact cover problem, then used Donald Knuth’s Algorithm X and Dancing Links implementation to solve that exact cover problem. Still here? Good, it’s time for me to explain a few things.

(Update October 2013: This project is available on GitHub!)

Exact Cover

There exists a type of problem called “exact cover”, and while Wikipedia explains things ever so well, I’ll throw in my own explanation that should serve well at least in the context of Sudoku.

Imagine you are challenged to rearrange where you place your cell phone, keys, and wallet. One item must go in a pants pocket, one item must go in a coat pocket, and one item must go in a backpack—and  no two items can be in the same area. Have a solution yet? I hope you said yes. You could solve this a variety of ways: keys in pants, phone in coat, wallet in backpack, etc.

Now imagine you have a hundred pockets, and three hundred different things that must be placed in them in a certain configuration. Now things get a little more difficult. This is because the number of constraints has grown much larger, as has the number of items that could potentially satisfy those constraints. How would you go about solving this issue?

Well, an exact cover approach would do the following: create a list of constraints that must be satisfied, then create a list of potential ways to satisfy each constraint. The list of constraints become the column headers of a matrix, and the list of ways to satisfy each constraint become rows in that matrix. To simplify things, let’s consider the original pocket rearrangement problem:

Our list of constraints: pants pocket filled, coat pocket filled, backpack filled

Our list of possible placements of items: keys in pants pocket, keys in coat pocket, keys in backpack, wallet in pants pocket, wallet in coat pocket, wallet in backpack, phone in pants pocket, phone in coat pocket, phone in backpack.

Let’s look at the matrix we made:

You’ll notice I labeled the columns (constraints) as questions. The question is whether a row satisfies the constraint. Having Keys in our pants pocket satisfies the constraint that our pants should have an item in them, but not the other two constraints. Let’s look at the matrix with the answers filled in:

In reality, ‘Yes’ and ‘No’ are filled with Boolean True (1) or False (0) values:

Remember, this matrix represents our constraints, and possibilities that satisfy those constraints; we haven’t solved the problem yet. To solve the problem we have to “exactly cover” the constraint columns. That is, choose a set of rows so that a ‘1’ appears exactly once in each column.

Exact cover problems are typically represented like this: given a matrix of 1’s and 0’s, choose a set of rows so that every column has a ‘1’ in it exactly once.

BUT! We forgot a few constraints on this item rearrangement problem; we can’t exactly place our keys in all three places at once and call it a day, and we definitely need to place each of the three items somewhere. Let’s add a few constraints to reflect these requirements:

The initial solution I wrote down (pants: keys, coat: phone, backpack: wallet) does exactly cover our matrix, but we did this without even thinking, so how can we tell a computer to do it?

Algorithm X

There are a few ways to go about solving this exact cover problem; the simplest (naive) way would be to just randomly pick a set of 3 rows over and over again until a solution is found. We can be a little smarter than that though.

The famous Donald Knuth is credited with formulating an algorithm to solve this exact cover problem, an algorithm he dubbed “Algorithm X”. Again, while the Wikipedia page is incredibly helpful, I’ll offer my own interpretation using the above example matrix.

Returning to the items-in-pockets problem, when we selected keys to go in the pants pocket, we also disregarded the wallet or phone being in that pocket because of the rule that no two items could be in the same area. That is, the rows compete against each other to be part of the solution, and when one row is chosen, its competition is removed. We also disregarded the keys being in any of the other pockets, so other rows where the key gets placed get removed too.

Algorithm X functions on this exact premise: select a constraint to fulfill, try to add an entry into the solution, remove competing entries, then repeat on the remaining matrix until a solution is found or a solution is impossible to find.

Let’s run Algorithm X on our simple problem:

Step 0: The matrix is not empty, so continue

Step 1: Choose a constraint to fulfill (arbitrarily choose Pants Pocket)

Step 2: Choose a row that fulfills that constraint (Keys in pants) and add to solution

Step 3: For each constraint that row satisfies, remove any other rows that also satisfy that constraint (Wallet in Pants, Phone in Pants, Keys in Coat, Keys in Backpack), and remove the fulfilled constraints’ columns (Pants Pocket and Keys Placed)

Step 4: The matrix is not empty, so continue

Step 5: Choose a remaining constraint to fulfill (Coat Pocket)

Step 6: Choose a row that fulfills that constraint (Wallet in coat) and add to solution

Step 7: For each constraint that row satisfies, remove any other rows that also satisfy that constraint (Wallet in Backpack, Phone in coat), and remove the fulfilled constraints’ columns (Coat Pocket and Wallet Placed)


Step 8: The matrix is not empty, so continue

Step 9: Choose a remaining constraint to fulfill (Backpack)

Step 10: Choose a row that fulfills that constraint (Phone in backpack) and add to solution

Step 11: For each constraint that row satisfies, remove any other rows that also satisfy that constraint (None), and remove the fulfilled constraints’ columns (Backpack and Phone Placed)


Step 12: The matrix is empty! We’ve found a solution!

Now that you have a basic understanding of Algorithm X, think about the same problem except you are given that your phone is already in your pants pocket. You perform the same competition-removal, and now your matrix is smaller and easier to manage. This is what Sudoku is like.

Making Sudoku an Exact Cover Problem

Sudoku, like organizing your pockets, can be represented as an exact cover problem. Our matrix will just be a lot bigger.

Sudoku has a few explicit rules: Values 1-9 must be in every Row, Column, and Box; and the numbers may not repeat in any Row, Column, or Box. An implicit rule is that every cell must contain a value.

Using the almighty power of algebra, we can discern that there are 324 constraints. Every row must have 9 values in it, and there are 9 rows (9 * 9 = 81), every column must have 9 values in it, and there are 9 columns (9 *9 = 81), every box must have 9 values in it, and there are 9 boxes (9 * 9=81), and every cell must have a value in it, and there are 81 cells.  81 + 81 + 81 + 81 = 324.

Again, we can use algebra to discover that there will be 729 rows in our matrix, because there are 729 row, column, value triplets (boxes are ignored because box value is a function of row and column values). Row, column, and value triplets represent a single entry into the empty Sudoku puzzle: (row 1, column 1, value 1), ( row 1, column 1, value 2), … , (row 9, column 9, value 9). There are 9 rows, 9 columns, and 9 values in a Sudoku puzzle: 9 * 9 *9 = 729 rows in our exact cover matrix.

Each of these rows also satisfies four constraints: a row-value pair, a column-value pair, a row-column pair, and a box-value pair. That is, (row 1, column 1, value 1) satisfies the constraint that row 1 must have the value of 1 in it somewhere, column 1 must have the value of 1 in it somewhere, row 1 column 1 must have a value in it, and box 1 must have the value of 1 in it somewhere.

A sample Sudoku exact cover matrix can be seen here. Although, not an entirely accurate matrix, it should give you an example of the representation. My own complete matrix is included near the end of this article with the source code of my program.

Now that we’ve turned Sudoku into an exact cover problem, we’re ready to run our Algorithm X on it.

Dancing Links (DLX)

The Sudoku’s exact cover matrix is a little larger than our clothes-items one. Also, it has way more 0’s than 1’s in it. This type of matrix is called a sparse matrix, and if we simply stored the whole thing in a 324 x 729 array, we’d spend an inordinate amount of time searching for 1’s.

So our old friend Donald Knuth created Dancing Links, or DLX. The gist of DLX is that we represent our matrix as a quadruply-linked circular list that only stores the 1’s from our matrix.

Each node in our list has links to the nodes to its right, left, top, and bottom areas in the matrix. It also points to a special column header node.

The graphic below (found at here) depicts what our data structure looks like, though it doesn’t show that there is a link between every node and its column header.

This circular linked list not only removes the task of searching through 0s to find 1s, but is also very efficient for performing removal and re-instatement of nodes because of the simple logic that removing a node from a row simply requires:

Node->left->right = Node->right;

Node->right->left = Node->left;

And reinstating the node requires

Node->left->right = Node;

Node->right->left = Node;

Using DLX with Algorithm X is a very fast way to solve any Sudoku puzzle.

More about Cover and Uncover

The above explanation may not be clear enough as to how we remove constraints and solutions in DLX. Let’s begin with a little review.

A Circular Linked List


A circular linked list is a simple way to represent a list of items, and the very end of the list wraps around to the front. Each node in the list is a data structure that knows which node is to its left, and which is to its right:


How do we remove a node from a linked list? Let’s go through an example. Say we want to remove (cover) node ‘B’ from the above list:


  1. B tells its left neighbor (A) to connect to B’s right neighbor (C) instead of connecting to B:CircularNodeRemoval2
  2. B tells its right neighbor (C) to connect to B’s left neighbor (A) instead of connecting to B:CircularNodeRemoval3
  3. Now that B’s left and right neighbors can find each other, we can remove B from the list!CircularNodeRemoval4
  4. But if we want to put B back in later (uncover), then we won’t delete it. We’ll just hold onto it somewhere else:CircularNodeRemoval5
  5. Notice that B still knows its original neighbors. If we wanted to put it back in, then we’d simply reverse steps 1 and 2. This is how a cover and an uncover would work in a circular list.


Let’s turn our circular linked list on its side:


Now it’s a column instead of a row. Why have I colored the top node green? Because in our DLX implementation, we use special column nodes to represent a constraint. Let’s add a little more detail to that figure:


So this is how we represent a constraint in DLX — a boring old linked list BUT with the additional property that every node has a link to the special column header. In the later figures I’ll omit the link to the column header for presentation reasons, but it’s always there.

Now, in our Sudoku puzzle, we have more than one constraint, so let’s add a few other constraint columns to our figure for realism:


Now we have a series of linked lists representing our constraints. Now we need a way for these lists to reference each other. So what do we do? How about another linked list? A linked list of linked lists:


Each column in DLX is a linked list, as is each row.  Here’s what a single node’s data structure looks like:


Nodes in DLX don’t just point left and right, they also point up and down (and to their respective column headers).

When Algorithm X needs to perform covering, it uses the notion of circular linked list covering at its heart. After we choose a row to satisfy a constraint:

  1. For each 1 in the row, 
  2. Cover the column header node containing that 1, AND
  3. Cover any other row with a 1 in the same column

If we consider the fact that each column of the matrix itself is its own list, then covering a row is simply removing a node from each column list:


In the figure above, we’re only removing a row. Algorithm X may remove multiple rows per constraint satisfied (plus the column header of the satisfied constraints!).

The Implementation

My implementation is written in C++ on Visual Studio 2008 running on Windows XP. If you ever use this source code, all I ask is that you credit me for it (you don’t even need a link to this article). I used .doc files because they were the only real file upload option WordPress would give me outside of .docx, and I figure more people will find compatibility with .doc files. This project is now hosted on GitHub ( If you can’t manage to download and view these files then e-mail me at my email address found on my About Me page.

Alternatively, I will keep the old .doc files up for you to download. I really recommend getting the files through Git, as you don’t have to go about opening them in Word or OpenOffice and then copying the code into a .cpp or .h file. Making them super small text, though!

These files are the heart of the program – they implement the dancing links and algorithm x functionality. Puzzles are fed to the program via simple text files at the moment, but feel free to build a GUI around them!
This file controls sizing information about the matrix. My solver works in the general case, not just standard 9×9 Sudoku Puzzles.  Commented out in the file is the ability to expand the solver to 10×10, 12×12, or 16×16, though you may need to allow your program to use more memory!
This file simply instantiates and uses a SudokuMatrix, looping to allow the user to solve multiple puzzles per session.

Exact Cover Matrix


This file is simply the binary output of my 324 x 729 exact cover matrix for a 9×9 Sudoku. There’s no row or column headers, though I very heavily document the assumptions I make about it in SudokuMatrix.h

Closing Thoughts

Algorithm X’s little secret is that it’s actually a brute force solver—it doesn’t always choose the next best candidate to be part of the solution. In that regard, every iteration of the algorithm is a guess! It’s just a very, very fast brute-force solver: I was able to solve expert level 16×16 Sudokus in under a second!

Algorithm X is a great candidate to implement recursively: the same manipulations are performed on an object until one of two base cases are met, and if the failure base case is encountered, the algorithm can easily backtrack by simply returning one level up from the recursive stack.

The hardest thing for me to get my mind around was the cover() and uncover() functions that hide and reinstate values into the matrix. At first this was because I misinterpreted how it should’ve worked and accidentally covered the columns of ALL values that were covered. (For example, when we chose keys in the pants pocket, we removed the phone in pants pocket option. Because the phone in pants pocket also covers Phone Placed? I was initially also covering this constraint, when in fact placing keys in pants pocket does not cover it). Then I didn’t realize that the uncover function must work in the opposite order that the cover function did.

I had a lot of help using this site as a reference aside from the Wikipedia articles.

I’m not too sure how long this program took me to write… maybe 20 hours or so. Most of that was spent debugging the cover() and uncover() functions, though! I’m sure you could do it in half the time or less.

I’m all ears for questions or comments; I’ve done the best I could to thoroughly explain the concepts at work here. The source code is available above this section, in case you skipped here to the end.

22 thoughts on “Solving Sudoku, Revisited

  1. rAz0rX.# says:

    I’ve not read the whole thing yet , but man , THANKS A LOT for explaining it yourself and not pointing to Wikipedia , Gosh i went mad reading stuff at wikipedia . You made my Day man ! I finally get what the rows and columns and matrices are all about. Thanks a ton!/

  2. Tobex says:

    Brilliant. Simply brilliant. But could you please explain the removing and reinstating of nodes? Have a hard time understanding it!

    1. Hi Tobex, I’ve edited my post to speak more about covering and uncovering. I tried not to get too technical as the article is more focused on a general audience, but I hope it was enough! If you download the sudoku matrix cpp file, then hopefully the exact logic makes more sense.

  3. Jan Svejda says:

    Thank you so much for explaining in a human-understandable way!!! We have a hexadoku assignment so your explanation is truly appreciated.
    Jan S.

  4. Hey, thanks for the article! I’m a bit of a Sudoku hobbyist, and I’ve been looking to improve upon by current Sudoku Solver – a recursive backtracker when I came upon the Exact Cover problem and this page.

    From your article and a few others that you’ve linked to, I think I’ve got a pretty good handle on how and why it works, and I’m sure I could implement it in Python quickly enough. However, what I’m really curious about is why the DLX algorithm has superior performance to a recursive backtracker.

    Correct me if I’m mistaken, but the DLX algorithm here is essentially analogous to a recursive backtracker. Each column constraint being fulfilled is equivalent to a standard recursive backtracker filling a cell with a random guess, the recursive logic is identical, the only difference is the cover/uncover step. In my implementation of a recursive backtracker, each empty cell being filled is examined for its ‘possibles’, i.e. values that don’t appear in the cell’s row/column/box. When that cell is filled and the next empty cell checked, this next cell’s set of ‘possibles’ is calculated (taking into account all newly-filled cells.) This strikes me as being similar enough to what the cover function does in eliminating obsolete rows/columns. So why is it that the DLX (presumably, because I haven’t properly timed it myself) has superior performance?

    Looking forward to hearing your thoughts.

    1. Hi calebchiam, DLX is just a convenient data structure for applying Algorithm X to Sudoku. I imagine that the primary advantage to this approach is that we do not spend a lot of time copying the board representation like you would for each recursive call. Consider it an upfront cost to construct the DLX, afterwards all operations are super fast.

      There also isn’t much branching or number crunching involved either (computing feasible entries).

      I don’t doubt a well implemented recursive backtracking algorithm could compete, but it’s easier to be performant with this approach.

      1. Hmm, I don’t think that’s it, since in my implementation at least, only a list (board) reference value is passed in each recursive call, and really all operations are on the same, single board.

        1. I tried 😉 perhaps you can benchmark both implementations (my code is open source… Though it could probably stand to be updated, I like to think I’ve learned something in five years) and then profile for bottlenecks.

          1. So, I finally got around to implementing it haha. Perhaps my implementation was inefficient (I included row header nodes, on top of column header nodes because it was more intuitive to me), but on my machine, Exact Cover takes about 230ms on average, while my recursive backtracker takes 180ms. I don’t suppose you’d be interested in taking a look at it? (:

  5. I wasn’t expecting a python implementation! I’ll take a closer look this weekend (I hope!) For benchmarking, it might be more accurate to use a variety of problems and time them all (additionally running each one multiple times) this is so you can get multiple data points, and higher confidence in whether there’s a statistically significant result. Think of it this way, with many data points you can get a mean and a standard deviation for each method’s runtime. Then you can see how close the resulting means are. The farther away they are, and the more data points you have to show for it, the more significant the result.

    P.S. I’d imagine we could speed things up a little bit by removing some string comparisons 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s