Algorithms

Real-Time Sudoku Solving with a Webcam

A deep dive into computer vision algorithms for recognizing and solving Sudoku puzzles in real time using a webcam, covering adaptive thresholding, Hough transform, OCR, and multiple solving strategies.

Preamble

This application may not have had practical value, but it provided a tremendous amount of experience. Computer vision is one of the most fascinating areas of modern computing. The complexity is enormous, but the possibilities are endless. The only limitation is your imagination, time, and understanding of the underlying mathematics.

I wrote this program in C++ because I wanted to understand the internal workings. There are many available libraries out there (I recommend OpenCV for learning). Webcam image acquisition is based on Vadim Gorbatenko's AviCap source code from CodeProject.

Here are the timing benchmarks for processing a 640x480 image on a 2.8 GHz PC:

Webcam image capture: ~100 ms (10 fps)
Monochrome conversion: ~12 ms
OCR correction + Sudoku solving: ~8 ms combined

How Monochrome Conversion Works

Threshold Selection

Every computer vision application begins by converting a color (or grayscale) image into a monochrome (black and white) image. To do this, the color image is first converted to grayscale (if it isn't already). Then, a threshold is selected. Each pixel is compared against the threshold: if the pixel value is greater, it becomes white; if less, it becomes black.

For example, pixel (200, 200, 200) compared against a global threshold of 128 (256/2) becomes white.

However, a fixed global threshold fails for images with uneven lighting. The solution is adaptive local thresholding. The algorithm examines an 11x11 pixel square centered on each pixel and calculates the average intensity: threshold = sum / 121 (where 121 = 11 x 11). If the pixel's intensity is greater than this local average, it becomes white; otherwise, it becomes black.

This approach correctly handles shadows and gradients, but comes at a significant computational cost.

The Integral Image

The integral image is an array of integers with the same dimensions as the original image. Each position in the integral image contains the cumulative sum of all pixel intensities from the top-left corner to the current position.

Consider a 12x12 region where each pixel has intensity = 1. The integral image stores running sums from the top-left to the bottom-right.

The brilliant property of the integral image is that you can calculate the sum of any rectangular region using just four values:

sum = D - C - B + A

For example: 110 - 44 - 40 + 16 = 42

Instead of reading all pixels within the rectangle, we only need four memory reads. This dramatically accelerates the adaptive thresholding, even though the monochrome conversion remains the most computationally expensive step.

How to Determine the Rotation Angle

A webcam is not a scanner. The image will never be perfectly aligned. Correcting for rotation is essential.

The Hough Transform

The Hough Transform detects straight lines in an image using the formula:

y = (x * cos(theta) + rho) / sin(theta)

Where:

theta = the angle of the line
rho = the perpendicular distance from the line to the origin (0, 0)

The algorithm works as follows:

Iterate through all pixels in the monochrome image, skipping white ones.
For each black pixel, draw 180 imaginary lines through it (one for each degree from 0 to 179).
Each line "votes" in a 2D accumulator array (theta x rho).
The maximum value in the accumulator corresponds to the strongest line in the image.

It's like voting. Real lines receive the most "votes."

Consider a diagonal line of three pixels. Pixel 1 votes for imaginary lines A, B, C, D. Pixel 2 votes for E, B, G, H. Pixel 3 votes for I, B, K, L. Line B wins with 3 votes; all others get just 1.

For a Sudoku grid, all grid lines share approximately the same angle (e.g., ~6 degrees of rotation). By finding the dominant angle in the accumulator, we can correct the image rotation. Note that perspective distortion (lines closer together on one side) can also be detected — lines A-B may be closer together than B-C.

As an extension, the Hough Transform can detect circles too — but this requires a 3D accumulator (x, y, r for center coordinates and radius).

How to Detect the Grid

The computer finds it very difficult to determine which lines belong to the Sudoku grid and which are simply visual noise. Newspapers often print multiple Sudoku grids on a single page, creating many extraneous lines.

The solution: instead of looking for black grid lines, look for the white boundaries surrounding the grid. Analyze horizontal "green lines" across the image:

A line that never crosses a black pixel = grid boundary
A line that crosses black pixels 10 times = not a boundary

Count the white-to-black pixel transitions beneath each line. For speed optimization, every third line is analyzed rather than every line.

After identifying the boundaries:

Run the Hough Transform to precisely locate the grid lines.
Correct skew and other image defects.
Determine exact grid line positions for digit extraction.

How OCR Works

After determining where the numbers should be located, we need to recognize them.

Theory

The recognition algorithm has three steps:

Feature definition
Training
Runtime classification

Zone-Based Density Method

Digits are reduced to a 5x5 size, normalized, and stored in a static array: OCRDigit::m_zon[10][5][5]

Reducing to 5x5 is called "zoning" or "zone partitioning." The density values are normalized to the range 0-1024.

At runtime:

Extract a digit from the image.
Reduce it to 5x5.
Compare each pixel against the 9 trained digit patterns.
Find the minimum difference — that's the match.

Advantages: Size-invariant, since it always works with 5x5 zones.

Limitations: Sensitive to rotation (addressed by the earlier rotation correction), position offset, white-on-black negatives, and inverted digits.

Width-to-Height Ratio

The digit "1" is a special case. Its distinguishing feature is that its width is 40% less than its height. No other digit shares this ratio. This becomes the 26th feature, added to the 25 zone features (5x5 grid).

OCR Result Correction

Sudoku rules are applied after OCR to fix misrecognitions: no duplicate digits are allowed in any row, column, or 3x3 block.

Example: if OCR yields "5" (difference = 5210) but this violates Sudoku rules, it's replaced with "6" (difference = 5508). The difference values are compared; the lower value indicates the more likely correct digit. The correction code is in SudSolver::Fixit().

Future improvement: Add digit profiles as features. Digits 5, 6, and 9 look similar when zoned. Profiles — measuring pixel distances between bounding box edges and digit boundaries — can distinguish them.

How the Sudoku Solver Works

Three complementary methods are used: naked singles, hidden singles, and brute-force backtracking.

Brute-Force Backtracking

This is the most widely used method by programmers, and it guarantees a solution regardless of difficulty level.

The process:

Create a candidate table (possible values for each empty cell).
Place values 1-9 sequentially in each cell.
Continue solving with the placed number.
If a deadlock occurs (duplicate in row/column/block), backtrack and try the next value.
Recurse until solved.

This may be slow depending on recursion depth, but it works for any puzzle.

Optimization tip: Start with cells that have fewer candidates — fewer combinations to test.

Naked Singles

If a cell has only a single candidate, then that digit definitely belongs in that cell. After placing the value, rebuild the candidate list. The process repeats until no single-candidate cells remain.

This is simple and obvious for computers, but less obvious for humans who cannot maintain complete candidate lists in their heads.

Hidden Singles

Look at the digit 7. If you're solving a Sudoku, you'd likely say that the seven must go in the indicated cell. Even if the cell contains multiple candidates (4, 7, 8, 9), searching for a unique digit candidate within the 3x3 block, row, or column reveals that 7 can only go here.

This works well in combination with naked singles. When naked singles are exhausted, hidden singles often emerge.

Combining All Methods Together

Processing and solving speed matters for a real-time application.

The algorithm flow:

Apply naked singles.
Apply hidden singles.
If still unsolved, initiate brute-force backtracking.

Fast methods (naked and hidden singles) solve simple puzzles entirely. Complex puzzles trigger backtracking, already pruned by the fast methods.

Backtracking limits:

Maximum 600,000 iterations
Three attempts maximum
Between attempts, randomly shuffle the recursive sequence (hoping a new order solves faster)
If still unsolved, hope the next camera frame provides a better result

Implementation: SudSolver::SolveMe()

Solution Buffering

When a solution is found, it is stored in a SudResultVoter array. Buffering is necessary because OCR does not guarantee 100% accuracy, and from time to time we get different solutions for the same puzzle.

Strategy: Display the strongest solution from the last 12 solution attempts. Periodically clear the array, forgetting old solutions, to enable recognition of new puzzles.

Video Demonstration

Future Improvements

Rewrite for Android to investigate smartphone compatibility.
Parallelize functions on multiprocessor systems (most modern PCs have at least 2 cores). While speed matters less than accuracy and quality for a webcam solver, parallel processing of different frame settings could improve results by selecting the best result after completion.

Downloads:

Demo: 170 KB (CodeProject)
Source code: 307 KB (CodeProject)
GitHub mirror: github.com/NeonMercury/Realtime-Webcam-Sudoku-Solver