Kwyjibo is an experiment in real-time OCR and CV, below is a little on how it works (2011)
- Board Detection
Find edges of board using colour filtering
- Board Extraction & Rectification
Extract the edges of the board image and then use an inverse perspective transform
- Tile Colour Threshold
Find tiles by filtering out non-tile colour pixels
- Tile Extraction
Extract tile pixel regions with blob detection
- Tile Masking
Use the extracted tile regions to mask out unwanted pixels
- Adaptive Thresholding
Use an adaptive threshold to find letter blobs
- Inner Border
Draw a thick inner border around the boundry to connect unwanted edge pixels
- Flood-fill Inner Border
Flood fill will remove the unwanted pixels
- Small Blob Removal
Use blob detection to find and remove small blobs below a threshold
- Extract & Rectify Letters
Use blob detection to extract tile letters and resize, dilate pixels at a standard resolution
- Classify Letters
Use classification algorithms to determine the letters that have been placed
- Game Logic
Use the detected letters to find words and score the play
The classifier is given letter images after they are extracted, which have been reduced to a much smaller feature vector using a custom merging approach.
This grid merge takes a 2D array of features (black and white pixels) and merges them into a much more compact 1D array. The 2D space is divided into an array of n by n buckets, containing only the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.
Fortunately this runs in linear time, so it works very nicely for real-time use. In this case it also helps improve the robustness of classification by allowing more room for slight rotation, skew and other distortions, since they won't alter pixel counts that much. In Kwyjibo the grid merge step reduces 1024 binary features (e.g. 32px * 32px) down to 16 integer features (a ~98% reduction). This is probably no longer really needed on modern hardware, but potentially a useful idea for larger scale problems.
Finally given we want to use as much resolution as possible, the best position for the camera is directly above the board. This isn't the easiest in practice and in a lot of cases ends up showing reflections under bright lights. To allow the camera much more freedom, the edges of the board are detected, then the image is perspective transformed back to a square before any OCR is applied as shown in the steps above.
The field of computer vision and computing power have moved on considerably since this project was built in 2011. If you're working on something similar, it's definitely worth further checking out neural network libraries for this kind of task, but the pre-processing techniques shown here might still be worthwhile.