Kwyjibo is an experiment in real-time OCR, computer vision and machine learning using the game of Scrabble.
This was a project built for my university dissertation way back in 2011, so the approach taken here might seem a little dated compared to more modern approaches like neural networks. Hopefully some parts may still be useful for those working on something similar.
- Board Detection
Find edges of board using colour filtering
- Board Extraction & Rectification
Extract the board's pixel region and then rectify using a quadrilateral transformation
- Tile Colour Threshold
Find tiles by filtering out non-tile colour pixels
- Tile Extraction
Extract tile pixel regions with blob detection
- Tile Masking
Apply the Tile Colour Threshold to the Tile Extraction to mask out unwanted pixels
- Adaptive Thresholding
Use an adaptive threshold to find letter blobs
- Inner Border
Draw a thick inner border around the boundry to connect unwanted surrounding pixels
- Flood-fill Inner Border
Flood fill will remove the unwanted pixels
- Small Blob Removal
Use blob detection to find and remove small blobs below a threshold
- Extract & Rectify Letters
Use blob detection to extract tile letters and resize, dilate to a standard resolution
- Classify Letters
Use classification algorithms to determine the letters that have been placed
- Scrabble Game Logic
Use the detected letters to find words and score the player respectively
Reading Scrabble characters
Tile classification was solved with a classic k-nearest classifier, but these days I think it might be better to make use of modern neural networks instead. The classifier is given extracted letters from the above process, which have been reduced to a feature vector. The feature reduction method I found to work best was one that is a sort of grid based merge.
As shown below, grid merge takes a 2D array of features and merges them into a much more compact 1D array. The features are just binary pixels from the threshold step. The 2D feature space is divided into an array of n by n buckets, containing the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.
This reduction approach seems pretty good, essentially O(n), so it works in real-time. It even improves the robustness of classification by increasing tolerance for slight rotation, skew and other unwanted effects. This is because slight 2D transforms won't alter the bucket pixel counts that much. In Kwyjibo grid merge reduces 1024 binary features (e.g. 32px * 32px) down to 16 integer features (a ~98% reduction) so the system performs far better.
The positioning of the camera is really important. The best position is directly above the board, framing it in shot exactly to gain the maximum resolution avaliable. This is quite impractical without a special camera rig and can cause issues with bright light reflections, since the board I was using was so shiny. Mounting the camera at any other position introduces skew but reduces reflections. Instead to handle any camera position I implemented board detection and rectification.