Kwyjibo is an experiment in real-time OCR and CV, below is a little on how it works (2011)
- Board Detection
Find edges of board using colour filtering - Board Extraction & Rectification
Extract the edges of the board image and then use an inverse perspective transform - Tile Colour Threshold
Find tiles by filtering out non-tile colour pixels - Tile Extraction
Extract tile pixel regions with blob detection - Tile Masking
Use the extracted tile regions to mask out unwanted pixels - Adaptive Thresholding
Use an adaptive threshold to find letter blobs
- Inner Border
Draw a thick inner border around the boundry to connect unwanted edge pixels - Flood-fill Inner Border
Flood fill will remove the unwanted pixels - Small Blob Removal
Use blob detection to find and remove small blobs below a threshold - Extract & Rectify Letters
Use blob detection to extract tile letters and resize, dilate pixels at a standard resolution - Classify Letters
Use classification algorithms to determine the letters that have been placed - Game Logic
Use the detected letters to find words and score the play - UI / UX / AR
Visually overlay the words placed by each player and how they were scored
As the project was developed around 2010, the approach to tile classification was handled with a k-nearest classifier, but now the task may be handled with modern neural networks.
After the pre-processing shown above, the classifier is given letter images reduced to a much more compact feature vector using a custom feature merging approach.
This grid based merge takes a 2D array of features (black and white pixels) and merges them into a much more compact 1D array. The 2D space is divided into an array of n by n buckets, containing only the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.
This approach gave better performance, enough for real-time use in this case, but another benefit is improved accuracy by acting as a feature hash. The robustness of classification is increased here by reducing sensitivity to slight rotation, skew and other distortions, since pixel counts in each bucket don't change much.
In Kwyjibo the grid merge step reduces 1024 binary features (e.g. 32px * 32px) down to 16 integer features (a ~98% reduction) with better classification accuracy too.
Given we want to use as much resolution as possible, the best position for the camera is directly above the board game. This isn't the easiest position to use in practice and in a lot of cases ends up showing reflections under bright lights.
To improve the user experience and allow much more freedom with the camera, the edges of the board are detected, the image is then perspective transformed back to a square.
This whole process is applied to each camera frame, the words played are overlaid on the board visually in real-time and the game rules are applied during play including scoring.
Overall the field of computer vision, machine learning and computing power have moved on considerably since this project was built (around 2010). If you're working on something similar, it's definitely worth further checking out neural network libraries for this kind of task, but the pre-processing and hashing techniques shown here might still be worthwhile.