Kwyjibo is an experiment in real-time OCR and computer vision.

Here's a short explanation on how it works that might interest those working on something similar. It's worth mentioning that since the project was originally built in 2011, neural networks have become much more useful for this kind of problem, but many of the processing steps can still be useful for similar problems.


Real time OCR
  1. Board Detection
    Find edges of board using colour filtering
  2. Board Extraction & Rectification
    Extract the board's pixel region and then rectify using a quadrilateral transformation
  3. Tile Colour Threshold
    Find tiles by filtering out non-tile colour pixels
  4. Tile Extraction
    Extract tile pixel regions with blob detection
  5. Tile Masking
    Apply the Tile Colour Threshold to the Tile Extraction to mask out unwanted pixels
  6. Adaptive Thresholding
    Use an adaptive threshold to find letter blobs
Real time OCR
  1. Inner Border
    Draw a thick inner border around the boundry to connect unwanted surrounding pixels
  2. Flood-fill Inner Border
    Flood fill will remove the unwanted pixels
  3. Small Blob Removal
    Use blob detection to find and remove small blobs below a threshold
  4. Extract & Rectify Letters
    Use blob detection to extract tile letters and resize, dilate to a standard resolution
  5. Classify Letters
    Use classification algorithms to determine the letters that have been placed
  6. Scrabble Game Logic
    Use the detected letters to find words and score the player respectively

Tile classification was solved with a k-nearest classifier, which worked well but another approach might be to use neural networks for this. The classifier is given extracted letters from the above process, which have been reduced to a feature vector using a grid based merge approach.

As shown below, grid merge takes a 2D array of features and merges them into a much more compact 1D array. The features are just binary pixels from the threshold step. The 2D feature space is divided into an array of n by n buckets, containing the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.

Real time OCR

This reduction approach seems pretty good and runs in O(n), so it works well for real-time, but it also helps improve the robustness of classification by increasing tolerance for slight rotation, skew and other distortions. This is because slight transforms won't alter the bucket pixel counts that much. In Kwyjibo grid merge reduces 1024 binary features (e.g. 32px * 32px) down to 16 integer features (a ~98% reduction) so the system performs far better too.

Real time OCR
Real time OCR

The positioning of the camera is important, the best position is directly above the board, framing it in shot exactly to gain the maximum resolution available. This is tricky without a special camera rig and can cause issues with bright light reflections, since the board is laminated. Mounting the camera at any other position reduces reflections but increases skew. So instead to handle any camera position I implemented board detection and rectification.