Kwyjibo is an experiment in real-time OCR and CV, below is a little on how it works (2011)


Real time OCR
  1. Board Detection
    Find edges of board using colour filtering
  2. Board Extraction & Rectification
    Extract the edges of the board image and then use an inverse perspective transform
  3. Tile Colour Threshold
    Find tiles by filtering out non-tile colour pixels
  4. Tile Extraction
    Extract tile pixel regions with blob detection
  5. Tile Masking
    Use the extracted tile regions to mask out unwanted pixels
  6. Adaptive Thresholding
    Use an adaptive threshold to find letter blobs
Real time OCR
  1. Inner Border
    Draw a thick inner border around the boundry to connect unwanted edge pixels
  2. Flood-fill Inner Border
    Flood fill will remove the unwanted pixels
  3. Small Blob Removal
    Use blob detection to find and remove small blobs below a threshold
  4. Extract & Rectify Letters
    Use blob detection to extract tile letters and resize, dilate pixels at a standard resolution
  5. Classify Letters
    Use classification algorithms to determine the letters that have been placed
  6. Game Logic
    Use the detected letters to find words and score the play

Since the project was developed around 2010, the tile classification was handled a k⁠-⁠nearest classifier, but these days the task can be handled with modern neural networks.

The classifier is given letter images after they are extracted, which have been reduced to a much smaller feature vector using a custom merging approach.

This grid merge takes a 2D array of features (black and white pixels) and merges them into a much more compact 1D array. The 2D space is divided into an array of n by n buckets, containing only the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.

Real time OCR

Fortunately this runs in linear time, so it works very nicely for real-time use. In this case it also helps improve the robustness of classification by allowing more room for slight rotation, skew and other distortions, since they won't alter pixel counts that much. In Kwyjibo the grid merge step reduces 1024 binary features (e.g. 32px * 32px) down to 16 integer features (a ~98% reduction). This is probably no longer really needed on modern hardware, but potentially a useful idea for larger scale problems.

Real time OCR
Real time OCR

Finally given we want to use as much resolution as possible, the best position for the camera is directly above the board. This isn't the easiest in practice and in a lot of cases ends up showing reflections under bright lights. To allow the camera much more freedom, the edges of the board are detected, then the image is perspective transformed back to a square before any OCR is applied as shown in the steps above.

The field of computer vision and computing power have moved on considerably since this project was built in 2011. If you're working on something similar, it's definitely worth further checking out neural network libraries for this kind of task, but the pre⁠-⁠processing techniques shown here might still be worthwhile.