Real time OCR on Scrabble

This page is a follow up on how Kwyjibo works from a high level point of view. It's mostly just the more interesting parts originally from my dissertation.

You've probably seen the tennis referee system Hawk Eye, right? Or heared about goal-line technology? Well, this system is a bit like those, except it's for refereeing the slightly more stationary game known as Scrabble.

While there's nothing very ground breaking about this approach, it seemed to work pretty well for this research application. Hopefully it can give some ideas for those working on similar types of project.

So here's how it works

As there's a number of steps involved, I'll break it down with pictures

Real time OCR
  1. Board Detection
    Find edges of board using colour filtering
  2. Board Extraction & Rectification
    Extract the board's pixel region and then rectify using a quadrilateral transformation
  3. Tile Colour Threshold
    Find tiles by filtering out non-tile colour pixels
Real time OCR
  1. Tile Extraction
    Extract tile pixel regions with blob detection
  2. Tile Masking
    Apply the Tile Colour Threshold to the Tile Extraction to mask out unwanted pixels
  3. Adaptive Thresholding
    Use an adaptive threshold to find letter blobs
Real time OCR
  1. Inner Border
    Draw a thick inner border around the boundry to connect unwanted surrounding pixels
  2. Flood-fill Inner Border
    Flood fill will remove the unwanted pixels
  3. Small Blob Removal
    Use blob detection to find and remove small blobs below a threshold
Real time OCR
  1. Extract & Rectify Letters
    Use blob detection to extract tile letters and resize, dilate to a standard resolution
  2. Classify Letters
    Use classification algorithms to determine the letters that have been placed
  3. Scrabble Game Logic
    Use the detected letters to find words and score the player respectively

The source code and a video of my implementation is on the Kwyjibo page.

Reading Scrabble characters with classification, feature reduction

Tile classification is solved with a regular k-nearest classifier. The classifier is given extracted letters from the above process, which have been reduced to a suitible feature vector. The feature reduction method I eventually found to work best was one I put together that I call grid merge. I found out later it's probably a type of spatial locality sensitive hashing.

As shown in the diagram below, grid merge takes a 2D array of features and merges them into a much more compact 1D array. The features are just binary pixels from the threshold step. The 2D feature space is divided into an array of n by n buckets, containing the count of pixels located in each bucket. This 2D array is then flattened to 1D for input into the classifier.

Real time OCR

This reduction approach is pretty cool. Essentially O(n), so it works in real-time. It even improves the robustness of classification by increasing tolerance for slight rotation, skew and other unwanted effects. This is because slight 2D transforms won't alter the bucket pixel counts that much.

In Kwyjibo, I used grid merge to successfully reduce 1024 binary features (e.g. 32px * 32px) down to just 16 integer features (a ~98% reduction). The system performed just as well, in fact better due to the increased tolerance.

Camera positioning

The positioning of the camera is really important. The best position is directly above the board, framing it in shot exactly to gain the maximum resolution avaliable. This is quite impractical without a special camera rig and can cause issues with bright light reflections, since the board I was using was ridiculously shiny. Mounting the camera at any other position introduces skew but reduces reflections. I solved this to handle any camera position by implementing board detection and rectification.

Real time OCR
best camera position, but not very practical
Real time OCR
more practical, but skew and lower resolution