Computer vision applied to the recognition of Magic cards

6 minute read

Mtgscan is a project aiming at recognizing Magic cards from an image (photo or screenshot), using OCR (Optical Character Recognition).

Test the application (an URL to an image is pre filled)

Introduction

Magic is a card game (played both online and IRL) in which you have to build a pile of cards (deck) to play against other players. The number of cards in a deck can be large (between 40 and 75, most of the time) and listing them by hand (to attend a tournament or sell them, for example) is tedious.

Card recognition

Most attempts to recognize Magic cards seem to rely on using image processing and/or training a neural network (usually CNN) on the overall image.
For example, this article pre-process the image, get each card by segmentation and recognize them using a perceptual hash together with a database of card hashs.
This other project tried to use a neural network combined with perceptual hashing, but without improving much.

However, these attempts do not work if the cards are stacked, as is usually the case when listing a deck or collection.

Example of stacked cards recognized by mtgscan: each card is only partially visible


Instead, I used an OCR to recognize the title of the cards, rather than the whole image.

Text recognition (OCR)

There are several existing OCR. I firstly tried Tesseract which is probably the best open-source OCR but the results turned out to be pretty bad on Magic cards.

Therefore, I considered proprietary, cloud solutions:

These OCRs can only be used in the cloud. They are free for a limited use, which is more than enough for me. They are much more efficient that Tesseract and, after comparing them on few images, I opted for the Azure OCR which was performing better.

Azure GCP AWS
OCR Read Vision Textract
Limits 5000/month and 20/minute 1000/month 1000/month

Azure OCR can be used through http request, where results are accessed via polling, each recognized block containing a bounding box and a text:


Aside from the swamp (hard to see), the results of Azure OCR are very good.

Card name recognition

Then, I lookup in a card dictionary for each block of text to decide if it is indeed a card name. However, the OCR often makes small errors that we need to correct first.

OCR post-processing and approximate string matching

There are several problems with the OCR result on the above image:

To fix these problems, I used fuzzy search (approximate string matching) with SymSpell, which find the closest word in a dictionary by edit distance. See this article for more details.
To be efficient, SymSpell expects a bound on the distance. By experience, a maximum edit distance of 6 gave good results. I also reject a text if its ratio distance/length (percentage of errors per character) is too high.
Alternatively, I could have used Lucene or Elastic search.

Other post-processing that I found useful:

  • Reject texts that are too short (< 3 characters) or too long (> 30).
  • Remove special characters (@, !, …).
  • If ‘..’ appears in text, it might mean that the name is truncated. Hence I do a prefix search in the SymSpell dictionary.
  • Some frequent keywords on cards are mistakenly detected as card name. Therefore I rejected some cards such as Sacrifice.
  • Try to detect card multipliers (e.g. 4x).

Results

Here is the result on the above screenshot:

We can spot 3 errors:

  • An extra Squee, Goblin Nabob is detected in the rules text.
  • The Ur-Dragon is not recognized, because of its mana cost being added to its name.
  • The Karakas in sideboard is not recognized, due to too many error from the OCR.

On this test set, I got a 10% error rate, which seems good to me: the decklist can be corrected by hand while still saving a lot of time.

Web application

Web application framework: Flask

I used Flask as a web application framework. By default, Flask uses the web server included in Werkzeug, which is easy to use but not suited for production.

Task queue: Celery

Since scanning an image takes some time (roughly 10 seconds), I used Celery task queue to scan cards as a background job (a worker), thus not freezing the server. Another possibility was to process images asynchronously with asyncio, but that would only use one thread which is not suited for scaling the server.

Message broker: Redis

To transfer messages between the web application and the workers, we need a message queue. Popular choices include RabbitMQ, ZeroMQ and Redis. I used the later, which can also be used as a database.

Communication between client and server: SocketIO

To send an image from the browser’s client to the server, I used SocketIO. It is based on WebSocket, which is a protocol for bidirectional communication between a client and a server. When the server has finished processing an image, it sends the decklist back to the client:

Web server: Eventlet

Unfortunately, Werkzeug does not support WebSockets. I used Eventlet instead.

Putting it all together

Connect Redis to SocketIO and Celery:

Get an image from the client, send it to the worker and send the decklist back:

Containerization: Docker

I used the following Dockerfile for Flask and Celery: Where poetry.lock contains the requirements of the application.

Finally, docker-compose starts Flask, Celery and Redis: The env files contain the environment variables: the credentials for Azure and the password used for Redis.

Deployment: GCP vs Azure

When it comes to deployment, I tried GCP and Azure:

Azure App Service F1 Azure App Service B1 Azure VM B1s Azure VM B2s GCP VM e2-micro GCP VM e2-medium GCP VM e2-standard-2 GCP App Engine
RAM 1 GB 1.75 GB 1 GB 4 GB 1 GB 4GB 8 GB
CPU Shared Dedicated 1 vCPU 2 vCPU 2 vCPUs shared 2 vCPU shared 2 vCPU
Price/month 1 Free $13.14 $7.59 $30.37 $6.11 $24.46 $48.91
Support compose Partially Partially Yes Yes Yes Yes Yes No

Azure App Service and GCP App Engine are very similar and aims at being easier to deploy simple applications. Since there is no support for docker-compose in GCP App Engine, I did not consider this option. Hosting on Azure App Service was significantly easier with the already configured environnement (docker, DNS, SSL…). However 1 GB of RAM was not enough when more than 2 requests occur simultaneously. The B1 plan was a minimum. On the plus side: the integration between Azure and Visual Code.

I also tried to host my app on a virtual machine, offering more liberty. GCP VMs seemed a bit more attracting to me than Azure VMs. I had no problem connecting to the GCP VM via Visual Code remote SSH.

Moreover Azure and GCP offer free trial and free tiers, although GCP is a bit more generous:

At account creation Free tier VM
Azure $200 the first month B1s (for 1 year)
GCP $300 the first 3 months e2-micro
(forever)

The free tier can also be used to partially cover the cost of more expensive services.

Perspectives

  • Make a Twitter bot scanning cards (by using http://mtgscan.net through a REST API), when mentioned.
  • Adapt the app for other card games.

Comments