Mtgscan is a project aiming at recognizing Magic cards from an image (photo or screenshot), using OCR (Optical Character Recognition).
Magic is a card game (played both online and IRL) in which you have to build a pile of cards (deck) to play against other players. The number of cards in a deck can be large (between 40 and 75, most of the time) and listing them by hand (to attend a tournament or sell them, for example) is tedious.
Most attempts to recognize Magic cards seem to rely on using image processing and/or training a neural network (usually CNN) on the overall image.
For example, this article pre-process the image, get each card by segmentation and recognize them using a perceptual hash together with a database of card hashs.
This other project tried to use a neural network combined with perceptual hashing, but without improving much.
However, these attempts do not work if the cards are stacked, as is usually the case when listing a deck or collection.
Instead, I used an OCR to recognize the title of the cards, rather than the whole image.
Text recognition (OCR)
There are several existing OCR. I firstly tried Tesseract which is probably the best open-source OCR but the results turned out to be pretty bad on Magic cards.
Therefore, I considered proprietary, cloud solutions:
These OCRs can only be used in the cloud. They are free for a limited use, which is more than enough for me. They are much more efficient that Tesseract and, after comparing them on few images, I opted for the Azure OCR which was performing better.
|Limits||5000/month and 20/minute||1000/month||1000/month|
Azure OCR can be used through http request, where results are accessed via polling, each recognized block containing a bounding box and a text:
Aside from the swamp (hard to see), the results of Azure OCR are very good.
Card name recognition
Then, I lookup in a card dictionary for each block of text to decide if it is indeed a card name. However, the OCR often makes small errors that we need to correct first.
OCR post-processing and approximate string matching
There are several problems with the OCR result on the above image:
- Some characters have been mistakenly added (
(, mana value…) to card names.
- The Ur-Dragon and Hogaak, Arisen Necropolis have truncated names.
- The Karakas in sideboard is totally wrong.
To fix these problems, I used fuzzy search (approximate string matching) with SymSpell, which find the closest word in a dictionary by edit distance. See this article for more details.
To be efficient, SymSpell expects a bound on the distance. By experience, a maximum edit distance of 6 gave good results. I also reject a text if its ratio distance/length (percentage of errors per character) is too high.
Alternatively, I could have used Lucene or Elastic search.
Other post-processing that I found useful:
- Reject texts that are too short (< 3 characters) or too long (> 30).
- Remove special characters (@, !, …).
- If ‘..’ appears in text, it might mean that the name is truncated. Hence I do a prefix search in the SymSpell dictionary.
- Some frequent keywords on cards are mistakenly detected as card name. Therefore I rejected some cards such as Sacrifice.
- Try to detect card multipliers (e.g. 4x).
Here is the result on the above screenshot:
We can spot 3 errors:
- An extra Squee, Goblin Nabob is detected in the rules text.
- The Ur-Dragon is not recognized, because of its mana cost being added to its name.
- The Karakas in sideboard is not recognized, due to too many error from the OCR.
On this test set, I got a 10% error rate, which seems good to me: the decklist can be corrected by hand while still saving a lot of time.
Web application framework: Flask
Task queue: Celery
Since scanning an image takes some time (roughly 10 seconds), I used Celery task queue to scan cards as a background job (a worker), thus not freezing the server. Another possibility was to process images asynchronously with asyncio, but that would only use one thread which is not suited for scaling the server.
Message broker: Redis
To transfer messages between the web application and the workers, we need a message queue. Popular choices include RabbitMQ, ZeroMQ and Redis. I used the later, which can also be used as a database.
Communication between client and server: SocketIO
To send an image from the browser’s client to the server, I used SocketIO. It is based on WebSocket, which is a protocol for bidirectional communication between a client and a server. When the server has finished processing an image, it sends the decklist back to the client:
Web server: Eventlet
Unfortunately, Werkzeug does not support WebSockets. I used Eventlet instead.
Putting it all together
Connect Redis to SocketIO and Celery:
Get an image from the client, send it to the worker and send the decklist back:
I used the following Dockerfile for Flask and Celery:
poetry.lock contains the requirements of the application.
Finally, docker-compose starts Flask, Celery and Redis: The env files contain the environment variables: the credentials for Azure and the password used for Redis.
Deployment: GCP vs Azure
When it comes to deployment, I tried GCP and Azure:
|Azure App Service F1||Azure App Service B1||Azure VM B1s||Azure VM B2s||GCP VM e2-micro||GCP VM e2-medium||GCP VM e2-standard-2||GCP App Engine|
|RAM||1 GB||1.75 GB||1 GB||4 GB||1 GB||4GB||8 GB|
|CPU||Shared||Dedicated||1 vCPU||2 vCPU||2 vCPUs shared||2 vCPU shared||2 vCPU|
Azure App Service and GCP App Engine are very similar and aims at being easier to deploy simple applications. Since there is no support for docker-compose in GCP App Engine, I did not consider this option. Hosting on Azure App Service was significantly easier with the already configured environnement (docker, DNS, SSL…). However 1 GB of RAM was not enough when more than 2 requests occur simultaneously. The B1 plan was a minimum. On the plus side: the integration between Azure and Visual Code.
I also tried to host my app on a virtual machine, offering more liberty. GCP VMs seemed a bit more attracting to me than Azure VMs. I had no problem connecting to the GCP VM via Visual Code remote SSH.
|At account creation||Free tier VM|
|Azure||$200 the first month||B1s (for 1 year)|
|GCP||$300 the first 3 months||e2-micro
The free tier can also be used to partially cover the cost of more expensive services.
- Make a Twitter bot scanning cards (by using http://mtgscan.net through a REST API), when mentioned.
- Adapt the app for other card games.