Quantcast
Channel: CodeSection,代码区,Python开发技术文章_教程 - CodeSec
Viewing all articles
Browse latest Browse all 9596

OCR made easy using tesserocr

$
0
0

There are numerous OCR libraries for python. tesserocr is the only library I found that has a decent, humanly-approachable API.

What is it exactly?

tesserocr is asimple, Pillow-friendly, wrapper around tesseract-ocr API.

Pillow is afriendly PIL fork (PIL is the Python Imaging Library).

Extracting text from a nutrition facts image

We’ll extract text from this image:


OCR made easy using tesserocr

First, install all the requirements:

$ sudo apt install tesseract-ocr \ libtesseract-dev \ libleptonica-dev $ pip install Pillow cython tesserocr

Now run the following gist:

And viola!

$ python ocr.py /path/to/chocolate.jpg Nutrition Facts Serving Size 1 cup (249g) Servings Per Container 8 ― Amount Per Sewing Calories 210 Calories from Fat 80 % Daily Value" Total Fat 8g 13% Saturated Fat 5g 26% Trans Fat 0g Cholesterol 30mg 10% Sodium 200mg 9% Total Carbohydrate 27g 9% Dietary Fiber 1g 5% Sugars 25g Protein 9g Vitamin A 6% - Vitamin C 0% Calcium 30% - Iron 6% Vitamin D 30% *Percent Daily Values are based on a 2,000 calorie diet.


Viewing all articles
Browse latest Browse all 9596

Trending Articles