One of the best Optical Character Recognition (OCR) engines is used in our products: Tesseract. Tesseract development has been sponsored by Google since 2006.
Our products contain set of procedures which optimize image quality before OCR recognition. These functions provide cardinal improving of the OCR results.
At the same time Tesserract engine may not recognize some specific fonts. The best results will be received for standard Microsoft Office fonts with font size from 9 to 13 px.
Please pay attention, we use Tesseract OCR as-is and we cannot add support for unrecognized symbols/fonts/languages
OCR supports more than 30 languages:
English, Arabic, Bulgarian, Catalan, Czech, Danish, Dutch, Finnish, French, German, Greek, Hebrew, Hungarian, Italian, Japanese, Korean, Latvian, Lithuanian, Norwegian, Polish, Portuguese, Romanian, Russian, Serbian, Slovenian, Slovakian, Spanish, Swedish, Tagalog, Turkish, Chinese Simplified, Chinese Traditional, Ukrainian, Vietnamese
How to install:
Download appropriated language pack
The eDocStation, SharePoint Scanner Plug-in 2010 Professional, Dynamics CRM Scanner and PDF Plug-in 2011:
Extract ZIP content to the directory "\ocr\tessdata" on your workstation (usually "C:\Program Files (x86)\Websio Information Solutions\Websio eDocStation\ocr\tessdata\")
SharePoint PDF & OCR Converter:
Extract ZIP content to the directory "C:\Program Files (x86)\Websio Information Solutions\Websio PDF Spooler\ocr\tessdata" on each your SharePoint Front-End server