Suggestion: OCR plugin wrapper for open-source Tesseract-OCR

Description

Greenshot 1.0 comes with a plugin (wrapper) for Microsoft Document Imaging MODI OCR - if this is available on the machine. MODI however requires a Microsoft Office license and at least partial installation of the required modules and languages. Often, not all of the required language packs are available. Not everyone like the MODI, even when this has a high detection quality.

Suggestion:
=========

* implementation of an alternative plugin as a wrapper for the open-source OCR command-line software "Tesseract"
* it requires also imagemagick (convert) for preprocessing images (resizing +300%, conversion to tiff)
* advantage: almost any language is available
* advantage: open-source

References:
* https://code.google.com/p/tesseract-ocr/
* https://de.wikipedia.org/wiki/Tesseract\_%28Software%29
* https://en.wikipedia.org/wiki/Tesseract\_%28software%29

Environment

None

Gliffy Diagrams

Activity

Show:

endolith May 24, 2014 at 1:18 AM

This is amazing, thanks. I wish it could be built in and easier to set up though.

Could also use this to copy to clipboard: http://superuser.com/a/231032

Robin Krom February 6, 2013 at 6:51 PM

  • *Milestone*: Next_Release --> None

Wikinaut November 28, 2012 at 6:42 AM

The output text file of my script of the previous post is

C:\WINDOWS\TEMP\ocr.txt

Wikinaut November 28, 2012 at 6:41 AM

I added code in posting #7 in https://sourceforge.net/projects/greenshot/forums/forum/676082/topic/5574624/index/page/1

The code of an external command file which works is:

@ECHO OFF
REM OCR 20121128
REM batch resize images 20121014
IF (%1)==() GOTO HELP
SET LANG=%~2
IF (%2)==() SET LANG=deu

SET TMPFILE=%TMP%\ocr.tiff

setlocal EnableDelayedExpansion
@ECHO OCRing %~1 (%LANG%) ==^> ocr.txt
@C:\Programme\ImageMagick\convert -resize "400%%" -type Grayscale +compress "%~1" %TMPFILE%
@C:\Programme\tesseract-OCR\tesseract %TMPFILE% %TMP%\ocr -l %LANG%
type ocr.txt
GOTO EOF

:HELP
@ECHO:
@ECHO OCR image
@ECHO:
@ECHO Usage^: ocr x.jpg [deu^|eng]
@ECHO: default deu
@ECHO:

:EOF

Wikinaut November 21, 2012 at 8:31 PM

by the way, MODI is really good. But MODI is not free and cannot be installed on all machines. This is why I looked for a free alternative. Together with Greenshot, these two are perhaps good twins...

Details

Assignee

Reporter

Labels

Created November 19, 2012 at 6:56 AM
Updated March 27, 2016 at 9:39 AM