[sane-devel] scanimage / tesseract interoperability

Jeffrey Ratcliffe jeffrey.ratcliffe at gmail.com
Sat May 10 14:30:48 UTC 2014


On 10 May 2014 05:56, Jeff Breidenbach <jeff at jab.org> wrote:
> Tesseract is an open source OCR program. It can already
> produce searchable PDF and will soon support streaming.
> It would be fun to support something like this:
>
>    scanimage --batch | tesseract - - pdf > searchable.pdf
>
> To make this work nicely, scanimage would need to
> print the name of each file to stdout after it is written.

Try gscan2pdf, which combines scanimage (or the Sane API directly,
which is more efficient), and tesseract (or cuneiform) - all packed up
in a nice GUI.

Regards

Jeff



More information about the sane-devel mailing list