[sane-devel] Need Document Scanning on Linux for Kodak i40, i800

Roger Price rprice at cs.uml.edu
Wed Apr 26 07:25:30 UTC 2006

On Tue, 25 Apr 2006, rcjohnson at openvotingsolutions.com wrote:

> Please advise me as to availability of SANE with capability of scanning 
> documents to produce XML,

Hello Richard, Producing a text file is a function of the software which 
comes bundled with the scanner rather than the scanner itself.  Sane does 
not itself provide OCR, but calls gocr to produce a text file.  At level 
0.3.5, gocr supported output "formats" ISO8859_1 TeX HTML UTF8.  It would 
probably be better to call these "character encodings" rather than 
formats.  http://jocr.sourceforge.net (Note the j.)

My experience with gocr is that the text file requires human review and 
correction to be usable.  Commercial OCR does better but will never be 
100% accurate.

When you say "produce XML", do you mean "produce a valid marked-up 
document according to a given DTD"?


