[sane-devel] Need Document Scanning on Linux for Kodak i40, i800
Roger Price
rprice at cs.uml.edu
Wed Apr 26 07:25:30 UTC 2006
On Tue, 25 Apr 2006, rcjohnson at openvotingsolutions.com wrote:
> Please advise me as to availability of SANE with capability of scanning
> documents to produce XML,
Hello Richard, Producing a text file is a function of the software which
comes bundled with the scanner rather than the scanner itself. Sane does
not itself provide OCR, but calls gocr to produce a text file. At level
0.3.5, gocr supported output "formats" ISO8859_1 TeX HTML UTF8. It would
probably be better to call these "character encodings" rather than
formats. http://jocr.sourceforge.net (Note the j.)
My experience with gocr is that the text file requires human review and
correction to be usable. Commercial OCR does better but will never be
100% accurate.
When you say "produce XML", do you mean "produce a valid marked-up
document according to a given DTD"?
Roger
More information about the sane-devel
mailing list