[sane-devel] scanning for archival and OCR
Jeremy Johnson
jeremy at acjlaw.net
Tue Jan 22 18:24:51 UTC 2013
The perl application gscan2pdf will probably do what you need:
http://gscan2pdf.sourceforge.net/
I use a shell script "bscan" for scanning to pnm then conversion to e.g. pdf.
Since my scanner scans better in 8-bit grayscale then 2-bit B&W,
I scan in 8-bit grayscale @ 300dpi then convert to bitonal Black&White using
djvu wavelet compression (option -BW in my script):
bscan --mode=8-bit --shades=2 --page=Legal --comp=lzw -BW FILE
Sometimes I may need to use a photo scanner with high optical resolution (e.g.
an Epson with 24-bit grayscale). If I need to scan in color, I usually scan to
pnm then convert to djvu using c44, e.g.:
bscan --mode=color --shades=truecolor --page=Letter -c44 --djvutopdf=25 FILE
http://www.acjlaw.net:8080/~jeremy/Ricoh/usage_bscan.html
I haven't had much luck with any of the open source OCR programs. Maybe max
90% accuracy on straight B/W text with no logos, rules, underlines and all
text horizontal and of the same font weight and shape.
More information about the sane-devel
mailing list