[sane-devel] scanning for archival and OCR
David H. Durgee
dhdurgee at comcast.net
Wed Jan 23 21:22:17 UTC 2013
After reading the responses on this thread I decided to try something
out. I picked an old, 3 page telephone bill on 10.3 cm x 17 cm pages.
I scanned this bill with my MG2120 at 1200 bpi full color and saved the
pnm files from xsane for later processing. Each file is 114,489K in
size as these are high-res full-color scans. I then opened each of
these in Gimp, increase contrast by 30 points and saved the results in a
jpg file with quality setting of 0 to maximize compression. This
produced files of 310-340K each. I then used convert to create pdf
versions of them. These pdfs range from 312K to 342K in size. I then
used pdftk to create a single 989K pdf file. Interestingly, using the
gs command as per another post in this thread created a single 2,969K
pdf file!
The page images resulting from this process are very readable, although
bleed-through from printing on the back of each page results in some
strange variations in the background shading. If anyone has suggestions
for alternative processing of the page images that might produce better
results, I will test them on this bill. I might also try a lower
resolution scan, but while I imagine that will result in smaller files I
suspect readability of these smaller files may be impaired.
Dave
More information about the sane-devel
mailing list