[sane-devel] scanning for archival and OCR

Wed Jan 23 21:22:17 UTC 2013

After reading the responses on this thread I decided to try something 
out.  I picked an old, 3 page telephone bill on 10.3 cm x 17 cm pages.  
I scanned this bill with my MG2120 at 1200 bpi full color and saved the 
pnm files from xsane for later processing.  Each file is 114,489K in 
size as these are high-res full-color scans.  I then opened each of 
these in Gimp, increase contrast by 30 points and saved the results in a 
jpg file with quality setting of 0 to maximize compression.  This 
produced files of 310-340K each.  I then used convert to create pdf 
versions of them.  These pdfs range from 312K to 342K in size.  I then 
used pdftk to create a single 989K pdf file.  Interestingly, using the 
gs command as per another post in this thread created a single 2,969K 
pdf file!

The page images resulting from this process are very readable, although 
bleed-through from printing on the back of each page results in some 
strange variations in the background shading.  If anyone has suggestions 
for alternative processing of the page images that might produce better 
results, I will test them on this bill.  I might also try a lower 
resolution scan, but while I imagine that will result in smaller files I 
suspect readability of these smaller files may be impaired.

Dave