[sane-devel] scanning for archival and OCR

Jeremy Johnson jeremy at acjlaw.net
Wed Jan 23 22:57:26 UTC 2013

Generally, 1200 dpi resolution for text would be overkill unless you have a 
document with extremely tiny print (1-2 point instead of 10-12 point).

They used to recommend 150 dpi or even 75 dpi for scanning documents 
containing just plain text. But I scan at 300 dpi and also print ordinarily at 
300 dpi which for me is adequate quality for plain text documents. 

My photo scanner can scan at 2400 dpi but my printer can only print at a max 
of 1200 dpi resolution. Scanning at a higher resolution than that at which one 
prints can be useful for enlarging a portion of the image. Otherwise it's 
probably just a waste of disc space.

Similarly, I ordinarily wouldn't scan bills, invoices, receipts, etc. in 
color, since a Black&White 1-bit image would suffice for my needs. If someone 
were to ask me for a copy of a receipt or check, even a G3 fax would probably 
be good enough.

If I have a document with pages mixing text and color graphics/photos, I 
ordinarily scan at full color depth and use djvu wavelet compression, which 
generates reasonably small file sizes without sacrificing too much text clarity.

A typical grayscale scan of a black&white letter-sized document would result, 
after binarization, in a pdf filesize of 30-40K per page (depending on line 
spacing, line length, text size, text weight, etc)

A typical color scan with djvu wavelet compression would be about 10X as large 
(again depending on the mix of text/graphics)

More information about the sane-devel mailing list