[sane-devel] scanning for archival and OCR
jeremy at acjlaw.net
Wed Jan 23 16:02:24 UTC 2013
Hmmm, I guess I learn something new every day.
I wouldn't have suspected that ghostscript writer could concatenate pdf's and
save so much during compression.
So I just did a test, scanning some tax forms
in 8-bit grayscale to z[0001 --- 0019].pdf using xsane
and then combining using both gs and pdftk.
$ du -csh z00??.pdf
# Now concatenate using pdftk
$ pdftk z00??.pdf cat output PDFTK.pdf
$ ls -sh PDFTK.pdf
# Concatenate using ghostscript's re-write
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=GS.pdf z00??.pdf
$ ls -sh GS.pdf
Of course, pdftk allows mixing papersizes. Ghostscript's writer will truncate
pages which are larger then the default or specified pagesize. Not sure if
ghostscript can write pdfs with mixed papersizes.
For good measure, I also tried pdfjam/pdfjoin/pdflatex and it too just
concatenates the pdfs into a 16M file.
More information about the sane-devel