[sane-devel] scanning for archival and OCR
Jeremy Johnson
jeremy at acjlaw.net
Wed Jan 23 16:02:24 UTC 2013
Hmmm, I guess I learn something new every day.
I wouldn't have suspected that ghostscript writer could concatenate pdf's and
save so much during compression.
So I just did a test, scanning some tax forms
in 8-bit grayscale to z[0001 --- 0019].pdf using xsane
and then combining using both gs and pdftk.
The results:
$ du -csh z00??.pdf
388K z0001.pdf
1.2M z0002.pdf
1.3M z0003.pdf
1.1M z0004.pdf
1.6M z0005.pdf
892K z0006.pdf
724K z0007.pdf
908K z0008.pdf
1.3M z0009.pdf
728K z0010.pdf
556K z0011.pdf
196K z0012.pdf
1.4M z0013.pdf
196K z0014.pdf
580K z0015.pdf
472K z0016.pdf
376K z0017.pdf
920K z0018.pdf
1.3M z0019.pdf
16M total
# Now concatenate using pdftk
$ pdftk z00??.pdf cat output PDFTK.pdf
$ ls -sh PDFTK.pdf
16M PDFTK.pdf
# Concatenate using ghostscript's re-write
$ gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=GS.pdf z00??.pdf
$ ls -sh GS.pdf
8.5M GS.pdf
Of course, pdftk allows mixing papersizes. Ghostscript's writer will truncate
pages which are larger then the default or specified pagesize. Not sure if
ghostscript can write pdfs with mixed papersizes.
For good measure, I also tried pdfjam/pdfjoin/pdflatex and it too just
concatenates the pdfs into a 16M file.
More information about the sane-devel
mailing list