[sane-devel] archiving old documents

Tue Feb 22 15:48:25 UTC 2011

On Monday, February 21, 2011, gobo wrote:
> hello,
> 
> for some time how, i've been storing documents using the following in
> scripts:
> 
> scanimage -x 215.9 -y 297 -pv --resolution=300 --mode=color -d
> hpaio:... > file.png
> convert -density 300 file.png file.ps
> ps2pdf file.ps
> 
> the png and ps files are deleted.  this has resulted in a reasonable
> compromise between storage requirements and the infrequent need to
> print the document.
> 
> now i wish to archive some old family documents that date back beyond
> 1930.  most is text, but there are a few b&w photos in the mix.
> storage space is not a concern.  my plans are to save the scanned
> images, but also build a pdf for distribution by email.
> 
> those of you who have already made such archives, what formats and
> resolutions have you found produce the best results should others
> wish to touch up or make other uses of the images?
> 
> 
> thanks.

If you plan on touching up photos, I think you should always work with the 
best original image in a lossless format. If you have to store your scans in a 
lossy format like jpeg, you definitely should always work with the original 
image because each time you edit/save the image, detail is forever lost.

I had some old family photos (prints, slides, film) that I wanted to preserve, 
so I scanned them at the best resolution my photo scanner would support. I 
then added the images to the genealogy program "Gramps". The progam 
automatically made thumbnail images and printed reports with the images scaled 
to appropriate sizes, but the original images at the original resolution could 
always be extracted/saved from the Gramps database. One caveat with Gramps is 
that (the last time I checked) it does not support importing/exporting to the 
German GEDCOM extensions

For handwritten letters, newspaper articles, etc., I usually scanned in 300dpi 
grayscale then binarized to black&white and converted to pdf. Other documents 
were scanned in color and likewise converted to pdf or djvu. 

DJVU is highly recommended for colored/mixed text documents and usually 
results in  significant compression (~10-100X) compared to pdf.

Will you be storing the images/pdfs as individual files in a compressed archive 
(zipfile, tarball) or will you concatenating the pdf files into a larger 
document. If the latter, then tools such as pdftk, pdfjam might be useful, but 
for anything more complicated I would use latex with the pdfpages and 
attachfile packages.