[Python-apps-team] Bug#646605: ocrfeeder: too much memory required when opening multipage documents

Tue Oct 25 16:10:43 UTC 2011

Package: ocrfeeder
Version: 0.7.6-1
Severity: normal

Dear Maintainer,
I tried ocrfeeder with a 26 page document and found it hard to
use. When I imported the directory containing my images (300dpi
greyscale scans from A4, in total 216 MByte as pnm) the memory usage kept
rising and rising. I have 2 GByte of RAM, ocrfeeder filled them
for about 80%, the system was waiting for the hard disk for
minutes. I had the impression that ocrfeeder loads all images into
memory at the same time and uncompresses them there.

I think an "ocr feeder" should handle multipage documents better
than this. For example gscan2pdf is able to import such documents
without bringing my system to near-halt.

Thanks for your work!

Michael Below

-- System Information:
Debian Release: wheezy/sid
  APT prefers testing
  APT policy: (900, 'testing'), (500, 'stable-updates'), (500, 'proposed-updates'), (500, 'stable'), (10, 'unstable')
Architecture: amd64 (x86_64)

Kernel: Linux 3.0.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=de_DE.UTF-8, LC_CTYPE=de_DE.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages ocrfeeder depends on:
ii  cuneiform            1.1.0+dfsg-1
ii  ghostscript          9.04~dfsg-2 
ii  gocr                 0.48-1      
ii  python               2.7.2-9     
ii  python-enchant       1.6.5-2     
ii  python-gnome2        2.28.1-3    
ii  python-gtk2          2.24.0-2    
ii  python-gtkspell      2.25.3-10.1 
ii  python-imaging-sane  1.1.7-4     
ii  python-pygoocanvas   0.14.1-1+b3 
ii  python-reportlab     2.5-1.1     
ii  python2.6            2.6.7-3     
ii  python2.7            2.7.2-5     
ii  tesseract-ocr        2.04-2.1    

Versions of packages ocrfeeder recommends:
ii  unpaper  0.3-1             
ii  yelp     2.30.1+webkit-1+b1

ocrfeeder suggests no packages.

-- no debconf information