[Python-apps-team] Bug#906242: cannot export OCR'ed Russian text

Dmitry Eremin-Solenikov dbaryshkov at gmail.com
Wed Aug 15 23:08:54 BST 2018


Package: ocrfeeder
Version: 0.8.1-4
Severity: important

After ocrfeeder has successfully OCR'ed Russian text, it is unable to
export it to any of the formats, dumping following errors to the
console:

Export to ODT
=================
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 284, in exportToOdt
    self.exportToFormat('ODT', 'ODT')
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
    name)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 605, in exportPagesWithGenerator
    document_generator.addPage(page)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 293, in addPage
    self.addBoxes(page_data.data_boxes)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 78, in addBoxes
    self.addBox(data_box)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 66, in addBox
    self.addText(data_box)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 251, in addText
    text = data_box.getText().decode('utf-8')
  File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
    return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
====================

Export to HTML
===================
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 298, in exportDialog
    self.EXPORT_FORMATS[format][1])
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
    name)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 606, in exportPagesWithGenerator
    document_generator.save()
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 207, in save
    ''' % {'title': self.name, 'body': self.bodies[i], 'previous_page': previous_page, 'next_page': next_page}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 137: ordinal not in range(128)
====================

Export to TXT
====================
Traceback (most recent call last):
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 298, in exportDialog
    self.EXPORT_FORMATS[format][1])
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
    name)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 605, in exportPagesWithGenerator
    document_generator.addPage(page)
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 364, in addPage
    self.addText(page.getTextFromBoxes())
  File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 361, in addText
    self.text += unicode(newText, 'utf-8')
TypeError: decoding Unicode is not supported
====================


-- System Information:
Debian Release: buster/sid
  APT prefers testing
  APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 4.18.0-rc4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages ocrfeeder depends on:
ii  cuneiform             1.1.0+dfsg-7
ii  ghostscript           9.22~dfsg-2.1
ii  gir1.2-goocanvas-2.0  2.0.4-1
ii  gir1.2-gtk-3.0        3.22.30-2
ii  gir1.2-gtkspell3-3.0  3.0.9-2
ii  iso-codes             3.79-1
ii  python                2.7.15-3
ii  python-enchant        2.0.0-1
ii  python-gi             3.28.2-1+b1
ii  python-lxml           4.2.3-1
ii  python-pil            5.2.0-2
ii  python-reportlab      3.5.2-1
ii  python-sane           2.8.3-1+b2
ii  tesseract-ocr         4.00~git2844-607e8fd8-2

Versions of packages ocrfeeder recommends:
ii  unpaper  6.1-2+b2
pn  yelp     <none>

ocrfeeder suggests no packages.

-- no debconf information



More information about the Python-apps-team mailing list