[Python-apps-team] Bug#906242: cannot export OCR'ed Russian text
Dmitry Eremin-Solenikov
dbaryshkov at gmail.com
Wed Aug 15 23:08:54 BST 2018
Package: ocrfeeder
Version: 0.8.1-4
Severity: important
After ocrfeeder has successfully OCR'ed Russian text, it is unable to
export it to any of the formats, dumping following errors to the
console:
Export to ODT
=================
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 284, in exportToOdt
self.exportToFormat('ODT', 'ODT')
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
name)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 605, in exportPagesWithGenerator
document_generator.addPage(page)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 293, in addPage
self.addBoxes(page_data.data_boxes)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 78, in addBoxes
self.addBox(data_box)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 66, in addBox
self.addText(data_box)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 251, in addText
text = data_box.getText().decode('utf-8')
File "/usr/lib/python2.7/encodings/utf_8.py", line 16, in decode
return codecs.utf_8_decode(input, errors, True)
UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-1: ordinal not in range(128)
====================
Export to HTML
===================
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 298, in exportDialog
self.EXPORT_FORMATS[format][1])
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
name)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 606, in exportPagesWithGenerator
document_generator.save()
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 207, in save
''' % {'title': self.name, 'body': self.bodies[i], 'previous_page': previous_page, 'next_page': next_page}
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 137: ordinal not in range(128)
====================
Export to TXT
====================
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 298, in exportDialog
self.EXPORT_FORMATS[format][1])
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/studioBuilder.py", line 281, in exportToFormat
name)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/studio/widgetModeler.py", line 605, in exportPagesWithGenerator
document_generator.addPage(page)
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 364, in addPage
self.addText(page.getTextFromBoxes())
File "/usr/lib/python2.7/dist-packages/ocrfeeder/feeder/documentGeneration.py", line 361, in addText
self.text += unicode(newText, 'utf-8')
TypeError: decoding Unicode is not supported
====================
-- System Information:
Debian Release: buster/sid
APT prefers testing
APT policy: (500, 'testing')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 4.18.0-rc4-amd64 (SMP w/4 CPU cores)
Locale: LANG=en_GB.utf8, LC_CTYPE=en_GB.utf8 (charmap=UTF-8), LANGUAGE=en_GB:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages ocrfeeder depends on:
ii cuneiform 1.1.0+dfsg-7
ii ghostscript 9.22~dfsg-2.1
ii gir1.2-goocanvas-2.0 2.0.4-1
ii gir1.2-gtk-3.0 3.22.30-2
ii gir1.2-gtkspell3-3.0 3.0.9-2
ii iso-codes 3.79-1
ii python 2.7.15-3
ii python-enchant 2.0.0-1
ii python-gi 3.28.2-1+b1
ii python-lxml 4.2.3-1
ii python-pil 5.2.0-2
ii python-reportlab 3.5.2-1
ii python-sane 2.8.3-1+b2
ii tesseract-ocr 4.00~git2844-607e8fd8-2
Versions of packages ocrfeeder recommends:
ii unpaper 6.1-2+b2
pn yelp <none>
ocrfeeder suggests no packages.
-- no debconf information
More information about the Python-apps-team
mailing list