[Python-modules-team] Bug#968865: pdf2txt can't read tagged PDF: a bytes-like object is required, not 'str'
John Scott
jscott at posteo.net
Sat Aug 22 18:39:57 BST 2020
Package: python3-pdfminer
Version: 20200726-1
Severity: normal
X-Debbugs-Cc:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256
I've not used this package before, so I can't assess whether this was
introduced in the recent update. Wanting to see if pdf2txt can help
check accessibility of documents, I tried this command on one of my
PDFs, but you can try with the quilt package installed:
pdf2txt -t tag /usr/share/doc/quilt/quilt.pdf
<page id="0" bbox="0.000,0.000,612.000,792.000" rotate="0">Traceback (most recent call last):
File "/usr/bin/pdf2txt", line 195, in <module>
sys.exit(main())
File "/usr/bin/pdf2txt", line 189, in main
outfp = extract_text(**vars(A))
File "/usr/bin/pdf2txt", line 57, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "/usr/lib/python3/dist-packages/pdfminer/high_level.py", line 85, in extract_text_to_fp
interpreter.process_page(page)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 895, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 908, in render_contents
self.execute(list_value(streams))
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 933, in execute
func(*args)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 802, in do_TJ
self.device.render_string(self.textstate, seq, self.ncs,
File "/usr/lib/python3/dist-packages/pdfminer/pdfdevice.py", line 159, in render_string
self.outfp.write(utils.enc(text))
TypeError: a bytes-like object is required, not 'str'
Installing pdfminer-data doesn't make a difference.
- -- System Information:
Debian Release: bullseye/sid
APT prefers testing
APT policy: (500, 'testing'), (2, 'unstable'), (1, 'testing-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 5.7.0-2-amd64 (SMP w/2 CPU threads)
Kernel taint flags: TAINT_USER, TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages python3-pdfminer depends on:
ii python3 3.8.2-3
ii python3-chardet 3.0.4-7
ii python3-cryptography 3.0-1
ii python3-sortedcontainers 2.1.0-2
Versions of packages python3-pdfminer recommends:
ii python3-crypto 2.6.1-13.1+b1
Versions of packages python3-pdfminer suggests:
pn pdfminer-data <none>
- -- no debconf information
-----BEGIN PGP SIGNATURE-----
iHUEARYIAB0WIQT287WtmxUhmhucNnhyvHFIwKstpwUCX0FYagAKCRByvHFIwKst
p8iPAPsFwOMclo2N31lGTjs9pa/oEFiw5WXCRYAquTVP7/ebQwD/T5rC85Nq+PtV
jGWE4Tou2WpDyRqWpqtsBSmyLlpt5gI=
=su45
-----END PGP SIGNATURE-----
More information about the Python-modules-team
mailing list