[Python-modules-team] Bug#968865: pdf2txt can't read tagged PDF: a bytes-like object is required, not 'str'

John Scott jscott at posteo.net
Sat Aug 22 18:39:57 BST 2020


Package: python3-pdfminer
Version: 20200726-1
Severity: normal
X-Debbugs-Cc: 

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

I've not used this package before, so I can't assess whether this was
introduced in the recent update. Wanting to see if pdf2txt can help
check accessibility of documents, I tried this command on one of my
PDFs, but you can try with the quilt package installed:

pdf2txt -t tag /usr/share/doc/quilt/quilt.pdf
<page id="0" bbox="0.000,0.000,612.000,792.000" rotate="0">Traceback (most recent call last):
File "/usr/bin/pdf2txt", line 195, in <module>
sys.exit(main())
File "/usr/bin/pdf2txt", line 189, in main
outfp = extract_text(**vars(A))
File "/usr/bin/pdf2txt", line 57, in extract_text
pdfminer.high_level.extract_text_to_fp(fp, **locals())
File "/usr/lib/python3/dist-packages/pdfminer/high_level.py", line 85, in extract_text_to_fp
interpreter.process_page(page)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 895, in process_page
self.render_contents(page.resources, page.contents, ctm=ctm)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 908, in render_contents
self.execute(list_value(streams))
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 933, in execute
func(*args)
File "/usr/lib/python3/dist-packages/pdfminer/pdfinterp.py", line 802, in do_TJ
self.device.render_string(self.textstate, seq, self.ncs,
File "/usr/lib/python3/dist-packages/pdfminer/pdfdevice.py", line 159, in render_string
self.outfp.write(utils.enc(text))
TypeError: a bytes-like object is required, not 'str'

Installing pdfminer-data doesn't make a difference.

- -- System Information:
Debian Release: bullseye/sid
  APT prefers testing
  APT policy: (500, 'testing'), (2, 'unstable'), (1, 'testing-debug'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.7.0-2-amd64 (SMP w/2 CPU threads)
Kernel taint flags: TAINT_USER, TAINT_FIRMWARE_WORKAROUND
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8), LANGUAGE not set
Shell: /bin/sh linked to /usr/bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages python3-pdfminer depends on:
ii  python3                   3.8.2-3
ii  python3-chardet           3.0.4-7
ii  python3-cryptography      3.0-1
ii  python3-sortedcontainers  2.1.0-2

Versions of packages python3-pdfminer recommends:
ii  python3-crypto  2.6.1-13.1+b1

Versions of packages python3-pdfminer suggests:
pn  pdfminer-data  <none>

- -- no debconf information

-----BEGIN PGP SIGNATURE-----

iHUEARYIAB0WIQT287WtmxUhmhucNnhyvHFIwKstpwUCX0FYagAKCRByvHFIwKst
p8iPAPsFwOMclo2N31lGTjs9pa/oEFiw5WXCRYAquTVP7/ebQwD/T5rC85Nq+PtV
jGWE4Tou2WpDyRqWpqtsBSmyLlpt5gI=
=su45
-----END PGP SIGNATURE-----



More information about the Python-modules-team mailing list