[Python-modules-team] Bug#919008: python3-pdfminer: should depend on python3-pycryptodome, not recommend python3-crypto

Daniele Tricoli eriol at mornie.org
Sat Jan 12 00:30:23 GMT 2019


Hello Sean,
meny thanks for this report!

On 11/01/2019 17:58, Sean Whitton wrote:
> Package: python3-pdfminer
> Version: 20181108+dfsg-2
> 
> Dear maintainer,
> 
> pdfminer.six's setup.py says that it requires pycryptodome, so I would
> expect to see
> 
>     Depends: python3-pycryptodome
> 
> but instead there is
> 
>     Recommends: python3-crypto
> 
> which seems to be wrong in two ways:
> 
> 1) it is a recommends, not a hard depends, but in setup.py it is listed
>    as 'required'

You are right, it's listed as a hard depends, but it used only in the
following module:

>>> pdfminer.pdfdocument
<module 'pdfminer.pdfdocument' from
'/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py'>

If we look at the code we will see:

try:
    from Crypto.Cipher import ARC4
    from Crypto.Cipher import AES
    from Crypto.Hash import SHA256
except ImportError:
    AES = SHA256 = None
    from . import arcfour as ARC4

So if an ImportError is raised the error will be catched and only AES
and SHA256 will be disabled. For ARC4 there is an implementation inside
pdfminer itself used as fallback.

This is why I chose recommends instead of depends, I think that could be
a valid usecase to not have pycrypto (or pycryptodome) as hard
dependency since it's not really required. I usually do this when I
discover that a library is not really required.

Is this causing some problems?

> 2) it is -crypto, rather than the -cryptodome fork.
> 
> I note that python3-pycryptodome is broken (#886291) and is not likely
> to be fixed before the transitions freeze, but I am not really sure
> whether that bug blocks this one or not.

I also stumbled on #886291, and since only AES, ARC4 and SHA256 are used
(and I don't think that the implementation in pycrypto of those
algorithms should worry us) I chose pycrypto (also it was used in the
original pdfminer) instead of pycryptodome.

I tested it with using the following steps (with python3-crypto installed):

❯ cat test.tex
\documentclass[a4paper]{article}

\begin{document}
\thispagestyle{empty}


This is a test!

\end{document}

❯ lualatex test.tex
[CUT output]

❯ qpdf --encrypt 1234 1234 40 -- test.pdf test-encrypted_ARC4.pdf
❯ pdf2txt -P 1234 test-encrypted_ARC4.pdf
This is a test!

❯ qpdf --encrypt 1234 1234 128 --use-aes=y -- test.pdf test-encrypted_AES.pdf
❯ pdf2txt -P 1234 test-encrypted_AES.pdf
This is a test!

Uninstalling python{,3}-crypto we will not be able do decrypt the file
encrypted with AES:

❯ pdf2txt -P 1234 test-encrypted_AES.pdf
Traceback (most recent call last):
  File "/usr/bin/pdf2txt", line 136, in <module>
    if __name__ == '__main__': sys.exit(main())
  File "/usr/bin/pdf2txt", line 131, in main
    outfp = extract_text(**vars(A))
  File "/usr/bin/pdf2txt", line 63, in extract_text
    pdfminer.high_level.extract_text_to_fp(fp, **locals())
  File "/usr/lib/python3/dist-packages/pdfminer/high_level.py", line 80, in
extract_text_to_fp
    check_extractable=True):
  File "/usr/lib/python3/dist-packages/pdfminer/pdfpage.py", line 129, in get_pages
    doc = PDFDocument(parser, password=password, caching=caching)
  File "/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py", line 577, in
__init__
    self._initialize_password(password)
  File "/usr/lib/python3/dist-packages/pdfminer/pdfdocument.py", line 602, in
_initialize_password
    raise PDFEncryptionError('Unknown algorithm: param=%r' % param)
pdfminer.pdfdocument.PDFEncryptionError: Unknown algorithm: param={'CF':
{'StdCF': {'AuthEvent': /'DocOpen', 'CFM': /'AESV2', 'Length': 16}}, 'Filter':
/'Standard', 'Length': 128, 'O':
b'\xc4\x8f\x00\x1f\xdcy\xa00\xd7\x18\xdf]\xbb\xda\xad\x81\xd1\xf6\xfe\xde\xc4\xa7\xb5\xcd\x98\rd\x13\x9e\xdf\xcb~',
'P': -4, 'R': 4, 'StmF': /'StdCF', 'StrF': /'StdCF', 'U':
b'\xfe\xeb\xed\x0e\x0f{2}r\xc5g\xc7\xf2]\xf0\xf4\x01"Ej\x91\xba\xe5\x13Bs\xa6\xdb\x13L\x87\xc4',
'V': 4}

but the ARC4 will be fine, due the provided implementation of pdfminer:

❯ pdf2txt -P 1234 test-encrypted_ARC4.pdf
This is a test!

Did you have problems with encryption? I only know qpdf to perform encryption,
so I used it, but if you can suggest more tools to test this feature I'll be
happy to try them.

Since all was fine I did not investigated more on pycryptodome, but please tell
me if I missed something.

> Please excuse my limited knowledge of python library packaging.

No need to excuse, your point are perfectly valid without an in-depth analysis
and I'm sorry for not putting what I just wrote in a README.Debian file inside
the package to explain why I did these choices. I did not thought about it, I
taken for granted... :(

My plan was to use python{,3}-crypto for Buster and then switch to pycryptodome
in Buster+1, but please tell me if something is not working or there is
something that I did not considered.

Regards,

-- 
  Daniele Tricoli 'eriol'
  https://mornie.org



-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 833 bytes
Desc: OpenPGP digital signature
URL: <http://alioth-lists.debian.net/pipermail/python-modules-team/attachments/20190112/f8cef097/attachment.sig>


More information about the Python-modules-team mailing list