Bug#971960: python3-debian: Needlessly recompiles 3 regex with each call to Deb822._internal_parser

Niels Thykier niels at thykier.net
Sat Oct 10 16:53:54 BST 2020


Package: python3-debian
Severity: minor

Hi,

The Deb822 code for _internal_parser starts with:

>     def _internal_parser(self,
>                          sequence,      # type: InputDataType
>                          fields=None,   # type: Optional[List[str]]
>                          strict=None,   # type: Optional[Dict[str, bool]]
>                          ):
>         # type: (...) -> None
>         # The key is non-whitespace, non-colon characters before any colon.
>         key_part = r"^(?P<key>[^: \t\n\r\f\v]+)\s*:\s*"
>         single = re.compile(key_part + r"(?P<data>\S.*?)\s*$")
>         multi = re.compile(key_part + r"$")
>         multidata = re.compile(r"^\s(?P<data>.+?)\s*$")


Note the 3 calls to re.compile for constant regular expressions (i.e.
the text passed to re:compile does not change).

This method is in turn called at least once per paragraph in your input
file when using Deb822.iter_paragraphs, meaning 3 x N re.compile calls
where 3 had been sufficient.  The use of python3-apt will work around
this deficiency, but in turn that requires python3-apt which can be
problematic during architecture bootstrapping scenarios.

~Niels



More information about the pkg-python-debian-maint mailing list