Bug#971960: python3-debian: Needlessly recompiles 3 regex with each call to Deb822._internal_parser
Niels Thykier
niels at thykier.net
Sat Oct 10 16:53:54 BST 2020
Package: python3-debian
Severity: minor
Hi,
The Deb822 code for _internal_parser starts with:
> def _internal_parser(self,
> sequence, # type: InputDataType
> fields=None, # type: Optional[List[str]]
> strict=None, # type: Optional[Dict[str, bool]]
> ):
> # type: (...) -> None
> # The key is non-whitespace, non-colon characters before any colon.
> key_part = r"^(?P<key>[^: \t\n\r\f\v]+)\s*:\s*"
> single = re.compile(key_part + r"(?P<data>\S.*?)\s*$")
> multi = re.compile(key_part + r"$")
> multidata = re.compile(r"^\s(?P<data>.+?)\s*$")
Note the 3 calls to re.compile for constant regular expressions (i.e.
the text passed to re:compile does not change).
This method is in turn called at least once per paragraph in your input
file when using Deb822.iter_paragraphs, meaning 3 x N re.compile calls
where 3 had been sufficient. The use of python3-apt will work around
this deficiency, but in turn that requires python3-apt which can be
problematic during architecture bootstrapping scenarios.
~Niels
More information about the pkg-python-debian-maint
mailing list