Bug#926392: licensecheck chokes on long lines
Niels Thykier
niels at thykier.net
Wed Apr 17 08:08:00 BST 2019
On Thu, 04 Apr 2019 18:13:43 +0200 Jonas Smedegaard <jonas at jones.dk> wrote:
> control: tag -1 confirmed
>
> Quoting Sandro Mani (2019-04-04 13:36:28)
> > $ wget https://files.pythonhosted.org/packages/source/x/xonsh/xonsh-0.8.12.tar.gz
> > $ tar xf xonsh-0.8.12.tar.gz
> > $ licensecheck xonsh-0.8.12/xonsh/parser_table.py
> >
> > => Licensecheck hangs eating cpu cycles (the file has lines with 33k and
> > 71k characters).
>
> Indeed. Thanks for reporting!
>
> - Jonas
>
> --
> * Jonas Smedegaard - idealist & Internet-arkitekt
> * Tlf.: +45 40843136 Website: http://dr.jones.dk/
>
> [x] quote me freely [ ] ask before reusing [ ] keep private
Hi,
I have been digging in the code (admittedly using the master branch of
the libregexp-pattern-license-perl and licensecheck rather than the
packages) and basically, it is a DOS from suboptimal regex.
I traced it down to getting stuck on the python_2 "grant_license". This
regex expands to (manually reformatted with /x for readability):
"""
m!
(?^:
(?:
(?: (?:[Ll]icensed|[Rr]eleased) [ ] under|(?:according [ ] to|as
[ ] governed [ ] by|under) [ ] the [ ] (?:conditions|terms)
[ ] of)(?:(?:[Tt]he [ ] )?Python-2.0
| (?:[Tt]he [ ])?Python(?: [ ] [Ll]icense)? [ ] 2.0
| (?:[Tt]he [ ])?Python-2.0
| (?:[Tt]he [ ])?Python [ ] Software [ ]
Foundation(?: [ ] [Ll]icense)? [ ] version [ ] 2
| (?:[Tt]he [ ])?python2
| (?:[Tt]he [ ])?Python-2
| (?:[Tt]he [ ])?PSF-2
| (?:[Tt]he [ ])?Python(?: [ ] [Ll]icense)? [ ] Version [ ] 2
| (?:[Tt]he [ ])?PYTHON [ ] SOFTWARE [ ] FOUNDATION [ ] LICENSE [
] VERSION [ ] 2
| (?:[Tt]he [ ])?python-license-2.0)
| (?:\W*\S+\W*)PSF [ ] is [ ] making [ ] Python [ ] available [ ]
to [ ] Licensee
)
)
!x
"""
The problem is the *last* alternative, namely:
"""
(?:\W*\S+\W*)PSF [ ] is [ ] making [ ] [...]
"""
That \W*\S+\W* (known as ${BB} in the libregexp-pattern-license-perl
code) is stirring up hell. Basically, perl wants to find the *longest*
match and will spent stupid amount of time in this "trivial" regex
enumerating exponentially many "non-matches" ([1] strikes again).
Simply removing ${BB} will make the code continue past the python_2 test
relatively fast. For the python_2 case, I think that the phrase "PSF
is making Python available to Licensee" would be sufficient enough to
consider it a match (i.e. ${BB} is redundant) - though it will change
behaviour on an anchored match (I hope this is not a problem).
Though it then gets stuck in the next regex "cube" (from
@L_type_unversioned) and that is as far down the rabbit hole I ventured
in terms of regex getting stuck (note that "cube" indirectly uses the
$BB regex too).
Thanks,
~Niels
[1] https://swtch.com/~rsc/regexp/regexp1.html
More information about the pkg-perl-maintainers
mailing list