Bug#948891: licensecheck --encoding utf8 exits on error when parsing binary files

Dominique Dumont dod at debian.org
Tue Jan 14 11:06:27 GMT 2020


Package: licensecheck
Version: 3.0.39-1
Severity: normal

Dear Maintainer,

When used with --encoding utf8 option, licensecheck exits on error when
parsing png files.

This was found with scikit-learn package:
$ licensecheck --encoding utf8 --copyright --machine --deb-fmt --recursive doc/testimonials 
doc/testimonials/README.txt     UNKNOWN *No copyright*
doc/testimonials/testimonials.rst       UNKNOWN *No copyright*
utf8 "\xFF" does not map to Unicode at /usr/share/licensecheck/App/Licensecheck.pm line 358.
$ echo $?
25

Strace show that licensecheck is tripped by birchbox.jpg file:

$ strace licensecheck --encoding utf8 --copyright --machine --deb-fmt --recursive doc/testimonials
[snip]
stat("doc/testimonials/images/birchbox.jpg", {st_mode=S_IFREG|0644, st_size=14595, ...}) = 0
openat(AT_FDCWD, "doc/testimonials/images/birchbox.jpg", O_RDONLY|O_CLOEXEC) = 3
ioctl(3, TCGETS, 0x7ffe9ca1fc10)        = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR)                   = 0
ioctl(3, TCGETS, 0x7ffe9ca1fc40)        = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0644, st_size=14595, ...}) = 0
read(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\342\7\270ICC_PROF"..., 8192) = 8192
write(2, "utf8 \"\\xFF\" does not map to Unic"..., 93utf8 "\xFF" does not map to Unicode at /usr/share/licensecheck/App/Licensecheck.pm line 358.
) = 93
lseek(3, 0, SEEK_SET)                   = 0
lseek(3, 0, SEEK_CUR)                   = 0
close(3)                                = 0

I'd suggest to either:
- skip binary files
- read binary file without utf8 encoding (even if --encoding utf8 is used to run licensecheck)
- for image files, use exiftool or Image::Exif to extract license information from copyright tags See Image::ExifTool::TagNames for tag list (which unfortunately depend on file format)

All the best

Dod



-- System Information:
Debian Release: bullseye/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386

Kernel: Linux 5.4.0-2-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled

Versions of packages licensecheck depends on:
ii  libgetopt-long-descriptive-perl        0.104-1
ii  liblog-any-adapter-screen-perl         0.140-1
ii  liblog-any-perl                        1.707-1
ii  libmoo-perl                            2.003006-1
ii  libnamespace-clean-perl                0.27-1
ii  libpath-iterator-rule-perl             1.014-1
ii  libpath-tiny-perl                      0.108-1
ii  libpod-constants-perl                  0.19-1
ii  libre-engine-re2-perl                  0.13-4+b1
ii  libregexp-pattern-license-perl         3.1.99-1
ii  libregexp-pattern-perl                 0.2.11-1
ii  libscalar-list-utils-perl              1:1.53-1
ii  libsort-key-perl                       1.33-2+b2
ii  libstrictures-perl                     2.000006-1
ii  libstring-copyright-perl               0.003006-1
ii  libstring-escape-perl                  2010.002-2
ii  libtry-tiny-perl                       0.30-1
ii  perl                                   5.30.0-9
ii  perl-base [libscalar-list-utils-perl]  5.30.0-9

licensecheck recommends no packages.

Versions of packages licensecheck suggests:
ii  bash-completion  1:2.9-1

-- no debconf information



More information about the pkg-perl-maintainers mailing list