Bug#948891: licensecheck --encoding utf8 exits on error when parsing binary files
Dominique Dumont
dod at debian.org
Tue Jan 14 11:06:27 GMT 2020
Package: licensecheck
Version: 3.0.39-1
Severity: normal
Dear Maintainer,
When used with --encoding utf8 option, licensecheck exits on error when
parsing png files.
This was found with scikit-learn package:
$ licensecheck --encoding utf8 --copyright --machine --deb-fmt --recursive doc/testimonials
doc/testimonials/README.txt UNKNOWN *No copyright*
doc/testimonials/testimonials.rst UNKNOWN *No copyright*
utf8 "\xFF" does not map to Unicode at /usr/share/licensecheck/App/Licensecheck.pm line 358.
$ echo $?
25
Strace show that licensecheck is tripped by birchbox.jpg file:
$ strace licensecheck --encoding utf8 --copyright --machine --deb-fmt --recursive doc/testimonials
[snip]
stat("doc/testimonials/images/birchbox.jpg", {st_mode=S_IFREG|0644, st_size=14595, ...}) = 0
openat(AT_FDCWD, "doc/testimonials/images/birchbox.jpg", O_RDONLY|O_CLOEXEC) = 3
ioctl(3, TCGETS, 0x7ffe9ca1fc10) = -1 ENOTTY (Inappropriate ioctl for device)
lseek(3, 0, SEEK_CUR) = 0
ioctl(3, TCGETS, 0x7ffe9ca1fc40) = -1 ENOTTY (Inappropriate ioctl for device)
fstat(3, {st_mode=S_IFREG|0644, st_size=14595, ...}) = 0
read(3, "\377\330\377\340\0\20JFIF\0\1\1\1\0H\0H\0\0\377\342\7\270ICC_PROF"..., 8192) = 8192
write(2, "utf8 \"\\xFF\" does not map to Unic"..., 93utf8 "\xFF" does not map to Unicode at /usr/share/licensecheck/App/Licensecheck.pm line 358.
) = 93
lseek(3, 0, SEEK_SET) = 0
lseek(3, 0, SEEK_CUR) = 0
close(3) = 0
I'd suggest to either:
- skip binary files
- read binary file without utf8 encoding (even if --encoding utf8 is used to run licensecheck)
- for image files, use exiftool or Image::Exif to extract license information from copyright tags See Image::ExifTool::TagNames for tag list (which unfortunately depend on file format)
All the best
Dod
-- System Information:
Debian Release: bullseye/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Foreign Architectures: i386
Kernel: Linux 5.4.0-2-amd64 (SMP w/8 CPU cores)
Kernel taint flags: TAINT_PROPRIETARY_MODULE, TAINT_WARN, TAINT_OOT_MODULE, TAINT_UNSIGNED_MODULE
Locale: LANG=en_GB.UTF-8, LC_CTYPE=en_GB.UTF-8 (charmap=UTF-8), LANGUAGE=en_US:en (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Init: systemd (via /run/systemd/system)
LSM: AppArmor: enabled
Versions of packages licensecheck depends on:
ii libgetopt-long-descriptive-perl 0.104-1
ii liblog-any-adapter-screen-perl 0.140-1
ii liblog-any-perl 1.707-1
ii libmoo-perl 2.003006-1
ii libnamespace-clean-perl 0.27-1
ii libpath-iterator-rule-perl 1.014-1
ii libpath-tiny-perl 0.108-1
ii libpod-constants-perl 0.19-1
ii libre-engine-re2-perl 0.13-4+b1
ii libregexp-pattern-license-perl 3.1.99-1
ii libregexp-pattern-perl 0.2.11-1
ii libscalar-list-utils-perl 1:1.53-1
ii libsort-key-perl 1.33-2+b2
ii libstrictures-perl 2.000006-1
ii libstring-copyright-perl 0.003006-1
ii libstring-escape-perl 2010.002-2
ii libtry-tiny-perl 0.30-1
ii perl 5.30.0-9
ii perl-base [libscalar-list-utils-perl] 5.30.0-9
licensecheck recommends no packages.
Versions of packages licensecheck suggests:
ii bash-completion 1:2.9-1
-- no debconf information
More information about the pkg-perl-maintainers
mailing list