Bug#880368: YAML::XS::Load expects utf8 octets, not perl's encoding; use slurp_raw

Andrej Shadura andrew.shadura at collabora.co.uk
Fri Dec 13 13:23:46 GMT 2019


On Sun, 05 Nov 2017 18:32:48 +0100 Dominique Dumont <dod at debian.org> wrote:
> On Monday, 30 October 2017 15:27:32 CET you wrote:
> > YAML::XS::Load (and *hopefully* the other implementations of
> > YAML::Any::Load?) expect utf8 octets on input, not perl's internal
> > encoding.
> 
> Uh ? I thought I had gotten rid of YAML::Any... Well, after checking, it turns 
> out that I've updated Config;:Model::Backend::Yaml, but I forgot to update 
> Dpkg::Scanner.
> 
> Anyway, using YAML::Any has several problems:
> - it's deprecated
> - it may load YAML or YAML::XS which have some security issues [1]
> 
> > Thus, slurp_raw should be used instead of slurp_utf8. [Though really,
> > YAML::XS::Load should probably do the right thing if is_utf8 is on,
> > anyway.]
> 
> Unfortunately, the strings returned by YAML::XS is not tagged as utf-8, which 
> leads to writing mojibake when cme is used to update debian/copyright.
>
> Given the security issues of YAML and YAML::XS, I'm not going to tweak the 
> structure returned by YAML::XS to fix the utf8 flag of each scalar contained 
> the structure (and may be all hash keys ..)
> 
> Instead, I'm going to replace YAML::Any with YAML::Tiny (which is more than 
> enough in this case).

Unfortunately, YAML::Tiny disallows some valid YAML markup, in
particular what pyyaml generates by default and which is very difficult
to change without in-depth hacking of it:

".*":
  "license": |-
    GPL-2
"debian/":
  "copyright": "A B <a at a>\n B C <b at b>\n C\
    \ D <c at c>\n D E <d at d>\n E F\
    \ <e at e>\n F G <f at f>\n G H <g at g>"
  "license": |-
    GPL-2+

As a temporary workaround, I patched the locally used version to use
YAML::XS, but as I see you won’t accept this patch upstream. Is there a
solution that would satisfy both conditions of how having security
issues and supporting proper YAML? By the way, what are those security
issues and how serious and relevant to scan-copyrights are they?

> Thanks for the report . This helps me improve dpkg model for cme (and led to 
> the release of Config::Model::Tester 3.003 which did not handle utf-8 
> correctly while checking file content).


-- 
Cheers,
  Andrej



More information about the pkg-perl-maintainers mailing list