Bug#711448: libhtml-copy-perl: FTBFS with perl 5.18: test failure

Tue Jun 18 20:08:43 UTC 2013

On Mon, Jun 17, 2013 at 08:35:28PM +0200, gregor herrmann wrote:
> Control: tag -1 + patch
> 
> On Thu, 06 Jun 2013 22:45:23 +0100, Dominic Hargreaves wrote:
> 
> > Strings with code points over 0xFF may not be mapped into in-memory file handles
> > readline() on closed filehandle $in at /build/dom-libhtml-copy-perl_1.30-1-i386-
> > fEvCSD/libhtml-copy-perl-1.30/blib/lib/HTML/Copy.pm line 255.
> > Use of uninitialized value in subroutine entry at /build/dom-libhtml-copy-perl_1
> > .30-1-i386-fEvCSD/libhtml-copy-perl-1.30/blib/lib/HTML/Copy.pm line 258.
> > Use of uninitialized value in concatenation (.) or string at /build/dom-libhtml-
> > copy-perl_1.30-1-i386-fEvCSD/libhtml-copy-perl-1.30/blib/lib/HTML/Copy.pm line 2
> > 76.
> > Can't guess encoding of  at /build/dom-libhtml-copy-perl_1.30-1-i386-fEvCSD/libh
> > tml-copy-perl-1.30/blib/lib/HTML/Copy.pm line 276.
> > # Looks like you planned 16 tests but ran 6.
> > # Looks like your test exited with 255 just after 6.
> > t/parse.t .... 
> > Dubious, test returned 255 (wstat 65280, 0xff00)
> > Failed 1/2 test programs. 0/7 subtests failed.
> > Failed 10/16 subtests 
> 
> "Strings with code points over 0xFF may not be mapped into in-memory file handles"
> happens t/parse.t, line 181:
>     open my $in, "<", \$src_html_utf8;
> (where $src_html_utf8 contains HTML with some nice characters (ああ) in
> it).
> 
> perldiag says:
> 
> Strings with code points over 0xFF may not be mapped into in-memory file handles
> 
>     (W utf8) You tried to open a reference to a scalar for read or
>     append where the scalar contained code points over 0xFF.
>     In-memory files model on-disk files and can only contain bytes.
> 
> 
> Some searching indicates that strategically dropping some
> encode_utf8() in the code might help ... Let's try ... Ok, here we are:
> 
> #v+
> diff --git a/t/parse.t b/t/parse.t
> index 1550268..15eb8c6 100644
> --- a/t/parse.t
> +++ b/t/parse.t
> @@ -6,6 +6,7 @@ use HTML::Copy;
>  use utf8;
>  use File::Spec::Functions;
>  #use Data::Dumper;
> +use Encode qw(encode_utf8 decode_utf8);
>  
>  use Test::More tests => 16;
>  
> @@ -109,7 +110,7 @@ $copy_html = do {
>  ok($copy_html eq $result_html_nocharset, "copy_to no charset shift_jis");
>  
>  ##== HTML with charset uft-8
> -my $src_html_utf8 = <<EOT;
> +my $src_html_utf8 = encode_utf8(<<EOT);
>  <!DOCTYPE html>
>  <html>
>  <head>
> @@ -126,7 +127,7 @@ my $src_html_utf8 = <<EOT;
>  </html>
>  EOT
>  
> -my $result_html_utf8 = <<EOT;
> +my $result_html_utf8 = encode_utf8(<<EOT);
>  <!DOCTYPE html>
>  <html>
>  <head>
> @@ -174,7 +175,7 @@ $copy_html = do {
>      read_and_unlink($destination, $p);
>  };
>  
> -ok($copy_html eq $result_html_utf8, "copy_to giviing a file handle");
> +ok($copy_html eq decode_utf8($result_html_utf8), "copy_to giviing a file handle");
>  
>  ##=== copy_to gving file handles for input and output
>  $copy_html = do {
> @@ -187,7 +188,7 @@ $copy_html = do {
>      Encode::decode($p->encoding, $outdata);
>  };
>  
> -ok($copy_html eq $result_html_utf8, "copy_to giviing file handles for input and output");
> +ok($copy_html eq decode_utf8($result_html_utf8), "copy_to giviing file handles for input and output");
>  
>  ##=== parse_to giving a file handle
>  $copy_html = do {
> @@ -196,7 +197,7 @@ $copy_html = do {
>      $p->parse_to($destination);
>  };
>  
> -ok($copy_html eq $result_html_utf8, "copy_to giviing file handles for input and output");
> +ok($copy_html eq decode_utf8($result_html_utf8), "copy_to giviing file handles for input and output");
>  
>  ##=== copy_to with directory destination
>  $copy_html = do {
> #v-
> 
> 
> I'm committing this now but some sanity check would be appreciated.

At a glance, this seems sane, but I guess upstream should be given a
chance to comment too (whether before or after you upload the fix
to Debian).

Cheers,
Dominic.