Bug#880798: Wide character in print at /usr/bin/json_pp line 82
Dominic Hargreaves
dom at earth.li
Fri Nov 17 15:03:45 UTC 2017
Control: forwarded -1 https://rt.cpan.org/Ticket/Display.html?id=123653
On Sun, Nov 05, 2017 at 04:31:19AM +0800, 積丹尼 Dan Jacobson wrote:
> X-Debbugs-Cc: makamaka at cpan.org
> Package: perl
> Version: 5.26.1-2
> File: /usr/bin/json_pp
>
> This command line utility should have all character set issues already
> solved internally, no?
>
> $ set http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81
> $ GET http://archive.org/wayback/available?url=$@
> {"url": "http://radioscanningtw.jidanni.org/index.php?title=\u9996\u9801", "archived_snapshots": {"closest": {"status": "200", "available": true, "url": "http://web.archive.org/web/20171104183618/http://radioscanningtw.jidanni.org/index.php?title=%E9%A6%96%E9%A0%81", "timestamp": "20171104183618"}}}
>
> $ GET http://archive.org/wayback/available?url=$@ | json_pp
> Wide character in print at /usr/bin/json_pp line 82, <STDIN> chunk 1.
It looks like this is working as advertised. From json_pp(1):
" -json_opt
options to JSON::PP
Acceptable options are:
ascii latin1 utf8 pretty indent space_before space_after relaxed canonical allow_nonref
allow_singlequote allow_barekey allow_bignum loose escape_slash
"
>From JSON::PP(3perl):
" utf8
$json = $json->utf8([$enable])
$enabled = $json->get_utf8
If $enable is true (or missing), then the encode method will encode the
JSON result into UTF-8, as required by many protocols, while the decode
method expects to be handled an UTF-8-encoded string. Please note that
UTF-8-encoded strings do not contain any characters outside the range
0..255, they are thus useful for bytewise/binary I/O.
(In Perl 5.005, any character outside the range 0..255 does not exist.
See to "UNICODE HANDLING ON PERLS".)
In future versions, enabling this option might enable autodetection of
the UTF-16 and UTF-32 encoding families, as described in RFC4627.
If $enable is false, then the encode method will return the JSON string
as a (non-encoded) Unicode string, while decode expects thus a Unicode
string. Any decoding or encoding (e.g. to UTF-8 or UTF-16) needs to be
done yourself, e.g. using the Encode module.
"
I do agree that the requirement to supply that flag is not intuitive,
although I'm not sure whether this is easily fixable. For some output
formats I can see that it would not make sense to always pass the utf8
flag up (for example the second example in the json_pp manpage) but
perhaps it could be a bit clever for situations where it ends up
printing utf8 characters to the terminal.
I've forwarded this upstream to see whether it is practical to make
this more user friendly.
Dominic.
More information about the Perl-maintainers
mailing list