Bug#655335: HTML parsing now breaks on entities and mismatched tags
Josh Triplett
josh at joshtriplett.org
Tue Jan 10 13:30:15 UTC 2012
Package: get-flash-videos
Version: 1.25~git2011.09.26-2
Severity: normal
At some point recently, get-flash-videos started breaking whenever it
tries to parse HTML. It complains about improperly paired tags, fails
to parse standard HTML entities like $nbsp; and ↑, and it tries to
parse the && in Javascript as an entity.
If it matters, this occurred when attempting to use get-flash-videos on
CollegeHumor URLs. For example:
$ ./get-flash-videos 'http://www.collegehumor.com/video/3505939/font-conference'
Downloading http://www.collegehumor.com/video/3505939/font-conference
Using method 'collegehumor' for http://www.collegehumor.com/video/3505939/font-conference
Error: :39: parser error : Opening and ending tag mismatch: meta line 4 and head
</head>
^
:207: parser error : Entity 'uarr' not defined
<div id="btn_upload" class="button"><a href="/submit">Submit ↑</a><
^
:222: parser error : Entity 'nbsp' not defined
<a href="javascript:void(0);" class="close" id="login_cancel"> </a>
^
:684: parser error : Entity 'copy' not defined
<p>© 2012 Connected Ventures, LLC. All rights reserved. | Broug
^
:760: parser error : xmlParseEntityRef: no name
if(e.target && e.target.nodeName == 'IFRAME') {
^
:760: parser error : xmlParseEntityRef: no name
if(e.target && e.target.nodeName == 'IFRAME') {
^
:838: parser error : Opening and ending tag mismatch: head line 3 and html
</html>
^
:839: parser error : Premature end of data in tag html line 2
^
(from FlashVideo::Site::Collegehumor::./get-flash-videos::1512)
I don't know whether get-flash-videos has changed how it invokes libxml,
or whether libxml's HTML parsing has broken.
- Josh Triplett
-- System Information:
Debian Release: wheezy/sid
APT prefers unstable
APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)
Kernel: Linux 3.1.0-1-amd64 (SMP w/4 CPU cores)
Locale: LANG=C.UTF-8, LC_CTYPE=C.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/dash
Versions of packages get-flash-videos depends on:
ii libdata-amf-perl 0.09-3
ii libhtml-parser-perl 3.69-1+b1
ii libtie-ixhash-perl 1.21-2
ii liburi-perl 1.59-1
ii libwww-mechanize-perl 1.71-1
ii libwww-perl 6.03-1
ii perl 5.14.2-6
ii rtmpdump 2.4+20111222.git4e06e21-1
Versions of packages get-flash-videos recommends:
ii get-iplayer <none>
ii libcrypt-rijndael-perl 1.08-1+b2
ii liblwp-protocol-socks-perl <none>
ii libxml-simple-perl 2.18-3
Versions of packages get-flash-videos suggests:
ii mplayer 2:1.0~rc4.dfsg1+svn33713-5
-- no debconf information
More information about the pkg-perl-maintainers
mailing list