Bug#463774: does not implement RSS 2.0 guid isPermaLink properly; hides guids

Joey Hess joeyh at debian.org
Sun Feb 3 07:28:32 UTC 2008


Package: libxml-rss-perl
Version: 1.31-3
Severity: normal
Tags: patch

This bug seems to have first appeared in version 1.30.

Consider a feed such as the music I listen to:
http://ws.audioscrobbler.com/1.0/user/joeyhess/recenttracks.rss

      <item>
         <title>Foo Fighters – Come Alive</title>
         <link>http://www.last.fm/music/Foo+Fighters/_/Come+Alive</link>
         <pubDate>Sun, 27 Jan 2008 05:07:00 +0000</pubDate>
         <guid>http://www.last.fm/user/joeyhess/#1201410420</guid>
                 <description>http://www.last.fm/music/Foo+Fighters</description>
      </item>

If I parse this using XML::RSS, this happens:
http://www.last.fm/music/Foo+Fighters/_/Come+Alive
joey at kodama:~>perl -le 'use XML::RSS; local $/=undef; $feed=<>;
	$r=XML::RSS->new(version => "1.0"); $r->parse($feed);
	print "link: ".$r->{items}->[0]->{link};
	print "guid: ".$r->{items}->[0]->{guid}' < recenttracks.rss
link: http://www.last.fm/music/Foo+Fighters/_/Come+Alive
guid: 

In this feed, the link links to the song. Which I might play multiple
times. Thus the guid, which differs for each play. Since I get back the
same link each time, and can't look at the guid, there's no way to
distinguish one play of the song from another.

Here's the culprit:

        # guid element is a permanent link unless isPermaLink attribute
	# is set to false
    }
    elsif ($el eq 'guid') {
        $self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} =
          !(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'} eq 'false'));

        # beginning of taxo li element in item element
        #'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo'
    }

This is just wrong. The RSS 2.0 spec says:

    If the guid element has an attribute named "isPermaLink" with a value of
    true, the reader may assume that it is a permalink to the item

The above code is exactly backwards to the spec, assuming that the guid
is a permalink unless isPermaLink=false. The guid doesn't even have to
be an url according to the spec, so this is very wrong. It can be fixed
as follows. (I threw in an lc too, because attributes should (probably)
be parsed case-insensatively.) Note that I had to patch the test suite,
since this does change behavior -- the test suite was testing for the
same incorrect reading of the spec.

Index: t/2.0-permalink.t
===================================================================
--- t/2.0-permalink.t	(revision 13998)
+++ t/2.0-permalink.t	(working copy)
@@ -21,9 +21,8 @@
 );
 
 # TEST
-is ($item_with_guid_missing->{"permaLink"}, 
-    "http://community.livejournal.com/lj_dev/713810.html",
-    "guid's isPermaLink is missing, so the item permalink property should be set to the value of the guid tag"
+ok ((!$item_with_guid_missing->{"permaLink"}),
+    "guid's isPermaLink is missing (implicitly false), so the item permalink property should not be set"
 );
 
 # TEST
Index: lib/XML/RSS.pm
===================================================================
--- lib/XML/RSS.pm	(revision 13998)
+++ lib/XML/RSS.pm	(working copy)
@@ -786,11 +786,12 @@
             }
         }
 
-        # guid element is a permanent link unless isPermaLink attribute is set to false
+        # guid element is a permanent link IFF isPermaLink attribute is set
+        # to true
     }
     elsif ($el eq 'guid') {
         $self->{'items'}->[$self->{num_items} - 1]->{'isPermaLink'} =
-          !(exists($attribs{'isPermaLink'}) && ($attribs{'isPermaLink'} eq 'false'));
+           (exists($attribs{'isPermaLink'}) && (lc($attribs{'isPermaLink'}) eq 'true'));
 
         # beginning of taxo li element in item element
         #'http://purl.org/rss/1.0/modules/taxonomy/' => 'taxo'

-- System Information:
Debian Release: lenny/sid
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'testing'), (500, 'stable'), (1, 'experimental')
Architecture: i386 (i686)

Kernel: Linux 2.6.24-1-686 (SMP w/1 CPU core)
Locale: LANG=en_US.UTF-8, LC_CTYPE=en_US.UTF-8 (charmap=UTF-8)
Shell: /bin/sh linked to /bin/bash

Versions of packages libxml-feed-perl depends on:
ii  libclass-errorhandler-perl 0.01-2        Base class for error handling
ii  libdatetime-format-mail-pe 0.3001-1      Convert between DateTime and RFC28
ii  libdatetime-format-w3cdtf- 0.04-2        Parse and format W3CDTF datetime s
ii  libdatetime-perl           2:0.41-1      perl DateTime - Reference implemen
ii  libfeed-find-perl          0.06-2        Syndication feed auto-discovery
ii  libhtml-parser-perl        3.56-1        A collection of modules that parse
ii  liburi-fetch-perl          0.08-1        Smart URI fetching/caching
ii  liburi-perl                1.35.dfsg.1-1 Manipulates and accesses URI strin
ii  libwww-perl                5.808-1       WWW client/server library for Perl
ii  libxml-atom-perl           0.25-2        Atom feed and API implementation
ii  libxml-rss-perl            1.31-3        Perl module for managing RSS (RDF 
ii  perl                       5.8.8-12      Larry Wall's Practical Extraction 

libxml-feed-perl recommends no packages.

-- no debconf information

-- 
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20080203/86217a5c/attachment.pgp 


More information about the pkg-perl-maintainers mailing list