Bug#381359: very hard to handle relative links in feeds
Joey Hess
joeyh at debian.org
Thu Aug 3 21:15:44 UTC 2006
Package: libxml-feed-perl
Version: 0.10-1
Severity: normal
If I'm parsing a feed with XML::Feed and it happens to contain a
relative link, I'm somewhat out of luck if I want to correctly turn that
into an absolute link:
* Maybe it's an atom feed that uses xml:base attributes to set the base
url to derelevatise links. But if it does, there seems to be no way to
get at that info once XML::Feed has parsed the feed.
* Maybe a Content-Location http header is used. But there's no way to
tell once XML::Feed has downloaded the feed.
* Maybe neither of the above is true, and so I have to fall back to
poorly defined heuristics like using the url of the feed itself as the
base url. And munge the content html myself, as well as checking for
relative links in the feed's own link attribute, as well as the link
attributes of individual entries in the feed, etc.
So 2/3 of the time it's impossible and 1/3 of the time it's enormously
painful and probably not possible to do right anyway. Ugh.
XML::Feed should hide all this insane complexity and ugliness from the
user by fixing up all relative url in feeds.
Here's how the python feed parser does it:
http://feedparser.org/docs/resolving-relative-links.html
--
see shy jo
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: Digital signature
Url : http://lists.alioth.debian.org/pipermail/pkg-perl-maintainers/attachments/20060803/ebccdd66/attachment.pgp
More information about the pkg-perl-maintainers
mailing list