[Popcon-developers] stable Packages.gz is UTF-8

Bill Allombert Bill.Allombert at math.u-bordeaux1.fr
Mon May 5 20:03:15 UTC 2008


On Mon, May 05, 2008 at 08:55:32PM +0200, Petter Reinholdtsen wrote:
> [Bill Allombert]
> > So I would like to know the specific issue you met, and to fix it
> > properly.
> 
> The issue I ran into was that popcon.debian.org was no longer being
> updated because popcon.pl crashed.  I tracked it down to a problem
> with reading the Packages.gz file as UTF-8 and finding a non-UTF-8
> character.  I solved it by picking the 8-bit charset that seemed to
> match the file best, ISO-8859.1.  Any idea how to guess charset when
> the content is mixed?

ISO-8859-1 does not match 'best' at all. It just so happen that any file
is a valid ISO-8859-1, so you will not get an error, but a broken result
instead.

The file is (according to Debian policy) in UTF-8, so we should use
:encoding(UTF-8) instead of :encoding(ISO-8859-1), but not :utf-8, this
way non-UTF-8 characters get replaced by the � character.

Done in CVS.

Cheers,
-- 
Bill. <ballombe at debian.org>

Imagine a large red swirl here. 



More information about the Popcon-developers mailing list