Changes in 0.1.15 and how to use same code on stable machines

Mon Apr 12 08:19:00 UTC 2010

(Including the list this time.)

On Sun, Apr 11, 2010 at 11:15:55AM +0200, Andreas Tille wrote:
> Hi John,
> 
> On Sat, Apr 10, 2010 at 02:37:01PM -0600, John Wright wrote:
> > How about
> > 
> >     printstring = stanza[field].decode('utf-8')
> > 
> > As far as I have tested, both str and unicode have a decode method, and
> > it looks like unicode doesn't really care about what argument you give
> > it.
> 
> I have tried this previosely and it also fails with
> 
>   UnicodeEncodeError: 'ascii' codec can't encode character u'\xf6' in position 42: ordinal not in range(128)

Ah, bummer.  I was trying with a unicode string that contained only
ascii characters...

> > The other option is checking whether the object is unicode or str type,
> > but I think the above works, and is cleaner.
> 
> I would really like to have a clean solution but this does fail as well
> and I admit all these encoding issues are by far the most frustrating
> issues and are consuming about half the debugging time when dealing with 
> non-ASCII content. :-(
> 
> I'd be happy about any other suggestion

Well, it looks like you'll have to change every usage of stanza[field]
anyway, so how about this helper function:

    def to_unicode(value, encoding='utf-8'):
	if isinstance(value, str):
	    return value.decode(encoding)
	else:
	    return unicode(value)

Then, everywhere you would have previously used something like

    printstring = unicode(stanza[field], 'utf-8')

instead use

    printstring = to_unicode(stanza[field], 'utf-8')

(or you can omit the 'utf-8' argument).  It's still not perfectly
elegant, but it looks and should behave just like your original code
and also work with python-debian >= 0.1.15.

-- 
John Wright <jsw at debian.org>