[Pkg-samba-maint] Bug#470844: Bug#470844: encoding issue with spaces in nmblookup(1) synopsis
Colin Watson
cjwatson at debian.org
Sun Mar 30 16:29:53 UTC 2008
On Sun, Mar 23, 2008 at 09:07:03PM -0700, Steve Langasek wrote:
> On Fri, Mar 14, 2008 at 07:29:07AM +0100, Christian Perrier wrote:
> > Apparently, these are non-breakable spaces....encoded in ISO-8859-1.
> > I'm not sure this can be called incorrect. Colin
>
> > I don't see any deep reasons for these spaces to be uncreakable,
> > though. I'd rather see them as regular spaces, which would help good
> > portability of original manpages.
>
> > Samba's manpages are generated from XML files in upstream's samba-doc
> > SVN. They are included in upstream's distribution when releases are
> > published.. The fix should probably go there rather than having a
> > Debian-cooked patch to the generated manpages
>
> Interestingly, there are no non-break spaces in the original HTML sources
> for these documents; so these are entirely an artifact of the translation
> tools (db2man.xsl?) used by upstream.
>
> I've prepared a patch that replaces all of the \xA0 chars in the manpages
> with the "\ " sequence Colin mentions, and will commit to svn shortly; and
> I'm cloning this bug to man-db in case Colin thinks this smashing of 0xA0 is
> something that should be fixed there.
I initially thought that this was a bug in groff's fonts (they don't
define a character at position 160), and indeed it is possible to make
this bug go away by changing the fonts.
However, after plugging my brain in, I realised that this was wrong.
Fonts deal with output glyphs, not input characters, and this is a
matter of the correct handling of the input character at position 160 in
ISO-8859-1 (or U+00A0). In fact, /usr/share/groff/current/tmac/troffrc
already transliterates \[char160] to \~ (stretchable non-breaking
space). So what's going on?
It turns out that this is the fault of the multibyte patch. The
translation was being read by troff and applied to char160. However,
with the multibyte patch input character 160 is handled as a wide
character (because its encoding in the input stream may vary).
Characters in the range 128-255 are strange, because, although they are
wide characters, their properties are already stored by ordinary troff
in its charset_table array. Unfortunately, troff wasn't looking these up
properly when the multibyte patch was in effect, and as a result it
ignored the translation requested by troffrc.
Fixing the lookup code in wcharset_table_entry makes the translation
work again. I've applied this fix for my next upload.
Cheers,
--
Colin Watson [cjwatson at debian.org]
More information about the Pkg-samba-maint
mailing list