[xml/sgml-pkgs] Bug#509967: libxml2 should accept any character in anyURI data (with Relax NG)

Vincent Lefevre vincent at vinc17.org
Sun Dec 28 02:15:27 UTC 2008


Package: libxml2
Version: 2.6.32.dfsg-5
Severity: normal

When validating a file with Relax NG, libxml2 doesn't accept some special
characters (e.g. the space and non-ASCII characters) in anyURI data, though
the specs allow them. Indeed http://www.w3.org/TR/xmlschema-2/#anyURI says:

--------
  3.2.17.1 Lexical representation

  The ·lexical space· of anyURI is finite-length character sequences which,
  when the algorithm defined in Section 5.4 of [XML Linking Language] is
  applied to them, result in strings which are legal URIs according to
  [RFC 2396], as amended by [RFC 2732].

    Note: Spaces are, in principle, allowed in the ·lexical space· of
    anyURI, however, their use is highly discouraged (unless they are
    encoded by %20).
--------

The goal of the algorithm defined in Section 5.4 of [XML Linking Language]
is precisely to encode/escape these disallowed characters in order to form
a valid URI.

Well, there are some restrictions (RFC 2396 / RFC 2732). For instance,
a space must not appear in the scheme part.

Note that nXML mode for Emacs seems to be correct.

-- System Information:
Debian Release: 5.0
  APT prefers unstable
  APT policy: (500, 'unstable'), (500, 'stable'), (1, 'experimental')
Architecture: amd64 (x86_64)

Kernel: Linux 2.6.26.5-20080922 (SMP w/2 CPU cores; PREEMPT)
Locale: LANG=POSIX, LC_CTYPE=en_US.ISO8859-1 (charmap=ISO-8859-1)
Shell: /bin/sh linked to /bin/bash

Versions of packages libxml2 depends on:
ii  libc6                  2.7-16            GNU C Library: Shared libraries
ii  zlib1g                 1:1.2.3.3.dfsg-12 compression library - runtime

Versions of packages libxml2 recommends:
ii  xml-core                      0.12       XML infrastructure and XML catalog

libxml2 suggests no packages.

-- no debconf information





More information about the debian-xml-sgml-pkgs mailing list