[xml/sgml-pkgs] Bug#652866: incorrectly formats URI's (URN's) with :-delimited paths

Ivan Shmakov oneingray at gmail.com
Wed Dec 21 08:02:25 UTC 2011


Package: libxml2-dev
Version: 2.7.8.dfsg-2

	[While filing a bug against an older version of the package, I
	don't seem to find anything in the Debian changelog of
	2.7.8.dfsg-2+squeeze1 that'd suggest that any change was made to
	the behavior described below.]

	The xmlSaveUri () function appears to format the URI's with
	:-delimited paths incorrectly, by adding a superfluous // (which
	is simple to resolve, see below), and %-encoding the :'s
	themselves (much harder, I guess), effectively preventing
	urn:-scheme URN's from being used.  (As in: catalogs.)

	Consider the output of the example program (MIME'd):

URI             urn:example:animal:ferret:nose
scheme          urn
opaque          (null)
authority       (null)
server          (null)
user            (null)
port            0
path            example:animal:ferret:nose
query           (null)
fragment        (null)
cleanup         0
query_raw       (null)
xmlSaveUri      urn://example%3Aanimal%3Aferret%3Anose

	Cf. the example in RFC 3986, section 3 [1]:

--cut--
         foo://example.com:8042/over/there?name=ferret#nose
         \_/   \______________/\_________/ \_________/ \__/
          |           |            |            |        |
       scheme     authority       path        query   fragment
          |   _____________________|__
         / \ /                        \
         urn:example:animal:ferret:nose
--cut--

	As the example URN has no authority part, it shouldn't have the
	// separator either.

[1] http://tools.ietf.org/html/rfc3986#section-3

	The relevant parts of the code (as of 2ee91eb6) seem to be:

   999	xmlChar *
  1000	xmlSaveUri(xmlURIPtr uri) {
×
  1019	    if (uri->scheme != NULL) {
× formatting the scheme×
  1047	    }
  1048	    if (uri->opaque != NULL) {
×
  1072	    } else {
  1073		if (uri->server != NULL) {
× adding //[USER@]SERVER[:PORT]
  1161		} else if (uri->authority != NULL) {
× adding //AUTHORITY
  1203		} else if (uri->scheme != NULL) {
×
  1216		    ret[len++] = '/';
  1217		    ret[len++] = '/';

	Here, we've added the superfluous // part.  Arguably, it should
	only be done for the file: scheme, and even then, it may worth
	using an explicit empty string for uri->server instead.

  1218		}
  1219		if (uri->path != NULL) {
×
  1245		    while (*p != 0) {
×
  1258			if ((IS_UNRESERVED(*(p))) || ((*(p) == '/')) ||
  1259	                    ((*(p) == ';')) || ((*(p) == '@')) || ((*(p) == '&')) ||
  1260		            ((*(p) == '=')) || ((*(p) == '+')) || ((*(p) == '$')) ||
  1261		            ((*(p) == ',')))

	Note that the :'s aren't in the list above.

  1262			    ret[len++] = *p++;
  1263			else {
  1264			    int val = *(unsigned char *)p++;
  1265			    int hi = val / 0x10, lo = val % 0x10;
  1266			    ret[len++] = '%';
  1267			    ret[len++] = hi + (hi > 9? 'A'-10 : '0');
  1268			    ret[len++] = lo + (lo > 9? 'A'-10 : '0');

	And here, we're %-encoding the :'s.

	This issue is, however, harder to overcome, unless
	uri->cleanup |= 2 is done before parsing, as
	xmlURIUnescapeString () will be called on the path part of the
	URI, thus effectively making %-encoded :'s indistinguishable
	from the :'s used as URN path delimiters.

  1269			}
  1270		    }
  1271		}

-- 
FSF associate member #7257
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: text/x-csrc
Size: 1505 bytes
Desc: not available
URL: <http://lists.alioth.debian.org/pipermail/debian-xml-sgml-pkgs/attachments/20111221/f54b9c7c/attachment.c>


More information about the debian-xml-sgml-pkgs mailing list