DEP14 policy for two dots
Ian Jackson
ijackson at chiark.greenend.org.uk
Thu Nov 3 19:37:41 UTC 2016
Nish Aravamudan writes ("DEP14 policy for two dots"):
> [ Raphael, apologies for sending twice, had a error in the headers in
> the prior one ]
>
> Not sure exactly where to ask this better than debian-devel, but I am
> working on an importer for the Ubuntu Server team which parses published
> versions of source packages in Debian and Ubuntu. I ran into an issue
> today where there is a published version of src:pcre3 with version
> '8.30..2'. `man git-check-ref-format` says that reference names "cannot
> have two consecutive dots .. anywhere." DEP14 specifies appropriate
> substitutions for : and ~, but it seems like .. should also be accounted
> for so I can correctly tag historic versions?
Urk. How exciting. I think we may need a more general escaping
scheme for these and other weirdnesses.
I have an interest as dgit uses DEP-14 tag escaping. I have CC'd the
vcs-pkg list.
tl;dr: I think we should insert `#' characters as needed.
Looking at git-check-ref-format(1) and
https://wiki.debian.org/Punctuation:
. special to git, generally permitted in versions,
and we want it usually to be literal - this is our problem
~ special to git, permitted in versions, handled by DEP-14 as _
: special to git, epoch in versions, handled by DEP-14 as %
@ special to git (although sometimes allowed), forbidden in versions
% _ not special to git but already used by DEP-14
# , =
not mentioned in the git manual as special, forbidden in versions
] not special to git, although [ is so let's not, eh ?
+ - not special to git, permitted in versions
" ' $ & ( ) * ; < > ? `
not mentioned in the git manual but troublesome shell
metacharacters which we would be insane to use here
[ / { }
interpreted specially by git some of the time,
forbidden in versions - not really useful
^ ? * \
all of these are forbiden by git, not permitted in versions
So I think in fact the only thing we have a problem with is multiple
dots. Looking at the summary above, we have the choice of one of
these:
# Its use as a shell comment character is fine, because when inside
a version tag it is always preceded by some string like
"debian/" or "upstream/". We would almost never need to put it
at the start of the encoded version string anyway, and we have
already tolerated a similar situation with ~.
There is possible confusion with HTML fragment identifiers, and
possibly in languages other than shell which use # for
comments (athough hopefuly they aren't dealing with our versions
as literals anyway).
Proposed rule:
Insert "#":
- between each pair of adjacent dots
- after any trailing dot
- before any leading dot
Examples:
8.30..2 => 8.30.#.2
8.30. => 8.30.#
.42 => #.42
, I would like to avoid this because lots of people are probably
using it as a list separator in ways that are difficult for us
to predict. If we used it, I would suggest the same as for #.
= In principle we could use this. I don't like it for a similar
reason to above. If we did use it it might look a bit like
Q-P encoding in some contexts.
@ We could use this although I wouldn't like to rely on the fact
that git dislikes `@{' and `@' but not @ followed by other
things.
% Reusing this is tempting because an epoch separator can never
follow `.', so any `%' after any `.' would unambiguously mean
`escape for dot rather than colon'. But in principle `.' can
occur at the start of the version, so `:3' and `.3' both =>
`%3'. There would have to be some horror of an exception rule.
(Although `:3' and `3' compare equal as Debian versions, they
are different textual strings and the tag needs to convey the
whole string.)
Ian.
--
Ian Jackson <ijackson at chiark.greenend.org.uk> These opinions are my own.
If I emailed you from an address @fyvzl.net or @evade.org.uk, that is
a private address which bypasses my fierce spamfilter.
More information about the vcs-pkg-discuss
mailing list