[Nut-upsdev] Time for a distributed VCS?

Eric S. Raymond esr at thyrsus.com
Sun Nov 27 23:07:20 UTC 2011


Charles Lepple <clepple at gmail.com>:
> Eric, I think you covered this more in your "DVCS Migration" page,
> but I left that in for the synchronization capabilities of git-svn.

I understood that.  And if the project hasn't yet made a commitment to
move fully to a DVCS, supporting that sort of gatewaying makes every
kind of sense.

Given that there doesn't seem to be any constituency for hg or bzr, my
original question should perhaps be re-posed as: Is the project ready
to move *all the way* to git?  If so, I can do a one-time conversion 
of significantly higher quality than git-svn by itself offers.
 
> I agree it's not an ideal commit identifier, but my knee-jerk reaction is 
>that "2011-10-25T15:11:09Z!fred at foonly.com" is only marginally better.

The jerking knee is quite understandable; action stamps are indeed
unpleasantly long and complex-looking if you're used to Subversion
revision numbers. 

To put this in perspective, though, that one (which is a good
representative example) is actually shorter than a 40-character git
hash ID.  And unlike a git hash, it has meaning that a human can
easily extract. So it's actually a net win over a git hash ID, even if
you discount the possibility that the project might someday want to
change VCSes again.

One potential area of improvement in my tools is coming up with more 
compact commit specifiers that rely on context.  For example, 

	"10-25T15:11:09Z!fred"

would be unambiguous if we undertake to specify leading components of the 
timestamp from the commit's date *and* we know that in all commits for 2011
there is only one committer with a fred@ address.

(This sort of design challenge is one reason I keep looking for
conversions to do.)

> The potentially confusing part is that I often used "[1234]" syntax
  to refer to SVN revisions (especially in merge commits) since we
  have a Trac instance to browse the repository. I think Trac added
  options later to identify the r[0-9]+ syntax.

Right.  My tools would pick up on that pattern as a potential
reference, so there's no intrinsic problem there.
 
> I wanted the git-svn conversion to remain machine-convertable while
> we worked out the kinks of getting NUT developers up to speed on
> Git, but so far, nobody else has taken the bait. (We had a few
> people submitting patches from their own git-svn trees several years
> ago, but as far as I know, I am the only one using that converted
> repository.) I think it's time to force the issue.

Having participated in several project conversions, I would say you
are extremely likely to be right about that. The pattern in DVCS
conversions tends to be that they go through the following stages:

1. Some individual (or a few people) sees the potential benefit and
starts pushing the idea, often by setting up a live mirror exactly 
as you have done.

2. There's a relatively long period of other developers going "meh..."
with no active opposition but not a lot of people rushing to join the
DVCS adopters either, often because everybody involved somewhat
overestimates the transition costs.

3. Project leadership decides to, as you put it, "force the issue". 
A week or two of hacking around and confusion ensues.

4. About a month later, after the dust has settled, everyone sort of
blinks and goes "Uh? Why the fsck didn't we do this sooner?"

My point is that by the time a forward-looking senior dev thinks it
might be "time to force the [DVCS] issue"...it *is*. And that's
because your hunch that other devs won't get off their butts to learn
a new tool until "we're switching" is issued as a ukase from the
Tsar(s) is also generally correct. 

I don't mean "get off their butts" as backhand criticism of anyone,
either - this is perfectly rational laziness.  Waiting for clear
consensus doesn't tend to work well in situations where a lot of
people are being asked to trade an obvious but small and short-term
cost for a larger but less-visible long-term gain.  This is one of
those situations, and it's why projects have leadership. If you and
Arnaud are so minded, my advice as a friend of the project is: go
ahead and pull that trigger.
 
(That's easier advice for me to give in this particular case because I
can lower your transition cost a lot...but I'd give the same advice
anyway on general principle.)

> > Another is that .cvsignore files aren't mapped over to .gitignore files.
> 
> Right, I have a .git/info/excludes built from the SVN metadata. Not
> ideal, but I didn't want the .gitignore files to get out of sync.

This would be one of the advantages of a well-planned one-time
conversion.  We can get this right once and not have to screw with
it again.

> > Yet another is that git-svn does not detect rename and copy operations.
> 
> I'm curious about this - doesn't git only detect copy and rename
> when you browse the repository metadata, not when you create a
> commit? I may be working with old information here, but I thought
> that's why you sometimes need to specify "--find-copies" and
> "--find-copies-harder" to some Git commands.

I'm not actually certain about this myself, and it may vary by git
version.  I added rename and copy detection to my tools because it was
easy and fast-import format has the primitive fileops to support them.

It may be that git now uses this information, or it may be that the
fast-import markup is only there for VCSes that are already known to
have first-class containers designed in (bzr is like this).  I don't
actually know, but fully supporting the format can't hurt.

> Yes, I've been lifting Subversion tags to Git tags manually. I'm a
> strong believer that tags should be outside of the commit namespace
> (like CVS and Git; unlike SVN and Mercurial).

I agree with you.  This is one of the few drawbacks in Mercurial, which
I consider in general a more elegant system than git. (However, I no
longer advise projects to convert to Mercurial; git's advantages in depth
of toolset and spread of available hosting are compelling. Sigh...)

>              I'm not intentionally singling out Fred, but if you
> look at the first few commits of both the Eaton_SDK and windows_port
> branch, there are some not-quite-merge commits of his which turn the
> Git history into a bit of a mess.
>
> I just got back from vacation, and haven't fully identified what is
> going on with the Eaton_SDK branch, but from what I can tell, git-svn
> didn't understand that the windows_port branch was re-created, and so
> it gave the branch several heads. If it's easy to programmatically
> identify such situations in reposurgeon, fine--otherwise I suggest we
> just manually run "git rebase -i" to squash those commits into a
> normal branch creation commit.

Aha.  You may want the tool I've been working on for the past
week...repostreamer.  This is the first I've told anyone about it;
might still take a few days to get it finished.

As you noted, git-svn is not an optimal tool for one-time conversions
(that is, as opposed to live gatewaying). repostreamer is designed
specifically for such conversions; it runs in a checkout directory
of the source VCS and produces a git-import stream on standard output.

The reason for repostreamer's existence is that, when properly factored,
converters for different VCSes can share the trickiest part of their code.
That's that part that walks through revisions, checking out contents and
sha1-hashing each file to construct blobs and fileop sequences.  

The VCS-specific extractor classes are quite small, about 100 LLOC in
Python for the two that now exist.  One is a git extractor that is a
workalike for git fast-export --all; I use that to test the rest of the
code, and it produces output byte-for-byte identical to git-fast-export
on real (multibranch) repositories.

The other is a Subversion extractor.  Yesterday it had crude
multibranch detection that would *not* have gotten your case right,
but I'm rewriting it to be more general and flexible.  My goal, which
I may or may not completely achieve, is to have it auto-adapt to
arbitrarily strange branch layouts. Very likely I can teach it to
auto-adapt to one as relatively simple as nut-ups's; I've looked at
your repo.

There is also the near-term prospect that a collaborator will add
a BitKeeper extractor.  And I'm thinking about writing one to handle
a directory's worth of RCS files.  I don't think I need to do CVS, 
as git cvsimport already does a pretty good job of that.

(By the way, it's not just git-svn.  *All* the existing Subversion
converters pretty much suck for cases like yours.  Either they don't
do multibranch handling at all, or they botch it, or they need hints
in the form of handcrafted rulesets. I'm trying to eliminate these. I
think it's just barely possible to get to look-ma-no-hands with
sufficiently clever deductions from Subversion's XML logs, and that's
what I'm working on *right now*.)

> The authors file for git-svn is currently in the source tree.

Oh, good.  That will be helpful.

> In short, I'm all for redoing the SVN-to-Git conversion, as long as
> we fix up a few things in the process. What's next?

Well, that depends.  The first question is: are you guys going to pull 
the trigger and decide to go for a high-quality one-time conversion to git,
as opposed to live gatewaying?

If the answer is "not yet", we should talk further when it changes to "yes".

Assuming the answer is "yes", then the outline of the process is described
in my DVCS Migration Guide.  Here's what I'll need to carry it through:

1. The authors file - happily, you've already put this together.

2. Admin access to your project on Alioth Forge, or github, or
wherever you plan on hosting the public git repo. What I'll need is
the ability to create, rename, and delete git repos.

3. A few hours to do the conversion. (It could be days by hand, but my
tools really are quite good that way.)

4. Your patience in case the first cut has minor issues that require a
do-over (such as unconverted references).  The cost of a do-over is
low, it just means that your devs have to re-clone rather than
pulling.

I would of course give plenty of warning to the mailing list at each stage.

If we end up using git-svn and then git-rebase to fix up the Eaton_SDK
branch issues, that's not a big deal.  It's pretty likely (though not
certain) that I can get repostreamer to a state that will get the
conversion right without hand fixups of the branches.  Unless there's
time pressure on the conversion that I don't know about, I'd prefer
to go that route.

It might be useful for you (singular, Charles Lepple) to take a good look
at the reposurgeon man page: 

   http://www.catb.org/~esr/reposurgeon/reposurgeon.html

I'm guessing you'll be the person mainly responsible for checking my work,
and whatever we have to resolve will go more quickly if you understand
the capabilities and limitations of my main tool.

If you have any desiderata other than the metadata-conversion ones we've
already discussed, I'd like to know about them sooner rather than later.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>



More information about the Nut-upsdev mailing list