[Nut-upsdev] Progress report on git conversion

Mon Dec 5 10:52:01 UTC 2011

(Copied to Dave Hart at the NTP project, who's interested in the
code's progress for non-NUTty reasons.  Dave, the background is that
I'm doing a trial conversion of the Network UPS Tools repo.)

First, repostreamer per se is dead. As it evolved it kept pulling
in more bits of code from reposurgeon until I said "screw it!" and
merged the repostreamer logic in as a reposurgeon input stage.

The NUT repo has some *strange* quirks.  They seem to be consequences of
a cvs2svn lift in the past; this is...normal.  All that stuff will
get cleaned up and untangled by the time I'm done.

I've dealt with the issue in interpreting mixed-copy commits that
I mentioned in previous listmail. One of Subversion's odder features,
the representation of tags and branching as directory copies, turned out
to be a lifesaver here - because it means that the first commit on a
branch always has content identical to whatever previous revision was
the copy source.

Thus, when I run into a branch/tag copy, I just ignore Subversion's
internal metadata about how the copy was made and walk forward from
the root in the commit sequence *looking for identical content!* I
then parent the commit *following* the copy on that match. This works
very nicely.  I've constructed a bunch of small repos with various 
topologies as test loads to prove the algorithm.

I'm working through the problems I encounter, teaching reposurgeon how
to cope.  The current one is that it crapped out on a nonexistent file
while trying to build a commit sometime after r536.  This may mean
that "svn ls -R" (which I use to get per-revision filesets) is
confused; though I have no idea how that could happen, I'm betting
debris left in your history by cvs2svn is implicated somehow. I've
told reposurgeon how to throw a warning rather than dying when this
happens; we'll see what comes next.

It's not a fast process.  Between poor performance from the Subversion
tools and Python being no speed demon itself, a test cycle (reading
the whole repo as far as it can get before throwing some sort of
error) takes a minimum of two hours.  The good news, though, is that
I'm definitely to the stage of fixing little gotchas - the concept and
the code architecture are looking sound.

If you decide you need a git conversion done *quickly*, the first
stage via git-svn is still an option.  But I think the new svn-enabled
reposurgeon will do a better job once I knock the remaining bugs out
of it.

Actually, the very first thing I plan to do when I get a clean
conversion of the entire repo is lift it again with git-svn and diff
the stream files.  I'm curious to find out what git-svn does with
those empty copy commits.  It probably does the easiest thing and just
discards them, which I think is OK in most practical cases but
philosophically wrong - after all, there could be useful information
for humans in the commit comments!

A tool like this should never, ever throw away information unless it's
told to. What I do with the empty copy commits instead is turn them
into tag objects pointing at the immediately previous commit,
preserving the committer ID and time and comment.  I'll delete the
uninteresting ones (probably all of them) by hand in the hand-cleanup
stage.

I also now preserve all Subversion properties, by the way.  svn:ignore
gets turned into equivalent .gitignore files.  Others are not
interpreted but left where the human surgeon can see them.  It is
possible I'll be able to translate some of them once I see instances
in the field.  Top on my list is to try to do something useful with
merge properties - I'm pretty sure none of the existing conversion 
tools go there.
-- 
		<a href="http://www.catb.org/~esr/">Eric S. Raymond</a>

Everything that is really great and inspiring is created by the
individual who can labor in freedom.
	-- Albert Einstein, in H. Eves Return to Mathematical Circles, 
		Boston: Prindle, Weber and Schmidt, 1988.