[Soc-coordination] [GSoC] Status Reoprt: Java Project Dependency Builder

Mon Jun 9 09:16:15 UTC 2014

On Sat, Jun 7, 2014 at 1:08 AM, Daniel Pocock <daniel at pocock.com.au> wrote:
>
>
> On 07/06/14 09:33, Andrew Schurman wrote:
>> Hey guys,
>>
>> Last week I was able to fetch source of artifacts which has an <scm/>
>> tag in their maven metadata. By leveraging the maven-scm-plugin, I was
>> able to get the vast support of VCSs for free. It also provides a well
>> documented interface for extension in case other VCSs come out or
>> someone is using one which is currently unsupported. I've tried to
>> make upgrading the maven-scm-plugin possible without cluttering the
>> mojo configuration using plexus configuration elements. This doesn't
>> make it impossible to change, but again, not very easy. I might want
>> to revisit this at a later time.
>>
>> This week I started to work on actually building a checked out
>> project. My focus has been maven projects. I've been able to build
>> simple (single module) projects easily and have started to look at
>> multi-module projects. This has brought up a few issues.
>>
>> I recall reading that packaging a maven project for debian requires
>> maven to be in offline mode during the build. I can try to emulate
>> that by dumping artifacts into a built artifact repository and use
>> that as the only repository during the build. You are still kind of
>
> This is a perfectly valid strategy.
>
>> cheating because you need an online repository to get the artifact
>> metadata in the first place, but since I've opted to fork off a maven
>> process for building, we can force offline mode there. This will
>
> I really feel this may need a two-phase workflow (e.g. scan deps in one
> phase, build in a second phase) and that it may be asynchronous (the
> build will not always happen immediately after the scan)
>
> A very simplistic strategy would involve creating some output file,
> maybe just a CSV file, listing each dependency that was not found.  This
> data could also go into SQL or MongoDB or something.

That was exactly the plan once I got into more complicated maven
projects. An xml report would detail the dependencies, type and
whether it's satisfied in the repo when the plugin was ran. There
would also be a configuration element controlling artifacts which
would be skipped and copied to the output repository.

>
>
>> require all dependencies, which includes parent poms, dependencies,
>> plugins and their dependencies to be built before the project can be
>> built. We can't rely on just dependencies and parent poms because one
>> of the plugins or maybe even one of the plugin dependencies could be
>> in the reactor, or a proper dependency that should be built. This begs
>> the question of where to stop as plugins will inevitably depend on
>> maven-core.
>>
>> My goal for next week, although is a little ambitious considering what
>> is left for building maven projects, is to look at ant projects and
>> see if I can come up with a good way to determine which way to build.
>> I think ant should always take preference if there is a build.xml file
>> regardless of a pom.xml file. I also forsee problems determining the
>> jar file which represents the built artifact. I could simply diff the
>> tree before and after the build to find new jars, but this doesn't
>> help if there are 2 or more jars built for the project. It really
>> doesn't help that ant doesn't have the same standards for building
>> that maven does. I may need project specific commands to locate the
>> artifact in question.
>
> You could fetch the corresponding binary artifact from the Central
> repository and scan the list of classes and compare that to the list of
> classes in each built JAR.  This is just an approximation but it is
> probably good for over 90% of projects.

Good idea.

>
> For some projects you will need to patch their pom file or build.xml or
> whatever and it would be ideal if you forked their Git repository and
> created your changes on a branch.
>
> If the project is in an SVN repo, you will probably need my sync2git script:
> https://github.com/dpocock/sync2git
>
> Please have a look at the issues in github too, #2 is quite easy:
> https://github.com/dpocock/sync2git/issues

Instead of cloning the entire history, what if we just take a snapshot
of the files at that particular version? It will save us from
translating between VCSs. We could do something different for git, but
I think doing the same thing for everything would make things easier.