[Licenses-discuss] Thought on steps

Thu Aug 10 05:05:56 UTC 2017

Hi,

Here is my thoughts.

> to put the tests
> under /usr/share/tests/license/<pkg>
> and   /usr/share/tests/copyright/<pkg>

Considering the normal size of the current installed binary is small and
test is bigger and not so useful for the normal user, this should be
packaged as a separate binary package a-la "<pkg>-data" while <pkg>
being
debmake, licensecheck, dcopy, ... I think.

For debmake, I will probable have the same data in both by placing
symlinks.

Let me add few more thought points before I forget them while I woke up early
(too cold on-site, I am used to the real Hotel these days).

Correction:
I said debmake matches by paragraph. --- Not exactly correct.

debmake separates
   * (copyright section lines)
   * (license or license reference section lines)
by line sectioning using regular expression matches and a state machine
logic.

The license lines are normalized and reduced to a single line with a
single space placed between words.  License scan match is done on this
line.  So license match itself is not paragraph aware.

Thought:

There are few actions which results need to be recorded as data in the
source package:

   1) extract and match copyright/license to generate a
      machine generated copyright summary. (No *: data-1)
   2) record human=maintainer judgment. (No *: data-2)
   3) reduce data-2 to DEP-5.  (Allow *: data-3)

As for the process-1, this is automatic.
As for the process-2, this is manual and need to record who did this.
  (But machine can provide initial template using previous history)
                                                  ^^^^^^^^
As for the process-3, this maybe done either by script or manually.

There is another aspect of these data.  What was the previous history?
I initially thought without setting up an external data base, it is
not-so-easy.  But actually in Debian, it is simple and we already have
it.  Previous source package (yes, it is signed!).

Normally a typical maintainer uses uscan/uupdate to make a new package
of a new upstream release and I have been making updates for these
programs.  So all I need to do is rename old data-{1.2.3} to something
like data-{1,2,3}x while it is run.  Simple.  It can do more.

 * The new data-1 is generated from the new source.
 * If data-1 == data-1x, then uscan should make data-{2,3} by renaming
   data-{2,3}x to data-{2,3}.
 * Otherwise, then making of data-{2,3} by copying (smartly) data-{2,3}x
   to data-{2,3} and annotating it as "FIXME" which will cause lintian
   error blocking release.  With my limited free time, I may end up
   doing simple, I mean, "copying dumb" with the first line having
   XXX_FIXME_XXX.

Of course, if the format of data-1 and data-2 are shared, that's nice.
But even if they are not uniform across tools, information is signed as
package and decision are traceable.

f course, if the format of data-1 and data-2 are shared, that's nice.
But even if they are not uniform across tools, information is signed as
package and decision are traceable.

Of course, we can add more signature and hush within these tools, but
the current debian package generation tool already provide them. The
subtle issues are:

 * whoever finally packaged and whoever touched inside package may be
   different.
 * someone may fool by copying data-1 over data-1x and adding a new hush
   to data-2 to fool lintian check.

The first can be addressed by requiring to generate signature file
data-2.sig on data-2 with the person taking the responsibility over
decision while having a hush of data-1 as an extra data stanza in the
data-2.  The last hush trick should prevent accidental upload.

The second can be addressed by the external database or post-upload
archive data comparison since Debian already has a snapshot archive.

Matching SPDX may be a non-trivial work.  But the above are really
trivial since it doesn't require much new codes.

Osamu