<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
</head>
<body>
<p>Hi Torsten,<br>
</p>
<div class="moz-cite-prefix">On 06.11.21 13:00, Thorsten Alteholz
wrote:<br>
</div>
<blockquote type="cite"
cite="mid:E1mjKMj-000Bwg-VI@fasolo.debian.org">
<pre class="moz-quote-pre" wrap="">
Hi Steffen,
upstream states in his corresponding paper the source of all datafiles.</pre>
</blockquote>
We agree that most references in the paper are about the analysis
the authors have performed for that paper. But those sequences are
not redistributed in the source tree.<br>
<blockquote type="cite"
cite="mid:E1mjKMj-000Bwg-VI@fasolo.debian.org">
<pre class="moz-quote-pre" wrap="">They are not licensed under GPL, so please add the correct ones to your debian/copyright.</pre>
</blockquote>
<p>You may be referring to this sentence: Tandem repeat (rmsk.txt)
and gene (refFlat.txt) annotations were obtained from the UCSC
genome database (<a href="http://genome.ucsc.edu/"
class="moz-txt-link-freetext">http://genome.ucsc.edu/</a>) [<a
data-track="click" data-track-action="reference anchor"
data-track-label="link" data-test="citation-ref"
aria-label="Reference 31" title="O’Leary NA, Wright MW, Brister
JR, Ciufo S, Haddad D, McVeigh R, Rajput B, Robbertse B,
Smith-White B, Ako-Adjei D, et al. Reference sequence (RefSeq)
database at NCBI: current status, taxonomic expansion, and
functional annotation. Nucleic Acids Res. 2016;44:D733–45."
href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6#ref-CR31"
id="ref-link-section-d21124409e4782">31</a>]. We made the file
hg38-disease-tr.txt, with 31 disease-associated tandem repeats,
based on Tang et al. [<a data-track="click"
data-track-action="reference anchor" data-track-label="link"
data-test="citation-ref" aria-label="Reference 1" title="Tang H,
Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E,
Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C, et al. Profiling
of short-tandem-repeat disease alleles in 12,632 human whole
genomes. Am J Hum Genet. 2017;101:700–15."
href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6#ref-CR1"
id="ref-link-section-d21124409e4785">1</a>].</p>
<p>The files redistributed in the tests folder with that name
however are not the complete files (those would be >12MB
<a class="moz-txt-link-freetext" href="https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/ucsc_annotations.html">https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/ucsc_annotations.html</a>)
but only a few lines resembling the format to make some good test
cases. I suggest to add the following to d/copyright</p>
<p>Files: hg*-disease-tr.txt<br>
Copyright: 2018-2021 Martin C. Frith <a class="moz-txt-link-rfc2396E" href="mailto:mcfrith@gmail.com"><mcfrith@gmail.com></a><br>
License: GPL-3.0+<br>
Comment: As stated in the accompanying paper<br>
(<a class="moz-txt-link-freetext" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6">https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6</a>),<br>
these files are derived from work published in<br>
Tang H, Kirkness EF, Lippert C, Biggs WH, Fabani M, Guzman E,
Ramakrishnan S, Lavrenko V, Kakaradov B, Hou C, et al<br>
Profiling of short-tandem-repeat disease alleles in 12,632 human
whole genomes. Am J Hum Genet. 2017;101:700–15.</p>
<p>Files: debian-tests-data/*<br>
Copyright: 2018-2021 Martin C. Frith <a class="moz-txt-link-rfc2396E" href="mailto:mcfrith@gmail.com"><mcfrith@gmail.com></a><br>
License: GPL-3.0+<br>
Comment: These files are very short to make quick and good tests
that resemble in format<br>
and content multi-Megabyte large annotation files. The
accompanying paper<br>
(<a class="moz-txt-link-freetext" href="https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6">https://genomebiology.biomedcentral.com/articles/10.1186/s13059-019-1667-6</a>)<br>
explicitly references refFlat.txt<br>
(<a class="moz-txt-link-freetext" href="https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/ucsc_annotations.html">https://googlegenomics.readthedocs.io/en/latest/use_cases/discover_public_data/ucsc_annotations.html</a>)<br>
and <span>rmsk.txt
(<a class="moz-txt-link-freetext" href="http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/">http://hgdownload.cse.ucsc.edu/goldenPath/hg19/database/</a><em>rmsk</em>.<em>txt</em>.gz)<br>
which need to be downloaded separately.<br>
</span></p>
<p>Would that be in line with what you had in mind? <br>
</p>
<p>Many thanks</p>
<p>Steffen<br>
</p>
<p><br>
</p>
<blockquote type="cite"
cite="mid:E1mjKMj-000Bwg-VI@fasolo.debian.org">
<pre class="moz-quote-pre" wrap="">
Thanks!
Thorsten
===
Please feel free to respond to this email if you don't understand why
your files were rejected, or if you upload new files which address our
concerns.
</pre>
</blockquote>
</body>
</html>