[Debian-med-packaging] [devteam-bioc] Please explain binary files without source in Rsamtools

Martin Morgan mtmorgan at fhcrc.org
Wed Oct 30 14:38:09 UTC 2013


On 10/30/2013 03:12 AM, Maintainer wrote:
> Hi,
>
> as formerly posted here I'm working onn the Debian packaging of
> preconditions for the new version of cummeRbund.  The package Rsamtools
> belongs to the tree of dependencies and inside its source I found some
> binary files with unclear origin which will not be accepted.  Since
> Martin Morgan pointed me in previous cases to the documentation inside
> the package I tried to verify this first but failed.  Here are the
> files in Question:

you don't mention the version you're trying to port; the following is from

   https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Rsamtools

>
> Files: inst/extdata/CaffeineTxdb.sqlite
>   I tried `grep -R CaffeineTxdb` with no hit.

stale; removed. sqlite tables derived from querying a public data base 
('biomart.org').

>
> Files: inst/extdata/ex1.bam

ex1.sam is the text representation of ex1.bam (using Rsamtools::asBam at the 
time of file creation); ex1.bam.bai is an index created on ex1.bam (using 
Rsamtools::indexBam at the time of file creation). ex1.sam is derived from files 
originally distributed with 'samtools' software under an MIT license, see 
Rsamtools/LICENSE

>   I tried
>
>    $ grep -R ex1\.bam | grep -v system\.file
>    inst/unitTests/test_BcfFile.R:        checkEquals("ex1.bam", h[["Sample"]])
>    inst/doc/Rsamtools-Overview.Rnw:list.files(dirname(bamFile), pattern="ex1.bam(.bai)?")
>    inst/doc/Rsamtools-Overview.R:list.files(dirname(bamFile), pattern="ex1.bam(.bai)?")
>    src/samtools/knetfile.c:                fp = knet_open("http://www.sanger.ac.uk/Users/lh3/ex1.bam", "r");
>
>   None of these files is kind of documenting the origin and even worse
>   the URL at www.sanger.ac.uk does not exist any more.
>
> Files: inst/extdata/ex1.bam
>   I tried `grep -R ex1\.bcf | grep -v system\.file` with no hit

ex1.bcf is a file hand created in antiquity, and used in unit tests and man pages.

Rsamtools$ grep -lr ex1.bcf *|grep -v svn
inst/unitTests/test_BcfFile.R
man/BcfFile-class.Rd
man/scanBcf.Rd

>
> Files: inst/extdata/example\.gtf*:
>   I tried `grep -R example\.gtf  | grep -v system\.file` with no hit

example.gtf.gz is gz-compressed text file used in unit tests and man pages, hand 
curated from public data sources, the .tbi variant is an index 
(Rsamtools::indexTabix)

Rsamtools$ grep -lr example.gtf.gz *|grep -v svn
inst/unitTests/test_TabixFile.R
man/headerTabix.Rd
man/seqnamesTabix.Rd
man/TabixFile-class.Rd

>
> Files: inst/extdata/example_from_SAM_Spec*:
>   These files are neither documented nor used since not even
>     grep -R example_from_SAM_Spec
>   shows any hit

This is a file useful to the developer, hand-curated from the SAM spec at

   http://samtools.sourceforge.net/SAMv1.pdf

although SAM.pdf did not historically include version numbers so the precise 
origin is unknown.

>
> Files: inst/extdata/olaps.Rda
>   This file is mentioned in a load statement in
>    inst/doc/Rsamtools-Overview.R
>   but no hint to its origin.

The script for creating this file is in the vignette Rsamtools-Overiew.Rnw

<<readGAlignmentsFromBam, eval=FALSE>>=
library(parallel)
options(srapply_fapply="parallel", mc.cores=detectCores())
olaps <- readGAlignmentsFromBam(bv)
@

>
> Files: inst/unitTests/cases/ex1.sam.gz
>   Except of the first two lines this is a copy of file
>    inst/extdata/ex1.sam
>
> Files: inst/unitTests/cases/ex1_*.bam*
>   Seems these files are derived from file
>    inst/extdata/ex1.sam
>   and just used for verification of the correctness of
>   Rsamtools.  Please confirm this suspicion.
>
> Files: inst/unitTests/cases/plp_refskip.bam*
>   I tried `grep -R plp_refskip | grep -v system\.file` with no hit

This is a hand-crafted file used in a unit test

Rsamtools$ grep -lr plp_refskip *|grep -v svn
inst/unitTests/test_applyPileups.R

>
>
> It would be really helpful if you could clarify the origin of these
> files since otherwise Debian ftpmasters will consider the package
> as non-free which will prevent it from inclusion into main Debian
> distribution and in turn we also could not get cummeRbund updated.
>
> Kind regards and thanks for your cooperation
>
>        Andreas.
>


-- 
Computational Biology / Fred Hutchinson Cancer Research Center
1100 Fairview Ave. N.
PO Box 19024 Seattle, WA 98109

Location: Arnold Building M1 B861
Phone: (206) 667-2793



More information about the Debian-med-packaging mailing list