[Debian-med-packaging] Bug#995406: Bug#995406: bbmap: package does not ship resource files

Robert heinro at umich.edu
Mon Oct 4 21:19:56 BST 2021


Hi,

With respect to the package test: the two fastq input files have to
match, since the "reads" i.e. fastq records typically come in pairs,
so one file has the forward reads and the other the respective reverse
reads in the same order.  I've tried sending this reply with a
suitable set of files attached (2-3 MB attachment) but it seems the
email didn't make it through.  However you can easily get publicly
available data, deposited at ncbi.nlm.nih.gov in the Sequence Read
Archive (SRA) as follows on a debian system: First install the
sra-toolkit package and then run

    fasterq-dump SRR492190

which gives you two 435 MB decompressed fastq files.  To reduce
resources, maybe take the first 40000 lines of each file (so pipe
through "head -n 40000", has to be a multiple of 4).  So there would
be 10000 "paired reads". Good enough for a test.  A somewhat typical
quality control run would then be

    bbduk.sh in1=SRR492190_1.10000.fastq in2=SRR492190_2.10000.fastq
qtrim=rl trimq=15 minlen=75 out=out.fastq

which should take a second or so.  You can have the files gzip
compressed and save on storage and run this as:

    bbduk.sh in1=SRR492190_1.10000.fastq.gz
in2=SRR492190_2.10000.fastq.gz qtrim=rl trimq=15 minlen=75
out=out.fastq.gz

I'd recommend to do that for testing the package.  With the bbduk
command I used in the original bug report it'll complain about some
missing reference data:

        ******  WARNING! A KMER OPERATION WAS CHOSEN BUT NO KMERS WERE
LOADED.  ******
        ******  YOU NEED TO SPECIFY A REFERENCE FILE OR LITERAL
SEQUENCE.       ******

that you would normally supply with the ref= option while the command
above doesn't need that and will run without any such warnings.

--Robert



More information about the Debian-med-packaging mailing list