[med-svn] [Git][med-team/kraken2][master] 2 commits: Propagate hardening and debug options

Andreas Tille gitlab at salsa.debian.org
Thu Mar 14 12:41:57 GMT 2019


Andreas Tille pushed to branch master at Debian Med / kraken2


Commits:
0e9cfaca by Andreas Tille at 2019-03-14T12:10:37Z
Propagate hardening and debug options

- - - - -
31e5d8f6 by Andreas Tille at 2019-03-14T12:41:09Z
Several remaining kraken -> kraken2 renames

- - - - -


12 changed files:

- debian/createmanpages
- debian/doc-base
- debian/install
- debian/links
- debian/manpages
- + debian/mans/kraken2-build.1
- + debian/mans/kraken2-inspect.1
- + debian/mans/kraken2.1
- + debian/patches/hardening+debug_options.patch
- debian/patches/series
- debian/rules
- debian/tests/run-unit-test


Changes:

=====================================
debian/createmanpages
=====================================
@@ -3,34 +3,33 @@ MANDIR=debian/mans
 mkdir -p $MANDIR
 
 VERSION=`dpkg-parsechangelog | awk '/^Version:/ {print $2}' | sed -e 's/^[0-9]*://' -e 's/-.*//' -e 's/[+~]dfsg$//'`
+NAME=`grep "^Description:" debian/control | sed 's/^Description: *//' | head -n1`
+PROGNAME=`grep "^Package:" debian/control | sed 's/^Package: *//' | head -n1`
 
-help2man --no-info --no-discard-stderr --help-option="--help" \
-         --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken > $MANDIR/kraken.1
+AUTHOR=".SH AUTHOR\nThis manpage was written by $DEBFULLNAME for the Debian distribution and
+can be used for any other usage of the program.
+"
 
+progname=${PROGNAME}
 help2man --no-info --no-discard-stderr --help-option="--help" \
          --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken-build > $MANDIR/kraken-build.1
+            --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
 
 help2man --no-info --no-discard-stderr --help-option="--help" \
          --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken-filter > $MANDIR/kraken-filter.1
+            --version-string="$VERSION" ${progname}-build > $MANDIR/${progname}-build.1
+echo $AUTHOR >> $MANDIR/${progname}-build.1
 
 help2man --no-info --no-discard-stderr --help-option="--help" \
-         --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken-mpa-report > $MANDIR/kraken-mpa-report.1
+         --name='allows users to gain information about the content of a Kraken 2 database' \
+            --version-string="$VERSION" ${progname}-inspect > $MANDIR/${progname}-inspect.1
+echo $AUTHOR >> $MANDIR/${progname}-inspect.1
 
-help2man --no-info --no-discard-stderr --help-option="--help" \
-         --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken-report > $MANDIR/kraken-report.1
-
-help2man --no-info --no-discard-stderr --help-option="--help" \
-         --name='assigning taxonomic labels to short DNA sequences' \
-            --version-string="$VERSION" kraken-translate > $MANDIR/kraken-translate.1
+echo "$MANDIR/*.1" > debian/manpages
 
 cat <<EOT
 Please enhance the help2man output.
 The following web page might be helpful in doing so:
     http://liw.fi/manpages/
 EOT
-


=====================================
debian/doc-base
=====================================
@@ -1,24 +1,40 @@
-Document: kraken
+Document: kraken2
 Title: Kraken taxonomic sequence classification system - Operating Manual
 Author: Derrick Wood <dwood at cs.jhu.edu>
-Abstract: assigning taxonomic labels to short DNA sequences
- Kraken is a system for assigning taxonomic labels to short DNA
- sequences, usually obtained through metagenomic studies. Previous
- attempts by other bioinformatics software to accomplish this task have
- often used sequence alignment or machine learning techniques that were
- quite slow, leading to the development of less sensitive but much faster
- abundance estimation programs. Kraken aims to achieve high sensitivity
- and high speed by utilizing exact alignments of k-mers and a novel
- classification algorithm.
+Abstract: taxonomic classification system using exact k-mer matches
+ Kraken 2 is the newest version of Kraken, a taxonomic classification
+ system using exact k-mer matches to achieve high accuracy and fast
+ classification speeds. This classifier matches each k-mer within a query
+ sequence to the lowest common ancestor (LCA) of all genomes containing
+ the given k-mer. The k-mer assignments inform the classification
+ algorithm. [see: Kraken 1's Webpage for more details].
  .
- In its fastest mode of operation, for a simulated metagenome of 100 bp
- reads, Kraken processed over 4 million reads per minute on a single
- core, over 900 times faster than Megablast and over 11 times faster than
- the abundance estimation program MetaPhlAn. Kraken's accuracy is
- comparable with Megablast, with slightly lower sensitivity and very high
- precision.
+ Kraken 2 provides significant improvements to Kraken 1, with faster
+ database build times, smaller database sizes, and faster classification
+ speeds. These improvements were achieved by the following updates to the
+ Kraken classification program:
+ .
+  1. Storage of Minimizers: Instead of storing/querying entire k-mers,
+     Kraken 2 stores minimizers (l-mers) of each k-mer. The length of
+     each l-mer must be ≤ the k-mer length. Each k-mer is treated by
+     Kraken 2 as if its LCA is the same as its minimizer's LCA.
+  2. Introduction of Spaced Seeds: Kraken 2 also uses spaced seeds to
+     store and query minimizers to improve classification accuracy.
+  3. Database Structure: While Kraken 1 saved an indexed and sorted list
+     of k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash
+     table is a probabilistic data structure that allows for faster
+     queries and lower memory requirements. However, this data structure
+     does have a <1% chance of returning the incorrect LCA or returning
+     an LCA for a non-inserted minimizer. Users can compensate for this
+     possibility by using Kraken's confidence scoring thresholds.
+  4. Protein Databases: Kraken 2 allows for databases built from amino
+     acid sequences. When queried, Kraken 2 performs a six-frame
+     translated search of the query sequences against the database.
+  5. 16S Databases: Kraken 2 also provides support for databases not
+     based on NCBI's taxonomy. Currently, these include the 16S
+     databases: Greengenes, SILVA, and RDP.
 Section: Science/Biology
 
 Format: html
-Index: /usr/share/doc/kraken/html/index.html
-Files: /usr/share/doc/kraken/html/MANUAL.html
+Index: /usr/share/doc/kraken2/html/index.html
+Files: /usr/share/doc/kraken2/html/MANUAL.html


=====================================
debian/install
=====================================
@@ -1,3 +1,3 @@
-docs/*.html	usr/share/doc/kraken/html
-docs/*.png	usr/share/doc/kraken/html
-docs/*.css	usr/share/doc/kraken/html
+docs/*.html	usr/share/doc/kraken2/html
+docs/*.png	usr/share/doc/kraken2/html
+docs/*.css	usr/share/doc/kraken2/html


=====================================
debian/links
=====================================
@@ -1 +1 @@
-usr/share/doc/kraken/html/MANUAL.html	usr/share/doc/kraken/html/index.html
+usr/share/doc/kraken2/html/MANUAL.html	usr/share/doc/kraken2/html/index.html


=====================================
debian/manpages
=====================================
@@ -1 +1 @@
-debian/mans/*
+debian/mans/*.1


=====================================
debian/mans/kraken2-build.1
=====================================
@@ -0,0 +1,82 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH KRAKEN2-BUILD "1" "March 2019" "kraken2-build 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2-build \- assigning taxonomic labels to short DNA sequences
+.SH SYNOPSIS
+.B kraken2-build
+[\fI\,task option\/\fR] [\fI\,options\/\fR]
+.SH DESCRIPTION
+.SS "Task options (exactly one must be selected):"
+.TP
+\fB\-\-download\-taxonomy\fR
+Download NCBI taxonomic information
+.TP
+\fB\-\-download\-library\fR TYPE
+Download partial library
+(TYPE = one of "archaea", "bacteria", "plasmid",
+"viral", "human", "fungi", "plant", "protozoa",
+"nr", "nt", "env_nr", "env_nt", "UniVec",
+"UniVec_Core")
+.TP
+\fB\-\-special\fR TYPE
+Download and build a special database
+(TYPE = one of "greengenes", "silva", "rdp")
+.TP
+\fB\-\-add\-to\-library\fR FILE
+Add FILE to library
+.TP
+\fB\-\-build\fR
+Create DB from library
+(requires taxonomy d/l'ed and at least one file
+in library)
+.TP
+\fB\-\-clean\fR
+Remove unneeded files from a built database
+.TP
+\fB\-\-standard\fR
+Download and build default database
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Kraken 2 DB/library name (mandatory except for
+\fB\-\-help\fR/\-\-version)
+.TP
+\fB\-\-threads\fR #
+Number of threads (def: 1)
+.TP
+\fB\-\-kmer\-len\fR NUM
+K\-mer length in bp/aa (build task only;
+def: 35 nt, 15 aa)
+.TP
+\fB\-\-minimizer\-len\fR NUM
+Minimizer length in bp/aa (build task only;
+def: 31 nt, 15 aa)
+.TP
+\fB\-\-minimizer\-spaces\fR NUM
+Number of characters in minimizer that are
+ignored in comparisons (build task only;
+def: 6 nt, 0 aa)
+.TP
+\fB\-\-protein\fR
+Build a protein database for translated search
+.TP
+\fB\-\-no\-masking\fR
+Used with \fB\-\-standard\fR/\-\-download\-library/
+\fB\-\-add\-to\-library\fR to avoid masking low\-complexity
+sequences prior to building; masking requires
+dustmasker or segmasker to be installed in PATH,
+which some users might not have.
+.TP
+\fB\-\-max\-db\-size\fR NUM
+Maximum number of bytes for Kraken 2 hash table;
+if the estimator determines more would normally be
+needed, the reference library will be downsampled
+to fit. (Used with \fB\-\-build\fR/\-\-standard/\-\-special)
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.


=====================================
debian/mans/kraken2-inspect.1
=====================================
@@ -0,0 +1,33 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH KRAKEN2-INSPECT "1" "March 2019" "kraken2-inspect 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2-inspect \- allows users to gain information about the content of a Kraken 2 database
+.SH SYNOPSIS
+.B kraken2-inspect
+[\fI\,options\/\fR]
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Name for Kraken 2 DB
+(default: none)
+.TP
+\fB\-\-threads\fR NUM
+Number of threads to use
+.TP
+\fB\-\-skip\-counts\fR
+Only print database summary statistics
+.TP
+\fB\-\-use\-mpa\-style\fR
+Format output like Kraken 1's kraken\-mpa\-report
+.TP
+\fB\-\-report\-zero\-counts\fR
+Report counts for ALL taxa, even if
+counts are zero
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.


=====================================
debian/mans/kraken2.1
=====================================
@@ -0,0 +1,73 @@
+.\" DO NOT MODIFY THIS FILE!  It was generated by help2man 1.47.8.
+.TH KRAKEN2 "1" "March 2019" "kraken2 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2 \- assigning taxonomic labels to short DNA sequences
+.SH SYNOPSIS
+.B kraken2
+[\fI\,options\/\fR] \fI\,<filename(s)>\/\fR
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Name for Kraken 2 DB
+(default: none)
+.TP
+\fB\-\-threads\fR NUM
+Number of threads (default: 1)
+.TP
+\fB\-\-quick\fR
+Quick operation (use first hit or hits)
+.TP
+\fB\-\-unclassified\-out\fR FILENAME
+Print unclassified sequences to filename
+.TP
+\fB\-\-classified\-out\fR FILENAME
+Print classified sequences to filename
+.TP
+\fB\-\-output\fR FILENAME
+Print output to filename (default: stdout); "\-" will
+suppress normal output
+.TP
+\fB\-\-confidence\fR FLOAT
+Confidence score threshold (default: 0.0); must be
+in [0, 1].
+.TP
+\fB\-\-minimum\-base\-quality\fR NUM
+Minimum base quality used in classification (def: 0,
+only effective with FASTQ input).
+.TP
+\fB\-\-report\fR FILENAME
+Print a report with aggregrate counts/clade to file
+.TP
+\fB\-\-use\-mpa\-style\fR
+With \fB\-\-report\fR, format report output like Kraken 1's
+kraken\-mpa\-report
+.TP
+\fB\-\-report\-zero\-counts\fR
+With \fB\-\-report\fR, report counts for ALL taxa, even if
+counts are zero
+.TP
+\fB\-\-memory\-mapping\fR
+Avoids loading database into RAM
+.TP
+\fB\-\-paired\fR
+The filenames provided have paired\-end reads
+.TP
+\fB\-\-use\-names\fR
+Print scientific names instead of just taxids
+.TP
+\fB\-\-gzip\-compressed\fR
+Input files are compressed with gzip
+.TP
+\fB\-\-bzip2\-compressed\fR
+Input files are compressed with bzip2
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.PP
+If none of the *\-compressed flags are specified, and the filename provided
+is a regular file, automatic format detection is attempted.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.


=====================================
debian/patches/hardening+debug_options.patch
=====================================
@@ -0,0 +1,13 @@
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Wed, 13 Mar 2019 15:46:40 +0100
+Description: Propagate hardening and debug options
+
+--- a/src/Makefile
++++ b/src/Makefile
+@@ -1,5 +1,5 @@
+ CXX = g++
+-CXXFLAGS = -fopenmp -Wall -std=c++11 -O3
++CXXFLAGS += -fopenmp -Wall -std=c++11 -O3
+ CXXFLAGS += -DLINEAR_PROBING
+ 
+ .PHONY: all clean install


=====================================
debian/patches/series
=====================================
@@ -1 +1,2 @@
 fix_install.patch
+hardening+debug_options.patch


=====================================
debian/rules
=====================================
@@ -12,7 +12,7 @@ export KRAKEN2_DIR=$(CURDIR)/debian/$(DEB_SOURCE)/usr/lib/$(DEB_SOURCE)
 
 override_dh_link:
 	dh_link
-	for file in $(KRAKEN2_DIR)/* ; do \
+	for file in $(KRAKEN2_DIR)/kraken* ; do \
 	    if ! echo $${file} | grep -q '\.pm$$' ; then \
 		ln -s ../lib/$(DEB_SOURCE)/`basename $${file}` $(CURDIR)/debian/$(DEB_SOURCE)/usr/bin/`basename $${file}` ; \
 	    fi \


=====================================
debian/tests/run-unit-test
=====================================
@@ -1,20 +1,23 @@
 #!/bin/sh -e
 
-pkg=kraken
+pkg=kraken2
 
-if [ "$ADTTMP" = "" ] ; then
-  ADTTMP=$(mktemp -d /tmp/${pkg}-test.XXXXXX)
-  trap "rm -rf $ADTTMP" 0 INT QUIT ABRT PIPE TERM
+if [ "${AUTOPKGTEST_TMP}" = "" ] ; then
+  AUTOPKGTEST_TMP=$(mktemp -d /tmp/${pkg}-test.XXXXXX)
+  # Double quote below to expand the temporary directory variable now versus
+  # later is on purpose.
+  # shellcheck disable=SC2064
+  trap "rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM
 fi
 
-cd $ADTTMP
-
+cd "${AUTOPKGTEST_TMP}"
 cp -a /usr/share/doc/${pkg}/test_data/* .
+
 gunzip -r *
 
-kraken-build --add-to-library Acartia_tonsa.fasta --db test_db
-kraken-build --add-to-library Acinetobacter_phage.fasta --db test_db
-kraken-build --build --minimizer-len 5 --db test_db
-kraken --db test_db test.fa > kraken.output
-kraken-report --db test_db kraken.output
+kraken2-build --add-to-library Acartia_tonsa.fasta --db test_db
+kraken2-build --add-to-library Acinetobacter_phage.fasta --db test_db
+kraken2-build --build --minimizer-len 5 --db test_db
+kraken2 --db test_db test.fa > kraken.output
+## kraken-report --db test_db kraken.output
 



View it on GitLab: https://salsa.debian.org/med-team/kraken2/compare/27b63b3565c65ff17fd5d0c257394afd10c51421...31e5d8f6c1968f8a4ed56fa19c979d0206cf12b3

-- 
View it on GitLab: https://salsa.debian.org/med-team/kraken2/compare/27b63b3565c65ff17fd5d0c257394afd10c51421...31e5d8f6c1968f8a4ed56fa19c979d0206cf12b3
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190314/5f81ecbc/attachment-0001.html>


More information about the debian-med-commit mailing list