[med-svn] [Git][med-team/kraken2][master] 2 commits: Propagate hardening and debug options
Andreas Tille
gitlab at salsa.debian.org
Thu Mar 14 12:41:57 GMT 2019
Andreas Tille pushed to branch master at Debian Med / kraken2
Commits:
0e9cfaca by Andreas Tille at 2019-03-14T12:10:37Z
Propagate hardening and debug options
- - - - -
31e5d8f6 by Andreas Tille at 2019-03-14T12:41:09Z
Several remaining kraken -> kraken2 renames
- - - - -
12 changed files:
- debian/createmanpages
- debian/doc-base
- debian/install
- debian/links
- debian/manpages
- + debian/mans/kraken2-build.1
- + debian/mans/kraken2-inspect.1
- + debian/mans/kraken2.1
- + debian/patches/hardening+debug_options.patch
- debian/patches/series
- debian/rules
- debian/tests/run-unit-test
Changes:
=====================================
debian/createmanpages
=====================================
@@ -3,34 +3,33 @@ MANDIR=debian/mans
mkdir -p $MANDIR
VERSION=`dpkg-parsechangelog | awk '/^Version:/ {print $2}' | sed -e 's/^[0-9]*://' -e 's/-.*//' -e 's/[+~]dfsg$//'`
+NAME=`grep "^Description:" debian/control | sed 's/^Description: *//' | head -n1`
+PROGNAME=`grep "^Package:" debian/control | sed 's/^Package: *//' | head -n1`
-help2man --no-info --no-discard-stderr --help-option="--help" \
- --name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken > $MANDIR/kraken.1
+AUTHOR=".SH AUTHOR\nThis manpage was written by $DEBFULLNAME for the Debian distribution and
+can be used for any other usage of the program.
+"
+progname=${PROGNAME}
help2man --no-info --no-discard-stderr --help-option="--help" \
--name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken-build > $MANDIR/kraken-build.1
+ --version-string="$VERSION" ${progname} > $MANDIR/${progname}.1
+echo $AUTHOR >> $MANDIR/${progname}.1
help2man --no-info --no-discard-stderr --help-option="--help" \
--name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken-filter > $MANDIR/kraken-filter.1
+ --version-string="$VERSION" ${progname}-build > $MANDIR/${progname}-build.1
+echo $AUTHOR >> $MANDIR/${progname}-build.1
help2man --no-info --no-discard-stderr --help-option="--help" \
- --name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken-mpa-report > $MANDIR/kraken-mpa-report.1
+ --name='allows users to gain information about the content of a Kraken 2 database' \
+ --version-string="$VERSION" ${progname}-inspect > $MANDIR/${progname}-inspect.1
+echo $AUTHOR >> $MANDIR/${progname}-inspect.1
-help2man --no-info --no-discard-stderr --help-option="--help" \
- --name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken-report > $MANDIR/kraken-report.1
-
-help2man --no-info --no-discard-stderr --help-option="--help" \
- --name='assigning taxonomic labels to short DNA sequences' \
- --version-string="$VERSION" kraken-translate > $MANDIR/kraken-translate.1
+echo "$MANDIR/*.1" > debian/manpages
cat <<EOT
Please enhance the help2man output.
The following web page might be helpful in doing so:
http://liw.fi/manpages/
EOT
-
=====================================
debian/doc-base
=====================================
@@ -1,24 +1,40 @@
-Document: kraken
+Document: kraken2
Title: Kraken taxonomic sequence classification system - Operating Manual
Author: Derrick Wood <dwood at cs.jhu.edu>
-Abstract: assigning taxonomic labels to short DNA sequences
- Kraken is a system for assigning taxonomic labels to short DNA
- sequences, usually obtained through metagenomic studies. Previous
- attempts by other bioinformatics software to accomplish this task have
- often used sequence alignment or machine learning techniques that were
- quite slow, leading to the development of less sensitive but much faster
- abundance estimation programs. Kraken aims to achieve high sensitivity
- and high speed by utilizing exact alignments of k-mers and a novel
- classification algorithm.
+Abstract: taxonomic classification system using exact k-mer matches
+ Kraken 2 is the newest version of Kraken, a taxonomic classification
+ system using exact k-mer matches to achieve high accuracy and fast
+ classification speeds. This classifier matches each k-mer within a query
+ sequence to the lowest common ancestor (LCA) of all genomes containing
+ the given k-mer. The k-mer assignments inform the classification
+ algorithm. [see: Kraken 1's Webpage for more details].
.
- In its fastest mode of operation, for a simulated metagenome of 100 bp
- reads, Kraken processed over 4 million reads per minute on a single
- core, over 900 times faster than Megablast and over 11 times faster than
- the abundance estimation program MetaPhlAn. Kraken's accuracy is
- comparable with Megablast, with slightly lower sensitivity and very high
- precision.
+ Kraken 2 provides significant improvements to Kraken 1, with faster
+ database build times, smaller database sizes, and faster classification
+ speeds. These improvements were achieved by the following updates to the
+ Kraken classification program:
+ .
+ 1. Storage of Minimizers: Instead of storing/querying entire k-mers,
+ Kraken 2 stores minimizers (l-mers) of each k-mer. The length of
+ each l-mer must be ≤ the k-mer length. Each k-mer is treated by
+ Kraken 2 as if its LCA is the same as its minimizer's LCA.
+ 2. Introduction of Spaced Seeds: Kraken 2 also uses spaced seeds to
+ store and query minimizers to improve classification accuracy.
+ 3. Database Structure: While Kraken 1 saved an indexed and sorted list
+ of k-mer/LCA pairs, Kraken 2 uses a compact hash table. This hash
+ table is a probabilistic data structure that allows for faster
+ queries and lower memory requirements. However, this data structure
+ does have a <1% chance of returning the incorrect LCA or returning
+ an LCA for a non-inserted minimizer. Users can compensate for this
+ possibility by using Kraken's confidence scoring thresholds.
+ 4. Protein Databases: Kraken 2 allows for databases built from amino
+ acid sequences. When queried, Kraken 2 performs a six-frame
+ translated search of the query sequences against the database.
+ 5. 16S Databases: Kraken 2 also provides support for databases not
+ based on NCBI's taxonomy. Currently, these include the 16S
+ databases: Greengenes, SILVA, and RDP.
Section: Science/Biology
Format: html
-Index: /usr/share/doc/kraken/html/index.html
-Files: /usr/share/doc/kraken/html/MANUAL.html
+Index: /usr/share/doc/kraken2/html/index.html
+Files: /usr/share/doc/kraken2/html/MANUAL.html
=====================================
debian/install
=====================================
@@ -1,3 +1,3 @@
-docs/*.html usr/share/doc/kraken/html
-docs/*.png usr/share/doc/kraken/html
-docs/*.css usr/share/doc/kraken/html
+docs/*.html usr/share/doc/kraken2/html
+docs/*.png usr/share/doc/kraken2/html
+docs/*.css usr/share/doc/kraken2/html
=====================================
debian/links
=====================================
@@ -1 +1 @@
-usr/share/doc/kraken/html/MANUAL.html usr/share/doc/kraken/html/index.html
+usr/share/doc/kraken2/html/MANUAL.html usr/share/doc/kraken2/html/index.html
=====================================
debian/manpages
=====================================
@@ -1 +1 @@
-debian/mans/*
+debian/mans/*.1
=====================================
debian/mans/kraken2-build.1
=====================================
@@ -0,0 +1,82 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH KRAKEN2-BUILD "1" "March 2019" "kraken2-build 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2-build \- assigning taxonomic labels to short DNA sequences
+.SH SYNOPSIS
+.B kraken2-build
+[\fI\,task option\/\fR] [\fI\,options\/\fR]
+.SH DESCRIPTION
+.SS "Task options (exactly one must be selected):"
+.TP
+\fB\-\-download\-taxonomy\fR
+Download NCBI taxonomic information
+.TP
+\fB\-\-download\-library\fR TYPE
+Download partial library
+(TYPE = one of "archaea", "bacteria", "plasmid",
+"viral", "human", "fungi", "plant", "protozoa",
+"nr", "nt", "env_nr", "env_nt", "UniVec",
+"UniVec_Core")
+.TP
+\fB\-\-special\fR TYPE
+Download and build a special database
+(TYPE = one of "greengenes", "silva", "rdp")
+.TP
+\fB\-\-add\-to\-library\fR FILE
+Add FILE to library
+.TP
+\fB\-\-build\fR
+Create DB from library
+(requires taxonomy d/l'ed and at least one file
+in library)
+.TP
+\fB\-\-clean\fR
+Remove unneeded files from a built database
+.TP
+\fB\-\-standard\fR
+Download and build default database
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Kraken 2 DB/library name (mandatory except for
+\fB\-\-help\fR/\-\-version)
+.TP
+\fB\-\-threads\fR #
+Number of threads (def: 1)
+.TP
+\fB\-\-kmer\-len\fR NUM
+K\-mer length in bp/aa (build task only;
+def: 35 nt, 15 aa)
+.TP
+\fB\-\-minimizer\-len\fR NUM
+Minimizer length in bp/aa (build task only;
+def: 31 nt, 15 aa)
+.TP
+\fB\-\-minimizer\-spaces\fR NUM
+Number of characters in minimizer that are
+ignored in comparisons (build task only;
+def: 6 nt, 0 aa)
+.TP
+\fB\-\-protein\fR
+Build a protein database for translated search
+.TP
+\fB\-\-no\-masking\fR
+Used with \fB\-\-standard\fR/\-\-download\-library/
+\fB\-\-add\-to\-library\fR to avoid masking low\-complexity
+sequences prior to building; masking requires
+dustmasker or segmasker to be installed in PATH,
+which some users might not have.
+.TP
+\fB\-\-max\-db\-size\fR NUM
+Maximum number of bytes for Kraken 2 hash table;
+if the estimator determines more would normally be
+needed, the reference library will be downsampled
+to fit. (Used with \fB\-\-build\fR/\-\-standard/\-\-special)
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/mans/kraken2-inspect.1
=====================================
@@ -0,0 +1,33 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH KRAKEN2-INSPECT "1" "March 2019" "kraken2-inspect 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2-inspect \- allows users to gain information about the content of a Kraken 2 database
+.SH SYNOPSIS
+.B kraken2-inspect
+[\fI\,options\/\fR]
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Name for Kraken 2 DB
+(default: none)
+.TP
+\fB\-\-threads\fR NUM
+Number of threads to use
+.TP
+\fB\-\-skip\-counts\fR
+Only print database summary statistics
+.TP
+\fB\-\-use\-mpa\-style\fR
+Format output like Kraken 1's kraken\-mpa\-report
+.TP
+\fB\-\-report\-zero\-counts\fR
+Report counts for ALL taxa, even if
+counts are zero
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/mans/kraken2.1
=====================================
@@ -0,0 +1,73 @@
+.\" DO NOT MODIFY THIS FILE! It was generated by help2man 1.47.8.
+.TH KRAKEN2 "1" "March 2019" "kraken2 2.0.7~beta" "User Commands"
+.SH NAME
+kraken2 \- assigning taxonomic labels to short DNA sequences
+.SH SYNOPSIS
+.B kraken2
+[\fI\,options\/\fR] \fI\,<filename(s)>\/\fR
+.SH OPTIONS
+.TP
+\fB\-\-db\fR NAME
+Name for Kraken 2 DB
+(default: none)
+.TP
+\fB\-\-threads\fR NUM
+Number of threads (default: 1)
+.TP
+\fB\-\-quick\fR
+Quick operation (use first hit or hits)
+.TP
+\fB\-\-unclassified\-out\fR FILENAME
+Print unclassified sequences to filename
+.TP
+\fB\-\-classified\-out\fR FILENAME
+Print classified sequences to filename
+.TP
+\fB\-\-output\fR FILENAME
+Print output to filename (default: stdout); "\-" will
+suppress normal output
+.TP
+\fB\-\-confidence\fR FLOAT
+Confidence score threshold (default: 0.0); must be
+in [0, 1].
+.TP
+\fB\-\-minimum\-base\-quality\fR NUM
+Minimum base quality used in classification (def: 0,
+only effective with FASTQ input).
+.TP
+\fB\-\-report\fR FILENAME
+Print a report with aggregrate counts/clade to file
+.TP
+\fB\-\-use\-mpa\-style\fR
+With \fB\-\-report\fR, format report output like Kraken 1's
+kraken\-mpa\-report
+.TP
+\fB\-\-report\-zero\-counts\fR
+With \fB\-\-report\fR, report counts for ALL taxa, even if
+counts are zero
+.TP
+\fB\-\-memory\-mapping\fR
+Avoids loading database into RAM
+.TP
+\fB\-\-paired\fR
+The filenames provided have paired\-end reads
+.TP
+\fB\-\-use\-names\fR
+Print scientific names instead of just taxids
+.TP
+\fB\-\-gzip\-compressed\fR
+Input files are compressed with gzip
+.TP
+\fB\-\-bzip2\-compressed\fR
+Input files are compressed with bzip2
+.TP
+\fB\-\-help\fR
+Print this message
+.TP
+\fB\-\-version\fR
+Print version information
+.PP
+If none of the *\-compressed flags are specified, and the filename provided
+is a regular file, automatic format detection is attempted.
+.SH AUTHOR
+This manpage was written by Andreas Tille for the Debian distribution and can be used for any other usage of the program.
=====================================
debian/patches/hardening+debug_options.patch
=====================================
@@ -0,0 +1,13 @@
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Wed, 13 Mar 2019 15:46:40 +0100
+Description: Propagate hardening and debug options
+
+--- a/src/Makefile
++++ b/src/Makefile
+@@ -1,5 +1,5 @@
+ CXX = g++
+-CXXFLAGS = -fopenmp -Wall -std=c++11 -O3
++CXXFLAGS += -fopenmp -Wall -std=c++11 -O3
+ CXXFLAGS += -DLINEAR_PROBING
+
+ .PHONY: all clean install
=====================================
debian/patches/series
=====================================
@@ -1 +1,2 @@
fix_install.patch
+hardening+debug_options.patch
=====================================
debian/rules
=====================================
@@ -12,7 +12,7 @@ export KRAKEN2_DIR=$(CURDIR)/debian/$(DEB_SOURCE)/usr/lib/$(DEB_SOURCE)
override_dh_link:
dh_link
- for file in $(KRAKEN2_DIR)/* ; do \
+ for file in $(KRAKEN2_DIR)/kraken* ; do \
if ! echo $${file} | grep -q '\.pm$$' ; then \
ln -s ../lib/$(DEB_SOURCE)/`basename $${file}` $(CURDIR)/debian/$(DEB_SOURCE)/usr/bin/`basename $${file}` ; \
fi \
=====================================
debian/tests/run-unit-test
=====================================
@@ -1,20 +1,23 @@
#!/bin/sh -e
-pkg=kraken
+pkg=kraken2
-if [ "$ADTTMP" = "" ] ; then
- ADTTMP=$(mktemp -d /tmp/${pkg}-test.XXXXXX)
- trap "rm -rf $ADTTMP" 0 INT QUIT ABRT PIPE TERM
+if [ "${AUTOPKGTEST_TMP}" = "" ] ; then
+ AUTOPKGTEST_TMP=$(mktemp -d /tmp/${pkg}-test.XXXXXX)
+ # Double quote below to expand the temporary directory variable now versus
+ # later is on purpose.
+ # shellcheck disable=SC2064
+ trap "rm -rf ${AUTOPKGTEST_TMP}" 0 INT QUIT ABRT PIPE TERM
fi
-cd $ADTTMP
-
+cd "${AUTOPKGTEST_TMP}"
cp -a /usr/share/doc/${pkg}/test_data/* .
+
gunzip -r *
-kraken-build --add-to-library Acartia_tonsa.fasta --db test_db
-kraken-build --add-to-library Acinetobacter_phage.fasta --db test_db
-kraken-build --build --minimizer-len 5 --db test_db
-kraken --db test_db test.fa > kraken.output
-kraken-report --db test_db kraken.output
+kraken2-build --add-to-library Acartia_tonsa.fasta --db test_db
+kraken2-build --add-to-library Acinetobacter_phage.fasta --db test_db
+kraken2-build --build --minimizer-len 5 --db test_db
+kraken2 --db test_db test.fa > kraken.output
+## kraken-report --db test_db kraken.output
View it on GitLab: https://salsa.debian.org/med-team/kraken2/compare/27b63b3565c65ff17fd5d0c257394afd10c51421...31e5d8f6c1968f8a4ed56fa19c979d0206cf12b3
--
View it on GitLab: https://salsa.debian.org/med-team/kraken2/compare/27b63b3565c65ff17fd5d0c257394afd10c51421...31e5d8f6c1968f8a4ed56fa19c979d0206cf12b3
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20190314/5f81ecbc/attachment-0001.html>
More information about the debian-med-commit
mailing list