[Debian-med-packaging] Bug#933661: metaphlan2: Port to Python3 needed

Steve Langasek steve.langasek at canonical.com
Fri Aug 16 01:11:18 BST 2019


Package: metaphlan2
Followup-For: Bug #933661
User: ubuntu-devel at lists.ubuntu.com
Usertags: origin-ubuntu eoan ubuntu-patch

Hi Andreas,

Prompted by an interest in dropping python-pandas from Ubuntu rather than
fixing its failing tests, I took a look at moving metaphlan2 to python3. 
Using the upstream 2.9.19 release
(https://bitbucket.org/biobakery/metaphlan2/get/2.9.19.tar.bz2) this is
fairly straightforward, however I find that 2.9 wants the databases in a
different format than are made available in the metaphlan2-data package; it
needs the new
https://bitbucket.org/biobakery/metaphlan2/downloads/mpa_v29_CHOCOPhlAn_201901.tar
database file instead of the current v20 file.

Since this data tarball will need repacking and some changes to the postinst
script (e.g. new metaphlan2 wants the database under
/usr/share/metaphlan2/metaphlan_databases instead of
/usr/share/metaphlan2/db_v20; and the input is no longer a fasta file but a
.fna.bz2), at least for now I'm not going to upload this change to Ubuntu.
But I'm attaching the debdiff with my work in progress for the metaphlan2
package, for your consideration.

Cheers,
-- 
Steve Langasek                   Give me a lever long enough and a Free OS
Debian Developer                   to set it on, and I can move the world.
Ubuntu Developer                                   https://www.debian.org/
slangasek at ubuntu.com                                     vorlon at debian.org
-------------- next part --------------
diff -Nru metaphlan2-2.7.8/debian/control metaphlan2-2.9.19/debian/control
--- metaphlan2-2.7.8/debian/control	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/control	2019-08-15 13:12:57.000000000 -0700
@@ -4,7 +4,7 @@
 Section: science
 Priority: optional
 Build-Depends: debhelper (>= 11~),
-               python-all,
+               python3-all,
                dh-python,
                pandoc,
                bowtie2
@@ -15,12 +15,12 @@
 
 Package: metaphlan2
 Architecture: all
-Depends: ${python:Depends},
+Depends: ${python3:Depends},
          ${misc:Depends},
          metaphlan2-data,
-         python-biom-format,
-         python-msgpack,
-         python-pandas,
+         python3-biom-format,
+         python3-msgpack,
+         python3-pandas,
          bowtie2
 Description: Metagenomic Phylogenetic Analysis
  MetaPhlAn is a computational tool for profiling the composition of
diff -Nru metaphlan2-2.7.8/debian/patches/_metaphlan2.py.patch metaphlan2-2.9.19/debian/patches/_metaphlan2.py.patch
--- metaphlan2-2.7.8/debian/patches/_metaphlan2.py.patch	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/_metaphlan2.py.patch	2019-08-15 13:12:57.000000000 -0700
@@ -9,8 +9,10 @@
  support function annotations: These are optional in Python 3, and are
  removed from the function definitions in "_metaphlan2.py" by the patch.
 
---- a/_metaphlan2.py
-+++ b/_metaphlan2.py
+Index: metaphlan2-2.9.19/_metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/_metaphlan2.py
++++ metaphlan2-2.9.19/_metaphlan2.py
 @@ -3,7 +3,7 @@
  # This module defines the functions which run MetaPhlAn2 on
  # single and paired fastq data.
@@ -20,8 +22,8 @@
  import subprocess as sb
  from q2_types.per_sample_sequences import SingleLanePerSampleSingleEndFastqDirFmt
  from q2_types.per_sample_sequences import SingleLanePerSamplePairedEndFastqDirFmt
-@@ -24,8 +24,7 @@ def metaphlan2_helper(raw_data, nproc, i
-     sb.run(cmd, check=True)
+@@ -30,8 +30,7 @@
+           'doi: https://doi.org/10.1038/nmeth.3589', end='\n\n')
  
  
 -def profile_single_fastq(raw_data: SingleLanePerSampleSingleEndFastqDirFmt,
@@ -30,7 +32,7 @@
      output_biom = None
  
      with tempfile.TemporaryDirectory() as tmp_dir:
-@@ -36,8 +35,7 @@ def profile_single_fastq(raw_data: Singl
+@@ -42,8 +41,7 @@
      return output_biom
  
  
diff -Nru metaphlan2-2.7.8/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch metaphlan2-2.9.19/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch
--- metaphlan2-2.7.8/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/mpa_dir-is-usr_share_metaphlan2.patch	2019-08-15 13:12:57.000000000 -0700
@@ -5,182 +5,222 @@
  .
  The doc is also adapted to this change.
 
---- a/metaphlan2.py
-+++ b/metaphlan2.py
-@@ -417,7 +417,7 @@ def read_params(args):
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -143,7 +143,7 @@
  
              "*  You can also provide an externally BowTie2-mapped SAM if you specify this format with \n"
              "   --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:\n"
--            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq\n"
-+            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x /usr/share/metaphlan2/db_v20/mpa_v20_m200 -U metagenome.fastq\n"
-             "$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt\n\n"
+-            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U metagenome.fastq\n"
++            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x /usr/share/metaphlan2/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U metagenome.fastq\n"
+             "$ metaphlan2.py metagenome.sam --input_type sam -o profiled_metagenome.txt\n\n"
  
-             # "*  Multiple alternative ways to pass the input are also available:\n"
-@@ -1391,7 +1391,7 @@ def metaphlan2():
+             "*  We can also natively handle paired-end metagenomes, and, more generally, metagenomes stored in \n"
+@@ -1154,7 +1154,7 @@
      # check for the mpa_pkl file
      if not os.path.isfile(pars['mpa_pkl']):
          sys.stderr.write("Error: Unable to find the mpa_pkl file at: " + pars['mpa_pkl'] +
 -                         "\nExpecting location ${mpa_dir}/db_v20/map_v20_m200.pkl "
-+                         "\nExpecting location /usr/share/metaphlan2/db_v20/mpa_v20_m200.pkl "
-                          "\nSelect the file location with the option --mpa_pkl.\n"
++                         "\nExpecting location /usr/share/metaphlan2/db_v20/map_v20_m200.pkl "
                           "Exiting...\n\n")
          sys.exit(1)
---- a/README.md
-+++ b/README.md
-@@ -86,33 +86,27 @@ In case you moved the `metaphlan2.py` sc
- 
- This section presents some basic usages of MetaPhlAn2, for more advanced usages, please see at [its wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
- 
--We assume here that ``metaphlan2.py`` is in the system path and that ``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this two variables moving to your MetaPhlAn2 local folder and type:
--
--```
--#!bash
--$ export PATH=`pwd`:$PATH
--$ export mpa_dir=`pwd`
--```
-+We assume here that ``metaphlan2`` is in the system path.
- 
- Here is the basic example to profile a metagenome from raw reads (requires BowTie2 in the system path with execution and read permissions, Perl installed). 
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.fastq --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome.fastq --input_type fastq > profiled_metagenome.txt
- ```
- 
- It is highly recommended to save the intermediate BowTie2 output for re-running MetaPhlAn extremely quickly (`--bowtie2out`), and use multiple CPUs (`--nproc`) if available:
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
- ```
- 
- If you already mapped your metagenome against the marker DB (using a previous  MetaPhlAn run), you can obtain the results in few seconds by using the previously saved `--bowtie2out` file and specifying the input (`--input_type bowtie2out`):
- 
- ```
- #!bash
--$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > profiled_metagenome.txt
-+$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > profiled_metagenome.txt
- ```
- 
- You can also provide an externally BowTie2-mapped SAM if you specify this format with `--input_type`. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
-@@ -120,14 +114,14 @@ You can also provide an externally BowTi
- ```
- #!bash
- $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x databases/mpa_v20_m200 -U metagenome.fastq
--$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
-+$ metaphlan2 metagenome.sam --input_type sam > profiled_metagenome.txt
- ```
- 
- MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not use the paired-end information), and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
- 
- ```
- #!bash
--$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
-+$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
- ```
- 
- For advanced options and other analysis types (such as strain tracking) please refer to the full command-line options.
-@@ -136,7 +130,7 @@ For advanced options and other analysis
- 
- 
- ```
--usage: metaphlan2.py --input_type
-+usage: metaphlan2 --input_type
-                      {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
-                      [--mpa_pkl MPA_PKL] [--bowtie2db METAPHLAN_BOWTIE2_DB]
-                      [--bt2_ps BowTie2 presets] [--bowtie2_exe BOWTIE2_EXE]
-@@ -161,7 +155,7 @@ AUTHORS: Nicola Segata (nicola.segata at un
- 
- COMMON COMMANDS
- 
-- We assume here that metaphlan2.py is in the system path and that mpa_dir bash variable contains the
-+ We assume here that metaphlan2 is in the system path and that mpa_dir bash variable contains the
-  main MetaPhlAn folder. Also BowTie2 should be in the system path with execution and read
-  permissions, and Perl should be installed.
- 
-@@ -172,25 +166,25 @@ strains in particular cases) present in
- relative abundance. This correspond to the default analysis type (--analysis_type rel_ab).
- 
- *  Profiling a metagenome from raw reads:
--$ metaphlan2.py metagenome.fastq --input_type fastq
-+$ metaphlan2 metagenome.fastq --input_type fastq
- 
- *  You can take advantage of multiple CPUs and save the intermediate BowTie2 output for re-running
-    MetaPhlAn extremely quickly:
--$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
-+$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
- 
- *  If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you
-    can obtain the results in few seconds by using the previously saved --bowtie2out file and 
-    specifying the input (--input_type bowtie2out):
--$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out
-+$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out
- 
- *  You can also provide an externally BowTie2-mapped SAM if you specify this format with 
-    --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:
- $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x databases/mpa_v20_m200 -U metagenome.fastq
--$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
-+$ metaphlan2 metagenome.sam --input_type sam > profiled_metagenome.txt
- 
- *  We can also natively handle paired-end metagenomes, and, more generally, metagenomes stored in 
-   multiple files (but you need to specify the --bowtie2out parameter):
--$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
-+$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
- 
- ------------------------------------------------------------------- 
-  
-@@ -208,23 +202,23 @@ file saved during the execution of the d
- *  The following command will output the abundance of each marker with a RPK (reads per kil-base) 
-    higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated before as 
-    shown above).
--$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
-    The obtained RPK can be optionally normalized by the total number of reads in the metagenome 
-    to guarantee fair comparisons of abundances across samples. The number of reads in the metagenome
-    needs to be passed with the '--nreads' argument
- 
- *  The list of markers present in the sample can be obtained with '-t marker_pres_table'
--$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
-    The --pres_th argument (default 1.0) set the minimum RPK value to consider a marker present
- 
- *  The list '-t clade_profiles' analysis type reports the same information of '-t marker_ab_table'
-    but the markers are reported on a clade-by-clade basis.
--$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
- 
- *  Finally, to obtain all markers present for a specific clade and all its subclades, the 
-    '-t clade_specific_strain_tracker' should be used. For example, the following command
-    is reporting the presence/absence of the markers for the B. fragulis species and its strains
--$ metaphlan2.py -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 databases/mpa_v20_m200.pkl --input_type bowtie2out > marker_abundance_table.txt
-+$ metaphlan2 -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 databases/mpa_v20_m200.pkl --input_type bowtie2out > marker_abundance_table.txt
-    the optional argument --min_ab specifies the minimum clade abundance for reporting the markers
- 
- ------------------------------------------------------------------- 
-@@ -521,7 +515,7 @@ pickle.dump(db, ofile, pickle.HIGHEST_PR
- ofile.close()
- ```
- 
--* To use the new database, switch to metaphlan2/db_v21 instead of metaphlan2/db\_v20 when running metaphlan2.py with option "--mpa\_pkl".
-+* To use the new database, switch to metaphlan2/db_v21 instead of metaphlan2/db\_v20 when running metaphlan2 with option "--mpa\_pkl".
- 
- 
- ## Metagenomic strain-level population genomics
-@@ -591,7 +585,7 @@ for f in $(ls fastqs/*.bz2)
- do
-     echo "Running metaphlan2 on ${f}"
-     bn=$(basename ${f} | cut -d . -f 1)
--    tar xjfO ${f} | ../metaphlan2.py --bowtie2db ../databases/mpa_v20_m200 --mpa_pkl ../databases/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
-+    tar xjfO ${f} | metaphlan2 --bowtie2db /usr/share/metaphlan2/db_v20/mpa_v20_m200 --mpa_pkl /usr/share/metaphlan2/db_v20/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
- done
- ```
- 
-@@ -731,4 +725,4 @@ In the output folder, you can find the f
- 1. clade_name.fasta: the alignment file of all metagenomic strains.
- 3. *.marker_pos: this file shows the starting position of each marker in the strains.
- 3. *.info: this file shows the general information like the total length of the concatenated markers (full sequence length), number of used markers, etc.
+ 
+Index: metaphlan2-2.9.19/README.md
+===================================================================
+--- metaphlan2-2.9.19.orig/README.md
++++ metaphlan2-2.9.19/README.md
+@@ -107,14 +107,14 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py --install 
++$ metaphlan2 --install 
+ ```
+ 
+ By default, the latest MetaPhlAn2 database is downloaded and built. You can download a specific version with the `--index` parameter
+ 
+ ```
+ #!bash
+-$ metaphlan2.py --install --index v29_CHOCOPhlAn_201901
++$ metaphlan2 --install --index v29_CHOCOPhlAn_201901
+ ```
+ 
+ --------------------------
+@@ -123,19 +123,13 @@
+ 
+ This section presents some basic usages of MetaPhlAn2, for more advanced usages, please see at [its wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
+ 
+-We assume here that ``metaphlan2.py`` is in the system path and that ``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this two variables moving to your MetaPhlAn2 local folder and type:
+-
+-```
+-#!bash
+-$ export PATH=`pwd`:$PATH
+-$ export mpa_dir=`pwd`
+-```
++We assume here that ``metaphlan2`` is in the system path.
+ 
+ Here is the basic example to profile a metagenome from raw reads (requires BowTie2 in the system path with execution and read permissions, Perl installed). 
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.fastq --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ ### Starting from version 2.9, MetaPhlAn2 estimates the fraction of the metagenome composed by microbes that are unknown. The relative abundance profile is scaled according the percentage of reads mapping to a known clade. 
+@@ -146,7 +140,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ 
+@@ -154,7 +148,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o profiled_metagenome.txt
+ ```
+ 
+ `bowtie2out` files generated with MetaPhlAn2 versions below 2.9 are not compatibile. Starting from MetaPhlAn2 2.9, the BowTie2 ouput now includes the size of the profiled metagenome.
+@@ -162,7 +156,7 @@
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out --nreads 520000 -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out --nreads 520000 -o profiled_metagenome.txt
+ ```
+ 
+ You can also provide an externally BowTie2-mapped SAM if you specify this format with `--input_type`. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
+@@ -170,14 +164,14 @@
+ ```
+ #!bash
+ $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x metaphlan_databases/mpa_v29_CHOCOPhlAn_201901 -U metagenome.fastq
+-$ metaphlan2.py metagenome.sam --input_type sam -o profiled_metagenome.txt
++$ metaphlan2 metagenome.sam --input_type sam -o profiled_metagenome.txt
+ ```
+ 
+ MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not use the paired-end information), and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
+ 
+ ```
+ #!bash
+-$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
+ ```
+ 
+ You can provide the specific database version with `--index`. 
+@@ -189,7 +183,7 @@
+ 
+ ## Full command-line options
+ ```
+-usage: metaphlan2.py --input_type
++usage: metaphlan2 --input_type
+                      {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
+                      [--mpa_pkl MPA_PKL] [--force]
+                      [--bowtie2db METAPHLAN_BOWTIE2_DB] [-x INDEX]
+@@ -219,7 +213,7 @@
+ 
+ COMMON COMMANDS
+ 
+- We assume here that metaphlan2.py is in the system path and that mpa_dir bash variable contains the
++ We assume here that metaphlan2 is in the system path and that mpa_dir bash variable contains the
+  main MetaPhlAn folder. Also BowTie2 should be in the system path with execution and read
+  permissions, and Perl should be installed)
+ 
+@@ -230,30 +224,30 @@
+ relative abundance. This correspond to the default analysis type (-t rel_ab).
+ 
+ *  Profiling a metagenome from raw reads:
+-$ metaphlan2.py metagenome.fastq --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --input_type fastq -o profiled_metagenome.txt
+ 
+ *  You can take advantage of multiple CPUs and save the intermediate BowTie2 output for re-running
+    MetaPhlAn extremely quickly:
+-$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
++$ metaphlan2 metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq -o profiled_metagenome.txt
+ 
+ *  If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you
+    can obtain the results in few seconds by using the previously saved --bowtie2out file and
+    specifying the input (--input_type bowtie2out):
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out -o profiled_metagenome.txt
+ 
+ *  bowtie2out files generated with MetaPhlAn2 versions below 2.9 are not compatibile.
+    Starting from MetaPhlAn2 2.9, the BowTie2 ouput now includes the size of the profiled metagenome.
+    If you want to re-run MetaPhlAn2 using these file you should provide the metagenome size via --nreads:
+-$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out --nreads 520000 -o profiled_metagenome.txt
++$ metaphlan2 metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out --nreads 520000 -o profiled_metagenome.txt
+ 
+ *  You can also provide an externally BowTie2-mapped SAM if you specify this format with
+    --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:
+ $ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/metaphlan_databases/mpa_v25_CHOCOPhlAn_201901 -U metagenome.fastq
+-$ metaphlan2.py metagenome.sam --input_type sam -o profiled_metagenome.txt
++$ metaphlan2 metagenome.sam --input_type sam -o profiled_metagenome.txt
+ 
+ *  We can also natively handle paired-end metagenomes, and, more generally, metagenomes stored in
+   multiple files (but you need to specify the --bowtie2out parameter):
+-$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
++$ metaphlan2 metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
+ 
+ -------------------------------------------------------------------
+ 
+@@ -271,25 +265,25 @@
+ *  The following command will output the abundance of each marker with a RPK (reads per kilo-base)
+    higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated before as
+    shown above).
+-$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
+    The obtained RPK can be optionally normalized by the total number of reads in the metagenome
+    to guarantee fair comparisons of abundances across samples. The number of reads in the metagenome
+    needs to be passed with the '--nreads' argument
+ 
+ *  The list of markers present in the sample can be obtained with '-t marker_pres_table'
+-$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
+    The --pres_th argument (default 1.0) set the minimum RPK value to consider a marker present
+ 
+ *  The list '-t clade_profiles' analysis type reports the same information of '-t marker_ab_table'
+    but the markers are reported on a clade-by-clade basis.
+-$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
+ 
+ *  Finally, to obtain all markers present for a specific clade and all its subclades, the
+    '-t clade_specific_strain_tracker' should be used. For example, the following command
+    is reporting the presence/absence of the markers for the B. fragulis species and its strains
+    the optional argument --min_ab specifies the minimum clade abundance for reporting the markers
+ 
+-$ metaphlan2.py -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
++$ metaphlan2 -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out -o marker_abundance_table.txt
+ 
+ -------------------------------------------------------------------
+ 
+@@ -539,7 +533,7 @@
+ 
+ ```
+ 
+-* To use the new database, run metaphlan2.py with option "--index v25_CHOCOPhlAn_NEW".
++* To use the new database, run metaphlan2 with option "--index v25_CHOCOPhlAn_NEW".
+ 
+ 
+ ## Metagenomic strain-level population genomics
+@@ -611,7 +605,7 @@
+ do
+     echo "Running metaphlan2 on ${f}"
+     bn=$(basename ${f} | cut -d '.' -f 1)
+-     ../metaphlan2.py --index v29_CHOCOPhlAn_201901 --input_type multifastq --nproc 10s -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o ssams/${bn}.profile ${f}
++     metaphlan2 --index v29_CHOCOPhlAn_201901 --input_type multifastq --nproc 10s -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o ssams/${bn}.profile ${f}
+ done
+ ```
+ 
+@@ -751,4 +745,4 @@
+ 1. clade_name.fasta: the alignment file of all metagenomic strains.
+ 3. *.marker_pos: this file shows the starting position of each marker in the strains.
+ 3. *.info: this file shows the general information like the total length of the concatenated markers (full sequence length), number of used markers, etc.
 -4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg\_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg\_coverage" is the average coverage at all polymorphic sites.
 \ No newline at end of file
-+4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg\_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg\_coverage" is the average coverage at all polymorphic sites.
++4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage\_of\_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg\_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg\_coverage" is the average coverage at all polymorphic sites.
diff -Nru metaphlan2-2.7.8/debian/patches/python3.patch metaphlan2-2.9.19/debian/patches/python3.patch
--- metaphlan2-2.7.8/debian/patches/python3.patch	1969-12-31 16:00:00.000000000 -0800
+++ metaphlan2-2.9.19/debian/patches/python3.patch	2019-08-15 13:12:57.000000000 -0700
@@ -0,0 +1,74 @@
+Description: set interpreter to python3
+Author: Steve Langasek <steve.langasek at ubuntu.com>
+Last-Modified: 2019-08-15
+
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ from __future__ import with_statement
+ __author__ = ('Nicola Segata (nicola.segata at unitn.it), '
+               'Duy Tin Truong, '
+Index: metaphlan2-2.9.19/strainphlan.py
+===================================================================
+--- metaphlan2-2.9.19.orig/strainphlan.py
++++ metaphlan2-2.9.19/strainphlan.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ # Author: Duy Tin Truong (duytin.truong at unitn.it)
+ #		at CIBIO, University of Trento, Italy
+ 
+Index: metaphlan2-2.9.19/utils/extract_markers.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/extract_markers.py
++++ metaphlan2-2.9.19/utils/extract_markers.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ #Author: Duy Tin Truong (duytin.truong at unitn.it)
+ #        at CIBIO, University of Trento, Italy
+ 
+Index: metaphlan2-2.9.19/utils/merge_metaphlan_tables.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/merge_metaphlan_tables.py
++++ metaphlan2-2.9.19/utils/merge_metaphlan_tables.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ import argparse
+ import os
+Index: metaphlan2-2.9.19/utils/metaphlan2krona.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/metaphlan2krona.py
++++ metaphlan2-2.9.19/utils/metaphlan2krona.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ # ============================================================================== 
+ # Conversion script: from MetaPhlAn output to Krona text input file
+Index: metaphlan2-2.9.19/utils/plot_bug.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/plot_bug.py
++++ metaphlan2-2.9.19/utils/plot_bug.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python2
++#!/usr/bin/env python3
+ 
+ import sys
+ import numpy as np
+Index: metaphlan2-2.9.19/utils/read_fastx.py
+===================================================================
+--- metaphlan2-2.9.19.orig/utils/read_fastx.py
++++ metaphlan2-2.9.19/utils/read_fastx.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/env python3
+ 
+ 
+ import sys
diff -Nru metaphlan2-2.7.8/debian/patches/series metaphlan2-2.9.19/debian/patches/series
--- metaphlan2-2.7.8/debian/patches/series	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/series	2019-08-15 13:12:57.000000000 -0700
@@ -1,3 +1,4 @@
 mpa_dir-is-usr_share_metaphlan2.patch
 spelling.patch
 _metaphlan2.py.patch
+python3.patch
diff -Nru metaphlan2-2.7.8/debian/patches/spelling.patch metaphlan2-2.9.19/debian/patches/spelling.patch
--- metaphlan2-2.7.8/debian/patches/spelling.patch	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/patches/spelling.patch	2019-08-15 13:12:57.000000000 -0700
@@ -2,29 +2,33 @@
 Last-Update: Mon, 23 May 2016 16:09:13 +0200
 Description: Spelling
 
---- a/README.md
-+++ b/README.md
-@@ -307,7 +307,7 @@ Post-mapping arguments:
- Additional analysis types and arguments:
-   -t ANALYSIS TYPE      Type of analysis to perform: 
-                          * rel_ab: profiling a metagenomes in terms of relative abundances
--                         * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads comming from each clade.
-+                         * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads coming from each clade.
-                          * reads_map: mapping from reads to clades (only reads hitting a marker)
-                          * clade_profiles: normalized marker counts for clades with at least a non-null marker
-                          * marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)
-@@ -713,7 +713,7 @@ python ../strainphlan.py -h
- The default setting can be stringent for some cases where you have very few samples left in the phylogenetic tree. You can relax some parameters to add more samples back:
- 
- 1. *marker\_in\_clade*: In each sample, the clades with the percentage of present markers less than this threshold are removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
--2. *sample\_in\_marker*: If the percentage of samples that a marker present in is less than this threhold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
-+2. *sample\_in\_marker*: If the percentage of samples that a marker present in is less than this threshold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
- 3. *N\_in\_marker*: The consensus markers with the percentage of N nucleotides greater than this threshold are removed. Default "0.2". You can set this parameter to "0.5" to add some more samples.
- 4. *gap\_in\_sample*: The samples with full sequences concatenated from all markers and having the percentage of gaps greater than this threshold will be removed. Default 0.2. You can set this parameter to "0.5" to add some more samples.
- 5. *relaxed\_parameters*: use this option to automatically set the above parameters to add some more samples by accepting some more gaps, Ns, etc. This option is equivalent to set: marker\_in\_clade=0.5, sample\_in\_marker=0.5, N\_in\_marker=0.5, gap\_in\_sample=0.5. Default "False".
---- a/strainphlan.py
-+++ b/strainphlan.py
-@@ -337,7 +337,7 @@ def read_params():
+Index: metaphlan2-2.9.19/README.md
+===================================================================
+--- metaphlan2-2.9.19.orig/README.md
++++ metaphlan2-2.9.19/README.md
+@@ -375,7 +375,7 @@
+ Additional analysis types and arguments:
+   -t ANALYSIS TYPE      Type of analysis to perform:
+                          * rel_ab: profiling a metagenomes in terms of relative abundances
+-                         * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads comming from each clade.
++                         * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads coming from each clade.
+                          * reads_map: mapping from reads to clades (only reads hitting a marker)
+                          * clade_profiles: normalized marker counts for clades with at least a non-null marker
+                          * marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)
+@@ -733,7 +733,7 @@
+ The default setting can be stringent for some cases where you have very few samples left in the phylogenetic tree. You can relax some parameters to add more samples back:
+ 
+ 1. *marker\_in\_clade*: In each sample, the clades with the percentage of present markers less than this threshold are removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+-2. *sample\_in\_marker*: If the percentage of samples that a marker present in is less than this threhold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
++2. *sample\_in\_marker*: If the percentage of samples that a marker present in is less than this threshold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+ 3. *N\_in\_marker*: The consensus markers with the percentage of N nucleotides greater than this threshold are removed. Default "0.2". You can set this parameter to "0.5" to add some more samples.
+ 4. *gap\_in\_sample*: The samples with full sequences concatenated from all markers and having the percentage of gaps greater than this threshold will be removed. Default 0.2. You can set this parameter to "0.5" to add some more samples.
+ 5. *relaxed\_parameters*: use this option to automatically set the above parameters to add some more samples by accepting some more gaps, Ns, etc. This option is equivalent to set: marker\_in\_clade=0.5, sample\_in\_marker=0.5, N\_in\_marker=0.5, gap\_in\_sample=0.5. Default "False".
+Index: metaphlan2-2.9.19/strainphlan.py
+===================================================================
+--- metaphlan2-2.9.19.orig/strainphlan.py
++++ metaphlan2-2.9.19/strainphlan.py
+@@ -338,7 +338,7 @@
          required=False,
          default=['all'],
          type=str,
@@ -33,9 +37,11 @@
                  'the marker alignments in fasta format and the phylogenetic '\
                  'trees. If a file name is specified, the clade list in that '\
                  'file where each clade name is on a line will be read.'
---- a/metaphlan2.py
-+++ b/metaphlan2.py
-@@ -596,7 +596,7 @@ def read_params(args):
+Index: metaphlan2-2.9.19/metaphlan2.py
+===================================================================
+--- metaphlan2-2.9.19.orig/metaphlan2.py
++++ metaphlan2-2.9.19/metaphlan2.py
+@@ -314,7 +314,7 @@
           default='rel_ab', help =
           "Type of analysis to perform: \n"
           " * rel_ab: profiling a metagenomes in terms of relative abundances\n"
diff -Nru metaphlan2-2.7.8/debian/rules metaphlan2-2.9.19/debian/rules
--- metaphlan2-2.7.8/debian/rules	2018-09-17 01:17:22.000000000 -0700
+++ metaphlan2-2.9.19/debian/rules	2019-08-15 13:12:57.000000000 -0700
@@ -3,7 +3,7 @@
 # DH_VERBOSE := 1
 
 %:
-	dh $@  --with python2
+	dh $@  --with python3
 
 override_dh_auto_build:
 	dh_auto_build


More information about the Debian-med-packaging mailing list