[med-svn] [metaphlan2] 01/04: Imported Upstream version 2.6.0+ds

Andreas Tille tille at debian.org
Tue Sep 13 07:30:35 UTC 2016


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository metaphlan2.

commit 9c3291e16a90dd1b2fa6048505995fd3ea435da1
Author: Andreas Tille <tille at debian.org>
Date:   Tue Sep 13 08:42:14 2016 +0200

    Imported Upstream version 2.6.0+ds
---
 .hg_archival.txt                                   |    4 +
 .hgsub                                             |    2 +
 .hgsubstate                                        |    2 +
 .hgtags                                            |    9 +
 README.md                                          |  764 ++
 changeset.txt                                      |   13 +
 db_v20/mpa_v20_m200.pkl                            |  Bin 0 -> 39902847 bytes
 license.txt                                        |    7 +
 metaphlan2.py                                      | 1282 ++++
 strainphlan.py                                     | 1538 ++++
 strainphlan_src/add_metadata_tree.py               |  109 +
 strainphlan_src/build_tree_single_strain.py        |  146 +
 strainphlan_src/compute_distance.py                |  195 +
 strainphlan_src/dump_file.py                       |   77 +
 strainphlan_src/extract_markers.py                 |   45 +
 strainphlan_src/fastx_len_filter.py                |   17 +
 strainphlan_src/fix_AF1.py                         |   36 +
 strainphlan_src/logging.ini                        |   22 +
 strainphlan_src/mixed_utils.py                     |   99 +
 strainphlan_src/ooSubprocess.py                    |  300 +
 strainphlan_src/plot_tree_graphlan.py              |  177 +
 strainphlan_src/sam_filter.py                      |   59 +
 strainphlan_src/sample2markers.py                  |  421 ++
 strainphlan_src/which.py                           |   25 +
 strainphlan_tutorial/step1_download.sh             |    3 +
 strainphlan_tutorial/step2_fastq2sam.sh            |    8 +
 strainphlan_tutorial/step3_sam2marker.sh           |    5 +
 strainphlan_tutorial/step4_extract_db_marker.sh    |    4 +
 strainphlan_tutorial/step5_build_tree.sh           |    4 +
 .../step6_build_tree_single_strain.sh              |    3 +
 utils/extract_markers.py                           |   49 +
 utils/markers_info.txt.bz2                         |  Bin 0 -> 33258288 bytes
 utils/merge_metaphlan_tables.py                    |  103 +
 utils/metaphlan2krona.py                           |   49 +
 utils/metaphlan_hclust_heatmap.py                  |  483 ++
 utils/plot_bug.py                                  |  254 +
 utils/species2genomes.txt                          | 7678 ++++++++++++++++++++
 37 files changed, 13992 insertions(+)

diff --git a/.hg_archival.txt b/.hg_archival.txt
new file mode 100644
index 0000000..adfcf40
--- /dev/null
+++ b/.hg_archival.txt
@@ -0,0 +1,4 @@
+repo: b4e7c5505112b08d33dd30f4788429ba023e67f0
+node: c43e40a443edbd3c4cac7349d2679540578096f5
+branch: default
+tag: 2.6.0
diff --git a/.hgsub b/.hgsub
new file mode 100644
index 0000000..c9a57df
--- /dev/null
+++ b/.hgsub
@@ -0,0 +1,2 @@
+utils/export2graphlan = https://bitbucket.org/cibiocm/export2graphlan
+utils/hclust2 = https://bitbucket.org/nsegata/hclust2
diff --git a/.hgsubstate b/.hgsubstate
new file mode 100644
index 0000000..ace1a92
--- /dev/null
+++ b/.hgsubstate
@@ -0,0 +1,2 @@
+f8823b8162ddea6533866afd27d5ed1ce6ff22e0 utils/export2graphlan
+0d8cb18ce9996e7ce4043a00294aeb2ed9bfa5f2 utils/hclust2
diff --git a/.hgtags b/.hgtags
new file mode 100644
index 0000000..8b490a3
--- /dev/null
+++ b/.hgtags
@@ -0,0 +1,9 @@
+b4e7c5505112b08d33dd30f4788429ba023e67f0 2.0_alpha1
+60d254d499e2dd1a8b1cfe344236efa47f823ec6 2.0_beta1
+1b6df65b5a3e9feed0179f855c11fd197fe9a64f 2.0_beta2
+12cceaad3493085c4497898aaeff691913ddb633 2.0_beta3
+616a7debe7937672940130e6c5b26a9ef9e76fcd 2.0.0
+3959b668bbed6150698b594cbbc30a924e5d30e1 2.1.0
+0ef29ae841f52b53176ca264fb9f52f98713eb3c 2.2.0
+5424bb911dfcdb7212ea0949d4faeb6e69cfa61f 2.3.0
+6f2a1673af8565e93fb8e69238141889b7c87361 2.5.0
diff --git a/README.md b/README.md
new file mode 100644
index 0000000..47f9b40
--- /dev/null
+++ b/README.md
@@ -0,0 +1,764 @@
+[TOC]
+
+#**MetaPhlAn 2: Metagenomic Phylogenetic Analysis**#
+
+AUTHORS: Duy Tin Truong (duytin.truong at unitn.it), Nicola Segata (nicola.segata at unitn.it)
+
+##**Description**##
+MetaPhlAn is a computational tool for profiling the composition of microbial communities (Bacteria, Archaea, Eukaryotes and Viruses) from metagenomic shotgun sequencing data with species level resolution. From version 2.0, MetaPhlAn is also able to identify specific strains (in the not-so-frequent cases in which the sample contains a previously sequenced strains) and to track strains across samples for all species.
+
+MetaPhlAn 2 relies on ~1M unique clade-specific marker genes ([the marker information file can be found at src/utils/markers_info.txt.bz2 or here](https://bitbucket.org/biobakery/metaphlan2/src/473a41eba501df5f750da032d4f04b38db98dde1/utils/markers_info.txt.bz2?at=default)) identified from ~17,000 reference genomes (~13,500 bacterial and archaeal, ~3,500 viral, and ~110 eukaryotic), allowing:
+
+* unambiguous taxonomic assignments;
+* accurate estimation of organismal relative abundance;
+* species-level resolution for bacteria, archaea, eukaryotes and viruses;
+* strain identification and tracking
+* orders of magnitude speedups compared to existing methods.
+* metagenomic strain-level population genomics
+
+If you use this software, please cite :
+
+[**MetaPhlAn2 for enhanced metagenomic taxonomic profiling.**](http://www.nature.com/nmeth/journal/v12/n10/pdf/nmeth.3589.pdf)
+ *Duy Tin Truong, Eric A Franzosa, Timothy L Tickle, Matthias Scholz, George Weingart, Edoardo Pasolli, Adrian Tett, Curtis Huttenhower & Nicola Segata*. 
+Nature Methods 12, 902–903 (2015)
+
+-------------
+
+##**Pre-requisites**##
+
+MetaPhlAn requires *python 2.7* or higher with argparse, tempfile and [*numpy*](http://www.numpy.org/) libraries installed 
+  (apart for numpy they are usually installed together with the python distribution). 
+  Python3 is also now supported.
+
+**If you provide the SAM output of [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) as input, there are no additional prerequisite.**
+
+* If you would like to use the BowTie2 integrated in MetaPhlAn, you need to have BowTie2 version 2.0.0 or higher and perl installed (bowtie2 needs to be in the system path with execute _and_ read permission)
+
+* If you use the "utils/metaphlan_hclust_heatmap.py" script to plot and hierarchical cluster multiple MetaPhlAn-profiled samples you will also need the following python libraries: [matplotlib](http://matplotlib.org/index.html), [scipy](http://www.scipy.org/), [pylab](http://wiki.scipy.org/PyLab) (if not installed together with MatPlotLib).
+
+* If you want to produce the output as "biom" file you also need [biom](http://biom-format.org/) installed
+
+* MetaPhlAn is not tightly integrated with advanced heatmap plotting with [hclust2](https://bitbucket.org/nsegata/hclust2) and cladogram visualization with [GraPhlAn](https://bitbucket.org/nsegata/graphlan/wiki/Home). If you use such visualization tool please refer to their prerequisites. 
+
+----------------------
+
+##**Installation**##
+
+MetaPhlAn 2.0 can be obtained by either
+
+[Downloading MetaPhlAn v2.0](https://bitbucket.org/biobakery/metaphlan2/get/default.zip)  
+
+**OR**
+
+Cloning the repository via the following commands
+``$ hg clone https://bitbucket.org/biobakery/metaphlan2``
+
+--------------------------
+
+
+##**Basic Usage**##
+
+This section presents some basic usages of MetaPhlAn2, for more advanced usages, please see at [its wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
+
+We assume here that ``metaphlan2.py`` is in the system path and that ``mpa_dir`` bash variable contains the main MetaPhlAn folder. You can set this two variables moving to your MetaPhlAn2 local folder and type:
+```
+#!cmd
+$ export PATH=`pwd`:$PATH
+$ export mpa_dir=`pwd`
+```
+
+Here is the basic example to profile a metagenome from raw reads (requires BowTie2 in the system path with execution and read permissions, Perl installed). 
+
+```
+#!cmd
+$ metaphlan2.py metagenome.fastq --input_type fastq > profiled_metagenome.txt
+```
+
+It is highly recommended to save the intermediate BowTie2 output for re-running MetaPhlAn extremely quickly (--bowtie2out), and use multiple CPUs (--nproc) if available:
+
+```
+#!cmd
+$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
+```
+
+If you already mapped your metagenome against the marker DB (using a previous  MetaPhlAn run), you can obtain the results in few seconds by using the previously saved --bowtie2out file and specifying the input (--input_type bowtie2out):
+
+```
+#!cmd
+$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out > profiled_metagenome.txt
+```
+
+You can also provide an externally BowTie2-mapped SAM if you specify this format with --input_type. Two steps here: first map your metagenome with BowTie2 and then feed MetaPhlAn2 with the obtained sam:
+
+```
+#!cmd
+$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq
+$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
+```
+
+In order to make MetaPhlAn 2 easily compatible with complex metagenomic pipeline, there are now multiple alternative ways to pass the input:
+
+```
+#!cmd
+$ cat metagenome.fastq | metaphlan2.py --input_type fastq > profiled_metagenome.txt
+```
+
+```
+#!cmd
+$ tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2.py --input_type fastq --bowtie2db ${mpa_dir}/db_v20/mpa_v20_m200 > profiled_metagenome.txt
+```
+
+```
+#!cmd
+$ metaphlan2.py --input_type fastq < metagenome.fastq > profiled_metagenome.txt
+```
+
+```
+#!cmd
+$ metaphlan2.py --input_type fastq <(bzcat metagenome.fastq.bz2) > profiled_metagenome.txt
+```
+
+```
+#!cmd
+$ metaphlan2.py --input_type fastq <(zcat metagenome_1.fastq.gz metagenome_2.fastq.gz) > profiled_metagenome.txt
+```
+
+MetaPhlAn 2 can also natively **handle paired-end metagenomes** (but does not use the paired-end information), and, more generally, metagenomes stored in multiple files (but you need to specify the --bowtie2out parameter):
+
+```
+#!cmd
+$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq > profiled_metagenome.txt
+```
+
+For advanced options and other analysis types (such as strain tracking) please refer to the full command-line options.
+
+##**Full command-line options**##
+
+
+```
+usage: metaphlan2.py --input_type
+                     {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
+                     [--mpa_pkl MPA_PKL] [--bowtie2db METAPHLAN_BOWTIE2_DB]
+                     [--bt2_ps BowTie2 presets] [--bowtie2_exe BOWTIE2_EXE]
+                     [--bowtie2out FILE_NAME] [--no_map] [--tmp_dir]
+                     [--tax_lev TAXONOMIC_LEVEL] [--min_cu_len]
+                     [--min_alignment_len] [--ignore_viruses]
+                     [--ignore_eukaryotes] [--ignore_bacteria]
+                     [--ignore_archaea] [--stat_q]
+                     [--ignore_markers IGNORE_MARKERS] [--avoid_disqm]
+                     [--stat] [-t ANALYSIS TYPE] [--nreads NUMBER_OF_READS]
+                     [--pres_th PRESENCE_THRESHOLD] [--clade] [--min_ab] [-h]
+                     [-o output file] [--sample_id_key name]
+                     [--sample_id value] [-s sam_output_file]
+                     [--biom biom_output] [--mdelim mdelim] [--nproc N] [-v]
+                     [INPUT_FILE] [OUTPUT_FILE]
+
+DESCRIPTION
+ MetaPhlAn version 2.1.0 (28 April 2015): 
+ METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling.
+
+AUTHORS: Nicola Segata (nicola.segata at unitn.it), Duy Tin Truong (duytin.truong at unitn.it)
+
+COMMON COMMANDS
+
+ We assume here that metaphlan2.py is in the system path and that mpa_dir bash variable contains the
+ main MetaPhlAn folder. Also BowTie2 should be in the system path with execution and read
+ permissions, and Perl should be installed.
+
+========== MetaPhlAn 2 clade-abundance estimation ================= 
+
+The basic usage of MetaPhlAn 2 consists in the identification of the clades (from phyla to species and 
+strains in particular cases) present in the metagenome obtained from a microbiome sample and their 
+relative abundance. This correspond to the default analysis type (--analysis_type rel_ab).
+
+*  Profiling a metagenome from raw reads:
+$ metaphlan2.py metagenome.fastq --input_type fastq
+
+*  You can take advantage of multiple CPUs and save the intermediate BowTie2 output for re-running
+   MetaPhlAn extremely quickly:
+$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
+
+*  If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you
+   can obtain the results in few seconds by using the previously saved --bowtie2out file and 
+   specifying the input (--input_type bowtie2out):
+$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out
+
+*  You can also provide an externally BowTie2-mapped SAM if you specify this format with 
+   --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:
+$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq
+$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt
+
+*  Multiple alternative ways to pass the input are also available:
+$ cat metagenome.fastq | metaphlan2.py --input_type fastq 
+$ tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2.py --input_type fastq 
+$ metaphlan2.py --input_type fastq < metagenome.fastq
+$ metaphlan2.py --input_type fastq <(bzcat metagenome.fastq.bz2)
+$ metaphlan2.py --input_type fastq <(zcat metagenome_1.fastq.gz metagenome_2.fastq.gz)
+
+*  We can also natively handle paired-end metagenomes, and, more generally, metagenomes stored in 
+  multiple files (but you need to specify the --bowtie2out parameter):
+$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq
+
+------------------------------------------------------------------- 
+ 
+
+========== MetaPhlAn 2 strain tracking ============================ 
+
+MetaPhlAn 2 introduces the capability of charaterizing organisms at the strain level using non
+aggregated marker information. Such capability comes with several slightly different flavours and 
+are a way to perform strain tracking and comparison across multiple samples.
+Usually, MetaPhlAn 2 is first ran with the default --analysis_type to profile the species present in
+the community, and then a strain-level profiling can be performed to zoom-in into specific species
+of interest. This operation can be performed quickly as it exploits the --bowtie2out intermediate 
+file saved during the execution of the default analysis type.
+
+*  The following command will output the abundance of each marker with a RPK (reads per kil-base) 
+   higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated before as 
+   shown above).
+$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
+   The obtained RPK can be optionally normalized by the total number of reads in the metagenome 
+   to guarantee fair comparisons of abundances across samples. The number of reads in the metagenome
+   needs to be passed with the '--nreads' argument
+
+*  The list of markers present in the sample can be obtained with '-t marker_pres_table'
+$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
+   The --pres_th argument (default 1.0) set the minimum RPK value to consider a marker present
+
+*  The list '-t clade_profiles' analysis type reports the same information of '-t marker_ab_table'
+   but the markers are reported on a clade-by-clade basis.
+$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt
+
+*  Finally, to obtain all markers present for a specific clade and all its subclades, the 
+   '-t clade_specific_strain_tracker' should be used. For example, the following command
+   is reporting the presence/absence of the markers for the B. fragulis species and its strains
+$ metaphlan2.py -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 db_v20/mpa_v20_m200.pkl --input_type bowtie2out > marker_abundance_table.txt
+   the optional argument --min_ab specifies the minimum clade abundance for reporting the markers
+
+------------------------------------------------------------------- 
+
+positional arguments:
+  INPUT_FILE            the input file can be:
+                        * a fastq file containing metagenomic reads
+                        OR
+                        * a BowTie2 produced SAM file. 
+                        OR
+                        * an intermediary mapping file of the metagenome generated by a previous MetaPhlAn run 
+                        If the input file is missing, the script assumes that the input is provided using the standard 
+                        input, or named pipes.
+                        IMPORTANT: the type of input needs to be specified with --input_type
+  OUTPUT_FILE           the tab-separated output file of the predicted taxon relative abundances 
+                        [stdout if not present]
+
+Required arguments:
+  --mpa_pkl MPA_PKL     the metadata pickled MetaPhlAn file
+  --input_type {fastq,fasta,multifasta,multifastq,bowtie2out,sam}
+                        set whether the input is the multifasta file of metagenomic reads or 
+                        the SAM file of the mapping of the reads against the MetaPhlAn db.
+                        [default 'automatic', i.e. the script will try to guess the input format]
+
+Mapping arguments:
+  --bowtie2db METAPHLAN_BOWTIE2_DB
+                        The BowTie2 database file of the MetaPhlAn database. 
+                        Used if --input_type is fastq, fasta, multifasta, or multifastq
+  --bt2_ps BowTie2 presets
+                        presets options for BowTie2 (applied only when a multifasta file is provided)
+                        The choices enabled in MetaPhlAn are:
+                         * sensitive
+                         * very-sensitive
+                         * sensitive-local
+                         * very-sensitive-local
+                        [default very-sensitive]
+  --bowtie2_exe BOWTIE2_EXE
+                        Full path and name of the BowTie2 executable. This option allows 
+                        MetaPhlAn to reach the executable even when it is not in the system 
+                        PATH or the system PATH is unreachable
+  --bowtie2out FILE_NAME
+                        The file for saving the output of BowTie2
+  --no_map              Avoid storing the --bowtie2out map file
+  --tmp_dir             the folder used to store temporary files 
+                        [default is the OS dependent tmp dir]
+
+Post-mapping arguments:
+  --tax_lev TAXONOMIC_LEVEL
+                        The taxonomic level for the relative abundance output:
+                        'a' : all taxonomic levels
+                        'k' : kingdoms (Bacteria and Archaea) only
+                        'p' : phyla only
+                        'c' : classes only
+                        'o' : orders only
+                        'f' : families only
+                        'g' : genera only
+                        's' : species only
+                        [default 'a']
+  --min_cu_len          minimum total nucleotide length for the markers in a clade for
+                        estimating the abundance without considering sub-clade abundances
+                        [default 2000]
+  --min_alignment_len   The sam records for aligned reads with the longest subalignment
+                        length smaller than this threshold will be discarded.
+                        [default None]
+  --ignore_viruses      Do not profile viral organisms
+  --ignore_eukaryotes   Do not profile eukaryotic organisms
+  --ignore_bacteria     Do not profile bacterial organisms
+  --ignore_archaea      Do not profile archeal organisms
+  --stat_q              Quantile value for the robust average
+                        [default 0.1]
+  --ignore_markers IGNORE_MARKERS
+                        File containing a list of markers to ignore. 
+  --avoid_disqm         Descrivate the procedure of disambiguating the quasi-markers based on the 
+                        marker abundance pattern found in the sample. It is generally recommended 
+                        too keep the disambiguation procedure in order to minimize false positives
+  --stat                EXPERIMENTAL! Statistical approach for converting marker abundances into clade abundances
+                        'avg_g'  : clade global (i.e. normalizing all markers together) average
+                        'avg_l'  : average of length-normalized marker counts
+                        'tavg_g' : truncated clade global average at --stat_q quantile
+                        'tavg_l' : trunated average of length-normalized marker counts (at --stat_q)
+                        'wavg_g' : winsorized clade global average (at --stat_q)
+                        'wavg_l' : winsorized average of length-normalized marker counts (at --stat_q)
+                        'med'    : median of length-normalized marker counts
+                        [default tavg_g]
+
+Additional analysis types and arguments:
+  -t ANALYSIS TYPE      Type of analysis to perform: 
+                         * rel_ab: profiling a metagenomes in terms of relative abundances
+                         * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads comming from each clade.
+                         * reads_map: mapping from reads to clades (only reads hitting a marker)
+                         * clade_profiles: normalized marker counts for clades with at least a non-null marker
+                         * marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)
+                         * marker_pres_table: list of markers present in the sample (threshold at 1.0 if not differently specified with --pres_th
+                        [default 'rel_ab']
+  --nreads NUMBER_OF_READS
+                        The total number of reads in the original metagenome. It is used only when 
+                        -t marker_table is specified for normalizing the length-normalized counts 
+                        with the metagenome size as well. No normalization applied if --nreads is not 
+                        specified
+  --pres_th PRESENCE_THRESHOLD
+                        Threshold for calling a marker present by the -t marker_pres_table option
+  --clade               The clade for clade_specific_strain_tracker analysis
+  --min_ab              The minimum percentage abundace for the clade in the clade_specific_strain_tracker analysis
+  -h, --help            show this help message and exit
+
+Output arguments:
+  -o output file, --output_file output file
+                        The output file (if not specified as positional argument)
+  --sample_id_key name  Specify the sample ID key for this analysis. Defaults to '#SampleID'.
+  --sample_id value     Specify the sample ID for this analysis. Defaults to 'Metaphlan2_Analysis'.
+  -s sam_output_file, --samout sam_output_file
+                        The sam output file
+  --biom biom_output, --biom_output_file biom_output
+                        If requesting biom file output: The name of the output file in biom format 
+  --mdelim mdelim, --metadata_delimiter_char mdelim
+                        Delimiter for bug metadata: - defaults to pipe. e.g. the pipe in k__Bacteria|p__Proteobacteria 
+
+Other arguments:
+  --nproc N             The number of CPUs to use for parallelizing the mapping
+                        [default 1, i.e. no parallelism]
+  -v, --version         Prints the current MetaPhlAn version and exit
+
+
+```
+
+##**Utility Scripts**##
+
+MetaPhlAn's repository features a few utility scripts to aid in manipulation of sample output and its visualization. These scripts can be found under the ``utils`` folder in the metaphlan2 directory.
+
+###**Merging Tables**###
+
+The script **merge_metaphlan_tables.py** allows to combine MetaPhlAn output from several samples to be merged into one table Bugs (rows) vs Samples (columns) with the table enlisting the relative normalized abundances per sample per bug.
+
+To merge multiple output files, run the script as below
+
+```
+#!cmd
+$ python utils/merge_metaphlan_tables.py metaphlan_output1.txt metaphlan_output2.txt metaphlan_output3.txt > output/merged_abundance_table.txt
+```
+
+Wildcards can be used as needed:
+```
+#!cmd
+$ python utils/merge_metaphlan_tables.py metaphlan_output*.txt > output/merged_abundance_table.txt
+```
+
+**There is no limit to how many files you can merge.**
+
+##**Heatmap Visualization**##
+
+The script **metaphlan_hclust_heatmap.py** allows to visualize the MetaPhlAn results in the form of a hierarchically-clustered heatmap. To generate the heatmap for a merged MetaPhlAn output table (as described above), please run the script as below.
+
+```
+#!cmd
+$ python utils/metaphlan_hclust_heatmap.py -c bbcry --top 25 --minv 0.1 -s log --in output/merged_abundance_table.txt --out output_images/abundance_heatmap.png
+```
+
+For detailed command-line instructions, please refer to below:
+
+
+```
+#!
+
+$ utils/metaphlan_hclust_heatmap.py -h
+usage: metaphlan_hclust_heatmap.py [-h] --in INPUT_FILE --out OUTPUT_FILE
+                                   [-m {single,complete,average,weighted,centroid,median,ward}]
+                                   [-d {euclidean,minkowski,cityblock,seuclidean,sqeuclidean,cosine,correlation,hamming,jaccard,chebyshev,canberra,braycurtis,mahalanobis,yule,matching,dice,kulsinski,rogerstanimoto,russellrao,sokalmichener,sokalsneath,wminkowski,ward}]
+                                   [-f {euclidean,minkowski,cityblock,seuclidean,sqeuclidean,cosine,correlation,hamming,jaccard,chebyshev,canberra,braycurtis,mahalanobis,yule,matching,dice,kulsinski,rogerstanimoto,russellrao,sokalmichener,sokalsneath,wminkowski,ward}]
+                                   [-s scale norm] [-x X] [-y Y] [--minv MINV]
+                                   [--maxv max value]
+                                   [--tax_lev TAXONOMIC_LEVEL] [--perc PERC]
+                                   [--top TOP] [--sdend_h SDEND_H]
+                                   [--fdend_w FDEND_W] [--cm_h CM_H]
+                                   [--cm_ticks label for ticks of the colormap]
+                                   [--font_size FONT_SIZE]
+                                   [--clust_line_w CLUST_LINE_W]
+                                   [-c {Accent,Blues,BrBG,BuGn,BuPu,Dark2,GnBu,Greens,Greys,OrRd,Oranges,PRGn,Paired,Pastel1,Pastel2,PiYG,PuBu,PuBuGn,PuOr,PuRd,Purples,RdBu,RdGy,RdPu,RdYlBu,RdYlGn,Reds,Set1,Set2,Set3,Spectral,YlGn,YlGnBu,YlOrBr,YlOrRd,afmhot,autumn,binary,bone,brg,bwr,cool,copper,flag,gist_earth,gist_gray,gist_heat,gist_ncar,gist_rainbow,gist_stern,gist_yarg,gnuplot,gnuplot2,gray,hot,hsv,jet,ocean,pink,prism,rainbow,seismic,spectral,spring,summer,terrain,winter,bbcyr,bbcry}]
+
+This scripts generates heatmaps with hierarchical clustering of both samples
+and microbial clades. The script can also subsample the number of clades to                                                                                                                                                                                                    
+display based on the their nth percentile abundance value in each sample                                                                                                                                                                                                       
+                                                                                                                                                                                                                                                                               
+optional arguments:                                                                                                                                                                                                                                                            
+  -h, --help            show this help message and exit                                                                                                                                                                                                                        
+  --in INPUT_FILE       The input file of microbial relative abundances. This                                                                                                                                                                                                  
+                        file is typically obtained with the                                                                                                                                                                                                                    
+                        "utils/merge_metaphlan_tables.py"                                                                                                                                                                                                                      
+  --out OUTPUT_FILE     The output image. The extension of the file determines                                                                                                                                                                                                 
+                        the image format. png, pdf, and svg are the preferred                                                                                                                                                                                                  
+                        format                                                                                                                                                                                                                                                 
+  -m {single,complete,average,weighted,centroid,median,ward}                                                                                                                                                                                                                   
+                        The hierarchical clustering method, default is                                                                                                                                                                                                         
+                        "average"                                                                                                                                                                                                                                              
+  -d {euclidean,minkowski,cityblock,seuclidean,sqeuclidean,cosine,correlation,hamming,jaccard,chebyshev,canberra,braycurtis,mahalanobis,yule,matching,dice,kulsinski,rogerstanimoto,russellrao,sokalmichener,sokalsneath,wminkowski,ward}                                      
+                        The distance function for samples. Default is                                                                                                                                                                                                          
+                        "braycurtis"                                                                                                                                                                                                                                           
+  -f {euclidean,minkowski,cityblock,seuclidean,sqeuclidean,cosine,correlation,hamming,jaccard,chebyshev,canberra,braycurtis,mahalanobis,yule,matching,dice,kulsinski,rogerstanimoto,russellrao,sokalmichener,sokalsneath,wminkowski,ward}                                      
+                        The distance function for microbes. Default is                                                                                                                                                                                                         
+                        "correlation"                                                                                                                                                                                                                                          
+  -s scale norm                                                                                                                                                                                                                                                                
+  -x X                  Width of heatmap cells. Automatically set, this option                                                                                                                                                                                                 
+                        should not be necessary unless for very large heatmaps                                                                                                                                                                                                 
+  -y Y                  Height of heatmap cells. Automatically set, this                                                                                                                                                                                                       
+                        option should not be necessary unless for very large                                                                                                                                                                                                   
+                        heatmaps                                                                                                                                                                                                                                               
+  --minv MINV           Minimum value to display. Default is 0.0, values                                                                                                                                                                                                       
+                        around 0.001 are also reasonable                                                                                                                                                                                                                       
+  --maxv max value      Maximum value to display. Default is maximum value                                                                                                                                                                                                     
+                        present, can be set e.g. to 100 to display the full
+                        scale
+  --tax_lev TAXONOMIC_LEVEL
+                        The taxonomic level to display: 'a' : all taxonomic
+                        levels 'k' : kingdoms (Bacteria and Archaea) only 'p'
+                        : phyla only 'c' : classes only 'o' : orders only 'f'
+                        : families only 'g' : genera only 's' : species only
+                        [default 's']
+  --perc PERC           Percentile to be used for ordering the microbes in
+                        order to select with --top the most abundant microbes
+                        only. Default is 90
+  --top TOP             Display the --top most abundant microbes only
+                        (ordering based on --perc)
+  --sdend_h SDEND_H     Set the height of the sample dendrogram. Default is
+                        0.1
+  --fdend_w FDEND_W     Set the width of the microbes dendrogram. Default is
+                        0.1
+  --cm_h CM_H           Set the height of the colormap. Default = 0.03
+  --cm_ticks label for ticks of the colormap
+  --font_size FONT_SIZE
+                        Set label font sizes. Default is 7
+  --clust_line_w CLUST_LINE_W
+                        Set the line width for the dendrograms
+  -c {Accent,Blues,BrBG,BuGn,BuPu,Dark2,GnBu,Greens,Greys,OrRd,Oranges,PRGn,Paired,Pastel1,Pastel2,PiYG,PuBu,PuBuGn,PuOr,PuRd,Purples,RdBu,RdGy,RdPu,RdYlBu,RdYlGn,Reds,Set1,Set2,Set3,Spectral,YlGn,YlGnBu,YlOrBr,YlOrRd,afmhot,autumn,binary,bone,brg,bwr,cool,copper,flag,gist_earth,gist_gray,gist_heat,gist_ncar,gist_rainbow,gist_stern,gist_yarg,gnuplot,gnuplot2,gray,hot,hsv,jet,ocean,pink,prism,rainbow,seismic,spectral,spring,summer,terrain,winter,bbcyr,bbcry}
+                        Set the colormap. Default is "jet".
+```
+
+###**GraPhlAn Visualization**###
+
+The tutorial of using GraPhlAn can be found from [the MetaPhlAn2 wiki](https://bitbucket.org/biobakery/biobakery/wiki/metaphlan2).
+
+
+##**Customizing the database**##
+In order to add a marker to the database, the user needs the following steps:
+
+* Reconstruct the marker sequences (in fasta format) from the MetaPhlAn2 bowtie2 database by:
+
+```
+#!bash
+
+bowtie2-inspect metaphlan2/db_v20/mpa_v20_m200 > metaphlan2/markers.fasta
+
+```
+
+
+* Add the marker sequence stored in a file new_marker.fasta to the marker set:
+
+```
+#!bash
+
+cat new_marker.fasta >> metaphlan2/markers.fasta
+
+```
+
+* Rebuild the bowtie2 database:
+
+```
+#!bash
+
+mkdir metaphlan2/db_v21/mpa_v21_m200
+bowtie2-build metaphlan2/markers.fasta metaphlan2/db_v21/mpa_v21_m200
+
+```
+
+* Assume that the new marker was extracted from genome1, genome2. Update the taxonomy file from python console as follows:
+
+```
+#!python
+
+import cPickle as pickle
+import bz2
+
+db = pickle.load(bz2.BZ2File('db_v20/mpa_v20_m200.pkl', 'r'))
+
+# Add the taxonomy of the new genomes
+db['taxonomy']['taxonomy of genome1'] = length of genome1
+db['taxonomy']['taxonomy of genome2'] = length of genome2
+
+# Add the information of the new marker as the other markers
+db['markers'][new_marker_name] = {
+                                   'clade': the clade that the marker belongs to,
+                                   'ext': {the name of the first external genome where the marker appears, 
+                                           the name of the second external genome where the marker appears, 
+                                          },
+                                   'len': length of the marker,
+                                   'score': score of the marker,
+                                   'taxon': the taxon of the marker}
+# To see an example, try to print the first marker information:
+# print db['markers'].items()[0]
+
+# Save the new mpa_pkl file
+ofile = bz2.BZ2File('metaphlan2/db_v21/mpa_v21_m200.pkl', 'w')
+pickle.dump(db, ofile, pickle.HIGHEST_PROTOCOL)
+ofile.close()
+```
+
+* To use the new database, switch to metaphlan2/db_v21 instead of metaphlan2/db_v20 when running metaphlan2.py with option "--mpa_pkl".
+
+
+##**Metagenomic strain-level population genomics**##
+
+StrainPhlAn is a computational tool for tracking individual strains across large set of samples. **The input** of StrainPhlAn is a set of metagenomic samples and for each species, **the output** is a multiple sequence alignment (MSA) file of all species strains reconstructed directly from the samples. From this MSA, StrainPhlAn calls RAxML (or other phylogenetic tree builders) to build the phylogenetic tree showing the strain evolution of the sample strains. 
+For each sample, StrainPhlAn extracts the strain of a specific species by merging and concatenating all reads mapped against that species markers in the MetaPhlAn2 database.
+
+In detail, let us start from a toy example with 6 HMP gut metagenomic samples (SRS055982-subjectID_638754422, SRS022137-subjectID_638754422, SRS019161-subjectID_763496533, SRS013951-subjectID_763496533, SRS014613-subjectID_763840445, SRS064276-subjectID_763840445) from 3 three subjects (each was sampled at two time points) and one *Bacteroides caccae* genome G000273725. 
+**We would like to**:
+
+* extract the *Bacteroides caccae* strains from these samples and compare them with the reference genome in a phylogenetic tree.
+* know how many snps between those strains and the reference genome.
+
+Running StrainPhlAn on these samples, we will obtain the *Bacteroides caccae* phylogentic tree and its multiple sequence alignment in the following figure (produced with [ete2](http://etetoolkit.org/) and [Jalview](http://www.jalview.org/)):
+
+![tree_alignment.png](https://bitbucket.org/repo/rM969K/images/476974413-tree_alignment.png)
+
+We can see that the strains from the same subject are grouped together. The tree also highlights that the strains from subject "763840445" (red color) do not change between two sampling time points whereas the strains from the other subjects have slightly evolved. From the tree, we also know that the strains of subject "763496533" is closer to the reference genome than those of the others. 
+In addition, the table below shows the number of snps between the sample strains and the reference genome based on the strain alignment returned by MetaPhlAN_Strainer.
+
+![snp_distance.png](https://bitbucket.org/repo/rM969K/images/1771497600-snp_distance.png)
+
+In the next sections, we will illustrate step by step how to run MetaPhlAn_Strainer on this toy example to reproduce the above figures.
+
+### Pre-requisites ###
+StrainPhlAn requires *python 2.7* and the libraries [pysam](http://pysam.readthedocs.org/en/latest/) (tested on **version 0.8.3**), [biopython](http://biopython.org/wiki/Main_Page), [msgpack](https://pypi.python.org/pypi/msgpack-python) and [numpy](http://www.numpy.org/), [dendropy](https://pythonhosted.org/DendroPy/) (tested on version **3.12.0**). Besides, StrainPhlAn also needs the following programs in the executable path:
+
+* [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2/index.shtml) for mapping reads against the marker database.
+
+* [MUSCLE](http://www.drive5.com/muscle/) for the alignment step.
+
+* [samtools, bcftools and vcfutils.pl](http://samtools.sourceforge.net/) which can be downloaded from [here](https://github.com/samtools) for building consensus markers. Note that vcfutils.pl is included in bcftools and **StrainPhlAn only works with samtools version 0.1.19** as samtools has changed the output format after this version.
+
+* [blastn](ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) for adding reference genomes to the phylogenetic tree.
+
+* [raxmlHPC and raxmlHPC-PTHREADS-SSE3](http://sco.h-its.org/exelixis/web/software/raxml/index.html) for building the phylogenetic trees.
+
+All dependence binaries on Linux 64 bit can be downloaded in the folder "bin" from [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+
+The script files in folder "strainphlan_src" should be changed to executable mode by:
+
+
+```
+#!python
+
+chmod +x strainphlan_src/*.py
+```
+
+and add to the executable path:
+
+```
+#!python
+
+export PATH=$PATH:$(pwd -P)/strainphlan_src
+```
+
+### Usage ###
+
+Let's reproduce the toy example result in the introduction section. Note that all the commands to run the below steps are in the "strainer_tutorial/step?*.sh" files (? corresponding to the step number). All the below steps are excuted under the "strainer_tutorial" folder.
+The steps are as follows:
+
+Step 1. Download 6 HMP gut metagenomic samples, the metadata.txt file and one reference genome from the folder "fastqs" and "reference_genomes" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0) and put these folders under the "strainer_tutorial" folder.
+
+Step 2. Obtain the sam files from these samples by mapping them against MetaPhlAn2 database:
+
+This step will run MetaPhlAn2 to map all metagenomic samples against the MetaPhlAn2 marker database and produce the sam files (\*.sam.bz2).
+Each sam file (in SAM format) corresponding to each sample contains the reads mapped against the marker database of MetaPhlAn2.
+The commands to run are:
+
+```
+#!python
+
+mkdir -p sams
+for f in $(ls fastqs/*.bz2)
+do
+    echo "Running metaphlan2 on ${f}"
+    bn=$(basename ${f} | cut -d . -f 1)
+    tar xjfO ${f} | ../metaphlan2.py --bowtie2db ../db_v20/mpa_v20_m200 --mpa_pkl ../db_v20/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
+done
+```
+
+After this step, you will have a folder "sams" containing the sam files (\*.sam.bz2) and other MetaPhlAn2 output files. 
+This step will take around 270 minutes. If you want to skip this step, you can download the sam files from the folder "sams" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+
+Step 3. Produce the consensus-marker files which are the input for StrainPhlAn:
+
+For each sample, this step will reconstruct all species strains found in it and store them in a marker file (\*.markers). Those strains are referred as *sample-reconstructed strains*. Additional details in generating consensus sequences can be found [here](http://samtools.sourceforge.net/mpileup.shtml).
+The commands to run are:
+
+
+```
+#!python
+
+mkdir -p consensus_markers
+cwd=$(pwd -P)
+export PATH=${cwd}/../strainphlan_src:${PATH}
+python ../strainphlan_src/sample2markers.py --ifn_samples sams/*.sam.bz2 --input_type sam --output_dir consensus_markers --nprocs 10 &> consensus_markers/log.txt
+```
+
+The result is the same if you want run several sample2markers.py scripts in parallel with each run for a sample (this maybe useful for some cluster-system settings).
+After this step, you will have a folder "consensus_markers" containing all sample-marker files (\*.markers).
+This steps will take around 44 minutes.  If you want to skip this step, you can download the consensus marker files from the folder "consensus_markers" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+
+Step 4. Extract the markers of *Bacteroides_caccae* from MetaPhlAn2 database (to add its reference genome later):
+
+This step will extract the markers of *Bacteroides_caccae* in the database and then StrainPhlAn will identify the sequences in the reference genomes that are closet to them (in the next step by using blast). Those will be concatenated and referred as *reference-genome-reconstructed strains*. 
+The commands to run are:
+
+```
+#!python
+
+mkdir -p db_markers
+bowtie2-inspect ../db_v20/mpa_v20_m200 > db_markers/all_markers.fasta
+python ../strainphlan_src/extract_markers.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_markers db_markers/all_markers.fasta --clade s__Bacteroides_caccae --ofn_markers db_markers/s__Bacteroides_caccae.markers.fasta
+```
+
+Note that the "all_markers.fasta" file consists can be reused for extracting other reference genomes. 
+After this step, you should have two files in folder "db_markers": "all_markers.fasta" containing all marker sequences, and "s__Bacteroides_caccae.markers.fasta" containing the markers of *Bacteroides caccae*.
+This step will take around 1 minute and can skipped if you do not need to add the reference genomes to the phylogenetic tree. Those markers can be found in the folder "db_markers" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+
+Before building the trees, we should get the list of all clades detected from the samples and save them in the "output/clades.txt" file by the following command:
+```
+#!python
+
+python ../strainphlan.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_samples consensus_markers/*.markers --output_dir output --nprocs_main 10 --print_clades_only > output/clades.txt
+```
+
+The clade names in the output file "clades.txt" will be used for the next step.
+
+Step 5. Build the multiple sequence alignment and phylogenetic tree:
+
+This step will align and clean the *sample-reconstructed strains* (stored in the marker files produced in step 3) and *reference-genome-reconstructed strains* (extracted based on the database markers in step 4) to produce a multiple sequence alignment (MSA) and store it in the file "clade_name.fasta". From this MSA file, StrainPhlAn will call RAxML to build the phylogenetic tree.
+Note that: all marker files (\*.markers) **must be used together** as the input for the strainphlan.py script because StrainPhlAn needs to align all of the strains at once.
+
+The commands to run are:
+
+```
+#!python
+
+mkdir -p output
+python ../strainphlan.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_samples consensus_markers/*.markers --ifn_markers db_markers/s__Bacteroides_caccae.markers.fasta --ifn_ref_genomes reference_genomes/G000273725.fna.bz2 --output_dir output --nprocs_main 10 --clades s__Bacteroides_caccae &> output/log_full.txt
+```
+
+This step will take around 2 minutes. After this step, you will find the tree "output/RAxML_bestTree.s__Bacteroides_caccae.tree". All the output files can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+You can view it by [Archaeopteryx](https://sites.google.com/site/cmzmasek/home/software/archaeopteryx) or any other viewers.
+
+By default, if you do not specify reference genomes (by --ifn_ref_genomes) and any specific clade (by --clades), strainphlan.py will build the phylogenetic trees for all species that it can detect.
+
+In order to add the metadata, we also provide a script called "add_metadata_tree.py" which can be used as follows:
+
+```
+#!python
+
+python ../strainphlan_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
+```
+
+The script "add_metadata_tree.py" can accept multiple metadata files (space separated, wild card can also be used) and multiple trees. A metadata file is a tab separated file where the first row is the meta-headers, and the following rows contain the metadata for each sample. Multiple metadata files are used in the case where your samples come from more than one dataset and you do not want to merge the metadata files.
+For more details of using "add_metadata_tree.py", please see its help (with option "-h").
+An example of a metadata file is the "fastqs/metadata.txt" file with the below content:
+
+```
+#!python
+
+sampleID        subjectID
+SRS055982       638754422
+SRS022137       638754422
+SRS019161       763496533
+SRS013951       763496533
+SRS014613       763840445
+SRS064276       763840445
+G000273725  ReferenceGenomes
+```
+
+Note that "sampleID" is a compulsory field. 
+
+After adding the metadata, you will obtain the tree files "*.tree.metadata" with metadata and view them by [Archaeopteryx](https://sites.google.com/site/cmzmasek/home/software/archaeopteryx) as in the previous step.
+
+If you have installed [graphlan](https://bitbucket.org/nsegata/graphlan/wiki/Home), you can plot the tree with the command:
+
+
+```
+#!python
+
+python ../strainphlan_src/plot_tree_graphlan.py --ifn_tree output/RAxML_bestTree.s__Bacteroides_caccae.tree.metadata --colorized_metadata subjectID
+```
+
+and obtain the following figure (output/RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png):
+
+![RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png](https://bitbucket.org/repo/rM969K/images/1574126761-RAxML_bestTree.s__Bacteroides_caccae.tree.metadata.png)
+
+Step 6. If you want to remove the samples with high-probability of containing multiple strains, you can rebuild the tree by removing the multiple strains:
+
+```
+#!python
+
+python ../strainphlan_src/build_tree_single_strain.py --ifn_alignments output/s__Bacteroides_caccae.fasta --nprocs 10 --log_ofn output/build_tree_single_strain.log
+python ../strainphlan_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.remove_multiple_strains.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
+```
+
+You will obtain the refined tree "output/RAxML_bestTree.s__Bacteroides_caccae.remove_multiple_strains.tree.metadata". This tree can be found in the folder "output" in [this link](https://www.dropbox.com/sh/m4na8wefp53j8ej/AABA3yVsG26TbB0t1cnBS9-Ra?dl=0).
+
+### Some useful options ###
+All option details can be viewed by strainphlan.py help:
+```
+#!python
+
+python ../strainphlan.py -h
+```
+
+The default setting can be stringent for some cases where you have very few samples left in the phylogenetic tree. You can relax some parameters to add more samples back:
+
+1. *marker_in_clade*: In each sample, the clades with the percentage of present markers less than this threshold are removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+2. *sample_in_marker*: If the percentage of samples that a marker present in is less than this threhold, that marker is removed. Default "0.8". You can set this parameter to "0.5" to add some more samples.
+3. *N_in_marker*: The consensus markers with the percentage of N nucleotides greater than this threshold are removed. Default "0.2". You can set this parameter to "0.5" to add some more samples.
+4. *gap_in_sample*: The samples with full sequences concatenated from all markers and having the percentage of gaps greater than this threshold will be removed. Default 0.2. You can set this parameter to "0.5" to add some more samples.
+5. *relaxed_parameters*: use this option to automatically set the above parameters to add some more samples by accepting some more gaps, Ns, etc. This option is equivalent to set: marker_in_clade=0.5, sample_in_marker=0.5,                        N_in_marker=0.5, gap_in_sample=0.5. Default "False".
+6. *relaxed_parameters2*: use this option to add more samples by accepting some noise. This is equivalent to set marker_in_clade=0.2, sample_in_marker=0.2, N_in_marker=0.8, gap_in_sample=0.8. Default "False".
+
+### Some other useful output files ###
+In the output folder, you can find the following files:
+
+1. clade_name.fasta: the alignment file of all metagenomic strains.
+3. *.marker_pos: this file shows the starting position of each marker in the strains.
+3. *.info: this file shows the general information like the total length of the concatenated markers (full sequence length), number of used markers, etc.
+4. *.polymorphic: this file shows the statistics on the polymorphic site, where "sample" is the sample name, "percentage_of_polymorphic_sites" is the percentage of sites that are suspected to be polymorphic, "avg_freq" is the average frequency of the dominant alleles on all polymorphic sites, "avg_coverage" is the average coverage at all polymorphic sites.
\ No newline at end of file
diff --git a/changeset.txt b/changeset.txt
new file mode 100644
index 0000000..02d1341
--- /dev/null
+++ b/changeset.txt
@@ -0,0 +1,13 @@
+== Version 2.2.0
+- added option "marker_counts" (by Nicola)
+
+=== Version 2.1.0
+- added min_alignment_len option to filter out short alignments in local mode. For long reads (>150) it is now recommended to use local mapping together with "--min_alignment_len 100" to filter out very short alignments. (by Tin)
+- added "--samout" option to store the mapping file in SAM format (the SAM will be compressed if the extension of the specified output file ends with ".bz2") (by Tin)
+- fix: MetaPhlAn2 now ingores about ~300 markers that were a-specific (thanks to Eric)
+
+=== Version 2.0.0
+- fix: Biom >= 2.0.0 has the clade IDs second and the sample ids third'
+- added extract_markers.py
+- fix: #5; revamp biom generation; set clade IDs as enumeration
+- added utils/metaphlan2krona.py
diff --git a/db_v20/mpa_v20_m200.pkl b/db_v20/mpa_v20_m200.pkl
new file mode 100644
index 0000000..b409019
Binary files /dev/null and b/db_v20/mpa_v20_m200.pkl differ
diff --git a/license.txt b/license.txt
new file mode 100644
index 0000000..1596b63
--- /dev/null
+++ b/license.txt
@@ -0,0 +1,7 @@
+Copyright (c) 2015, Duy Tin Truong, Nicola Segata and Curtis Huttenhower
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
diff --git a/metaphlan2.py b/metaphlan2.py
new file mode 100755
index 0000000..cae0ced
--- /dev/null
+++ b/metaphlan2.py
@@ -0,0 +1,1282 @@
+#!/usr/bin/env python
+
+from __future__ import with_statement 
+
+# ==============================================================================
+# MetaPhlAn v2.x: METAgenomic PHyLogenetic ANalysis for taxonomic classification
+#                 of metagenomic data
+#
+# Authors: Nicola Segata (nicola.segata at unitn.it), 
+#          Duy Tin Truong (duytin.truong at unitn.it)
+#
+# Please type "./metaphlan2.py -h" for usage help
+#
+# ==============================================================================
+
+__author__ = 'Nicola Segata (nicola.segata at unitn.it), Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '2.6.0'
+__date__ = '19 August 2016'
+
+
+import sys
+import os
+import stat
+import re
+from binascii import b2a_uu 
+
+try:
+    import numpy as np 
+except ImportError:
+    sys.stderr.write("Error! numpy python library not detected!!\n")
+    sys.exit(1)
+import tempfile as tf
+import argparse as ap
+import subprocess as subp
+import multiprocessing as mp
+from collections import defaultdict as defdict
+import bz2 
+import itertools
+from distutils.version import LooseVersion
+try:
+    import cPickle as pickle
+except:
+    import pickle
+
+
+#*************************************************************
+#*  Imports related to biom file generation                  *
+#*************************************************************
+try:
+    import biom
+    import biom.table
+    import numpy as np
+except ImportError:
+    sys.stderr.write("Warning! Biom python library not detected!"
+                     "\n Exporting to biom format will not work!\n")
+try:
+    import json
+except ImportError:
+    sys.stderr.write("Warning! json python library not detected!"
+                     "\n Exporting to biom format will not work!\n")
+
+# This set contains the markers that after careful validation are found to have low precision or recall
+# We esclude the markers here to avoid generating a new marker DB when changing just few markers
+markers_to_exclude = \
+    set([
+        'NC_001782.1','GeneID:17099689','gi|419819595|ref|NZ_AJRE01000517.1|:1-118',
+        'GeneID:10498696', 'GeneID:10498710', 'GeneID:10498726', 'GeneID:10498735',
+        'GeneID:10498757', 'GeneID:10498760', 'GeneID:10498761', 'GeneID:10498763',
+        'GeneID:11294465', 'GeneID:14181982', 'GeneID:14182132', 'GeneID:14182146',
+        'GeneID:14182148', 'GeneID:14182328', 'GeneID:14182639', 'GeneID:14182647',
+        'GeneID:14182650', 'GeneID:14182663', 'GeneID:14182683', 'GeneID:14182684',
+        'GeneID:14182691', 'GeneID:14182803', 'GeneID:14296322', 'GeneID:1489077',
+        'GeneID:1489080', 'GeneID:1489081', 'GeneID:1489084', 'GeneID:1489085',
+        'GeneID:1489088', 'GeneID:1489089', 'GeneID:1489090', 'GeneID:1489528',
+        'GeneID:1489530', 'GeneID:1489531', 'GeneID:1489735', 'GeneID:1491873',
+        'GeneID:1491889', 'GeneID:1491962', 'GeneID:1491963', 'GeneID:1491964',
+        'GeneID:1491965', 'GeneID:17099689', 'GeneID:1724732', 'GeneID:17494231',
+        'GeneID:2546403', 'GeneID:2703374', 'GeneID:2703375', 'GeneID:2703498',
+        'GeneID:2703531', 'GeneID:2772983', 'GeneID:2772989', 'GeneID:2772991',
+        'GeneID:2772993', 'GeneID:2772995', 'GeneID:2773037', 'GeneID:2777387',
+        'GeneID:2777399', 'GeneID:2777400', 'GeneID:2777439', 'GeneID:2777493',
+        'GeneID:2777494', 'GeneID:3077424', 'GeneID:3160801', 'GeneID:3197323',
+        'GeneID:3197355', 'GeneID:3197400', 'GeneID:3197428', 'GeneID:3783722',
+        'GeneID:3783750', 'GeneID:3953004', 'GeneID:3959334', 'GeneID:3964368',
+        'GeneID:3964370', 'GeneID:4961452', 'GeneID:5075645', 'GeneID:5075646',
+        'GeneID:5075647', 'GeneID:5075648', 'GeneID:5075649', 'GeneID:5075650',
+        'GeneID:5075651', 'GeneID:5075652', 'GeneID:5075653', 'GeneID:5075654',
+        'GeneID:5075655', 'GeneID:5075656', 'GeneID:5075657', 'GeneID:5075658',
+        'GeneID:5075659', 'GeneID:5075660', 'GeneID:5075661', 'GeneID:5075662',
+        'GeneID:5075663', 'GeneID:5075664', 'GeneID:5075665', 'GeneID:5075667',
+        'GeneID:5075668', 'GeneID:5075669', 'GeneID:5075670', 'GeneID:5075671',
+        'GeneID:5075672', 'GeneID:5075673', 'GeneID:5075674', 'GeneID:5075675',
+        'GeneID:5075676', 'GeneID:5075677', 'GeneID:5075678', 'GeneID:5075679',
+        'GeneID:5075680', 'GeneID:5075681', 'GeneID:5075682', 'GeneID:5075683',
+        'GeneID:5075684', 'GeneID:5075685', 'GeneID:5075686', 'GeneID:5075687',
+        'GeneID:5075688', 'GeneID:5075689', 'GeneID:5075690', 'GeneID:5075691',
+        'GeneID:5075692', 'GeneID:5075693', 'GeneID:5075694', 'GeneID:5075695',
+        'GeneID:5075696', 'GeneID:5075697', 'GeneID:5075698', 'GeneID:5075700',
+        'GeneID:5075701', 'GeneID:5075702', 'GeneID:5075703', 'GeneID:5075704',
+        'GeneID:5075705', 'GeneID:5075707', 'GeneID:5075708', 'GeneID:5075709',
+        'GeneID:5075710', 'GeneID:5075711', 'GeneID:5075712', 'GeneID:5075713',
+        'GeneID:5075714', 'GeneID:5075715', 'GeneID:5075716', 'GeneID:5176189',
+        'GeneID:6803896', 'GeneID:6803915', 'GeneID:7944151', 'GeneID:927334',
+        'GeneID:927335', 'GeneID:927337', 'GeneID:940263', 'GeneID:9538324',
+        'NC_003977.1', 'gi|103485498|ref|NC_008048.1|:1941166-1942314',
+        'gi|108802856|ref|NC_008148.1|:1230231-1230875',
+        'gi|124806686|ref|XM_001350760.1|',
+        'gi|126661648|ref|NZ_AAXW01000149.1|:c1513-1341',
+        'gi|149172845|ref|NZ_ABBW01000029.1|:970-1270',
+        'gi|153883242|ref|NZ_ABDQ01000074.1|:79-541',
+        'gi|167031021|ref|NC_010322.1|:1834668-1835168',
+        'gi|171344510|ref|NZ_ABJO01001391.1|:1-116',
+        'gi|171346813|ref|NZ_ABJO01001728.1|:c109-1',
+        'gi|190640924|ref|NZ_ABRC01000948.1|:c226-44',
+        'gi|223045343|ref|NZ_ACEN01000042.1|:1-336',
+        'gi|224580998|ref|NZ_GG657387.1|:c114607-114002',
+        'gi|224993759|ref|NZ_ACFY01000068.1|:c357-1',
+        'gi|237784637|ref|NC_012704.1|:141000-142970',
+        'gi|237784637|ref|NC_012704.1|:c2048315-2047083',
+        'gi|240136783|ref|NC_012808.1|:1928224-1928961',
+        'gi|255319020|ref|NZ_ACVR01000025.1|:28698-29132',
+        'gi|260590341|ref|NZ_ACEO02000062.1|:c387-151',
+        'gi|262368201|ref|NZ_GG704964.1|:733100-733978',
+        'gi|262369811|ref|NZ_GG704966.1|:c264858-264520',
+        'gi|288559258|ref|NC_013790.1|:448046-451354',
+        'gi|288559258|ref|NC_013790.1|:532047-533942',
+        'gi|294794157|ref|NZ_GG770200.1|:245344-245619',
+        'gi|304372805|ref|NC_014448.1|:444677-445120',
+        'gi|304372805|ref|NC_014448.1|:707516-708268',
+        'gi|304372805|ref|NC_014448.1|:790263-792257',
+        'gi|304372805|ref|NC_014448.1|:c367313-364470',
+        'gi|304372805|ref|NC_014448.1|:c659144-658272',
+        'gi|304372805|ref|NC_014448.1|:c772578-770410',
+        'gi|304372805|ref|NC_014448.1|:c777901-777470',
+        'gi|306477407|ref|NZ_GG770409.1|:c1643877-1643338',
+        'gi|317120849|ref|NC_014831.1|:c891121-890144',
+        'gi|323356441|ref|NZ_GL698442.1|:560-682',
+        'gi|324996766|ref|NZ_BABV01000451.1|:10656-11579',
+        'gi|326579405|ref|NZ_AEGQ01000006.1|:2997-3791',
+        'gi|326579407|ref|NZ_AEGQ01000008.1|:c45210-44497',
+        'gi|326579433|ref|NZ_AEGQ01000034.1|:346-3699',
+        'gi|329889017|ref|NZ_GL883086.1|:586124-586804',
+        'gi|330822653|ref|NC_015422.1|:2024431-2025018',
+        'gi|335053104|ref|NZ_AFIL01000010.1|:c33862-32210',
+        'gi|339304121|ref|NZ_AEOR01000258.1|:c294-1',
+        'gi|339304277|ref|NZ_AEOR01000414.1|:1-812',
+        'gi|342211239|ref|NZ_AFUK01000001.1|:790086-790835',
+        'gi|342211239|ref|NZ_AFUK01000001.1|:c1579497-1578787',
+        'gi|342213707|ref|NZ_AFUJ01000005.1|:48315-48908',
+        'gi|355707189|ref|NZ_JH376566.1|:326756-326986',
+        'gi|355707384|ref|NZ_JH376567.1|:90374-91453',
+        'gi|355707384|ref|NZ_JH376567.1|:c388018-387605',
+        'gi|355708440|ref|NZ_JH376569.1|:c80380-79448',
+        'gi|358051729|ref|NZ_AEUN01000100.1|:c120-1',
+        'gi|365983217|ref|XM_003668394.1|',
+        'gi|377571722|ref|NZ_BAFD01000110.1|:c1267-29',
+        'gi|377684864|ref|NZ_CM001194.1|:c1159954-1159619',
+        'gi|377684864|ref|NZ_CM001194.1|:c4966-4196',
+        'gi|378759497|ref|NZ_AFXE01000152.1|:1628-2215',
+        'gi|378835506|ref|NC_016829.1|:112560-113342',
+        'gi|378835506|ref|NC_016829.1|:114945-115193',
+        'gi|378835506|ref|NC_016829.1|:126414-127151',
+        'gi|378835506|ref|NC_016829.1|:272056-272403',
+        'gi|378835506|ref|NC_016829.1|:272493-272786',
+        'gi|378835506|ref|NC_016829.1|:358647-360863',
+        'gi|378835506|ref|NC_016829.1|:37637-38185',
+        'gi|378835506|ref|NC_016829.1|:60012-60497',
+        'gi|378835506|ref|NC_016829.1|:606819-607427',
+        'gi|378835506|ref|NC_016829.1|:607458-607760',
+        'gi|378835506|ref|NC_016829.1|:826192-826821',
+        'gi|378835506|ref|NC_016829.1|:c451932-451336',
+        'gi|378835506|ref|NC_016829.1|:c460520-459951',
+        'gi|378835506|ref|NC_016829.1|:c483843-482842',
+        'gi|378835506|ref|NC_016829.1|:c544660-543638',
+        'gi|378835506|ref|NC_016829.1|:c556383-555496',
+        'gi|378835506|ref|NC_016829.1|:c632166-631228',
+        'gi|378835506|ref|NC_016829.1|:c805066-802691',
+        'gi|384124469|ref|NC_017160.1|:c2157447-2156863',
+        'gi|385263288|ref|NZ_AJST01000001.1|:594143-594940',
+        'gi|385858114|ref|NC_017519.1|:10252-10746',
+        'gi|385858114|ref|NC_017519.1|:104630-104902',
+        'gi|385858114|ref|NC_017519.1|:154292-156016',
+        'gi|385858114|ref|NC_017519.1|:205158-206462',
+        'gi|385858114|ref|NC_017519.1|:507239-507703',
+        'gi|385858114|ref|NC_017519.1|:518924-519772',
+        'gi|385858114|ref|NC_017519.1|:524712-525545',
+        'gi|385858114|ref|NC_017519.1|:528387-528785',
+        'gi|385858114|ref|NC_017519.1|:532275-533429',
+        'gi|385858114|ref|NC_017519.1|:586402-586824',
+        'gi|385858114|ref|NC_017519.1|:621696-622226',
+        'gi|385858114|ref|NC_017519.1|:673673-676105',
+        'gi|385858114|ref|NC_017519.1|:706602-708218',
+        'gi|385858114|ref|NC_017519.1|:710627-711997',
+        'gi|385858114|ref|NC_017519.1|:744974-745456',
+        'gi|385858114|ref|NC_017519.1|:791055-791801',
+        'gi|385858114|ref|NC_017519.1|:805643-807430',
+        'gi|385858114|ref|NC_017519.1|:c172050-170809',
+        'gi|385858114|ref|NC_017519.1|:c334545-333268',
+        'gi|385858114|ref|NC_017519.1|:c383474-383202',
+        'gi|385858114|ref|NC_017519.1|:c450880-450389',
+        'gi|385858114|ref|NC_017519.1|:c451975-451001',
+        'gi|385858114|ref|NC_017519.1|:c470488-470036',
+        'gi|385858114|ref|NC_017519.1|:c485596-484598',
+        'gi|385858114|ref|NC_017519.1|:c58658-58065',
+        'gi|385858114|ref|NC_017519.1|:c592754-591081',
+        'gi|385858114|ref|NC_017519.1|:c59590-58820',
+        'gi|385858114|ref|NC_017519.1|:c601339-600575',
+        'gi|385858114|ref|NC_017519.1|:c76080-75160',
+        'gi|385858114|ref|NC_017519.1|:c97777-96302',
+        'gi|391227518|ref|NZ_CM001514.1|:c1442504-1440237',
+        'gi|391227518|ref|NZ_CM001514.1|:c3053472-3053023',
+        'gi|394749766|ref|NZ_AHHC01000069.1|:3978-6176',
+        'gi|398899615|ref|NZ_AKJK01000021.1|:28532-29209',
+        'gi|406580057|ref|NZ_AJRD01000017.1|:c17130-15766',
+        'gi|406584668|ref|NZ_AJQZ01000017.1|:c1397-771',
+        'gi|408543458|ref|NZ_AJLO01000024.1|:67702-68304',
+        'gi|410936685|ref|NZ_AJRF02000012.1|:21785-22696',
+        'gi|41406098|ref|NC_002944.2|:c4468304-4467864',
+        'gi|416998679|ref|NZ_AEXI01000003.1|:c562937-562176',
+        'gi|417017738|ref|NZ_AEYL01000489.1|:c111-1',
+        'gi|417018375|ref|NZ_AEYL01000508.1|:100-238',
+        'gi|418576506|ref|NZ_AHKB01000025.1|:c7989-7669',
+        'gi|419819595|ref|NZ_AJRE01000517.1|:1-118',
+        'gi|421806549|ref|NZ_AMTB01000006.1|:c181247-180489',
+        'gi|422320815|ref|NZ_GL636045.1|:28704-29048',
+        'gi|422320874|ref|NZ_GL636046.1|:4984-5742',
+        'gi|422323244|ref|NZ_GL636061.1|:479975-480520',
+        'gi|422443048|ref|NZ_GL383112.1|:663738-664823',
+        'gi|422552858|ref|NZ_GL383469.1|:c216727-215501',
+        'gi|422859491|ref|NZ_GL878548.1|:c271832-271695',
+        'gi|423012810|ref|NZ_GL982453.1|:3888672-3888935',
+        'gi|423012810|ref|NZ_GL982453.1|:4541873-4542328',
+        'gi|423012810|ref|NZ_GL982453.1|:c2189976-2188582',
+        'gi|423012810|ref|NZ_GL982453.1|:c5471232-5470300',
+        'gi|423262555|ref|NC_019552.1|:24703-25212',
+        'gi|423262555|ref|NC_019552.1|:28306-30696',
+        'gi|423262555|ref|NC_019552.1|:284252-284581',
+        'gi|423262555|ref|NC_019552.1|:311161-311373',
+        'gi|423262555|ref|NC_019552.1|:32707-34497',
+        'gi|423262555|ref|NC_019552.1|:34497-35237',
+        'gi|423262555|ref|NC_019552.1|:53691-56813',
+        'gi|423262555|ref|NC_019552.1|:c388986-386611',
+        'gi|423262555|ref|NC_019552.1|:c523106-522528',
+        'gi|423689090|ref|NZ_CM001513.1|:c1700632-1699448',
+        'gi|423689090|ref|NZ_CM001513.1|:c1701670-1700651',
+        'gi|423689090|ref|NZ_CM001513.1|:c5739118-5738390',
+        'gi|427395956|ref|NZ_JH992914.1|:c592682-591900',
+        'gi|427407324|ref|NZ_JH992904.1|:c2681223-2679463',
+        'gi|451952303|ref|NZ_AJRB03000021.1|:1041-1574',
+        'gi|452231579|ref|NZ_AEKA01000123.1|:c18076-16676',
+        'gi|459791914|ref|NZ_CM001824.1|:c899379-899239',
+        'gi|471265562|ref|NC_020815.1|:3155799-3156695',
+        'gi|472279780|ref|NZ_ALPV02000001.1|:33911-36751',
+        'gi|482733945|ref|NZ_AHGZ01000071.1|:10408-11154',
+        'gi|483051300|ref|NZ_ALYK02000034.1|:c37582-36650',
+        'gi|483051300|ref|NZ_ALYK02000034.1|:c38037-37582',
+        'gi|483993347|ref|NZ_AMXG01000045.1|:251724-253082',
+        'gi|484100856|ref|NZ_JH670250.1|:600643-602949',
+        'gi|484115941|ref|NZ_AJXG01000093.1|:567-947',
+        'gi|484228609|ref|NZ_JH730929.1|:c103784-99021',
+        'gi|484228797|ref|NZ_JH730960.1|:c16193-12429',
+        'gi|484228814|ref|NZ_JH730962.1|:c29706-29260',
+        'gi|484228929|ref|NZ_JH730981.1|:18645-22060',
+        'gi|484228939|ref|NZ_JH730983.1|:42943-43860',
+        'gi|484266598|ref|NZ_AKGC01000024.1|:118869-119636',
+        'gi|484327375|ref|NZ_AKVP01000093.1|:1-1281',
+        'gi|484328234|ref|NZ_AKVP01000127.1|:c325-110',
+        'gi|487376144|ref|NZ_KB911257.1|:600445-601482',
+        'gi|487376194|ref|NZ_KB911260.1|:146228-146533',
+        'gi|487381776|ref|NZ_KB911485.1|:101242-103083',
+        'gi|487381776|ref|NZ_KB911485.1|:c32472-31627',
+        'gi|487381800|ref|NZ_KB911486.1|:39414-39872',
+        'gi|487381828|ref|NZ_KB911487.1|:15689-17026',
+        'gi|487381846|ref|NZ_KB911488.1|:13678-13821',
+        'gi|487382089|ref|NZ_KB911497.1|:23810-26641',
+        'gi|487382176|ref|NZ_KB911501.1|:c497-381',
+        'gi|487382213|ref|NZ_KB911502.1|:12706-13119',
+        'gi|487382247|ref|NZ_KB911505.1|:c7595-6663',
+        'gi|490551798|ref|NZ_AORG01000011.1|:40110-41390',
+        'gi|491099398|ref|NZ_KB849654.1|:c720460-719912',
+        'gi|491124812|ref|NZ_KB849705.1|:1946500-1946937',
+        'gi|491155563|ref|NZ_KB849732.1|:46469-46843',
+        'gi|491155563|ref|NZ_KB849732.1|:46840-47181',
+        'gi|491155563|ref|NZ_KB849732.1|:47165-48616',
+        'gi|491155563|ref|NZ_KB849732.1|:55055-56662',
+        'gi|491155563|ref|NZ_KB849732.1|:56662-57351',
+        'gi|491155563|ref|NZ_KB849732.1|:6101-7588',
+        'gi|491155563|ref|NZ_KB849732.1|:7657-8073',
+        'gi|491349766|ref|NZ_KB850082.1|:441-941',
+        'gi|491395079|ref|NZ_KB850142.1|:1461751-1462554',
+        'gi|512608407|ref|NZ_KE150401.1|:c156891-156016',
+        'gi|518653462|ref|NZ_ATLM01000004.1|:c89669-89247',
+        'gi|520818261|ref|NZ_ATLQ01000015.1|:480744-481463',
+        'gi|520822538|ref|NZ_ATLQ01000063.1|:103173-103283',
+        'gi|520826510|ref|NZ_ATLQ01000092.1|:c13892-13563',
+        'gi|544644736|ref|NZ_KE747865.1|:68388-69722',
+        'gi|545347918|ref|NZ_KE952096.1|:c83873-81831',
+        'gi|550735774|gb|AXMM01000002.1|:c743886-743575',
+        'gi|552875787|ref|NZ_KI515684.1|:c584270-583890',
+        'gi|552876418|ref|NZ_KI515685.1|:36713-37258',
+        'gi|552876418|ref|NZ_KI515685.1|:432422-433465',
+        'gi|552876418|ref|NZ_KI515685.1|:c1014617-1014117',
+        'gi|552876418|ref|NZ_KI515685.1|:c931935-931327',
+        'gi|552876815|ref|NZ_KI515686.1|:613740-614315',
+        'gi|552879811|ref|NZ_AXME01000001.1|:1146402-1146932',
+        'gi|552879811|ref|NZ_AXME01000001.1|:40840-41742',
+        'gi|552879811|ref|NZ_AXME01000001.1|:49241-49654',
+        'gi|552891898|ref|NZ_AXMG01000001.1|:99114-99290',
+        'gi|552891898|ref|NZ_AXMG01000001.1|:c1460921-1460529',
+        'gi|552895565|ref|NZ_AXMI01000001.1|:619555-620031',
+        'gi|552895565|ref|NZ_AXMI01000001.1|:c14352-13837',
+        'gi|552896371|ref|NZ_AXMI01000002.1|:c148595-146280',
+        'gi|552897201|ref|NZ_AXMI01000004.1|:c231437-230883',
+        'gi|552902020|ref|NZ_AXMK01000001.1|:c1625038-1624022',
+        'gi|556346902|ref|NZ_KI535485.1|:c828278-827901',
+        'gi|556478613|ref|NZ_KI535633.1|:3529392-3530162',
+        'gi|560534311|ref|NZ_AYSF01000111.1|:26758-29049',
+        'gi|564165687|gb|AYLX01000355.1|:10906-11166',
+        'gi|564169776|gb|AYLX01000156.1|:1-185',
+        'gi|564938696|gb|AWYH01000018.1|:c75674-75039', 'gi|67993724|ref|XM_664440.1|',
+        'gi|68059117|ref|XM_666447.1|', 'gi|68062389|ref|XM_668109.1|',
+        'gi|71730848|gb|AAAM03000019.1|:c14289-12877', 'gi|82753723|ref|XM_722699.1|',
+        'gi|82775382|ref|NC_007606.1|:2249487-2250014', 'gi|82793634|ref|XM_723027.1|'
+        ])
+
+tax_units = "kpcofgst"
+
+if float(sys.version_info[0]) < 3.0:
+    def read_and_split( ofn  ):
+        return (l.strip().split('\t') for l in ofn)
+    def read_and_split_line( line ):
+        return line.strip().split('\t')
+else:
+    def read_and_split( ofn ):
+        return (str(l,encoding='utf-8').strip().split('\t') for l in ofn)
+    def read_and_split_line( line ):
+        return str(line,encoding='utf-8').strip().split('\t')
+
+
+def plain_read_and_split( ofn ):
+    return (l.strip().split('\t') for l in ofn)
+
+def plain_read_and_split_line( l ):
+    return l.strip().split('\t')
+
+
+
+if float(sys.version_info[0]) < 3.0:
+    def mybytes( val ):
+        return val
+else:
+    def mybytes( val ):
+        return bytes(val,encoding='utf-8')
+    
+# get the directory that contains this script
+metaphlan2_script_install_folder=os.path.dirname(os.path.abspath(__file__))
+
+def read_params(args):
+    p = ap.ArgumentParser( description= 
+            "DESCRIPTION\n"
+            " MetaPhlAn version "+__version__+" ("+__date__+"): \n"
+            " METAgenomic PHyLogenetic ANalysis for metagenomic taxonomic profiling.\n\n"
+            "AUTHORS: "+__author__+"\n\n"
+            "COMMON COMMANDS\n\n"
+            " We assume here that metaphlan2.py is in the system path and that mpa_dir bash variable contains the\n"
+            " main MetaPhlAn folder. Also BowTie2 should be in the system path with execution and read\n"
+            " permissions, and Perl should be installed)\n\n"
+           
+            "\n========== MetaPhlAn 2 clade-abundance estimation ================= \n\n"
+            "The basic usage of MetaPhlAn 2 consists in the identification of the clades (from phyla to species and \n"
+            "strains in particular cases) present in the metagenome obtained from a microbiome sample and their \n"
+            "relative abundance. This correspond to the default analysis type (--analysis_type rel_ab).\n\n"
+
+            "*  Profiling a metagenome from raw reads:\n"
+            "$ metaphlan2.py metagenome.fastq --input_type fastq\n\n"
+            
+            "*  You can take advantage of multiple CPUs and save the intermediate BowTie2 output for re-running\n"
+            "   MetaPhlAn extremely quickly:\n"
+            "$ metaphlan2.py metagenome.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq\n\n"
+            
+            "*  If you already mapped your metagenome against the marker DB (using a previous MetaPhlAn run), you\n"
+            "   can obtain the results in few seconds by using the previously saved --bowtie2out file and \n"
+            "   specifying the input (--input_type bowtie2out):\n"
+            "$ metaphlan2.py metagenome.bowtie2.bz2 --nproc 5 --input_type bowtie2out\n\n"
+            
+            "*  You can also provide an externally BowTie2-mapped SAM if you specify this format with \n"
+            "   --input_type. Two steps: first apply BowTie2 and then feed MetaPhlAn2 with the obtained sam:\n"
+            "$ bowtie2 --sam-no-hd --sam-no-sq --no-unal --very-sensitive -S metagenome.sam -x ${mpa_dir}/db_v20/mpa_v20_m200 -U metagenome.fastq\n"
+            "$ metaphlan2.py metagenome.sam --input_type sam > profiled_metagenome.txt\n\n"
+            
+            "*  Multiple alternative ways to pass the input are also available:\n"
+            "$ cat metagenome.fastq | metaphlan2.py --input_type fastq \n"
+            "$ tar xjf metagenome.tar.bz2 --to-stdout | metaphlan2.py --input_type fastq \n"
+            "$ metaphlan2.py --input_type fastq < metagenome.fastq\n"
+            "$ metaphlan2.py --input_type fastq <(bzcat metagenome.fastq.bz2)\n"
+            "$ metaphlan2.py --input_type fastq <(zcat metagenome_1.fastq.gz metagenome_2.fastq.gz)\n\n"
+
+            "*  We can also natively handle paired-end metagenomes, and, more generally, metagenomes stored in \n"
+            "  multiple files (but you need to specify the --bowtie2out parameter):\n"
+            "$ metaphlan2.py metagenome_1.fastq,metagenome_2.fastq --bowtie2out metagenome.bowtie2.bz2 --nproc 5 --input_type fastq\n\n"
+            "\n------------------------------------------------------------------- \n \n\n"
+        
+            
+            "\n========== MetaPhlAn 2 strain tracking ============================ \n\n"
+            "MetaPhlAn 2 introduces the capability of charachterizing organisms at the strain level using non\n"
+            "aggregated marker information. Such capability comes with several slightly different flavours and \n"
+            "are a way to perform strain tracking and comparison across multiple samples.\n"
+            "Usually, MetaPhlAn 2 is first ran with the default --analysis_type to profile the species present in\n"
+            "the community, and then a strain-level profiling can be performed to zoom-in into specific species\n"
+            "of interest. This operation can be performed quickly as it exploits the --bowtie2out intermediate \n"
+            "file saved during the execution of the default analysis type.\n\n"
+           
+            "*  The following command will output the abundance of each marker with a RPK (reads per kil-base) \n"
+            "   higher 0.0. (we are assuming that metagenome_outfmt.bz2 has been generated before as \n"
+            "   shown above).\n"
+            "$ metaphlan2.py -t marker_ab_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n"
+            "   The obtained RPK can be optionally normalized by the total number of reads in the metagenome \n"
+            "   to guarantee fair comparisons of abundances across samples. The number of reads in the metagenome\n"
+            "   needs to be passed with the '--nreads' argument\n\n"
+
+            "*  The list of markers present in the sample can be obtained with '-t marker_pres_table'\n"
+            "$ metaphlan2.py -t marker_pres_table metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n"
+            "   The --pres_th argument (default 1.0) set the minimum RPK value to consider a marker present\n\n"
+            
+            "*  The list '-t clade_profiles' analysis type reports the same information of '-t marker_ab_table'\n"
+            "   but the markers are reported on a clade-by-clade basis.\n"
+            "$ metaphlan2.py -t clade_profiles metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n\n"
+            
+            "*  Finally, to obtain all markers present for a specific clade and all its subclades, the \n"
+            "   '-t clade_specific_strain_tracker' should be used. For example, the following command\n"
+            "   is reporting the presence/absence of the markers for the B. fragulis species and its strains\n"
+            "   the optional argument --min_ab specifies the minimum clade abundance for reporting the markers\n\n"
+            "$ metaphlan2.py -t clade_specific_strain_tracker --clade s__Bacteroides_fragilis metagenome_outfmt.bz2 --input_type bowtie2out > marker_abundance_table.txt\n"
+            
+            "\n------------------------------------------------------------------- \n\n"
+            "",
+            formatter_class=ap.RawTextHelpFormatter,
+            add_help=False )
+    arg = p.add_argument
+
+    arg( 'inp', metavar='INPUT_FILE', type=str, nargs='?', default=None, help= 
+         "the input file can be:\n"
+         "* a fastq file containing metagenomic reads\n"
+         "OR\n"
+         "* a BowTie2 produced SAM file. \n"
+         "OR\n"
+         "* an intermediary mapping file of the metagenome generated by a previous MetaPhlAn run \n"
+         "If the input file is missing, the script assumes that the input is provided using the standard \n"
+         "input, or named pipes.\n"
+         "IMPORTANT: the type of input needs to be specified with --input_type" )   
+    
+    arg( 'output', metavar='OUTPUT_FILE', type=str, nargs='?', default=None,
+         help= "the tab-separated output file of the predicted taxon relative abundances \n"
+               "[stdout if not present]")
+
+
+    g = p.add_argument_group('Required arguments')
+    arg = g.add_argument
+    input_type_choices = ['fastq','fasta','multifasta','multifastq','bowtie2out','sam'] # !!!!
+    arg( '--input_type', choices=input_type_choices, required = 'True', help =  
+         "set whether the input is the multifasta file of metagenomic reads or \n"
+         "the SAM file of the mapping of the reads against the MetaPhlAn db.\n"
+         "[default 'automatic', i.e. the script will try to guess the input format]\n" )
+   
+    g = p.add_argument_group('Mapping arguments')
+    arg = g.add_argument
+    arg( '--mpa_pkl', type=str,
+         default=os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200.pkl"), 
+         help = "the metadata pickled MetaPhlAn file")
+    arg( '--bowtie2db', metavar="METAPHLAN_BOWTIE2_DB", type=str,
+         default = os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200"),
+         help = "The BowTie2 database file of the MetaPhlAn database. \n"
+                "Used if --input_type is fastq, fasta, multifasta, or multifastq")
+    bt2ps = ['sensitive','very-sensitive','sensitive-local','very-sensitive-local']
+    arg( '--bt2_ps', metavar="BowTie2 presets", default='very-sensitive', choices=bt2ps,
+         help = "presets options for BowTie2 (applied only when a multifasta file is provided)\n"
+                "The choices enabled in MetaPhlAn are:\n"
+                " * sensitive\n"
+                " * very-sensitive\n"
+                " * sensitive-local\n"
+                " * very-sensitive-local\n"
+                "[default very-sensitive]\n"   )
+    arg( '--bowtie2_exe', type=str, default = None, help =
+         'Full path and name of the BowTie2 executable. This option allows \n'
+         'MetaPhlAn to reach the executable even when it is not in the system \n'
+         'PATH or the system PATH is unreachable\n' )
+    arg( '--bowtie2out', metavar="FILE_NAME", type=str, default = None, help = 
+         "The file for saving the output of BowTie2\n" )
+    arg( '--no_map', action='store_true', help=
+         "Avoid storing the --bowtie2out map file\n" )
+    arg( '--tmp_dir', metavar="", default=None, type=str, help = 
+         "the folder used to store temporary files \n"
+         "[default is the OS dependent tmp dir]\n"   )
+    
+    
+    g = p.add_argument_group('Post-mapping arguments')
+    arg = g.add_argument
+    stat_choices = ['avg_g','avg_l','tavg_g','tavg_l','wavg_g','wavg_l','med']
+    arg( '--tax_lev', metavar='TAXONOMIC_LEVEL', type=str, 
+         choices='a'+tax_units, default='a', help = 
+         "The taxonomic level for the relative abundance output:\n"
+         "'a' : all taxonomic levels\n"
+         "'k' : kingdoms\n"
+         "'p' : phyla only\n"
+         "'c' : classes only\n"
+         "'o' : orders only\n"
+         "'f' : families only\n"
+         "'g' : genera only\n"
+         "'s' : species only\n"
+         "[default 'a']" )
+    arg( '--min_cu_len', metavar="", default="2000", type=int, help =
+         "minimum total nucleotide length for the markers in a clade for\n"
+         "estimating the abundance without considering sub-clade abundances\n"
+         "[default 2000]\n"   )
+    arg( '--min_alignment_len', metavar="", default=None, type=int, help =
+         "The sam records for aligned reads with the longest subalignment\n"
+         "length smaller than this threshold will be discarded.\n"
+         "[default None]\n"   )
+    arg( '--ignore_viruses', action='store_true', help=
+         "Do not profile viral organisms" )
+    arg( '--ignore_eukaryotes', action='store_true', help=
+         "Do not profile eukaryotic organisms" )
+    arg( '--ignore_bacteria', action='store_true', help=
+         "Do not profile bacterial organisms" )
+    arg( '--ignore_archaea', action='store_true', help=
+         "Do not profile archeal organisms" )
+    arg( '--stat_q', metavar="", type = float, default=0.1, help = 
+         "Quantile value for the robust average\n"
+         "[default 0.1]"   )
+    arg( '--ignore_markers', type=str, default = None, help = 
+         "File containing a list of markers to ignore. \n")
+    arg( '--avoid_disqm', action="store_true", help = 
+         "Deactivate the procedure of disambiguating the quasi-markers based on the \n"
+         "marker abundance pattern found in the sample. It is generally recommended \n"
+         "too keep the disambiguation procedure in order to minimize false positives\n")
+    arg( '--stat', metavar="", choices=stat_choices, default="tavg_g", type=str, help = 
+         "EXPERIMENTAL! Statistical approach for converting marker abundances into clade abundances\n"
+         "'avg_g'  : clade global (i.e. normalizing all markers together) average\n"
+         "'avg_l'  : average of length-normalized marker counts\n"
+         "'tavg_g' : truncated clade global average at --stat_q quantile\n"
+         "'tavg_l' : trunated average of length-normalized marker counts (at --stat_q)\n"
+         "'wavg_g' : winsorized clade global average (at --stat_q)\n"
+         "'wavg_l' : winsorized average of length-normalized marker counts (at --stat_q)\n"
+         "'med'    : median of length-normalized marker counts\n"
+         "[default tavg_g]"   ) 
+    
+    arg = p.add_argument
+
+
+    
+    g = p.add_argument_group('Additional analysis types and arguments')
+    arg = g.add_argument
+    analysis_types = ['rel_ab', 'rel_ab_w_read_stats', 'reads_map', 'clade_profiles', 'marker_ab_table', 'marker_counts', 'marker_pres_table', 'clade_specific_strain_tracker']
+    arg( '-t', metavar='ANALYSIS TYPE', type=str, choices = analysis_types, 
+         default='rel_ab', help = 
+         "Type of analysis to perform: \n"
+         " * rel_ab: profiling a metagenomes in terms of relative abundances\n"
+         " * rel_ab_w_read_stats: profiling a metagenomes in terms of relative abundances and estimate the number of reads comming from each clade.\n"
+         " * reads_map: mapping from reads to clades (only reads hitting a marker)\n"
+         " * clade_profiles: normalized marker counts for clades with at least a non-null marker\n"
+         " * marker_ab_table: normalized marker counts (only when > 0.0 and normalized by metagenome size if --nreads is specified)\n"
+         " * marker_counts: non-normalized marker counts [use with extreme caution]\n"
+         " * marker_pres_table: list of markers present in the sample (threshold at 1.0 if not differently specified with --pres_th\n"
+         "[default 'rel_ab']" )
+    arg( '--nreads', metavar="NUMBER_OF_READS", type=int, default = None, help =
+         "The total number of reads in the original metagenome. It is used only when \n"
+         "-t marker_table is specified for normalizing the length-normalized counts \n"
+         "with the metagenome size as well. No normalization applied if --nreads is not \n"
+         "specified" )
+    arg( '--pres_th', metavar="PRESENCE_THRESHOLD", type=int, default = 1.0, help =
+         'Threshold for calling a marker present by the -t marker_pres_table option' )
+    arg( '--clade', metavar="", default=None, type=str, help = 
+         "The clade for clade_specific_strain_tracker analysis\n"  )
+    arg( '--min_ab', metavar="", default=0.1, type=float, help = 
+         "The minimum percentage abundace for the clade in the clade_specific_strain_tracker analysis\n"  )
+    arg( "-h", "--help", action="help", help="show this help message and exit")
+
+    g = p.add_argument_group('Output arguments')
+    arg = g.add_argument
+    arg( '-o', '--output_file',  metavar="output file", type=str, default=None, help = 
+         "The output file (if not specified as positional argument)\n")
+    arg('--sample_id_key',  metavar="name", type=str, default="#SampleID", 
+        help =("Specify the sample ID key for this analysis."
+               " Defaults to '#SampleID'."))
+    arg('--sample_id',  metavar="value", type=str, 
+        default="Metaphlan2_Analysis",
+        help =("Specify the sample ID for this analysis."
+               " Defaults to 'Metaphlan2_Analysis'."))
+    arg( '-s', '--samout', metavar="sam_output_file",
+        type=str, default=None, help="The sam output file\n")
+    #*************************************************************
+    #* Parameters related to biom file generation                *
+    #*************************************************************         
+    arg( '--biom', '--biom_output_file',  metavar="biom_output", type=str, default=None, help = 
+         "If requesting biom file output: The name of the output file in biom format \n")
+
+    arg( '--mdelim', '--metadata_delimiter_char',  metavar="mdelim", type=str, default="|", help = 
+         "Delimiter for bug metadata: - defaults to pipe. e.g. the pipe in k__Bacteria|p__Proteobacteria \n")
+    #*************************************************************
+    #* End parameters related to biom file generation            *
+    #*************************************************************    
+    
+    g = p.add_argument_group('Other arguments')
+    arg = g.add_argument
+    arg( '--nproc', metavar="N", type=int, default=1, help = 
+         "The number of CPUs to use for parallelizing the mapping\n"
+         "[default 1, i.e. no parallelism]\n" ) 
+    arg( '-v','--version', action='version', version="MetaPhlAn version "+__version__+"\t("+__date__+")",
+         help="Prints the current MetaPhlAn version and exit\n" )
+    
+
+    return vars(p.parse_args()) 
+
+def run_bowtie2(  fna_in, outfmt6_out, bowtie2_db, preset, nproc, 
+                  file_format = "multifasta", exe = None, 
+                  samout = None,
+                  min_alignment_len = None,
+                  ):
+    try:
+        if not fna_in: # or stat.S_ISFIFO(os.stat(fna_in).st_mode):
+            fna_in = "-"
+        bowtie2_cmd = [ exe if exe else 'bowtie2', 
+                        "--quiet", "--no-unal", 
+                        "--"+preset,
+                        "-S","-",
+                        "-x", bowtie2_db,
+                         ] + ([] if int(nproc) < 2 else ["-p",str(nproc)])
+        bowtie2_cmd += ["-U", fna_in] # if not stat.S_ISFIFO(os.stat(fna_in).st_mode) else []
+        bowtie2_cmd += (["-f"] if file_format == "multifasta" else []) 
+        p = subp.Popen( bowtie2_cmd, stdout=subp.PIPE ) 
+        lmybytes, outf = (mybytes,bz2.BZ2File(outfmt6_out, "w")) if outfmt6_out.endswith(".bz2") else (str,open( outfmt6_out, "w" ))
+        
+        try:
+            if samout:
+                if samout[-4:] == '.bz2':
+                    sam_file = bz2.BZ2File(samout, 'w')
+                else:
+                    sam_file = open(samout, 'w')
+        except IOError:
+            sys.stderr.write( "IOError: Unable to open sam output file.\n" )
+            sys.exit(1)
+
+        for line in p.stdout:
+            if samout:
+                sam_file.write(line)
+            if line[0] != '@':
+                o = read_and_split_line(line)
+                if o[2][-1] != '*':
+                    if min_alignment_len == None\
+                        or max([int(x.strip('M')) for x in\
+                                re.findall(r'(\d*M)', o[5])]) >= min_alignment_len:
+                        outf.write( lmybytes("\t".join([o[0],o[2]]) +"\n") )
+        #if  float(sys.version_info[0]) >= 3: 
+        #    for o in read_and_split(p.stdout):
+        #        if o[2][-1] != '*':
+        #            outf.write( bytes("\t".join([o[0],o[2]]) +"\n",encoding='utf-8') )
+        #else:
+        #    for o in read_and_split(p.stdout):
+        #        if o[2][-1] != '*':
+        #            outf.write( "\t".join([o[0],o[2]]) +"\n" )
+        outf.close()
+        if samout:
+            sam_file.close()
+        p.wait()
+
+
+    except OSError:
+        sys.stderr.write( "OSError: fatal error running BowTie2. Is BowTie2 in the system path?\n" )
+        sys.exit(1)
+    except ValueError:
+        sys.stderr.write( "ValueError: fatal error running BowTie2.\n" )
+        sys.exit(1)
+    except IOError:
+        sys.stderr.write( "IOError: fatal error running BowTie2.\n" )
+        sys.exit(1)
+    if p.returncode == 13:
+        sys.stderr.write( "Permission Denied Error: fatal error running BowTie2." 
+          "Is the BowTie2 file in the path with execution and read permissions?\n" )
+        sys.exit(1)
+    elif p.returncode != 0:
+        sys.stderr.write("Error while running bowtie2.\n")
+        sys.exit(1)
+
+#def guess_input_format( inp_file ):
+#    if "," in inp_file:
+#        sys.stderr.write( "Sorry, I cannot guess the format of the input, when "
+#                          "more than one file is specified. Please set the --input_type parameter \n" )
+#        sys.exit(1) 
+#
+#    with open( inp_file ) as inpf:
+#        for i,l in enumerate(inpf):
+#            line = l.strip()
+#            if line[0] == '#': continue
+#            if line[0] == '>': return 'multifasta'
+#            if line[0] == '@': return 'multifastq'
+#            if len(l.split('\t')) == 2: return 'bowtie2out'
+#            if i > 20: break
+#    return None
+
+class TaxClade:
+    min_cu_len = -1
+    markers2lens = None
+    stat = None
+    quantile = None
+    avoid_disqm = False
+
+    def __init__( self, name, uncl = False, id_int = 0 ):
+        self.children, self.markers2nreads = {}, {}
+        self.name, self.father = name, None
+        self.uncl, self.subcl_uncl = uncl, False
+        self.abundance, self.uncl_abundance = None, 0 
+        self.id = id_int
+
+    def add_child( self, name, id_int ):
+        new_clade = TaxClade( name, id_int=id_int )
+        self.children[name] = new_clade
+        new_clade.father = self
+        return new_clade
+
+    
+    def get_terminals( self ):
+        terms = []
+        if not self.children:
+            return [self]
+        for c in self.children.values():
+            terms += c.get_terminals()
+        return terms
+
+
+    def get_full_name( self ):
+        fullname = [self.name]
+        cl = self.father
+        while cl:
+            fullname = [cl.name] + fullname
+            cl = cl.father
+        return "|".join(fullname[1:])
+
+    def get_normalized_counts( self ):
+        return [(m,float(n)*1000.0/self.markers2lens[m]) 
+                    for m,n in self.markers2nreads.items()]
+
+    def compute_abundance( self ):
+        if self.abundance is not None: return self.abundance
+        sum_ab = sum([c.compute_abundance() for c in self.children.values()]) 
+        rat_nreads = sorted([(self.markers2lens[m],n) 
+                                    for m,n in self.markers2nreads.items()],
+                                            key = lambda x: x[1])
+
+        rat_nreads, removed = [], []
+        for m,n in self.markers2nreads.items():
+            misidentified = False
+
+            if not self.avoid_disqm:
+                for e in self.markers2exts[m]:
+                    toclade = self.taxa2clades[e]
+                    m2nr = toclade.markers2nreads
+                    tocladetmp = toclade
+                    while len(tocladetmp.children) == 1:
+                        tocladetmp = list(tocladetmp.children.values())[0]
+                        m2nr = tocladetmp.markers2nreads
+    
+                    nonzeros = sum([v>0 for v in m2nr.values()])
+                    if len(m2nr):
+                        if float(nonzeros) / len(m2nr) > 0.33:
+                            misidentified = True
+                            removed.append( (self.markers2lens[m],n) )
+                            break
+            if not misidentified:
+                rat_nreads.append( (self.markers2lens[m],n) ) 
+       
+        if not self.avoid_disqm and len(removed):
+            n_rat_nreads = float(len(rat_nreads))
+            n_removed = float(len(removed))
+            n_tot = n_rat_nreads + n_removed
+            n_ripr = 10
+            
+            if len(self.get_terminals()) < 2:
+                n_ripr = 0
+
+            if "k__Viruses" in self.get_full_name():
+                n_ripr = 0
+
+            if n_rat_nreads < n_ripr and n_tot > n_rat_nreads:
+                rat_nreads += removed[:n_ripr-int(n_rat_nreads)]
+
+        
+        rat_nreads = sorted(rat_nreads, key = lambda x: x[1])
+
+        rat_v,nreads_v = zip(*rat_nreads) if rat_nreads else ([],[])
+        rat, nrawreads, loc_ab = float(sum(rat_v)) or -1.0, sum(nreads_v), 0.0
+        quant = int(self.quantile*len(rat_nreads))
+        ql,qr,qn = (quant,-quant,quant) if quant else (None,None,0)
+     
+        if self.name[0] == 't' and (len(self.father.children) > 1 or "_sp" in self.father.name or "k__Viruses" in self.get_full_name()):
+            non_zeros = float(len([n for r,n in rat_nreads if n > 0])) 
+            nreads = float(len(rat_nreads))
+            if nreads == 0.0 or non_zeros / nreads < 0.7:
+                self.abundance = 0.0
+                return 0.0
+
+        if rat < 0.0:
+            pass
+        elif self.stat == 'avg_g' or (not qn and self.stat in ['wavg_g','tavg_g']):
+            loc_ab = nrawreads / rat if rat >= 0 else 0.0
+        elif self.stat == 'avg_l' or (not qn and self.stat in ['wavg_l','tavg_l']):
+            loc_ab = np.mean([float(n)/r for r,n in rat_nreads]) 
+        elif self.stat == 'tavg_g':
+            wnreads = sorted([(float(n)/r,r,n) for r,n in rat_nreads], key=lambda x:x[0])
+            den,num = zip(*[v[1:] for v in wnreads[ql:qr]])
+            loc_ab = float(sum(num))/float(sum(den)) if any(den) else 0.0
+        elif self.stat == 'tavg_l':
+            loc_ab = np.mean(sorted([float(n)/r for r,n in rat_nreads])[ql:qr])
+        elif self.stat == 'wavg_g':
+            vmin, vmax = nreads_v[ql], nreads_v[qr]
+            wnreads = [vmin]*qn+list(nreads_v[ql:qr])+[vmax]*qn
+            loc_ab = float(sum(wnreads)) / rat  
+        elif self.stat == 'wavg_l':
+            wnreads = sorted([float(n)/r for r,n in rat_nreads])
+            vmin, vmax = wnreads[ql], wnreads[qr]
+            wnreads = [vmin]*qn+list(wnreads[ql:qr])+[vmax]*qn
+            loc_ab = np.mean(wnreads) 
+        elif self.stat == 'med':
+            loc_ab = np.median(sorted([float(n)/r for r,n in rat_nreads])[ql:qr]) 
+        
+        self.abundance = loc_ab
+        if rat < self.min_cu_len and self.children:
+            self.abundance = sum_ab
+        elif loc_ab < sum_ab:
+            self.abundance = sum_ab
+
+        if self.abundance > sum_ab and self.children: # *1.1??
+            self.uncl_abundance = self.abundance - sum_ab
+        self.subcl_uncl = not self.children and self.name[0] not in tax_units[-2:] 
+
+        return self.abundance
+
+    def get_all_abundances( self ):
+        ret = [(self.name,self.abundance)]
+        if self.uncl_abundance > 0.0:
+            lchild = list(self.children.values())[0].name[:3]
+            ret += [(lchild+self.name[3:]+"_unclassified",self.uncl_abundance)]
+        if self.subcl_uncl and self.name[0] != tax_units[-2]:
+            cind = tax_units.index( self.name[0] )
+            ret += [(   tax_units[cind+1]+self.name[1:]+"_unclassified",
+                        self.abundance)]
+        for c in self.children.values():
+            ret += c.get_all_abundances()
+        return ret
+
+
+class TaxTree:
+    def __init__( self, mpa, markers_to_ignore = None ): #, min_cu_len ):
+        self.root = TaxClade( "root" )
+        self.all_clades, self.markers2lens, self.markers2clades, self.taxa2clades, self.markers2exts = {}, {}, {}, {}, {}
+        TaxClade.markers2lens = self.markers2lens
+        TaxClade.markers2exts = self.markers2exts
+        TaxClade.taxa2clades = self.taxa2clades
+        self.id_gen = itertools.count(1)
+
+        clades_txt = ((l.strip().split("|"),n) for l,n in mpa_pkl['taxonomy'].items())        
+        for clade,lenc in clades_txt:
+            father = self.root
+            for clade_lev in clade: # !!!!! [:-1]:
+                if not clade_lev in father.children:
+                    father.add_child( clade_lev, id_int=next(self.id_gen) )
+                    self.all_clades[clade_lev] = father.children[clade_lev]
+                if clade_lev[0] == "t":
+                    self.taxa2clades[clade_lev[3:]] = father
+
+                father = father.children[clade_lev]
+                if clade_lev[0] == "t":
+                    father.glen = lenc
+
+        def add_lens( node ):
+            if not node.children:
+                return node.glen
+            lens = [] 
+            for c in node.children.values():
+                lens.append( add_lens( c ) )
+            node.glen = sum(lens) / len(lens)
+            return node.glen
+        add_lens( self.root )
+        
+        for k,p in mpa_pkl['markers'].items():
+            if k in markers_to_exclude:
+                continue
+            if k in markers_to_ignore:
+                continue
+            self.markers2lens[k] = p['len']
+            self.markers2clades[k] = p['clade']
+            self.add_reads( k, 0  )
+            self.markers2exts[k] = p['ext']
+
+    def set_min_cu_len( self, min_cu_len ):
+        TaxClade.min_cu_len = min_cu_len
+
+    def set_stat( self, stat, quantile, avoid_disqm = False ):
+        TaxClade.stat = stat
+        TaxClade.quantile = quantile
+        TaxClade.avoid_disqm = avoid_disqm
+
+    def add_reads(  self, marker, n, 
+                    ignore_viruses = False, ignore_eukaryotes = False, 
+                    ignore_bacteria = False, ignore_archaea = False  ):
+        clade = self.markers2clades[marker]
+        cl = self.all_clades[clade]
+        if ignore_viruses or ignore_eukaryotes or ignore_bacteria or ignore_archaea:
+            cn = cl.get_full_name()
+            if ignore_viruses and cn.startswith("k__Viruses"):
+                return ""
+            if ignore_eukaryotes and cn.startswith("k__Eukaryota"):
+                return ""
+            if ignore_archaea and cn.startswith("k__Archaea"):
+                return ""
+            if ignore_bacteria and cn.startswith("k__Bacteria"):
+                return ""
+        while len(cl.children) == 1:
+            cl = list(cl.children.values())[0]
+        cl.markers2nreads[marker] = n
+        return cl.get_full_name()
+   
+
+    def markers2counts( self ):
+        m2c = {}
+        for k,v in self.all_clades.items():
+            for m,c in v.markers2nreads.items():
+                m2c[m] = c
+        return m2c
+
+    def clade_profiles( self, tax_lev, get_all = False  ):
+        cl2pr = {}
+        for k,v in self.all_clades.items():
+            if tax_lev and not k.startswith(tax_lev): 
+                continue
+            prof = v.get_normalized_counts()
+            if not get_all and ( len(prof) < 1 or not sum([p[1] for p in prof]) > 0.0 ):
+                continue
+            cl2pr[v.get_full_name()] = prof
+        return cl2pr
+            
+    def relative_abundances( self, tax_lev  ):
+        cl2ab_n = dict([(k,v) for k,v in self.all_clades.items() 
+                    if k.startswith("k__") and not v.uncl])
+     
+        cl2ab, cl2glen, tot_ab = {}, {}, 0.0 
+        for k,v in cl2ab_n.items():
+            tot_ab += v.compute_abundance()
+
+        for k,v in cl2ab_n.items():
+            for cl,ab in v.get_all_abundances():
+                if not tax_lev:
+                    if cl not in self.all_clades:
+                        to = tax_units.index(cl[0])
+                        t = tax_units[to-1]
+                        cl = t + cl.split("_unclassified")[0][1:]
+                        cl = self.all_clades[cl].get_full_name()
+                        spl = cl.split("|")
+                        cl = "|".join(spl+[tax_units[to]+spl[-1][1:]+"_unclassified"])
+                        glen = self.all_clades[spl[-1]].glen
+                    else:
+                        glen = self.all_clades[cl].glen
+                        cl = self.all_clades[cl].get_full_name() 
+                elif not cl.startswith(tax_lev):
+                    if cl in self.all_clades:
+                        glen = self.all_clades[cl].glen
+                    else:
+                        glen = 1.0
+                    continue
+                cl2ab[cl] = ab
+                cl2glen[cl] = glen 
+
+        ret_d = dict([( k, float(v) / tot_ab if tot_ab else 0.0) for k,v in cl2ab.items()])
+        ret_r = dict([( k, (v,cl2glen[k],float(v)*cl2glen[k])) for k,v in cl2ab.items()])
+        #ret_r = dict([( k, float(v) / tot_ab if tot_ab else 0.0) for k,v in cl2ab.items()])
+        if tax_lev:
+            ret_d[tax_lev+"unclassified"] = 1.0 - sum(ret_d.values())
+        return ret_d, ret_r
+
+def map2bbh( mapping_f, input_type = 'bowtie2out', min_alignment_len = None):
+    if not mapping_f:
+        ras, ras_line, inpf = plain_read_and_split, plain_read_and_split_line, sys.stdin
+    else:
+        if mapping_f.endswith(".bz2"):
+            ras, ras_line, inpf = read_and_split, read_and_split_line, bz2.BZ2File( mapping_f, "r" )
+        else:
+            ras, ras_line, inpf = plain_read_and_split,\
+                                  plain_read_and_split_line,\
+                                  open( mapping_f )
+
+    reads2markers, reads2maxb = {}, {}
+    if input_type == 'bowtie2out':
+        for r,c in ras(inpf):
+            reads2markers[r] = c
+    elif input_type == 'sam':
+        for line in inpf:
+            o = ras_line(line)
+            if o[0][0] != '@' and o[2][-1] != '*':
+                if min_alignment_len == None\
+                    or max([int(x.strip('M')) for x in\
+                            re.findall(r'(\d*M)', o[5])]) >= min_alignment_len:
+                    reads2markers[o[0]] = o[2]
+    inpf.close()
+
+    markers2reads = defdict( set )
+    for r,m in reads2markers.items():
+        markers2reads[m].add( r )
+
+    return markers2reads
+    
+    
+def maybe_generate_biom_file(pars, abundance_predictions):
+    if not pars['biom']:
+        return None
+    if not abundance_predictions:
+        return open(pars['biom'], 'w').close()
+
+    delimiter = "|" if len(pars['mdelim']) > 1 else pars['mdelim']
+    def istip(clade_name):
+        end_name = clade_name.split(delimiter)[-1]
+        return end_name.startswith("t__") or end_name.endswith("_unclassified")
+
+    def findclade(clade_name):
+        if clade_name.endswith('_unclassified'):
+            name = clade_name.split(delimiter)[-2]
+        else:
+            name = clade_name.split(delimiter)[-1]
+        return tree.all_clades[name]
+
+    def to_biomformat(clade_name):
+        return { 'taxonomy': clade_name.split(delimiter) }
+
+    clades = iter( (abundance, findclade(name)) 
+                   for (name, abundance) in abundance_predictions
+                   if istip(name) )
+    packed = iter( ([abundance], clade.get_full_name(), clade.id)
+                   for (abundance, clade) in clades )
+
+    #unpack that tuple here to stay under 80 chars on a line
+    data, clade_names, clade_ids = zip(*packed)
+    # biom likes column vectors, so we give it an array like this:
+    # np.array([a],[b],[c])
+    data = np.array(data)
+    sample_ids = [pars['sample_id']]
+    table_id='MetaPhlAn2_Analysis'
+    json_key = "MetaPhlAn2"
+
+    if LooseVersion(biom.__version__) < LooseVersion("2.0.0"):
+        biom_table = biom.table.table_factory(
+            data, sample_ids, clade_ids,
+            sample_metadata      = None,
+            observation_metadata = map(to_biomformat, clade_names),
+            table_id             = table_id,
+            constructor          = biom.table.DenseOTUTable
+        )
+        with open(pars['biom'], 'w') as outfile:
+            json.dump( biom_table.getBiomFormatObject(json_key),
+                           outfile )
+    else:  # Below is the biom2 compatible code
+        biom_table = biom.table.Table(
+            data, clade_ids, sample_ids,
+            sample_metadata      = None,
+            observation_metadata = map(to_biomformat, clade_names),
+            table_id             = table_id,
+            input_is_dense       = True
+        )
+        
+        with open(pars['biom'], 'w') as outfile:  
+            biom_table.to_json( json_key,
+                                direct_io = outfile )
+
+    return True
+
+
+if __name__ == '__main__':
+    pars = read_params( sys.argv )    
+    #if pars['inp'] is None and ( pars['input_type'] is None or  pars['input_type'] == 'automatic'): 
+    #    sys.stderr.write( "The --input_type parameter need top be specified when the "
+    #                      "input is provided from the standard input.\n"
+    #                      "Type metaphlan.py -h for more info\n")
+    #    sys.exit(0)
+
+    if pars['bt2_ps'] in [
+                          "sensitive-local",
+                          "very-sensitive-local"
+                          ]\
+        and pars['min_alignment_len'] == None:
+            pars['min_alignment_len'] = 100
+            sys.stderr.write('Warning! bt2_ps is set to local mode, '\
+                             'and min_alignment_len is None, '
+                             'I automatically set min_alignment_len to 100! '\
+                             'If you do not like, rerun the command and set '\
+                             'min_alignment_len to a specific value.\n'
+                            )
+
+    if pars['input_type'] == 'fastq':
+        pars['input_type'] = 'multifastq'
+    if pars['input_type'] == 'fasta':
+        pars['input_type'] = 'multifasta'
+
+    #if pars['input_type'] == 'automatic':
+    #    pars['input_type'] = guess_input_format( pars['inp'] )
+    #    if not pars['input_type']:
+    #        sys.stderr.write( "Sorry, I cannot guess the format of the input file, please "
+    #                          "specify the --input_type parameter \n" )
+    #        sys.exit(1) 
+
+    # check for the mpa_pkl file
+    if not os.path.isfile(pars['mpa_pkl']):
+        sys.stderr.write("Error: Unable to find the mpa_pkl file at: " + pars['mpa_pkl'] +
+                         "\nExpecting location ${mpa_dir}/db_v20/map_v20_m200.pkl "
+                         "\nSelect the file location with the option --mpa_pkl.\n"
+                         "Exiting...\n\n")
+        sys.exit(1)           
+
+    if pars['ignore_markers']:
+        with open(pars['ignore_markers']) as ignv:
+            ignore_markers = set([l.strip() for l in ignv])
+    else:
+        ignore_markers = set()
+
+    no_map = False
+    if pars['input_type'] == 'multifasta' or pars['input_type'] == 'multifastq':
+        bow = pars['bowtie2db'] is not None
+        if not bow:
+            sys.stderr.write( "No MetaPhlAn BowTie2 database provided\n "
+                              "[--bowtie2db options]!\n"
+                              "Exiting...\n\n" )
+            sys.exit(1)
+        if pars['no_map']:
+            pars['bowtie2out'] = tf.NamedTemporaryFile(dir=pars['tmp_dir']).name
+            no_map = True
+        else:
+            if bow and not pars['bowtie2out']:
+                if pars['inp'] and "," in  pars['inp']:
+                    sys.stderr.write( "Error! --bowtie2out needs to be specified when multiple "
+                                      "fastq or fasta files (comma separated) are provided"  )
+                    sys.exit(1)
+                fname = pars['inp']
+                if fname is None:
+                    fname = "stdin_map"
+                elif stat.S_ISFIFO(os.stat(fname).st_mode):
+                    fname = "fifo_map"
+                pars['bowtie2out'] = fname + ".bowtie2out.txt"
+
+            if os.path.exists( pars['bowtie2out'] ):
+                sys.stderr.write(   
+                    "BowTie2 output file detected: " + pars['bowtie2out'] + "\n"
+                    "Please use it as input or remove it if you want to "
+                    "re-perform the BowTie2 run.\n"
+                    "Exiting...\n\n" )
+                sys.exit(1)
+
+        if bow and not all([os.path.exists(".".join([str(pars['bowtie2db']),p]))
+                        for p in ["1.bt2", "2.bt2", "3.bt2","4.bt2","1.bt2","2.bt2"]]):
+            sys.stderr.write( "No MetaPhlAn BowTie2 database found "
+                              "[--bowtie2db option]! "
+                              "(or wrong path provided)."
+                              "\nExpecting location ${mpa_dir}/db_v20/map_v20_m200 "
+                              "\nExiting... " )
+            sys.exit(1)
+
+        if bow:
+            run_bowtie2( pars['inp'], pars['bowtie2out'], pars['bowtie2db'], 
+                         pars['bt2_ps'], pars['nproc'], file_format = pars['input_type'],
+                         exe = pars['bowtie2_exe'],
+                         samout = pars['samout'],
+                         min_alignment_len = pars['min_alignment_len'])
+            pars['input_type'] = 'bowtie2out'
+        
+        pars['inp'] = pars['bowtie2out'] # !!!
+
+    with open( pars['mpa_pkl'], 'rb' ) as a:
+        mpa_pkl = pickle.loads( bz2.decompress( a.read() ) )
+
+    tree = TaxTree( mpa_pkl, ignore_markers )
+    tree.set_min_cu_len( pars['min_cu_len'] )
+    tree.set_stat( pars['stat'], pars['stat_q'], pars['avoid_disqm']  )
+
+    markers2reads = map2bbh( 
+                            pars['inp'], 
+                            pars['input_type'],
+                            pars['min_alignment_len']
+                            )
+    if no_map:
+        os.remove( pars['inp'] )         
+
+    map_out = []
+    for marker,reads in markers2reads.items():
+        if marker not in tree.markers2lens:
+            continue
+        tax_seq = tree.add_reads( marker, len(reads), 
+                                  ignore_viruses = pars['ignore_viruses'],
+                                  ignore_eukaryotes = pars['ignore_eukaryotes'],
+                                  ignore_bacteria = pars['ignore_bacteria'],
+                                  ignore_archaea = pars['ignore_archaea'],
+                                  )
+        if tax_seq:
+            map_out +=["\t".join([r,tax_seq]) for r in reads]
+    
+    if pars['output'] is None and pars['output_file'] is not None:
+        pars['output'] = pars['output_file']
+
+    with (open(pars['output'],"w") if pars['output'] else sys.stdout) as outf:
+        outf.write('\t'.join((pars["sample_id_key"], pars["sample_id"])) + '\n')
+        if pars['t'] == 'reads_map':
+            outf.write( "\n".join( map_out ) + "\n" )
+        elif pars['t'] == 'rel_ab':
+            cl2ab, _ = tree.relative_abundances( 
+                        pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None )
+            outpred = [(k,round(v*100.0,5)) for k,v in cl2ab.items() if v > 0.0]
+            if outpred:
+                for k,v in sorted(  outpred, reverse=True,
+                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ): 
+                    outf.write( "\t".join( [k,str(v)] ) + "\n" )   
+            else:
+                outf.write( "unclassified\t100.0\n" )
+            maybe_generate_biom_file(pars, outpred)
+        elif pars['t'] == 'rel_ab_w_read_stats':
+            cl2ab, rr = tree.relative_abundances( 
+                        pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None )
+            outpred = [(k,round(v*100.0,5)) for k,v in cl2ab.items() if v > 0.0]
+            totl = 0
+            if outpred:
+                outf.write( "\t".join( [    "#clade_name",
+                                            "relative_abundance",
+                                            "coverage",
+                                            "average_genome_length_in_the_clade",
+                                            "estimated_number_of_reads_from_the_clade" ]) +"\n" )
+
+                for k,v in sorted(  outpred, reverse=True,
+                                    key=lambda x:x[1]+(100.0*(8-x[0].count("|")))  ): 
+                    outf.write( "\t".join( [    k,
+                                                str(v),
+                                                str(rr[k][0]) if k in rr else "-",
+                                                str(rr[k][1]) if k in rr else "-",
+                                                str(int(round(rr[k][2],0)) if k in rr else "-")   
+                                                ] ) + "\n" )   
+                    if "|" not in k:
+                        totl += (int(round(rr[k][2],0)) if k in rr else 0)
+
+                outf.write( "#estimated total number of reads from known clades: " + str(totl)+"\n")
+            else:
+                outf.write( "unclassified\t100.0\n" )
+            maybe_generate_biom_file(pars, outpred)
+
+        elif pars['t'] == 'clade_profiles':
+            cl2pr = tree.clade_profiles( pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None  )
+            for c,p in cl2pr.items():
+                mn,n = zip(*p)
+                outf.write( "\t".join( [""]+[str(s) for s in mn] ) + "\n" )
+                outf.write( "\t".join( [c]+[str(s) for s in n] ) + "\n" )
+        elif pars['t'] == 'marker_ab_table':
+            cl2pr = tree.clade_profiles( pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None  )
+            for v in cl2pr.values():
+                outf.write( "\n".join(["\t".join([str(a),str(b/float(pars['nreads'])) if pars['nreads'] else str(b)]) 
+                                for a,b in v if b > 0.0]) + "\n" )
+        elif pars['t'] == 'marker_pres_table':
+            cl2pr = tree.clade_profiles( pars['tax_lev']+"__" if pars['tax_lev'] != 'a' else None  )
+            for v in cl2pr.values():
+                strout = ["\t".join([str(a),"1"]) for a,b in v if b > pars['pres_th']]
+                if strout:
+                    outf.write( "\n".join(strout) + "\n" )
+
+        elif pars['t'] == 'marker_counts':
+            outf.write( "\n".join( ["\t".join([m,str(c)]) for m,c in tree.markers2counts().items() ]) +"\n" )
+
+        elif pars['t'] == 'clade_specific_strain_tracker':
+            cl2pr = tree.clade_profiles( None, get_all = True  )
+            cl2ab, _ = tree.relative_abundances( None )
+            strout = []
+            for cl,v in cl2pr.items():
+                if cl.endswith(pars['clade']) and cl2ab[cl]*100.0 < pars['min_ab']:
+                    strout = []
+                    break
+                if pars['clade'] in cl:
+                    strout += ["\t".join([str(a),str(int(b > pars['pres_th']))]) for a,b in v]
+            if strout:
+                strout = sorted(strout,key=lambda x:x[0])
+                outf.write( "\n".join(strout) + "\n" )
+            else:
+                sys.stderr.write("Clade "+pars['clade']+" not present at an abundance >"+str(round(pars['min_ab'],2))+"%, "
+                                 "so no clade specific markers are reported\n")
diff --git a/strainphlan.py b/strainphlan.py
new file mode 100755
index 0000000..f0023a0
--- /dev/null
+++ b/strainphlan.py
@@ -0,0 +1,1538 @@
+#!/usr/bin/env python
+# Author: Duy Tin Truong (duytin.truong at unitn.it)
+#		at CIBIO, University of Trento, Italy
+
+__author__ = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '1.0.0'
+__date__ = '2nd August 2016'
+
+import sys
+import os
+import shutil
+ABS_PATH = os.path.abspath(sys.argv[0])
+MAIN_DIR = os.path.dirname(ABS_PATH)
+os.environ['PATH'] += ':' + MAIN_DIR
+os.environ['PATH'] += ':' + os.path.join(MAIN_DIR, 'strainphlan_src')
+sys.path.append(os.path.join(MAIN_DIR, 'strainphlan_src'))
+
+import which
+import argparse as ap
+import cPickle as pickle
+import msgpack
+import glob
+from mixed_utils import statistics
+import ooSubprocess
+from ooSubprocess import trace_unhandled_exceptions
+import bz2
+import gzip
+from collections import defaultdict
+from tempfile import SpooledTemporaryFile, NamedTemporaryFile
+from Bio import SeqIO, Seq, SeqRecord
+from Bio.Alphabet import IUPAC
+import pandas
+import logging
+import logging.config
+import sample2markers
+import copy
+import threading
+import numpy
+import random
+import gc
+#import ipdb
+
+shared_variables = type('shared_variables', (object,), {})
+
+logging.basicConfig(level=logging.DEBUG, stream=sys.stderr,
+                    disable_existing_loggers=False,
+                    format='%(asctime)s | %(levelname)s | %(name)s | %(funcName)s | %(lineno)d | %(message)s')
+logger = logging.getLogger(__name__)
+
+# get the directory that contains this script
+metaphlan2_script_install_folder=os.path.dirname(os.path.abspath(__file__))
+
+# functions
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument(
+        '--ifn_samples',
+        nargs='+',
+        required=False,
+        default=[],
+        type=str,
+        help='The list of sample files (space separated).'\
+             'The wildcard can also be used.')
+
+    p.add_argument(
+        '--ifn_second_samples',
+        nargs='+',
+        required=False,
+        default=[],
+        type=str,
+        help='The list of second sample files (space separated).'\
+             'The wildcard can also be used. '\
+             'Note that only the markers found in the samples or '\
+             'reference genomes '
+             'specified by --ifn_samples or --ifn_representative_sample '\
+             'or --ifn_ref_genomes with '\
+             'add_reference_genomes_as_second_samples=False '\
+             'will be used to build the phylogenetic trees. '
+             )
+
+    p.add_argument(
+        '--ifn_representative_sample',
+        required=False,
+        default=None,
+        type=str,
+        help='The representative sample. The marker list of each species '\
+             'extracted from this sample will be used for all other samples.')
+
+    p.add_argument(
+        '--mpa_pkl', 
+        required=False, 
+        default=os.path.join(metaphlan2_script_install_folder,"db_v20","mpa_v20_m200.pkl"), 
+        type=str, 
+        help='The database of metaphlan3.py.')
+
+    p.add_argument(
+        '--output_dir', 
+        required=True, 
+        default='strainer_output', 
+        type=str,
+        help='The output directory.')
+
+    p.add_argument(
+        '--ifn_markers', 
+        required=False, 
+        default=None, 
+        type=str,
+        help='The marker file in fasta format.')
+
+    p.add_argument(
+        '--nprocs_main', 
+        required=False, 
+        default=1, 
+        type=int,
+        help='The number of processors are used for the main threads. '\
+             'Default 1.')
+
+    p.add_argument(
+        '--nprocs_load_samples', 
+        required=False, 
+        default=None, 
+        type=int,
+        help='The number of processors are used for loading samples. '\
+             'Default nprocs_main.')
+
+    p.add_argument(
+        '--nprocs_align_clean', 
+        required=False, 
+        default=None, 
+        type=int,
+        help='The number of processors are used for aligning and cleaning markers. '\
+             'Default nprocs_main.')
+
+    p.add_argument(
+        '--nprocs_raxml', 
+        required=False, 
+        default=None, 
+        type=int,
+        help='The number of processors are used for running raxml. '\
+             'Default nprocs_main.')
+
+    p.add_argument(
+        '--bootstrap_raxml', 
+        required=False, 
+        default=0, 
+        type=int,
+        help='The number of runs for bootstraping when building the tree. '\
+             'Default 0.')
+
+    p.add_argument(
+        '--ifn_ref_genomes',
+        nargs='+',
+        required=False,
+        default=None,
+        type=str,
+        help='The reference genome file names. They are separated by spaces.')
+
+    p.add_argument(
+        '--add_reference_genomes_as_second_samples', 
+        required=False, 
+        dest='add_reference_genomes_as_second_samples',
+        action='store_true',
+        help='Add reference genomes as second samples. '\
+             'Default "False". ' \
+             'Note that only the markers found in the samples or '\
+             'reference genomes '
+             'specified by --ifn_samples or --ifn_representative_sample '\
+             'or --ifn_ref_genomes with '\
+             'add_reference_genomes_as_second_samples=False '\
+             'will be used to build the phylogenetic trees. '
+             )
+    p.set_defaults(add_reference_genomes_as_second_samples=False)
+
+    p.add_argument(
+        '--N_in_marker',
+        required=False,
+        default=0.2,
+        type=float,
+        help='The consensus markers with the rate of N nucleotides greater than '\
+                'this threshold are removed. Default 0.2.')
+
+    p.add_argument(
+        '--marker_strip_length',
+        required=False,
+        default=50,
+        type=int,
+        help='The number of nucleotides will be deleted from each of two ends '\
+                'of a marker. Default 50.')
+
+    p.add_argument(
+        '--marker_in_clade',
+        required=False,
+        default=0.8,
+        type=float,
+        help='In each sample, the clades with the rate of present markers less than '\
+                'this threshold are removed. Default 0.8.')
+
+    p.add_argument(
+        '--second_marker_in_clade',
+        required=False,
+        default=0.8,
+        type=float,
+        help='In each sample/reference genomes specified by --ifn_second_samples, '\
+             'or --add_reference_genomes_as_second_samples, '\
+             'the clades with the rate of present markers less than '\
+             'this threshold are removed. Default 0.8.')
+
+    p.add_argument(
+        '--sample_in_clade',
+        required=False,
+        default=2,
+        type=int,
+        help='Only clades present in at least sample_in_clade samples '\
+             'are kept. Default 2.')
+
+    p.add_argument(
+        '--sample_in_marker',
+        required=False,
+        default=0.8,
+        type=float,
+        help='If the percentage of samples that a marker present in is '\
+             'less than this threshold, that marker is removed. Default 0.8.')
+
+    p.add_argument(
+        '--gap_in_trailing_col',
+        required=False,
+        default=0.2,
+        type=float,
+        help='If the number of the trailing nucleotide columns in aligned '\
+              'markers with the percentage of gaps greater than '\
+              'gap_in_trailing_col is less than gap_trailing_col_limit, '\
+              'these columns will be removed. '\
+              'Default 0.2.')
+
+    p.add_argument(
+        '--gap_trailing_col_limit',
+        required=False,
+        default=101,
+        type=float,
+        help='If the number of the trailing nucleotide columns in aligned '\
+              'markers with the percentage of gaps greater than '\
+              'gap_in_trailing_col is less than gap_trailing_col_limit, '\
+              'these columns will be removed. '\
+              'Default 101.')
+
+    p.add_argument(
+        '--gap_in_internal_col',
+        required=False,
+        default=0.3,
+        type=float,
+        help='The internal nucleotide columns in aligned '\
+              'markers with the percentage of gaps greater than '\
+              'gap_in_internal_col will be removed. '\
+              'Default 0.3.')
+
+    p.add_argument(
+        '--gap_in_sample',
+        required=False,
+        default=0.2,
+        type=float,
+        help='The samples with full sequences from all markers '\
+            'and having the percentage of gaps greater than this threshold '\
+            'will be removed. Default 0.2.')
+
+    p.add_argument(
+        '--second_gap_in_sample',
+        required=False,
+        default=0.2,
+        type=float,
+        help='The samples specified by --ifn_second_samples with full sequences from all markers '\
+            'and having the percentage of gaps greater than this threshold '\
+            'will be removed. Default 0.2.')
+
+    p.add_argument(
+        '--N_col',
+        required=False,
+        default=0.8,
+        type=float,
+        help='In aligned markers, if the percentage of nucleotide columns '\
+              'containing more than N_count Ns '\
+              'less than this threshold, these columns will be removed. '
+              'Default 0.8.')
+
+    p.add_argument(
+        '--N_count',
+        required=False,
+        default=0,
+        type=int,
+        help='In aligned markers, if the percentage of nucleotide columns '\
+              'containing more than N_count Ns '\
+              'less than N_col threshold, these columns will be removed. '\
+              'Default 0.')
+
+    p.add_argument(
+        '--long_gap_length',
+        required=False,
+        default=2,
+        type=int,
+        help='In each concatenated sequence of a sample, sequential '\
+                'gap positions is a gap group. '\
+                'A gap group with length greater than this '\
+                'threshold is considered as '\
+                'a long gap group. If the ratio between the number of unique '\
+                'positions in all long gap groups and the concatenated sequence '\
+                'length is less than long_gap_percentage, these positions '\
+                'will be removed from all concatenated sequences. '\
+                'Default 2.')
+
+    p.add_argument(
+        '--long_gap_percentage',
+        required=False,
+        default=0.8,
+        type=float,
+        help='Combining this threshold with long_gap_length to removed long '\
+             'gaps. Default 0.8.')
+
+    p.add_argument(
+        '--p_value',
+        required=False,
+        default=0.05,
+        type=float,
+        help='The p_value to reject a non-polymorphic site.'\
+             'Default 0.05.')
+
+    p.add_argument(
+        '--clades', 
+        nargs='+',
+        required=False, 
+        default=['all'], 
+        type=str,
+        help='The clades (space seperated) for which the script will compute '\
+                'the marker alignments in fasta format and the phylogenetic '\
+                'trees. If a file name is specified, the clade list in that '\
+                'file where each clade name is on a line will be read.'
+                'Default "automatically identify all clades".')
+
+    p.add_argument(
+        '--marker_list_fn', 
+        required=False, 
+        default=None, 
+        type=str,
+        help='The file name containing the list of considered markers. '\
+                'The other markers will be discarded. '\
+                'Default "None".')
+    p.add_argument(
+        '--print_clades_only', 
+        required=False, 
+        dest='print_clades_only',
+        action='store_true',
+        help='Only print the potential clades and stop without building any '\
+             'tree. This option is useful when you want to check quickly '\
+             'all possible clades and rerun only for some specific ones. '\
+             'Default "False".')
+    p.set_defaults(print_clades_only=False)
+
+    p.add_argument(
+        '--alignment_program', 
+        required=False, 
+        default='muscle', 
+        choices=['muscle', 'mafft'],
+        type=str,
+        help='The alignment program. Default "muscle".')
+
+    p.add_argument(
+        '--relaxed_parameters', 
+        required=False, 
+        dest='relaxed_parameters',
+        action='store_true',
+        help='Set marker_in_clade=0.5, sample_in_marker=0.5, '\
+             'N_in_marker=0.5, gap_in_sample=0.5. '\
+             'Default "False".')
+    p.set_defaults(relaxed_parameters=False)
+
+    p.add_argument(
+        '--relaxed_parameters2', 
+        required=False, 
+        dest='relaxed_parameters2',
+        action='store_true',
+        help='Set marker_in_clade=0.2, sample_in_marker=0.2, '\
+             'N_in_marker=0.8, gap_in_sample=0.8. '\
+             'Default "False".')
+    p.set_defaults(relaxed_parameters2=False)
+
+    p.add_argument(
+        '--relaxed_parameters3', 
+        required=False, 
+        dest='relaxed_parameters3',
+        action='store_true',
+        help='Set gap_in_trailing_col=0.9, gap_in_internal_col=0.9, '\
+             'gap_in_sample=0.9, second_gap_in_sample=0.5, '\
+             'sample_in_marker=0.1, marker_in_clade=0.1, '\
+             'second_marker_in_clade=0.1, '\
+             'Default "False".')
+    p.set_defaults(relaxed_parameters3=False)
+
+    p.add_argument(
+        '--keep_alignment_files', 
+        required=False, 
+        dest='keep_alignment_files',
+        action='store_true',
+        help='Keep the alignment files of all markers before cleaning step.')
+    p.set_defaults(keep_alignment_files=False)
+
+    p.add_argument(
+        '--keep_full_alignment_files', 
+        required=False, 
+        dest='keep_full_alignment_files',
+        action='store_true',
+        help='Keep the alignment files of all markers before '\
+             'truncating the starting and ending parts, and cleaning step. '
+             'This is equivalent to '\
+             '--keep_alignment_files --marker_strip_length 0')
+    p.set_defaults(keep_full_alignment_files=False)
+
+    p.add_argument(
+        '--save_sample2fullfreq', 
+        required=False, 
+        dest='save_sample2fullfreq',
+        action='store_true',
+        help='Save sample2fullfreq to a msgpack file sample2fullfreq.msgpack.')
+    p.set_defaults(save_sample2fullfreq=False)
+
+    p.add_argument(
+        '--use_threads', 
+        required=False, 
+        action='store_true',
+        dest='use_threads',
+        help='Use multithreading. Default "Use multiprocessing".')
+    p.set_defaults(use_threads=False)
+
+    return vars(p.parse_args())
+
+
+
+def filter_sequence(sample, marker2seq, marker_strip_length, N_in_marker):
+    '''
+    Filter markers with percentage of N-bases greater than a threshold.
+
+    :param marker2seq: a dictionary containing sequences of a sample.
+    marker2seq[marker]['seq'] should return the sequence of the marker.
+    :returns: a dictionary containing filtered sequences of samples.
+    '''
+    remove_markers = [marker for marker in marker2seq if
+                      float(marker2seq[marker]['seq'].count('N')) /
+                      len(marker2seq[marker]['seq']) > N_in_marker]
+    for marker in remove_markers:
+        del marker2seq[marker]
+    log_line = 'sample %s, number of markers after N_in_marker: %d\n'\
+               %(sample, len(marker2seq))
+
+    remove_markers = []
+    for marker in marker2seq:
+        if marker_strip_length > 0:
+            marker2seq[marker]['seq'] = \
+                marker2seq[marker]['seq'][marker_strip_length:-marker_strip_length]
+            marker2seq[marker]['freq'] = \
+                marker2seq[marker]['freq'][marker_strip_length:-marker_strip_length]
+        #marker2seq[marker]['seq'] = marker2seq[marker]['seq'].strip('N')
+        if len(marker2seq[marker]['seq']) == 0:
+            remove_markers.append(marker)
+    for marker in remove_markers:
+        del marker2seq[marker]
+    logger.debug(log_line + \
+                 'sample %s, number of markers after marker_strip_length: %d'\
+                 %(sample, len(marker2seq)))
+
+    return marker2seq
+
+
+
+
+def get_db_clades(db):
+    # find singleton clades
+    sing_clades = []
+    clade2subclades = defaultdict(set)
+    for tax in db['taxonomy']:
+        tax_clades = tax.split('|')
+        for i, clade in enumerate(tax_clades):
+            if 't__' not in clade and 's__' not in clade:
+                if i < len(tax_clades)-1:
+                    if 't__' in tax_clades[-1]:
+                        clade2subclades[clade].add('|'.join(tax_clades[i+1:-1]))
+                    else:
+                        clade2subclades[clade].add('|'.join(tax_clades[i+1:]))
+    sing_clades = [clade for clade in clade2subclades if
+                             len(clade2subclades[clade]) == 1]
+
+    # extract species
+    clade2num_markers = defaultdict(int)
+    level = 's__'
+    for marker in db['markers']:
+        clade = db['markers'][marker]['taxon'].split('|')[-1]
+        if level in clade or clade in sing_clades:
+            clade2num_markers[clade] = clade2num_markers[clade] + 1
+    clade2num_markers = dict(clade2num_markers)
+
+    return sing_clades, clade2num_markers, clade2subclades
+
+
+
+
+def align(marker_fn, alignment_program):
+    oosp = ooSubprocess.ooSubprocess()
+    if alignment_program == 'muscle':
+        ifile = open(marker_fn, 'r')
+        alignment_file = oosp.ex(
+            'muscle',
+            args=['-quiet', '-in', '-', '-out', '-'],
+            in_pipe=ifile,
+            get_out_pipe=True,
+            verbose=False)
+        ifile.close()
+    elif alignment_program == 'mafft':
+        alignment_file = oosp.ex(
+            'mafft',
+            args=['--auto', marker_fn],
+            get_out_pipe=True,
+            verbose=False)
+    else:
+        raise Exception('Unknown alignment_program %s!'%alignment_program)
+    return alignment_file
+
+
+
+
+def clean_alignment(
+        samples,
+        sample2seq,
+        sample2freq,
+        gap_in_trailing_col,
+        gap_trailing_col_limit,
+        gap_in_internal_col,
+        N_count,
+        N_col):
+
+    length = len(sample2seq[sample2seq.keys()[0]])
+    logger.debug('marker length: %d', length)
+    aligned_samples = sample2seq.keys()
+    for sample in samples:
+        if sample not in aligned_samples:
+            sample2seq[sample] = ['-' for i in range(length)]
+            sample2freq[sample] = [(0.0, 0.0, 0.0) for i in range(length)]
+
+    df_seq = pandas.DataFrame.from_dict(sample2seq, orient='index')
+    df_freq = pandas.DataFrame.from_dict(sample2freq, orient='index')
+
+    # remove trailing gap columns
+    del_cols = []
+    for i in range(len(df_seq.columns)):
+        if float(list(df_seq[df_seq.columns[i]]).count('-')) / len(samples) <= gap_in_trailing_col:
+            break
+        else:
+            del_cols.append(df_seq.columns[i])
+    for i in reversed(range(len(df_seq.columns))):
+        if float(list(df_seq[df_seq.columns[i]]).count('-')) / len(samples) <= gap_in_trailing_col:
+            break
+        else:
+            del_cols.append(df_seq.columns[i])
+    if len(del_cols) < gap_trailing_col_limit:
+        df_seq.drop(del_cols, axis=1, inplace=True)
+        df_freq.drop(del_cols, axis=1, inplace=True)
+        logger.debug('length after gap_in_trailing_col: %d', len(df_seq.columns))
+    else:
+        logger.debug('do not use gap_in_trailing_col as the number of del_cols is %d'%len(del_cols))
+
+    # remove internal gap columns
+    del_cols = []
+    for i in range(len(df_seq.columns)):
+        if float(list(df_seq[df_seq.columns[i]]).count('-')) / len(samples) > gap_in_internal_col:
+            del_cols.append(df_seq.columns[i])
+    df_seq.drop(del_cols, axis=1, inplace=True)
+    df_freq.drop(del_cols, axis=1, inplace=True)
+    logger.debug('length after gap_in_internal_col: %d', len(df_seq.columns))
+
+    # remove N columns
+    if len(df_seq.columns) > 0:
+        del_cols = []
+        remove_N_col = False
+        for i in range(len(df_seq.columns)):
+            if list(df_seq[df_seq.columns[i]]).count('N') > N_count:
+                del_cols.append(df_seq.columns[i])
+        if float(len(del_cols)) / len(df_seq.columns) < N_col:
+            remove_N_col = True
+            df_seq.drop(del_cols, axis=1, inplace=True)   
+            df_freq.drop(del_cols, axis=1, inplace=True)   
+            logger.debug('length after N_col: %d', len(df_seq.columns))
+            
+        if N_count > 0 or not remove_N_col:
+            logger.debug('replace Ns by gaps for all samples')
+            for sample in samples:
+                seq = ''.join(df_seq.loc[sample])
+                logger.debug('sample %s, number of Ns: %d'\
+                                %(sample, seq.count('N')))
+                sample2seq[sample] = list(seq.replace('N', '-'))
+        else:
+            for sample in samples:
+                sample2seq[sample] = df_seq.loc[sample].tolist()
+        for sample in samples:
+            sample2freq[sample] = df_freq.loc[sample].tolist()
+    else:
+        sample2seq = {}
+        sample2freq = {}
+
+    return sample2seq, sample2freq
+
+
+
+
+def add_ref_genomes(genome2marker, marker_records, ifn_ref_genomes, tmp_dir):
+    ifn_ref_genomes = sorted(list(set(ifn_ref_genomes)))
+    logger.debug('add %d reference genomes'%len(ifn_ref_genomes))
+    logger.debug('Number of samples: %d'%len(genome2marker))
+
+    # marker list
+    if len(genome2marker) == 0:
+        unique_markers = set(marker_records.keys())
+    else:
+        unique_markers = set([])
+        for sample in genome2marker:
+            for marker in genome2marker[sample]:
+                if marker not in unique_markers:
+                    unique_markers.add(marker)
+    logger.debug('Number of unique markers: %d'%len(unique_markers))
+
+    # add ifn_ref_genomes
+    oosp = ooSubprocess.ooSubprocess(tmp_dir=tmp_dir)
+    logger.debug('load genome contigs')
+    p1 = SpooledTemporaryFile(dir=tmp_dir)
+    contigs = defaultdict(dict)
+    for ifn_genome in ifn_ref_genomes:
+        genome = ooSubprocess.splitext(ifn_genome)[0]
+        if ifn_genome[-4:] == '.bz2':
+            ifile_genome = bz2.BZ2File(ifn_genome, 'r')
+        elif ifn_genome[-3:] == '.gz':
+            ifile_genome = gzip.GzipFile(ifn_genome, 'r')
+        elif ifn_genome[-4:] == '.fna':
+            ifile_genome = open(ifn_genome, 'r')
+        else:
+            logger.error('Unknown file type of %s. '%ifn_genome +\
+                            'It should be .fna.bz2, .fna.gz, or .fna!')
+            exit(1)
+
+        # extract genome contigs
+        for rec in SeqIO.parse(ifile_genome, 'fasta'):
+            #rec.name = genome + '___' + rec.name
+            if rec.name in contigs:
+                logger.error(
+                            'Error: Contig %s in genome%s'\
+                            %(rec.name.split('___')[-1], genome)\
+                            + ' are not unique!')
+                exit(1)
+            contigs[rec.name]['seq'] = str(rec.seq)
+            contigs[rec.name]['genome'] = genome
+            SeqIO.write(rec, p1, 'fasta')
+
+        ifile_genome.close()
+    p1.seek(0)
+                        
+    # build blastdb
+    logger.debug('build blastdb')
+    blastdb_prefix = oosp.ftmp('genome_blastn_db_%s'%(random.random()))
+    if len(glob.glob('%s*'%blastdb_prefix)):
+        logger.error('blastdb exists! Please remove it or rerun!')
+        exit(1)
+    oosp.ex('makeblastdb', 
+                args=[
+                        '-dbtype', 'nucl', 
+                        '-title', 'genome_db',
+                        '-out', blastdb_prefix],
+                in_pipe=p1,
+                verbose=True)
+
+    # blast markers against contigs
+    logger.debug('blast markers against contigs')
+    p1 = SpooledTemporaryFile(dir=tmp_dir)
+    for marker in unique_markers:
+        SeqIO.write(marker_records[marker], p1, 'fasta')
+    p1.seek(0)
+    blastn_args = [
+                    '-db', blastdb_prefix,
+                    '-outfmt', '6',
+                    '-evalue', '1e-10',
+                    '-max_target_seqs', '1000000000']
+    if args['nprocs_main'] > 1:
+        blastn_args += ['-num_threads', str(args['nprocs_main'])]
+    output = oosp.ex(
+                        'blastn', 
+                        args=blastn_args,
+                        in_pipe=p1,
+                        get_out_pipe=True,
+                        verbose=True)
+    
+    #output = output.split('\n')
+    for line in output:
+        if line.strip() == '':
+            break
+        line = line.strip().split()
+        query = line[0]
+        target = line[1]
+        pstart = int(line[8])-1
+        pend = int(line[9])-1
+        genome = contigs[target]['genome']
+        if query not in genome2marker[genome]:
+            genome2marker[genome][query] = {}
+            if pstart < pend:
+                genome2marker[genome][query]['seq'] = contigs[target]['seq'][pstart:pend+1]
+            else:
+                genome2marker[genome][query]['seq'] = \
+                     str(Seq.Seq(
+                                contigs[target]['seq'][pend:pstart+1],
+                                IUPAC.unambiguous_dna).reverse_complement())
+            genome2marker[genome][query]['freq'] = [(0.0, 0.0, 0.0) for i in \
+                                                    range(len(genome2marker[genome][query]['seq']))]
+            genome2marker[genome][query]['seq'] = genome2marker[genome][query]['seq'].upper()
+
+    # remove database
+    for fn in glob.glob('%s*'%blastdb_prefix):
+        os.remove(fn)
+
+    logger.debug('Number of samples and genomes: %d'%len(genome2marker))
+    return genome2marker
+
+
+
+
+ at trace_unhandled_exceptions
+def align_clean(args):
+    marker = args['marker']
+    sample2marker = shared_variables.sample2marker #args['sample2marker']
+    clade = args['clade']
+    gap_in_trailing_col = args['gap_in_trailing_col']
+    gap_trailing_col_limit = args['gap_trailing_col_limit']
+    gap_in_internal_col = args['gap_in_internal_col']
+    N_col = args['N_col']
+    N_count = args['N_count']
+    sample_in_marker = args['sample_in_marker']
+    tmp_dir = args['tmp_dir']
+    alignment_program = args['alignment_program']
+    alignment_fn = args['alignment_fn']
+
+    logger.debug('align and clean for marker: %s'%marker)
+    marker_file = NamedTemporaryFile(dir=tmp_dir, delete=False)
+    marker_fn = marker_file.name
+    sample_count = 0
+    for sample in iter(sample2marker.keys()):
+        if marker in iter(sample2marker[sample].keys()):
+            sample_count += 1
+            SeqIO.write(
+                SeqRecord.SeqRecord(
+                    id=sample,
+                    description='',
+                    seq=Seq.Seq(sample2marker[sample][marker]['seq'])),
+                marker_file,
+                'fasta')
+    marker_file.close()
+    ratio = float(sample_count) / len(sample2marker)
+    if  ratio < sample_in_marker:
+        os.remove(marker_fn)
+        logger.debug('skip this marker because percentage of samples '\
+                     'it present is %f < sample_in_marker'%ratio)
+        return {}, {}
+
+    alignment_file = align(marker_fn, alignment_program)
+    os.remove(marker_fn)
+
+    sample2seq = {}
+    sample2freq = {}
+    for rec in SeqIO.parse(alignment_file, 'fasta'):
+        sample = rec.name
+        sample2seq[sample] = list(str(rec.seq))
+        sample2freq[sample] = list(sample2marker[sample][marker]['freq'])
+        for i, c in enumerate(sample2seq[sample]):
+            if c == '-':
+                sample2freq[sample].insert(i, (0.0, 0.0, 0.0))
+        logger.debug('alignment length of sample %s is %d, %d'%(
+                        sample,
+                        len(sample2seq[sample]),
+                        len(sample2freq[sample])))
+    if alignment_fn:
+        shutil.copyfile(alignment_file.name, alignment_fn)
+
+    alignment_file.close()
+    logger.debug('alignment for marker %s is done'%marker)
+
+
+    if len(sample2seq) == 0:
+        logger.error('Fatal error in alignment step!')
+        exit(1)
+
+    sample2seq, sample2freq = clean_alignment(
+                                    sample2marker.keys(),
+                                    sample2seq, 
+                                    sample2freq,
+                                    gap_in_trailing_col,
+                                    gap_trailing_col_limit,
+                                    gap_in_internal_col,
+                                    N_count,
+                                    N_col)
+    logger.debug('cleaning for marker %s is done'%marker)
+
+    return sample2seq, sample2freq
+
+
+
+
+def build_tree(
+        clade,
+        sample2marker, 
+        sample2order,
+        clade2num_markers,
+        sample_in_clade,
+        sample_in_marker,
+        gap_in_trailing_col,
+        gap_trailing_col_limit,
+        gap_in_internal_col,
+        N_count,
+        N_col,
+        gap_in_sample,
+        second_gap_in_sample,
+        long_gap_length,
+        long_gap_percentage,
+        p_value,
+        output_dir,
+        nprocs_align_clean,
+        alignment_program,
+        nprocs_raxml,
+        keep_alignment_files,
+        bootstrap_raxml,
+        save_sample2fullfreq,
+        use_threads):
+
+    # build the tree for each clade
+    if len(sample2marker) < sample_in_clade:
+        logger.debug(
+                        'skip clade %s because number of present samples '
+                        'is %d'%(clade, len(sample2marker)))
+        return
+
+    ofn_cladeinfo = os.path.join(output_dir, '%s.info'%clade)
+    ofile_cladeinfo = open(ofn_cladeinfo, 'w')
+
+    logger.debug('clade: %s', clade)
+    ofile_cladeinfo.write('clade: %s\n'%clade)
+    logger.debug('number of samples: %d', len(sample2marker))
+    ofile_cladeinfo.write('number of samples: %d\n'\
+                          %len(sample2marker))
+    if clade in clade2num_markers:
+        logger.debug('number of markers of the clade in db: %d'\
+                     %clade2num_markers[clade])
+        ofile_cladeinfo.write('number of markers of the clade in db: %d\n'\
+                          %clade2num_markers[clade])
+
+    # align sequences in each marker
+    markers = set([]) 
+    for sample in sample2marker:
+        if sample2order[sample] == 'first':
+            for marker in sample2marker[sample]:
+                if marker not in markers:
+                    markers.add(marker)
+    markers = sorted(list(markers))
+
+    logger.debug('number of used markers: %d'%len(markers))
+    ofile_cladeinfo.write('number of used markers: %d\n'%len(markers))
+    if clade in clade2num_markers:
+        logger.debug('fraction of used markers: %f'\
+                    %(float(len(markers)) / clade2num_markers[clade]))
+        ofile_cladeinfo.write('fraction of used markers: %f\n'\
+                    %(float(len(markers)) / clade2num_markers[clade]))
+
+    logger.debug('align and clean')
+    args_list = []
+
+    # parallelize
+    for i in range(len(markers)):
+        args_list.append({})
+        args_list[i]['marker'] = markers[i]
+        args_list[i]['clade'] = clade
+        args_list[i]['gap_in_trailing_col'] = gap_in_trailing_col
+        args_list[i]['gap_trailing_col_limit'] = gap_trailing_col_limit
+        args_list[i]['gap_in_internal_col'] = gap_in_internal_col
+        args_list[i]['N_count'] = N_count
+        args_list[i]['N_col'] = N_col
+        args_list[i]['sample_in_marker'] = sample_in_marker
+        args_list[i]['tmp_dir'] = output_dir
+        args_list[i]['alignment_program'] = alignment_program
+        if keep_alignment_files:
+            args_list[i]['alignment_fn'] = os.path.join(output_dir, markers[i] + '.marker_aligned')
+        else:
+            args_list[i]['alignment_fn'] = None
+
+    logger.debug('start to align_clean for all markers')
+    results = ooSubprocess.parallelize(
+                                       align_clean, 
+                                       args_list, 
+                                       nprocs_align_clean,
+                                       use_threads=use_threads)
+
+    sample2seqs, sample2freqs = zip(*results)
+    sample2fullseq = defaultdict(list)
+    sample2fullfreq = defaultdict(list)
+    empty_markers = []
+    pos = 0
+    marker_pos = []
+    for i in range(len(sample2seqs)):
+        #logger.debug('marker_name: %s, seq: %s'%(markers[i], sample2seqs[i]))
+        if len(sample2seqs[i]):
+            for sample in sample2seqs[i]:
+                sample2fullseq[sample] += sample2seqs[i][sample]
+                sample2fullfreq[sample] += sample2freqs[i][sample]
+            marker_pos.append([markers[i], pos])
+            pos += len(sample2seqs[i][sample])
+        else:
+            empty_markers.append(markers[i])
+
+    logger.debug(
+            'number of markers after deleting empty markers: %d',
+            len(markers) - len(empty_markers))
+    ofile_cladeinfo.write('number of markers after deleting '\
+                          'empty markers: %d\n'%
+                          (len(markers) - len(empty_markers)))
+
+    if clade in clade2num_markers:
+        logger.debug('fraction of used markers after deleting empty markers: '\
+                '%f'%(float(len(markers) - len(empty_markers)) / clade2num_markers[clade]))
+        ofile_cladeinfo.write('fraction of used markers after deleting empty '\
+                'markers: %f\n'\
+                %(float(len(markers) - len(empty_markers)) / clade2num_markers[clade]))
+
+
+    if len(sample2fullseq) == 0:
+        logger.debug('all markers were removed, skip this clade!')
+        ofile_cladeinfo.write('all markers were removed, skip this clade!\n')
+        return
+    
+    # remove long gaps
+    logger.debug('full sequence length before long_gap_length: %d'\
+                    %(len(sample2fullseq[sample2fullseq.keys()[0]])))
+    ofile_cladeinfo.write(
+                    'full sequence length before long_gap_length: %d\n'\
+                    %(len(sample2fullseq[sample2fullseq.keys()[0]])))
+
+    df_seq = pandas.DataFrame.from_dict(sample2fullseq, orient='index')
+    df_freq = pandas.DataFrame.from_dict(sample2fullfreq, orient='index')
+    del_cols = []
+    del_pos = []
+    for sample in sample2fullseq:
+        row = df_seq.loc[sample]
+        gap_in_cols = []
+        gap_in_pos = []
+        for i in range(len(row)):
+            if row[df_seq.columns[i]] == '-':
+                gap_in_cols.append(df_seq.columns[i])
+                gap_in_pos.append(i)
+            else:
+                if len(gap_in_cols) > long_gap_length:
+                    del_cols += gap_in_cols
+                    del_pos += gap_in_pos
+                gap_in_cols = []
+                gap_in_pos = []
+
+        if len(gap_in_cols) > long_gap_length:
+            del_cols += gap_in_cols
+            del_pos += gap_in_pos
+
+    del_cols = list(set(del_cols))
+    del_pos = sorted(list(set(del_pos)))
+    del_ratio = float(len(del_cols)) / len(sample2fullseq[sample])
+    if del_ratio < long_gap_percentage:
+        df_seq.drop(del_cols, axis=1, inplace=True)
+        df_freq.drop(del_cols, axis=1, inplace=True)
+        for sample in sample2fullseq:
+            sample2fullseq[sample] = df_seq.loc[sample].tolist()
+            sample2fullfreq[sample] = df_freq.loc[sample].tolist()
+        logger.debug('full sequence length after long_gap_length: %d'\
+                        %(len(sample2fullseq[sample2fullseq.keys()[0]])))
+        ofile_cladeinfo.write(
+                        'full sequence length after long_gap_length: %d\n'\
+                        %(len(sample2fullseq[sample2fullseq.keys()[0]])))
+
+        for i in range(len(marker_pos)):
+            num_del = 0 
+            for p in del_pos:
+                if marker_pos[i][1] > p:
+                    num_del += 1
+            marker_pos[i][1] -= num_del
+    else:
+        logger.debug('do not apply long_gap_length because '\
+                        'long_gap_percentage is not satisfied. '\
+                        'del_ratio: %f'%del_ratio)
+        ofile_cladeinfo.write('do not apply long_gap_length because '\
+                                'long_gap_percentage is not satisfied. '\
+                                'del_ratio: %f\n'%del_ratio)
+
+    ofn_clademarker = os.path.join(output_dir, '%s.marker_pos'%clade)
+    with open(ofn_clademarker, 'w') as ofile_clademarker:
+        for m, p in marker_pos:
+            ofile_clademarker.write('%s\t%d\n'%(m, p))
+
+    # remove samples with too many gaps
+    logger.debug(
+            'number of samples before gap_in_sample: %d'\
+            %len(sample2fullseq))
+    ofile_cladeinfo.write(
+            'number of samples before gap_in_sample: %d\n'\
+            %len(sample2fullseq))
+    for sample in sample2marker:
+        ratio = float(sample2fullseq[sample].count('-')) / len(sample2fullseq[sample]) 
+        gap_ratio = gap_in_sample if (sample2order[sample] == 'first') else second_gap_in_sample
+        if ratio > gap_ratio:
+            del sample2fullseq[sample]
+            del sample2fullfreq[sample]
+            if sample2order[sample] == 'first':
+                logger.debug('remove sample %s by gap_in_sample %f'%(sample, ratio))
+            else:
+                logger.debug('remove sample %s by second_gap_in_sample %f'%(sample, ratio))
+    logger.debug(
+            'number of samples after gap_in_sample: %d'\
+            %len(sample2fullseq))
+    ofile_cladeinfo.write(
+            'number of samples after gap_in_sample: %d\n'\
+            %len(sample2fullseq))
+
+    # log gaps
+    sequential_gaps = []
+    all_gaps = []
+    for sample in sample2fullseq:
+        agap = 0
+        sgap = 0
+        row2 = sample2fullseq[sample]
+        for i in range(len(row2)):
+            if row2[i] == '-':
+                sgap += 1
+                agap += 1
+            elif sgap > 0:
+                sequential_gaps.append(sgap)
+                sgap = 0
+        all_gaps.append(agap)
+        
+    ofile_cladeinfo.write('all_gaps:\n' + statistics(all_gaps)[1])
+    if sequential_gaps == []:
+        sequential_gaps = [0]
+    ofile_cladeinfo.write('sequential_gaps:\n' + \
+                          statistics(sequential_gaps)[1])
+    ofile_cladeinfo.close()
+
+    # compute ppercentage of polymorphic sites
+    if save_sample2fullfreq:
+        with open(os.path.join(output_dir, 'sample2fullfreq.msgpack'), 'wb') as ofile:
+            msgpack.dump(sample2fullfreq, ofile)
+
+    ofn_pol = os.path.join(output_dir, '%s.polymorphic'%clade)
+    logger.debug('polymorphic file: %s'%ofn_pol)
+    with open(ofn_pol, 'w') as ofile:
+        ofile.write('#sample\tpercentage_of_polymorphic_sites\tavg_freq\tmedian_freq\tstd_freq\tmin_freq\tmax_freq\tq90_freq\tq10_freq\tavg_coverage\tmedian_coverage\tstd_coverage\tmin_coverage\tmax_coverage\tq90_coverage\tq10_coverage\n')
+        for sample in sample2fullfreq:
+            freqs = [x[0] for x in sample2fullfreq[sample] if x[0] > 0 and x[0] < 1 and x[2] < p_value]
+            coverages = [x[1] for x in sample2fullfreq[sample] if x[0] > 0 and x[0] < 1 and x[2] < p_value]
+            ofile.write('%s\t%f'%(sample, float(len(freqs)) * 100 / len(sample2fullfreq[sample])))
+            for vals in [freqs, coverages]:
+                if len(vals):
+                    ofile.write('\t%f\t%f\t%f\t%f\t%f\t%f\t%f'%(\
+                                numpy.average(vals),
+                                numpy.percentile(vals,50),
+                                numpy.std(vals),
+                                numpy.min(vals),
+                                numpy.max(vals),
+                                numpy.percentile(vals,90),
+                                numpy.percentile(vals,10),
+                               ))
+                else:
+                    ofile.write('\t0\t0\t0\t0\t0\t0\t0')
+            ofile.write('\n')
+
+
+    # save merged alignment
+    ofn_align = os.path.join(output_dir, '%s.fasta'%clade)
+    logger.debug('alignment file: %s'%ofn_align)
+    with open(ofn_align, 'w') as ofile:
+        for sample in sample2fullseq:
+            SeqIO.write(
+                SeqRecord.SeqRecord(
+                    id=sample,
+                    description='',
+                    seq=Seq.Seq(''.join(sample2fullseq[sample]))),
+                ofile, 
+                'fasta')
+
+    # produce tree
+    oosp = ooSubprocess.ooSubprocess() 
+    #ofn_tree = os.path.join(output_dir, '%s.tree'%clade)
+    #oosp.ex('FastTree', args=['-quiet', '-nt', ofn_align], out_fn=ofn_tree)
+    ofn_tree = clade + '.tree'
+    logger.debug('tree file: %s'%ofn_tree)
+    try:
+        for fn in glob.glob('%s/RAxML_*%s'
+                    %(os.path.abspath(output_dir), ofn_tree)):
+            os.remove(fn)
+        raxml_args = [
+                            '-s', os.path.abspath(ofn_align), 
+                            '-w', os.path.abspath(output_dir),
+                            '-n', ofn_tree,
+                            '-p', '1234'
+                        ]
+        if bootstrap_raxml:
+            raxml_args += ['-f', 'a',
+                           '-m', 'GTRGAMMA',
+                           '-x', '1234',
+                           '-N', str(bootstrap_raxml)]
+        else:
+            raxml_args += ['-m', 'GTRCAT']
+
+        if nprocs_raxml > 1:
+            raxml_args += ['-T', str(nprocs_raxml)]
+            raxml_prog = 'raxmlHPC-PTHREADS-SSE3'
+        else:
+            raxml_prog = 'raxmlHPC'
+        oosp.ex(
+                raxml_prog,
+                args=raxml_args
+                )
+    except:
+        logger.info('Cannot build the tree! The number of samples is too few '\
+                    'or there is some error with raxmlHMP')
+        pass
+
+
+
+
+ at trace_unhandled_exceptions
+def load_sample(args):
+    ifn_sample = args['ifn_sample']
+    logger.debug('load %s'%ifn_sample)
+    output_dir = args['output_dir'] 
+    ifn_markers = args['ifn_markers']
+    clades = args['clades']
+    kept_clade = args['kept_clade']
+    db = shared_variables.db
+    sing_clades = shared_variables.sing_clades
+    clade2num_markers = shared_variables.clade2num_markers
+    marker_in_clade = args['marker_in_clade']
+    kept_markers = args['kept_markers']
+    sample = ooSubprocess.splitext(ifn_sample)[0]
+    with open(ifn_sample, 'rb') as ifile:
+        marker2seq = msgpack.load(ifile, use_list=False)
+
+    if kept_clade:
+        if kept_clade == 'singleton':
+            nmarkers = len(marker2seq)
+        else:
+            # remove redundant clades and markers
+            nmarkers = 0
+            for marker in marker2seq.keys():
+                clade = db['markers'][marker]['taxon'].split('|')[-1]
+                if kept_markers:
+                    if marker in kept_markers and clade == kept_clade:
+                        nmarkers += 1
+                    else:
+                        del marker2seq[marker]
+                elif clade == kept_clade:
+                    nmarkers += 1
+                else:
+                    del marker2seq[marker]
+        total_num_markers = clade2num_markers[kept_clade] if not kept_markers else len(kept_markers)
+        if float(nmarkers) / total_num_markers < marker_in_clade:
+            marker2seq = {}
+
+        # reformat 'pileup'
+        for m in marker2seq:
+            freq = marker2seq[m]['freq']
+            marker2seq[m]['freq'] = [(0.0, 0.0, 0.0) for i in \
+                                     range(len(marker2seq[m]['seq']))]
+            for p in freq:
+                marker2seq[m]['freq'][p] = freq[p]
+            marker2seq[m]['seq'] = marker2seq[m]['seq'].replace('-', 'N') # make sure we have no gaps in the sequence
+
+        return marker2seq
+    else:
+        # remove redundant clades and markers
+        clade2n_markers = defaultdict(int)
+        remove_clade = []
+        for marker in marker2seq:
+            clade = db['markers'][marker]['taxon'].split('|')[-1]
+            if 's__' in clade or clade in sing_clades:
+                clade2n_markers[clade] = clade2n_markers[clade] + 1
+            else:
+                remove_clade.append(clade)
+        remove_clade += [clade for clade in clade2n_markers if
+                         float(clade2n_markers[clade]) \
+                         / float(clade2num_markers[clade]) < marker_in_clade]
+        remove_marker = [marker for marker in marker2seq if
+                         db['markers'][marker]['taxon'].split('|')[-1] in
+                         remove_clade]
+        for marker in remove_marker:
+            del marker2seq[marker]
+     
+        sample_clades = set([])
+        for marker in marker2seq:
+            clade = db['markers'][marker]['taxon'].split('|')[-1]
+            sample_clades.add(clade)
+        return sample_clades
+
+
+
+
+def load_all_samples(args, sample2order, kept_clade, kept_markers):
+    ifn_samples = args['ifn_samples'] + args['ifn_second_samples'] 
+    if args['ifn_representative_sample']:
+        ifn_samples.append(args['ifn_representative_sample'])
+    ifn_samples = sorted(list(set(ifn_samples)))
+    if not ifn_samples:
+        return None
+    else:
+        args_list = []
+        for ifn_sample in ifn_samples:
+            func_args = {}
+            func_args['ifn_sample'] = ifn_sample
+            func_args['kept_clade'] = kept_clade
+            func_args['kept_markers'] = kept_markers
+            for k in [ 
+                      'output_dir',
+                      'ifn_markers', 'nprocs_load_samples', 
+                      'clades',
+                      'mpa_pkl',
+                      ]:
+                func_args[k] = args[k]
+            sample = ooSubprocess.splitext(ifn_sample)[0]
+            if sample2order[sample] == 'first':
+                func_args['marker_in_clade'] = args['marker_in_clade']
+            else:
+                func_args['marker_in_clade'] = args['second_marker_in_clade']
+            args_list.append(func_args)
+
+        results = ooSubprocess.parallelize(
+                                           load_sample,
+                                           args_list,
+                                           args['nprocs_load_samples'],
+                                           use_threads=args['use_threads'])
+        if kept_clade:
+            sample2marker = {}
+            for i in range(len(ifn_samples)):
+                sample = ooSubprocess.splitext(ifn_samples[i])[0]
+                if len(results[i]): # skip samples with no markers
+                    sample2marker[sample] = results[i] 
+            return sample2marker
+        else:
+            all_clades = set([])
+            for r in results:
+                for c in r:
+                    all_clades.add(c)
+            all_clades = sorted(list(all_clades))
+            return all_clades
+
+
+
+
+def strainer(args):
+    # auto-set some params
+    if args['relaxed_parameters']:
+        args['sample_in_marker'] = 0.5
+        args['N_in_marker'] = 0.5
+        args['gap_in_sample'] = 0.5
+    elif args['relaxed_parameters2']:
+        args['sample_in_marker'] = 0.2
+        args['N_in_marker'] = 0.8
+        args['gap_in_sample'] = 0.8
+    elif args['relaxed_parameters3']:
+        args['gap_in_trailing_col'] = 0.9
+        args['gap_in_internal_col'] = 0.9
+        args['gap_in_sample'] = 0.9
+        args['second_gap_in_sample'] = 0.5
+        args['sample_in_marker'] = 0.1
+        args['marker_in_clade'] = 0.1
+        args['second_marker_in_clade'] = 0.1
+
+    if args['keep_full_alignment_files']:
+        args['keep_alignment_files'] = True
+        args['marker_strip_length'] = 0
+        
+    if os.path.isfile(args['clades'][0]):
+        with open(args['clades'][0], 'r') as ifile:
+            args['clades'] = [line.strip() for line in ifile]
+
+ 
+    # check conditions
+    ooSubprocess.mkdir(args['output_dir'])
+    with open(os.path.join(args['output_dir'], 'arguments.txt'), 'w') as ofile:
+        #for para in args:
+        #    ofile.write('%s\n'%para)
+        #    ofile.write('%s\n'%args[para])
+        ofile.write('%s\n'%' '.join(sys.argv))
+        ofile.write('%s'%args)
+
+    if args['ifn_markers'] == None and args['ifn_ref_genomes'] != None:
+        logger.error('ifn_ref_genomes is set but ifn_markers is not set!')
+        exit(1)
+
+    if args['ifn_markers'] != None and args['ifn_ref_genomes'] != None:
+        if len(args['clades']) != 1 or args['clades'] == ['all']:
+            logger.error('Only one clade can be specified when adding '\
+                         'reference genomes')
+            exit(1)
+
+    if args['ifn_markers'] == None and args['clades'] == ['singleton']:
+        logger.error('clades is set to singleton but ifn_markers is not set!')
+        exit(1)
+
+    if args['ifn_samples'] == []:
+        args['clades'] = ['singleton']
+
+    if args['nprocs_load_samples'] == None:
+        args['nprocs_load_samples'] = args['nprocs_main']
+
+    if args['nprocs_align_clean'] == None:
+        args['nprocs_align_clean'] = args['nprocs_main']
+
+    if args['nprocs_raxml'] == None:
+        args['nprocs_raxml'] = args['nprocs_main']
+
+    if args['clades'] == ['singleton']:
+        shared_variables.db = None
+        shared_variables.sing_clades = []
+        nmarkers = 0
+        for rec in SeqIO.parse(args['ifn_markers'], 'fasta'):
+            nmarkers += 1
+        clade2num_markers = {'singleton': nmarkers}
+        shared_variables.clade2num_markers = clade2num_markers
+    else:
+        # load mpa_pkl
+        logger.info('Load mpa_pkl')
+        db = pickle.load(bz2.BZ2File(args['mpa_pkl']))
+        shared_variables.db = db
+
+        # reduce and convert to shared memory
+        #logger.debug('converting db')
+        db['taxonomy'] = db['taxonomy'].keys()
+        for m in db['markers']:
+            del db['markers'][m]['clade']
+            del db['markers'][m]['ext']
+            del db['markers'][m]['len']
+            del db['markers'][m]['score']
+        gc.collect()
+        #logger.debug('converted db')
+        
+        # get clades from db
+        logger.info('Get clades from db')
+        sing_clades, clade2num_markers, clade2subclades = get_db_clades(db)
+        shared_variables.sing_clades = sing_clades
+        shared_variables.clade2num_markers = clade2num_markers
+
+    # set order
+    sample2order = {}
+    if args['ifn_representative_sample']:
+        sample = ooSubprocess.splitext(args['ifn_representative_sample'])[0]
+        sample2order[sample] = 'first'
+
+    for ifn in args['ifn_samples']:
+        sample = ooSubprocess.splitext(ifn)[0]
+        sample2order[sample] = 'first'
+
+    for ifn in args['ifn_second_samples']:
+        sample = ooSubprocess.splitext(ifn)[0]
+        if sample not in sample2order:
+            sample2order[sample] = 'second'
+    
+    kept_markers = set([])
+    if args['marker_list_fn']:
+        with open(args['marker_list_fn'], 'r') as ifile:
+            for line in ifile:
+                kept_markers.add(line.strip())
+        if not kept_markers:
+            raise Exception('Number of markers in the marker_list_fn is 0!'%args['marker_list_fn'])
+    elif args['ifn_representative_sample']:
+        with open(args['ifn_representative_sample'], 'rb') as ifile:
+            repr_marker2seq = msgpack.load(ifile, use_list=False)
+        if args['clades'] != ['all'] and args['clades'] != ['singleton']:
+            for marker in repr_marker2seq:
+                clade = db['markers'][marker]['taxon'].split('|')[-1]
+                if clade in args['clades']:
+                    kept_markers.add(marker)
+        else:
+            kept_markers = set(repr_marker2seq.keys())
+        logger.debug('Number of markers in the representative '\
+                     'sample: %d'%len(kept_markers))
+        if not kept_markers:
+            raise Exception('Number of markers in the representative sample is 0!')
+    
+    # get clades from samples
+    if args['clades'] == ['all']:
+        logger.info('Get clades from samples')
+        args['clades'] = load_all_samples(args, 
+                                          sample2order,
+                                          kept_clade=None,
+                                          kept_markers=kept_markers)
+
+    if args['print_clades_only']:
+        for c in args['clades']:
+            if c.startswith('s__'):
+                print c
+            else:
+                print c, '(%s)'%(','.join(list(clade2subclades[c])))
+        return
+
+    # add reference genomes
+    ref2marker = defaultdict(dict)
+    if args['ifn_markers'] != None and args['ifn_ref_genomes'] != None:
+        logger.info('Add reference genomes')
+        marker_records = {}
+        for rec in SeqIO.parse(open(args['ifn_markers'], 'r'), 'fasta'):
+            if rec.id in kept_markers or (not kept_markers):
+                marker_records[rec.id] = rec
+        add_ref_genomes(
+                        ref2marker, 
+                        marker_records, 
+                        args['ifn_ref_genomes'],
+                        args['output_dir'])
+
+        # remove bad reference genomes
+        if not kept_markers:
+            nmarkers = shared_variables.clade2num_markers[args['clades'][0]]
+        else:
+            nmarkers = len(kept_markers)
+        remove_ref = []
+        mic = args['second_marker_in_clade'] if args['add_reference_genomes_as_second_samples'] else args['marker_in_clade']
+        for ref in ref2marker:
+            if float(len(ref2marker[ref])) / nmarkers < mic:
+                remove_ref.append(ref)
+        for ref in remove_ref:
+            del ref2marker[ref]
+    ref2marker = dict(ref2marker)
+    for ref in ref2marker:
+        if args['add_reference_genomes_as_second_samples']:
+            sample2order[ref] = 'second'
+        else:
+            sample2order[ref] = 'first'
+
+    # build tree for each clade
+    for clade in args['clades']:
+        logger.info('Build the tree for %s'%clade)
+
+        # load samples and reference genomes
+        sample2marker = load_all_samples(args, 
+                                         sample2order,
+                                         kept_clade=clade,
+                                         kept_markers=kept_markers)
+
+        for r in ref2marker:
+            sample2marker[r] = ref2marker[r]
+        logger.debug('number of samples and reference genomes: %d'%len(sample2marker))
+
+        for sample in sample2marker:
+            logger.debug('number of markers in sample %s: %d'%(
+                          sample,
+                          len(sample2marker[sample])))
+
+        # Filter sequences
+        logger.debug('Filter consensus marker sequences')
+        for sample in sample2marker:
+            sample2marker[sample] = filter_sequence(
+                                                    sample,
+                                                    sample2marker[sample],
+                                                    args['marker_strip_length'],
+                                                    args['N_in_marker'])
+
+        # remove samples with percentage of markers less than marker_in_clade
+        logger.debug('remove samples with percentage of markers '\
+                     'less than marker_in_clade')
+        for sample in sample2marker.keys():
+            if len(sample2marker[sample]):
+                if clade == 'singleton':
+                    c = 'singleton'
+                else:
+                    marker = sample2marker[sample].keys()[0]
+                    c = db['markers'][marker]['taxon'].split('|')[-1]
+                if len(sample2marker[sample]) / \
+                    float(clade2num_markers[c]) < args['marker_in_clade']:
+                        del sample2marker[sample]
+            else:
+                del sample2marker[sample]
+
+        # build trees
+        shared_variables.sample2marker = sample2marker
+        build_tree(
+            clade=clade,
+            sample2marker=sample2marker, 
+            sample2order=sample2order,
+            clade2num_markers=clade2num_markers,
+            sample_in_clade=args['sample_in_clade'],
+            sample_in_marker=args['sample_in_marker'],
+            gap_in_trailing_col=args['gap_in_trailing_col'],
+            gap_trailing_col_limit=args['gap_trailing_col_limit'],
+            gap_in_internal_col=args['gap_in_internal_col'],
+            N_count=args['N_count'],
+            N_col=args['N_col'],
+            gap_in_sample=args['gap_in_sample'],
+            second_gap_in_sample=args['second_gap_in_sample'],
+            long_gap_length=args['long_gap_length'],
+            long_gap_percentage=args['long_gap_percentage'],
+            p_value=args['p_value'],
+            output_dir=args['output_dir'],
+            nprocs_align_clean=args['nprocs_align_clean'],
+            alignment_program=args['alignment_program'],
+            nprocs_raxml=args['nprocs_raxml'],
+            keep_alignment_files=args['keep_alignment_files'],
+            bootstrap_raxml=args['bootstrap_raxml'],
+            save_sample2fullfreq=args['save_sample2fullfreq'],
+            use_threads=args['use_threads'])
+        del shared_variables.sample2marker
+        del sample2marker
+        #gc.collect()
+
+    logger.info('Finished!')
+
+
+
+
+def check_dependencies(args):
+    programs = ['muscle']
+
+    if args['ifn_markers'] != None or args['ifn_ref_genomes'] != None:
+            programs += ['blastn', 'makeblastdb']
+
+    if args['nprocs_main'] > 1:
+        programs += ['raxmlHPC-PTHREADS-SSE3']
+    else:
+        programs += ['raxmlHPC']
+
+    for prog in programs:
+        if not which.is_exe(prog):
+            logger.error('Cannot find %s in the executable path!'%prog)
+            exit(1)
+
+
+
+
+if __name__ == "__main__":
+    args = read_params()
+    check_dependencies(args)
+    strainer(args)
diff --git a/strainphlan_src/add_metadata_tree.py b/strainphlan_src/add_metadata_tree.py
new file mode 100755
index 0000000..3aec59a
--- /dev/null
+++ b/strainphlan_src/add_metadata_tree.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+
+import sys
+import os
+import argparse as ap
+import pandas
+import copy
+import ConfigParser
+import dendropy
+import numpy
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--ifn_trees', nargs='+', required=True, default=None, type=str)
+    p.add_argument('--ifn_metadatas', nargs='+', required=True, default=None, type=str)
+    p.add_argument('--string_to_remove', 
+                   required=False, default='', type=str,
+                   help='string to be removed in the tree node names')
+    p.add_argument(
+                    '--metadatas', 
+                    nargs='+', 
+                    required=False, 
+                    default=['all'],
+                    type=str,
+                    help='The metadata fields that you want to add. '\
+                          'Default: add all metadata from the first line.')
+    return vars(p.parse_args())
+
+
+def get_index_col(ifn):
+    with open(ifn, 'r') as ifile:
+        line = ifile.readline()
+        line = line.strip().split()
+        for i in range(len(line)):
+            if line[i].upper() == 'SAMPLEID':
+                return i
+    return -1
+
+
+def main(args):
+    add_fields = args['metadatas']
+    for ifn_tree in args['ifn_trees']:
+        print 'Input:', ifn_tree
+        df_list = []
+        samples = []
+        for ifn in args['ifn_metadatas']:
+            index_col = get_index_col(ifn)
+            df = pandas.read_csv(
+                ifn,
+                sep='\t', 
+                dtype=unicode,
+                header=0, 
+                index_col=index_col)
+            df = df.transpose()
+            df_list.append(df)
+            samples += df.columns.values.tolist()
+            if add_fields == ['all']:
+                with open(ifn, 'r') as ifile:
+                    add_fields = [f for f in ifile.readline().strip().split('\t') \
+                                  if f.upper() != 'SAMPLEID']
+        print 'number of samples in metadata: %d'%len(samples)
+        count = 0
+        with open(ifn_tree, 'r') as ifile:
+            line = ifile.readline()
+        line = line.replace(args['string_to_remove'], '')
+        tree = dendropy.Tree(stream=open(ifn_tree, 'r'), schema='newick')
+        for node in tree.leaf_nodes():
+            sample = node.get_node_str().strip("'")
+            sample = sample.replace(args['string_to_remove'], '')
+            prefixes = [prefix for prefix in 
+                            ['k__', 'p__', 'c__', 'o__', 
+                             'f__', 'g__', 's__'] \
+                        if prefix in sample]
+
+            metadata = sample
+            if len(prefixes) == 0:
+                count += 1
+                for meta in add_fields:
+                    old_meta = meta
+                    for i in range(len(df_list)):
+                        if sample in df_list[i].columns.values.tolist():
+                            df = df_list[i]
+                            if meta.lower() in df[sample]:
+                                meta = meta.lower()
+                            elif meta.upper() in df[sample]:
+                                meta = meta.upper()
+                            elif meta.title() in df[sample]:
+                                meta = meta.title()
+                            if meta in df[sample]:
+                                metadata += '|%s-%s'%(
+                                                old_meta,
+                                                str(df[sample][meta]).replace(':','_'))
+                                break # take the first metadata
+
+            line = line.replace(sample + ':', metadata + ':')
+
+        ofn_tree = ifn_tree + '.metadata'
+        print 'Number of samples in tree: %d'%count
+        print 'Output:', ofn_tree
+        with open(ofn_tree, 'w') as ofile:
+            ofile.write(line)
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
diff --git a/strainphlan_src/build_tree_single_strain.py b/strainphlan_src/build_tree_single_strain.py
new file mode 100755
index 0000000..c1ede91
--- /dev/null
+++ b/strainphlan_src/build_tree_single_strain.py
@@ -0,0 +1,146 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '17 Sep 2015'
+
+import sys
+import os
+import argparse 
+import numpy
+from Bio import SeqIO
+import glob
+
+
+def read_params():
+    p = argparse.ArgumentParser()
+    p.add_argument(
+        '--ifn_alignments', 
+        nargs='+',
+        required=True, 
+        default=None, 
+        type=str,
+        help='The alignment file.')
+    p.add_argument(
+        '--log_ofn', 
+        required=True, 
+        default=None, 
+        type=str,
+        help='The log file.')
+    p.add_argument(
+        '--nprocs', 
+        required=True, 
+        default=None, 
+        type=int,
+        help='Number of processors.')
+    p.add_argument(
+        '--bootstrap_raxml', 
+        required=False, 
+        default=0, 
+        type=int,
+        help='The number of runs for bootstraping when building the tree. '\
+             'Default 0.')
+    p.add_argument(
+        '--verbose', 
+        required=False, 
+        dest='quiet',
+        action='store_false',
+        help='Show all information. Default "not set".')
+    p.set_defaults(quiet=True)
+
+    return p.parse_args()
+
+
+def run(cmd):
+    print cmd
+    os.system(cmd)
+
+
+def main(args):
+    lfile = open(args.log_ofn, 'w')
+    for ifn_alignment in args.ifn_alignments:
+        if 'remove_' in ifn_alignment:
+            continue
+        sample2polrate = {}
+        ifn_polymorphic = ifn_alignment.replace('.fasta', '.polymorphic')
+        singles = set([])
+        with open(ifn_polymorphic, 'r') as ifile:
+            for line in ifile:
+                if line[0] == '#':
+                    continue
+                line = line.strip().split()
+                val = float(line[1])
+                if line[0][:3] in ['k__', 'p__', 'c__', 'o__', 'f__', 'g__', 's__', 't__']:
+                    singles.add(line[0])
+                    continue
+                sample2polrate[line[0]] = val
+        median = numpy.median(sample2polrate.values())
+        std = numpy.std(sample2polrate.values())
+        for s in sample2polrate:
+            if sample2polrate[s] <= median + std:
+                singles.add(s)
+
+        if len(sample2polrate):
+            log_line = '%s\t%d\t%d\t%f\n'%\
+                        (os.path.basename(ifn_polymorphic).replace('.polymorphic', ''), 
+                        len(singles), 
+                        len(sample2polrate), 
+                        float(len(singles)) / len(sample2polrate))
+        else:
+            log_line = '%s\t%d\t%d\t%f\n'%\
+                        (os.path.basename(ifn_polymorphic).replace('.polymorphic', ''), 
+                        len(singles), 
+                        len(sample2polrate), 
+                        0)
+        lfile.write(log_line)
+
+        ifn_alignment2 = ifn_alignment.replace('.fasta', '') + '.remove_multiple_strains.fasta'
+        with open(ifn_alignment2, 'w') as ofile:
+            for rec in SeqIO.parse(open(ifn_alignment, 'r'), 'fasta'):
+                if rec.name in singles:
+                    SeqIO.write(rec, ofile, 'fasta')
+
+        with open(ifn_alignment2 + '.log', 'w') as ofile:
+            ofile.write(log_line)
+
+        output_suffix = os.path.basename(ifn_alignment2).replace('.polymorphic', '').replace('.fasta', '')
+        output_suffix += '.tree'
+        if args.bootstrap_raxml:
+            cmd = 'raxmlHPC-PTHREADS-SSE3 '
+            cmd += '-f a '
+            cmd += '-m GTRGAMMA '
+            #cmd += '-b 1234 '
+            cmd += '-x 1234 '
+            cmd += '-N %d '%(args.bootstrap_raxml)
+            cmd += '-s %s '%os.path.abspath(ifn_alignment2)
+            cmd += '-w %s '%os.path.abspath(os.path.dirname(ifn_alignment2))
+            cmd += '-n %s '%output_suffix 
+            cmd += '-p 1234 '
+        else:
+            cmd = 'raxmlHPC-PTHREADS-SSE3 '
+            cmd += '-m GTRCAT '
+            cmd += '-s %s '%os.path.abspath(ifn_alignment2)
+            cmd += '-w %s '%os.path.abspath(os.path.dirname(ifn_alignment2))
+            cmd += '-n %s '%output_suffix
+            cmd += '-p 1234 '
+        if args.nprocs:
+            cmd += '-T %d '%(args.nprocs)
+        raxfns = glob.glob('%s/RAxML_*%s*'%(os.path.dirname(ifn_alignment2), output_suffix))
+        for fn in raxfns:
+            os.remove(fn)
+        '''
+        if len(raxfns) == 0:
+            run(cmd)
+        '''
+        run(cmd)
+    lfile.close()
+     
+
+
+
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
diff --git a/strainphlan_src/compute_distance.py b/strainphlan_src/compute_distance.py
new file mode 100755
index 0000000..f72b38d
--- /dev/null
+++ b/strainphlan_src/compute_distance.py
@@ -0,0 +1,195 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '1 Sep 2014'
+
+import sys
+import os
+ABS_PATH = os.path.abspath(sys.argv[0])
+MAIN_DIR = os.path.dirname(ABS_PATH)
+os.environ['PATH'] += ':%s'%MAIN_DIR
+sys.path.append(MAIN_DIR)
+
+from mixed_utils import dist2file, statistics
+import argparse as ap
+from Bio import SeqIO, Seq, SeqRecord
+from collections import defaultdict
+import numpy
+from ooSubprocess import ooSubprocess
+
+
+'''
+SUBST = {
+        'A':{'A':0.0, 'C':1.0, 'G':1.0, 'T':1.0, '-':1.0},
+        'C':{'A':1.0, 'C':0.0, 'G':1.0, 'T':1.0, '-':1.0}, 
+        'G':{'A':1.0, 'C':1.0, 'G':0.0, 'T':1.0, '-':1.0}, 
+        'T':{'A':1.0, 'C':1.0, 'G':1.0, 'T':0.0, '-':1.0},
+        '-':{'A':1.0, 'C':1.0, 'G':1.0, 'T':1.0, '-':0.0}}
+'''
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--ifn_alignment', required=True, default=None, type=str)
+    p.add_argument('--ofn_prefix', required=True, default=None, type=str)
+    p.add_argument('--count_gaps', 
+                   required=False,
+                   dest='ignore_gaps', 
+                   action='store_false')
+    p.set_defaults(ignore_gaps=True)
+    p.add_argument('--overwrite', 
+                   required=False,
+                   dest='overwrite', 
+                   action='store_true')
+    p.set_defaults(overwrite=True)
+
+    return vars(p.parse_args())
+
+
+def get_dist(seq1, seq2, ignore_gaps):
+    if len(seq1) != len(seq2):
+        print >> sys.stderr, 'Error: Two sequences have different lengths!'
+        print >> sys.stderr, 'Cannot compute the distance!'
+        exit(1)
+
+    abs_dist = 0.0
+    for i in range(len(seq1)):
+        if seq1[i] != seq2[i]:
+            if ignore_gaps:
+                if seq1[i] != '-' and seq2[i] != '-':
+                    abs_dist += 1.0
+            else:
+                abs_dist += 1.0
+
+    abs_sim = len(seq1) - abs_dist
+    rel_dist = abs_dist / float(len(seq1))
+    rel_sim = 1.0 - rel_dist
+    return abs_dist, rel_dist, abs_sim, rel_sim
+    
+
+def compute_dist_matrix(ifn_alignment, ofn_prefix, ignore_gaps):
+    ofn_abs_dist = ofn_prefix + '.abs_dist'
+    if os.path.isfile(ofn_abs_dist):
+        print 'File %s exists, skip!'%ofn_abs_dist
+        return
+    else:
+        print 'Compute dist_matrix for %s'%ofn_abs_dist
+    #print 'Compute dist_matrix for %s'%ofn_abs_dist
+
+    recs = [rec for rec in SeqIO.parse(open((ifn_alignment), 'r'), 'fasta')]
+    abs_dist = numpy.zeros((len(recs), len(recs)))
+    abs_dist_flat = []
+    rel_dist = numpy.zeros((len(recs), len(recs)))
+    rel_dist_flat = []
+    abs_sim = numpy.zeros((len(recs), len(recs)))
+    abs_sim_flat = []
+    rel_sim = numpy.zeros((len(recs), len(recs)))
+    rel_sim_flat = []
+
+    for i in range(len(recs)):
+        for j in range(i+1, len(recs)):
+            abs_d, rel_d, abs_s, rel_s = get_dist(recs[i].seq, 
+                                                  recs[j].seq,
+                                                  ignore_gaps)
+
+            abs_dist[i][j] = abs_d
+            abs_dist[j][i] = abs_d
+            abs_dist_flat.append(abs_d)
+            
+            rel_dist[i][j] = rel_d
+            rel_dist[j][i] = rel_d
+            rel_dist_flat.append(rel_d)
+
+            abs_sim[i][j] = abs_s
+            abs_sim[j][i] = abs_s
+            abs_sim_flat.append(abs_s)
+            
+            rel_sim[i][j] = rel_s
+            rel_sim[j][i] = rel_s
+            rel_sim_flat.append(rel_s)
+
+
+    labels = [rec.name for rec in recs]
+    oosp = ooSubprocess()
+
+    ofn_abs_dist = ofn_prefix + '.abs_dist'
+    dist2file(abs_dist, labels, ofn_abs_dist)
+    with open(ofn_abs_dist + '.info', 'w') as ofile:
+        ofile.write(statistics(abs_dist_flat)[1])
+    '''
+    if len(abs_dist_flat) > 0:
+        oosp.ex('hclust2.py',
+        args=['-i', ofn_abs_dist,
+              '-o', ofn_abs_dist + '.png',
+              '--f_dist_f', 'euclidean',
+              '--s_dist_f', 'euclidean',
+              '-l', '--dpi', '300',
+              '--flabel_size', '5',
+              '--slabel_size', '5',
+              '--max_flabel_len', '200'])
+    '''
+
+    ofn_rel_dist = ofn_prefix + '.rel_dist'
+    dist2file(rel_dist, labels, ofn_rel_dist)
+    with open(ofn_rel_dist + '.info', 'w') as ofile:
+        ofile.write(statistics(rel_dist_flat)[1])
+    '''
+    if len(rel_dist_flat) > 0:
+        oosp.ex('hclust2.py',
+        args=['-i', ofn_rel_dist,
+              '-o', ofn_rel_dist + '.png',
+              '--f_dist_f', 'euclidean',
+              '--s_dist_f', 'euclidean',
+              '-l', '--dpi', '300',
+              '--flabel_size', '5',
+              '--slabel_size', '5',
+              '--max_flabel_len', '200'])
+    '''
+
+    ofn_abs_sim = ofn_prefix + '.abs_sim'
+    dist2file(abs_sim, labels, ofn_abs_sim)
+    with open(ofn_abs_sim + '.info', 'w') as ofile:
+        ofile.write(statistics(abs_sim_flat)[1])
+    '''
+    if len(abs_sim_flat) > 0:
+        oosp.ex('hclust2.py',
+        args=['-i', ofn_abs_sim,
+              '-o', ofn_abs_sim + '.png',
+              '--f_dist_f', 'euclidean',
+              '--s_dist_f', 'euclidean',
+              '-l', '--dpi', '300',
+              '--flabel_size', '5',
+              '--slabel_size', '5',
+              '--max_flabel_len', '200'])
+    '''
+
+    ofn_rel_sim = ofn_prefix + '.rel_sim'
+    dist2file(rel_sim, labels, ofn_rel_sim)
+    with open(ofn_rel_sim + '.info', 'w') as ofile:
+        ofile.write(statistics(rel_sim_flat)[1])
+    '''
+    if len(rel_sim_flat) > 0:
+        oosp.ex('hclust2.py',
+        args=['-i', ofn_rel_sim,
+              '-o', ofn_rel_sim + '.png',
+              '--f_dist_f', 'euclidean',
+              '--s_dist_f', 'euclidean',
+              '-l', '--dpi', '300',
+              '--flabel_size', '5',
+              '--slabel_size', '5',
+              '--max_flabel_len', '200'])
+    '''       
+
+
+def main(args):
+    compute_dist_matrix(
+                        args['ifn_alignment'], 
+                        args['ofn_prefix'],
+                        args['ignore_gaps']) 
+    
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
diff --git a/strainphlan_src/dump_file.py b/strainphlan_src/dump_file.py
new file mode 100755
index 0000000..9133804
--- /dev/null
+++ b/strainphlan_src/dump_file.py
@@ -0,0 +1,77 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+
+import sys
+import argparse as ap
+import bz2
+import gzip
+import tarfile
+#import logging.config
+#sys.path.append('../pyphlan')
+#sys.path.append('pyphlan')
+import ooSubprocess
+
+#logging.config.fileConfig('logging.ini', disable_existing_loggers=False)
+#logger = logging.getLogger(__name__)
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--input_file', required=True, default=None, type=str)
+
+    return vars(p.parse_args())
+
+
+def dump_file(ifn):
+    file_ext = ''
+    if ifn.endswith('.tar.bz2'):
+        ifile = tarfile.open(ifn, 'r:bz2')
+        file_ext = '.tar.bz2'
+    elif ifn.endswith('.tar.gz'):
+        ifile = tarfile.open(ifn, 'r:gz')
+        file_ext = '.tar.gz'
+    elif ifn.endswith('.bz2'):
+        ifile = bz2.BZ2File(ifn, 'r')
+        file_ext = '.bz2'
+    elif ifn.endswith('.gz'):
+        ifile = gzip.GzipFile(ifn, 'r')
+        file_ext = '.gz'
+    elif ifn.endswith('.fastq'):
+        ifile = open(ifn, 'r')
+        file_ext = '.fastq'
+    elif ifn.endswith('.sam'):
+        ifile = open(ifn, 'r')
+        file_ext = '.sam'
+    elif ifn.endswith('.sra'):
+        oosp = ooSubprocess.ooSubprocess()
+        ifile = oosp.ex(
+                        'fastq-dump',
+                        args=[
+                                '-Z', ifn,
+                                '--split-spot'],
+                        get_out_pipe=True)
+        file_ext = '.sra'
+    else:
+        raise Exception('Unrecognized format! The format should be .bz2, .gz'\
+                '.tar.bz2, .tar.gz, .sra, .sam.bz2, .sam, or .fastq\n')
+
+    try:
+        if file_ext in ['.tar.bz2', '.tar.gz']:
+            for tar_info in ifile:
+                ifile2 = ifile.extractfile(tar_info)
+                if ifile2 != None:
+                    for line in ifile2:
+                        sys.stdout.write(line)
+        else:
+            for line in ifile:
+                sys.stdout.write(line)
+    except:
+        sys.stderr.write('Error while dumping file %s\n'%ifn)
+        raise
+
+
+if __name__ == "__main__":
+    args = read_params()
+    dump_file(args['input_file'])
diff --git a/strainphlan_src/extract_markers.py b/strainphlan_src/extract_markers.py
new file mode 100755
index 0000000..2c8ca9c
--- /dev/null
+++ b/strainphlan_src/extract_markers.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '1 Sep 2014'
+
+import sys
+import os
+import argparse as ap
+import pickle
+import bz2
+from Bio import SeqIO, Seq, SeqRecord
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--mpa_pkl', required=True, default=None, type=str)
+    p.add_argument('--ifn_markers', required=True, default=None, type=str)
+    p.add_argument('--clade', required=True, default=None, type=str)
+    p.add_argument('--ofn_markers', required=True, default=None, type=str)
+    return vars(p.parse_args())
+
+
+def extract_markers(mpa_pkl, ifn_markers, clade, ofn_markers):
+    with open(mpa_pkl, 'rb') as ifile:
+        db = pickle.loads(bz2.decompress(ifile.read()))
+    markers = set([])
+    for marker in db['markers']:
+        if clade == db['markers'][marker]['taxon'].split('|')[-1]:
+            markers.add(marker)
+    print 'number of markers', len(markers)
+    with open(ofn_markers, 'w') as ofile:
+        for rec in SeqIO.parse(open(ifn_markers, 'r'), 'fasta'):
+            if rec.name in markers:
+                SeqIO.write(rec, ofile, 'fasta')
+
+
+if __name__ == "__main__":
+    args = read_params()
+    extract_markers(
+                    mpa_pkl=args['mpa_pkl'],
+                    ifn_markers=args['ifn_markers'],
+                    clade=args['clade'],
+                    ofn_markers=args['ofn_markers'])
diff --git a/strainphlan_src/fastx_len_filter.py b/strainphlan_src/fastx_len_filter.py
new file mode 100755
index 0000000..84af5ab
--- /dev/null
+++ b/strainphlan_src/fastx_len_filter.py
@@ -0,0 +1,17 @@
+#!/usr/bin/env python
+from Bio import SeqIO
+import argparse as ap
+import sys
+
+def read_params(args):
+	p = ap.ArgumentParser(description = 'fastax_len_filter.py Parameters\n')
+	p.add_argument('--min_len', required = True, default = None, type = int)
+	return vars(p.parse_args())
+	
+if __name__ == '__main__':
+	args = read_params(sys.argv)
+	min_len = args['min_len']
+	with sys.stdout as outf:
+		for r in SeqIO.parse(sys.stdin, "fastq"):
+			if len(r) >= min_len:
+				SeqIO.write(r, outf, "fastq")
diff --git a/strainphlan_src/fix_AF1.py b/strainphlan_src/fix_AF1.py
new file mode 100755
index 0000000..b24edd5
--- /dev/null
+++ b/strainphlan_src/fix_AF1.py
@@ -0,0 +1,36 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+
+import sys
+import argparse
+
+def read_params():
+    p = argparse.ArgumentParser()
+    p.add_argument('--input_file', required=True, default='-', type=str)
+
+    return vars(p.parse_args())
+
+
+def fix_AF1(ifn):
+    if ifn == '-':
+        ifile = sys.stdin
+    else:
+        ifile = open(ifn, 'r')
+
+    for line in ifile:
+        if line[0] != '#':
+            if 'AF1=0' in line:
+                spline = line.split()
+                if spline[3] != spline[4] and spline[4].upper() in ['A', 'T', 'C', 'G']:
+                    line = line.replace('AF1=0', 'AF1=1')
+        sys.stdout.write(line)
+
+    if ifn != '-':
+        ifile.close()
+
+
+if __name__ == "__main__":
+    args = read_params()
+    fix_AF1(args['input_file'])
diff --git a/strainphlan_src/logging.ini b/strainphlan_src/logging.ini
new file mode 100755
index 0000000..528b555
--- /dev/null
+++ b/strainphlan_src/logging.ini
@@ -0,0 +1,22 @@
+[loggers]
+keys=root
+
+[handlers]
+keys=consoleHandler
+
+[formatters]
+keys=simpleFormatter
+
+[logger_root]
+level=DEBUG
+handlers=consoleHandler
+
+[handler_consoleHandler]
+class=StreamHandler
+level=DEBUG
+formatter=simpleFormatter
+args=(sys.stdout,)
+
+[formatter_simpleFormatter]
+format=%(asctime)s | %(name)s | %(levelname)s | %(funcName)s | %(lineno)d | %(message)s
+datefmt=
diff --git a/strainphlan_src/mixed_utils.py b/strainphlan_src/mixed_utils.py
new file mode 100755
index 0000000..1b46ab8
--- /dev/null
+++ b/strainphlan_src/mixed_utils.py
@@ -0,0 +1,99 @@
+#!/usr/bin/env python
+# Author: Duy Tin Truong (duytin.truong at unitn.it)
+#		at CIBIO, University of Trento, Italy
+
+__author__ = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__ = '1st Sep 2014'
+
+import numpy
+import sys
+
+def dist2file(dist, labels, ofn):
+    with open(ofn, 'w') as ofile:
+        ofile.write('ID')
+        for label in labels:
+            ofile.write('\t%s'%label)
+        ofile.write('\n')
+        for i in range(len(labels)):
+            ofile.write('%s\t'%labels[i])
+            for j in range(len(labels)):
+                if j == len(labels) - 1:
+                    ofile.write('%f\n'%dist[i][j])
+                else:
+                    ofile.write('%f\t'%dist[i][j])
+
+
+
+def statistics(vals):
+    vals = numpy.array(vals)
+    result = {}
+    if len(vals.shape) == 1:
+        num_elems = len(vals)
+        nrows = num_elems
+        ncols = 1
+    else:
+        nrows, ncols = vals.shape
+        num_elems = nrows * ncols
+    if num_elems > 0:
+        result['nrows'] = nrows
+        result['ncols'] = ncols
+        result['size'] = num_elems
+        result['average'] = numpy.average(vals)
+        result['min'] = numpy.min(vals)
+        result['max'] = numpy.max(vals)
+        result['median'] = numpy.percentile(vals, 50)
+        result['percentile_25'] = numpy.percentile(vals, 25)
+        result['percentile_75'] = numpy.percentile(vals, 75)
+    else:
+        result['nrows'] = nrows
+        result['ncols'] = ncols
+        result['size'] = num_elems
+        result['average'] = 0
+        result['min'] = 0
+        result['max'] = 0
+        result['median'] = 0
+        result['percentile_25'] = 0
+        result['percentile_75'] = 0
+
+    str_result = ''
+    for key in ['nrows',
+                'ncols',
+                'size',
+                'average',
+                'min',
+                'max',
+                'median',
+                'percentile_25',
+                'percentile_75']:
+        str_result += '%s: %s\n'%(key, result[key])
+
+    return result, str_result
+
+
+
+def dict2str(dict_var):
+    result = ''
+    for key in dict_var:
+        result += '%s: %s\n'%(key, dict_var[key])
+    return result
+
+
+def openr( fn, mode = "r" ):
+    if fn is None:
+        return sys.stdin
+    return bz2.BZ2File(fn) if fn.endswith(".bz2") else open(fn,mode)
+    
+
+def openw( fn ):
+    if fn is None:
+        return sys.stdout
+    return bz2.BZ2File(fn,"w") if fn.endswith(".bz2") else open(fn,"w")
+            
+
+def is_number(s):
+    try:
+        int(s)
+        return True
+    except ValueError:
+        return False
diff --git a/strainphlan_src/ooSubprocess.py b/strainphlan_src/ooSubprocess.py
new file mode 100755
index 0000000..66d435e
--- /dev/null
+++ b/strainphlan_src/ooSubprocess.py
@@ -0,0 +1,300 @@
+#!/usr/bin/env python
+# Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+
+import subprocess
+import os
+import multiprocessing
+from multiprocessing.pool import ThreadPool
+import sys
+import cStringIO
+from tempfile import NamedTemporaryFile 
+import which
+import functools
+import traceback
+import numpy
+
+
+class ooSubprocessException(Exception):
+    pass
+
+class ooSubprocess:
+
+    def __init__(self, tmp_dir='tmp/'):
+        self.chain_cmds = []
+        self.tmp_dir = tmp_dir
+        mkdir(tmp_dir)
+
+    def ex(
+            self,
+            prog,
+            args=[],
+            get_output=False,
+            get_out_pipe=False,
+            out_fn=None,
+            in_pipe=None,
+            verbose=True,
+            **kwargs):
+
+        if not which.is_exe(prog):
+            raise ooSubprocessException('Error: cannot find the program %s '\
+                        'in the executable path!'%prog)
+
+        if isinstance(args, str):
+            args = args.split()
+
+        if not isinstance(args, list):
+            args = [args]
+
+        cmd = [prog] + args
+        print_cmd = 'ooSubprocess: ' + ' '.join(cmd)
+        if verbose and out_fn and (not get_output):
+            print_stderr(print_cmd + ' > ' + out_fn)
+        elif verbose:
+            print_stderr(print_cmd)
+
+        if get_output:
+            result = subprocess.check_output(
+                                                cmd,
+                                                stdin=in_pipe,
+                                                **kwargs)
+        elif get_out_pipe:
+            tmp_file = NamedTemporaryFile(dir=self.tmp_dir)
+            p = subprocess.Popen(
+                                        cmd, 
+                                        stdin=in_pipe, 
+                                        stdout=tmp_file,
+                                        **kwargs)
+            p.wait()
+            if in_pipe != None:
+                in_pipe.close()
+            tmp_file.seek(0)
+            result = tmp_file
+        elif out_fn:
+            ofile = open(out_fn, 'w') if out_fn else None
+            result = subprocess.check_call(
+                                            cmd, 
+                                            stdin=in_pipe, 
+                                            stdout=ofile, 
+                                            **kwargs)
+            ofile.close()
+        else:
+            result = subprocess.check_call(
+                                            cmd, 
+                                            stdin=in_pipe, 
+                                            **kwargs)
+        return result
+
+    def chain(
+            self,
+            prog,
+            args=[],
+            stop=False,
+            in_pipe=None,
+            get_output=False,
+            get_out_pipe=False,
+            out_fn=None,
+            verbose=True,
+            **kwargs):
+
+        if not which.is_exe(prog):
+            raise ooSubprocessException('Error: cannot find the program %s '\
+                        'in the executable path!'%prog)
+
+        if in_pipe is None and self.chain_cmds != []:
+            raise ooSubprocessException(
+                'The pipeline was not stopped before creating a new one!'\
+                'In cache: %s' % (' | '.join(self.chain_cmds)))
+        if out_fn and stop == False:
+            raise ooSubprocessException(
+                'out_fn (output_file_name) is only specified when stop = True!')
+
+        if isinstance(args, str):
+            args = args.split()
+
+        if not isinstance(args, list):
+            args = [args]
+        cmd = [prog] + args
+
+        print_cmd = ' '.join(cmd)
+        if out_fn and (not get_output):
+            print_cmd += ' > ' + out_fn
+        self.chain_cmds.append(print_cmd)
+
+        if stop:
+            if in_pipe is None:
+                raise ooSubprocessException('No input process to create a pipeline!')
+
+            if verbose:
+                print_stderr('ooSubprocess: ' + ' | '.join(self.chain_cmds))
+
+            self.chain_cmds = []
+            if get_output:
+                result = subprocess.check_output(
+                                                    cmd,
+                                                    stdin=in_pipe,
+                                                    **kwargs)
+            elif get_out_pipe:
+                tmp_file = NamedTemporaryFile(dir=self.tmp_dir)
+                p = subprocess.Popen(
+                                    cmd,
+                                    stdin=in_pipe,
+                                    stdout=tmp_file,
+                                    **kwargs)
+                return_code = p.wait()
+                if return_code != 0:
+                    raise ooSubprocessException(
+                                    'Failed when executing the command: %s\n'\
+                                    'return code: %s'\
+                                    %(' | '.join(self.chain_cmds), return_code))
+                tmp_file.seek(0)
+                if in_pipe != None:
+                    in_pipe.close()
+
+                result = tmp_file
+            elif out_fn:
+                ofile = open(out_fn, 'w')
+                result = subprocess.check_call(
+                                                cmd,
+                                                stdin=in_pipe,
+                                                stdout=ofile,
+                                                **kwargs)
+                ofile.close()
+            else:
+                result = subprocess.check_call(
+                                                cmd,
+                                                stdin=in_pipe,
+                                                **kwargs)
+        else:
+            tmp_file = NamedTemporaryFile(dir=self.tmp_dir)
+            p = subprocess.Popen(
+                                cmd,
+                                stdin=in_pipe,
+                                stdout=tmp_file,
+                                **kwargs)
+            return_code = p.wait()
+            if return_code != 0:
+                    raise ooSubprocessException(
+                                    'Failed when executing the command: %s\n'\
+                                    'return code: %s'\
+                                    %(' | '.join(self.chain_cmds), return_code))
+            tmp_file.seek(0)
+            if in_pipe != None:
+                in_pipe.close()
+
+            result = tmp_file
+        return result
+
+    def ftmp(self, ifn):
+        return os.path.join(self.tmp_dir, os.path.basename(ifn))
+
+
+def fdir(dir, ifn):
+    return os.path.join(dir, os.path.basename(ifn))
+
+
+def mkdir(dir):
+    if not os.path.exists(dir):
+        try:
+            os.makedirs(dir)
+        except OSError as e:
+            if e.errno != 17:
+                raise
+            pass
+    elif not os.path.isdir(dir):
+        raise ooSubprocessException('Error: %s is not a directory!' % dir)
+
+
+def replace_ext(ifn, old_ext, new_ext):
+    # if not os.path.isfile(ifn):
+    #    print_stderr('Error: file %s does not exist!'%(ifn))
+    #    exit(1)
+    if ifn[len(ifn) - len(old_ext):] != old_ext:
+        # print_stderr('Error: the old file extension %s does not match!'%old_ext)
+        # exit(1)
+        new_ifn = ifn + new_ext
+    else:
+        new_ifn = ifn[:len(ifn) - len(old_ext)] + new_ext
+    return new_ifn
+
+
+def splitext(ifn):
+    basename = os.path.basename(ifn)
+    if ifn.endswith('.tar.bz2'):
+        ext = '.tar.bz2'
+    elif ifn.endswith('.tar.gz'):
+        ext = '.tar.gz'
+    else:
+        ext = basename.split('.')[-1]
+        if ext != basename:
+            ext = '.' + ext
+    base = basename[:-len(ext)]
+    for t in ['.sam', '.fastq', '.fasta', '.fna']:
+        if base.endswith(t):
+            ext = t + ext
+            base = base[:-len(t)]
+    return base, ext
+
+
+def trace_unhandled_exceptions(f):
+    @functools.wraps(f)
+    def wrapper(*args, **kwargs):
+        try:
+            return f(*args, **kwargs)
+        except:
+            #traceback.print_exc()
+            #raise Exception(''.join(traceback.format_exception(*sys.exc_info())))
+            raise Exception(traceback.format_exc())
+    return wrapper
+
+
+def parallelize(func, args, nprocs=1, use_threads=False):
+    if nprocs > 1:
+        if use_threads:
+            pool = ThreadPool(nprocs)
+        else:
+            pool = multiprocessing.Pool(nprocs)
+        results = pool.map(func, args)
+        pool.close()
+        pool.join()
+    else:
+        results = serialize(func, args)
+    return results
+
+
+def parallelize_async(func, args, nprocs=1, use_threads=False):
+    if nprocs > 1:
+        if use_threads:
+            pool = ThreadPool(nprocs)
+        else:
+            pool = multiprocessing.Pool(nprocs)
+        app_results = []
+        for a in args:
+            app_results.append(pool.apply_async(func, [a]))
+        pool.close()
+        pool.join()
+        results = [r.get() for r in app_results]
+    else:
+        results = serialize(func, args)
+    return results
+
+
+def serialize(func, args):
+    results = []
+    for arg in args:
+        results.append(func(arg))
+    return results
+
+
+def print_stderr(*args):
+    sys.stderr.write(' '.join(map(str, args)) + '\n')
+    sys.stderr.flush()
+
+
+def print_stdout(*args):
+    sys.stdout.write(' '.join(map(str, args)) + '\n')
+    sys.stdout.flush()
+
+
+
diff --git a/strainphlan_src/plot_tree_graphlan.py b/strainphlan_src/plot_tree_graphlan.py
new file mode 100755
index 0000000..f514f24
--- /dev/null
+++ b/strainphlan_src/plot_tree_graphlan.py
@@ -0,0 +1,177 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '4 May 2015'
+
+import sys
+import os
+import argparse as ap
+import dendropy
+from StringIO import StringIO
+import re
+from collections import defaultdict
+import ConfigParser
+import matplotlib.colors as colors
+import subprocess
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--ifn_tree', 
+                   required=True, 
+                   default=None, 
+                   type=str,
+                   help='The input tree in newick format.')
+    p.add_argument('--colorized_metadata', 
+                   required=False, 
+                   default='unset', 
+                   type=str,
+                   help='The metadata field to colorize. Default "unset".')
+    p.add_argument('--fig_size', 
+                   required=False, 
+                   default=8, 
+                   type=float,
+                   help='The figure size. Default "8".')
+    p.add_argument('--legend_marker_size', 
+                   required=False, 
+                   default=20, 
+                   type=int,
+                   help='The legend marker size. Default "20".'
+                   )
+    p.add_argument('--legend_font_size', 
+                   required=False, 
+                   default=10, 
+                   type=int,
+                   help='The legend font size. Default "10".'
+                   )
+    p.add_argument('--legend_marker_edge_width', 
+                   required=False, 
+                   default=0.2, 
+                   type=float,
+                   help='The legend marker edge width. Default "0.2".'
+                   )
+    p.add_argument('--leaf_marker_size', 
+                   required=False, 
+                   default=20, 
+                   type=int,
+                   help='The legend marker size. Default "20".'
+                   )
+    p.add_argument('--leaf_marker_edge_width', 
+                   required=False, 
+                   default=0.2, 
+                   type=float,
+                   help='The legend marker edge width. Default "0.2".'
+                   )
+    p.add_argument('--dpi', 
+                   required=False, 
+                   default=300, 
+                   type=int,
+                   help='The figure dpi.')
+    p.add_argument('--figure_extension', 
+                   required=False, 
+                   default='.png', 
+                   type=str,
+                   help='The figure extension. Default ".png".')
+    p.add_argument('--ofn_prefix', 
+                   required=False, 
+                   default=None, 
+                   type=str,
+                   help='The prefix of output files.')
+    return p.parse_args()
+
+
+
+
+def run(cmd):
+    print cmd
+    subprocess.call(cmd.split())
+    
+
+
+
+def main(args):
+    tree = dendropy.Tree.get_from_path(args.ifn_tree, schema='newick',
+                                       preserve_underscores=True)
+    tree.reroot_at_midpoint()
+    count = 0
+    metadatas = set([])
+    node2metadata = {}
+    for node in tree.preorder_node_iter():
+        nodestr = node.get_node_str().strip("'")
+        if node.is_leaf():
+            if '.' in nodestr:
+                nodestr = nodestr.replace('.',',')
+                node.taxon = dendropy.Taxon(label=nodestr)
+            substrs = re.findall(
+                         '%s-[a-zA-Z0-9.]*'%args.colorized_metadata,
+                          nodestr)
+            if substrs:
+                md = substrs[0].replace(args.colorized_metadata + '-', '')
+                metadatas.add(md)
+                node2metadata[nodestr] = md
+        else:
+            count += 1
+            node.taxon = dendropy.Taxon(label='node_%d'%count)
+    metadatas = sorted(list(metadatas))
+    color_names = colors.cnames.keys()
+    metadata2color = {}
+    for i, md in enumerate(metadatas):
+        metadata2color[md] = color_names[i % len(color_names)]
+
+    if not args.ofn_prefix:
+        args.ofn_prefix = args.ifn_tree
+    ofn_tree = args.ofn_prefix + '.graphlantree'
+    tree.write_to_path(ofn_tree, 'newick')
+    ofn_annot = args.ofn_prefix + '.annot'
+    with open(ofn_annot, 'w') as ofile:
+        #ofile.write('clade_separation\t0\n')
+        ofile.write('branch_bracket_width\t0\n')
+        #ofile.write('clade_separation\t0.15\n')
+        ofile.write('branch_bracket_depth\t0\n')
+        #ofile.write('branch_thickness\t1.25\n')
+        ofile.write('annotation_background_width\t0\n')
+        
+        # legend
+        ofile.write('#legends\n')
+        ofile.write('class_legend_font_size\t%d\n'%args.legend_font_size)
+
+        for md in metadata2color:
+            ofile.write('%s\tclade_marker_size\t%d\n'%(md, args.legend_marker_size))
+            ofile.write('%s\tclade_marker_color\t%s\n'%(md, metadata2color[md]))
+            ofile.write('%s\tclade_marker_edge_width\t%f\n'%(md, args.legend_marker_edge_width))
+
+        # remove intermedate nodes
+        for node in tree.preorder_node_iter():
+            if not node.is_leaf():
+                nodestr = node.get_node_str().strip("'")
+                ofile.write('%s\tclade_marker_size\t0\n'%(nodestr))
+
+        # colorize leaf nodes
+        for node in tree.seed_node.leaf_nodes():
+            nodestr = node.get_node_str().strip("'")
+            if nodestr in node2metadata:
+                leaf_color = metadata2color[node2metadata[nodestr]]
+                ofile.write('%s\tclade_marker_size\t%d\n'%(nodestr, args.leaf_marker_size))
+                ofile.write('%s\tclade_marker_color\t%s\n'%(nodestr, leaf_color))
+                ofile.write('%s\tclade_marker_edge_width\t%f\n'%(nodestr, args.leaf_marker_edge_width))
+
+    ofn_xml = args.ofn_prefix + '.xml'
+    cmd = 'graphlan_annotate.py --annot %s %s %s'%(ofn_annot, ofn_tree, ofn_xml)
+    run(cmd)
+
+    ofn_fig = args.ofn_prefix + args.figure_extension
+    cmd = 'graphlan.py %s %s --dpi %d --size %f'%(ofn_xml, ofn_fig, args.dpi, args.fig_size)
+    run(cmd)
+
+    print 'Output file: %s'%ofn_fig
+
+
+
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
+    #test()
diff --git a/strainphlan_src/sam_filter.py b/strainphlan_src/sam_filter.py
new file mode 100755
index 0000000..20a90e9
--- /dev/null
+++ b/strainphlan_src/sam_filter.py
@@ -0,0 +1,59 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '18 Jul 2015'
+
+import sys
+import os
+import argparse 
+
+
+def read_params():
+    p = argparse.ArgumentParser()
+    p.add_argument(
+        '--input_file', 
+        required=False, 
+        default=None, 
+        type=str,
+        help='The input sam file.')
+    p.add_argument(
+        '--min_align_score', 
+        required=True, 
+        default=None, 
+        type=int,
+        help='The sam records with alignment score smaller than this value '
+             'will be discarded.')
+    p.add_argument(
+        '--verbose', 
+        required=False, 
+        dest='quiet',
+        action='store_false',
+        help='Show all information. Default "not set".')
+    p.set_defaults(quiet=True)
+
+    return p.parse_args()
+
+
+def main(args):
+    if args.input_file == None:
+        ifile = sys.stdin
+    else:
+        ifile = open(args.input_file, 'r')
+    for line in ifile:
+        if line[0] == '@':
+            sys.stdout.write(line)
+        else:
+            spline = line.split()
+            read_length = len(spline[9])
+            align_score = float(spline[11].replace('AS:i:', ''))
+            if align_score >= args.min_align_score * read_length / 100.0:
+                sys.stdout.write(line)
+
+
+
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
diff --git a/strainphlan_src/sample2markers.py b/strainphlan_src/sample2markers.py
new file mode 100755
index 0000000..144a0a9
--- /dev/null
+++ b/strainphlan_src/sample2markers.py
@@ -0,0 +1,421 @@
+#!/usr/bin/env python
+# Author: Duy Tin Truong (duytin.truong at unitn.it)
+#		at CIBIO, University of Trento, Italy
+
+
+import sys
+import os
+ABS_PATH = os.path.abspath(sys.argv[0])
+MAIN_DIR = os.path.dirname(ABS_PATH)
+os.environ['PATH'] += ':%s'%MAIN_DIR
+os.environ['PATH'] += ':%s'%os.path.join(MAIN_DIR, 'strainphlan_src')
+sys.path.append(MAIN_DIR)
+sys.path.append(os.path.join(MAIN_DIR, 'strainphlan_src'))
+
+import argparse as ap
+import glob
+import ooSubprocess
+from ooSubprocess import print_stderr, trace_unhandled_exceptions
+import ConfigParser
+from Bio import SeqIO, Seq, SeqRecord
+import cStringIO
+import msgpack
+import random
+import subprocess
+import bz2
+import gzip
+import logging
+import logging.config
+import tarfile
+import threading
+import multiprocessing
+import pysam
+from collections import defaultdict
+from scipy import stats
+import numpy
+
+logging.basicConfig(level=logging.DEBUG, stream=sys.stderr,
+                    disable_existing_loggers=False,
+                    format='%(asctime)s | %(levelname)s | %(name)s | %(funcName)s | %(lineno)d | %(message)s')
+logger = logging.getLogger(__name__)
+
+
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--ifn_samples', nargs='+', required=True, default=None, type=str)
+    p.add_argument('--ifn_markers', required=False, default=None, type=str)
+    p.add_argument('--output_dir', required=True, default=None, type=str)
+    p.add_argument('--nprocs', required=False, default=1, type=int)
+    p.add_argument('--min_read_len', required=False, default=90, type=int)
+    p.add_argument('--min_align_score', required=False, default=None, type=int)
+    p.add_argument('--min_base_quality', required=False, default=30, type=float)
+    p.add_argument('--error_rate', required=False, default=0.01, type=float)
+    p.add_argument('--marker2file_ext', required=False, default='.markers', type=str)
+    p.add_argument('--sam2file_ext', required=False, default='.sam.bz2', type=str)
+    p.add_argument(
+        '--verbose', 
+        required=False, 
+        dest='quiet',
+        action='store_false',
+        help='Show all information. Default "not set".')
+    p.set_defaults(quiet=True)
+    '''
+    p.add_argument(
+        '--use_processes', 
+        required=False, 
+        default=False, 
+        action='store_false',
+        dest='use_threads',
+        help='Use multiprocessing. Default "Use multithreading".')
+    p.set_defaults(use_threads=True)
+    '''
+    p.add_argument(
+        '--input_type',
+        required=True,
+        default=None,
+        type=str,
+        choices=['fastq', 'sam'],
+        help='The input type:'\
+                ' fastq, sam. Sam '\
+                ' files can be obtained from the previous run of'\
+                ' this script or strainphlan.py).')
+
+    return vars(p.parse_args())
+
+
+def build_bowtie2db(ifn_markers, tmp_dir, error_pipe=None):
+    # build bowtie2-index
+    if not os.path.isfile(ifn_markers):
+        error = 'ifn_markers %s does not exist!'%ifn_markers
+        logger.error(error)
+        raise Exception(error)
+
+    if not os.path.isdir(tmp_dir):
+        ooSubprocess.mkdir(tmp_dir)
+    bt2_base = ooSubprocess.splitext(ifn_markers)[0]
+    index_fns = glob.glob('%s/%s.*'%(tmp_dir, bt2_base))
+    index_path = os.path.join(tmp_dir, bt2_base)
+    oosp = ooSubprocess.ooSubprocess(tmp_dir)
+    if index_fns == []:
+        oosp.ex(
+                'bowtie2-build', 
+                ['--quiet', ifn_markers, index_path],
+                stderr=error_pipe)
+    else:
+        logger.warning('bowtie2-indexes of %s are ready, skip rebuilding!'
+                        %(bt2_base))
+    return index_path
+
+
+
+def sample2markers(
+        ifn_sample, 
+        min_read_len,
+        min_align_score,
+        min_base_quality,
+        error_rate,
+        ifn_markers, 
+        index_path,
+        nprocs=1, 
+        sam2file=None,
+        marker2file=None, 
+        tmp_dir='tmp',
+        quiet=False):
+
+    '''
+    Compute the consensus markers in a sample file ifn_sample.
+
+    :param ifn_sample: the sample file in fastq format.
+    :param marker2file: the file name to store the consensus markers.
+    :param sam2file: the file name to store the sam content.
+    :returns: if marker2file==None, return the dictionary of the consensus
+    markers. Otherwise, save the result in fasta format to the file declared by 
+    marker2file
+    '''
+
+    if quiet:
+        error_pipe = open(os.devnull, 'w')
+    else:
+        error_pipe = None
+    oosp = ooSubprocess.ooSubprocess(tmp_dir)
+
+    # sample to sam
+    sample_pipe = oosp.chain(
+                             'dump_file.py', 
+                             args=['--input_file', ifn_sample],
+                             stderr=error_pipe
+                             )
+    filter_length_pipe = oosp.chain(
+                                    'fastx_len_filter.py',
+                                    args=['--min_len', str(min_read_len)],
+                                    in_pipe=sample_pipe,
+                                    stderr=error_pipe
+                                    )
+    bowtie2_pipe = oosp.chain(
+                                'bowtie2', 
+                                args=[
+                                        '-U', '-',
+                                        '-x', index_path,
+                                        '--very-sensitive',
+                                        '--no-unal',
+                                        '-p', str(nprocs)],
+                                in_pipe=filter_length_pipe,
+                                stderr=error_pipe)
+    if sam2file == None:
+        sam_pipe = bowtie2_pipe
+    else:
+        oosp.chain(
+                    'compress_file.py',
+                    args=['--output_file', sam2file], 
+                    in_pipe=bowtie2_pipe,
+                    stderr=error_pipe,
+                    stop=True)
+
+        sam_pipe = oosp.chain(
+                                'dump_file.py', 
+                                args=['--input_file', sam2file],
+                                stderr=error_pipe)
+    ofn_bam_sorted_prefix = os.path.join(
+                                tmp_dir,
+                                os.path.basename(ifn_sample) + '.bam.sorted')
+
+    return sam2markers(
+                       sam_file=sam_pipe, 
+                       ofn_bam_sorted_prefix=ofn_bam_sorted_prefix,
+                       marker2file=marker2file, 
+                       oosp=oosp, 
+                       tmp_dir=tmp_dir,
+                       quiet=quiet)
+
+
+
+def save2file(tmp_file, ofn):
+    logger.debug('save %s'%ofn)
+    with open(ofn, 'w') as ofile:
+        for line in tmp_file:
+            ofile.write(line)
+    tmp_file.seek(0)
+
+
+
+def sam2markers(
+                sam_file, 
+                ofn_bam_sorted_prefix,
+                min_align_score=None,
+                min_base_quality=30,
+                error_rate=0.01,
+                marker2file=None, 
+                oosp=None, 
+                tmp_dir='tmp',
+                quiet=False):
+    '''
+    Compute the consensus markers in a sample from a sam content.
+
+    :param sam_file: a file name, a file object or subprocess.Popen object
+    containing the content of a sam file.
+    :param marker2file: the file name to store the consensus genomes.
+    :param ofn_bam_sorted_prefix: the bam sorted file prefix
+    :param oosp: an instance of ooSubprocess for running a pipe
+    :returns: if marker2file==None, return the dictionary of the consensus
+    genomes. Otherwise, save the result in fasta format to the file declared by
+    marker2file
+    '''
+
+    if quiet:
+        error_pipe = open(os.devnull, 'w')
+    else:
+        error_pipe = None
+
+    # sam content to file object
+    if oosp is None:
+        oosp = ooSubprocess.ooSubprocess()
+
+    if type(sam_file) == str:
+        p1 = oosp.chain(
+                        'dump_file.py', 
+                        args=['--input_file', sam_file],
+                        stderr=error_pipe)
+    else:
+        p1 = sam_file
+    
+    # filter sam
+    if min_align_score == None:
+        p1_filtered = p1
+    else:
+        p1_filtered = oosp.chain('sam_filter.py',
+                                 args=['--min_align_score',
+                                       str(min_align_score)],
+                                 in_pipe=p1,
+                                 stderr=error_pipe)
+    # sam to bam
+    p2 = oosp.chain(
+                    'samtools', 
+                    args=['view', '-bS', '-'], 
+                    in_pipe=p1_filtered,
+                    stderr=error_pipe)
+
+    # sort bam
+    tmp_fns = glob.glob('%s*'%ofn_bam_sorted_prefix)
+    for tmp_fn in tmp_fns:
+        os.remove(tmp_fn)
+    p3 = oosp.chain(
+                    'samtools', 
+                    args=['sort', '-o', '-', ofn_bam_sorted_prefix], 
+                    in_pipe=p2,
+                    stderr=error_pipe)
+
+    # extract polimorphic information 
+    marker2seq = defaultdict(dict)
+    pysam.index(p3.name)
+    samfile = pysam.AlignmentFile(p3.name)
+    for pileupcolumn in samfile.pileup():
+        rname = samfile.getrname(pileupcolumn.reference_id)
+        pileup = defaultdict(int)
+        for pileupread in pileupcolumn.pileups:
+            if not pileupread.is_del and not pileupread.is_refskip:  # query position is None if is_del or is_refskip is set.
+                b = pileupread.alignment.query_sequence[pileupread.query_position]
+                q = pileupread.alignment.query_qualities[pileupread.query_position]
+                if q >= min_base_quality:
+                    pileup[b] += 1
+        if len(pileup):
+            f = float(max(pileup.values())) / sum(pileup.values())
+            p = stats.binom.cdf(max(pileup.values()), sum(pileup.values()), 1.0 - error_rate)
+            freq = (f, sum(pileup.values()), p)
+        else:
+            freq = (0.0, 0.0, 0.0)
+        if 'freq' not in marker2seq[rname]:
+            marker2seq[rname]['freq'] = {}
+        marker2seq[rname]['freq'][pileupcolumn.pos] = freq
+    samfile.close()
+    os.remove(p3.name + '.bai')
+
+    # bam to mpileup
+    p3.seek(0)
+    p4 = oosp.chain(
+                    'samtools', 
+                    args=['mpileup', '-u', '-'], 
+                    in_pipe=p3,
+                    stderr=error_pipe)
+
+    # mpileup to vcf
+    p5 = oosp.chain(
+                    'bcftools', 
+                    args=['view', '-c', '-g', '-p', '1.1', '-'], 
+                    in_pipe=p4,
+                    stderr=error_pipe)
+                    #stderr=open(os.devnull, 'w'))
+
+    # fix AF1=0
+    p6 = oosp.chain(
+                    'fix_AF1.py', 
+                    args=['--input_file', '-'], 
+                    in_pipe=p5,
+                    stderr=error_pipe)
+
+    # vcf to fastq
+    p7 = oosp.chain(
+                      'vcfutils.pl', 
+                      args=['vcf2fq'], 
+                      in_pipe=p6,
+                      get_out_pipe=True,
+                      stderr=error_pipe,
+                      stop=True)
+
+    try:
+        for rec in SeqIO.parse(p7, 'fastq'):
+            marker2seq[rec.name]['seq'] = str(rec.seq).upper()
+        marker2seq = dict(marker2seq)
+    except Exception as e:
+        logger.error("sam2markers failed on file " + sam_file)
+        raise 
+
+    if type(p1) == file:
+        p1.close()
+
+    if marker2file:
+        with open(marker2file, 'wb') as ofile:
+            msgpack.dump(marker2seq, ofile)
+
+    return marker2seq
+
+
+
+
+ at trace_unhandled_exceptions
+def run_sample(args_list):
+    ifn_sample = args_list[0]
+    args = args_list[1]
+    base_name = ooSubprocess.splitext(ifn_sample)[0]
+    output_prefix = os.path.join(args['output_dir'], base_name)
+    if args['sam2file_ext'] != None:
+        sam2file = output_prefix + args['sam2file_ext']
+    else:
+        sam2file = None
+    marker2file = output_prefix + args['marker2file_ext']
+    if args['input_type'] == 'fastq':
+        sample2markers(
+                    ifn_sample=ifn_sample, 
+                    min_read_len=args['min_read_len'],
+                    min_align_score=args['min_align_score'],
+                    min_base_quality=args['min_base_quality'],
+                    error_rate=args['error_rate'],
+                    ifn_markers=args['ifn_markers'], 
+                    index_path=args['index_path'],
+                    nprocs=args['nprocs'],
+                    sam2file=sam2file,
+                    marker2file=marker2file, 
+                    tmp_dir=args['output_dir'],
+                    quiet=args['quiet'])
+    else:
+        ofn_bam_sorted_prefix = os.path.join(
+                            args['output_dir'],
+                            os.path.basename(ifn_sample) + '.bam.sorted')
+        sam2markers(
+                    sam_file=ifn_sample, 
+                    ofn_bam_sorted_prefix=ofn_bam_sorted_prefix,
+                    min_align_score=args['min_align_score'],
+                    min_base_quality=args['min_base_quality'],
+                    error_rate=args['error_rate'],
+                    marker2file=marker2file,
+                    quiet=args['quiet'])
+    return 0
+
+
+
+
+def compute_polymorphic_sites(sample2pileup, ifn_alignment):
+    return
+
+
+
+
+def main(args):
+    ooSubprocess.mkdir(args['output_dir'])
+    manager = multiprocessing.Manager()
+
+    if args['input_type'] == 'fastq':
+        index_path = build_bowtie2db(args['ifn_markers'], args['output_dir'])
+        args['index_path'] = index_path
+
+    args_list = []
+    for ifn_sample in args['ifn_samples']:
+        args_list.append([ifn_sample, args])
+
+    #ooSubprocess.parallelize(run_sample, args_list, args['nprocs'])
+    pool = multiprocessing.Pool(args['nprocs'])
+    results = []
+    for a in args_list:
+        r = pool.apply_async(run_sample, [a])
+        results.append(r)
+    for r in results:
+        try:
+            r.get()
+        except Exception as e:
+            print e
+
+
+        
+if __name__ == "__main__":
+    args = read_params()
+    main(args)
diff --git a/strainphlan_src/which.py b/strainphlan_src/which.py
new file mode 100755
index 0000000..7a7e75e
--- /dev/null
+++ b/strainphlan_src/which.py
@@ -0,0 +1,25 @@
+#!/usr/bin/env python
+# Author: Duy Tin Truong (duytin.truong at unitn.it)
+#		at CIBIO, University of Trento, Italy
+
+
+import os
+def which(program):
+    def is_exe(fpath):
+        return os.path.isfile(fpath) and os.access(fpath, os.X_OK)
+
+    fpath, fname = os.path.split(program)
+    if fpath:
+        if is_exe(program):
+            return program
+    else:
+        for path in os.environ["PATH"].split(os.pathsep):
+            path = path.strip('"')
+            exe_file = os.path.join(path, program)
+            if is_exe(exe_file):
+                return exe_file
+
+    return None
+
+def is_exe(program):
+    return which(program) != None
diff --git a/strainphlan_tutorial/step1_download.sh b/strainphlan_tutorial/step1_download.sh
new file mode 100644
index 0000000..17c40c0
--- /dev/null
+++ b/strainphlan_tutorial/step1_download.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+wget 
+bunzip2 
diff --git a/strainphlan_tutorial/step2_fastq2sam.sh b/strainphlan_tutorial/step2_fastq2sam.sh
new file mode 100755
index 0000000..e98004f
--- /dev/null
+++ b/strainphlan_tutorial/step2_fastq2sam.sh
@@ -0,0 +1,8 @@
+#!/bin/bash
+mkdir -p sams
+for f in $(ls fastqs/*.bz2)
+do
+    echo "Running metaphlan2 on ${f}"
+    bn=$(basename ${f} | cut -d . -f 1)
+    tar xjfO ${f} | ../metaphlan2.py --bowtie2db ../db_v20/mpa_v20_m200 --mpa_pkl ../db_v20/mpa_v20_m200.pkl --input_type multifastq --nproc 10 -s sams/${bn}.sam.bz2 --bowtie2out sams/${bn}.bowtie2_out.bz2 -o sams/${bn}.profile
+done
diff --git a/strainphlan_tutorial/step3_sam2marker.sh b/strainphlan_tutorial/step3_sam2marker.sh
new file mode 100755
index 0000000..6660d13
--- /dev/null
+++ b/strainphlan_tutorial/step3_sam2marker.sh
@@ -0,0 +1,5 @@
+#!/bin/bash
+mkdir -p consensus_markers
+cwd=$(pwd -P)
+export PATH=${cwd}/../strainphlan_src:${PATH}
+python ../strainphlan_src/sample2markers.py --ifn_samples sams/*.sam.bz2 --input_type sam --output_dir consensus_markers --nprocs 10 &> consensus_markers/log.txt
diff --git a/strainphlan_tutorial/step4_extract_db_marker.sh b/strainphlan_tutorial/step4_extract_db_marker.sh
new file mode 100755
index 0000000..99c1c35
--- /dev/null
+++ b/strainphlan_tutorial/step4_extract_db_marker.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+mkdir -p db_markers
+bowtie2-inspect ../db_v20/mpa_v20_m200 > db_markers/all_markers.fasta
+python ../strainphlan_src/extract_markers.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_markers db_markers/all_markers.fasta --clade s__Bacteroides_caccae --ofn_markers db_markers/s__Bacteroides_caccae.markers.fasta
diff --git a/strainphlan_tutorial/step5_build_tree.sh b/strainphlan_tutorial/step5_build_tree.sh
new file mode 100644
index 0000000..9600a2d
--- /dev/null
+++ b/strainphlan_tutorial/step5_build_tree.sh
@@ -0,0 +1,4 @@
+#!/bin/bash
+mkdir -p output
+python ../strainphlan.py --mpa_pkl ../db_v20/mpa_v20_m200.pkl --ifn_samples consensus_markers/*.markers --ifn_markers db_markers/s__Bacteroides_caccae.markers.fasta --ifn_ref_genomes reference_genomes/G000273725.fna.bz2 --output_dir output --nprocs_main 10 --clades s__Bacteroides_caccae &> output/log.txt
+python ../strainphlan_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
diff --git a/strainphlan_tutorial/step6_build_tree_single_strain.sh b/strainphlan_tutorial/step6_build_tree_single_strain.sh
new file mode 100644
index 0000000..ad99085
--- /dev/null
+++ b/strainphlan_tutorial/step6_build_tree_single_strain.sh
@@ -0,0 +1,3 @@
+#!/bin/bash
+python ../strainer_src/build_tree_single_strain.py --ifn_alignments output/s__Bacteroides_caccae.fasta --nprocs 10 --log_ofn output/build_tree_single_strain.log
+python ../strainer_src/add_metadata_tree.py --ifn_trees output/RAxML_bestTree.s__Bacteroides_caccae.remove_multiple_strains.tree --ifn_metadatas fastqs/metadata.txt --metadatas subjectID
diff --git a/utils/extract_markers.py b/utils/extract_markers.py
new file mode 100755
index 0000000..f649438
--- /dev/null
+++ b/utils/extract_markers.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python
+#Author: Duy Tin Truong (duytin.truong at unitn.it)
+#        at CIBIO, University of Trento, Italy
+
+__author__  = 'Duy Tin Truong (duytin.truong at unitn.it)'
+__version__ = '0.1'
+__date__    = '1 Sep 2014'
+
+import sys
+import os
+import argparse as ap
+import pickle
+import bz2
+from Bio import SeqIO, Seq, SeqRecord
+
+def read_params():
+    p = ap.ArgumentParser()
+    p.add_argument('--mpa_pkl', required=True, default=None, type=str)
+    p.add_argument('--ifn_markers', required=False, default=None, type=str)
+    p.add_argument('--clade', required=True, default=None, type=str)
+    p.add_argument('--ofn_markers', required=True, default=None, type=str)
+    return vars(p.parse_args())
+
+
+def extract_markers(mpa_pkl, ifn_markers, clade, ofn_markers):
+    with open(mpa_pkl, 'rb') as ifile:
+        db = pickle.loads(bz2.decompress(ifile.read()))
+    markers = set([])
+    for marker in db['markers']:
+        if clade == db['markers'][marker]['taxon'].split('|')[-1]:
+            markers.add(marker)
+    print 'number of markers', len(markers)
+    with open(ofn_markers, 'w') as ofile:
+        if ifn_markers:
+            for rec in SeqIO.parse(open(ifn_markers, 'r'), 'fasta'):
+                if rec.name in markers:
+                    SeqIO.write(rec, ofile, 'fasta')
+        else:
+            for m in markers:
+                ofile.write('%s\n'%m)
+
+
+if __name__ == "__main__":
+    args = read_params()
+    extract_markers(
+                    mpa_pkl=args['mpa_pkl'],
+                    ifn_markers=args['ifn_markers'],
+                    clade=args['clade'],
+                    ofn_markers=args['ofn_markers'])
diff --git a/utils/markers_info.txt.bz2 b/utils/markers_info.txt.bz2
new file mode 100644
index 0000000..81b1804
Binary files /dev/null and b/utils/markers_info.txt.bz2 differ
diff --git a/utils/merge_metaphlan_tables.py b/utils/merge_metaphlan_tables.py
new file mode 100755
index 0000000..c65259c
--- /dev/null
+++ b/utils/merge_metaphlan_tables.py
@@ -0,0 +1,103 @@
+#!/usr/bin/env python
+
+# ============================================================================== 
+# Merge script: from MetaPhlAn output on single sample to a joined "clades vs samples" table
+# Authors: Timothy Tickle (ttickle at hsph.harvard.edu) and Curtis Huttenhower (chuttenh at hsph.harvard.edu)
+# ==============================================================================
+
+import argparse
+import csv
+import os
+import sys
+
+
+def merge( aaastrIn, astrLabels, iCol, ostm ):
+	"""
+	Outputs the table join of the given pre-split string collection.
+	
+	:param	aaastrIn:	One or more split lines from which data are read.
+	:type	aaastrIn:	collection of collections of string collections
+	:param	astrLabels:	File names of input data.
+	:type	astrLabels:	collection of strings
+	:param	iCol:		Data column in which IDs are matched (zero-indexed).
+	:type	iCol:		int
+	:param	ostm:		Output stream to which matched rows are written.
+	:type	ostm:		output stream
+
+	"""
+	
+	setstrIDs = set()
+	"""The final set of all IDs in any table."""
+	ahashIDs = [{} for i in range( len( aaastrIn ) )]
+	"""One hash of IDs to row numbers for each input datum."""
+	aaastrData = [[] for i in range( len( aaastrIn ) )]
+	"""One data table for each input datum."""
+	aastrHeaders = [[] for i in range( len( aaastrIn ) )]
+	"""The list of non-ID headers for each input datum."""
+	strHeader = "ID"
+	"""The ID column header."""
+
+	# For each input datum in each input stream...
+	pos = 0
+
+	for f in aaastrIn :
+		with open(f) as csvfile :
+			iIn = csv.reader(csvfile, csv.excel_tab)
+
+			# Lines from the current file, empty list to hold data, empty hash to hold ids
+			aastrData, hashIDs = (a[pos] for a in (aaastrData, ahashIDs))
+
+			iLine = -1
+			# For a line in the file
+			for astrLine in iIn:
+				iLine += 1
+
+				# ID is from first column, data are everything else
+				strID, astrData = astrLine[iCol], ( astrLine[:iCol] + astrLine[( iCol + 1 ):] )
+
+				hashIDs[strID] = iLine
+				aastrData.append( astrData )
+
+			# Batch merge every new ID key set
+			setstrIDs.update( hashIDs.keys( ) )
+
+		pos += 1
+
+	# Create writer
+	csvw = csv.writer( ostm, csv.excel_tab, lineterminator='\n' )
+
+	# Make the file names the column names
+	csvw.writerow( [strHeader] + [os.path.splitext(f)[0] for f in astrLabels] )
+
+	# Write out data
+	for strID in sorted( setstrIDs ):
+		astrOut = []
+		for iIn in range( len( aaastrIn ) ):
+			aastrData, hashIDs = (a[iIn] for a in (aaastrData, ahashIDs))
+			# Look up the row number of the current ID in the current dataset, if any
+			iID = hashIDs.get( strID )
+			# If not, start with no data; if yes, pull out stored data row
+			astrData = [0.0] if ( iID == None ) else aastrData[iID]
+			# Pad output data as needed
+			astrData += [None] * ( len( aastrHeaders[iIn] ) - len( astrData ) )
+			astrOut += astrData
+		csvw.writerow( [strID] + astrOut )
+
+
+argp = argparse.ArgumentParser( prog = "merge_metaphlan_tables.py",
+	description = """Performs a table join on one or more metaphlan output files.""")
+argp.add_argument( "aistms",	metavar = "input.txt", nargs = "+",
+	help = "One or more tab-delimited text tables to join" )
+
+__doc__ = "::\n\n\t" + argp.format_help( ).replace( "\n", "\n\t" )
+
+argp.usage = argp.format_usage()[7:]+"\n\n\tPlease make sure to supply file paths to the files to combine. If combining 3 files (Table1.txt, Table2.txt, and Table3.txt) the call should be:\n\n\t\tpython merge_metaphlan_tables.py Table1.txt Table2.txt Table3.txt > output.txt\n\n\tA wildcard to indicate all .txt files that start with Table can be used as follows:\n\n\t\tpython merge_metaphlan_tables.py Table*.txt > output.txt"
+
+
+def _main( ):
+	args = argp.parse_args( )
+	merge(args.aistms, [os.path.split(os.path.basename(f))[1] for f in args.aistms], 0, sys.stdout)
+
+
+if __name__ == "__main__":
+	_main( )
diff --git a/utils/metaphlan2krona.py b/utils/metaphlan2krona.py
new file mode 100755
index 0000000..fc1fd5a
--- /dev/null
+++ b/utils/metaphlan2krona.py
@@ -0,0 +1,49 @@
+#!/usr/bin/env python
+
+# ============================================================================== 
+# Conversion script: from MetaPhlAn output to Krona text input file
+# Author: Daniel Brami (daniel.brami at gmail.com)
+# ==============================================================================
+
+import sys
+import optparse
+import re
+
+def main():
+    #Parse Command Line
+    parser = optparse.OptionParser()
+    parser.add_option( '-p', '--profile', dest='profile', default='', action='store', help='The input file is the MetaPhlAn standard result file' )
+    parser.add_option( '-k', '--krona', dest='krona', default='krona.out', action='store', help='the Krons output file name' )
+    ( options, spillover ) = parser.parse_args()
+
+    if not options.profile or not options.krona:
+        parser.print_help()
+        sys.exit()
+
+    re_candidates = re.compile(r"s__|unclassified\t")
+    re_replace = re.compile(r"\w__")
+    re_bar = re.compile(r"\|")
+
+    metaPhLan = list()
+    with open(options.profile,'r') as f:
+        metaPhLan = f.readlines()
+    f.close()
+
+    krona_tmp = options.krona 
+    metaPhLan_FH = open(krona_tmp, 'w')
+
+    for aline in (metaPhLan):
+        if(re.search(re_candidates, aline)):
+            x=re.sub(re_replace, '\t', aline)
+            x=re.sub(re_bar, '', x)
+
+            x_cells = x.split('\t')
+            lineage = '\t'.join(x_cells[0:(len(x_cells) -1)])
+            abundance = float(x_cells[-1].rstrip('\n')) 
+
+            metaPhLan_FH.write('%s\n'%(str(abundance) + '\t' + lineage))
+
+    metaPhLan_FH.close()
+
+if __name__ == '__main__':
+    main()
diff --git a/utils/metaphlan_hclust_heatmap.py b/utils/metaphlan_hclust_heatmap.py
new file mode 100755
index 0000000..c306509
--- /dev/null
+++ b/utils/metaphlan_hclust_heatmap.py
@@ -0,0 +1,483 @@
+#!/usr/bin/env python
+
+import sys
+import numpy as np 
+import matplotlib
+matplotlib.use('Agg')
+import scipy
+import pylab
+import scipy.cluster.hierarchy as sch
+from scipy import stats
+
+# User defined color maps (in addition to matplotlib ones)
+bbcyr = {'red':  (  (0.0, 0.0, 0.0),
+                    (0.25, 0.0, 0.0),
+                    (0.50, 0.0, 0.0),
+                    (0.75, 1.0, 1.0),
+                    (1.0, 1.0, 1.0)),
+         'green': ( (0.0, 0.0, 0.0),
+                    (0.25, 0.0, 0.0),
+                    (0.50, 1.0, 1.0),
+                    (0.75, 1.0, 1.0),
+                    (1.0, 0.0, 1.0)),
+         'blue': (  (0.0, 0.0, 0.0),
+                    (0.25, 1.0, 1.0),
+                    (0.50, 1.0, 1.0),
+                    (0.75, 0.0, 0.0),
+                    (1.0, 0.0, 1.0))}
+
+bbcry = {'red':  (  (0.0, 0.0, 0.0),
+                    (0.25, 0.0, 0.0),
+                    (0.50, 0.0, 0.0),
+                    (0.75, 1.0, 1.0),
+                    (1.0, 1.0, 1.0)),
+         'green': ( (0.0, 0.0, 0.0),
+                    (0.25, 0.0, 0.0),
+                    (0.50, 1.0, 1.0),
+                    (0.75, 0.0, 0.0),
+                    (1.0, 1.0, 1.0)),
+         'blue': (  (0.0, 0.0, 0.0),
+                    (0.25, 1.0, 1.0),
+                    (0.50, 1.0, 1.0),
+                    (0.75, 0.0, 0.0),
+                    (1.0, 0.0, 1.0))}
+my_colormaps = [    ('bbcyr',bbcyr),
+                    ('bbcry',bbcry)]
+
+tax_units = "kpcofgs"
+
+def read_params(args):
+    import argparse as ap
+    import textwrap
+
+    p = ap.ArgumentParser( description= "This scripts generates heatmaps with hierarchical clustering \n"
+                                        "of both samples and microbial clades. The script can also subsample \n"
+                                        "the number of clades to display based on the their nth percentile \n"
+                                        "abundance value in each sample\n" )
+    
+    p.add_argument( '--in', metavar='INPUT_FILE', type=str,  default=None, required = True,
+                    help= "The input file of microbial relative abundances. \n"
+                          "This file is typically obtained with the \"utils/merge_metaphlan_tables.py\"\n")
+
+    p.add_argument( '--out', metavar='OUTPUT_FILE', type=str,  default=None, required = True,
+                    help= "The output image. \n"
+                          "The extension of the file determines the image format. png, pdf, and svg are the preferred format" )
+
+    p.add_argument( '-m', type=str,
+                    choices=[   "single","complete","average",
+                                "weighted","centroid","median",
+                                "ward" ],
+                    default="average", 
+                    help = "The hierarchical clustering method, default is \"average\"\n" )
+
+    dist_funcs = [  "euclidean","minkowski","cityblock","seuclidean",
+                    "sqeuclidean","cosine","correlation","hamming",
+                    "jaccard","chebyshev","canberra","braycurtis",
+                    "mahalanobis","yule","matching","dice",
+                    "kulsinski","rogerstanimoto","russellrao","sokalmichener",
+                    "sokalsneath","wminkowski","ward"]
+    p.add_argument( '-d', type=str, choices=dist_funcs, default="braycurtis",
+                    help="The distance function for samples. Default is \"braycurtis\"")
+    p.add_argument( '-f', type=str, choices=dist_funcs, default="correlation", 
+                    help="The distance function for microbes. Default is \"correlation\"")
+
+    p.add_argument( '-s', metavar='scale norm', type=str,
+                    default = 'lin', choices = ['log','lin'])
+
+    p.add_argument( '-x', type=float, default = 0.1, 
+                    help="Width of heatmap cells. Automatically set, this option should not be necessary unless for very large heatmaps")
+    p.add_argument( '-y', type=float, default = 0.1, 
+                    help="Height of heatmap cells. Automatically set, this option should not be necessary unless for very large heatmaps")
+
+    p.add_argument( '--minv', type=float, default = 0.0,
+                    help="Minimum value to display. Default is 0.0, values around 0.001 are also reasonable")
+    p.add_argument( '--maxv', metavar='max value', type=float,
+                    help="Maximum value to display. Default is maximum value present, can be set e.g. to 100 to display the full scale")
+    
+    p.add_argument( '--tax_lev', metavar='TAXONOMIC_LEVEL', type=str, 
+                    choices='a'+tax_units, default='s', help = 
+                   "The taxonomic level to display:\n"
+                   "'a' : all taxonomic levels\n"
+                   "'k' : kingdoms (Bacteria and Archaea) only\n"
+                   "'p' : phyla only\n"
+                   "'c' : classes only\n"
+                   "'o' : orders only\n"
+                   "'f' : families only\n"
+                   "'g' : genera only\n"
+                   "'s' : species only\n"
+                   "[default 's']" )
+
+    p.add_argument( '--perc', type=int, default=None,
+                    help="Percentile to be used for ordering the microbes in order to select with --top the most abundant microbes only. Default is 90")
+    p.add_argument( '--top', type=int, default=None,
+                    help="Display the --top most abundant microbes only (ordering based on --perc)")
+    
+    p.add_argument( '--sdend_h', type=float, default = 0.1,
+                    help="Set the height of the sample dendrogram. Default is 0.1")
+    p.add_argument( '--fdend_w', type=float, default = 0.1,
+                    help="Set the width of the microbes dendrogram. Default is 0.1")
+    p.add_argument( '--cm_h', type=float, default = 0.03,
+                    help="Set the height of the colormap. Default = 0.03" )
+    p.add_argument( '--cm_ticks', metavar='label for ticks of the colormap', type=str,
+                    default = None )
+    
+    p.add_argument( '--font_size', type=int, default = 7,
+                    help = "Set label font sizes. Default is 7\n" )
+    p.add_argument( '--clust_line_w',  type=float, default = 1.0,
+                    help="Set the line width for the dendrograms" )
+
+    col_maps = ['Accent', 'Blues', 'BrBG', 'BuGn', 'BuPu', 'Dark2', 'GnBu', 
+                'Greens', 'Greys', 'OrRd', 'Oranges', 'PRGn', 'Paired', 
+                'Pastel1', 'Pastel2', 'PiYG', 'PuBu', 'PuBuGn', 'PuOr', 
+                'PuRd', 'Purples', 'RdBu', 'RdGy', 'RdPu', 'RdYlBu', 'RdYlGn', 
+                'Reds', 'Set1', 'Set2', 'Set3', 'Spectral', 'YlGn', 'YlGnBu', 
+                'YlOrBr', 'YlOrRd', 'afmhot', 'autumn', 'binary', 'bone', 
+                'brg', 'bwr', 'cool', 'copper', 'flag', 'gist_earth', 
+                'gist_gray', 'gist_heat', 'gist_ncar', 'gist_rainbow', 
+                'gist_stern', 'gist_yarg', 'gnuplot', 'gnuplot2', 'gray', 
+                'hot', 'hsv', 'jet', 'ocean', 'pink', 'prism', 'rainbow', 
+                'seismic', 'spectral', 'spring', 'summer', 'terrain', 'winter'] + [n for n,c in my_colormaps]
+    p.add_argument( '-c', type=str, choices = col_maps, default = 'jet',
+                    help="Set the colormap. Default is \"jet\"." )
+
+    return vars(p.parse_args()) 
+
+# Predefined colors for dendrograms brances and class labels
+colors = [  "#B22222","#006400","#0000CD","#9400D3","#696969","#8B4513",
+            "#FF1493","#FF8C00","#3CB371","#00Bfff","#CDC9C9","#FFD700",
+            "#2F4F4F","#FF0000","#ADFF2F","#B03060" ]
+
+def samples2classes_panel(fig, samples, s2l, idx1, idx2, height, xsize, cols, legendon, fontsize, label2cols, legend_ncol ):
+    from matplotlib.patches import Rectangle
+    samples2labels = dict([(s,l) 
+                            for s,l in [ll.strip().split('\t') 
+                                for ll in open(s2l)]])
+   
+    if label2cols:
+        labels2colors = dict([(l[0],l[1]) for l in [ll.strip().split('\t') for ll in open(label2cols)]])
+    else:
+        cs = cols if cols else colors
+        labels2colors = dict([(l,cs[i%len(cs)]) for i,l in enumerate(set(samples2labels.values()))])
+    ax1 = fig.add_axes([0.,1.0,1.0,height],frameon=False)
+    ax1.set_xticks([])
+    ax1.set_yticks([])
+    ax1.set_ylim( [0.0, height] )
+    ax1.set_xlim( [0.0, xsize] )
+    step = xsize / float(len(samples))
+    labels = set()
+    added_labels = set()
+    for i,ind in enumerate(idx2):
+        if  not samples[ind] in samples2labels or \
+            not samples2labels[samples[ind]] in labels2colors:
+            fc, ll = None, None
+        else:
+            ll = samples2labels[samples[ind]]
+            ll = None if ll in added_labels else ll
+            added_labels.add( ll )
+            fc = labels2colors[samples2labels[samples[ind]]]
+    
+        rect = Rectangle( [float(i)*step, 0.0], step, height,
+                            facecolor = fc,
+                            label = ll,
+                            edgecolor='b', lw = 0.0)
+        labels.add( ll )
+        ax1.add_patch(rect)
+    ax1.autoscale_view()
+   
+    if legendon:
+        ax1.legend( loc = 2, ncol = legend_ncol, bbox_to_anchor=(1.01, 3.),
+                    borderpad = 0.0, labelspacing = 0.0,
+                    handlelength = 0.5, handletextpad = 0.3,
+                    borderaxespad = 0.0, columnspacing = 0.3,
+                    prop = {'size':fontsize}, frameon = False)
+
+def samples_dend_panel( fig, Z, Z2, ystart, ylen, lw ):
+    ax2 = fig.add_axes([0.0,1.0+ystart,1.0,ylen], frameon=False)
+    Z2['color_list'] = [c.replace('b','k') for c in Z2['color_list']]
+    mh = max(Z[:,2])
+    sch._plot_dendrogram(   Z2['icoord'], Z2['dcoord'], Z2['ivl'], 
+                            Z.shape[0] + 1, Z.shape[0] + 1, 
+                            mh, 'top', no_labels=True, 
+                            color_list=Z2['color_list'] )
+    for coll in ax2.collections:
+        coll._linewidths = (lw,)
+    ax2.set_xticks([])
+    ax2.set_yticks([])
+    ax2.set_xticklabels([])
+ 
+def features_dend_panel( fig, Z, Z2, width, lw ):
+    ax1 = fig.add_axes([-width,0.0,width,1.0], frameon=False)
+    Z2['color_list'] = [c.replace('b','k').replace('x','b') for c in Z2['color_list']]
+    mh = max(Z[:,2])
+    sch._plot_dendrogram(Z2['icoord'], Z2['dcoord'], Z2['ivl'], Z.shape[0] + 1, Z.shape[0] + 1, mh, 'right', no_labels=True, color_list=Z2['color_list'])
+    for coll in ax1.collections:
+        coll._linewidths = (lw,)
+    ax1.set_xticks([])
+    ax1.set_yticks([])
+    ax1.set_xticklabels([])
+ 
+
+def add_cmap( cmapdict, name ):
+    my_cmap = matplotlib.colors.LinearSegmentedColormap(name,cmapdict,256)
+    pylab.register_cmap(name=name,cmap=my_cmap)
+
+def init_fig(xsize,ysize,ncol):
+    fig = pylab.figure(figsize=(xsize,ysize))
+    sch._link_line_colors = colors[:ncol] 
+    return fig
+
+def heatmap_panel( fig, D, minv, maxv, idx1, idx2, cm_name, scale, cols, rows, label_font_size, cb_offset, cb_l, flabelson, slabelson, cm_ticks, gridon, bar_offset ):
+    cm = pylab.get_cmap(cm_name)
+    bottom_col = [    cm._segmentdata['red'][0][1],
+                      cm._segmentdata['green'][0][1],
+                      cm._segmentdata['blue'][0][1]   ]
+    axmatrix = fig.add_axes(    [0.0,0.0,1.0,1.0],
+                                axisbg=bottom_col)
+    if any([c < 0.95 for c in bottom_col]):
+        axmatrix.spines['right'].set_color('none')
+        axmatrix.spines['left'].set_color('none')
+        axmatrix.spines['top'].set_color('none')
+        axmatrix.spines['bottom'].set_color('none')
+    norm_f = matplotlib.colors.LogNorm if scale == 'log' else matplotlib.colors.Normalize
+    im = axmatrix.matshow(  D, norm = norm_f(   vmin=minv if minv > 0.0 else None,
+                                                vmax=maxv), 
+                            aspect='auto', origin='lower', cmap=cm, vmax=maxv)
+    
+    axmatrix2 = axmatrix.twinx()
+    axmatrix3 = axmatrix.twiny()
+   
+    axmatrix.set_xticks([])
+    axmatrix2.set_xticks([])
+    axmatrix3.set_xticks([])
+    axmatrix.set_yticks([])
+    axmatrix2.set_yticks([])
+    axmatrix3.set_yticks([])
+    
+    axmatrix.set_xticklabels([])
+    axmatrix2.set_xticklabels([])
+    axmatrix3.set_xticklabels([])
+    axmatrix.set_yticklabels([])
+    axmatrix2.set_yticklabels([])
+    axmatrix3.set_yticklabels([])
+
+    if any([c < 0.95 for c in bottom_col]):
+        axmatrix2.spines['right'].set_color('none')
+        axmatrix2.spines['left'].set_color('none')
+        axmatrix2.spines['top'].set_color('none')
+        axmatrix2.spines['bottom'].set_color('none')
+    if any([c < 0.95 for c in bottom_col]):
+        axmatrix3.spines['right'].set_color('none')
+        axmatrix3.spines['left'].set_color('none')
+        axmatrix3.spines['top'].set_color('none')
+        axmatrix3.spines['bottom'].set_color('none')
+    if flabelson:
+        axmatrix2.set_yticks(np.arange(len(rows))+0.5)
+        axmatrix2.set_yticklabels([rows[r] for r in idx1],size=label_font_size,va='center')
+    if slabelson:
+        axmatrix.set_xticks(np.arange(len(cols)))
+        axmatrix.set_xticklabels([cols[r] for r in idx2],size=label_font_size,rotation=90,va='top',ha='center')
+    axmatrix.tick_params(length=0)
+    axmatrix2.tick_params(length=0)
+    axmatrix3.tick_params(length=0)
+    axmatrix2.set_ylim(0,len(rows))
+  
+    if gridon:
+        axmatrix.set_yticks(np.arange(len(idx1)-1)+0.5)
+        axmatrix.set_xticks(np.arange(len(idx2))+0.5)
+        axmatrix.grid( True )
+        ticklines = axmatrix.get_xticklines()
+        ticklines.extend( axmatrix.get_yticklines() )
+        #gridlines = axmatrix.get_xgridlines()
+        #gridlines.extend( axmatrix.get_ygridlines() )
+
+        for line in ticklines:
+            line.set_linewidth(3)
+    
+    if cb_l > 0.0:
+        axcolor = fig.add_axes([0.0,1.0+bar_offset*1.25,1.0,cb_l])
+        cbar = fig.colorbar(im, cax=axcolor, orientation='horizontal')
+        cbar.ax.tick_params(labelsize=label_font_size)
+        if cm_ticks:
+            cbar.ax.set_xticklabels( cm_ticks.split(":") )
+
+
+def read_table( fin, xstart,xstop,ystart,ystop, percentile = None, top = None, tax_lev = 's' ):
+    mat = [l.strip().split('\t') for l in open( fin ) if l.strip()]
+    if tax_lev != 'a':
+        i = tax_units.index(tax_lev) 
+        mat = [m for i,m in enumerate(mat) if i == 0 or m[0].split('|')[-1][0] == tax_lev or ( len(m[0].split('|')) == i and m[0].split('|')[-1][0].endswith("unclassified"))]
+    sample_labels = mat[0][xstart:xstop]
+
+    m = [(mm[xstart-1],np.array([float(f) for f in mm[xstart:xstop]])) for mm in mat[ystart:ystop]]
+
+    if top and not percentile:
+        percentile = 90
+   
+    if percentile:
+        m = sorted(m,key=lambda x:-stats.scoreatpercentile(x[1],percentile))
+    if top:
+        feat_labels = [mm[0].split("|")[-1] for mm in m[:top]]
+        m = [mm[1] for mm in m[:top]]
+    else:
+        feat_labels = [mm[0].split("|")[-1] for mm in m]
+        m = [mm[1] for mm in m]
+    
+    D = np.matrix(  np.array( m ) )
+
+    return D, feat_labels, sample_labels
+
+def read_dm( fin, n ):
+    mat = [[float(f) for f in l.strip().split('\t')] for l in open( fin )]
+    nc = sum([len(r) for r in mat]) 
+    
+    if nc == n*n:
+        dm = []
+        for i in range(n):
+            dm += mat[i][i+1:]
+        return np.array(dm)
+    if nc == (n*n-n)/2:
+        dm = []
+        for i in range(n):
+            dm += mat[i]
+        return np.array(dm)
+    sys.stderr.write( "Error in reading the distance matrix\n" )
+    sys.exit()
+
+
+def hclust(  fin, fout,
+             method = "average",
+             dist_func = "euclidean",
+             feat_dist_func = "d",
+             xcw = 0.1,
+             ycw = 0.1,
+             scale = 'lin',
+             minv = 0.0,
+             maxv = None,
+             xstart = 1,
+             ystart = 1,
+             xstop = None,
+             ystop = None,
+             percentile = None,
+             top = None,
+             cm_name = 'jet',
+             s2l = None,
+             label_font_size = 7,
+             feat_dend_col_th = None,
+             sample_dend_col_th = None,
+             clust_ncols = 7,
+             clust_line_w = 1.0,
+             label_cols = None,
+             sdend_h = 0.1,
+             fdend_w = 0.1,
+             cm_h = 0.03,
+             dmf = None,
+             dms = None,
+             legendon = False,
+             label2cols = None,
+             flabelon = True,
+             slabelon = True,
+             cm_ticks = None,
+             legend_ncol = 3,
+             pad_inches = None,
+             legend_font_size = 7,
+             gridon = 0,
+             tax_lev = 's'):
+
+    if label_cols and label_cols.count("-"):
+        label_cols = label_cols.split("-")
+
+    for n,c in my_colormaps:
+        add_cmap( c, n )
+    
+    if feat_dist_func == 'd':
+        feat_dist_func = dist_func
+
+    D, feat_labels, sample_labels = read_table(fin,xstart,xstop,ystart,ystop,percentile,top,tax_lev=tax_lev)
+
+    ylen,xlen = D[:].shape
+    Dt = D.transpose() 
+
+    size_cx, size_cy = xcw, ycw
+ 
+    xsize, ysize = max(xlen*size_cx,2.0), max(ylen*size_cy,2.0)
+    ydend_offset = 0.025*8.0/ysize if s2l else 0.0
+
+    fig = init_fig(xsize,ysize,clust_ncols)
+
+    nfeats, nsamples = len(D), len(Dt) 
+    
+    if dmf:
+        p1 = read_dm( dmf, nfeats )
+        Y1 = sch.linkage(   p1, method=method )
+    else:
+        if len(D) < 2 or len(Dt) < 2: 
+            Y1 = []
+        elif feat_dist_func == 'correlation':
+            Y1 = sch.linkage(   D, method=method, metric=lambda x,y:max(0.0,scipy.spatial.distance.correlation(x,y)) )
+        else:
+            Y1 = sch.linkage(   D, method=method, metric=feat_dist_func )
+    
+    if len(Y1):
+        Z1 = sch.dendrogram(Y1, no_plot=True, color_threshold=feat_dend_col_th) 
+        idx1 = Z1['leaves']
+    else:
+        idx1 = list(range(len(D)))
+
+    if dms:
+        p2 = read_dm( dms, nsamples )
+        Y2 = sch.linkage(   p2, method=method )
+    else:
+        if len(Dt) < 2 or len(D) < 2:
+            Y2 = []
+        elif sample_dend_col_th == 'correlation':
+            Y2 = sch.linkage(   Dt, method=method, metric=lambda x,y:max(0.0,scipy.spatial.distance.correlation(x,y)) )
+        else:
+            Y2 = sch.linkage(   Dt, method=method, metric=dist_func )
+    
+    if len(Y2):
+        Z2 = sch.dendrogram(Y2, no_plot=True, color_threshold=sample_dend_col_th) 
+        idx2 = Z2['leaves']
+    else:
+        idx2 = list(range(len(Dt))) 
+    D = D[idx1,:][:,idx2]
+
+    if fdend_w > 0.0 and len(Y1):
+        features_dend_panel(fig, Y1, Z1, fdend_w*8.0/xsize, clust_line_w ) 
+    if sdend_h > 0.0 and len(Y2): 
+        samples_dend_panel(fig, Y2, Z2, ydend_offset, sdend_h*8.0/ysize, clust_line_w)
+ 
+   
+    if s2l:
+        samples2classes_panel( fig, sample_labels, s2l, idx1, idx2, 0.025*8.0/ysize, xsize, label_cols, legendon, legend_font_size, label2cols, legend_ncol )
+    heatmap_panel( fig, D, minv, maxv, idx1, idx2, cm_name, scale, sample_labels, feat_labels, label_font_size, -cm_h*8.0/ysize, cm_h*0.8*8.0/ysize, flabelon, slabelon, cm_ticks, gridon, ydend_offset+sdend_h*8.0/ysize )
+  
+    fig.savefig(    fout, bbox_inches='tight',  
+                    pad_inches = pad_inches, 
+                    dpi=300) if fout else pylab.show()
+
+if __name__ == '__main__':
+    pars = read_params( sys.argv )
+  
+    hclust(   fin  = pars['in'],
+              fout = pars['out'],
+              method = pars['m'],
+              dist_func = pars['d'],
+              feat_dist_func = pars['f'],
+              xcw = pars['x'],
+              ycw = pars['y'],
+              scale = pars['s'],
+              minv = pars['minv'],
+              maxv = pars['maxv'],
+              percentile = pars['perc'],
+              top = pars['top'],
+              cm_name = pars['c'],
+              label_font_size = pars['font_size'],
+              clust_line_w = pars['clust_line_w'],
+              sdend_h = pars['sdend_h'],
+              fdend_w = pars['fdend_w'],
+              cm_h = pars['cm_h'],
+              cm_ticks = pars['cm_ticks'],
+              pad_inches = 0.1,
+              tax_lev = pars['tax_lev']
+              )
+
diff --git a/utils/plot_bug.py b/utils/plot_bug.py
new file mode 100755
index 0000000..f580c2f
--- /dev/null
+++ b/utils/plot_bug.py
@@ -0,0 +1,254 @@
+#!/usr/bin/env python
+
+import sys
+import numpy as np
+import scipy.spatial.distance as spd 
+import scipy.cluster.hierarchy as sph
+from scipy import stats
+import matplotlib
+#matplotlib.use('Agg')
+import pylab
+import pandas as pd
+import matplotlib.pyplot as plt
+
+class ReadCmd:
+
+    def __init__( self ):
+        import argparse as ap
+        import textwrap
+
+        p = ap.ArgumentParser( description= "TBA" )
+        arg = p.add_argument
+        
+        arg( '-i', '--inp', '--in', metavar='INPUT_FILE', type=str, nargs='?', default=sys.stdin,
+             help= "The input matrix" )
+        arg( '-o', '--out', metavar='OUTPUT_FILE', type=str, nargs='?', default=None,
+             help= "The output image file [image on screen of not specified]" )
+
+        arg( '-m', '--metadata_file', type=str, default='None',
+             help= "The input metadata file [default None]" )
+
+        DataMatrix.input_parameters( p )
+        BarPlot.input_parameters( p )
+        self.args  = p.parse_args()
+
+    def check_consistency( self ):
+        pass
+
+    def get_args( self ):
+        return self.args
+
+class DataMatrix:
+    datatype = 'data_matrix'
+    
+    @staticmethod
+    def input_parameters( parser ):
+        dm_param = parser.add_argument_group('Input data matrix parameters')
+        arg = dm_param.add_argument
+
+        arg( '--sep', type=str, default='\t' )
+        arg( '-f', '--feat', type=str, default=None, required = True,
+             help = "Name of the feature to plot"
+                    "[or the ending string if --endswith is specified]")
+        arg( '--endswith', action='store_true',
+             help = "Match the ending part of the feature name" )
+        arg( '--fname_row', type=int, default=0,
+             help = "row number containing the names of the features "
+                    "[default 0, specify -1 if no names are present in the matrix")
+        arg( '--sname_row', type=int, default=0,
+             help = "column number containing the names of the samples "
+                    "[default 0, specify -1 if no names are present in the matrix")
+        arg( '--skip_rows', type=str, default=None,
+             help = "Row numbers to skip (0-indexed, comma separated) from the input file"
+                    "[default None, meaning no rows skipped")
+        arg( '--def_na', type=float, default=None,
+             help = "Set the default value for missing values [default None which means no replacement]")
+
+    def __init__( self, input_file, args ):
+        self.args = args
+        toskip = [int(l) for l in self.args.skip_rows.split(",")]  if self.args.skip_rows else None
+        self.table = pd.read_table( 
+                input_file, sep = self.args.sep, skipinitialspace = True, skiprows = toskip,
+                                  header = self.args.fname_row if self.args.fname_row > -1 else None,
+                                  index_col = self.args.sname_row if self.args.sname_row > -1 else None
+                                    )
+
+        rows = []
+
+        if self.args.endswith:
+            for n in self.table.index:
+                if n.endswith( self.args.feat  ):
+                    rows.append( n )
+        elif self.args.feat in self.table.index:
+            rows.append( self.args.feat )
+        self.table = self.table.reindex( index=rows )
+
+        if not len(rows):
+            sys.stderr.write("Error, feat "+self.args.feat+" not found!")
+            sys.exit()
+        if len(rows) > 1:
+            sys.stderr.write("Error, multiple features matching "+self.args.feat+" !")
+            sys.exit()
+
+        if not self.args.def_na is None:
+            self.table = self.table.fillna( self.args.def_na )
+
+    def get_numpy_matrix( self ): 
+        return self.table
+    
+    def get_snames( self ):
+        return list(self.table.index)
+    
+    def get_fnames( self ):
+        return list(self.table.columns)
+   
+    def save_matrix( self, output_file ):
+        self.table.to_csv( output_file, sep = '\t' )
+
+class MetadataMatrix:
+    datatype = 'metadata_matrix'
+    
+    @staticmethod
+    def input_parameters( parser ):
+        dm_param = parser.add_argument_group('Input metadata file')
+        arg = dm_param.add_argument
+
+        arg( '--sep', type=str, default='\t' )
+        arg( '--fname_row', type=int, default=0,
+             help = "row number containing the names of the features "
+                    "[default 0, specify -1 if no names are present in the matrix")
+        arg( '--def_na', type=float, default=None,
+             help = "Set the default value for missing values [default None which means no replacement]")
+
+    def __init__( self, input_file, args ):
+        self.args = args
+        self.table = pd.read_table( 
+                input_file, sep = self.args.sep, skipinitialspace = True, 
+                        #header = self.args.fname_row if self.args.fname_row > -1 else None,
+                                  index_col = self.args.sname_row if self.args.sname_row > -1 else None
+                                    )
+
+        if not self.args.def_na is None:
+            self.table = self.table.fillna( self.args.def_na )
+    
+    def get_snames( self ):
+        return list(self.table.index)
+    
+    def get_fnames( self ):
+        return list(self.table.columns)
+
+    def get_table( self ):
+        return self.table
+
+class BarPlot:
+    datatype = 'barplot'
+
+    @staticmethod
+    def input_parameters( parser ):
+        hm_param = parser.add_argument_group('Heatmap options')
+        arg = hm_param.add_argument
+
+        arg( '--dpi', type=int, default=72,
+             help = "Image resolution in dpi [default 72]")
+        arg( '-C', '--color_condition', type=str, default=None,
+             help = "The name of the metadata column used for coloring")
+        arg( '-H', '--hatch_condition', type=str, default=None,
+             help = "The name of the metadata column used for hatching")
+        arg( '-G', '--group_condition', type=str, default=None,
+             help = "The name of the metadata column used for grouping")
+        arg( '-t', '--title', type=str, default=None,
+             help = "The title of the plot [default no title]")
+        arg( '-l', '--log_scale', action='store_true',
+             help = "Log scale" )
+
+    
+    def __init__( self, numpy_matrix, metadata_matrix, args = None ):
+        self.numpy_matrix = numpy_matrix
+        self.mmatrix = metadata_matrix
+        self.args = args
+
+    def draw( self ):
+
+        fig = plt.figure( figsize=(20,8)  )
+        ax = fig.add_subplot(111)
+
+        width = 0.65      
+
+        names = list(self.numpy_matrix.index)
+        n0 = names[0]
+
+        tp = self.numpy_matrix.to_dict()
+        
+        keys = sorted(tp)
+        
+        if self.args.color_condition not in self.mmatrix:
+            self.args.color_condition = None
+        cond_values = [None] if self.args.color_condition is None else sorted(set(self.mmatrix[self.args.color_condition]) )
+        if self.args.hatch_condition not in self.mmatrix:
+            self.args.hatch_condition = None
+        hatch_values = [None] if self.args.hatch_condition is None else sorted(set(self.mmatrix[self.args.hatch_condition]) )
+       
+        if self.args.group_condition:
+            group_values = list(sorted(set(self.mmatrix[self.args.group_condition])))
+            keys = sorted( keys, key=lambda x:group_values.index(self.mmatrix[self.args.group_condition][x]) )
+        else:
+            keys, group_values = sorted( keys ), []
+
+        ind = np.arange( len(tp) )
+        pos = ind-width/2
+
+        hatches = ['//','\\\\','++','--','xx']
+        cols = ['r','g','c','b']
+        minv,maxv = 0.0, max([v[n0] for v in tp.values()])
+        
+        bar_sets = []
+        for i,c in (enumerate(cond_values) if len(cond_values) > 0 else None): 
+            for j,h in enumerate(hatch_values): 
+                values = [(tp[k][n0] if (c is None or self.mmatrix[self.args.color_condition][k] == c) 
+                                    and (h is None or self.mmatrix[self.args.hatch_condition][k] == h) else 0.0) for k in keys]
+                b = ax.bar(pos, values, width, hatch=hatches[j%len(hatches)] if len(hatch_values) > 1 else "", color=cols[i%len(cols)])
+                cond = self.args.color_condition + " "+str(c).strip()+", " if c else ""
+                hatch = self.args.hatch_condition + " "+str(h).strip()+", " if h else ""
+                bar_sets.append( (b,cond+hatch) )
+
+        v0 = ind[0]-0.5
+        vm1 = v0
+        ax.plot([v0,v0],[minv,maxv],"--",linewidth=2,color='k')
+        for g in group_values:
+            vm1 = v0
+            v0 += list(self.mmatrix[self.args.group_condition]).count(g)
+            ax.plot([v0,v0],[minv,maxv],"--",linewidth=2,color='k')
+            ax.text( (vm1+v0)*0.5, maxv * 0.9, str(g), horizontalalignment='center', verticalalignment='center' )
+            #ax.text( (vm1+v0)*0.5, maxv * 0.9, str(round(g,1)), horizontalalignment='center', verticalalignment='center' )
+
+        if self.args.color_condition or self.args.hatch_condition:
+            leg = ax.legend( zip(*bar_sets)[0], zip(*bar_sets)[1], bbox_to_anchor=(1.02, 0,0.3,1), loc=1,
+                           ncol=1, mode="expand", borderaxespad=0., frameon = False)
+
+        ax.set_xlim(-width,ind[-1]+width)
+        ax.set_ylim(0,maxv)
+        ax.set_xticks( ind )
+        ax.set_xticklabels( keys, rotation = 90 )
+        ax.set_title( self.args.title or "" )
+
+        if not self.args.out:
+            plt.show()
+        else:
+            fig.savefig( self.args.out, bbox_inches='tight', dpi = self.args.dpi, 
+                         bbox_extra_artists=((fig.get_axes()[0].get_legend(),) if self.args.color_condition or self.args.hatch_condition else None) ) #dpi = self.args.dpi )
+
+if __name__ == '__main__':
+
+    read = ReadCmd( )
+    read.check_consistency()
+    args = read.get_args()
+
+    dm = DataMatrix( args.inp, args )
+    mdm = MetadataMatrix( args.metadata_file, args ) 
+
+    bp = BarPlot( dm.get_numpy_matrix(), mdm.get_table(),args )      
+    bp.draw()
+
+
+
+
diff --git a/utils/species2genomes.txt b/utils/species2genomes.txt
new file mode 100644
index 0000000..5beec92
--- /dev/null
+++ b/utils/species2genomes.txt
@@ -0,0 +1,7678 @@
+s__Streptomyces_clavuligerus	3	GCF_000154925	GCF_000148465	GCF_000163875
+s__Crinalium_epipsammum	1	GCF_000317495
+s__Cronobacter_phage_CR5	1	PRJNA209076
+s__Schlesneria_paludicola	1	GCF_000255655
+s__Abiotrophia_defectiva	1	GCF_000160075
+s__Indian_peanut_clump_virus	1	PRJNA14882
+s__Pseudomonas_sp_Lz4W	1	GCF_000346225
+s__Acinetobacter_phage_AP205	1	PRJNA14710
+s__Enterobacteria_phage_BP_4795	1	PRJNA14287
+s__Cronobacter_phage_CR3	1	PRJNA167658
+s__Carrot_mottle_mimic_virus	1	PRJNA15085
+s__Hirschia_maritima	1	GCF_000378345
+s__Schizosaccharomyces_pombe	1	GCA_000002945
+s__Pseudomonas_phage_NH_4	1	PRJNA181065
+s__Candidatus_Nitrosopumilus_sp_AR2	1	GCF_000299395
+s__Kadipiro_virus	1	PRJNA14858
+s__Brachyspira_hampsonii	2	GCF_000334935	GCF_000316195
+s__Propionibacterium_phage_PHL067M10	1	PRJNA219115
+s__Staphylococcus_phage_phiETA	1	PRJNA14141
+s__Circovirus_like_genome_BBC_A	1	PRJNA39611
+s__Tomato_leaf_deformation_virus	2	PRJNA178590	PRJNA52633
+s__Pseudomonas_phage_JBD30	1	PRJNA188536
+s__Staphylococcus_phage_Twort	1	PRJNA15246
+s__Acidiphilium_multivorum	1	GCF_000202835
+s__Alistipes_onderdonkii	1	GCF_000374505
+s__Porcine_sapelovirus	1	PRJNA15400
+s__Clostridium_ljungdahlii	1	GCF_000143685
+s__Torque_teno_virus	1	PRJNA70005
+s__Methanococcus_aeolicus	1	GCF_000017185
+s__Clostridium_phage_phiZP2	1	PRJNA169232
+s__Bacillus_fordii	1	GCF_000374565
+s__Agrococcus_pavilionensis	1	GCF_000400485
+s__Pseudomonas_sp_HPB0071	1	GCF_000478505
+s__Facklamia_ignava	1	GCF_000301055
+s__Alistipes_indistinctus	1	GCF_000231275
+s__Staphylococcus_phage_StB20	1	PRJNA184156
+s__Staphylococcus_phage_StB27	1	PRJNA184157
+s__Liao_ning_virus	1	PRJNA16336
+s__Synechococcus_phage_S_PM2	1	PRJNA15223
+s__Bacillus_sonorensis	1	GCF_000342105
+s__Streptomyces_coelicolor	1	GCF_000203835
+s__Candidatus_Aquiluna_sp_IMCC13023	1	GCF_000257665
+s__Gremmeniella_abietina_RNA_virus_MS1	1	PRJNA14836
+s__Streptomyces_davawensis	1	GCF_000349325
+s__Streptococcus_equinus	2	GCF_000146405	GCF_000187265
+s__Exiguobacterium_sp_AT1b	1	GCF_000023045
+s__Leucas_zeylanica_yellow_vein_virus_satellite_DNA_beta	1	PRJNA41305
+s__Hyposoter_fugitivus_ichnovirus	1	PRJNA18779
+s__Hoeflea_sp_108	1	GCF_000372965
+s__Vallota_speciosa_virus	1	PRJNA167578
+s__Human_papillomavirus_126_like_viruses	1	PRJNA76727
+s__Salmonella_phage_RE_2010	1	PRJNA181070
+s__Lactobacillus_sakei	2	GCF_000026065	GCF_000478625
+s__Mycobacterium_phage_Hamulus	1	PRJNA215116
+s__Burkholderia_ambifaria	4	GCF_000181975	GCF_000019925	GCF_000182015	GCF_000203915
+s__Streptomyces_filamentosus	2	GCF_000156455	GCF_000156695
+s__Leptotrichia_wadei	2	GCF_000373345	GCF_000469405
+s__zeta_proteobacterium_SCGC_AB_602_C20	1	GCF_000379345
+s__Rhizobium_phage_RR1_B	1	PRJNA209212
+s__Leptospira_fainei	1	GCF_000306235
+s__Acanthocystis_turfacea_Chlorella_virus_1	1	PRJNA18527
+s__Nora_virus	1	PRJNA16656
+s__Wasabi_mottle_virus	1	PRJNA14733
+s__Papaya_leaf_curl_virus_betasatellite	1	PRJNA14448
+s__Botrytis_cinerea_mitovirus_1	1	PRJNA32247
+s__Razdan_virus	2	PRJNA225931	PRJNA226013
+s__Nectria_haematococca	1	GCA_000151355
+s__Verminephrobacter_eiseniae	1	GCF_000015565
+s__Desulfovibrio_gigas	1	GCF_000468495
+s__Paenibacillus_sp_HGF7	1	GCF_000214295
+s__Streptomyces_rimosus	1	GCF_000331185
+s__Coprothermobacter_platensis	1	GCF_000378005
+s__Sclerotinia_sclerotiorum	1	GCA_000146945
+s__Burkholderia_phage_BcepNazgul	1	PRJNA14305
+s__Candidatus_Nitrososphaera_gargensis	1	GCF_000303155
+s__Fischerella_sp_JSC_11	1	GCF_000231365
+s__Corynebacterium_efficiens	2	GCF_000011305	GCF_000160795
+s__Leptolyngbya_sp_PCC_7375	1	GCF_000316115
+s__Eubacterium_cellulosolvens	1	GCF_000183525
+s__Mycobacterium_phage_SiSi	1	PRJNA206026
+s__Leptolyngbya_sp_PCC_7376	1	GCF_000316605
+s__Oceanithermus_profundus	1	GCF_000183745
+s__Lactococcus_phage_r1t	1	PRJNA14225
+s__Chinese_wheat_mosaic_virus	1	PRJNA14694
+s__Mycobacterium_phage_Lockley	1	PRJNA30519
+s__Pseudoalteromonas_undina	1	GCF_000238275
+s__Persea_americana_endornavirus	1	PRJNA81035
+s__Pyrococcus_horikoshii	1	GCF_000011105
+s__Banana_streak_UI_virus	1	PRJNA66611
+s__Ruania_albidiflava	1	GCF_000421225
+s__Eclipta_yellow_vein_virus	1	PRJNA81215
+s__Blueberry_virus_A	1	PRJNA173920
+s__Eastern_equine_encephalitis_virus	1	PRJNA15429
+s__Nocardioidaceae_bacterium_Broad_1	1	GCF_000192415
+s__Clostridium_novyi	1	GCF_000014125
+s__Veillonella_sp_oral_taxon_780	1	GCF_000221605
+s__Thioalkalivibrio_sp_ARh3	1	GCF_000377265
+s__Grimontia_hollisae	1	GCF_000176515
+s__Thioalkalivibrio_sp_ARh5	1	GCF_000381805
+s__Thioalkalivibrio_sp_ARh4	1	GCF_000378265
+s__Meiothermus_timidus	1	GCF_000373205
+s__Niabella_aurantiaca	1	GCF_000374125
+s__Burkholderia_kururiensis	1	GCF_000341045
+s__Morogoro_virus	1	PRJNA39791
+s__Mycobacterium_phage_WIVsmall	1	PRJNA206482
+s__Montana_myotis_leukoencephalitis_virus	1	PRJNA15402
+s__Mycobacterium_phage_Trouble	1	PRJNA215119
+s__Collinsella_aerofaciens	1	GCF_000169035
+s__Vernonia_yellow_vein_Fujian_virus_betasatellite	1	PRJNA72143
+s__Phipapillomavirus_1	1	PRJNA16815
+s__Gloeobacter_kilaueensis	1	GCF_000484535
+s__Barley_yellow_mosaic_virus	1	PRJNA15362
+s__Corchorus_yellow_vein_mosaic_betasatellite	1	PRJNA192608
+s__Lactobacillus_prophage_Lj928	1	PRJNA14350
+s__Mycoplasma_bovigenitalium	1	GCF_000367805
+s__Streptomyces_sp_KhCrAH_244	1	GCF_000373505
+s__Thiorhodococcus_drewsii	1	GCF_000224065
+s__Streptomyces_ghanaensis	1	GCF_000156435
+s__Beet_black_scorch_virus_satellite_RNA	1	PRJNA14623
+s__Spodoptera_litura_nucleopolyhedrovirus	1	PRJNA14138
+s__Eubacterium_dolichum	1	GCF_000154285
+s__Burkholderia_ubonensis	1	GCF_000170335
+s__Eupatorium_vein_clearing_virus	1	PRJNA29879
+s__Roseobacter_litoralis	1	GCF_000154785
+s__Sphaerochaeta_pleomorpha	1	GCF_000236685
+s__Erwinia_phage_vB_EamP_L1	1	PRJNA181229
+s__Alloiococcus_otitis	1	GCF_000315445
+s__Minute_virus_of_mice	1	PRJNA14019
+s__Gremmeniella_abietina_RNA_virus_L1	1	PRJNA14824
+s__Bacillus_phage_BCP78	1	PRJNA177518
+s__Gremmeniella_abietina_RNA_virus_L2	1	PRJNA15230
+s__Kamiti_River_virus	1	PRJNA14896
+s__Dialister_succinatiphilus	1	GCF_000242435
+s__Hop_latent_virus	1	PRJNA15373
+s__Staphylococcus_pettenkoferi	1	GCF_000260275
+s__Poinsettia_mosaic_virus	1	PRJNA15366
+s__Corynebacterium_maris	1	GCF_000442645
+s__Thermodesulfatator_indicus	1	GCF_000217795
+s__Tomato_leaf_curl_Bangladesh_virus	1	PRJNA14245
+s__Propionibacterium_phage_P100D	1	PRJNA177534
+s__Tomato_leaf_curl_Hajipur_betasatellite	1	PRJNA175587
+s__Impatiens_necrotic_spot_virus	1	PRJNA14767
+s__Salinivibrio_costicola	1	GCF_000390145
+s__Enterobacteria_phage_mEp043_c_1	1	PRJNA183145
+s__Brochothrix_phage_BL3	1	PRJNA64549
+s__Enterococcus_sp_GMD2E	1	GCF_000296895
+s__Propionibacterium_avidum	3	GCF_000463645	GCF_000227295	GCF_000367205
+s__Haloferax_sp_ATCC_BAA_646	1	GCF_000336855
+s__Haloferax_sp_ATCC_BAA_645	1	GCF_000336835
+s__Haloferax_sp_ATCC_BAA_644	1	GCF_000336975
+s__Halosarcina_pallida	1	GCF_000337095
+s__Tobacco_curly_shoot_virus	1	PRJNA15257
+s__Tetrasphaera_elongata	1	GCF_000367525
+s__Banana_streak_UL_virus	1	PRJNA66613
+s__Broad_bean_necrosis_virus	1	PRJNA14870
+s__Capnocytophaga_sp_oral_taxon_380	1	GCF_000318255
+s__Nonlabens_dokdonensis	1	GCF_000332115
+s__Belliella_baltica	1	GCF_000265405
+s__Porcine_stool_associated_circular_virus_2	1	PRJNA202891
+s__Tomato_leaf_curl_Bangalore_virus_Ban5_satellite_DNA_beta	1	PRJNA28067
+s__Caldicellulosiruptor_owensensis	1	GCF_000166335
+s__Methanobrevibacter_smithii	23	GCF_000189975	GCF_000190115	GCF_000190035	GCF_000151245	GCF_000189915	GCF_000190135	GCF_000190095	GCF_000016525	GCF_000190075	GCF_000190015	GCF_000189955	GCF_000190175	GCF_000190055	GCF_000189935	GCF_000189875	GCF_000189795	GCF_000189815	GCF_000151225	GCF_000189855	GCF_000189995	GCF_000189895	GCF_000189835	GCF_000190155
+s__Leifsonia_aquatica	1	GCF_000469485
+s__Sphaeropsis_sapinea_RNA_virus_2	1	PRJNA14687
+s__Ancylobacter_sp_FA202	1	GCF_000380205
+s__Afipia_birgiae	1	GCF_000308295
+s__Brachybacterium_faecium	1	GCF_000023405
+s__Grapevine_leafroll_associated_virus_2	1	PRJNA15884
+s__Brucella_ovis	16	GCF_000366065	GCF_000370905	GCF_000365985	GCF_000365965	GCF_000367085	GCF_000371345	GCF_000366045	GCF_000413515	GCF_000365885	GCF_000365905	GCF_000365945	GCF_000365925	GCF_000366005	GCF_000370885	GCF_000016845	GCF_000367065
+s__Lactococcus_phage_bIL311	1	PRJNA14139
+s__Lactococcus_phage_bIL310	1	PRJNA14112
+s__Grapevine_leafroll_associated_virus_6	1	PRJNA77937
+s__Bacteroides_plebeius	1	GCF_000187895
+s__Grapevine_leafroll_associated_virus_4	1	PRJNA77935
+s__Grapevine_leafroll_associated_virus_5	1	PRJNA74429
+s__Meiothermus_ruber	2	GCF_000376665	GCF_000024425
+s__Thermus_phage_IN93	1	PRJNA14235
+s__Tiger_puffer_nervous_necrosis_virus	1	PRJNA41607
+s__Bradyrhizobium_sp_CCGE_LA001	1	GCF_000296215
+s__Eubacterium_infirmum	1	GCF_000242675
+s__Agrobacterium_sp_H13_3	1	GCF_000192635
+s__Vibrio_splendidus	13	GCF_000272105	GCF_000222625	GCF_000256485	GCF_000152765	GCF_000272345	GCF_000272245	GCF_000272225	GCF_000272125	GCF_000272285	GCF_000272265	GCF_000272305	GCF_000091465	GCF_000272325
+s__Eel_picornavirus_1	1	PRJNA219023
+s__Tomato_leaf_curl_virus_Pune_associated_DNA_beta	1	PRJNA18001
+s__Oryza_sativa_endornavirus	1	PRJNA16239
+s__Tomato_leaf_curl_Oman_virus	1	PRJNA52947
+s__Sphingomonas_sp_LH128	1	GCF_000293195
+s__Gallionella_sp_SCGC_AAA018_N21	1	GCF_000379385
+s__Acinetobacter_phage_Bphi_B1251	1	PRJNA181989
+s__Sclerophthora_macrospora_virus_B	1	PRJNA14912
+s__Sclerophthora_macrospora_virus_A	1	PRJNA14361
+s__Herbaspirillum_frisingense	1	GCF_000300975
+s__Nodularia_spumigena	2	GCF_000169135	GCF_000340565
+s__Cellulophaga_phage_phi12_2	1	PRJNA212943
+s__Fodinicurvata_sediminis	1	GCF_000420625
+s__Dickeya_sp_D_s0432_1	1	GCF_000474655
+s__Sclerotinia_sclerotiorum_hypovirus_1	1	PRJNA72389
+s__Sodalis_phage_phiSG1	1	PRJNA16583
+s__Tomato_chocolate_spot_virus	1	PRJNA39867
+s__Methylobacterium_radiotolerans	1	GCF_000019725
+s__Propionibacterium_sp_KPL1852	1	GCF_000477695
+s__Propionibacterium_sp_KPL1854	1	GCF_000477815
+s__Pelargonium_line_pattern_virus	1	PRJNA15413
+s__Citrobacter_sp_A1	1	GCF_000277565
+s__Melissococcus_plutonius	1	GCF_000270185
+s__Postia_placenta	1	GCA_000006255
+s__Sordaria_macrospora	1	GCA_000182805
+s__Vibrio_sp_624788	1	GCF_000316985
+s__Euphorbia_leaf_curl_virus	1	PRJNA14341
+s__Halomonas_smyrnensis	1	GCF_000265245
+s__actinobacterium_SCGC_AAA028_A23	1	GCF_000378905
+s__Thermus_islandicus	1	GCF_000421625
+s__Pineapple_bacilliform_comosus_virus	1	PRJNA60049
+s__Mycobacterium_phage_LeBron	1	PRJNA51673
+s__Clostridium_nexile	1	GCF_000156035
+s__Pseudomonas_phage_phi297	1	PRJNA82641
+s__Sida_yellow_net_virus	1	PRJNA189215
+s__Methanothermobacter_phage_psiM100	1	PRJNA14289
+s__Bacteroides_helcogenes	1	GCF_000186225
+s__Bacillus_phage_PBC1	1	PRJNA167662
+s__Anaerostipes_caccae	1	GCF_000154305
+s__Thermococcus_barophilus	1	GCF_000151105
+s__Rhizobium_phaseoli	1	GCF_000268285
+s__Cyprinid_herpesvirus_2	1	PRJNA181228
+s__Cyprinid_herpesvirus_3	1	PRJNA19059
+s__Himetobi_P_virus	1	PRJNA14801
+s__Cyprinid_herpesvirus_1	1	PRJNA181227
+s__Granulicella_tundricola	1	GCF_000178975
+s__Bacillus_halodurans	1	GCF_000011145
+s__Pseudomonas_phage_Phi_S1	1	PRJNA197298
+s__Treponema_vincentii	2	GCF_000412995	GCF_000175895
+s__Nitrospina_gracilis	1	GCF_000341545
+s__Ourmia_melon_virus	1	PRJNA30737
+s__Psittacid_herpesvirus_1	1	PRJNA14314
+s__Prevotella_oris	3	GCF_000142965	GCF_000162915	GCF_000377685
+s__Melon_chlorotic_mosaic_virus_associated_alphasatellite	1	PRJNA51413
+s__Ustilaginoidea_virens_RNA_virus_1	1	PRJNA196971
+s__Cyclovirus_PKgoat11_PAK_2009	1	PRJNA61949
+s__Xanthomonas_axonopodis	70	GCF_000266345	GCF_000266045	GCF_000266165	GCF_000266545	GCF_000265905	GCF_000266405	GCF_000265605	GCF_000266425	GCF_000266305	GCF_000265705	GCF_000259445	GCF_000309925	GCF_000266725	GCF_000265665	GCF_000285775	GCF_000266245	GCF_000265865	GCF_000266665	GCF_000265765	GCF_000265985	GCF_000266505	GCF_000265745	GCF_000266145	GCF_000266085	GCF_000266385	GCF_000266025	GCF_000309905	GCF_000266325	GCF_000265845	GCF_000265565	GCF_000265945	GCF_000266445	GCF_000265625	G [...]
+s__Cupriavidus_taiwanensis	2	GCF_000372525	GCF_000069785
+s__Sclerotinia_sclerotiorum_endornavirus_1	1	PRJNA210796
+s__Canine_adenovirus_A	1	PRJNA14516
+s__Rinderpest_virus	1	PRJNA15050
+s__Pseudomonas_fuscovaginae	3	GCF_000251185	GCF_000280575	GCF_000364705
+s__Arabis_mosaic_virus_large_satellite_RNA	1	PRJNA14752
+s__Lymphocystis_disease_virus_isolate_China	1	PRJNA14472
+s__Haemophilus_parasuis	13	GCF_000444625	GCF_000444605	GCF_000444585	GCF_000021885	GCF_000172375	GCF_000478405	GCF_000444685	GCF_000444705	GCF_000444545	GCF_000444645	GCF_000444665	GCF_000439395	GCF_000444565
+s__Vibrio_phage_pYD21_A	1	PRJNA195477
+s__Infectious_spleen_and_kidney_necrosis_virus	1	PRJNA14600
+s__Niastella_koreensis	1	GCF_000246855
+s__Arcobacter_butzleri	4	GCF_000014025	GCF_000215345	GCF_000284355	GCF_000185325
+s__Synechococcus_sp_WH_7805	1	GCF_000153285
+s__Rhodobacter_capsulatus	7	GCF_000506565	GCF_000021865	GCF_000506425	GCF_000505785	GCF_000506545	GCF_000506525	GCF_000506965
+s__Synechococcus_sp_WH_7803	1	GCF_000063505
+s__Borrelia_burgdorferi	16	GCF_000172315	GCF_000166635	GCF_000181575	GCF_000008685	GCF_000181855	GCF_000382565	GCF_000171755	GCF_000021405	GCF_000171735	GCF_000181555	GCF_000181715	GCF_000172255	GCF_000444465	GCF_000172335	GCF_000172295	GCF_000166655
+s__Pseudoalteromonas_tunicata	1	GCF_000153245
+s__Solitalea_canadensis	1	GCF_000242635
+s__Methylobacterium_sp_4_46	1	GCF_000019365
+s__Mouse_parvovirus_3	1	PRJNA17123
+s__Mouse_parvovirus_1	1	PRJNA14325
+s__Mouse_parvovirus_4	1	PRJNA33009
+s__Mycoplasma_wenyonii	1	GCF_000277795
+s__Desulfobacca_acetoxidans	1	GCF_000195295
+s__Agropyron_mosaic_virus	1	PRJNA15063
+s__Spodoptera_frugiperda_ascovirus_1a	1	PRJNA17721
+s__Potato_yellow_dwarf_virus	1	PRJNA74995
+s__Bacillus_phage_Fah	1	PRJNA16382
+s__Mycobacterium_sp_MCS	1	GCF_000014165
+s__Glaciecola_sp_4H_3_7_YE_5	1	GCF_000212335
+s__Canine_papillomavirus_14	1	PRJNA183910
+s__Corynebacterium_sp_KPL1859	1	GCF_000478015
+s__Grapevine_vein_clearing_virus	1	PRJNA70007
+s__Sulfurihydrogenibium_sp_YO3AOP1	1	GCF_000020325
+s__Corynebacterium_sp_KPL1855	1	GCF_000478075
+s__Corynebacterium_sp_KPL1856	1	GCF_000478055
+s__Corynebacterium_sp_KPL1857	1	GCF_000478035
+s__Aggregatibacter_segnis	1	GCF_000185305
+s__Streptomyces_sp_CNT302	1	GCF_000377525
+s__Megamonas_hypermegale	1	GCF_000209975
+s__Brucella_sp_56_94	1	GCF_000370925
+s__Clerodendrum_golden_mosaic_virus	1	PRJNA29849
+s__Candidatus_Chloracidobacterium_thermophilum	1	GCF_000226295
+s__Culex_originated_Tymoviridae_like_virus	1	PRJNA176434
+s__Pseudomonas_phage_phi_2	1	PRJNA42717
+s__Tomato_yellow_leaf_curl_Vietnam_virus	1	PRJNA19785
+s__Cryptosporidium_muris	1	GCA_000006515
+s__Alicyclobacillus_acidocaldarius	3	GCF_000024285	GCF_000173835	GCF_000219875
+s__Paramecium_bursaria_Chlorella_virus_1	1	PRJNA14564
+s__Gemella_sanguinis	1	GCF_000204335
+s__Equine_foamy_virus	1	PRJNA14738
+s__East_African_cassava_mosaic_Cameroon_virus	1	PRJNA15180
+s__Mosquito_flavivirus	1	PRJNA198479
+s__Banana_streak_Mysore_virus	1	PRJNA15234
+s__Pseudomonas_syringae	62	GCF_000416805	GCF_000416865	GCF_000282735	GCF_000344515	GCF_000012245	GCF_000344435	GCF_000245435	GCF_000416665	GCF_000233835	GCF_000416585	GCF_000344375	GCF_000416945	GCF_000344475	GCF_000145925	GCF_000177515	GCF_000416705	GCF_000344555	GCF_000245395	GCF_000416485	GCF_000344455	GCF_000416545	GCF_000412165	GCF_000333995	GCF_000344495	GCF_000416885	GCF_000344335	GCF_000416845	GCF_000416685	GCF_000416905	GCF_000233795	GCF_000331385	GCF_000344395	GCF_000416645	GCF [...]
+s__Cyanophage_P_SSP2	1	PRJNA81179
+s__Bacteroides_nordii	1	GCF_000273175
+s__Bacillus_atrophaeus	3	GCF_000385965	GCF_000264395	GCF_000165925
+s__Enterobacteria_phage_vB_EcoS_ACG_M12	1	PRJNA179414
+s__Enterobacterial_phage_mEp213	1	PRJNA183152
+s__Salinarchaeum_sp_Harcht_Bsk1	1	GCF_000403645
+s__Ponticaulis_koreensis	1	GCF_000420665
+s__Methylobacterium_sp_WSM2598	1	GCF_000379105
+s__Halorubrum_lacusprofundi	1	GCF_000022205
+s__Tobacco_rattle_virus	1	PRJNA14808
+s__Anguillid_rhabdovirus	1	PRJNA224248
+s__Arthrobacter_sp_TB_23	1	GCF_000294595
+s__Flexistipes_sinusarabici	1	GCF_000218625
+s__Bovine_parvovirus	1	PRJNA14020
+s__Frateuria_aurantia	1	GCF_000242255
+s__Rickettsia_africae	1	GCF_000023005
+s__Staphylococcus_sp_EGD_HP3	1	GCF_000463545
+s__Goose_hemorrhagic_polyomavirus	1	PRJNA14286
+s__Hoeflea_phototrophica	1	GCF_000154705
+s__Tibrogargan_virus	1	PRJNA194142
+s__Rubritalea_marina	1	GCF_000378105
+s__Streptococcus_tigurinus	4	GCF_000442155	GCF_000344275	GCF_000442175	GCF_000344255
+s__O_nyong_nyong_virus	1	PRJNA15311
+s__Shewanella_denitrificans	1	GCF_000013765
+s__Tomato_spotted_wilt_virus	1	PRJNA14997
+s__Phthorimaea_operculella_granulovirus	1	PRJNA14202
+s__Ateles_paniscus_polyomavirus_1	1	PRJNA183902
+s__Pseudomonas_phage_73	1	PRJNA16384
+s__Pepper_leaf_curl_Bangladesh_virus	1	PRJNA14218
+s__Rosellinia_necatrix_quadrivirus_1	1	PRJNA82351
+s__Ictalurid_herpesvirus_1	1	PRJNA14018
+s__Mycobacterium_colombiense	1	GCF_000222105
+s__Aeromonas_phage_25	1	PRJNA17105
+s__Pseudomonas_mandelii	2	GCF_000257545	GCF_000381285
+s__Rubella_virus	1	PRJNA15315
+s__Arcobacter_sp_L	1	GCF_000284235
+s__Synechococcus_phage_Syn19	1	PRJNA64709
+s__Oceanicola_granulosus	1	GCF_000153305
+s__Homalodisca_vitripennis_reovirus	1	PRJNA36621
+s__Thermococcus_prieurii_virus_1	1	PRJNA84407
+s__Pepper_huasteco_yellow_vein_virus	1	PRJNA14059
+s__Pseudanabaena_sp_PCC_7367	1	GCF_000317065
+s__Sphingomonas_sp_S17	1	GCF_000211795
+s__Paracoccus_sp_TRP	1	GCF_000185925
+s__Arthrobacter_aurescens	1	GCF_000014925
+s__Tomato_yellow_leaf_curl_virus	1	PRJNA15182
+s__Rickettsia_massiliae	2	GCF_000016625	GCF_000283855
+s__Synechococcus_phage_metaG_MbCM1	1	PRJNA181073
+s__Roseobacter_phage_SIO1	1	PRJNA14308
+s__Thiothrix_nivea	1	GCF_000260135
+s__Mycobacterium_phage_Porky	1	PRJNA30699
+s__Enterobacter_cancerogenus	1	GCF_000155995
+s__Streptomyces_sp_HmicA12	1	GCF_000373565
+s__Papaya_leaf_curl_Guandong_virus	1	PRJNA14537
+s__Trichomonas_vaginalis_virus_2	1	PRJNA14822
+s__Trichomonas_vaginalis_virus_3	1	PRJNA14837
+s__Clostridium_sp_ATCC_29733	1	GCF_000466605
+s__Mycobacterium_rhodesiae	2	GCF_000230895	GCF_000230935
+s__Lactobacillus_phage_phiadh	1	PRJNA14588
+s__Pseudomonas_sp_R62	1	GCF_000257605
+s__Bat_coronavirus_BM48_31_BGR_2008	1	PRJNA51751
+s__Clavibacter_phage_CMP1	1	PRJNA42947
+s__Methanosaeta_harundinacea	1	GCF_000235565
+s__Escherichia_phage_phiV10	1	PRJNA16381
+s__Desulfovibrio_oxyclinae	1	GCF_000375485
+s__Pseudomonas_sp_GM78	1	GCF_000282475
+s__Morganella_morganii	1	GCF_000286435
+s__Grapevine_leafroll_associated_virus_1	1	PRJNA80677
+s__Rhodopirellula_sp_SWK7	1	GCF_000346425
+s__Pectobacterium_phage_PP1	1	PRJNA181988
+s__Psychromonas_ossibalaenae	1	GCF_000381745
+s__Acidianus_two_tailed_virus	1	PRJNA15686
+s__Bandicoot_papillomatosis_carcinomatosis_virus_type_1	1	PRJNA27985
+s__Bandicoot_papillomatosis_carcinomatosis_virus_type_2	1	PRJNA30081
+s__Chloroflexus_aurantiacus	1	GCF_000018865
+s__Agromyces_subbeticus	1	GCF_000421565
+s__Roseomonas_cervicalis	1	GCF_000164635
+s__Acinetobacter_sp_CIP_102529	1	GCF_000368325
+s__Enterobacteria_phage_G4_sensu_lato	1	PRJNA14318
+s__Bacteroidales_bacterium_ph8	1	GCF_000311925
+s__Actinopolymorpha_alba	1	GCF_000373925
+s__Vibrio_ordalii	6	GCF_000287095	GCF_000287075	GCF_000287115	GCF_000287135	GCF_000287155	GCF_000257205
+s__actinobacterium_SCGC_AAA278_O22	1	GCF_000372185
+s__Kingella_oralis	1	GCF_000160435
+s__Acidianus_filamentous_virus_9	1	PRJNA29195
+s__Acidianus_filamentous_virus_8	1	PRJNA28079
+s__Acidianus_filamentous_virus_7	1	PRJNA28077
+s__Acidianus_filamentous_virus_6	1	PRJNA28075
+s__Atkinsonella_hypoxylon_virus	1	PRJNA14164
+s__Acidianus_filamentous_virus_3	1	PRJNA28073
+s__Acidianus_filamentous_virus_2	1	PRJNA20965
+s__Luffa_yellow_mosaic_virus	1	PRJNA14290
+s__Perina_nuda_virus	1	PRJNA14717
+s__Citrus_tristeza_virus	1	PRJNA15334
+s__Fischerella_sp_PCC_9339	1	GCF_000315585
+s__Shuni_virus	1	PRJNA173357
+s__Rickettsia_conorii	4	GCF_000007025	GCF_000263815	GCF_000257435	GCF_000261325
+s__Cotesia_congregata_bracovirus	1	PRJNA14556
+s__Sphaerochaeta_globosa	1	GCF_000190435
+s__Pseudomonas_aeruginosa	137	GCF_000481865	GCF_000481845	GCF_000481825	GCF_000480965	GCF_000467675	GCF_000258285	GCF_000481785	GCF_000482005	GCF_000481205	GCF_000480765	GCF_000480865	GCF_000481425	GCF_000481325	GCF_000480395	GCF_000480685	GCF_000480845	GCF_000481305	GCF_000481385	GCF_000481025	GCF_000359505	GCF_000265035	GCF_000480475	GCF_000290555	GCF_000480515	GCF_000259025	GCF_000481045	GCF_000481885	GCF_000481945	GCF_000481145	GCF_000481905	GCF_000480745	GCF_000481405	GCF_000226155	 [...]
+s__Nitrospina_sp_AB_629_B06	1	GCF_000375745
+s__Atopobium_rimae	1	GCF_000174015
+s__Burkholderia_thailandensis	8	GCF_000170315	GCF_000012365	GCF_000170395	GCF_000179515	GCF_000170495	GCF_000266985	GCF_000385525	GCF_000152285
+s__Delftia_acidovorans	3	GCF_000411215	GCF_000018665	GCF_000411195
+s__Shigella_phage_Shfl1	1	PRJNA66345
+s__Mycobacterium_sp_141	1	GCF_000382405
+s__Xanthomonas_phage_OP2	1	PRJNA16300
+s__Xanthomonas_phage_OP1	1	PRJNA16299
+s__Nudaurelia_capensis_beta_virus	1	PRJNA14982
+s__Enterobacteria_phage_HK106	1	PRJNA183158
+s__Bacteroides_sp_9_1_42FAA	1	GCF_000157075
+s__Giardia_intestinalis	1	GCA_000002435
+s__Natronococcus_occultus	1	GCF_000328685
+s__Azoarcus_sp_KH32C	1	GCF_000349945
+s__Curtobacterium_ginsengisoli	1	GCF_000419445
+s__Tomato_chlorotic_leaf_distortion_virus	1	PRJNA72721
+s__Bdellovibrio_bacteriovorus	2	GCF_000196175	GCF_000317895
+s__Kytococcus_sedentarius	1	GCF_000023925
+s__Okra_enation_leaf_curl_betasatellite	1	PRJNA61781
+s__Thermoanaerobacter_indiensis	1	GCF_000373165
+s__Marichromatium_purpuratum	1	GCF_000224005
+s__Sweet_potato_latent_virus	1	PRJNA196180
+s__Thauera_phenylacetica	1	GCF_000310225
+s__Desulfomonile_tiedjei	1	GCF_000266945
+s__Bacillus_vallismortis	1	GCF_000245315
+s__Lachnospiraceae_bacterium_1_4_56FAA	1	GCF_000218385
+s__Pseudomonas_phage_LMA2	1	PRJNA31055
+s__Elm_mottle_virus	1	PRJNA14760
+s__Cuban_alphasatellite_1	1	PRJNA210798
+s__Rudbeckia_flower_distortion_virus	1	PRJNA33679
+s__Human_polyomavirus_9	1	PRJNA63123
+s__Yersinia_phage_phiR201	1	PRJNA184144
+s__Hollyhock_yellow_vein_mosaic_virus	1	PRJNA81151
+s__Polaromonas_sp_JS666	1	GCF_000013865
+s__Equine_pegivirus_1	1	PRJNA196421
+s__Enterovibrio_calviensis	3	GCF_000286915	GCF_000286875	GCF_000286895
+s__Actinobacillus_minor	2	GCF_000175195	GCF_000174155
+s__Brochothrix_phage_A9	1	PRJNA64547
+s__Bathycoccus_sp_RCC1105_virus_BpV	1	PRJNA61009
+s__Shewanella_putrefaciens	2	GCF_000169215	GCF_000016585
+s__Blackberry_yellow_vein_associated_virus	1	PRJNA15168
+s__Burkholderia_sp_YI23	1	GCF_000236065
+s__Honeysuckle_yellow_vein_mosaic_disease_associated_satellite_DNA_beta	1	PRJNA19863
+s__Gardnerella_vaginalis	36	GCF_000414485	GCF_000414505	GCF_000263475	GCF_000178355	GCF_000263555	GCF_000414425	GCF_000414445	GCF_000414465	GCF_000176495	GCF_000263655	GCF_000414645	GCF_000263535	GCF_000165635	GCF_000414585	GCF_000214315	GCF_000414625	GCF_000414605	GCF_000414565	GCF_000414665	GCF_000263615	GCF_000159155	GCF_000176475	GCF_000263595	GCF_000414685	GCF_000263515	GCF_000263575	GCF_000414705	GCF_000213955	GCF_000165615	GCF_000414545	GCF_000414525	GCF_000263495	GCF_000025205	GC [...]
+s__Thermotoga_naphthophila	1	GCF_000025105
+s__Bhendi_yellow_vein_betasatellite	1	PRJNA14445
+s__Pedobacter_agri	1	GCF_000258495
+s__Hymenobacter_aerophilus	1	GCF_000382225
+s__Mycobacterium_phage_Sarfire	1	PRJNA219123
+s__Mal_de_Rio_Cuarto_virus	1	PRJNA18539
+s__Oat_blue_dwarf_virus	1	PRJNA15341
+s__Roseobacter_sp_GAI101	1	GCF_000156335
+s__Imperata_yellow_mottle_virus	1	PRJNA32677
+s__Lactobacillus_helveticus	6	GCF_000422165	GCF_000160855	GCF_000165775	GCF_000189515	GCF_000015385	GCF_000195355
+s__Ferroglobus_placidus	1	GCF_000025505
+s__Pseudomonas_phage_phiCTX	1	PRJNA14415
+s__Oceanobacillus_iheyensis	1	GCF_000011245
+s__Clostridium_butyricum	4	GCF_000355785	GCF_000182605	GCF_000171115	GCF_000371625
+s__Rickettsia_montanensis	1	GCF_000284175
+s__Idiomarina_sp_A28L	1	GCF_000218785
+s__Adoxophyes_honmai_entomopoxvirus_L	1	PRJNA203665
+s__Dialister_invisus	1	GCF_000160055
+s__Azospirillum_lipoferum	2	GCF_000010725	GCF_000283655
+s__Gillisia_limnaea	1	GCF_000243235
+s__Redspotted_grouper_nervous_necrosis_virus	1	PRJNA16819
+s__Bartonella_henselae	1	GCF_000046705
+s__Desulfovibrio_longus	1	GCF_000420485
+s__Burkholderia_phage_phiE125	1	PRJNA14330
+s__Verrucosispora_maris	1	GCF_000204155
+s__Pseudomonas_coronafaciens	1	GCF_000156995
+s__Cowpea_chlorotic_mottle_virus	1	PRJNA14758
+s__Staphylococcus_phage_P68	1	PRJNA14269
+s__Lactococcus_phage_1706	1	PRJNA29283
+s__Simian_hemorrhagic_fever_virus	1	PRJNA14727
+s__Leptospira_wolffii	1	GCF_000306115
+s__Citromicrobium_sp_JLT1363	1	GCF_000186705
+s__Aquifex_aeolicus	1	GCF_000008625
+s__Bradyrhizobium_elkanii	2	GCF_000379145	GCF_000257685
+s__Enterobacteria_phage_SSL_2009a	1	PRJNA34919
+s__Desulfurococcus_fermentans	1	GCF_000231015
+s__Spirosoma_panaciterrae	1	GCF_000374025
+s__Mycobacterium_phage_Bxz2	1	PRJNA14275
+s__Mycobacterium_phage_Bxz1	1	PRJNA14309
+s__Xanthomonas_vesicatoria	1	GCF_000192025
+s__Bacillus_phage_Curly	1	PRJNA192873
+s__Brachybacterium_muris	1	GCF_000338055
+s__Pseudomonas_protegens	2	GCF_000012265	GCF_000397205
+s__Porcine_epidemic_diarrhea_virus	1	PRJNA14739
+s__Dragonfly_associated_mastrevirus	1	PRJNA181243
+s__Roseovarius_nubinhibens	1	GCF_000152625
+s__Cercopithecine_herpesvirus_9	1	PRJNA14596
+s__Cercopithecine_herpesvirus_5	1	PRJNA38429
+s__Allochromatium_vinosum	1	GCF_000025485
+s__Cercopithecine_herpesvirus_2	1	PRJNA14558
+s__Enterobacteria_phage_285P	1	PRJNA64539
+s__Flavobacterium_limnosediminis	1	GCF_000498535
+s__Torque_teno_sus_virus_1a	1	PRJNA48139
+s__Thermofilum_pendens	1	GCF_000015225
+s__Betacoronavirus_1	1	PRJNA15438
+s__Mycobacterium_phage_Corndog	1	PRJNA14272
+s__Pseudomonas_caeni	1	GCF_000421765
+s__Leptotrichia_shahii	1	GCF_000373045
+s__Marinobacter_algicola	1	GCF_000170835
+s__Mycobacterium_phage_Phelemich	1	PRJNA215112
+s__Methylovorus_glucosotrophus	1	GCF_000023745
+s__Sida_mosaic_Alagoas_virus	1	PRJNA81007
+s__Enterococcus_saccharolyticus	3	GCF_000234175	GCF_000407285	GCF_000407005
+s__Clostridium_clostridioforme	9	GCF_000371545	GCF_000371405	GCF_000371565	GCF_000371605	GCF_000371585	GCF_000371505	GCF_000371485	GCF_000371525	GCF_000234155
+s__Lactobacillus_reuteri	12	GCF_000159455	GCF_000179455	GCF_000159615	GCF_000179435	GCF_000236455	GCF_000168255	GCF_000010005	GCF_000016825	GCF_000160715	GCF_000159475	GCF_000410995	GCF_000439275
+s__Tomato_yellow_leaf_curl_Thailand_virus_associated_DNA_1	1	PRJNA14300
+s__Vibrio_sp_MED222	1	GCF_000153005
+s__Nocardiopsis_dassonvillei	1	GCF_000092985
+s__Spiroplasma_melliferum	2	GCF_000328865	GCF_000236085
+s__Agrotis_ipsilon_multiple_nucleopolyhedrovirus	1	PRJNA32171
+s__Stomatobaculum_longum	1	GCF_000242235
+s__Enterobacteria_phage_N15	1	PRJNA14086
+s__Adeno_associated_virus_7	1	PRJNA14454
+s__Adeno_associated_virus_5	1	PRJNA14426
+s__Adeno_associated_virus_4	1	PRJNA14030
+s__Acinetobacter_haemolyticus	6	GCF_000301715	GCF_000369085	GCF_000369065	GCF_000164055	GCF_000309035	GCF_000302315
+s__Adeno_associated_virus_2	1	PRJNA14060
+s__Adeno_associated_virus_1	1	PRJNA15323
+s__Adeno_associated_virus_8	1	PRJNA14455
+s__Halorubrum_kocurii	1	GCF_000337355
+s__Enterococcus_italicus	1	GCF_000185365
+s__Sorghum_mosaic_virus	1	PRJNA15098
+s__Lactococcus_phage_Tuc2009	1	PRJNA14131
+s__Natrialba_chahannaoensis	1	GCF_000337135
+s__Chelativorans_sp_BNC1	1	GCF_000014245
+s__Varibaculum_cambriense	1	GCF_000420065
+s__Dictyostelium_purpureum	1	GCA_000190715
+s__Enterobacteria_phage_mEp237	1	PRJNA183147
+s__Enterobacteria_phage_mEp235	1	PRJNA183146
+s__Corynebacterium_mastitidis	1	GCF_000375365
+s__Piper_yellow_mottle_virus	1	PRJNA219627
+s__Pantoea_ananatis	7	GCF_000285975	GCF_000285475	GCF_000270125	GCF_000233595	GCF_000025405	GCF_000475035	GCF_000283875
+s__Crenarchaeota_archaeon_SCGC_AAA471_B05	1	GCF_000380705
+s__Cronobacter_phage_ENT39118	1	PRJNA184168
+s__Rhodobacterales_bacterium_HTCC2255	1	GCF_000153745
+s__Lymphocytic_choriomeningitis_virus	1	PRJNA14862
+s__Mucilaginibacter_paludis	1	GCF_000166195
+s__Prochlorococcus_phage_MED4_213	1	PRJNA195505
+s__Hyphantria_cunea_nucleopolyhedrovirus	1	PRJNA16343
+s__Bacillus_anthracis	24	GCF_000167295	GCF_000181915	GCF_000167235	GCF_000295695	GCF_000219895	GCF_000181995	GCF_000167335	GCF_000292565	GCF_000021445	GCF_000258885	GCF_000008445	GCF_000022865	GCF_000181675	GCF_000278385	GCF_000182055	GCF_000181935	GCF_000167255	GCF_000008165	GCF_000167315	GCF_000181835	GCF_000167275	GCF_000319695	GCF_000007845	GCF_000319715
+s__Wheat_eqlid_mosaic_virus	1	PRJNA20763
+s__Burkholderia_mallei	10	GCF_000011705	GCF_000167635	GCF_000015605	GCF_000015625	GCF_000153085	GCF_000152305	GCF_000015465	GCF_000169875	GCF_000152385	GCF_000152405
+s__Diaporthe_ambigua_RNA_virus_1	1	PRJNA14962
+s__Tomato_begomovirus_satellite_DNA_beta	1	PRJNA14449
+s__Antheraea_pernyi_nucleopolyhedrovirus	1	PRJNA16793
+s__Macroptilium_yellow_net_virus	1	PRJNA124063
+s__Verrucomicrobia_bacterium_SCGC_AB_629_E09	1	GCF_000371985
+s__Nakamurella_multipartita	1	GCF_000024365
+s__Cycloclasticus_zancles	1	GCF_000442595
+s__Bacteroides_cellulosilyticus	2	GCF_000158035	GCF_000273015
+s__Equus_ferus_caballus_papillomavirus_type_4	1	PRJNA185426
+s__Equus_ferus_caballus_papillomavirus_type_5	1	PRJNA185427
+s__Pseudomonas_alcaliphila	1	GCF_000319815
+s__Equus_ferus_caballus_papillomavirus_type_7	1	PRJNA193979
+s__Frankia_sp_CN3	1	GCF_000235425
+s__Fritillary_virus_Y	1	PRJNA30175
+s__Duck_astrovirus_GII_A	1	PRJNA36399
+s__Enterobacteria_phage_NJ01	1	PRJNA177541
+s__Thioalkalivibrio_thiocyanodenitrificans	1	GCF_000378965
+s__Marinomonas_sp_MWYL1	1	GCF_000017285
+s__Polymorphum_gilvum	1	GCF_000192745
+s__Nocardioides_sp_Iso805N	1	GCF_000364605
+s__Ligustrum_necrotic_ringspot_virus	1	PRJNA28681
+s__Sphingobium_ummariense	1	GCF_000447205
+s__Brevibacillus_sp_BC25	1	GCF_000282075
+s__Thermus_phage_P74_26	1	PRJNA20767
+s__Steller_sea_lion_vesivirus	1	PRJNA30663
+s__Lolium_latent_virus	1	PRJNA28971
+s__Wheat_dwarf_India_virus	1	PRJNA162491
+s__Epinotia_aporema_granulovirus	1	PRJNA177904
+s__Cyanophage_KBS_P_1A	1	PRJNA195501
+s__Southern_tomato_virus	1	PRJNA32821
+s__Brucella_sp_UK1_97	1	GCF_000371045
+s__Mimosa_yellow_leaf_curl_virus_associated_DNA_1	1	PRJNA19817
+s__Banana_streak_OL_virus	1	PRJNA15239
+s__Bovine_herpesvirus_5	1	PRJNA14313
+s__Bovine_herpesvirus_4	1	PRJNA14110
+s__Wolbachia_endosymbiont_of_Diaphorina_citri	1	GCF_000331595
+s__Enterobacteria_phage_K1E	1	PRJNA16228
+s__Ralstonia_phage_p12J	1	PRJNA14307
+s__Enterococcus_phoeniculicola	2	GCF_000407505	GCF_000394035
+s__Pseudomonas_sp_CF161	1	GCF_000416215
+s__Human_parainfluenza_virus_1	1	PRJNA14743
+s__Human_parainfluenza_virus_2	1	PRJNA15421
+s__Human_parainfluenza_virus_3	1	PRJNA14706
+s__Calothrix_parietina	1	GCF_000317435
+s__Xanthomonas_alfalfae	1	GCF_000225915
+s__Pseudomonas_phage_tf	1	PRJNA167604
+s__Ochrobactrum_sp_CDB2	1	GCF_000344725
+s__Mopeia_Lassa_virus_reassortant_29	1	PRJNA15036
+s__Tomato_yellow_dwarf_disease_associated_satellite_DNA_beta_Kochi	1	PRJNA20983
+s__Melon_aphid_borne_yellows_virus	1	PRJNA30049
+s__Rhodococcus_qingshengii	1	GCF_000341815
+s__Turnip_mosaic_virus	1	PRJNA15408
+s__American_bat_vesiculovirus_TFFN_2013	1	PRJNA226731
+s__Lactobacillus_phage_JCL1032	1	PRJNA181076
+s__Synechococcus_sp_BL107	1	GCF_000153805
+s__Luffa_puckering_and_leaf_distortion_associated_DNA_beta	1	PRJNA15779
+s__Burkholderia_phage_BcepGomr	1	PRJNA19579
+s__Rickettsia_endosymbiont_of_Bemisia_tabaci	1	GCF_000265225
+s__Actinosynnema_mirum	1	GCF_000023245
+s__Halocynthia_phage_JM_2012	1	PRJNA167664
+s__Yellowtail_ascites_virus	1	PRJNA14852
+s__Streptomyces_pristinaespiralis	1	GCF_000154945
+s__Photobacterium_damselae	1	GCF_000176795
+s__Mesotoga_prima	1	GCF_000147715
+s__Vibrio_phage_pYD38_B	1	PRJNA209211
+s__Vibrio_phage_pYD38_A	1	PRJNA209063
+s__Synechococcus_sp_WH_5701	1	GCF_000153045
+s__Dyella_ginsengisoli	1	GCF_000334915
+s__Thalassiobium_sp_R2A62	1	GCF_000161835
+s__Diplodia_scrobiculata_RNA_virus_1	1	PRJNA43007
+s__Erwinia_pyrifoliae	2	GCF_000026985	GCF_000027265
+s__Megasphaera_sp_UPII_199_6	1	GCF_000214495
+s__Moroccan_pepper_virus	1	PRJNA185273
+s__Cucumber_mosaic_virus_satellite_RNA	1	PRJNA14568
+s__Tobacco_vein_clearing_virus	1	PRJNA14150
+s__Janthinobacterium_sp_Marseille	1	GCF_000013625
+s__Methanosphaera_stadtmanae	1	GCF_000012545
+s__Vibrio_phage_N4	1	PRJNA42785
+s__Serratia_sp_M24T3	1	GCF_000257645
+s__Mouse_kobuvirus_M_5_USA_2010	1	PRJNA72383
+s__Escherichia_sp_4_1_40B	1	GCF_000158415
+s__Sugarcane_striate_mosaic_associated_virus	1	PRJNA14819
+s__Colwellia_phage_9A	1	PRJNA169428
+s__Stackebrandtia_nassauensis	1	GCF_000024545
+s__Calothrix_sp_PCC_7507	1	GCF_000316575
+s__Lactobacillus_prophage_Lj965	1	PRJNA14351
+s__Bell_pepper_endornavirus	1	PRJNA70001
+s__Neisseria_lactamica	3	GCF_000193795	GCF_000196295	GCF_000173995
+s__Bibersteinia_trehalosi	1	GCF_000347595
+s__Burkholderia_glumae	4	GCF_000300395	GCF_000365245	GCF_000300755	GCF_000022645
+s__Hardenbergia_virus_A	1	PRJNA65813
+s__Moritella_sp_PE36	1	GCF_000170855
+s__Honeysuckle_yellow_vein_Japan_betasatellite	1	PRJNA19603
+s__Lactobacillus_mali	2	GCF_000260415	GCF_000276905
+s__Peptoniphilus_indolicus	1	GCF_000227315
+s__Propionibacterium_sp_409_HC1	1	GCF_000214515
+s__Prevotella_loescheii	1	GCF_000378085
+s__Thauera_sp_27	1	GCF_000310125
+s__Propionibacterium_freudenreichii	1	GCF_000091725
+s__Drosophila_x_virus	1	PRJNA14853
+s__Encephalomyocarditis_virus	1	PRJNA15307
+s__Mulberry_small_circular_viroid_like_RNA_1	1	PRJNA32241
+s__Corynebacterium_crenatum	1	GCF_000380545
+s__Pseudomonas_chlororaphis	4	GCF_000281915	GCF_000506385	GCF_000264555	GCF_000237045
+s__Murray_Valley_encephalitis_virus	1	PRJNA15430
+s__Verrucomicrobium_spinosum	1	GCF_000172155
+s__Clostridium_hathewayi	3	GCF_000235505	GCF_000160095	GCF_000371445
+s__Streptococcus_pyogenes_phage_315_3	1	PRJNA14529
+s__Streptococcus_pyogenes_phage_315_2	1	PRJNA14528
+s__Streptococcus_pyogenes_phage_315_1	1	PRJNA14533
+s__Pyrobaculum_spherical_virus	1	PRJNA14374
+s__Streptococcus_pyogenes_phage_315_6	1	PRJNA14532
+s__Streptococcus_pyogenes_phage_315_5	1	PRJNA14531
+s__Streptococcus_pyogenes_phage_315_4	1	PRJNA14530
+s__Cardiobacterium_hominis	1	GCF_000160655
+s__Human_papillomavirus_132_like_viruses	1	PRJNA62179
+s__Tomato_leaf_curl_Hsinchu_virus	1	PRJNA18627
+s__Porcine_coronavirus_HKU15	1	PRJNA109271
+s__Streptomyces_vitaminophilus	1	GCF_000380165
+s__Penaeus_monodon_hepatopancreatic_parvovirus	1	PRJNA32695
+s__Corynebacterium_doosanense	1	GCF_000372245
+s__Sporosarcina_newyorkensis	1	GCF_000220335
+s__Burkholderia_dolosa	1	GCF_000152585
+s__Salinicoccus_albus	1	GCF_000385175
+s__Pseudogulbenkiania_sp_NH8B	1	GCF_000283535
+s__Sodalis_phage_SO_1	1	PRJNA42597
+s__Serratia_phage_Eta	1	PRJNA209364
+s__Burkholderia_phage_BcepIL02	1	PRJNA38297
+s__Goose_adenovirus_A	1	PRJNA167579
+s__Vibrio_phage_Vf33	1	PRJNA14384
+s__Mycobacterium_phage_Adawi	1	PRJNA219121
+s__Tobacco_bushy_top_virus_satellite_like_RNA	1	PRJNA14511
+s__Mycobacterium_phage_Che12	1	PRJNA17143
+s__Burkholderia_phage_BcepC6B	1	PRJNA14379
+s__Acidothermus_cellulolyticus	1	GCF_000015025
+s__Veillonella_parvula	3	GCF_000177435	GCF_000024945	GCF_000215025
+s__Nitrosopumilus_sp_SJ	1	GCF_000328945
+s__Salmonella_phage_epsilon34	1	PRJNA33779
+s__Vibrio_shilonii	1	GCF_000181535
+s__Pantoea_dispersa	1	GCF_000465555
+s__Lily_mottle_virus	1	PRJNA15495
+s__Pepper_golden_mosaic_virus	1	PRJNA14210
+s__Invertebrate_iridescent_virus_9	1	PRJNA69999
+s__Coprobacter_fastidiosus	1	GCF_000473955
+s__Invertebrate_iridescent_virus_6	1	PRJNA14124
+s__Tsukamurella_phage_TPA2	1	PRJNA63439
+s__Invertebrate_iridescent_virus_3	1	PRJNA17099
+s__Bacillus_phage_GA_1	1	PRJNA15202
+s__Pseudomonas_umsongensis	1	GCF_000377725
+s__Thioalkalivibrio_sp_ALgr3	1	GCF_000377325
+s__Thioalkalivibrio_sp_ALgr1	1	GCF_000378285
+s__Thioalkalivibrio_sp_ALgr5	1	GCF_000381485
+s__Bacillus_phage_0305phi8_36	1	PRJNA20653
+s__Listeria_grayi	1	GCF_000148995
+s__Desulfovibrio_sp_Dsv1	1	GCF_000403945
+s__Trichodesmium_erythraeum	1	GCF_000014265
+s__Piliocolobus_rufomitratus_polyomavirus_1	1	PRJNA183909
+s__Streptococcus_gordonii	1	GCF_000017005
+s__Botrytis_porri_RNA_virus_1	1	PRJNA167870
+s__Bacillus_sp_916	1	GCF_000275785
+s__Tobacco_leaf_curl_Yunnan_virus_satellite_DNA_beta	1	PRJNA14539
+s__Staphylothermus_hellenicus	1	GCF_000092465
+s__Streptomyces_sp_Amel2xE9	1	GCF_000383935
+s__Rose_yellow_mosaic_virus	1	PRJNA178589
+s__Synechococcus_sp_PCC_6312	1	GCF_000316685
+s__Sulfolobus_islandicus_filamentous_virus	1	PRJNA14132
+s__Vervet_monkey_polyomavirus_1	1	PRJNA183709
+s__Porcine_cytomegalovirus	1	PRJNA217990
+s__Panax_virus_Y	1	PRJNA49715
+s__Pelagibacter_phage_HTVC011P	1	PRJNA192867
+s__Corynebacterium_ulceribovis	1	GCF_000372445
+s__Microbacterium_paraoxydans	1	GCF_000380465
+s__Bifidobacterium_thermophilum	1	GCF_000347695
+s__Parvimonas_micra	1	GCF_000154405
+s__Rift_Valley_fever_virus	1	PRJNA14631
+s__Ochrobactrum_intermedium	3	GCF_000332835	GCF_000182645	GCF_000472165
+s__Passion_fruit_woodiness_virus	1	PRJNA61089
+s__Tobacco_leaf_curl_PUSA_alphasatellite	1	PRJNA56023
+s__Gallibacterium_anatis	3	GCF_000379785	GCF_000464615	GCF_000209675
+s__Anaeromusa_acidaminophila	1	GCF_000374545
+s__Chilli_leaf_curl_betasatellite	1	PRJNA14441
+s__Mycobacterium_phage_KayaCho	1	PRJNA215111
+s__Lactobacillus_oris	2	GCF_000221505	GCF_000180015
+s__Poinsettia_latent_virus	1	PRJNA32691
+s__Haemophilus_parainfluenzae	4	GCF_000191405	GCF_000259485	GCF_000210895	GCF_000261285
+s__Mycobacterium_phage_ScottMcG	1	PRJNA31283
+s__Gracilimonas_tropica	1	GCF_000375425
+s__Selenomonas_sp_FOBRC6	1	GCF_000286455
+s__Junin_virus	1	PRJNA15028
+s__Tepidanaerobacter_acetatoxydans	2	GCF_000213235	GCF_000328765
+s__Mosquito_densovirus_BR_07	1	PRJNA62639
+s__Beet_chlorosis_virus	1	PRJNA14712
+s__Brugmansia_mild_mottle_virus	1	PRJNA30157
+s__Mycoplasma_auris	1	GCF_000367765
+s__Hippeastrum_mosaic_virus	1	PRJNA167580
+s__Chickpea_chlorotic_stunt_virus	1	PRJNA17363
+s__Candidatus_Puniceispirillum_marinum	1	GCF_000024465
+s__Lagos_bat_virus	1	PRJNA194143
+s__Hepatitis_C_virus	1	PRJNA15432
+s__Lactobacillus_salivarius	7	GCF_000215465	GCF_000143435	GCF_000260335	GCF_000008925	GCF_000179475	GCF_000159395	GCF_000217735
+s__Ornithinibacillus_scapharcae	1	GCF_000190475
+s__Puumala_virus	1	PRJNA14930
+s__Tomato_yellow_margin_leaf_curl_virus	1	PRJNA14371
+s__Moorea_producens	1	GCF_000211815
+s__Cronobacter_phage_ENT47670	1	PRJNA184169
+s__Cellulophaga_phage_phi4_1	1	PRJNA212952
+s__Oscillatoria_sp_PCC_10802	1	GCF_000332335
+s__Phormidium_phage_Pf_WMP3	1	PRJNA19801
+s__Corynebacterium_striatum	1	GCF_000159135
+s__Cucumber_mottle_virus	1	PRJNA18331
+s__Robiginitomaculum_antarcticum	1	GCF_000365025
+s__Eubacterium_sp_14_2	1	GCF_000403845
+s__Acidovorax_ebreus	1	GCF_000022305
+s__Thermus_phage_P23_77	1	PRJNA40235
+s__Pelagibacterium_halotolerans	1	GCF_000230555
+s__Oryctes_rhinoceros_virus	1	PRJNA32781
+s__Propionibacterium_phage_PA6	1	PRJNA19767
+s__Sphingomonas_wittichii	2	GCF_000259955	GCF_000016765
+s__Lactobacillus_mucosae	1	GCF_000248095
+s__Streptococcus_sp_C150	1	GCF_000187445
+s__Sida_yellow_vein_Madurai_virus	1	PRJNA19405
+s__Epsilonpapillomavirus_1	1	PRJNA14220
+s__Vibrio_breoganii	3	GCF_000280885	GCF_000286995	GCF_000286975
+s__Streptococcus_phage_ALQ13_2	1	PRJNA42593
+s__Stenotrophomonas_sp_SKA14	1	GCF_000158575
+s__Acinetobacter_sp_NIPH_2100	1	GCF_000369805
+s__Vibrio_phage_nt_1	1	PRJNA209064
+s__Polaribacter_sp_MED152	1	GCF_000152945
+s__Sporolactobacillus_laevolacticus	1	GCF_000497245
+s__Desulfovibrio_salexigens	1	GCF_000023445
+s__Yersinia_bercovieri	1	GCF_000167975
+s__Murine_pneumonia_virus	1	PRJNA15251
+s__Skua_adenovirus_A	1	PRJNA78801
+s__Mycobacterium_tuberculosis_bovis_africanum_canetti	82	GCF_000008585	GCF_000304555	GCF_000195835	GCF_000177835	GCF_000177855	GCF_000177975	GCF_000328825	GCF_000184005	GCF_000454345	GCF_000270365	GCF_000155245	GCF_000155225	GCF_000389945	GCF_000389925	GCF_000177895	GCF_000331445	GCF_000328805	GCF_000190335	GCF_000159755	GCF_000253355	GCF_000328785	GCF_000193185	GCF_000454325	GCF_000234725	GCF_000016145	GCF_000155305	GCF_000488915	GCF_000154605	GCF_000155165	GCF_000184025	GCF_000152105	G [...]
+s__Sandfly_fever_Naples_virus	1	PRJNA15053
+s__Sida_yellow_vein_Vietnam_alphasatellite	1	PRJNA19815
+s__Lettuce_yellow_mottle_virus	1	PRJNA32669
+s__Sulfobacillus_acidophilus	2	GCF_000219855	GCF_000237975
+s__Clostridium_bifermentans	2	GCF_000452225	GCF_000452245
+s__Phaeospirillum_molischianum	1	GCF_000294655
+s__Paenibacillus_sp_JCM_10914	1	GCF_000509425
+s__Chamaesiphon_minutus	1	GCF_000317145
+s__Desulfosporosinus_youngiae	1	GCF_000244895
+s__Haloferax_prahovense	1	GCF_000336815
+s__Kalanchoe_top_spotting_virus	1	PRJNA14236
+s__Staphylococcus_phage_phiETA2	1	PRJNA18669
+s__Staphylococcus_phage_phiETA3	1	PRJNA18671
+s__Rubrivivax_benzoatilyticus	1	GCF_000190375
+s__Acetonema_longum	1	GCF_000219125
+s__Enterobacteria_phage_vB_EcoM_VR7	1	PRJNA61099
+s__Enterococcus_sp_7L76	1	GCF_000210115
+s__Phaeocystis_globosa_virus_virophage	1	PRJNA206475
+s__Enterobacteria_phage_JSE	1	PRJNA38263
+s__KI_polyomavirus	1	PRJNA19155
+s__Halothece_sp_PCC_7418	1	GCF_000317635
+s__Enterobacteria_phage_JK06	1	PRJNA15569
+s__Bacteroides_sp_3_1_40A	1	GCF_000186105
+s__Enterococcus_phage_phiFL2A	1	PRJNA42795
+s__Pea_early_browning_virus	1	PRJNA15067
+s__Thielavia_terrestris	1	GCA_000226115
+s__Legionella_pneumophila	13	GCF_000465915	GCF_000306865	GCF_000306845	GCF_000092545	GCF_000347615	GCF_000465695	GCF_000048665	GCF_000465675	GCF_000404245	GCF_000455845	GCF_000239175	GCF_000048645	GCF_000092625
+s__Mexican_papita_viroid	1	PRJNA14773
+s__Mycobacterium_tusciae	1	GCF_000243415
+s__Planctomyces_brasiliensis	1	GCF_000165715
+s__Bacillus_phage_BCJA1c	1	PRJNA14548
+s__Eremothecium_gossypii	1	GCA_000091025
+s__Bacillus_methanolicus	2	GCF_000262755	GCF_000262735
+s__Squash_leaf_curl_virus	1	PRJNA14038
+s__Desulfotomaculum_hydrothermale	1	GCF_000315365
+s__Bhendi_yellow_vein_Bhubhaneswar_virus	1	PRJNA33885
+s__Budgerigar_fledgling_disease_polyomavirus	1	PRJNA14284
+s__Tobacco_curly_shoot_alphasatellite	1	PRJNA15480
+s__Synechococcus_phage_P60	1	PRJNA14628
+s__Cucumber_Bulgarian_virus	1	PRJNA14881
+s__Psychromonas_hadalis	1	GCF_000420245
+s__Kokobera_virus	1	PRJNA18843
+s__Thermanaerovibrio_velox	1	GCF_000237825
+s__Tobacco_leaf_curl_Yunnan_virus	1	PRJNA15258
+s__Streptomyces_flavogriseus	1	GCF_000176115
+s__Thiobacillus_denitrificans	2	GCF_000376425	GCF_000012745
+s__Francisella_novicida	7	GCF_000156415	GCF_000195555	GCF_000014645	GCF_000154265	GCF_000154185	GCF_000155755	GCF_000195535
+s__Nocardia_cyriacigeorgica	1	GCF_000284035
+s__Haemophilus_ducreyi	1	GCF_000007945
+s__Pseudomonas_sp_CF149	1	GCF_000416155
+s__Lactobacillus_johnsonii	6	GCF_000159355	GCF_000091405	GCF_000008065	GCF_000219475	GCF_000204985	GCF_000498675
+s__Halovirus_HCTV_5	1	PRJNA206497
+s__Halovirus_HCTV_2	1	PRJNA206498
+s__Sida_leaf_curl_virus	1	PRJNA16225
+s__Halovirus_HCTV_1	1	PRJNA206499
+s__Brucella_sp_04_5288	1	GCF_000480195
+s__Gordonia_amarae	1	GCF_000241345
+s__Spissistilus_festinus_reovirus	1	PRJNA83187
+s__Akabane_virus	1	PRJNA20971
+s__Grapevine_geminivirus	1	PRJNA165741
+s__Spirochaeta_alkalica	1	GCF_000373545
+s__Eupatorium_yellow_vein_betasatellite	1	PRJNA14447
+s__Chlamydia_trachomatis	73	GCF_000026905	GCF_000318645	GCF_000304495	GCF_000318785	GCF_000318985	GCF_000318905	GCF_000318885	GCF_000319045	GCF_000226605	GCF_000092725	GCF_000092665	GCF_000173535	GCF_000008725	GCF_000441635	GCF_000174055	GCF_000092805	GCF_000318845	GCF_000441815	GCF_000012125	GCF_000092485	GCF_000319005	GCF_000348845	GCF_000026925	GCF_000441655	GCF_000318545	GCF_000318665	GCF_000441615	GCF_000318605	GCF_000318825	GCF_000319125	GCF_000441755	GCF_000092685	GCF_000318565	GC [...]
+s__Spiroplasma_syrphidicola	1	GCF_000400955
+s__Suid_herpesvirus_1	1	PRJNA14424
+s__Sinorhizobium_meliloti	19	GCF_000287375	GCF_000287475	GCF_000287575	GCF_000304415	GCF_000287415	GCF_000375585	GCF_000287535	GCF_000287435	GCF_000147795	GCF_000287515	GCF_000287555	GCF_000287455	GCF_000346065	GCF_000147775	GCF_000320385	GCF_000218265	GCF_000006965	GCF_000236945	GCF_000287495
+s__Lactobacillus_murinus	1	GCF_000364205
+s__Caulobacter_sp_JGI_0001010_J14	1	GCF_000382625
+s__Enterobacteria_phage_13a	1	PRJNA30603
+s__Acinetobacter_guillouiae	2	GCF_000368145	GCF_000368485
+s__Rhodopirellula_baltica	4	GCF_000195185	GCF_000304635	GCF_000330745	GCF_000196115
+s__Crenarchaeota_archaeon_SCGC_AAA471_L14	1	GCF_000399825
+s__Influenza_B_virus	1	PRJNA14656
+s__Bovine_immunodeficiency_virus	1	PRJNA14634
+s__Fowl_adenovirus_A	2	PRJNA14522	PRJNA40323
+s__Fowl_adenovirus_C	1	PRJNA65223
+s__Fowl_adenovirus_B	1	PRJNA203280
+s__Fowl_adenovirus_E	1	PRJNA62241
+s__Fowl_adenovirus_D	2	PRJNA14523	PRJNA40321
+s__Eremothecium_cymbalariae	1	GCA_000235365
+s__Peptostreptococcus_stomatis	1	GCF_000147675
+s__Tomato_golden_leaf_spot_virus	1	PRJNA209726
+s__Subterranean_clover_mottle_virus_satellite_RNA	1	PRJNA14503
+s__Desulfovibrio_piezophilus	1	GCF_000341895
+s__Propionibacterium_phage_PHL010M04	1	PRJNA219117
+s__Enterobacteria_phage_Ike	1	PRJNA14627
+s__Streptococcus_parauberis	4	GCF_000213825	GCF_000343855	GCF_000187935	GCF_000342505
+s__Sphingomonas_melonis	2	GCF_000371765	GCF_000379045
+s__Thermosphaera_aggregans	1	GCF_000092185
+s__Clostridium_bolteae	6	GCF_000371665	GCF_000371645	GCF_000154365	GCF_000371685	GCF_000371705	GCF_000371725
+s__Watermelon_silver_mottle_virus	1	PRJNA15176
+s__Selenomonas_flueggei	1	GCF_000160695
+s__Shallot_yellow_stripe_virus	1	PRJNA15745
+s__Spilanthes_yellow_vein_virus	1	PRJNA19779
+s__Blautia_sp_KLE_1732	1	GCF_000466565
+s__Bacillus_phage_BMBtp2	1	PRJNA184152
+s__Halobacteroides_halobius	1	GCF_000328625
+s__Wolbachia_endosymbiont_of_Culex_quinquefasciatus	2	GCF_000156735	GCF_000073005
+s__Enterococcus_phage_EF62phi	1	PRJNA159663
+s__Aeromonas_hydrophila	9	GCF_000354635	GCF_000354675	GCF_000350405	GCF_000014805	GCF_000315835	GCF_000354695	GCF_000401555	GCF_000298055	GCF_000354715
+s__Desulfovibrio_desulfuricans	4	GCF_000384815	GCF_000022125	GCF_000420465	GCF_000189295
+s__Citrobacter_freundii	5	GCF_000208765	GCF_000238735	GCF_000388155	GCF_000342325	GCF_000312465
+s__Mesocricetus_auratus_papillomavirus_1	1	PRJNA226107
+s__Cotton_leaf_curl_virus_associated_DNA_1_isolate_Lucknow	1	PRJNA65305
+s__Scheffersomyces_stipitis	1	GCA_000209165
+s__Subdoligranulum_variabile	1	GCF_000157955
+s__Virgibacillus_halodenitrificans	1	GCF_000294755
+s__Streptomyces_phage_phiC31	1	PRJNA14606
+s__Erwinia_phage_phiEa21_4	2	PRJNA33537	PRJNA64759
+s__Desulfobulbus_propionicus	1	GCF_000186885
+s__Riemerella_phage_RAP44	1	PRJNA181081
+s__Ectothiorhodospira_sp_PHS_1	1	GCF_000225005
+s__Rice_dwarf_virus	1	PRJNA14797
+s__Bromus_catharticus_striate_mosaic_virus	1	PRJNA61437
+s__Bagaza_virus	1	PRJNA36619
+s__Paenibacillus_elgii	1	GCF_000213315
+s__Caviid_herpesvirus_2	1	PRJNA188730
+s__Satellites_of_Trichomonas_vaginalis_T1_virus	1	PRJNA14201
+s__Louping_ill_virus	1	PRJNA15343
+s__zeta_proteobacterium_SCGC_AB_602_E04	1	GCF_000379265
+s__Catellicoccus_marimammalium	1	GCF_000313915
+s__Francisella_sp_TX077308	1	GCF_000219045
+s__Hirschia_baltica	1	GCF_000023785
+s__Streptococcus_ferus	1	GCF_000372425
+s__Vibrio_campbellii	5	GCF_000334195	GCF_000259875	GCF_000154025	GCF_000464435	GCF_000017705
+s__Aedes_pseudoscutellaris_reovirus	1	PRJNA16243
+s__Anaplasma_centrale	1	GCF_000024505
+s__Candidatus_Arthromitus_sp_SFB_rat_Yit	1	GCF_000283555
+s__Soybean_chlorotic_blotch_virus	1	PRJNA48595
+s__Nesterenkonia_sp_F	1	GCF_000220985
+s__Citrus_yellow_mosaic_virus	1	PRJNA14153
+s__candidate_division_TM7_genomosp_GTL1	1	GCF_000169295
+s__Dahlia_mosaic_virus	1	PRJNA175589
+s__Anaerococcus_lactolyticus	1	GCF_000156575
+s__Amycolatopsis_methanolica	1	GCF_000371885
+s__Leuconostoc_fallax	1	GCF_000165675
+s__Arcticibacter_svalbardensis	1	GCF_000403135
+s__Tupaia_virus	1	PRJNA15415
+s__Borrelia_garinii	4	GCF_000172275	GCF_000172595	GCF_000300045	GCF_000239475
+s__Idiomarina_loihiensis	2	GCF_000008465	GCF_000401175
+s__Pseudomonas_gingeri	1	GCF_000280765
+s__Pelargonium_flower_break_virus	1	PRJNA14928
+s__Marinilabilia_salmonicolor	1	GCF_000259075
+s__Actinobaculum_sp_oral_taxon_183	1	GCF_000466165
+s__Bartonella_washoensis	2	GCF_000278195	GCF_000278135
+s__Acidaminococcus_sp_HPA0509	1	GCF_000411395
+s__Eikenella_corrodens	2	GCF_000504685	GCF_000158615
+s__Enterobacteria_phage_HK140	1	PRJNA183139
+s__Rhodococcus_wratislaviensis	1	GCF_000325625
+s__Acholeplasma_phage_L2	1	PRJNA14066
+s__Bacillus_phage_TP21_L	1	PRJNA33139
+s__Bacillus_phage_phiAGATE	1	PRJNA185318
+s__Thrush_coronavirus_HKU12	1	PRJNA32701
+s__Nilaparvata_lugens_honeydew_virus_2	1	PRJNA209359
+s__Nilaparvata_lugens_honeydew_virus_3	1	PRJNA209357
+s__Apple_latent_spherical_virus	1	PRJNA15367
+s__Malvastrum_yellow_mosaic_alphasatellite	1	PRJNA18129
+s__Guanarito_virus	1	PRJNA14939
+s__Acinetobacter_sp_CIP_102129	1	GCF_000368305
+s__Methylotenera_mobilis	2	GCF_000023705	GCF_000384255
+s__Nostoc_sp_PCC_7107	1	GCF_000316625
+s__Indibacter_alkaliphilus	1	GCF_000295935
+s__Pleurocapsa_sp_PCC_7319	1	GCF_000332195
+s__Prevotella_intermedia	1	GCF_000261025
+s__Radish_leaf_curl_virus	1	PRJNA28279
+s__Propionibacterium_acidipropionici	1	GCF_000310065
+s__Apple_dimple_fruit_viroid	1	PRJNA14971
+s__Aquamavirus_A	1	PRJNA20985
+s__Gloeocapsa_sp_PCC_7428	1	GCF_000317555
+s__Anatid_herpesvirus_1	1	PRJNA39725
+s__Staphylococcus_simiae	1	GCF_000235645
+s__Salmonella_phage_epsilon15	1	PRJNA14285
+s__Butyrivibrio_sp_AE3009	1	GCF_000420845
+s__Acinetobacter_oleivorans	2	GCF_000488235	GCF_000196795
+s__Sida_yellow_mosaic_China_virus	1	PRJNA167735
+s__Lactobacillus_parabrevis	1	GCF_000383435
+s__Streptomyces_sp_SA3_actG	1	GCF_000179195
+s__Streptomyces_sp_SA3_actF	1	GCF_000179215
+s__Marinococcus_halotolerans	1	GCF_000420725
+s__Torque_teno_mini_virus_5	1	PRJNA48177
+s__Bifidobacterium_dentium	4	GCF_000172135	GCF_000149165	GCF_000024445	GCF_000146775
+s__Sutterella_parvirubra	1	GCF_000250875
+s__Granulicatella_adiacens	1	GCF_000160675
+s__Torque_teno_mini_virus_7	1	PRJNA48163
+s__Vibrio_tasmaniensis	5	GCF_000272445	GCF_000272385	GCF_000272405	GCF_000272425	GCF_000272365
+s__Bartonella_taylorii	1	GCF_000278295
+s__Enterococcus_sp_C1	1	GCF_000277605
+s__Streptomyces_sp_FxanaD5	1	GCF_000373465
+s__Mycobacterium_phage_CASbig	1	PRJNA206483
+s__Chino_del_tomate_virus	1	PRJNA14183
+s__Frankia_sp_EuI1c	1	GCF_000166135
+s__Mint_virus_1	1	PRJNA15210
+s__Clostridium_phage_phi24R	1	PRJNA181218
+s__Sphingomonas_elodea	1	GCF_000226955
+s__Curtobacterium_flaccumfaciens	1	GCF_000349565
+s__Ryegrass_mottle_virus	1	PRJNA15375
+s__Pelargonium_chlorotic_ring_pattern_virus	1	PRJNA14922
+s__Lactobacillus_phage_Lv_1	1	PRJNA33535
+s__Infectious_pancreatic_necrosis_virus	1	PRJNA15024
+s__Bacteroides_salyersiae	2	GCF_000381365	GCF_000273235
+s__Thermotoga_thermarum	1	GCF_000217815
+s__Enterobacteria_phage_JL1	1	PRJNA179426
+s__Torque_teno_mini_virus_9	1	PRJNA14058
+s__Tomato_leaf_curl_Karnataka_alphasatellite	1	PRJNA181995
+s__Thioalkalivibrio_sp_HL_Eb18	1	GCF_000364985
+s__Mint_virus_X	1	PRJNA15160
+s__Streptococcus_uberis	1	GCF_000009545
+s__Cotton_leaf_curl_virus	1	PRJNA162501
+s__Azotobacter_vinelandii	2	GCF_000380365	GCF_000021045
+s__Geobacillus_sp_GHH01	1	GCF_000336445
+s__Dyadobacter_fermentans	1	GCF_000023125
+s__Pseudomonas_nitroreducens	1	GCF_000313755
+s__Pepper_mild_mottle_virus	1	PRJNA15148
+s__Human_papillomavirus_type_131	1	PRJNA62177
+s__Human_papillomavirus_type_137	1	PRJNA167867
+s__Sphaeropsis_sapinea_RNA_virus_1	1	PRJNA14722
+s__Human_papillomavirus_type_135	1	PRJNA167865
+s__Human_papillomavirus_type_134	1	PRJNA62181
+s__Strawberry_necrotic_shock_virus	1	PRJNA18507
+s__Potato_yellow_mosaic_Trinidad_virus	1	PRJNA14256
+s__Salmonella_phage_S16	1	PRJNA191122
+s__Bordetella_bronchiseptica_parapertussis	11	GCF_000318015	GCF_000313065	GCF_000195695	GCF_000312945	GCF_000306945	GCF_000317955	GCF_000195675	GCF_000313085	GCF_000479655	GCF_000479735	GCF_000317935
+s__Pseudomonas_phage_vB_Pae_Kakheti25	1	PRJNA167052
+s__Fructobacillus_fructosus	1	GCF_000185045
+s__Vibrio_phage_VvAW1	1	PRJNA192871
+s__Alcanivorax_dieselolei	1	GCF_000300005
+s__Paramecium_bursaria_Chlorella_virus_A1	1	PRJNA18305
+s__Mycobacterium_phage_SDcharge11	1	PRJNA206028
+s__Rhizosolenia_setigera_RNA_virus_01	1	PRJNA175590
+s__Enterobacteria_phage_WA13_sensu_lato	1	PRJNA16595
+s__Cupriavidus_metallidurans	1	GCF_000196015
+s__Vibrio_phage_VCY_phi	1	PRJNA76737
+s__Octadecabacter_antarcticus	1	GCF_000155675
+s__Amsacta_moorei_entomopoxvirus_L	1	PRJNA14097
+s__Synechococcus_phage_S_SSM7	1	PRJNA64711
+s__Synechococcus_phage_S_SSM5	1	PRJNA64715
+s__Rhizobium_freirei	1	GCF_000359745
+s__Lettuce_infectious_yellows_virus	1	PRJNA14768
+s__Labidocera_aestiva_circovirus	1	PRJNA186433
+s__Meleagrid_herpesvirus_1	1	PRJNA14106
+s__Deinococcus_peraridilitoris	1	GCF_000317835
+s__Pyrococcus_furiosus	2	GCF_000275605	GCF_000007305
+s__Pseudonocardia_sp_P1	1	GCF_000178675
+s__Pseudonocardia_sp_P2	1	GCF_000179835
+s__Prochlorococcus_phage_P_GSP1	1	PRJNA195518
+s__Cecembia_lonarensis	1	GCF_000298295
+s__Quang_Binh_virus	1	PRJNA37969
+s__Grapevine_leafroll_associated_virus_3	1	PRJNA14906
+s__Bacteroides_sp_2_1_33B	1	GCF_000162175
+s__Polyomavirus_HPyV6	1	PRJNA51559
+s__Polyomavirus_HPyV7	1	PRJNA51557
+s__White_bream_virus	1	PRJNA18013
+s__Lactobacillus_amylovorus	2	GCF_000194115	GCF_000182855
+s__Pusillimonas_noertemannii	1	GCF_000308195
+s__Sepik_virus	1	PRJNA18513
+s__Cetacean_morbillivirus	1	PRJNA15215
+s__Grapevine_leafroll_associated_virus_7	1	PRJNA78707
+s__Parabacteroides_distasonis	3	GCF_000012845	GCF_000307435	GCF_000307455
+s__Sphaerochaeta_coccoides	1	GCF_000208385
+s__Porcine_bocavirus_5_JS677	1	PRJNA81033
+s__Acidovorax_citrulli	2	GCF_000316055	GCF_000015325
+s__Peruvian_horse_sickness_virus	1	PRJNA16337
+s__Cellulosilyticum_lentocellum	1	GCF_000178835
+s__Mycoplasma_moatsii	1	GCF_000420225
+s__Xanthomonas_phage_vB_XveM_DIBBI	1	PRJNA167661
+s__Enterobacter_sp_MGH_22	1	GCF_000493055
+s__Ageratum_enation_virus	1	PRJNA15192
+s__Streptococcus_sp_GMD6S	1	GCF_000297015
+s__Staphylococcus_phage_phi13	1	PRJNA14248
+s__Sporolactobacillus_vineae	2	GCF_000377985	GCF_000246965
+s__Streptomyces_sp_LaPpAH_202	1	GCF_000373225
+s__Anaerobaculum_hydrogeniformans	1	GCF_000160455
+s__Acinetobacter_sp_NIPH_2168	1	GCF_000369705
+s__Cellvibrio_japonicus	1	GCF_000019225
+s__Chlorobaculum_tepidum	1	GCF_000006985
+s__Tomato_mottle_Taino_virus	1	PRJNA14082
+s__Cytophaga_aurantiaca	1	GCF_000379725
+s__Magnaporthe_oryzae_chrysovirus_1	1	PRJNA51685
+s__Streptococcus_phage_PH10	1	PRJNA38365
+s__Xanthomonas_citri	3	GCF_000349225	GCF_000007165	GCF_000263335
+s__Streptococcus_phage_PH15	1	PRJNA30161
+s__Burkholderia_phage_BcepMu	1	PRJNA14376
+s__Sweet_potato_leaf_curl_Georgia_virus	1	PRJNA14257
+s__Bovine_rhinitis_B_virus	1	PRJNA28835
+s__Deinococcus_deserti	1	GCF_000020685
+s__Thioalkalivibrio_sp_AL5	1	GCF_000378565
+s__Salsuginibacillus_kocurii	1	GCF_000377705
+s__Exiguobacterium_pavilionensis	1	GCF_000416965
+s__Astrovirus_VA1	1	PRJNA39811
+s__Pea_enation_mosaic_virus_satellite_RNA	1	PRJNA14432
+s__Megamonas_funiformis	1	GCF_000245775
+s__Enterobacteria_phage_HK629	1	PRJNA183144
+s__Sandarakinorhabdus_sp_AAP62	1	GCF_000331225
+s__Clostridium_cellulolyticum	1	GCF_000022065
+s__Mycobacterium_phage_Job42	1	PRJNA209072
+s__Bermanella_marisrubri	1	GCF_000153565
+s__Lactobacillus_kisonensis	1	GCF_000242275
+s__Arthrospira_platensis	3	GCF_000210375	GCF_000175415	GCF_000307915
+s__Acinetobacter_brisouii	2	GCF_000368645	GCF_000488275
+s__Prochlorococcus_phage_P_HM1	1	PRJNA64697
+s__Prochlorococcus_phage_P_HM2	1	PRJNA64705
+s__Tomato_leaf_curl_Sri_Lanka_virus	1	PRJNA14259
+s__Streptomyces_mobaraensis	1	GCF_000342125
+s__STL_polyomavirus	1	PRJNA186434
+s__Pseudacidovorax_intermedius	1	GCF_000333675
+s__Cyanothece_sp_ATCC_51142	1	GCF_000017845
+s__Santeuil_nodavirus	1	PRJNA62547
+s__Desmospora_sp_8437	1	GCF_000213595
+s__Blattabacterium_punctulatus	1	GCF_000236405
+s__Onion_yellows_phytoplasma	1	GCF_000009845
+s__Echinicola_vietnamensis	1	GCF_000325705
+s__Tomato_yellow_leaf_curl_Malaga_virus	1	PRJNA14239
+s__Veillonella_sp_HPA0037	1	GCF_000411535
+s__Fusobacterium_russii	1	GCF_000381725
+s__Porcine_teschovirus	1	PRJNA15092
+s__Bulleidia_extructa	1	GCF_000177375
+s__Leuconostoc_citreum	4	GCF_000026405	GCF_000239935	GCF_000239895	GCF_000239915
+s__Bacteriophage_APSE_2	1	PRJNA32705
+s__Parainfluenza_virus_5	1	PRJNA15014
+s__Clostridium_phage_phiCP7R	1	PRJNA167663
+s__Bordetella_phage_BIP_1	1	PRJNA14359
+s__Bordetella_phage_BPP_1	1	PRJNA14353
+s__Ustilago_maydis_virus_H1	1	PRJNA14812
+s__Artibeus_jamaicensis_parvovirus_1	1	PRJNA81739
+s__Pseudoramibacter_alactolyticus	1	GCF_000185505
+s__halophilic_archaeon_DL31	1	GCF_000224475
+s__Ralstonia_phage_RSM1	1	PRJNA18239
+s__Clostridium_leptum	1	GCF_000154345
+s__Commensalibacter_intestini	1	GCF_000231445
+s__Streptomyces_violaceusniger	2	GCF_000478605	GCF_000147815
+s__Methylomicrobium_buryatense	1	GCF_000341735
+s__Mycobacterium_phage_Boomer	1	PRJNA30693
+s__Trichomonas_vaginalis	1	GCA_000002825
+s__Lactobacillus_suebicus	1	GCF_000260395
+s__Citrus_variegation_virus	1	PRJNA19747
+s__Elizabethkingia_anophelis	2	GCF_000331815	GCF_000240095
+s__Acinetobacter_bacteriophage_AP22	1	PRJNA167576
+s__Kribbella_catacumbae	1	GCF_000372465
+s__Serratia_plymuthica	5	GCF_000438825	GCF_000261045	GCF_000214235	GCF_000300895	GCF_000176835
+s__Circovirus_like_genome_RW_E	1	PRJNA39625
+s__Artichoke_mottled_crinkle_virus	1	PRJNA15517
+s__Circovirus_like_genome_RW_A	1	PRJNA39617
+s__Circovirus_like_genome_RW_C	1	PRJNA39621
+s__Circovirus_like_genome_RW_B	1	PRJNA39619
+s__Cucumber_necrosis_virus	1	PRJNA14638
+s__Velvet_tobacco_mottle_virus	1	PRJNA52631
+s__Sebaldella_termitidis	1	GCF_000024405
+s__Candidatus_Symbiobacter_mobilis	1	GCF_000477435
+s__Methanocaldococcus_fervens	1	GCF_000023985
+s__Mycobacterium_phage_Wanda	1	PRJNA215122
+s__Pseudomonas_stutzeri	15	GCF_000195105	GCF_000263395	GCF_000282955	GCF_000280555	GCF_000279165	GCF_000416345	GCF_000267545	GCF_000235745	GCF_000013785	GCF_000327065	GCF_000237885	GCF_000219605	GCF_000307775	GCF_000341615	GCF_000455665
+s__Ehrlichia_ruminantium	3	GCF_000050405	GCF_000050425	GCF_000026005
+s__Edwardsiella_phage_PEi2	1	PRJNA226729
+s__Mycobacterium_phage_Bobi	1	PRJNA215126
+s__Habenaria_mosaic_virus	1	PRJNA212951
+s__Streptomyces_xinghaiensis	1	GCF_000220705
+s__Bacillus_phage_BtCS33	1	PRJNA169233
+s__Psychrobacter_aquaticus	1	GCF_000471625
+s__Mycobacterium_phage_Butters	1	PRJNA197297
+s__Propionimicrobium_lymphophilum	1	GCF_000411175
+s__Lactococcus_phage_bIL67	1	PRJNA32321
+s__Desulfovibrio_termitidis	1	GCF_000504305
+s__Halomonas_sp_GFAJ_1	1	GCF_000236625
+s__Murine_polyomavirus	1	PRJNA15489
+s__Lautropia_mirabilis	1	GCF_000186425
+s__Chitinophaga_pinensis	1	GCF_000024005
+s__Mungbean_yellow_mosaic_India_virus	1	PRJNA15259
+s__Edwardsiella_ictaluri	1	GCF_000022885
+s__Tomato_leaf_curl_Vietnam_virus	1	PRJNA14214
+s__Cypovirus_14	1	PRJNA15250
+s__Rhodocyclus_sp_UW_659_1_F08	1	GCF_000375925
+s__Thioalkalivibrio_sp_ALE10	1	GCF_000381385
+s__Thioalkalivibrio_sp_ALE11	1	GCF_000381205
+s__Thioalkalivibrio_sp_ALE12	1	GCF_000381105
+s__Thioalkalivibrio_sp_ALE14	1	GCF_000376845
+s__Bradyrhizobium_sp_STM_3843	1	GCF_000239815
+s__Thioalkalivibrio_sp_ALE16	1	GCF_000381305
+s__Thioalkalivibrio_sp_ALE18	1	GCF_000381465
+s__Mycobacterium_phage_Spud	1	PRJNA31285
+s__Cotton_leaf_curl_Gezira_alphasatellite	1	PRJNA42507
+s__Borrelia_afzelii	3	GCF_000304735	GCF_000170935	GCF_000222835
+s__Treponema_saccharophilum	1	GCF_000255555
+s__Lactococcus_phage_ul36	1	PRJNA14331
+s__Aquareovirus_C	1	PRJNA14900
+s__Nocardioides_sp_CF8	1	GCF_000389985
+s__Streptocarpus_flower_break_virus	1	PRJNA17803
+s__Human_respiratory_syncytial_virus	1	PRJNA15003
+s__Finch_circovirus	1	PRJNA18021
+s__Vibrio_sp_N418	1	GCF_000222565
+s__Lactobacillus_florum	1	GCF_000304715
+s__gamma_proteobacterium_BDW918	1	GCF_000259575
+s__Spissistilus_festinus_virus_1	1	PRJNA51181
+s__Mycoplasma_penetrans	1	GCF_000011225
+s__Streptococcus_sp_SK140	1	GCF_000259525
+s__Spinach_curly_top_Arizona_virus	1	PRJNA62497
+s__Thioalkalivibrio_sp_ALE30	1	GCF_000377465
+s__Erysipelothrix_rhusiopathiae	2	GCF_000270085	GCF_000160815
+s__Helicobacter_phage_1961P	1	PRJNA181239
+s__Lactobacillus_ruminis	4	GCF_000224985	GCF_000225845	GCF_000217755	GCF_000159375
+s__Streptomyces_sp_TOR3209	1	GCF_000259895
+s__Candidatus_Poribacteria_sp_WGA_4G	1	GCF_000364585
+s__Bifidobacterium_longum	18	GCF_000261265	GCF_000196575	GCF_000092325	GCF_000166315	GCF_000196555	GCF_000155415	GCF_000269965	GCF_000261205	GCF_000478525	GCF_000261225	GCF_000007525	GCF_000261245	GCF_000497735	GCF_000210755	GCF_000219455	GCF_000020425	GCF_000003135	GCF_000008945
+s__Herbaspirillum_sp_YR522	1	GCF_000282575
+s__Mesorhizobium_sp_WSM4349	1	GCF_000373125
+s__Aeromonas_phage_Aes012	1	PRJNA195532
+s__Clitocybe_odora_virus	1	PRJNA129589
+s__Photorhabdus_asymbiotica	1	GCF_000196475
+s__Ornithinimicrobium_pekingense	1	GCF_000421185
+s__Cellulophaga_phage_phi12_1	1	PRJNA212966
+s__Barmah_Forest_virus	1	PRJNA14679
+s__Clostera_anachoreta_granulovirus	1	PRJNA65819
+s__Lactococcus_raffinolactis	1	GCF_000327305
+s__Pseudomonas_phage_PAK_P1	1	PRJNA64763
+s__Actinomyces_sp_oral_taxon_180	1	GCF_000185285
+s__Porphyromonas_endodontalis	1	GCF_000174815
+s__Okra_mosaic_virus	1	PRJNA19761
+s__Heterocapsa_circularisquama_RNA_virus	1	PRJNA16157
+s__Florida_woods_cockroach_associated_cyclovirus	1	PRJNA188548
+s__Naumovozyma_dairenensis	1	GCA_000227115
+s__Tobacco_etch_virus	1	PRJNA15325
+s__Rubber_viroid_India_2009	1	PRJNA48423
+s__Tomato_leaf_curl_Ghana_virus	1	PRJNA28699
+s__Thiomonas_sp_FB_6	1	GCF_000377645
+s__Pseudoalteromonas_sp_NJ631	1	GCF_000276645
+s__Herbaspirillum_sp_GW103	1	GCF_000261365
+s__Deinococcus_maricopensis	1	GCF_000186385
+s__Actinomyces_phage_Av_1	1	PRJNA20057
+s__Kocuria_rhizophila	2	GCF_000010285	GCF_000214115
+s__Acinetobacter_sp_ANC_3994	1	GCF_000367925
+s__Natronococcus_amylolyticus	1	GCF_000337675
+s__Cotton_leaf_curl_Rajasthan_virus	1	PRJNA14130
+s__Holophaga_foetida	1	GCF_000242615
+s__Feline_foamy_virus	1	PRJNA15219
+s__Sida_yellow_mosaic_Alagoas_virus	1	PRJNA189217
+s__Nitratiruptor_sp_SB155_2	1	GCF_000010325
+s__Xanthomonas_sp_SHU166	1	GCF_000364685
+s__Pseudocowpox_virus	1	PRJNA45973
+s__Candidatus_Liberibacter_asiaticus	2	GCF_000023765	GCF_000346595
+s__Ketogulonicigenium_vulgare	2	GCF_000223375	GCF_000164885
+s__Snakehead_virus	1	PRJNA14689
+s__Anaerobaculum_mobile	1	GCF_000266925
+s__Bacteriovorax_sp_BSW11_IV	1	GCF_000447755
+s__Odoribacter_splanchnicus	1	GCF_000190535
+s__Ageratum_yellow_vein_China_virus	1	PRJNA14490
+s__Marinobacterium_stanieri	1	GCF_000220545
+s__Cellulophaga_phage_phi13_2	1	PRJNA212953
+s__Veillonella_sp_oral_taxon_158	1	GCF_000183505
+s__Tomato_leaf_curl_Java_betasatellite	1	PRJNA14452
+s__Tomato_leaf_curl_Togo_betasatellite_Togo_2006	1	PRJNA60629
+s__Felid_herpesvirus_1	1	PRJNA42429
+s__Cucumber_fruit_mottle_mosaic_virus	1	PRJNA14709
+s__Tomato_necrotic_stunt_virus	1	PRJNA162495
+s__Thermococcus_zilligii	1	GCF_000258515
+s__Shallot_virus_X	1	PRJNA14805
+s__Acinetobacter_sp_CIP_102143	1	GCF_000369865
+s__Leishmania_major	1	GCA_000002725
+s__Alicycliphilus_sp_CRZ1	1	GCF_000282995
+s__Zymophilus_raffinosivorans	1	GCF_000381065
+s__Salmonella_bongori	2	GCF_000439255	GCF_000252995
+s__Nostoc_sp_PCC_7120	1	GCF_000009705
+s__Salmonella_phage_PhiSH19	1	PRJNA181236
+s__Helleborus_net_necrosis_virus	1	PRJNA33877
+s__Melioribacter_roseus	1	GCF_000279145
+s__Pegivirus_A	1	PRJNA14647
+s__Squash_vein_yellowing_virus	1	PRJNA29107
+s__Enterobacter_lignolyticus	1	GCF_000164865
+s__Cryphonectria_hypovirus_4	1	PRJNA15007
+s__Clostridium_sp_HGF2	1	GCF_000183585
+s__Cryphonectria_hypovirus_1	1	PRJNA14664
+s__Cryphonectria_hypovirus_3	1	PRJNA14690
+s__Cryphonectria_hypovirus_2	1	PRJNA14754
+s__Bifidobacterium_pseudolongum	1	GCF_000421365
+s__Sphingobium_quisquiliarum	1	GCF_000445065
+s__Deinococcus_radiodurans	1	GCF_000008565
+s__Klebsiella_phage_KP27	1	PRJNA185314
+s__Hylemonella_gracilis	1	GCF_000211835
+s__Staphylococcus_haemolyticus	2	GCF_000009865	GCF_000261465
+s__Senecio_yellow_mosaic_virus	1	PRJNA15233
+s__Chicory_yellow_mottle_virus_satellite_RNA	1	PRJNA14988
+s__Carnation_ringspot_virus	1	PRJNA14753
+s__Alloprevotella_rava	1	GCF_000234115
+s__Bundibugyo_ebolavirus	1	PRJNA51245
+s__Salmonella_phage_Vi06	1	PRJNA64609
+s__Streptococcus_intermedius	8	GCF_000463355	GCF_000234015	GCF_000306805	GCF_000258445	GCF_000313655	GCF_000234035	GCF_000413475	GCF_000463385
+s__Serratia_sp_S4	1	GCF_000347995
+s__Eubacterium_biforme	1	GCF_000156655
+s__Propionibacterium_phage_PHL037M02	1	PRJNA219116
+s__Tulip_virus_X	1	PRJNA14865
+s__Burkholderia_bryophila	1	GCF_000383275
+s__Synergistes_sp_3_1_syn1	1	GCF_000238615
+s__Frankia_alni	1	GCF_000058485
+s__Malvastrum_yellow_mosaic_virus	1	PRJNA18131
+s__Bartonella_schoenbuchensis	1	GCF_000385435
+s__zeta_proteobacterium_SCGC_AB_133_C04	1	GCF_000379285
+s__Fragaria_chiloensis_latent_virus	1	PRJNA15122
+s__Photobacterium_sp_SKA34	1	GCF_000153325
+s__Bacteroides_faecis	1	GCF_000226135
+s__Rodent_herpesvirus_Peru	1	PRJNA62491
+s__Desulfovibrio_africanus	2	GCF_000344315	GCF_000212675
+s__Polaribacter_franzmannii	1	GCF_000377865
+s__Pseudoalteromonas_ruthenica	1	GCF_000336495
+s__Pseudochrobactrum_sp_AO18b	1	GCF_000409565
+s__Anaerotruncus_sp_G3_2012	1	GCF_000403395
+s__Rhizobium_sp_BR816	1	GCF_000378985
+s__Streptomyces_albus	2	GCF_000156475	GCF_000359525
+s__Lentibacillus_jeotgali	1	GCF_000224785
+s__Natrinema_altunense	1	GCF_000337155
+s__Rickettsia_rickettsii	8	GCF_000283835	GCF_000283775	GCF_000018225	GCF_000283795	GCF_000283935	GCF_000283815	GCF_000017445	GCF_000283955
+s__Brucella_sp_UK38_05	1	GCF_000367125
+s__Halomonas_jeotgali	1	GCF_000334215
+s__Acinetobacter_indicus	2	GCF_000413875	GCF_000488255
+s__Providencia_phage_Redjac	1	PRJNA177540
+s__Adlercreutzia_equolifaciens	1	GCF_000478885
+s__Coprococcus_eutactus	1	GCF_000154425
+s__Pelargonium_zonate_spot_virus	1	PRJNA14774
+s__Apricot_latent_virus	1	PRJNA61427
+s__Streptococcus_phage_2972	1	PRJNA15254
+s__Acinetobacter_beijerinckii	2	GCF_000369005	GCF_000368985
+s__Saccharomonospora_paurometabolica	1	GCF_000231035
+s__Avian_encephalomyelitis_virus	1	PRJNA15360
+s__Roseburia_inulinivorans	1	GCF_000174195
+s__Barnesiella_intestinihominis	1	GCF_000296465
+s__Lactobacillus_fructivorans	1	GCF_000185465
+s__Glaciecola_punicea	1	GCF_000252165
+s__Brazoran_virus	1	PRJNA214783
+s__Peste_des_petits_ruminants_virus	1	PRJNA15499
+s__Corynebacterium_phage_P1201	1	PRJNA20781
+s__Crocuta_crocuta_papillomavirus_1	1	PRJNA174774
+s__Cyclobacterium_marinum	1	GCF_000222485
+s__Natronococcus_jeotgali	1	GCF_000337695
+s__Ruminococcus_sp_5_1_39BFAA	1	GCF_000159975
+s__Streptococcus_sp_GMD4S	1	GCF_000296955
+s__Torque_teno_virus_8	1	PRJNA48167
+s__Torque_teno_virus_7	1	PRJNA48159
+s__Torque_teno_virus_6	1	PRJNA48187
+s__Fenneropenaeus_chinensis_hepatopancreatic_densovirus	1	PRJNA51177
+s__Torque_teno_virus_4	1	PRJNA48137
+s__Torque_teno_virus_3	1	PRJNA48161
+s__Torque_teno_virus_2	1	PRJNA51893
+s__Torque_teno_virus_1	1	PRJNA15247
+s__Myroides_injenensis	1	GCF_000246945
+s__Brucella_neotomae	1	GCF_000158715
+s__Mycobacterium_phage_vB_MapS_FF47	1	PRJNA197296
+s__Acinetobacter_sp_NIPH_236	1	GCF_000367965
+s__Caldanaerobacter_subterraneus	3	GCF_000156275	GCF_000007085	GCF_000473865
+s__Flavobacteria_bacterium_BBFL7	1	GCF_000153385
+s__Thiothrix_disciformis	1	GCF_000371925
+s__Xanthomonas_phage_Cf1c	1	PRJNA14329
+s__Ornithogalum_mosaic_virus	1	PRJNA179428
+s__Candidatus_Protochlamydia_amoebophila	1	GCF_000011565
+s__Methanoculleus_bourgensis	1	GCF_000304355
+s__Xanthomonas_oryzae	6	GCF_000007385	GCF_000212755	GCF_000168315	GCF_000212775	GCF_000010025	GCF_000019585
+s__Lactobacillus_curvatus	1	GCF_000235705
+s__Brucella_phage_Pr	1	PRJNA181064
+s__Campylobacter_phage_CP30A	1	PRJNA177545
+s__Duck_hepatitis_B_virus	1	PRJNA14576
+s__Ralstonia_pickettii	4	GCF_000471925	GCF_000020205	GCF_000372665	GCF_000023425
+s__Chayote_yellow_mosaic_virus	1	PRJNA15193
+s__Callitrichine_herpesvirus_3	1	PRJNA14324
+s__Acinetobacter_tjernbergiae	2	GCF_000488175	GCF_000374425
+s__Rhodopirellula_europaea	2	GCF_000346315	GCF_000338295
+s__Treponema_sp_JC4	1	GCF_000260795
+s__Sorghum_chlorotic_spot_virus	1	PRJNA14835
+s__Pseudoalteromonas_marina	1	GCF_000238335
+s__Peach_latent_mosaic_viroid	1	PRJNA14772
+s__Tomato_infectious_chlorosis_virus	1	PRJNA40419
+s__Vibrio_kanaloae	1	GCF_000272165
+s__Thermoplasma_acidophilum	1	GCF_000195915
+s__Venezuelan_equine_encephalitis_virus	1	PRJNA15302
+s__Bacteriovorax_marinus	1	GCF_000210915
+s__Bacteroides_phage_B40_8	1	PRJNA31249
+s__Acinetobacter_sp_ATCC_27244	1	GCF_000156555
+s__Bacillus_phage_Troll	1	PRJNA215668
+s__Lactobacillus_antri	1	GCF_000160835
+s__Microcystis_aeruginosa_phage_Ma_LMM01	1	PRJNA18127
+s__Thottapalayam_virus	1	PRJNA29841
+s__Citrus_psorosis_virus	1	PRJNA15060
+s__Citreicella_sp_SE45	1	GCF_000161755
+s__Ceratocystis_resinifera_virus_1	1	PRJNA29901
+s__Acetobacteraceae_bacterium_AT_5844	1	GCF_000245075
+s__Thermus_phage_phi_OH2	1	PRJNA212950
+s__Lactobacillus_phage_LP65	1	PRJNA14547
+s__Prochlorococcus_phage_Syn33	1	PRJNA64707
+s__Rickettsia_bellii	2	GCF_000012385	GCF_000018245
+s__Kalanchoe_latent_virus	1	PRJNA39583
+s__Campylobacter_curvus	2	GCF_000017465	GCF_000376325
+s__Cyanophage_Syn30	1	PRJNA198437
+s__Tomato_leaf_curl_Nigeria_virus_Nigeria_2006	1	PRJNA34815
+s__Caprine_arthritis_encephalitis_virus	1	PRJNA15243
+s__Hepatitis_delta_virus	1	PRJNA15032
+s__Photobacterium_angustum	1	GCF_000153265
+s__Helicobacter_bizzozeronii	2	GCF_000263275	GCF_000237285
+s__Nocardiopsis_halophila	1	GCF_000341245
+s__Neisseria_wadsworthii	1	GCF_000227765
+s__Idiomarina_baltica	1	GCF_000152885
+s__Streptomyces_phage_phiHau3	1	PRJNA177522
+s__Aeromonas_phage_phiAS5	1	PRJNA59729
+s__Aeromonas_phage_phiAS4	1	PRJNA59727
+s__Aeromonas_phage_phiAS7	1	PRJNA181221
+s__Hop_latent_viroid	1	PRJNA14970
+s__Shewanella_sp_W3_18_1	1	GCF_000015185
+s__Peptoniphilus_timonensis	1	GCF_000312025
+s__Yersinia_phage_phiA1122	1	PRJNA14332
+s__Bacteroides_finegoldii	3	GCF_000269545	GCF_000156195	GCF_000304195
+s__Ruminococcus_sp_JC304	1	GCF_000285855
+s__Methylobacterium_sp_77	1	GCF_000372825
+s__Treponema_phagedenis	1	GCF_000187105
+s__Pseudomonas_alcaligenes	2	GCF_000455385	GCF_000467105
+s__Leptonema_illini	1	GCF_000243335
+s__Chickpea_chlorosis_virus	1	PRJNA60627
+s__Aminomonas_paucivorans	1	GCF_000165795
+s__Spiroplasma_taiwanense	1	GCF_000439435
+s__Bhendi_yellow_vein_India_betasatellite	1	PRJNA61557
+s__Prochlorococcus_marinus	13	GCF_000015705	GCF_000015965	GCF_000015645	GCF_000011465	GCF_000007925	GCF_000011485	GCF_000015685	GCF_000012465	GCF_000018585	GCF_000158595	GCF_000015665	GCF_000012645	GCF_000018065
+s__Vibrio_sp_712i1	1	GCF_000316925
+s__Corynebacterium_terpenotabidum	1	GCF_000418365
+s__Alicycliphilus_denitrificans	2	GCF_000204645	GCF_000179015
+s__Myroides_odoratus	1	GCF_000243275
+s__Providencia_alcalifaciens	2	GCF_000173415	GCF_000314875
+s__Tomato_yellow_leaf_curl_Guangdong_virus	1	PRJNA17801
+s__Shuttleworthia_satelles	1	GCF_000160115
+s__Haliangium_ochraceum	1	GCF_000024805
+s__Carrot_yellow_leaf_virus	1	PRJNA39585
+s__Garlic_common_latent_virus	1	PRJNA78925
+s__Streptomyces_sp_S4	1	GCF_000297715
+s__Orgyia_pseudotsugata_multiple_nucleopolyhedrovirus	1	PRJNA14084
+s__Helicoverpa_zea_single_nucleopolyhedrovirus	1	PRJNA14148
+s__Tobacco_ringspot_virus	1	PRJNA14933
+s__Corynebacterium_amycolatum	1	GCF_000173655
+s__Mopeia_virus	1	PRJNA15037
+s__Acidimicrobium_ferrooxidans	1	GCF_000023265
+s__Leifsonia_rubra	1	GCF_000477555
+s__Cadicivirus_A	1	PRJNA201444
+s__Streptomyces_sp_SS	1	GCF_000302615
+s__Leptolyngbya_sp_PCC_6406	1	GCF_000332095
+s__Tetrasphaera_phage_TJE1	1	PRJNA184167
+s__Pestivirus_strain_Aydin_04_TR	1	PRJNA176618
+s__Spring_viraemia_of_carp_virus	1	PRJNA14726
+s__Bacillus_sp_WBUNB004	1	GCF_000319755
+s__Halorubrum_sp_T3	1	GCF_000296615
+s__Tomato_leaf_curl_Sinaloa_virus	1	PRJNA19971
+s__Mesorhizobium_australicum	1	GCF_000230995
+s__Porphyrobacter_sp_AAP82	1	GCF_000331285
+s__Haloferax_gibbonsii	1	GCF_000336775
+s__Yellow_fever_virus	1	PRJNA15284
+s__Pteronotus_polyomavirus	1	PRJNA185190
+s__Cucurbit_leaf_crumple_virus	1	PRJNA14121
+s__Leptospira_wolbachii	1	GCF_000332515
+s__Stachytarpheta_leaf_curl_virus	1	PRJNA14412
+s__Desulfarculus_baarsii	1	GCF_000143965
+s__Canine_minute_virus	1	PRJNA15465
+s__endosymbiont_of_Tevnia_jerichonana	1	GCF_000224925
+s__Leptospira_vanthielii	1	GCF_000332455
+s__Desulfotomaculum_nigrificans	1	GCF_000189755
+s__Apple_chlorotic_leaf_spot_virus	1	PRJNA14658
+s__Xylanimonas_cellulosilytica	1	GCF_000024965
+s__Joostella_marina	1	GCF_000260115
+s__Escherichia_phage_Lw1	1	PRJNA206486
+s__Haloarcula_amylolytica	1	GCF_000336615
+s__Agrobacterium_sp_224MFTsu3_1	1	GCF_000384555
+s__Ignisphaera_aggregans	1	GCF_000145985
+s__Acinetobacter_sp_NIPH_758	1	GCF_000368345
+s__Lysinibacillus_fusiformis	2	GCF_000178135	GCF_000313955
+s__Borna_disease_virus	1	PRJNA14675
+s__Thioalkalivibrio_sp_ALE31	1	GCF_000377405
+s__Capnocytophaga_sp_oral_taxon_329	1	GCF_000213295
+s__Papaya_leaf_curl_China_virus	1	PRJNA14536
+s__Actinomyces_sp_oral_taxon_181	1	GCF_000318335
+s__Methanocaldococcus_jannaschii	1	GCF_000091665
+s__Salicola_phage_CGphi29	1	PRJNA195485
+s__Tomato_yellow_leaf_curl_Thailand_virus	1	PRJNA15179
+s__Chrysodeixis_chalcites_nucleopolyhedrovirus	1	PRJNA15469
+s__Human_parvovirus_B19	1	PRJNA14090
+s__Actinoplanes_missouriensis	1	GCF_000284295
+s__Algerian_watermelon_mosaic_virus	1	PRJNA29883
+s__Crocosphaera_watsonii	2	GCF_000235665	GCF_000167195
+s__Escherichia_phage_rv5	1	PRJNA30613
+s__Clostridium_tunisiense	1	GCF_000300195
+s__Thiothrix_flexilis	1	GCF_000380185
+s__Zinnia_leaf_curl_virus_associated_DNA_beta	1	PRJNA14538
+s__Leptospira_santarosai	24	GCF_000217455	GCF_000306615	GCF_000332435	GCF_000244795	GCF_000244655	GCF_000244555	GCF_000348015	GCF_000306575	GCF_000246375	GCF_000244575	GCF_000306475	GCF_000332395	GCF_000346915	GCF_000244475	GCF_000244615	GCF_000243835	GCF_000244735	GCF_000343395	GCF_000246395	GCF_000346995	GCF_000306455	GCF_000216275	GCF_000313175	GCF_000244675
+s__Simkania_negevensis	1	GCF_000237205
+s__Caulobacter_sp_AP07	1	GCF_000281955
+s__Orchid_fleck_virus	1	PRJNA19969
+s__Desulfonatronospira_thiodismutans	1	GCF_000174435
+s__Aconitum_latent_virus	1	PRJNA15382
+s__Brucella_sp_NVSL_07_0026	1	GCF_000163135
+s__Zygosaccharomyces_bailii_virus_Z	1	PRJNA14823
+s__Woodsholea_maritima	1	GCF_000382325
+s__Sinorhizobium_fredii	3	GCF_000283895	GCF_000018545	GCF_000265205
+s__Salmonella_phage_SETP13	1	PRJNA226727
+s__Flavobacteria_bacterium_MS024_2A	1	GCF_000173095
+s__Methylobacterium_mesophilicum	1	GCF_000364445
+s__Porphyromonas_catoniae	1	GCF_000318215
+s__Magnetospirillum_magneticum	1	GCF_000009985
+s__Lachnospiraceae_bacterium_oral_taxon_082	1	GCF_000242315
+s__Flavobacterium_antarcticum	1	GCF_000419685
+s__Pseudomonas_phage_JG024	1	PRJNA181067
+s__Babesia_bovis	1	GCA_000165395
+s__Paenibacillus_curdlanolyticus	1	GCF_000179615
+s__Sclerotinia_sclerotiorum_dsRNA_mycovirus_L	1	PRJNA165743
+s__Listeria_phage_B054	1	PRJNA20797
+s__Collinsella_tanakaei	1	GCF_000225705
+s__Mycobacterium_gilvum	2	GCF_000184435	GCF_000016365
+s__Luffa_begomovirus_associated_DNA_beta	1	PRJNA16795
+s__Sida_mottle_virus	1	PRJNA14255
+s__Capnocytophaga_sp_oral_taxon_324	1	GCF_000318315
+s__Thauera_sp_MZ1T	1	GCF_000021765
+s__Capnocytophaga_sp_oral_taxon_326	1	GCF_000318295
+s__Strawberry_mild_yellow_edge_virus	1	PRJNA14999
+s__Tomato_leaf_curl_Cebu_virus	1	PRJNA28987
+s__Riemerella_anatipestifer	7	GCF_000331695	GCF_000183155	GCF_000252855	GCF_000184135	GCF_000191565	GCF_000321285	GCF_000295655
+s__Croton_yellow_vein_virus	1	PRJNA51789
+s__Pepino_mosaic_virus	1	PRJNA15125
+s__Methanosarcina_acetivorans	1	GCF_000007345
+s__Actinobacillus_succinogenes	1	GCF_000017245
+s__Bradyrhizobium_sp_ORS_278	1	GCF_000026145
+s__Turneriella_parva	1	GCF_000266885
+s__Staphylococcus_vitulinus	1	GCF_000286335
+s__Torque_teno_zalophus_virus_1	1	PRJNA34735
+s__Pseudomonas_phage_F116	1	PRJNA15127
+s__Thermodesulfovibrio_yellowstonii	1	GCF_000020985
+s__Okra_leaf_curl_Mali_virus_satellite_DNA_beta	1	PRJNA20323
+s__Yersinia_phage_Yepe2	1	PRJNA62965
+s__Allpahuayo_virus	1	PRJNA28323
+s__Bacteroides_massiliensis	3	GCF_000403195	GCF_000382445	GCF_000373085
+s__Ribgrass_mosaic_virus	1	PRJNA14980
+s__Microbacterium_sp_11MF	1	GCF_000383475
+s__Plutella_xylostella_multiple_nucleopolyhedrovirus	1	PRJNA17671
+s__Candidatus_Blochmannia_vafer	1	GCF_000185985
+s__Synechococcus_phage_S_RSM4	1	PRJNA39923
+s__Sugarcane_streak_virus	1	PRJNA14177
+s__Salmonella_phage_PVP_SE1	1	PRJNA74359
+s__Vanderwaltozyma_polyspora	1	GCA_000150035
+s__Cotton_leaf_curl_Multan_virus	2	PRJNA14242	PRJNA33487
+s__Streptomyces_tsukubaensis	1	GCF_000297155
+s__Rhodococcus_sp_AW25M09	1	GCF_000333955
+s__Coprococcus_catus	1	GCF_000210555
+s__Riemerella_columbina	1	GCF_000374405
+s__Vibrio_sp_Ex25	2	GCF_000152485	GCF_000024825
+s__Mycobacterium_phage_Troll4	1	PRJNA32011
+s__Measles_virus	1	PRJNA15025
+s__Clostridium_papyrosolvens	2	GCF_000421965	GCF_000175795
+s__Phytophthora_infestans_RNA_virus_1	1	PRJNA40329
+s__Prevotella_histicola	1	GCF_000234055
+s__Tobacco_vein_mottling_virus	1	PRJNA15348
+s__Maize_white_line_mosaic_virus	1	PRJNA19755
+s__Snake_adenovirus_A	1	PRJNA27899
+s__Sodalis_glossinidius	1	GCF_000010085
+s__Streptococcus_oligofermentans	1	GCF_000385925
+s__Geobacillus_sp_WSUCF1	1	GCF_000422025
+s__Pseudomonas_sp_CFT9	1	GCF_000416255
+s__Tomato_leaf_curl_Mayotte_virus	1	PRJNA15212
+s__Cronobacter_phage_vB_CskP_GAP227	1	PRJNA185316
+s__Mycobacterium_phage_CrimD	1	PRJNA51669
+s__Phaeobacter_inhibens	1	GCF_000154765
+s__Listeria_phage_2389	1	PRJNA14142
+s__Chryseobacterium_sp_CF314	1	GCF_000282115
+s__Vibrio_phage_douglas_12A4	1	PRJNA198432
+s__Butyricicoccus_pullicaecorum	1	GCF_000398925
+s__Soybean_chlorotic_mottle_virus	1	PRJNA14594
+s__Thermotoga_sp_EMP	1	GCF_000294555
+s__Bacillus_sp_1NLA3E	1	GCF_000242895
+s__Corchorus_yellow_vein_mosaic_virus	1	PRJNA192607
+s__Avian_endogenous_retrovirus_EAV_HP	1	PRJNA15213
+s__Hemileuca_sp_nucleopolyhedrovirus	1	PRJNA214353
+s__Candidatus_Azobacteroides_pseudotrichonymphae	1	GCF_000010645
+s__Leadbetterella_byssophila	1	GCF_000166395
+s__Paenibacillus_massiliensis	1	GCF_000377505
+s__Holospora_obtusa	1	GCF_000469665
+s__Fusobacterium_nucleatum	22	GCF_000158255	GCF_000273625	GCF_000218645	GCF_000479185	GCF_000178895	GCF_000158275	GCF_000182945	GCF_000220825	GCF_000400875	GCF_000162235	GCF_000279975	GCF_000242975	GCF_000273605	GCF_000153625	GCF_000479225	GCF_000158535	GCF_000218655	GCF_000163915	GCF_000479205	GCF_000234075	GCF_000007325	GCF_000162355
+s__Martelella_mediterranea	1	GCF_000376125
+s__Geobacillus_sp_Y412MC61	1	GCF_000024705
+s__Avian_carcinoma_virus	1	PRJNA14632
+s__Catelliglobosispora_koreensis	1	GCF_000379685
+s__Enhydrobacter_aerosaccus	1	GCF_000175915
+s__Dobrava_Belgrade_virus	1	PRJNA15319
+s__Fischerella_thermalis	1	GCF_000317225
+s__Thermoanaerobacter_ethanolicus	2	GCF_000192295	GCF_000175815
+s__Ilumatobacter_coccineus	1	GCF_000348785
+s__Microbacterium_sp_oral_taxon_186	1	GCF_000411455
+s__Macroptilium_yellow_mosaic_virus	1	PRJNA14598
+s__Equine_polyomavirus	1	PRJNA167666
+s__Parabacteroides_merdae	3	GCF_000154105	GCF_000307495	GCF_000307345
+s__Burkholderia_sp_JPY347	1	GCF_000373005
+s__Prevotella_saccharolytica	1	GCF_000318195
+s__Microbacterium_sp_292MF	1	GCF_000380605
+s__Pseudomonas_sp_2_1_26	1	GCF_000233495
+s__Rotavirus_D	1	PRJNA52635
+s__Citrus_dwarfing_viroid	1	PRJNA14974
+s__Lachnospiraceae_bacterium_7_1_58FAA	1	GCF_000242155
+s__Alphapapillomavirus_3	1	PRJNA15451
+s__Alphapapillomavirus_1	1	PRJNA15508
+s__Alphapapillomavirus_6	1	PRJNA15510
+s__Alphapapillomavirus_7	1	PRJNA15506
+s__Alphapapillomavirus_4	1	PRJNA15512
+s__Alphapapillomavirus_5	1	PRJNA15507
+s__Streptococcus_marimammalium	1	GCF_000380045
+s__Enterobacter_sp_R4_368	1	GCF_000410515
+s__Alphapapillomavirus_8	1	PRJNA15450
+s__Alphapapillomavirus_9	1	PRJNA15505
+s__Asystasia_begomovirus_1	1	PRJNA81011
+s__Mannheimia_haemolytica	16	GCF_000443105	GCF_000153645	GCF_000443205	GCF_000176255	GCF_000443225	GCF_000176275	GCF_000349785	GCF_000427275	GCF_000376645	GCF_000439735	GCF_000422145	GCF_000349765	GCF_000341635	GCF_000443085	GCF_000443185	GCF_000422095
+s__Bacteroides_sp_1_1_6	1	GCF_000159875
+s__Pseudomonas_phage_JBD5	1	PRJNA188543
+s__Vibrio_phage_VP5	1	PRJNA14382
+s__Vibrio_phage_VP2	1	PRJNA14473
+s__Streptococcus_sp_GMD2S	1	GCF_000296995
+s__Acinetobacter_sp_TG2027	1	GCF_000302435
+s__Deinococcus_geothermalis	1	GCF_000196275
+s__Maize_fine_streak_virus	1	PRJNA15216
+s__Mycobacterium_phage_Bethlehem	1	PRJNA20945
+s__Starkeya_novella	1	GCF_000092925
+s__Penaeid_shrimp_infectious_myonecrosis_virus	1	PRJNA16652
+s__Maruca_vitrata_nucleopolyhedrovirus	1	PRJNA18533
+s__Vibrio_sp_16	1	GCF_000158115
+s__Natronobacterium_gregoryi	2	GCF_000230715	GCF_000337655
+s__Narcissus_mosaic_virus	1	PRJNA14660
+s__Deinococcus_proteolyticus	1	GCF_000190555
+s__Listeria_phage_A511	1	PRJNA20793
+s__Veillonella_dispar	1	GCF_000160015
+s__Sclerotinia_sclerotiorum_partitivirus_S	1	PRJNA39595
+s__Sida_yellow_vein_virus_satellite_DNA_beta	1	PRJNA15562
+s__Nocardiopsis_sp_CNS639	1	GCF_000381685
+s__Lentisphaera_araneosa	1	GCF_000170755
+s__Henriciella_marina	1	GCF_000376805
+s__Vibrio_albensis	1	GCF_000174235
+s__Alteromonas_macleodii	12	GCF_000300175	GCF_000439595	GCF_000310085	GCF_000439575	GCF_000299955	GCF_000020585	GCF_000439515	GCF_000299995	GCF_000439475	GCF_000172635	GCF_000439535	GCF_000439555
+s__Porphyromonas_uenonis	1	GCF_000174775
+s__Propionibacterium_sp_HGH0353	1	GCF_000413335
+s__Clostridium_sp_L2_50	1	GCF_000154245
+s__Staphylococcus_phage_GH15	1	PRJNA181069
+s__Thermus_sp_CCB_US3_UF1	1	GCF_000236585
+s__Culex_pipiens_densovirus	1	PRJNA37995
+s__zeta_proteobacterium_SCGC_AB_137_J06	1	GCF_000379245
+s__Haemophilus_haemolyticus	1	GCF_000262285
+s__Gammapapillomavirus_10	1	PRJNA49377
+s__Mamastrovirus_1	1	PRJNA15436
+s__Polynucleobacter_necessarius	2	GCF_000016345	GCF_000019745
+s__Neisseria_cinerea	1	GCF_000173895
+s__Ajellomyces_capsulatus	1	GCA_000149585
+s__Janthinobacterium_sp_CG3	1	GCF_000344615
+s__Xanthomonas_citri_phage_CP2	1	PRJNA188546
+s__Modestobacter_multiseptatus	1	GCF_000306785
+s__Desulfitobacterium_dehalogenans	1	GCF_000243155
+s__Dickeya_zeae	4	GCF_000382585	GCF_000400525	GCF_000264075	GCF_000023565
+s__Jaagsiekte_sheep_retrovirus	1	PRJNA14665
+s__Bradyrhizobium_sp_WSM1253	1	GCF_000244935
+s__Talaromyces_marneffei	1	GCA_000001985
+s__Maize_streak_Reunion_virus	1	PRJNA165745
+s__Nocardia_sp_BMG111209	1	GCF_000381925
+s__Mycobacterium_phage_Kostya	1	PRJNA30695
+s__Miniopterus_bat_coronavirus_HKU8	1	PRJNA29245
+s__Tobacco_yellow_crinkle_virus	1	PRJNA67693
+s__Tomato_leaf_curl_Yemen_betasatellite	1	PRJNA177643
+s__Clostridium_ramosum	1	GCF_000154485
+s__Pea_seed_borne_mosaic_virus	1	PRJNA15295
+s__Candidatus_Amoebophilus_asiaticus	1	GCF_000020565
+s__Candidatus_Tremblaya_princeps	2	GCF_000219195	GCF_000220965
+s__Ralstonia_phage_RSS1	1	PRJNA18291
+s__Beet_western_yellows_ST9_associated_virus	1	PRJNA14910
+s__Maracuja_mosaic_virus	1	PRJNA18531
+s__Streptococcus_sp_oral_taxon_058	1	GCF_000235485
+s__Streptococcus_sp_oral_taxon_056	1	GCF_000220065
+s__Enterobacteria_phage_HK225	1	PRJNA183140
+s__Nocardiopsis_valliformis	1	GCF_000340985
+s__Penicillium_chrysogenum_virus	1	PRJNA16141
+s__Synechococcus_phage_S_CAM1	1	PRJNA195484
+s__Synechococcus_phage_S_CAM8	1	PRJNA209065
+s__Corynebacterium_pilosum	1	GCF_000373805
+s__Mouse_parvovirus_2	1	PRJNA17125
+s__Acinetobacter_rudis	1	GCF_000413895
+s__Sphaerobacter_thermophilus	1	GCF_000024985
+s__Brevibacillus_agri	1	GCF_000328345
+s__Pyrobaculum_islandicum	1	GCF_000015205
+s__Campylobacter_upsaliensis	2	GCF_000185345	GCF_000167395
+s__Tomato_leaf_curl_Philippines_virus	1	PRJNA14297
+s__Potato_yellow_mosaic_Panama_virus	1	PRJNA14013
+s__Tioman_virus	1	PRJNA14846
+s__Mokola_virus	1	PRJNA15013
+s__Rhodanobacter_thiooxydans	1	GCF_000264375
+s__Pseudomonas_plecoglossicida	1	GCF_000412715
+s__Marine_RNA_virus_SOG	1	PRJNA20647
+s__Staphylococcus_phage_ROSA	1	PRJNA15274
+s__Weissella_phage_phiYS61	1	PRJNA171973
+s__Streptomyces_sp_CNB091	1	GCF_000377965
+s__Mannheimia_phage_phiMHaA1	1	PRJNA17103
+s__Strawberry_chlorotic_fleck_associated_virus	1	PRJNA17741
+s__Shamonda_virus	1	PRJNA173358
+s__Mycobacterium_phage_Phaedrus	1	PRJNA30697
+s__Cotton_leaf_curl_virus_betasatellite	1	PRJNA162497
+s__Mycobacterium_phage_Breezona	1	PRJNA206035
+s__Ross_River_virus	1	PRJNA15314
+s__Prochlorococcus_sp_W9	1	GCF_000291925
+s__Streptomyces_sp_SPB78	1	GCF_000158855
+s__Streptomyces_sp_SPB74	1	GCF_000154905
+s__Brassica_yellows_virus	1	PRJNA73689
+s__Streptococcus_phage_7201	1	PRJNA14051
+s__Alcaligenes_faecalis	1	GCF_000275465
+s__Bat_coronavirus_CDPHE15_USA_2006	1	PRJNA215863
+s__Okra_enation_leaf_curl_virus	1	PRJNA61775
+s__Azospira_oryzae	1	GCF_000236665
+s__Holdemania_filiformis	1	GCF_000157995
+s__Legionella_drancourtii	1	GCF_000162755
+s__Flavobacteriaceae_bacterium_HQM9	1	GCF_000218485
+s__Natronomonas_pharaonis	1	GCF_000026045
+s__Sunn_hemp_leaf_distortion_virus	1	PRJNA39609
+s__Canine_papillomavirus_10	1	PRJNA74355
+s__Treponema_azotonutricium	1	GCF_000214355
+s__Bacillus_phage_MG_B1	1	PRJNA206485
+s__Parabacteroides_goldsteinii	2	GCF_000307395	GCF_000403825
+s__Prevotella_copri	1	GCF_000157935
+s__Infectious_hypodermal_and_hematopoietic_necrosis_virus	1	PRJNA14436
+s__Salmonella_phage_vB_SenS_Ent1	1	PRJNA181993
+s__Halorubrum_californiense	1	GCF_000336875
+s__Clostridiaceae_bacterium_JC118	1	GCF_000313565
+s__Sweetpotato_badnavirus_A	1	PRJNA68017
+s__Acinetobacter_tandoii	1	GCF_000400735
+s__Bat_coronavirus_BtCoV_133_2005	1	PRJNA17585
+s__Advenella_kashmirensis	2	GCF_000219915	GCF_000506985
+s__Lachnospiraceae_bacterium_4_1_37FAA	1	GCF_000191805
+s__Pantoea_vagans	1	GCF_000148935
+s__Microlunatus_phosphovorus	1	GCF_000270245
+s__Candidatus_Moranella_endobia	2	GCF_000219175	GCF_000364725
+s__Ureaplasma_parvum	4	GCF_000006625	GCF_000019345	GCF_000171355	GCF_000169895
+s__White_ash_mosaic_virus	1	PRJNA32671
+s__Opitutaceae_bacterium_TAV5	1	GCF_000242935
+s__Bacillus_sp_M_2_6	1	GCF_000264255
+s__Oscillibacter_valericigenes	1	GCF_000283575
+s__Lactobacillus_sp_7_1_47FAA	1	GCF_000227195
+s__Leptospira_alstoni	2	GCF_000347175	GCF_000332555
+s__Clostridium_botulinum	29	GCF_000171055	GCF_000020285	GCF_000204395	GCF_000307635	GCF_000204375	GCF_000063585	GCF_000439615	GCF_000017025	GCF_000171095	GCF_000175335	GCF_000309805	GCF_000092345	GCF_000353835	GCF_000020345	GCF_000171075	GCF_000204565	GCF_000175395	GCF_000439815	GCF_000017045	GCF_000019305	GCF_000219255	GCF_000022765	GCF_000019545	GCF_000307655	GCF_000253195	GCF_000020165	GCF_000439635	GCF_000439655	GCF_000017065
+s__Sporolactobacillus_inulinus	1	GCF_000222445
+s__Streptococcus_thoraltensis	1	GCF_000380145
+s__Turdivirus_1	1	PRJNA51587
+s__Turdivirus_2	1	PRJNA51589
+s__Mycoplasma_capricolum	2	GCF_000012765	GCF_000192395
+s__Fervidicoccus_fontis	1	GCF_000258425
+s__Alistipes_sp_JC136	1	GCF_000285455
+s__Parabacteroides_sp_D13	1	GCF_000162275
+s__Cyclovirus_VN	1	PRJNA210797
+s__Geobacillus_virus_E2	1	PRJNA19797
+s__Enterobacterial_phage_mEp390	1	PRJNA183154
+s__Propionibacterium_phage_P100_A	1	PRJNA177535
+s__Leptospira_sp_serovar_Kenya	1	GCF_000347195
+s__Honeysuckle_yellow_vein_virus	1	PRJNA15224
+s__Sacbrood_virus	1	PRJNA14688
+s__Thermotoga_sp_RQ2	1	GCF_000019625
+s__Turkey_adenovirus_4	1	PRJNA225922
+s__Leptospira_borgpetersenii	21	GCF_000306375	GCF_000346975	GCF_000355135	GCF_000306415	GCF_000342885	GCF_000353225	GCF_000244535	GCF_000246695	GCF_000246555	GCF_000244215	GCF_000244495	GCF_000244075	GCF_000013945	GCF_000306335	GCF_000244255	GCF_000013965	GCF_000306675	GCF_000243795	GCF_000244835	GCF_000306315	GCF_000243775
+s__Pantoea_sp_A4	1	GCF_000295955
+s__Cacao_swollen_shoot_virus	1	PRJNA14534
+s__Propionibacterium_phage_B5	1	PRJNA14163
+s__Clostridium_tyrobutyricum	3	GCF_000359585	GCF_000392375	GCF_000332015
+s__Edwardsiella_phage_MSW_3	1	PRJNA185428
+s__Serratia_phage_phiMAM1	1	PRJNA185777
+s__Burkholderia_phage_BcepNY3	1	PRJNA19963
+s__Humibacter_albus	1	GCF_000421825
+s__Carnobacterium_sp_AT7	1	GCF_000171855
+s__Coprococcus_sp_HPP0048	1	GCF_000411355
+s__Thiomicrospira_crunogena	1	GCF_000012605
+s__SAR324_cluster_bacterium_SCGC_AAA240_J09	1	GCF_000213355
+s__Honeysuckle_yellow_vein_Kagoshima_virus	1	PRJNA18657
+s__Pseudomonas_phage_OBP	1	PRJNA81003
+s__Thermobacillus_composti	1	GCF_000227705
+s__Enterobacteria_phage_P22	1	PRJNA14478
+s__Flavobacterium_sp_WG21	1	GCF_000335775
+s__Tomato_leaf_curl_Ranchi_virus	1	PRJNA89399
+s__Patulibacter_medicamentivorans	1	GCF_000240225
+s__Nitrosospira_multiformis	1	GCF_000196355
+s__Rhodothermus_phage_RM378	1	PRJNA14420
+s__Peru_tomato_mosaic_virus	1	PRJNA15406
+s__Bacillus_phage_Wip1	1	PRJNA215653
+s__Halobacterium_sp_DL1	1	GCF_000230955
+s__Acidithiobacillus_sp_GGI_221	1	GCF_000179815
+s__Pseudomonas_phage_JG004	1	PRJNA181068
+s__Streptomyces_cattleya	2	GCF_000240165	GCF_000237305
+s__Neospora_caninum	1	GCA_000208865
+s__Xanthomonas_sp_NCPPB1131	1	GCF_000226895
+s__Xanthomonas_sp_NCPPB1132	1	GCF_000226915
+s__Lysinibacillus_boronitolerans	1	GCF_000286375
+s__Lambdapapillomavirus_1	1	PRJNA14421
+s__Lambdapapillomavirus_2	1	PRJNA14326
+s__Lambdapapillomavirus_3	1	PRJNA40369
+s__Lambdapapillomavirus_4	1	PRJNA15468
+s__Rickettsia_helvetica	1	GCF_000255355
+s__Geobacillus_sp_Y4_1MC1	1	GCF_000166075
+s__Debaryomyces_hansenii	1	GCA_000006445
+s__Ageratum_yellow_vein_virus	1	PRJNA15203
+s__Koala_retrovirus	1	PRJNA210799
+s__Mycobacterium_phage_Butterscotch	1	PRJNA32007
+s__Botryotinia_fuckeliana_partitivirus_1	1	PRJNA28759
+s__Halobacillus_halophilus	1	GCF_000284515
+s__Alistipes_senegalensis	1	GCF_000312145
+s__Lactobacillus_phage_LL_H	1	PRJNA19803
+s__Arenimonas_oryziterrae	1	GCF_000420545
+s__Enterococcus_sp_GMD5E	1	GCF_000302675
+s__Weissella_confusa	1	GCF_000239955
+s__Grapevine_Pinot_gris_virus	1	PRJNA70003
+s__Scrophularia_mottle_virus	1	PRJNA32679
+s__Clostridium_phage_phiCD38_2	1	PRJNA67249
+s__Staphylococcus_phage_66	1	PRJNA15263
+s__Lactobacillus_johnsonii_prophage_Lj771	1	PRJNA28145
+s__Cellulophaga_phage_phi46_1	1	PRJNA212961
+s__Desulfovibrio_fructosivorans	1	GCF_000179555
+s__Cellulophaga_phage_phi46_3	1	PRJNA212963
+s__Clostridium_sp_SS2_1	1	GCF_000154545
+s__Bacillus_phage_Andromeda	1	PRJNA192872
+s__Paenibacillus_polymyxa	5	GCF_000217775	GCF_000164985	GCF_000265445	GCF_000146875	GCF_000237325
+s__Helicobacter_phage_KHP30	1	PRJNA184163
+s__Abutilon_mosaic_Bolivia_virus	1	PRJNA62479
+s__Bat_adenovirus_B	1	PRJNA72369
+s__Leptotrichia_goodfellowii	1	GCF_000176335
+s__Mamestra_configurata_nucleopolyhedrovirus_A	1	PRJNA14168
+s__Calicivirus_isolate_TCG	1	PRJNA15123
+s__Methanobacterium_formicicum	1	GCF_000302455
+s__Bacillus_sp_95MFCvi2_1	1	GCF_000374965
+s__Acidithiobacillus_caldus	2	GCF_000175575	GCF_000221025
+s__Cotton_leaf_curl_Alabad_virus	1	PRJNA14240
+s__Perch_rhabdovirus	1	PRJNA194138
+s__Thermococcus_sp_4557	1	GCF_000221185
+s__Dysgonomonas_gadei	1	GCF_000213555
+s__Roseovarius_sp_217	1	GCF_000152845
+s__gamma_proteobacterium_HIMB30	1	GCF_000227525
+s__Helicobacter_mustelae	1	GCF_000091985
+s__Tetrahymena_thermophila	1	GCA_000189635
+s__Pelagibacter_phage_HTVC019P	1	PRJNA192868
+s__Dictyoglomus_thermophilum	1	GCF_000020965
+s__Chlorobium_phaeobacteroides	2	GCF_000015125	GCF_000020545
+s__Shigella_phage_EP23	1	PRJNA80919
+s__Beet_soil_borne_mosaic_virus	1	PRJNA14750
+s__Desulfovibrio_hydrothermalis	1	GCF_000331025
+s__Pseudomonas_extremaustralis	1	GCF_000242115
+s__Catenibacterium_mitsuokai	1	GCF_000173795
+s__Eilat_virus	1	PRJNA175588
+s__Spodoptera_litura_granulovirus	1	PRJNA19695
+s__Pseudomonas_sp_CFII64	1	GCF_000416235
+s__Neisseria_gonorrhoeae	18	GCF_000185865	GCF_000156815	GCF_000156935	GCF_000156955	GCF_000156975	GCF_000156775	GCF_000006845	GCF_000156915	GCF_000273665	GCF_000159935	GCF_000156875	GCF_000156795	GCF_000273685	GCF_000156895	GCF_000156755	GCF_000156835	GCF_000020105	GCF_000163535
+s__Enterobacteria_phage_PsP3	1	PRJNA14345
+s__Nostoc_sp_PCC_7524	1	GCF_000316645
+s__Methanolobus_psychrophilus	1	GCF_000306725
+s__Deinococcus_aquatilis	1	GCF_000378445
+s__Sphingomonas_phyllosphaerae	1	GCF_000419605
+s__Sathuperi_virus	1	PRJNA173356
+s__Human_papillomavirus_type_154	1	PRJNA208538
+s__Pyrenophora_tritici_repentis	1	GCA_000149985
+s__Burkholderia_sp_Ch1_1	1	GCF_000178415
+s__Simian_immunodeficiency_virus	2	PRJNA14872	PRJNA15501
+s__Columnea_latent_viroid	1	PRJNA14756
+s__Sindbis_virus	1	PRJNA15316
+s__Eubacterium_saphenum	1	GCF_000161975
+s__Zucchini_yellow_mosaic_virus	1	PRJNA15390
+s__Meno_virus	1	PRJNA196419
+s__Helicoverpa_armigera_multiple_nucleopolyhedrovirus	1	PRJNA33003
+s__Turnip_yellow_mosaic_virus	1	PRJNA15293
+s__Zymoseptoria_tritici	1	GCA_000219625
+s__Fer_de_Lance_paramyxovirus	1	PRJNA14985
+s__Merremia_mosaic_virus	1	PRJNA16699
+s__Vibrio_halioticoli	1	GCF_000496695
+s__Aspergillus_fumigatus	1	GCA_000002655
+s__Burkholderia_phage_ST79	1	PRJNA206488
+s__Nesterenkonia_alba	1	GCF_000421745
+s__H_1_parvovirus	1	PRJNA14578
+s__Gordonia_soli	1	GCF_000334455
+s__Bacteroides_sp_2_2_4	1	GCF_000157055
+s__Metascardovia_criceti	1	GCF_000376885
+s__Gluconacetobacter_sp_SXCC_1	1	GCF_000208635
+s__Scallion_mosaic_virus	1	PRJNA15190
+s__Keunjorong_mosaic_virus	1	PRJNA76731
+s__Desulfurobacterium_thermolithotrophum	1	GCF_000191045
+s__Halorubrum_phage_CGphi46	1	PRJNA209066
+s__Mycobacterium_sp_KMS	1	GCF_000015405
+s__Clanis_bilineata_nucleopolyhedrovirus	1	PRJNA17485
+s__Bear_Canyon_virus	1	PRJNA28325
+s__Dehalococcoides_mccartyi	6	GCF_000025025	GCF_000011905	GCF_000009025	GCF_000499365	GCF_000025585	GCF_000016705
+s__Corynebacterium_pseudodiphtheriticum	1	GCF_000466825
+s__Gull_circovirus	1	PRJNA18019
+s__Radish_leaf_curl_betasatellite	1	PRJNA28281
+s__Streptococcus_phage_Dp_1	1	PRJNA64617
+s__Rhizobium_sp_PDO1_076	1	GCF_000247475
+s__Tobacco_leaf_chlorosis_betasatellite	1	PRJNA178075
+s__Wheat_yellow_mosaic_virus	1	PRJNA15358
+s__Foxtail_mosaic_virus	1	PRJNA14640
+s__Lactobacillus_kefiranofaciens	1	GCF_000214785
+s__Thermotoga_maritima	3	GCF_000230655	GCF_000008545	GCF_000390265
+s__Anoxybacillus_sp_DT3_1	1	GCF_000346275
+s__Asparagus_virus_2	1	PRJNA33493
+s__Streptomyces_sp_KhCrAH_340	1	GCF_000373445
+s__Klebsiella_pneumoniae	150	GCF_000406725	GCF_000406745	GCF_000281535	GCF_000492195	GCF_000409105	GCF_000406425	GCF_000465975	GCF_000281615	GCF_000406385	GCF_000493075	GCF_000313365	GCF_000417205	GCF_000406505	GCF_000309505	GCF_000406865	GCF_000474885	GCF_000492295	GCF_000163455	GCF_000281355	GCF_000474015	GCF_000492315	GCF_000406685	GCF_000493155	GCF_000346145	GCF_000417045	GCF_000492535	GCF_000492415	GCF_000409085	GCF_000417545	GCF_000281495	GCF_000294365	GCF_000283455	GCF_000412575	G [...]
+s__Sesbania_mosaic_virus	1	PRJNA15372
+s__Begomovirus_associated_DNA_II	1	PRJNA15161
+s__Bacteroides_propionicifaciens	1	GCF_000375405
+s__Chlorobium_limicola	1	GCF_000020465
+s__Escherichia_sp_TW09231	1	GCF_000208465
+s__Hibiscus_latent_Singapore_virus	1	PRJNA17573
+s__Vibrio_phage_VHML	1	PRJNA14234
+s__Citreicella_sp_357	1	GCF_000259095
+s__Monkeypox_virus	1	PRJNA15142
+s__Astrovirus_MLB3	1	PRJNA178563
+s__Astrovirus_MLB2	1	PRJNA76723
+s__Synechococcus_phage_S_MbCM6	1	PRJNA181072
+s__Halorubrum_sp_J07HR59	1	GCF_000416045
+s__Arabis_mosaic_virus	1	PRJNA14932
+s__Mycobacterium_phage_Brujita	1	PRJNA32005
+s__Lactococcus_phage_phiLC3	1	PRJNA14362
+s__Mycobacterium_phage_Catera	1	PRJNA17141
+s__Vibrio_phage_VpV262	1	PRJNA14316
+s__Beet_cryptic_virus_1	1	PRJNA32709
+s__Lactococcus_lactis	20	GCF_000006865	GCF_000192705	GCF_000468955	GCF_000447885	GCF_000025045	GCF_000447905	GCF_000143205	GCF_000284735	GCF_000488975	GCF_000312685	GCF_000344575	GCF_000479375	GCF_000447825	GCF_000014545	GCF_000236475	GCF_000348965	GCF_000447925	GCF_000447845	GCF_000447985	GCF_000447965
+s__Lactobacillus_vini	1	GCF_000255495
+s__Kyuri_green_mottle_mosaic_virus	1	PRJNA15140
+s__Candidatus_Nitrosoarchaeum_koreensis	2	GCF_000299365	GCF_000220175
+s__Morganella_phage_MmP1	1	PRJNA30793
+s__Roseiflexus_sp_RS_1	1	GCF_000016665
+s__Dasheen_mosaic_virus	1	PRJNA15388
+s__Ilyobacter_polytropus	1	GCF_000165505
+s__Pseudomonas_phage_phiKZ	1	PRJNA14251
+s__Cardiobacterium_valvarum	1	GCF_000239355
+s__Enterobacteria_phage_T5	1	PRJNA15143
+s__Actinomyces_coleocanis	1	GCF_000159015
+s__Clover_yellow_vein_virus	1	PRJNA15353
+s__Pseudanabaena_sp_PCC_6802	1	GCF_000332175
+s__Enterobacteria_phage_T1	1	PRJNA14496
+s__Ostreid_herpesvirus_1	1	PRJNA14552
+s__Meganema_perideroedes	1	GCF_000374145
+s__Human_adenovirus_B	3	PRJNA14607	PRJNA15150	PRJNA31177
+s__Human_adenovirus_A	2	PRJNA14517	PRJNA40315
+s__Human_adenovirus_G	1	PRJNA14626
+s__Human_adenovirus_F	1	PRJNA14487
+s__Human_adenovirus_E	2	PRJNA15152	PRJNA162489
+s__Human_adenovirus_D	3	PRJNA14535	PRJNA15105	PRJNA39353
+s__Bacteroides_barnesiae	1	GCF_000374585
+s__Streptomyces_sp_ScaeMP_e10	1	GCF_000373405
+s__Wigeon_coronavirus_HKU20	1	PRJNA109279
+s__Amphritea_japonica	1	GCF_000381785
+s__Tomato_yellow_spot_virus	1	PRJNA16327
+s__Rickettsia_philipii	1	GCF_000283995
+s__Macaque_simian_foamy_virus	1	PRJNA30115
+s__Hendra_virus	1	PRJNA14911
+s__Afipia_sp_1NLS2	1	GCF_000178995
+s__Phage_Gifsy_1	1	PRJNA32269
+s__Streptococcus_sp_oral_taxon_071	1	GCF_000146755
+s__Dioscorea_bacilliform_virus	1	PRJNA18829
+s__Phage_Gifsy_2	1	PRJNA32271
+s__Methylacidiphilum_fumariolicum	1	GCF_000297415
+s__Cyanothece_sp_ATCC_51472	1	GCF_000231425
+s__Enterobacter_sp_MR1	1	GCF_000390385
+s__Natranaerobius_thermophilus	1	GCF_000020005
+s__Acinetobacter_phage_Acj9	1	PRJNA60121
+s__Paenibacillus_sp_PAMC_26794	1	GCF_000316035
+s__Planococcus_halocryophilus	1	GCF_000342445
+s__Beet_black_scorch_virus	1	PRJNA14949
+s__Methylobacterium_sp_88A	1	GCF_000376345
+s__Candidatus_Hodgkinia_cicadicola	1	GCF_000021505
+s__Candidatus_Phytoplasma_australiense	2	GCF_000069925	GCF_000397185
+s__Rose_rosette_virus	1	PRJNA64937
+s__Peptoniphilus_sp_oral_taxon_836	1	GCF_000179335
+s__Streptococcus_vestibularis	2	GCF_000180075	GCF_000188295
+s__Lujo_virus	1	PRJNA38405
+s__Paenibacillus_sp_HGH0039	1	GCF_000411255
+s__Circulifer_tenellus_virus_1	1	PRJNA51183
+s__Vibrio_phage_K139	1	PRJNA14144
+s__alpha_proteobacterium_SCGC_AAA027_J10	1	GCF_000371825
+s__Acinetobacter_schindleri	3	GCF_000368465	GCF_000301815	GCF_000368625
+s__Fischerella_muscicola	2	GCF_000317245	GCF_000317205
+s__Pyrobaculum_oguniense	1	GCF_000247545
+s__Donkey_orchid_virus_A	1	PRJNA202316
+s__Microbacterium_sp_B19	1	GCF_000333395
+s__Lactobacillus_casei_paracasei	56	GCF_000309725	GCF_000309785	GCF_000410255	GCF_000410235	GCF_000309685	GCF_000410355	GCF_000410015	GCF_000409875	GCF_000410415	GCF_000410295	GCF_000410375	GCF_000410315	GCF_000309765	GCF_000026485	GCF_000409955	GCF_000410175	GCF_000410475	GCF_000410135	GCF_000194785	GCF_000309665	GCF_000309585	GCF_000409975	GCF_000410435	GCF_000410455	GCF_000418515	GCF_000410155	GCF_000410495	GCF_000309625	GCF_000155515	GCF_000019245	GCF_000409995	GCF_000309565	GCF_0004 [...]
+s__Propionibacterium_phage_PHL060L00	1	PRJNA219122
+s__Escherichia_albertii	3	GCF_000208505	GCF_000155105	GCF_000208425
+s__Bacillus_phage_Eoghan	1	PRJNA192874
+s__Deinococcus_wulumuqiensis	1	GCF_000348665
+s__actinobacterium_SCGC_AAA044_N04	1	GCF_000378885
+s__Selenomonas_sp_F0473	1	GCF_000315545
+s__Solenopsis_invicta_densovirus	1	PRJNA226730
+s__Streptomyces_sp_AA0539	1	GCF_000297635
+s__Goose_paramyxovirus_SF02	1	PRJNA14895
+s__Anaerostipes_hadrus	1	GCF_000332875
+s__Dinoroseobacter_shibae	1	GCF_000018145
+s__Pseudomonas_phage_14_1	1	PRJNA33265
+s__Azospirillum_phage_Cd	1	PRJNA28841
+s__Vibrio_tubiashii	2	GCF_000222665	GCF_000259295
+s__Ahrensia_sp_R2A130	1	GCF_000179775
+s__Chipapillomavirus_2	1	PRJNA28243
+s__Burkholderia_sp_BT03	1	GCF_000281995
+s__Clostridium_phage_phiMMP02	1	PRJNA179416
+s__Rhodospirillum_centenum	1	GCF_000016185
+s__Clostridium_phage_phiMMP04	1	PRJNA179417
+s__Methylovulum_miyakonense	1	GCF_000384075
+s__Lachnospiraceae_bacterium_6_1_63FAA	1	GCF_000209425
+s__Grapevine_berry_inner_necrosis_virus	1	PRJNA63625
+s__Cesiribacter_andamanensis	1	GCF_000348925
+s__Bacillus_marmarensis	1	GCF_000474275
+s__Leuconostoc_kimchii	1	GCF_000092505
+s__Mycobacterium_phage_Wildcat	1	PRJNA17175
+s__Thermoanaerobacterium_thermosaccharolyticum	2	GCF_000145615	GCF_000328545
+s__Erysipelotrichaceae_bacterium_3_1_53	1	GCF_000165065
+s__Tomato_leaf_curl_Philippine_betasatellite	1	PRJNA19865
+s__Tetrapisispora_phaffii	1	GCA_000236905
+s__Pseudomonas_pelagia	1	GCF_000410875
+s__Campylobacter_jejuni	81	GCF_000466065	GCF_000254715	GCF_000254575	GCF_000254855	GCF_000254475	GCF_000254935	GCF_000468915	GCF_000285755	GCF_000017485	GCF_000163995	GCF_000017905	GCF_000255075	GCF_000242395	GCF_000254635	GCF_000254555	GCF_000168195	GCF_000466075	GCF_000254895	GCF_000285695	GCF_000254975	GCF_000285715	GCF_000254315	GCF_000254275	GCF_000184085	GCF_000254415	GCF_000254675	GCF_000254695	GCF_000254755	GCF_000255095	GCF_000168135	GCF_000184825	GCF_000302555	GCF_000493495	GCF [...]
+s__Mycobacterium_phage_Adjutor	1	PRJNA29919
+s__Herbaspirillum_lusitanum	1	GCF_000256565
+s__Xanthobacter_autotrophicus	1	GCF_000017645
+s__Mycobacterium_mageritense	1	GCF_000233935
+s__Chlamydia_pneumoniae_phage_CPAR39	1	PRJNA57809
+s__Richelia_intracellularis	2	GCF_000350105	GCF_000350125
+s__Thiocystis_violascens	1	GCF_000227745
+s__Microbacterium_testaceum	1	GCF_000202635
+s__Pseudoalteromonas_phage_pYD6_A	1	PRJNA195478
+s__Vagococcus_lutrae	1	GCF_000498295
+s__Mycobacterium_phage_Pacc40	1	PRJNA32017
+s__Bartonella_rattimassiliensis	2	GCF_000312605	GCF_000278215
+s__Apple_scar_skin_viroid	1	PRJNA14967
+s__Deerpox_virus_W_1170_84	1	PRJNA32597
+s__Natrinema_pallidum	1	GCF_000337615
+s__Mycobacterium_thermoresistibile	1	GCF_000234585
+s__Clostridium_phage_phiCP39_O	1	PRJNA32103
+s__Streptomyces_sp_CNS335	1	GCF_000377125
+s__Saccharomyces_20S_RNA_narnavirus	1	PRJNA14841
+s__Banana_streak_CA_virus	1	PRJNA66617
+s__alpha_proteobacterium_SCGC_AAA024_N17	1	GCF_000372045
+s__Mycoplasma_phage_MAV1	1	PRJNA14395
+s__Oscillochloris_trichoides	1	GCF_000152145
+s__Alphapapillomavirus_2	1	PRJNA15504
+s__Sida_leaf_curl_virus_associated_DNA_1	1	PRJNA16227
+s__Alphapapillomavirus_10	1	PRJNA15454
+s__Alphapapillomavirus_11	1	PRJNA15509
+s__Alphapapillomavirus_12	1	PRJNA14025
+s__Alphapapillomavirus_13	1	PRJNA15466
+s__Alphapapillomavirus_14	1	PRJNA15424
+s__Saccharopolyspora_spinosa	1	GCF_000194155
+s__Lactobacillus_pobuzihii	1	GCF_000349725
+s__Acinetobacter_sp_NIPH_713	1	GCF_000369445
+s__Enterobacteria_phage_K1_5	1	PRJNA17059
+s__Glaciecola_mesophila	1	GCF_000315015
+s__Lettuce_chlorosis_virus	1	PRJNA38899
+s__Brevibacterium_linens	1	GCF_000167575
+s__zeta_proteobacterium_SCGC_AB_137_C09	1	GCF_000379225
+s__Tomato_leaf_curl_Taiwan_virus	1	PRJNA14193
+s__Nitrosomonas_sp_Is79A3	1	GCF_000219585
+s__Chloroherpeton_thalassium	1	GCF_000020525
+s__Erectites_yellow_mosaic_virus_satellite_DNA_beta	1	PRJNA19827
+s__Wolbachia_pipientis	2	GCF_000242415	GCF_000333775
+s__Sowbane_mosaic_virus	1	PRJNA31125
+s__Prochlorococcus_phage_P_RSM4	1	PRJNA64703
+s__Streptomyces_sp_PAMC26508	1	GCF_000364805
+s__Actinoplanes_globisporus	1	GCF_000379645
+s__Leeia_oryzae	1	GCF_000376945
+s__St_Augustine_decline_satellite_virus	1	PRJNA14898
+s__Halosimplex_carlsbadense	1	GCF_000337455
+s__Trichomonas_vaginalis_virus	1	PRJNA14813
+s__Geobacter_sp_M18	1	GCF_000175115
+s__Kordia_algicida	1	GCF_000154725
+s__Natronorubrum_bangense	1	GCF_000337715
+s__Staphylococcus_phage_44AHJD	1	PRJNA14268
+s__Sphingobacterium_paucimobilis	1	GCF_000416985
+s__Rhizobium_gallicum	1	GCF_000373025
+s__Borrelia_recurrentis	1	GCF_000019705
+s__Planktothrix_phage_PaV_LD	1	PRJNA80915
+s__Tuber_aestivum_mitovirus	1	PRJNA67889
+s__Rhodococcus_sp_114MFTsu3_1	1	GCF_000383555
+s__Mesorhizobium_metallidurans	1	GCF_000350085
+s__Saccharomyces_23S_RNA_narnavirus	1	PRJNA14840
+s__Taro_vein_chlorosis_virus	1	PRJNA15163
+s__Sulfitobacter_phage_pCB2047_B	1	PRJNA195474
+s__Sulfitobacter_phage_pCB2047_C	1	PRJNA195472
+s__Sulfitobacter_phage_pCB2047_A	1	PRJNA195473
+s__Yersinia_ruckeri	1	GCF_000173755
+s__Rhodococcus_sp_R04	1	GCF_000219395
+s__Fibrobacter_succinogenes	2	GCF_000024665	GCF_000146505
+s__Aeromonas_phage_CC2	1	PRJNA181987
+s__Chlorobium_ferrooxidans	1	GCF_000168715
+s__Flock_house_virus	1	PRJNA15075
+s__Prochlorococcus_sp_W11	1	GCF_000291945
+s__Prochlorococcus_sp_W10	1	GCF_000291845
+s__Prochlorococcus_sp_W12	1	GCF_000291965
+s__Bacillus_acidiproducens	1	GCF_000374345
+s__Enterococcus_sulfureus	2	GCF_000407605	GCF_000407025
+s__Aphid_lethal_paralysis_virus	1	PRJNA14867
+s__Bacillus_phage_phIS3501	1	PRJNA181213
+s__Shigella_phage_Ag3	1	PRJNA42937
+s__Calibrachoa_mottle_virus	1	PRJNA214239
+s__Bifidobacterium_gallicum	1	GCF_000173375
+s__Zavarzinella_formosa	1	GCF_000255705
+s__Hemidesmus_yellow_mosaic_virus	1	PRJNA215128
+s__Actinomyces_europaeus	1	GCF_000411155
+s__Clostridiales_bacterium_1_7_47FAA	1	GCF_000155435
+s__Stenotrophomonas_phage_phiSMA7	1	PRJNA209360
+s__Salisaeta_icosahedral_phage_1	1	PRJNA167575
+s__Cellulophaga_phage_phi17_2	1	PRJNA212965
+s__Candidatus_Nitrosopumilus_salaria	1	GCF_000242875
+s__Cupriavidus_sp_HMR_1	1	GCF_000319775
+s__Sulfolobus_spindle_shaped_virus_5	1	PRJNA31219
+s__Francisella_philomiragia	2	GCF_000019285	GCF_000156715
+s__Acinetobacter_pittii_calcoaceticus_nosocomialis	28	GCF_000162375	GCF_000248235	GCF_000368605	GCF_000399705	GCF_000399665	GCF_000309015	GCF_000368085	GCF_000248315	GCF_000302375	GCF_000163635	GCF_000341835	GCF_000191145	GCF_000399685	GCF_000302295	GCF_000162035	GCF_000368965	GCF_000301775	GCF_000301695	GCF_000301675	GCF_000369045	GCF_000472005	GCF_000369025	GCF_000368945	GCF_000300635	GCF_000248335	GCF_000248175	GCF_000367865	GCF_000230465
+s__African_oil_palm_ringspot_virus	1	PRJNA36557
+s__Fervidobacterium_pennivorans	1	GCF_000235405
+s__Okra_leaf_curl_betasatellite	1	PRJNA14209
+s__Simian_virus_41	1	PRJNA15220
+s__Pseudomonas_sp_GM30	1	GCF_000282275
+s__Pseudomonas_sp_GM33	1	GCF_000282295
+s__Tomato_apical_stunt_viroid	1	PRJNA14670
+s__Listeria_innocua	3	GCF_000195795	GCF_000183885	GCF_000241405
+s__Rice_stripe_virus	1	PRJNA14795
+s__Flavobacterium_sp_SCGC_AAA536_P05	1	GCF_000384835
+s__Pelobacter_carbinolicus	1	GCF_000012885
+s__Candidatus_Arthromitus_sp_SFB_mouse	3	GCF_000284435	GCF_000270205	GCF_000225365
+s__Wenxinia_marina	1	GCF_000379485
+s__Staphylococcus_phage_TEM123	1	PRJNA167573
+s__Blautia_producta	1	GCF_000373885
+s__Xanthomonas_albilineans	1	GCF_000087965
+s__Acinetobacter_gerneri	1	GCF_000368565
+s__Macroptilium_yellow_vein_virus	1	PRJNA124061
+s__Pseudoalteromonas_sp_BSi20652	1	GCF_000239855
+s__Corynebacterium_phage_BFK20	1	PRJNA20757
+s__Segetibacter_koreensis	1	GCF_000374045
+s__Rickettsia_typhi	3	GCF_000277305	GCF_000008045	GCF_000277285
+s__Mycoplasma_canis	5	GCF_000258965	GCF_000258985	GCF_000258945	GCF_000259005	GCF_000258925
+s__Serratia_symbiotica	2	GCF_000186485	GCF_000238975
+s__Nautilia_profundicola	1	GCF_000021725
+s__Mycobacterium_phage_Barnyard	1	PRJNA14274
+s__Acinetobacter_sp_ANC_4105	1	GCF_000369485
+s__Tomato_leaf_curl_Guangdong_virus	1	PRJNA17805
+s__Diascia_yellow_mottle_virus	1	PRJNA30795
+s__Grapevine_leafroll_associated_virus_10	1	PRJNA33263
+s__Clostridium_sp_D5	1	GCF_000190355
+s__Fragaria_chiloensis_cryptic_virus	1	PRJNA19741
+s__Thermodesulfobium_narugense	1	GCF_000212395
+s__Slackia_exigua	1	GCF_000162875
+s__Synechococcus_sp_CC9902	1	GCF_000012505
+s__Sinorhizobium_phage_PBC5	1	PRJNA14146
+s__Beijerinckia_indica	1	GCF_000019845
+s__Synechococcus_sp_PCC_7002	1	GCF_000019485
+s__Burkholderia_phage_Bcep781	1	PRJNA14405
+s__Mycobacteriophage_Velveteen	1	PRJNA215123
+s__Rubidibacter_lacunae	1	GCF_000473895
+s__Bacteroides_oleiciplenus	1	GCF_000315485
+s__Enterobacteria_phage_IME08	1	PRJNA50177
+s__Mycobacterium_phage_PhrostyMug	1	PRJNA219114
+s__Oscillatoriales_cyanobacterium_JSC_12	1	GCF_000309945
+s__Lassa_virus	1	PRJNA14864
+s__Desulfovibrio_magneticus	2	GCF_000010665	GCF_000307955
+s__Tamiami_virus	1	PRJNA29831
+s__Oliveros_virus	1	PRJNA28319
+s__Bacteroides_xylanisolvens	4	GCF_000273315	GCF_000210075	GCF_000178295	GCF_000178215
+s__Piscirickettsia_salmonis	2	GCF_000300295	GCF_000401515
+s__Thermus_scotoductus	2	GCF_000187005	GCF_000381045
+s__Rose_yellow_vein_virus	1	PRJNA196972
+s__Clostridium_acidurici	1	GCF_000299355
+s__Barley_yellow_dwarf_virus_PAV	1	PRJNA15196
+s__Aeromonas_aquariorum	1	GCF_000315195
+s__Pseudomonas_synxantha	1	GCF_000263715
+s__Gossypium_punctatum_mild_leaf_curl_virus	1	PRJNA33489
+s__Tomato_leaf_curl_Gujarat_virus	1	PRJNA14238
+s__Lactobacillus_crispatus	11	GCF_000162255	GCF_000177575	GCF_000497065	GCF_000165885	GCF_000162315	GCF_000091765	GCF_000301135	GCF_000160515	GCF_000301115	GCF_000466885	GCF_000161915
+s__Microcystis_sp_T1_4	1	GCF_000297435
+s__Blattabacterium_sp_Periplaneta_americana	1	GCF_000093165
+s__Halanaerobium_saccharolyticum	1	GCF_000350165
+s__Porphyromonas_bennonis	1	GCF_000375645
+s__Avian_metapneumovirus	1	PRJNA16240
+s__Bean_yellow_dwarf_virus	1	PRJNA14605
+s__Coriobacteriaceae_bacterium_phI	1	GCF_000311845
+s__Psychrobacter_sp_G	1	GCF_000418305
+s__Peptoniphilus_harei	1	GCF_000183565
+s__Rabbit_hemorrhagic_disease_virus	1	PRJNA15313
+s__Mesorhizobium_amorphae	1	GCF_000233995
+s__Desulfocapsa_sulfexigens	1	GCF_000341395
+s__Methanothermococcus_thermolithotrophicus	1	GCF_000376965
+s__Reyranella_massiliensis	1	GCF_000312425
+s__Raspberry_ringspot_virus	1	PRJNA14934
+s__Brevibacterium_mcbrellneri	1	GCF_000178455
+s__Phenylobacterium_zucineum	1	GCF_000017265
+s__Sida_mosaic_Sinaloa_virus	1	PRJNA16937
+s__Aspergillus_clavatus	1	GCA_000002715
+s__Brucella_canis	12	GCF_000018525	GCF_000370605	GCF_000370585	GCF_000292185	GCF_000298575	GCF_000367285	GCF_000480295	GCF_000367305	GCF_000480275	GCF_000367265	GCF_000238195	GCF_000366825
+s__Pyrococcus_sp_NA2	1	GCF_000211475
+s__Prevotella_baroniae	1	GCF_000468635
+s__Tomato_blistering_mosaic_virus	1	PRJNA213013
+s__Clostridium_phytofermentans	1	GCF_000018685
+s__Thermoanaerobacterium_xylanolyticum	1	GCF_000189775
+s__Halalkalicoccus_jeotgali	2	GCF_000337255	GCF_000196895
+s__Tomato_leaf_curl_Pune_virus	1	PRJNA18015
+s__Enterobacteria_phage_RTP	1	PRJNA16178
+s__Oat_dwarf_virus	1	PRJNA30037
+s__Thermomonospora_curvata	1	GCF_000024385
+s__Brucella_sp_F23_97	1	GCF_000370965
+s__Acetobacter_pasteurianus	9	GCF_000010845	GCF_000010865	GCF_000010885	GCF_000010945	GCF_000010925	GCF_000010825	GCF_000285315	GCF_000010905	GCF_000010965
+s__Mycoplasma_gallisepticum	11	GCF_000025385	GCF_000286735	GCF_000286815	GCF_000286675	GCF_000092585	GCF_000286695	GCF_000286775	GCF_000286795	GCF_000286715	GCF_000025365	GCF_000286755
+s__Deerpox_virus_W_848_83	1	PRJNA15462
+s__Equine_papillomavirus_type_6	1	PRJNA193978
+s__Banna_virus	1	PRJNA15178
+s__Semliki_forest_virus	1	PRJNA15282
+s__Thermosipho_melanesiensis	1	GCF_000016905
+s__Nitrosopumilus_maritimus	1	GCF_000018465
+s__Atopobium_sp_oral_taxon_810	1	GCF_000466405
+s__Rhizoctonia_cerealis_endornavirus_1	1	PRJNA225929
+s__Frankia_symbiont_of_Datisca_glomerata	1	GCF_000177615
+s__Lewinella_cohaerens	1	GCF_000379805
+s__Cycloclasticus_sp_P1	1	GCF_000299965
+s__Lactococcus_phage_P680	1	PRJNA213080
+s__delta_proteobacterium_NaphS2	1	GCF_000179315
+s__Citrus_leaf_rugose_virus	1	PRJNA14759
+s__Roseobacter_sp_SK209_2_6	1	GCF_000169455
+s__Halovivax_ruber	1	GCF_000328525
+s__Lachnospiraceae_bacterium_3_1_57FAA_CT1	1	GCF_000218405
+s__Red_clover_mottle_virus	1	PRJNA15291
+s__Mycoplasma_ovipneumoniae	1	GCF_000218525
+s__Anabaena_sp_90	1	GCF_000312705
+s__Xanthomonas_phage_CP1	1	PRJNA184158
+s__His2_virus	1	PRJNA16651
+s__Enterobacteria_phage_phiP27	1	PRJNA14599
+s__Rhodococcus_sp_P27	1	GCF_000454285
+s__Cosavirus_A	1	PRJNA38497
+s__Alkalibacillus_haloalkaliphilus	1	GCF_000269905
+s__Pseudomonas_phage_LIT1	1	PRJNA42949
+s__Streptococcus_equi	4	GCF_000026585	GCF_000219765	GCF_000445225	GCF_000020765
+s__Curvularia_thermal_tolerance_virus	1	PRJNA30363
+s__Clostridium_carboxidivorans	2	GCF_000175595	GCF_000163855
+s__Theilovirus	2	PRJNA15292	PRJNA30053
+s__Kappapapillomavirus_1	1	PRJNA14057
+s__Ralstonia_sp_5_7_47FAA	1	GCF_000165085
+s__Prevotella_sp_oral_taxon_306	1	GCF_000257925
+s__Mycoplasma_synoviae	2	GCF_000385095	GCF_000008245
+s__Bean_golden_mosaic_virus	1	PRJNA14199
+s__Rhodococcus_triatomae	1	GCF_000341795
+s__Methylococcus_capsulatus	2	GCF_000297615	GCF_000008325
+s__Shigella_sp_D9	1	GCF_000158395
+s__Bifidobacterium_asteroides	1	GCF_000304215
+s__Segniliparus_rotundus	1	GCF_000092825
+s__Aeromicrobium_marinum	1	GCF_000160775
+s__Pseudomonas_phage_LUZ24	1	PRJNA28739
+s__Ahrensia_kielensis	1	GCF_000374465
+s__Cyanophage_MED4_117	1	PRJNA195503
+s__Red_clover_necrotic_mosaic_virus	1	PRJNA14796
+s__Clostridium_tetani	1	GCF_000007625
+s__Mycobacterium_phage_Pipefish	1	PRJNA17171
+s__Saccharomonospora_glauca	1	GCF_000243395
+s__Phlebiopsis_gigantea_mycovirus_dsRNA_1	1	PRJNA46855
+s__Brucella_sp_83_13	1	GCF_000157875
+s__Natrinema_pellirubrum	2	GCF_000337635	GCF_000230735
+s__Micromonas_pusilla_reovirus	1	PRJNA17091
+s__Cryphonectria_parasitica_mitovirus_1_NB631	1	PRJNA14838
+s__Bacillus_selenitireducens	1	GCF_000093085
+s__Rabbit_fibroma_virus	1	PRJNA14590
+s__Operophtera_brumata_reovirus	1	PRJNA16145
+s__Bacteroides_sp_3_1_33FAA	1	GCF_000162195
+s__Cassava_mosaic_Madagascar_alphasatellite	1	PRJNA175666
+s__African_swine_fever_virus	1	PRJNA15242
+s__Megavirus_lba	1	PRJNA188728
+s__Propionibacterium_phage_ATCC29399B_T	1	PRJNA177538
+s__Haloferax_mucosum	1	GCF_000337815
+s__Acinetobacter_sp_CIP_64_7	1	GCF_000369745
+s__Acinetobacter_sp_CIP_64_2	1	GCF_000369645
+s__Pseudomonas_syringae_group_genomosp_3	6	GCF_000145845	GCF_000177455	GCF_000177475	GCF_000007805	GCF_000172895	GCF_000177495
+s__Halorubrum_lipolyticum	1	GCF_000337375
+s__Candidatus_Nitrosoarchaeum_limnia	2	GCF_000241145	GCF_000204585
+s__Panine_herpesvirus_2	1	PRJNA14404
+s__Wolbachia_endosymbiont_of_Brugia_malayi	1	GCF_000008385
+s__Staphylococcus_equorum	2	GCF_000467635	GCF_000297455
+s__Methanoculleus_marisnigri	1	GCF_000015825
+s__Lactococcus_phage_phi7	1	PRJNA213073
+s__Chiltepin_yellow_mosaic_virus	1	PRJNA48419
+s__Acidianus_bottle_shaped_virus	1	PRJNA19605
+s__Kaistia_granuli	1	GCF_000380505
+s__Erwinia_phage_vB_EamM_Y2	1	PRJNA181231
+s__Enterococcus_columbae	3	GCF_000406925	GCF_000407225	GCF_000373065
+s__Cryptophlebia_leucotreta_granulovirus	1	PRJNA14302
+s__Chthoniobacter_flavus	1	GCF_000173075
+s__Renibacterium_salmoninarum	1	GCF_000018885
+s__Tobacco_necrosis_satellite_virus	1	PRJNA14672
+s__Capnocytophaga_sp_oral_taxon_332	1	GCF_000318275
+s__Marinimicrobia_bacterium_SCGC_AAA160_I06	1	GCF_000402815
+s__Potato_aucuba_mosaic_virus	1	PRJNA14771
+s__Thermodesulfatator_atlanticus	1	GCF_000421585
+s__Pseudomonas_mosselii	1	GCF_000498975
+s__Fusarium_graminearum_dsRNA_mycovirus_3	1	PRJNA41629
+s__Pseudomonas_sp_35MFCvi1_1	1	GCF_000378525
+s__Fusarium_graminearum_dsRNA_mycovirus_4	1	PRJNA41631
+s__Helminthosporium_victoriae_virus_190S	1	PRJNA14763
+s__Methanospirillum_hungatei	1	GCF_000013445
+s__Methanofollis_liminatans	1	GCF_000275865
+s__Synechococcus_phage_KBS_M_1A	1	PRJNA195500
+s__Methylobacterium_nodulans	1	GCF_000022085
+s__Clostridium_citroniae	1	GCF_000233455
+s__Thermovirga_lienii	1	GCF_000233775
+s__Desulfosporosinus_acidiphilus	1	GCF_000255115
+s__Soybean_mild_mottle_virus	1	PRJNA48593
+s__Candidatus_Mycoplasma_haemolamae	1	GCF_000281235
+s__Providencia_burhodogranariea	1	GCF_000314855
+s__Halovirus_HSTV_1	1	PRJNA207837
+s__Aravan_virus	1	PRJNA194139
+s__Enterobacteria_phage_RB51	1	PRJNA37819
+s__Emilia_yellow_vein_virus	1	PRJNA28689
+s__Tomato_leaf_curl_China_virus_OX2	1	PRJNA202888
+s__Raspberry_bushy_dwarf_virus	1	PRJNA14791
+s__Rhizobium_sp_AP16	1	GCF_000281735
+s__Drosophila_obscura_sigmavirus	1	PRJNA224247
+s__Spleen_focus_forming_virus	1	PRJNA14641
+s__Avocado_sunblotch_viroid	1	PRJNA14908
+s__Rice_black_streaked_dwarf_virus	1	PRJNA14790
+s__Tomato_leaf_curl_Joydebpur_virus	1	PRJNA16324
+s__Corynebacterium_casei	1	GCF_000234765
+s__Bovine_leukemia_virus	1	PRJNA14916
+s__Chickpea_redleaf_virus	1	PRJNA60625
+s__Enterobacteria_phage_ime09	1	PRJNA181233
+s__Candidatus_Blochmannia_chromaiodes	1	GCF_000331065
+s__Pig_stool_associated_circular_ssDNA_virus	1	PRJNA165737
+s__Rose_cryptic_virus_1	1	PRJNA28761
+s__Leptotrichia_buccalis	1	GCF_000023905
+s__Eubacterium_yurii	1	GCF_000146855
+s__Pseudomonas_phage_MR299_2	1	PRJNA183543
+s__Vibrio_ezurae	1	GCF_000467185
+s__Spirochaeta_thermophila	2	GCF_000147075	GCF_000184345
+s__Alistipes_sp_AP11	1	GCF_000321205
+s__Haloarcula_hispanica	1	GCF_000223905
+s__Geobacillus_thermoleovorans	1	GCF_000236605
+s__Reinekea_blandensis	1	GCF_000153185
+s__Cronobacter_phage_vB_CsaP_GAP52	1	PRJNA179411
+s__Mannheimia_succiniciproducens	1	GCF_000007745
+s__Pseudomonas_thermotolerans	1	GCF_000364625
+s__Oyster_mushroom_spherical_virus	1	PRJNA14951
+s__Curvibacter_lanceolatus	1	GCF_000381265
+s__Xanthomonas_perforans	1	GCF_000192045
+s__Torulaspora_delbrueckii	1	GCA_000243375
+s__Bergeyella_zoohelcum	2	GCF_000301095	GCF_000301075
+s__Rhodovulum_phage_RS1	1	PRJNA195480
+s__Torque_teno_sus_virus_k2	1	PRJNA48301
+s__Radish_mosaic_virus	1	PRJNA29843
+s__Neodiprion_abietis_NPV	1	PRJNA17361
+s__Clostridium_sp_ASF502	1	GCF_000364245
+s__Streptomyces_sp_Wigar10	1	GCF_000226995
+s__Galleria_mellonella_densovirus	1	PRJNA14221
+s__Escherichia_sp_TW11588	1	GCF_000208585
+s__Mycobacterium_phage_Dumbo	1	PRJNA206034
+s__Salivirus_A	2	PRJNA39349	PRJNA39553
+s__Natrinema_gari	1	GCF_000337175
+s__Yersinia_phage_phiR1_37	1	PRJNA76739
+s__Chapare_virus	1	PRJNA29223
+s__Pseudomonas_phage_vB_PaeM_C2_10_Ab1	1	PRJNA184146
+s__Corynebacterium_sp_KPL1998	1	GCF_000477895
+s__Methylomonas_sp_MK1	1	GCF_000365425
+s__Sugarcane_yellow_leaf_virus	1	PRJNA15363
+s__Tobacco_streak_virus	1	PRJNA15472
+s__Corynebacterium_sp_KPL1995	1	GCF_000477935
+s__Corynebacterium_sp_KPL1996	1	GCF_000477915
+s__Pseudomonas_sp_GM17	1	GCF_000282175
+s__Pseudomonas_sp_GM16	1	GCF_000282155
+s__Acinetobacter_bereziniae	3	GCF_000368505	GCF_000368925	GCF_000248295
+s__Mycobacterium_phage_Catdawg	1	PRJNA215124
+s__Pseudomonas_sp_GM18	1	GCF_000282195
+s__Maize_yellow_dwarf_virus_RMV	1	PRJNA208537
+s__Atopobium_vaginae	3	GCF_000179715	GCF_000159235	GCF_000178335
+s__Aestuariimicrobium_kwangyangense	1	GCF_000421525
+s__Ralstonia_phage_RSS0	1	PRJNA181985
+s__Chlamydia_ibidis	1	GCF_000454725
+s__Kluyveromyces_lactis	1	GCA_000002515
+s__Bacillus_isronensis	1	GCF_000298255
+s__Citrus_viroid_V	1	PRJNA28115
+s__Cherry_mottle_leaf_virus	1	PRJNA14695
+s__Cotton_leaf_curl_Bangalore_betasatellite	1	PRJNA15557
+s__JC_polyomavirus	1	PRJNA15477
+s__Staphylococcus_phage_phiPVL_CN125	1	PRJNA38431
+s__Candidatus_Pelagibacter_sp_IMCC9063	1	GCF_000195085
+s__Squash_mosaic_virus	1	PRJNA15384
+s__Streptococcus_phage_Abc2	1	PRJNA42791
+s__Ralstonia_phage_PE226	1	PRJNA64769
+s__Chlamydia_muridarum	2	GCF_000175535	GCF_000174995
+s__Aeromonas_phage_vB_AsaM_56	1	PRJNA181214
+s__Streptomyces_rapamycinicus	1	GCF_000418455
+s__Vibrio_phage_martha_12B12	1	PRJNA198434
+s__Arthrobacter_sp_135MFCol5_1	1	GCF_000374865
+s__Edwardsiella_phage_KF_1	1	PRJNA179430
+s__Acinetobacter_phage_Ac42	1	PRJNA60115
+s__Ktedonobacter_racemifer	1	GCF_000178855
+s__Streptococcus_phage_Sfi21	1	PRJNA14133
+s__Mycobacterium_phage_PattyP	1	PRJNA206030
+s__Broad_bean_wilt_virus_2	1	PRJNA15380
+s__Broad_bean_wilt_virus_1	1	PRJNA14905
+s__Vibrio_phage_VBP32	1	PRJNA195492
+s__Streptococcus_phage_MM1	1	PRJNA14601
+s__Mythimna_separata_entomopoxvirus_L	1	PRJNA203667
+s__Enterococcus_phage_phiFL4A	1	PRJNA42793
+s__Streptococcus_sp_2_1_36FAA	1	GCF_000161955
+s__Deltapapillomavirus_1	1	PRJNA15453
+s__gamma_proteobacterium_NOR5_3	1	GCF_000158155
+s__Deltapapillomavirus_3	1	PRJNA15460
+s__Deltapapillomavirus_4	1	PRJNA15513
+s__Deltapapillomavirus_5	1	PRJNA30665
+s__Bacteroides_sp_3_1_23	1	GCF_000162555
+s__delta_proteobacterium_MLMS_1	1	GCF_000168275
+s__Loktanella_hongkongensis	1	GCF_000365005
+s__Cotton_leaf_curl_Burewala_virus	1	PRJNA34757
+s__Wolbachia_endosymbiont_of_Onchocerca_ochengi	1	GCF_000306885
+s__Cowpea_aphid_borne_mosaic_virus	1	PRJNA15394
+s__Salmonella_phage_FSL_SP_031	1	PRJNA212717
+s__Salmonella_phage_FSL_SP_030	1	PRJNA212718
+s__Pleurotus_ostreatus_virus_1	1	PRJNA15169
+s__Gordonia_polyisoprenivorans	3	GCF_000385355	GCF_000241325	GCF_000247715
+s__Sida_golden_mottle_virus	1	PRJNA48421
+s__Dyodeltapapillomavirus_1	1	PRJNA32003
+s__Thermococcus_kodakarensis	1	GCF_000009965
+s__Stenotrophomonas_phage_phiSMA9	1	PRJNA15493
+s__Acetivibrio_cellulolyticus	1	GCF_000179595
+s__Hibiscus_chlorotic_ringspot_virus	1	PRJNA15208
+s__Southern_elephant_seal_virus	1	PRJNA88117
+s__Trypanosoma_cruzi	1	GCA_000209065
+s__Frankia_sp_BMG5_12	1	GCF_000374165
+s__Wolbachia_endosymbiont_of_Culex_pipiens_molestus	1	GCF_000208785
+s__Treponema_denticola	17	GCF_000191825	GCF_000338455	GCF_000338615	GCF_000340725	GCF_000340745	GCF_000340605	GCF_000338475	GCF_000340705	GCF_000413095	GCF_000413075	GCF_000340685	GCF_000338595	GCF_000340645	GCF_000008185	GCF_000338635	GCF_000338515	GCF_000413115
+s__Nafulsella_turpanensis	1	GCF_000346615
+s__Epinephelus_tauvina_nervous_necrosis_virus	1	PRJNA14849
+s__Digitaria_didactyla_striate_mosaic_virus	1	PRJNA53503
+s__Xanthomonas_euvesicatoria	1	GCF_000009165
+s__Blueberry_shock_virus	1	PRJNA218015
+s__Newbury_1_virus	2	PRJNA14845	PRJNA16653
+s__Mycobacterium_fortuitum	1	GCF_000295855
+s__Infectious_bursal_disease_virus	1	PRJNA14990
+s__Geobacter_metallireducens	2	GCF_000243475	GCF_000012925
+s__Clostridium_phage_PhiS63	1	PRJNA167577
+s__Sweet_potato_leaf_curl_China_Henan_virus	1	PRJNA210929
+s__Arthrospira_sp_PCC_8005	1	GCF_000176895
+s__Rhodomicrobium_vannielii	1	GCF_000166055
+s__Tomato_leaf_curl_Arusha_virus	1	PRJNA18861
+s__Shewanella_oneidensis	1	GCF_000146165
+s__Rice_yellow_mottle_virus	1	PRJNA15327
+s__Pediococcus_phage_clP1	1	PRJNA76735
+s__Thermosipho_africanus	2	GCF_000021285	GCF_000300715
+s__Mycoplasma_flocculare	1	GCF_000367185
+s__Leptospira_meyeri	2	GCF_000304275	GCF_000347075
+s__Methylobacterium_extorquens	5	GCF_000021845	GCF_000018845	GCF_000243435	GCF_000083545	GCF_000022685
+s__Heliothis_virescens_ascovirus_3a	1	PRJNA19151
+s__Atopobium_sp_ICM58	1	GCF_000283035
+s__Paenibacillus_sp_Aloe_11	1	GCF_000245715
+s__Burkholderia_phage_phi644_2	1	PRJNA62941
+s__Butyrivibrio_sp_XPD2006	1	GCF_000420865
+s__Methylobacterium_sp_MB200	1	GCF_000333655
+s__Asticcacaulis_sp_AC460	1	GCF_000495795
+s__Gallionella_capsiferriformans	1	GCF_000145255
+s__Turicibacter_sp_HGF1	1	GCF_000191865
+s__Pseudaminobacter_salicylatoxidans	1	GCF_000304395
+s__Maize_dwarf_mosaic_virus	1	PRJNA15355
+s__Vibrio_azureus	1	GCF_000467165
+s__Corynebacterium_genitalium	1	GCF_000143825
+s__Staphylococcus_hominis	4	GCF_000183685	GCF_000247085	GCF_000174735	GCF_000269685
+s__Kenaf_leaf_curl_virus	1	PRJNA28991
+s__zeta_proteobacterium_SCGC_AB_604_O16	1	GCF_000372125
+s__Macroptilium_yellow_mosaic_Florida_virus	1	PRJNA14399
+s__Campylobacter_fetus	3	GCF_000015085	GCF_000222425	GCF_000174675
+s__Marburg_marburgvirus	1	PRJNA15199
+s__Escherichia_sp_TW09276	1	GCF_000208445
+s__Enterobacteria_phage_Min27	1	PRJNA29143
+s__Bifidobacterium_minimum	1	GCF_000421685
+s__Toxoplasma_gondii	1	GCA_000006565
+s__Syntrophothermus_lipocalidus	1	GCF_000092405
+s__Cassava_brown_streak_virus	1	PRJNA38085
+s__Lettuce_virus_X	1	PRJNA30177
+s__Ethanoligenens_harbinense	1	GCF_000178115
+s__Rickettsia_heilongjiangensis	1	GCF_000221205
+s__Thermus_sp_RL	1	GCF_000252835
+s__Chocolate_lily_virus_A	1	PRJNA78931
+s__Geitlerinema_sp_PCC_7105	1	GCF_000332355
+s__Rabbit_calicivirus_Australia_1_MIC_07	1	PRJNA33267
+s__Bacillus_sp_WBUNB009	1	GCF_000319735
+s__Acidovorax_sp_NO_1	1	GCF_000238595
+s__Eidolon_helvum_parvovirus_1	1	PRJNA81567
+s__Blattabacterium_sp_Mastotermes_darwiniensis	1	GCF_000233435
+s__Acidianus_filamentous_virus_1	1	PRJNA14363
+s__Apple_green_crinkle_associated_virus	1	PRJNA176615
+s__Dragonfly_associated_alphasatellite	1	PRJNA181244
+s__Streptococcus_macacae	1	GCF_000187995
+s__Agrotis_segetum_nucleopolyhedrovirus	1	PRJNA16661
+s__secondary_endosymbiont_of_Heteropsylla_cubana	1	GCF_000287355
+s__Thermovibrio_ammonificans	1	GCF_000185805
+s__Enterobacteria_phage_SP6	1	PRJNA14291
+s__Fiji_disease_virus	1	PRJNA15473
+s__Lactococcus_phage_asccphi28	1	PRJNA28985
+s__Actinobacillus_suis	1	GCF_000307145
+s__Selenomonas_sputigena	2	GCF_000160495	GCF_000208405
+s__Acholeplasma_phage_MV_L1	1	PRJNA14573
+s__Shewanella_decolorationis	1	GCF_000485795
+s__Propionibacterium_phage_P1_1	1	PRJNA177537
+s__Spiroplasma_phage_1_C74	1	PRJNA14178
+s__Sweet_potato_leaf_curl_Spain_virus	1	PRJNA30673
+s__Enterobacteria_phage_TLS	1	PRJNA19775
+s__Streptomyces_sviceus	1	GCF_000154965
+s__Burkholderia_sp_SJ98	1	GCF_000256585
+s__Sugarcane_bacilliform_MO_virus	1	PRJNA16750
+s__Marinimicrobia_bacterium_JGI_0000039_D08	1	GCF_000405265
+s__Methylobacterium_sp_GXF4	1	GCF_000272495
+s__Microbacterium_phage_Min1	1	PRJNA19961
+s__Tomato_mild_mosaic_virus	1	PRJNA30187
+s__Leptotrichia_sp_oral_taxon_215	1	GCF_000469505
+s__Actinoplanes_sp_SE50_110	1	GCF_000237145
+s__Pseudomonas_phage_PT5	1	PRJNA30847
+s__Pseudomonas_phage_PT2	1	PRJNA30851
+s__Leucobacter_salsicius	1	GCF_000350525
+s__Tobacco_mosaic_virus	1	PRJNA15071
+s__Rio_Bravo_virus	1	PRJNA15368
+s__Prevotella_pleuritidis	1	GCF_000468135
+s__Succinatimonas_hippei	1	GCF_000188195
+s__Puniceispirillum_phage_HMO_2011	1	PRJNA213071
+s__Scytonema_hofmanni	1	GCF_000346485
+s__Lactobacillus_brevis	3	GCF_000014465	GCF_000469365	GCF_000159175
+s__Rhizobium_mongolense	1	GCF_000419765
+s__Mycobacterium_phage_Dylan	1	PRJNA219120
+s__Lachnospiraceae_bacterium_5_1_57FAA	1	GCF_000218425
+s__Melon_yellow_spot_virus	1	PRJNA17545
+s__Citrobacter_koseri	1	GCF_000018045
+s__Bluetongue_virus	1	PRJNA14938
+s__Beet_soil_borne_virus	1	PRJNA14751
+s__Methanococcus_maripaludis	5	GCF_000220645	GCF_000017225	GCF_000011585	GCF_000018485	GCF_000016125
+s__Prevotella_veroralis	2	GCF_000162935	GCF_000377625
+s__Fusobacterium_periodonticum	4	GCF_000297655	GCF_000158215	GCF_000163935	GCF_000160475
+s__Selenomonas_bovis	1	GCF_000381005
+s__Lymantria_xylina_MNPV	1	PRJNA46671
+s__Cellulophaga_phage_phi12a_1	1	PRJNA212956
+s__Thermotoga_neapolitana	1	GCF_000018945
+s__Hantaan_virus	1	PRJNA14929
+s__Mycobacterium_phage_Qyrzula	1	PRJNA17173
+s__Phaius_virus_X	1	PRJNA28617
+s__Bordetella_phage_BMP_1	1	PRJNA14358
+s__Nitrosococcus_halophilus	1	GCF_000024725
+s__Mycoplasma_columbinum	1	GCF_000222995
+s__Paprika_mild_mottle_virus	1	PRJNA14935
+s__Mycobacterium_phage_Myrna	1	PRJNA31279
+s__Euphorbia_mosaic_virus_associated_DNA_1	1	PRJNA59505
+s__Streptococcus_ratti	2	GCF_000286075	GCF_000347915
+s__Thermoanaerobacter_sp_X561	1	GCF_000175775
+s__Burkholderia_phage_phiE202	1	PRJNA19163
+s__Vibrio_anguillarum	4	GCF_000217675	GCF_000462975	GCF_000257165	GCF_000257185
+s__Microchaete_sp_PCC_7126	1	GCF_000332295
+s__Equine_papillomavirus_2	1	PRJNA34709
+s__Equine_papillomavirus_3	1	PRJNA163309
+s__Ageratum_enation_alphasatellite	1	PRJNA181994
+s__Prochlorococcus_phage_P_SSP10	1	PRJNA195499
+s__South_African_cassava_mosaic_virus	1	PRJNA14179
+s__Actinomyces_sp_oral_taxon_849	1	GCF_000239715
+s__Actinomyces_sp_oral_taxon_848	1	GCF_000162895
+s__Oscillibacter_sp_1_3	1	GCF_000403435
+s__Synechococcus_sp_JA_2_3B_a_2_13	1	GCF_000013225
+s__Pedobacter_sp_BAL39	1	GCF_000170795
+s__actinobacterium_SCGC_AAA027_L06	1	GCF_000294575
+s__Haemophilus_parahaemolyticus	1	GCF_000262265
+s__Bordetella_pertussis	35	GCF_000479895	GCF_000479415	GCF_000193515	GCF_000479835	GCF_000212975	GCF_000479395	GCF_000479715	GCF_000479675	GCF_000479455	GCF_000479695	GCF_000193535	GCF_000193595	GCF_000479535	GCF_000479555	GCF_000479495	GCF_000479595	GCF_000479795	GCF_000193575	GCF_000195715	GCF_000479475	GCF_000479855	GCF_000479915	GCF_000479635	GCF_000479815	GCF_000504325	GCF_000479875	GCF_000479575	GCF_000193555	GCF_000479755	GCF_000479515	GCF_000479775	GCF_000479615	GCF_000479435	GCF [...]
+s__Caulobacter_sp_JGI_0001013_O16	1	GCF_000376365
+s__Scardovia_wiggsiae	2	GCF_000275805	GCF_000269605
+s__Brucella_inopinata	1	GCF_000182725
+s__Cellvibrio_sp_BR	1	GCF_000263355
+s__Sulfolobales_Mexican_rudivirus_1	1	PRJNA179431
+s__Rosellinia_necatrix_victorivirus_1	1	PRJNA209362
+s__planctomycete_KSU_1	1	GCF_000296795
+s__Halonotius_sp_J07HN4	1	GCF_000416065
+s__Pseudomonas_phage_MP38	1	PRJNA32995
+s__Halonotius_sp_J07HN6	1	GCF_000416025
+s__Vibrio_phage_VBP47	1	PRJNA195493
+s__Facklamia_languida	1	GCF_000245795
+s__Torque_teno_canis_virus	1	PRJNA48141
+s__Porcine_circovirus_type_1_2a	1	PRJNA45807
+s__Actinomyces_massiliensis	2	GCF_000296275	GCF_000269805
+s__Ideonella_sp_B508_1	1	GCF_000333615
+s__Tobacco_leaf_curl_Yunnan_virus_associated_DNA_1	1	PRJNA15482
+s__Paenibacillus_popilliae	1	GCF_000315235
+s__Thalassolituus_oleivorans	1	GCF_000355675
+s__Porcine_circovirus_2	1	PRJNA15442
+s__Porcine_circovirus_1	1	PRJNA14053
+s__Exiguobacterium_sp_S17	1	GCF_000411915
+s__Trichophyton_verrucosum	1	GCA_000151505
+s__Bacteroides_sp_D2	1	GCF_000159075
+s__Bacteroides_sp_D1	1	GCF_000157095
+s__gamma_proteobacterium_IMCC1989	1	GCF_000209515
+s__Marinithermus_hydrothermalis	1	GCF_000195335
+s__Halomonas_sp_TD01	1	GCF_000219565
+s__Burkholderia_pyrrocinia	1	GCF_000297475
+s__Neisseria_sp_oral_taxon_014	1	GCF_000090875
+s__Megasphaera_sp_BV3C16_1	1	GCF_000478965
+s__European_brown_hare_syndrome_virus	1	PRJNA15087
+s__Pandoravirus_salinus	1	PRJNA215788
+s__Cellulophaga_phage_phi38_1	1	PRJNA212958
+s__Shewanella_sp_MR_7	1	GCF_000014665
+s__Shewanella_sp_MR_4	1	GCF_000014685
+s__Flexithrix_dorotheae	1	GCF_000379765
+s__Mycoplasma_alkalescens	1	GCF_000367445
+s__Shigella_phage_Shfl2	1	PRJNA66347
+s__Halobiforma_lacisalsi	2	GCF_000226975	GCF_000336655
+s__Prevotella_nanceiensis	1	GCF_000379965
+s__Desulfohalobium_retbaense	1	GCF_000024325
+s__Nocardia_phage_NBR1	1	PRJNA80925
+s__Saccharopolyspora_erythraea	3	GCF_000171635	GCF_000448385	GCF_000062885
+s__Acidithiobacillus_ferrivorans	1	GCF_000214095
+s__Bacillus_megaterium	4	GCF_000025805	GCF_000225265	GCF_000025825	GCF_000334875
+s__Spiroplasma_chrysopicola	1	GCF_000400935
+s__Drosophila_melanogaster_sigmavirus	1	PRJNA40127
+s__Alkalilimnicola_ehrlichii	1	GCF_000014785
+s__Mesorhizobium_loti	51	GCF_000504265	GCF_000502715	GCF_000502415	GCF_000502835	GCF_000502355	GCF_000503035	GCF_000502915	GCF_000009625	GCF_000502995	GCF_000502975	GCF_000502955	GCF_000502575	GCF_000502675	GCF_000502935	GCF_000502735	GCF_000502475	GCF_000502375	GCF_000502555	GCF_000502215	GCF_000503135	GCF_000503155	GCF_000502895	GCF_000502795	GCF_000503015	GCF_000503095	GCF_000502335	GCF_000502815	GCF_000502455	GCF_000502615	GCF_000502495	GCF_000502435	GCF_000502515	GCF_000502295	GCF_0 [...]
+s__Coconut_tinangaja_viroid	1	PRJNA14662
+s__Rhodobacterales_bacterium_Y4I	1	GCF_000156135
+s__J_virus	1	PRJNA15892
+s__Mycobacterium_phage_Murphy	1	PRJNA206024
+s__Mycobacterium_phage_HINdeR	1	PRJNA206031
+s__Odontoglossum_ringspot_virus	1	PRJNA15201
+s__Ludwigia_yellow_vein_virus_associated_DNA_beta	1	PRJNA15561
+s__Mycobacterium_vaccae	1	GCF_000295825
+s__Wongabel_virus	1	PRJNA33129
+s__Rickettsia_japonica	1	GCF_000283595
+s__Japanese_holly_fern_mottle_virus	1	PRJNA40117
+s__Eubacterium_cylindroides	1	GCF_000469305
+s__Bacillus_sp_10403023	1	GCF_000285535
+s__Burkholderia_sp_CCGE1001	1	GCF_000176935
+s__Burkholderia_sp_CCGE1002	1	GCF_000092885
+s__Burkholderia_sp_CCGE1003	1	GCF_000148685
+s__Halobacterium_salinarum	2	GCF_000069025	GCF_000006805
+s__Cherry_necrotic_rusty_mottle_virus	1	PRJNA14729
+s__Labrenzia_aggregata	1	GCF_000168975
+s__Pseudomonas_sp_HYS	1	GCF_000259195
+s__Cellulophaga_lytica	1	GCF_000190595
+s__Acinetobacter_phage_Abp1	1	PRJNA206470
+s__Digitaria_ciliaris_striate_mosaic_virus	1	PRJNA174778
+s__Rickettsia_canadensis	2	GCF_000283915	GCF_000014345
+s__Neurospora_crassa	1	GCA_000182925
+s__Acetobacter_aceti	2	GCF_000379545	GCF_000193495
+s__Anaerotruncus_colihominis	1	GCF_000154565
+s__Janthinobacterium_sp_HH01	1	GCF_000335815
+s__Bacillus_phage_Finn	1	PRJNA192875
+s__Geovibrio_sp_L21_Ace_BES	1	GCF_000421105
+s__Felis_catus_papillomavirus_4	1	PRJNA221115
+s__Thermosynechococcus_elongatus	1	GCF_000011345
+s__Felis_catus_papillomavirus_3	1	PRJNA207833
+s__Burkholderia_vietnamiensis	1	GCF_000016205
+s__Prevotella_timonensis	1	GCF_000177055
+s__Cowpea_mild_mottle_virus	1	PRJNA60623
+s__Pseudomonas_sp_GM74	1	GCF_000282455
+s__Pseudomonas_sp_GM79	1	GCF_000282495
+s__Nitrosopumilus_sp_AR	1	GCF_000328925
+s__Leuconostoc_pseudomesenteroides	2	GCF_000185065	GCF_000297375
+s__Commelina_yellow_mottle_virus	1	PRJNA14575
+s__Reston_ebolavirus	1	PRJNA15006
+s__Walleye_dermal_sarcoma_virus	1	PRJNA14718
+s__Acidithiobacillus_thiooxidans	1	GCF_000227215
+s__Fibrella_aestuarina	1	GCF_000331105
+s__Acinetobacter_sp_ANC_3862	1	GCF_000369565
+s__Thermobaculum_terrenum	1	GCF_000025005
+s__Peptoniphilus_sp_oral_taxon_386	1	GCF_000090945
+s__Arthrospira_maxima	1	GCF_000173555
+s__Gordonia_amicalis	1	GCF_000332995
+s__Ruminococcus_flavefaciens	2	GCF_000174895	GCF_000247525
+s__Vibrio_phage_VP93	1	PRJNA37885
+s__Leishmania_donovani	1	GCA_000227135
+s__Enterobacteria_phage_HK544	1	PRJNA183160
+s__Enterobacteria_phage_HK542	1	PRJNA183159
+s__Buchnera_aphidicola	13	GCF_000007725	GCF_000007365	GCF_000217635	GCF_000225465	GCF_000183245	GCF_000183305	GCF_000183285	GCF_000090965	GCF_000174075	GCF_000021065	GCF_000021085	GCF_000225445	GCF_000183225
+s__Dorea_longicatena	1	GCF_000154065
+s__Goose_circovirus	1	PRJNA14125
+s__Enterobacteria_phage_PRD1	1	PRJNA14062
+s__Ageratum_yellow_vein_Hualian_virus	1	PRJNA30057
+s__Sweet_potato_virus_2	1	PRJNA167581
+s__Lactobacillus_phage_Lc_Nu	2	PRJNA14475	PRJNA16114
+s__Pseudoalteromonas_spongiae	1	GCF_000238255
+s__Rhodococcus_phage_REQ1	1	PRJNA81177
+s__Snake_parvovirus_1	1	PRJNA14477
+s__Parabacteroides_sp_ASF519	1	GCF_000364265
+s__Nodosilinea_nodulosa	1	GCF_000309385
+s__Vibrio_coralliilyticus	3	GCF_000461895	GCF_000195475	GCF_000176135
+s__Catenulispora_acidiphila	1	GCF_000024025
+s__Phocoena_phocoena_papillomavirus_1	1	PRJNA168666
+s__Phocoena_phocoena_papillomavirus_2	1	PRJNA168667
+s__Phocoena_phocoena_papillomavirus_4	1	PRJNA168668
+s__Enterococcus_sp_GMD3E	1	GCF_000296915
+s__Sweet_potato_virus_G	1	PRJNA169624
+s__Sweet_potato_virus_C	1	PRJNA60649
+s__Enterobacteria_phage_K1F	1	PRJNA15880
+s__Desulfobulbus_sp_oral_taxon_041	2	GCF_000349365	GCF_000349345
+s__Geminocystis_herdmanii	1	GCF_000332235
+s__Cycad_leaf_necrosis_virus	1	PRJNA30835
+s__Aureimonas_ureilytica	1	GCF_000382705
+s__gamma_proteobacterium_HIMB55	1	GCF_000227505
+s__Anaeromyxobacter_sp_K	1	GCF_000020805
+s__Bovine_parainfluenza_virus_3	1	PRJNA15001
+s__Beggiatoa_sp_SS	1	GCF_000170695
+s__Feline_bocavirus	1	PRJNA162493
+s__Sheeppox_virus	1	PRJNA14196
+s__Gordonibacter_pamelaeae	1	GCF_000210055
+s__Solibacillus_silvestris	1	GCF_000271325
+s__Bovine_respiratory_syncytial_virus	1	PRJNA14697
+s__Streptococcus_sp_AS14	1	GCF_000286495
+s__Listeria_phage_B025	1	PRJNA20795
+s__Selenomonas_noxia	2	GCF_000160555	GCF_000234135
+s__Mycoplasma_bovis	3	GCF_000219375	GCF_000270525	GCF_000183385
+s__Flavobacteriales_bacterium_ALC_1	1	GCF_000171875
+s__Stigmatella_aurantiaca	1	GCF_000165485
+s__Mycoplasma_agalactiae	3	GCF_000063605	GCF_000266865	GCF_000089865
+s__Acinetobacter_junii	5	GCF_000162075	GCF_000368665	GCF_000302355	GCF_000368745	GCF_000368765
+s__Salinibacter_ruber	2	GCF_000090405	GCF_000013045
+s__Helcococcus_kunzii	1	GCF_000245755
+s__Saimiriine_herpesvirus_1	1	PRJNA54017
+s__Streptomyces_sp_HPH0547	1	GCF_000411495
+s__Pseudomonas_phage_M6	1	PRJNA16387
+s__Staphylococcus_phage_vB_SauM_Romulus	1	PRJNA195528
+s__Gemmata_obscuriglobus	1	GCF_000171775
+s__Methanothermus_fervidus	1	GCF_000166095
+s__Campylobacter_coli	50	GCF_000254015	GCF_000253535	GCF_000253595	GCF_000253455	GCF_000470055	GCF_000253515	GCF_000253895	GCF_000254035	GCF_000254195	GCF_000253435	GCF_000254155	GCF_000253415	GCF_000254235	GCF_000254175	GCF_000253635	GCF_000254075	GCF_000253475	GCF_000253655	GCF_000253575	GCF_000464875	GCF_000253695	GCF_000253915	GCF_000253975	GCF_000254095	GCF_000253995	GCF_000253875	GCF_000253815	GCF_000253835	GCF_000254135	GCF_000253555	GCF_000253495	GCF_000505625	GCF_000254215	GCF_0 [...]
+s__Marinomonas_mediterranea	1	GCF_000192865
+s__Lucerne_transient_streak_virus	1	PRJNA15337
+s__Gordonia_rubripertincta	1	GCF_000327325
+s__Spiroplasma_diminutum	1	GCF_000439455
+s__Chaetoceros_socialis_f_radians_RNA_virus_01	1	PRJNA34845
+s__Shigella_flexneri	25	GCF_000213755	GCF_000252895	GCF_000022245	GCF_000007405	GCF_000013585	GCF_000213475	GCF_000268165	GCF_000268085	GCF_000267985	GCF_000183785	GCF_000213715	GCF_000213435	GCF_000213675	GCF_000213495	GCF_000268025	GCF_000193935	GCF_000213695	GCF_000281795	GCF_000296305	GCF_000268065	GCF_000217895	GCF_000268245	GCF_000006925	GCF_000213735	GCF_000213455
+s__Bacillus_sp_B14905	1	GCF_000169315
+s__Thermocrinis_albus	1	GCF_000025605
+s__Methanolobus_tindarius	1	GCF_000504205
+s__Halanaerobium_praevalens	1	GCF_000165465
+s__Microviridae_phi_CA82	1	PRJNA70009
+s__Pantoea_phage_LIMElight	1	PRJNA181079
+s__Enterobacteria_phage_ES18	1	PRJNA15174
+s__Sida_golden_mosaic_Florida_virus	1	PRJNA51627
+s__Lactobacillus_amylolyticus	1	GCF_000178475
+s__Sweetpotato_badnavirus_B	1	PRJNA38241
+s__Corynebacterium_ciconiae	1	GCF_000372385
+s__Streptococcus_pseudopneumoniae	7	GCF_000506745	GCF_000258265	GCF_000506665	GCF_000506705	GCF_000257825	GCF_000221985	GCF_000506685
+s__Rhodanobacter_fulvus	1	GCF_000264315
+s__Anaerococcus_obesiensis	1	GCF_000311745
+s__Desulfovibrio_sp_X2	1	GCF_000422205
+s__Nocardiopsis_alba	2	GCF_000341225	GCF_000294515
+s__Legionella_longbeachae	2	GCF_000091785	GCF_000176095
+s__Fusobacterium_sp_oral_taxon_370	1	GCF_000235465
+s__Mycobacterium_phage_Contagion	1	PRJNA215114
+s__Lactococcus_phage_bIL312	1	PRJNA14113
+s__Arthrobacter_sp_SJCon	1	GCF_000332815
+s__Artemisia_virus_A	1	PRJNA165739
+s__African_green_monkey_polyomavirus	1	PRJNA15320
+s__Soybean_mosaic_virus	1	PRJNA15377
+s__Palyam_virus	1	PRJNA14923
+s__Bean_chlorosis_virus	1	PRJNA182753
+s__Mycobacterium_phage_SargentShorty9	1	PRJNA219108
+s__Deformed_wing_virus	2	PRJNA14891	PRJNA14957
+s__Human_T_lymphotropic_virus_4	1	PRJNA33481
+s__Methyloversatilis_universalis	3	GCF_000385375	GCF_000214035	GCF_000378945
+s__Black_raspberry_necrosis_virus	1	PRJNA17093
+s__Bradyrhizobium_sp_BTAi1	1	GCF_000015165
+s__Torque_teno_mini_virus_4	1	PRJNA48179
+s__Capnocytophaga_ochracea	3	GCF_000023285	GCF_000277585	GCF_000183985
+s__Torque_teno_mini_virus_6	1	PRJNA48189
+s__Sphingopyxis_baekryungensis	1	GCF_000420305
+s__Torque_teno_mini_virus_1	1	PRJNA48193
+s__Torque_teno_mini_virus_2	1	PRJNA48171
+s__Torque_teno_mini_virus_3	1	PRJNA48175
+s__Torque_teno_mini_virus_8	1	PRJNA48135
+s__Dyadobacter_beijingensis	1	GCF_000382205
+s__Bovine_viral_diarrhea_virus_3	1	PRJNA38557
+s__Bovine_viral_diarrhea_virus_2	1	PRJNA15089
+s__Bovine_viral_diarrhea_virus_1	1	PRJNA15305
+s__Herbaspirillum_seropedicae	3	GCF_000300435	GCF_000143225	GCF_000300415
+s__Providence_virus	1	PRJNA48417
+s__Clostridium_phage_phiSM101	1	PRJNA58117
+s__Arthrobacter_sp_161MFSha2_1	1	GCF_000374945
+s__Marinomonas_posidonica	1	GCF_000214215
+s__Pectobacterium_carotovorum	4	GCF_000173135	GCF_000294535	GCF_000023605	GCF_000173155
+s__Gastropod_associated_circular_ssDNA_virus	1	PRJNA192606
+s__Thioalkalivibrio_sp_ALMg13_2	1	GCF_000381185
+s__Western_equine_encephalitis_virus	1	PRJNA14831
+s__Taura_syndrome_virus	1	PRJNA14713
+s__Carnation_Italian_ringspot_virus	1	PRJNA15077
+s__Pseudoalteromonas_flavipulchra	1	GCF_000259115
+s__Staphylococcus_phage_phi5967PVL	1	PRJNA184165
+s__Mycobacterium_phage_DNAIII	1	PRJNA213079
+s__Pepper_yellow_mosaic_virus	1	PRJNA50567
+s__Pepper_severe_mosaic_virus	1	PRJNA17809
+s__Tomato_mosaic_Havana_virus	1	PRJNA14188
+s__East_African_cassava_mosaic_Zanzibar_virus	1	PRJNA14526
+s__Agrobacterium_phage_7_7_1	1	PRJNA181226
+s__Dragonfly_associated_microphage_1	1	PRJNA177547
+s__Vibrio_phage_vB_VchM_138	1	PRJNA181217
+s__Klebsiella_sp_OBRC7	1	GCF_000293135
+s__Bacillus_weihenstephanensis	1	GCF_000018825
+s__Bean_calico_mosaic_virus	1	PRJNA14165
+s__Longispora_albida	1	GCF_000379825
+s__Photorhabdus_luminescens	1	GCF_000196155
+s__Escherichia_phage_Cba120	1	PRJNA81001
+s__Firmicutes_bacterium_M10_2	1	GCF_000403415
+s__Staphylococcus_delphini	1	GCF_000308115
+s__Veillonella_sp_ACP1	1	GCF_000286635
+s__Acinetobacter_lwoffii	9	GCF_000369145	GCF_000369105	GCF_000368165	GCF_000248355	GCF_000301755	GCF_000369125	GCF_000487975	GCF_000219275	GCF_000162095
+s__Cupriavidus_sp_BIS7	1	GCF_000292345
+s__Anabaena_sp_PCC_7108	1	GCF_000332135
+s__Passiflora_latent_carlavirus	1	PRJNA17487
+s__Hop_mosaic_virus	1	PRJNA29191
+s__Burkholderia_phymatum	1	GCF_000020045
+s__Alistipes_shahii	1	GCF_000210575
+s__Nipah_virus	1	PRJNA15443
+s__Tomato_leaf_curl_Java_virus	1	PRJNA14296
+s__Oceanobacillus_sp_Ndiop	1	GCF_000285495
+s__Geopsychrobacter_electrodiphilus	1	GCF_000384395
+s__Psychrobacter_sp_PAMC_21119	1	GCF_000247495
+s__Alicyclobacillus_acidoterrestris	1	GCF_000444055
+s__Pantoea_stewartii	1	GCF_000248395
+s__Paenibacillus_sanguinis	1	GCF_000374825
+s__Acinetobacter_sp_TG19627	1	GCF_000302415
+s__Oscillatoria_acuminata	1	GCF_000317105
+s__Ambystoma_tigrinum_virus	1	PRJNA14364
+s__Tomato_leaf_curl_China_virus	1	PRJNA14342
+s__American_plum_line_pattern_virus	1	PRJNA14742
+s__Streptococcus_urinalis	2	GCF_000188055	GCF_000314815
+s__Metallosphaera_yellowstonensis	1	GCF_000243315
+s__Sideroxydans_lithotrophicus	1	GCF_000025705
+s__Haloarcula_marismortui	1	GCF_000011085
+s__Mobuck_virus	1	PRJNA225930
+s__Banana_bunchy_top_virus	1	PRJNA14621
+s__Helicobacter_felis	1	GCF_000200595
+s__Pipapillomavirus_1	1	PRJNA18011
+s__Pipapillomavirus_2	2	PRJNA18259	PRJNA50561
+s__Vulcanisaeta_distributa	1	GCF_000148385
+s__Salmonella_phage_SFP10	1	PRJNA74351
+s__Passion_fruit_mosaic_virus	1	PRJNA67109
+s__Pepper_leaf_curl_Lahore_virus	1	PRJNA89655
+s__Pseudomonas_brassicacearum	1	GCF_000194805
+s__Candiru_virus	1	PRJNA65423
+s__Neodiprion_sertifer_nucleopolyhedrovirus	1	PRJNA14383
+s__Helicobasidium_mompa_totivirus_1_17	1	PRJNA14918
+s__Porcine_bocavirus_4	1	PRJNA73549
+s__Acidocella_sp_MX_AZ02	1	GCF_000306035
+s__Bacillus_phage_SPbeta	1	PRJNA14034
+s__Paralichthys_olivaceus_birnavirus	1	PRJNA21035
+s__Acinetobacter_sp_NIPH_1867	1	GCF_000369545
+s__Staphylococcus_pasteuri	1	GCF_000494875
+s__Bacillus_phage_PM1	1	PRJNA195536
+s__Blackcurrant_reversion_virus	1	PRJNA14749
+s__Astrovirus_wild_boar_WBAstV_1_2011_HUN	1	PRJNA84401
+s__Leifsonia_xyli	2	GCF_000470775	GCF_000007665
+s__Rhodococcus_imtechensis	1	GCF_000260815
+s__Caulobacter_phage_CcrSwift	1	PRJNA179423
+s__Halobacillus_sp_BAB_2008	1	GCF_000328325
+s__Clerodendron_yellow_mosaic_virus	1	PRJNA19599
+s__Okra_yellow_vein_mosaic_virus	1	PRJNA14266
+s__Acinetobacter_sp_NIPH_809	1	GCF_000367945
+s__Dickeya_phage_Limestone	1	PRJNA185317
+s__Proteus_penneri	1	GCF_000155835
+s__Camelus_dromedarius_papillomavirus_type_2	1	PRJNA64599
+s__Camelus_dromedarius_papillomavirus_type_1	1	PRJNA64597
+s__Asclepias_asymptomatic_virus	1	PRJNA66899
+s__Caulobacter_phage_CcrRogue	1	PRJNA179422
+s__Ferrimonas_balearica	1	GCF_000148645
+s__Bacillus_sp_123MFChir2	1	GCF_000383235
+s__Lactobacillus_ingluviei	1	GCF_000312405
+s__Mycobacterium_phage_LittleCherry	1	PRJNA215674
+s__Peptoniphilus_sp_ph5	1	GCF_000311825
+s__Enterobacteria_phage_RB16	1	PRJNA51699
+s__Ruegeria_lacuscaerulensis	1	GCF_000161775
+s__Enterobacteria_phage_RB14	1	PRJNA37825
+s__Tomato_yellow_leaf_curl_virus_associated_DNA_beta	1	PRJNA28045
+s__Prevotella_disiens	2	GCF_000467875	GCF_000179675
+s__Acinetobacter_parvus	3	GCF_000248155	GCF_000368005	GCF_000368025
+s__Rhodopirellula_maiorica	1	GCF_000346295
+s__Vibrio_sp_RC586	1	GCF_000176715
+s__Acinetobacter_phage_133	1	PRJNA64541
+s__Brevibacterium_casei	1	GCF_000314575
+s__Rhodococcus_rhodochrous	1	GCF_000239135
+s__Mycobacterium_phage_Newman	1	PRJNA206032
+s__Prevotella_sp_BV3P1	1	GCF_000479005
+s__Chimpanzee_polyomavirus	1	PRJNA60731
+s__Veillonella_sp_3_1_44	1	GCF_000163715
+s__Mushroom_bacilliform_virus	1	PRJNA14676
+s__Machupo_virus	1	PRJNA14931
+s__Agrotis_segetum_granulovirus	1	PRJNA14481
+s__Nocardiopsis_chromatogenes	1	GCF_000341185
+s__Clavispora_lusitaniae	1	GCA_000003835
+s__Sphingobium_sp_YL23	1	GCF_000412635
+s__Tomato_leaf_curl_Patna_betasatellite	1	PRJNA36541
+s__Paracoccus_sp_N5	1	GCF_000371965
+s__Pepper_leaf_curl_Yunnan_virus_satellite_DNA_beta	1	PRJNA29415
+s__Grapevine_fanleaf_virus	1	PRJNA15286
+s__Parrot_hepatitis_B_virus	1	PRJNA80909
+s__Cherry_rusty_mottle_associated_virus	1	PRJNA196970
+s__Streptococcus_minor	1	GCF_000377005
+s__Desulfitobacterium_dichloroeliminans	1	GCF_000243135
+s__Sulfolobus_islandicus	20	GCF_000245095	GCF_000245155	GCF_000024305	GCF_000245215	GCF_000364745	GCF_000022385	GCF_000022405	GCF_000245235	GCF_000245275	GCF_000189555	GCF_000022425	GCF_000022445	GCF_000245135	GCF_000022465	GCF_000245195	GCF_000245255	GCF_000245175	GCF_000022485	GCF_000245115	GCF_000189575
+s__Afipia_clevelandensis	1	GCF_000336555
+s__Nitrobacter_hamburgensis	1	GCF_000013885
+s__Deinococcus_sp_2009	1	GCF_000419625
+s__Narcissus_symptomless_virus	1	PRJNA18071
+s__Acinetobacter_sp_GG2	1	GCF_000292385
+s__Peristrophe_mosaic_virus	1	PRJNA178459
+s__Barbel_circovirus	1	PRJNA65821
+s__Pseudoalteromonas_piscicida	2	GCF_000238315	GCF_000382005
+s__Colombian_datura_virus	1	PRJNA185274
+s__Thioalkalivibrio_sp_ALD1	1	GCF_000381245
+s__Nocardiopsis_gilva	1	GCF_000341165
+s__Salmonella_phage_L13	1	PRJNA206468
+s__Cellulophaga_phage_phi48_2	1	PRJNA212947
+s__Lactobacillus_equi	1	GCF_000504525
+s__Physalis_mottle_virus	1	PRJNA15090
+s__Phaeospirillum_fulvum	1	GCF_000442515
+s__Enterobacteria_phage_mEp460	1	PRJNA183148
+s__Feldmannia_species_virus	1	PRJNA31093
+s__Methylobacterium_populi	1	GCF_000019945
+s__Blainvillea_yellow_spot_virus	1	PRJNA30183
+s__Miscanthus_streak_virus	1	PRJNA14151
+s__Cyanothece_sp_PCC_8801	1	GCF_000021805
+s__Cyanothece_sp_PCC_8802	1	GCF_000024045
+s__Sugarcane_bacilliform_virus	1	PRJNA41599
+s__Maize_chlorotic_mottle_virus	1	PRJNA15117
+s__Bacteroides_sp_2_1_56FAA	1	GCF_000218345
+s__Pseudomonas_sp_GM50	1	GCF_000282375
+s__Beet_pseudoyellows_virus	1	PRJNA14901
+s__Methanoregula_boonei	1	GCF_000017625
+s__Pseudomonas_sp_GM55	1	GCF_000282395
+s__Nitrosospira_sp_APG3	1	GCF_000355765
+s__Rhodobacter_sp_SW2	1	GCF_000176015
+s__Clostridium_saccharoperbutylacetonicum	2	GCF_000334435	GCF_000340885
+s__Glaciecola_pallidula	1	GCF_000315035
+s__Potato_spindle_tuber_viroid	1	PRJNA14966
+s__Leishmania_mexicana	1	GCA_000234665
+s__Enterobacteria_phage_Hgal1	1	PRJNA184161
+s__Tetragenococcus_halophilus	1	GCF_000283615
+s__Corynebacterium_massiliense	1	GCF_000420605
+s__Frog_virus_3	1	PRJNA14560
+s__Ectromelia_virus	1	PRJNA14211
+s__Microscilla_marina	1	GCF_000169175
+s__Campylobacter_concisus	1	GCF_000017725
+s__Soil_borne_cereal_mosaic_virus	1	PRJNA14693
+s__Apple_stem_pitting_virus	1	PRJNA14744
+s__Brevibacterium_sp_JC43	1	GCF_000285835
+s__Brugmansia_suaveolens_mottle_virus	1	PRJNA52813
+s__Bombyx_mori_cypovirus_1_satellite_RNA	1	PRJNA14557
+s__Enterobacteria_phage_K30	1	PRJNA68413
+s__Pyrolobus_fumarii	1	GCF_000223395
+s__Pseudomonas_phage_F10	1	PRJNA16383
+s__Listeria_phage_P35	1	PRJNA20799
+s__Enterobacter_sp_MGH_34	1	GCF_000492895
+s__Paracoccus_aminophilus	1	GCF_000444995
+s__Sauropus_leaf_curl_disease_associated_DNA_beta	1	PRJNA176433
+s__Enterobacter_sp_MGH_38	1	GCF_000492855
+s__Gillisia_marina	1	GCF_000258765
+s__Enterobacteria_phage_HX01	1	PRJNA177542
+s__Actinobacillus_pleuropneumoniae	15	GCF_000178575	GCF_000016685	GCF_000178655	GCF_000020405	GCF_000178495	GCF_000178535	GCF_000179295	GCF_000178615	GCF_000295915	GCF_000178635	GCF_000015885	GCF_000178515	GCF_000179275	GCF_000178555	GCF_000178595
+s__Nocardiopsis_potens	1	GCF_000341105
+s__Thermoanaerobacter_mathranii	1	GCF_000092965
+s__Freesia_mosaic_virus	1	PRJNA48387
+s__Yersinia_kristensenii	1	GCF_000173715
+s__Salmonella_phage_FSL_SP_076	1	PRJNA212719
+s__Pelotomaculum_thermopropionicum	1	GCF_000010565
+s__Siniperca_chuatsi_rhabdovirus	1	PRJNA18009
+s__Bifidobacterium_breve	7	GCF_000411435	GCF_000158015	GCF_000213865	GCF_000226175	GCF_000220135	GCF_000247755	GCF_000466545
+s__Porphyromonas_levii	1	GCF_000379925
+s__Spodoptera_litura_nucleopolyhedrovirus_II	1	PRJNA33005
+s__Corynebacterium_diphtheriae	17	GCF_000255155	GCF_000255235	GCF_000455785	GCF_000241915	GCF_000455805	GCF_000257885	GCF_000195815	GCF_000241935	GCF_000255215	GCF_000242775	GCF_000241895	GCF_000255255	GCF_000241875	GCF_000255175	GCF_000255195	GCF_000255275	GCF_000263415
+s__Torque_teno_midi_virus_2	1	PRJNA48185
+s__Torque_teno_midi_virus_1	1	PRJNA19131
+s__Amphibacillus_xylanus	1	GCF_000307165
+s__Black_queen_cell_virus	1	PRJNA14803
+s__Pseudomonas_phage_vB_PaeS_PMG1	1	PRJNA82649
+s__Peptostreptococcus_anaerobius	3	GCF_000381525	GCF_000178095	GCF_000318115
+s__Tsukamurella_paurometabola	1	GCF_000092225
+s__Burkholderia_gladioli	2	GCF_000365265	GCF_000194745
+s__Bean_yellow_disorder_virus	1	PRJNA29237
+s__Megavirus_chiliensis	1	PRJNA74349
+s__Mavirus	1	PRJNA64497
+s__Cryphonectria_parasitica_bipartite_mycovirus_1	1	PRJNA203281
+s__Klebsiella_sp_MS_92_3	1	GCF_000195655
+s__Leishmania_infantum	1	GCA_000002875
+s__Methanocaldococcus_infernus	1	GCF_000092305
+s__Bifidobacterium_pseudocatenulatum	1	GCF_000173435
+s__Lactococcus_phage_P087	1	PRJNA37887
+s__Cydia_pomonella_granulovirus	1	PRJNA14118
+s__Escherichia_phage_PhaxI	1	PRJNA181080
+s__Brevundimonas_sp_BAL3	1	GCF_000155575
+s__Pseudomonas_sp_CBZ_4	1	GCF_000346755
+s__Melandrium_yellow_fleck_virus	1	PRJNA40661
+s__Infectious_flacherie_virus	1	PRJNA14800
+s__Sweet_potato_leaf_curl_Lanzarote_virus	1	PRJNA41625
+s__Pseudoxanthomonas_spadix	1	GCF_000233915
+s__Trichodysplasia_spinulosa_associated_polyomavirus	1	PRJNA51185
+s__Asticcacaulis_sp_AC466	1	GCF_000495815
+s__Iresine_viroid_1	1	PRJNA14765
+s__Bacillus_sp_JS	1	GCF_000259365
+s__Microbacterium_yannicii	1	GCF_000304335
+s__Vibrio_phage_VFJ	1	PRJNA209358
+s__Wolinella_succinogenes	1	GCF_000196135
+s__Aurantiochytrium_single_stranded_RNA_virus_01	1	PRJNA16134
+s__Pseudomonas_phage_LKD16	1	PRJNA21043
+s__Vibrio_phage_ICP3	1	PRJNA63233
+s__Macrococcus_caseolyticus	1	GCF_000010585
+s__Pseudomonas_monteilii	1	GCF_000262005
+s__Halococcus_hamelinensis	2	GCF_000259215	GCF_000336675
+s__Tomato_mottle_virus	1	PRJNA14079
+s__Clostridium_phage_c_st	1	PRJNA16151
+s__Musca_hytrovirus	1	PRJNA29631
+s__Rhodopseudomonas_sp_B29	1	GCF_000333455
+s__Heterosigma_akashiwo_RNA_virus	1	PRJNA15425
+s__Enterobacter_aerogenes	3	GCF_000215745	GCF_000334515	GCF_000383335
+s__Salinimonas_chungwhensis	1	GCF_000378185
+s__Tomato_aspermy_virus	1	PRJNA14815
+s__Nocardia_sp_348MFTsu5_1	1	GCF_000383535
+s__Enterobacter_cloacae	19	GCF_000025565	GCF_000210775	GCF_000422225	GCF_000492455	GCF_000264705	GCF_000492715	GCF_000467655	GCF_000492495	GCF_000496775	GCF_000492435	GCF_000286275	GCF_000492675	GCF_000492575	GCF_000492615	GCF_000315775	GCF_000390425	GCF_000300455	GCF_000235765	GCF_000239975
+s__Spirosoma_linguale	1	GCF_000024525
+s__Amycolatopsis_benzoatilytica	1	GCF_000383915
+s__Acinetobacter_towneri	1	GCF_000368785
+s__Hibiscus_green_spot_virus	1	PRJNA76343
+s__Isfahan_virus	1	PRJNA194141
+s__Donghicola_sp_S598	1	GCF_000308135
+s__Mycobacterium_phage_Redno2	1	PRJNA215125
+s__Rousettus_bat_coronavirus_HKU9	1	PRJNA18867
+s__Rhodococcus_phage_REQ3	1	PRJNA81175
+s__Rhodococcus_phage_REQ2	1	PRJNA81171
+s__Bradyrhizobium_japonicum	3	GCF_000284375	GCF_000374205	GCF_000379585
+s__Pelistega_sp_HM_7	1	GCF_000506865
+s__Vibrio_phage_helene_12B3	1	PRJNA198433
+s__Pseudomonas_sp_PAMC_26793	1	GCF_000313235
+s__Rhizobium_sp_CF080	1	GCF_000282095
+s__Thermoanaerobacter_siderophilus	1	GCF_000262445
+s__Kudzu_mosaic_virus	1	PRJNA20053
+s__Pantoea_sp_At_9b	1	GCF_000175935
+s__Gill_associated_virus	1	PRJNA28679
+s__Clostridium_phage_phiCD6356	1	PRJNA64557
+s__Rhizobium_sp_CCGE_510	1	GCF_000292525
+s__Rickettsia_felis	1	GCF_000012145
+s__Candidatus_Portiera_aleyrodidarum	5	GCF_000349745	GCF_000300075	GCF_000300035	GCF_000298385	GCF_000292685
+s__Watermelon_mosaic_virus	1	PRJNA15046
+s__Halovirus_HGTV_1	1	PRJNA206496
+s__Staphylococcus_arlettae	1	GCF_000295715
+s__Peptoniphilus_lacrimalis	2	GCF_000176955	GCF_000378725
+s__Encephalitozoon_hellem	1	GCA_000277815
+s__Sphingomonas_sp_MM_1	1	GCF_000347675
+s__Aggregatibacter_sp_oral_taxon_458	1	GCF_000466335
+s__Aleutian_mink_disease_virus	1	PRJNA14077
+s__Serratia_proteamaculans	1	GCF_000018085
+s__Mycobacterium_phlei	1	GCF_000257725
+s__Xanthomonas_phage_Xp10	1	PRJNA14292
+s__Mycobacterium_phage_244	1	PRJNA17115
+s__Xanthomonas_phage_Xp15	1	PRJNA15255
+s__Lactobacillus_jensenii	7	GCF_000161895	GCF_000466805	GCF_000175035	GCF_000159335	GCF_000155915	GCF_000162435	GCF_000162335
+s__Ralstonia_solanacearum	13	GCF_000212635	GCF_000215325	GCF_000285815	GCF_000223115	GCF_000331875	GCF_000167955	GCF_000283475	GCF_000348545	GCF_000197855	GCF_000009125	GCF_000331895	GCF_000427195	GCF_000430925
+s__Acinetobacter_sp_NCTC_7422	1	GCF_000248195
+s__Polaromonas_naphthalenivorans	1	GCF_000015505
+s__Kocuria_sp_UCD_OTCP	1	GCF_000349605
+s__Chaetoceros_tenuissimus_DNA_virus	1	PRJNA60753
+s__Halovirus_PH1	1	PRJNA196975
+s__Archaeoglobus_fulgidus	1	GCF_000008665
+s__Borrelia_spielmanii	1	GCF_000181895
+s__Methanomassiliicoccus_luminyensis	1	GCF_000308215
+s__Choristoneura_fumiferana_multiple_nucleopolyhedrovirus	1	PRJNA15133
+s__Euphorbia_mosaic_virus	1	PRJNA17551
+s__Sulfolobus_virus_Kamchatka_1	1	PRJNA14355
+s__White_clover_mosaic_virus	1	PRJNA15069
+s__Pseudomonas_phage_PP7	1	PRJNA15076
+s__Papaya_leaf_crumple_virus	1	PRJNA60361
+s__Paspalum_dilatatum_striate_mosaic_virus	1	PRJNA174777
+s__Honeysuckle_yellow_vein_mosaic_virus	1	PRJNA14172
+s__Roseburia_intestinalis	3	GCF_000209995	GCF_000156535	GCF_000210655
+s__Euprosterna_elaeasa_virus	1	PRJNA14737
+s__Acidithiobacillus_ferrooxidans	2	GCF_000021485	GCF_000020825
+s__Powassan_virus	1	PRJNA15304
+s__Mycobacterium_phage_Llij	1	PRJNA17149
+s__Actinomyces_oris	1	GCF_000180155
+s__Enzootic_nasal_tumour_virus_of_goats	1	PRJNA14893
+s__Human_herpesvirus_8	1	PRJNA14158
+s__Microbacterium_laevaniformans	1	GCF_000255595
+s__Human_herpesvirus_4	2	PRJNA14413	PRJNA20959
+s__Human_herpesvirus_5	1	PRJNA14559
+s__Human_herpesvirus_7	1	PRJNA14625
+s__Pelosinus_fermentans	6	GCF_000271525	GCF_000271505	GCF_000271465	GCF_000271485	GCF_000271545	GCF_000271665
+s__Human_herpesvirus_1	1	PRJNA15217
+s__Human_herpesvirus_2	1	PRJNA15218
+s__Human_herpesvirus_3	1	PRJNA15198
+s__Brenneria_sp_EniD312	1	GCF_000225565
+s__Sphingomonas_sp_PAMC_26617	1	GCF_000242835
+s__Pseudoalteromonas_sp_BSi20429	1	GCF_000238895
+s__Begomovirus_associated_DNA_III	1	PRJNA15162
+s__Sweet_potato_caulimo_like_virus	1	PRJNA65307
+s__Metallosphaera_cuprina	1	GCF_000204925
+s__Avian_sapelovirus	1	PRJNA15039
+s__Desulfotomaculum_acetoxidans	1	GCF_000024205
+s__Rickettsiella_grylli	1	GCF_000168295
+s__Escherichia_phage_wV8	1	PRJNA38281
+s__Cricket_paralysis_virus	1	PRJNA14832
+s__Escherichia_phage_wV7	1	PRJNA181232
+s__Loktanella_cinnabarina	1	GCF_000466965
+s__Xanthomonas_sp_SHU199	1	GCF_000364665
+s__Haloferax_elongans	1	GCF_000336755
+s__Tobacco_vein_banding_mosaic_virus	1	PRJNA27895
+s__Corynebacterium_glutamicum	7	GCF_000404145	GCF_000404185	GCF_000224315	GCF_000417765	GCF_000233355	GCF_000445015	GCF_000010225
+s__Pseudomonas_veronii	1	GCF_000350565
+s__Corynebacterium_accolens	2	GCF_000159115	GCF_000146485
+s__Spiroplasma_phage_1_R8A2B	1	PRJNA14580
+s__Jonesia_denitrificans	1	GCF_000024065
+s__Listeria_fleischmannii	2	GCF_000344175	GCF_000252625
+s__Halorubrum_distributum	2	GCF_000337055	GCF_000337335
+s__Acinetobacter_sp_NIPH_3623	1	GCF_000369785
+s__Eubacteriaceae_bacterium_OBRC8	1	GCF_000293035
+s__Chili_leaf_curl_Bhatinda_betasatellite	1	PRJNA206467
+s__Nocardiopsis_lucentensis	1	GCF_000341125
+s__Konjac_mosaic_virus	1	PRJNA16643
+s__Streptomyces_bottropensis	2	GCF_000383595	GCF_000340335
+s__Bradyrhizobium_oligotrophicum	1	GCF_000344805
+s__Maize_white_line_mosaic_satellite_virus	1	PRJNA14770
+s__Paenibacillus_sp_JC66	1	GCF_000285515
+s__Actinomyces_odontolyticus	2	GCF_000154225	GCF_000163415
+s__Acinetobacter_sp_NIPH_1847	1	GCF_000369605
+s__Peanut_chlorotic_streak_virus	1	PRJNA14388
+s__Plasmodium_knowlesi	1	GCA_000006355
+s__Naegleria_gruberi	1	GCA_000004985
+s__Nitrospina_sp_SCGC_AAA288_L16	1	GCF_000372225
+s__Wheat_yellow_dwarf_virus_GPV	1	PRJNA39307
+s__Ureaplasma_urealyticum	14	GCF_000169935	GCF_000171395	GCF_000255395	GCF_000169595	GCF_000021265	GCF_000169535	GCF_000169575	GCF_000169955	GCF_000255375	GCF_000171555	GCF_000255415	GCF_000169915	GCF_000255435	GCF_000169555
+s__Lausannevirus	1	PRJNA65279
+s__Olive_latent_virus_3	1	PRJNA46223
+s__Olive_latent_virus_2	1	PRJNA14778
+s__Olive_latent_virus_1	1	PRJNA15084
+s__Xenorhabdus_nematophila	1	GCF_000252955
+s__Burkholderia_phage_BcepMigl	1	PRJNA184149
+s__Pseudomonas_sp_EGD_AK9	1	GCF_000465935
+s__Erwinia_phage_phiEaH2	1	PRJNA184155
+s__Melegrivirus_A	1	PRJNA202886
+s__Potato_virus_M	1	PRJNA15324
+s__Potato_virus_H	1	PRJNA171010
+s__Cucurbita_yellow_vein_virus_associated_DNA_beta	1	PRJNA14525
+s__Neisseria_macacae	1	GCF_000220865
+s__Potato_virus_A	1	PRJNA15376
+s__Leptospira_interrogans	189	GCF_000244135	GCF_000343435	GCF_000246595	GCF_000246475	GCF_000246455	GCF_000343085	GCF_000343715	GCF_000343145	GCF_000216815	GCF_000246215	GCF_000306135	GCF_000347115	GCF_000342865	GCF_000216635	GCF_000217495	GCF_000347055	GCF_000217095	GCF_000342925	GCF_000342705	GCF_000244375	GCF_000243955	GCF_000216355	GCF_000246275	GCF_000244635	GCF_000217575	GCF_000342665	GCF_000343795	GCF_000343675	GCF_000216975	GCF_000343775	GCF_000343595	GCF_000343165	GCF_000231175	 [...]
+s__Potato_virus_X	1	PRJNA15503
+s__Potato_virus_Y	1	PRJNA15290
+s__Potato_virus_V	1	PRJNA15379
+s__Potato_virus_T	1	PRJNA30735
+s__Xanthomonas_sacchari	1	GCF_000225975
+s__Potato_virus_S	1	PRJNA15574
+s__Hippea_maritima	1	GCF_000194135
+s__Clostridium_sticklandii	1	GCF_000196455
+s__Brucella_sp_UK5_01	1	GCF_000367105
+s__Oribacterium_sp_ACB1	1	GCF_000238055
+s__Streptomyces_sp_MspMP_M5	1	GCF_000373585
+s__Pepper_leaf_curl_virus	1	PRJNA14046
+s__Rice_gall_dwarf_virus	1	PRJNA19149
+s__Candidatus_Photodesmus_katoptron	1	GCF_000478685
+s__Candidatus_Ruthia_magnifica	1	GCF_000015105
+s__Methylocystis_rosea	1	GCF_000372845
+s__Circovirus_like_genome_SAR_A	1	PRJNA39631
+s__Mycobacterium_phage_Halo	1	PRJNA17147
+s__Circovirus_like_genome_SAR_B	1	PRJNA39607
+s__Synechococcus_phage_S_ShM2	1	PRJNA64699
+s__Black_beetle_virus	1	PRJNA14961
+s__Streptomyces_roseochromogenes	1	GCF_000497445
+s__Usutu_virus	1	PRJNA15047
+s__Atopobium_sp_BV3Ac4	1	GCF_000468815
+s__Orgyia_leucostigma_NPV	1	PRJNA28501
+s__Brachyspira_innocens	1	GCF_000384655
+s__Acinetobacter_johnsonii	5	GCF_000302335	GCF_000162055	GCF_000368045	GCF_000301735	GCF_000368805
+s__Bacillus_phage_CampHawk	1	PRJNA227116
+s__Vibrio_phage_VfO4K68	1	PRJNA14094
+s__Hollyhock_leaf_crumple_virus	1	PRJNA14206
+s__Mycobacterium_sp_JLS	1	GCF_000016005
+s__Porcine_parvovirus	1	PRJNA14055
+s__Sudan_ebolavirus	1	PRJNA15012
+s__Bacillus_sp_ZYK	1	GCF_000331575
+s__Halogeometricum_borinquense	2	GCF_000337855	GCF_000172995
+s__Haemophilus_pittmaniae	1	GCF_000223275
+s__Oribacterium_sp_oral_taxon_108	1	GCF_000214455
+s__Malvastrum_yellow_vein_Changa_Manga_virus	1	PRJNA60047
+s__Beet_curly_top_Iran_virus	1	PRJNA28973
+s__Gordonia_phage_GTE7	1	PRJNA76745
+s__Chicory_yellow_mottle_virus_large_satellite_RNA	1	PRJNA14798
+s__Gordonia_phage_GTE5	1	PRJNA78693
+s__Gordonia_phage_GTE2	1	PRJNA68415
+s__Candidatus_Methylomirabilis_oxyfera	1	GCF_000091165
+s__Taylorella_equigenitalis	2	GCF_000276685	GCF_000185745
+s__Cellulophaga_phage_phi14_2	1	PRJNA212954
+s__Equine_rhinitis_A_virus	1	PRJNA15205
+s__Microvirga_sp_WSM3557	1	GCF_000262405
+s__Dickeya_dadantii	3	GCF_000025065	GCF_000023545	GCF_000147055
+s__Bean_pod_mottle_virus	1	PRJNA15294
+s__Streptomyces_somaliensis	1	GCF_000258595
+s__Casphalia_extranea_densovirus	1	PRJNA14222
+s__Dill_cryptic_virus_1	1	PRJNA225921
+s__Clostridium_sartagoforme	1	GCF_000401215
+s__Dill_cryptic_virus_2	1	PRJNA198774
+s__Thioalkalivibrio_sp_ALJ3	1	GCF_000377205
+s__Thioalkalivibrio_sp_ALJ2	1	GCF_000378325
+s__Lachnospiraceae_bacterium_2_1_46FAA	1	GCF_000209385
+s__Thioalkalivibrio_sp_ALJ7	1	GCF_000376865
+s__Thioalkalivibrio_sp_ALJ6	1	GCF_000377365
+s__Thioalkalivibrio_sp_ALJ5	1	GCF_000377245
+s__Thioalkalivibrio_sp_ALJ4	1	GCF_000377225
+s__Thioalkalivibrio_sp_ALJ9	1	GCF_000380585
+s__Thioalkalivibrio_sp_ALJ8	1	GCF_000377385
+s__Abutilon_mosaic_virus	1	PRJNA14603
+s__Staphylococcus_phage_SAP_26	1	PRJNA51671
+s__Photobacterium_profundum	2	GCF_000196255	GCF_000153425
+s__TYLCCNV_Y322_satellite_DNA_beta	1	PRJNA16338
+s__Bacteroides_fragilis	14	GCF_000009925	GCF_000273095	GCF_000269525	GCF_000273765	GCF_000210835	GCF_000273115	GCF_000025985	GCF_000273155	GCF_000297695	GCF_000263115	GCF_000157015	GCF_000297755	GCF_000297735	GCF_000273135
+s__Streptomyces_viridosporus	1	GCF_000316095
+s__Xanthophyllomyces_dendrorhous_virus_L1A	1	PRJNA196417
+s__Bean_yellow_mosaic_virus	1	PRJNA15339
+s__Bifidobacterium_angulatum	1	GCF_000156635
+s__Desulfospira_joergensenii	1	GCF_000420085
+s__Haemophilus_sputorum	2	GCF_000287615	GCF_000238795
+s__Acinetobacter_sp_CIP_101966	1	GCF_000369725
+s__Grapevine_Syrah_virus_1	1	PRJNA36515
+s__Ageratum_leaf_curl_Cameroon_betasatellite	1	PRJNA36669
+s__Streptomyces_prunicolor	1	GCF_000367365
+s__Natrialba_phage_PhiCh1	1	PRJNA14207
+s__Candidatus_Microgenomatus_auricola	1	GCF_000380825
+s__Euphorbia_yellow_mosaic_virus	1	PRJNA36655
+s__Thioalkalivibrio_sp_ALJT	1	GCF_000381825
+s__Veillonella_atypica	3	GCF_000179735	GCF_000318355	GCF_000179755
+s__Streptomyces_sp_AA1529	1	GCF_000280905
+s__Mycobacterium_phage_Muddy	1	PRJNA215120
+s__Coleus_vein_necrosis_virus	1	PRJNA20665
+s__Tomato_leaf_curl_Cameroon_virus	1	PRJNA42743
+s__Lactobacillus_acidipiscis	1	GCF_000260635
+s__Ruegeria_sp_R11	1	GCF_000156255
+s__Corynebacterium_sp_KPL1824	1	GCF_000478095
+s__Bartonella_tribocorum	1	GCF_000196435
+s__Human_endogenous_retrovirus_K	1	PRJNA222261
+s__Endoriftia_persephone	1	GCF_000168735
+s__Coconut_cadang_cadang_viroid	1	PRJNA14629
+s__Canary_circovirus	1	PRJNA14513
+s__Kingella_kingae	3	GCF_000213535	GCF_000255635	GCF_000283375
+s__Yersinia_mollaretii	1	GCF_000167995
+s__Alcanivorax_hongdengensis	1	GCF_000300995
+s__Janibacter_hoylei	1	GCF_000297495
+s__Streptococcus_cristatus	2	GCF_000187855	GCF_000222765
+s__Panicum_mosaic_satellite_virus	1	PRJNA14816
+s__Mayaro_virus	1	PRJNA15392
+s__Fusobacterium_mortiferum	1	GCF_000158195
+s__Eggplant_latent_viroid	1	PRJNA14977
+s__Ovine_enzootic_nasal_tumor_virus	1	PRJNA15410
+s__Burkholderia_phytofirmans	1	GCF_000020125
+s__Enterobacter_sp_MGH_16	1	GCF_000493175
+s__Enterobacter_sp_MGH_14	1	GCF_000474785
+s__Staphylococcus_phage_phiPVL108	1	PRJNA18463
+s__Otomops_polyomavirus_1	1	PRJNA185192
+s__Otomops_polyomavirus_2	1	PRJNA185193
+s__Tomato_yellow_leaf_curl_Vietnam_virus_satellite_DNA_beta	1	PRJNA19829
+s__Hahella_chejuensis	1	GCF_000012985
+s__Pyrococcus_sp_ST04	1	GCF_000263735
+s__Salmonella_phage_FSL_SP_058	1	PRJNA212712
+s__Ralstonia_phage_RSB1	1	PRJNA31163
+s__Pseudomonas_savastanoi	6	GCF_000225805	GCF_000187065	GCF_000012205	GCF_000187045	GCF_000143005	GCF_000164015
+s__Catonella_morbi	1	GCF_000160035
+s__Caldalkalibacillus_thermarum	1	GCF_000218765
+s__Human_cyclovirus_VS5700009	1	PRJNA209365
+s__Acartia_tonsa_copepod_circovirus	1	PRJNA186432
+s__Great_Island_virus	1	PRJNA52641
+s__Filifactor_alocis	1	GCF_000163895
+s__Aeromonas_molluscorum	1	GCF_000388115
+s__Cupriavidus_pinatubonensis	1	GCF_000203875
+s__Klebsiella_variicola	1	GCF_000025465
+s__Tomato_yellow_leaf_curl_Kanchanaburi_virus	1	PRJNA14360
+s__Okra_yellow_vein_disease_associated_sequence	1	PRJNA14443
+s__Syntrophus_aciditrophicus	1	GCF_000013405
+s__Paenibacillus_daejeonensis	1	GCF_000378385
+s__Armigeres_subalbatus_virus_SaX06_AK20	1	PRJNA56065
+s__Serratia_sp_DD3	1	GCF_000496755
+s__Enterobacteria_phage_Chi	1	PRJNA206471
+s__Paracoccus_denitrificans	2	GCF_000203895	GCF_000219825
+s__Phascolarctobacterium_succinatutens	1	GCF_000188175
+s__Aeromonas_diversa	1	GCF_000367845
+s__Simian_foamy_virus	1	PRJNA14699
+s__Bacteriovorax_sp_BAL6_X	1	GCF_000443995
+s__Asticcacaulis_sp_AC402	1	GCF_000495835
+s__Methylocella_silvestris	1	GCF_000021745
+s__Staphylococcus_phage_SA13	1	PRJNA213078
+s__Staphylococcus_phage_SA12	1	PRJNA212955
+s__Staphylococcus_phage_SA11	1	PRJNA181242
+s__Kitasatospora_setae	1	GCF_000269985
+s__Xanthomonas_hortorum	1	GCF_000505565
+s__Aspergillus_flavus	1	GCA_000006275
+s__Sonchus_yellow_net_virus	1	PRJNA14642
+s__Papio_hamadryas_papillomavirus_type_1	1	PRJNA159111
+s__Paenibacillus_ginsengihumi	1	GCF_000380965
+s__Staphylococcus_sp_AL1	1	GCF_000292305
+s__Banana_streak_virus_strain_Acuminata_Vietnam	1	PRJNA15240
+s__Lactobacillus_sp_66c	1	GCF_000312625
+s__Thioalkalivibrio_sp_ALJ17	1	GCF_000377945
+s__Thioalkalivibrio_sp_ALJ16	1	GCF_000377345
+s__Thioalkalivibrio_sp_ALJ15	1	GCF_000383695
+s__Thioalkalivibrio_sp_ALJ12	1	GCF_000378305
+s__Thioalkalivibrio_sp_ALJ11	1	GCF_000376925
+s__Thioalkalivibrio_sp_ALJ10	1	GCF_000377305
+s__Acaryochloris_marina	1	GCF_000018105
+s__Sulfolobus_spindle_shaped_virus_4	1	PRJNA27893
+s__Enterobacteria_phage_RB32	1	PRJNA17997
+s__Staphylococcus_prophage_phiPV83	1	PRJNA14135
+s__Lactobacillus_saerimneri	1	GCF_000317165
+s__Feline_leukemia_virus	1	PRJNA14686
+s__Caulobacter_sp_JGI_0001013_D04	1	GCF_000376305
+s__Robiginitalea_biformata	1	GCF_000024125
+s__Mycobacterium_phage_DD5	1	PRJNA30513
+s__Corynebacterium_halotolerans	1	GCF_000341345
+s__Merkel_cell_polyomavirus	1	PRJNA28509
+s__Beluga_Whale_coronavirus_SW1	1	PRJNA29509
+s__Pseudomonas_phage_AF	1	PRJNA184151
+s__Desulfurispora_thermophila	1	GCF_000376385
+s__Pantoea_phage_LIMEzero	1	PRJNA67417
+s__Leishmania_RNA_virus_2_1	1	PRJNA14696
+s__Vibrio_rumoiensis	1	GCF_000286955
+s__Streptomyces_griseoaurantiacus	1	GCF_000204605
+s__Actinomyces_timonensis	1	GCF_000295095
+s__Afipia_broomeae	1	GCF_000314675
+s__Campylobacter_sp_FOBRC14	1	GCF_000287855
+s__Methanosaeta_concilii	1	GCF_000204415
+s__Brome_streak_mosaic_virus	1	PRJNA15336
+s__Vibrio_owensii	1	GCF_000400385
+s__Yokose_virus	1	PRJNA15118
+s__Truepera_radiovictrix	1	GCF_000092425
+s__Adeno_associated_virus_3	1	PRJNA14319
+s__Tetraselmis_viridis_virus_S1	1	PRJNA195496
+s__Cotton_leaf_curl_Kokhran_virus	1	PRJNA14241
+s__Neisseria_flavescens	2	GCF_000175275	GCF_000173935
+s__Okra_yellow_mosaic_Mexico_virus	1	PRJNA48103
+s__Bartonella_bovis	2	GCF_000385395	GCF_000384965
+s__Cocksfoot_mottle_virus	1	PRJNA15078
+s__Influenza_C_virus	1	PRJNA15055
+s__Methylohalobius_crimeensis	1	GCF_000421465
+s__Capsicum_chlorosis_virus	1	PRJNA17547
+s__Streptomyces_lividans	2	GCF_000403665	GCF_000158935
+s__Halogeometricum_pleomorphic_virus_1	1	PRJNA157263
+s__Simian_adenovirus_18	1	PRJNA218146
+s__Wisteria_vein_mosaic_virus	1	PRJNA15532
+s__Theileria_annulata	1	GCA_000003225
+s__Sheldgoose_hepatitis_B_virus	1	PRJNA14618
+s__Mycobacterium_phage_Gizmo	1	PRJNA206479
+s__Subterranean_clover_stunt_virus	1	PRJNA14180
+s__Oceanimonas_smirnovii	1	GCF_000381965
+s__Enterococcus_faecium	241	GCF_000394435	GCF_000394695	GCF_000396765	GCF_000250945	GCF_000295015	GCF_000392195	GCF_000396925	GCF_000322045	GCF_000395465	GCF_000396965	GCF_000415285	GCF_000415365	GCF_000394715	GCF_000394555	GCF_000295395	GCF_000295275	GCF_000321805	GCF_000295215	GCF_000394655	GCF_000321765	GCF_000392105	GCF_000392165	GCF_000395885	GCF_000295435	GCF_000321685	GCF_000295055	GCF_000295575	GCF_000397025	GCF_000392065	GCF_000392085	GCF_000295455	GCF_000294815	GCF_000407105	GC [...]
+s__Ectropis_obliqua_virus	1	PRJNA14953
+s__Ralstonia_sp_GA3_3	1	GCF_000389805
+s__Bradyrhizobium_sp_WSM471	1	GCF_000244915
+s__Corynebacterium_capitovis	1	GCF_000372085
+s__Amycolicicoccus_subflavus	1	GCF_000214175
+s__Candidatus_Desulforudis_audaxviator	1	GCF_000018425
+s__Thin_paspalum_asymptomatic_virus	1	PRJNA210800
+s__Horseradish_latent_virus	1	PRJNA177549
+s__Lactobacillus_plantarum	16	GCF_000474695	GCF_000347515	GCF_000410795	GCF_000412205	GCF_000469115	GCF_000507045	GCF_000247735	GCF_000338115	GCF_000203855	GCF_000463075	GCF_000143745	GCF_000466845	GCF_000148815	GCF_000392485	GCF_000023085	GCF_000466905
+s__Halorhodospira_halophila	1	GCF_000015585
+s__Pelodictyon_luteolum	1	GCF_000012485
+s__Idiomarina_xiamenensis	1	GCF_000299895
+s__Banana_streak_virus	1	PRJNA16747
+s__Corchorus_golden_mosaic_virus	1	PRJNA20051
+s__Tomato_golden_mosaic_virus	1	PRJNA14072
+s__Campylobacter_phage_NCTC12673	1	PRJNA66395
+s__Rice_grassy_stunt_virus	1	PRJNA14692
+s__Rhodobacteraceae_bacterium_HTCC2083	1	GCF_000156115
+s__Flavobacteriaceae_bacterium_3519_10	1	GCF_000023725
+s__Thermoanaerobacterium_saccharolyticum	1	GCF_000307585
+s__Methylopila_sp_M107	1	GCF_000384475
+s__Wolbachia_sp_wRi	1	GCF_000022285
+s__Cyanophage_SS120_1	1	PRJNA195516
+s__Nocardia_farcinica	1	GCF_000009805
+s__Campylobacter_lari	1	GCF_000019205
+s__Marinobacter_sp_ELB17	1	GCF_000169375
+s__Adoxophyes_orana_nucleopolyhedrovirus	1	PRJNA32387
+s__Halorubrum_litoreum	1	GCF_000337395
+s__Streptococcus_thermophilus	9	GCF_000284675	GCF_000011825	GCF_000014485	GCF_000253395	GCF_000011845	GCF_000182875	GCF_000262675	GCF_000335515	GCF_000335495
+s__Clover_yellow_mosaic_virus	1	PRJNA14645
+s__Tai_Forest_ebolavirus	1	PRJNA51257
+s__Emilia_yellow_vein_virus_associated_DNA_beta	1	PRJNA37893
+s__Wohlfahrtiimonas_chitiniclastica	2	GCF_000334955	GCF_000375345
+s__Candidatus_Glomeribacter_gigasporarum	1	GCF_000227585
+s__Atlantic_salmon_swim_bladder_sarcoma_virus	1	PRJNA16247
+s__Actinomyces_georgiae	1	GCF_000277685
+s__Azoarcus_sp_BH72	1	GCF_000061505
+s__Oribacterium_sp_ACB7	1	GCF_000238075
+s__Pontibacter_roseus	1	GCF_000373265
+s__Fusobacterium_ulcerans	2	GCF_000158315	GCF_000242995
+s__Apple_mosaic_virus	1	PRJNA14745
+s__Eubacterium_limosum	1	GCF_000152245
+s__Mycoplasma_pneumoniae	7	GCF_000319655	GCF_000143945	GCF_000331085	GCF_000319675	GCF_000387745	GCF_000027345	GCF_000283755
+s__Streptomyces_sulphureus	2	GCF_000262345	GCF_000381025
+s__Scardovia_inopinata	1	GCF_000163755
+s__Desulfovibrio_aespoeensis	1	GCF_000176915
+s__Aggregatibacter_phage_S1249	1	PRJNA41333
+s__Bacillus_siamensis	1	GCF_000262045
+s__Tomato_common_mosaic_virus	1	PRJNA30185
+s__Bartonella_birtlesii	3	GCF_000296235	GCF_000273375	GCF_000278095
+s__Mesorhizobium_ciceri	1	GCF_000185905
+s__Arthrobacter_sp_Rue61a	1	GCF_000294695
+s__Megasphaera_sp_UPII_135_E	1	GCF_000221545
+s__Neosartorya_fischeri	1	GCA_000149645
+s__Heliobacterium_modesticaldum	1	GCF_000019165
+s__Anaerofustis_stercorihominis	1	GCF_000154825
+s__Oribacterium_sp_ACB8	1	GCF_000277505
+s__Citrobacter_rodentium	1	GCF_000027085
+s__Pelobacter_propionicus	1	GCF_000015045
+s__Parasutterella_excrementihominis	1	GCF_000205025
+s__Anaeromyxobacter_sp_Fw109_5	1	GCF_000017505
+s__Photorhabdus_temperata	2	GCF_000478765	GCF_000447415
+s__Streptococcus_gallolyticus	4	GCF_000203195	GCF_000146525	GCF_000027185	GCF_000270145
+s__Mycoplasma_hominis	2	GCF_000085865	GCF_000385075
+s__Atopobium_minutum	1	GCF_000364325
+s__Lactobacillus_coryniformis	3	GCF_000283115	GCF_000166795	GCF_000184285
+s__Anaeromyxobacter_dehalogenans	2	GCF_000013385	GCF_000022145
+s__Flavobacterium_columnare	1	GCF_000240075
+s__Nyamanini_virus	1	PRJNA38109
+s__Arthrobacter_sp_131MFCol6_1	1	GCF_000374925
+s__Staphylococcus_sp_OJ82	1	GCF_000294465
+s__Figwort_mosaic_virus	1	PRJNA14512
+s__Chromohalobacter_salexigens	1	GCF_000055785
+s__Bean_dwarf_mosaic_virus	1	PRJNA14037
+s__Pseudaletia_unipuncta_granulovirus	1	PRJNA43731
+s__Gayadomonas_joobiniege	1	GCF_000300815
+s__Vibrio_phage_CTX	1	PRJNA63437
+s__Tobacco_leaf_curl_betasatellite	1	PRJNA45925
+s__Etapapillomavirus_1	1	PRJNA14205
+s__Tobacco_mild_green_mosaic_virus	1	PRJNA14671
+s__alpha_proteobacterium_BAL199	1	GCF_000171835
+s__Royal_Farm_virus	1	PRJNA15149
+s__Lactobacillus_coleohominis	1	GCF_000161935
+s__Granulibacter_bethesdensis	1	GCF_000014285
+s__Acinetobacter_sp_NBRC_100985	1	GCF_000241225
+s__Parabacteroides_sp_20_3	1	GCF_000162535
+s__Prevotella_oralis	3	GCF_000185145	GCF_000507905	GCF_000413355
+s__Methanosaeta_thermophila	1	GCF_000014945
+s__Pseudomonas_sp_CFII68	1	GCF_000416195
+s__Curtobacterium_sp_B18	1	GCF_000333375
+s__Tobacco_leaf_curl_disease_associated_sequence	1	PRJNA14442
+s__Francisella_tularensis	26	GCF_000016105	GCF_000009245	GCF_000305875	GCF_000380385	GCF_000248415	GCF_000009325	GCF_000023305	GCF_000168775	GCF_000017785	GCF_000380405	GCF_000155535	GCF_000380425	GCF_000380445	GCF_000154165	GCF_000305915	GCF_000154145	GCF_000170295	GCF_000018925	GCF_000305835	GCF_000305855	GCF_000014605	GCF_000313275	GCF_000305895	GCF_000346525	GCF_000153845	GCF_000313385
+s__Frankia_sp_QA3	1	GCF_000262465
+s__Cronobacter_phage_ESP2949_1	1	PRJNA181234
+s__Mycoplasma_phage_phiMFV1	1	PRJNA14387
+s__Triticum_mosaic_virus	1	PRJNA38495
+s__Proteus_mirabilis	7	GCF_000372565	GCF_000313255	GCF_000160755	GCF_000444425	GCF_000297835	GCF_000069965	GCF_000297815
+s__Pseudomonas_phage_Pf1	1	PRJNA14571
+s__Pseudomonas_phage_phiKMV	1	PRJNA15226
+s__Pseudomonas_phage_Pf3	1	PRJNA14061
+s__Ilheus_virus	1	PRJNA18845
+s__Geobacter_daltonii	1	GCF_000022265
+s__Ludwigia_yellow_vein_virus	1	PRJNA15559
+s__Maize_streak_virus	1	PRJNA14577
+s__Mycobacterium_phage_Bane1	1	PRJNA219118
+s__Mycobacterium_phage_Bane2	1	PRJNA219119
+s__Flavobacterium_cauense	1	GCF_000498475
+s__Synechococcus_phage_S_RIP1	1	PRJNA195487
+s__Synechococcus_phage_S_RIP2	1	PRJNA195486
+s__Beet_western_yellows_virus	1	PRJNA14885
+s__Pyrococcus_abyssi	1	GCF_000195935
+s__Olsenella_profusa	1	GCF_000468755
+s__Calothrix_sp_PCC_7103	1	GCF_000331305
+s__Discula_destructiva_virus_2	1	PRJNA14787
+s__Ostreococcus_virus_OsV5	1	PRJNA28159
+s__Discula_destructiva_virus_1	1	PRJNA14117
+s__Haloferax_sulfurifontis	1	GCF_000337835
+s__Taylorella_asinigenitalis	1	GCF_000226625
+s__Plautia_stali_intestine_virus	1	PRJNA14799
+s__Tacaribe_virus	1	PRJNA14863
+s__Lachnospiraceae_bacterium_9_1_43BFAA	1	GCF_000209445
+s__Tomato_chlorotic_mottle_virus	1	PRJNA14175
+s__Pelodictyon_phaeoclathratiforme	1	GCF_000020645
+s__Enterobacter_mori	1	GCF_000211415
+s__Beak_and_feather_disease_virus	1	PRJNA14453
+s__Turicibacter_sanguinis	1	GCF_000178255
+s__Salimicrobium_sp_MJ3	1	GCF_000299295
+s__Bdellovibrio_phage_phi1422	1	PRJNA181215
+s__Apium_virus_Y	1	PRJNA61905
+s__Tomato_mosaic_leaf_curl_virus	1	PRJNA14370
+s__Amasya_cherry_disease_associated_chrysovirus	1	PRJNA21113
+s__Rhizobium_sp_JGI_0001005_H05	1	GCF_000375385
+s__Mycobacterium_phage_Cooper	1	PRJNA17145
+s__Cylindrospermum_stagnale	1	GCF_000317535
+s__Xanthomonas_phage_Xop411	1	PRJNA19771
+s__Irkut_virus	1	PRJNA194140
+s__Acinetobacter_sp_ANC_3880	1	GCF_000369845
+s__Staphylococcus_phage_phiSauS_IPLA88	1	PRJNA33001
+s__Humulus_japonicus_latent_virus	1	PRJNA14958
+s__Staphylococcus_phage_JS01	1	PRJNA212710
+s__Marine_Group_II_euryarchaeote_SCGC_AB_629_J06	1	GCF_000376045
+s__Streptomyces_sp_LaPpAH_108	1	GCF_000373625
+s__Anabaena_variabilis	1	GCF_000204075
+s__Glaciecola_arctica	1	GCF_000314995
+s__Desulfuromonas_acetoxidans	1	GCF_000167355
+s__Banana_bract_mosaic_virus	1	PRJNA20617
+s__Citrobacter_sp_KTE151	1	GCF_000398845
+s__Loktanella_phage_pCB2051_A	1	PRJNA195476
+s__Streptococcus_mitis_oralis_pneumoniae	301	GCF_000251625	GCF_000185265	GCF_000252305	GCF_000251285	GCF_000251565	GCF_000278885	GCF_000251825	GCF_000506605	GCF_000232325	GCF_000019265	GCF_000252265	GCF_000257495	GCF_000171655	GCF_000506765	GCF_000170035	GCF_000232725	GCF_000147095	GCF_000232645	GCF_000334555	GCF_000385715	GCF_000232785	GCF_000232765	GCF_000385835	GCF_000334735	GCF_000210995	GCF_000232945	GCF_000430345	GCF_000180575	GCF_000222785	GCF_000210955	GCF_000334695	GCF_000251085 [...]
+s__Halorubrum_phage_HF2	1	PRJNA14147
+s__Eubacterium_sp_3_1_31	2	GCF_000273585	GCF_000242955
+s__Rothia_aeria	2	GCF_000479025	GCF_000258205
+s__Vibrio_phage_VPUSM_8	1	PRJNA227006
+s__Narcissus_common_latent_virus	1	PRJNA17373
+s__Brucella_sp_BO2	1	GCF_000177135
+s__Listeria_phage_P70	1	PRJNA177526
+s__Penaeus_vannamei_nodavirus	1	PRJNA62263
+s__Celery_mosaic_virus	1	PRJNA65809
+s__Bacteroides_sp_4_1_36	1	GCF_000185585
+s__Mycobacterium_chubuense	1	GCF_000266905
+s__Gossypium_davidsonii_symptomless_alphasatellite	1	PRJNA39589
+s__Salmonella_phage_SPN1S	1	PRJNA82639
+s__Candida_tropicalis	1	GCA_000006335
+s__Erwinia_toletana	1	GCF_000336255
+s__Streptomyces_sp_SirexAA_E	1	GCF_000177195
+s__Cyanophage_KBS_S_2A	1	PRJNA195502
+s__Amycolatopsis_vancoresmycina	1	GCF_000388135
+s__Propionibacterium_phage_P9_1	1	PRJNA177529
+s__Caulobacter_phage_phiCb5	1	PRJNA181078
+s__Nocardiopsis_salina	1	GCF_000341025
+s__Hantavirus_Z10	1	PRJNA15044
+s__Salmonella_phage_SPN19	1	PRJNA179408
+s__Sphingomonas_sp_KC8	1	GCF_000214335
+s__Silicibacter_phage_DSS3phi2	1	PRJNA38081
+s__Alicyclobacillus_hesperidum	1	GCF_000294675
+s__Ralstonia_phage_RSL1	1	PRJNA30059
+s__Myxococcus_xanthus	3	GCF_000278585	GCF_000340515	GCF_000012685
+s__Shigella_dysenteriae	6	GCF_000193895	GCF_000012005	GCF_000164925	GCF_000268105	GCF_000168075	GCF_000211935
+s__Turkey_astrovirus	1	PRJNA15096
+s__Roseobacter_denitrificans	1	GCF_000014045
+s__Mycobacterium_phage_Chy5	1	PRJNA206476
+s__Hepatitis_B_virus	1	PRJNA15428
+s__Chickpea_chlorosis_Australia_virus	1	PRJNA216948
+s__Streptomyces_phage_Zemlya	1	PRJNA206481
+s__Squirrel_monkey_retrovirus	1	PRJNA14914
+s__Andean_potato_latent_virus	1	PRJNA192611
+s__Ferret_papillomavirus	1	PRJNA218024
+s__Oligella_ureolytica	1	GCF_000373745
+s__Caulobacter_phage_CcrMagneto	1	PRJNA179421
+s__Streptococcus_parasanguinis	7	GCF_000260695	GCF_000262145	GCF_000180035	GCF_000187505	GCF_000507765	GCF_000164675	GCF_000222725
+s__Sphingobium_yanoikuyae	2	GCF_000315525	GCF_000224695
+s__Thermoanaerobacter_italicus	1	GCF_000025645
+s__Leptolyngbya_boryana	1	GCF_000353285
+s__Coleofasciculus_chthonoplastes	1	GCF_000155555
+s__Diadromus_pulchellus_ascovirus_4a	1	PRJNA32133
+s__Mycobacterium_phage_Cjw1	1	PRJNA14270
+s__Pseudomonas_tolaasii	2	GCF_000316215	GCF_000276565
+s__Ehrlichia_chaffeensis	2	GCF_000013145	GCF_000167655
+s__Solanum_nodiflorum_mottle_virus_satellite_RNA	1	PRJNA14184
+s__Zymomonas_mobilis	6	GCF_000218875	GCF_000024245	GCF_000303025	GCF_000007105	GCF_000277755	GCF_000175255
+s__Citrobacter_sp_L17	1	GCF_000313895
+s__Flavonifractor_plautii	1	GCF_000239295
+s__Phlox_virus_S	1	PRJNA19427
+s__Mycoplasma_hyorhinis	5	GCF_000383515	GCF_000241125	GCF_000145705	GCF_000211295	GCF_000313635
+s__Papiine_herpesvirus_2	1	PRJNA16246
+s__Bordetella_holmesii	3	GCF_000341485	GCF_000341465	GCF_000317335
+s__Rothia_mucilaginosa	3	GCF_000011025	GCF_000175615	GCF_000231235
+s__Enterococcus_phage_BC_611	1	PRJNA169229
+s__Mycobacterium_phage_ET08	1	PRJNA42783
+s__Ikoma_lyssavirus	1	PRJNA175665
+s__Methyloferula_stellata	1	GCF_000385335
+s__Pseudomonas_sp_45MFCol3_1	1	GCF_000382025
+s__Banana_streak_GF_virus	1	PRJNA15411
+s__Paracoccidioides_sp_lutzii	1	GCA_000150705
+s__Bat_picornavirus_2	1	PRJNA72393
+s__Bat_picornavirus_3	1	PRJNA72379
+s__Bat_picornavirus_1	1	PRJNA72391
+s__Rhodopirellula_sallentina	1	GCF_000346505
+s__Peptoniphilus_sp_oral_taxon_375	1	GCF_000221565
+s__Olsenella_sp_oral_taxon_809	1	GCF_000233535
+s__Salmonella_phage_phiSG_JL2	1	PRJNA30063
+s__Plasmodium_berghei	1	GCA_000005395
+s__Jannaschia_sp_CCS1	1	GCF_000013565
+s__Burkholderia_phage_KS14	1	PRJNA64613
+s__Escherichia_sp_3_2_53FAA	1	GCF_000157115
+s__Microplitis_demolitor_bracovirus	1	PRJNA15245
+s__Burkholderia_phage_KS10	1	PRJNA31221
+s__Verrucomicrobia_bacterium_SCGC_AAA300_K03	1	GCF_000382665
+s__Yersinia_enterocolitica	12	GCF_000285015	GCF_000192105	GCF_000401995	GCF_000253175	GCF_000284995	GCF_000009345	GCF_000401935	GCF_000401955	GCF_000230775	GCF_000330605	GCF_000401975	GCF_000297175
+s__Pseudomonas_sp_S13_1_2	1	GCF_000292285
+s__Caulobacter_vibrioides	2	GCF_000372645	GCF_000022005
+s__Rhynchosia_golden_mosaic_Yucatan_virus	1	PRJNA36505
+s__Staphylococcus_phage_P954	1	PRJNA40231
+s__Lactococcus_phage_P335_sensu_lato	1	PRJNA14281
+s__Wolbachia_endosymbiont_of_Nasonia_vitripennis	1	GCF_000204545
+s__Taro_bacilliform_virus	1	PRJNA14233
+s__Bombyx_mori_Macula_like_virus	1	PRJNA66973
+s__Blueberry_latent_virus	1	PRJNA56015
+s__Actinomyces_sp_ICM47	1	GCF_000278725
+s__Amycolatopsis_halophila	1	GCF_000504245
+s__Mycobacterium_phage_DrDrey	1	PRJNA215108
+s__Thermus_phage_phiYS40	1	PRJNA18277
+s__Frankia_sp_EAN1pec	1	GCF_000018005
+s__Propionibacterium_phage_PAD20	1	PRJNA66341
+s__Pineapple_mealybug_wilt_associated_virus_1	1	PRJNA28147
+s__Bacteroides_salanitronis	1	GCF_000190575
+s__Candidatus_Poribacteria_sp_WGA_A3	1	GCF_000177275
+s__Acidovorax_avenae	2	GCF_000218805	GCF_000176855
+s__Fusobacterium_varium	1	GCF_000159915
+s__Actinomyces_sp_oral_taxon_172	1	GCF_000466265
+s__Actinomyces_sp_oral_taxon_171	1	GCF_000186965
+s__Actinomyces_sp_oral_taxon_170	1	GCF_000195595
+s__Citrobacter_youngae	1	GCF_000155975
+s__Methylobacter_sp_UW_659_2_H10	1	GCF_000375885
+s__Actinomyces_sp_oral_taxon_175	1	GCF_000223355
+s__Actinomyces_sp_oral_taxon_178	1	GCF_000186685
+s__Flavobacterium_phage_6H	1	PRJNA213018
+s__Chloris_striate_mosaic_virus	1	PRJNA14068
+s__Vibrio_mimicus	8	GCF_000176415	GCF_000338875	GCF_000175975	GCF_000473785	GCF_000222145	GCF_000176375	GCF_000473825	GCF_000175995
+s__Sida_mottle_Alagoas_virus	1	PRJNA189218
+s__Natronolimnobius_innermongolicus	1	GCF_000337215
+s__Roseobacter_phage_RDJL_Phi_1	1	PRJNA66399
+s__Spirochaeta_smaragdinae	1	GCF_000143985
+s__Petrotoga_mobilis	1	GCF_000018605
+s__Akkermansia_muciniphila	1	GCF_000020225
+s__Actinomyces_viscosus	1	GCF_000175315
+s__Clostridium_sp_7_2_43FAA	1	GCF_000158375
+s__Enterobacteria_phage_MS2	1	PRJNA14659
+s__Helicobacter_winghamensis	1	GCF_000158455
+s__White_clover_cryptic_virus_1	1	PRJNA15061
+s__Human_papillomavirus_161_like_viruses	1	PRJNA178458
+s__Yoka_poxvirus	1	PRJNA72715
+s__Pseudomonas_phage_KPP10	1	PRJNA64611
+s__Pseudomonas_phage_KPP12	1	PRJNA184164
+s__Anaerostipes_sp_3_2_56FAA	1	GCF_000185825
+s__Methylibium_petroleiphilum	1	GCF_000015725
+s__Bdellovibrio_phage_phi1402	1	PRJNA68417
+s__Youcai_mosaic_virus	1	PRJNA14869
+s__Sagittula_stellata	1	GCF_000169415
+s__Lactobacillus_phage_phig1e	1	PRJNA14315
+s__Ipomoea_begomovirus_satellite_DNA_beta	1	PRJNA80873
+s__Eimeria_tenella	1	GCA_000002835
+s__Cebus_albifrons_polyomavirus_1	1	PRJNA183903
+s__Brucella_ceti	5	GCF_000158775	GCF_000157855	GCF_000157835	GCF_000158755	GCF_000182425
+s__Xipapillomavirus_1	1	PRJNA15452
+s__Tobacco_vein_distorting_virus	1	PRJNA29875
+s__Clostridium_autoethanogenum	1	GCF_000484505
+s__Pseudoalteromonas_arctica	1	GCF_000238395
+s__Nitratifractor_salsuginis	1	GCF_000186245
+s__Kyasanur_forest_disease_virus	1	PRJNA15387
+s__Honeysuckle_yellow_vein_beta	1	PRJNA19601
+s__Ageratum_leaf_curl_betasatellite	1	PRJNA195929
+s__Selenomonas_sp_oral_taxon_149	1	GCF_000146365
+s__Methylomicrobium_album	1	GCF_000214275
+s__Mycoplasma_putrefaciens	2	GCF_000224105	GCF_000376625
+s__Pediococcus_acidilactici	3	GCF_000235805	GCF_000146325	GCF_000163095
+s__Asticcacaulis_benevestitus	2	GCF_000495775	GCF_000376105
+s__Rhizobium_sp_2MFCol3_1	1	GCF_000377565
+s__Streptomyces_sp_R1_NS_10	1	GCF_000376565
+s__Tula_virus	1	PRJNA14936
+s__gamma_proteobacterium_IMCC3088	1	GCF_000204315
+s__Anaerolinea_thermophila	1	GCF_000199675
+s__Helicoverpa_armigera_nucleopolyhedrovirus	3	PRJNA14108	PRJNA14615	PRJNA32205
+s__Ehrlichia_canis	1	GCF_000012565
+s__Mobiluncus_mulieris	4	GCF_000148485	GCF_000160615	GCF_000146895	GCF_000176775
+s__Clostridium_stercorarium	1	GCF_000331995
+s__Torque_teno_virus_26	1	PRJNA48157
+s__Enterobacteria_phage_vB_EcoS_Rogue1	1	PRJNA183151
+s__Woolly_monkey_sarcoma_virus	1	PRJNA19547
+s__Torque_teno_virus_25	1	PRJNA48165
+s__Streptomyces_sp_HCCB10043	1	GCF_000498935
+s__Torque_teno_virus_28	1	PRJNA48145
+s__Staphylococcus_phage_G1	1	PRJNA15261
+s__Erwinia_phage_PEp14	1	PRJNA82653
+s__Chalara_elegans_RNA_Virus_1	1	PRJNA15126
+s__Candidatus_Riesia_pediculicola	1	GCF_000093065
+s__Veillonella_ratti	1	GCF_000315505
+s__Strawberry_pallidosis_associated_virus	1	PRJNA15058
+s__Human_adenovirus_C	3	PRJNA14518	PRJNA15106	PRJNA15107
+s__Enterobacteria_phage_phiEcoM_GJ1	1	PRJNA27979
+s__Canine_circovirus	1	PRJNA196432
+s__Lactococcus_phage_bIL285	1	PRJNA14111
+s__Lactococcus_phage_bIL286	1	PRJNA14397
+s__Pedobacter_arcticus	1	GCF_000302595
+s__Acinetobacter_phage_Acj61	1	PRJNA60117
+s__Blackberry_vein_banding_associated_virus	1	PRJNA215129
+s__Tomato_leaf_curl_virus	1	PRJNA14191
+s__Novosphingobium_sp_PP1Y	1	GCF_000253255
+s__Raphanus_sativus_cryptic_virus_2	1	PRJNA28757
+s__Raphanus_sativus_cryptic_virus_3	1	PRJNA33269
+s__Raphanus_sativus_cryptic_virus_1	1	PRJNA17127
+s__Halopiger_xanaduensis	1	GCF_000217715
+s__Methanocella_conradii	1	GCF_000251105
+s__Candidatus_Accumulibacter_phosphatis	1	GCF_000024165
+s__Cellulophaga_phage_phi39_1	1	PRJNA212957
+s__Pseudomonas_fluorescens	21	GCF_000275925	GCF_000009225	GCF_000292795	GCF_000297195	GCF_000012445	GCF_000285355	GCF_000275905	GCF_000263695	GCF_000285955	GCF_000281895	GCF_000237065	GCF_000263675	GCF_000293885	GCF_000334015	GCF_000465595	GCF_000166515	GCF_000308175	GCF_000262325	GCF_000280805	GCF_000276585	GCF_000217955
+s__Salmonella_phage_SE1	1	PRJNA33483
+s__Haloterrigena_thermotolerans	1	GCF_000337115
+s__Aster_yellows_witches_broom_phytoplasma	1	GCF_000012225
+s__Burkholderia_phage_Bcep22	1	PRJNA14335
+s__Mycobacterium_phage_PG1	1	PRJNA14357
+s__Chinese_yam_necrotic_mosaic_virus	1	PRJNA173355
+s__Natronorubrum_sulfidifaciens	1	GCF_000337735
+s__Zantedeschia_mild_mosaic_virus	1	PRJNA32715
+s__Haloferax_lucentense	1	GCF_000336795
+s__Leptospira_terpstrae	1	GCF_000332495
+s__Listeria_phage_LP_125	1	PRJNA212716
+s__Staphylococcus_phage_SAP_2	1	PRJNA20925
+s__Alishewanella_agri	1	GCF_000272005
+s__Marinobacterium_rhizophilum	1	GCF_000378045
+s__Vibrio_nigripulchritudo	1	GCF_000222685
+s__Bovine_herpesvirus_1	1	PRJNA14585
+s__Grapevine_red_blotch_associated_virus	1	PRJNA214508
+s__Bacillus_sp_7_6_55CFAA_CT2	1	GCF_000238655
+s__Acinetobacter_sp_528	1	GCF_000302395
+s__Cardioderma_polyomavirus	1	PRJNA185188
+s__Gordonia_otitidis	1	GCF_000248075
+s__Clostridium_sp_BNL1100	1	GCF_000244875
+s__Murine_coronavirus	3	PRJNA15138	PRJNA15350	PRJNA39313
+s__Pseudomonas_sp_313	1	GCF_000316965
+s__Acinetobacter_baumannii	191	GCF_000184495	GCF_000314635	GCF_000372585	GCF_000188215	GCF_000341985	GCF_000314655	GCF_000241665	GCF_000332855	GCF_000189695	GCF_000309235	GCF_000413915	GCF_000309135	GCF_000021145	GCF_000297535	GCF_000305255	GCF_000282795	GCF_000309115	GCF_000335615	GCF_000309195	GCF_000163375	GCF_000369225	GCF_000301515	GCF_000369165	GCF_000441955	GCF_000353995	GCF_000369265	GCF_000301995	GCF_000302015	GCF_000302075	GCF_000069245	GCF_000354075	GCF_000241685	GCF_000314615 [...]
+s__Psychromonas_ingrahamii	1	GCF_000015285
+s__Pseudomonas_phage_PB1	1	PRJNA33499
+s__Mobiluncus_curtisii	4	GCF_000146285	GCF_000196535	GCF_000185445	GCF_000185425
+s__Oribacterium_sinus	1	GCF_000160635
+s__SAR116_cluster_alpha_proteobacterium_HIMB100	1	GCF_000238815
+s__Titi_monkey_adenovirus_ECC_2011	1	PRJNA192854
+s__Pigeon_picornavirus_B	1	PRJNA67691
+s__Choristoneura_fumiferana_DEF_multiple_nucleopolyhedrovirus	1	PRJNA15137
+s__Bacteriovorax_sp_DB6_IX	1	GCF_000447775
+s__Flavobacterium_branchiophilum	1	GCF_000253275
+s__Clostridium_asparagiforme	1	GCF_000158075
+s__Ruminococcus_bromii	1	GCF_000209875
+s__Acinetobacter_baylyi	2	GCF_000302115	GCF_000368685
+s__Sulfolobus_virus_STSV1	1	PRJNA14561
+s__Mycobacterium_phage_Ardmore	1	PRJNA46607
+s__Sulfolobus_virus_STSV2	1	PRJNA185313
+s__Leucania_separata_nucleopolyhedrovirus	1	PRJNA17669
+s__Ralstonia_phage_RSS30	1	PRJNA213021
+s__Japanese_yam_mosaic_virus	1	PRJNA15365
+s__Camelpox_virus	1	PRJNA14156
+s__Bacillus_sp_EGD_AK10	1	GCF_000465855
+s__Burkholderia_sp_RPE64	1	GCF_000402035
+s__Myroides_odoratimimus	6	GCF_000242095	GCF_000413415	GCF_000297855	GCF_000242135	GCF_000242075	GCF_000297875
+s__Tuber_melanosporum	1	GCA_000151645
+s__Horseradish_curly_top_virus	1	PRJNA14100
+s__Bean_common_mosaic_virus	1	PRJNA15183
+s__Pear_blister_canker_viroid	1	PRJNA14965
+s__Anagyris_vein_yellowing_virus	1	PRJNA32713
+s__Mimosa_yellow_leaf_curl_virus_satellite_DNA_beta	1	PRJNA19821
+s__Ononis_yellow_mosaic_virus	1	PRJNA14669
+s__Epulopiscium_sp_N_t_morphotype_B	1	GCF_000171335
+s__SAR202_cluster_bacterium_SCGC_AAA240_N13	1	GCF_000372165
+s__Butterbur_mosaic_virus	1	PRJNA42145
+s__Synechococcus_sp_CC9605	1	GCF_000012625
+s__Salmonella_phage_HK620	1	PRJNA14115
+s__Gloeobacter_violaceus	1	GCF_000011385
+s__Halorubrum_coriense	1	GCF_000337035
+s__Streptomyces_avermitilis	1	GCF_000009765
+s__Gluconobacter_morbifer	1	GCF_000234355
+s__Methylotenera_sp_73s	1	GCF_000384435
+s__Tomato_zonate_spot_virus	1	PRJNA29091
+s__Asparagus_virus_3	1	PRJNA28979
+s__Selenomonas_infelix	1	GCF_000234095
+s__Sunflower_leaf_curl_Karnataka_alphasatellite	1	PRJNA181246
+s__Sabia_virus	1	PRJNA15054
+s__Borrelia_valaisiana	1	GCF_000170955
+s__Turnip_curly_top_virus	1	PRJNA50429
+s__Cucumber_green_mottle_mosaic_virus	1	PRJNA14681
+s__Eggerthella_lenta	1	GCF_000024265
+s__Thermoanaerobacter_brockii	1	GCF_000175295
+s__Gemella_morbillorum	1	GCF_000185645
+s__Haloferax_volcanii	2	GCF_000025685	GCF_000337315
+s__Rous_sarcoma_virus	1	PRJNA14978
+s__Mycobacterium_liflandii	1	GCF_000026445
+s__Anoxybacillus_flavithermus	4	GCF_000019045	GCF_000327465	GCF_000353425	GCF_000367505
+s__Tomato_rugose_mosaic_virus	1	PRJNA14101
+s__Tembusu_virus	1	PRJNA70159
+s__Candidatus_Tremblaya_phenacola	1	GCF_000412755
+s__Corynebacterium_sp_KPL1860	1	GCF_000477995
+s__Bean_golden_yellow_mosaic_virus	1	PRJNA14200
+s__Bacteroides_sp_1_1_14	1	GCF_000162515
+s__Pseudomonas_phage_JBD24	1	PRJNA188535
+s__Clostridium_sp_DL_VIII	1	GCF_000230835
+s__alpha_proteobacterium_SCGC_AAA300_J04	1	GCF_000382645
+s__Feline_picornavirus	1	PRJNA76725
+s__Mycobacterium_phage_Quink	1	PRJNA219113
+s__Bacillus_cytotoxicus	1	GCF_000017425
+s__Bartonella_tamiae	2	GCF_000278275	GCF_000279995
+s__Persimmon_viroid_2	1	PRJNA210930
+s__Pseudomonas_amygdali	10	GCF_000163275	GCF_000145945	GCF_000145765	GCF_000145885	GCF_000159835	GCF_000275945	GCF_000146005	GCF_000163255	GCF_000145685	GCF_000145745
+s__Tobacco_curly_shoot_betasatellite	1	PRJNA14446
+s__Rudanella_lutea	1	GCF_000383955
+s__Visna_maedi_virus	1	PRJNA14636
+s__Halorhabdus_utahensis	1	GCF_000023945
+s__Maize_chlorotic_dwarf_virus	1	PRJNA15345
+s__gamma_proteobacterium_HTCC5015	1	GCF_000155715
+s__Turdivirus_3	1	PRJNA51591
+s__Bacillus_sp_2_A_57_CT2	1	GCF_000186145
+s__Mycoplasma_phage_P1	1	PRJNA14136
+s__Patulibacter_americanus	1	GCF_000420025
+s__Treponema_succinifaciens	1	GCF_000195275
+s__Pyrobaculum_arsenaticum	1	GCF_000016385
+s__Gordonia_phage_GRU1	1	PRJNA78691
+s__Tomato_yellow_vein_streak_virus	1	PRJNA30171
+s__Glaciecola_nitratireducens	1	GCF_000226565
+s__Plasmodium_falciparum	1	GCA_000002765
+s__Pseudomonas_phage_phi12	1	PRJNA14855
+s__Pseudomonas_phage_phi13	1	PRJNA14854
+s__Bacteroides_sp_2_1_16	1	GCF_000162135
+s__Lettuce_ring_necrosis_virus	1	PRJNA14959
+s__Tomato_leaf_curl_Malaysia_virus	1	PRJNA14260
+s__Clostridium_phage_phiCTP1	1	PRJNA51665
+s__Bacillus_phage_W_Ph	1	PRJNA80913
+s__Collimonas_fungivorans	1	GCF_000221045
+s__Micromonospora_sp_CNB394	1	GCF_000374985
+s__Saccharomonospora_azurea	2	GCF_000231055	GCF_000236985
+s__Smaragdicoccus_niigatensis	1	GCF_000380645
+s__Asticcacaulis_biprosthecium	1	GCF_000204015
+s__Roseomonas_sp_B5	1	GCF_000292225
+s__Bartonella_clarridgeiae	1	GCF_000253015
+s__Methylomonas_methanica	1	GCF_000214665
+s__Salmonella_phage_SE2	1	PRJNA82643
+s__Propionibacterium_sp_oral_taxon_192	1	GCF_000413315
+s__Cellulophaga_phage_phi17_1	1	PRJNA212962
+s__Pseudomonas_phage_UFV_P2	1	PRJNA177548
+s__Psychrobacter_sp_1501_2011	1	GCF_000213615
+s__Sida_golden_mosaic_Buckup_virus_Jamaica_St_Elizabeth_2004	1	PRJNA61135
+s__Glossina_hytrovirus	1	PRJNA28839
+s__Mycobacterium_phage_Wee	1	PRJNA61859
+s__Carnobacterium_maltaromaticum	2	GCF_000317975	GCF_000238575
+s__Rickettsia_peacockii	1	GCF_000021525
+s__Acanthamoeba_polyphaga_mimivirus	1	PRJNA60053
+s__Astrovirus_MLB1	2	PRJNA32327	PRJNA50359
+s__Mycoplasma_haemocanis	1	GCF_000238995
+s__Prochlorococcus_phage_P_SSP3	1	PRJNA195517
+s__Prochlorococcus_phage_P_SSP7	1	PRJNA15134
+s__Bradyrhizobium_sp_YR681	1	GCF_000282615
+s__Changuinola_virus	1	PRJNA226021
+s__Mycobacterium_phage_Rosebush	1	PRJNA14304
+s__Corynebacterium_pyruviciproducens	1	GCF_000411375
+s__Banana_streak_UA_virus	1	PRJNA66609
+s__Stenotrophomonas_phage_phiSHP2	1	PRJNA67419
+s__Sulfuricurvum_kujiense	1	GCF_000183725
+s__Limnobacter_sp_MED105	1	GCF_000170915
+s__Butyrivibrio_fibrisolvens	3	GCF_000420985	GCF_000420965	GCF_000209815
+s__Stanieria_cyanosphaera	1	GCF_000317575
+s__Erwinia_amylovora_phage_Era103	1	PRJNA18839
+s__Plasmodium_yoelii	1	GCA_000003085
+s__Pseudomonas_phage_EL	1	PRJNA16199
+s__Alcanivorax_borkumensis	1	GCF_000009365
+s__Myxococcus_stipitatus	1	GCF_000331735
+s__Sphingomonas_sp_SKA58	1	GCF_000153545
+s__Barley_mild_mosaic_virus	1	PRJNA15338
+s__Corynebacterium_urealyticum	2	GCF_000069945	GCF_000338095
+s__Cardamine_chlorotic_fleck_virus	1	PRJNA14674
+s__Pseudomonas_sp_GM102	1	GCF_000282555
+s__Nocardiopsis_ganjiahuensis	1	GCF_000341085
+s__Vibrio_phage_PWH3a_P1	1	PRJNA195481
+s__Chlamydophila_felis	1	GCF_000009945
+s__Halobiforma_nitratireducens	1	GCF_000337895
+s__Sulfurovum_sp_NBC37_1	1	GCF_000010345
+s__Prevotella_buccalis	1	GCF_000177075
+s__Escherichia_phage_HK75	1	PRJNA76733
+s__Japanese_iris_necrotic_ring_virus	1	PRJNA15094
+s__Diatraea_saccharalis_densovirus	1	PRJNA14036
+s__Bacillus_sp_FJAT_13831	1	GCF_000299035
+s__Oxalobacter_formigenes	2	GCF_000158475	GCF_000158495
+s__Brevibacillus_sp_CF112	1	GCF_000282015
+s__Human_enteric_coronavirus_strain_4408	1	PRJNA39335
+s__Crimean_Congo_hemorrhagic_fever_virus	1	PRJNA15026
+s__Lactococcus_garvieae	14	GCF_000407645	GCF_000213885	GCF_000236535	GCF_000305995	GCF_000236515	GCF_000212475	GCF_000269705	GCF_000407125	GCF_000305975	GCF_000269925	GCF_000504505	GCF_000205485	GCF_000300795	GCF_000269945
+s__Actinidia_virus_B	1	PRJNA77137
+s__Yersinia_frederiksenii	1	GCF_000168015
+s__Actinomyces_sp_HPA0247	1	GCF_000411415
+s__Tomato_leaf_curl_Mindanao_virus	1	PRJNA29011
+s__Synechococcus_phage_S_SKS1	1	PRJNA195489
+s__Lloviu_virus	1	PRJNA76475
+s__Potato_apical_leaf_curl_disease_associated_satellite_DNA_beta	1	PRJNA18323
+s__gamma_proteobacterium_IMCC2047	1	GCF_000211335
+s__Staphylococcus_phage_S24_1	1	PRJNA80917
+s__Finegoldia_magna	5	GCF_000159695	GCF_000179495	GCF_000010185	GCF_000179695	GCF_000221585
+s__Enterobacteria_phage_EcoDS1	1	PRJNA30601
+s__Turnip_rosette_virus	1	PRJNA14876
+s__Actinomadura_atramentaria	1	GCF_000381885
+s__Thermincola_potens	1	GCF_000092945
+s__Enterobacteria_phage_ID18_sensu_lato	1	PRJNA16628
+s__French_bean_leaf_curl_virus_Kanpur	1	PRJNA169555
+s__Clostridium_glycolicum	1	GCF_000373865
+s__Enterococcus_phage_phiEF24C	1	PRJNA21009
+s__Paenibacillus_fonticola	1	GCF_000381905
+s__Miniopterus_polyomavirus	1	PRJNA185189
+s__Oat_golden_stripe_virus	1	PRJNA15093
+s__Mycobacterium_phage_Jobu08	1	PRJNA209074
+s__Carrot_red_leaf_virus	1	PRJNA15057
+s__Anaerophaga_thermohalophila	2	GCF_000191885	GCF_000250735
+s__Macaca_fascicularis_polyomavirus_1	1	PRJNA183904
+s__Corynebacterium_caspium	1	GCF_000379705
+s__Methylosarcina_fibrata	1	GCF_000372865
+s__Agrobacterium_albertimagni	1	GCF_000300855
+s__Bacillus_virus_1	1	PRJNA20397
+s__Bat_circovirus	1	PRJNA202887
+s__Neodiprion_lecontei_nucleopolyhedrovirus	1	PRJNA14617
+s__Halomonas_zhanjiangensis	1	GCF_000377665
+s__Pectobacterium_phage_ZF40	1	PRJNA181216
+s__Staphylococcus_phage_PVL	1	PRJNA14392
+s__Serratia_sp_AS13	1	GCF_000214805
+s__Serratia_sp_AS12	1	GCF_000214195
+s__MW_polyomavirus	1	XXX
+s__Rickettsia_slovaca	2	GCF_000252365	GCF_000237845
+s__Feline_astrovirus_2	1	PRJNA218014
+s__Anaplasma_marginale	7	GCF_000172515	GCF_000172475	GCF_000495535	GCF_000495495	GCF_000172495	GCF_000011945	GCF_000020305
+s__Methanocella_arvoryzae	1	GCF_000063445
+s__Prune_dwarf_virus	1	PRJNA16818
+s__Lactobacillus_shenzhenensis	1	GCF_000469325
+s__Chlamydia_psittaci	46	GCF_000211155	GCF_000415685	GCF_000298475	GCF_000415525	GCF_000415665	GCF_000298515	GCF_000270445	GCF_000415565	GCF_000270405	GCF_000204175	GCF_000415805	GCF_000417695	GCF_000298535	GCF_000298555	GCF_000417565	GCF_000298455	GCF_000298435	GCF_000417585	GCF_000270385	GCF_000417825	GCF_000417845	GCF_000417715	GCF_000417655	GCF_000298375	GCF_000417735	GCF_000317995	GCF_000415645	GCF_000415625	GCF_000191925	GCF_000338695	GCF_000415845	GCF_000298495	GCF_000415545	GCF_0 [...]
+s__Porphyromonas_macacae	1	GCF_000379945
+s__Aotine_herpesvirus_1	1	PRJNA78945
+s__Pseudoalteromonas_sp_BSi20480	1	GCF_000241365
+s__Aspergillus_foetidus_dsRNA_mycovirus	1	PRJNA186431
+s__Mycobacterium_phage_PBI1	1	PRJNA17165
+s__Bacteroidetes_oral_taxon_274	1	GCF_000163695
+s__Acheta_domesticus_volvovirus	1	PRJNA198480
+s__Clostridium_cellulovorans	2	GCF_000180115	GCF_000145275
+s__Bacillus_phage_B103	1	PRJNA14216
+s__Gordonia_paraffinivorans	1	GCF_000344155
+s__Methanosalsum_zhilinae	1	GCF_000217995
+s__Pyrobaculum_aerophilum	1	GCF_000007225
+s__Burkholderia_cepacia	1	GCF_000292915
+s__Marine_Group_I_thaumarchaeote_SCGC_AB_629_A13	1	GCF_000399745
+s__Ground_squirrel_hepatitis_virus	1	PRJNA14070
+s__Helminthosporium_victoriae_145S_virus	1	PRJNA14945
+s__Ignicoccus_hospitalis	1	GCF_000017945
+s__Bovine_respiratory_coronavirus_AH187	1	PRJNA39331
+s__Caldisericum_exile	1	GCF_000284335
+s__Primula_malacoides_virus_China_Mar2007	1	PRJNA39975
+s__Streptomyces_sp_C	1	GCF_000158895
+s__Malassezia_globosa	1	GCA_000181695
+s__Mesoflavibacter_zeaxanthinifaciens	1	GCF_000220585
+s__Blueberry_red_ringspot_virus	1	PRJNA14129
+s__Caldicellulosiruptor_lactoaceticus	1	GCF_000193435
+s__Erwinia_tracheiphila	1	GCF_000404125
+s__Thermococcus_sibiricus	1	GCF_000022545
+s__Actinomyces_turicensis	1	GCF_000296505
+s__Stenotrophomonas_phage_S1	1	PRJNA32787
+s__Haloarcula_vallismortis	1	GCF_000337775
+s__Shewanella_halifaxensis	1	GCF_000019185
+s__Eragrostis_curvula_streak_virus	1	PRJNA37889
+s__Cotton_leaf_curl_betasatellite	1	PRJNA14438
+s__Tobacco_leaf_curl_Thailand_virus	1	PRJNA19799
+s__Actinopolyspora_halophila	1	GCF_000371785
+s__Halanaerobium_hydrogeniformans	1	GCF_000166415
+s__Acinetobacter_sp_CIP_110321	1	GCF_000400715
+s__Propionibacterium_sp_KPL1849	1	GCF_000477835
+s__Enterobacter_sp_638	1	GCF_000016325
+s__Sorangium_cellulosum	2	GCF_000418325	GCF_000067165
+s__Propionibacterium_sp_KPL1847	1	GCF_000477855
+s__Providencia_stuartii	2	GCF_000259175	GCF_000154865
+s__Propionibacterium_sp_KPL1844	1	GCF_000477715
+s__Frangipani_mosaic_virus	1	PRJNA53499
+s__Fusarium_graminearum	1	GCA_000240135
+s__Brachyspira_hyodysenteriae	2	GCF_000383255	GCF_000022105
+s__Leptotrichia_sp_oral_taxon_879	1	GCF_000469385
+s__Cronobacter_malonaticus	2	GCF_000319555	GCF_000319535
+s__Thiobacillus_thioparus	1	GCF_000373385
+s__Nocardiopsis_baichengensis	1	GCF_000341205
+s__Rosa_rugosa_leaf_distortion_virus	1	PRJNA191123
+s__Corynebacterium_glucuronolyticum	2	GCF_000156595	GCF_000159595
+s__Jonquetella_anthropi	2	GCF_000161995	GCF_000237805
+s__Vibrio_phage_vB_VpaS_MAR10	1	PRJNA183157
+s__Johnsonella_ignava	1	GCF_000235445
+s__Bdellovibrio_phage_phiMH2K	1	PRJNA14107
+s__Enterobacteria_phage_HK022	1	PRJNA14048
+s__Mycobacterium_phage_Adzzy	1	PRJNA215109
+s__Enterococcus_raffinosus	2	GCF_000393895	GCF_000407525
+s__Enterobacteria_phage_vB_KleM_RaK2	1	PRJNA181223
+s__Calyptogena_okutanii_thioautotrophic_gill_symbiont	1	GCF_000010405
+s__Endosymbiont_phage_APSE_1	1	PRJNA14047
+s__Rhizobium_lupini	1	GCF_000304595
+s__Pantoea_sp_YR343	1	GCF_000282695
+s__Zygosaccharomyces_rouxii	1	GCA_000026365
+s__Anaerococcus_prevotii	2	GCF_000191725	GCF_000024105
+s__Butyricimonas_synergistica	1	GCF_000379665
+s__Mycobacterium_phage_D29	1	PRJNA14203
+s__Shigella_phage_SP18	1	PRJNA56019
+s__Borrelia_sp_SV1	1	GCF_000181875
+s__Pusillimonas_sp_T7_7	1	GCF_000209655
+s__Enterobacter_phage_IME11	1	PRJNA179425
+s__Ippy_virus	1	PRJNA16633
+s__Prevotella_buccae	2	GCF_000162455	GCF_000184945
+s__Corynebacterium_jeikeium	2	GCF_000163435	GCF_000006605
+s__European_bat_lyssavirus_2	1	PRJNA19759
+s__European_bat_lyssavirus_1	1	PRJNA19757
+s__Candidatus_Phytoplasma_mali	1	GCF_000026205
+s__Halococcus_thailandensis	1	GCF_000336715
+s__Mycobacterium_phage_TM4	1	PRJNA14154
+s__Pseudomonas_phage_PaBG	1	PRJNA215670
+s__Klebsiella_sp_KTE92	1	GCF_000398905
+s__Streptococcus_phage_TP_778L	1	PRJNA227111
+s__Lutibaculum_baratangense	1	GCF_000496075
+s__Spinach_severe_curly_top_virus	1	PRJNA59507
+s__Pepper_curly_top_virus	1	PRJNA19745
+s__Espirito_Santo_virus	1	PRJNA80737
+s__Methylovorus_sp_MP688	1	GCF_000183115
+s__Selenomonas_sp_FOBRC9	1	GCF_000287655
+s__Streptococcus_australis	2	GCF_000186465	GCF_000222745
+s__Listeria_marthii	1	GCF_000183865
+s__Croton_yellow_vein_mosaic_virus	1	PRJNA15195
+s__Variovorax_paradoxus	5	GCF_000463015	GCF_000023345	GCF_000377585	GCF_000184745	GCF_000382045
+s__Serratia_liquefaciens	1	GCF_000422085
+s__Cowpox_virus	1	PRJNA14174
+s__Streptomyces_sp_ATexAB_D23	1	GCF_000373645
+s__Coprobacillus_sp_29_1	1	GCF_000186525
+s__Selenomonas_artemidis	1	GCF_000187125
+s__Granulicatella_elegans	1	GCF_000162475
+s__Streptococcus_macedonicus	1	GCF_000283635
+s__Pseudanabaena_biceps	1	GCF_000332215
+s__Pseudomonas_phage_LUZ7	1	PRJNA42951
+s__Clostridium_phage_phiCD119	1	PRJNA16662
+s__Capnocytophaga_sp_oral_taxon_863	1	GCF_000466425
+s__Bhendi_yellow_vein_mosaic_virus	1	PRJNA14159
+s__Torque_teno_virus_10	1	PRJNA48151
+s__Helicobacter_canadensis	2	GCF_000162575	GCF_000155455
+s__Beet_virus_Q	1	PRJNA15091
+s__Tomato_black_ring_virus_satellite_RNA	1	PRJNA15016
+s__Helicobacter_cetorum	2	GCF_000259275	GCF_000259255
+s__Gramella_forsetii	1	GCF_000060345
+s__Night_heron_coronavirus_HKU19	1	PRJNA109277
+s__Rubellimicrobium_thermophilum	1	GCF_000442315
+s__Vernonia_yellow_vein_Fujian_virus_alphasatellite	1	PRJNA72145
+s__Pseudomonas_phage_LKA1	1	PRJNA21045
+s__Malvastrum_yellow_vein_betasatellite	1	PRJNA15317
+s__Chilli_leaf_curl_virus	1	PRJNA14250
+s__Paenibacillus_sp_HW567	1	GCF_000374185
+s__Cassava_mosaic_Madagascar_virus	1	PRJNA129597
+s__Pseudovibrio_sp_FO_BEG1	1	GCF_000236645
+s__Fort_Morgan_virus	1	PRJNA42147
+s__Austwickia_chelonae	1	GCF_000298175
+s__Ageratum_yellow_vein_betasatellite	1	PRJNA14444
+s__Desulfatibacillum_alkenivorans	1	GCF_000021905
+s__Yersinia_phage_phiR1_RT	1	PRJNA184143
+s__Leptospira_licerasiae	3	GCF_000244755	GCF_000216455	GCF_000244715
+s__Propionibacterium_phage_P105	1	PRJNA177533
+s__Alteromonas_sp_SN2	1	GCF_000213655
+s__Mannheimia_phage_vB_MhM_1152AP	1	PRJNA212715
+s__Staphylococcus_phage_StB12	1	PRJNA192927
+s__Sida_golden_yellow_vein_virus	1	PRJNA14253
+s__Legionella_tunisiensis	1	GCF_000308315
+s__Streptomyces_phage_R4	1	PRJNA179407
+s__Bacteroides_sp_1_1_30	1	GCF_000218365
+s__Murrumbidgee_virus	1	PRJNA225034
+s__Vicia_cryptic_virus	1	PRJNA15555
+s__Aedes_aegypti_densovirus	1	PRJNA37821
+s__Thysanoplusia_orichalcea_nucleopolyhedrovirus	1	PRJNA184813
+s__Yersinia_phage_PY54	1	PRJNA15227
+s__zeta_proteobacterium_SCGC_AB_137_I08	1	GCF_000379305
+s__Northern_cereal_mosaic_virus	1	PRJNA14984
+s__Pepper_veinal_mottle_virus	1	PRJNA33675
+s__Haloquadratum_walsbyi	4	GCF_000415965	GCF_000009185	GCF_000415985	GCF_000237865
+s__Paenibacillus_terrae	1	GCF_000235585
+s__Winogradskyella_psychrotolerans	1	GCF_000427335
+s__Methanococcoides_burtonii	1	GCF_000013725
+s__Dorea_sp_5_2	1	GCF_000403455
+s__Kluyvera_phage_Kvp1	1	PRJNA32673
+s__Microbacterium_barkeri	1	GCF_000299315
+s__Enterobacteria_phage_BA14	1	PRJNA30599
+s__Micavibrio_aeruginosavorus	1	GCF_000226315
+s__Sphingobium_sp_AP49	1	GCF_000281715
+s__Switchgrass_mosaic_virus	1	PRJNA66897
+s__Cyanophage_S_TIM5	1	PRJNA181237
+s__Erysipelotrichaceae_bacterium_5_2_54FAA	1	GCF_000163515
+s__Propionibacterium_phage_ATCC29399B_C	1	PRJNA177539
+s__Eggerthella_sp_YY7918	1	GCF_000270285
+s__Tobacco_ringspot_virus_satellite_RNA	1	PRJNA14189
+s__Spiroplasma_phage_SVTS2	1	PRJNA14032
+s__Blastomonas_sp_CACIA14H2	1	GCF_000503195
+s__Enterococcus_cecorum	3	GCF_000492155	GCF_000379745	GCF_000407565
+s__Prevotella_stercorea	1	GCF_000235885
+s__Chlorobium_chlorochromatii	1	GCF_000012585
+s__African_elephant_polyomavirus_1	1	PRJNA222309
+s__Lactobacillus_phage_c5	1	PRJNA181077
+s__Bacillus_alcalophilus	1	GCF_000292245
+s__Turnip_crinkle_virus_satellite_RNA	2	PRJNA14433	PRJNA14506
+s__New_World_begomovirus_associated_satellite_DNA	1	PRJNA88123
+s__Streptomyces_sp_303MFCol5_2	1	GCF_000383635
+s__Oceanicaulis_alexandrii	1	GCF_000420265
+s__Bacillus_oceanisediminis	1	GCF_000294775
+s__Laceyella_sacchari	1	GCF_000421885
+s__Scallion_virus_X	1	PRJNA15099
+s__Asticcacaulis_sp_YBE204	1	GCF_000495855
+s__Siegesbeckia_yellow_vein_virus	1	PRJNA17267
+s__Brevibacillus_sp_phR	1	GCF_000311785
+s__Beilong_virus	1	PRJNA16630
+s__Aeromonas_phage_31	1	PRJNA15416
+s__Kotonkan_virus	1	PRJNA159107
+s__Dethiobacter_alkaliphilus	1	GCF_000174415
+s__Andes_virus	1	PRJNA14746
+s__zeta_proteobacterium_SCGC_AB_604_B04	1	GCF_000379205
+s__Tylonycteris_bat_coronavirus_HKU4	1	PRJNA18863
+s__Lettuce_necrotic_yellows_virus	1	PRJNA16236
+s__Streptococcus_constellatus	5	GCF_000223295	GCF_000463395	GCF_000463445	GCF_000463425	GCF_000257785
+s__Kurthia_massiliensis	1	GCF_000285555
+s__Methylophilales_bacterium_HTCC2181	1	GCF_000168995
+s__Pseudomonas_sp_M47T1	1	GCF_000263855
+s__Chlorogloeopsis_sp_PCC_9212	1	GCF_000317265
+s__Burkholderia_phage_phiE12_2	1	PRJNA19161
+s__Nocardiopsis_synnemataformans	1	GCF_000340945
+s__Dokdonia_donghaensis	1	GCF_000152925
+s__Lysinibacillus_sphaericus	2	GCF_000017965	GCF_000392615
+s__Sclerotinia_sclerotiorum_debilitation_associated_RNA_virus	1	PRJNA15717
+s__SAR86_cluster_bacterium_SAR86E	1	GCF_000307935
+s__SAR86_cluster_bacterium_SAR86D	1	GCF_000252585
+s__SAR86_cluster_bacterium_SAR86C	1	GCF_000252565
+s__Tupaiid_herpesvirus_1	1	PRJNA14597
+s__Pelagibacter_phage_HTVC008M	1	PRJNA192865
+s__Staphylococcus_warneri	4	GCF_000332735	GCF_000321085	GCF_000175175	GCF_000211215
+s__Ageratum_yellow_vein_Singapore_alphasatellite	1	PRJNA14232
+s__Cyanothece_sp_PCC_7822	1	GCF_000147335
+s__Seneca_valley_virus	1	PRJNA32193
+s__Rhodococcus_erythropolis	5	GCF_000010105	GCF_000454045	GCF_000174835	GCF_000225665	GCF_000454425
+s__Methylophilus_sp_42	1	GCF_000384155
+s__Flavobacterium_enshiense	1	GCF_000498495
+s__Mycobacterium_phage_Whirlwind	1	PRJNA215117
+s__Vibrio_genomosp_F10	4	GCF_000287015	GCF_000287055	GCF_000287035	GCF_000287195
+s__Streptococcus_sp_BS35b	1	GCF_000286475
+s__Propionibacterium_phage_PHL071N05	1	PRJNA219109
+s__Thielaviopsis_basicola_mitovirus	1	PRJNA37715
+s__Tolumonas_auensis	1	GCF_000023065
+s__Acetohalobium_arabaticum	1	GCF_000144695
+s__Paramecium_bursaria_Chlorella_virus_AR158	1	PRJNA20991
+s__Pyrobaculum_calidifontis	1	GCF_000015805
+s__Wild_tomato_mosaic_virus	1	PRJNA20625
+s__Enterobacteriaceae_bacterium_9_2_54FAA	1	GCF_000185685
+s__Rhizobium_mesoamericanum	1	GCF_000312665
+s__Alloscardovia_omnicolens	2	GCF_000420505	GCF_000466365
+s__Aeromonas_phage_44RR2_8t	1	PRJNA14321
+s__Sulfurimonas_denitrificans	1	GCF_000012965
+s__Paenibacillus_sp_JDR_2	1	GCF_000023585
+s__Halogranum_salarium	1	GCF_000283335
+s__Clavibacter_michiganensis	3	GCF_000069225	GCF_000355695	GCF_000063485
+s__Natronorubrum_tibetense	2	GCF_000383975	GCF_000337235
+s__Xylella_fastidiosa	11	GCF_000019325	GCF_000219235	GCF_000007245	GCF_000006725	GCF_000506905	GCF_000166835	GCF_000148405	GCF_000466025	GCF_000506405	GCF_000166855	GCF_000019765
+s__Synechococcus_sp_CB0101	1	GCF_000179235
+s__Hydrogenophaga_sp_PBC	1	GCF_000263795
+s__Tomato_leaf_curl_Patna_virus	1	PRJNA36527
+s__Limnohabitans_sp_Rim47	1	GCF_000292865
+s__Cellulophaga_algicola	1	GCF_000186265
+s__Turkey_adenovirus_A	2	PRJNA14524	PRJNA15112
+s__Novosphingobium_sp_Rr_2_17	1	GCF_000272475
+s__Acyrthosiphon_pisum_virus	1	PRJNA40357
+s__Zucchini_green_mottle_mosaic_virus	1	PRJNA15189
+s__Methylosinus_trichosporium	1	GCF_000178815
+s__Mycobacterium_sp_155	1	GCF_000373905
+s__Yersinia_pestis	123	GCF_000268485	GCF_000323285	GCF_000268865	GCF_000268825	GCF_000269145	GCF_000475135	GCF_000324885	GCF_000269465	GCF_000323645	GCF_000268445	GCF_000324785	GCF_000007885	GCF_000022805	GCF_000268545	GCF_000268505	GCF_000323365	GCF_000269125	GCF_000269325	GCF_000268665	GCF_000269025	GCF_000169655	GCF_000268685	GCF_000268745	GCF_000169635	GCF_000169615	GCF_000170275	GCF_000323785	GCF_000324465	GCF_000324025	GCF_000018805	GCF_000323505	GCF_000022825	GCF_000268905	GCF_000 [...]
+s__Tomato_yellow_leaf_curl_China_virus	1	PRJNA15318
+s__Enterococcus_villorum	2	GCF_000407205	GCF_000393935
+s__Pantoea_sp_Sc1	1	GCF_000255315
+s__Caldilinea_aerophila	1	GCF_000281175
+s__Halovivax_asiaticus	1	GCF_000337515
+s__Macroptilium_golden_mosaic_virus	1	PRJNA30169
+s__Clostridium_pasteurianum	3	GCF_000389635	GCF_000506785	GCF_000330945
+s__Prochlorothrix_hollandica	2	GCF_000332315	GCF_000341585
+s__Enterovirus_A	1	PRJNA15445
+s__Blattabacterium_sp_Blaberus_giganteus	1	GCF_000262715
+s__Prochlorococcus_sp_W8	1	GCF_000291825
+s__Bacillus_phage_WBeta	1	PRJNA16329
+s__Epirus_cherry_virus	1	PRJNA30739
+s__Prochlorococcus_sp_W3	1	GCF_000291905
+s__Prochlorococcus_sp_W2	1	GCF_000291885
+s__Prochlorococcus_sp_W4	1	GCF_000291785
+s__Prochlorococcus_sp_W7	1	GCF_000291805
+s__Rhodobacter_sphaeroides	7	GCF_000012905	GCF_000273405	GCF_000015985	GCF_000212605	GCF_000021005	GCF_000269625	GCF_000016405
+s__Amycolatopsis_sp_ATCC_39116	1	GCF_000231075
+s__Bat_adeno_associated_virus_YNM	1	PRJNA51735
+s__Paenibacillus_peoriae	1	GCF_000236805
+s__Nitrospina_sp_AB_629_B18	1	GCF_000375765
+s__Turkey_adenovirus_5	1	PRJNA225923
+s__Thermus_thermophilus	4	GCF_000091545	GCF_000008125	GCF_000258245	GCF_000214845
+s__Demetria_terragena	1	GCF_000376825
+s__Streptomyces_sp_GBA_94_10	1	GCF_000495635
+s__Ruminococcus_champanellensis	1	GCF_000210095
+s__Moroccan_watermelon_mosaic_virus	1	PRJNA27897
+s__Myxococcus_fulvus	1	GCF_000219105
+s__Shewanella_loihica	1	GCF_000016065
+s__Terriglobus_roseus	1	GCF_000265425
+s__Cyclovirus_bat_USA_2009	1	PRJNA61951
+s__Bat_adenovirus_A	1	PRJNA84399
+s__Stx2_converting_phage_1717	1	PRJNA32213
+s__Kedougou_virus	1	PRJNA36617
+s__Candidatus_Korarchaeum_cryptofilum	1	GCF_000019605
+s__Paenibacillus_sp_ICGEB2008	1	GCF_000307675
+s__Bell_pepper_mottle_virus	1	PRJNA20059
+s__Actinobacillus_ureae	1	GCF_000188255
+s__Lactococcus_phage_bIL309	1	PRJNA14338
+s__Thermacetogenium_phaeum	1	GCF_000305935
+s__Salinispora_tropica	3	GCF_000016425	GCF_000377085	GCF_000377065
+s__Streptomyces_sp_W007	1	GCF_000239075
+s__Potato_yellow_vein_virus	1	PRJNA14924
+s__Moritella_dasanensis	1	GCF_000276805
+s__Malva_mosaic_virus	1	PRJNA17349
+s__Encephalitozoon_intestinalis	1	GCA_000146465
+s__Sciscionella_marina	1	GCF_000379465
+s__Zea_mosaic_virus	1	PRJNA177544
+s__Burkholderia_phage_phi1026b	1	PRJNA14410
+s__Orientia_tsutsugamushi	2	GCF_000063545	GCF_000010205
+s__Methanocaldococcus_sp_FS406_22	1	GCF_000025525
+s__Rice_tungro_bacilliform_virus	1	PRJNA14579
+s__Corchorus_yellow_spot_virus	1	PRJNA17993
+s__Enterobacteria_phage_C_1_INW_2012	1	PRJNA184162
+s__Cleome_leaf_crumple_virus_associated_DNA_1	1	PRJNA60045
+s__Rattail_cactus_necrosis_associated_virus	1	PRJNA78929
+s__Sphingobium_chlorophenolicum	1	GCF_000147835
+s__Bacillus_phage_BPS13	1	PRJNA177519
+s__Lactococcus_phage_949	1	PRJNA64559
+s__Chicken_astrovirus	1	PRJNA14804
+s__Bamboo_mosaic_virus_satellite_RNA	1	PRJNA14748
+s__Enterovibrio_norvegicus	3	GCF_000286855	GCF_000286835	GCF_000264435
+s__Brachymonas_chironomi	1	GCF_000374625
+s__Mesorhizobium_alhagi	1	GCF_000236565
+s__Vibrio_phage_CP_T1	1	PRJNA181062
+s__Streptomyces_sp_FxanaC1	1	GCF_000375625
+s__Choristoneura_biennis_entomopoxvirus_L	1	PRJNA203666
+s__Prevotella_denticola	2	GCF_000191765	GCF_000193395
+s__Blattabacterium_sp_Blatta_orientalis	1	GCF_000334405
+s__Streptococcus_porcinus	1	GCF_000187955
+s__Ignavibacterium_album	1	GCF_000258405
+s__Mycoplasma_arginini	1	GCF_000367785
+s__Saccharophagus_degradans	1	GCF_000013665
+s__Centipeda_periodontii	1	GCF_000213975
+s__RD114_retrovirus	1	PRJNA20979
+s__Bacillus_phage_IEBH	1	PRJNA31057
+s__African_cassava_mosaic_virus	1	PRJNA15175
+s__Tomato_yellow_leaf_curl_Indonesia_virus_Lembang	1	PRJNA17387
+s__Butyrivibrio_sp_AE2015	1	GCF_000420825
+s__Allamanda_leaf_curl_virus	1	PRJNA30179
+s__Oscillibacter_sp_KLE_1728	1	GCF_000469425
+s__Mycobacteriophage_Daenerys	1	PRJNA215121
+s__Desulfurispirillum_indicum	1	GCF_000177635
+s__Leuconostoc_sp_C2	1	GCF_000219785
+s__Lachnoanaerobaculum_saburreum	2	GCF_000185385	GCF_000257705
+s__Choristoneura_occidentalis_alphabaculovirus	1	PRJNA214177
+s__Syntrophomonas_wolfei	1	GCF_000014725
+s__Parastagonospora_nodorum	1	GCA_000146915
+s__Actinomyces_johnsonii	2	GCF_000466205	GCF_000466245
+s__Pseudomonas_sp_M1	1	GCF_000317185
+s__Rhodococcus_sp_R1101	1	GCF_000278445
+s__Mycobacterium_phage_Gumball	1	PRJNA32009
+s__Galinsoga_mosaic_virus	1	PRJNA15209
+s__Streptomyces_viridochromogenes	1	GCF_000158955
+s__Thermus_phage_P23_45	1	PRJNA20765
+s__Chilli_leaf_curl_Multan_alphasatellite	1	PRJNA39933
+s__Wolbachia_endosymbiont_of_Drosophila_simulans	2	GCF_000376585	GCF_000376605
+s__Desulfococcus_oleovorans	1	GCF_000018405
+s__Brevibacillus_panacihumi	1	GCF_000503775
+s__Zinnia_leaf_curl_disease_associated_sequence	1	PRJNA14440
+s__Galbibacter_marinus	1	GCF_000300875
+s__Bifidobacterium_sp_12_1_47BFAA	1	GCF_000185665
+s__Mycobacterium_phage_Phlyer	1	PRJNA33871
+s__Lyngbya_sp_PCC_8106	1	GCF_000169095
+s__Tomato_leaf_curl_Bangalore_virus	1	PRJNA14190
+s__Krokinobacter_sp_4H_3_7_5	1	GCF_000212355
+s__Lactococcus_phage_BM13	1	PRJNA213076
+s__Dulcamara_mottle_virus	1	PRJNA16188
+s__Malvastrum_yellow_vein_Baoshan_virus	1	PRJNA37891
+s__Shallot_latent_virus	1	PRJNA15426
+s__Pseudomonas_phage_phi2954	1	PRJNA34533
+s__Dyella_japonica	1	GCF_000292265
+s__Aedes_flavivirus	1	PRJNA39601
+s__Actinoalloteichus_spitiensis	1	GCF_000239155
+s__Streptococcus_phage_TP_J34	1	PRJNA188154
+s__Sugarcane_mosaic_virus	1	PRJNA14994
+s__Mycobacterium_phage_Severus	1	PRJNA206027
+s__Pea_enation_mosaic_virus_1	1	PRJNA14769
+s__Methyloglobulus_morosus	1	GCF_000496735
+s__Mapuera_virus	1	PRJNA19651
+s__Pea_enation_mosaic_virus_2	1	PRJNA14818
+s__Bacillus_mojavensis	1	GCF_000245335
+s__Xanthomonas_translucens	3	GCF_000334075	GCF_000331775	GCF_000313775
+s__Enterococcus_moraviensis	2	GCF_000407445	GCF_000394015
+s__Rhodobacteraceae_bacterium_KLH11	1	GCF_000158135
+s__Clostridium_phage_phiCD27	1	PRJNA32323
+s__Bacillus_phage_vB_BceM_Bc431v3	1	PRJNA195534
+s__Helicobacter_hepaticus	1	GCF_000007905
+s__Escherichia_phage_P13374	1	PRJNA177543
+s__Thioalkalivibrio_sp_ALMg2	1	GCF_000381145
+s__Thioalkalivibrio_sp_ALMg3	1	GCF_000381225
+s__Picrophilus_torridus	1	GCF_000008265
+s__Bacteroides_ovatus	5	GCF_000273195	GCF_000273215	GCF_000218325	GCF_000178275	GCF_000154125
+s__Thioalkalivibrio_sp_AL21	1	GCF_000381325
+s__Thioalkalivibrio_sp_ALMg9	1	GCF_000380625
+s__Indian_cassava_mosaic_virus	1	PRJNA14483
+s__Rhodococcus_opacus	3	GCF_000264745	GCF_000234335	GCF_000010805
+s__Campoletis_sonorensis_ichnovirus	1	PRJNA16738
+s__Caulobacter_sp_K31	1	GCF_000019145
+s__Halococcus_salifodinae	1	GCF_000336935
+s__Thauera_terpenica	1	GCF_000443165
+s__Borrelia_bavariensis	1	GCF_000196215
+s__Murine_astrovirus	1	PRJNA176429
+s__Oceanicaulis_sp_HTCC2633	1	GCF_000152745
+s__Neisseria_subflava	1	GCF_000173955
+s__Cassava_common_mosaic_virus	1	PRJNA14705
+s__Streptomyces_sp_LaPpAH_165	1	GCF_000373525
+s__Maize_mosaic_virus	1	PRJNA14920
+s__Southern_bean_mosaic_virus	1	PRJNA15356
+s__Phaeobacter_gallaeciensis	2	GCF_000154745	GCF_000203975
+s__secondary_endosymbiont_of_Ctenarytaina_eucalypti	1	GCF_000287335
+s__Ochrobactrum_sp_EGD_AQ16	1	GCF_000465835
+s__Hafnia_alvei	1	GCF_000239255
+s__Mycoplasma_alligatoris	1	GCF_000178375
+s__Enterobacter_asburiae	1	GCF_000224675
+s__St_Louis_encephalitis_virus	1	PRJNA16150
+s__Human_cosavirus_D	1	PRJNA38501
+s__Streptomyces_zinciresistens	1	GCF_000225525
+s__Weissella_halotolerans	1	GCF_000420365
+s__Vibrio_phage_vB_VpaM_MAR	1	PRJNA183156
+s__Clostridium_colicanis	1	GCF_000371465
+s__Drosophila_melanogaster_totivirus_SW_2009a	1	PRJNA41725
+s__Gammapapillomavirus_9	1	PRJNA39691
+s__Gammapapillomavirus_8	1	PRJNA36517
+s__Gammapapillomavirus_5	1	PRJNA28737
+s__Cellulophaga_phage_phiSM	1	PRJNA195497
+s__Gammapapillomavirus_7	1	PRJNA36519
+s__Gammapapillomavirus_6	3	PRJNA17119	PRJNA17121	PRJNA34847
+s__Gammapapillomavirus_1	1	PRJNA15492
+s__Lachancea_thermotolerans	1	GCA_000142805
+s__Cellulophaga_phage_phiST	1	PRJNA195498
+s__Ovine_herpesvirus_2	1	PRJNA16234
+s__Pseudomonas_fragi	2	GCF_000250615	GCF_000250595
+s__Leptospira_broomii	1	GCF_000243715
+s__Microcoleus_vaginatus	1	GCF_000214075
+s__Listeria_phage_A118	1	PRJNA14589
+s__Desulfurococcus_mucosus	1	GCF_000186365
+s__Pepper_yellow_vein_Mali_virus	1	PRJNA14348
+s__Aeromonas_sp_159	1	GCF_000292325
+s__Shewanella_sp_ANA_3	1	GCF_000203935
+s__Tetraselmis_viridis_virus_SI1	1	PRJNA195491
+s__Vibrio_scophthalmi	1	GCF_000222585
+s__Pyrococcus_abyssi_virus_1	1	PRJNA19929
+s__Enterobacteria_phage_alpha3	1	PRJNA14570
+s__Candidatus_Rickettsia_amblyommii	1	GCF_000284055
+s__Lactobacillus_pasteurii	1	GCF_000297025
+s__Lactobacillus_sanfranciscensis	1	GCF_000225325
+s__Grapevine_fanleaf_virus_satellite_RNA	1	PRJNA14986
+s__Methylophaga_aminisulfidivorans	1	GCF_000214595
+s__Squirrel_monkey_polyomavirus	1	PRJNA27775
+s__Bacillus_phage_phBC6A52	1	PRJNA15022
+s__Mirabilis_jalapa_mottle_virus	1	PRJNA74427
+s__Lachnospiraceae_bacterium_ICM7	1	GCF_000287675
+s__Bacillus_phage_phBC6A51	1	PRJNA15021
+s__Sweet_potato_leaf_curl_Uganda_virus_Uganda_Kampala_2008	1	PRJNA62213
+s__Leuconostoc_gasicomitatum	1	GCF_000196855
+s__Streptococcus_phage_P9	1	PRJNA20785
+s__Pestivirus_Giraffe_1	1	PRJNA14780
+s__Bacteroides_fluxus	1	GCF_000195635
+s__Methanomethylovorans_hollandica	1	GCF_000328665
+s__Potato_Virus_P	1	PRJNA20657
+s__Vibrio_metschnikovii	1	GCF_000176155
+s__Gordonia_neofelifaecis	1	GCF_000192435
+s__Avian_adeno_associated_virus	1	PRJNA14463
+s__Staphylococcus_phage_187	1	PRJNA15264
+s__Turnip_crinkle_virus	1	PRJNA14811
+s__Coprobacillus_sp_3_3_56FAA	1	GCF_000239735
+s__Staphylococcus_caprae_capitis	5	GCF_000174135	GCF_000263775	GCF_000160215	GCF_000183705	GCF_000221525
+s__Pseudoxanthomonas_sp_GW2	1	GCF_000283075
+s__Lindernia_anagallis_yellow_vein_virus	1	PRJNA19777
+s__Halovirus_HVTV_1	1	PRJNA186952
+s__Thermoanaerobacter_wiegelii	1	GCF_000147695
+s__Lachnospiraceae_bacterium_2_1_58FAA	1	GCF_000218465
+s__Marinitoga_piezophila	1	GCF_000255135
+s__Pontibacter_sp_BAB1700	1	GCF_000277005
+s__Aromatoleum_aromaticum	1	GCF_000025965
+s__Bacteroides_coprocola	1	GCF_000154845
+s__Halorubrum_arcis	1	GCF_000337015
+s__Streptomyces_sp_HGB0020	1	GCF_000411315
+s__Caldimonas_manganoxidans	1	GCF_000381125
+s__Middle_East_respiratory_syndrome_coronavirus	1	PRJNA183710
+s__Sida_golden_mosaic_Costa_Rica_virus	1	PRJNA14262
+s__Eggerthia_catenaformis	1	GCF_000340375
+s__Clostridium_innocuum	1	GCF_000371425
+s__Vibrionales_bacterium_SWAT_3	1	GCF_000169995
+s__Capnocytophaga_granulosa	1	GCF_000411115
+s__Lactococcus_phage_KSY1	1	PRJNA20783
+s__Streptomyces_acidiscabies	1	GCF_000242715
+s__Enterobacteria_phage_JS10	1	PRJNA38265
+s__Omegapapillomavirus_1	1	PRJNA29915
+s__Thermoanaerobacter_pseudethanolicus	1	GCF_000019085
+s__Paenisporosarcina_sp_HGH0030	1	GCF_000411295
+s__Actinopolyspora_mortivallis	1	GCF_000384035
+s__Clostridium_clariflavum	1	GCF_000237085
+s__Lisianthus_necrosis_virus	1	PRJNA16737
+s__Pedilanthus_leaf_curl_virus	1	PRJNA34665
+s__Rhizobium_phage_16_3	1	PRJNA30845
+s__Gammapapillomavirus_HPV127	1	PRJNA51741
+s__Pepper_vein_yellows_virus	1	PRJNA62493
+s__Bocavirus_gorilla_GBoV1_2009	1	PRJNA51179
+s__Haloarcula_californiae	1	GCF_000337755
+s__Listeria_phage_LP_030_2	1	PRJNA209078
+s__Hyphomicrobium_sp	1	GCF_000253295
+s__Molluscum_contagiosum_virus	1	PRJNA14328
+s__Xanthomonas_gardneri	1	GCF_000192065
+s__Psychrobacter_cryohalolentis	1	GCF_000013905
+s__Alphamesonivirus_1	2	PRJNA68059	PRJNA71143
+s__Elizabethkingia_meningoseptica	3	GCF_000367325	GCF_000447375	GCF_000401415
+s__Thermococcus_gammatolerans	1	GCF_000022365
+s__Sphingobacterium_spiritivorum	2	GCF_000143765	GCF_000159515
+s__Actinobaculum_schaalii	1	GCF_000411135
+s__Mycobacterium_smegmatis	4	GCF_000331165	GCF_000283295	GCF_000328565	GCF_000015005
+s__Prevotella_marshii	1	GCF_000146675
+s__Verbena_virus_Y	1	PRJNA29881
+s__Limnohabitans_sp_Rim28	1	GCF_000293865
+s__Escherichia_phage_TL_2011b	1	PRJNA181074
+s__Escherichia_phage_TL_2011c	1	PRJNA181075
+s__Asaia_sp_SF2_1	1	GCF_000505765
+s__beta_proteobacterium_KB13	1	GCF_000156155
+s__Gordonia_kroppenstedtii	1	GCF_000380485
+s__Mycoplasma_iowae	1	GCF_000227355
+s__Rhodopseudomonas_palustris	7	GCF_000013745	GCF_000013685	GCF_000195775	GCF_000020445	GCF_000013365	GCF_000014825	GCF_000177255
+s__Paenibacillus_riograndensis	1	GCF_000224945
+s__Caminibacter_mediatlanticus	1	GCF_000170735
+s__Streptomyces_niveus	1	GCF_000497425
+s__Broad_bean_true_mosaic_virus	1	PRJNA214691
+s__Acinetobacter_sp_ANC_3929	1	GCF_000369405
+s__Arthrobacter_sp_AK_YN10	1	GCF_000465895
+s__Salinispora_arenicola	21	GCF_000375165	GCF_000375125	GCF_000380945	GCF_000373845	GCF_000375085	GCF_000259615	GCF_000378645	GCF_000375145	GCF_000375005	GCF_000384275	GCF_000259675	GCF_000375105	GCF_000375185	GCF_000375045	GCF_000018265	GCF_000375205	GCF_000375025	GCF_000378665	GCF_000378705	GCF_000378685	GCF_000377605
+s__Shewanella_frigidimarina	1	GCF_000014705
+s__Shigella_boydii	7	GCF_000020185	GCF_000268185	GCF_000012025	GCF_000268145	GCF_000193915	GCF_000211975	GCF_000211955
+s__Tetrapisispora_blattae	1	GCA_000315915
+s__European_mountain_ash_ringspot_associated_virus	1	PRJNA39973
+s__Lactobacillus_phage_Sha1	1	PRJNA181084
+s__Agrobacterium_sp_10MFCol1_1	1	GCF_000381165
+s__Leptospira_inadai	1	GCF_000243675
+s__Bacillus_smithii	1	GCF_000238675
+s__Salmonella_phage_SSU5	1	PRJNA177521
+s__Holospora_undulata	1	GCF_000388175
+s__Tomato_leaf_curl_China_betasatellite	1	PRJNA15446
+s__Staphylococcus_phage_53_sensu_lato	11	PRJNA15260	PRJNA15265	PRJNA15266	PRJNA15270	PRJNA15273	PRJNA15275	PRJNA15277	PRJNA15278	PRJNA15279	PRJNA15280	PRJNA15281
+s__Enterobacteria_phage_HK446	1	PRJNA183141
+s__Sida_yellow_vein_disease_associated_DNA_1	1	PRJNA48075
+s__Mycoplasma_fermentans	2	GCF_000148625	GCF_000186005
+s__Gloeocapsa_sp_PCC_73106	1	GCF_000332035
+s__Thioflavicoccus_mobilis	1	GCF_000327045
+s__Vibrio_proteolyticus	1	GCF_000467125
+s__Plum_bark_necrosis_stem_pitting_associated_virus	1	PRJNA27909
+s__Nosema_ceranae	1	GCA_000182985
+s__Canine_papillomavirus_9	1	PRJNA74353
+s__Canine_papillomavirus_8	1	PRJNA73441
+s__Gordonia_aichiensis	1	GCF_000332975
+s__Acidaminococcus_sp_BV3L6	1	GCF_000468835
+s__Sida_golden_mosaic_virus	1	PRJNA14083
+s__Bartonella_vinsonii	4	GCF_000341385	GCF_000385415	GCF_000278335	GCF_000278235
+s__Classical_swine_fever_virus	1	PRJNA15457
+s__Simbu_virus	1	PRJNA173359
+s__Barley_stripe_mosaic_virus	1	PRJNA15031
+s__Acinetobacter_venetianus	3	GCF_000271425	GCF_000368585	GCF_000308235
+s__Enterobacteria_phage_JS	1	PRJNA27983
+s__Gordonia_alkanivorans	2	GCF_000503935	GCF_000225505
+s__Tuber_aestivum_endornavirus	1	PRJNA61903
+s__White_clover_cryptic_virus_2	1	PRJNA198685
+s__Burkholderia_phage_BcepB1A	1	PRJNA14476
+s__Bradyrhizobium_sp_ORS_285	1	GCF_000239755
+s__Clostridium_symbiosum	3	GCF_000189595	GCF_000466485	GCF_000189615
+s__Sigmapapillomavirus_1	1	PRJNA15171
+s__Bacillus_phage_Cherry	1	PRJNA15784
+s__Acidianus_spindle_shaped_virus_1	1	PRJNA42351
+s__Rhodococcus_phage_E3	1	PRJNA206474
+s__Aerococcus_urinae	1	GCF_000193205
+s__Enterobacteria_phage_SfV	1	PRJNA14162
+s__Mycobacterium_phage_PMC	1	PRJNA17169
+s__Acidovorax_delafieldii	1	GCF_000175235
+s__Methanotorris_formicicus	1	GCF_000243455
+s__Brachyspira_intermedia	1	GCF_000223215
+s__East_Asian_Passiflora_virus	1	PRJNA16326
+s__Sida_micrantha_mosaic_virus	1	PRJNA14343
+s__Rhizobium_etli	9	GCF_000172775	GCF_000172695	GCF_000442435	GCF_000092045	GCF_000172715	GCF_000172795	GCF_000172755	GCF_000172735	GCF_000020265
+s__Tomato_leaf_curl_betasatellite	1	PRJNA14622
+s__Cycloclasticus_pugetii	1	GCF_000384415
+s__Salmonella_phage_Fels_2	1	PRJNA32273
+s__Actinoplanes_phage_phiAsp2	1	PRJNA14378
+s__Wolbachia_endosymbiont_of_Drosophila_melanogaster	2	GCF_000475015	GCF_000008025
+s__Tomato_leaf_curl_Hanoi_virus	1	PRJNA62755
+s__Rothia_dentocariosa	2	GCF_000143585	GCF_000164695
+s__Mycobacterium_phage_Papyrus	1	PRJNA215107
+s__Pseudomonas_sp_CMAA1215	1	GCF_000474765
+s__Bacillus_toyonensis	1	GCF_000496285
+s__Candidatus_Prevotella_conceptionensis	1	GCF_000312305
+s__Clerodendrum_golden_mosaic_China_virus	1	PRJNA32175
+s__Tomato_leaf_curl_Sulawesi_virus	1	PRJNA41173
+s__Heliothis_zea_virus_1	1	PRJNA14215
+s__Primate_T_lymphotropic_virus_3	1	PRJNA14732
+s__Primate_T_lymphotropic_virus_2	1	PRJNA15221
+s__Acinetobacter_phage_AB3	1	PRJNA206500
+s__Sendai_virus	1	PRJNA15023
+s__Klebsiella_phage_0507_KN2_1	1	PRJNA219106
+s__Pepper_yellow_leaf_curl_Indonesia_virus	1	PRJNA17429
+s__Desulfovibrio_sp_6_1_46AFAA	1	GCF_000224635
+s__Oscillibacter_sp_KLE_1745	1	GCF_000469445
+s__Bacillus_infantis	1	GCF_000473245
+s__Bacillus_sp_HYC_10	1	GCF_000300535
+s__Vibrio_phage_fs2	1	PRJNA14088
+s__Vibrio_phage_fs1	1	PRJNA14227
+s__Halomonas_boliviensis	1	GCF_000236035
+s__Bovine_papular_stomatitis_virus	1	PRJNA14469
+s__Bordetella_avium	1	GCF_000070465
+s__Methylocystis_parvus	1	GCF_000283235
+s__Marinomonas_sp_MED121	1	GCF_000153025
+s__Methylobacter_sp_UW_659_2_G11	1	GCF_000375905
+s__Streptococcus_phage_C1	1	PRJNA14288
+s__Methanobacterium_sp_Maddingley_MBC34	1	GCF_000309865
+s__Helicoverpa_armigera_stunt_virus	1	PRJNA14652
+s__Saccharomyces_cerevisiae	1	GCA_000146045
+s__Carboxydothermus_hydrogenoformans	1	GCF_000012865
+s__Eragrostis_minor_streak_virus	1	PRJNA67111
+s__Rhodococcus_sp_29MFTsu3_1	1	GCF_000382105
+s__Pseudomonas_sp_GM41_2012	1	GCF_000282315
+s__Shewanella_violacea	1	GCF_000091325
+s__Malvastrum_leaf_curl_virus	1	PRJNA16325
+s__Indian_citrus_ringspot_virus	1	PRJNA14716
+s__Narcissus_yellow_stripe_virus	1	PRJNA32687
+s__Kingella_denitrificans	1	GCF_000190695
+s__Tetraselmis_viridis_virus_S20	1	PRJNA195490
+s__Periplaneta_fuliginosa_densovirus	1	PRJNA14091
+s__Sphingobium_sp_HDIP04	1	GCF_000445085
+s__Yarrowia_lipolytica	1	GCA_000002525
+s__Choristoneura_rosaceana_entomopoxvirus_L	1	PRJNA203664
+s__Schizosaccharomyces_japonicus	1	GCA_000149845
+s__Saccharibacter_floricola	1	GCF_000378165
+s__Propionibacterium_phage_PHL114L00	1	PRJNA219112
+s__Acaryochloris_sp_CCMEE_5410	1	GCF_000238775
+s__Micrococcus_luteus	4	GCF_000309825	GCF_000176875	GCF_000180435	GCF_000023205
+s__Hana_virus	1	PRJNA196418
+s__Celeribacter_baekdonensis	1	GCF_000299875
+s__Tomato_yellow_leaf_distortion_virus	1	PRJNA165747
+s__Dahlia_latent_viroid	1	PRJNA186953
+s__Desulfobacula_toluolica	1	GCF_000307105
+s__Staphylococcus_phage_phiMR11	1	PRJNA28065
+s__Aedes_albopictus_densovirus	1	PRJNA14581
+s__Oscillatoria_sp_PCC_6506	1	GCF_000180455
+s__Pseudomonas_sp_P179	1	GCF_000478485
+s__Brucella_sp_NF_2653	1	GCF_000177155
+s__Streptococcus_merionis	1	GCF_000380085
+s__Halorubrum_hochstenium	1	GCF_000337075
+s__alpha_proteobacterium_HIMB59	1	GCF_000299115
+s__Aeromonas_phage_Aeh1	1	PRJNA14312
+s__Bacillus_coagulans	5	GCF_000333935	GCF_000333915	GCF_000169195	GCF_000217835	GCF_000223155
+s__Candidatus_Liberibacter_americanus	2	GCF_000496595	GCF_000350385
+s__Staphylococcus_phage_phiN315	1	PRJNA14527
+s__Psipapillomavirus_1	1	PRJNA17549
+s__Staphylococcus_phage_SMSAP5	1	PRJNA181240
+s__Sphingobium_indicum	1	GCF_000264945
+s__Methylophaga_lonarensis	1	GCF_000349205
+s__Geobacillus_sp_C56_T3	1	GCF_000092445
+s__Nanovirus_like_particle	1	PRJNA14386
+s__Diplorickettsia_massiliensis	1	GCF_000257395
+s__Drosophila_A_virus	1	PRJNA39351
+s__Gilvimarinus_chinensis	1	GCF_000377745
+s__Mesorhizobium_opportunistum	1	GCF_000176035
+s__Pseudomonas_putida	24	GCF_000478865	GCF_000281215	GCF_000183645	GCF_000495455	GCF_000016865	GCF_000007565	GCF_000325725	GCF_000285395	GCF_000390005	GCF_000226035	GCF_000497385	GCF_000219705	GCF_000019445	GCF_000410575	GCF_000319305	GCF_000412675	GCF_000292775	GCF_000226475	GCF_000294445	GCF_000271965	GCF_000287915	GCF_000264665	GCF_000019125	GCF_000367825
+s__Singulisphaera_acidiphila	2	GCF_000242455	GCF_000255675
+s__Mythimna_loreyi_densovirus	1	PRJNA14346
+s__Weissella_cibaria	1	GCF_000193635
+s__Rhizobium_tropici	1	GCF_000330885
+s__Salmonella_phage_ViI	1	PRJNA64767
+s__Methylacidiphilum_infernorum	1	GCF_000019665
+s__Listeria_phage_vB_LmoM_AG20	1	PRJNA195527
+s__Vibrio_phage_VEJphi	1	PRJNA38367
+s__Prevotella_maculosa	2	GCF_000243015	GCF_000382385
+s__Streptomyces_sp_CNT372	1	GCF_000377145
+s__Cyanophage_NATL2A_133	1	PRJNA81185
+s__Anaplasma_phagocytophilum	6	GCF_000478445	GCF_000439755	GCF_000439775	GCF_000013125	GCF_000478425	GCF_000439795
+s__Verticillium_alfalfae	1	GCA_000150825
+s__Enterobacteria_phage_BZ13	1	PRJNA14635
+s__Avastrovirus_3	1	PRJNA14954
+s__Vibrio_alginolyticus	4	GCF_000176055	GCF_000354175	GCF_000153505	GCF_000467145
+s__Marinobacter_manganoxydans	1	GCF_000235625
+s__Erwinia_phage_FE44	1	PRJNA227003
+s__Human_cosavirus_E	1	PRJNA38493
+s__Desulfobacterium_autotrophicum	1	GCF_000020365
+s__Rhynchosai_mild_mosaic_virus	1	PRJNA66547
+s__Human_cosavirus_B	1	PRJNA38499
+s__Bacillus_sp_SG_1	1	GCF_000181495
+s__Pleurocapsa_minor	1	GCF_000317025
+s__Banana_mild_mosaic_virus	1	PRJNA14711
+s__Equid_herpesvirus_2	1	PRJNA14457
+s__Equid_herpesvirus_1	1	PRJNA14465
+s__Chlamydia_pecorum	4	GCF_000470825	GCF_000470765	GCF_000204135	GCF_000470805
+s__Hydrogenivirga_sp_128_5_R1_1	1	GCF_000171895
+s__Equid_herpesvirus_4	1	PRJNA14418
+s__Equid_herpesvirus_9	1	PRJNA33137
+s__Equid_herpesvirus_8	1	PRJNA162499
+s__Maize_rayado_fino_virus	1	PRJNA15381
+s__Aurantimonas_manganoxydans	1	GCF_000153465
+s__Aliivibrio_logei	2	GCF_000390125	GCF_000286935
+s__Pseudoalteromonas_haloplanktis	3	GCF_000238355	GCF_000026085	GCF_000212655
+s__Nitrobacter_sp_Nb_311A	1	GCF_000152905
+s__Cowpea_mosaic_virus	1	PRJNA15283
+s__Pseudomonas_phage_YuA	1	PRJNA28053
+s__Bacteroides_phage_B124_14	1	PRJNA82753
+s__Sanguibacter_sp_JC301	1	GCF_000312125
+s__Cherry_green_ring_mottle_virus	1	PRJNA14650
+s__Vibrio_crassostreae	5	GCF_000272065	GCF_000272205	GCF_000272185	GCF_000272045	GCF_000272085
+s__Bhargavaea_cecembensis	1	GCF_000348905
+s__Shewanella_piezotolerans	1	GCF_000014885
+s__Sida_leaf_curl_virus_satellite_DNA_beta	1	PRJNA19823
+s__Roseobacter_sp_CCS2	1	GCF_000169435
+s__Amorphus_coralli	1	GCF_000374525
+s__Halomonas_sp_KM_1	1	GCF_000246875
+s__Streptomyces_scabrisporus	1	GCF_000372745
+s__Teredinibacter_turnerae	6	GCF_000381665	GCF_000023025	GCF_000379165	GCF_000372325	GCF_000381645	GCF_000372925
+s__Streptococcus_criceti	1	GCF_000187975
+s__Duck_parvovirus	1	PRJNA14425
+s__Mycoplasma_crocodyli	1	GCF_000025845
+s__Enterococcus_hirae	3	GCF_000407425	GCF_000271405	GCF_000393835
+s__Atopobium_sp_oral_taxon_199	1	GCF_000411555
+s__Natrinema_sp_J7_2	1	GCF_000281695
+s__Xanthomonas_sp_SHU308	1	GCF_000364645
+s__Hosta_virus_X	1	PRJNA32693
+s__Tomato_leaf_curl_New_Delhi_betasatellite	1	PRJNA14451
+s__Streptococcus_sp_I_G2	1	GCF_000479335
+s__Gibbon_ape_leukemia_virus	1	PRJNA14657
+s__Mycobacterium_phage_Fishburne	1	PRJNA206033
+s__Alfalfa_mosaic_virus	1	PRJNA14667
+s__Pseudomonas_viridiflava	1	GCF_000307715
+s__Sinorhizobium_medicae	3	GCF_000372345	GCF_000378785	GCF_000017145
+s__Desulfovibrio_sp_U5L	1	GCF_000245055
+s__Halomonas_sp_BJGMM_B45	1	GCF_000470745
+s__Borrelia_miyamotoi	1	GCF_000445425
+s__Conexibacter_woesei	1	GCF_000025265
+s__Bovine_foamy_virus	1	PRJNA14646
+s__Chitiniphilus_shinanonensis	1	GCF_000374805
+s__Grapevine_yellow_speckle_viroid_2	1	PRJNA14764
+s__Brachyspira_pilosicoli	4	GCF_000319185	GCF_000325665	GCF_000143725	GCF_000296575
+s__Sulfobacillus_thermosulfidooxidans	1	GCF_000294425
+s__Desulfobacter_postgatei	1	GCF_000233695
+s__Desulfovibrio_sp_3_1_syn3	1	GCF_000145315
+s__Australian_bat_lyssavirus	1	PRJNA14730
+s__Centrosema_yellow_spot_virus	1	PRJNA124057
+s__Rhodobacter_phage_RcapMu	1	PRJNA76743
+s__Alkaliphilus_metalliredigens	1	GCF_000016985
+s__Grapevine_endophyte_endornavirus	1	PRJNA181245
+s__Gossypium_darwinii_symptomless_alphasatellite	1	PRJNA39593
+s__Staphylococcus_sp_HGB0015	1	GCF_000411275
+s__Grapevine_yellow_speckle_viroid_1	1	PRJNA14963
+s__Bacillus_phage_SPO1	1	PRJNA32379
+s__Abaca_bunchy_top_virus	1	PRJNA28697
+s__Campylobacter_showae	3	GCF_000175655	GCF_000313615	GCF_000344295
+s__Propionibacterium_phage_P104A	1	PRJNA177532
+s__Escherichia_phage_ADB_2	1	PRJNA183155
+s__Pseudomonas_phage_MPK6	1	PRJNA227001
+s__Brucella_phage_Tb	1	PRJNA181063
+s__Nitratireductor_aquibiodomus	1	GCF_000265055
+s__Sweet_potato_vein_clearing_virus	1	PRJNA64493
+s__Ross_s_goose_hepatitis_B_virus	2	PRJNA14380	PRJNA14403
+s__Porphyromonas_gulae	1	GCF_000378065
+s__Maribacter_sp_HTCC2170	1	GCF_000153165
+s__Ralstonia_sp_AU12_08	1	GCF_000442475
+s__Veillonella_sp_6_1_27	1	GCF_000163735
+s__Streptomyces_sp_CcalMP_8W	1	GCF_000373305
+s__Pseudoalteromonas_agarivorans	1	GCF_000363985
+s__Vibrio_parahaemolyticus	21	GCF_000454145	GCF_000500755	GCF_000182465	GCF_000196095	GCF_000454225	GCF_000154045	GCF_000500105	GCF_000454475	GCF_000454455	GCF_000454205	GCF_000182385	GCF_000315135	GCF_000182365	GCF_000454185	GCF_000477475	GCF_000328405	GCF_000454245	GCF_000195415	GCF_000454165	GCF_000182345	GCF_000454265
+s__Cotia_virus	1	PRJNA85563
+s__Anaerococcus_hydrogenalis	2	GCF_000191745	GCF_000173355
+s__Pectobacterium_phage_phiTE	1	PRJNA188533
+s__Verrucomicrobia_bacterium_SCGC_AAA168_E21	1	GCF_000264625
+s__Blueberry_scorch_virus	1	PRJNA15329
+s__Adoxophyes_honmai_nucleopolyhedrovirus	1	PRJNA14408
+s__Geobacillus_kaustophilus	2	GCF_000415905	GCF_000009785
+s__Hydrocarboniphaga_effusa	1	GCF_000271305
+s__Parietaria_mottle_virus	1	PRJNA14940
+s__Mycobacterium_phage_BPs	1	PRJNA29917
+s__Bacillus_coahuilensis	1	GCF_000171615
+s__Pseudomonas_sp_Chol1	1	GCF_000306015
+s__West_Nile_virus	1	PRJNA30293
+s__Grapevine_satellite_virus	1	PRJNA208539
+s__Caldicellulosiruptor_kronotskyensis	1	GCF_000166775
+s__Porcine_associated_stool_circular_virus	1	PRJNA175586
+s__Bordetella_sp_FB_8	1	GCF_000382185
+s__Weissella_ceti	1	GCF_000320345
+s__Erwinia_sp_Ejp617	1	GCF_000165815
+s__Aquaspirillum_serpens	1	GCF_000420525
+s__Flexibacter_litoralis	1	GCF_000265505
+s__Pseudomonas_phage_D3112	1	PRJNA14334
+s__Holdemania_sp_AP2	1	GCF_000327285
+s__Mud_crab_dicistrovirus	1	PRJNA61121
+s__Langat_virus	1	PRJNA15370
+s__Bacteroides_caccae	2	GCF_000169015	GCF_000273725
+s__Bradyrhizobium_sp_S23321	1	GCF_000284275
+s__Xenopus_laevis_endogenous_retrovirus_Xen1	1	PRJNA30173
+s__Comamonas_testosteroni	5	GCF_000168855	GCF_000241525	GCF_000093145	GCF_000178915	GCF_000241245
+s__Eudoraea_adriatica	1	GCF_000382125
+s__Acinetobacter_sp_NIPH_542	1	GCF_000369825
+s__Actinobacillus_capsulatus	1	GCF_000374285
+s__Nodamura_virus	1	PRJNA14724
+s__Janibacter_sp_HTCC2649	1	GCF_000152705
+s__Gordonia_sp_KTR9	1	GCF_000143885
+s__Vibrio_genomosp_F6	1	GCF_000272145
+s__Leucobacter_sp_UCD_THU	1	GCF_000349545
+s__Tomato_mild_yellow_leaf_curl_Aragua_virus	1	PRJNA19653
+s__Haloferax_larsenii	1	GCF_000336955
+s__Latino_virus	1	PRJNA29905
+s__Phycisphaera_mikurensis	1	GCF_000284115
+s__Eubacterium_rectale	3	GCF_000209935	GCF_000020605	GCF_000209955
+s__Escherichia_fergusonii	2	GCF_000026225	GCF_000190495
+s__Brochothrix_phage_NF5	1	PRJNA64545
+s__Gluconacetobacter_hansenii	1	GCF_000164395
+s__Helicobacter_pylori	271	GCF_000275185	GCF_000275365	GCF_000274325	GCF_000345065	GCF_000148855	GCF_000345405	GCF_000392455	GCF_000274405	GCF_000275425	GCF_000345465	GCF_000275085	GCF_000274905	GCF_000444325	GCF_000274965	GCF_000345565	GCF_000192335	GCF_000346875	GCF_000359645	GCF_000274745	GCF_000345885	GCF_000307795	GCF_000345945	GCF_000256035	GCF_000196755	GCF_000023805	GCF_000270045	GCF_000346025	GCF_000349505	GCF_000345145	GCF_000299815	GCF_000275385	GCF_000274025	GCF_000273805	GCF [...]
+s__Munia_coronavirus_HKU13	1	PRJNA32703
+s__Rhodococcus_phage_RRH1	1	PRJNA81169
+s__Pseudomonas_phage_MPK7	1	PRJNA215673
+s__Bovine_adeno_associated_virus	1	PRJNA14381
+s__Gallid_herpesvirus_2	1	PRJNA14402
+s__Silicibacter_sp_TrichCH4B	1	GCF_000161815
+s__Erysipelothrix_tonsillarum	1	GCF_000373785
+s__Bean_yellow_mosaic_Mexico_virus	1	PRJNA66545
+s__Talaromyces_stipitatus	1	GCA_000003125
+s__Arthroderma_gypseum	1	GCA_000150975
+s__Bacillus_sp_m3_13	1	GCF_000175075
+s__Aliivibrio_fischeri	4	GCF_000011805	GCF_000241785	GCF_000287175	GCF_000020845
+s__Rickettsia_honei	1	GCF_000263055
+s__Haloferax_mediterranei	2	GCF_000337295	GCF_000306765
+s__Cereal_yellow_dwarf_virus_RPV_satellite_RNA	1	PRJNA14169
+s__Bordetella_petrii	1	GCF_000067205
+s__Betapapillomavirus_5	1	PRJNA15488
+s__Betapapillomavirus_4	1	PRJNA14406
+s__Duvenhage_virus	1	PRJNA194144
+s__Rhodococcus_rhodnii	1	GCF_000389715
+s__Ageratum_Yellow_vein_China_virus_OX1	1	PRJNA202889
+s__Staphylococcus_phage_EW	1	PRJNA15272
+s__Geobacillus_sp_G11MC16	1	GCF_000173035
+s__Parvularcula_bermudensis	1	GCF_000152825
+s__Streptococcus_phage_Cp_1	1	PRJNA14584
+s__Carnobacterium_sp_17_4	1	GCF_000195575
+s__Junonia_coenia_densovirus	1	PRJNA15423
+s__Acinetobacter_sp_CIP_102136	1	GCF_000369685
+s__Bacteroides_sp_D22	1	GCF_000163675
+s__Bacteroides_sp_D20	1	GCF_000162215
+s__Bunyamwera_virus	1	PRJNA14649
+s__Phytophthora_endornavirus_1	1	PRJNA15418
+s__Mycobacterium_phage_Rizal	1	PRJNA31281
+s__Sin_Nombre_virus	1	PRJNA15005
+s__Mycobacterium_abscessus	61	GCF_000069185	GCF_000332605	GCF_000270565	GCF_000270925	GCF_000271145	GCF_000271225	GCF_000271125	GCF_000271105	GCF_000261105	GCF_000270765	GCF_000333695	GCF_000500165	GCF_000277775	GCF_000270785	GCF_000270645	GCF_000260575	GCF_000270585	GCF_000271025	GCF_000257245	GCF_000271205	GCF_000280595	GCF_000445035	GCF_000271045	GCF_000271265	GCF_000280615	GCF_000270945	GCF_000270845	GCF_000271185	GCF_000280655	GCF_000270825	GCF_000500185	GCF_000270665	GCF_000271165	 [...]
+s__Plesiomonas_shigelloides	1	GCF_000392595
+s__Gremmeniella_abietina_mitochondrial_RNA_virus_S2	1	PRJNA15229
+s__Crow_polyomavirus	1	PRJNA16654
+s__Actinomyces_cardiffensis	1	GCF_000364865
+s__Plutella_xylostella_granulovirus	1	PRJNA14104
+s__Paramecium_tetraurelia	1	GCA_000165425
+s__Nonomuraea_coxensis	1	GCF_000379885
+s__Burkholderia_phage_Bcep43	1	PRJNA14411
+s__Klebsiella_phage_KP15	1	PRJNA47333
+s__Thioalkalivibrio_sp_K90mix	1	GCF_000025545
+s__Streptococcus_caballi	1	GCF_000379985
+s__Burkholderia_oklahomensis	2	GCF_000170375	GCF_000170355
+s__Sugarcane_bacilliform_IM_virus	1	PRJNA14123
+s__Mycobacterium_phage_Nigel	1	PRJNA30609
+s__Oxalobacteraceae_bacterium_IMCC9480	1	GCF_000195205
+s__Sweet_potato_leaf_curl_Japan_virus	1	PRJNA217880
+s__Microbulbifer_variabilis	1	GCF_000380565
+s__Paenibacillus_dendritiformis	1	GCF_000245555
+s__Coccidioides_posadasii	1	GCA_000151335
+s__Nitrolancetus_hollandicus	1	GCF_000297255
+s__Clostridium_methylpentosum	1	GCF_000158655
+s__Parvovirus_NIH_CQV	1	PRJNA215356
+s__alpha_proteobacterium_HIMB5	1	GCF_000299095
+s__Bacteroides_clarus	1	GCF_000195615
+s__Archaeoglobus_veneficus	1	GCF_000194625
+s__Brucella_suis	37	GCF_000371225	GCF_000371125	GCF_000371305	GCF_000365705	GCF_000480315	GCF_000366205	GCF_000007505	GCF_000209635	GCF_000371265	GCF_000480155	GCF_000366245	GCF_000480055	GCF_000480035	GCF_000236255	GCF_000371245	GCF_000365585	GCF_000371325	GCF_000160275	GCF_000157775	GCF_000157755	GCF_000292125	GCF_000292005	GCF_000366085	GCF_000371085	GCF_000371185	GCF_000371205	GCF_000018905	GCF_000371145	GCF_000366265	GCF_000292105	GCF_000480135	GCF_000365565	GCF_000223195	GCF_000371 [...]
+s__Corynebacterium_resistens	1	GCF_000177535
+s__Methanobrevibacter_sp_AbM4	1	GCF_000404165
+s__Treponema_lecithinolyticum	1	GCF_000468055
+s__Tamus_red_mosaic_virus	1	PRJNA73083
+s__Cocksfoot_mild_mosaic_virus	1	PRJNA30849
+s__Rhizobium_phage_RR1_A	1	PRJNA209209
+s__Brucella_sp_F5_99	1	GCF_000158995
+s__Lucky_bamboo_bacilliform_virus	1	PRJNA19855
+s__Clostridium_phage_phiCP34O	1	PRJNA181211
+s__Actinomadura_flavalba	1	GCF_000374305
+s__Human_papillomavirus_type_128	1	PRJNA62171
+s__Salmonella_phage_ST64T	1	PRJNA14230
+s__Dyoetapapillomavirus_1	1	PRJNA33407
+s__Macroptilium_mosaic_Puerto_Rico_virus	1	PRJNA14398
+s__Raphidiopsis_brookii	1	GCF_000175855
+s__Stenotrophomonas_maltophilia	14	GCF_000355725	GCF_000382065	GCF_000223885	GCF_000237025	GCF_000287935	GCF_000295735	GCF_000020665	GCF_000072485	GCF_000455685	GCF_000355745	GCF_000344215	GCF_000346445	GCF_000284595	GCF_000308335
+s__Alcaligenes_sp_EGD_AK7	1	GCF_000465875
+s__Haemophilus_aegyptius	1	GCF_000195005
+s__Phormidium_phage_Pf_WMP4	1	PRJNA17743
+s__Citromicrobium_bathyomarinum	1	GCF_000176355
+s__Halomonas_titanicae	1	GCF_000336575
+s__Cytophaga_hutchinsonii	1	GCF_000014145
+s__Pelagibacter_phage_HTVC010P	1	PRJNA192866
+s__Aspergillus_terreus	1	GCA_000149615
+s__Cupriavidus_sp_WS	1	GCF_000395345
+s__Marine_birnavirus	1	PRJNA16748
+s__Xestia_c_nigrum_granulovirus	1	PRJNA14092
+s__Lactobacillus_phage_J_1	1	PRJNA227005
+s__marine_gamma_proteobacterium_HTCC2080	1	GCF_000169115
+s__Angelonia_flower_break_virus	1	PRJNA16334
+s__Streptococcus_phage_EJ_1	1	PRJNA14604
+s__Pseudomonas_sp_TJI_51	1	GCF_000190455
+s__Modoc_virus	1	PRJNA15393
+s__Clostridium_sp_KLE_1755	1	GCF_000466465
+s__Coprococcus_comes	1	GCF_000155875
+s__Vesicular_stomatitis_Indiana_virus	1	PRJNA14673
+s__Gordonia_namibiensis	1	GCF_000298235
+s__Maize_Iranian_mosaic_virus	1	PRJNA32689
+s__Trichormus_azollae	1	GCF_000196515
+s__Rhizobium_giardinii	1	GCF_000379605
+s__Leptospira_sp_B5_022	1	GCF_000347035
+s__Shimwellia_blattae	2	GCF_000262305	GCF_000327265
+s__Eimeria_brunetti_RNA_virus_1	1	PRJNA14725
+s__Methylobacter_tundripaludum	1	GCF_000190755
+s__Theileria_parva	1	GCA_000165365
+s__Bat_polyomavirus	1	PRJNA32077
+s__Bacteroides_gallinarum	1	GCF_000374365
+s__Acidianus_rod_shaped_virus_1	1	PRJNA27799
+s__Youngiibacter_fragilis	1	GCF_000495435
+s__Liberibacter_phage_SC1	1	PRJNA181990
+s__Liberibacter_phage_SC2	1	PRJNA181991
+s__Acinetobacter_sp_NIPH_2171	1	GCF_000369625
+s__Cactus_virus_X	1	PRJNA14996
+s__Runella_slithyformis	1	GCF_000218895
+s__Streptococcus_phage_M102	1	PRJNA38845
+s__Fowlpox_virus	1	PRJNA14052
+s__Saccharomonospora_marina	1	GCF_000244955
+s__Wigglesworthia_glossinidia	2	GCF_000247565	GCF_000008885
+s__Shewanella_benthica	1	GCF_000172075
+s__Exiguobacterium_antarcticum	1	GCF_000299435
+s__Paenibacillus_barengoltzii	1	GCF_000403375
+s__Cohnella_laeviribosi	1	GCF_000378425
+s__Enterococcus_phage_phiFL3A	1	PRJNA42787
+s__Croceibacter_atlanticus	1	GCF_000196315
+s__Simian_virus_40	1	PRJNA14024
+s__Tomato_golden_mottle_virus	1	PRJNA14182
+s__Sea_turtle_tornovirus_1	1	PRJNA34541
+s__Possum_enterovirus_W6	1	PRJNA18519
+s__Possum_enterovirus_W1	1	PRJNA18517
+s__Thioalkalivibrio_sp_AKL6	1	GCF_000376905
+s__Thioalkalivibrio_sp_AKL7	1	GCF_000381705
+s__Neisseria_mucosa	2	GCF_000173875	GCF_000186165
+s__Enterobacteria_phage_HK633	1	PRJNA183143
+s__Thioalkalivibrio_sp_AKL3	1	GCF_000377805
+s__Tannerella_forsythia	1	GCF_000238215
+s__Enterobacteria_phage_HK630	1	PRJNA183142
+s__Staphylococcus_phage_phiMR25	1	PRJNA30061
+s__Thioalkalivibrio_sp_AKL8	1	GCF_000380525
+s__Thioalkalivibrio_sp_AKL9	1	GCF_000377825
+s__Border_disease_virus	1	PRJNA15463
+s__Cleome_golden_mosaic_virus	1	PRJNA65817
+s__Sulfurovum_sp_AR	1	GCF_000296775
+s__Staphylococcus_phage_PH15	1	PRJNA18525
+s__Leucothrix_mucor	1	GCF_000419525
+s__Tomato_leaf_curl_Pakistan_alphasatellite	1	PRJNA38463
+s__Brucella_sp_F8_99	1	GCF_000371005
+s__Bartonella_elizabethae	2	GCF_000278175	GCF_000278315
+s__Kosakonia_radicincitans	1	GCF_000280495
+s__Deferribacter_desulfuricans	1	GCF_000010985
+s__Faecalibacterium_prausnitzii	5	GCF_000166035	GCF_000209855	GCF_000154385	GCF_000210735	GCF_000162015
+s__SAR324_cluster_bacterium_JCVI_SC_AAA005	1	GCF_000224765
+s__Prevotella_pallens	1	GCF_000220255
+s__Walrus_calicivirus	1	PRJNA14874
+s__Ectocarpus_siliculosus_virus_1	1	PRJNA14114
+s__Rubus_canadensis_virus_1	1	PRJNA178460
+s__California_sea_lion_polyomavirus_1	1	PRJNA45909
+s__Streptococcus_sp_HPH0090	1	GCF_000411475
+s__Beutenbergia_cavernae	1	GCF_000023105
+s__Tomato_yellow_leaf_curl_China_alphasatellite	1	PRJNA15481
+s__Corynebacterium_sp_KPL2004	1	GCF_000477875
+s__Peptoniphilus_rhinitidis	1	GCF_000246925
+s__Parachlamydia_acanthamoebae	2	GCF_000176075	GCF_000253035
+s__Bovine_ephemeral_fever_virus	1	PRJNA14434
+s__Cotton_leaf_curl_Multan_virus_satellite_U36_1	1	PRJNA16312
+s__Desulfobacter_curvatus	1	GCF_000373985
+s__Desulfurobacterium_sp_TC5_1	1	GCF_000421485
+s__Brucella_melitensis	51	GCF_000370725	GCF_000366625	GCF_000370665	GCF_000250835	GCF_000298595	GCF_000366885	GCF_000182235	GCF_000366965	GCF_000366925	GCF_000331655	GCF_000348645	GCF_000365865	GCF_000370825	GCF_000370765	GCF_000158695	GCF_000366865	GCF_000370845	GCF_000022625	GCF_000367045	GCF_000370785	GCF_000367025	GCF_000365845	GCF_000292065	GCF_000370685	GCF_000158735	GCF_000227645	GCF_000479975	GCF_000370745	GCF_000370645	GCF_000298615	GCF_000367005	GCF_000209575	GCF_000192885	GCF_ [...]
+s__Blackcurrant_reversion_virus_satellite_RNA	1	PRJNA14821
+s__Pseudomonas_sp_CF150	1	GCF_000416175
+s__Acinetobacter_sp_TG27347	1	GCF_000301635
+s__Rhodanobacter_sp_115	1	GCF_000264335
+s__Seal_anellovirus_TFFN_USA_2006	1	PRJNA63583
+s__Acinetobacter_sp_NCTC_10304	1	GCF_000248215
+s__Bovine_hungarovirus	1	PRJNA176432
+s__Chthonomonas_calidirosea	1	GCF_000427095
+s__Pepper_chat_fruit_viroid	1	PRJNA32817
+s__Lactococcus_phage_340	1	PRJNA213081
+s__Chayote_mosaic_virus	1	PRJNA15420
+s__Bacillus_sp_AP8	1	GCF_000321185
+s__Lactococcus_phage_936_sensu_lato	7	PRJNA14087	PRJNA14096	PRJNA17737	PRJNA17739	PRJNA17757	PRJNA17759	PRJNA30597
+s__Leek_white_stripe_virus	1	PRJNA15082
+s__Campylobacter_rectus	1	GCF_000174175
+s__Pectobacterium_wasabiae	2	GCF_000024645	GCF_000291725
+s__Ageratum_yellow_vein_Sri_Lanka_virus	1	PRJNA14120
+s__Propionibacterium_humerusii	1	GCF_000204235
+s__Corynebacterium_ulcerans	4	GCF_000215645	GCF_000306825	GCF_000215665	GCF_000498915
+s__Brucella_sp_63_311	1	GCF_000370945
+s__Fujinami_sarcoma_virus	1	PRJNA14708
+s__Oryza_rufipogon_endornavirus	1	PRJNA16238
+s__Novosphingobium_aromaticivorans	1	GCF_000013325
+s__Carrot_mottle_virus	1	PRJNA72859
+s__gamma_proteobacterium_HdN1	1	GCF_000198515
+s__Persimmon_viroid	1	PRJNA28683
+s__Avian_gyrovirus_2	1	PRJNA65815
+s__Vesicular_exanthema_of_swine_virus	1	PRJNA14704
+s__Bilophila_wadsworthia	1	GCF_000185705
+s__Staphylococcus_massiliensis	2	GCF_000314555	GCF_000298075
+s__Lactobacillus_phage_Lrm1	1	PRJNA30879
+s__Tobacco_bushy_top_virus	1	PRJNA14868
+s__Apple_stem_grooving_virus	1	PRJNA15119
+s__Tobacco_leaf_curl_Japan_virus	1	PRJNA14261
+s__GB_virus_C	1	PRJNA15467
+s__Infectious_salmon_anemia_virus	1	PRJNA15020
+s__Planaria_asexual_strain_specific_virus_like_element_type_1	1	PRJNA14140
+s__Deep_sea_thermophilic_phage_D6E	1	PRJNA181996
+s__Ageratum_leaf_curl_virus	1	PRJNA14492
+s__Meiothermus_silvanus	1	GCF_000092125
+s__Klebsiella_phage_phiKO2	1	PRJNA14495
+s__Caulobacter_phage_phiCbK	1	PRJNA179418
+s__Vibrio_phage_11895_B1	1	PRJNA195495
+s__Mink_calicivirus	1	PRJNA183163
+s__Planctomyces_limnophilus	1	GCF_000092105
+s__Tomato_leaf_curl_Pakistan_virus	1	PRJNA17539
+s__St_Valerien_swine_virus	1	PRJNA38093
+s__Enterocytozoon_bieneusi	1	GCA_000209485
+s__Anaerococcus_tetradius	1	GCF_000159095
+s__Mason_Pfizer_monkey_virus	1	PRJNA14683
+s__Bean_chlorotic_mosaic_virus	1	PRJNA214690
+s__Synechococcus_phage_syn9	1	PRJNA17541
+s__Sulfolobus_turreted_icosahedral_virus_2	1	PRJNA48299
+s__Murine_leukemia_virus	2	PRJNA14907	PRJNA15204
+s__Rhodothermus_marinus	2	GCF_000024845	GCF_000224745
+s__Subdoligranulum_sp_4_3_54A2FAA	1	GCF_000238635
+s__Clostridium_sp_MSTE9	1	GCF_000277625
+s__Paraprevotella_clara	1	GCF_000233955
+s__Xylella_phage_Xfas53	1	PRJNA42595
+s__Caldivirga_maquilingensis	1	GCF_000018305
+s__Beet_ringspot_virus	1	PRJNA15287
+s__Achromobacter_piechaudii	2	GCF_000286415	GCF_000164035
+s__Ruminococcus_sp	1	GCF_000209835
+s__Zunongwangia_profunda	1	GCF_000023465
+s__Anaerococcus_vaginalis	1	GCF_000163295
+s__Prevotella_bivia	2	GCF_000177315	GCF_000262545
+s__Helicobacter_cinaedi	3	GCF_000349975	GCF_000155475	GCF_000284635
+s__Simian_retrovirus_4	1	PRJNA51791
+s__Arcanobacterium_haemolyticum	1	GCF_000092365
+s__Duck_circovirus	3	PRJNA14543	PRJNA14619	PRJNA15558
+s__Sida_yellow_vein_Vietnam_virus	1	PRJNA19783
+s__Chlamydia_phage_2	1	PRJNA14593
+s__Chlamydia_phage_3	1	PRJNA14471
+s__Chlamydia_phage_1	1	PRJNA14064
+s__Chlamydia_phage_4	1	PRJNA15781
+s__Soybean_yellow_mottle_mosaic_virus	1	PRJNA33135
+s__Beet_necrotic_yellow_vein_virus	1	PRJNA15033
+s__Paenibacillus_sp_OSY_SE	1	GCF_000283315
+s__Lactobacillus_versmoldensis	1	GCF_000260455
+s__Cucurbit_aphid_borne_yellows_virus	1	PRJNA15074
+s__Porcine_type_C_oncovirus	1	PRJNA14126
+s__Flavobacterium_rivuli	1	GCF_000378485
+s__Helicoverpa_armigera_granulovirus	1	PRJNA28275
+s__Pannonibacter_phragmitetus	1	GCF_000382365
+s__Pokeweed_mosaic_virus	1	PRJNA177901
+s__Mycobacterium_phage_Bxb1	1	PRJNA14109
+s__Komagataella_pastoris	1	GCA_000027005
+s__Apricot_pseudo_chlorotic_leaf_spot_virus	1	PRJNA15172
+s__Saccharomyces_cerevisiae_killer_virus_M1	1	PRJNA14678
+s__Magnetospirillum_sp_SO_1	1	GCF_000342045
+s__Enterobacteria_phage_fiAA91_ss	1	PRJNA226726
+s__Kineosphaera_limosa	1	GCF_000298215
+s__Sida_yellow_mosaic_Yucatan_virus	1	PRJNA18625
+s__Rotavirus_C	1	PRJNA16140
+s__Rotavirus_A	1	PRJNA32521
+s__Rotavirus_G	1	PRJNA209727
+s__Coriobacterium_glomerans	1	GCF_000195315
+s__Lactobacillus_hilgardii	1	GCF_000159315
+s__Novosphingobium_tardaugens	1	GCF_000466945
+s__Candidatus_Cloacimonas_acidaminovorans	1	GCF_000146065
+s__Pyrobaculum_neutrophilum	1	GCF_000019805
+s__Streptococcus_pseudoporcinus	2	GCF_000188035	GCF_000183465
+s__Clostridium_phage_phiCP13O	1	PRJNA181210
+s__Corynebacterium_propinquum	1	GCF_000375525
+s__Rhinovirus_B	1	PRJNA15309
+s__Rhinovirus_C	1	PRJNA27901
+s__Rhinovirus_A	1	PRJNA15330
+s__Vernonia_yellow_vein_virus	1	PRJNA16335
+s__Rehmannia_mosaic_virus	1	PRJNA18885
+s__Enterobacteria_phage_FI_sensu_lato	1	PRJNA15459
+s__Sweet_potato_chlorotic_fleck_virus	1	PRJNA15038
+s__Bacillus_phage_SP10	1	PRJNA181082
+s__Vibrio_sinaloensis	1	GCF_000189275
+s__Bacteroides_intestinalis	1	GCF_000172175
+s__Giardia_lamblia_virus	1	PRJNA15018
+s__Labrenzia_alexandrii	1	GCF_000158095
+s__Thioalkalivibrio_sp_ALMg11	1	GCF_000377905
+s__Ateline_herpesvirus_3	1	PRJNA14040
+s__SAR324_cluster_bacterium_SCGC_AAA001_C10	1	GCF_000213335
+s__Desulfovibrio_alaskensis	1	GCF_000012665
+s__Salmonella_phage_7_11	1	PRJNA72387
+s__Lunk_virus_NKS_1	1	PRJNA176605
+s__Cymbidium_ringspot_virus_satellite_RNA	1	PRJNA14989
+s__Turnip_vein_clearing_virus	1	PRJNA14685
+s__Anoxybacillus_sp_SK3_4	1	GCF_000443775
+s__Butyrivibrio_sp_NC2007	1	GCF_000421405
+s__Shewanella_woodyi	1	GCF_000019525
+s__Anaerococcus_sp_PH9	1	GCF_000307225
+s__Human_metapneumovirus	1	PRJNA15498
+s__Chryseobacterium_gleum	1	GCF_000143785
+s__Staphylococcus_phage_StauST398_1	1	PRJNA206473
+s__Staphylococcus_phage_StauST398_2	1	PRJNA206489
+s__Staphylococcus_phage_StauST398_3	1	PRJNA206472
+s__Lupine_mosaic_virus	1	PRJNA61853
+s__Glaciecola_agarilytica	1	GCF_000314935
+s__Corynebacterium_lipophiloflavum	1	GCF_000159635
+s__Fusobacterium_gonidiaformans	2	GCF_000158835	GCF_000158235
+s__Obodhiang_virus	1	PRJNA159049
+s__Lewinella_persica	1	GCF_000373105
+s__Streptomyces_phage_phiSASD1	1	PRJNA49613
+s__Acinetobacter_sp_CIP_102159	1	GCF_000368285
+s__Yersinia_phage_Berlin	1	PRJNA18481
+s__Geobacillus_sp_WCH70	1	GCF_000023385
+s__Bat_coronavirus_HKU10	1	PRJNA177902
+s__alpha_proteobacterium_SCGC_AAA280_B11	1	GCF_000371745
+s__Nocardiopsis_xinjiangensis	1	GCF_000341145
+s__Sulfolobus_acidocaldarius	3	GCF_000012285	GCF_000338775	GCF_000340315
+s__Streptomyces_sp_351MFTsu5_1	1	GCF_000383655
+s__Broad_bean_mottle_virus	1	PRJNA14833
+s__Butyrivibrio_proteoclasticus	1	GCF_000145035
+s__Acidovorax_radicis	1	GCF_000204195
+s__Thauera_aminoaromatica	1	GCF_000310185
+s__Tomato_chino_La_Paz_virus	1	PRJNA14368
+s__Lactate_dehydrogenase_elevating_virus	1	PRJNA14702
+s__Synechococcus_phage_S_CBS1	1	PRJNA76741
+s__Rhodococcus_pyridinivorans	1	GCF_000236965
+s__Human_herpesvirus_6B	1	PRJNA14422
+s__Candidatus_Pelagibacter_sp_HTCC7211	1	GCF_000155895
+s__Lactobacillus_iners	15	GCF_000160875	GCF_000191685	GCF_000149085	GCF_000179935	GCF_000149145	GCF_000177755	GCF_000204435	GCF_000179955	GCF_000179975	GCF_000149125	GCF_000149065	GCF_000149105	GCF_000185405	GCF_000191705	GCF_000179995
+s__Staphylococcus_phage_11	1	PRJNA14246
+s__Enterobacteria_phage_Phieco32	1	PRJNA28729
+s__Vibrio_phage_KSF_1phi	1	PRJNA14562
+s__Klebsiella_phage_KP36	1	PRJNA183428
+s__Klebsiella_phage_KP34	1	PRJNA42781
+s__Klebsiella_phage_KP32	1	PRJNA42779
+s__Blue_squill_virus_A	1	PRJNA179427
+s__Blattabacterium_sp_Nauphoeta_cinerea	1	GCF_000471965
+s__Klebsiella_sp_4_1_44FAA	1	GCF_000238715
+s__Candida_glabrata	1	GCA_000002545
+s__Desmodium_leaf_distortion_virus	1	PRJNA17991
+s__Leptospirillum_ferrooxidans	1	GCF_000284315
+s__Enterobacteria_phage_Qbeta	1	PRJNA15479
+s__Persephonella_marina	1	GCF_000021565
+s__Imtechella_halotolerans	1	GCF_000260835
+s__Lactobacillus_ultunensis	1	GCF_000159415
+s__Okra_leaf_curl_alphasatellite	1	PRJNA29397
+s__Streptococcus_sp_I_P16	1	GCF_000479315
+s__Bacillus_mycoides	3	GCF_000161435	GCF_000161415	GCF_000003925
+s__Kurthia_sp_JC8E	1	GCF_000285595
+s__Cyanophage_NATL1A_7	1	PRJNA81183
+s__Streptomyces_chartreusis	2	GCF_000226435	GCF_000226455
+s__Thiorhodospira_sibirica	1	GCF_000227725
+s__Erysimum_latent_virus	1	PRJNA14651
+s__Cherry_rasp_leaf_virus	1	PRJNA15131
+s__Hydrogenobaculum_sp_HO	1	GCF_000341855
+s__Malvastrum_leaf_curl_Philippines_virus	1	PRJNA203676
+s__Streptomyces_phage_VWB	1	PRJNA14485
+s__Dyoepsilonpapillomavirus_1	1	PRJNA39987
+s__Human_polyomavirus_12	1	PRJNA195931
+s__Enterococcus_phage_phiEf11	1	PRJNA42943
+s__Ectropis_obliqua_nucleopolyhedrovirus	1	PRJNA18273
+s__Sphingopyxis_alaskensis	1	GCF_000013985
+s__Azoarcus_toluclasticus	1	GCF_000378245
+s__Providencia_rettgeri	2	GCF_000314835	GCF_000158055
+s__Digitaria_streak_virus	1	PRJNA14069
+s__Moraxella_catarrhalis	12	GCF_000193025	GCF_000302495	GCF_000193065	GCF_000192985	GCF_000193085	GCF_000193005	GCF_000193045	GCF_000192965	GCF_000192905	GCF_000192925	GCF_000192945	GCF_000092265
+s__Goose_parvovirus	1	PRJNA14098
+s__Leptospira_kmetyi	1	GCF_000243735
+s__Porcine_stool_associated_circular_virus_3	1	PRJNA202890
+s__Hirame_rhabdovirus	1	PRJNA15132
+s__Kangiella_koreensis	1	GCF_000024085
+s__Diuris_virus_B	1	PRJNA178592
+s__Bacteroidetes_bacterium_oral_taxon_272	1	GCF_000442105
+s__Halorubrum_pleomorphic_virus_3	1	PRJNA157259
+s__Cowpea_severe_mosaic_virus	1	PRJNA15301
+s__Halorubrum_pleomorphic_virus_1	1	PRJNA36677
+s__Mycobacterium_phage_Crossroads	1	PRJNA215113
+s__Streptococcus_massiliensis	2	GCF_000380065	GCF_000341525
+s__Squash_leaf_curl_Philippines_virus	1	PRJNA14369
+s__Treponema_bryantii	1	GCF_000421345
+s__Bacteroides_eggerthii	3	GCF_000155815	GCF_000185605	GCF_000273465
+s__Alcelaphine_herpesvirus_1	1	PRJNA14099
+s__Lactobacillus_animalis	1	GCF_000183825
+s__Vibrio_phage_VSK	1	PRJNA14337
+s__Clostridium_sp_JC122	1	GCF_000285575
+s__Marivirga_tractuosa	1	GCF_000183425
+s__Xanthomonas_campestris	14	GCF_000277975	GCF_000277955	GCF_000277875	GCF_000277915	GCF_000277895	GCF_000070605	GCF_000007145	GCF_000012105	GCF_000159815	GCF_000263835	GCF_000221965	GCF_000233635	GCF_000277935	GCF_000321125
+s__Desulfotalea_psychrophila	1	GCF_000025945
+s__Persimmon_cryptic_virus	1	PRJNA167734
+s__Pseudomonas_luteola	1	GCF_000282775
+s__Nilaparvata_lugens_reovirus	1	PRJNA14775
+s__Mycobacterium_marinum	3	GCF_000419315	GCF_000419335	GCF_000018345
+s__Burkholderia_phage_phi52237	1	PRJNA15422
+s__Hydrogenobaculum_sp_Y04AAS1	1	GCF_000020785
+s__Rhodococcus_phage_RGL3	1	PRJNA81167
+s__Thiorhodovibrio_sp_970	1	GCF_000228725
+s__Caldisphaera_lagunensis	1	GCF_000317795
+s__Streptococcus_sp_C300	1	GCF_000187645
+s__Tall_oatgrass_mosaic_virus	1	PRJNA226728
+s__Yersinia_intermedia	1	GCF_000168035
+s__Bacteroides_dorei	5	GCF_000273035	GCF_000273075	GCF_000158335	GCF_000273055	GCF_000156075
+s__Hahella_ganghwensis	1	GCF_000376785
+s__Capnocytophaga_cynodegmi	1	GCF_000379185
+s__Thioalkalimicrobium_cyclicum	1	GCF_000214825
+s__Synechococcus_elongatus	2	GCF_000012525	GCF_000010065
+s__Owenweeksia_hongkongensis	1	GCF_000236705
+s__Burkholderia_pseudomallei	35	GCF_000439695	GCF_000170595	GCF_000259775	GCF_000294635	GCF_000346205	GCF_000169715	GCF_000193475	GCF_000170555	GCF_000193455	GCF_000170455	GCF_000259735	GCF_000152685	GCF_000494855	GCF_000152325	GCF_000259795	GCF_000152365	GCF_000170535	GCF_000259815	GCF_000347975	GCF_000260515	GCF_000015925	GCF_000170575	GCF_000170515	GCF_000445385	GCF_000170435	GCF_000182445	GCF_000012785	GCF_000170415	GCF_000170475	GCF_000182585	GCF_000182195	GCF_000152345	GCF_00025975 [...]
+s__Mastigocladopsis_repens	1	GCF_000315565
+s__Thalassobacter_arenae	1	GCF_000442275
+s__Mitsuokella_sp_oral_taxon_131	1	GCF_000469545
+s__Cucumber_leaf_spot_virus	1	PRJNA16590
+s__Bacillus_licheniformis	8	GCF_000008425	GCF_000315975	GCF_000260535	GCF_000477395	GCF_000260555	GCF_000011645	GCF_000258125	GCF_000408885
+s__Opuntia_virus_X	1	PRJNA14956
+s__Mycobacterium_phage_ArcherS7	1	PRJNA206478
+s__Perkinsus_marinus	1	GCA_000006405
+s__Erythrobacter_litoralis	1	GCF_000013005
+s__Dietzia_alimentaria	1	GCF_000226215
+s__Enterobacteria_phage_vB_EcoM_FV3	1	PRJNA181219
+s__Enterococcus_phage_phiFL1A	1	PRJNA42789
+s__Tupaia_paramyxovirus	1	PRJNA14723
+s__Rhodococcus_sp_DK17	1	GCF_000263875
+s__Desulfosporosinus_meridiei	1	GCF_000231385
+s__Maricaulis_maris	1	GCF_000014745
+s__Oropouche_virus	1	PRJNA14943
+s__Pediococcus_claussenii	1	GCF_000237995
+s__Streptococcus_sp_GMD5S	1	GCF_000298715
+s__Prevotella_ruminicola	1	GCF_000025925
+s__Sphingomonas_sp_PR090111_T3T_6A	1	GCF_000383095
+s__Nocardioides_sp_JS614	1	GCF_000015265
+s__Erectites_yellow_mosaic_virus	1	PRJNA19787
+s__Yersinia_massiliensis	1	GCF_000312485
+s__Sebokele_virus_1	1	PRJNA208541
+s__Papaya_leaf_curl_China_virus_satellite_DNA_beta	1	PRJNA19819
+s__Streptosporangium_roseum	1	GCF_000024865
+s__Acinetobacter_sp_MDS7A	1	GCF_000386005
+s__Eidolon_polyomavirus_1	1	PRJNA185194
+s__Beet_mosaic_virus	1	PRJNA14942
+s__Mungbean_yellow_mosaic_virus	1	PRJNA14555
+s__Enterobacter_phage_Enc34	1	PRJNA181238
+s__Arracacha_virus_B	1	PRJNA196183
+s__Sweet_potato_leaf_curl_China_virus	1	PRJNA225011
+s__Abelson_murine_leukemia_virus	1	PRJNA14654
+s__Acinetobacter_sp_CIP_A165	1	GCF_000367985
+s__Acinetobacter_sp_CIP_A162	1	GCF_000367905
+s__Shewanella_amazonensis	1	GCF_000015245
+s__Gordonia_sputi	1	GCF_000248055
+s__Enterobacteria_phage_UAB_Phi78	1	PRJNA191121
+s__Striped_Jack_nervous_necrosis_virus	1	PRJNA14741
+s__Sweet_potato_mild_mottle_virus	1	PRJNA15340
+s__Staphylococcus_phage_tp310_1	1	PRJNA20659
+s__Staphylococcus_phage_vB_SauM_Remus	1	PRJNA215669
+s__Torque_teno_tamarin_virus	1	PRJNA48169
+s__Pantoea_sp_aB	1	GCF_000179655
+s__Pepper_yellow_leaf_curl_China_virus	1	PRJNA188776
+s__Fig_cryptic_virus	1	PRJNA66565
+s__Mycoplasma_mycoides	4	GCF_000011445	GCF_000253075	GCF_000339035	GCF_000143865
+s__Thermosinus_carboxydivorans	1	GCF_000169155
+s__Cucurbit_yellow_stunting_disorder_virus	1	PRJNA14890
+s__Enterobacteria_phage_Phi1	1	PRJNA20789
+s__Aura_virus	1	PRJNA14830
+s__Desulfovibrio_alkalitolerans	1	GCF_000422245
+s__Candidatus_Arthromitus_sp_SFB_5	1	GCF_000252745
+s__Candidatus_Arthromitus_sp_SFB_4	1	GCF_000252725
+s__Candidatus_Arthromitus_sp_SFB_3	1	GCF_000252705
+s__Candidatus_Arthromitus_sp_SFB_2	1	GCF_000252685
+s__Candidatus_Arthromitus_sp_SFB_1	1	GCF_000252665
+s__Ralstonia_sp_PBA	1	GCF_000272025
+s__Edwardsiella_tarda	5	GCF_000341505	GCF_000020865	GCF_000163955	GCF_000348565	GCF_000146305
+s__Acinetobacter_sp_HA	1	GCF_000264725
+s__Cyanobium_sp_PCC_7001	1	GCF_000155635
+s__Vulcanisaeta_moutnovskia	1	GCF_000190315
+s__Sclerotinia_sclerotiorum_hypovirulence_associated_DNA_virus_1	1	PRJNA39985
+s__Okra_enation_leaf_curl_alphasatellite	1	PRJNA184814
+s__Clostera_anastomosis_granulovirus	1	PRJNA226250
+s__Tomato_pseudo_curly_top_virus	1	PRJNA14582
+s__Leuconostoc_inhae	1	GCF_000166735
+s__Betapapillomavirus_1	1	PRJNA15511
+s__Cyanophage_P_RSM1	1	PRJNA198436
+s__Cyanophage_P_RSM6	1	PRJNA195506
+s__Betapapillomavirus_3	1	PRJNA15455
+s__Laribacter_hongkongensis	1	GCF_000021025
+s__Betapapillomavirus_2	1	PRJNA15456
+s__Japanese_eel_endothelial_cells_infecting_virus	1	PRJNA62749
+s__Coleus_blumei_viroid_2	1	PRJNA14783
+s__Synechococcus_sp_JA_3_3Ab	1	GCF_000013205
+s__Woodchuck_hepatitis_virus	1	PRJNA14212
+s__Merremia_mosaic_Puerto_Rico_virus	1	PRJNA66549
+s__Synechococcus_sp_RCC307	1	GCF_000063525
+s__Frankia_sp_EUN1f	1	GCF_000177675
+s__Cycloclasticus_sp_PY97M	1	GCF_000444935
+s__Betapapillomavirus_6	1	PRJNA68287
+s__Mycoplasma_mobile	1	GCF_000008365
+s__Murid_herpesvirus_8	1	PRJNA182227
+s__Murid_herpesvirus_4	1	PRJNA14458
+s__Clostridium_phage_phiC2	1	PRJNA19153
+s__Murid_herpesvirus_1	1	PRJNA15181
+s__Murid_herpesvirus_2	1	PRJNA14419
+s__Pseudomonas_phage_SN	1	PRJNA33327
+s__Thalassiosira_pseudonana	1	GCA_000149405
+s__Aeromonas_sp_MDS8	1	GCF_000388005
+s__Bacillus_bataviensis	1	GCF_000307875
+s__Acholeplasma_laidlawii	1	GCF_000018785
+s__Neisseria_bacilliformis	1	GCF_000194925
+s__Choristoneura_rosaceana_alphabaculovirus	1	PRJNA214178
+s__Ipomoea_yellow_vein_virus	1	PRJNA39615
+s__Halovirus_HRTV_4	1	PRJNA206493
+s__Haemophilus_phage_Aaphi23	1	PRJNA15228
+s__Methanothermococcus_okinawensis	1	GCF_000179575
+s__Penaeus_merguiensis_densovirus	1	PRJNA15556
+s__Serratia_odorifera	1	GCF_000163595
+s__Bifidobacterium_magnum	1	GCF_000420565
+s__Oligella_urethralis	1	GCF_000372065
+s__Halomonas_stevensii	1	GCF_000275725
+s__Selenomonas_ruminantium	1	GCF_000284095
+s__Synechococcus_phage_S_IOM18	1	PRJNA209067
+s__Sweet_potato_leaf_curl_Canary_virus	1	PRJNA41623
+s__Achromobacter_arsenitoxydans	1	GCF_000236785
+s__Oat_chlorotic_stunt_virus	1	PRJNA15081
+s__Bovine_parvovirus_2	1	PRJNA14553
+s__alpha_proteobacterium_SCGC_AAA280_P20	1	GCF_000371845
+s__Cupixi_virus	1	PRJNA28327
+s__Ntaya_virus	1	PRJNA176549
+s__Amycolatopsis_balhimycina	1	GCF_000384295
+s__Pseudovibrio_sp_JE062	1	GCF_000156235
+s__Peanut_clump_virus	1	PRJNA14776
+s__Ruminococcaceae_bacterium_D16	1	GCF_000177015
+s__Caldicellulosiruptor_kristjanssonii	1	GCF_000166695
+s__Mycobacterium_avium	40	GCF_000218095	GCF_000504845	GCF_000505015	GCF_000216015	GCF_000504745	GCF_000240525	GCF_000390085	GCF_000240465	GCF_000007865	GCF_000240405	GCF_000240425	GCF_000218055	GCF_000504785	GCF_000504865	GCF_000240345	GCF_000218135	GCF_000218155	GCF_000504765	GCF_000504945	GCF_000504925	GCF_000504885	GCF_000215815	GCF_000504825	GCF_000240445	GCF_000240385	GCF_000218115	GCF_000504975	GCF_000014985	GCF_000504905	GCF_000240485	GCF_000218075	GCF_000330785	GCF_000504725	GCF_ [...]
+s__Papaya_ringspot_virus	1	PRJNA15289
+s__Listeria_ivanovii	2	GCF_000183925	GCF_000252975
+s__Marinobacter_santoriniensis	1	GCF_000347775
+s__Halomonas_lutea	1	GCF_000378505
+s__Flavobacterium_sp_MS220_5C	1	GCF_000341755
+s__Campylobacter_sp_10_1_50	1	GCF_000238755
+s__Okra_yellow_crinkle_Cameroon_alphasatellite	1	PRJNA61907
+s__Aquimarina_agarilytica	1	GCF_000255455
+s__Candidatus_Halobonum_tyrrellensis	1	GCF_000495475
+s__Alcaligenes_sp_HPC1271	1	GCF_000313875
+s__Salmonella_phage_iEPS5	1	PRJNA212949
+s__Candidatus_Haloredivivus_sp_G17	1	GCF_000236195
+s__Leptosphaeria_maculans	1	GCA_000230375
+s__Geobacter_bemidjiensis	1	GCF_000020725
+s__Rhizoctonia_solani_virus_717	1	PRJNA14807
+s__Halomicrobium_mukohataei	1	GCF_000023965
+s__Prevotella_sp_C561	1	GCF_000224595
+s__Papaya_leaf_curl_virus	1	PRJNA14213
+s__Hepatitis_A_virus	1	PRJNA15308
+s__Pseudoclavibacter_faecalis	1	GCF_000381765
+s__Honeysuckle_ringspot_virus	1	PRJNA62211
+s__Actinobaculum_urinale	1	GCF_000420445
+s__Methanothermobacter_thermautotrophicus	1	GCF_000008645
+s__Chrysanthemum_stunt_viroid	1	PRJNA14968
+s__Gluconacetobacter_europaeus	3	GCF_000285295	GCF_000285335	GCF_000227545
+s__Botryotinia_fuckeliana	1	GCA_000143535
+s__Chlorogloeopsis_fritschii	1	GCF_000317285
+s__Papaya_mosaic_virus	1	PRJNA14700
+s__Dalechampia_chlorotic_mosaic_virus	1	PRJNA176616
+s__Leptospirillum_sp_Group_IV	1	GCF_000496115
+s__Streptobacillus_moniliformis	1	GCF_000024565
+s__Porphyromonas_sp_oral_taxon_278	1	GCF_000467855
+s__Porphyromonas_sp_oral_taxon_279	1	GCF_000292995
+s__Thauera_sp_28	1	GCF_000310145
+s__Methanobacterium_phage_psiM2	1	PRJNA14160
+s__Allium_virus_X	1	PRJNA34843
+s__Pseudomonas_sp_PAMC_25886	1	GCF_000242655
+s__Methanobacterium_sp_SWAN_1	1	GCF_000214725
+s__Bacillus_sp_JC63	1	GCF_000311725
+s__Gordonia_effusa	1	GCF_000241305
+s__Tomato_leaf_curl_Joydebpur_betasatellite	1	PRJNA28273
+s__Parsnip_yellow_fleck_virus	1	PRJNA15299
+s__Tomato_chlorotic_dwarf_viroid	1	PRJNA14973
+s__Simian_virus_12	1	PRJNA16189
+s__Tomato_leaf_curl_Laos_virus	1	PRJNA14244
+s__Haemophilus_phage_HP1	1	PRJNA14078
+s__Bacillus_pseudofirmus	1	GCF_000005825
+s__Acidaminococcus_intestini	1	GCF_000230275
+s__Staphylococcus_lentus	1	GCF_000286395
+s__Flavobacteria_bacterium_MS024_3C	1	GCF_000173115
+s__Rhodococcus_sp_JVH1	1	GCF_000280725
+s__Staphylococcus_phage_37	1	PRJNA15271
+s__Terracoccus_sp_273MFTsu3_1	1	GCF_000383675
+s__Thermoplasma_volcanium	1	GCF_000011185
+s__alpha_proteobacterium_SCGC_AAA027_C06	1	GCF_000364545
+s__Murine_leukemia_related_retroviruses	1	PRJNA16631
+s__Brevundimonas_subvibrioides	1	GCF_000144605
+s__Pandoraea_sp_B_6	1	GCF_000282835
+s__Orf_virus	1	PRJNA14464
+s__Enterococcus_malodoratus	2	GCF_000393875	GCF_000407185
+s__Grapevine_chrome_mosaic_virus	1	PRJNA15285
+s__Paenibacillus_lactis	1	GCF_000230915
+s__Adoxophyes_orana_granulovirus	1	PRJNA14298
+s__Bacillus_cellulosilyticus	1	GCF_000177235
+s__Streptococcus_orisratti	1	GCF_000380105
+s__Omsk_hemorrhagic_fever_virus	1	PRJNA14995
+s__Helicobacter_macacae	1	GCF_000507845
+s__Thermomicrobium_roseum	1	GCF_000021685
+s__Prevotella_sp_MSX73	1	GCF_000287635
+s__Ostreococcus_lucimarinus_virus_OlV1	1	PRJNA61011
+s__Ostreococcus_lucimarinus_virus_OlV5	1	PRJNA195483
+s__Clostridium_scindens	1	GCF_000154505
+s__Enterococcus_gallinarum	1	GCF_000157255
+s__Coprinopsis_cinerea	1	GCA_000182895
+s__Guinea_pig_Chlamydia_phage	1	PRJNA14012
+s__Staphylococcus_phage_3A	1	PRJNA15269
+s__Bacillus_pseudomycoides	1	GCF_000161455
+s__Bacteroides_coprophilus	1	GCF_000157915
+s__Bacillus_sp_5B6	1	GCF_000259405
+s__Bacteroides_stercoris	2	GCF_000154525	GCF_000413395
+s__Synechococcus_sp_PCC_7502	1	GCF_000317085
+s__Fulvimarina_pelagi	1	GCF_000153705
+s__Klebsiella_oxytoca	11	GCF_000492815	GCF_000276705	GCF_000247915	GCF_000252915	GCF_000492955	GCF_000240325	GCF_000247835	GCF_000247875	GCF_000247895	GCF_000269585	GCF_000247855
+s__Mouse_astrovirus_M_52_USA_2008	1	PRJNA72381
+s__Heron_hepatitis_B_virus	1	PRJNA15458
+s__Pseudoalteromonas_phage_RIO_1	1	PRJNA206039
+s__Desulfitobacterium_metallireducens	1	GCF_000231405
+s__Maricaulis_sp_JL2009	1	GCF_000412185
+s__Xanthomonas_arboricola	1	GCF_000306055
+s__Leishmania_braziliensis	1	GCA_000002845
+s__Pedobacter_saltans	1	GCF_000190735
+s__Eubacterium_brachy	1	GCF_000488855
+s__Clostridium_acetobutylicum	3	GCF_000008765	GCF_000218855	GCF_000191905
+s__Grapevine_virus_E	1	PRJNA30853
+s__Mycobacterium_phage_Cali	1	PRJNA31291
+s__Grapevine_virus_A	1	PRJNA15086
+s__Grapevine_virus_B	1	PRJNA15083
+s__Bacillus_cereus_thuringiensis	174	GCF_000013065	GCF_000293525	GCF_000293685	GCF_000161235	GCF_000290715	GCF_000387405	GCF_000293745	GCF_000291115	GCF_000161715	GCF_000399605	GCF_000290895	GCF_000291235	GCF_000399405	GCF_000161335	GCF_000292705	GCF_000290975	GCF_000021225	GCF_000399385	GCF_000399005	GCF_000399065	GCF_000181615	GCF_000161055	GCF_000398985	GCF_000398945	GCF_000399165	GCF_000291535	GCF_000300475	GCF_000291075	GCF_000290655	GCF_000338755	GCF_000291415	GCF_000290675	GCF_000 [...]
+s__Sphingobium_xenophagum	2	GCF_000367345	GCF_000277525
+s__freshwater_metagenome	1	GCF_000500915
+s__Blattella_germanica_densovirus	1	PRJNA14320
+s__Porcine_astrovirus_3	1	PRJNA181247
+s__Rabies_virus	1	PRJNA15144
+s__Sandfly_fever_Sicilian_virus	1	PRJNA66185
+s__Methanocaldococcus_villosus	2	GCF_000363885	GCF_000371805
+s__Anticarsia_gemmatalis_nucleopolyhedrovirus	1	PRJNA17995
+s__Mycobacterium_phage_U2	1	PRJNA20943
+s__Varroa_destructor_virus_1	1	PRJNA15121
+s__Enterococcus_faecalis	301	GCF_000390965	GCF_000391525	GCF_000393275	GCF_000394995	GCF_000393315	GCF_000147255	GCF_000391085	GCF_000396905	GCF_000393395	GCF_000391145	GCF_000147555	GCF_000396485	GCF_000159255	GCF_000294045	GCF_000294325	GCF_000294225	GCF_000415185	GCF_000294025	GCF_000294005	GCF_000390645	GCF_000394455	GCF_000415205	GCF_000394395	GCF_000415025	GCF_000396385	GCF_000390805	GCF_000148265	GCF_000394275	GCF_000392975	GCF_000395985	GCF_000394375	GCF_000393075	GCF_000391565	G [...]
+s__Sulfolobus_tokodaii	1	GCF_000011205
+s__Spodoptera_exigua_iflavirus_1	1	PRJNA77135
+s__Mycobacterium_sp_MOTT36Y	1	GCF_000262165
+s__Geobacillus_sp_Y412MC52	1	GCF_000174795
+s__Peanut_stunt_virus	1	PRJNA15471
+s__Haemophilus_phage_SuMu	1	PRJNA181066
+s__Streptomyces_coelicoflavus	1	GCF_000241835
+s__Methylosinus_sp_LW4	1	GCF_000379125
+s__Albidiferax_ferrireducens	1	GCF_000013605
+s__gamma_proteobacterium_SCGC_AAA001_B15	1	GCF_000213375
+s__Thioalkalivibrio_sp_ALSr1	1	GCF_000381945
+s__Leek_yellow_stripe_virus	1	PRJNA15184
+s__Tobacco_leaf_curl_Pusa_virus	1	PRJNA56021
+s__Mycovirus_FusoV	1	PRJNA14829
+s__Corallococcus_coralloides	1	GCF_000255295
+s__Cucumber_vein_yellowing_virus	1	PRJNA15153
+s__Vicia_faba_endornavirus	1	PRJNA16237
+s__Bovine_adenovirus_D	1	PRJNA14486
+s__Bovine_adenovirus_E	1	PRJNA185272
+s__Bovine_adenovirus_B	2	PRJNA14515	PRJNA40311
+s__Orenia_marismortui	1	GCF_000379025
+s__Desulfovibrio_sp_A2	1	GCF_000226255
+s__Culex_flavivirus	1	PRJNA18303
+s__Ranid_herpesvirus_1	1	PRJNA17181
+s__Bacillus_phage_SPP1	1	PRJNA14586
+s__Sida_yellow_vein_Vietnam_virus_satellite_DNA_beta	1	PRJNA19825
+s__Ralstonia_sp_5_2_56FAA	1	GCF_000227255
+s__Bacillus_phage_BCD7	1	PRJNA181220
+s__Bettongia_penicillata_papillomavirus_1	1	PRJNA48601
+s__Spirosoma_luteum	1	GCF_000374065
+s__Denitrovibrio_acetiphilus	1	GCF_000025725
+s__Clostridium_bartlettii	1	GCF_000154445
+s__Synechococcus_sp_WH_8016	1	GCF_000230675
+s__Listeria_monocytogenes	53	GCF_000093125	GCF_000465735	GCF_000209755	GCF_000008285	GCF_000195395	GCF_000382925	GCF_000196035	GCF_000306905	GCF_000465815	GCF_000258905	GCF_000168615	GCF_000318055	GCF_000210795	GCF_000168495	GCF_000168655	GCF_000382945	GCF_000212455	GCF_000307615	GCF_000168475	GCF_000307045	GCF_000021185	GCF_000307005	GCF_000197755	GCF_000168395	GCF_000168635	GCF_000306985	GCF_000022925	GCF_000465755	GCF_000465775	GCF_000465795	GCF_000168555	GCF_000026705	GCF_000168415	G [...]
+s__Meyerozyma_guilliermondii	1	GCA_000149425
+s__Marinimicrobia_bacterium_SCGC_AAA076_M08	1	GCF_000402675
+s__Erwinia_tasmaniensis	1	GCF_000026185
+s__Okra_yellow_crinkle_virus	1	PRJNA17807
+s__Leeuwenhoekiella_blandensis	1	GCF_000152985
+s__Capnocytophaga_sp_oral_taxon_412	1	GCF_000271925
+s__Gluconacetobacter_xylinus	1	GCF_000182745
+s__Arthrobacter_sp_162MFSha1_1	1	GCF_000374905
+s__Broome_virus	1	PRJNA49651
+s__Pepper_leaf_curl_virus_satellite_DNA_beta	1	PRJNA28283
+s__Citrus_leaf_blotch_virus	1	PRJNA14825
+s__Novosphingobium_nitrogenifigens	1	GCF_000375445
+s__Thalassospira_lucentensis	1	GCF_000421265
+s__Erysipelotrichaceae_bacterium_6_1_45	1	GCF_000242175
+s__Clostridium_perfringens	11	GCF_000171135	GCF_000171215	GCF_000171195	GCF_000243175	GCF_000013285	GCF_000171175	GCF_000255475	GCF_000171155	GCF_000009685	GCF_000013845	GCF_000172455
+s__Burkholderia_phage_Bcep176	1	PRJNA16102
+s__Saccharomyces_cerevisiae_virus_L_A	1	PRJNA14792
+s__Cassava_vein_mosaic_virus	1	PRJNA14056
+s__Staphylococcus_phage_PT1028	1	PRJNA15262
+s__Iotapapillomavirus_1	1	PRJNA14022
+s__Bacillus_sp_WBUNB001	1	GCF_000309525
+s__Kordiimonas_gwangyangensis	1	GCF_000375545
+s__actinobacterium_SCGC_AAA024_D14	1	GCF_000372025
+s__Lactobacillus_gasseri	8	GCF_000155935	GCF_000406345	GCF_000177415	GCF_000014425	GCF_000439915	GCF_000175055	GCF_000283135	GCF_000143645
+s__Dehalobacter_sp_FTH1	1	GCF_000372005
+s__Olive_mild_mosaic_virus	1	PRJNA15159
+s__Treponema_pallidum	10	GCF_000246755	GCF_000410555	GCF_000008605	GCF_000246815	GCF_000246775	GCF_000387485	GCF_000304295	GCF_000246795	GCF_000024485	GCF_000410535
+s__Phaeocystis_globosa_virus	1	PRJNA206023
+s__Natrinema_versiforme	1	GCF_000337195
+s__Flavobacterium_psychrophilum	1	GCF_000064305
+s__Carnobacterium_sp_WN1359	1	GCF_000493735
+s__Caldicellulosiruptor_obsidiansis	1	GCF_000145215
+s__Tomato_curly_stunt_virus	1	PRJNA14267
+s__Cotton_leaf_curl_Gezira_beta	1	PRJNA20565
+s__Glaciibacter_superstes	1	GCF_000421145
+s__Pepper_leaf_curl_Yunnan_virus_YN323	1	PRJNA29413
+s__Panicum_streak_virus	1	PRJNA14076
+s__Streptomyces_sp_CNQ766	1	GCF_000377105
+s__Nariva_virus	1	PRJNA167112
+s__Mycobacterium_sp_JDM601	1	GCF_000214155
+s__Quail_picornavirus_QPV1_HUN_2010	1	PRJNA77133
+s__Lactobacillus_otakiensis	1	GCF_000415925
+s__Isoptericola_variabilis	1	GCF_000215105
+s__Podospora_anserina	1	GCA_000226545
+s__Rice_yellow_stunt_virus	1	PRJNA14793
+s__Amycolatopsis_orientalis	1	GCF_000400635
+s__Lily_symptomless_virus	1	PRJNA15015
+s__Oceanibulbus_indolifex	1	GCF_000172095
+s__Synechococcus_sp_PCC_7335	1	GCF_000155595
+s__Synechococcus_sp_PCC_7336	1	GCF_000332275
+s__Nitratireductor_indicus	1	GCF_000300515
+s__Prevotella_amnii	2	GCF_000177355	GCF_000378745
+s__Streptomyces_sp_DvalAA_83	1	GCF_000382745
+s__Cotton_leaf_curl_Allahabad_virus	1	PRJNA61851
+s__Bacteroides_vulgatus	4	GCF_000273295	GCF_000012825	GCF_000178195	GCF_000403235
+s__Staphylococcus_saprophyticus	2	GCF_000251125	GCF_000010125
+s__Turnip_ringspot_virus	1	PRJNA40327
+s__Marine_Group_III_euryarchaeote_SCGC_AAA007_O11	1	GCF_000372505
+s__Staphylococcus_phage_phi_12	1	PRJNA14247
+s__Aggregatibacter_aphrophilus	3	GCF_000231255	GCF_000226495	GCF_000022985
+s__Citricoccus_sp_CH26A	1	GCF_000224415
+s__Subterranean_clover_mottle_virus	1	PRJNA15403
+s__Bacteriovorax_sp_Seq25_V	1	GCF_000447795
+s__Ageratum_yellow_vein_China_betasatellite	1	PRJNA15515
+s__Prevotella_multiformis	1	GCF_000191065
+s__Halothermothrix_orenii	1	GCF_000020485
+s__Okra_leaf_curl_virus	1	PRJNA39605
+s__Salmonella_enterica	522	GCF_000414805	GCF_000272895	GCF_000329305	GCF_000505305	GCF_000329045	GCF_000231625	GCF_000486165	GCF_000487855	GCF_000487735	GCF_000189195	GCF_000231605	GCF_000272975	GCF_000487775	GCF_000189075	GCF_000486265	GCF_000231685	GCF_000272875	GCF_000486995	GCF_000231525	GCF_000484455	GCF_000494385	GCF_000329285	GCF_000483855	GCF_000484195	GCF_000272915	GCF_000231585	GCF_000272935	GCF_000020925	GCF_000335895	GCF_000271885	GCF_000258365	GCF_000484055	GCF_000500025	GCF [...]
+s__Zygocactus_virus_X	1	PRJNA14955
+s__Dracaena_mottle_virus	1	PRJNA16799
+s__Rice_tungro_spherical_virus	1	PRJNA15332
+s__Megamonas_rupellensis	1	GCF_000378365
+s__Streptococcus_ovis	1	GCF_000380125
+s__TGP_Carmovirus_1	1	PRJNA64491
+s__Delftia_sp_Cs1_4	1	GCF_000214395
+s__Nootka_lupine_vein_clearing_virus	1	PRJNA18853
+s__marine_actinobacterium_PHSC20C1	1	GCF_000153145
+s__Red_clover_cryptic_virus_1	1	PRJNA225924
+s__Acinetobacter_sp_P8_3_8	1	GCF_000214135
+s__Bacillus_phage_phi29	1	PRJNA30615
+s__Red_clover_cryptic_virus_2	1	PRJNA198686
+s__Pasteurella_multocida	21	GCF_000412105	GCF_000412075	GCF_000296345	GCF_000478235	GCF_000259545	GCF_000298675	GCF_000412015	GCF_000291645	GCF_000255915	GCF_000412035	GCF_000291625	GCF_000413135	GCF_000219335	GCF_000006825	GCF_000219315	GCF_000234745	GCF_000298655	GCF_000291605	GCF_000412125	GCF_000409915	GCF_000469095
+s__Butyrivibrio_crossotus	1	GCF_000156015
+s__Chilli_ringspot_virus	1	PRJNA73825
+s__Rhizobium_grahamii	1	GCF_000298315
+s__Cryptosporidium_parvum	1	GCA_000165345
+s__Lactococcus_phage_c2	1	PRJNA14029
+s__Achromobacter_xylosoxidans	3	GCF_000165835	GCF_000219745	GCF_000186185
+s__Apoi_virus	1	PRJNA15369
+s__Caulobacter_segnis	1	GCF_000092285
+s__Mycoplasma_arthritidis	1	GCF_000020065
+s__Empedobacter_brevis	1	GCF_000382425
+s__Potato_mop_top_virus	1	PRJNA14789
+s__Cronobacter_universalis	1	GCF_000319325
+s__Enterobacteria_phage_M	1	PRJNA183161
+s__Prevotella_sp_oral_taxon_317	1	GCF_000162415
+s__Candidatus_Carsonella_ruddii	2	GCF_000287275	GCF_000010365
+s__Thermodesulfobacterium_geofontis	1	GCF_000215975
+s__Streptomyces_phage_Lika	1	PRJNA206037
+s__Agromyces_italicus	1	GCF_000421545
+s__Streptococcus_phage_O1205	1	PRJNA14226
+s__Anguillid_herpesvirus_1	1	PRJNA42931
+s__Lachnospiraceae_oral_taxon_107	1	GCF_000209465
+s__Staphylococcus_phage_phiNM	1	PRJNA18293
+s__Acidilobus_saccharovorans	1	GCF_000144915
+s__Bdellovibrio_exovorus	1	GCF_000348725
+s__Pasteurella_dagmatis	1	GCF_000163475
+s__Flavobacterium_sp_SCGC_AAA160_P02	1	GCF_000383355
+s__Staphylococcus_intermedius	1	GCF_000308095
+s__Murine_adenovirus_A	2	PRJNA14519	PRJNA40319
+s__Staphylococcus_carnosus	1	GCF_000009405
+s__Murine_adenovirus_C	1	PRJNA37713
+s__Murine_adenovirus_B	1	PRJNA61855
+s__Geobacillus_caldoxylosilyticus	1	GCF_000313345
+s__Agarivorans_albus	1	GCF_000414175
+s__Lactobacillus_zeae	1	GCF_000260435
+s__Prevotella_micans	2	GCF_000243035	GCF_000373705
+s__Valsa_ceratosperma_hypovirus_1	1	PRJNA157807
+s__Slackia_sp_CM382	1	GCF_000293015
+s__Klebsiella_phage_K11	1	PRJNA62963
+s__Rickettsia_parkeri	1	GCF_000284195
+s__Pseudomonas_phage_201phi2_1	1	PRJNA30097
+s__Geobacillus_phage_GBSV1	1	PRJNA17775
+s__Acinetobacter_nectaris	1	GCF_000488215
+s__Aeromonas_veronii	6	GCF_000204115	GCF_000298035	GCF_000297995	GCF_000464515	GCF_000297975	GCF_000298015
+s__Candidatus_Solibacter_usitatus	1	GCF_000014905
+s__Acinetobacter_sp_WC_743	1	GCF_000335555
+s__Hippea_alviniae	1	GCF_000420385
+s__Rice_yellow_mottle_virus_satellite	1	PRJNA14152
+s__Vibrio_natriegens	1	GCF_000417905
+s__Thioalkalivibrio_sp_ALE28	1	GCF_000377425
+s__Nitrosococcus_oceani	2	GCF_000155655	GCF_000012805
+s__Bartonella_alsatica	1	GCF_000280015
+s__Thioalkalivibrio_sp_ALE20	1	GCF_000381405
+s__Thioalkalivibrio_sp_ALE23	1	GCF_000378545
+s__Thioalkalivibrio_sp_ALE22	1	GCF_000381445
+s__Thioalkalivibrio_sp_ALE25	1	GCF_000377285
+s__Thioalkalivibrio_sp_ALE27	1	GCF_000377485
+s__Streptococcus_canis	1	GCF_000268305
+s__Geitlerinema_sp_PCC_7407	1	GCF_000317045
+s__Enterobacteria_phage_cdtI	1	PRJNA19737
+s__Torque_teno_felis_virus	1	PRJNA48143
+s__Thermosediminibacter_oceani	1	GCF_000144645
+s__Cyclobacterium_qasimii	1	GCF_000427295
+s__Cronobacter_phage_vB_CsaM_GAP32	1	PRJNA179410
+s__Cronobacter_phage_vB_CsaM_GAP31	1	PRJNA179409
+s__Citrus_bark_cracking_viroid	1	PRJNA14757
+s__Geobacillus_sp_MAS1	1	GCF_000498995
+s__Sweet_potato_leaf_curl_Bengal_virus	1	PRJNA42745
+s__Human_papillomavirus_type_129	1	PRJNA62175
+s__Salmonella_phage_Jersey	1	PRJNA212713
+s__Raspberry_leaf_mottle_virus	1	PRJNA18275
+s__Clostridium_sp_ASF356	1	GCF_000364165
+s__Oscillatoria_formosa	1	GCF_000332155
+s__Culex_nigripalpus_nucleopolyhedrovirus	1	PRJNA14128
+s__Lactococcus_phage_TP901_1	1	PRJNA14116
+s__Tobacco_necrosis_virus_D	1	PRJNA14747
+s__Azorhizobium_caulinodans	1	GCF_000010525
+s__Tobacco_necrosis_virus_A	1	PRJNA15146
+s__Sporobolus_striate_mosaic_virus_2	1	PRJNA174776
+s__Sporobolus_striate_mosaic_virus_1	1	PRJNA174775
+s__Porcine_partetravirus	1	PRJNA215864
+s__Leuconostoc_lactis	2	GCF_000185085	GCF_000179875
+s__Bartonella_melophagi	1	GCF_000278255
+s__Acinetobacter_ursingii	4	GCF_000368825	GCF_000369885	GCF_000368845	GCF_000248135
+s__Thermoproteus_tenax_spherical_virus_1	1	PRJNA14540
+s__Mycobacterium_phage_Omega	1	PRJNA14273
+s__Amycolatopsis_nigrescens	1	GCF_000384315
+s__Catenovulum_agarivorans	1	GCF_000281085
+s__Hardenbergia_mosaic_virus	1	PRJNA65811
+s__Clostridium_phage_vB_CpeS_CP51	1	PRJNA206487
+s__Haloferax_denitrificans	1	GCF_000337795
+s__Carnation_mottle_virus	1	PRJNA14993
+s__Rheinheimera_nanhaiensis	1	GCF_000296695
+s__Canine_distemper_virus	1	PRJNA15002
+s__Capnocytophaga_sp_oral_taxon_336	1	GCF_000411575
+s__Pseudomonas_phage_phikF77	1	PRJNA36373
+s__Capnocytophaga_sp_oral_taxon_335	1	GCF_000277665
+s__Capnocytophaga_sp_oral_taxon_338	1	GCF_000192225
+s__Halovirus_HSTV_2	1	PRJNA186951
+s__Strawberry_latent_ringspot_virus	1	PRJNA15167
+s__Columbid_circovirus	1	PRJNA14437
+s__Tick_borne_encephalitis_virus	1	PRJNA15335
+s__Erwinia_phage_phiEt88	1	PRJNA64765
+s__Halorubrum_pleomorphic_virus_6	1	PRJNA157261
+s__Cryocola_sp_340MFSha3_1	1	GCF_000383315
+s__Tobacco_yellow_dwarf_virus	1	PRJNA14181
+s__Porphyromonas_gingivalis	12	GCF_000467995	GCF_000503975	GCF_000271945	GCF_000007585	GCF_000467955	GCF_000380305	GCF_000467815	GCF_000467835	GCF_000270225	GCF_000467795	GCF_000467975	GCF_000010505
+s__Tropheryma_whipplei	2	GCF_000196075	GCF_000007485
+s__Methanopyrus_kandleri	1	GCF_000007185
+s__Neisseria_shayeganii	1	GCF_000226875
+s__Sandarakinorhabdus_limnophila	1	GCF_000420765
+s__Streptococcus_sp_F0442	1	GCF_000314795
+s__Streptococcus_sp_F0441	1	GCF_000314775
+s__Magpie_robin_coronavirus_HKU18	1	PRJNA109275
+s__Pseudoalteromonas_sp_PAMC_22718	1	GCF_000263075
+s__Blattabacterium_sp_Blattella_germanica	1	GCF_000022605
+s__Micromonospora_sp_L5	1	GCF_000177655
+s__Lactobacillus_phage_phiAQ113	1	PRJNA188466
+s__Ryegrass_mosaic_virus	1	PRJNA15344
+s__Thioalkalivibrio_sp_ALE9	1	GCF_000377445
+s__Bovine_papillomavirus_7	1	PRJNA16202
+s__Bacillus_phage_AP50	1	PRJNA32599
+s__Nocardia_asteroides	1	GCF_000308355
+s__Streptococcus_iniae	3	GCF_000403625	GCF_000331915	GCF_000300915
+s__Malachra_yellow_vein_mosaic_virus_associated_satellite_DNA_beta	1	PRJNA28727
+s__Synechocystis_sp_PCC_7509	1	GCF_000332075
+s__Clostridium_sp_ATCC_BAA_442	1	GCF_000466445
+s__Haloarcula_argentinensis	1	GCF_000336895
+s__Staphylococcus_phage_55	1	PRJNA15276
+s__Haloarcula_sinaiiensis	1	GCF_000337275
+s__Streptomyces_scabiei	1	GCF_000091305
+s__Vibrio_phage_JA_1	1	PRJNA209075
+s__Flavobacterium_sp_F52	1	GCF_000278705
+s__Clostridiales_bacterium_BV3C26	1	GCF_000478985
+s__Acinetobacter_sp_NIPH_298	1	GCF_000369505
+s__Listeria_phage_A500	1	PRJNA20791
+s__Dysgonomonas_mossii	2	GCF_000376405	GCF_000213575
+s__Borrelia_turicatae	1	GCF_000012085
+s__Pseudomonas_psychrophila	1	GCF_000282975
+s__Methanobrevibacter_ruminantium	1	GCF_000024185
+s__Bacillus_pumilus	4	GCF_000299555	GCF_000017885	GCF_000225935	GCF_000172815
+s__Alteromonadales_bacterium_TW_7	1	GCF_000169055
+s__Treponema_pedis	1	GCF_000447675
+s__Human_erythrovirus_V9	1	PRJNA14224
+s__Corynebacterium_durum	1	GCF_000318135
+s__Oceanibaculum_indicum	1	GCF_000299935
+s__Microbacterium_maritypicum	1	GCF_000455825
+s__Tomato_ringspot_virus	1	PRJNA15300
+s__Parvibaculum_lavamentivorans	1	GCF_000017565
+s__Moumouvirus	1	PRJNA186430
+s__Burkholderia_phenoliruptrix	1	GCF_000300095
+s__Pariacoto_virus	1	PRJNA14785
+s__Mouse_parvovirus_5a	1	PRJNA33007
+s__Lactobacillus_phage_ATCC_8014_B1	1	PRJNA184150
+s__Moloney_murine_sarcoma_virus	1	PRJNA14721
+s__Naumovozyma_castellii	1	GCA_000237345
+s__Pennisetum_mosaic_virus	1	PRJNA15447
+s__Methylobacter_marinus	1	GCF_000383855
+s__Cupriavidus_sp_HPC_L	1	GCF_000307735
+s__Haloarcula_japonica	1	GCF_000336635
+s__Methanotorris_igneus	1	GCF_000214415
+s__Passionfruit_severe_leaf_distortion_virus	1	PRJNA38459
+s__Micromonas_pusilla_virus_12T	1	PRJNA195482
+s__Sida_mosaic_Bolivia_virus_2	1	PRJNA62475
+s__Sida_mosaic_Bolivia_virus_1	1	PRJNA62477
+s__Pseudomonad_phage_gh_1	1	PRJNA14265
+s__Yaba_monkey_tumor_virus	1	PRJNA14466
+s__Curtobacterium_sp_B8	1	GCF_000333315
+s__Human_papillomavirus_type_140	1	PRJNA167868
+s__Pyrococcus_yayanosii	1	GCF_000215995
+s__Human_papillomavirus_type_144	1	PRJNA167869
+s__Mycobacterium_phage_Chy4	1	PRJNA206477
+s__Prevotella_nigrescens	3	GCF_000220235	GCF_000507825	GCF_000336235
+s__Bougainvillea_spectabilis_chlorotic_vein_banding_virus	1	PRJNA32823
+s__Thermanaerovibrio_acidaminovorans	1	GCF_000024905
+s__Pseudomonas_sp_S9	1	GCF_000222125
+s__Sapporo_virus	3	PRJNA14952	PRJNA15040	PRJNA15048
+s__Agrobacterium_fabrum	1	GCF_000092025
+s__Lactobacillus_gigeriorum	1	GCF_000296855
+s__Neisseria_meningitidis	174	GCF_000392695	GCF_000328145	GCF_000386185	GCF_000387345	GCF_000191285	GCF_000367485	GCF_000191265	GCF_000386885	GCF_000387385	GCF_000293285	GCF_000386845	GCF_000413175	GCF_000448125	GCF_000293465	GCF_000386685	GCF_000386905	GCF_000448185	GCF_000293405	GCF_000328225	GCF_000448245	GCF_000448085	GCF_000327805	GCF_000327865	GCF_000327765	GCF_000327705	GCF_000386105	GCF_000327965	GCF_000392355	GCF_000327645	GCF_000293365	GCF_000387185	GCF_000327545	GCF_000191245	 [...]
+s__Bacillus_clausii	1	GCF_000009825
+s__Serinicoccus_profundi	1	GCF_000224715
+s__Kappapapillomavirus_2	1	PRJNA14075
+s__Salinibacterium_sp_PAMC_21357	1	GCF_000247645
+s__Haloarcula_phage_SH1	1	PRJNA15535
+s__Sputnik_virophage	1	PRJNA30929
+s__Pseudomonas_phage_vB_Pae_TbilisiM32	1	PRJNA167051
+s__Acinetobacter_sp_CIP_70_18	1	GCF_000369525
+s__Thiomicrospira_arctica	1	GCF_000381085
+s__Burkholderia_terrae	1	GCF_000265115
+s__Legionella_shakespearei	1	GCF_000373765
+s__Acidovorax_sp_JS42	1	GCF_000015545
+s__Avian_adeno_associated_virus_ATCC_VR_865	1	PRJNA14456
+s__Dichelobacter_nodosus	1	GCF_000015345
+s__Oat_mosaic_virus	1	PRJNA15391
+s__Capnocytophaga_sputigena	1	GCF_000173675
+s__Bean_common_mosaic_necrosis_virus	1	PRJNA15333
+s__Yersinia_rohdei	1	GCF_000173775
+s__Vibrio_gazogenes	1	GCF_000390165
+s__Proteiniphilum_acetatigenes	1	GCF_000380985
+s__Murine_pneumotropic_virus	1	PRJNA14071
+s__Chlorobaculum_parvum	1	GCF_000020505
+s__Clostridium_hylemonae	1	GCF_000156515
+s__Sulfitobacter_sp_EE_36	1	GCF_000152605
+s__Flavobacteria_bacterium_BAL38	1	GCF_000169355
+s__His_1_virus	1	PRJNA16650
+s__Stenotrophomonas_phage_IME15	1	PRJNA179429
+s__Red_clover_vein_mosaic_virus	1	PRJNA34841
+s__Kurthia_sp_Dielmo	1	GCF_000307285
+s__Halorubrum_saccharovorum	1	GCF_000337915
+s__Beggiatoa_alba	1	GCF_000245015
+s__Pseudorhodobacter_ferrugineus	1	GCF_000420745
+s__Desulfococcus_multivorans	1	GCF_000422185
+s__Bacillus_sp_CPSM8	1	GCF_000409505
+s__Colwellia_psychrerythraea	1	GCF_000012325
+s__Bacillus_sp_BT1B_CT2	1	GCF_000186125
+s__Alternanthera_yellow_vein_virus	1	PRJNA15560
+s__Streptomyces_albulus	1	GCF_000403765
+s__Beet_mild_curly_top_virus	1	PRJNA14282
+s__Vibrio_brasiliensis	1	GCF_000189255
+s__Paracoccus_zeaxanthinifaciens	1	GCF_000420145
+s__Methylotenera_sp_1P_1	1	GCF_000384355
+s__Bartonella_sp_OS02	1	GCF_000312545
+s__Actinomyces_vaccimaxillae	1	GCF_000420425
+s__Acinetobacter_sp_ADP1	1	GCF_000046845
+s__Mogibacterium_sp_CM50	1	GCF_000293155
+s__Barley_yellow_dwarf_virus_MAV	1	PRJNA14781
+s__Streptococcus_salivarius	7	GCF_000225385	GCF_000253335	GCF_000174715	GCF_000305335	GCF_000253315	GCF_000257585	GCF_000286295
+s__Wissadula_golden_mosaic_virus	1	PRJNA30167
+s__Oribacterium_sp_oral_taxon_078	2	GCF_000160135	GCF_000469565
+s__Staphylococcus_phage_phiSauS_IPLA35	1	PRJNA32997
+s__Acinetobacter_sp_CIP_102082	1	GCF_000368365
+s__Vibrio_phage_ICP1	1	PRJNA63229
+s__Tomato_leaf_curl_Iran_virus	1	PRJNA14474
+s__Vibrio_phage_ICP2	1	PRJNA63231
+s__Ustilago_maydis	1	GCA_000328475
+s__Lactobacillus_hominis	1	GCF_000296835
+s__Halyomorpha_halys_virus	1	PRJNA225920
+s__Cylindrospermopsis_raciborskii	1	GCF_000175835
+s__candidate_division_TM7_single_cell_isolate_TM7b	1	GCF_000170655
+s__candidate_division_TM7_single_cell_isolate_TM7c	1	GCF_000170675
+s__Streptococcus_infantis	5	GCF_000215385	GCF_000187465	GCF_000223335	GCF_000223255	GCF_000260755
+s__Cotton_leaf_curl_Multan_alphasatellite	1	PRJNA169228
+s__Ljungan_virus	1	PRJNA15401
+s__Methanoplanus_limicola	1	GCF_000243255
+s__Stx2_converting_phage_86	1	PRJNA17979
+s__Burkholderia_rhizoxinica	1	GCF_000198775
+s__Virgibacillus_sp_CM_4	1	GCF_000445495
+s__Chlamydia_pneumoniae	5	GCF_000007205	GCF_000008745	GCF_000024145	GCF_000091085	GCF_000011165
+s__Kyrpidia_tusciae	1	GCF_000092905
+s__Novosphingobium_sp_AP12	1	GCF_000281975
+s__Mycobacterium_phage_Reprobate	1	PRJNA215118
+s__Mycobacterium_phage_Astraea	1	PRJNA206480
+s__Streptococcus_sp_GMD1S	1	GCF_000296875
+s__Human_parechovirus	1	PRJNA15357
+s__Sulfolobus_turreted_icosahedral_virus	1	PRJNA14401
+s__Mycoplasma_parvum	1	GCF_000477415
+s__Porphyromonas_cansulci	1	GCF_000509265
+s__Gluconobacter_thailandicus	1	GCF_000344115
+s__Torque_teno_douroucouli_virus	1	PRJNA48173
+s__Desulfosporosinus_orientis	1	GCF_000235605
+s__Cereal_yellow_dwarf_virus_RPS	1	PRJNA14691
+s__Soil_borne_wheat_mosaic_virus	1	PRJNA14661
+s__Cereal_yellow_dwarf_virus_RPV	1	PRJNA14883
+s__Caldibacillus_debilis	1	GCF_000383875
+s__Geobacter_sulfurreducens	2	GCF_000210155	GCF_000007985
+s__Alstroemeria_virus_x	1	PRJNA15687
+s__Enterococcus_caccae	2	GCF_000394055	GCF_000407145
+s__Capnocytophaga_gingivalis	1	GCF_000174755
+s__Bean_leafroll_virus	1	PRJNA14734
+s__Sugarcane_streak_Reunion_virus	1	PRJNA14303
+s__Escherichia_phage_vB_EcoP_G7C	1	PRJNA72371
+s__Enterobacter_phage_EcP1	1	PRJNA181071
+s__Mycobacterium_phage_Fredward	1	PRJNA227008
+s__Pseudoalteromonas_citrea	1	GCF_000238375
+s__Halococcus_saccharolyticus	1	GCF_000336915
+s__Pasteurella_bettyae	1	GCF_000262245
+s__Streptococcus_phage_phiNJ2	1	PRJNA179424
+s__Methylobacterium_sp_285MFTsu5_1	1	GCF_000383455
+s__Pseudomonas_phage_LUZ19	1	PRJNA28741
+s__Calditerrivibrio_nitroreducens	1	GCF_000183405
+s__Acidaminococcus_sp_D21	1	GCF_000174215
+s__Mycoplasma_orale	1	GCF_000420105
+s__Streptomyces_sp_CNY243	1	GCF_000377165
+s__Aggregatibacter_actinomycetemcomitans	20	GCF_000241025	GCF_000241005	GCF_000332915	GCF_000226715	GCF_000259915	GCF_000226795	GCF_000318155	GCF_000226835	GCF_000146265	GCF_000226755	GCF_000372365	GCF_000332895	GCF_000226775	GCF_000226855	GCF_000226735	GCF_000226815	GCF_000163615	GCF_000240985	GCF_000332935	GCF_000332955
+s__Macroptilium_yellow_spot_virus	1	PRJNA124059
+s__Streptomyces_sp_PVA_94_07	1	GCF_000495755
+s__Psychrobacter_sp_PRwf_1	1	GCF_000016885
+s__Morelia_spilota_papillomavirus_1	1	PRJNA73439
+s__Cronobacter_dublinensis	2	GCF_000319495	GCF_000319345
+s__Pipistrellus_bat_coronavirus_HKU5	1	PRJNA18865
+s__Vibrio_phage_KVP40	1	PRJNA14416
+s__Soybean_crinkle_leaf_virus	1	PRJNA14149
+s__Lactococcus_phage_BK5_T	1	PRJNA15244
+s__Helicobacter_pullorum	1	GCF_000155495
+s__Cedecea_davisae	1	GCF_000412335
+s__Marine_Group_I_thaumarchaeote_SCGC_AB_629_I23	1	GCF_000399765
+s__Listeria_welshimeri	1	GCF_000060285
+s__Peanut_stunt_virus_satellite_RNA	1	PRJNA14502
+s__Tomato_bushy_stunt_virus	1	PRJNA15147
+s__Acheta_domesticus_mini_ambidensovirus	1	PRJNA223005
+s__Nocardiopsis_alkaliphila	1	GCF_000341005
+s__Cestrum_yellow_leaf_curling_virus	1	PRJNA14470
+s__Thermoanaerobacter_thermohydrosulfuricus	1	GCF_000353265
+s__Pseudoalteromonas_sp_SM9913	1	GCF_000184065
+s__Human_coronavirus_HKU1	1	PRJNA15139
+s__Glaciecola_sp_HTCC2999	1	GCF_000155775
+s__Cowpea_mottle_virus	1	PRJNA14755
+s__Escherichia_phage_N4	1	PRJNA18511
+s__Campylobacter_hominis	1	GCF_000017585
+s__Gentian_Kobu_sho_associated_virus	1	PRJNA189210
+s__Providencia_rustigianii	1	GCF_000156395
+s__Enterobacteria_phage_EPS7	1	PRJNA29287
+s__Lactobacillus_sp_ASF360	1	GCF_000364185
+s__Malvastrum_yellow_mosaic_virus_satellite_DNA_beta	1	PRJNA18133
+s__Francisella_noatunensis	1	GCF_000262205
+s__Helicobacter_phage_phiHP33	1	PRJNA80923
+s__Malvastrum_yellow_vein_virus	1	PRJNA14252
+s__Clostridium_beijerinckii	2	GCF_000016965	GCF_000280535
+s__Amphibacillus_jilinensis	1	GCF_000306965
+s__Gryllus_bimaculatus_nudivirus	1	PRJNA19181
+s__Mycobacterium_phage_BTCU_1	1	PRJNA209077
+s__Neorickettsia_risticii	1	GCF_000022525
+s__Sulfuricella_denitrificans	1	GCF_000297055
+s__Gracilibacillus_halophilus	1	GCF_000359605
+s__Clostridium_phage_phiCP26F	1	PRJNA181251
+s__Pirital_virus	1	PRJNA14919
+s__Treponema_brennaborense	1	GCF_000212415
+s__Fig_badnavirus_1	1	PRJNA162487
+s__Brevicoryne_brassicae_picorna_like_virus	1	PRJNA19753
+s__Myxoma_virus	1	PRJNA14396
+s__Rubrivivax_gelatinosus	1	GCF_000284255
+s__Cotton_leaf_crumple_virus	1	PRJNA14541
+s__Tomato_yellow_leaf_curl_Yunnan_virus	1	PRJNA206466
+s__Canine_picornavirus	1	PRJNA89397
+s__actinobacterium_SCGC_AAA027_J17	1	GCF_000383815
+s__Leptospira_biflexa	2	GCF_000017685	GCF_000017605
+s__Neisseria_sp_GT4A_CT1	1	GCF_000227275
+s__Clostridium_sp_M62_1	1	GCF_000159055
+s__Acinetobacter_sp_NIPH_899	1	GCF_000368385
+s__Actinomyces_naeslundii	1	GCF_000285995
+s__Bacteroides_pyogenes	1	GCF_000466505
+s__Hyphomicrobium_nitrativorans	1	GCF_000503895
+s__Coprococcus_sp_ART55_1	1	GCF_000210595
+s__Propionibacterium_propionicum	1	GCF_000277715
+s__Coprococcus_sp_HPP0074	1	GCF_000411335
+s__Haloarcula_hispanica_pleomorphic_virus_1	1	PRJNA43589
+s__Cynomolgus_macaque_cytomegalovirus_strain_Ottawa	1	PRJNA76697
+s__Flavobacterium_frigoris	1	GCF_000252125
+s__Yaniella_halotolerans	1	GCF_000420805
+s__Spiribacter_sp_UAH_SP71	1	GCF_000485905
+s__Burkholderia_phage_Bcep1	1	PRJNA14409
+s__Tomato_leaf_curl_New_Delhi_virus	1	PRJNA14243
+s__Bacillus_phage_Gamma	1	PRJNA15783
+s__Bacteroides_coprosuis	1	GCF_000212915
+s__Staphylococcus_sp_E463	1	GCF_000316945
+s__Sweet_clover_necrotic_mosaic_virus	1	PRJNA14809
+s__Archaeoglobus_profundus	1	GCF_000025285
+s__Rhodanobacter_spathiphylli	1	GCF_000264295
+s__Sulfolobus_spindle_shaped_virus_6	1	PRJNA42355
+s__Dicliptera_yellow_mottle_virus	1	PRJNA14185
+s__Mupapillomavirus_2	1	PRJNA15486
+s__Mupapillomavirus_1	1	PRJNA15491
+s__Human_coronavirus_NL63	1	PRJNA14960
+s__Vibrio_phage_VPMS1	1	PRJNA212709
+s__Alicyclobacillus_pohliae	1	GCF_000376225
+s__Staphylococcus_simulans	2	GCF_000477455	GCF_000314755
+s__Honeysuckle_yellow_vein_betasatellite	1	PRJNA14620
+s__Blastomonas_sp_AAP53	1	GCF_000331245
+s__Thiomicrospira_halophila	1	GCF_000384235
+s__Streptomyces_turgidiscabies	1	GCF_000331005
+s__Lachnospiraceae_bacterium_8_1_57FAA	1	GCF_000185545
+s__Actinomyces_graevenitzii	2	GCF_000239695	GCF_000466185
+s__TTV_like_mini_virus	1	PRJNA193982
+s__Human_papillomavirus	1	PRJNA215652
+s__Equine_arteritis_virus	1	PRJNA15383
+s__Rhodococcus_jostii	1	GCF_000014565
+s__Synechococcus_phage_S_RIM2	1	PRJNA195488
+s__Verminephrobacter_aporrectodeae	1	GCF_000193225
+s__Streptococcus_dysgalactiae	8	GCF_000493775	GCF_000188715	GCF_000214575	GCF_000188315	GCF_000010705	GCF_000317855	GCF_000221105	GCF_000307185
+s__Enterococcus_casseliflavus	7	GCF_000191365	GCF_000393915	GCF_000407405	GCF_000157215	GCF_000273565	GCF_000157295	GCF_000414945
+s__Synechococcus_phage_S_RIM8	1	PRJNA192853
+s__Thauera_sp_63	1	GCF_000310165
+s__Duck_adenovirus_A	2	PRJNA14520	PRJNA40313
+s__Rhizobium_sp_JGI_0001005_K05	1	GCF_000376185
+s__Psychrobacter_lutiphocae	1	GCF_000382145
+s__Coprothermobacter_proteolyticus	1	GCF_000020945
+s__Pantoea_sp_SL1_M5	1	GCF_000220605
+s__Cyanophage_PSS2	1	PRJNA39613
+s__Southern_rice_black_streaked_dwarf_virus	1	PRJNA60383
+s__Bidens_mottle_virus	1	PRJNA50559
+s__Arhodomonas_aquaeolei	1	GCF_000374645
+s__Lactococcus_phage_jm2	1	PRJNA213074
+s__Lactococcus_phage_jm3	1	PRJNA213075
+s__Pediococcus_pentosaceus	2	GCF_000285875	GCF_000014505
+s__Sida_yellow_blotch_virus	1	PRJNA189216
+s__Pseudomonas_sp_GM24	1	GCF_000282235
+s__Pseudomonas_sp_GM25	1	GCF_000282255
+s__Pseudomonas_sp_GM21	1	GCF_000282215
+s__Human_immunodeficiency_virus_2	1	PRJNA14991
+s__Human_immunodeficiency_virus_1	1	PRJNA15476
+s__Desulfitobacterium_hafniense	5	GCF_000379505	GCF_000378805	GCF_000010045	GCF_000021925	GCF_000238035
+s__Eggplant_mosaic_virus	1	PRJNA14639
+s__Marseillevirus	1	PRJNA43573
+s__Propionibacterium_phage_PHL111M01	1	PRJNA219111
+s__Staphylococcus_phage_77	1	PRJNA14352
+s__Homalodisca_coagulata_virus_1	1	PRJNA16797
+s__Clostridium_celatum	1	GCF_000320405
+s__Cassava_virus_C	1	PRJNA39977
+s__Bebaru_virus	1	PRJNA88121
+s__Encephalitozoon_cuniculi	1	GCA_000091225
+s__Vibrio_cyclitrophicus	21	GCF_000256425	GCF_000256135	GCF_000256165	GCF_000256265	GCF_000256345	GCF_000256285	GCF_000256115	GCF_000256185	GCF_000256405	GCF_000247005	GCF_000256305	GCF_000256095	GCF_000256205	GCF_000256605	GCF_000473545	GCF_000256445	GCF_000256245	GCF_000256325	GCF_000256365	GCF_000256465	GCF_000256385
+s__Streptococcus_phage_Sfi11	1	PRJNA14054
+s__Diuris_virus_A	1	PRJNA178591
+s__Enterobacteria_phage_lambda	1	PRJNA14204
+s__Streptococcus_phage_Sfi19	1	PRJNA14045
+s__Candida_albicans	1	GCA_000182965
+s__Arthrobacter_sp_PAO19	1	GCF_000414345
+s__Blastococcus_saxobsidens	1	GCF_000284015
+s__Staphylococcus_aureus	436	GCF_000361605	GCF_000363125	GCF_000360545	GCF_000330825	GCF_000361665	GCF_000362745	GCF_000260015	GCF_000360525	GCF_000175475	GCF_000361965	GCF_000361905	GCF_000362185	GCF_000262955	GCF_000215405	GCF_000360245	GCF_000359785	GCF_000360485	GCF_000361765	GCF_000239615	GCF_000248935	GCF_000363225	GCF_000360425	GCF_000362205	GCF_000361845	GCF_000360305	GCF_000276625	GCF_000507725	GCF_000239655	GCF_000364085	GCF_000360085	GCF_000248535	GCF_000361365	GCF_000248675	G [...]
+s__Thermogladius_cellulolyticus	1	GCF_000264495
+s__Spirosoma_spitsbergense	1	GCF_000374085
+s__Andean_potato_mild_mosaic_virus	1	PRJNA192605
+s__Aspergillus_niger	1	GCA_000002855
+s__Algoriphagus_machipongonensis	1	GCF_000166275
+s__Haliscomenobacter_hydrossis	1	GCF_000212735
+s__Clostridium_phage_phi8074_B1	1	PRJNA184148
+s__Geodermatophilus_obscurus	1	GCF_000025345
+s__Mycobacterium_hassiacum	2	GCF_000300375	GCF_000379865
+s__Invertebrate_iridovirus_22	1	PRJNA213479
+s__actinobacterium_SCGC_AAA023_D18	1	GCF_000378925
+s__Alistipes_sp_HGB5	1	GCF_000183485
+s__Lymantria_dispar_multiple_nucleopolyhedrovirus	1	PRJNA14390
+s__Enterococcus_sp_GMD4E	1	GCF_000296935
+s__Sweet_potato_golden_vein_associated_virus	1	PRJNA65275
+s__Anaeroglobus_geminatus	1	GCF_000239275
+s__Vibrio_phage_henriette_12B8	1	PRJNA198435
+s__Shewanella_sediminis	1	GCF_000018025
+s__Enterobacteria_phage_St_1	1	PRJNA38669
+s__Vibrio_phage_VP882	1	PRJNA18851
+s__Burkholderia_sp_TJI49	1	GCF_000191945
+s__Haloterrigena_turkmenica	1	GCF_000025325
+s__Enterobacteria_phage_IME10	1	PRJNA181235
+s__Triatoma_virus	1	PRJNA14802
+s__Salmonella_phage_FelixO1	1	PRJNA14323
+s__Corynebacterium_sp_KPL1821	1	GCF_000478115
+s__Abalone_shriveling_syndrome_associated_virus	1	PRJNA33141
+s__Rice_ragged_stunt_virus	1	PRJNA14794
+s__Mimosa_yellow_leaf_curl_virus	1	PRJNA19781
+s__Luminiphilus_syltensis	1	GCF_000158175
+s__Vibrio_phage_kappa	1	PRJNA28503
+s__Pseudomonas_avellanae	1	GCF_000302915
+s__Vibrio_orientalis	2	GCF_000222645	GCF_000176235
+s__Brachybacterium_paraconglomeratum	1	GCF_000233655
+s__Listeria_seeligeri	2	GCF_000183965	GCF_000027145
+s__Eyach_virus	1	PRJNA14786
+s__Geobacillus_thermoglucosidasius	2	GCF_000178395	GCF_000258725
+s__Rickettsia_endosymbiont_of_Ixodes_scapularis	1	GCF_000160735
+s__Salinivibrio_phage_CW02	1	PRJNA181992
+s__Rhizobium_sp_Pop5	1	GCF_000295895
+s__Streptococcus_infantarius	2	GCF_000154985	GCF_000246835
+s__Small_anellovirus	2	PRJNA15252	PRJNA15253
+s__Kushneria_aurantia	1	GCF_000382245
+s__Citrus_leprosis_virus_C	1	PRJNA17095
+s__Mycobacterium_phage_Predator	1	PRJNA30611
+s__Synechococcus_phage_Syn5	1	PRJNA19763
+s__Listonella_phage_phiHSIC	1	PRJNA15173
+s__Escherichia_phage_bV_EcoS_AKFV33	1	PRJNA167572
+s__Novosphingobium_lindaniclasticum	1	GCF_000445125
+s__Lactobacillus_farciminis	1	GCF_000184535
+s__Salmonella_phage_SPN9CC	1	PRJNA167665
+s__Epizootic_hemorrhagic_disease_virus	1	PRJNA41081
+s__Liberibacter_crescens	1	GCF_000325745
+s__Brugmansia_mosaic_virus	1	PRJNA186429
+s__Helicobacter_canis	1	GCF_000507865
+s__Prochlorococcus_phage_P_SSM3	1	PRJNA209210
+s__Prochlorococcus_phage_P_SSM2	1	PRJNA15135
+s__Prochlorococcus_phage_P_SSM4	1	PRJNA15136
+s__Prochlorococcus_phage_P_SSM7	1	PRJNA64717
+s__Bacteroides_thetaiotaomicron	2	GCF_000403155	GCF_000011065
+s__Rickettsia_australis	1	GCF_000284155
+s__Arthrobacter_chlorophenolicus	1	GCF_000022025
+s__Bifidobacterium_adolescentis	2	GCF_000154085	GCF_000010425
+s__Cowpea_severe_leaf_curl_associated_DNA_beta	1	PRJNA15157
+s__Treponema_paraluiscuniculi	1	GCF_000217655
+s__Pseudomonas_phage_Bf7	1	PRJNA82647
+s__Avian_myelocytomatosis_virus	1	PRJNA14909
+s__Enterobacteria_phage_YYZ_2008	1	PRJNA32231
+s__Caldicellulosiruptor_bescii	1	GCF_000022325
+s__Bean_necrotic_mosaic_virus	2	PRJNA168523	PRJNA168596
+s__Lucerne_transient_streak_virus_satellite_RNA	1	PRJNA14501
+s__Treponema_primitia	2	GCF_000214375	GCF_000297095
+s__Myceliophthora_thermophila	1	GCA_000226095
+s__Ureibacillus_thermosphaericus	1	GCF_000284835
+s__Eubacteriaceae_bacterium_ACC19a	1	GCF_000238115
+s__Gluconacetobacter_oboediens	1	GCF_000227565
+s__Burkholderia_phage_KL3	1	PRJNA64565
+s__Pseudomonas_phage_PaP2	1	PRJNA14377
+s__Pseudomonas_phage_PaP3	1	PRJNA14322
+s__Pseudomonas_phage_PaP1	1	PRJNA184153
+s__Hordeum_mosaic_virus	1	PRJNA15064
+s__Sutterella_wadsworthensis	3	GCF_000297775	GCF_000411515	GCF_000186505
+s__Spirochaeta_africana	1	GCF_000242595
+s__Synechococcus_sp_CB0205	1	GCF_000179255
+s__Tomato_dwarf_leaf_virus	1	PRJNA81031
+s__Pasteurella_phage_F108	1	PRJNA17113
+s__Feline_calicivirus	1	PRJNA14877
+s__Brucella_pinnipedialis	4	GCF_000158675	GCF_000157815	GCF_000221005	GCF_000157795
+s__Bradyrhizobium_sp	1	GCF_000239795
+s__Methyloversatilis_sp_NVD	1	GCF_000372885
+s__Erwinia_amylovora	11	GCF_000367665	GCF_000367545	GCF_000027205	GCF_000240705	GCF_000367605	GCF_000367585	GCF_000367645	GCF_000367625	GCF_000091565	GCF_000367685	GCF_000367565
+s__Highlands_J_virus	1	PRJNA37281
+s__Geobacter_uraniireducens	1	GCF_000016745
+s__Leptospira_sp_Fiocruz_LV4135	1	GCF_000346675
+s__Salipiger_mucosus	1	GCF_000442255
+s__Shewanella_sp_HN_41	1	GCF_000217915
+s__Fusobacterium_necrophorum	4	GCF_000292975	GCF_000158295	GCF_000242215	GCF_000262225
+s__Escherichia_Stx1_converting_phage	1	PRJNA14293
+s__Gallid_herpesvirus_3	1	PRJNA14103
+s__Escherichia_sp_TW10509	1	GCF_000208545
+s__Gallid_herpesvirus_1	1	PRJNA14566
+s__Blueberry_necrotic_ring_blotch_virus	1	PRJNA74579
+s__Streptomyces_ipomoeae	1	GCF_000317595
+s__Acinetobacter_sp_NIPH_284	1	GCF_000369425
+s__Mycoplasma_pulmonis	1	GCF_000195875
+s__Halomonas_anticariensis	1	GCF_000409775
+s__Halorhabdus_tiamatea	1	GCF_000215915
+s__Porcine_parvovirus_4	1	PRJNA60137
+s__Desulfurivibrio_alkaliphilus	1	GCF_000092205
+s__Gyrovirus_4	1	PRJNA172459
+s__Rhodococcus_sp_P14	1	GCF_000256505
+s__Pan_troglodytes_schweinfurthii_polyomavirus_2	1	PRJNA183905
+s__Yam_mosaic_virus	1	PRJNA14884
+s__Johnsongrass_chlorotic_stripe_mosaic_virus	1	PRJNA14904
+s__Butyrivibrio_sp_AD3002	1	GCF_000420905
+s__Neisseria_sicca	4	GCF_000193735	GCF_000174655	GCF_000193755	GCF_000260655
+s__Prevotella_sp_F0091	1	GCF_000467895
+s__Salmonella_phage_ST64B	1	PRJNA14228
+s__Clostridium_sp_SY8519	1	GCF_000270305
+s__Lachnobacterium_bovis	1	GCF_000421025
+s__Gordonia_araii	1	GCF_000241265
+s__Turnip_yellows_virus	1	PRJNA15072
+s__Sulfolobus_solfataricus	3	GCF_000024745	GCF_000175555	GCF_000007005
+s__Natrialba_asiatica	1	GCF_000337555
+s__Mycobacterium_phage_Solon	1	PRJNA31287
+s__Ruminococcus_albus	2	GCF_000178155	GCF_000179635
+s__East_African_cassava_mosaic_Kenya_virus	1	PRJNA32329
+s__Perlucidibaca_piscinae	1	GCF_000420045
+s__Enterobacteria_phage_I2_2	1	PRJNA14572
+s__Ovine_lentivirus	1	PRJNA14668
+s__Leptotrichia_sp_oral_taxon_225	1	GCF_000469525
+s__Novosphingobium_sp_B_7	1	GCF_000410615
+s__Bavariicoccus_seileri	1	GCF_000421665
+s__Abalone_herpesvirus_Victoria_AUS_2009	1	PRJNA177933
+s__Bacteroides_sp_4_3_47FAA	1	GCF_000158515
+s__Fervidobacterium_nodosum	1	GCF_000017545
+s__Facklamia_hominis	2	GCF_000301035	GCF_000413455
+s__Aspergillus_oryzae	1	GCA_000184455
+s__Gordonia_rhizosphera	1	GCF_000298195
+s__Pichinde_virus	1	PRJNA15008
+s__Sphingopyxis_sp_MC1	1	GCF_000371385
+s__Asticcacaulis_excentricus	1	GCF_000175215
+s__Golden_Gate_virus	1	PRJNA173354
+s__Rhodobacter_phage_RC1	1	PRJNA195479
+s__Sphingobacterium_sp_IITKGP_BTPF85	1	GCF_000447275
+s__Fusarium_graminearum_dsRNA_mycovirus_1	1	PRJNA15154
+s__Mucispirillum_schaedleri	1	GCF_000487995
+s__Pseudomonas_phage_PAJU2	1	PRJNA32249
+s__Capraria_yellow_spot_Yucatan_virus	1	PRJNA214689
+s__Wheat_dwarf_virus	2	PRJNA15478	PRJNA30035
+s__Clostridium_phage_phi3626	1	PRJNA14166
+s__Human_herpesvirus_6A	1	PRJNA14462
+s__Cellulophaga_phage_phi18_1	1	PRJNA212959
+s__Spiroplasma_phage_4	1	PRJNA14161
+s__Cellulophaga_phage_phi18_3	1	PRJNA212960
+s__Thogoto_virus	1	PRJNA15043
+s__Duganella_zoogloeoides	1	GCF_000383895
+s__Erysipelotrichaceae_bacterium_21_3	1	GCF_000242195
+s__Mycobacterium_phage_AnnaL29	1	PRJNA215671
+s__Lindernia_anagallis_yellow_vein_virus_satellite_DNA_beta	1	PRJNA19831
+s__Bat_coronavirus_1A	1	PRJNA29247
+s__Staphylococcus_phage_phi7401PVL	1	PRJNA188545
+s__Mumps_virus	1	PRJNA15059
+s__Alloprevotella_tannerae	1	GCF_000159995
+s__Cellulomonas_flavigena	1	GCF_000092865
+s__Thermoanaerobacter_sp_X513	1	GCF_000148425
+s__Burkholderia_phage_phiE255	1	PRJNA19165
+s__Cocksfoot_streak_virus	1	PRJNA15399
+s__Thermoanaerobacter_sp_X514	1	GCF_000019065
+s__Geobacter_lovleyi	1	GCF_000020385
+s__Kiloniella_laminariae	1	GCF_000374005
+s__Enterovirus_G	1	PRJNA15396
+s__Arthrobacter_gangotriensis	1	GCF_000348945
+s__Stretch_Lagoon_orbivirus	1	PRJNA39971
+s__Enterobacter_sp_SST3	1	GCF_000286655
+s__Millerozyma_farinosa	1	GCA_000315895
+s__Sulfurimonas_autotrophica	1	GCF_000147355
+s__Coleus_blumei_viroid_3	1	PRJNA14784
+s__Nitrococcus_mobilis	1	GCF_000153205
+s__Coleus_blumei_viroid_1	1	PRJNA14782
+s__Coleus_blumei_viroid_6	1	PRJNA38541
+s__Coleus_blumei_viroid_5	1	PRJNA34737
+s__Streptomyces_phage_SV1	1	PRJNA177523
+s__Leptospira_sp_Fiocruz_LV3954	1	GCF_000306435
+s__Desulfomicrobium_baculatum	1	GCF_000023225
+s__Streptococcus_entericus	1	GCF_000380025
+s__Finch_polyomavirus	1	PRJNA16655
+s__Avian_leukosis_virus	1	PRJNA14633
+s__Pseudomonas_geniculata	1	GCF_000258575
+s__Pirellula_staleyi	1	GCF_000025185
+s__Tomato_leaf_curl_Seychelles_virus	1	PRJNA18869
+s__Bartonella_bacilliformis	2	GCF_000015445	GCF_000311905
+s__Cucumber_mosaic_virus	1	PRJNA15470
+s__Macacine_herpesvirus_3	1	PRJNA14468
+s__Macacine_herpesvirus_1	1	PRJNA14489
+s__Pseudomonas_sp_G5_2012	1	GCF_000408945
+s__Turkey_adenovirus_B	1	PRJNA53557
+s__Macacine_herpesvirus_5	1	PRJNA14423
+s__Macacine_herpesvirus_4	1	PRJNA14467
+s__Peach_mosaic_virus	1	PRJNA32727
+s__Lactobacillus_phage_Lb338_1	1	PRJNA36611
+s__Thermus_phage_TMA	1	PRJNA72385
+s__Enterobacteria_phage_RB43	1	PRJNA15417
+s__Enterobacteria_phage_RB49	1	PRJNA14301
+s__Acinetobacter_sp_CIP_56_2	1	GCF_000368445
+s__Mycobacterium_sp_360MFTsu5_1	1	GCF_000383495
+s__Mycoplasma_suis	2	GCF_000203215	GCF_000179035
+s__Ochrobactrum_anthropi	2	GCF_000251205	GCF_000017405
+s__Parabacteroides_sp_D25	1	GCF_000307475
+s__Aequorivita_sublithincola	1	GCF_000265385
+s__Prosthecochloris_aestuarii	1	GCF_000020625
+s__marine_gamma_proteobacterium_HTCC2148	1	GCF_000156295
+s__Lettuce_big_vein_associated_virus	1	PRJNA32725
+s__marine_gamma_proteobacterium_HTCC2143	1	GCF_000169075
+s__Microbacterium_sp_UCD_TDU	1	GCF_000340625
+s__Sphingobium_chinhatense	1	GCF_000421925
+s__alpha_proteobacterium_HIMB114	1	GCF_000163555
+s__Pseudomonas_phage_JBD88a	1	PRJNA188544
+s__Pseudomonas_entomophila	1	GCF_000026105
+s__Echinicola_pacifica	1	GCF_000373245
+s__Oceanicola_batsensis	1	GCF_000152725
+s__Bitter_gourd_leaf_curl_betasatellite	1	PRJNA16245
+s__Mycoplasma_hyopneumoniae	5	GCF_000427215	GCF_000008225	GCF_000400855	GCF_000008405	GCF_000008205
+s__Ruegeria_sp_TM1040	1	GCF_000014065
+s__Acidovorax_sp_MR_S7	1	GCF_000400995
+s__Burkholderia_sp_WSM4176	1	GCF_000372945
+s__Emiliania_huxleyi_virus_86	1	PRJNA15618
+s__Wheat_streak_mosaic_virus	1	PRJNA15354
+s__Soybean_yellow_common_mosaic_virus	1	PRJNA73551
+s__Pseudomonas_poae	1	GCF_000336465
+s__Marine_Group_III_euryarchaeote_SCGC_AAA288_E19	1	GCF_000382725
+s__Enterobacteria_phage_Bp7	1	PRJNA181212
+s__Coraliomargarita_akajimensis	1	GCF_000025905
+s__Afipia_felis	1	GCF_000314735
+s__Siegesbeckia_yellow_vein_Guangxi_virus	1	PRJNA17595
+s__Thermaerobacter_marianensis	1	GCF_000184705
+s__Midway_virus	1	PRJNA38097
+s__Streptococcus_lutetiensis	1	GCF_000441535
+s__Corynebacterium_lubricantis	1	GCF_000379425
+s__Megasphaera_sp_BL7	1	GCF_000417525
+s__Mycobacterium_phage_Goku	1	PRJNA215672
+s__Olsenella_uli	1	GCF_000143845
+s__Oligotropha_carboxidovorans	3	GCF_000021365	GCF_000218585	GCF_000218565
+s__Sweet_potato_leaf_curl_South_Carolina_virus	1	PRJNA65201
+s__Pseudomonas_fulva	1	GCF_000213805
+s__Ophiostoma_mitovirus_4	1	PRJNA14842
+s__Chlamydophila_sp_08_1274_3	1	GCF_000471025
+s__Corynebacterium_pseudotuberculosis	15	GCF_000241855	GCF_000227175	GCF_000259155	GCF_000143705	GCF_000144935	GCF_000255935	GCF_000152065	GCF_000265545	GCF_000144675	GCF_000233735	GCF_000227605	GCF_000221625	GCF_000263755	GCF_000258385	GCF_000248375
+s__Lactobacillus_pentosus	1	GCF_000271445
+s__Methylobacillus_flagellatus	1	GCF_000013705
+s__Enterobacteria_phage_vB_EcoM_ACG_C40	1	PRJNA179415
+s__Enterococcus_haemoperoxidus	2	GCF_000393995	GCF_000407165
+s__Uukuniemi_virus	1	PRJNA14902
+s__Corynebacterium_matruchotii	2	GCF_000175375	GCF_000158635
+s__Pelosinus_sp_HCF1	1	GCF_000317005
+s__Saccharomonospora_xinjiangensis	1	GCF_000258175
+s__Chroococcidiopsis_thermalis	1	GCF_000317125
+s__Ngaingan_virus	1	PRJNA46715
+s__Streptococcus_sobrinus	46	GCF_000228445	GCF_000228485	GCF_000227825	GCF_000228365	GCF_000228225	GCF_000227965	GCF_000228125	GCF_000228425	GCF_000228165	GCF_000228345	GCF_000228385	GCF_000227885	GCF_000228245	GCF_000227985	GCF_000228305	GCF_000228545	GCF_000228625	GCF_000228025	GCF_000227865	GCF_000227905	GCF_000228505	GCF_000228325	GCF_000228405	GCF_000228645	GCF_000227785	GCF_000228605	GCF_000228205	GCF_000227845	GCF_000228085	GCF_000228465	GCF_000228145	GCF_000467915	GCF_000228665	G [...]
+s__Hop_trefoil_cryptic_virus_2	1	PRJNA198687
+s__Chilli_veinal_mottle_virus	1	PRJNA15225
+s__Tolypocladium_cylindrosporum_virus_1	1	PRJNA61451
+s__Raspberry_latent_virus	1	PRJNA56055
+s__Micromonospora_sp_ATCC_39149	1	GCF_000158815
+s__Desulfovibrio_sp_FW1012B	1	GCF_000177215
+s__Brucella_sp_UK40_99	1	GCF_000371065
+s__Equine_infectious_anemia_virus	1	PRJNA14684
+s__Apple_fruit_crinkle_viroid	1	PRJNA14964
+s__Telosma_mosaic_virus	1	PRJNA20621
+s__Megasphaera_elsdenii	1	GCF_000283495
+s__Mirafiori_lettuce_big_vein_virus	1	PRJNA14886
+s__Leishmania_RNA_virus_1_1	1	PRJNA14666
+s__Leishmania_RNA_virus_1_4	1	PRJNA14761
+s__Circovirus_like_genome_CB_B	1	PRJNA39629
+s__Circovirus_like_genome_CB_A	1	PRJNA39627
+s__Tomato_leaf_curl_Hainan_virus	1	PRJNA39931
+s__Tistrella_mobilis	1	GCF_000264455
+s__Shigella_phage_SfII	1	PRJNA213070
+s__Shigella_phage_SfIV	1	PRJNA227000
+s__Chaetoceros_salsugineum_DNA_virus	1	PRJNA15497
+s__Chelatococcus_sp_GW1	1	GCF_000283095
+s__Brevibacillus_brevis	3	GCF_000010165	GCF_000296715	GCF_000346255
+s__Japanese_encephalitis_virus	1	PRJNA15310
+s__Chlamydophila_caviae	1	GCF_000007605
+s__Halarchaeum_acidiphilum	2	GCF_000474235	GCF_000400975
+s__Mesotoga_sp_PhosAc3	1	GCF_000367705
+s__Rivularia_sp_PCC_7116	1	GCF_000316665
+s__Human_bocavirus	1	PRJNA15895
+s__Tomato_mottle_mosaic_virus	1	PRJNA217881
+s__Mycobacterium_yongonense	1	GCF_000418535
+s__Treponema_medium	1	GCF_000413035
+s__Halovirus_HF1	1	PRJNA14294
+s__Nitrobacter_winogradskyi	1	GCF_000012725
+s__Streptomyces_auratus	1	GCF_000280865
+s__Paenibacillus_phage_phiIBB_Pl23	1	PRJNA213072
+s__Streptomyces_venezuelae	1	GCF_000253235
+s__Aeromonas_salmonicida	3	GCF_000234845	GCF_000447435	GCF_000196395
+s__Pseudoalteromonas_luteoviolacea	2	GCF_000333235	GCF_000495575
+s__alpha_proteobacterium_SCGC_AAA027_L15	1	GCF_000371865
+s__Prunus_necrotic_ringspot_virus	1	PRJNA14866
+s__Mycobacterium_sp_H4Y	1	GCF_000364405
+s__Caldithrix_abyssi	1	GCF_000241815
+s__Erwinia_billingiae	1	GCF_000196615
+s__Amycolatopsis_mediterranei	4	GCF_000196835	GCF_000282715	GCF_000454025	GCF_000220945
+s__Burdock_mottle_virus	1	PRJNA212310
+s__Bhendi_yellow_vein_Delhi_virus	1	PRJNA33677
+s__Tomato_yellow_leaf_curl_Mali_virus_associated_DNA_beta	1	PRJNA15995
+s__Candida_dubliniensis	1	GCA_000026945
+s__Ugandan_cassava_brown_streak_virus	1	PRJNA61097
+s__Intrasporangium_calvum	1	GCF_000184685
+s__endosymbiont_of_Bathymodiolus_sp	1	GCF_000297135
+s__Peach_chlorotic_mottle_virus	1	PRJNA20977
+s__Rhododendron_virus_A	1	PRJNA51905
+s__Enterovirus_J	2	PRJNA29255	PRJNA42941
+s__Enterovirus_H	1	PRJNA15371
+s__Enterovirus_B	1	PRJNA15321
+s__Enterovirus_C	1	PRJNA15288
+s__Haloplasma_contractile	1	GCF_000215935
+s__Enterovirus_F	1	PRJNA203090
+s__Odoribacter_laneus	1	GCF_000243215
+s__Enterovirus_D	1	PRJNA15297
+s__Enterovirus_E	1	PRJNA15351
+s__Tobacco_leaf_curl_Kochi_virus	1	PRJNA14400
+s__Dermacoccus_sp_Ellin185	1	GCF_000152185
+s__Tomato_leaf_curl_Kumasi_virus	1	PRJNA30837
+s__Leucobacter_chromiiresistens	1	GCF_000231305
+s__Streptomyces_sp_e14	1	GCF_000162775
+s__Sulfitobacter_phage_EE36phi1	1	PRJNA38079
+s__Thalassospira_xiamenensis	1	GCF_000300235
+s__Mycoplasma_yeatsii	1	GCF_000380285
+s__Eupatorium_yellow_vein_mosaic_virus	1	PRJNA14171
+s__Tomato_black_ring_virus	1	PRJNA14871
+s__Pieris_rapae_granulovirus	1	PRJNA45911
+s__Ornithobacterium_rhinotracheale	1	GCF_000265465
+s__Bacillus_subtilis	21	GCF_000349795	GCF_000293765	GCF_000245035	GCF_000385985	GCF_000209795	GCF_000186745	GCF_000230755	GCF_000338735	GCF_000227465	GCF_000177595	GCF_000344745	GCF_000340295	GCF_000321395	GCF_000227485	GCF_000183765	GCF_000155375	GCF_000341775	GCF_000332645	GCF_000497485	GCF_000146565	GCF_000245295
+s__Bacillus_endophyticus	1	GCF_000283255
+s__Lactobacillus_delbrueckii	10	GCF_000182835	GCF_000192165	GCF_000056065	GCF_000179375	GCF_000284695	GCF_000284715	GCF_000014405	GCF_000409675	GCF_000191165	GCF_000387565
+s__Kocuria_atrinae	1	GCF_000286355
+s__Neorickettsia_sennetsu	1	GCF_000013165
+s__Actinomyces_neuii	1	GCF_000296485
+s__Mycobacterium_phage_Peaches	1	PRJNA42939
+s__Desulfotignum_phosphitoxidans	1	GCF_000350545
+s__Sphingobium_baderi	1	GCF_000445145
+s__Salinispora_pacifica	20	GCF_000383575	GCF_000378845	GCF_000374705	GCF_000375265	GCF_000374685	GCF_000374745	GCF_000383995	GCF_000374785	GCF_000375285	GCF_000375225	GCF_000373825	GCF_000374665	GCF_000384095	GCF_000374765	GCF_000374725	GCF_000377025	GCF_000375305	GCF_000379065	GCF_000378825	GCF_000375245
+s__Gemella_bergeri	1	GCF_000469465
+s__Dietzia_sp_UCD_THP	1	GCF_000349585
+s__Vernonia_yellow_vein_betasatellite	1	PRJNA41303
+s__Wesselsbron_virus	1	PRJNA38295
+s__Schizophyllum_commune	1	GCA_000143185
+s__French_bean_leaf_curl_betasatellite_Kanpur	1	PRJNA169556
+s__Shigella_sonnei	7	GCF_000092525	GCF_000268045	GCF_000268225	GCF_000268005	GCF_000283715	GCF_000281815	GCF_000188795
+s__Klebsiella_phage_JD001	1	PRJNA188547
+s__Spiribacter_salinus	1	GCF_000319575
+s__Burkholderia_phage_BcepF1	1	PRJNA18857
+s__Tomato_torrado_virus	1	PRJNA18831
+s__Mesta_yellow_vein_mosaic_virus	1	PRJNA18967
+s__Aneurinibacillus_aneurinilyticus	1	GCF_000466385
+s__Alteromonas_phage_vB_AmaP_AD45_P	1	PRJNA209073
+s__Kangiella_aquimarina	1	GCF_000374105
+s__Sweet_potato_leaf_curl_virus	1	PRJNA15461
+s__Enterobacteria_phage_ID2_Moscow_ID_2001	1	PRJNA16591
+s__Rahnella_sp_Y9602	1	GCF_000187705
+s__Methanothermobacter_marburgensis	1	GCF_000145295
+s__Yersinia_phage_L_413C	1	PRJNA14280
+s__Boolarra_virus	1	PRJNA14850
+s__Tomato_leaf_curl_Bangladesh_betasatellite	1	PRJNA56017
+s__Groundnut_rosette_virus	1	PRJNA14762
+s__Stx2_converting_phage_II	1	PRJNA14310
+s__Streptomyces_canus	1	GCF_000383615
+s__White_eye_coronavirus_HKU16	1	PRJNA109273
+s__Dietzia_cinnamea	1	GCF_000186325
+s__Smithella_sp_ME_1	1	GCF_000495415
+s__Little_cherry_virus_1	1	PRJNA15346
+s__Little_cherry_virus_2	1	PRJNA15062
+s__Ovine_adenovirus_D	1	PRJNA14198
+s__Actinoplanes_sp_N902_109	1	GCF_000389965
+s__Ovine_adenovirus_A	2	PRJNA14497	PRJNA40309
+s__Enterobacteria_phage_P2	1	PRJNA14035
+s__Candidatus_Endolissoclinum_faulkneri	1	GCF_000319385
+s__Enterobacteria_phage_P1	1	PRJNA14493
+s__Verrucomicrobia_bacterium_SCGC_AAA164_A21	1	GCF_000264585
+s__Enterobacteria_phage_P4	1	PRJNA14414
+s__Pan_troglodytes_verus_polyomavirus_4	1	PRJNA183907
+s__Pan_troglodytes_verus_polyomavirus_5	1	PRJNA183908
+s__Parvimonas_sp_oral_taxon_110	1	GCF_000214475
+s__Propionibacterium_sp_434_HC2	1	GCF_000214535
+s__Human_bocavirus_2	1	PRJNA33891
+s__Human_bocavirus_3	1	PRJNA37291
+s__Human_bocavirus_4	1	PRJNA38243
+s__Tomato_yellow_mottle_virus	1	PRJNA184815
+s__Geobacillus_thermodenitrificans	1	GCF_000015745
+s__Tomato_leaf_curl_Sudan_virus	1	PRJNA14372
+s__Enterobacteria_phage_M13	1	PRJNA14549
+s__Zika_virus	1	PRJNA36615
+s__Escherichia_phage_KBNP135	1	PRJNA177528
+s__Thermus_oshimai	2	GCF_000373145	GCF_000309885
+s__Mycobacterium_phage_Che8	1	PRJNA14394
+s__Salmonella_phage_Fels_1	1	PRJNA29267
+s__Solenopsis_invicta_virus_3	1	PRJNA36613
+s__Clostridium_kluyveri	2	GCF_000010265	GCF_000016505
+s__Solenopsis_invicta_virus_1	1	PRJNA15042
+s__Mycobacterium_phage_Orion	1	PRJNA17151
+s__Aerococcus_viridans	2	GCF_000262085	GCF_000178435
+s__Cryptosporidium_hominis	1	GCA_000006425
+s__Vibrio_caribbenthicus	1	GCF_000165125
+s__Methanosphaerula_palustris	1	GCF_000021965
+s__Oat_necrotic_mottle_virus	1	PRJNA14899
+s__Geobacillus_sp_JF8	1	GCF_000445995
+s__Muricauda_ruestringensis	1	GCF_000224085
+s__Dugbe_virus	1	PRJNA14851
+s__Synechococcus_phage_S_CRM01	1	PRJNA67251
+s__Aguacate_virus	1	PRJNA66333
+s__Flavobacterium_phage_11b	1	PRJNA14565
+s__Vitreoscilla_stercoraria	1	GCF_000382305
+s__Pseudomonas_phage_Lu11	1	PRJNA167656
+s__Cetobacterium_somerae	1	GCF_000479045
+s__Chaerephon_polyomavirus_1	1	PRJNA185191
+s__Anopheles_gambiae_densonucleosis_virus	1	PRJNA32101
+s__Novosphingobium_pentaromativorans	1	GCF_000235975
+s__Grapevine_deformation_virus	1	PRJNA167162
+s__Thioalkalivibrio_nitratireducens	1	GCF_000321415
+s__Eubacterium_plexicaudatum	1	GCF_000364225
+s__Garlic_virus_X	1	PRJNA14987
+s__Influenza_A_virus	4	PRJNA14892	PRJNA15617	PRJNA15620	PRJNA15622
+s__Mycoplasma_haemofelis	2	GCF_000200735	GCF_000186985
+s__Leptolyngbya_sp_Heron_Island_J	1	GCF_000482245
+s__Pseudoalteromonas_rubra	1	GCF_000238295
+s__Paenibacillus_alvei	3	GCF_000442555	GCF_000442535	GCF_000293805
+s__Paenibacillus_terrigena	1	GCF_000374845
+s__Staphylococcus_lugdunensis	5	GCF_000185485	GCF_000270465	GCF_000316075	GCF_000025085	GCF_000247225
+s__Garlic_virus_E	1	PRJNA14834
+s__Garlic_virus_A	1	PRJNA14735
+s__Burkholderia_graminis	1	GCF_000172415
+s__Brevundimonas_abyssalis	1	GCF_000466985
+s__Propionibacterium_phage_PHL113M01	1	PRJNA219107
+s__Frankia_sp_CcI6	1	GCF_000503735
+s__Sphingobium_sp_SYK_6	1	GCF_000283515
+s__Frankia_sp_CcI3	1	GCF_000013345
+s__Sida_yellow_mottle_virus	1	PRJNA74527
+s__Mobilicoccus_pelagius	1	GCF_000247995
+s__Ophiostoma_mitovirus_3a	1	PRJNA14839
+s__Chrysanthemum_chlorotic_mottle_viroid	1	PRJNA14170
+s__Cyanobacterium_stanieri	1	GCF_000317655
+s__Cardiospermum_yellow_leaf_curl_betasatellite	1	PRJNA28647
+s__Caulobacter_phage_CcrColossus	1	PRJNA179419
+s__Potato_leafroll_virus	1	PRJNA15068
+s__Potato_latent_virus	1	PRJNA32629
+s__Acinetobacter_sp_CIP_51_11	1	GCF_000369665
+s__Canine_calicivirus	1	PRJNA14875
+s__Seoul_virus	1	PRJNA15027
+s__Xanthomonas_phage_phiL7	1	PRJNA38267
+s__Histophilus_somni	2	GCF_000019405	GCF_000011785
+s__Acidiphilium_sp_PM	1	GCF_000219295
+s__zeta_proteobacterium_SCGC_AB_133_G06	1	GCF_000379325
+s__Pseudomonas_sp_Ag1	1	GCF_000278565
+s__Flexal_virus	1	PRJNA29903
+s__Niabella_soli	1	GCF_000243115
+s__Vibrio_phage_pVp_1	1	PRJNA181224
+s__Respiratory_syncytial_virus	1	PRJNA15004
+s__Salmonella_phage_g341c	1	PRJNA39795
+s__Bacillus_sp_L1_2012	1	GCF_000334155
+s__Vibrio_phage_VGJphi	1	PRJNA14279
+s__Leptospira_sp_P2653	1	GCF_000346955
+s__Oceanimonas_sp_GK1	1	GCF_000243075
+s__Mycobacterium_phage_PLot	1	PRJNA17167
+s__Enterobacteria_phage_RB69	1	PRJNA15141
+s__Burkholderia_multivorans	7	GCF_000010545	GCF_000182275	GCF_000018505	GCF_000182295	GCF_000286575	GCF_000182255	GCF_000286555
+s__Lachnospiraceae_bacterium_1_1_57FAA	1	GCF_000218445
+s__Plantago_mottle_virus	1	PRJNA32683
+s__Squash_mild_leaf_curl_virus	1	PRJNA14407
+s__Enterobacteria_phage_SPC35	1	PRJNA64605
+s__Leuconostoc_phage_phiLN04	1	PRJNA195530
+s__endosymbiont_of_Riftia_pachyptila	1	GCF_000224455
+s__Kribbella_flavida	1	GCF_000024345
+s__Flavobacterium_sp_ACAM_123	1	GCF_000264055
+s__Actinomadura_madurae	1	GCF_000468475
+s__Haloquadratum_sp_J07HQX50	1	GCF_000416005
+s__Candidatus_Arthromitus_sp_SFB_mouse_SU	1	GCF_000252785
+s__Sunflower_chlorotic_mottle_virus	1	PRJNA47931
+s__Dickeya_paradisiaca	1	GCF_000400505
+s__Coleus_blumei_viroid	1	PRJNA14826
+s__Candidatus_Nitrospira_defluvii	1	GCF_000196815
+s__Phage_phiJL001	1	PRJNA16076
+s__Bradyrhizobium_diazoefficiens	1	GCF_000011365
+s__Salmon_pancreas_disease_virus	2	PRJNA15187	PRJNA15395
+s__Staphylococcus_sp_JGI_0001002_I23	1	GCF_000376205
+s__Cyanophage_9515_10a	1	PRJNA81181
+s__Mycobacterium_parascrofulaceum	1	GCF_000164135
+s__Iodobacteriophage_phiPLPE	1	PRJNA30965
+s__Thalassospira_profundimaris	1	GCF_000300275
+s__Helicobacter_suis	2	GCF_000187625	GCF_000187605
+s__Campylobacter_phage_CP81	1	PRJNA80911
+s__Endozoicomonas_elysicola	1	GCF_000373945
+s__Geobacter_sp_M21	1	GCF_000023645
+s__Desulfotomaculum_ruminis	1	GCF_000215085
+s__Dermabacter_sp_HFH0086	1	GCF_000413375
+s__Enterobacteria_phage_Mu	1	PRJNA14105
+s__Rhizobium_leguminosarum	17	GCF_000271785	GCF_000373285	GCF_000385155	GCF_000375705	GCF_000023185	GCF_000021345	GCF_000009265	GCF_000271845	GCF_000373425	GCF_000372105	GCF_000372205	GCF_000373325	GCF_000371905	GCF_000379005	GCF_000372305	GCF_000271825	GCF_000271805
+s__Cyclovirus_NGchicken15_NGA_2009	1	PRJNA61953
+s__Jatropha_yellow_mosaic_India_virus	1	PRJNA32075
+s__Thioalkalimicrobium_aerophilum	1	GCF_000227665
+s__Rhynchosia_yellow_mosaic_India_virus	1	PRJNA61861
+s__Salmonella_phage_SETP3	1	PRJNA19157
+s__Thiocapsa_marina	1	GCF_000223985
+s__Fluoribacter_dumoffii	2	GCF_000236145	GCF_000236165
+s__Salmonella_phage_SETP7	1	PRJNA226725
+s__Human_papillomavirus_type_136	1	PRJNA167866
+s__Flavobacterium_saliperosum	1	GCF_000498515
+s__Corynebacterium_ammoniagenes	1	GCF_000164115
+s__Vibrio_phage_VfO3K6	1	PRJNA14093
+s__Ruminococcus_callidus	1	GCF_000468015
+s__Persimmon_virus_A	1	PRJNA172457
+s__Actinokineospora_enzanensis	1	GCF_000374445
+s__Roseobacter_sp_MED193	1	GCF_000152965
+s__Nemesia_ring_necrosis_virus	1	PRJNA32681
+s__Halovirus_HHTV_1	1	PRJNA206495
+s__Halovirus_HHTV_2	1	PRJNA206494
+s__Nerine_virus_X	1	PRJNA16257
+s__Synechococcus_phage_S_CBS2	1	PRJNA66393
+s__Marinobacter_lipolyticus	2	GCF_000397065	GCF_000372805
+s__Choristoneura_occidentalis_granulovirus	1	PRJNA17097
+s__Thermococcus_sp_AM4	1	GCF_000151205
+s__Streptococcus_pasteurianus	1	GCF_000270165
+s__Peptoniphilus_duerdenii	1	GCF_000146345
+s__Sulfolobus_virus_Ragged_Hills	1	PRJNA14354
+s__Mycoplasma_genitalium	6	GCF_000292405	GCF_000292505	GCF_000167595	GCF_000292445	GCF_000292485	GCF_000027325
+s__Abutilon_mosaic_Brazil_virus	1	PRJNA81009
+s__Pandoravirus_dulcis	1	PRJNA213019
+s__Leuconostoc_carnosum	2	GCF_000260375	GCF_000300135
+s__Citrobacter_sp_KTE30	1	GCF_000398825
+s__Citrobacter_sp_KTE32	1	GCF_000398865
+s__Lettuce_mosaic_virus	1	PRJNA15342
+s__Methanosarcina_mazei	2	GCF_000007065	GCF_000341715
+s__Mycobacterium_indicus_pranii	1	GCF_000298095
+s__Corynebacterium_sp_KPL1989	1	GCF_000477955
+s__Dictyoglomus_turgidum	1	GCF_000021645
+s__Acinetobacter_phage_ZZ1	1	PRJNA169230
+s__Corynebacterium_sp_KPL1986	1	GCF_000477975
+s__Selenomonas_sp_oral_taxon_892	1	GCF_000468035
+s__Pseudomonas_sp_GM60	1	GCF_000282415
+s__Sanguibacter_keddieii	1	GCF_000024925
+s__Pseudomonas_sp_GM67	1	GCF_000282435
+s__Parana_virus	1	PRJNA29907
+s__Panicum_mosaic_virus	1	PRJNA14979
+s__Ralstonia_phage_RSM3	1	PRJNA32325
+s__Blackberry_virus_E	1	PRJNA68409
+s__Satsuma_dwarf_virus	1	PRJNA15409
+s__Pedobacter_heparinus	1	GCF_000023825
+s__Torque_teno_virus_27	1	PRJNA48147
+s__Mycobacterium_phage_Konstantine	1	PRJNA32015
+s__Clostridium_sp_BL8	1	GCF_000447315
+s__Collinsella_stercoris	1	GCF_000156215
+s__Dethiosulfovibrio_peptidovorans	1	GCF_000172975
+s__Arthrobacter_sp_FB24	1	GCF_000196235
+s__Corynebacterium_sp_HFH0082	1	GCF_000411235
+s__Blackberry_virus_Y	1	PRJNA18125
+s__Mycobacterium_sp_012931	1	GCF_000419295
+s__Clostridiales_genomosp_BVAB3	1	GCF_000025225
+s__Helicobacter_phage_KHP40	1	PRJNA184159
+s__Bacillus_nealsonii	1	GCF_000401235
+s__Arthrobacter_globiformis	1	GCF_000238915
+s__Aspergillus_nidulans	1	GCA_000149205
+s__Beet_severe_curly_top_virus	1	PRJNA14367
+s__Paraprevotella_xylaniphila	1	GCF_000205165
+s__Mycoplasma_conjunctivae	1	GCF_000026765
+s__Prevotella_bryantii	1	GCF_000179055
+s__Sri_Lankan_cassava_mosaic_virus	1	PRJNA15130
+s__Enterobacterial_phage_mEp234	1	PRJNA183153
+s__Ramie_mosaic_virus	1	PRJNA29985
+s__Horsegram_yellow_mosaic_virus	1	PRJNA14356
+s__Squash_leaf_curl_China_virus	1	PRJNA15591
+s__Actinomycetospora_chiangmaiensis	1	GCF_000379625
+s__Y73_sarcoma_virus	1	PRJNA16745
+s__Bacillus_amyloliquefaciens	19	GCF_000242855	GCF_000195515	GCF_000015785	GCF_000330805	GCF_000469015	GCF_000493375	GCF_000204275	GCF_000341875	GCF_000299615	GCF_000319475	GCF_000221645	GCF_000283695	GCF_000196735	GCF_000284395	GCF_000455565	GCF_000494835	GCF_000455585	GCF_000262385	GCF_000465655
+s__Leptothrix_cholodnii	1	GCF_000019785
+s__Gordonia_hirsuta	1	GCF_000333015
+s__Pseudomonas_phage_119X	1	PRJNA16385
+s__Streptomyce_phage_TG1	1	PRJNA177524
+s__Prochlorococcus_phage_MED4_184	1	PRJNA195504
+s__Waddlia_chondrophila	1	GCF_000092785
+s__Nitrosococcus_watsonii	1	GCF_000143085
+s__Mycobacterium_phage_Giles	1	PRJNA27907
+s__Streptococcus_sanguinis	22	GCF_000212835	GCF_000194945	GCF_000195125	GCF_000212815	GCF_000192275	GCF_000195045	GCF_000220275	GCF_000192245	GCF_000188275	GCF_000191105	GCF_000212855	GCF_000507745	GCF_000192205	GCF_000195025	GCF_000192185	GCF_000204475	GCF_000191085	GCF_000014205	GCF_000212795	GCF_000220315	GCF_000191125	GCF_000194965
+s__Salmonella_phage_FSL_SP_004	1	PRJNA212714
+s__Synechococcus_phage_S_SSM4	1	PRJNA195515
+s__Vibrio_harveyi	4	GCF_000182685	GCF_000347555	GCF_000275705	GCF_000259935
+s__Cronobacter_phage_vB_CsaM_GAP161	1	PRJNA179412
+s__Methanoregula_formicica	1	GCF_000327485
+s__Streptococcus_anginosus	8	GCF_000257765	GCF_000184365	GCF_000373605	GCF_000214555	GCF_000463465	GCF_000186545	GCF_000463505	GCF_000287595
+s__Pseudoalteromonas_sp_BSi20311	1	GCF_000239875
+s__Actinomyces_sp_oral_taxon_448	1	GCF_000220835
+s__Euproctis_pseudoconspersa_nucleopolyhedrovirus	1	PRJNA37827
+s__Tomato_mottle_leaf_curl_Zulia_virus	1	PRJNA62741
+s__Pepper_ringspot_virus	1	PRJNA14777
+s__Bacteroides_sp_3_1_19	1	GCF_000163655
+s__Maize_necrotic_streak_virus	1	PRJNA16323
+s__Moritella_marina	2	GCF_000381865	GCF_000291685
+s__Salmonella_phage_SKML_39	1	PRJNA184160
+s__Elusimicrobium_minutum	1	GCF_000020145
+s__Vibrio_rotiferianus	1	GCF_000195225
+s__Pan_troglodytes_verus_polyomavirus_3	1	PRJNA183906
+s__Vibriophage_VP4	1	PRJNA15449
+s__Vibrio_furnissii	2	GCF_000184325	GCF_000176175
+s__Malvastrum_leaf_curl_Philippines_betasatellite	1	PRJNA214366
+s__Frankia_sp_Iso899	1	GCF_000421445
+s__Bartonella_sp_R4_2010	1	GCF_000312525
+s__Streptomyces_fulvissimus	1	GCF_000385945
+s__Tamana_bat_virus	1	PRJNA15398
+s__Pear_latent_virus	1	PRJNA14879
+s__Faba_bean_necrotic_stunt_virus	1	PRJNA39929
+s__Sweet_potato_feathery_mottle_virus	1	PRJNA15347
+s__Hydrogenobacter_thermophilus	2	GCF_000010785	GCF_000164905
+s__Bartonella_sp_DB5_6	1	GCF_000278115
+s__Enterobacter_sp_MGH_8	1	GCF_000474805
+s__Tomato_leaf_curl_Kerala_virus	1	PRJNA30935
+s__Spiroplasma_kunkelii_virus_SkV1_CR2_3x	1	PRJNA27891
+s__Plantago_asiatica_mosaic_virus	1	PRJNA15073
+s__Gremmeniella_abietina_RNA_virus_MS2	1	PRJNA15232
+s__Pectobacterium_atrosepticum	1	GCF_000011605
+s__Groundnut_rosette_virus_satellite_RNA	1	PRJNA14429
+s__Acidianus_hospitalis	1	GCF_000213215
+s__Bifidobacterium_animalis	14	GCF_000021425	GCF_000277325	GCF_000092765	GCF_000240765	GCF_000025245	GCF_000172535	GCF_000022705	GCF_000471945	GCF_000414215	GCF_000022965	GCF_000220885	GCF_000224965	GCF_000277345	GCF_000260715
+s__Paenibacillus_sp_HGF5	1	GCF_000204455
+s__Glaciecola_lipolytica	1	GCF_000314975
+s__Melon_chlorotic_mosaic_virus	1	PRJNA51415
+s__Kosmotoga_olearia	1	GCF_000023325
+s__Dechloromonas_aromatica	1	GCF_000012425
+s__Listeria_phage_LP_037	1	PRJNA212948
+s__Lactobacillus_acidophilus	4	GCF_000191545	GCF_000389675	GCF_000159715	GCF_000011985
+s__Vibrio_vulnificus	6	GCF_000299635	GCF_000342305	GCF_000186585	GCF_000009745	GCF_000039765	GCF_000303175
+s__Actinomyces_urogenitalis	1	GCF_000159035
+s__Frankia_sp_BCU110501	1	GCF_000373365
+s__Ostreococcus_tauri_virus_2	1	PRJNA61087
+s__Ostreococcus_tauri_virus_1	1	PRJNA40907
+s__Obuda_pepper_virus	1	PRJNA14817
+s__Synechococcus_sp_CC9311	1	GCF_000014585
+s__Barley_yellow_dwarf_virus_GAV	1	PRJNA15035
+s__Tomato_severe_rugose_virus	1	PRJNA19973
+s__Slow_bee_paralysis_virus	1	PRJNA48587
+s__Brachybacterium_squillarum	1	GCF_000225825
+s__Glaciecola_chathamensis	1	GCF_000314955
+s__Arthroderma_otae	1	GCA_000151145
+s__Flavobacterium_johnsoniae	1	GCF_000016645
+s__Clostridium_termitidis	1	GCF_000350485
+s__Clostridium_saccharobutylicum	1	GCF_000473995
+s__Trichoplusia_ni_ascovirus_2c	1	PRJNA18003
+s__Corynebacterium_pseudogenitalium	1	GCF_000156615
+s__Brevundimonas_diminuta	2	GCF_000204035	GCF_000318405
+s__Raoultella_ornithinolytica	1	GCF_000367425
+s__Bartonella_australis	1	GCF_000341355
+s__Botryotinia_fuckeliana_totivirus_1	1	PRJNA19133
+s__Vibrio_ichthyoenteri	1	GCF_000222605
+s__Simian_T_cell_lymphotropic_virus_6	1	PRJNA32697
+s__Thermus_igniterrae	1	GCF_000376265
+s__Mesta_yellow_vein_mosaic_Bahraich_virus	1	PRJNA30083
+s__Mycobacterium_phage_Pukovnik	1	PRJNA30521
+s__Cupriavidus_sp_UYPR2_512	1	GCF_000379565
+s__Blechum_interveinal_chlorosis_virus	1	PRJNA178635
+s__Sulfurimonas_gotlandica	2	GCF_000156095	GCF_000242915
+s__Pseudomonas_pseudoalcaligenes	2	GCF_000297075	GCF_000262065
+s__Peptoniphilus_sp_JC140	1	GCF_000321025
+s__Enterococcus_gilvus	2	GCF_000407545	GCF_000394615
+s__Parascardovia_denticolens	3	GCF_000191785	GCF_000269845	GCF_000163835
+s__Serinicoccus_marinus	1	GCF_000421245
+s__Bacillus_sp_17376	1	GCF_000498695
+s__Staphylococcus_phage_phiNM3	1	PRJNA18329
+s__Pantoea_agglomerans	3	GCF_000241285	GCF_000475055	GCF_000330765
+s__Caloramator_australicus	1	GCF_000297115
+s__Succinivibrionaceae_bacterium_WG_1	1	GCF_000222855
+s__Okra_leaf_curl_Cameroon_virus	1	PRJNA60747
+s__Agrobacterium_sp_ATCC_31749	1	GCF_000214615
+s__Sparrow_coronavirus_HKU17	1	PRJNA17048
+s__Thioalkalivibrio_thiocyanoxidans	2	GCF_000227685	GCF_000385215
+s__Bradyrhizobiaceae_bacterium_SG_6C	1	GCF_000219645
+s__Lactobacillus_phage_LF1	1	PRJNA181083
+s__Mycobacterium_kansasii	1	GCF_000157895
+s__SAR406_cluster_bacterium_SCGC_AB_629_J13	1	GCF_000375825
+s__Bacillus_azotoformans	1	GCF_000307855
+s__Clitoria_yellow_mottle_virus	1	PRJNA80771
+s__Urochloa_streak_virus	1	PRJNA30033
+s__Cyanobium_gracile	1	GCF_000316515
+s__Bartonella_doshiae	1	GCF_000278155
+s__Arabis_mosaic_virus_small_satellite_RNA	1	PRJNA14021
+s__Streptomyces_sp_CNY228	1	GCF_000377545
+s__Orangutan_polyomavirus	1	PRJNA41471
+s__Solenopsis_invicta_virus_2	1	PRJNA19773
+s__Xanthomonas_sp_M97	1	GCF_000401255
+s__Petunia_vein_clearing_virus	1	PRJNA14031
+s__Natrialba_aegyptia	1	GCF_000337535
+s__Enterobacteria_phage_phiX174_sensu_lato	1	PRJNA14015
+s__Bacteroides_sp_3_2_5	1	GCF_000159855
+s__Cyclovirus_PKgoat21_PAK_2009	1	PRJNA61947
+s__Roseibium_sp_TrichSKD4	1	GCF_000148725
+s__Coprobacillus_sp_8_2_54BFAA	1	GCF_000244855
+s__Bhendi_yellow_vein_mosaic_betasatellite	1	PRJNA61777
+s__Erysipelotrichaceae_bacterium_2_2_44A	1	GCF_000225685
+s__Tomato_leaf_curl_Karnataka_virus_associated_DNA_beta	1	PRJNA17999
+s__Parvimonas_sp_oral_taxon_393	1	GCF_000223315
+s__Thermoproteus_uzoniensis	1	GCF_000193375
+s__Narcissus_degeneration_virus	1	PRJNA18729
+s__Arcobacter_nitrofigilis	1	GCF_000092245
+s__Tomato_yellow_leaf_curl_Saudi_virus	1	PRJNA217879
+s__Eubacterium_siraeum	4	GCF_000210635	GCF_000209915	GCF_000382085	GCF_000154325
+s__Leptospirillum_ferriphilum	1	GCF_000299235
+s__Helicobacter_bilis	2	GCF_000158435	GCF_000364285
+s__Oscillatoria_nigro_viridis	1	GCF_000317475
+s__Tomato_leaf_curl_virus_satellite	1	PRJNA14428
+s__Dolosigranulum_pigrum	1	GCF_000245815
+s__Carrot_red_leaf_luteovirus_associated_RNA	1	PRJNA14820
+s__Listeria_phage_A006	1	PRJNA20801
+s__Rheinheimera_sp_A13L	1	GCF_000217935
+s__Propionibacterium_phage_PAS50	1	PRJNA66339
+s__Actinomyces_sp_oral_taxon_877	1	GCF_000466305
+s__Bacillus_phage_phiNIT1	1	PRJNA213017
+s__Tomato_leaf_curl_Cotabato_virus	1	PRJNA28989
+s__Sphingobacterium_sp_21	1	GCF_000192845
+s__Ndumu_virus	1	PRJNA88115
+s__Phyllobacterium_sp_YR531	1	GCF_000282595
+s__Candidatus_Baumannia_cicadellinicola	1	GCF_000013185
+s__Enterobacteria_phage_If1	1	PRJNA14039
+s__Mycoreovirus_1	1	PRJNA29913
+s__Staphylococcus_phage_2638A	1	PRJNA15267
+s__Novispirillum_itersonii	1	GCF_000381985
+s__Sulfurospirillum_deleyianum	1	GCF_000024885
+s__Marinobacter_sp_BSs20148	1	GCF_000283275
+s__Pseudomonas_phage_MP29	1	PRJNA32999
+s__Gordonia_sp_NB4_1Y	1	GCF_000347295
+s__Pseudomonas_phage_MP22	1	PRJNA20961
+s__St_Croix_River_virus	1	PRJNA14941
+s__Arthroderma_benhamiae	1	GCA_000151125
+s__Cellulomonas_fimi	1	GCF_000212695
+s__Roseobacter_sp_AzwK_3b	1	GCF_000170875
+s__Dorea_formicigenerans	2	GCF_000225745	GCF_000169235
+s__Raven_circovirus	1	PRJNA17773
+s__Citrus_exocortis_viroid	1	PRJNA14637
+s__Rose_spring_dwarf_associated_virus	1	PRJNA30051
+s__Clostridium_spiroforme	1	GCF_000154805
+s__Azospirillum_brasilense	1	GCF_000237365
+s__Salinisphaera_shabanensis	1	GCF_000215955
+s__Streptococcus_phage_858	1	PRJNA28829
+s__Streptococcus_sp_SK643	1	GCF_000259505
+s__Pseudomonas_phage_phi8	1	PRJNA14731
+s__Haemophilus_phage_HP2	1	PRJNA14231
+s__Haemophilus_sp_oral_taxon_851	1	GCF_000242295
+s__Clostridium_sporogenes	2	GCF_000240115	GCF_000155085
+s__Pseudomonas_phage_phi6	1	PRJNA14788
+s__Trichoplusia_ni_single_nucleopolyhedrovirus	1	PRJNA15635
+s__TYLCAxV_Sic1_IT_Sic2_2_04	1	PRJNA30523
+s__Jatropha_mosaic_Nigerian_virus	1	PRJNA178634
+s__Acinetobacter_gyllenbergii	2	GCF_000488195	GCF_000413855
+s__Colwellia_piezophila	1	GCF_000378625
+s__Isosphaera_pallida	1	GCF_000186345
+s__Pseudomonas_mendocina	5	GCF_000287395	GCF_000465575	GCF_000016565	GCF_000204295	GCF_000295795
+s__Bifidobacterium_bifidum	8	GCF_000155395	GCF_000265095	GCF_000300215	GCF_000164965	GCF_000299595	GCF_000165905	GCF_000466525	GCF_000273525
+s__Magnaporthe_oryzae	1	GCA_000002495
+s__Nanoarchaeum_equitans	1	GCF_000008085
+s__Cotton_leaf_curl_Bangalore_virus	1	PRJNA15575
+s__Cactus_mild_mottle_virus	1	PRJNA33485
+s__Enterococcus_mundtii	4	GCF_000504125	GCF_000233395	GCF_000393815	GCF_000407465
+s__Escherichia_sp_1_1_43	1	GCF_000159895
+s__Entebbe_bat_virus	1	PRJNA18515
+s__Pseudomonas_chloritidismutans	1	GCF_000495915
+s__Zalophus_californianus_papillomavirus_1	1	PRJNA65277
+s__Psychromonas_sp_CNPT3	1	GCF_000153405
+s__Croton_yellow_vein_mosaic_betasatellite	1	PRJNA18249
+s__Verrucomicrobiae_bacterium_DG1235	1	GCF_000155695
+s__Grapevine_Algerian_latent_virus	1	PRJNA32675
+s__Prevotella_oulorum	1	GCF_000224615
+s__Streptomyces_sp_Mg1	1	GCF_000154885
+s__Moraxella_macacae	1	GCF_000320365
+s__Methylophilus_methylotrophus	1	GCF_000378225
+s__Paenibacillus_sp_Y412MC10	1	GCF_000024685
+s__Prevotella_salivae	1	GCF_000185845
+s__Verbesina_encelioides_leaf_curl_alphasatellite	1	PRJNA67961
+s__Spinach_latent_virus	1	PRJNA14810
+s__Hydrangea_chlorotic_mottle_virus	1	PRJNA38689
+s__Guar_leaf_curl_alphasatellite	1	PRJNA193981
+s__Fusarium_poae_virus_1	1	PRJNA14827
+s__Coriobacteriaceae_bacterium_BV3Ac1	1	GCF_000468855
+s__Candidatus_Uzinura_diaspidicola	1	GCF_000331975
+s__Sphingobium_lactosutens	1	GCF_000445105
+s__Okra_mottle_virus	1	PRJNA31095
+s__Eragrostis_streak_virus	1	PRJNA28825
+s__Rhizobium_sp_CF122	1	GCF_000282035
+s__Banana_streak_UM_virus	1	PRJNA66615
+s__Magnetococcus_marinus	1	GCF_000014865
+s__Propionibacterium_acnes	85	GCF_000145095	GCF_000231215	GCF_000144325	GCF_000496915	GCF_000144305	GCF_000145575	GCF_000147145	GCF_000144105	GCF_000144385	GCF_000144245	GCF_000177395	GCF_000221125	GCF_000144875	GCF_000342585	GCF_000008345	GCF_000144505	GCF_000217615	GCF_000144025	GCF_000145075	GCF_000145155	GCF_000144345	GCF_000144185	GCF_000144045	GCF_000178075	GCF_000240035	GCF_000144565	GCF_000144585	GCF_000252385	GCF_000178055	GCF_000194825	GCF_000194905	GCF_000145495	GCF_000144225	 [...]
+s__Grapevine_rupestris_stem_pitting_associated_virus	1	PRJNA15249
+s__Fretibacterium_fastidiosum	1	GCF_000210715
+s__Methanosarcina_barkeri	1	GCF_000195895
+s__Streptomyces_gancidicus	1	GCF_000342345
+s__Tomato_rugose_yellow_leaf_curl_virus	1	PRJNA189211
+s__Lactobacillus_fermentum	8	GCF_000162395	GCF_000010145	GCF_000159215	GCF_000466785	GCF_000496435	GCF_000210515	GCF_000477515	GCF_000397165
+s__Burkholderia_phage_KS5	1	PRJNA64563
+s__Thermotoga_lettingae	1	GCF_000017865
+s__Acinetobacter_radioresistens	8	GCF_000162115	GCF_000301795	GCF_000248115	GCF_000368885	GCF_000286595	GCF_000368905	GCF_000175675	GCF_000308075
+s__Megasphaera_genomosp_type_1	1	GCF_000177555
+s__Prevotella_dentalis	2	GCF_000242335	GCF_000220215
+s__Rhodococcus_equi	3	GCF_000473915	GCF_000164155	GCF_000196695
+s__Burkholderia_phage_KS9	1	PRJNA39771
+s__Sulfurospirillum_barnesii	1	GCF_000265295
+s__Hippeastrum_latent_virus	1	PRJNA32685
+s__Singularimonas_variicoloris	1	GCF_000382285
+s__Natrialba_magadii	2	GCF_000025625	GCF_000337875
+s__Exiguobacterium_sibiricum	1	GCF_000019905
+s__Pseudomonas_sp_GM48	1	GCF_000282335
+s__Pseudomonas_sp_GM49	1	GCF_000282355
+s__Acinetobacter_sp_ANC_3789	1	GCF_000368265
+s__Corynebacterium_kroppenstedtii	1	GCF_000023145
+s__Goatpox_virus	1	PRJNA14197
+s__Haloterrigena_limicola	1	GCF_000337475
+s__Eubacterium_hallii	1	GCF_000173975
+s__Mycobacterium_phage_Angelica	1	PRJNA51667
+s__Bartonella_grahamii	1	GCF_000022725
+s__Janthinobacterium_lividum	1	GCF_000242815
+s__Wild_potato_mosaic_virus	1	PRJNA15404
+s__Desulfurococcus_kamchatkensis	1	GCF_000020905
+s__Ageratum_latent_virus	1	PRJNA216153
+s__Pedosphaera_parvula	1	GCF_000172555
+s__Acinetobacter_sp_CIP_101934	1	GCF_000369585
+s__Cypovirus_1	1	PRJNA14714
+s__Enterobacteria_phage_933W_sensu_lato	3	PRJNA14043	PRJNA14167	PRJNA14480
+s__Cypovirus_5	1	PRJNA29601
+s__Escherichia_phage_phiKT	1	PRJNA181222
+s__Astrovirus_VA4	1	PRJNA178562
+s__Gemella_haemolysans	2	GCF_000173915	GCF_000204355
+s__Mycobacterium_ulcerans	1	GCF_000013925
+s__Lumpy_skin_disease_virus	1	PRJNA14122
+s__Brachyspira_murdochii	1	GCF_000092845
+s__Enterobacteria_phage_CC31	1	PRJNA60119
+s__Rotavirus_F	1	PRJNA210412
+s__Psychrobacter_arcticus	1	GCF_000012305
+s__Gentian_mosaic_virus	1	PRJNA31113
+s__Astrovirus_VA2	1	PRJNA176435
+s__Murine_osteosarcoma_virus	1	PRJNA14655
+s__Astrovirus_VA3	1	PRJNA178564
+s__Kelp_fly_virus	1	PRJNA16201
+s__Lactobacillus_buchneri	3	GCF_000159195	GCF_000298115	GCF_000211375
+s__Mitsuokella_multacida	1	GCF_000155955
+s__Rotavirus_H	1	PRJNA16144
+s__Arthrobacter_crystallopoietes	1	GCF_000328305
+s__Upsilonpapillomavirus_2	1	PRJNA17117
+s__Enterobacter_hormaechei	3	GCF_000328905	GCF_000213995	GCF_000328885
+s__Gracilibacillus_lacisalsi	1	GCF_000377765
+s__Groundnut_ringspot_and_Tomato_chlorotic_spot_virus_reassortant	1	PRJNA66459
+s__Poplar_mosaic_virus	1	PRJNA15056
+s__Slackia_heliotrinireducens	1	GCF_000023885
+s__Haloarcula_hispanica_icosahedral_virus_2	1	PRJNA109269
+s__Squash_leaf_curl_Yunnan_virus	1	PRJNA15194
+s__Datura_leaf_distortion_virus	1	PRJNA176617
+s__Methylotenera_versatilis	2	GCF_000093025	GCF_000384375
+s__Deltapapillomavirus_2	1	PRJNA14073
+s__Nile_crocodilepox_virus	1	PRJNA16798
+s__Alcanivorax_sp_DG881	1	GCF_000155615
+s__Sida_yellow_vein_virus	1	PRJNA14264
+s__Piscine_myocarditis_virus_AL_V_708	1	PRJNA67963
+s__Planococcus_citri_densovirus	1	PRJNA14223
+s__Azospirillum_amazonense	1	GCF_000225995
+s__Menangle_virus	1	PRJNA16205
+s__Thioalkalivibrio_versutus	1	GCF_000374265
+s__Methanohalophilus_mahii	1	GCF_000025865
+s__Xanthomonas_fuscans	2	GCF_000175135	GCF_000175155
+s__Methylocystis_sp_SC2	1	GCF_000304315
+s__Lactobacillus_malefermentans	1	GCF_000260775
+s__Elephantid_herpesvirus_1	1	PRJNA192609
+s__Mouse_mammary_tumor_virus	1	PRJNA14435
+s__Gluconobacter_oxydans	3	GCF_000011685	GCF_000263255	GCF_000311765
+s__Corynebacterium_callunae	1	GCF_000344785
+s__Enterococcus_sp_GMD1E	1	GCF_000296975
+s__Streptococcus_didelphis	1	GCF_000380005
+s__Pseudomonas_phage_B3	1	PRJNA14542
+s__Leuconostoc_phage_P793	1	PRJNA195531
+s__Saimiriine_herpesvirus_2	1	PRJNA14417
+s__Saimiriine_herpesvirus_3	1	PRJNA78947
+s__Corynebacterium_bovis	1	GCF_000183325
+s__Hyperthermophilic_Archaeal_Virus_1	1	PRJNA50363
+s__Garlic_virus_C	1	PRJNA14736
+s__Hyperthermophilic_Archaeal_Virus_2	1	PRJNA50361
+s__Kineococcus_radiotolerans	1	GCF_000017305
+s__Alishewanella_aestuarii	1	GCF_000280055
+s__Whitewater_Arroyo_virus	1	PRJNA29833
+s__Thermococcus_onnurineus	1	GCF_000018365
+s__Natrialba_taiwanensis	1	GCF_000337595
+s__Norwalk_virus	1	PRJNA15520
+s__Staphylococcus_phage_42E	1	PRJNA15268
+s__Baboon_orthoreovirus	1	PRJNA71165
+s__Grapevine_rootstock_stem_lesion_associated_virus	1	PRJNA14880
+s__Pseudomonas_sp_GM80	1	GCF_000282515
+s__Flavobacterium_sp_CF136	1	GCF_000282055
+s__Streptomyces_sp_SM8	1	GCF_000299175
+s__Clostridium_sp_Maddingley_MBC34_26	1	GCF_000309845
+s__Pseudomonas_sp_GM84	1	GCF_000282535
+s__Corynebacterium_tuberculostearicum	1	GCF_000175635
+s__Paenisporosarcina_sp_TG20	1	GCF_000286315
+s__Weeksella_virosa	1	GCF_000189415
+s__Daphne_mosaic_virus	1	PRJNA16794
+s__Plasmodium_vivax	1	GCA_000002415
+s__Candidatus_Burkholderia_kirkii	1	GCF_000234195
+s__Enterococcus_phage_EFRM31	1	PRJNA64607
+s__Candidatus_Zinderia_insecticola	1	GCF_000147015
+s__Saguaro_cactus_virus	1	PRJNA14981
+s__Peptoniphilus_sp_BV3AC2	1	GCF_000478945
+s__Chloroflexus_aggregans	1	GCF_000021945
+s__Alistipes_finegoldii	1	GCF_000265365
+s__Acidobacterium_capsulatum	1	GCF_000022565
+s__Atopobium_parvulum	1	GCF_000024225
+s__Pothos_latent_virus	1	PRJNA15185
+s__Ceratocystis_polonica_partitivirus	1	PRJNA29847
+s__Vibrio_phage_Vf12	1	PRJNA14385
+s__Haemophilus_paraphrohaemolyticus	1	GCF_000260675
+s__Aeropyrum_pernix	1	GCF_000011125
+s__Hepatitis_E_virus	1	PRJNA15435
+s__Enterobacteria_phage_HK97	1	PRJNA14592
+s__Coniothyrium_minitans_RNA_virus	1	PRJNA16142
+s__Avian_orthoreovirus	1	PRJNA62875
+s__Prevotella_paludivivens	1	GCF_000373185
+s__Hydrangea_ringspot_virus	1	PRJNA15151
+s__Porphyromonas_somerae	1	GCF_000372405
+s__Propionibacterium_phage_P100_1	1	PRJNA177536
+s__Chlorobium_phaeovibrioides	1	GCF_000016085
+s__Erwinia_phage_ENT90	1	PRJNA184166
+s__Enterobacteria_phage_T7	1	PRJNA14460
+s__Sulfurimonas_sp_AST_10	1	GCF_000445475
+s__Enterobacteria_phage_T3	1	PRJNA14336
+s__Desulfitobacterium_sp_PCE1	1	GCF_000384015
+s__Mycobacterium_phage_Chah	1	PRJNA32021
+s__Staphylococcus_epidermidis	82	GCF_000304575	GCF_000276285	GCF_000276325	GCF_000418085	GCF_000417945	GCF_000308395	GCF_000205325	GCF_000418125	GCF_000247065	GCF_000245635	GCF_000276065	GCF_000247165	GCF_000418185	GCF_000276185	GCF_000275985	GCF_000276505	GCF_000390365	GCF_000160235	GCF_000257965	GCF_000247045	GCF_000247025	GCF_000247185	GCF_000011925	GCF_000177115	GCF_000247125	GCF_000276145	GCF_000314715	GCF_000276125	GCF_000276085	GCF_000276025	GCF_000276485	GCF_000257945	GCF_0002471 [...]
+s__Weissella_koreensis	2	GCF_000277645	GCF_000219805
+s__Magnaporthe_oryzae_virus_2	1	PRJNA28297
+s__Magnaporthe_oryzae_virus_1	1	PRJNA15041
+s__Pseudomonas_sp_UW4	1	GCF_000316175
+s__Sulfurihydrogenibium_yellowstonense	1	GCF_000173615
+s__Daphne_virus_S	1	PRJNA16749
+s__Malvastrum_leaf_curl_Guangdong_virus	1	PRJNA17593
+s__Siegesbeckia_yellow_vein_virus_associated_DNA_beta	1	PRJNA17269
+s__Basella_rugose_mosaic_virus	1	PRJNA20619
+s__Nostoc_punctiforme	1	GCF_000020025
+s__Nitrosomonas_europaea	1	GCF_000009145
+s__LuIII_virus	1	PRJNA14278
+s__Symbiobacterium_thermophilum	1	GCF_000009905
+s__Micromonospora_lupini	1	GCF_000297395
+s__actinobacterium_SCGC_AAA023_J06	1	GCF_000372265
+s__Desulfosporosinus_sp_OT	1	GCF_000224515
+s__Malvastrum_yellow_vein_Yunnan_virus_satellite_DNA_beta	1	PRJNA14567
+s__Acidovorax_sp_CF316	1	GCF_000276605
+s__Dictyostelium_discoideum	1	GCA_000004695
+s__Pyrobaculum_sp_1860	1	GCF_000234805
+s__Acinetobacter_sp_CIP_102637	1	GCF_000368425
+s__BK_polyomavirus	1	PRJNA14074
+s__Barley_yellow_dwarf_virus_PAS	1	PRJNA14698
+s__Rickettsia_akari	1	GCF_000018205
+s__Pseudomonas_phage_PRR1	1	PRJNA17481
+s__Ageratum_yellow_leaf_curl_betasatellite	1	PRJNA14439
+s__Corynebacterium_timonense	1	GCF_000312345
+s__Sphingomonas_sp_PAMC_26605	1	GCF_000241485
+s__Mycobacterium_phage_L5	1	PRJNA14459
+s__Blastopirellula_marina	1	GCF_000153105
+s__Pseudoalteromonas_sp_BSi20439	1	GCF_000241165
+s__Collinsella_sp_GD3	1	GCF_000333815
+s__Prochlorococcus_phage_Syn1	1	PRJNA64713
+s__Human_gyrovirus_type_1	1	PRJNA67891
+s__Southern_cowpea_mosaic_virus	1	PRJNA15331
+s__Mycobacterium_phage_Phaux	1	PRJNA206025
+s__Bhendi_yellow_vein_India_virus	1	PRJNA61555
+s__Thiovulum_sp_ES	1	GCF_000276965
+s__Blautia_hansenii	1	GCF_000156675
+s__Variola_virus	1	PRJNA15197
+s__Porphyromonas_asaccharolytica	2	GCF_000212375	GCF_000183605
+s__Sphingomonas_sp_Mn802worker	1	GCF_000382485
+s__Exiguobacterium_sp_MH3	1	GCF_000496635
+s__Bacteroides_sp_2_1_7	1	GCF_000157035
+s__Syntrophobotulus_glycolicus	1	GCF_000190635
+s__Salinicoccus_carnicancri	1	GCF_000330705
+s__Oceanicola_sp_S124	1	GCF_000220565
+s__Psychrobacter_phage_pOW20_A	1	PRJNA195475
+s__Halothiobacillus_neapolitanus	1	GCF_000024765
+s__Allofustis_seminis	1	GCF_000374325
+s__Sugarcane_streak_mosaic_virus	1	PRJNA47861
+s__Acinetobacter_sp_NIPH_973	1	GCF_000368065
+s__Citrobacter_sp_30_2	1	GCF_000158355
+s__Escherichia_coli	1472	GCF_000457635	GCF_000408425	GCF_000459975	GCF_000175755	GCF_000303895	GCF_000458215	GCF_000350945	GCF_000352525	GCF_000264195	GCF_000335215	GCF_000358335	GCF_000352005	GCF_000356105	GCF_000462545	GCF_000461195	GCF_000459135	GCF_000458295	GCF_000461955	GCF_000462645	GCF_000267805	GCF_000407765	GCF_000172055	GCF_000320195	GCF_000007445	GCF_000356505	GCF_000340255	GCF_000264135	GCF_000462465	GCF_000181775	GCF_000320075	GCF_000462665	GCF_000303295	GCF_000357865	GCF_0 [...]
+s__Desulfotomaculum_carboxydivorans	1	GCF_000214435
+s__Coconut_foliar_decay_virus	1	PRJNA14067
+s__Melaka_orthoreovirus	1	PRJNA191884
+s__Beet_yellows_virus	1	PRJNA15328
+s__Thermobifida_fusca	2	GCF_000401915	GCF_000012405
+s__Tomato_bushy_stunt_virus_satellite_RNA	1	PRJNA14430
+s__Acinetobacter_sp_NIPH_1859	1	GCF_000369765
+s__Acinetobacter_soli	2	GCF_000368705	GCF_000368725
+s__Acinetobacter_sp_NIPH_817	1	GCF_000368405
+s__Neisseria_sp_oral_taxon_020	1	GCF_000318235
+s__Methylophaga_thiooxydans	1	GCF_000156355
+s__Mycobacterium_intracellulare	5	GCF_000309055	GCF_000277145	GCF_000277125	GCF_000276825	GCF_000172115
+s__Cyanobacterium_aponinum	1	GCF_000317675
+s__Fibrisoma_limi	1	GCF_000296815
+s__Thunberg_fritillary_virus	1	PRJNA15483
+s__Human_parvovirus_4	1	PRJNA15414
+s__Mungbean_yellow_mosaic_India_virus_associated_betasatellite_India_Faizabad_Cow_Pea_2012	1	PRJNA177773
+s__Enterococcus_pallens	2	GCF_000407485	GCF_000393975
+s__Dehalobacter_sp_DCA	1	GCF_000305775
+s__Haemophilus_influenzae	21	GCF_000169775	GCF_000169835	GCF_000210875	GCF_000012185	GCF_000169815	GCF_000027305	GCF_000173315	GCF_000016485	GCF_000165575	GCF_000169855	GCF_000175455	GCF_000175435	GCF_000197875	GCF_000200475	GCF_000169735	GCF_000465255	GCF_000016465	GCF_000173335	GCF_000169795	GCF_000165525	GCF_000169755
+s__Staphylococcus_phage_phiSLT	1	PRJNA14137
+s__Vibrio_cholerae	183	GCF_000237705	GCF_000168915	GCF_000318075	GCF_000299495	GCF_000221485	GCF_000305115	GCF_000221345	GCF_000305075	GCF_000174295	GCF_000303105	GCF_000303125	GCF_000176435	GCF_000152425	GCF_000237445	GCF_000348225	GCF_000348365	GCF_000176455	GCF_000305625	GCF_000153785	GCF_000348385	GCF_000302775	GCF_000330905	GCF_000154005	GCF_000234435	GCF_000220765	GCF_000221425	GCF_000302875	GCF_000279435	GCF_000221385	GCF_000387725	GCF_000387625	GCF_000234395	GCF_000303085	GCF_000 [...]
+s__Dehalobacter_sp_CF	1	GCF_000305815
+s__Yersinia_phage_phiYeO3_12	1	PRJNA14591
+s__Tomato_marchitez_virus	1	PRJNA30365
+s__Candidatus_Pelagibacter_ubique	7	GCF_000153525	GCF_000012345	GCF_000419545	GCF_000372905	GCF_000504225	GCF_000472605	GCF_000384455
+s__Aeromonas_caviae	1	GCF_000208825
+s__Mycoplasma_anatis	1	GCF_000221305
+s__Trypanosoma_brucei	1	GCA_000002445
+s__Bamboo_mosaic_virus	1	PRJNA14728
+s__Chloroflexus_sp_Y_400_fl	1	GCF_000022185
+s__Lactobacillus_phage_PL_1	1	PRJNA227007
+s__Enterorhabdus_caecimuris	1	GCF_000403355
+s__Prevotella_multisaccharivorax	1	GCF_000218235
+s__Halorubrum_terrestre	1	GCF_000337435
+s__Enterococcus_avium	2	GCF_000407245	GCF_000406965
+s__Herbaspirillum_sp_CF444	1	GCF_000282135
+s__Staphylococcus_phage_YMC_09_04_R1988	1	PRJNA227002
+s__Capnocytophaga_canimorsus	1	GCF_000220625
+s__Halococcus_morrhuae	1	GCF_000336695
+s__Thermotoga_petrophila	1	GCF_000016785
+s__Rodent_hepacivirus	1	PRJNA198869
+s__Gordonia_terrae	2	GCF_000390025	GCF_000248035
+s__Colobus_monkey_papillomavirus	1	PRJNA68289
+s__Gordonia_malaquae	1	GCF_000344135
+s__Yokenella_regensburgei	1	GCF_000239335
+s__European_catfish_virus	1	PRJNA167164
+s__Gayfeather_mild_mottle_virus	1	PRJNA34755
+s__Leptospira_alexanderi	1	GCF_000243815
+s__Abutilon_Brazil_virus	1	PRJNA48591
+s__Rahnella_aquatilis	2	GCF_000255535	GCF_000241955
+s__Propionibacterium_sp_CC003_HC2	1	GCF_000221085
+s__Opitutaceae_bacterium_TAV1	1	GCF_000243495
+s__Velvet_bean_severe_mosaic_virus	1	PRJNA41175
+s__Sulfolobus_islandicus_rod_shaped_virus_1	1	PRJNA14514
+s__Sulfolobus_islandicus_rod_shaped_virus_2	1	PRJNA15191
+s__Methanohalobium_evestigatum	1	GCF_000196655
+s__Milk_vetch_dwarf_virus	1	PRJNA14173
+s__Flavobacteriaceae_bacterium_S85	1	GCF_000220525
+s__Propionibacterium_phage_P101A	1	PRJNA177531
+s__Rhizobium_sp_CF142	1	GCF_000281145
+s__Freesia_sneak_virus	1	PRJNA196748
+s__Microcoleus_sp_PCC_7113	1	GCF_000317515
+s__Cycas_necrotic_stunt_virus	1	PRJNA15397
+s__Psychroflexus_torquis	1	GCF_000153485
+s__Shewanella_baltica	9	GCF_000015845	GCF_000231345	GCF_000018765	GCF_000147735	GCF_000179535	GCF_000178875	GCF_000017325	GCF_000215895	GCF_000021665
+s__Plum_pox_virus	1	PRJNA15298
+s__Glaciecola_psychrophila	2	GCF_000347635	GCF_000315075
+s__Yersinia_pseudotuberculosis	4	GCF_000019465	GCF_000047365	GCF_000016945	GCF_000020085
+s__Nitritalea_halalkaliphila	1	GCF_000265075
+s__Agrobacterium_tumefaciens	6	GCF_000233975	GCF_000219665	GCF_000349865	GCF_000236125	GCF_000421945	GCF_000016265
+s__Thioalkalivibrio_sp_ALE6	1	GCF_000364565
+s__Acaricomes_phytoseiuli	1	GCF_000376245
+s__Acidaminococcus_fermentans	1	GCF_000025305
+s__Mycobacterium_vanbaalenii	1	GCF_000015305
+s__Streptococcus_henryi	1	GCF_000376985
+s__Actinobaculum_massiliense	1	GCF_000315465
+s__Hamster_polyomavirus	1	PRJNA14461
+s__Bat_hepatitis_virus	1	PRJNA195535
+s__Canine_bocavirus	1	PRJNA193977
+s__Soybean_chlorotic_spot_virus	1	PRJNA173351
+s__Alcanivorax_pacificus	1	GCF_000299335
+s__Pseudoflavonifractor_capillosus	1	GCF_000169255
+s__Cafeteria_roenbergensis_virus_BV_PW1	1	PRJNA59783
+s__Moraxella_boevrei	1	GCF_000379845
+s__Streptococcus_suis	21	GCF_000167375	GCF_000018185	GCF_000231325	GCF_000471985	GCF_000231905	GCF_000390245	GCF_000231885	GCF_000294495	GCF_000091905	GCF_000344765	GCF_000026725	GCF_000231925	GCF_000231865	GCF_000233575	GCF_000494895	GCF_000014325	GCF_000168355	GCF_000204625	GCF_000014305	GCF_000186405	GCF_000026745
+s__Ruminococcus_gnavus	2	GCF_000169475	GCF_000507805
+s__Corynebacterium_variabile	1	GCF_000179395
+s__Oenococcus_oeni	3	GCF_000372485	GCF_000168955	GCF_000014385
+s__Pediococcus_lolii	1	GCF_000319265
+s__Propionibacterium_sp_KPL2008	1	GCF_000477755
+s__Propionibacterium_sp_KPL2009	1	GCF_000477655
+s__Propionibacterium_sp_KPL2005	1	GCF_000477675
+s__Propionibacterium_sp_KPL2003	1	GCF_000477775
+s__Propionibacterium_sp_KPL2000	1	GCF_000477795
+s__Bacteroides_sp_HPS0048	1	GCF_000382465
+s__Vibrio_sp_AND4	1	GCF_000171815
+s__Streptococcus_sp_M334	1	GCF_000187745
+s__Pea_stem_necrosis_virus	1	PRJNA14894
+s__Eubacterium_ventriosum	1	GCF_000153885
+s__Trichechus_manatus_latirostris_papillomavirus_2	1	PRJNA84405
+s__Prevotella_melaninogenica	2	GCF_000163035	GCF_000144405
+s__Synechococcus_phage_S_CBS4	1	PRJNA82651
+s__Sida_yellow_mosaic_virus	1	PRJNA15496
+s__Rubrobacter_xylanophilus	1	GCF_000014185
+s__Streptomyces_griseus	2	GCF_000010605	GCF_000177175
+s__Enterobacteria_phage_HK578	1	PRJNA183138
+s__Massilia_niastensis	1	GCF_000382345
+s__Shigella_phage_pSf_1	1	PRJNA206484
+s__Dactylococcopsis_salina	1	GCF_000317615
+s__Saccharibacillus_kuerlensis	1	GCF_000378145
+s__Streptococcus_agalactiae	257	GCF_000289455	GCF_000289275	GCF_000310505	GCF_000311645	GCF_000310585	GCF_000289575	GCF_000288655	GCF_000186445	GCF_000289995	GCF_000288695	GCF_000311005	GCF_000288955	GCF_000289695	GCF_000290115	GCF_000311705	GCF_000290215	GCF_000007265	GCF_000310985	GCF_000311245	GCF_000289955	GCF_000311145	GCF_000310825	GCF_000311365	GCF_000311165	GCF_000322625	GCF_000323105	GCF_000322985	GCF_000322845	GCF_000322525	GCF_000289875	GCF_000289015	GCF_000288335	GCF_00031052 [...]
+s__Synechococcus_phage_S_CBS3	1	PRJNA66397
+s__Lachnospiraceae_bacterium_5_1_63FAA	1	GCF_000185525
+s__Vibrio_sp_HENC_02	1	GCF_000305735
+s__Vibrio_sp_HENC_03	1	GCF_000305755
+s__Vibrio_sp_HENC_01	1	GCF_000305715
+s__Enterobacter_sp_MGH_26	1	GCF_000492975
+s__Enterobacter_sp_MGH_24	1	GCF_000493015
+s__Enterobacter_sp_MGH_25	1	GCF_000492995
+s__Sphingomonas_sp_ATCC_31555	1	GCF_000282895
+s__Enterobacter_sp_MGH_23	1	GCF_000493035
+s__Mamastrovirus_13	1	PRJNA15095
+s__Mamastrovirus_10	1	PRJNA14897
+s__Staphylococcus_phage_CNPH82	1	PRJNA18523
+s__candidate_division_TG3_bacterium_ACht1	1	GCF_000474745
+s__Lamium_leaf_distortion_virus	1	PRJNA29877
+s__Lyngbya_aestuarii	1	GCF_000478195
+s__Haloferax_sp_BAB2207	1	GCF_000328285
+s__Sida_golden_mosaic_Honduras_virus	1	PRJNA14263
+s__Streptomyces_himastatinicus	1	GCF_000158915
+s__Ralstonia_phage_RSA1	1	PRJNA19481
+s__Saccharothrix_espanaensis	1	GCF_000328705
+s__Burkholderia_xenovorans	1	GCF_000013645
+s__Turkey_gallivirus	1	PRJNA172458
+s__Deinococcus_gobiensis	1	GCF_000252445
+s__Ludwigia_leaf_distortion_betasatellite	1	PRJNA29233
+s__Verrucomicrobia_bacterium_SCGC_AAA300_O17	1	GCF_000382685
+s__Jatropha_leaf_curl_virus	1	PRJNA31277
+s__Haladaptatus_paucihalophilus	2	GCF_000187225	GCF_000376445
+s__Enterobacteria_phage_vB_EcoP_ACG_C91	1	PRJNA179413
+s__Schlumbergera_virus_X	1	PRJNA33189
+s__Ustilaginoidea_virens_RNA_virus	1	PRJNA213142
+s__Alkaliphilus_oremlandii	1	GCF_000018325
+s__Clostridium_sordellii	2	GCF_000444095	GCF_000444075
+s__Enterococcus_phage_EFAP_1	1	PRJNA36375
+s__Alistipes_putredinis	1	GCF_000154465
+s__Australian_grapevine_viroid	1	PRJNA14976
+s__Halomicrobium_katesii	1	GCF_000379085
+s__Mirabilis_mosaic_virus	1	PRJNA14393
+s__Macrobrachium_rosenbergii_nodavirus	1	PRJNA15129
+s__Sugarcane_streak_Egypt_virus	1	PRJNA14365
+s__Aroa_virus	1	PRJNA18847
+s__Thermus_aquaticus	1	GCF_000173055
+s__Aeromicrobium_sp_JC14	2	GCF_000285435	GCF_000312105
+s__Sulfolobus_spindle_shaped_virus_7	1	PRJNA42357
+s__Vibrio_sp_RC341	1	GCF_000176215
+s__Congregibacter_litoralis	1	GCF_000153125
+s__Sulfolobus_spindle_shaped_virus_2	1	PRJNA14317
+s__Ruegeria_pomeroyi	1	GCF_000011965
+s__Sulfolobus_spindle_shaped_virus_1	1	PRJNA14014
+s__Pelagibaca_bermudensis	1	GCF_000153725
+s__Metallosphaera_sedula	1	GCF_000016605
+s__Streptomyces_sp_LaPpAH_95	1	GCF_000375725
+s__Paramecium_bursaria_Chlorella_virus_NY2A	1	PRJNA20989
+s__Carnation_etched_ring_virus	1	PRJNA14494
+s__Circovirus_like_genome_RW_D	1	PRJNA39623
+s__Burkholderia_cenocepacia	8	GCF_000236215	GCF_000009485	GCF_000333135	GCF_000019505	GCF_000014085	GCF_000203955	GCF_000333155	GCF_000152565
+s__Methanobacterium_sp_AL_21	1	GCF_000191585
+s__Malvastrum_yellow_vein_Yunnan_virus	1	PRJNA15231
+s__Algicola_sagamiensis	1	GCF_000374485
+s__Tsukamurella_sp_1534	1	GCF_000312385
+s__Frog_adenovirus_A	1	PRJNA14488
+s__Bovine_polyomavirus	1	PRJNA14017
+s__Cellulophaga_phage_phi10_1	1	PRJNA212964
+s__Lactobacillus_phage_phiJL_1	1	PRJNA15156
+s__Mesta_yellow_vein_mosaic_virus_associated_DNA_beta	1	PRJNA21015
+s__Blattabacterium_sp_Panesthia_angustipennis_spadica	1	GCF_000348805
+s__Paenisporosarcina_sp_TG_14	1	GCF_000297555
+s__Groundnut_bud_necrosis_virus	1	PRJNA14766
+s__Penicillium_stoloniferum_virus_S	1	PRJNA14950
+s__Massilia_timonae	1	GCF_000315425
+s__Penicillium_stoloniferum_virus_F	1	PRJNA15533
+s__Saccharomonospora_halophila	1	GCF_000383775
+s__Brucella_abortus	136	GCF_000479955	GCF_000366405	GCF_000478665	GCF_000366545	GCF_000369925	GCF_000245835	GCF_000370245	GCF_000370345	GCF_000245915	GCF_000370505	GCF_000370145	GCF_000245875	GCF_000370085	GCF_000370365	GCF_000413615	GCF_000157675	GCF_000480235	GCF_000366345	GCF_000366445	GCF_000366525	GCF_000472245	GCF_000370325	GCF_000366765	GCF_000366325	GCF_000182625	GCF_000366605	GCF_000366665	GCF_000298635	GCF_000370445	GCF_000413755	GCF_000369965	GCF_000370285	GCF_000480115	GCF_00 [...]
+s__Tomato_yellow_leaf_curl_Thailand_betasatellite	1	PRJNA14450
+s__Aichivirus_B	1	PRJNA14948
+s__Aichivirus_C	1	PRJNA82751
+s__WU_Polyomavirus	1	PRJNA19765
+s__Phaeodactylum_tricornutum	1	GCA_000150955
+s__Lactobacillus_parafarraginis	1	GCF_000238835
+s__Blackberry_chlorotic_ringspot_virus	1	PRJNA32707
+s__Gemmatimonas_aurantiaca	1	GCF_000010305
+s__Staphylococcus_phage_phi2958PVL	1	PRJNA32173
+s__Desulfotomaculum_reducens	1	GCF_000016165
+s__Tomato_severe_leaf_curl_virus	1	PRJNA14482
+s__Mycobacterium_phage_Jabbawokkie	1	PRJNA215115
+s__Cellulophaga_phage_phi19_1	1	PRJNA212942
+s__Cellulophaga_phage_phi19_3	1	PRJNA212945
+s__Escherichia_phage_D108	1	PRJNA42515
+s__Propionibacterium_phage_P14_4	1	PRJNA177530
+s__Methanocella_paludicola	1	GCF_000011005
+s__Pseudoalteromonas_atlantica	1	GCF_000014225
+s__Rosellinia_necatrix_partitivirus_2	1	PRJNA188731
+s__Nse_virus	1	PRJNA196420
+s__Calla_lily_latent_virus	1	PRJNA202315
+s__Verrucomicrobia_bacterium_SCGC_AAA164_L15	1	GCF_000285795
+s__Thioalkalivibrio_sulfidophilus	1	GCF_000021985
+s__Oceanobacillus_kimchii	1	GCF_000340475
+s__Rhodospirillum_rubrum	2	GCF_000225955	GCF_000013085
+s__Pasteurella_pneumotropica	1	GCF_000379905
+s__Spinach_curly_top_virus	1	PRJNA14373
+s__Cyanothece_sp_PCC_7424	1	GCF_000021825
+s__Cyanothece_sp_PCC_7425	1	GCF_000022045
+s__Borrelia_bissettii	1	GCF_000222305
+s__Fluviicola_taffensis	1	GCF_000194605
+s__Propionibacterium_sp_5_U_42AFAA	1	GCF_000233555
+s__Streptomyces_sp_PsTaAH_124	1	GCF_000373685
+s__Candidatus_Liberibacter_solanacearum	1	GCF_000183665
+s__Pseudomonas_phage_LBL3	1	PRJNA31053
+s__Terriglobus_saanensis	1	GCF_000179915
+s__Sulfurihydrogenibium_azorense	1	GCF_000021545
+s__Leptospira_kirschneri	25	GCF_000246335	GCF_000246155	GCF_000243655	GCF_000306395	GCF_000306175	GCF_000347215	GCF_000243695	GCF_000244515	GCF_000346895	GCF_000306595	GCF_000347015	GCF_000246675	GCF_000306555	GCF_000246175	GCF_000342725	GCF_000343555	GCF_000243615	GCF_000243855	GCF_000306355	GCF_000306515	GCF_000243915	GCF_000246355	GCF_000347235	GCF_000246295	GCF_000243875
+s__Pseudonocardia_dioxanivorans	1	GCF_000196675
+s__Alternaria_alternata_virus_1	1	PRJNA30367
+s__Tulare_apple_mosaic_virus	1	PRJNA14814
+s__Sphingomonas_sp_PAMC_26621	1	GCF_000251145
+s__Alternanthera_mosaic_virus	1	PRJNA16333
+s__Rickettsia_sibirica	2	GCF_000246715	GCF_000247625
+s__Infectious_hematopoietic_necrosis_virus	1	PRJNA14677
+s__Megasphaera_micronuciformis	1	GCF_000165735
+s__Corynebacterium_aurimucosum	2	GCF_000022905	GCF_000174695
+s__Streptococcus_phage_DT1	1	PRJNA15124
+s__Marvinbryantia_formatexigens	1	GCF_000173815
+s__Papaya_leaf_distortion_mosaic_virus	1	PRJNA15405
+s__SAR324_cluster_bacterium_SCGC_AB_629_J17	1	GCF_000375785
+s__Cotton_leaf_curl_Gezira_virus	1	PRJNA14095
+s__Leptospira_yanagawae	1	GCF_000332475
+s__Bradyrhizobium_sp_DFCI_1	1	GCF_000465325
+s__Ruminococcus_lactaris	2	GCF_000507785	GCF_000155205
+s__Caulobacter_phage_CcrKarma	1	PRJNA179420
+s__Pseudomonas_agarici	1	GCF_000280785
+s__Corynebacterium_nuruki	1	GCF_000213935
+s__Uliginosibacterium_gangwonense	1	GCF_000373965
+s__Dehalobacter_sp_E1	1	GCF_000309295
+s__Clostridium_saccharolyticum	2	GCF_000210535	GCF_000144625
+s__Rhynchosia_golden_mosaic_virus	1	PRJNA14258
+s__Methanococcus_voltae	1	GCF_000006175
+s__Citrus_vein_enation_virus	1	PRJNA209366
+s__Cymbidium_mosaic_virus	1	PRJNA15490
+s__Grapevine_fleck_virus	1	PRJNA15188
+s__HMO_Astrovirus_A	1	PRJNA41413
+s__Spodoptera_frugiperda_multiple_nucleopolyhedrovirus	1	PRJNA18827
+s__Verrucomicrobia_bacterium_SCGC_AAA164_O14	1	GCF_000264605
+s__Torque_teno_virus_19	1	PRJNA48155
+s__Torque_teno_virus_16	1	PRJNA48181
+s__Torque_teno_virus_15	1	PRJNA48191
+s__Torque_teno_virus_14	1	PRJNA48153
+s__Torque_teno_virus_12	1	PRJNA48149
+s__Colorado_tick_fever_virus	1	PRJNA14857
+s__Ranid_herpesvirus_2	1	PRJNA17183
+s__Collinsella_intestinalis	1	GCF_000156175
+s__Acidiphilium_cryptum	1	GCF_000016725
+s__Maritimibacter_alkaliphilus	1	GCF_000152805
+s__Mycobacterium_phage_Wheeler	1	PRJNA215110
+s__Amapari_virus	1	PRJNA28321
+s__Gluconobacter_frateurii	1	GCF_000284875
+s__Candidatus_Blochmannia_floridanus	1	GCF_000043285
+s__Escherichia_phage_KBNP21	1	PRJNA177527
+s__Streptomyces_globisporus	1	GCF_000261345
+s__Arthrobacter_phenanthrenivorans	1	GCF_000189535
+s__Enterococcus_durans	4	GCF_000406985	GCF_000315405	GCF_000350465	GCF_000407265
+s__Pseudoalteromonas_sp_BSi20495	1	GCF_000241185
+s__Roseovarius_sp_TM1035	1	GCF_000170775
+s__Pyramidobacter_piscolens	1	GCF_000177335
+s__Bat_sapovirus_TLC58_HK	1	PRJNA167111
+s__Avibacterium_paragallinarum	1	GCF_000348525
+s__Clostridium_thermocellum	6	GCF_000175715	GCF_000173015	GCF_000015865	GCF_000184925	GCF_000255575	GCF_000255615
+s__Herpetosiphon_aurantiacus	1	GCF_000018565
+s__Eggerthella_sp_1_3_56FAA	1	GCF_000185625
+s__Gremmeniella_abietina_type_B_RNA_virus_XL	1	PRJNA16657
+s__Lone_Star_virus	1	PRJNA203651
+s__Bacillus_sp_NRRL_B_14911	1	GCF_000153365
+s__Halomonas_sp_HAL1	1	GCF_000235725
+s__Herminiimonas_arsenicoxydans	1	GCF_000026125
+s__Watermelon_chlorotic_stunt_virus	1	PRJNA14176
+s__Hyphomicrobium_zavarzinii	1	GCF_000383415
+s__Sida_yellow_mosaic_virus_China_associated_DNA_beta	1	PRJNA15514
+s__Burkholderia_sp_JPY251	1	GCF_000372985
+s__Mesoplasma_florum	2	GCF_000479355	GCF_000008305
+s__Plautia_stali_symbiont	1	GCF_000180175
+s__Laccaria_bicolor	1	GCA_000143565
+s__Escherichia_phage_2_JES_2013	1	PRJNA219124
+s__Planctomyces_maris	1	GCF_000181475
+s__Burkholderia_lata	1	GCF_000012945
+s__Coxiella_burnetii	8	GCF_000007765	GCF_000019885	GCF_000168875	GCF_000019865	GCF_000169495	GCF_000017105	GCF_000018745	GCF_000300315
+s__Jonquetella_sp_BV3C21	1	GCF_000468895
+s__Pseudoalteromonas_phage_H105_1	1	PRJNA64761
+s__Thiomonas_intermedia	1	GCF_000092605
+s__African_green_monkey_simian_foamy_virus	1	PRJNA30095
+s__Ferroplasma_acidarmanus	1	GCF_000152265
+s__Planococcus_antarcticus	1	GCF_000264415
+s__Streptomyces_sp_BoleA5	1	GCF_000373665
+s__Hollyhock_leaf_crumple_virus_satellite_DNA	1	PRJNA14208
+s__Squash_yellow_mild_mottle_virus	1	PRJNA14186
+s__Wallal_virus	1	PRJNA222995
+s__Marinobacter_adhaerens	1	GCF_000166295
+s__Providencia_sneebia	1	GCF_000314895
+s__Marine_RNA_virus_JP_B	1	PRJNA20651
+s__Marine_RNA_virus_JP_A	1	PRJNA20649
+s__Cellvibrio_gilvus	1	GCF_000218545
+s__Microcystis_aeruginosa	12	GCF_000312245	GCF_000330925	GCF_000010625	GCF_000312285	GCF_000312205	GCF_000312265	GCF_000312725	GCF_000312185	GCF_000312225	GCF_000412595	GCF_000312165	GCF_000307995
+s__Chromobacterium_violaceum	1	GCF_000007705
+s__American_hop_latent_virus	1	PRJNA163147
+s__Variovorax_sp_CF313	1	GCF_000282635
+s__Dendrolimus_punctatus_densovirus	1	PRJNA14546
+s__East_African_cassava_mosaic_virus	1	PRJNA15177
+s__Saccharomonospora_cyanea	1	GCF_000244975
+s__Escherichia_sp_TW09308	1	GCF_000208565
+s__Burkholderiales_bacterium_JOSHI_001	1	GCF_000244995
+s__Enterobacter_sp_Ag1	1	GCF_000277545
+s__Cotton_leaf_curl_Multan_betasatellite	1	PRJNA15780
+s__Streptococcus_peroris	1	GCF_000187585
+s__Rudaea_cellulosilytica	1	GCF_000378125
+s__Chronic_bee_paralysis_virus	1	PRJNA29839
+s__Acheta_domestica_densovirus	1	PRJNA15222
+s__Arthrobacter_sp_M2012083	1	GCF_000281065
+s__Mycobacterium_phage_Angel	1	PRJNA38461
+s__Onion_yellow_dwarf_virus	1	PRJNA15407
+s__Paenibacillus_sp_A9	1	GCF_000346635
+s__Moniliophthora_perniciosa	1	GCA_000183025
+s__Burkholderia_sp_H160	1	GCF_000173575
+s__Cypovirus_15	1	PRJNA14102
+s__Cutthroat_trout_virus	1	PRJNA66895
+s__Lacinutrix_sp_5H_3_7_4	1	GCF_000211855
+s__Donggang_virus	1	PRJNA115527
+s__Candidatus_Blochmannia_pennsylvanicus	1	GCF_000011745
+s__Lactobacillus_rhamnosus	13	GCF_000160175	GCF_000235785	GCF_000026525	GCF_000195375	GCF_000418495	GCF_000418475	GCF_000466865	GCF_000311965	GCF_000226235	GCF_000173255	GCF_000235865	GCF_000233755	GCF_000311945
+s__Campylobacter_ureolyticus	2	GCF_000374605	GCF_000413435
+s__Neptuniibacter_caesariensis	1	GCF_000153345
+s__Succinispira_mobilis	1	GCF_000384135
+s__Acinetobacter_bouvetii	2	GCF_000373725	GCF_000368865
+s__Vibrio_sp_EJY3	1	GCF_000241385
+s__Blotched_snakehead_virus	1	PRJNA14921
+s__Mycobacterium_leprae	2	GCF_000195855	GCF_000026685
+s__Cleome_leaf_crumple_virus	1	PRJNA81005
+s__Helicobasidium_mompa_endornavirus_1	1	PRJNA41437
+s__Bacillus_phage_GIL16c	1	PRJNA15164
+s__Brucella_sp_F96_2	1	GCF_000371025
+s__Rhodobacteraceae_bacterium_HTCC2150	1	GCF_000169395
+s__Kazachstania_africana	1	GCA_000304475
+s__Halomonas_phage_phiHAP_1	1	PRJNA28763
+s__Rhodospirillum_photometricum	1	GCF_000284415
+s__Citrus_bent_leaf_viroid	3	PRJNA14903	PRJNA14969	PRJNA14972
+s__Yersinia_aldovae	1	GCF_000173735
+s__Octadecabacter_arcticus	1	GCF_000155735
+s__Listeria_phage_P40	1	PRJNA32073
+s__Papaya_lethal_yellowing_virus	1	PRJNA173050
+s__Brevibacterium_massiliense	1	GCF_000285915
+s__Mycobacterium_phage_Tweety	1	PRJNA20787
+s__Dendrolimus_punctatus_tetravirus	1	PRJNA15120
+s__Pseudomonas_phage_phi15	1	PRJNA63435
+s__Verrucomicrobium_sp_3C	1	GCF_000379365
+s__Leifsonia_sp_109	1	GCF_000380665
+s__Paenibacillus_sp_oral_taxon_786	1	GCF_000159955
+s__Entamoeba_dispar	1	GCA_000209125
+s__Staphylococcus_phage_80alpha	1	PRJNA19749
+s__Cassia_yellow_blotch_virus	1	PRJNA15419
+s__Aquareovirus_A	1	PRJNA16158
+s__Plesiocystis_pacifica	1	GCF_000170895
+s__Propionibacterium_granulosum	2	GCF_000464495	GCF_000463665
+s__Canarypox_virus	1	PRJNA14340
+s__Psychroflexus_gondwanensis	1	GCF_000355905
+s__Aciduliprofundum_boonei	2	GCF_000151085	GCF_000025665
+s__Thioalkalivibrio_sp_AKL10	1	GCF_000381845
+s__Thioalkalivibrio_sp_AKL11	1	GCF_000377845
+s__Thioalkalivibrio_sp_AKL12	1	GCF_000377925
+s__Thioalkalivibrio_sp_AKL17	1	GCF_000377885
+s__Prevotella_sp_oral_taxon_299	1	GCF_000163055
+s__Ruminococcus_obeum	2	GCF_000210015	GCF_000153905
+s__Mycoplasma_sp_G5847	1	GCF_000327395
+s__Sulfolobales_Mexican_fusellovirus_1	1	PRJNA195533
+s__Drosophila_C_virus	1	PRJNA14682
+s__Streptococcus_downei	1	GCF_000180055
+s__Mycobacterium_phage_Faith1	1	PRJNA67415
+s__Aliivibrio_salmonicida	1	GCF_000196495
+s__Avian_paramyxovirus_6	1	PRJNA14719
+s__Avian_paramyxovirus_4	1	PRJNA181250
+s__Promicromonospora_sukumoe	1	GCF_000385135
+s__Caldicellulosiruptor_saccharolyticus	1	GCF_000016545
+s__Mycobacterium_phage_First	1	PRJNA195529
+s__Sphingomonas_echinoides	1	GCF_000241465
+s__Beet_mild_yellowing_virus	1	PRJNA15079
+s__Prevotella_sp_oral_taxon_472	1	GCF_000163495
+s__Prevotella_sp_oral_taxon_473	1	GCF_000318095
+s__Bacteroides_sp_2_1_22	1	GCF_000162155
+s__Black_raspberry_virus_F	1	PRJNA20975
+s__Grapevine_Bulgarian_latent_virus	1	PRJNA66553
+s__Ageratum_yellow_vein_Taiwan_virus	1	PRJNA14249
+s__Marine_group_II_euryarchaeote_SCGC_AAA288_C18	1	GCF_000382765
+s__Pseudomonas_psychrotolerans	1	GCF_000236825
+s__Microbulbifer_agarilyticus	1	GCF_000220505
+s__Psychroflexus_tropicus	1	GCF_000378765
+s__Vibrio_phage_VBM1	1	PRJNA195494
+s__Rodent_pegivirus	1	PRJNA198868
+s__Yunnan_orbivirus	1	PRJNA16242
+s__Helicobacter_fennelliae	1	GCF_000509365
+s__Propionibacterium_phage_PHL112N00	1	PRJNA219110
+s__Alternanthera_yellow_vein_betasatellite	1	PRJNA19833
+s__Brevibacillus_laterosporus	4	GCF_000237005	GCF_000472325	GCF_000219535	GCF_000374385
+s__Acinetobacter_sp_NIPH_2036	1	GCF_000413935
+s__Nitrosomonas_eutropha	1	GCF_000014765
+s__Lily_virus_X	1	PRJNA15494
+s__Mycobacterium_phage_Leo	1	PRJNA209361
+s__Zaire_ebolavirus	1	PRJNA14703
+s__Halorubrum_pleomorphic_virus_2	1	PRJNA157257
+s__Tomato_leaf_curl_Madagascar_virus	1	PRJNA15211
+s__Glypta_fumiferanae_ichnovirus	1	PRJNA18767
+s__Lactobacillus_vaginalis	1	GCF_000159435
+s__Bacillus_phage_Bastille	1	PRJNA177550
+s__Gluconacetobacter_diazotrophicus	2	GCF_000021325	GCF_000067045
+s__Thioalkalivibrio_sp_ALJ20	1	GCF_000378585
+s__Thioalkalivibrio_sp_ALJ21	1	GCF_000378605
+s__Sedimentibacter_sp_B4	1	GCF_000309315
+s__Thioalkalivibrio_sp_ALJ24	1	GCF_000377785
+s__Pseudomonas_phage_F8	1	PRJNA16388
+s__Shigella_phage_Sf6	1	PRJNA14498
+s__Sulfitobacter_sp_NAS_14_1	1	GCF_000152645
+s__Chlamydophila_abortus	2	GCF_000026025	GCF_000213905
+s__Pseudomonas_resinovorans	1	GCF_000412695
+s__Human_coronavirus_229E	1	PRJNA14913
+s__Slackia_piriformis	1	GCF_000296445
+s__Crassocephalum_yellow_vein_virus	1	PRJNA18659
+s__Strawberry_latent_ringspot_virus_satellite_RNA	1	PRJNA15155
+s__Citrus_viroid_VI	1	PRJNA42701
+s__CAS_virus	1	PRJNA173353
+s__Staphylococcus_xylosus	1	GCF_000338275
+s__Pseudomonas_denitrificans	1	GCF_000349845
+s__Porphyromonas_crevioricanis	1	GCF_000509245
+s__Bacillus_phage_phi105	1	PRJNA14217
+s__Streptomyces_phage_phiBT1	1	PRJNA14276
+s__Yersinia_phage_phi80_18	1	PRJNA184145
+s__Pseudomonas_sp_R81	1	GCF_000257625
+s__Chicken_anemia_virus	1	PRJNA15484
+s__Escherichia_phage_HK639	1	PRJNA76729
+s__Mycobacterium_orygis	1	GCF_000353205
+s__Streptomyces_hygroscopicus	2	GCF_000340845	GCF_000245355
+s__Mycobacterium_phage_PegLeg	1	PRJNA206038
+s__Barfin_flounder_nervous_necrosis_virus	1	PRJNA41605
+s__Rhodococcus_phage_RER2	1	PRJNA81173
+s__Mycobacterium_phage_Ramsey	1	PRJNA32019
+s__Simian_adenovirus_20	1	PRJNA192869
+s__Candidatus_Poribacteria_sp_WGA_4E	1	GCF_000372285
+s__Halorubrum_tebenquichense	1	GCF_000337415
+s__Crenarchaeota_archaeon_SCGC_AAA471_O08	1	GCF_000398765
+s__Cymbidium_ringspot_virus	1	PRJNA15066
+s__Pseudomonas_phage_phiIBB_PF7A	1	PRJNA64561
+s__Chryseobacterium_taeanense	1	GCF_000304615
+s__Cupriavidus_basilensis	2	GCF_000282815	GCF_000243095
+s__Methanolinea_tarda	1	GCF_000235685
+s__Pseudoxanthomonas_suwonensis	1	GCF_000185965
+s__Dialister_micraerophilus	2	GCF_000183445	GCF_000194985
+s__Actinomyces_sp_ICM39	1	GCF_000282935
+s__Mycoplasma_cynos	1	GCF_000328725
+s__Cherry_leaf_roll_virus	1	PRJNA66187
+s__Singapore_grouper_iridovirus	1	PRJNA14544
+s__Halovirus_HRTV_7	1	PRJNA206491
+s__Halovirus_HRTV_5	1	PRJNA206492
+s__Tomato_yellow_leaf_curl_Sardinia_virus	1	PRJNA14484
+s__Bradyrhizobium_sp_ORS_375	1	GCF_000239775
+s__Saprospira_grandis	2	GCF_000275825	GCF_000250635
+s__Halovirus_HRTV_8	1	PRJNA206490
+s__Brucella_microti	1	GCF_000022745
+s__Tomato_chlorosis_virus	1	PRJNA15587
+s__Ramlibacter_tataouinensis	1	GCF_000215705
+s__Tomato_leaf_curl_Togo_virus	1	PRJNA34813
+s__Nevskia_ramosa	1	GCF_000420645
+s__Clostridium_arbusti	1	GCF_000246895
+s__Shinella_zoogloeoides	1	GCF_000496935
+s__Cyanophage_PP	1	PRJNA227004
+s__Human_picobirnavirus	1	PRJNA15248
+s__Desulfovibrio_piger	1	GCF_000156375
+s__Rhinolophus_bat_coronavirus_HKU2	1	PRJNA27911
+s__Pseudomonas_sp_UK4	1	GCF_000174915
+s__Syntrophobacter_fumaroxidans	1	GCF_000014965
+s__Turicella_otitidis	2	GCF_000296405	GCF_000297795
+s__Megasphaera_sp_NM10	1	GCF_000417505
+s__Nocardiopsis_prasina	1	GCF_000341265
+s__Candidatus_Sulcia_muelleri	5	GCF_000017525	GCF_000022945	GCF_000025785	GCF_000168155	GCF_000147035
+s__Leuconostoc_gelidum	2	GCF_000298875	GCF_000166715
+s__Sporomusa_ovata	1	GCF_000445445
+s__Cenarchaeum_symbiosum	1	GCF_000200715
+s__Pseudoalteromonas_phage_PM2	1	PRJNA14237
+s__Rhopalosiphum_padi_virus	1	PRJNA14648
+s__Enterobacteria_phage_ST104	1	PRJNA14499
+s__Mariprofundus_ferrooxydans	2	GCF_000379405	GCF_000153765
+s__Sida_leaf_curl_virus_associated_DNA_beta	1	PRJNA16226
+s__Amasya_cherry_disease_associated_mycovirus	1	PRJNA15010
+s__Polaromonas_sp_CF318	1	GCF_000282655
+s__Ruminococcus_torques	2	GCF_000153925	GCF_000210035
+s__Treponema_socranskii	3	GCF_000413015	GCF_000468115	GCF_000464455
+s__Mycobacterium_phage_Che9c	1	PRJNA14271
+s__Mycobacterium_phage_Che9d	1	PRJNA14339
+s__Coccidioides_immitis	1	GCA_000149335
+s__Staphylothermus_marinus	1	GCF_000015945
+s__Streptomyces_collinus	1	GCF_000444875
+s__Brevibacillus_borstelensis	1	GCF_000353565
+s__Peanut_witches_broom_phytoplasma	1	GCF_000364425
+s__Bacteroides_uniformis	4	GCF_000403175	GCF_000273785	GCF_000154205	GCF_000273275
+s__Cucurbit_chlorotic_yellows_virus	1	PRJNA170929
+s__Streptomyces_sp_AA4	1	GCF_000158875
+s__Cabbage_leaf_curl_virus	1	PRJNA14187
+s__Strawberry_mottle_virus	1	PRJNA14740
+s__Candidatus_Saccharimonas_aalborgensis	1	GCF_000392435
+s__Salmonella_phage_ST160	1	PRJNA61857
+s__Amycolatopsis_decaplanina	1	GCF_000342005
+s__Synechocystis_sp_PCC_6803	3	GCF_000340785	GCF_000270265	GCF_000009725
+s__Fangia_hongkongensis	1	GCF_000379445
+s__Cellulomonas_sp_JC225	1	GCF_000312005
+s__UR2_sarcoma_virus	1	PRJNA15322
+s__Anoxybacillus_kamchatkensis	1	GCF_000283415
+s__Tobacco_leaf_curl_Zimbabwe_virus	1	PRJNA14119
+s__Saccharum_streak_virus	1	PRJNA41611
+s__Cotton_leafroll_dwarf_virus	1	PRJNA53497
+s__Hymenobacter_norwichensis	1	GCF_000420705
+s__Starling_circovirus	1	PRJNA16796
+s__Simian_adenovirus_B	1	PRJNA64487
+s__Simian_adenovirus_C	1	PRJNA200956
+s__Simian_adenovirus_A	1	PRJNA14491
+s__Caladenia_virus_A	1	PRJNA174779
+s__Enterococcus_asini	2	GCF_000393955	GCF_000407365
+s__Lymphocystis_disease_virus_1	1	PRJNA14081
+s__Agrobacterium_vitis	1	GCF_000016285
+s__Yam_mild_mosaic_virus	1	PRJNA179432
+s__Staphylococcus_phage_K	1	PRJNA14479
+s__Paludibacter_propionicigenes	1	GCF_000183135
+s__Micromonas_sp_RCC1109_virus_MpV1	1	PRJNA61013
+s__Rosellinia_necatrix_megabirnavirus_1	1	PRJNA41609
+s__Cotton_leaf_curl_Gezira_betasatellite	1	PRJNA15166
+s__Fig_fleck_associated_virus	1	PRJNA64495
+s__Cauliflower_mosaic_virus	1	PRJNA14574
+s__Microbacterium_sp_TS_1	1	GCF_000509385
+s__Nupapillomavirus_1	1	PRJNA15485
+s__Helicobacter_acinonychis	1	GCF_000009305
+s__Erythrobacter_sp_NAP1	1	GCF_000152865
+s__Vaccinia_virus	1	PRJNA15241
+s__Murine_norovirus	1	PRJNA17577
+s__Kennedya_yellow_mosaic_virus	1	PRJNA14644
+s__Chromobacterium_sp_C_61	1	GCF_000285415
+s__Blautia_hydrogenotrophica	1	GCF_000157975
+s__Methylophilus_sp_1	1	GCF_000374225
+s__Streptomyces_sp_CNS615	1	GCF_000365385
+s__Potato_yellow_mosaic_virus	1	PRJNA14065
+s__Acinetobacter_sp_CIP_53_82	1	GCF_000369465
+s__Pantoea_sp_GM01	1	GCF_000282675
+s__Chaetomium_globosum	1	GCA_000143365
+s__Brucella_sp_F5_06	1	GCF_000370985
+s__Geobacillus_sp_A8	1	GCF_000447395
+s__Synechococcus_phage_S_SM1	1	PRJNA64701
+s__Synechococcus_phage_S_SM2	1	PRJNA64695
+s__Salmonella_phage_Vi_II_E1	1	PRJNA29079
+s__Spirochaeta_bajacaliforniensis	1	GCF_000378205
+s__Sphingobium_japonicum	1	GCF_000091125
+s__Clostridium_hiranonis	1	GCF_000156055
+s__Streptomyces_phage_mu1_6	1	PRJNA16706
+s__East_African_cassava_mosaic_Malawi_virus	1	PRJNA226083
+s__Marinimicrobia_bacterium_JGI_0000059_L03	1	GCF_000365325
+s__Chickpea_chlorotic_dwarf_virus	2	PRJNA28581	PRJNA30715
+s__Methanocaldococcus_vulcanius	1	GCF_000024625
+s__Burkholderiales_bacterium_1_1_47	1	GCF_000144975
+s__Roseburia_hominis	1	GCF_000225345
+s__Rhopapillomavirus_1	1	PRJNA14545
+s__Actinoplanes_friuliensis	1	GCF_000494755
+s__Caldicellulosiruptor_hydrothermalis	1	GCF_000166355
+s__Shewanella_pealeana	1	GCF_000018285
+s__Nocardia_brasiliensis	1	GCF_000250675
+s__Paenibacillus_phage_PG1	1	PRJNA209208
+s__Rabbit_vesivirus	1	PRJNA18289
+s__Pelargonium_necrotic_spot_virus	1	PRJNA15214
+s__Tomato_leaf_curl_Mali_virus	1	PRJNA14349
+s__Coprobacillus_sp_D6	1	GCF_000269565
+s__Coprobacillus_sp_D7	1	GCF_000158555
+s__Tomato_leaf_curl_Palampur_virus	1	PRJNA30181
+s__Barley_yellow_dwarf_virus	1	PRJNA208540
+s__Sphingomonas_phage_PAU	1	PRJNA181225
+s__Feline_immunodeficiency_virus	1	PRJNA15029
+s__Bacteroides_pectinophilus	1	GCF_000155855
+s__Halorubrum_aidingense	1	GCF_000336995
+s__Corchorus_yellow_vein_virus	1	PRJNA14563
+s__Leptotrichia_hofstadii	1	GCF_000162955
+s__Rhodococcus_ruber	2	GCF_000341965	GCF_000347955
+s__Halastavi_arva_RNA_virus	1	PRJNA77939
+s__Escherichia_hermannii	1	GCF_000248015
+s__Barfin_flounder_virus_BF93Hok	1	PRJNA30741
+s__Amycolatopsis_alba	1	GCF_000384215
+s__Candidatus_Midichloria_mitochondrii	1	GCF_000219355
+s__Lawsonia_intracellularis	2	GCF_000331715	GCF_000055945
+s__Mycobacterium_xenopi	1	GCF_000257745
+s__Bartonella_quintana	2	GCF_000294715	GCF_000046685
+s__Pseudomonas_phage_PA11	1	PRJNA16386
+s__Cupriavidus_necator	2	GCF_000009285	GCF_000219215
+s__Desulfovibrio_vulgaris	4	GCF_000166115	GCF_000021385	GCF_000195755	GCF_000015485
+s__actinobacterium_SCGC_AAA278_I18	1	GCF_000378865
+s__Cronobacter_sakazakii	7	GCF_000263215	GCF_000319615	GCF_000017665	GCF_000339015	GCF_000319595	GCF_000214745	GCF_000316155
+s__Bat_coronavirus_1B	1	PRJNA29249
+s__Pseudogulbenkiania_ferrooxidans	2	GCF_000174355	GCF_000462205
+s__Bovine_respiratory_coronavirus_bovine_US_OH_440_TC_1996	1	PRJNA39333
+s__Bifidobacterium_catenulatum	1	GCF_000173455
+s__Melanoplus_sanguinipes_entomopoxvirus	1	PRJNA14042
+s__Borrelia_duttonii	1	GCF_000019685
+s__Bizionia_argentinensis	1	GCF_000224335
+s__Tomato_leaf_curl_Guangxi_virus	1	PRJNA17607
+s__Cronobacter_condimenti	1	GCF_000319285
+s__Paenibacillus_mucilaginosus	3	GCF_000250655	GCF_000258535	GCF_000218915
+s__Alishewanella_jeotgali	1	GCF_000245735
+s__Hyphomicrobium_denitrificans	2	GCF_000230975	GCF_000143145
+s__Ophiostoma_mitovirus_6	1	PRJNA14844
+s__Ophiostoma_mitovirus_5	1	PRJNA14843
+s__Mossman_virus	1	PRJNA14915
+s__Gossypium_mustilinum_symptomless_alphasatellite	1	PRJNA39591
+s__Leptospira_noguchii	9	GCF_000244775	GCF_000216255	GCF_000350585	GCF_000350605	GCF_000346655	GCF_000306195	GCF_000243535	GCF_000243575	GCF_000306255
+s__Pelargonium_vein_banding_virus	1	PRJNA40631
+s__Eubacterium_eligens	1	GCF_000146185
+s__Botrytis_virus_X	1	PRJNA14947
+s__Botrytis_virus_F	1	PRJNA14707
+s__Legionella_anisa	1	GCF_000333755
+s__Clostridium_sp_7_3_54FAA	1	GCF_000233515
+s__Xanthomonas_fragariae	1	GCF_000376745
+s__Hepatitis_GB_virus_B	1	PRJNA15364
+s__Archaeal_BJ1_virus	1	PRJNA18503
+s__Citrus_sudden_death_associated_virus	1	PRJNA15170
+s__Brome_mosaic_virus	1	PRJNA15052
+s__Corynebacterium_sp_KPL1814	1	GCF_000478175
+s__Corynebacterium_sp_KPL1817	1	GCF_000478155
+s__Methanococcus_vannielii	1	GCF_000017165
+s__Clostridium_phage_phiCPV4	1	PRJNA169231
+s__Emticicia_oligotrophica	1	GCF_000263195
+s__Corynebacterium_sp_KPL1818	1	GCF_000478135
+s__Sweet_potato_chlorotic_stunt_virus	1	PRJNA14848
+s__Cryptobacterium_curtum	1	GCF_000023845
+s__Cell_fusing_agent_virus	1	PRJNA15326
+s__Salmonella_phage_SPN3UB	1	PRJNA181984
+s__Dolichos_yellow_mosaic_virus	1	PRJNA14344
+s__Eubacterium_sp_AS15	1	GCF_000287695
+s__Haloferax_alexandrinus	1	GCF_000336735
+s__Pseudomonas_taiwanensis	1	GCF_000500605
+s__Ruegeria_mobilis	1	GCF_000376545
+s__Aeromonas_media	1	GCF_000287215
+s__Saccharomonospora_saliphila	1	GCF_000383795
+s__Ajellomyces_dermatitidis	1	GCA_000003855
+s__Streptomyces_bingchenggensis	1	GCF_000092385
+s__Thalassomonas_phage_BA3	1	PRJNA27903
+s__Swinepox_virus	1	PRJNA14155
+s__Micromonospora_aurantiaca	1	GCF_000145235
+s__Bartonella_rattaustraliani	1	GCF_000312565
+s__Bombyx_mori_nucleopolyhedrovirus	2	PRJNA14089	PRJNA37971
+s__Rosellinia_necatrix_virus_1	1	PRJNA16156
+s__Desulfotomaculum_gibsoniae	1	GCF_000233715
+s__Salmonella_phage_FSL_SP_088	1	PRJNA212711
+s__Uncinocarpus_reesii	1	GCA_000003515
+s__Porcine_adenovirus_C	2	PRJNA14521	PRJNA40317
+s__Streptococcus_pyogenes	45	GCF_000499145	GCF_000483505	GCF_000013545	GCF_000012165	GCF_000013525	GCF_000011665	GCF_000499265	GCF_000013485	GCF_000483605	GCF_000263315	GCF_000454125	GCF_000250925	GCF_000018125	GCF_000483585	GCF_000275625	GCF_000007285	GCF_000011285	GCF_000290575	GCF_000444035	GCF_000483565	GCF_000483525	GCF_000011765	GCF_000006785	GCF_000230295	GCF_000444015	GCF_000290595	GCF_000468795	GCF_000250905	GCF_000483645	GCF_000483625	GCF_000422045	GCF_000499245	GCF_000499165	G [...]
+s__Rhizobium_sp_42MFCr_1	1	GCF_000377185
+s__Serratia_marcescens	5	GCF_000465615	GCF_000330865	GCF_000292365	GCF_000264275	GCF_000342205
+s__Porcine_bocavirus_3	1	PRJNA73547
+s__Okra_leaf_curl_India_virus	1	PRJNA61559
+s__Tomato_mosaic_virus	1	PRJNA14926
+s__Paenibacillus_vortex	1	GCF_000193415
+s__Halomonas_elongata	1	GCF_000196875
+s__Temperate_phage_phiNIH1_1	1	PRJNA14145
+s__Mobala_virus	1	PRJNA16582
+s__Eubacterium_ramulus	1	GCF_000469345
+s__Canna_yellow_streak_virus	1	PRJNA40629
+s__Streptomyces_griseoflavus	1	GCF_000158975
+s__Klebsiella_sp_1_1_55	1	GCF_000163075
+s__Snakehead_retrovirus	1	PRJNA14701
+s__Streptococcus_phage_SM1	1	PRJNA14295
+s__Neisseria_elongata	1	GCF_000176755
+s__Chaetoceros_lorenzianus_DNA_Virus	1	PRJNA63565
+s__Spirulina_subsalsa	1	GCF_000314005
+s__Thermobispora_bispora	1	GCF_000092645
+s__Salisaeta_longa	1	GCF_000419585
+s__Simian_sapelovirus	1	PRJNA14946
+s__Thetapapillomavirus_1	1	PRJNA14195
+s__Marinobacter_nanhaiticus	1	GCF_000364845
+s__Pseudomonas_phage_DMS3	1	PRJNA18521
+s__Aeromonas_phage_65	1	PRJNA64543
+s__Capnocytophaga_sp_CM59	1	GCF_000293175
+s__Marinobacter_hydrocarbonoclasticus	2	GCF_000015365	GCF_000284615
+s__Lactobacillus_rossiae	1	GCF_000277855
+s__Lodderomyces_elongisporus	1	GCA_000149685
+s__Dickeya_solani	1	GCF_000400565
+s__Myxococcus_phage_Mx8	1	PRJNA14391
+s__Soybean_dwarf_virus	1	PRJNA14715
+s__Streptococcus_mutans	145	GCF_000339395	GCF_000091645	GCF_000229045	GCF_000007465	GCF_000229645	GCF_000339835	GCF_000229745	GCF_000229465	GCF_000229025	GCF_000339935	GCF_000228865	GCF_000229325	GCF_000284575	GCF_000339795	GCF_000339195	GCF_000230025	GCF_000229165	GCF_000339355	GCF_000230045	GCF_000339155	GCF_000339695	GCF_000340055	GCF_000229905	GCF_000347795	GCF_000339475	GCF_000229725	GCF_000339575	GCF_000229425	GCF_000230005	GCF_000347835	GCF_000339815	GCF_000496535	GCF_000229065	GC [...]
+s__Streptococcus_phage_SMP	1	PRJNA18529
+s__Pseudomonas_phage_D3	1	PRJNA14500
+s__Actinomyces_sp_ph3	1	GCF_000308055
+s__Gordonia_sihwensis	1	GCF_000333035
+s__Serratia_sp_ATCC_39006	1	GCF_000463345
+s__Bacillus_sp_37MA	1	GCF_000372765
+s__Mycobacterium_phage_Phrux	1	PRJNA206029
+s__Candidatus_Regiella_insecticola	2	GCF_000284655	GCF_000143625
+s__Myzus_persicae_densovirus	1	PRJNA14299
+s__Weissella_paramesenteroides	1	GCF_000160575
+s__Deinococcus_apachensis	1	GCF_000381345
+s__Sporichthya_polymorpha	1	GCF_000384115
+s__Borrelia_hermsii	1	GCF_000012065
+s__Nocardiopsis_halotolerans	1	GCF_000341065
+s__Enterobacteria_phage_mEpX2	1	PRJNA183150
+s__Enterobacteria_phage_mEpX1	1	PRJNA183149
+s__Candidatus_Arthromitus_sp_SFB_co	1	GCF_000252765
+s__Aeromonas_phage_phiO18P	1	PRJNA19769
+s__Selenomonas_sp_CM52	1	GCF_000292955
+s__Streptococcus_phage_phi3396	1	PRJNA18859
+s__Penicillium_chrysogenum	1	GCA_000226395
+s__Rickettsia_prowazekii	12	GCF_000277265	GCF_000367405	GCF_000277225	GCF_000277245	GCF_000277165	GCF_000363905	GCF_000385495	GCF_000277185	GCF_000277205	GCF_000385475	GCF_000195735	GCF_000022785
+s__Loktanella_vestfoldensis	2	GCF_000152785	GCF_000382265
+s__Rhodonellum_psychrophilum	2	GCF_000473765	GCF_000381545
+s__Bacillus_phage_B4	1	PRJNA177520
+s__Lactobacillus_equicursoris	1	GCF_000312645
+s__Candidatus_Odyssella_thessalonicensis	1	GCF_000190415
+s__Streptococcus_ictaluri	1	GCF_000188015
+s__Omikronpapillomavirus_1	1	PRJNA15186
+s__Neisseria_polysaccharea	1	GCF_000176735
+s__Vibrio_phage_SIO_2	1	PRJNA80921
+s__Cherry_virus_A	1	PRJNA15080
+s__Raptor_adenovirus_A	1	PRJNA66343
+s__Chandipura_virus	1	PRJNA194137
+s__Archaeoglobus_sulfaticallidus	1	GCF_000385565
+s__Rhodobacter_phage_RcapNL	1	PRJNA192926
+s__Phlox_Virus_B	1	PRJNA27905
+s__Staphylococcus_phage_JD007	1	PRJNA183162
+s__Segniliparus_rugosus	1	GCF_000185725
+s__Tomato_yellow_leaf_curl_China_betasatellite	1	PRJNA181248
+s__Hop_stunt_viroid	1	PRJNA14720
+s__Malvastrum_yellow_mosaic_Cameroon_alphasatellite	1	PRJNA61909
+s__Cronobacter_turicensis	2	GCF_000319515	GCF_000027065
+s__Rickettsia_rhipicephali	1	GCF_000284075
+s__Leptospira_weilii	9	GCF_000243595	GCF_000217475	GCF_000332415	GCF_000246655	GCF_000243995	GCF_000244355	GCF_000244815	GCF_000246635	GCF_000216315
+s__Acetobacterium_woodii	1	GCF_000247605
+s__Lactobacillus_gastricus	1	GCF_000247775
+s__Saccharomonospora_viridis	1	GCF_000023865
+s__Campylobacter_gracilis	1	GCF_000175875
+s__Common_moorhen_coronavirus_HKU21	1	PRJNA109281
+s__Synechococcus_sp_WH_8102	1	GCF_000195975
+s__Mycoreovirus_3	1	PRJNA16143
+s__Methanoplanus_petrolearius	1	GCF_000147875
+s__Strawberry_vein_banding_virus	1	PRJNA15207
+s__Synechococcus_sp_WH_8109	1	GCF_000161795
+s__Escherichia_phage_phAPEC8	1	PRJNA185315
+s__Desulfotomaculum_kuznetsovii	1	GCF_000214705
+s__Dehalogenimonas_lykanthroporepellens	1	GCF_000143165
+s__Thermococcus_sp_CL1	1	GCF_000265525
+s__Thermaerobacter_subterraneus	1	GCF_000183545
+s__Enterococcus_dispar	2	GCF_000406945	GCF_000407585
+s__Synechococcus_sp_RS9917	1	GCF_000153065
+s__Synechococcus_sp_RS9916	1	GCF_000153825
+s__gamma_proteobacterium_HTCC2207	1	GCF_000153445
+s__Nitrosomonas_sp_AL212	1	GCF_000175095
+s__Sunflower_mild_mosaic_virus	1	PRJNA198478
+s__Lachnospiraceae_bacterium_3_1_46FAA	1	GCF_000209405
+s__Williamsia_sp_D3	1	GCF_000506245
+s__Bacillus_macauensis	1	GCF_000269865
+s__Thioalkalivibrio_sp_ALRh	1	GCF_000381425
+s__Hyperthermus_butylicus	1	GCF_000015145
+s__Thermodesulfobacterium_thermophilum	1	GCF_000421605
+s__Bilophila_sp_4_1_30	1	GCF_000224655
+s__Tannerella_sp_6_1_58FAA_CT1	1	GCF_000238695
+s__African_horse_sickness_virus	1	PRJNA14937
+s__Reticuloendotheliosis_virus	1	PRJNA15145
+s__Glaciecola_polaris	1	GCF_000315055
+s__Porcine_reproductive_and_respiratory_syndrome_virus	1	PRJNA15437
+s__Johnsongrass_mosaic_virus	1	PRJNA15349
+s__Haloterrigena_salina	1	GCF_000337495
+s__Granulicella_mallensis	1	GCF_000178955
+s__Chikungunya_virus	1	PRJNA14998
+s__Porcine_rubulavirus	1	PRJNA20055
+s__Burkholderia_sp_KJ006	1	GCF_000262695
+s__Tanapox_virus	2	PRJNA14595	PRJNA20981
+s__SAR324_cluster_bacterium_SCGC_AB_629_O05	1	GCF_000375805
+s__Leuconostoc_mesenteroides	4	GCF_000160595	GCF_000447945	GCF_000234825	GCF_000014445
+s__Selenomonas_sp_oral_taxon_138	1	GCF_000318175
+s__Balneola_vulgaris	1	GCF_000375465
+s__Selenomonas_sp_oral_taxon_137	1	GCF_000183625
+s__Eremococcus_coleocola	1	GCF_000183205
+s__Thauera_selenatis	1	GCF_000284915
+s__Cryptococcus_neoformans	2	GCA_000091045	GCA_000149385
+s__Neisseria_weaveri	2	GCF_000224255	GCF_000224275
+s__Roseiflexus_castenholzii	1	GCF_000017805
+s__Streptococcus_sp_M143	1	GCF_000162495
+s__Streptomyces_phage_Sujidade	1	PRJNA206036
+s__Erythrobacter_sp_SD_21	1	GCF_000181515
+s__Ammonifex_degensii	1	GCF_000024605
+s__Plasmodium_chabaudi	1	GCA_000003075
+s__Staphylococcus_pseudintermedius	3	GCF_000189495	GCF_000390045	GCF_000185885
+s__Leptothrix_ochracea	1	GCF_000262525
+s__Pepper_mottle_virus	1	PRJNA15312
+s__Gordonia_bronchialis	1	GCF_000024785
+s__Campylobacter_sp_03_427	1	GCF_000495505
+s__Allobaculum_stercoricanis	1	GCF_000384195
+s__Mycoplasma_leachii	2	GCF_000183365	GCF_000253095
+s__Beet_curly_top_virus	1	PRJNA14366
+s__Lamprocystis_purpurea	1	GCF_000379525
+s__Moorella_thermoacetica	1	GCF_000013105
+s__Opitutus_terrae	1	GCF_000019965
+s__Candidatus_Koribacter_versatilis	1	GCF_000014005
+s__Rangifer_tarandus_papillomavirus_2	1	PRJNA214364
+s__Pseudoplusia_includens_densovirus	1	PRJNA181249
+s__Malvastrum_leaf_curl_betasatellite	2	PRJNA16301	PRJNA16320
+s__Staphylococcus_sp_MDS7B	1	GCF_000387985
+s__Clostridium_sporosphaeroides	1	GCF_000383295
+s__Methanocorpusculum_labreanum	1	GCF_000015765
+s__Prevotella_bergensis	1	GCF_000160535
+s__Nocardiopsis_kunsanensis	1	GCF_000340965
+s__Peanut_mottle_virus	1	PRJNA15352
+s__Taterapox_virus	1	PRJNA17483
+s__Scotophilus_bat_coronavirus_512	1	PRJNA20135
+s__Arthrobacter_arilaitensis	1	GCF_000197735
+s__Thioalkalivibrio_sp_ALM2T	1	GCF_000381505
+s__Hyphomonas_neptunium	1	GCF_000013025
+s__Yam_bean_mosaic_virus	1	PRJNA78927
+s__Acidovorax_sp_KKS102	1	GCF_000302535
+s__Melon_necrotic_spot_virus	1	PRJNA15502
+s__Sweet_potato_leaf_curl_Shanghai_virus	1	PRJNA217878
+s__Streptococcus_phage_SP_QS1	1	PRJNA213015
+s__Propionibacterium_sp_KPL1838	1	GCF_000477735
+s__Mahella_australiensis	1	GCF_000213255
+s__Pectobacterium_phage_My1	1	PRJNA177525
+s__Spring_beauty_latent_virus	1	PRJNA15009
+s__Listeria_phage_LP_110	1	PRJNA212944
+s__Betacoronavirus_Erinaceus_VMC_DEU_2012	1	PRJNA226084
+s__Rheinheimera_perlucida	1	GCF_000382165
+s__Faba_bean_necrotic_yellows_virus	1	PRJNA14427
+s__Pumpkin_yellow_mosaic_virus	1	PRJNA30159
+s__Bacillus_sp_105MF	1	GCF_000374885
+s__Nitratireductor_pacificus	1	GCF_000300335
+s__Staphylococcus_phage_tp310_2	1	PRJNA20661
+s__Staphylococcus_phage_tp310_3	1	PRJNA20663
+s__Streptomyces_purpureus	1	GCF_000384175
+s__Comamonas_sp_B_9	1	GCF_000410635
+s__Tomato_planta_macho_viroid	1	PRJNA15000
+s__Cyanothece_sp_CCY0110	1	GCF_000169335
+s__Parabacteroides_johnsonii	2	GCF_000156495	GCF_000307375
+s__Borrelia_crocidurae	1	GCF_000259345
+s__Eggerthella_sp_HGA1	1	GCF_000191845
+s__Xenorhabdus_bovienii	1	GCF_000027225
+s__Simonsiella_muelleri	1	GCF_000163775
+s__Marinimicrobia_bacterium_SCGC_AAA160_C11	1	GCF_000402795
+s__Candidatus_Hamiltonella_defensa	1	GCF_000021705
+s__Luna_virus	1	PRJNA76617
+s__Erwinia_phage_vB_EamP_S6	1	PRJNA181230
+s__Treponema_maltophilum	1	GCF_000413055
+s__Oenococcus_kitaharae	1	GCF_000241055
+s__Velvet_tobacco_mottle_virus_Satellite_RNA	1	PRJNA14194
+s__Ralstonia_phage_RSS20	1	PRJNA213020
+s__Eubacteriaceae_bacterium_CM5	1	GCF_000238135
+s__Erwinia_phage_phiEa100	1	PRJNA184154
+s__Eubacteriaceae_bacterium_CM2	1	GCF_000238095
+s__Planococcus_donghaensis	1	GCF_000189395
+s__Mycobacterium_phage_Fruitloop	1	PRJNA32013
+s__Treponema_caldaria	1	GCF_000219725
+s__Lactobacillus_phage_A2	1	PRJNA14602
+s__Banana_streak_IM_virus	1	PRJNA66619
+s__Epiphyas_postvittana_nucleopolyhedrovirus	1	PRJNA14127
+s__Campylobacter_phage_CP21	1	PRJNA181241
+s__Aeromonas_phage_Aes508	1	PRJNA181986
+s__Solobacterium_moorei	1	GCF_000186945
+s__Clostridium_difficile	209	GCF_000452345	GCF_000449885	GCF_000085225	GCF_000448825	GCF_000448925	GCF_000154685	GCF_000450885	GCF_000451905	GCF_000451965	GCF_000450945	GCF_000449325	GCF_000027105	GCF_000451865	GCF_000451885	GCF_000449025	GCF_000448765	GCF_000450865	GCF_000450645	GCF_000450625	GCF_000450385	GCF_000155065	GCF_000164175	GCF_000451385	GCF_000450145	GCF_000449625	GCF_000450405	GCF_000449565	GCF_000451765	GCF_000009205	GCF_000449465	GCF_000450425	GCF_000451485	GCF_000473585	G [...]
+s__Pepper_yellow_dwarf_virus_New_Mexico	1	PRJNA31127
+s__Tomato_leaf_curl_Karnataka_virus	1	PRJNA14192
+s__Aminobacterium_colombiense	1	GCF_000025885
+s__Herbaspirillum_sp_JC206	1	GCF_000312045
+s__Hydrogenobaculum_sp_SN	1	GCF_000348765
+s__Deftia_phage_phiW_14	1	PRJNA42945
+s__Xenococcus_sp_PCC_7305	1	GCF_000332055
+s__Marinimicrobia_bacterium_SCGC_AAA298_D23	1	GCF_000402655
+s__Xanthomonas_vasicola	6	GCF_000277995	GCF_000278035	GCF_000278075	GCF_000278055	GCF_000278015	GCF_000159795
+s__Polaribacter_irgensii	1	GCF_000153225
+s__Succinimonas_amylolytica	1	GCF_000378405
+s__Spodoptera_exigua_multiple_nucleopolyhedrovirus	1	PRJNA14134
+s__Clostridium_ultunense	1	GCF_000344075
+s__Furcraea_necrotic_streak_virus	1	PRJNA192610
+s__Natrialba_hulunbeirensis	1	GCF_000337575

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/metaphlan2.git



More information about the debian-med-commit mailing list