[med-svn] [Git][med-team/resfinder][upstream] New upstream version 3.1.0
Andreas Tille
gitlab at salsa.debian.org
Thu Oct 18 15:49:12 BST 2018
Andreas Tille pushed to branch upstream at Debian Med / resfinder
Commits:
5079ee1c by Andreas Tille at 2018-10-18T14:36:00Z
New upstream version 3.1.0
- - - - -
8 changed files:
- − INSTALL_DB
- − Makefile
- README.md
- − UPDATE_DB
- − VALIDATE_DB
- − brew.sh
- resfinder.pl
- + resfinder.py
Changes:
=====================================
INSTALL_DB deleted
=====================================
@@ -1,29 +0,0 @@
-#!/bin/bash
-# INPUT: path_to_db
-
-# Check if the user provided the input
-if [ -z "$1" ]; then
- echo "Provide a path to or a name for the database, followed by [ENTER]:"
- read path
-else
- path=$1
-fi
-
-# Check if the path is valid and unexisting
-if [ ! -d $path ]; then
- git clone git at bitbucket.org:genomicepidemiology/resfinder_db.git $path
-else
- echo "Error: path exists! Please provide a valid and unused path..."
-fi
-
-echo "Do you want to fix permissions, yes/y or no/n, followed by [ENTER]:"
-read fix
-if [[ $fix == "yes" ]] || [[ $fix == "y" ]]; then
- echo "enter group name followed by [ENTER]:"
- read gn
- if [ ! -z "$gn" ]; then
- chgrp -R $gn $path
- fi
- find $path \( -name .git \) -prune -printf '' -o -type d -print -exec chmod 775 {} \;
- find $path \( -name .git \) -prune -printf '' -o -type f -print -exec chmod 664 {} \;
-fi
=====================================
Makefile deleted
=====================================
@@ -1,9 +0,0 @@
-install:
- cpanm CJFIELDS/BioPerl-1.6.924.tar.gz --force
- cpanm Data::Dumper
- cpanm Getopt::Long
- cpanm File::Temp
-clean:
- find ./ -name "*.DS_Store" -delete
- find ./ -name "*.log" -delete
- find ./ -name "*.gz" -delete
=====================================
README.md
=====================================
@@ -1,170 +1,117 @@
-===================
-ResFinder
-===================
-
-This project documents ResFinder service
-
-
-Documentation
+ResFinder documentation
=============
-## What is it?
-
The ResFinder service contains one perl script *resfinder.pl* which is the
script of the latest version of the ResFinder service. ResFinder identifies
acquired antimicrobial resistance genes in total or partial sequenced isolates
of bacteria.
+This repository also contains a python script *resfinder.py* which is a new version
+of ResFinder, but not yet running on the CGE server. This program was added because
+it uses a newer version of blastn, which, in contrary from the blastall version
+that the perl script uses, is avail to download.
+
## Content of the repository
1. resfinder.pl - the program
-2. INSTALL_DB - shell script for downloading the ResFinder database
-3. UPDATE_DB - shell script for updating the database to the newest version
-4. VALIDATE_DB - python script for verifying the database contains all
- required files
-5. brew.sh - shell script for installing dependencies
-6. makefile - make script for installing dependencies
-7. test.fsa - test fasta file
+2. resfinder.py - (same program using an available blastn version - blastn-2.2.26+)
+3. test.fsa - test fasta file
## Installation
-Setting up ResFinder
+Setting up ResFinder script and database
```bash
# Go to wanted location for resfinder
cd /path/to/some/dir
+
# Clone and enter the resfinder directory
-git clone https://bitbucket.org/genomicepidemiology/resfinder.git
+git clone https://git@bitbucket.org/genomicepidemiology/resfinder.git
cd resfinder
-```
-Installing up the ResFinder database
-```bash
-cd /path/to/resfinder
-./INSTALL_DB database
-# Check all DB scripts works, and validate the database is correct
-./UPDATE_DB database
-./VALIDATE_DB database
-```
+# Installing up the ResFinder database
+# Go to wanted location for resfinder database
+cd /path/to/some/dir
+
+# Clone and enter the resfinder directory
+git clone https://git@bitbucket.org/genomicepidemiology/resfinder_db.git
+cd resfinder_db
-Installing dependencies:
-Perlbrew is used to manage isolated perl environments. To install it run:
-```bash
-bash brew.sh
```
-This will installed Perl 5.23 in the Home folder, along with CPAN minus as
-package manager.
-Blast will also be installed when running brew.sh if BlastAll and FormatDB are
-not already installed and place in the user's path.
-After running brew.sh and installing Blast add this command to the end of your
-~/bash_profile to add BlastAll and FormatDB to the user's path
+### Installing dependencies (for python script):
-```bash
-export PATH=$PATH:blast-2.2.26/bin
-```
+The BlastAll and FormatDB that the perl script uses are no longer available
+for downloading through ncbi. Therefor we have provided the resfinder.py
+scriot that uses Blastn instead. Note, this is not not script that is running
+on the CGE server. The CGE server is running the perl script using BlastAll
-If you want to download the two external tools from the Blast package, BlastAll
-and FormatDB, yourself go to
-```url
-ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/LATEST
-```
-and download the version for your OS with the format:
+#### Download Blastn and BioPython
```url
-blast-version-architecture-OS.tar.gz
+http://biopython.org/DIST/docs/install/Installation.html
```
-
-after unzipping the file, add this command to the end of your ~/bash_profile.
-```bash
-export PATH=$PATH:/path/to/blast-folder/bin
+```url
+ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/
```
-
-where path/to/blast-folder is the folder you unzipped.
-
-At last ResFinder has several Perl dependencies. To install them (this requires
-CPAN minus as package manager):
+#### Install the cgecore module to python3
```bash
-make install
+pip3 install cgecore
```
-The scripts are self contained. You just have to copy them to where they should
-be used.
-
-Remember to add the program to your system path if you want to be able to
-invoke the program without calling the full path.
-If you don't do that you have to write the full path to the program when using
-it.
+## Usage
-## Test the scripts and database
+You can run resfinder command line using python3
+
```bash
-cd /path/to/test_dir
-cp /path/to/resfinder/test.fsa .
-perl /path/to/resfinder/resfinder.pl -d /path/to/resfinder/database \
--b /path/to/blast/parent/dir -i test.fsa -a aminoglycoside -k 90.00 -l 0.60
-```
-
-## Usage
-The program can be invoked with the -h option to get help and more information
-of the service.
-
-```bash
-Usage: perl resfinder.pl [options]
-
-Options:
-
- -h HELP
- Prints a message with options and information to the screen
- -d DATABASE
- The path to where you have located the database folder
- -b BLAST
- The path to the location of blast-2.2.26 if it is not added
- to the users path (see the install guide in 'README.md')
- -i INFILE
- Your input file which needs to be preassembled partial
- or complete genomes in fasta format
- -o OUTFOLDER
- The folder you want to have your output files places.
- If not specified the program will create a folder named
- 'Output' in which the result files will be stored
- -a ANTIMICROBIAL
- Antimicrobial configuration. The options can be found
- in the file 'ResFinder_Antimicrobial'
- -k THRESHOLD
- The threshold for % identity for example '95.00' for 95 %
- -l MIN_LENGHT
- The minimum length of the overlap ex 0.60 for an overlap
- of minimum 60 %
+# Example of running resfinder
+python3 resfinder.py -i test.fsa -o . -p /path/to/resfinder_db \
+-b /path/to/blastn -d aminoglycoside -k 90.00 -l 0.60
+
+# The program can be invoked with the -h option
+Usage: resfinder.py [-h] [-i INPUTFILE] [-1 FASTQ1] [-2 FASTQ2] [-o OUT_PATH]
+ [-b BLAST_PATH] [-p DB_PATH] [-k KMA_PATH]
+ [-q DB_PATH_KMA] [-d DATABASES] [-l MIN_COV]
+ [-t THRESHOLD]
+
+optional arguments:
+ -h, --help show this help message and exit
+ -i INPUTFILE, --inputfile INPUTFILE
+ Input file
+ -1 FASTQ1, --fastq1 FASTQ1
+ Raw read data file 1.
+ -2 FASTQ2, --fastq2 FASTQ2
+ Raw read data file 2 (only required if data is paired-
+ end).
+ -o OUT_PATH, --outputPath OUT_PATH
+ Path to blast output
+ -b BLAST_PATH, --blastPath BLAST_PATH
+ Path to blast
+ -p DB_PATH, --databasePath DB_PATH
+ Path to the databases
+ -k KMA_PATH, --kmaPath KMA_PATH
+ Path to KMA
+ -q DB_PATH_KMA, --databasePathKMA DB_PATH_KMA
+ Path to the directories containing the KMA indexed
+ databases. Defaults to the directory 'kma_indexing'
+ inside the databasePath directory.
+ -d DATABASES, --databases DATABASES
+ Databases chosen to search in - if none are specified
+ all are used
+ -l MIN_COV, --min_cov MIN_COV
+ Minimum coverage default 0.6
+ -t THRESHOLD, --threshold THRESHOLD
+ Blast threshold for identity
+ default minimum 0.9
```
-#### Example of use with the *database* folder is loacted in the current
-#### directory and Blast added to the user's path
-```perl
- perl resfinder.pl -i test.fsa -o OUTFOLDER -a aminoglycoside -k 90.00 \
- -l 0.60
-```
-#### Example of use with the *database* and *blast-2.2.26* folders loacted in
-#### other directories
-```perl
- perl resfinder.pl -d path/to/database -b path/to/blast-2.2.26 -i \
- test.fsa -o OUTFOLDER -a aminoglycoside -k 90.00 -l 0.60
-```
-
-## Web-server
+### Web-server
A webserver implementing the methods is available at the [CGE
website](http://www.genomicepidemiology.org/) and can be found here:
https://cge.cbs.dtu.dk/services/ResFinder/
-
-## The Latest Version
-
-
-The latest version can be found at
-https://bitbucket.org/genomicepidemiology/resfinder/overview
-
-## Documentation
-
+### Documentation
The documentation available as of the date of this release can be found at
https://bitbucket.org/genomicepidemiology/resfinder/overview.
=====================================
UPDATE_DB deleted
=====================================
@@ -1,33 +0,0 @@
-#!/bin/bash
-# INPUT: path_to_db
-
-# Check if the user provided the input
-if [ -z "$1" ]; then
- cwd=$(pwd)
- echo "Current path: "$cwd
- echo "Type the path to where the database is located, followed by [ENTER]:"
- read path
-else
- path=$1
-fi
-
-# Check if the path is valid and existing
-if [ -d $path ]; then
- cd $path
- #git stash
- git pull
-else
- echo "Error: path does not exist! Please provide a valid path..."
-fi
-
-echo "Do you want to fix permissions, yes/y or no/n, followed by [ENTER]:"
-read fix
-if [[ $fix == "yes" ]] || [[ $fix == "y" ]]; then
- echo "enter group name followed by [ENTER]:"
- read gn
- if [ ! -z "$gn" ]; then
- chgrp -R $gn $path
- fi
- find $path \( -name .git \) -prune -printf '' -o -type d -print -exec chmod 775 {} \;
- find $path \( -name .git \) -prune -printf '' -o -type f -print -exec chmod 664 {} \;
-fi
=====================================
VALIDATE_DB deleted
=====================================
@@ -1,55 +0,0 @@
-#!/usr/bin/env python
-''' Validate Database '''
-import sys, os
-
-if len(sys.argv) > 1:
- db_path = sys.argv[1]
-else:
- print('Current path: %s'%(os.getcwd()))
- db_path = input('Please provide the path to the database:')
-
-# VALIDATE REQUIRED ARGUMENTS
-if not os.path.exists(db_path):
- sys.exit("Error: The specified database directory does not exist!\n")
-else:
- # Check existence of config file
- db_config_file = '%s/config'%(db_path)
- if not os.path.exists(db_config_file):
- sys.exit("Error: The database config file could not be found!")
-
-dbs = []
-extensions = []
-with open(db_config_file) as f:
- for l in f:
- l = l.strip()
- if l == '': continue
- if l[0] == '#':
- if 'important files are:' in l:
- files = ["%s/%s"%(db_path, s.strip())
- for s in l.split('are:')[-1].split(',')]
- # Check all files exist
- for path in files:
- if not os.path.exists(path):
- sys.exit('Error: %s not found!'%(path))
- if 'extensions:' in l:
- extensions = [s.strip() for s in l.split('extensions:')[-1].split(',')]
- continue
- tmp = l.split('\t')
- if len(tmp) != 3:
- sys.exit(("Error: Invalid line in the database config file!\n"
- "A proper entry requires 3 tab separated columns!\n%s")%(l))
- db_prefix = tmp[0].strip()
- name = tmp[1].split('#')[0]
- description = tmp[2]
- # Check if all db files are present
- for ext in extensions:
- path = "%s/%s.%s"%(db_path, db_prefix, ext)
- if not os.path.exists(path):
- sys.exit(("Error: The database file (%s) could not be found!")%(
- path))
- dbs.append((name, db_prefix))
-
-if len(dbs) == 0:
- sys.exit("Error: No databases were found in the database config file!")
-else:
- print("Validation passed. Database is valid.")
\ No newline at end of file
=====================================
brew.sh deleted
=====================================
@@ -1,71 +0,0 @@
-#!/bin/env bash
-#
-
-PERLBREW='http://install.perlbrew.pl'
-BLASTLINUX='ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-x64-linux.tar.gz'
-BLASTMAC='ftp://ftp.ncbi.nlm.nih.gov/blast/executables/legacy.NOTSUPPORTED/2.2.26/blast-2.2.26-universal-macosx.tar.gz'
-
-BLASTFOLDER=blast
-
-# PerlBrew needs to be installed to manage isolated perl environemnts if missing
-command -v perlbrew >/dev/null 2>&1 || {
- echo 'Installing Perl brew...'
- echo "OS type: ${OSTYPE}"
-
- if [[ "$OSTYPE" == 'linux'* ]]; then
- wget -O - http://install.perlbrew.pl | bash
- else
- curl -L ${PERLBREW} | bash
- fi
-
- source ~/perl5/perlbrew/etc/bashrc
- echo 'source ~/perl5/perlbrew/etc/bashrc' >> ~/.bash_profile
-}
-
-perlbrew init
-echo 'Upgrading batchperl'
-perlbrew install-patchperl
-
-echo "Do you want to install a local perl? [Y]/[N]"
-read answer
-if [ $answer == 'Y' ]; then
- echo 'Installing perl-5.22.0 ...';
- #perlbrew install perl-5.22.0
- perlbrew --force install perl-5.22.0
- perlbrew switch perl-5.22.0
- #cd /root/perl5/perlbrew/build/perl-5.22.0/perl-5.22.0
- #make intall
-else
- echo 'Local perl will be used ...';
-fi
-
-echo "Do you want to install a cpanmin locally through perlbrew? [Y]/[N]"
-read answer
-if [ $answer == 'Y' ]; then
- echo 'Installing perlbrew install-cpanm...';
- perlbrew install-cpanm
-else
- echo "Do you want to install a cpanmin as sudo[Y]/[N]"
- read answer
- if [ $answer == 'Y' ]; then
- echo 'Installing cpanmin as sudo...';
- curl -L https://cpanmin.us | perl - --sudo App::cpanminus
- else
- echo 'Assuming cpanm is already installed...'
- fi
-fi
-
-
-# Installing NCBI Blast tools if missing
-command -v blastall >/dev/null 2>&1 || {
- echo 'Installing Blast tools...'
- echo "OS type: ${OSTYPE}"
-
- if [[ "$OSTYPE" == 'linux'* ]]; then
- curl ${BLASTLINUX} -o ${BLASTFOLDER}.tar.gz
- else
- curl ${BLASTMAC} -o ${BLASTFOLDER}.tar.gz
- fi
-
- tar -zxvf ${BLASTFOLDER}.tar.gz
-}
=====================================
resfinder.pl
=====================================
@@ -1,71 +1,95 @@
-#!/usr/bin/env perl
+#!/usr/bin/perl
# --------------------------------------------------------------------
# %% Setting up %%
+#
+
use strict;
use Getopt::Long qw(:config no_ignore_case no_auto_abbrev pass_through);
+use File::Temp qw/ tempfile tempdir /;
+
use Bio::SeqIO;
+use Bio::Seq;
use Bio::SearchIO;
-use Try::Tiny::Retry;
use constant PROGRAM_NAME => 'ResFinder_v2.pl';
use constant PROGRAM_NAME_LONG => 'Findes antimicrobial resitance genes for a sequence or genome';
-use constant VERSION => '2.3';
-
-#Global variables
-my $BLAST;
-my $BLASTALL;
-my $FORMATDB;
-my $ABRES_DB;
-my $Pheno;
-my ($Help, $AB_indput, $threshold, $min_length, $InFile, $dir);
+use constant VERSION => '3.0';
+
+# my $BLASTALL = "/usr/cbs/bio/bin/Linux/x86_64/blastall";
+my $BLASTALL = "/usr/local/bin/blastall";
+# my $FORMATDB = "/usr/cbs/bio/bin/Linux/x86_64/formatdb";
+my $FORMATDB = "/usr/local/bin/formatdb";
+my $ABRES_DB = "/home/data1/services/ResFinder/database_repository"; # This one should point to a directory containing the files with the antibiotic resistance genes.
+my $Pheno = "$ABRES_DB/notes.txt";
+
+
+my ($Help, $AB_indput, $threshold, $min_length, $InFile, $runroot);
my $IFormat = "fasta";
my $OFormat = "ST";
my %ARGV = ('-p' => 'blastn', '-a' => '5' , '-F' => 'F');
+my ($checkbox,$species);
-#Getting global variables from the command line
-&commandline_parsing();
+&GetOptions (
+ "d=s" => \$AB_indput,
+ "k=s" => \$threshold,
+ "l=s" => \$min_length,
+ "c=s" => \$checkbox,
+ "s=s" => \$species,
+ "i=s" => \$InFile,
+ "I=s" => \$IFormat,
+ "O=s" => \$OFormat,
+ "R=s" => \$runroot,
+ "h|help" => \$Help # Prints the help text
+);
-if (defined $Help || not defined $AB_indput || not defined $InFile || not defined $threshold || not defined $min_length) {
- print_help();
- exit;
+if (defined $Help || not defined $AB_indput) {
+ print_help();
+ exit;
}
-#If there are not given a path to the database or BLAST the program assume that the files are located in the curet directury
-if (not defined $BLAST) {
- $BLASTALL = "blastall";
- $FORMATDB = "formatdb";
-}
-if (not defined $ABRES_DB) {
- $ABRES_DB = "database";
- $Pheno = "database/notes.txt";
-}
-if (not defined $dir) {
- $dir = ".";
-}
+my %Temp = @ARGV;
+ at ARGV{keys %Temp} = values %Temp;
my $procent_length = 100*$min_length;
-# Making tmp directory for BLAST output
-my $tmp_dir = "$dir/tmp";
-mkdir("$dir/tmp");
-
+# --------------------------------------------------------------------
+# Printing info
+=pod
+print "<style type='text/css'>
+ .bglightgreen{background-color:#BEDCBE;} /*IMPERFECT HIT COLOR*/
+ .bggrey{background-color:#9D9D9D;} /*WARNING COLOR*/
+ .bggreen{background-color:#6EBE50;} /*PERFECT HIT COLOR*/
+ .bgred{background-color:#BF2026;} /*NO HITS COLOR*/
+ .results th,td{ /*PROPERTIES OF TABLE CELLS*/
+ font-size: 14px;
+ padding: 4px 5px 1px 5px;
+ text-align: center;
+ }
+ .results th{ /*HEADER COLORS*/
+ color:#FFF;
+ background-color:#3956A6;
+ }
+ .results td a{color:#3956A6;} /*LINK COLOR*/
+ </style>";
+print "<h1>ResFinder-3.0 Server - Results: Checkbox - $species </h1>\n<center><br>";
+=cut
#---------------------------------------------------------------------
# %% Main Program %%
#
-##################### Antimicrobial families #################
-# Extract Resistance class names from config file
+## AVAILABLE antimicrobial families ##
+ # hash mapping the ARGF profiles to antibiotic names
my %argfProfiles=();
-open(IN, '<', "$ABRES_DB/config") or die "Error: $!\n";
+##################### Antimicrobial families ################# ="Aminoglycoside";
+open(IN, '<', "$ABRES_DB/config") or die "Could't open config file!\n";
while (defined(my $line = <IN>)) {
next if $line =~ m/^#/; # discard comments
my @tmp = split("\t", $line);
$argfProfiles{$tmp[0]} = $tmp[1];
}
close IN;
-
# -----------------------
#Making phenotype list
my %Pheno_hash = &phenolist($Pheno);
@@ -79,8 +103,8 @@ my %FINAL_RESULTS; # will contain the results for txt printing
### Variables for text-printing ###
my $txtresults= "ResFinder result file\n\n"; #Added to txt print
-my $tableresult = "";
-my $tabr .= "Resistance gene\tIdentity\tQuery/HSP\tContig\tPosition in contig\tPhenotype\tAccession no.\n"; #added to print tab-separated txt file
+my $tabr = ''; # Making empty tab result
+#$txtresults .= "Resistance gene\tIdentity\tAllele Lenght/HSP\tContig\tPosition in contig\tAccession no.\n";
my $contigtable ="\nContig Table\n"; #added to txt print
$contigtable .= "-----------\n";
my $alignment = "\n\nAlignments as text\n_________________________________________________________________________________\n";
@@ -88,29 +112,49 @@ my @headarray = ("RES. GENE ", "IDENTITY ", "QUERY/HSP ", "CONTIG ",
my $resalign = ""; #Added for alignment print
my $hits_in_seq = ""; #Added for alignment print
+#### FINDING THE RUNROOT ## Added to print textresults
+#my @tmpArrRunroot = split(/final_input/, $InFile);
+my @tmpArrRunroot = split(/inputs/, $InFile);
+#my $runroot = $tmpArrRunroot[0];
+my $dir;
+if($runroot){
+ $dir = $runroot."/downloads/";
+}
+else{
+ $runroot = "./";
+ $dir = "./";
+}
+
+# SETTING THE OUTPUT FOLDER (DOWNLOAD)
+#my $dir = $runroot."downloads/";
+#my $dir = "downloads/";
+
+#If all databases is called
+if ($AB_indput eq "ALL") {
+ $AB_indput = "aminoglycoside,beta-lactamase,colistin,quinolone,fosfomycin,fusidicacid,macrolide,nitroimidazole,oxazolidinone,phenicol,rifampicin,sulphonamide,tetracycline,trimethoprim,vancomycin";
+}
+
+
# Finding genes for each chosen antimicrobial family
my $antibiocount = 0;
my @Antimicrobial = split(/,/,$AB_indput);
foreach my $element(@Antimicrobial){
+ #print "$element\n";
$antibiocount ++;
my $CurrentAnti = $element;
# Run BLAST and find best matching Alleles
-
- my ($Seqs_ABres, $Seqs_input, @Blast_lines);
-
- retry{
- $Seqs_ABres = read_seqs(-file => $ABRES_DB.'/'.$element.'.fsa', format => 'fasta');
- $Seqs_input = $InFile ne "" ? read_seqs(-file => $InFile, -format => $IFormat) :
+ my $Seqs_ABres = read_seqs(-file => $ABRES_DB.'/'.$element.'.fsa', -format => 'fasta');
+ my $Seqs_input = $InFile ne "" ? read_seqs(-file => $InFile, -format => $IFormat) :
read_seqs(-fh => \*STDIN, -format => $IFormat);
-
- @Blast_lines = get_blast_run($tmp_dir, $element, -d => $Seqs_input, -i => $Seqs_ABres, %ARGV);
- }
- catch{ die $_ };
-
+
+ my @Blast_lines = get_blast_run(-d => $Seqs_input, -i => $Seqs_ABres, %ARGV);
#Declaring variables - array and hash
my @RESULTS_AND_SETTINGS_ARRAY; #will contain the typing results some setting and the hash with the results for each gene
my @GENE_RESULTS_ARRAY; #will contain the typing results some setting and the hash with the results for each gene
+ #my %GENE_ALIGN_HIT_HASH; #will contain the sequence alignment lines
+ #my %GENE_ALIGN_HOMO_HASH; #will contain the sequence alignment homolog string
+ #my %GENE_ALIGN_QUERY_HASH; #will contain the sequence alignment allele string
my %GENE_RESULTS_HASH;
# Declaring variables for each BLAST
my %ABres;
@@ -160,13 +204,17 @@ foreach my $element(@Antimicrobial){
my $id2 = $qid.":".$hit_start."_".$hit_end."_".$bits;
my $id = $hit_start."_".$hit_end."_".$contig_name;
#print "$id\n";
- #print "query_length: $query_length, hsp_length: $hsp_length, ident: $ident, bits: $bits, score: $calc_score, hit_start: $hit_start, hit_end: $hit_end, gaps: $gaps, contig: $contig_name\n";
-
+ #print "<h1>query_length: $query_length, hsp_length: $hsp_length, ident: $ident, bits: $bits, score: $calc_score, hit_start: $hit_start, hit_end: $hit_end, gaps: $gaps, contig: $contig_name\n</h1>";
+
#making
-
+
# Save BLAST results, sorting on position and contig
if (exists $BITS{$id}){
- #print "if exists, BITS{id}: $BITS{$id}, bits: $bits, qid: $qid\n";
+ #print "if exists, BITS{id}: $BITS{$id}, bits: $bits, $calc_score qid: $qid, id: $ident, lenght (q/h): $query_length/$hsp_length\n";
+ #if ($ident == 100 and $query_length/$hsp_length == 1) {
+ # $calc_score = $calc_score*100;
+ # $bits = $bits*100;
+ #}
if ($calc_score == $ABres{$id} and $bits > $BITS{$id}) {
$ABres{$id} = $calc_score; #%MLST and %PERC_IDENT are later used to find the best query ID at the next level (adk1.., adk2.., adk3.. -> dk2..).
$GENEID{$id} = $qid; #Query ID, genename_connecter_accessionnumber, e.g. tet(C)_2_AB123456
@@ -232,9 +280,9 @@ foreach my $element(@Antimicrobial){
#print "else; ID: $id\tGENE: $GENEID{$id}\n";
}
}
-
-
- # Saving the best hits in each contig for each position
+
+
+ #Saving the best hits in each contig for each porsition
foreach my $contig (keys %HoA_sort){
#print "First level: $contig\n";
my @list;
@@ -250,10 +298,10 @@ foreach my $element(@Antimicrobial){
$k++;
}
- # Sorting AoA aftener start position in contig
+ #Sorting AoA aftener start position in contig
my @sorted = sort { $a->[0] <=> $b->[0] } @list;
- # Sorting, so overlapping genes are removed
+ #Sorting, so overlapping genes are removed
my $start = 0;
my $end = 1;
my $contig = 2;
@@ -265,15 +313,21 @@ foreach my $element(@Antimicrobial){
if ($#sorted == 0) {
push(@best_hit, "$sorted[0][0]_$sorted[0][1]_$sorted[0][2]");
}
- until ($next > $#sorted ) { # as long $first <= 4 and $next <= 5
+ until ($next > $#sorted ) { # s lnge $first <= 4 og $next <= 5
if ($#sorted == 1) {
- if ($sorted[$first][$bits] >= $sorted[$next][$bits]) {
+ if ($sorted[$first][$end] <= $sorted[$next][$start]) { # if the two hits dos not overlap, save both
+ push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
+ push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
+ #print "2 hits, save both\n";
+ $next = 2;
+ }
+ elsif ($sorted[$first][$bits] >= $sorted[$next][$bits]) {
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
- #print "Sorted == 1, first: $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]";
+ #print "Sorted == 1, first: $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]\n";
$next++;
} else {
push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
- #print "Sorted == 1, next: $sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]";
+ #print "Sorted == 1, next: $sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]\n";
$next++;
}
}
@@ -281,26 +335,27 @@ foreach my $element(@Antimicrobial){
if ($next == $#sorted) {
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
- #print "if end<start, first: $first, next: $next\t$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]\t$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]\n";
+ #print "if end<start, first: $first, next: $next\t$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3]\t$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][3]\n";
$first = $next;
$next++;
} else {
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
- #print "if end<start, first: $first, next: $next\t$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]\n";
+ #print "if end<start, first: $first, next: $next\t$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3]\n";
$first = $next;
$next++;
}
}
- elsif ( ($sorted[$first][$end] >= $sorted[$next][$start]+1) && ($sorted[$first][$end] < $sorted[$next][$start]+30)) { # overlap 1-30, Vancomycin A-H overlap 3, 7, 10, 17, 22
+ elsif ( ($sorted[$first][$end] >= $sorted[$next][$start]+1) && ($sorted[$first][$end]-$sorted[$next][$start] <= 30)) { # overlap 1-30, Vancomycin A-H overlap 3, 7, 10, 17, 22
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
- #print "if +1-30, first: $first, next: $next\tsave: $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]\n";
+ #print "if +1-30, first: $first, next: $next\tsave: $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3]\n";
$first = $next;
$next++;
}
elsif (($sorted[$first][$end] > $sorted[$next][$start]) && ($sorted[$first][$calc_score] < $sorted[$next][$calc_score])){ # If first end is after next start + first is better...
if ($next == $#sorted){
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
- #print "1>2: $next\n";
+ #print "1>2: $first, $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3], bits: $sorted[$first][$bits] cal_score: $sorted[$first][$calc_score]\n";
+ #print "1>2: $next, $sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][3], bits: $sorted[$next][$bits] cal_score: $sorted[$next][$calc_score]\n";
$next++;
} else {
#print "1>2: first: $first, next: $next\n";
@@ -310,7 +365,7 @@ foreach my $element(@Antimicrobial){
elsif (($sorted[$first][$end] > $sorted[$next][$start]) && ($sorted[$first][$calc_score] > $sorted[$next][$calc_score])) { #If first end is after next start + next is better...
if ($next == $#sorted){
push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
- #print "1<=2: $next\n";
+ #print "1<=2: first: $first, next: $next, $sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][3]\n";
$next++;
} else {
#print "1<=2: first: $first, next: $next\n";
@@ -319,55 +374,69 @@ foreach my $element(@Antimicrobial){
}
}
elsif (($sorted[$first][$end] > $sorted[$next][$start]) && ($sorted[$first][$calc_score] == $sorted[$next][$calc_score])) { # If Calc_score ==
- if ($sorted[$first][$bits] > $sorted[$next][$bits]) { # First bits > next bits => first better
+ #my $f = "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]";
+ #my $n = "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]";
+ #print "ID: $PERC_IDENT{$f}\t$PERC_IDENT{$n}\n";
+ if ($PERC_IDENT{"$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]"} == 100 ) {
+ push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
+ $next++;
+ #print "1>2==: first: $first, next: $next, $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3], $PERC_IDENT{$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]}\n";
+ }
+ elsif ($PERC_IDENT{"$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]"} == 100 ){
+ push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
+ $first = $next;
+ $next++;
+ }
+ elsif ($sorted[$first][$bits] > $sorted[$next][$bits]) { # First bits > next bits => first better
if ($next == $#sorted){
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
- #print "1>2: $next\n";
+ #print "1>2==: first: $first, next: $next, $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][3]\n";
$next++;
} else {
- #print "1>2: first: $first, next: $next\n";
+ #print "1>2==: first: $first, $sorted[$first][$bits] $sorted[$first][3] next: $next $sorted[$next][$bits] $sorted[$next][3]\n";
$next++;
}
}
elsif ($sorted[$first][$bits] <= $sorted[$next][$bits]) { # Next bits > first bits => next better...
if ($next == $#sorted){
push (@best_hit, "$sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][$contig]");
- #print "1<=2: $next\n";
+ #print "1<=2bits: first: $first, next: $next, $sorted[$next][$start]_$sorted[$next][$end]_$sorted[$next][3]\n";
$next++;
} else {
- #print "1<=2: first: $first, next: $next\n";
+ #print "1<=2bits: first: $first, next: $next\n";
$first = $next;
$next++;
}
}
}
elsif ($next == $#sorted){
- #print "First = length: first: $first, next: $next\t";
+ #print "First = length: first: $first, next: $next, $sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$3]\t";
push (@best_hit, "$sorted[$first][$start]_$sorted[$first][$end]_$sorted[$first][$contig]");
$next++;
}
}
}
-
+
my @control;
my $major_variants_detector = 0;
# Savning the best hits for output/results, with nice output
foreach my $gene (@best_hit){
#print "Gene: $gene\t$GENEID{$gene}\n"; # $gene = $qid2, "hit_start"_"hit_end"_"contig_name", $GENEID{$gene} = $qid = genename_connecter_accessionNR (tet(A)_1_AB12345)
-
+
my $data = $gene;
-
- # Get accession number
+
+ #Get accession number
my @tmp = split(/_/, $GENEID{$gene},3);
my $genename = $tmp[0];
my $accNR = $tmp[2];
#print "Gene: $genename\tAcc: $accNR\n";
-
+
$GENE_RESULTS_HASH{$gene} = ();
$GENE_ALIGN_HIT_HASH{$gene} = ();
$GENE_ALIGN_HOMO_HASH{$gene} = ();
$GENE_ALIGN_QUERY_HASH{$gene} = ();
-
+
+ #Declaring variables
my $sub_contig;
my $sub_rev_contig;
my $final_contig;
@@ -380,11 +449,11 @@ foreach my $element(@Antimicrobial){
my @gaps_in_hit;
my $no_gaps_gene = 0;
my $no_gaps_hit = 0;
-
+
# Position in contig
my $position = "$HIT_START{$data}..$HIT_END{$data}";
- # Finding genes with a %ID >= threshold, savning results for output-print in html
+ #Finding genes with a %ID >= threshold, savning results for output-print in html
if ($HSP_LENGTH{$data} >= ($min_length)*$QUERY_LENGTH{$data} and $PERC_IDENT{$data} >= $threshold) {
push(@control, $data);
if ($HSP_LENGTH{$data} == $QUERY_LENGTH{$data} and $PERC_IDENT{$data} == 100.00){ #If 100% length and %id = 100
@@ -394,7 +463,7 @@ foreach my $element(@Antimicrobial){
my $phenores;
if (exists $Pheno_hash{$genename}){
$phenores = $Pheno_hash{$genename};
- } # End if (exists $Pheno_hash{$gene})
+ } #End if (exists $Pheno_hash{$gene})
push(@{$GENE_RESULTS_HASH{$gene}},sprintf("%.2f", $PERC_IDENT{$data})); #% ID
push(@{$GENE_RESULTS_HASH{$gene}}, $QUERY_LENGTH{$data}); # Query lenght
push(@{$GENE_RESULTS_HASH{$gene}}, $HSP_LENGTH{$data}); # HSP langht
@@ -443,32 +512,32 @@ foreach my $element(@Antimicrobial){
push(@{$GENE_RESULTS_HASH2{$gene}}, $QUERY_START{$data}); #
push(@{$GENE_RESULTS_HASH2{$gene}}, $genename);
#print "$QUERY_LENGTH{$data}, $HSP_LENGTH{$data}, $CONTIG_NAME{$data}, $position, $data\n";
- # Identifying gaps in the MLST allele string and hit string
+ #Identifying gaps in the MLST allele string and hit string
@gaps_in_gene = Getting_gaps($Q_STRING{$data});
@gaps_in_hit = Getting_gaps($HIT_STRING{$data});
#foreach my $elem (@gaps_in_hit){
#print "####\n$elem\n#####";
#}
-
+
$no_gaps_gene = scalar @gaps_in_gene;
$no_gaps_hit = scalar @gaps_in_hit;
-
- # Getting the complete mlst allele (even thought it may not all be part of the HSP)
+
+ #Getting the complete mlst allele (even thought it may not all be part of the HSP)
my @array_for_getting_mlst_seq = ($GENEID{$data});
my $Seqs_ref = grep_ids(-seqs => $Seqs_ABres, -ids => \@array_for_getting_mlst_seq);
for (@{ $Seqs_ref }) {
$seq_lower_case = lc($_->seq);
}
-
- # Getting the right contig
+
+ #Getting the right contig
my @array_for_getting_genome_seq = ($CONTIG_NAME{$data});
my $Seqs_genome_ref = grep_ids(-seqs => $Seqs_input, -ids => \@array_for_getting_genome_seq);
for (@{ $Seqs_genome_ref }) {
my $contig = lc($_->seq);
- my $length_contig = length($contig); #Redundant, da jeg ogs� v.h.a. bioperl finder hit length
-
- # Getting the right sub_contig depends on which strand the hit is on
- # If the hit is on the +1
+ my $length_contig = length($contig); #Redundant, da jeg ogs v.h.a. bioperl finder hit length
+
+ #Getting the right sub_contig depends on which strand the hit is on
+ #If the hit is on the +1
if ($HIT_STRAND{$data} == 1){
if (($QUERY_START{$data} == 1) && (($QUERY_LENGTH{$data} + $no_gaps_gene) == $HSP_LENGTH{$data})){
$variant = 1;
@@ -490,26 +559,29 @@ foreach my $element(@Antimicrobial){
$variant = 4;
$major_variants_detector = 1;
$sub_contig = substr($contig, 0, ( $HIT_START{$data} + (($QUERY_LENGTH{$data} + $no_gaps_gene) - $QUERY_START{$data})));
- #If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome. Likewise, the match-string (the "||�|||�||") should be preceeded by spaces
+ #If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome. Likewise, the match-string (the "|||||||") should be preceeded by spaces
$spaces_hit = $QUERY_START{$data} - $HIT_START{$data} + 1;
$spaces_match_string = $QUERY_START{$data};
- } else {
+ }
+ else {
if ((($HIT_START{$data} - $QUERY_START{$data}) + ($QUERY_LENGTH{$data} + $no_gaps_gene)) < ($HIT_LENGTH{$data} + $no_gaps_hit)){
$variant = "5a";
$sub_contig = substr($contig, ($HIT_START{$data} - $QUERY_START{$data}), ($QUERY_LENGTH{$data} + $no_gaps_gene));
$spaces_match_string = $QUERY_START{$data};
- } else {
+ }
+ else {
$variant = "5b";
#$major_variants_detector = 1;
$sub_contig = substr($contig, ($HIT_START{$data} - $QUERY_START{$data}), ((($HIT_LENGTH{$data} + $no_gaps_hit) - $HIT_START{$data}) + $QUERY_START{$data} -1 ));
$spaces_match_string = $QUERY_START{$data};
}
}
- } else {
+ }
+ else {
#print "New option not taken into account!\n";
}
}
-
+
#If the hit is on the -1 strand
elsif ($HIT_STRAND{$data} == -1){
if (($QUERY_START{$data} == 1) && (($QUERY_LENGTH{$data} + $no_gaps_gene) == $HSP_LENGTH{$data})){
@@ -555,7 +627,8 @@ foreach my $element(@Antimicrobial){
}
}
}
- } else {
+ }
+ else {
#print "New option not taken into account!\n";
}
@@ -564,7 +637,7 @@ foreach my $element(@Antimicrobial){
$sub_contig = $sub_rev_contig;
} # End elsif ($HIT_STRAND{$data} == -1){
} # End for (@{ $Seqs_genome_ref }) {
-
+
#Adding gaps to the sub_contig (if there are any) leading to the creation of final_contig
if ($no_gaps_hit > 0){
my $hsp_length1 = (length $sub_contig) + $no_gaps_hit;
@@ -598,13 +671,13 @@ foreach my $element(@Antimicrobial){
}
$final_contig .= $sign1;
++$start1;
- } # END for (my $i = 0 ; $i < $hsp_length1 ; ++$i){
- } # END if ($no_gaps_hit > 0){
+ } #END for (my $i = 0 ; $i < $hsp_length1 ; ++$i){
+ } #END if ($no_gaps_hit > 0){
else {
$final_contig = $sub_contig;
}
#print "####\n final hit contig: $final_contig\n ##################";
- # Adding gaps to the mlst allele sequence, $seq_lower_case (if there are any) leading to the creation of final_seq_lower_case
+ #Adding gaps to the mlst allele sequence, $seq_lower_case (if there are any) leading to the creation of final_seq_lower_case
if ($no_gaps_gene > 0){
my $hsp_length2 = (length $seq_lower_case) + $no_gaps_gene;
my $sign2;
@@ -630,14 +703,14 @@ foreach my $element(@Antimicrobial){
$final_seq_lower_case = $seq_lower_case;
}
}
- else { # If length !=
+ else { #If length !=
push(@{$GENE_RESULTS_HASH{$gene}}, "warning2");
push(@{$GENE_RESULTS_HASH2{$gene}}, "warning2");
#print "warning2 - $gene\n";
my $phenores;
if (exists $Pheno_hash{$genename}){
$phenores = $Pheno_hash{$genename};
- } # End if (exists $Pheno_hash{$gene})
+ } #End if (exists $Pheno_hash{$gene})
push(@{$GENE_RESULTS_HASH{$gene}},sprintf("%.2f", $PERC_IDENT{$data})); #% ID
push(@{$GENE_RESULTS_HASH{$gene}}, $QUERY_LENGTH{$data}); # Query lenght
push(@{$GENE_RESULTS_HASH{$gene}}, $HSP_LENGTH{$data}); # HSP langht
@@ -658,32 +731,32 @@ foreach my $element(@Antimicrobial){
push(@{$GENE_RESULTS_HASH2{$gene}}, $QUERY_START{$data});
push(@{$GENE_RESULTS_HASH2{$gene}}, $genename);
- # Identifying gaps in the MLST allele string and hit string
+ #Identifying gaps in the MLST allele string and hit string
@gaps_in_gene = Getting_gaps($Q_STRING{$data});
@gaps_in_hit = Getting_gaps($HIT_STRING{$data});
#foreach my $elem (@gaps_in_hit){
#print "####\n$elem\n#####";
#}
-
+
$no_gaps_gene = scalar @gaps_in_gene;
$no_gaps_hit = scalar @gaps_in_hit;
-
- # Getting the complete mlst allele (even thought it may not all be part of the HSP)
+
+ #Getting the complete mlst allele (even thought it may not all be part of the HSP)
my @array_for_getting_mlst_seq = ($GENEID{$data});
my $Seqs_ref = grep_ids(-seqs => $Seqs_ABres, -ids => \@array_for_getting_mlst_seq);
for (@{ $Seqs_ref }) {
$seq_lower_case = lc($_->seq);
}
-
- # Getting the right contig
+
+ #Getting the right contig
my @array_for_getting_genome_seq = ($CONTIG_NAME{$data});
my $Seqs_genome_ref = grep_ids(-seqs => $Seqs_input, -ids => \@array_for_getting_genome_seq);
for (@{ $Seqs_genome_ref }) {
my $contig = lc($_->seq);
- my $length_contig = length($contig); # Redundant, since bioperl finds hit length
-
- # Getting the right sub_contig depends on which strand the hit is on
- # If the hit is on the +1
+ my $length_contig = length($contig); #Redundant, da jeg ogs v.h.a. bioperl finder hit length
+
+ #Getting the right sub_contig depends on which strand the hit is on
+ #If the hit is on the +1
if ($HIT_STRAND{$data} == 1){
if (($QUERY_START{$data} == 1) && (($QUERY_LENGTH{$data} + $no_gaps_gene) == $HSP_LENGTH{$data})){
$variant = 1;
@@ -705,7 +778,7 @@ foreach my $element(@Antimicrobial){
$variant = 4;
$major_variants_detector = 1;
$sub_contig = substr($contig, 0, ( $HIT_START{$data} + (($QUERY_LENGTH{$data} + $no_gaps_gene) - $QUERY_START{$data})));
- # If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome. Likewise, the match-string (the "||�|||�||") should be preceeded by spaces
+ #If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome. Likewise, the match-string (the "|||||||") should be preceeded by spaces
$spaces_hit = $QUERY_START{$data} - $HIT_START{$data} + 1;
$spaces_match_string = $QUERY_START{$data};
}
@@ -722,11 +795,12 @@ foreach my $element(@Antimicrobial){
$spaces_match_string = $QUERY_START{$data};
}
}
- } else {
+ }
+ else {
#print "New option not taken into account!\n";
}
}
-
+
#If the hit is on the -1 strand
elsif ($HIT_STRAND{$data} == -1){
if (($QUERY_START{$data} == 1) && (($QUERY_LENGTH{$data} + $no_gaps_gene) == $HSP_LENGTH{$data})){
@@ -746,7 +820,7 @@ foreach my $element(@Antimicrobial){
}
elsif ($QUERY_START{$data} > 1){
if (($HIT_START{$data} + $QUERY_LENGTH{$data}) > $length_contig){
- # If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome
+ #If, as here, the HSP only starts some nucleotides within the mlst allele, a number of spaces must be written before the matching string from the genome
$variant = 10;
$major_variants_detector = 1;
$spaces_hit = $QUERY_START{$data} - ($length_contig - $HIT_END{$data}) ;
@@ -772,7 +846,8 @@ foreach my $element(@Antimicrobial){
}
}
}
- } else {
+ }
+ else {
#print "New option not taken into account!\n";
}
@@ -781,8 +856,8 @@ foreach my $element(@Antimicrobial){
$sub_contig = $sub_rev_contig;
} # End elsif ($HIT_STRAND{$data} == -1){
} # End for (@{ $Seqs_genome_ref }) {
-
- # Adding gaps to the sub_contig (if there are any) leading to the creation of final_contig
+
+ #Adding gaps to the sub_contig (if there are any) leading to the creation of final_contig
if ($no_gaps_hit > 0){
my $hsp_length1 = (length $sub_contig) + $no_gaps_hit;
my $sign1;
@@ -815,13 +890,13 @@ foreach my $element(@Antimicrobial){
}
$final_contig .= $sign1;
++$start1;
- } # END for (my $i = 0 ; $i < $hsp_length1 ; ++$i){
- } # END if ($no_gaps_hit > 0){
+ } #END for (my $i = 0 ; $i < $hsp_length1 ; ++$i){
+ } #END if ($no_gaps_hit > 0){
else {
$final_contig = $sub_contig;
}
#print "####\n final hit contig: $final_contig\n ##################";
- # Adding gaps to the mlst allele sequence, $seq_lower_case (if there are any) leading to the creation of final_seq_lower_case
+ #Adding gaps to the mlst allele sequence, $seq_lower_case (if there are any) leading to the creation of final_seq_lower_case
if ($no_gaps_gene > 0){
my $hsp_length2 = (length $seq_lower_case) + $no_gaps_gene;
my $sign2;
@@ -848,8 +923,8 @@ foreach my $element(@Antimicrobial){
}
}# END else# End else
- # Nicely printing the sequences
- #print contig_name
+ #Nicely printing the sequences
+ # print contig_name
#print "Con: $CONTIG_NAME{$data}, HSP: $HSP_LENGTH{$data}, Acc: $accNR, Pos: $position\t";
#push(@{$GENE_ALIGN_HIT_HASH{$gene}}, $CONTIG_NAME{$data});
#print "Variant: $variant\n";
@@ -860,7 +935,7 @@ foreach my $element(@Antimicrobial){
#print $ABres_substr . "\n";
push(@{$GENE_ALIGN_QUERY_HASH{$gene}}, $ABres_substr);
- #Printing spaces before the match string with the "||�||||�||�||" and the string itself
+ #Printing spaces before the match string with the "||||||||||" and the string itself
#print " ";
my $string_spaces = ""; #For saving the spaces as a string of spaces instead of a number of spaces
for (my $i = 1 ; $i < $spaces_match_string; ++$i){
@@ -871,7 +946,7 @@ foreach my $element(@Antimicrobial){
#print $homo_string_substr . "\n";
push(@{$GENE_ALIGN_HOMO_HASH{$gene}}, $homo_string_substr);
- # Printing the match in the genome
+ #Printing the match in the genome
my $string_spaces_hit = ""; #For saving the spaces as a string of spaces instead of a number of spaces
for (my $i = 1 ; $i < $spaces_hit ; ++$i){
$string_spaces_hit .= " ";
@@ -888,7 +963,8 @@ foreach my $element(@Antimicrobial){
delete $GENE_RESULTS_HASH{$gene};
} # End else
} # End foreach my $gene (keys %Profile){
-
+
+
my $length = scalar at control;
push(@RESULTS_AND_SETTINGS_ARRAY, $length);
push(@RESULTS_AND_SETTINGS_ARRAY, $argfProfiles{$element});
@@ -896,12 +972,16 @@ foreach my $element(@Antimicrobial){
push(@RESULTS_AND_SETTINGS_ARRAY, $threshold);
push(@RESULTS_AND_SETTINGS_ARRAY, $procent_length);
- # Making the header of for the tab seperated result file
- $tableresult .= $argfProfiles{$CurrentAnti};
+ #print_html_results(\@RESULTS_AND_SETTINGS_ARRAY, \%GENE_RESULTS_HASH);
+
+
+ ## Making the header of for the tab seperated result file
+ $tabr .= $argfProfiles{$CurrentAnti};
if (!%GENE_RESULTS_HASH) {
- $tableresult .= "\nNo resistance genes found.\n"
- } else {
- $tableresult .= "\nResistance gene\tIdentity\tQuery/HSP\tContig\tPosition in contig\tPhenotype\tAccession no.\n"; #added to print tab-separated txt file
+ $tabr .= "\nNo resistance genes found.\n"
+ }
+ else {
+ $tabr .= "\nResistance gene\tIdentity\tQuery/HSP\tContig\tPosition in contig\tPhenotype\tAccession no.\n"; #added to print tab-separated txt file
## Making a hash with results for txt printing WORKS
foreach my $key (sort keys %GENE_RESULTS_HASH) {
@@ -911,15 +991,14 @@ foreach my $element(@Antimicrobial){
$FINAL_RESULTS{$key}= [@$array[9], @$array[1], @$array[2], @$array[3], @$array[4], @$array[5], @$array[8], $CurrentAnti, $phenotype];
#$contigtable .= "Contig ".$antibiocount." is ".@$array[4]."\n";
#$contigtable .= "-----------\n";
-
+
$tabr .= @$array[9]."\t".@$array[1]."\t".@$array[2]."/".@$array[3]."\t".@$array[4]."\t".@$array[5]."\t".$phenotype."\t".@$array[8]."\n";
- $tableresult .= @$array[9]."\t".@$array[1]."\t".@$array[2]."/".@$array[3]."\t".@$array[4]."\t".@$array[5]."\t".$phenotype."\t".@$array[8]."\n";
}
}
# Ekstra newline indicating that the result for the current anti has ended and a new can begin in the tab result
- $tableresult .= "\n";
-
+ $tabr .= "\n";
+
} # End foreach my $element(@Antimicrobial){
my $contigcount = 0; ## To give the contig numbers and put the real contig names in a table
@@ -952,13 +1031,16 @@ foreach my $key (sort keys %FINAL_RESULTS) {
}
## Printing the alignments as text, making a txt string with alignment
+
foreach my $key (sort keys %GENE_RESULTS_HASH2) {
- # print header line for each gene
+ # print header line for each gene
+
my $array = $GENE_RESULTS_HASH2{$key};
my $outStr = @$array[7];
my $hspLen = @$array[3];
my $qStart = @$array[6];
my $qEnd = $qStart + $hspLen - 1;
+ #my $matchAll = lc(@$array[5]);
if (@$array[0] eq "perfect" ){
$outStr .= ": PERFECT MATCH, ";
}
@@ -969,120 +1051,59 @@ foreach my $key (sort keys %GENE_RESULTS_HASH2) {
$outStr = $outStr.": WARNING2, ";
}
$alignment .= $outStr."ID: ".@$array[1]."%, HSP/Length: ".@$array[3]."/".@$array[2]. ", Contig name: ".@$array[4].", Position: ".@$array[5]."\n\n";
- $hits_in_seq .= ">".$outStr."ID: ".@$array[1]."%, HSP/Length: ".@$array[3]."/".@$array[2]. ", Positions in reference: ".$qStart."..".$qEnd.", Contig name: ".@$array[4].", Position: ".@$array[5]."\n"; #m�ske forkert text
+ $hits_in_seq .= ">".$outStr."ID: ".@$array[1]."%, HSP/Length: ".@$array[3]."/".@$array[2]. ", Positions in reference: ".$qStart."..".$qEnd.", Contig name: ".@$array[4].", Position: ".@$array[5]."\n"; #mske forkert text
$resalign .= ">".@$array[7]."\n";
-
- # now print the alleles
+
+ #now print the alleles
my $queryArray = $GENE_ALIGN_QUERY_HASH{$key};
my $homoArray = $GENE_ALIGN_HOMO_HASH{$key};
my $hitArray = $GENE_ALIGN_HIT_HASH{$key};
-
+
for (my $i=0; $i < scalar(@$hitArray); $i++){
my $tmpQuerySingleLine = @$queryArray[$i];
my $tmpHomoSingleLine = @$homoArray[$i];
my $tmpHitSingleLine = @$hitArray[$i];
+
$alignment .= "Resistance gene seq: ".$tmpQuerySingleLine."\n";
$alignment .= " ".$tmpHomoSingleLine."\n";
$alignment .= "Hit in genome: ".$tmpHitSingleLine."\n\n";
$resalign .= $tmpQuerySingleLine."\n";
$hits_in_seq .= $tmpHitSingleLine."\n";
- } # end for
+ }#end for
$alignment .= "\n--------------------------------------------------------------------------------\n\n";
-} # end foreach
+}#end foreach
-# WRITING results.txt
-open (TXTRESULTS, '>>',"$dir/results.txt") or die("Error! Could not write to $dir"."results.txt\n");
+ #WRITING standard_output.txt
+open (TXTRESULTS, '>>',$dir.'results.txt') or die("Error! Could not write to $dir"."results.txt\n");
print TXTRESULTS $txtresults;
print TXTRESULTS $contigtable;
print TXTRESULTS $alignment;
close (TXTRESULTS);
+#print $txtresults; #printing to screen
-# Print table output
-open (TABLER, '>',"$dir/results_table.txt") or die("Error! Could not write to results_table.txt");
-print TABLER $tableresult;
-close (TABLER);
-# Print tab output
-open (TABR, '>', "$dir/results_tab.txt") or die("Error! Could not write to results_tab.txt");
+open (TABR, '>',$dir.'results_tab.txt') or die("Error! Could not write to results_tab.txt");
print TABR $tabr;
close (TABR);
-# WRITING Hit_in_genome_seq.fsa
-open (HIT, '>'."$dir/Hit_in_genome_seq.fsa") || die("Error! Could not write to Hit_in_genome_seq.fsa");
+
+ #WRITING Hit_in_genome_seq.fsa
+open (HIT, '>'.$dir.'Hit_in_genome_seq.fsa') || die("Error! Could not write to Hit_in_genome_seq.fsa");
print HIT $hits_in_seq;
close (HIT);
-# WRITING Resistance_gene_seq.fsa
-open (ALLELE, '>'."$dir/Resistance_gene_seq.fsa") || die("Error! Could not write to Resistance_gene_seq.fsa");
+#WRITING Resistance_gene_seq.fsa
+open (ALLELE, '>'.$dir.'Resistance_gene_seq.fsa') || die("Error! Could not write to Resistance_gene_seq.fsa");
print ALLELE $resalign;
close (ALLELE);
-system("rm -r error.log");
-system("rm -r formatdb.log");
-system("rm -r $tmp_dir/");
-
exit;
-# --------------------------------------------------------------------
-# %% Land of the Subroutines %%
-sub commandline_parsing {
- while (scalar @ARGV) {
- if ($ARGV[0] =~ m/^-d$/) {
- $ABRES_DB = $ARGV[1];
- $Pheno = "$ABRES_DB/notes.txt";
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-b$/) {
- $BLAST = $ARGV[1];
- $BLASTALL = "$BLAST/bin/blastall";
- $FORMATDB = "$BLAST/bin/formatdb";
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-l$/) {
- $min_length = $ARGV[1];
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-k$/) {
- $threshold = $ARGV[1];
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-a$/) {
- if ($ARGV[1] eq 'all') {
- $AB_indput = 'aminoglycoside,beta-lactamase,quinolone,fosfomycin,fusidicacid,macrolide,nitroimidazole,oxazolidinone,phenicol,rifampicin,sulphonamide,tetracycline,trimethoprim,glycopeptide'
- }
- else {
- $AB_indput = $ARGV[1];
- }
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-i$/) {
- $InFile = $ARGV[1];
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-o$/) {
- $dir = $ARGV[1];
- mkdir $dir;
- shift @ARGV;
- shift @ARGV;
- }
- elsif ($ARGV[0] =~ m/^-h$/) {
- $Help = 1;
- shift @ARGV;
- }
- else {
- &print_help();
- exit
- }
- }
-}
+# --------------------------------------------------------------------
+# %% Land of the Subroutines %%
+#
############# Phenotype sub ########
# Making hash of phenotypes, genename = phenotype.
@@ -1104,57 +1125,64 @@ sub phenolist {
# Returns text lines of blast output
sub get_blast_run {
- my ($tmp_dir, $org, %args) = @_;
- #my $fh = $tmp_dir;
- my $file = "blast_$org.fsa";
- #my ($fh, $file) = tempfile( DIR => '/tmp', UNLINK => 1);
- output_sequence(-file => ">$tmp_dir/$file", seqs => delete $args{-d}, -format => 'fasta');
- die "Error! Could not build blast database" if (system("$FORMATDB -p F -i $tmp_dir/$file"));
-
- my $query_file = "$file.blastpipe";
-
- #open QUERY, ">> $query_file" || die("Error! Could not perform blast run");
- output_sequence(-file => ">$tmp_dir/$query_file", seqs => $args{-i}, -format => 'fasta');
- #close QUERY;
-
- delete $args{-i};
-
- my $cmd = join(" ", %args);
- my $file2 = "$tmp_dir/$file.blast_output";
- system("$BLASTALL -d $tmp_dir/$file -i $tmp_dir/$query_file -o $file2 $cmd");
-
- my $report = new Bio::SearchIO( -file => $file2,
- -format => "blast"
- );
- # Go through BLAST reports one by one
- my @blast;
- while(my $result = $report->next_result) {
- # Go through each matching sequence
- while(my $hit = $result->next_hit) {
- # Go through each each HSP for this sequence
- while (my$hsp = $hit->next_hsp) {
- push(@blast, $result->query_accession ."\t".
- $result->query_length ."\t".
- $hsp->hsp_length ."\t".
- $hsp->gaps ."\t".
- $hsp->percent_identity ."\t".
- $hsp->evalue ."\t".
- $hsp->bits ."\t".
- $hsp->query_string ."\t".
- $hsp->hit_string ."\t".
- $hsp->homology_string ."\t".
- $hsp->seq_inds ."\t".
- $hsp->strand('hit') ."\t".
- $hsp->start('hit') ."\t".
- $hsp->end('hit') ."\t".
- $hit->name ."\t".
- $hsp->strand('query') ."\t".
- $hsp->start('query') ."\t".
- $hit->length ."\n");
- }
- }
- }
- return @blast;
+ my %args = @_;
+ #my ($fh, $file) = tempfile( DIR => '/scratch', UNLINK => 1);
+ my ($fh, $file) = tempfile( DIR => '/tmp', UNLINK => 1);
+ #print "\ntemp filename: $file\n";
+ output_sequence(-fh => $fh, seqs => delete $args{-d}, -format => 'fasta');
+ #die "Error! Could not build blast database" if (system("/usr/cbs/bio/bin/Linux/x86_64/formatdb -p F -i $file"));
+ die "Error! Could not build blast database" if (system("$FORMATDB -p F -i $file"));
+ my $query_file = $file.".blastpipe";
+ `mknod $query_file p`;
+ if ( !fork() ) {
+ open QUERY, ">> $query_file" || die("Error! Could not perform blast run");
+ output_sequence(-fh => \*QUERY, seqs => $args{-i}, -format => 'fasta');
+ close QUERY;
+ exit(0);
+ }
+ delete $args{-i};
+ my $cmd = join(" ", %args);
+ #my ($fh2, $file2) = tempfile( DIR => '/scratch', UNLINK => 1);
+ my ($fh2, $file2) = tempfile( DIR => '/tmp', UNLINK => 1);
+ #print "\ntemp filename: $file2\n";
+ print $fh2 `$BLASTALL -d $file -i $query_file $cmd`;
+ close $fh2;
+ #system("$BLASTALL -d $file -i $query_file $cmd > $file2");
+ #system ("echo -d $file -i $query_file $cmd ");
+ my $report = new Bio::SearchIO(
+ -file => $file2,
+ -format => "blast");
+# Go through BLAST reports one by one, evt. array in array...
+my @blast;
+while(my $result = $report->next_result) {
+ # Go through each matching sequence
+ while(my $hit = $result->next_hit) {
+ # Go through each HSP for this sequence
+ while (my$hsp = $hit->next_hsp) {
+ push(@blast,
+ $result->query_accession . "\t" .
+ $result->query_length . "\t" .
+ $hsp->hsp_length . "\t" .
+ $hsp->gaps . "\t" .
+ $hsp->percent_identity . "\t" .
+ $hsp->evalue . "\t" .
+ $hsp->bits . "\t" .
+ $hsp->query_string ."\t" .
+ $hsp->hit_string ."\t" .
+ $hsp->homology_string . "\t" .
+ $hsp->seq_inds . "\t" .
+ $hsp->strand('hit') . "\t" .
+ $hsp->start('hit') . "\t" .
+ $hsp->end('hit') . "\t" .
+ $hit->name . "\t" .
+ $hsp->strand('query') . "\t" .
+ $hsp->start('query') . "\n");
+ }
+ }
+}
+unlink 'formatdb.log', 'error.log', "$file.blastpipe" , "$file.nhr" , "$file.nin" , "$file.nsq";
+#system("rm -rf formatdb.log error.log");
+return @blast;
}
###################################
@@ -1232,22 +1260,21 @@ sub Getting_gaps {
# The filehandle and filename
sub output_sequence {
- my %args = @_;
- my $seqs_ref = delete $args{seqs};
- my $i = 1;
- #$args{-fh} = \*STDOUT unless (exists $args{-fh} or exists $args{-file});
- #if (exists $args{tempdir}) {
- # my $tempdir = delete $args{tempdir};
- # ($args{-fh}, $args{-file}) = tempfile(DIR => $tempdir, SUFFIX => ".".$args{-format})
- #}
- my $file = delete $args{-file} if (exists $args{-fh} && exists $args{-file}); # Stupid BioPerl cannot handle that both might be set...
- print %args;
- my $seq_out = Bio::SeqIO->new(%args);
- $args{-file} = $file if (defined $file);
- for my $seq (@{ $seqs_ref }) {
- $seq_out->write_seq($seq);
- }
- return ($args{-fh}, $args{-file});
+ my %args = @_;
+ my $seqs_ref = delete $args{seqs};
+ my $i = 1;
+ $args{-fh} = \*STDOUT unless (exists $args{-fh} or exists $args{-file});
+ if (exists $args{tempdir}) {
+ my $tempdir = delete $args{tempdir};
+ ($args{-fh}, $args{-file}) = tempfile(DIR => $tempdir, SUFFIX => ".".$args{-format})
+ }
+ my $file = delete $args{-file} if (exists $args{-fh} && exists $args{-file}); # Stupid BioPerl cannot handle that both might be set...
+ my $seq_out = Bio::SeqIO->new(%args);
+ $args{-file} = $file if (defined $file);
+ for my $seq (@{ $seqs_ref }) {
+ $seq_out->write_seq($seq);
+ }
+ return ($args{-fh}, $args{-file});
}
# --------------------------------------------------------------------
@@ -1265,8 +1292,10 @@ NAME
$ProgName - $ProgNameLong
SYNOPSIS
- $ProgName [Options]
-
+ $ProgName -d [Antimicrobial] [Options] < [File]
+ or
+ $ProgName -d [Antimicrobial] -i [File] [Options]
+
DESCRIPTION
Calculates antibiotic resistance genes based on a BLAST alignment of the input
sequence file and the specified allele set. If possible the ST will be
@@ -1278,40 +1307,44 @@ DESCRIPTION
alleles from the species specified with '-d'.
Notice also that the default options for BLAST are changed to suit the
- ResFinder alignment.
-
-Options:
-
- -h HELP
- Prints a message with options and information to the screen
- -d DATABASE
- The path to where you have located the database folder
- -b BLAST
- The path to the location of blast-2.2.26 if it is not added
- to the user's path (see the install guide in 'README.md')
- -i INFILE
- Your input file which needs to be preassembled partial
- or complete genomes in fasta format
- -o OUTFOLDER
- The folder you want to have your output files places.
- If not specified the program will create a folder named
- 'Output' in which the result files will be stored
- -a ANTIMICROBIAL
- Antimicrobial configuration. The options can be found
- in the config file
- -k THRESHOLD
- The threshold for % identity for example '95.00' for 95 %
- -l MIN_LENGHT
- The minimum length of the overlap ex 0.60 for an overlap of minimum 60 %
-
-Example of use with the 'database' folder is loacted in the current directory and Blast added to the user's path
-
- perl ResFinder-2.1.pl -i INFILE.fasta -o OUTFOLDER -a aminoglycoside -k 95.00 -l 0.60
+ ResFinder alignment. Although the user can change the arguments just as if he
+ was running blastall directly, this is the command line used as default
+ in this script:
-Example of use with the 'database* and *blast-2.2.26' folders loacted in other directories
+ $BLASTALL $CMD
- perl ResFinder-2.1.pl -d path/to/database -b path/to/blast-2.2.26 -i INFILE.fasta -o OUTFOLDER -a aminoglycoside -k 95.00 -l 0.60
-
+ These defaults can be overridden by giving them as arguments to this
+ script, although it will likely break if anything other than '-a' is
+ changed. Example:
+
+ $ProgName -a 10 -i [File] -d [Antimicrobial Database]
+
+OPTIONS
+ Most options are simply forwarded to the call to BLAST. A few are unique
+ to this script and a few blast arguments deserve special mention.
+
+ -d [Antimicrobial database]
+ The antibiotic for which the resistance gene should be calculated.
+ Should follow a simple format, e.g. tetracycline, macrolide, etc. These are
+ in fact the core names of a .fsa located in the database directory:
+
+ $ABRES_DB
+
+ -I [Format]
+ Specify the sequence format of the input file.
+ If the input file is in another format than fasta, this can be specified
+ here. Most BioPerl sequence formats are supported.
+ Default is 'fasta'
+
+ -O [Format]
+ Specifies the format of the output.
+ If left unspecified (the default), the ST is given along with the
+ corresponding allele numbers in a tabbed format. If set to a sequence
+ format, e.g. 'tab' or 'fasta', the sequence of the alleles will instead
+ be outputted in the requested format.
+
+ -h or --help
+ Prints this text.
VERSION
Current: $Version
@@ -1320,4 +1353,4 @@ AUTHORS
Carsten Friis, carsten\@cbs.dtu.dk, Mette Voldby Larsen, metteb\@cbs.dtu.dk, Ea Zankari
EOH
-}
+}
\ No newline at end of file
=====================================
resfinder.py
=====================================
@@ -0,0 +1,552 @@
+#!/usr/bin/env python3
+from __future__ import division
+import sys
+import os
+import time
+import random
+import re
+import subprocess
+from argparse import ArgumentParser
+from tabulate import tabulate
+import collections
+
+from cgecore.blaster import Blaster
+from cgecore.cgefinder import CGEFinder
+
+
+class ResFinder(CGEFinder):
+
+ def __init__(self, db_conf_file, notes, db_path, db_path_kma=None,
+ databases=None):
+ """
+ """
+ self.db_path = db_path
+
+ if(db_path_kma is None):
+ self.db_path_kma = db_path + "/kma_indexing"
+ else:
+ self.db_path_kma = db_path_kma
+
+ self.configured_dbs = dict()
+ self.kma_db_files = None
+ self.load_db_config(db_conf_file=db_conf_file)
+
+ self.databases = []
+ self.load_databases(databases=databases)
+
+ self.phenos = dict()
+ self.load_notes(notes=notes)
+
+ self.blast_results = None
+ # self.kma_results = None
+ # self.results = None
+
+ def write_results(self, out_path, result, res_type):
+ """
+ """
+ if(res_type == ResFinder.TYPE_BLAST):
+ result_str = self.results_to_str(res_type=res_type,
+ results=result.results,
+ query_align=result.gene_align_query,
+ homo_align=result.gene_align_homo,
+ sbjct_align=result.gene_align_sbjct)
+ elif(res_type == ResFinder.TYPE_KMA):
+ result_str = self.results_to_str(res_type=res_type,
+ results=result)
+
+ with open(out_path + "/results_tab.txt", "w") as fh:
+ fh.write(result_str[0])
+ with open(out_path + "/results_table.txt", "w") as fh:
+ fh.write(result_str[1])
+ with open(out_path + "/results.txt", "w") as fh:
+ fh.write(result_str[2])
+ with open(out_path + "/Resistance_gene_seq.fsa", "w") as fh:
+ fh.write(result_str[3])
+ with open(out_path + "/Hit_in_genome_seq.fsa", "w") as fh:
+ fh.write(result_str[4])
+
+ def blast(self, inputfile, out_path, min_cov=0.9, threshold=0.6,
+ blast="blastn"):
+ """
+ """
+ blast_run = Blaster(inputfile=inputfile, databases=self.databases,
+ db_path=self.db_path, out_path=out_path,
+ min_cov=min_cov, threshold=threshold, blast=blast)
+ self.blast_results = blast_run.results
+ return blast_run
+
+ def results_to_str(self, res_type, results, query_align=None,
+ homo_align=None, sbjct_align=None):
+
+ if(res_type != ResFinder.TYPE_BLAST and res_type != ResFinder.TYPE_KMA):
+ sys.exit("resfinder.py error: result method was not provided or not "
+ "recognized.")
+
+ # Write the header for the tab file
+ tab_str = ("Resistance gene\tIdentity\tAlignment Length/Gene Length\t"
+ "Coverage\tPosition in reference\tContig\t"
+ "Position in contig\tPhenotype\tAccession no.\n")
+
+ table_str = ""
+ txt_str = ""
+ ref_str = ""
+ hit_str = ""
+
+ # Getting and writing out the results
+ titles = dict()
+ rows = dict()
+ headers = dict()
+ txt_file_seq_text = dict()
+ split_print = collections.defaultdict(list)
+
+ for db in results:
+ if(db == "excluded"):
+ continue
+
+ # Clean up dbs with only excluded hits
+ no_hits = True
+ for hit in results[db]:
+ if(hit not in results["excluded"]):
+ print("hit not excluded: " + str(hit))
+ print("\tIn DB: " + str(db))
+ no_hits = False
+ break
+ if(no_hits):
+ print("No hits in: " + str(db))
+ results[db] = "No hit found"
+
+ profile = str(self.configured_dbs[db][0])
+ if results[db] == "No hit found":
+ table_str += ("%s\n%s\n\n" % (profile, results[db]))
+ else:
+ titles[db] = "%s" % (profile)
+ headers[db] = ["Resistance gene", "Identity",
+ "Alignment Length/Gene Length", "Coverage",
+ "Position in reference", "Contig",
+ "Position in contig", "Phenotype",
+ "Accession no."]
+ table_str += ("%s\n" % (profile))
+ table_str += ("Resistance gene\tIdentity\t"
+ "Alignment Length/Gene Length\tCoverage\t"
+ "Position in reference\tContig\tPosition in contig\t"
+ "Phenotype\tAccession no.\n")
+
+ rows[db] = list()
+ txt_file_seq_text[db] = list()
+
+ for hit in results[db]:
+ if(hit in results["excluded"]):
+ continue
+
+ res_header = results[db][hit]["sbjct_header"]
+ tmp = res_header.split("_")
+ gene = tmp[0]
+ acc = "_".join(tmp[2:])
+ ID = results[db][hit]["perc_ident"]
+ coverage = results[db][hit]["perc_coverage"]
+ sbjt_length = results[db][hit]["sbjct_length"]
+ HSP = results[db][hit]["HSP_length"]
+ positions_contig = "%s..%s" % (results[db][hit]["query_start"],
+ results[db][hit]["query_end"])
+ positions_ref = "%s..%s" % (results[db][hit]["sbjct_start"],
+ results[db][hit]["sbjct_end"])
+ contig_name = results[db][hit]["contig_name"]
+ pheno = self.phenos.get(gene, "Warning: gene is missing from "
+ "Notes file. Please inform curator.")
+
+ pheno = pheno.strip()
+
+ if "split_length" in results[db][hit]:
+ total_HSP = results[db][hit]["split_length"]
+ split_print[res_header].append([gene, ID, total_HSP,
+ sbjt_length, coverage,
+ positions_ref, contig_name,
+ positions_contig, pheno,
+ acc])
+ tab_str += ("%s\t%s\t%s/%s\t%s\t%s\t%s\t%s\t%s\t%s\n"
+ % (gene, ID, HSP, sbjt_length, coverage,
+ positions_ref, contig_name, positions_contig,
+ pheno, acc)
+ )
+ else:
+ # Write tabels
+ table_str += ("%s\t%.2f\t%s/%s\t%s\t%s\t%s\t%s\t%s\t%s\n"
+ % (gene, ID, HSP, sbjt_length, coverage,
+ positions_ref, contig_name,
+ positions_contig, pheno, acc)
+ )
+ tab_str += ("%s\t%.2f\t%s/%s\t%s\t%s\t%s\t%s\t%s\t%s\n"
+ % (gene, ID, HSP, sbjt_length, coverage,
+ positions_ref, contig_name, positions_contig,
+ pheno, acc)
+ )
+
+ # Saving the output to write the txt result table
+ hsp_length = "%s/%s" % (HSP, sbjt_length)
+ rows[db].append([gene, ID, hsp_length, coverage,
+ positions_ref, contig_name, positions_contig,
+ pheno, acc])
+
+ # Writing subjet/ref sequence
+ if(res_type == ResFinder.TYPE_BLAST):
+ ref_seq = sbjct_align[db][hit]
+ elif(res_type == ResFinder.TYPE_KMA):
+ ref_seq = results[db][hit]["sbjct_string"]
+
+ ref_str += (">%s_%s\n" % (gene, acc))
+ for i in range(0, len(ref_seq), 60):
+ ref_str += ("%s\n" % (ref_seq[i:i + 60]))
+
+ # Getting the header and text for the txt file output
+ sbjct_start = results[db][hit]["sbjct_start"]
+ sbjct_end = results[db][hit]["sbjct_end"]
+ text = ("%s, ID: %.2f %%, Alignment Length/Gene Length: %s/%s, "
+ "Coverage: %s, "
+ "Positions in reference: %s..%s, Contig name: %s, "
+ "Position: %s" % (gene, ID, HSP, sbjt_length, coverage,
+ sbjct_start, sbjct_end, contig_name,
+ positions_contig))
+ hit_str += (">%s\n" % text)
+
+ # Writing query/hit sequence
+ if(res_type == ResFinder.TYPE_BLAST):
+ hit_seq = query_align[db][hit]
+ elif(res_type == ResFinder.TYPE_KMA):
+ hit_seq = results[db][hit]["query_string"]
+
+ for i in range(0, len(hit_seq), 60):
+ hit_str += ("%s\n" % (hit_seq[i:i + 60]))
+
+ # Saving the output to print the txt result file allignemts
+ if(res_type == ResFinder.TYPE_BLAST):
+ txt_file_seq_text[db].append((text, ref_seq,
+ homo_align[db][hit], hit_seq))
+ elif(res_type == ResFinder.TYPE_KMA):
+ txt_file_seq_text[db].append((text, ref_seq,
+ results[db][hit]["homo_string"],
+ hit_seq))
+
+ for res in split_print:
+ gene = split_print[res][0][0]
+ ID = split_print[res][0][1]
+ HSP = split_print[res][0][2]
+ sbjt_length = split_print[res][0][3]
+ coverage = split_print[res][0][4]
+ positions_ref = split_print[res][0][5]
+ contig_name = split_print[res][0][6]
+ positions_contig = split_print[res][0][7]
+ pheno = split_print[res][0][8]
+ acc = split_print[res][0][9]
+
+ total_coverage = 0
+
+ for i in range(1, len(split_print[res])):
+ ID = "%s, %.2f" % (ID, split_print[res][i][1])
+ total_coverage += split_print[res][i][4]
+ positions_ref = positions_ref + ", " + split_print[res][i][5]
+ contig_name = contig_name + ", " + split_print[res][i][6]
+ positions_contig = (positions_contig + ", "
+ + split_print[res][i][7])
+
+ table_str += ("%s\t%s\t%s/%s\t%s\t%s\t%s\t%s\t%s\t%s\n"
+ % (gene, ID, HSP, sbjt_length, coverage,
+ positions_ref, contig_name, positions_contig,
+ pheno, acc)
+ )
+
+ hsp_length = "%s/%s" % (HSP, sbjt_length)
+
+ rows[db].append([gene, ID, hsp_length, coverage, positions_ref,
+ contig_name, positions_contig, pheno, acc])
+
+ table_str += ("\n")
+
+ # Writing table txt for all hits
+ for db in titles:
+ # Txt file table
+ table = ResFinder.text_table(titles[db], headers[db], rows[db])
+ txt_str += table
+
+ # Writing alignment txt for all hits
+ for db in titles:
+ # Txt file alignments
+ txt_str += ("##################### %s #####################\n"
+ % (db))
+ for text in txt_file_seq_text[db]:
+ txt_str += ("%s\n\n" % (text[0]))
+ for i in range(0, len(text[1]), 60):
+ txt_str += ("%s\n" % (text[1][i:i + 60]))
+ txt_str += ("%s\n" % (text[2][i:i + 60]))
+ txt_str += ("%s\n\n" % (text[3][i:i + 60]))
+ txt_str += ("\n")
+
+ return (tab_str, table_str, txt_str, ref_str, hit_str)
+
+ @staticmethod
+ def text_table(title, headers, rows, table_format='psql'):
+ ''' Create text table
+
+ USAGE:
+ >>> from tabulate import tabulate
+ >>> title = 'My Title'
+ >>> headers = ['A','B']
+ >>> rows = [[1,2],[3,4]]
+ >>> print(text_table(title, headers, rows))
+ +-----------+
+ | My Title |
+ +-----+-----+
+ | A | B |
+ +=====+=====+
+ | 1 | 2 |
+ | 3 | 4 |
+ +-----+-----+
+ '''
+ # Create table
+ table = tabulate(rows, headers, tablefmt=table_format)
+ # Prepare title injection
+ width = len(table.split('\n')[0])
+ tlen = len(title)
+ if tlen + 4 > width:
+ # Truncate oversized titles
+ tlen = width - 4
+ title = title[:tlen]
+ spaces = width - 2 - tlen
+ left_spacer = ' ' * int(spaces / 2)
+ right_spacer = ' ' * (spaces - len(left_spacer))
+ # Update table with title
+ table = '\n'.join(['+%s+' % ('-' * (width - 2)),
+ '|%s%s%s|' % (left_spacer, title, right_spacer),
+ table, '\n'])
+ return table
+
+ def load_notes(self, notes):
+ with open(notes, 'r') as f:
+ for line in f:
+ line = line.strip()
+ if line.startswith("#"):
+ continue
+ else:
+ tmp = line.split(":")
+
+ self.phenos[tmp[0]] = "%s %s" % (tmp[1], tmp[2])
+
+ if(tmp[2].startswith("Alternate name; ")):
+ self.phenos[tmp[2][16:]] = "%s %s" % (tmp[1], tmp[2])
+
+ def load_databases(self, databases):
+ """
+ """
+ # Check if databases and config file are correct/correponds
+ if databases is '':
+ sys.exit("Input Error: No database was specified!\n")
+ elif databases is None:
+ # Choose all available databases from the config file
+ self.databases = self.configured_dbs.keys()
+ else:
+ # Handle multiple databases
+ databases = databases.split(',')
+ # Check that the ResFinder DBs are valid
+ for db_prefix in databases:
+ if db_prefix in self.configured_dbs:
+ self.databases.append(db_prefix)
+ else:
+ sys.exit("Input Error: Provided database was not "
+ "recognised! (%s)\n" % db_prefix)
+
+ def load_db_config(self, db_conf_file):
+ """
+ """
+ extensions = []
+ with open(db_conf_file) as f:
+ for line in f:
+ line = line.strip()
+
+ if not line:
+ continue
+
+ if line[0] == '#':
+ if 'extensions:' in line:
+ extensions = [s.strip() for s
+ in line.split('extensions:')[-1].split(',')]
+ continue
+
+ tmp = line.split('\t')
+ if len(tmp) != 3:
+ sys.exit(("Input Error: Invalid line in the database"
+ " config file!\nA proper entry requires 3 tab "
+ "separated columns!\n%s") % (line))
+
+ db_prefix = tmp[0].strip()
+ name = tmp[1].split('#')[0].strip()
+ description = tmp[2]
+
+ # Check if all db files are present
+ for ext in extensions:
+ db = "%s/%s.%s" % (self.db_path, db_prefix, ext)
+ if not os.path.exists(db):
+ sys.exit(("Input Error: The database file (%s) "
+ "could not be found!") % (db_path))
+
+ if db_prefix not in self.configured_dbs:
+ self.configured_dbs[db_prefix] = []
+ self.configured_dbs[db_prefix].append(name)
+
+ if len(self.configured_dbs) == 0:
+ sys.exit("Input Error: No databases were found in the "
+ "database config file!")
+
+ # Loading paths for KMA databases.
+ for drug in self.configured_dbs:
+ kma_db = self.db_path_kma + drug
+ self.kma_db_files = [kma_db + ".b", kma_db + ".length.b",
+ kma_db + ".name.b", kma_db + ".align.b"]
+
+
+if __name__ == '__main__':
+
+ ##########################################################################
+ # PARSE COMMAND LINE OPTIONS
+ ##########################################################################
+
+ parser = ArgumentParser()
+ parser.add_argument("-i", "--inputfile",
+ dest="inputfile",
+ help="Input file",
+ default=None)
+ parser.add_argument("-1", "--fastq1",
+ help="Raw read data file 1.",
+ default=None)
+ parser.add_argument("-2", "--fastq2",
+ help="Raw read data file 2 (only required if data is \
+ paired-end).",
+ default=None)
+ parser.add_argument("-o", "--outputPath",
+ dest="out_path",
+ help="Path to blast output",
+ default='')
+
+ parser.add_argument("-b", "--blastPath",
+ dest="blast_path",
+ help="Path to blast",
+ default='blastn')
+ parser.add_argument("-p", "--databasePath",
+ dest="db_path",
+ help="Path to the databases",
+ default='')
+
+ parser.add_argument("-k", "--kmaPath",
+ dest="kma_path",
+ help="Path to KMA",
+ default=None)
+ parser.add_argument("-q", "--databasePathKMA",
+ dest="db_path_kma",
+ help="Path to the directories containing the KMA \
+ indexed databases. Defaults to the directory \
+ 'kma_indexing' inside the databasePath \
+ directory.",
+ default=None)
+
+ parser.add_argument("-d", "--databases",
+ dest="databases",
+ help="Databases chosen to search in - if none are \
+ specified all are used",
+ default=None)
+ parser.add_argument("-l", "--min_cov",
+ dest="min_cov",
+ help="Minimum coverage",
+ default=0.60)
+ parser.add_argument("-t", "--threshold",
+ dest="threshold",
+ help="Blast threshold for identity",
+ default=0.90)
+ args = parser.parse_args()
+
+ ##########################################################################
+ # MAIN
+ ##########################################################################
+
+ # Defining varibales
+
+ min_cov = args.min_cov
+ threshold = args.threshold
+
+ # Check if valid database is provided
+ if args.db_path is None:
+ sys.exit("Input Error: No database directory was provided!\n")
+ elif not os.path.exists(args.db_path):
+ sys.exit("Input Error: The specified database directory does not "
+ "exist!\n")
+ else:
+ # Check existence of config file
+ db_config_file = '%s/config' % (args.db_path)
+ if not os.path.exists(db_config_file):
+ sys.exit("Input Error: The database config file could not be found!")
+ # Save path
+ db_path = args.db_path
+
+ # Check existence of notes file
+ notes_path = "%s/notes.txt" % (args.db_path)
+ if not os.path.exists(notes_path):
+ sys.exit('Input Error: notes.txt not found! (%s)' % (notes_path))
+
+ # Check for input
+ if args.inputfile is None and args.fastq1 is None:
+ sys.exit("Input Error: No Input were provided!\n")
+
+ # Check if valid input file for assembly is provided
+ if args.inputfile:
+ if not os.path.exists(args.inputfile):
+ sys.exit("Input Error: Input file does not exist!\n")
+ else:
+ inputfile = args.inputfile
+ else:
+ inputfile = None
+
+ # Check if valid input files for raw data
+ if args.fastq1:
+
+ if not os.path.exists(args.fastq1):
+ sys.exit("Input Error: fastq1 file does not exist!\n")
+ else:
+ input_fastq1 = args.fastq1
+
+ if args.fastq2:
+ if not os.path.exists(args.fastq2):
+ sys.exit("Input Error: fastq2 file does not exist!\n")
+ else:
+ input_fastq2 = args.fastq2
+ else:
+ input_fastq2 = None
+ else:
+ input_fastq1 = None
+ input_fastq2 = None
+
+ # Check if valid output directory is provided
+ if not os.path.exists(args.out_path):
+ sys.exit("Input Error: Output dirctory does not exists!\n")
+ else:
+ out_path = args.out_path
+
+ # If input is an assembly, then use BLAST
+ if(inputfile is not None):
+ finder = ResFinder(db_conf_file=db_config_file, databases=args.databases,
+ db_path=db_path, notes=notes_path)
+
+ blast_run = finder.blast(inputfile=inputfile, out_path=out_path,
+ min_cov=min_cov, threshold=threshold,
+ blast=args.blast_path)
+
+ finder.write_results(out_path=out_path, result=blast_run,
+ res_type=ResFinder.TYPE_BLAST)
+
+ # If the input is raw read data, then use KMA
+ elif(input_fastq1 is not None):
+ finder = ResFinder(db_conf_file=db_config_file, databases=args.databases,
+ db_path=db_path, db_path_kma=args.db_path_kma,
+ notes=notes_path)
+
+ # if input_fastq2 is None, it is ignored by the kma method.
+ kma_run = finder.kma(inputfile_1=input_fastq1, inputfile_2=input_fastq2)
+
+ finder.write_results(out_path=out_path, result=kma_run,
+ res_type=ResFinder.TYPE_KMA)
View it on GitLab: https://salsa.debian.org/med-team/resfinder/commit/5079ee1cc172d249b02026dd2cd85b4ff51a596e
--
View it on GitLab: https://salsa.debian.org/med-team/resfinder/commit/5079ee1cc172d249b02026dd2cd85b4ff51a596e
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20181018/106e312c/attachment-0001.html>
More information about the debian-med-commit
mailing list