[med-svn] [Git][med-team/gubbins][master] 4 commits: routine-update: New upstream version
Andreas Tille (@tille)
gitlab at salsa.debian.org
Wed Jul 12 20:36:47 BST 2023
Andreas Tille pushed to branch master at Debian Med / gubbins
Commits:
a780ff6c by Andreas Tille at 2023-07-12T21:30:41+02:00
routine-update: New upstream version
- - - - -
6930509c by Andreas Tille at 2023-07-12T21:30:42+02:00
New upstream version 3.3.0
- - - - -
f117de3e by Andreas Tille at 2023-07-12T21:30:50+02:00
Update upstream source from tag 'upstream/3.3.0'
Update to upstream version '3.3.0'
with Debian dir b9cc26b0d8c0608a7e76714d879f7b0839326bc2
- - - - -
af53dc3f by Andreas Tille at 2023-07-12T21:34:37+02:00
routine-update: Ready to upload to unstable
- - - - -
18 changed files:
- CHANGELOG.md
- VERSION
- debian/changelog
- debian/control
- docs/gubbins_manual.md
- docs/gubbins_tutorial.md
- environment.yml
- python/gubbins/common.py
- + python/gubbins/tests/data/corrected_alignment.aln
- + python/gubbins/tests/data/corrected_alignment.csv
- + python/gubbins/tests/data/invalid_alignment.aln
- + python/gubbins/tests/data/multiple_recombinations_annotation.gff
- + python/gubbins/tests/data/multiple_recombinations_rec_per_gene.tab
- python/gubbins/tests/test_python_scripts.py
- + python/scripts/count_recombinations_per_gene.py
- python/scripts/generate_ska_alignment.py
- python/scripts/gubbins_alignment_checker.py
- python/setup.py
Changes:
=====================================
CHANGELOG.md
=====================================
@@ -1,6 +1,7 @@
# Change Log
-## [v3.3]
+## [v3.3](https://github.com/nickjcroucher/gubbins/releases/tag/v3.3) (2022-10-6)
+[Full Changelog](https://github.com/sanger-pathogens/gubbins/compare/v3.2.1...v3.3)
- Enable time calibration of final tree
- Use separate models for tree construction and SNP reconstruction
=====================================
VERSION
=====================================
@@ -1 +1 @@
-3.2.1
+3.3.0
=====================================
debian/changelog
=====================================
@@ -1,4 +1,4 @@
-gubbins (3.3-1) UNRELEASED; urgency=medium
+gubbins (3.3.0-1) unstable; urgency=medium
* New upstream version
* Standards-Version: 4.6.2 (routine-update)
@@ -8,7 +8,7 @@ gubbins (3.3-1) UNRELEASED; urgency=medium
* Build-Depends: tox and do not override dh_auto_test
* Test-Depends / Recommends: python3-numba
- -- Andreas Tille <tille at debian.org> Tue, 07 Feb 2023 10:04:38 +0100
+ -- Andreas Tille <tille at debian.org> Wed, 12 Jul 2023 21:30:57 +0200
gubbins (2.4.1-5) unstable; urgency=medium
=====================================
debian/control
=====================================
@@ -10,7 +10,7 @@ Build-Depends: debhelper-compat (= 13),
pkg-config,
fasttree,
iqtree,
- raxml (>= 8.2.12+dfsg-5),
+ raxml,
python3-setuptools,
zlib1g-dev,
check,
=====================================
docs/gubbins_manual.md
=====================================
@@ -52,13 +52,13 @@ These will automatically be installed within the conda environment. Please cite
The required input file for Gubbins is a whole genome FASTA alignment. Each sequence should have a unique identifier, and special characters should be avoided. The sequences should only use the characters `ACGT` (DNA bases), `N` (unknown base) or `-` (alignment gap). If a starting tree is to be included, then this should be a Newick format.
-The alignment is most easily generated through mapping sequences against a reference sequence. This can be achieved with the popular mapping software Snippy, following the instructions on the relevant [Github repository](https://github.com/tseemann/snippy). Alternatively, the alignment can be generated using the Gubbins script `generate_ska_alignment.py`, which creates an alignment using [SKA](https://github.com/simonrharris/SKA), which can be installed through `conda install -c bioconda ska` (SKA is included when installing Gubbins through conda). For instance,
+The alignment is most easily generated through mapping sequences against a reference sequence. This can be achieved with the popular mapping software Snippy, following the instructions on the relevant [Github repository](https://github.com/tseemann/snippy). Alternatively, the alignment can be generated using the Gubbins script `generate_ska_alignment.py`, which creates an alignment using [SKA2](https://github.com/bacpop/ska.rust), which can be installed through `conda install -c bioconda ska2` (SKA2 is included when installing Gubbins through conda). For instance,
```
-generate_ska_alignment.py --reference seq_X.fa --fasta fasta_files.list --fastq fastq_files.list --out out.aln
+generate_ska_alignment.py --reference seq_X.fa --input input.list --out out.aln
```
-Where `fasta_files.list` is a two column tab-delimited file containing sequence names (in the first column) and FASTA sequence assembly file paths (in the second column); `fastq_files.list` contains the same for unassembled FASTQ-format read data.
+Where `input.list` is a tab-delimited file with one row per isolate. The first column should be the isolate name, and the subsequent entries on the same row should contain the corresponding sequence data - this may be a single FASTA assembly, or multiple FASTQ raw read files. The alignment will be reformatted for Gubbins (e.g. modifying isolate names and removing non-N IUPAC ambiguity codes) by this script.
The alignment can then be analysed with Gubbins:
=====================================
docs/gubbins_tutorial.md
=====================================
@@ -10,10 +10,10 @@ tar xfz PMEN3_assemblies.tar.gz
## Generating the alignment
-The draft genomes can be aligned to the reference using [SKA](https://github.com/simonrharris/SKA). This is first installed through conda:
+The draft genomes can be aligned to the reference using [SKA2](https://github.com/bacpop/ska.rust). This is first installed through conda:
```
-conda install -c bioconda ska
+conda install -c bioconda ska2
```
An index file is then generated to name the isolates to be aligned:
@@ -38,7 +38,7 @@ The `PMEN3_isolates.list` should contain this text:
The alignment is then constructed using the Gubbins script `generate_ska_alignment.py`:
```
-generate_ska_alignment.py --reference RMV4.fa --fasta PMEN3_isolates.list --out PMEN3.aln
+generate_ska_alignment.py --reference RMV4.fa --input PMEN3_isolates.list --out PMEN3.aln
```
## Analysis with Gubbins
=====================================
environment.yml
=====================================
@@ -25,7 +25,7 @@ dependencies:
- dendropy
- biopython
- multiprocess
- - numpy
+ - numpy<=1.23.0
- numba
# phylogenetics
- raxml=8.2.12
@@ -34,4 +34,4 @@ dependencies:
- raxml-ng=1.0.1
- fasttree=2.1.10
# Scripts
- - ska
+ - ska2
=====================================
python/gubbins/common.py
=====================================
@@ -557,7 +557,7 @@ def parse_and_run(input_args, program_description=""):
subprocess.check_call(dating_command, shell=True)
except subprocess.SubprocessError:
# If this fails, continue to generate rest of output
- sys.write("Failed running tree time calibration with LSD.")
+ sys.stderr.write("Failed running tree time calibration with LSD.")
input_args.date = None
# Create the final output
=====================================
python/gubbins/tests/data/corrected_alignment.aln
=====================================
@@ -0,0 +1,8 @@
+>sequence1
+AAANNNNAAA
+>sequence2
+CCCCCCCCCC
+>sequence3
+GGGGGGGGGG
+>sequence4
+TTTTTTTTTT
=====================================
python/gubbins/tests/data/corrected_alignment.csv
=====================================
@@ -0,0 +1,5 @@
+isolate,A,C,G,N,T
+sequence1,6,0,0,4,0
+sequence2,0,10,0,0,0
+sequence3,0,0,10,0,0
+sequence4,0,0,0,0,10
=====================================
python/gubbins/tests/data/invalid_alignment.aln
=====================================
@@ -0,0 +1,8 @@
+>sequence1
+AAARRRRAAA
+>sequence2
+CCCCCCCCCC
+>sequence3
+GGGGGGGGGG
+>sequence4
+TTTTTTTTTT
=====================================
python/gubbins/tests/data/multiple_recombinations_annotation.gff
=====================================
@@ -0,0 +1,5 @@
+##gff-version 3
+##sequence-region CONTIG1 1 20
+##sequence-region CONTIG2 1 200
+CONTIG2 EMBL CDS 50 80 0.000 + 0 ID="CDS1";transl_table=11;gene="CDS1";locus_tag="CDS1";product="protein"
+CONTIG2 EMBL CDS 80 120 0.000 + 0 ID="CDS2";transl_table=11;gene="CDS2";locus_tag="CDS2";product="protein"
=====================================
python/gubbins/tests/data/multiple_recombinations_rec_per_gene.tab
=====================================
@@ -0,0 +1,3 @@
+CDS GeneName Start End NumRec NumAffectedTaxa AffectedTaxa
+CDS1 CDS1 70 100 3 9 sequence_1;sequence_2;sequence_3;sequence_4;sequence_5;sequence_6;sequence_7;sequence_8;sequence_9
+CDS2 CDS2 100 140 1 1 sequence_10
=====================================
python/gubbins/tests/test_python_scripts.py
=====================================
@@ -19,17 +19,28 @@ working_dir = os.path.join(modules_dir, 'tests')
class TestPythonScripts(unittest.TestCase):
- ## Test the alignment_checker script
-
+ ## Test the alignment_checker script
def test_alignment_checker(self):
small_aln = os.path.join(data_dir, "valid_alignment.aln")
- output_file = os.path.join(working_dir, "valid_alignment_test")
output_csv = os.path.join(working_dir, "valid_alignment_test.csv")
test_csv = os.path.join(data_dir, "test_valid_output.csv")
- aln_cmd = "gubbins_alignment_checker.py --aln " + small_aln + " --out " + output_file
+ aln_cmd = "gubbins_alignment_checker.py --aln " + small_aln + " --out " + output_csv
+ subprocess.check_call(aln_cmd, shell=True)
+ assert self.md5_check(output_csv, test_csv)
+ os.remove(output_csv)
+
+ def test_alignment_reformatting(self):
+ invalid_aln = os.path.join(data_dir, "invalid_alignment.aln")
+ output_file = os.path.join(working_dir, "invalid_alignment_test.aln")
+ output_csv = os.path.join(working_dir, "valid_alignment_test.csv")
+ test_aln = os.path.join(data_dir, "corrected_alignment.aln")
+ test_csv = os.path.join(data_dir, "corrected_alignment.csv")
+ aln_cmd = "gubbins_alignment_checker.py --aln " + invalid_aln + " --out-aln " + output_file + " --out " + output_csv
subprocess.check_call(aln_cmd, shell=True)
assert self.md5_check(output_csv, test_csv)
+ assert self.md5_check(output_file, test_aln)
os.remove(output_csv)
+ os.remove(output_file)
## Test the clade extraction script
def test_clade_extraction(self):
@@ -82,9 +93,9 @@ class TestPythonScripts(unittest.TestCase):
ref_seq = os.path.join(preprocess_dir, 'sequence_t1.fasta')
aln_out = os.path.join(preprocess_dir, 'ska_test_aln.aln')
# Script name
- ska_cmd = "generate_ska_alignment.py --fasta " + fasta_loc +\
+ ska_cmd = "generate_ska_alignment.py --input " + fasta_loc +\
" --reference " + ref_seq + " --out " + aln_out +\
- " --k 6"
+ " --k 7"
subprocess.check_call(ska_cmd, shell=True)
## Now run gubbins on the aln and check all the output is produced
parser = run_gubbins.parse_input_args()
@@ -122,6 +133,17 @@ class TestPythonScripts(unittest.TestCase):
os.remove(out_gff)
os.remove(out_tree)
+ # Test clade file extraction script
+ def test_recombination_counting_per_gene(self):
+ multiple_gff = os.path.join(data_dir, "multiple_recombinations_gubbins.recombination_predictions.gff")
+ annnotation_gff = os.path.join(data_dir, "multiple_recombinations_annotation.gff")
+ out_tab = os.path.join(data_dir, "test_multiple_recombinations_annotation.gff")
+ check_tab = os.path.join(data_dir, "multiple_recombinations_rec_per_gene.tab")
+ rec_count_cmd = "count_recombinations_per_gene.py --rec-gff " + multiple_gff + " --anno-gff " + annnotation_gff + " --out " + out_tab
+ subprocess.check_call(rec_count_cmd, shell=True)
+ assert self.md5_check(out_tab, check_tab)
+ os.remove(out_tab)
+
@staticmethod
def check_for_output_files(prefix):
assert os.path.exists(prefix + '.summary_of_snp_distribution.vcf')
=====================================
python/scripts/count_recombinations_per_gene.py
=====================================
@@ -0,0 +1,131 @@
+#! python
+
+# encoding: utf-8
+# Wellcome Trust Sanger Institute and Imperial College London
+# Copyright (C) 2023 Wellcome Trust Sanger Institute and Imperial College London
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301, USA.
+#
+
+# Generic imports
+import sys
+import argparse
+import re
+
+# command line parsing
+def get_options():
+
+ parser = argparse.ArgumentParser(description='Mask recombinant regions detected by'
+ ' Gubbins from the input alignment',
+ prog='mask_gubbins_aln')
+
+ # input options
+ parser.add_argument('--rec-gff',
+ help = 'GFF of recombinant regions detected by Gubbins',
+ required = True)
+ parser.add_argument('--anno-gff',
+ help = 'GFF of annotation corresponding to the input alignment',
+ required = True)
+ parser.add_argument('--out',
+ help = 'Output file name',
+ required = True)
+ return parser.parse_args()
+
+# main code
+if __name__ == "__main__":
+
+ # Get command line options
+ args = get_options()
+
+ # Read recombinant regions from GFF
+ rec_start = []
+ rec_end = []
+ rec_affected = []
+ taxon_pattern = re.compile('taxa="([^"]*)"')
+ with open(args.rec_gff,'r') as gff_file:
+ for line in gff_file.readlines():
+ if not line.startswith('##'):
+ # Calculate stats
+ info = line.rstrip().split('\t')
+ start = int(info[3])
+ end = int(info[4])
+ taxon_set = set(taxon_pattern.search(info[8]).group(1).split())
+ # Record stats
+ rec_start.append(start)
+ rec_end.append(end)
+ rec_affected.append(taxon_set)
+
+ # Read annotation from GFF
+ cds_start = []
+ cds_end = []
+ cds_name = []
+ cds_index = []
+ contig_starts = {}
+ cumulative_length = 0
+ with open(args.anno_gff,'r') as gff_file:
+ for line in gff_file.readlines():
+ if line.startswith('##sequence-region'):
+ info = line.rstrip().split(' ')
+ contig_starts[info[1]] = int(cumulative_length)
+ cumulative_length = contig_starts[info[1]] + int(info[3])
+ elif not line.startswith('##'):
+ info = line.rstrip().split('\t')
+ if len(info) >= 7:
+ if info[2] == 'CDS':
+ name = '-'
+ index = None
+ cds_data = info[8].replace('"','').split(';')
+ for datum in cds_data:
+ qualifier, value = datum.split('=')
+ if qualifier == 'locus_tag':
+ index = value
+ elif qualifier == 'gene':
+ name = value
+ if index is not None:
+ contig_start = contig_starts[info[0]]
+ cds_start.append(int(info[3]) + contig_start)
+ cds_end.append(int(info[4]) + contig_start)
+ cds_index.append(index)
+ cds_name.append(name)
+
+ # Run checks on whether genes have been detected
+ if len(cds_index) == 0:
+ sys.stderr.write('No genes detected in annotation\n')
+ sys.exit()
+ elif not (len(cds_start) == len(cds_end) and len(cds_start) == len(cds_index) and len(cds_start) == len(cds_name)):
+ sys.stderr.write('Error with extraction of information on annotation\n')
+ sys.exit()
+ elif not (len(rec_start) == len(rec_end) and len(rec_start) == len(rec_affected)):
+ sys.stderr.write('Error with extraction of information on recombination\n')
+ sys.exit()
+
+ # Write out summary statistics
+ with open(args.out,'w') as out_file:
+ out_file.write('CDS\tGeneName\tStart\tEnd\tNumRec\tNumAffectedTaxa\tAffectedTaxa\n')
+ for cnum,cindex in enumerate(cds_index):
+ num_rec = 0
+ affected_taxa = set()
+ cstart = cds_start[cnum]
+ cend = cds_end[cnum]
+ for rnum,rstart in enumerate(rec_start):
+ rend = rec_end[rnum]
+ if (rstart < cend and rend > cend) or \
+ (rstart < cstart and rend > cstart) or \
+ (rstart > cstart and rend < cend):
+ num_rec = num_rec + 1
+ affected_taxa = affected_taxa.union(rec_affected[rnum])
+ sorted_affected_taxa = list(affected_taxa)
+ sorted_affected_taxa.sort()
+ out_file.write(cds_index[cnum] + '\t' + cds_name[cnum] + '\t' + str(cds_start[cnum]) + '\t' + str(cds_end[cnum]) + '\t' + str(num_rec) + '\t' + str(len(affected_taxa)) + '\t' + ';'.join(sorted_affected_taxa) + '\n')
=====================================
python/scripts/generate_ska_alignment.py
=====================================
@@ -31,7 +31,7 @@ from shutil import which
# command line parsing
def get_options():
- parser = argparse.ArgumentParser(description='Generate a ska alignment from a list '
+ parser = argparse.ArgumentParser(description='Generate a ska2 alignment from a list '
'of assemblies',
prog='generate_ska_alignment')
@@ -39,12 +39,8 @@ def get_options():
parser.add_argument('--reference',
help = 'Name of reference sequence to use for alignment',
required = True)
- parser.add_argument('--fasta',
- help = 'Two column list of names and FASTA files to include in alignment',
- default = None,
- required = False)
- parser.add_argument('--fastq',
- help = 'Two/three column list of names and of FASTQ files to include in alignment',
+ parser.add_argument('--input',
+ help = 'List of sequence data; one row per isolate, with first column being the isolate name',
default = None,
required = False)
parser.add_argument('--out',
@@ -53,7 +49,7 @@ def get_options():
parser.add_argument('--k',
help = 'Split kmer size',
type = int,
- default = 15)
+ default = 17)
parser.add_argument('--threads',
help = 'Number of threads to use',
type = int,
@@ -89,67 +85,37 @@ if __name__ == "__main__":
# Check if ska is installed
if which('ska') is None:
- sys.stderr.write('SKA cannot be found on PATH; install with "conda install ska"')
+ sys.stderr.write('ska2 cannot be found on PATH; install with "conda install ska2"')
sys.exit(1)
- # Dictionary for sequence names
- seq_names = {}
- all_names = []
-
- # Make split kmers from assemblies
- fasta_names = []
- if args.fasta is not None:
- # Read in FASTA assemblies
- with open(args.fasta,'r') as fasta_list:
- for line in fasta_list.readlines():
- info = line.strip().split()
- if os.path.isfile(info[1]):
- fasta_names.append(info[1])
- seq_names[info[1]] = info[0]
- all_names.append(info[0])
- else:
- sys.stderr.write('Unable to find file ' + info[1] + '\n')
- # Sketch into split kmers
- with Pool(processes = args.threads) as pool:
- pool.map(partial(map_fasta_sequence,
- k = args.k,
- names = seq_names),
- fasta_names)
-
- # Make split kmers from FASTQs
- fastq_names = []
- if args.fastq is not None:
- # Read in FASTQ reads
- with open(args.fastq,'r') as fastq_list:
- for line in fastq_list.readlines():
- info = line.strip().split()
- if os.path.isfile(info[1]) and os.path.isfile(info[2]):
- fastq_names.append((info[1],info[2]))
- seq_names[info[1]] = info[0]
- all_names.append(info[0])
- else:
- sys.stderr.write('Unable to find files ' + info[1] + ' and ' + info[2] + '\n')
- # Sketch into split kmers
- with Pool(processes = args.threads) as pool:
- return_codes = pool.map(partial(map_fastq_sequence,
- k = args.k,
- names = seq_names),
- fastq_names)
-
- # Map sequences
- with Pool(processes = args.threads) as pool:
- return_codes = pool.map(partial(ska_map_sequences,
- k = args.k,
- ref = args.reference),
- all_names)
-
- # Generate alignment
- subprocess.check_output('cat ' + ' '.join([seq + '.map.aln' for seq in all_names]) + ' > ' + args.out,
+ # Check if k value is acceptable:
+ if (args.k % 2) == 0 or args.k < 5 or args.k > 63:
+ sys.stderr.write('k must be odd and between 5 and 63\n')
+ sys.exit(1)
+
+ # Build ska sketch
+ subprocess.check_output('ska build -o ' + args.out + ' -k ' + str(args.k) + \
+ ' -f ' + args.input + ' --threads ' + str(args.threads),
shell = True)
+ # Run ska mapping
+ if os.path.exists(args.out + '.skf'):
+ tmp_aln = os.path.join(os.path.dirname(args.out), 'tmp.' + os.path.basename(args.out))
+ subprocess.check_output('ska map -o ' + tmp_aln + ' --threads ' + str(args.threads) + ' ' + \
+ args.reference + ' ' + args.out + '.skf',
+ shell = True)
+ else:
+ sys.stderr.write('ska building failed\n')
+ sys.exit(1)
+
+ # Clean alignment to prep for Gubbins
+ subprocess.check_output('gubbins_alignment_checker.py --aln ' + tmp_aln + ' --out-aln ' + args.out + \
+ ' --out ' + args.out + '.csv',
+ shell = True)
+
+ sys.stderr.write("Completed generating alignment with ska2 (https://github.com/bacpop/ska.rust)\n")
+
# Clean up
if not args.no_cleanup:
- subprocess.check_output('rm ' + ' '.join([seq + '.map.aln' for seq in all_names]),
- shell = True)
- subprocess.check_output('rm ' + ' '.join([seq + '.skf' for seq in all_names]),
+ subprocess.check_output('rm ' + args.out + '.skf ' + tmp_aln,
shell = True)
=====================================
python/scripts/gubbins_alignment_checker.py
=====================================
@@ -20,26 +20,35 @@
#
import argparse
-from asyncore import write
import re
from collections import Counter
def parse_input_args():
parser = argparse.ArgumentParser(
- description='Script to output bases in an alignment by isolate',
+ description='Script to evaluate and reformat an alignment prior to Gubbins analysis',
formatter_class=argparse.ArgumentDefaultsHelpFormatter)
- parser.add_argument('--aln', '-a', dest="aln",
- help='Multifasta alignment file', required=True)
- parser.add_argument('--out', '-o',
+ parser.add_argument('--aln',
+ '-a',
+ dest="aln",
+ help='Multifasta alignment filename',
+ required=True)
+ parser.add_argument('--out-aln',
+ dest="out_aln",
+ help='Reformatted alignment filename',
+ default = None,
+ required=False)
+ parser.add_argument('--out',
+ '-o',
dest="out",
- help="Out csv name for writing results too", required=True)
+ help='Output CSV filename',
+ required=True)
return parser.parse_args()
def main(input_args):
- ## Lets set up the csv
+ # Read the number of rows in the alignment
row_num = 0
tot_lines = 0
with open(input_args.aln, "r") as aln_file:
@@ -48,16 +57,29 @@ def main(input_args):
row_num += 1
tot_lines += 1
- # Going to use counter in a first pass to store sequence counts and the range of sequences
-
+ # Filter alignment if requested
+ aln_filename = input_args.aln
+ if input_args.out_aln:
+ with open(input_args.aln, "r") as aln_file, \
+ open(input_args.out_aln, "w") as out_aln_file:
+ for line in aln_file:
+ if re.search("^>", line):
+ name = line.replace("#","_").replace(":","_").replace(">","").rstrip().split()
+ out_aln_file.write('>' + name[0] + '\n')
+ else:
+ sequence = line.upper().rstrip()
+ sequence = re.sub('[^ACGTN-]','N',sequence)
+ out_aln_file.write(sequence + '\n')
+ aln_filename = input_args.out_aln
+ # Going to use counter in a first pass to store sequence counts and the range of sequences
total_base_counts = []
iso_data = []
total_headers = []
tot_length_str = str(tot_lines)
print("Running through alignment file: %s" % input_args.aln)
print()
- with open(input_args.aln, "r") as aln_file:
+ with open(aln_filename, "r") as aln_file:
for index,line in enumerate(aln_file):
num_zeros = len(tot_length_str) - len(str(index + 1))
fmt_index = (("0" * num_zeros) + str(index + 1))
@@ -78,6 +100,7 @@ def main(input_args):
## Second pass to line up all the counts across the isolates
print("")
print("Assessing counts...")
+ total_headers.sort()
isolate_bases = []
for count in total_base_counts:
iso_row = []
@@ -88,7 +111,7 @@ def main(input_args):
total_headers.insert(0, "isolate")
write_out_headers = [re.sub("-","gap",i) for i in total_headers]
print("Writing out results...")
- with open((input_args.out + ".csv"), "w") as output:
+ with open((input_args.out), "w") as output:
output.write(",".join(write_out_headers) + "\n")
for i, aln_row in enumerate(isolate_bases):
aln_row_str = list(map(str, aln_row))
=====================================
python/setup.py
=====================================
@@ -20,7 +20,8 @@ setuptools.setup(
'scripts/extract_gubbins_clade.py',
'scripts/mask_gubbins_aln.py',
'scripts/gubbins_alignment_checker.py',
- 'scripts/generate_files_for_clade_analysis.py'
+ 'scripts/generate_files_for_clade_analysis.py',
+ 'scripts/count_recombinations_per_gene.py'
],
tests_require=[
"pytest >= 4.6",
View it on GitLab: https://salsa.debian.org/med-team/gubbins/-/compare/c6b8c30ca01f72a149a8f694d4fb6fcca486318a...af53dc3f5396f3e377f9f5e5bdb8450b039c39c2
--
View it on GitLab: https://salsa.debian.org/med-team/gubbins/-/compare/c6b8c30ca01f72a149a8f694d4fb6fcca486318a...af53dc3f5396f3e377f9f5e5bdb8450b039c39c2
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20230712/b531fe40/attachment-0001.htm>
More information about the debian-med-commit
mailing list