[med-svn] [abyss] 20/20: New upstream version 1.3.1
Andreas Tille
tille at debian.org
Thu Sep 21 20:46:15 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to annotated tag upstream/1.3.1
in repository abyss.
commit 6c47283e98a213bd8647b0a50f83067064329faa
Author: Andreas Tille <tille at debian.org>
Date: Thu Sep 21 22:44:39 2017 +0200
New upstream version 1.3.1
---
Manual.html | 622 +++++++++++++++
abacas | 1876 ++++++++++++++++++++++++++++++++++++++++++++++
debian/abacas.1 | 104 ---
debian/changelog | 29 -
debian/compat | 1 -
debian/control | 30 -
debian/copyright | 31 -
debian/doc-base | 12 -
debian/docs | 2 -
debian/get-orig-source | 48 --
debian/install | 1 -
debian/manpages | 1 -
debian/rules | 13 -
debian/source/format | 1 -
debian/upstream/edam | 13 -
debian/upstream/metadata | 12 -
debian/watch | 3 -
style.css | 102 +++
18 files changed, 2600 insertions(+), 301 deletions(-)
diff --git a/Manual.html b/Manual.html
new file mode 100644
index 0000000..46dfe4a
--- /dev/null
+++ b/Manual.html
@@ -0,0 +1,622 @@
+<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
+<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en"><head><title>ABACAS</title>
+
+<meta http-equiv="Content-Type" content="text/html;charset=iso-8859-1" />
+<link rel="stylesheet" type="text/css" href="style.css" />
+</head>
+<body>
+<div id="wrapper"><br />
+<div id="header">
+<div style="text-align: center;"><big><span style="font-weight: bold;"><br />
+<br />
+User Manual</span></big></div>
+<br />
+ <br />
+
+
+
+
+
+ <br />
+<br />
+
+
+ <br />
+</div>
+<div id="content">
+<div id="menu">
+<ul>
+<li></li>
+<li></li>
+<li></li>
+<li></li>
+</ul>
+<br />
+<br />
+<br />
+<br />
+<br />
+<br />
+<br />
+<ul>
+</ul>
+<ul>
+<li><a href="http://abacas.sourceforge.net/index.html">Home</a></li>
+<li><a href="http://abacas.sourceforge.net/documentation.html">Documentation</a></li>
+<li><a href="https://sourceforge.net/projects/abacas/files/">Download</a></li>
+<li><a href="http://sourceforge.net/tracker/?group_id=238123&atid=1105374">Bugs</a></li>
+<li><a href="http://mummer.sourceforge.net/">MUMmer</a></li>
+<li><a href="http://www.sanger.ac.uk/Software/ACT/">ACT</a></li>
+<li><a href="http://www.sanger.ac.uk/Projects/Pathogens/">Pathogen
+Genomics</a></li>
+<li><a href="http://www.sanger.ac.uk">WTSI</a></li>
+</ul>
+</div>
+<div id="stuff">
+<ol>
+<li><a href="#Overview"><span style="font-weight: bold;">Overview</span></a></li>
+<li><a href="#3._Requirement"><span style="font-weight: bold;">Instructions for use</span></a></li>
+<li><a href="#3._Requirement"><span style="font-weight: bold;">Requirement</span></a></li>
+<li><a href="#4._Usage"><span style="font-weight: bold;">Usage</span></a></li>
+<li><a href="#5._Options"><span style="font-weight: bold;">Options</span></a></li>
+<li><a href="#6._Input_files"><span style="font-weight: bold;">Input files</span></a></li>
+<li><a href="#7._Default_output_files"><span style="font-weight: bold;">Default output files</span></a></li>
+<li><a href="#8._Optional_output_files"><span style="font-weight: bold;">Optional output files</span></a></li>
+<li><a href="#9._Colour_code"><span style="font-weight: bold;">Colour code</span></a></li>
+<li><a href="#10._Explanation_of_output_files"><span style="font-weight: bold;">Explanation of output files</span></a></li>
+<li><a href="#11._Contact"><span style="font-weight: bold;">Contact </span></a></li>
+<li><span style="font-weight: bold;"><a href="#12._Test_dataset">Test dataset</a><br />
+</span></li>
+</ol>
+<span style="font-weight: bold;"><br />
+</span>
+<h2><span style="font-weight: bold;">1. Overview </span></h2>
+<style type="text/css">!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style><br />
+<div style="text-align: justify;"><big style="font-family: Times New Roman,Times,serif;">ABACAS is
+intended to rapidly
+contiguate (align, order, orientate) , visualize and design primers
+to close gaps on shotgun assembled contigs based on a reference
+sequence. It uses MUMmer to find alignment positions and identify
+syntenies of assembly contigs against the reference. The output is
+then processed to generate a pseudomolecule taking overlaping contigs
+and gaps in to account. MUMmer's alignment generating programs,
+Nucmer and Promer are used followed by the 'delta-filter' utility
+function. Users could also run tblastx on contigs that are not used
+to generate the pseudomolecule. </big><br />
+<br />
+<big style="font-family: Times New Roman,Times,serif;">If the
+blast search or
+consecutive MUMmer alignments result in mapping of extra contigs,
+finishers can use the Arthemis Comparision Tool (ACT) to easily
+modify ordering of contigs on the pseudomolecule by dragging and
+droping contigs to the desired location. Overlapping contigs and gaps
+in the pseudololecule are represented by "N"s. Overlapping
+contigs are often due to low quality contig ends and low complexity
+regions. ABACAS could automatically extract gaps on the
+pseudomolecule and generate primer oligos for gap closure using
+Primer3. Uniqueness of primer sets is checked by running a sensitive
+NUCmer alignment. If a quality file (contig_name.qual) exists in the
+working directory, it will be used during the primer design step</big><span style="font-family: Times New Roman,Times,serif;">.</span><br />
+</div>
+<span style="font-weight: bold;"><a name="2._Instructions_for_use"></a><br />
+</span>
+<h2><span style="font-weight: bold;">2. Instructions
+for use</span></h2><title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+--> </style>
+<ol>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">Download ABACAS from the <a href="https://sourceforge.net/projects/abacas/files/">download page</a> </font> </big></li>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">Run: abacas -r
+<reference> -q <contigs> -p
+<nucmer|promer><br />
+</font><font face="Times New Roman, serif"><b>NOTE:</b></font><font face="Times New Roman, serif"> If ABACAS cannot find <a href="http://mummer.sourceforge.net/">MUMmer</a> from
+the default path - it will prompt the user to enter the location of
+MUMer </font> </big></li>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">ABACAS may take several
+minutes to run for large genomes/chromosomes and will produce a number
+of different output files in the working directory </font> </big></li>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">Start <a href="http://www.sanger.ac.uk/Software/ACT/">ACT</a>
+and load the sequence and comparison files as printed out by ABACAS. </font>
+</big></li>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">In ACT, load the contig names
+by going to 'File', <query>, 'Read an Entry', and select
+the file <query>_<reference>.tab </font>
+</big></li>
+<li style="text-align: justify;"><big><font face="Times New Roman, serif">You can also load the repeat
+plot for the reference which tells whether or not gaps are due to
+repetitive sequence. You can load this by going to 'Graph',
+'<reference>', 'Add User Plot', and select the file
+'<reference>.Repeats.plot' </font> </big></li>
+<li>
+<div style="text-align: justify;"><big><font face="Times New Roman, serif">The file
+'<query>.bin' contains the names of the contigs that were
+not mapped and mapped multiple times to the reference.</font></big></div>
+</li>
+</ol>
+<h2><a name="3._Requirement"></a><span style="font-weight: bold;">3. Requirement</span></h2><title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p style="text-align: justify;"><big><font face="Times New Roman, serif">ABACAS requires MUMmer to be
+installed in the working path for ordering and orienting of contigs.
+If MUMmer is not found in the working path, users will be asked to
+provide a valid path for MUMmer. The Arthemis Comparision Tool (ACT)
+should be downloaded for visualizing scaffolding of contigs. Primer
+design requires Primer3. Optionally, BLASTALL is required in order to
+run tblastx on the contigs that are not mapped using Nucmer or
+Promer.</font></big></p>
+<h2><span style="font-weight: bold;"><a name="4._Usage"></a>4. Usage</span></h2>
+
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p><font face="Times New Roman, serif"><font style="font-size: 10pt;" size="2"><b>abacas
+-r <reference file: single fasta> -q <query
+sequence file:
+fasta> -p <nucmer/promer> [Options]</b></font></font></p>
+<p><big> <font face="Times New Roman, serif"> for
+contig ordering and primer design </font>
+</big></p>
+<p><big><font face="Times New Roman, serif">OR<br />
+<b>abacas -r <reference
+file: single fasta> -q
+<pseudomolecule/ordered file:
+fasta> -e </b></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+to escape contig ordering and
+go directly to primer design </font>
+</big></p>
+<p><big><font face="Times New Roman, serif">OR
+</font>
+</big></p>
+<p><font face="Times New Roman, serif"><big><b>abacas
+-h </b>for help</big></font></p>
+<p><font face="Times New Roman, serif">
+</font></p>
+<h2><span style="font-weight: bold;"><a name="5._Options"></a>5. Options</span></h2>
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-d
+ use
+default NUCmer
+or PROmer parameters</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *ABACAS
+uses <i>--maxmatch</i> to increase mapping sensitivity
+during mapping. The option -d could
+be useful while dealing with larger genomes or when a higher
+sensitivity is not required.</font></big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-m
+ print
+ordered and
+orientated contigs to file</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *This
+option is helpful if
+users want to further investigate the ordering using other alignment
+algorithms such as blast. </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-b
+ print
+contigs in the
+bin file to multi-fasta file</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *contigs
+that are not used in
+generating the pseudomolecule will be placed in a '.bin' file. Since
+this file only contains contig names, the -b option could be used to
+print these contigs to a file for further analysis. Note that
+this
+option is required if users are interested in running a blast search.</font></big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-N
+ generate
+a
+pseudomolecule without 'N's</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *ABACAS
+produces a
+pseudomolecule ('.fasta' file) and fills gaps with 'N's. It also puts
+ 100 'N's between overlapping contigs. This option will
+produce another pseudomolecule without padding (.NoNs.fasta). </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"><i><span style="color: red;">-i
+ default 40</span> </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ *minimum
+percent identity
+could vary from 0 to 100 depending on the closeness of the two genomes.
+Choosing a smaller value will pull in more contigs and vice
+versa </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"><i><span style="color: red;">-v
+ default 40</span> </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ *minimum
+contig coverage: set
+a value between 0 and 100 </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-V
+ default 1
+</i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ *minimum contig coverage
+difference. Use -V 0 to place contigs randomly to one of the positions
+(in cases where a contig maps to multiple places) </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"><i><span style="color: red;">-l
+ default 100</span> </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ * contigs below this cutoff
+will not be used </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-t
+ run
+tblastx on
+contigs that are not used to generate the pseudomolecule</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *-t will
+run blastall on a
+fasta file of contigs in the .bin file. The option -b should be used to
+generate this file</font></big></p>
+<p><big><font face="Times New Roman, serif"><i><span style="color: red;">-g
+ file_name</span> </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ * will
+print sequences of the
+reference that correspond with gaps on the pseudomolecule in a
+multi-fasta format </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-a
+ append
+contigs in the
+.bin file to the end of the pseudomolecule</i></font></big></p>
+<p><big><font face="Times New Roman, serif">
+ *Contigs
+could then be easily
+manipulated and re-ordered using ACT's graphica interface</font></big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-o
+ prefix
+(string)</i> </font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ *output
+files will have this
+prefix </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-P
+ pick
+primer oligos to
+close gaps </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif"><i><span style="color: red;">-f
+ default 1000</span> </i></font>
+</big></p>
+<p><big><font face="Times New Roman, serif">
+ *number
+of flanking bases on
+either side of a gap for primer design (default 1000bp ) </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>-R
+ avoid
+running mummer</i></font></big></p>
+<p style="color: red;"><big> <font face="Times New Roman, serif"><i>-e
+ Escape
+contig ordering i.e. go to primer design </i></font>
+</big></p>
+<p style="color: red;"><font face="Times New Roman, serif"><i><big>-c
+ Reference
+sequence is circular</big> </i></font>
+</p>
+<h2><br />
+<span style="font-weight: bold;"><a name="6._Input_files"></a><a name="6._Input_files"></a>6.
+Input files</span></h2>
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p><big><font face="Times New Roman, serif">Two
+fasta files containing the
+reference and query (contigs) sequences are required. The reference
+file should be in a single fasta format for speedy contig ordering
+and orientation.</font></big></p>
+<h2><a name="7._Default_output_files"></a></h2>
+<h2><br />
+</h2>
+<h2><a name="7._Default_output_files"></a>7. <span style="font-weight: bold;">Default output files</span></h2>
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p><big><font face="Times New Roman, serif">Running
+ABACAS with default
+options will generate the following files: </font>
+</big></p>
+<ol>
+<li>
+<p><big><font face="Times New Roman, serif">Ordered
+and orientated sequence file (reference_query.fasta or prefix.fasta) </font>
+</big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Feature
+file (reference_query.tab or prefix.tab) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Bin
+file that contains contigs that are not used in ordering
+(reference_query.bin or prefix.bin) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Comparison
+file (reference_query.crunch or prefix.crunch) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Gap
+information (reference_query.gaps, prefix.gaps) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Information
+on contigs that have a mapping information but could not be used in the
+ordering (unused_contigs.out) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">Feature
+file to view contigs with ambiguous mapping
+(reference.notMapped.contigs.tab). This file should be uploaded on the
+reference side of ACT view. </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+file that shows how repetitive the reference genome is
+(reference.Repeats.plot). </font> </big></p>
+</li>
+</ol>
+<p><big><font face="Times New Roman, serif">Files
+7 & 8 should be uploaded on the reference side of ACT view. </font>
+</big></p>
+<p><big><font face="Times New Roman, serif">Please
+note that contigs in
+the '.fasta' file will be reverse complemented if they are found to
+map on the reverse strand. However, the ACT view shows the initial
+orientation of these contigs i.e. they will be shown on the reverse
+strand. If you write a fasta file of the pseudomolecule from ACT, the
+resulting sequence will be a set of ordered contigs (the orientation
+will not change). It is therefore recommended to use the '.fasta'
+pseudomolecule file automatically generated for further
+investigation.</font></big></p>
+<h2><br />
+</h2>
+<h2><a name="8._Optional_output_files"></a>8. <span style="font-weight: bold;">Optional output files</span></h2>
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p><big><font face="Times New Roman, serif">It
+is also possible to
+generate additional files including: </font>
+</big></p>
+<ol>
+<li>
+<p><big><font face="Times New Roman, serif">A
+list of ordered and orientated contigs in a multi-fasta format (-m ) .</font></big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+pseudomolecule with all unmapepd contigs appended to the end for
+reordering (-a ) .</font></big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+pseudomolecule where the gaps are not padded with N (-N )</font></big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+multi-fasta file of all unmapped contigs (-b ) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+multi-fasta file of regions on the reference that correspond to gaps on
+pseudomolecule (-g file_name) </font> </big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+list of sense and antisense primer sets in separate files </font>
+</big></p>
+<big> </big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+list of locations where sense and antisense primers are found in two
+separate files</font></big></p>
+</li>
+<li><big><font face="Times New Roman, serif">Non-unique
+primers and primers near contig ends will be printed to a file (primers
+to exclud)<br />
+</font></big></li>
+<li>
+<p><big><font face="Times New Roman, serif">A
+standard primer3 output summary file with a detailed information on
+oligos. </font> </big></p>
+</li>
+</ol>
+<p><br />
+</p>
+<h2><a name="9._Colour_code"></a>9. Colour code</h2>
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p><big><font face="Times New Roman, serif">The
+feature (file 2 from
+default output section) file has the following colour codes: </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"> <i>Dark
+blue (4):</i> contigs
+with forward orientation </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"> <i>Dark
+green (3):</i> contigs
+with reverse orientations </font>
+</big></p>
+<p><big><font face="Times New Roman, serif"> <i>Sky
+blue (5):</i> contigs
+that overlap with the next contig </font>
+</big></p>
+<p><font face="Times New Roman, serif"> <big><i>Yellow
+(7):</i> contigs
+that have no hit (only added to the pseudomolecule if '-a ' is used)</big>
+</font>
+</p>
+<h2><br />
+<span style="font-weight: bold;"></span><span style="font-weight: bold;"></span></h2>
+<h2><span style="font-weight: bold;"><a name="10._Explanation_of_output_files"></a>10.
+Explanation of output files</span></h2>
+<title></title>
+
+<meta name="GENERATOR" content="OpenOffice.org 2.4 (Linux)" />
+<style type="text/css">
+<!--
+ at page { size: 8.5in 11in; margin: 0.79in }
+P { margin-bottom: 0.08in }
+-->
+</style>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>Comparison file
+(.crunch
+file)</i></font></big></p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">72
+100 1 23328 contig00198 1 16690 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+100 29503 29782 contig00002 22865 23144 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+100 29952 52948 contig00087 23314 46310 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+100 52986 63111 contig00243 46348 56473 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+94 63118 63576 contig00217 56480 56938 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+100 63775 63932 contig00224 57137 57294 unknown NONE </font></font>
+</p>
+<p style="margin-left: 0.49in; font-style: normal;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">100
+100 63933 64216 contig00250 57295 57578 unknown NONE </font></font>
+</p>
+<p><font face="Times New Roman, serif"><big>The
+first seven columns of the
+comparison file represent c<i>overage, percent identity, start on
+pseudomolecule, end on pseudomolecule, contig ID, start on reference
+and end on reference</i>.</big> </font>
+</p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>Gap file (.gaps)</i></font></big></p>
+<p style="margin-left: 0.49in;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">Gap 6174 23329
+29502 16691 22864 </font></font>
+</p>
+<p style="margin-left: 0.49in;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">Gap 6 63112 63117
+56474 56479 </font></font>
+</p>
+<p style="margin-left: 0.49in;"><font face="Courier New, monospace"><font style="font-size: 10pt;" size="2">Gap 198 63577 63774
+56939 57136 </font></font>
+</p>
+<p style=""><big><font face="Times New Roman, serif">Columns
+2-6 represent <i>gap size, start on pseudomolecule, end on
+pseudomolecule, start on reference and end on reference. </i></font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>Bin
+file (.bin)</i></font></big></p>
+<p><big><font face="Times New Roman, serif">A
+list of contig names that
+could not be used in generating a pseudomolecule. </font>
+</big></p>
+<p style="color: red;"><big><font face="Times New Roman, serif"><i>Fasta file (.fasta)</i></font></big></p>
+<p style="font-style: normal;"><big><font face="Times New Roman, serif">This
+is the pseudomolecule generated from ordered and orientated contigs.
+Overlapping contigs are separated by 100 'N's. Gaps are also
+represented by 'N's.</font></big></p>
+<br />
+<h2><span style="font-weight: bold;"><a name="11._Contact"></a>11. Contact</span></h2>
+Please email sa4 {at} sanger.ac.uk if you have any problems or comments.<br />
+<br />
+<h2><a name="12._Test_dataset"></a>12. <span style="font-weight: bold;">Test dataset</span></h2>
+We have provided a test dataset from <i><a href="http://www.sanger.ac.uk/Projects/S_suis/">Streptococcus
+suis</a></i> which consists of a set of 454 contigs and the
+reference genome.<br />
+<table style="width: 519px; height: 43px;" align="center" border="1">
+<tbody>
+<tr>
+<td>454 Contigs</td>
+<td><a href="454AllContigs.fna">download</a></td>
+</tr>
+<tr>
+<td>Reference</td>
+<td><a href="ftp://ftp.sanger.ac.uk/pub/pathogens/ss/S_suis_SC84.dna">download</a></td>
+</tr>
+</tbody>
+</table>
+</div>
+</div>
+<div id="bottom"> <br />
+</div>
+<p style="text-align: center;"><a href="https://sourceforge.net/">SourceForge</a>
+ <a href="www.sanger.ac.uk"><img style="border: 0px solid ; width: 100px; height: 29px;" alt="sanger logo" src="sanger_w100.png" /></a>
+ <a href="http://www.biomalpar.org/"><img style="border: 0px solid ; width: 71px; height: 56px;" alt="biomalpar" src="biomalpar.jpg" /></a></p>
+</div>
+</body></html>
+
+<!--
+NO http-equiv
+-->
+ContFiltSE:8020
+
diff --git a/abacas b/abacas
new file mode 100644
index 0000000..ece1707
--- /dev/null
+++ b/abacas
@@ -0,0 +1,1876 @@
+#!/usr/bin/perl
+# Copyright (C) 2008-10 Genome Research Limited. All Rights Reserved.
+#
+# This program is free software; you can redistribute it and/or
+# modify it under the terms of the GNU General Public License
+# as published by the Free Software Foundation; either version 2
+# of the License, or (at your option) any later version.
+#This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
+
+#ABACAS.1.3.1
+#--------------------------------
+#Please report bugs to:
+#sa4 at sanger.ac.uk & tdo at sanger.ac.uk
+
+
+use strict;
+use warnings;
+use POSIX qw(ceil floor);
+use Getopt::Std;
+our $version="1.3.1";
+#-------------------------------------------------------------------------------
+
+if (@ARGV < 1) { usage();}
+
+my ($help, $reference, $query_file, $choice, $sen, $seed, $mlfas, $fasta_bin, $avoid_Ns,
+ $tbx, $min_id, $min_cov, $diff_cov, $min_len,$add_bin_2ps, $pick_primer,
+ $flank, $chk_uniq,$redoMummer, $is_circular,$escapeToPrimers, $debug, $gaps_2file, $prefix,$optionsLog)
+ =checkUserInput( @ARGV );
+
+my $ref_inline;
+if ($escapeToPrimers ==1)
+{
+ pickPrimers ($reference, $query_file, $flank, $chk_uniq);
+ exit;
+}
+#BEGIN
+#-------------------------------------------------------------------------------
+print_header();
+print $optionsLog;
+
+$ref_inline = Ref_Inline($reference);
+#Get length of the reference sequence
+my $ref_len = length($ref_inline);
+
+
+###################
+# Running MUMmer #
+###################
+my ($path_dir, $run_mum, $path_toPass);
+if ($debug)
+{
+ print "the seed is $seed \n";
+ print "RedoMummer= ",$redoMummer."\n";
+}
+my @do_mum_return;
+my $mummer_tiling;
+if ($redoMummer==0)
+{
+ print "PREPARING DATA FOR $choice \n";
+ @do_mum_return = doMummer($reference, $query_file, $choice,$sen,$seed,$min_id, $min_cov, $diff_cov,$min_len, $debug, $is_circular) or die "Couldn't run MUMmer\n";
+ $mummer_tiling = $do_mum_return[0];
+ $path_dir = $do_mum_return[2];
+ $path_toPass = $do_mum_return[1];
+
+}
+elsif ($redoMummer ==1)
+{ print "not doing mummer\n";
+ my @check = checkProg ($choice);
+ $mummer_tiling = '$choice.tiling';
+ $path_dir = $check[2];
+ $path_toPass = $check[1];
+}
+else
+{
+ print "Unknown option for -R\n";
+ exit;
+}
+
+####################################
+# Processing tiling output #
+####################################
+if ($debug) {
+ print "Do tiling...\n";
+}
+
+#--------------------------------
+
+##################
+#Do Tiling
+#-------------------------------------------
+doTiling ($mummer_tiling, $path_toPass, $path_dir,$reference, $query_file, $choice, $prefix,$mlfas, $fasta_bin, $avoid_Ns, $ref_len, $gaps_2file, $ref_inline, $add_bin_2ps, $pick_primer, $flank, $chk_uniq, $tbx);
+
+
+
+
+
+############################################## SUB ROUTINES for CONTIG ORDERING and PRIMER DESIGN ##########################################################
+# Put in one file for ease of downloading. They could be placed in separate packages.
+#----------------------------------------------------------------------------------------------------------
+#################################Contig ordering ##########################################################
+########
+
+sub help
+{
+
+die <<EOF
+
+***********************************************************************************
+* ABACAS: Algorithm Based Automatic Contiguation of Assembled Sequences *
+* *
+* *
+* Copyright (C) 2008-10 The Wellcome Trust Sanger Institute, Cambridge, UK. *
+* All Rights Reserved. *
+* *
+***********************************************************************************
+
+USAGE
+abacas -r <reference file: single fasta> -q <query sequence file: fasta> -p <nucmer/promer> [OPTIONS]
+
+ -r reference sequence in a single fasta file
+ -q contigs in multi-fasta format
+ -p MUMmer program to use: 'nucmer' or 'promer'
+OR
+abacas -r <reference file: single fasta> -q <pseudomolecule/ordered sequence file: fasta> -e
+OPTIONS
+ -h print usage
+ -d use default nucmer/promer parameters
+ -s int minimum length of exact matching word (nucmer default = 12, promer default = 4)
+ -m print ordered contigs to file in multifasta format
+ -b print contigs in bin to file
+ -N print a pseudomolecule without "N"s
+ -i int mimimum percent identity [default 40]
+ -v int mimimum contig coverage [default 40]
+ -V int minimum contig coverage difference [default 1]
+ -l int minimum contig length [default 1]
+ -t run tblastx on contigs that are not mapped
+ -g string (file name) print uncovered regions (gaps) on reference to file name
+ -a append contigs in bin to the pseudomolecule
+ -o prefix output files will have this prefix
+ -P pick primer sets to close gaps
+ -f int number of flanking bases on either side of a gap for primer design (default 350)
+ -R int Run mummer [default 1, use -R 0 to avoid running mummer]
+ -e Escape contig ordering i.e. go to primer design
+ -c Reference sequence is circular
+
+EOF
+}
+########
+sub usage
+{
+
+die <<EOF
+---------------------------------------------------------------------------------------------------
+
+ABACAS.$version
+visit www.abacas.sourceforge.net for more information.
+--------------------------------
+Please report bugs to:sa4 (at)sanger.ac.uk and tdo (at) sanger.ac.uk
+----------------------------------------------------------------------------------------------------
+
+USAGE
+abacas -r <reference file: single fasta> -q <query sequence file: fasta> -p <nucmer/promer> [OPTIONS]
+ -r reference sequence in a single fasta file
+ -q contigs in multi-fasta format
+ -p MUMmer program to use: 'nucmer' or 'promer'
+for contig ordering and primer design
+
+OR
+abacas -r <reference file: single fasta> -q <pseudomolecule/ordered sequence file: fasta> -e
+to escape contig ordering and go directly to primer design
+
+OR
+abacas -h for help
+
+EOF
+}
+########
+##################
+sub print_header
+{
+ print "
+***********************************************************************************
+* ABACAS: Algorithm Based Automatic Contiguation of Assembled Sequences *
+* *
+* *
+* Copyright (C) 2008-10 The Wellcome Trust Sanger Institute, Cambridge, UK. *
+* All Rights Reserved. *
+* *
+***********************************************************************************
+\n";
+}
+#########################################
+sub checkUserInput{
+ my %options;
+ getopts('hr:q:p:ds:mbNi:v:V:l:tg:ao:Pf:Ru:ecD', \%options);
+ my $optionsLog="# Checking user options:";
+ my ($help, $reference, $query_file, $choice, $sen, $seed, $mlfas, $fasta_bin, $avoid_Ns,
+ $tbx, $min_id, $min_cov, $diff_cov, $min_len,$add_bin_2ps, $pick_primer,
+ $flank, $chk_uniq,$redoMummer, $is_circular,$escapeToPrimers, $debug, $gaps_2file, $prefix);
+
+ if($options{h}) {
+ $help = $options{h};
+ help();
+ }
+ if ($options{r} && $options{q} ){
+ ($reference, $query_file) = ($options{r},$options{q});
+ $optionsLog.="\n#\t-r Reference=$reference\n#\t-q Query=$query_file\n";
+ }else{
+ usage() unless $options{e};
+ }
+
+ if ($options{p}){
+ $choice = $options{p};
+ $optionsLog.="#\t-p $choice\n";
+ unless ($choice eq "nucmer" || $choice eq "promer"){
+ print "Unknown MuMmer function\n Please use nucmer or promer\n";
+ exit;
+ }
+ }else{
+ usage() unless $options{e};
+ }
+ if ($options{e}){ #$escapeToPrimers)
+ print_header();
+ print "Primer design selected,... escaping contig ordering\n";
+ $escapeToPrimers = 1;
+ $chk_uniq = "nucmer";
+ $choice = "";
+ $optionsLog.="#\t-e Primer design selected,... escaping contig ordering\n";
+ }else{
+ $escapeToPrimers = 0;
+ }
+ if ($options{d}) {
+ $sen =1;
+ $optionsLog.="#\t-d use default setting i.e. --mumreference in $choice\n";
+ } else {
+ $sen =0;
+ $optionsLog.="#\t-d 0 use sensitive mapping in $choice i.e. --maxmatch\n";
+ } #print $sen , " ---sen\n";
+ #print $options{t}, "\n"; exit;
+ if($options{t}) {$tbx = 1;} else {$tbx = 0;} #print $tbx, " ---tbx\n"; #
+ if ($options{s}){
+ $seed = $options{s};
+ $optionsLog.="#\t-s seed=$seed\n";
+ }
+ else{
+ if ($choice eq "nucmer"){
+ $seed = 12;
+ }
+ else{
+ $seed =4;
+ }
+
+ }
+ if ($options{m}){
+ $mlfas =1;
+ $optionsLog.="#\t-m print multifasta file of ordered contigs\n"
+ } else { $mlfas =0; } #print $mlfas , " ---mlfasta\n";
+ if ($options{b}) {
+ $fasta_bin =1;
+ $optionsLog.="#\t-b print multifasta file of contigs in bin to file\n"
+ } else {$fasta_bin =0;}
+ if ($options {N}){
+ $avoid_Ns =1;
+ $optionsLog.="#\t-N don't print Ns in pseudo-molecule\n"
+ } else {$avoid_Ns=0;}
+
+ if($options{i}){
+ $min_id=$options{i};
+ $optionsLog.="#\t-i $min_id is the minimum identity cutoff\n";
+ }
+ else{$min_id =40;
+ # $optionsLog.="#\t-i not defined: $min_id is the default minimum identity cutoff\n";
+ }
+ if ($options{v}){
+ $min_cov = $options{v};
+ $optionsLog.="#\t-v $min_cov is the minimum contig-coverage cutoff\n";
+ }
+ else{
+ $min_cov =40;
+ #$optionsLog.="#\t-v not defined $min_id is the default contig-coverage cutoff\n";
+ }
+ if($options{V}){
+ $diff_cov = $options{V};
+ $optionsLog.="#\t-V $diff_cov\n";
+ }
+ else {$diff_cov =1;
+ # $optionsLog.="#\t-V $diff_cov using the default value\n";
+ }
+ if ($options {l}){
+ $min_len = $options {l};
+ $optionsLog.="#\t-l $min_len is the minimum length of contigs to be ordered\n";
+ }
+ else{$min_len = 1;
+ # $optionsLog.="#\t-l not defined: using 1 as the default minimum length of contigs to be ordered\n";
+ }
+ if ($options{a}) {$add_bin_2ps = 1; }else {$add_bin_2ps =0;}
+ if ($options{P}) {$pick_primer=1;} else {$pick_primer =0;}
+ if ($options{f}) {$flank = $options{f};} else {$flank = 1000;}
+ if ($options{u}) {$chk_uniq = $options{u}; } else {$chk_uniq = "nucmer";}
+
+ #unless ($options{R}) {$re}
+ if($options{R}) {$redoMummer = 1; } else {$redoMummer = 0;}
+
+ if($options{c}) {$is_circular = 1;}else {$is_circular =0;}
+ if ($options{g}) {$gaps_2file = $options{g};} else {$gaps_2file ="";}
+ if($options{o}) {$prefix = $options{o};} else {$prefix = "";}
+ if($options{D}){
+ $debug=1;
+ $optionsLog.="#\t-D debug\n";
+ }
+ else {$debug =0};
+ if ($tbx ==1 && $fasta_bin !=1)
+ {
+ print "ERROR: Please use -t -b if you want to run tblastx on contigs in bin\n";
+ exit;
+ }
+ # print $redoMummer , "\n"; exit;
+
+ $optionsLog.="#\tInput checking done!!\n";
+ #print $optionsLog;
+ return ($help, $reference, $query_file, $choice, $sen, $seed, $mlfas, $fasta_bin, $avoid_Ns,
+ $tbx, $min_id, $min_cov, $diff_cov, $min_len,$add_bin_2ps, $pick_primer,
+ $flank, $chk_uniq,$redoMummer, $is_circular,$escapeToPrimers, $debug, $gaps_2file, $prefix,$optionsLog);
+
+} ## end of checkUserInput
+#############
+## get the reference sequence in one line
+#--------------------------------------------------
+sub Ref_Inline
+{
+ my $ref = shift;
+ open (refFH, $ref) or die "Could not open file $ref\n";
+ my $seq ="";
+ my @r = <refFH>;
+ my $num_chr =0;
+ foreach(@r){
+ if ($_ =~ /\>/){
+ $num_chr +=1;
+ }
+ }
+ if ($num_chr > 1){
+ print "\nERROR: Please use a single fasta reference file. You can simply merge chromosomes in to a union fasta file.\n\n";
+ exit;
+ }
+ shift @r;
+ foreach(@r){
+ chomp;
+ $seq = $seq.$_;
+ }
+ return $seq;
+}
+################
+# Run mummer
+#--------------------------------------------
+sub doMummer
+{
+ my ($reference, $query_file, $choice, $sen,$seed,$min_id, $min_cov, $diff_cov,$min_len, $debug, $is_circular ) = @_;
+
+ my $df = 'delta-filter';
+ my $st = 'show-tiling';
+ my $ask = 'which';
+ my ($path_toPass, $run_mum); # params to return...
+ my ($command, $Path, $dir) = checkProg($choice);
+ my ($run_df, $df_path, $df_dir) = checkProg($df);
+ my ($run_st, $st_path, $st_dir) = checkProg($st);
+ my (@running, @deltaRes, @coordsRes);
+ if ($choice eq "nucmer")
+ {
+ if ($sen ==0)
+ {
+ @running = `$command --maxmatch -l $seed -p $choice $reference $query_file &> /dev/null`;
+ @deltaRes = `$run_df -q $choice.delta >$choice.filtered.delta`;
+ if ($is_circular == 1)
+ {
+ @coordsRes = `$run_st -c -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+ }
+ else
+ {
+ @coordsRes = `$run_st -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+
+ }
+ }
+ else
+ {
+ @running = `$command -p $choice $reference $query_file &> /dev/null`;
+ @deltaRes = `$run_df -q $choice.delta >$choice.filtered.delta`;
+ if ($is_circular ==1) {@coordsRes = `$run_st -c -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;}
+ else { @coordsRes = `$run_st -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;}
+ }
+
+ }
+ else
+ {
+ if ($sen ==0)
+ {
+ @running = `$command --maxmatch -l $seed -x 1 -p $choice $reference $query_file &> /dev/null`;
+ @deltaRes = `$run_df -q $choice.delta >$choice.filtered.delta`;
+ if ($is_circular == 1)
+ {
+ @coordsRes= `$run_st -c -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+ }
+ else
+ {
+ @coordsRes= `$run_st -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+ }
+ }
+ else
+ {
+ @running = `$command -l $seed -p $choice $reference $query_file &> /dev/null`;
+ @deltaRes = `$run_df -q $choice.delta >$choice.filtered.delta`;
+ if ($is_circular == 1) {
+ @coordsRes= `$run_st -c -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+ }
+ else
+ {
+ @coordsRes= `$run_st -i $min_id -v $min_cov -V $diff_cov -l $min_len -R -u unused_contigs.out $choice.filtered.delta > $choice.tiling`;
+ }
+ }
+ }
+ my $Coordsfull= "$choice.tiling";
+ return ($Coordsfull,$Path, $dir);
+}
+
+## #############################################################################
+sub checkProg{ #checks if a given excutable is in the path...
+ my $prog = shift;
+ my $ask = 'which';
+ my @check_prog = `$ask $prog`;
+ my $path_toPass;
+ my $path_dir;
+ my $command;
+ if (defined $check_prog[0] && $check_prog[0] =~ /$prog$/)
+ {
+ $path_toPass = $prog;
+ $command = $prog;
+ }
+ else
+ {
+ print "\nENTER the directory for your ", $prog, " executables [default ./]: ";
+ my $path=<STDIN>;
+ chomp $path;
+ $path_dir = $path;
+ if ($path_dir =~/\/$/)
+ {
+ $path_dir = $path_dir;
+ }
+ else
+ {
+ $path_dir = $path_dir."/";
+ }
+ my @final_check = `$ask $command`;
+ if (exists $final_check[0] && $final_check[0] =~ /$prog$/)
+ {
+ $command = $path_dir.$prog;
+ $path_toPass = $command;
+ }
+ else
+ {
+ print "ERROR: Could not run ", $prog, ", please check if it is installed in your path\n or provide the directory \n";
+ exit;
+ }
+ }
+
+ return ($command, $path_toPass, $path_dir);
+}
+
+##############################
+# converts a fasta file to an ordered single line
+#--------------------------------------------------
+sub Fasta2ordered
+{
+ if( @_ != 1 )
+ {
+ print "Usage: Fasta2ordered <fasta_file>\n";
+ exit;
+ }
+ my $fast = shift; #print $fast; exit;
+ my @fasta = split (/\n/, $fast);
+ if ($fasta[0] =~ /\>/ ) #remove chromosome name if exists in the onput sequence.
+ {
+ my $ch_name = $fasta[0];
+ shift @fasta;
+ }
+ #print $fasta[0]; exit;
+ foreach(@fasta){chomp;}
+ my $num_lines = scalar(@fasta);
+ my $dna = '';
+ for(my $i=0; $i< $num_lines; $i+=1)
+ {
+ $dna = $dna.$fasta[$i];
+ }
+ my $ordered_dna = $dna;
+ return $ordered_dna;
+}
+##############################################################
+# Hash input contigs
+#------------------------------------------------
+
+sub hash_contigs {
+ if( @_ != 1 )
+ {
+ print "Usage: hash_contigs contigs_file";
+ exit;
+ }
+ my $contigs_file = shift;
+ if( $contigs_file =~ /\.gz$/ )
+ {
+ open(CONTIGS, $contigs_file, "gunzip -c $contigs_file |" ) or die "Cant open contigs file: $!\n";
+ }
+ else
+ {
+ open( CONTIGS, $contigs_file) or die "Cant open contigs file: $!\n";
+ }
+
+ my %contigs_hash; # hash to store contig names and sequences
+ my $contigName;
+
+
+ while (<CONTIGS>) ##add error checking...
+ {
+ if (/^>(\S+)/) {
+ $contigName=$1;
+ }
+ else {
+ chomp;
+ $contigs_hash{$contigName} .= $_;
+ }
+ }
+ close(CONTIGS);
+ #tdo
+ ## check if qual exists
+ my %contigs_qual_hash;
+
+
+ if (-r "$contigs_file.qual" or -r "$contigs_file.qual.gz") {
+ if( -r "$contigs_file.qual.gz" )
+ {
+ open(CONTIGS, "$contigs_file.qual.gz", "gunzip -c $contigs_file.qual.gz |" ) or die "Cant open contigs file: $!\n";
+ }
+ else
+ {
+ open( CONTIGS, "$contigs_file.qual" ) or die "Cant open contigs file: $!\n";
+ }
+
+
+ while (<CONTIGS>) {
+ if (/^>(\S+)/) {
+ $contigName=$1;
+ }
+ else {
+ chomp;
+ $contigs_qual_hash{$contigName} .= $_;
+ }
+ }
+
+ } # end tdo # end if exist
+
+
+ return (\%contigs_hash,\%contigs_qual_hash);
+}
+#######################
+##############################
+### it gets a delta name
+sub getMummerComparison{
+ my $deltaName = shift;
+
+
+ ### transform the delta file to coords
+ my $call ="show-coords -H -T -q $deltaName > $deltaName.coords ";
+ !system(" $call") or die "Problems doing the show-coords comparison: $call $!\n";
+
+
+ ### willh old results
+ my %h;
+
+ ### has as index the postion with the max hits
+ my %h_max;
+ my $tmp=0;
+ my $tmp_index;
+ my $key='';
+ my $is_promer =0;
+ if ($deltaName =~/^promer/)
+ {
+ $is_promer =1;
+ }
+
+ open (F,"$deltaName.coords") or die "Problem in getComparisonFile to open file $deltaName.coords\n";
+ my @File=<F>;
+ my @a;
+ @a=split(/\s+/,$File[0]);
+ $tmp=$a[5];
+ $tmp_index=$a[0];
+ if ($is_promer ==1)
+ {
+ $key=$a[12]; ## nucmer: $key = $a[8]
+ }
+ else
+ {
+ $key = $a[8];
+ }
+
+ foreach (@File) {
+ @a=split(/\s+/);
+
+ if ($is_promer ==1)
+ {
+ push @{ $h{$a[12]}}, "$a[12]\t$a[11]\t$a[7]\t$a[5]\t0\t0\t$a[2]\t$a[3]\t$a[0]\t$a[1]\t1\t$a[5]\n";
+ if ($key eq $a[12] and $a[5]>$tmp)
+ {
+ $tmp=$a[5]; # length
+ $tmp_index=$a[0]; # position reference
+ }
+ elsif ($key ne $a[12]) {
+ ### here possible bugg...
+ $h_max{$tmp_index}=$key;
+ $key=$a[12];
+ $tmp=$a[5]; # length
+ $tmp_index=$a[0]; # position reference
+
+ }
+ } #end if
+ #else i.e. if nucmer
+ else
+ {
+ push @{ $h{$a[8]}}, "$a[8]\t$a[6]\t$a[6]\t$a[5]\t0\t0\t$a[2]\t$a[3]\t$a[0]\t$a[1]\t1\t$a[5]\n";
+ if ($key eq $a[8] and $a[5]>$tmp)
+ {
+ $tmp=$a[5]; # length
+ $tmp_index=$a[0]; # position reference
+ }
+ elsif ($key ne $a[8]) {
+ ### here possible bugg...
+ $h_max{$tmp_index}=$key;
+ $key=$a[8];
+ $tmp=$a[5]; # length
+ $tmp_index=$a[0]; # position reference
+
+ }
+ }
+
+ }
+ $h_max{$tmp_index}=$key;
+# print Dumper %h_max;
+
+ return (\%h,\%h_max);
+}
+##################################
+###########################
+sub writeBinContigs2Ref{
+ my $nameBin = shift;
+ my $name = shift;
+
+ open (F, "$nameBin") or die "Couldn't find file $nameBin: $!\n";
+
+ my @ar;
+
+ my $count=0;
+
+ while (<F>) {
+ push @ar, $_;
+ $count++;
+ }
+ #### sa4: added error checking:- if file is empty
+ if (scalar(@ar) < 1)
+ {
+ print "No contigs in unusedcontigs file\n";
+ $count = 0;
+
+ }
+ else
+ {
+ open (F, "> $name.notMapped.contigs.tab") or die "Couldn't write file $name.tab: $!\n";
+ print F doArt(\@ar);
+ close(F);
+ }
+ return $count;
+
+}
+
+
+##############################
+sub doArt{
+ my ($ref) = @_;
+
+
+ ## hash of array with all positions of the contig
+ my %Pos;
+
+ ## Hash with note of result line of nucmer
+ my %lines;
+
+ foreach (@$ref) {
+ chomp;
+ my @ar=split(/\t/);
+ push @{ $Pos{$ar[12]}}, "$ar[0]..$ar[1]";
+ $lines{$ar[12]} .= "FT $_\n";
+ }
+
+ my $res;
+
+ foreach my $contig (keys %lines) {
+
+ if (scalar(@{ $Pos{$contig} } >1)) {
+ my $tmp;
+
+ foreach (@{ $Pos{$contig} }) {
+ $tmp.="$_,";
+ }
+ $tmp =~ s/,$//g; # get away last comma
+ $res .= "FT contig join($tmp)\n";
+ }
+ else {
+ $res .= "FT contig $Pos{$contig}[0]\n";
+ }
+ $res .= "FT /systematic_id=\"$contig\"\n";
+ $res .= "FT /note=\"Contig $contig couldn't map perfectly.\n";
+ $res .= $lines{$contig};
+ $res .= "FT \"\n";
+
+ }
+
+ return $res;
+}
+
+##########################################################################
+#---------------------------
+sub makeN #creates a string of Ns
+{
+ my $n = shift;
+ my $Ns= "N";
+ for (my $i =1; $i < $n; $i+=1)
+ {
+ $Ns = $Ns."N";
+ }
+ return $Ns;
+}
+
+###########################################################################
+## reverse complement a sequence
+#---------------------------------------------
+sub revComp {
+ my $dna = shift;
+ my $revcomp = reverse($dna);
+
+ $revcomp =~ tr/ACGTacgt/TGCAtgca/;
+
+ return $revcomp;
+}
+
+################################################################################
+### function to visualize rep. regions in Reference genome.
+sub findRepeats
+{
+ my $reference = shift;
+ my $name = shift;
+ my $path_prog = shift;
+
+ # get path
+ my ($path_coords) = $path_prog;
+
+ $path_coords =~ s/nucmer/show-coords/;
+
+ my $call = "$path_prog --maxmatch -c 100 -b 40 -p $name.repeats -l 25 $reference $reference &> /dev/null ";
+ !system("$call") or die "Problems doing the nucmer comparison: $call $!\n";
+ $call ="$path_coords -r -c -l $name.repeats.delta > $name.repeats.coords ";
+ !system(" $call") or die "Problems doing the show-coords comparison: $call $!\n";
+
+ my @Res;
+ open (F, "$name.repeats.coords" ) or die "Problems to open file $reference.repeats.coords: Is MUMmer installed correctly and inserted in the PATH environment variable? ($!)\n";
+ $_=<F>; $_=<F>; $_=<F>; $_=<F>;$_=<F>;
+ while (<F>) {
+ my @ar = split(/\s+/);
+ if (!($ar[1] == $ar[4] or $ar[2] == $ar[5] or $ar[7] > 100000)) { # to exclude self alignment
+
+ foreach ($ar[1]..$ar[2]) {
+ $Res[($_-1)]++;
+ }
+ }
+ }
+
+ ### write the result to the plot file
+
+ my $res;
+ foreach (@Res){
+ if (defined($_)) {
+ $res .= "$_\n";
+ }
+ else {
+ $res .= "0\n";
+ }
+ }
+
+ open (F, "> $name.Repeats.plot") or die "Couldn't open file $name.plot to write: $! \n";
+ print F $res;
+ close(F);
+
+ ### delete files
+ unlink("$name.repeats.delta");
+ unlink("$name.repeats.coords");
+ unlink("$name.repeats.cluster");
+}
+
+################################################################################
+# reverse a list of qualities
+#------------------------------
+sub reverseQual{
+ my $str = shift;
+
+ $str =~ s/\s+$//g;
+
+ my @ar = split(/\s/,$str);
+ my $res;
+
+ for (my $i=(scalar(@ar)-1);$i>=0;$i--) {
+ $res.="$ar[$i] ";
+ }
+ return $res;
+}
+
+##############################################################################
+# ----------------
+sub getPosCoords{
+ my $ref_ar = shift;
+ my $contig = shift;
+
+ my $offset = shift;
+ my $res;
+ # print Dumper $$ref_ar{$contig};
+ foreach (@{$$ref_ar{$contig}}) {
+ print "in getPos Coords: $_\n";;
+
+ my @ar=split(/\t/);
+ $ar[6]+=$offset;
+ $ar[7]+=$offset;
+ $res .= join("\t", at ar);;
+ }
+
+ return $res;
+
+}
+#############################################################################
+# -----------------------------
+sub getPosCoordsTurn{
+ my $ref_ar = shift;
+ my $contig = shift;
+
+ my $offset = shift;
+
+
+ my $res;
+
+ # print Dumper $$ref_ar{$contig};
+ foreach (@{$$ref_ar{$contig}}) {
+ my @ar=split(/\t/);
+ my $tmp_8=$ar[8];
+ my $tmp_9=$ar[9];
+
+ $ar[8]=$ar[6]+$offset;
+ $ar[9]=$ar[7]+$offset;
+ $ar[6]=$tmp_8;
+ $ar[7]=$tmp_9;
+
+ ## change query subject
+
+ $res .= join("\t", at ar);;
+ }
+
+ return $res;
+
+}
+
+############################################################################
+#------------------------
+sub printStats{
+
+ my ($num_fortillingcontigs,$num_notsetTilling,$num_mapped, $num_contigs, $num_inComparisoncontigs, $ref_len, $total_bases_mpd) = @_;
+ $num_fortillingcontigs=$num_notsetTilling+$num_mapped;
+ my $res;
+ $res.= "Short statistics of run.\n";
+ $res.= "$num_contigs\t\tcontigs entered to be mapped against the reference.\n";
+ $res.= sprintf("$num_inComparisoncontigs\t%0.2f \%\tmade a hit against the reference to the given parameter (-s -d etc)\n",($num_inComparisoncontigs*100/$num_contigs));
+ $res.= sprintf("$num_fortillingcontigs\t%0.2f \%\twere considered for the tilling graph (-s -d etc)\n",($num_fortillingcontigs*100/$num_contigs));
+ $res.= sprintf("$num_mapped\t%0.2f \%\tare mapped in the tilling graph (-s -d etc)\n",($num_mapped*100/$num_contigs));
+ $res.= sprintf("\nCoverage: The reference is $ref_len long. Up to $total_bases_mpd bp (%0.2f \%) are covered by the contigs (This doesn't mean that these regions are really similar...)\n",($total_bases_mpd*100/$ref_len));
+
+ print $res;
+
+}
+##################################################
+#### Do Tiling
+#----------------------------------------------------
+sub doTiling {
+
+ my ($mummer_tiling, $path_toPass, $path_dir,$reference, $query_file, $choice, $prefix,$mlfas, $fasta_bin, $avoid_Ns, $ref_len, $gaps_2file, $ref_inline, $add_bin_2ps, $pick_primer, $flank, $chk_uniq, $run_blast) = @_;
+
+ ### these are also defined in the main script.... to be changed!!
+ my ($num_contigs , $num_inbincontigs, $avg_cont_size,$num_overlaps , $num_gaps,
+ $num_mapped,$total_bases_mpd,$p_ref_covered,$num_ambigus,$num_inComparisoncontigs,
+ $num_fortillingcontigs,$num_notsetTilling)= (0,0,0,0,0,0,0,0,0,0,0,0);
+
+ my ($href, $ref_contigs_qual) = hash_contigs($query_file);
+ my $qualAvailable=0;
+ my %contigs_hash = %{$href};
+ my @c_keys = keys %contigs_hash;
+ $num_contigs = scalar(@c_keys);
+ my @cont_lens;
+ my (@ids ,$id, $id_len);
+ my (@Rs, @Re, @G, @Len, @cov, @pid, @orient, @Cid);
+ my (@Ps, @Pe);
+ my ($total);
+ my $g; #define gap size between contigs
+ my $tiling_gap; #gap size from tiling graph
+ open (TIL, $mummer_tiling) or die "Could not open $mummer_tiling: $!";
+ while (<TIL>)
+ {
+ chomp;
+ if ($_ =~/^>/)
+ {
+ my $line = substr $_, 1;
+ my @splits = split /\s+/, $line;
+ $id = $splits[0];
+ push @ids, $id;
+ $id_len= $splits[1];
+ }
+ else
+ {
+ my @splits = split /\s+/, $_;
+ push @Rs, $splits[0];
+ push @Re, $splits[1];
+ push @G, $splits[2];
+ push @Len, $splits[3];
+ push @cov, $splits[4];
+ push @pid, $splits[5];
+ push @orient, $splits[6];
+ push @Cid, $splits[7];
+ }
+
+ }
+ close (TIL);
+ if (scalar(@Rs) != scalar(@Re))
+ {
+ print "ERROR: unequal array size\n";
+ exit;
+ }
+ else
+ {
+ $total = scalar(@Rs);
+ $num_mapped = scalar(@Rs);
+ }
+ my $ref_loc = $reference; # get locations of reference and query files
+ my $qry_loc = $query_file;
+ my $dif_dir =0; #assume query and reference are in the working directory
+ my @splits_reference = split (/\//, $reference);
+ my $new_reference_file = $splits_reference[(scalar(@splits_reference)-1)];
+ my @splits_query = split (/\//, $query_file);
+ my $new_query_file = $splits_query[(scalar(@splits_query)-1)];
+ if ($prefix eq "")
+ {
+ $prefix = $new_query_file."_".$new_reference_file;
+ }
+ #-------------------------------------------------------------------
+ #define file handles for output files and open files to write output
+ #-------------------------------------------------------------------
+ my ($seqFH,$tabFH,$binFH,$crunchFH, $gapFH, $gapFHT, $mlFH, $dbinFH, $avoidNFH,$ref_gapsFH);
+ open ($seqFH, '>', $prefix . '.fasta') or die "Could not open file $prefix.fasta for write: $!\n";
+ open ($tabFH, '>', $prefix . '.tab') or die "Could not open file $prefix.tab for write: $!\n";
+ open ($binFH, '>', $prefix . '.bin') or die "Could not open file $prefix.bin for write: $!\n";
+ open ($crunchFH, '>', $prefix . '.crunch') or die "Could not open file $prefix.crunch for write: $!\n";
+ open ($gapFH, '>', $prefix . '.gaps') or die "Could not open file $prefix.gaps for write: $!\n";
+ open ($gapFHT, '>', $prefix . '.gaps.tab') or die "Could not open file $prefix.gaps.tab for write: $!\n";
+
+ if ($mlfas ==1)
+ {
+ open ($mlFH, '>', $prefix . '.MULTIFASTA.fa') or die "Could not open file $prefix.contigs.fas for write: $!\n";
+ }
+ if ($fasta_bin ==1)
+ {
+ open ($dbinFH, '>', $prefix . '.contigsInbin.fas') or die "Could not open file $prefix.contigsInbin.fas for write: $!\n";
+ }
+ if ($avoid_Ns ==1)
+ {
+ open ($avoidNFH, '>', $prefix .'.NoNs.fasta') or die "Could not open file $prefix.NoNs.fasta for write: $!\n";
+ }
+ if ($gaps_2file ne "")
+ {
+ open ($ref_gapsFH, '>', $gaps_2file.'.Gaps_onRef') or die "Could not open file $gaps_2file.Gaps_onRef for write: $!\n";
+ }
+ #-------------------------------------------------------------------------
+ # Writing tiling graph and generating a pseudomolecule
+ # Note use use ps for pseudomolecule
+ #@Ps = start of ps, and @Pe = end of ps
+ my $ps_start =1;
+ $Ps[0] = 1;
+ $Pe[0] = $Ps[0] + $Len[0] -1;
+ my $tmp_qual;
+ my $tmp_nqual;
+ my $tmp_seq ="";
+ my $tmp_nseq ="";
+ print "Total contigs = $total \n";
+
+ #------------------------------------------------------------
+ # The 'for loop' loops over each contig in the Tiling output
+ #Writing to file is done for each contig to speed up the process
+ #This part could potentially be a separate subroutine
+
+ print $tabFH "ID ",$id, "\n";
+ print $seqFH ">", "ordered_", $id, "\n";
+ print $gapFHT "ID ",$id, "\n";
+
+ for (my $i=1; $i <= $total; $i+=1)
+ {
+ my $covv =sprintf("%.0f",$cov[$i -1]); #ROUNDING
+ my $pidd = sprintf("%.0f", $pid[$i -1]);
+ my ($contig_coord, $color, $contig_seq);
+ my $contig_qual='';
+ $tiling_gap = $G[$i -1];
+ if ($tiling_gap <= 1){ #insert 100Ns for overlaps and gaps of size less than or equal to one base
+ $g = 99; # default gap size to
+ }
+ else{
+ $g = $tiling_gap;
+ }
+ if (defined($Len[$i]))
+ {
+ $Ps[$i] = $Pe[$i-1] +$g +1;
+ $Pe[$i] = $Ps[$i] + $Len[$i] -1;
+ $total_bases_mpd+=$Len[$i];
+ }
+
+ if ($Rs[$i -1] <0) #check if a reference starting position is less than 0
+ {
+ $Rs[$i -1] =1;
+ }
+
+ if($orient[$i-1] eq "+")
+ {
+ $contig_coord = $Ps[$i -1]."..".$Pe[$i-1];
+ $color = 4;
+ $contig_seq = $contigs_hash{$Cid[$i-1]};
+ }
+ else
+ {
+ $contig_coord = "complement(".$Ps[$i -1]."..".$Pe[$i-1].")";
+ $color =3;
+ $contig_seq = revComp($contigs_hash{$Cid[$i-1]}); #REVERSE COMPLEMENT A SEQUENCE
+
+ }
+ push (@cont_lens, length($contig_seq));
+
+ # tdo
+ if (defined($$ref_contigs_qual{$Cid[$i-1]})) {
+ ## flag to know, that the qual exists
+ $qualAvailable=1;
+ $contig_qual = $$ref_contigs_qual{$Cid[$i-1]};
+ }
+ $tmp_qual .= $contig_qual;
+ $tmp_seq .= $contig_seq;
+ if ($avoid_Ns ==1)
+ {
+ $tmp_nseq.= $contig_seq;
+ #tdo
+ $tmp_nqual .= $contig_qual;
+ }
+ if ($mlfas ==1)
+ {
+ my $multifasta_seq = write_Fasta ($contig_seq);
+ print $mlFH ">", $Cid[$i-1], "\n", $multifasta_seq;
+ }
+ if ($Re[$i -1] > $ref_len)
+ {
+ $Re[$i -1] = $ref_len -1;
+ }
+ if ($Pe[$i -1] > length($tmp_seq))
+ {
+ $Pe[$i -1] = length($tmp_seq);
+ }
+
+ #-----------------------------------------
+ print $crunchFH $covv, " ", $pidd, " ", $Ps[$i -1], " ", $Pe[$i -1], " ", $Cid[$i -1], " ", $Rs[$i -1], " ", $Re[$i-1], " ", "unknown NONE\n";
+
+ #WRITE FEATURE FILE
+ print $tabFH "FT contig ",$contig_coord, "\n";
+ print $tabFH "FT ", "/systematic_id=\"", $Cid[$i-1],"\"","\n";
+ print $tabFH "FT ", "/method=\"", "mummer\"", "\n";
+ print $tabFH "FT ", "/Contig_coverage=\"",$cov[$i -1], "\"", "\n";
+ print $tabFH "FT ", "/Percent_identity=\"",$pid[$i -1], "\"", "\n";
+ my ($gap_coord, $gapCol,$gap_start,$gap_end,$ref_start, $ref_end) ;
+
+
+ $gap_start = $Pe[$i -1] +1;
+ if (defined $Ps[$i])
+ {
+ $gap_end = $Ps[$i] -1;
+ }
+ else
+ {
+ $gap_end = $gap_start + ($g) -1 ; ############ ...... check.....
+ }
+ $ref_start = $Re[$i -1] +1;
+
+ if (defined $Rs[$i])
+ {
+ $ref_end =$Rs[$i]-1;
+ }
+ else
+ {
+ $ref_end = "END";
+ }
+ $gap_coord = $gap_start."..".$gap_end;
+ my $ov ="";
+ if ($tiling_gap > 1) #WRITE GAP LOCATIONS AND SIZE TO FILE
+ {
+ print $gapFH "Gap\t",$tiling_gap, "\t", $gap_start, "\t", $gap_end, "\t", $ref_start, "\t", $ref_end,"\tNON-Overlapping\n";
+ $ov = "NO";
+ $gapCol = 8;
+
+ if ($gaps_2file ne "" && $ref_start < $ref_len)
+ {
+ my $gapOnref = substr ($ref_inline, $ref_start, $g);
+ print $ref_gapsFH ">StartOnRef_",$ref_start, " [Gap=",$g,"]\n";
+ my $file_toPrint = write_Fasta ($gapOnref);
+ print $ref_gapsFH $file_toPrint;
+ }
+ }
+ else
+ {
+ $color = 5;
+ $gapCol =9;
+ $ov ="YES";
+ print $gapFH "Gap\t",$g, "\t", $gap_start, "\t", $gap_end, "\t", $ref_start, "\t", $ref_end,"\tOverlapping\n";
+
+ print $tabFH "FT ", "/Overlapping=\"", "YES\"", "\n";
+
+ }
+
+ print $gapFHT "FT GAP ",$gap_coord, "\n";
+ print $gapFHT "FT ", "/SIZE=\"", $g,"\"","\n";
+ print $gapFHT "FT ", "/Overlapping=\"", $ov, "\"","\n";
+ print $gapFHT "FT ", "/colour=\"",$gapCol, "\"", "\n";
+ print $tabFH "FT ", "/colour=\"",$color, "\"", "\n";
+
+ my $ns = makeN($g);
+ $tmp_seq = $tmp_seq.$ns;
+ #tdo
+ for (1..length($ns))
+ {
+ $tmp_qual .= "0 ";
+ }
+
+ }
+ #------------------------------------------------------------------
+ #tdo
+ my @Quality_Array;
+ if ($qualAvailable) {
+ @Quality_Array = split(/\s/,$tmp_qual);
+ my $res;
+ foreach (@Quality_Array) {
+ $res .= "$_\n";
+ }
+ ## get name
+ my @splits_query = split (/\//, $query_file);
+ $new_query_file = $splits_query[(scalar(@splits_query)-1)];
+ open (F,"> $new_query_file.qual.plot") or die "problems\n";
+ print F $res;
+ close(F);
+ }
+ ##WRITE PSEUDOMOLECULE WITHOUT 'N's
+ #--------------------------------------
+ if ($avoid_Ns ==1)
+ {
+ print $avoidNFH ">", "ordered_", $id, "without 'N's","\n";
+ my $toWrite = write_Fasta ($tmp_nseq);
+ print $avoidNFH $toWrite;
+ }
+ ####################################
+ #WRITE CONTIGS WITH NO HIT TO FILE #
+ #################################
+ my %Cids;
+
+ foreach(@Cid)
+ {
+ chomp;
+ $Cids{$_} = 1;
+ }
+ my @contigs_2bin = ();
+ my %h_contigs_2bin;
+
+ foreach (@c_keys)
+ {
+ push(@contigs_2bin, $_) unless exists $Cids{$_};
+
+ }
+ foreach(@contigs_2bin)
+ {
+ $h_contigs_2bin{$_}=1;
+
+ print $binFH "$_ \n";
+
+ }
+ $num_inbincontigs= scalar(@contigs_2bin);
+
+ ########
+ # WRITE PSEUDOMOLECULE TO FILE
+ #----------------------------------
+ my $new_seq = $tmp_seq;
+ my $prev_len = length($tmp_seq);
+ my $total_len = $prev_len;
+ foreach (@contigs_2bin)
+ {
+ chomp;
+ #my $binseq = $contigs_hash{$contigs_2bin[$i]};
+ my $l = length ($contigs_hash{$_});
+ $total_len +=$l;
+ }
+ if ($add_bin_2ps ==1) #appending unmapped contigs to pseudomolecule
+ {
+
+ for (my $i =0; $i < scalar(@contigs_2bin); $i+=1)
+ {
+ my $binseq = $contigs_hash{$contigs_2bin[$i]};
+ $new_seq .=$contigs_hash{$contigs_2bin[$i]};
+ my $len_current_contig = length($binseq);
+ my $start = $prev_len +1;
+ my $end = $start + $len_current_contig -1;
+ my $col = 7;
+ if ($start > $total_len)
+ {
+ $start = $total_len;
+ }
+ if ($end >$total_len)
+ {
+ $end = $total_len;
+ }
+ my $co_cord = $start."..".$end;
+ my $note = "NO_HIT";
+ print $tabFH "FT contig ",$co_cord, "\n";
+ print $tabFH "FT ", "/systematic_id=\"", $contigs_2bin[$i],"\"","\n";
+ print $tabFH "FT ", "/method=\"", "mummer\"", "\n";
+ print $tabFH "FT ", "/colour=\"",$col, "\"", "\n";
+ print $tabFH "FT ", "/", $note, "\n";
+ $prev_len= $end;
+ }
+ }
+ my $to_write = write_Fasta ($new_seq);
+ print $seqFH $to_write;
+ ########
+ #WRITE CONTIGS IN BIN TO FILE #
+ #------------------------------------------------------
+ if ($fasta_bin ==1)
+ {
+ foreach(@contigs_2bin)
+ {
+ print $dbinFH ">", $_, "\n";
+ my $to_write = write_Fasta($contigs_hash{$_});
+ print $dbinFH $to_write;
+ }
+ }
+ #unlink ("$choice.delta");
+ #unlink ("$choice.filtered.delta");
+ #unlink ("$choice.cluster");
+ #unlink ("$choice.tiling");
+ #PRINT FINAL MESSAGE
+ print " FINISHED CONTIG ORDERING\n";
+ print "\nTo view your results in ACT\n\t\t Sequence file 1: $new_reference_file\n\t\t Comparison file 1: $prefix.crunch\n\t\t Sequence file 2: $prefix.fasta\n
+ \t\tACT feature file is: $prefix.tab\n
+ \t\tContigs bin file is: $prefix.bin\n
+ \t\tGaps in pseudomolecule are in: $prefix.gaps\n\n";
+
+ #Run tblastx....
+ if ($run_blast ==1)
+ {
+ print "Running tblastx on contigs in bin...\nThis may take several minutes ...\n";
+ my $formatdb = 'formatdb -p F -i' ;
+# my @formating = `
+ !system("$formatdb $new_reference_file") or die "ERROR: Could not find 'formatdb' for blast\n";
+ my $blast_opt = 'blastall -m 9 -p tblastx -d ';
+ my $contigs_inBin = $prefix.'.contigsInbin.fas';
+# my @bigger_b = `
+ !system("$blast_opt $new_reference_file -i $contigs_inBin -o blast.out") or die "ERROR: Could not find 'blastall' , please install blast in your working path (other dir==0)\n$blast_opt $new_reference_file -i $contigs_inBin -o blast.out\n \n";
+ }
+
+
+ if ($pick_primer == 1)
+ {
+ print " DESIGNING PRIMERS FOR GAP CLOSURE...\n";
+ my $qq = "$prefix.fasta";
+ pickPrimers($qq, $reference, $flank, $path_toPass, $chk_uniq,$qualAvailable, at Quality_Array);
+ }
+}
+
+
+#------------------------------
+sub write_Fasta {
+ my $sequence = shift;
+ my $fasta_seq ="";
+ my $length = length($sequence);
+ if ($length <= 60)
+ {
+ $fasta_seq = $sequence."\n";
+ }
+ elsif ($length> 60 )
+ {
+ for (my $i =0; $i < $length; $i+=60)
+ {
+ my $tmp_s = substr $sequence, $i, 60;
+ $fasta_seq .= $tmp_s."\n";
+ }
+ }
+
+ return $fasta_seq;
+}
+#---------------------------------------------- END OF CONTIG ORDERING SUBROUTINES----------------------------------------------------
+
+#----------------------------------------------- PRIMER DESIGN ---------------------------------------------------
+sub pickPrimers
+{
+ #$ps = pseudo molecule,$rf = reference, $flan = flanking region size
+ my ($ps,$rf, $flan, $passed_path, $chk_uniq,$qualAvailable, @Quality_Array);
+ if (@_==4){
+ ($rf,$ps, $flan, $chk_uniq) = @_;
+ print "Primers without ordering..\n";
+ print $rf;
+ $passed_path = "nucmer";
+ $qualAvailable =0;
+ @Quality_Array = [];
+ }
+ else #(@_ == 7)
+ {
+ ($ps,$rf, $flan, $passed_path, $chk_uniq, $qualAvailable, @Quality_Array) = @_;
+ }
+
+ my $dna='';
+ my @gappedSeq;
+ my $records='';
+ my @sequence;
+ my $input='';
+ #tdo
+ my @gappedQual;
+ #my $quality='';
+ my $path_toPass = $passed_path;
+ my @fasta;
+ my $query='';
+ my @exc_regions;
+ my $ch_name;
+ #my $flank = $flan;
+ open (FH, $rf) or die "Could not open reference file\n";
+ open (FH2, $ps) or die "Could not open query/pseudomolecule file\n";
+ my $ref; #print ".... ", $rf; exit;
+ my @r = <FH>;
+ my @qry = <FH2>;
+ my $dn = join ("", @qry);
+ $ref = join ("", @r);
+ $dna = Fasta2ordered ($dn);
+ #check if primer3 is installed
+ my $pr3 = "primer3_core";
+ my ($pr3_path, $pr3_Path, $pr3_path_dir) = checkProg ($pr3);
+ #my @check_prog = `which primer3_core`;
+ open (PRI, '>primer3.summary.out') or die "Could not open file for write\n";
+
+ #
+
+ #print $ref; exit;
+ #PARSING FOR PRIMER3 INPUT
+ my ($opt,$min,$max,$optTemp,$minTemp,$maxTemp,$flank,$lowRange,$maxRange,$gcMin,$gcOpt,$gcMax,$gclamp,$exclude,$quality) = getPoptions($qualAvailable);
+ my ($gap_position, at positions, %seq_hash);
+
+ my $exc1 = $flank -$exclude; #start of left exclude
+ print "Please wait... extracting target regions ...\n";
+ #regular expression extracts dna sequence before and after gaps in sequence (defined by N)
+ while($dna=~ /([atgc]{$flank,$flank}N+[atgc]{$flank,$flank})/gi)
+ {
+ $records= $1;
+ push (@gappedSeq, $records);
+ $gap_position = index($dna, $records);
+ push @positions, $gap_position;
+ $seq_hash{$gap_position}=$records;
+ #dna
+ if ($qualAvailable) {
+ my $res;
+ for (my $nn=($gap_position-1); $nn <= ($gap_position-1+length($records)-1); $nn++) {
+ $res.="$Quality_Array[$nn] ";
+ }
+ push @gappedQual, $res;
+ }
+ }
+ #loop prints out primer targets into a file format accepted by primer3
+ my $count=1;
+ my $identify='';
+ my $seq_num = scalar @gappedSeq;
+ my $name= " ";
+
+ my ($totalp, @left_name, @right_name, @left_seq, @right_seq);
+
+ my ($leftP_names, $rightP_names, $leftP_seqs, $rightP_seqs, $leftP_start, $leftP_lens, $rightP_ends, $rightP_lens,$left_Exclude,$right_Exclude, $primers_toExc, $prod_size)=
+ ("","","","","","","","","","", "", "");
+
+ print $seq_num, " gaps found in target sequence\n";
+ print "Please wait...\nLooking for primers...\n";
+ print "Running Primer3 and checking uniquness of primers...\nCounting left and right primer hits from a nucmer mapping (-c 15 -l 15)\n";
+
+ for (my $i=0; $i<$seq_num; $i+=1)
+ {
+ $identify = $count++;
+ if (defined $ch_name)
+ {
+ $name = $ch_name;
+ }
+ my $len = length($gappedSeq[$i]);
+ my $exc2 = $len - $flank;
+ open(FILE, '>data') or die "Could not open file\n";
+ #tdo
+ my $qual='';
+ if ($qualAvailable) {
+ $qual="PRIMER_SEQUENCE_QUALITY=$gappedQual[$i]\nPRIMER_MIN_QUALITY=$quality\n";
+ }
+
+#WARNING: indenting the following lines may cause problems in primer3
+print FILE "PRIMER_SEQUENCE_ID=Starting_Pos $positions[$i]
+SEQUENCE=$gappedSeq[$i]
+PRIMER_OPT_SIZE=$opt
+PRIMER_MIN_SIZE=$min
+PRIMER_MAX_SIZE=$max
+PRIMER_OPT_TM=$optTemp
+PRIMER_MIN_TM=$minTemp
+PRIMER_MAX_TM=$maxTemp
+PRIMER_NUM_NS_ACCEPTED=1
+PRIMER_PRODUCT_SIZE_RANGE=$lowRange-$maxRange
+PRIMER_MIN_GC=$gcMin
+PRIMER_GC_CLAMP =$gclamp
+PRIMER_OPT_GC_PERCENT=$gcOpt
+PRIMER_MAX_GC=$gcMax
+PRIMER_INTERNAL_OLIGO_EXCLUDED_REGION=$exc1,$exclude $exc2,$exclude
+".$qual."Number To Return=1
+=\n";
+close FILE;
+
+ #runs primer3 from commandline
+
+ ################# NOTE: PRIMER3 SHOULD BE IN YOUR WORKING PATH #########
+
+ my @Pr3_output = `$pr3_path -format_output <data`;
+ #print $positions[$i], "\t", $i, " ", $path_toPass, " ", $rf, $exc1, " ",$exc2, "\n";
+ my $fil = join (":%:", @Pr3_output);
+ my ($uniq_primer, $string,$left_nm,$right_nm,$left_sq, $right_sq,$left_strt,$left_ln, $right_End,$right_ln,$primers_toExclude, $product_size)
+ = check_Primers ($fil, $positions[$i], $i,$path_toPass, $rf, $exc1, $exc2);
+
+ print PRI $string;
+ if ($uniq_primer ==1)
+ {
+ $leftP_names.=$left_nm."\n";
+ $rightP_names.=$right_nm."\n";
+ $leftP_seqs.=$left_sq."\n";
+ $rightP_seqs.=$right_sq."\n";
+ $leftP_start.=$left_strt."\n";
+ $leftP_lens.=$left_ln."\n";
+ $rightP_ends.=$right_End."\n";
+ $rightP_lens.=$right_ln."\n";
+ $left_Exclude.=$exc1."\n";
+ $right_Exclude.=$exc2."\n";
+ $prod_size.=$product_size."\n";
+ }
+ if ($primers_toExclude ne "")
+ {
+ $primers_toExc.= $primers_toExclude; #."\n";
+ }
+
+ }
+ write_Primers ($leftP_names, $rightP_names, $leftP_seqs, $rightP_seqs, $leftP_start, $leftP_lens, $rightP_ends, $rightP_lens,$primers_toExc,$left_Exclude,$right_Exclude, $prod_size);
+ #write_Primers (@left_name, @right_name, @left_seq, @right_seq, at left_start, @left_len, @right_end, @right_len, @left_exclude, @right_exclude, $primers_toExclude);
+
+}
+
+#checks the uniqueness of primers
+#input an array with promer3 output for each gap
+sub check_Primers
+{
+
+ my ($fil, $position, $index,$path_toPass, $rf, $exc1, $exc2) = @_;
+ my @Pr3_output = split /:%:/, $fil;
+ my ($left_name, $right_name, $left_seq, $right_seq, $left_start,$left_len,$right_end,$right_len,$left_exclude,$right_exclude) = ("", "", "", "", "", "", "", "", "", "");
+ my $primers_toExclude ="";
+ my $product_size ="";
+ my $string ="";
+ my $uniq_primer = 0;
+ $string.="=========================================================================================\n";
+ $string.="Primer set for region starting at ".$position."\n";
+
+ if (defined $Pr3_output[5] && defined $Pr3_output[6])
+ {
+ if ($Pr3_output[5]=~ /LEFT PRIMER/)
+ {
+ # print $Pr3_output[5];
+ #check uniquness of primer against the genome
+ my @splits_1 = split (/\s+/, $Pr3_output[5]);
+ my $left_primer = $splits_1[8];
+ my $left_st = $splits_1[2];
+ my $left_length = $splits_1[3];
+
+ my @splits_2 = split (/\s+/, $Pr3_output[6]);
+ my $right_primer = $splits_2[8];
+ my $right_st = $splits_2[2];
+ my $right_length = $splits_2[3];
+
+ open (QRY_1, '>./left_query'); # open a file for left primers
+ print QRY_1 ">left_01\n"; #
+ print QRY_1 $left_primer,"\n";
+ open (QRY_2, '>./right_query');
+ print QRY_2 ">right_01\n";
+ print QRY_2 $right_primer,"\n";
+
+ my ($left_count, $right_count);
+ #if ($chk_uniq eq "nucmer")
+ #{
+ my $options = "-c 15 --coords -l 15 ";
+ my $rq = "right_query";
+ my $lq = "left_query";
+ my (@right_ps, @left_ps);
+ # print $path_toPass, "\t", $options, "\n";
+
+
+ my @Rrun = `$path_toPass $options -p R $rf $rq &> /dev/null`;
+ print ".";
+ my $f1 = "R.coords";
+ open (RP, $f1) or die "Could not open file $f1 while checking uniqueness of right primer\n";
+ while (<RP>)
+ {
+ chomp;
+ if ($_ =~ /right_01$/)
+ {
+ push @right_ps, $_;
+ }
+ }
+ close (RP);
+ my @Lrun = `$path_toPass $options -p L $rf $lq &> /dev/null`;
+ print ".";
+ my $f2 = "L.coords";
+ open (LQ, $f2) or die "Could not open file $f2\n";
+ while (<LQ>)
+ {
+ chomp;
+ if ($_ =~ /left_01$/)
+ {
+ push @left_ps, $_;
+ }
+ }
+ close (LQ);
+ $right_count = scalar (@right_ps);
+ $left_count = scalar(@left_ps);
+ #check if a primer is not in the excluded region::
+ my $primer_NearEnd =0;
+ if ($left_st > $exc1 || $right_st < $exc2)
+ {
+ $primer_NearEnd = 1;
+ }
+
+ if ($left_count < 2 && $right_count<2 && $primer_NearEnd ==0)
+ {
+ $string.=$left_count."\t".$Pr3_output[5]."\n";
+ $string.=$right_count."\t".$Pr3_output[6]."\n";
+ $string.="***************************** PRIMER3 OUTPUT **************************\n";
+ foreach (@Pr3_output) {$string.=$_;}
+
+ my @prod_size_split = split /\s+/, $Pr3_output[10];
+
+ $product_size = substr($prod_size_split[2], 0, -1);
+ $left_name = $position;
+ $right_name = $position;
+ my $lp_uc = uc ($left_primer);
+ my $rp_uc = uc($right_primer);
+ #print $left_count, "..", $right_count, "\t";
+ $left_seq = $lp_uc;
+ $right_seq= $rp_uc;
+
+ $left_start= $left_st;
+ $left_len = $left_length;
+
+ $right_end = $right_st;
+ $right_len = $right_length;
+
+ $left_exclude = $exc1;
+ $right_exclude =$exc2;
+ $uniq_primer =1;
+ }
+ else
+ {
+ if ($primer_NearEnd ==1)
+ {
+ $string.="One of the oligos is near the end of a contig\n";
+ }
+ else
+ {
+ $string.="Primer set not unique\n";
+ }
+ $primers_toExclude.=">L.".$position."\n".$left_primer."\n";
+ $primers_toExclude.=">R.".$position."\n".$right_primer."\n";
+ }
+
+ }
+ else
+ {
+ $string.="No Primers found\n";
+ }
+ }
+
+ return ($uniq_primer, $string,$left_name,$right_name,$left_seq, $right_seq,$left_start,$left_len, $right_end,$right_len,$primers_toExclude, $product_size);
+
+
+}
+
+###------------------------------------
+# Writes primers and their regions to file
+sub write_Primers {
+ my ($leftP_names, $rightP_names, $leftP_seqs, $rightP_seqs, $leftP_start, $leftP_lens, $rightP_ends, $rightP_lens,$primers_toExclude,$left_Exclude,$right_Exclude, $product_sizes) = @_;
+ my (@left_name, @right_name, @left_seq, @right_seq, @left_start, @left_len, @right_end, @right_len, @left_exclude, @right_exclude, @product_size);
+
+ #open files to read
+ @left_name = split /\n/, $leftP_names;
+ @right_name= split /\n/, $rightP_names;
+ @left_seq = split /\n/, $leftP_seqs;
+ @right_seq = split /\n/, $rightP_seqs;
+
+ @left_start = split /\n/, $leftP_start;
+ @left_len = split /\n/, $leftP_lens;
+ @right_end = split/\n/, $rightP_ends;
+ @right_len = split /\n/, $rightP_lens;
+ @left_exclude = split /\n/, $left_Exclude;
+ @right_exclude = split /\n/,$right_Exclude;
+ @product_size = split /\n/, $product_sizes;
+
+ my $primers_withSize ="";
+ open (SEN, '>sense_primers.out') or die "Could not open file for write\n";
+ open (ASEN, '>antiSense_primers.out') or die "Could not open file for write\n";
+ open (REG_1, '>sense_regions.out') or die "Could not open file for write\n";
+ open (REG_2, '>antiSense_regions.out') or die "Could not open file for write\n";
+
+ if ($primers_toExclude ne "")
+ {
+ open (PEX, '>primers_toExclude.out') or die "Could not open file for write\n";
+ print PEX $primers_toExclude;
+ }
+
+
+ my $totalp = scalar (@left_name);
+
+ #print $totalp, "\n"; exit;
+
+ my $well_pos;
+ my $max_plates = ceil($totalp/96);
+ #print "MAX Ps ", $max_plates, "\n";
+ my $plate=1;
+ my $sen ="";
+ my $asen ="";
+ my $plate_counter =0;
+ my $wells = 96;
+ for (my $index =0; $index < $totalp; $index += $wells)
+ {
+ my $do = $index;
+ my $upper_bound= $index + $wells;
+ if ($upper_bound > $totalp)
+ {
+ $upper_bound = $totalp;
+ }
+
+ for (my $j=$index; $j <= ($upper_bound-1); $j+=1)
+ {
+ my $i = $j;
+ if ($j < 96)
+ {
+ $well_pos = get_WellPosition ($j);
+ }
+ else
+ {
+ $well_pos = get_WellPosition ($j - $wells)
+ }
+
+ #$primers_withSize.=$product_size[$i]."\t"."Plate_".$plate. "\t\tS.".$i."\tS.".$left_name[$i]."\t".
+ print SEN "Plate_".$plate, "\t\t","S.", $i, "\tS.", $left_name[$i], "\t", $left_seq[$i], "\t\t+", "\t", $well_pos, "\n";
+ print ASEN "Plate_".$plate, "\t\t","AS.", $i, "\tAS.", $right_name[$i], "\t", $right_seq[$i], "\t\t-","\t", $well_pos,"\n";
+ print REG_1 "Plate_".$plate, "\t\t","S.", $i, "\t", $left_start[$i], "\t", $left_len[$i], "\n";
+ print REG_2 "Plate_".$plate, "\t\t","AS.", $i, "\t", $right_end[$i], "\t",$right_len[$i], "\n";
+
+ }
+ $plate +=1;
+ }
+
+ #delete tmp. files
+ #my $rm = "rm -f";
+ system ("rm -f data left_query right_query R.delta R.cluster R.coords L.delta L.cluster L.coords");
+ print "\nPRIMER DESIGN DONE\n\n";
+ # end of primer design program
+}#//
+#####
+# returns a well position for oligos
+sub get_WellPosition{
+
+ my $j = shift;
+ my $well_pos;
+ if ($j < 12)
+ {
+ $well_pos = "a".($j+1);
+ }
+ elsif ($j>11 && $j<24) {
+ $well_pos = "b". (($j+1) -12);
+ }
+ elsif ($j>23 && $j<36) {
+ $well_pos = "c". (($j+1) -24);
+ }
+ elsif ($j>35 && $j<48) {
+ $well_pos = "d". (($j+1) - 36);
+ }
+ elsif($j>47 && $j<60) {
+ $well_pos = "e". (($j+1) -48);
+ }
+ elsif ($j>59 && $j<72)
+ {
+ $well_pos = "f". (($j+1) - 60);
+ }
+ elsif ($j>71 && $j< 84)
+ {
+ $well_pos = "g". (($j+1) - 72);
+ }
+ elsif ($j>83 && $j<96)
+ {
+ $well_pos = "h". (($j+1) - 84);
+ }
+ return $well_pos;
+}
+
+
+####################################################################
+#get options for primer design
+#----------------------
+sub getPoptions{
+
+ my $qualAvailable = shift;
+ #### USER INPUTS ##########
+ #ask for optimum primer size
+ print "\nEnter Optimum Primer size (default 20 bases):";
+ my $opt=<STDIN>;
+ chomp $opt;
+ if($opt eq '')
+ {
+ $opt = 20;
+ }
+ #ask for minimum primer size
+ print "\nEnter Minimum Primer size (default 18 bases):";
+ my $min=<STDIN>;
+ chomp $min;
+ if($min eq '')
+ {
+ $min = 18;
+ }
+ #ask for maximum primer size
+ print "\nEnter Maximum Primer size (default 27 bases):";
+ my $max= <STDIN>;
+ chomp $max;
+ if($max eq '')
+ {
+ $max= 27;
+ }
+ #ask for optimum primer temperature
+ print "\nEnter Optimum melting temperature (Celcius) for a primer oligo (default 60.0C):";
+ my $optTemp=<STDIN>;
+ chomp $optTemp;
+ if($optTemp eq '')
+ {
+ $optTemp = 60.0;
+ }
+ #ask for minimum primer temperature
+ print "\nEnter Minimum melting temperature (Celcius) for a primer oligo (default 57.0C):";
+ my $minTemp=<STDIN>;
+ chomp $minTemp;
+ if($minTemp eq '')
+ {
+ $minTemp = 57.0;
+ }
+ #ask for maximum primer temperature
+ print "\nEnter Maximum melting temperature (Celcius) for a primer oligo (default 63.0C):";
+ my $maxTemp=<STDIN>;
+ chomp $maxTemp;
+ if($maxTemp eq '')
+ {
+ $maxTemp = 63.0;
+ }
+ print "\nEnter flanking region size (default 1000 bases): ";
+ my $flank=<STDIN>;
+ chomp $flank;
+ if ($flank eq '')
+ {
+ $flank = 1000;
+ }
+ #ask for primer product range
+ print "\nEnter minimum product size produced by primers (default =flanking size):";
+ my $lowRange=<STDIN>;
+ chomp $lowRange;
+ if($lowRange eq '')
+ {
+ $lowRange = $flank;
+ }
+ print "\nEnter maxmimum product size produced by primers (default 7000):";
+ my $maxRange=<STDIN>;
+ chomp $maxRange;
+ if($maxRange eq '')
+ {
+ $maxRange = 7000;
+ }
+ #ask for minimum GC content in primers
+ print "\nEnter minimum GC content in primers (default 20%):";
+ my $gcMin=<STDIN>;
+ chomp $gcMin;
+ if($gcMin eq '')
+ {
+ $gcMin = 20.0;
+ }
+ #ask for optimum GC content in primers
+ print "\nEnter optimum GC content in primers (default 50%):";
+ my $gcOpt=<STDIN>;
+ chomp $gcOpt;
+ if($gcOpt eq '')
+ {
+ $gcOpt = 50.0;
+ }
+ #ask for maximum GC content in primers
+ print "\nEnter maximum GC content in primers (default 80%):";
+ my $gcMax=<STDIN>;
+ chomp $gcMax;
+ if($gcMax eq '')
+ {
+ $gcMax = 80.0;
+ }
+ print "\nEnter GC clamp (default 1):";
+ my $gclamp=<STDIN>;
+ chomp $gclamp;
+ if($gclamp eq '')
+ {
+ $gclamp = 1;
+ }
+ print "\nEnter size of region to exclude at the end of contigs (default 100 bases):";
+ my $exclude=<STDIN>;
+ chomp $exclude;
+ if ($exclude eq '')
+ {
+ $exclude = 100;
+ }
+
+
+ #tdo
+ my $quality='';
+ if ($qualAvailable)
+ {
+
+ print "\nEnter minimum quality for primer pick (default 40):";
+ $quality=<STDIN>;
+ chomp $quality;
+ if($quality eq '')
+ {
+ $quality = 40;
+ }
+ }
+
+
+return ($opt,$min,$max,$optTemp,$minTemp,$maxTemp,$flank,$lowRange,$maxRange,$gcMin,$gcOpt,$gcMax,$gclamp,$exclude, $quality);
+
+}
+###############
+#-----------------------------------------------------END of PRIMER DESIGN ----------------------------------------------------------------
+#-----------------------------------------------------END OF ABACAS -----------------------------------------------------------------------
+
+
diff --git a/debian/abacas.1 b/debian/abacas.1
deleted file mode 100644
index a67c317..0000000
--- a/debian/abacas.1
+++ /dev/null
@@ -1,104 +0,0 @@
-.TH ABACAS "1" "2011-02-11" "1.3.1" "User Commands"
-.SH NAME
-abacas \- Algorithm Based Automatic Contiguation of Assembled Sequences
-.SH SYNOPSIS
-.B abacas
-\fB\-r\fR \fIref\fR \fB\-q\fR \fIqs\fR \fB\-p\fR \fIprog\fR [OPTIONS]
-.PP
-OR
-.PP
-.B abacas
-\fB\-r\fR \fIref\fR \fB\-q\fR \fIpsf\fR \fB\-e\fR
-.PP
-.TP
-\fIref\fR
-reference sequence in a single fasta file
-.TP
-\fIqs\fR
-contigs in multi\-fasta format
-.TP
-\fI\prog\fR
-MUMmer program to use: 'nucmer' or 'promer'
-.TP
-\fIpsf\fR
-pseudomolecule/ordered sequence file in fasta format
-.PP
-\fBOPTIONS\fR
-.TP
-\fB\-h\fR
-print usage
-.TP
-\fB\-d\fR
-use default nucmer/promer parameters
-.TP
-\fB\-s\fR
-int minimum length of exact matching word (nucmer default = 12, promer default = 4)
-.TP
-\fB\-m\fR
-print ordered contigs to file in multifasta format
-.TP
-\fB\-b\fR
-print contigs in bin to file
-.TP
-\fB\-N\fR
-print a pseudomolecule without "N"s
-.TP
-\fB\-i\fR
-int mimimum percent identity [default 40]
-.TP
-\fB\-v\fR
-int mimimum contig coverage [default 40]
-.TP
-\fB\-V\fR
-int minimum contig coverage difference [default 1]
-.TP
-\fB\-l\fR
-int minimum contig length [default 1]
-.TP
-\fB\-t\fR
-run tblastx on contigs that are not mapped
-.TP
-\fB\-g\fR
-string (file name) print uncovered regions (gaps) on reference to file name
-.TP
-\fB\-a\fR
-append contigs in bin to the pseudomolecule
-.TP
-\fB\-o\fR
-prefix output files will have this prefix
-.TP
-\fB\-P\fR
-pick primer sets to close gaps
-.TP
-\fB\-f\fR
-int number of flanking bases on either side of a gap for primer design (default 350)
-.TP
-\fB\-R\fR
-int Run mummer [default 1, use \fB\-R\fR 0 to avoid running mummer]
-.TP
-\fB\-e\fR
-Escape contig ordering i.e. go to primer design
-.TP
-\fB\-c\fR
-Reference sequence is circular
-
-.SH DESCRIPTION
-ABACAS is intended to rapidly contiguate (align, order, orientate),
-visualize and design primers to close gaps on shotgun assembled contigs
-based on a reference sequence.
-.PP
-ABACAS uses MUMmer to find alignment positions and identify syntenies
-of assembled contigs against the reference. The output is then processed
-to generate a pseudomolecule taking overlapping contigs and gaps in to
-account. ABACAS generates a comparision file that can be used to
-visualize ordered and oriented contigs in ACT. Synteny is represented by
-red bars where colour intensity decreases with lower values of percent
-identity between comparable blocks. Information on contigs such as the
-orientation, percent identity, coverage and overlap with other contigs
-can also be visualized by loading the outputted feature file on ACT.
-
-.SH AUTHOR
-ABACAS IS Copyright (C) 2008-10 The Wellcome Trust Sanger Institute, Cambridge, UK.
-.PP
-This manual page was written by Andreas Tille <tille at debian.org>,
-for the Debian project (and may be used by others).
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index b806ebc..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,29 +0,0 @@
-abacas (1.3.1-4) UNRELEASED; urgency=medium
-
- * Introduced EDAM annotation
-
- -- Steffen Moeller <moeller at debian.org> Fri, 05 Feb 2016 17:16:23 +0100
-
-abacas (1.3.1-3) unstable; urgency=medium
-
- * Moved debian/upstream to debian/upstream/metadata
- * cme fix dpkg-control
-
- -- Andreas Tille <tille at debian.org> Mon, 25 Jan 2016 09:03:25 +0100
-
-abacas (1.3.1-2) unstable; urgency=low
-
- * debian/upstream: Added citation information
- * debian/control:
- - cme fix dpkg-control
- - debhelper 9
- - canocoal Vcs URLs
- * debian/copyright: DEP5
-
- -- Andreas Tille <tille at debian.org> Fri, 25 Oct 2013 15:59:00 +0200
-
-abacas (1.3.1-1) unstable; urgency=low
-
- * Initial release (Closes: #619100)
-
- -- Andreas Tille <tille at debian.org> Mon, 21 Mar 2011 09:48:04 +0100
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index ec63514..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-9
diff --git a/debian/control b/debian/control
deleted file mode 100644
index a5dd73f..0000000
--- a/debian/control
+++ /dev/null
@@ -1,30 +0,0 @@
-Source: abacas
-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Andreas Tille <tille at debian.org>
-Section: science
-Priority: optional
-Build-Depends: debhelper (>= 9)
-Standards-Version: 3.9.6
-Vcs-Browser: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/abacas/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/abacas/trunk/
-Homepage: http://abacas.sourceforge.net/
-
-Package: abacas
-Architecture: all
-Depends: ${perl:Depends},
- ${misc:Depends},
- mummer
-Description: Algorithm Based Automatic Contiguation of Assembled Sequences
- ABACAS is intended to rapidly contiguate (align, order, orientate),
- visualize and design primers to close gaps on shotgun assembled contigs
- based on a reference sequence.
- .
- ABACAS uses MUMmer to find alignment positions and identify syntenies
- of assembled contigs against the reference. The output is then processed
- to generate a pseudomolecule taking overlapping contigs and gaps in to
- account. ABACAS generates a comparision file that can be used to
- visualize ordered and oriented contigs in ACT. Synteny is represented by
- red bars where colour intensity decreases with lower values of percent
- identity between comparable blocks. Information on contigs such as the
- orientation, percent identity, coverage and overlap with other contigs
- can also be visualized by loading the outputted feature file on ACT.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index 2dc731d..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,31 +0,0 @@
-Format: http://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: Abacas
-Upstream-Contact: Wellcome Trust Sanger Institute <sa4 at sanger.ac.uk>
-Source: http://sourceforge.net/projects/abacas/files/
-
-Files: *
-Copyright: © 2008-10 Genome Research Limited. All Rights Reserved.
-License: GPL-2+
-
-Files: debian/*
-Copyright: © 2011 Andreas Tille <tille at debian.org>
-License: GPL-2+
-
-License: GPL-2+
- Abacas is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
- .
- Velvet is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- .
- You should have received a copy of the GNU General Public License
- along with Velvet; if not, write to the Free Software
- Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- .
- On Debian systems, the complete text of the GNU General Public
- License version 2 can be found in ‘/usr/share/common-licenses/GPL-2’.
-
diff --git a/debian/doc-base b/debian/doc-base
deleted file mode 100644
index cf66533..0000000
--- a/debian/doc-base
+++ /dev/null
@@ -1,12 +0,0 @@
-Document: abacas
-Title: Abacas user manual
-Author: Wellcome Trust Sanger Institute
-Abstract: Algorithm Based Automatic Contiguation of Assembled Sequences
- ABACAS is intended to rapidly contiguate (align, order, orientate),
- visualize and design primers to close gaps on shotgun assembled contigs
- based on a reference sequence.
-Section: Science/Biology
-
-Format: html
-Files: /usr/share/doc/abacas/Manual.html
-Index: /usr/share/doc/abacas/Manual.html
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index bbc3411..0000000
--- a/debian/docs
+++ /dev/null
@@ -1,2 +0,0 @@
-*.html
-*.css
diff --git a/debian/get-orig-source b/debian/get-orig-source
deleted file mode 100755
index d148835..0000000
--- a/debian/get-orig-source
+++ /dev/null
@@ -1,48 +0,0 @@
-#!/bin/sh -e
-# creating source tarbal for abacas which comes as plain Perl file
-# and needs to be putin to a tarball
-
-PKG=`dpkg-parsechangelog | awk '/^Source/ { print $2 }'`
-
-if ! echo $@ | grep -q upstream-version ; then
- # if called manually run uscan to obtain file and version number
- # VERSION=`dpkg-parsechangelog | awk '/^Version:/ { print $2 }' | sed 's/\([0-9\.]\+\)-[0-9]\+$/\1/'`
- VERSION=`uscan --verbose --force-download | \
- grep "Newest version on remote site is .* local version is .*" | \
- head -n 1 | \
- sed "s/Newest version on remote site is \([-0-9.]\+\),.*/\1/"`
-else
- # If called by uscan
- VERSION=`echo $@ | sed 's?^.*--upstream-version \([0-9.]\+\) .*abacas.*?\1?'`
- if echo "$VERSION" | grep -q "upstream-version" ; then
- echo "Unable to parse version number"
- exit
- fi
-fi
-
-#if [ "$VERSION" = "" ] ; then
-# VERSION=`ls ../${PKG}.*.pl | sed "s/.*$PKG\.\(.*\)\.pl/\1/"`
-#fi
-
-mkdir -p ../tarballs/${PKG}-${VERSION}
-cd ../tarballs
-
-# Rename perl file (without .pl suffix) and fix perl path
-sed -e '1s?/usr/local/bin/perl?/usr/bin/perl?' \
- -e "s/^${PKG}.pl/${PKG}/" \
- ../${PKG}.${VERSION}.pl > ${PKG}-${VERSION}/${PKG}
-touch -r ../${PKG}.${VERSION}.pl ${PKG}-${VERSION}/${PKG}
-rm -f ../${PKG}.${VERSION}.pl
-
-cd ${PKG}-${VERSION}
-wget -q -N http://abacas.sourceforge.net/Manual.html
-sed -i -e 's?index.html">Home?http://abacas.sourceforge.net/&?' \
- -e 's?documentation.html">Documentation?http://abacas.sourceforge.net/&?' \
- -e 's/perl abacas\.pl/abacas/' \
- -e 's/abacas\.pl/abacas/' \
- Manual.html
-wget -q -N http://abacas.sourceforge.net/style.css
-cd ..
-
-GZIP="--best --no-name" tar -czf "$PKG"_"$VERSION".orig.tar.gz "$PKG"-"$VERSION"
-rm -rf "$PKG"-"$VERSION"
diff --git a/debian/install b/debian/install
deleted file mode 100644
index 77afc00..0000000
--- a/debian/install
+++ /dev/null
@@ -1 +0,0 @@
-abacas usr/bin
diff --git a/debian/manpages b/debian/manpages
deleted file mode 100644
index 0f65186..0000000
--- a/debian/manpages
+++ /dev/null
@@ -1 +0,0 @@
-debian/*.1
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index ef9065a..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,13 +0,0 @@
-#!/usr/bin/make -f
-# debian/rules for abacas
-# Andreas Tille <tille at debian.org>
-# GPL
-
-# Uncomment this to turn on verbose mode.
-#export DH_VERBOSE=1
-
-%:
- dh $@
-
-get-orig-source:
- . debian/get-orig-source
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/upstream/edam b/debian/upstream/edam
deleted file mode 100644
index c6c64e1..0000000
--- a/debian/upstream/edam
+++ /dev/null
@@ -1,13 +0,0 @@
-ontology: EDAM (1.12)
-topic:
- - Probes and primers
-scopes:
- - name: summary
- function:
- - PCR primer design
- inputs:
- - data: Sequence
- formats: [FASTA]
- outputs:
- - data: Sequence
- formats: [FASTA]
diff --git a/debian/upstream/metadata b/debian/upstream/metadata
deleted file mode 100644
index 9e661a9..0000000
--- a/debian/upstream/metadata
+++ /dev/null
@@ -1,12 +0,0 @@
-Reference:
- Author: Samuel Assefa and Thomas M. Keane and Thomas D. Otto and Chris Newbold and Matthew Berriman
- Title: "ABACAS: algorithm-based automatic contiguation of assembled sequences"
- Journal: Bioinformatics
- Year: 2009
- Volume: 25
- Number: 15
- Pages: 1968-1969
- DOI: 10.1093/bioinformatics/btp347
- PMID: 19497936
- URL: http://bioinformatics.oxfordjournals.org/content/25/15/1968
- eprint: http://bioinformatics.oxfordjournals.org/content/25/15/1968.full.pdf+html
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index 6a4c879..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,3 +0,0 @@
-version=3
-http://sf.net/abacas/abacas\.([\d\.]+)\.pl \
- debian debian/get-orig-source
diff --git a/style.css b/style.css
new file mode 100644
index 0000000..75aea1d
--- /dev/null
+++ b/style.css
@@ -0,0 +1,102 @@
+/* Generated by KompoZer */
+body {
+ margin: 0;
+ padding: 0;
+ background-repeat: repeat-x;
+ background-attachment: scroll;
+ background-position: center top;
+ font-family: verdana,arial,sans-serif;
+ font-style: normal;
+ font-variant: normal;
+ font-weight: normal;
+ font-size: 8pt;
+ line-height: 13pt;
+ font-size-adjust: none;
+ font-stretch: normal;
+ background-color: #f9f9f9;
+}
+#wrapper {
+ margin: 0 auto;
+ padding: 0;
+ width: 800px;
+ text-align: left;
+}
+#top {
+ background: transparent url(images/bgtop2.png) no-repeat scroll center top;
+ width: 800px;
+ height: 78px;
+}
+#content {
+ padding: 0px 17px;
+ background: transparent url(images/bgmiddle2.png) repeat-y scroll center;
+ width: 766px;
+ height: 100%;
+}
+#header {
+ margin: 0px 0px 4px;
+ padding: 60px 0px 0px 20px;
+ background: transparent url(images/bgtop2.png) no-repeat scroll center top;
+ width: 766px;
+ height: 56px;
+ color: black;
+ font-size: 16px;
+}
+#header h1 {
+ padding: 0px 190px 0px 125px;
+ font-size: 18px;
+ font-weight: normal;
+}
+#menu {
+ width: 200px;
+ height: 100%;
+ margin-left: 10px;
+ float: left;
+ text-align: left;
+}
+#menu li a {
+ voice-family: inherit;
+ height: 29px;
+ text-decoration: none;
+ text-align: left;
+}
+#menu li a:link, #menu li a:visited {
+ padding: 8px 0 0 10px;
+ background: transparent url(images/off2.png) no-repeat scroll center top;
+ color: navy;
+ display: block;
+ height: 29px;
+ text-align: left;
+}
+#menu li a:hover {
+ padding: 8px 0 0 10px;
+ background: transparent url(images/on.png) no-repeat scroll center top;
+ color: blue;
+ height: 29px;
+ text-align: left;
+}
+ul {
+ margin: 0;
+ padding: 0;
+ list-style-type: none;
+ list-style-image: none;
+ list-style-position: outside;
+ text-align: left;
+}
+#stuff {
+ border: none;
+ margin: 0px 0px 0px 220px;
+ background-color: transparent;
+ background-repeat: no-repeat;
+ background-attachment: scroll;
+ background-position: left top;
+ width: 520px;
+ padding-top: 55px;
+}
+img {
+ border: none;
+}
+#bottom {
+ background: transparent url(images/bgbottom3.png) no-repeat scroll center bottom;
+ width: 800px;
+ height: 50px;
+}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/abyss.git
More information about the debian-med-commit
mailing list