[med-svn] [velvetoptimiser] 09/11: New upstream version 2.2.5
Andreas Tille
tille at debian.org
Fri Sep 22 07:15:26 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository velvetoptimiser.
commit d3f2c8616f9e58364bedd8e4f2fd138f1d6ccf70
Author: Andreas Tille <tille at debian.org>
Date: Fri Sep 22 09:09:15 2017 +0200
New upstream version 2.2.5
---
CHANGELOG | 84 +++++
INSTALL | 59 +++
LICENSE | 339 +++++++++++++++++
README | 311 ++++++++++++++++
VelvetOpt/Assembly.pm | 567 ++++++++++++++++++++++++++++
VelvetOpt/Utils.pm | 217 +++++++++++
VelvetOpt/gwrap.pm | 171 +++++++++
VelvetOpt/hwrap.pm | 371 +++++++++++++++++++
VelvetOptimiser.pl | 923 ++++++++++++++++++++++++++++++++++++++++++++++
debian/changelog | 33 --
debian/compat | 1 -
debian/control | 22 --
debian/copyright | 32 --
debian/docs | 1 -
debian/install | 2 -
debian/manpages | 1 -
debian/patches/no_findbin | 13 -
debian/patches/series | 0
debian/rules | 15 -
debian/source/format | 1 -
debian/velvetoptimiser.1 | 77 ----
debian/watch | 3 -
22 files changed, 3042 insertions(+), 201 deletions(-)
diff --git a/CHANGELOG b/CHANGELOG
new file mode 100644
index 0000000..c4a7bcc
--- /dev/null
+++ b/CHANGELOG
@@ -0,0 +1,84 @@
+
+CHANGELOG FOR VELVETOPTIMISER
+=============================
+
+Changes since Version 2.0:
+
+2.0.1:
+
+* Added Mikael Brandstrom Durling's code to get free_mem and num_cpus for the Mac.
+* Fixed a bug where if no assembly score was calculable the program crashed. It now sets the assembly score to 0 instead.
+
+2.1.0:
+
+* Added two stage optimisation functions. First one is used to optimise for hash value and second to optimise for cov_cutoff. Both are user definable and default to "n50" for k-mer size and "Lbp" for cov_cutoff.
+* Above necessitated change in command line option letters to minimise confusion. first stage opt. func is -k for k-mer size and second is -c for cov_cutoff
+* Fixed a bug in Utils.pm where the exp_cov was only calculated for the first two categories and left out the rest. Now uses all short read categories.
+* Added a command line option -o to pass through extra commands to velvetg (such as long_mult_cutoff etc.) NB: No sanity checking here!
+
+2.1.1:
+
+* Fixed a bug where prefixs containing '-' or '.' would cause the script to fail.
+
+2.1.2:
+
+* Fixed a bug where estExpCov would try and incorporate columns in the stats.txt file that contained "Inf" or "N/A" into the calculation and thereby crash.
+
+2.1.3:
+
+* Now gives a nice warning when optimisation function returns undef or 0, instead of cryptic perl error message.
+
+2.1.4:
+
+* Fixed another bug in estExpCov in Utils.pm so it now doesn't count stats with coverage < 2 and contigs of less than 3 * kmer size - 1.
+
+2.1.5:
+
+* Added support for velveth's new input file types. (bam, sam and raw) and attempted to future proof it..
+
+2.1.6
+
+* Now prints Velvet calculated insert sizes and standard deviations in assembly summaries, both in the logfile and on screen
+
+2.1.7
+
+* Takes new velveth help format into account. Thanks to Alexie Papanicolaou - CSIRO for the patch.
+
+2.1.8
+
+* Fixed a bug in the Assembly.pm file so it displays the paired end statistics for newer versions of velvet.
+
+2.1.9
+
+* Added a hash value search step option.
+* Calculates the maximum number of velvet instances to run allowing for the velvet OMP compile num threads.
+* Warns if compiled with velvet is OMP and optimiser threads set to more than number of cpus times OMP threads.
+
+2.2.0
+
+* You can choose an output folder to put final optimal results into with the -d option.
+* Fixed a potential bug when running as root.
+* Set default 'hashe' parameter to MAXKMERLENGTH that velvet was compiled with.
+
+2.2.1
+
+* Added an option to manually set a minimum coverage cutoff.
+
+2.2.2
+
+* Added an option to set the maximum coverage cutoff as a fraction of the expected coverage.
+
+2.2.3
+
+* Added --version option
+* Added LICENSE
+* Added INSTALL
+* Moved the change log from README to its own file CHANGELOG
+
+2.2.4
+
+* Fixed a bug when the starting and ending hash values were the same.
+
+2.2.5
+
+* Re-configured velveth input line checker to handle new velvet options and hopefully future proof it.
diff --git a/INSTALL b/INSTALL
new file mode 100644
index 0000000..8fb3bf4
--- /dev/null
+++ b/INSTALL
@@ -0,0 +1,59 @@
+
+==============================
+HOW TO INSTALL VELVETOPTIMISER
+==============================
+
+
+
+1. DOWNLOAD TARBALL
+===================
+
+Download the latest .tar.gz file from the website:
+
+ http://bioinformatics.net.au/software.velvetoptimiser.shtml
+
+For example:
+
+ VelvetOptimiser-2.2.4.tar.gz
+
+
+
+2. UNTAR
+========
+
+Choose a place to put the software - it comes in it's own directory.
+
+Here we install it as a normal user in our $HOME directory (eg. /home/peter)
+
+ % cd $HOME
+ % sudo tar zxvf /path/to/VelvetOptimiser-2.2.4.tar.gz
+
+You now have it in a folder called $HOME/VelvetOptimiser-2.2.4
+
+
+
+3. ADD TO PATH
+==============
+
+Add the following to your $HOME/.bashrc file
+
+ export PATH=$PATH:$HOME/VelvetOptimiser-2.2.4
+
+You will have to log out and log back in for this to be incorporated.
+
+
+4. TRY IT
+=========
+
+Type:
+
+ % VelvetOptimiser.pl
+
+You should see the command line help. For detailed instructions read the
+README file:
+
+ % less $HOME/VelvetOptimiser-2.2.4/README
+
+
+
+
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..d159169
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,339 @@
+ GNU GENERAL PUBLIC LICENSE
+ Version 2, June 1991
+
+ Copyright (C) 1989, 1991 Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+ Preamble
+
+ The licenses for most software are designed to take away your
+freedom to share and change it. By contrast, the GNU General Public
+License is intended to guarantee your freedom to share and change free
+software--to make sure the software is free for all its users. This
+General Public License applies to most of the Free Software
+Foundation's software and to any other program whose authors commit to
+using it. (Some other Free Software Foundation software is covered by
+the GNU Lesser General Public License instead.) You can apply it to
+your programs, too.
+
+ When we speak of free software, we are referring to freedom, not
+price. Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+this service if you wish), that you receive source code or can get it
+if you want it, that you can change the software or use pieces of it
+in new free programs; and that you know you can do these things.
+
+ To protect your rights, we need to make restrictions that forbid
+anyone to deny you these rights or to ask you to surrender the rights.
+These restrictions translate to certain responsibilities for you if you
+distribute copies of the software, or if you modify it.
+
+ For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must give the recipients all the rights that
+you have. You must make sure that they, too, receive or can get the
+source code. And you must show them these terms so they know their
+rights.
+
+ We protect your rights with two steps: (1) copyright the software, and
+(2) offer you this license which gives you legal permission to copy,
+distribute and/or modify the software.
+
+ Also, for each author's protection and ours, we want to make certain
+that everyone understands that there is no warranty for this free
+software. If the software is modified by someone else and passed on, we
+want its recipients to know that what they have is not the original, so
+that any problems introduced by others will not reflect on the original
+authors' reputations.
+
+ Finally, any free program is threatened constantly by software
+patents. We wish to avoid the danger that redistributors of a free
+program will individually obtain patent licenses, in effect making the
+program proprietary. To prevent this, we have made it clear that any
+patent must be licensed for everyone's free use or not licensed at all.
+
+ The precise terms and conditions for copying, distribution and
+modification follow.
+
+ GNU GENERAL PUBLIC LICENSE
+ TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
+
+ 0. This License applies to any program or other work which contains
+a notice placed by the copyright holder saying it may be distributed
+under the terms of this General Public License. The "Program", below,
+refers to any such program or work, and a "work based on the Program"
+means either the Program or any derivative work under copyright law:
+that is to say, a work containing the Program or a portion of it,
+either verbatim or with modifications and/or translated into another
+language. (Hereinafter, translation is included without limitation in
+the term "modification".) Each licensee is addressed as "you".
+
+Activities other than copying, distribution and modification are not
+covered by this License; they are outside its scope. The act of
+running the Program is not restricted, and the output from the Program
+is covered only if its contents constitute a work based on the
+Program (independent of having been made by running the Program).
+Whether that is true depends on what the Program does.
+
+ 1. You may copy and distribute verbatim copies of the Program's
+source code as you receive it, in any medium, provided that you
+conspicuously and appropriately publish on each copy an appropriate
+copyright notice and disclaimer of warranty; keep intact all the
+notices that refer to this License and to the absence of any warranty;
+and give any other recipients of the Program a copy of this License
+along with the Program.
+
+You may charge a fee for the physical act of transferring a copy, and
+you may at your option offer warranty protection in exchange for a fee.
+
+ 2. You may modify your copy or copies of the Program or any portion
+of it, thus forming a work based on the Program, and copy and
+distribute such modifications or work under the terms of Section 1
+above, provided that you also meet all of these conditions:
+
+ a) You must cause the modified files to carry prominent notices
+ stating that you changed the files and the date of any change.
+
+ b) You must cause any work that you distribute or publish, that in
+ whole or in part contains or is derived from the Program or any
+ part thereof, to be licensed as a whole at no charge to all third
+ parties under the terms of this License.
+
+ c) If the modified program normally reads commands interactively
+ when run, you must cause it, when started running for such
+ interactive use in the most ordinary way, to print or display an
+ announcement including an appropriate copyright notice and a
+ notice that there is no warranty (or else, saying that you provide
+ a warranty) and that users may redistribute the program under
+ these conditions, and telling the user how to view a copy of this
+ License. (Exception: if the Program itself is interactive but
+ does not normally print such an announcement, your work based on
+ the Program is not required to print an announcement.)
+
+These requirements apply to the modified work as a whole. If
+identifiable sections of that work are not derived from the Program,
+and can be reasonably considered independent and separate works in
+themselves, then this License, and its terms, do not apply to those
+sections when you distribute them as separate works. But when you
+distribute the same sections as part of a whole which is a work based
+on the Program, the distribution of the whole must be on the terms of
+this License, whose permissions for other licensees extend to the
+entire whole, and thus to each and every part regardless of who wrote it.
+
+Thus, it is not the intent of this section to claim rights or contest
+your rights to work written entirely by you; rather, the intent is to
+exercise the right to control the distribution of derivative or
+collective works based on the Program.
+
+In addition, mere aggregation of another work not based on the Program
+with the Program (or with a work based on the Program) on a volume of
+a storage or distribution medium does not bring the other work under
+the scope of this License.
+
+ 3. You may copy and distribute the Program (or a work based on it,
+under Section 2) in object code or executable form under the terms of
+Sections 1 and 2 above provided that you also do one of the following:
+
+ a) Accompany it with the complete corresponding machine-readable
+ source code, which must be distributed under the terms of Sections
+ 1 and 2 above on a medium customarily used for software interchange; or,
+
+ b) Accompany it with a written offer, valid for at least three
+ years, to give any third party, for a charge no more than your
+ cost of physically performing source distribution, a complete
+ machine-readable copy of the corresponding source code, to be
+ distributed under the terms of Sections 1 and 2 above on a medium
+ customarily used for software interchange; or,
+
+ c) Accompany it with the information you received as to the offer
+ to distribute corresponding source code. (This alternative is
+ allowed only for noncommercial distribution and only if you
+ received the program in object code or executable form with such
+ an offer, in accord with Subsection b above.)
+
+The source code for a work means the preferred form of the work for
+making modifications to it. For an executable work, complete source
+code means all the source code for all modules it contains, plus any
+associated interface definition files, plus the scripts used to
+control compilation and installation of the executable. However, as a
+special exception, the source code distributed need not include
+anything that is normally distributed (in either source or binary
+form) with the major components (compiler, kernel, and so on) of the
+operating system on which the executable runs, unless that component
+itself accompanies the executable.
+
+If distribution of executable or object code is made by offering
+access to copy from a designated place, then offering equivalent
+access to copy the source code from the same place counts as
+distribution of the source code, even though third parties are not
+compelled to copy the source along with the object code.
+
+ 4. You may not copy, modify, sublicense, or distribute the Program
+except as expressly provided under this License. Any attempt
+otherwise to copy, modify, sublicense or distribute the Program is
+void, and will automatically terminate your rights under this License.
+However, parties who have received copies, or rights, from you under
+this License will not have their licenses terminated so long as such
+parties remain in full compliance.
+
+ 5. You are not required to accept this License, since you have not
+signed it. However, nothing else grants you permission to modify or
+distribute the Program or its derivative works. These actions are
+prohibited by law if you do not accept this License. Therefore, by
+modifying or distributing the Program (or any work based on the
+Program), you indicate your acceptance of this License to do so, and
+all its terms and conditions for copying, distributing or modifying
+the Program or works based on it.
+
+ 6. Each time you redistribute the Program (or any work based on the
+Program), the recipient automatically receives a license from the
+original licensor to copy, distribute or modify the Program subject to
+these terms and conditions. You may not impose any further
+restrictions on the recipients' exercise of the rights granted herein.
+You are not responsible for enforcing compliance by third parties to
+this License.
+
+ 7. If, as a consequence of a court judgment or allegation of patent
+infringement or for any other reason (not limited to patent issues),
+conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License. If you cannot
+distribute so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you
+may not distribute the Program at all. For example, if a patent
+license would not permit royalty-free redistribution of the Program by
+all those who receive copies directly or indirectly through you, then
+the only way you could satisfy both it and this License would be to
+refrain entirely from distribution of the Program.
+
+If any portion of this section is held invalid or unenforceable under
+any particular circumstance, the balance of the section is intended to
+apply and the section as a whole is intended to apply in other
+circumstances.
+
+It is not the purpose of this section to induce you to infringe any
+patents or other property right claims or to contest validity of any
+such claims; this section has the sole purpose of protecting the
+integrity of the free software distribution system, which is
+implemented by public license practices. Many people have made
+generous contributions to the wide range of software distributed
+through that system in reliance on consistent application of that
+system; it is up to the author/donor to decide if he or she is willing
+to distribute software through any other system and a licensee cannot
+impose that choice.
+
+This section is intended to make thoroughly clear what is believed to
+be a consequence of the rest of this License.
+
+ 8. If the distribution and/or use of the Program is restricted in
+certain countries either by patents or by copyrighted interfaces, the
+original copyright holder who places the Program under this License
+may add an explicit geographical distribution limitation excluding
+those countries, so that distribution is permitted only in or among
+countries not thus excluded. In such case, this License incorporates
+the limitation as if written in the body of this License.
+
+ 9. The Free Software Foundation may publish revised and/or new versions
+of the General Public License from time to time. Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+Each version is given a distinguishing version number. If the Program
+specifies a version number of this License which applies to it and "any
+later version", you have the option of following the terms and conditions
+either of that version or of any later version published by the Free
+Software Foundation. If the Program does not specify a version number of
+this License, you may choose any version ever published by the Free Software
+Foundation.
+
+ 10. If you wish to incorporate parts of the Program into other free
+programs whose distribution conditions are different, write to the author
+to ask for permission. For software which is copyrighted by the Free
+Software Foundation, write to the Free Software Foundation; we sometimes
+make exceptions for this. Our decision will be guided by the two goals
+of preserving the free status of all derivatives of our free software and
+of promoting the sharing and reuse of software generally.
+
+ NO WARRANTY
+
+ 11. BECAUSE THE PROGRAM IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY
+FOR THE PROGRAM, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN
+OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES
+PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED
+OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF
+MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS
+TO THE QUALITY AND PERFORMANCE OF THE PROGRAM IS WITH YOU. SHOULD THE
+PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING,
+REPAIR OR CORRECTION.
+
+ 12. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR
+REDISTRIBUTE THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES,
+INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING
+OUT OF THE USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED
+TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY
+YOU OR THIRD PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER
+PROGRAMS), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE
+POSSIBILITY OF SUCH DAMAGES.
+
+ END OF TERMS AND CONDITIONS
+
+ How to Apply These Terms to Your New Programs
+
+ If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+ To do so, attach the following notices to the program. It is safest
+to attach them to the start of each source file to most effectively
+convey the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+ <one line to give the program's name and a brief idea of what it does.>
+ Copyright (C) <year> <name of author>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License along
+ with this program; if not, write to the Free Software Foundation, Inc.,
+ 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA.
+
+Also add information on how to contact you by electronic and paper mail.
+
+If the program is interactive, make it output a short notice like this
+when it starts in an interactive mode:
+
+ Gnomovision version 69, Copyright (C) year name of author
+ Gnomovision comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+ This is free software, and you are welcome to redistribute it
+ under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License. Of course, the commands you use may
+be called something other than `show w' and `show c'; they could even be
+mouse-clicks or menu items--whatever suits your program.
+
+You should also get your employer (if you work as a programmer) or your
+school, if any, to sign a "copyright disclaimer" for the program, if
+necessary. Here is a sample; alter the names:
+
+ Yoyodyne, Inc., hereby disclaims all copyright interest in the program
+ `Gnomovision' (which makes passes at compilers) written by James Hacker.
+
+ <signature of Ty Coon>, 1 April 1989
+ Ty Coon, President of Vice
+
+This General Public License does not permit incorporating your program into
+proprietary programs. If your program is a subroutine library, you may
+consider it more useful to permit linking proprietary applications with the
+library. If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.
diff --git a/README b/README
new file mode 100644
index 0000000..013a86c
--- /dev/null
+++ b/README
@@ -0,0 +1,311 @@
+NAME
+====
+
+VelvetOptimiser
+
+VERSION
+=======
+
+Version 2.2.5
+
+LICENCE
+=======
+
+Copyright 2009 - Simon Gladman - CSIRO & Monash University.
+
+simon.gladman at monash.edu
+
+This program is free software; you can redistribute it and/or modify
+it under the terms of the GNU General Public License as published by
+the Free Software Foundation; either version 2 of the License, or
+(at your option) any later version.
+
+This program is distributed in the hope that it will be useful,
+but WITHOUT ANY WARRANTY; without even the implied warranty of
+MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+GNU General Public License for more details.
+
+You should have received a copy of the GNU General Public License
+along with this program; if not, write to the Free Software
+Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+MA 02110-1301, USA.
+
+
+INTRODUCTION
+============
+
+The VelvetOptimiser is designed to run as a wrapper script for the Velvet
+assembler (Daniel Zerbino, EBI UK) and to assist with optimising the
+assembly. It searches a supplied hash value range for the optimum,
+estimates the expected coverage and then searches for the optimum coverage
+cutoff. It uses Velvet's internal mechanism for estimating insert lengths
+for paired end libraries. It can optimise the assemblies by either the
+default optimisation condition or by a user supplied one. It outputs the
+results to a subdirectory and records all its operations in a logfile.
+
+Expected coverage is estimated using the length weighted mode of the contig
+coverage in all active short columns of the stats.txt file.
+
+
+PREREQUISITES
+=============
+
+Velvet => 0.7.51
+Perl => 5.8.8
+BioPerl >= 1.4
+GNU utilities: grep sed free cut
+
+
+COMMAND LINE
+============
+
+ VelvetOptimiser.pl [options] -f 'velveth input line'
+
+ Options:
+
+ --help This help.
+ --V|version! Print version to stdout and exit.
+ --v|verbose+ Verbose logging, includes all velvet output in the logfile. (default '0').
+ --s|hashs=i The starting (lower) hash value (default '19').
+ --e|hashe=i The end (higher) hash value (default '31').
+ --x|step=i The step in hash search.. min 2, no odd numbers (default '2').
+ --f|velvethfiles=s The file section of the velveth command line. (default '0').
+ --a|amosfile! Turn on velvet's read tracking and amos file output. (default '0').
+ --o|velvetgoptions=s Extra velvetg options to pass through. eg. -long_mult_cutoff -max_coverage etc (default '').
+ --t|threads=i The maximum number of simulataneous velvet instances to run. (default '4').
+ --g|genomesize=f The approximate size of the genome to be assembled in megabases.
+ Only used in memory use estimation. If not specified, memory use estimation
+ will not occur. If memory use is estimated, the results are shown and then program exits. (default '0').
+ --k|optFuncKmer=s The optimisation function used for k-mer choice. (default 'n50').
+ --c|optFuncCov=s The optimisation function used for cov_cutoff optimisation. (default 'Lbp').
+ --p|prefix=s The prefix for the output filenames, the default is the date and time in the format DD-MM-YYYY-HH-MM_. (default 'auto').
+ --d|dir_final=s The name of the directory to put the final output into. (default '.')
+ --z|upperCovCutoff=f The maximum coverage cutoff to consider as a multiplier of the expected coverage. (default '0.8').
+
+
+Advanced!: Changing the optimisation function(s)
+
+Velvet optimiser assembly optimisation function can be built from the following variables.
+ LNbp = The total number of Ns in large contigs
+ Lbp = The total number of base pairs in large contigs
+ Lcon = The number of large contigs
+ max = The length of the longest contig
+ n50 = The n50
+ ncon = The total number of contigs
+ tbp = The total number of basepairs in contigs
+Examples are:
+ 'Lbp' = Just the total basepairs in contigs longer than 1kb
+ 'n50*Lcon' = The n50 times the number of long contigs.
+ 'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
+ by the total bases in all contigs plus the log of the number of bases
+ in long contigs.
+
+
+
+EXAMPLES
+========
+
+Find the best assembly for a lane of Illumina single-end reads, trying k-values between 27 and 31:
+
+% VelvetOptimiser.pl -s 27 -e 31 -f '-short -fastq s_1_sequence.txt'
+
+Print an estimate of how much RAM is needed by the above command, if we use eight threads at once,
+and we estimate our assembled genome to be 4.5 megabases long:
+
+% VelvetOptimiser.pl -s 27 -e 31 -f '-short -fastq s_1_sequence.txt' -g 4.5 -t 8
+
+Find the best assembly for Illumina paired end reads just for k=31, using four threads (eg. quad core CPU),
+but optimizing for N50 for k-mer length rather than sum of large contig sizes:
+
+% VelvetOptimiser.pl -s 31 -e 31 -f '-shortPaired -fasta interleaved.fasta' -t 4 --optFuncKmer 'n50'
+
+
+DETAILED OPTIONS
+================
+
+-h or --help
+
+ Prints the commandline help to STDOUT.
+
+-V or --version
+
+ Prints the program name and version to STDOUT. Note that other information is still
+ printed to STDERR. You can ignore this by redirecting the output like this:
+ % VelvetOptimiser.pl --version 2> /dev/null
+
+-v or --verbose
+
+ Adds the full velveth and velvetg output to the logfile. (Handy for
+ looking at the insert lengths and sds that Velvet has chosen for each library.)
+
+-s or --hashs
+
+ Parameter type required: odd integer > 0 & <= the MAXKMERLENGTH velvet was compiled with.
+ Default: 19
+
+ This is the lower end of the hash value range that the optimiser will search for the optimum.
+ If the supplied value is even, it will be lowered by 1.
+ If the supplied value is higher than MAXKMERLENGTH, it will be dropped to MAXKMERLENGTH.
+
+-e or --hashe
+
+ Parameter type required: odd integer >= 'hashs' & <= the MAXKMERLENGTH velvet was compiled with.
+ Default: MAXKMERLENGTH
+
+ This is the upper end of the hash value range that the optimiser will search for the optimum.
+ If the supplied value is even, it will be lowered by 1.
+ If the supplied value is higher than MAXKMERLENGTH, it will be dropped to MAXKMERLENGTH.
+ If the supplied value is lower than 'hashs' then it will be set to equal 'hashs'.
+
+-x or --step
+
+ Parameter type required: even integer < difference between 'hashs' and 'hashe'.
+ Default: 2
+
+ This parameter details the number of hash values to skip each increment from 'hashs' to 'hashe' when searching for the optimum.
+ If the supplied value is odd, it will be lowered by 1.
+ If the supplied value is less than 2, it will be set to 2.
+ If the supplied value is greater than the 'hashs' to 'hashe' range, it will be set to the range.
+
+-f or --velvethfiles
+
+ Parameter type required: string with '' or ""
+ No default.
+
+ This is a required parameter. If this option is not specified, then the optimisers usage will be displayed.
+
+ You need to supply everything you would normally supply velveth at this point except for the hash size and the
+ directory name in the following format.
+
+ {[-file_format][-read_type] filename} repeated for as many read files as you have.
+
+
+ File format options:
+ -fasta
+ -fastq
+ -fasta.gz
+ -fastq.gz
+ -bam
+ -sam
+ -eland
+ -gerald
+
+ Read type options:
+ -short
+ -shortPaired
+ -short2
+ -shortPaired2
+ -long
+ -longPaired
+
+ Examples:
+
+ -f 'reads.fna'
+ reads.fna is short not paired and fasta. (these are the defaults: -short and -fasta)
+
+ -f '-shortPaired -fastq paired_reads.fastq -long long_reads.fna'
+ Two read files supplied, first one is a paired end fastq file and the second is a long single ended read file.
+
+ -f '-shortPaired paired_reads_1.fna -shortPaired2 paired_reads_2.fna'
+ Two read files supplied, both are short paired fastas but come from two different libraries, therefore needing two different CATEGORIES.
+
+ There is a fairly extensive checker built into the optimiser to check if the format of the string is correct. However, it won't check the read files for their format (fasta, fastq, eland etc.)
+
+-a or --amosfile
+
+ Turns on Velvets read tracking and amos file output.
+ This option is the same as specifying '-amos_file yes -read_trkg yes' in velvetg. However, it will only be added to the final assembly and not to the intermediate ones.
+
+-o or --velvetgoptions
+
+ Parameter type required: string.
+ No default
+
+ String should contain extra options to be passed to velvetg as required such as "-long_mult_cutoff 1" or "-max_coverage 50" etc. Warning, there is no sanity check, so be careful. The optimiser will crash if you give velvetg something it doesn't handle.
+
+-t or --threads
+
+ Parameter type required: integer
+
+ Specifies the maximum number of threads (simulataneous Velvet instances) to run. It defaults to the number of CPUs in the current computer.
+
+-g or --genomesize
+
+ Parameter type required: float.
+ No default.
+
+ This option will run the Optimiser's memory estimator. It will estimate the memory required to run Velvet with the current -f parameter and number of threads. Once the estimator has finsihed its calulations, it will display the required memory, make a recommendation and then exit the script. This is useful for determining if you will have sufficient free RAM to run the assembly before you start.
+ You need to supply the approximate size of the genome you are assembling in mega bases. For example, for a Salmonella genome I would use: -g 5
+
+-k or --optFuncKmer
+
+ Parameter type required: string.
+ Default: 'n50'
+
+ This option will change the function that the Optimiser uses to find the best hash value from the given range. The default is to use the n50. I have found this function to work for me better than the previous single optimisation function, but you may wish to change it. A list of possible variables to use in your optimisation function and some examples are shown below.
+
+-c or --optFuncCov
+
+ Parameter type required: string.
+ Default: 'Lbp'
+
+ This option will change the function that the Optimiser uses to find the best hash value from the given range. The default is to use the number of basepairs in contigs greater than 1 kilobase in length. I have found this function to work for me but you may wish to change it. A list of possible variables to use in your optimisation function and some examples are shown below.
+
+ Velvet optimiser assembly optimisation functions can be built from the following variables:
+
+ LNbp = The total number of Ns in large contigs
+ Lbp = The total number of base pairs in large contigs
+ Lcon = The number of large contigs
+ max = The length of the longest contig
+ n50 = The n50
+ ncon = The total number of contigs
+ tbp = The total number of basepairs in contigs
+
+ Examples are:
+
+ 'Lbp' = Just the total basepairs in contigs longer than 1kb
+ 'n50*Lcon' = The n50 times the number of long contigs.
+ 'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
+ by the total bases in all contigs plus the log of the number of bases
+ in long contigs.
+
+ Be warned! The optimiser doesn't care what you supply in this string and will attempt to run anyway. If you give it a nonsensical optimisation function be prepared to receive a nonsensical assembly!
+
+-p or --prefix
+
+ Parameter type required: string
+ Default: The current date and time in the format "DD-MM-YYYY-HH-MM-SS_"
+
+ Names the logfile and the output directory with whatever prefix is supplied followed by "_logfile.txt" for the logfile and "_data_k" where k is the optimum hash value for the ouput data directory.
+
+-d or --dir_final
+
+ Parameter type required: string
+ Default: . (the current directory)
+
+ At the completion of the optimiser, any non default string will cause the final velvet output and the Velvet Optimiser logfile to be moved to the directory specified. If the directory already exists, an error is generated and the optimiser stops.
+
+-z or --upperCovCutoff
+
+ Parameter type required: float
+ Default: 0.8
+
+ Uses this fraction of the expected coverage to set the upper limit of the coverage cutoff range to search for the optimum in.
+
+BUGS
+====
+
+* None that I am aware of.
+
+TO DO
+=====
+
+* Make the auto_XXX folders be in --dir_final when set, not in the current directory.
+* Use velvetk.pl script to choose suitable -s and -e parameters.
+
+CONTACT
+=======
+
+Simon Gladman <simon.gladman at csiro.au>
+Torsten Seemann <torsten.seemann at monash.edu>
+
diff --git a/VelvetOpt/Assembly.pm b/VelvetOpt/Assembly.pm
new file mode 100644
index 0000000..c2c9cef
--- /dev/null
+++ b/VelvetOpt/Assembly.pm
@@ -0,0 +1,567 @@
+# VelvetOpt::Assembly.pm
+#
+# Copyright 2008,2009 Simon Gladman <simon.gladman at monash.edu>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+# Version 2.1.4
+#
+# Changes for 2.0.1
+# *Bug fix in CalcAssemblyScore. Now returns 0 if there is no calculable score instead of crashing.
+#
+# Changes for 2.1.0
+# *Added 2 stage optimisation functions for optimising kmer size and cov_cutoff differently if required.
+#
+# Changes for 2.1.1
+# *Allowed for non-word characters in prefix names. (. - etc.) Still no spaces allowed in prefix name or any filenames.
+#
+# Changes for 2.1.2
+# *Now warns nicely of optimisation function returning undef or 0. Suggests you choose and alternative.
+#
+# Changes for 2.1.3
+# *Now prints the velvet calculated insert sizes and standard deviations in the Assembly summaries (both log files and screen).
+#
+# Changes for 2.1.4
+# *Fixed a bug where newer versions of velvet would cause the paired end library stats not to be displayed.
+
+package VelvetOpt::Assembly;
+
+=head1 NAME
+
+VelvetOpt::Assembly.pm - Velvet assembly container class.
+
+=head1 AUTHOR
+
+Simon Gladman, CSIRO, 2007, 2008.
+
+=head1 LICENSE
+
+Copyright 2008, 2009 Simon Gladman <simon.gladman at csiro.au>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ MA 02110-1301, USA.
+
+=head1 SYNOPSIS
+
+ use VelvetOpt::Assembly;
+ my $object = VelvetOpt::Assembly->new(
+ timestamph => "23 November 2008 15:00:00",
+ ass_id => "1",
+ versionh => "0.7.04",
+ ass_dir => "/home/gla048/Desktop/newVelvetOptimiser/data_1"
+ );
+ print $object->toString();
+
+=head1 DESCRIPTION
+
+A container class to hold the results of a Velvet assembly. Includes timestamps,
+version information, parameter strings and assembly output metrics.
+
+Version 2.1.4
+
+=head2 Uses
+
+=over 8
+
+=item strict
+
+=item warnings
+
+=item Carp
+
+=back
+
+=head2 Fields
+
+=over 8
+
+=item assmscore
+
+The assembly score metric for this object
+
+=item timstamph
+
+The timestamp of the start of the velveth run for this assembly
+
+=item timestampg
+
+The date and time of the end of the velvetg run.
+
+=item ass_id
+
+The assembly id number. Sequential for all the runs for this optimisation.
+
+=item versionh
+
+The version number of velveth used in this assembly
+
+=item versiong
+
+The version number of velvetg used in this assembly
+
+=item readfilename
+
+The name of the file containing all the reads (or a qw of them if more than one...)
+
+=item pstringh
+
+The velveth parameter string used in this assembly
+
+=item pstringg
+
+The velvetg parameter string used in this assembly
+
+=item ass_dir
+
+The assembly directory path (full)
+
+=item hashval
+
+The hash value used for this assembly
+
+=item rmapfs
+
+The roadmap file size
+
+=item sequences
+
+The total number of sequences in the input files
+
+=item nconts
+
+The number of contigs in the final assembly
+
+=item totalbp
+
+The total number of bases in the contigs
+
+=item n50
+
+The n50 of the assembly
+
+=item maxlength
+
+The length of the longest contig in the assembly
+
+=item maxcont
+
+The size of the largest contig in the assembly
+
+=item nconts1k
+
+The number of contigs greater than 1k in size
+
+=item totalbp1k
+
+the sum of the length of contigs > 1k in size
+
+=item velvethout
+
+The velveth output
+
+=item velvetgout
+
+The velvetg output
+
+=back
+
+=head2 Methods
+
+=over 8
+
+=item new
+
+Returns a new VelvetAssembly object.
+
+=item accessor methods
+
+Accessor methods for all fields.
+
+=item calcAssemblyScore
+
+Calculates the assembly score of the object (after velvetg has been run.) and stores it in self.
+
+=item getHashingDetails
+
+Gets the details of the outputs from the velveth run and stores it in self.
+
+=item getAssemblyDetails
+
+Gets the details of the outputs from the velvetg run and stores it in self.
+
+=item toString
+
+Returns a string representation of the object's contents.
+
+=item toStringNoV
+
+Returns a string representation of the object's contents without the velvet outputs which are large.
+
+=item opt_func_toString
+
+Returns the usage of the optimisation function.
+
+=back
+
+=cut
+
+use strict;
+use Carp;
+use warnings;
+#use base "Storable";
+use Cwd;
+use Bio::SeqIO;
+
+my $interested = 0;
+
+
+
+#constructor
+sub new {
+ my $class = shift;
+ my $self = {@_};
+ bless ($self, $class);
+ return $self;
+}
+
+#optimisation function options...
+my %f_opts;
+ $f_opts{'ncon'}->{'intname'} = 'nconts';
+ $f_opts{'ncon'}->{'desc'} = "The total number of contigs";
+ $f_opts{'n50'}->{'intname'} = 'n50';
+ $f_opts{'n50'}->{'desc'} = "The n50";
+ $f_opts{'max'}->{'intname'} = 'maxlength';
+ $f_opts{'max'}->{'desc'} = "The length of the longest contig";
+ $f_opts{'Lcon'}->{'intname'} = 'nconts1k';
+ $f_opts{'Lcon'}->{'desc'} = "The number of large contigs";
+ $f_opts{'tbp'}->{'intname'} = 'totalbp';
+ $f_opts{'tbp'}->{'desc'} = "The total number of basepairs in contigs";
+ $f_opts{'Lbp'}->{'intname'} = 'totalbp1k';
+ $f_opts{'Lbp'}->{'desc'} = "The total number of base pairs in large contigs";
+ $f_opts{'LNbp'}->{'intname'} = 'numNs1k';
+ $f_opts{'LNbp'}->{'desc'} = "The total number of Ns in large contigs";
+
+#accessor methods
+sub assmscore{ $_[0]->{assmscore}=$_[1] if defined $_[1]; $_[0]->{assmscore}}
+sub timestamph{ $_[0]->{timestamph}=$_[1] if defined $_[1]; $_[0]->{timestamph}}
+sub timestampg{ $_[0]->{timestampg}=$_[1] if defined $_[1]; $_[0]->{timestampg}}
+sub ass_id{ $_[0]->{ass_id}=$_[1] if defined $_[1]; $_[0]->{ass_id}}
+sub versionh{ $_[0]->{versionh}=$_[1] if defined $_[1]; $_[0]->{versionh}}
+sub versiong{ $_[0]->{versiong}=$_[1] if defined $_[1]; $_[0]->{versiong}}
+sub readfilename{ $_[0]->{readfilename}=$_[1] if defined $_[1]; $_[0]->{readfilename}}
+sub pstringh{ $_[0]->{pstringh}=$_[1] if defined $_[1]; $_[0]->{pstringh}}
+sub pstringg{ $_[0]->{pstringg}=$_[1] if defined $_[1]; $_[0]->{pstringg}}
+sub ass_dir{ $_[0]->{ass_dir}=$_[1] if defined $_[1]; $_[0]->{ass_dir}}
+sub hashval{ $_[0]->{hashval}=$_[1] if defined $_[1]; $_[0]->{hashval}}
+sub rmapfs{ $_[0]->{rmapfs}=$_[1] if defined $_[1]; $_[0]->{rmapfs}}
+sub nconts{ $_[0]->{nconts}=$_[1] if defined $_[1]; $_[0]->{nconts}}
+sub n50{ $_[0]->{n50}=$_[1] if defined $_[1]; $_[0]->{n50}}
+sub maxlength{ $_[0]->{maxlength}=$_[1] if defined $_[1]; $_[0]->{maxlength}}
+sub nconts1k{ $_[0]->{nconts1k}=$_[1] if defined $_[1]; $_[0]->{nconts1k}}
+sub totalbp{ $_[0]->{totalbp}=$_[1] if defined $_[1]; $_[0]->{totalbp}}
+sub totalbp1k{ $_[0]->{totalbp1k}=$_[1] if defined $_[1]; $_[0]->{totalbp1k}}
+sub numNs1k{ $_[0]->{numNs1k}=$_[1] if defined $_[1]; $_[0]->{numNs1k}}
+sub velvethout{ $_[0]->{velvethout}=$_[1] if defined $_[1]; $_[0]->{velvethout}}
+sub velvetgout{ $_[0]->{velvetgout}=$_[1] if defined $_[1]; $_[0]->{velvetgout}}
+sub sequences{ $_[0]->{sequences}=$_[1] if defined $_[1]; $_[0]->{sequences}}
+sub assmfunc{ $_[0]->{assmfunc}=$_[1] if defined $_[1]; $_[0]->{assmfunc}}
+sub assmfunc2{ $_[0]->{assmfunc2}=$_[1] if defined $_[1]; $_[0]->{assmfunc2}}
+
+#assemblyScoreCalculator
+sub calcAssemblyScore {
+ use Safe;
+
+ my $self = shift;
+ my $func = shift;
+
+ my $cpt = new Safe;
+
+ #Basic variable IO and traversal
+ $cpt->permit(qw(null scalar const padany lineseq leaveeval rv2sv rv2hv helem hslice each values keys exists delete rv2cv));
+ #Comparators
+ $cpt->permit(qw(lt i_lt gt i_gt le i_le ge i_ge eq i_eq ne i_ne ncmp i_ncmp slt sgt sle sge seq sne scmp));
+ #Base math
+ $cpt->permit(qw(preinc i_preinc predec i_predec postinc i_postinc postdec i_postdec int hex oct abs pow multiply i_multiply divide i_divide modulo i_modulo add i_add subtract i_subtract));
+ #Binary math
+ $cpt->permit(qw(left_shift right_shift bit_and bit_xor bit_or negate i_negate not complement));
+ #Regex
+ $cpt->permit(qw(match split qr));
+ #Conditionals
+ $cpt->permit(qw(cond_expr flip flop andassign orassign and or xor));
+ #Advanced math
+ $cpt->permit(qw(atan2 sin cos exp log sqrt rand srand));
+
+ foreach my $key (keys %f_opts){
+ print "\nkey: $key\tintname: ", $f_opts{$key}->{'intname'}, "\n" if $interested;
+
+ $func =~ s/\b$key\b/$self->{$f_opts{$key}->{'intname'}}/g;
+ }
+
+ my $r = $cpt->reval($func);
+ warn $@ if $@;
+ $self->{assmscore} = $r;
+ unless($r =~ /^\d+/){
+ warn "Optimisation function did not return a single float.\nOptimisation function was not evaluatable.\nOptfunc: $func";
+ warn "Setting assembly score to 0\n";
+ $self->{assmscore} = 0;
+ }
+ if($r == 0){
+ print STDERR "**********\n";
+ print STDERR "Warning: Assembly score for assembly_id " . $self->{ass_id} . " is 0\n";
+ print STDERR "You may want to consider choosing a different optimisation variable or function.\n";
+ print STDERR "Current optimisation functions are ", $self->{assmfunc}, " for k value and ", $self->{assmfunc2}, " for cov_cutoff\n";
+ print STDERR "**********\n";
+ }
+ return 1;
+}
+
+#getHashingDetails
+sub getHashingDetails {
+ my $self = shift;
+ unless(!$self->timestamph || !$self->pstringh){
+ my $programPath = cwd;
+ $self->pstringh =~ /^(\S+)\s+(\d+)\s+(.*)$/;
+ $self->{ass_dir} = $programPath . "/" . $1;
+ $self->{rmapfs} = -s $self->ass_dir . "/Roadmaps";
+ $self->{hashval} = $2;
+ $self->{readfilename} = $3;
+ my @t = split /\n/, $self->velvethout;
+ foreach(@t){
+ if(/^(\d+).*total\.$/){
+ $self->{sequences} = $1;
+ last;
+ }
+ }
+ return 1;
+ }
+ return 0;
+}
+
+#getAssemblyDetails
+sub getAssemblyDetails {
+ my $self = shift;
+ my $file = $self->ass_dir . "/contigs.fa";
+ unless(!(-e $file)){
+
+ my $all = &contigStats($file,1);
+ my $large = &contigStats($file,1000);
+
+ $self->{nconts} = defined $all->{numSeqs} ? $all->{numSeqs} : 0;
+ $self->{n50} = defined $all->{n50} ? $all->{n50} : 0;
+ $self->{maxlength} = defined $all->{maxLen} ? $all->{maxLen} : 0;
+ $self->{nconts1k} = defined $large->{numSeqs} ? $large->{numSeqs} : 0;
+ $self->{totalbp} = defined $all->{numBases} ? $all->{numBases} : 0;
+ $self->{totalbp1k} = defined $large->{numBases} ? $large->{numBases} : 0;
+ $self->{numNs1k} = defined $large->{numNs} ? $large->{numNs} : 0;
+
+ if($self->pstringg =~ m/cov_cutoff/){
+ $self->calcAssemblyScore($self->{assmfunc2});
+ }
+ else {
+ $self->calcAssemblyScore($self->{assmfunc});
+ }
+
+ return 1;
+ }
+ return 0;
+}
+
+#contigStats
+#Original script fa-show.pl by Torsten Seemann (Monash University, Melbourne, Australia)
+#Modified by Simon Gladman to suit.
+sub contigStats {
+
+ my $file = shift;
+ my $minsize = shift;
+
+ print "In contigStats with $file, $minsize\n" if $interested;
+
+ my $numseq=0;
+ my $avglen=0;
+ my $minlen=1E9;
+ my $maxlen=0;
+ my @len;
+ my $toosmall=0;
+ my $nn=0;
+
+ my $in = Bio::SeqIO->new(-file => $file, -format => 'Fasta');
+ while(my $seq = $in->next_seq()){
+ my $L = $seq->length;
+ #check > minsize
+ if($L < $minsize){
+ $toosmall ++;
+ next;
+ }
+ #count Ns
+ my $s = $seq->seq;
+ my $n = $s =~ s/N/N/gi;
+ $n ||= 0;
+ $nn += $n;
+ #count seqs and other stats
+ $numseq ++;
+ $avglen += $L;
+ $maxlen = $L if $L > $maxlen;
+ $minlen = $L if $L < $minlen;
+ push @len, $L;
+ }
+ @len = sort { $a <=> $b } @len;
+ my $cum = 0;
+ my $n50 = 0;
+ for my $i (0 .. $#len){
+ $cum += $len[$i];
+ if($cum >= $avglen/2) {
+ $n50 = $len[$i];
+ last;
+ }
+ }
+
+ my %out;
+ if($numseq > 0){
+ $out{numSeqs} = $numseq;
+ $out{numBases} = $avglen;
+ $out{numOK} = ($avglen - $nn);
+ $out{numNs} = $nn;
+ $out{minLen} = $minlen;
+ $out{avgLen} = $avglen/$numseq;
+ $out{maxLen} = $maxlen;
+ $out{n50} = $n50;
+ $out{minsize} = $minsize;
+ $out{numTooSmall} = $toosmall;
+ }
+ else {
+ $out{$numseq} = 0;
+ }
+
+ print "Leaving contigstats!\n" if $interested;
+ return (\%out);
+}
+
+
+#toString method
+sub toString {
+ my $self = shift;
+ my $tmp = $self->toStringNoV();
+ if(defined $self->velvethout){
+ $tmp .= "Velveth Output:\n" . $self->velvethout() . "\n";
+ }
+ if(defined $self->velvetgout){
+ $tmp .= "Velvetg Output:\n" . $self->velvetgout() . "\n";
+ }
+ $tmp .= "**********************************************************\n";
+ return $tmp;
+}
+
+
+#toStringNoV method
+sub toStringNoV {
+ my $self = shift;
+ my $tmp = "********************************************************\n";
+ if($self->ass_id()){
+ $tmp .= "Assembly id: " . $self->ass_id(). "\n";
+ }
+ if($self->assmscore()){
+ $tmp .= "Assembly score: " .$self->assmscore(). "\n";
+ }
+ if($self->timestamph()){
+ $tmp .= "Velveth timestamp: " . $self->timestamph(). "\n";
+ }
+ if($self->timestampg()){
+ $tmp .= "Velvetg timestamp: " . $self->timestampg(). "\n";
+ }
+ if(defined $self->versionh){
+ $tmp .= "Velveth version: " . $self->versionh(). "\n";
+ }
+ if(defined $self->versiong){
+ $tmp .= "Velvetg version: " . $self->versiong(). "\n";
+ }
+ if(defined $self->readfilename){
+ $tmp .= "Readfile(s): " . $self->readfilename(). "\n";
+ }
+ if(defined $self->pstringh){
+ $tmp .= "Velveth parameter string: " . $self->pstringh(). "\n";
+ }
+ if(defined $self->pstringg){
+ $tmp .= "Velvetg parameter string: " . $self->pstringg(). "\n";
+ }
+ if(defined $self->ass_dir){
+ $tmp .= "Assembly directory: " . $self->ass_dir(). "\n";
+ }
+ if(defined $self->hashval){
+ $tmp .= "Velvet hash value: " . $self->hashval(). "\n";
+ }
+ if(defined $self->rmapfs){
+ $tmp .= "Roadmap file size: " . $self->rmapfs(). "\n";
+ }
+ if(defined $self->sequences){
+ $tmp .= "Total number of sequences: " . $self->sequences(). "\n";
+ }
+ if(defined $self->nconts){
+ $tmp .= "Total number of contigs: " . $self->nconts(). "\n";
+ }
+ if(defined $self->n50){
+ $tmp .= "n50: " . $self->n50(). "\n";
+ }
+ if(defined $self->maxlength){
+ $tmp .= "length of longest contig: " . $self->maxlength(). "\n";
+ }
+ if(defined $self->totalbp){
+ $tmp .= "Total bases in contigs: " . $self->totalbp(). "\n";
+ }
+ if(defined $self->nconts1k){
+ $tmp .= "Number of contigs > 1k: " . $self->nconts1k(). "\n";
+ }
+ if(defined $self->totalbp1k){
+ $tmp .= "Total bases in contigs > 1k: " . $self->totalbp1k(). "\n";
+ }
+ if($self->pstringh =~ /Pair/ && defined $self->pstringg && $self->pstringg =~ /-exp_cov/){
+ $tmp .= "Paired Library insert stats:\n";
+ my @x = split /\n/, $self->velvetgout;
+ foreach(@x){
+ chomp;
+ if(/Paired-end library \d+ has/){
+ s/^\[\d+\.\d+\]\s+//;
+ $tmp .= "$_\n";
+ }
+ }
+ }
+ $tmp .= "**********************************************************\n";
+ return $tmp;
+}
+
+sub opt_func_toString {
+ my $out = "\nVelvet optimiser assembly optimisation function can be built from the following variables.\n";
+ foreach my $key (sort keys %f_opts){
+ $out .= "\t$key = " . $f_opts{$key}->{'desc'} . "\n";
+ }
+ $out .= "Examples are:\n\t'Lbp' = Just the total basepairs in contigs longer than 1kb\n";
+ $out .= "\t'n50*Lcon' = The n50 times the number of long contigs.\n";
+ $out .= "\t'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided\n\t\tby the total bases in all contigs plus the log of the number of bases\n\t\tin long contigs.\n";
+ return $out
+}
+
+1;
diff --git a/VelvetOpt/Utils.pm b/VelvetOpt/Utils.pm
new file mode 100644
index 0000000..c0edbe2
--- /dev/null
+++ b/VelvetOpt/Utils.pm
@@ -0,0 +1,217 @@
+#
+# VelvetOpt::Utils.pm
+#
+# Copyright 2008,2009,2010 Simon Gladman <simon.gladman at monash.edu>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+# Version 2.1.3
+
+# Changes for Version 2.0.1
+# Added Mikael Brandstrom Durling's numCpus and freeMem for the Mac.
+#
+# Changes for Version 2.1.0
+# Fixed bug in estExpCov so it now correctly uses all short read categories not just the first two.
+#
+# Changes for Version 2.1.2
+# Fixed bug in estExpCov so it now won't take columns with "N/A" or "Inf" into account
+#
+# Changes for Version 2.1.3
+# Changed the minimum contig size to use for estimating expected coverage to 3*kmer size -1 and set the minimum coverage to 2 instead of 0.
+# This should get rid of exp_covs of 1 when it should be very high for assembling reads that weren't ampped to a reference using one of the standard read mapping programs
+
+
+package VelvetOpt::Utils;
+
+use strict;
+use warnings;
+use POSIX qw(ceil floor);
+use Carp;
+use List::Util qw(max);
+use Bio::SeqIO;
+
+# num_cpu
+# It returns the number of cpus present in the system if linux.
+# If it is MAC then it returns the number of cores present.
+# If the OS is not linux or Mac then it returns 1.
+# Written by Torsten Seemann 2009 (linux) and Mikael Brandstrom Durling 2009 (Mac).
+
+sub num_cpu {
+ if ( $^O =~ m/linux/i ) {
+ my ($num) = qx(grep -c ^processor /proc/cpuinfo);
+ chomp $num;
+ return $num if $num =~ m/^\d+/;
+ }
+ elsif( $^O =~ m/darwin/i){
+ my ($num) = qx(system_profiler SPHardwareDataType | grep Cores);
+ $num =~ /.*Cores: (\d+)/;
+ $num =$1;
+ return $num;
+ }
+ return 1;
+}
+
+# free_mem
+# Returns the current amount of free memory
+# Mac Section written by Mikael Brandstrom Durling 2009 (Mac).
+
+sub free_mem {
+ if( $^O =~ m/linux/i ) {
+ my $x = `free | grep '^Mem:' | sed 's/ */~/g' | cut -d '~' -f 4,7`;
+ my @tmp = split "~", $x;
+ my $total = $tmp[0] + $tmp[1];
+ my $totalGB = $total / 1024 / 1024;
+ return $totalGB;
+ }
+ elsif( $^O =~ m/darwin/i){
+ my ($tmp) = qx(vm_stat | grep size);
+ $tmp =~ /.*size of (\d+) bytes.*/;
+ my $page_size = $1;
+ ($tmp) = qx(vm_stat | grep free);
+ $tmp =~ /[^0-9]+(\d+).*/;
+ my $free_pages = $1;
+ my $totalGB = ($free_pages * $page_size) / 1024 / 1024 / 1024;
+ return $totalGB;
+ }
+}
+
+# estExpCov
+# it returns the expected coverage of short reads from an assembly by
+# performing a math mode on the stats.txt file supplied.. It looks at
+# all the short_cov? columns.. Uses minimum contig length and minimum coverage.
+# needs the stats.txt file path and name, and the k-value used in the assembly.
+# Original algorithm by Torsten Seemann 2009 under the GPL.
+# Adapted by Simon Gladman 2009.
+# It does a weighted mode...
+
+sub estExpCov {
+ use List::Util qw(max);
+ my $file = shift;
+ my $kmer = shift;
+ my $minlen = 3 * $kmer - 1;
+ my $mincov = 2;
+ my $fh;
+ unless ( open IN, $file ) {
+ croak "Unable to open $file for exp_cov determination.\n";
+ }
+ my @cov;
+ while (<IN>) {
+ chomp;
+ my @x = split m/\t/;
+ my $len = scalar @x;
+ next unless @x >= 7;
+ next unless $x[1] =~ m/^\d+$/;
+ next unless $x[1] >= $minlen;
+
+ #add all the short_cov columns..
+ my $cov = 0;
+ for(my $i = 5; $i < $len; $i += 2){
+ if($x[$i] =~ /\d/){
+ $cov += $x[$i];
+ }
+ }
+ next unless $cov > $mincov;
+ push @cov, ( ( int($cov) ) x $x[1] );
+ }
+
+ my %freq_of;
+ map { $freq_of{$_}++ } @cov;
+ my $mode = 0;
+ $freq_of{$mode} = 0; # sentinel
+ for my $x ( keys %freq_of ) {
+ $mode = $x if $freq_of{$x} > $freq_of{$mode};
+ }
+ return $mode;
+}
+
+# estVelvetMemUse
+# returns the estimated memory usage for velvet in GB
+
+sub estVelvetMemUse {
+ my ($readsize, $genomesize, $numreads, $k) = @_;
+ my $velvetgmem = -109635 + 18977*$readsize + 86326*$genomesize + 233353*$numreads - 51092*$k;
+ my $out = ($velvetgmem/1024) / 1024;
+ return $out;
+}
+
+# getReadSizeNum
+# returns the number of reads and average size in the short and shortPaired categories...
+
+sub getReadSizeNum {
+ my $f = shift;
+ my %reads;
+ my $num = 0;
+ my $currentfiletype = "fasta";
+ #first pull apart the velveth string and get the short and shortpaired filenames...
+ my @l = split /\s+/, $f;
+ my $i = 0;
+ foreach (@l){
+ if(/^-/){
+ if(/^-fasta/){
+ $currentfiletype = "fasta";
+ }
+ elsif(/^-fastq/){
+ $currentfiletype = "fastq";
+ }
+ elsif(/(-eland)|(-gerald)|(-fasta.gz)|(-fastq.gz)/) {
+ croak "Cannot estimate memory usage from file types other than fasta or fastq..\n";
+ }
+ }
+ elsif(-r $_){
+ my $file = $_;
+ if($currentfiletype eq "fasta"){
+ my $x = `grep -c "^>" $file`;
+ chomp($x);
+ $num += $x;
+ my $l = &getReadLength($file, 'Fasta');
+ $reads{$l} += $x;
+ print STDERR "File: $file has $x reads of length $l\n";
+ }
+ else {
+ my $x = `grep -c "^@" $file`;
+ chomp($x);
+ $num += $x;
+ my $l = &getReadLength($file, 'Fastq');
+ $reads{$l} += $x;
+ print STDERR "File: $file has $x reads of length $l\n";
+ }
+ }
+ $i ++;
+ }
+ my $totlength = 0;
+ foreach my $k (keys %reads){
+ $totlength += ($reads{$k} * $k);
+ }
+
+
+ my @results;
+ push @results, floor($totlength/$num);
+ push @results, ($num/1000000);
+ printf STDERR "Total reads: %.1f million. Avg length: %.1f\n",($num/1000000), ($totlength/$num);
+ return @results;
+}
+
+# getReadLength - returns the length of the first read in a file of type fasta or fastq..
+#
+sub getReadLength {
+ my ($f, $t) = @_;
+ my $sio = Bio::SeqIO->new(-file => $f, -format => $t);
+ my $s = $sio->next_seq() or croak "Something went bad while reading file $f!\n";
+ return $s->length;
+}
+
+return 1;
+
diff --git a/VelvetOpt/gwrap.pm b/VelvetOpt/gwrap.pm
new file mode 100644
index 0000000..1936a15
--- /dev/null
+++ b/VelvetOpt/gwrap.pm
@@ -0,0 +1,171 @@
+# VelvetOpt::gwrap.pm
+#
+# Copyright 2008 Simon Gladman <simon.gladman at monash.edu>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+#
+package VelvetOpt::gwrap;
+
+=head1 NAME
+
+VelvetOpt::gwrap.pm - Velvet graphing and assembly program wrapper module.
+
+=head1 AUTHOR
+
+Simon Gladman, CSIRO, 2007, 2008.
+
+=head1 LICENSE
+
+Copyright 2008 Simon Gladman <simon.gladman at csiro.au>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ MA 02110-1301, USA.
+
+=head1 SYNOPSIS
+
+ use VelvetOpt::gwrap;
+ use VelvetOpt::Assembly;
+ my $object = VelvetOpt::Assembly->new(
+ timestamph => "23 November 2008 15:00:00",
+ ass_id => "1",
+ versiong => "0.7.19",
+ pstringg => "test",
+ ass_dir => "/home/gla048/Desktop/newVelvetOptimiser/test"
+ );
+ my $worked = VelvetOpt::gwrap::objectVelvetg($object);
+ if($worked){
+ print $object->toString();
+ }
+ else {
+ die "Error in velvetg..\n" . $object->toString();
+ }
+
+=head1 DESCRIPTION
+
+A wrapper module to run velvetg on VelvetAssembly objects or on velvetg
+parameter strings. Also contains private methods to check velvetg
+parameter strings, run velvetg and return results.
+
+=head2 Uses
+
+=over 8
+
+=item strict
+
+=item warnings
+
+=item Carp
+
+=item VelvetOpt::Assembly
+
+=item POSIX qw(strftime)
+
+=back
+
+=head2 Private Fields
+
+=over 8
+
+=item interested
+
+STDERR printing debug message toggle. 1 for on, 0 for off.
+
+=back
+
+=head2 Methods
+
+=over 8
+
+=item _runVelvetg
+
+Private method which runs velvetg with the supplied velvetg parameter string and returns velvetg output messages as a string.
+
+=item _checkVGString
+
+Private method which checks for a correctly formatted velvetg parameter string. Returns 1 or 0.
+
+=item objectVelvetg
+
+Accepts a VelvetAssembly object, looks for the velvetg parameter string it contains, checks it, sends it to _runVelvetg, collects the results and stores them in the VelvetAssembly object.
+
+=item stringVelvetg
+
+Accepts a velvetg parameter string, checks it, sends it to _runVelvetg and then collects and returns the velvetg output messages.
+
+=back
+
+=cut
+
+use warnings;
+use strict;
+use Carp;
+use VelvetOpt::Assembly;
+use POSIX qw(strftime);
+
+my $interested = 0;
+
+sub _runVelvetg {
+ my $cmdline = shift;
+ my $output = "";
+ print STDERR "About to run velvetg!\n" if $interested;
+ $output = `velvetg $cmdline`;
+ $output .= "\nTimestamp: " . strftime("%b %e %Y %H:%M:%S", localtime) . "\n";
+ return $output;
+}
+
+sub _checkVGString {
+ return 1;
+}
+
+sub objectVelvetg {
+ my $va = shift;
+ my $cmdline = $va->{pstringg};
+ if(_checkVGString($cmdline)){
+ $va->{velvetgout} = _runVelvetg($cmdline);
+ my @t = split /\n/, $va->{velvetgout};
+ $t[$#t] =~ s/Timestamp:\s+//;
+ $va->{timestampg} = $t[$#t];
+ return 1;
+ }
+ else {
+ $va->{velvetgout} = "Formatting errors in velvetg parameter string.";
+ return 0;
+ }
+}
+
+sub stringVelvetg {
+ my $cmdline = shift;
+ if(_checkVGString($cmdline)){
+ return _runVelvetg($cmdline);
+ }
+ else {
+ return "Formatting errors in velvetg parameter string.";
+ }
+}
+
+1;
diff --git a/VelvetOpt/hwrap.pm b/VelvetOpt/hwrap.pm
new file mode 100644
index 0000000..bd6b570
--- /dev/null
+++ b/VelvetOpt/hwrap.pm
@@ -0,0 +1,371 @@
+# VelvetOpt::hwrap.pm
+#
+# Copyright 2008 Simon Gladman <simon.gladman at monash.edu>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+#
+# Version 1.1 - 14/07/2010 - Added support for changing input file types
+# Version 1.2 - 11/08/2010 - Changed velveth help parser for new velvet help format
+# Thanks to Alexie Papanicolaou - CSIRO for the patch.
+# Version 1.3 - 05/10/2012 - Added support for new velveth options
+
+package VelvetOpt::hwrap;
+
+=head1 NAME
+
+VelvetOpt::hwrap.pm - Velvet hashing program wrapper module.
+
+=head1 AUTHOR
+
+Simon Gladman, CSIRO, 2007, 2008.
+
+=head1 LICENSE
+
+Copyright 2008 Simon Gladman <simon.gladman at csiro.au>
+
+ This program is free software; you can redistribute it and/or modify
+ it under the terms of the GNU General Public License as published by
+ the Free Software Foundation; either version 2 of the License, or
+ (at your option) any later version.
+
+ This program is distributed in the hope that it will be useful,
+ but WITHOUT ANY WARRANTY; without even the implied warranty of
+ MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+ GNU General Public License for more details.
+
+ You should have received a copy of the GNU General Public License
+ along with this program; if not, write to the Free Software
+ Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+ MA 02110-1301, USA.
+
+=head1 SYNOPSIS
+
+ use VelvetOpt::hwrap;
+ use VelvetOpt::Assembly;
+ my $object = VelvetOpt::Assembly->new(
+ timestamph => "23 November 2008 15:00:00",
+ ass_id => "1",
+ versionh => "0.7.04",
+ pstringh => "test 21 -fasta test_reads.fna",
+ ass_dir => "/home/gla048/Desktop/newVelvetOptimiser/data_1"
+ );
+ my $worked = VelvetOpt::hwrap::objectVelveth($object);
+ if($worked){
+ print $object->toString();
+ }
+ else {
+ die "Error in velveth..\n" . $object->toString();
+ }
+
+=head1 DESCRIPTION
+
+A wrapper module to run velveth on VelvetAssembly objects or on velveth
+parameter strings. Also contains private methods to check velveth
+parameter strings, run velveth and return results.
+
+=head2 Uses
+
+=over 8
+
+=item strict
+
+=item warnings
+
+=item Carp
+
+=item VelvetOpt::Assembly
+
+=item POSIX qw(strftime)
+
+=back
+
+=head2 Private Fields
+
+=over 8
+
+=item interested
+
+STDERR printing debug message toggle. 1 for on, 0 for off.
+
+=back
+
+=head2 Methods
+
+=over 8
+
+=item _runVelveth
+
+Private method which runs velveth with the supplied velveth parameter string and returns velveth output messages as a string.
+
+=item _checkVHString
+
+Private method which checks for a correctly formatted velveth string. Returns 1 or 0.
+
+=item objectVelveth
+
+Accepts a VelvetAssembly object and the number of categories velvet was compiled with, looks for the velveth parameter string it contains, checks it, sends it to _runVelveth, collects the results and stores them in the VelvetAssembly object.
+
+=item stringVelveth
+
+Accepts a velveth parameter string and the number of categories velvet was compiled with, checks it, sends it to _runVelveth and then collects and returns the velveth output messages.
+
+=back
+
+=cut
+
+use warnings;
+use strict;
+use Carp;
+use VelvetOpt::Assembly;
+use POSIX qw(strftime);
+
+my $interested = 0;
+
+my $cats;
+my $maxkmerlength;
+my %Fileformats;
+my %Readtypes;
+my %Otheroptions;
+my %Filelayouts;
+
+my $usage;
+my $inited = 0;
+
+sub init {
+ #run a velveth to get its help lines..
+ my $response = &_runVelveth(" ");
+
+ #get the categories
+ $response =~ m/CATEGORIES = (\d+)/;
+ $cats = $1;
+ unless($cats){$cats = 2;}
+
+ #get the maxkmerlength
+ $response =~ m/MAXKMERLENGTH = (\d+)/;
+ $maxkmerlength = $1;
+
+ #get the file format options
+ $response =~ m/(File format options:(.*)\(Note:)/s;
+ splitVHOptions($1, \%Fileformats);
+
+ #get the file layouts
+ unless($response =~ m/File layout options for paired reads.*:\n(.*)Read type options:/s){ warn "No match for file layout options\n$!";}
+ splitVHOptions($1, \%Filelayouts);
+
+ #get the read type options
+ $response =~ m/(Read type options:(.*)Options:)/s;
+ splitVHOptions($1, \%Readtypes);
+
+ #get the other options
+ $response =~ m/\nOptions:(.*)Synopsis:/s;
+ splitVHOptions($1, \%Otheroptions);
+
+ #make up the standard usage for velveth parameter strings...
+ $usage = "Incorrect velveth parameter string: Needs to be of the form\n{[-file_layout][-file_format][-read_type] filename} or {-other_option}\n";
+ $usage .= "Where:\nFile layout options (for paired end reads):\n";
+ foreach my $key(sort keys %Filelayouts){
+ $usage .= "\t$key\n";
+ }
+ $usage .= "File format options:";
+ foreach my $key (sort keys %Fileformats){
+ $usage .= "\t$key\n";
+ }
+ $usage .= "Read type options:\n";
+ foreach my $key (sort keys %Readtypes){
+ $usage .= "\t$key\n";
+ }
+ $usage .= "Other options:\n";
+ foreach my $key (sort keys %Otheroptions){
+ $usage .= "\t$key\n";
+ }
+ $usage .= "\nThere can be more than one filename specified as long as its a different type.\nStopping run\n";
+
+ #set inited to 1
+ $inited = 1;
+}
+
+sub splitVHOptions {
+ my $section = shift;
+ my $opts = shift;
+ my @t = split /\n/, $section;
+ foreach(@t){
+ #if(/\s+(-\S+)/){
+ while(/\s+(-\S+)/g){
+ $opts->{$1} = 1;
+ }
+ }
+}
+
+sub _runVelveth {
+ #unless($inited){ &init(); }
+ my $cmdline = shift;
+ my $output = "";
+ print STDERR "About to run velveth!\n" if $interested;
+ $output = `velveth $cmdline`;
+ $output .= "\nTimestamp: " . strftime("%b %e %Y %H:%M:%S", localtime) . "\n";
+ return $output;
+}
+
+sub _checkVHString {
+ unless($inited){ &init(); }
+ print STDERR $usage if $interested;
+ my $line = shift;
+ my $useless = shift;
+
+ print STDERR "\tIN checkVHString: About to test $line\n" if $interested;
+
+ my $ok = 1;
+
+ #first remove all "other" options.
+ foreach(keys %Otheroptions){
+ $line =~ s/$_//;
+ }
+
+ #get each m/-options+ filename+/ block
+ my @blocks;
+ $line =~ s/^/ /;
+ while ($line =~ m/(\b(-[\w\d]+\s+)+[\w\/\\\. ]+)/g) {
+ my $text = $1;
+ $text =~ s/\s+$//;
+ push @blocks, $text;
+ }
+
+ #look at each block in turn
+ foreach my $block(@blocks) {
+ my $blockgood = 1;
+ my $numfiles = 0;
+ my $formatused = 0;
+ my $layoutused = 0;
+ my $readused = 0;
+ my $separate = 0;
+ my $paired = 0;
+ my @files_to_check;
+
+ print STDERR "\tIN checkVHString: Block being checked: $block\n" if $interested;
+
+ my @t = split /\s+/, $block;
+
+ #look at each part of the block
+ foreach my $x(@t){
+ #check if its an option, otherwise its a filename...
+ unless($x =~ m/^-/){
+ push @files_to_check, $x;
+ $numfiles ++;
+ next;
+ }
+ #make sure its a valid option.
+ #check file formats first
+ if($Fileformats{$x}){
+ $formatused ++;
+ }
+ elsif($Filelayouts{$x}){
+ $layoutused ++;
+ }
+ elsif($Readtypes{$x}){
+ $readused ++;
+ $paired ++ if $x =~ m/Paired/;
+ }
+ else {
+ $blockgood = 0;
+ if($x =~ m/(\d+)$/){
+ carp "*** Category number $1 in $x higher than that velvet compiled with ($cats)\n";
+ }
+ else {
+ carp "*** Unknown option used: $x in file block: $block\n";
+ }
+
+ }
+ if($x eq "-separate"){
+ $separate = 1;
+ }
+ }
+
+ #make sure only 1 filetype, format and readtype is used in each block
+ if($formatused > 1){
+ carp "*** Too many file formats used in block: $block\n";
+ $blockgood = 0;
+ }
+ if($layoutused > 1){
+ carp "*** Too many file layouts used in block: $block\n";
+ $blockgood = 0;
+ }
+ if($readused > 1){
+ carp "*** Too many read type specifications used in block: $block\n";
+ $blockgood = 0;
+ }
+
+ #check appropriate number of files if separate..
+ if($separate && $numfiles != 2){
+ carp "*** $numfiles files specified for -separate option in block: $block. Require exactly 2.\n";
+ $blockgood = 0;
+ }
+
+ #check if paired read type option was chosen...
+ if($separate && !$paired){
+ carp "*** -separate chosen without valid Paired read type specified in block: $block. Need to specify either -shortPaired or -longPaired\n";
+ $blockgood = 0;
+ }
+
+ #make sure files are readable..
+ foreach my $file(@files_to_check){
+ unless(-r $file){
+ $blockgood = 0;
+ carp "*** File $file doesn't exist or is unreadable.\n";
+ }
+ }
+ unless($blockgood){ print STDERR "Block $block FAILED!\n"}
+
+ #if block is no good then whole thing is no good...
+ $ok = $blockgood;
+ }
+
+ return $ok;
+
+
+
+}
+
+sub objectVelveth {
+ unless($inited){ &init(); }
+ my $va = shift;
+ my $cats = shift;
+ my $cmdline = $va->{pstringh};
+ if(_checkVHString($cmdline, $cats)){
+ $va->{velvethout} = _runVelveth($cmdline);
+ my @t = split /\n/, $va->{velvethout};
+ $t[$#t] =~ s/Timestamp:\s+//;
+ $va->{timestamph} = $t[$#t];
+ return 1;
+ }
+ else {
+ $va->{velvethout} = "Formatting errors in velveth parameter string.$usage";
+ return 0;
+ }
+}
+
+sub stringVelveth {
+ unless($inited){ &init(); }
+ my $cmdline = shift;
+ my $cats = shift;
+ if(_checkVHString($cmdline,$cats)){
+ return _runVelveth($cmdline);
+ }
+ else {
+ return "Formatting errors in velveth parameter string.$usage";
+ }
+}
+
+1;
diff --git a/VelvetOptimiser.pl b/VelvetOptimiser.pl
new file mode 100755
index 0000000..a76c677
--- /dev/null
+++ b/VelvetOptimiser.pl
@@ -0,0 +1,923 @@
+#!/usr/bin/perl
+#
+# VelvetOptimiser.pl
+#
+# Copyright 2008, 2009, 2010 Simon Gladman <simon.gladman at monash.edu>
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program; if not, write to the Free Software
+# Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston,
+# MA 02110-1301, USA.
+
+my $OptVersion = "2.2.5";
+
+#
+# pragmas
+#
+use strict;
+use warnings;
+#
+# includes
+#
+use POSIX qw(strftime);
+use FindBin;
+use lib "$FindBin::Bin";
+use threads;
+use threads::shared;
+use VelvetOpt::Assembly;
+use VelvetOpt::hwrap;
+use VelvetOpt::gwrap;
+use VelvetOpt::Utils;
+use Data::Dumper;
+use Storable qw (freeze thaw);
+use Getopt::Long;
+
+
+#
+# global var decs
+#
+
+#Change the following integer when compiling Velvet with the MAXKMERLENGTH
+#greater than 31 to the value you used.
+my $maxhash;
+my @hashvals;
+my %assemblies : shared;
+my %assembliesObjs;
+my @Options;
+my $readfile;
+my $interested = 0;
+my $verbose : shared;
+my $hashs;
+my $hashe;
+my $hashstep;
+my $amos;
+my $vgoptions;
+my $genomesize;
+my @shortInserts;
+my $logfile = "logfile.txt";
+my $ass_num = 1;
+my $categories;
+my $prefix;
+my $OUT;
+my $logSem : shared;
+our $num_threads;
+my $current_threads : shared = 0;
+my $opt_func;
+my $opt_func2;
+my $minCovCutoff;
+my $upperCovCutoff;
+my $threadfailed : shared = 0;
+my $finaldir;
+my $printVersion = 0;
+
+#
+#
+# main script
+#
+#
+print STDERR "
+****************************************************
+ VelvetOptimiser.pl Version $OptVersion
+****************************************************\n";
+
+my $currfreemem = VelvetOpt::Utils::free_mem;
+
+print STDERR "Number of CPUs available: " . VelvetOpt::Utils::num_cpu . "\n";
+printf STDERR "Current free RAM: %.3fGB\n", $currfreemem;
+
+#get the velveth and velvetg version numbers...
+my $response = VelvetOpt::hwrap::_runVelveth(" ");
+$response =~ /Version\s+(\d+\.\d+\.\d+)/s;
+my $vhversion = $1;
+unless ($vhversion){ die "Unable to find velveth, please ensure that the velvet executables are in your PATH.\n";}
+$response =~ /CATEGORIES = (\d+)/;
+$categories = $1;
+unless($categories){ $categories = 2; }
+
+$response =~ /MAXKMERLENGTH = (\d+)/;
+$maxhash = $1;
+unless($maxhash){ $maxhash = 31; }
+
+#check the number of threads that velvet was compiled with (OMP_NUM_THREADS) if it is the OMP version
+#then warn the user that -t will multiply that value and VO will use more CPUs than they think..
+my $thread_per_job = $ENV{OMP_NUM_THREADS} || 1;
+
+print STDERR "Velvet OMP compiler setting: $thread_per_job\n";
+
+#check Bioperl installed!
+BEGIN {
+ eval{ require Bio::SeqIO; 1;};
+ die "You do not have BioPerl intalled. See http://www.bioperl.org/wiki/Installing_BioPerl for help." if $@;
+}
+
+#get the options!
+&setOptions();
+
+if($prefix eq "auto"){
+ $logfile = strftime("%d-%m-%Y-%H-%M-%S", localtime) . "_Logfile.txt";
+} else {
+ $logfile = $prefix . "_logfile.txt";
+}
+
+print "Logfile name: $logfile\n";
+
+#open the logfile
+open $OUT, ">$logfile" or die "Couldn't open $logfile for writing.\n$!\n";
+
+#
+#
+# Perform common tasks - write details to log file and screen, run velveth and vanilla velvetg
+#
+#
+
+print STDERR "\nMemory use estimation only! Script will terminate after showing results.\n\n" if($genomesize);
+
+print STDERR "Velvet details:\n";
+print STDERR "\tVelvet version: $vhversion\n";
+print STDERR "\tCompiled categories: $categories\n" if $categories;
+print STDERR "\tCompiled max kmer length: $maxhash\n" if $maxhash;
+print STDERR "\tMaximum number of velvetinstances to run: $num_threads\n";
+
+#let user know about parameters to run with.
+print STDERR "Will run velvet optimiser with the following paramters:\n";
+print STDERR "\tVelveth parameter string:\n\t\t$readfile\n";
+print STDERR "\tVelveth start hash values:\t$hashs\n";
+print STDERR "\tVelveth end hash value:\t\t$hashe\n";
+print STDERR "\tVelveth hash step value:\t$hashstep\n";
+print STDERR "\tVelvetg minimum coverage cutoff to use:\t$minCovCutoff\n\n";
+if($vgoptions){
+ print $OUT "\tUser specified velvetg options: $vgoptions\n";
+}
+if($amos){
+ print STDERR "\tRead tracking for final assembly on.\n";
+} else {
+ print STDERR "\tRead tracking for final assembly off.\n";
+}
+
+#build the hashval array - steps too...
+for(my $i = $hashs; $i <= $hashe; $i += $hashstep){
+ #print STDERR "i is $i\n";
+ push @hashvals, $i;
+}
+
+#check for $hashe in array..
+my $max = $hashvals[$#hashvals];
+if($max < $hashe){
+ push @hashvals, $hashe;
+}
+
+if($genomesize){
+ my $x = &estMemUse();
+ printf STDERR "\nMemory use estimated to be: %.1fGB for $num_threads threads.\n\n", $x;
+ if ($x < $currfreemem){
+ print STDERR "You should have enough memory to complete this job. (Though this estimate is no guarantee..)\n";
+ exit;
+ }
+ else {
+ print STDERR "You probably won't have enough memory to run this job.\nTry decreasing the maximum number of threads used.\n(use the -t option to set max threads.)\n";
+ exit;
+ }
+}
+
+
+print $OUT strftime("%b %e %H:%M:%S", localtime), "\n";
+
+#send run parameters to log file.
+print $OUT "Will run velvet optimiser with the following paramters:\n";
+print $OUT "\tVelveth parameter string:\n\t\t$readfile\n";
+print $OUT "\tVelveth start hash values:\t$hashs\n";
+print $OUT "\tVelveth end hash value:\t\t$hashe\n";
+print $OUT "\tVelveth hash step value:\t$hashstep\n";
+print $OUT "\tVelvetg minimum coverage cutoff to use:\t$minCovCutoff\n\n";
+if($vgoptions){
+ print $OUT "\tUser specified velvetg options: $vgoptions\n";
+}
+if($amos){
+ print $OUT "\tRead tracking for final assembly on.\n";
+} else {
+ print $OUT "\tRead tracking for final assembly off.\n";
+}
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), " Beginning velveth runs.\n";
+print $OUT strftime("%b %e %H:%M:%S", localtime), "\n\n\tBeginning velveth runs.\n";
+
+#now run velveth for all the hashvalues in a certain number of threads..
+my @threads;
+foreach my $hashval (@hashvals){
+ #print STDERR "Hashval = $hashval Numthreads = $num_threads currentThreads = $current_threads\n";
+ while($current_threads >= $num_threads){
+ sleep(2);
+ }
+ if($threadfailed){
+ for my $thr (threads->list) {
+ print STDERR "Waiting for thread ",$thr->tid," to complete.\n";
+ $thr->join;
+ }
+ die "Velveth failed to run! Must be a problem with file types, check by running velveth manually or by using -v option and reading the log file.\n";
+ }
+ $threads[$ass_num] = threads->create(\&runVelveth, $readfile, $hashval, $vhversion, \$logSem, $ass_num);
+ $ass_num ++;
+ sleep(2);
+}
+
+for my $thr (threads->list) {
+ #print STDERR "Waiting for thread ",$thr->tid," to complete.\n";
+ $thr->join;
+}
+
+#now run velvetg for the all the hashvalues in a certain number of threads..
+#first get velvetg's version number.
+
+$response = VelvetOpt::gwrap::_runVelvetg(" ");
+$response =~ /Version\s+(\d+\.\d+\.\d+)/s;
+my $vgversion = $1;
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), " Finished velveth runs.\n";
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), " Beginning vanilla velvetg runs.\n";
+print $OUT strftime("%b %e %H:%M:%S", localtime), "\n\n\tBeginning vanilla velvetg runs.\n";
+
+foreach my $key (sort { $a <=> $b } keys %assemblies){
+ while($current_threads >= $num_threads){
+ sleep(2);
+ }
+ $threads[$ass_num] = threads->create(\&runVelvetg, $vgversion, \$logSem, $key);
+ sleep(2);
+}
+
+for my $thr (threads->list) {
+ #print STDERR "Waiting for thread ",$thr->tid," to complete.\n";
+ $thr->join;
+}
+
+
+#now to thaw it all out..
+
+foreach my $key(sort keys %assemblies){
+ my $obj = bless thaw($assemblies{$key}), "VelvetOpt::Assembly";
+ $assembliesObjs{$key} = $obj;
+}
+
+
+#find the best assembly...
+
+#
+#
+# Now perform a velvetg optimisation based upon the file types sent to velveth
+#
+#
+
+#
+# get the best assembly so far...
+#
+
+my $bestId;
+my $maxScore = -100;
+my $asmscorenotneg = 1;
+
+foreach my $key (keys %assembliesObjs){
+ if(($assembliesObjs{$key}->{assmscore} != -1) && $asmscorenotneg){
+ if($assembliesObjs{$key}->{assmscore} > $maxScore){
+ $bestId = $key;
+ $maxScore = $assembliesObjs{$key}->{assmscore};
+ }
+ }
+ elsif($assembliesObjs{$key}->{n50} && $asmscorenotneg){
+ if($assembliesObjs{$key}->{n50} > $maxScore){
+ $bestId = $key;
+ $maxScore = $assembliesObjs{$key}->{n50};
+ }
+ }
+ else {
+ $asmscorenotneg = 0;
+ if($assembliesObjs{$key}->{totalbp} > $maxScore){
+ $bestId = $key;
+ $maxScore = $assembliesObjs{$key}->{totalbp};
+ }
+ }
+}
+print "\n\nThe best assembly so far is:\n" if $interested;
+print $assembliesObjs{$bestId}->toStringNoV() if $interested;
+
+# determine the optimisation route for the assembly based on the velveth parameter string.
+my $optRoute = &getOptRoutine($readfile);
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), " Hash value of best assembly by assembly score: ". $assembliesObjs{$bestId}->{hashval} . "\n";
+
+print $OUT strftime("%b %e %H:%M:%S", localtime), " Best assembly by assembly score - assembly id: $bestId\n";
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), " Optimisation routine chosen for best assembly: $optRoute\n";
+print $OUT strftime("%b %e %H:%M:%S", localtime), " Optimisation routine chosen for best assembly: $optRoute\n";
+
+#now send the best assembly so far to the appropriate optimisation routine...
+
+if($optRoute eq "shortOpt"){
+
+ &expCov($assembliesObjs{$bestId});
+ &covCutoff($assembliesObjs{$bestId});
+
+}
+elsif($optRoute eq "shortLong"){
+
+ &expCov($assembliesObjs{$bestId});
+ &covCutoff($assembliesObjs{$bestId});
+
+}
+elsif($optRoute eq "longPaired"){
+ &expCov($assembliesObjs{$bestId});
+ &insLengthLong($assembliesObjs{$bestId});
+ &covCutoff($assembliesObjs{$bestId});
+}
+elsif($optRoute eq "shortPaired"){
+ &expCov($assembliesObjs{$bestId});
+ &insLengthShort($assembliesObjs{$bestId});
+ &covCutoff($assembliesObjs{$bestId});
+}
+elsif($optRoute eq "shortLongPaired"){
+ &expCov($assembliesObjs{$bestId});
+ &insLengthShort($assembliesObjs{$bestId});
+ &insLengthLong($assembliesObjs{$bestId});
+ &covCutoff($assembliesObjs{$bestId});
+}
+else{
+ print STDERR "There was an error choosing an optimisation routine for this assembly. Please change the velveth parameter string and try again.\n";
+ print $OUT "There was an error choosing an optimisation routine for this assembly. Please change the velveth parameter string and try again.\n";
+}
+
+# once it comes back from the optimisation routines, we need to turn on read tracking and amos output if it was selected in the options.
+#
+#
+# The final assembly run!
+#
+#
+if($amos){
+ $assembliesObjs{$bestId}->{pstringg} .= " -amos_file yes -read_trkg yes";
+
+ my $final = VelvetOpt::gwrap::objectVelvetg($assembliesObjs{$bestId});
+ $assembliesObjs{$bestId}->getAssemblyDetails();
+}
+
+print STDERR strftime("%b %e %H:%M:%S", localtime), "\n\n\nFinal optimised assembly details:\n";
+print $OUT strftime("%b %e %H:%M:%S", localtime), "\n\n\nFinal optimised assembly details:\n";
+print STDERR $assembliesObjs{$bestId}->toStringNoV() if !$verbose;
+print $OUT $assembliesObjs{$bestId}->toStringNoV() if !$verbose;
+print STDERR $assembliesObjs{$bestId}->toString() if $verbose;
+print $OUT $assembliesObjs{$bestId}->toString() if $verbose;
+if($finaldir eq "."){
+ print STDERR "\n\nAssembly output files are in the following directory:\n" . $assembliesObjs{$bestId}->{ass_dir} . "\n\n";
+ print $OUT "\n\nAssembly output files are in the following directory:\n" . $assembliesObjs{$bestId}->{ass_dir} . "\n";
+}
+else {
+ print STDERR "\n\nAssembly output files are in the following directory:\n" . $finaldir . "\n\n";
+ print $OUT "\n\nAssembly output files are in the following directory:\n" . $finaldir . "\n";
+}
+
+#delete superfluous directories..
+foreach my $key(keys %assemblies){
+ unless($key == $bestId){
+ my $dir = $assembliesObjs{$key}->{ass_dir};
+ system('rm', '-r', '--preserve-root', $dir);
+ }
+}
+unless ($finaldir eq "."){
+ rename $assembliesObjs{$bestId}->{ass_dir}, $finaldir;
+ rename $logfile, "$finaldir/$logfile";
+}
+
+#
+#
+# subroutines...
+#
+#
+#----------------------------------------------------------------------
+
+# Option setting routines
+
+sub setOptions {
+ use Getopt::Long;
+ my $num_cpus = VelvetOpt::Utils::num_cpu;
+ my $thmax = int($num_cpus/$thread_per_job) || 1;
+
+
+ @Options = (
+ {OPT=>"help", VAR=>\&usage, DESC=>"This help"},
+ {OPT=>"version!", VAR=>\$printVersion, DEFAULT=>0, DESC=>"Print version to stdout and exit."},
+ {OPT=>"v|verbose+", VAR=>\$verbose, DEFAULT=>0, DESC=>"Verbose logging, includes all velvet output in the logfile."},
+ {OPT=>"s|hashs=i", VAR=>\$hashs, DEFAULT=>19, DESC=>"The starting (lower) hash value"},
+ {OPT=>"e|hashe=i", VAR=>\$hashe, DEFAULT=>$maxhash, DESC=>"The end (higher) hash value"},
+ {OPT=>"x|step=i", VAR=>\$hashstep, DEFAULT=>2, DESC=>"The step in hash search.. min 2, no odd numbers"},
+ {OPT=>"f|velvethfiles=s", VAR=>\$readfile, DEFAULT=>0, DESC=>"The file section of the velveth command line."},
+ {OPT=>"a|amosfile!", VAR=>\$amos, DEFAULT=>0, DESC=>"Turn on velvet's read tracking and amos file output."},
+ {OPT=>"o|velvetgoptions=s", VAR=>\$vgoptions, DEFAULT=>'', DESC=>"Extra velvetg options to pass through. eg. -long_mult_cutoff -max_coverage etc"},
+ {OPT=>"t|threads=i", VAR=>\$num_threads, DEFAULT=>$thmax, DESC=>"The maximum number of simulataneous velvet instances to run."},
+ {OPT=>"g|genomesize=f", VAR=>\$genomesize, DEFAULT=>0, DESC=>"The approximate size of the genome to be assembled in megabases.\n\t\t\tOnly used in memory use estimation. If not specified, memory use estimation\n\t\t\twill not occur. If memory use is estimated, the results are shown and then program exits."},
+ {OPT=>"k|optFuncKmer=s", VAR=>\$opt_func, DEFAULT=>'n50', DESC=>"The optimisation function used for k-mer choice."},
+ {OPT=>"c|optFuncCov=s", VAR=>\$opt_func2, DEFAULT=>'Lbp', DESC=>"The optimisation function used for cov_cutoff optimisation."},
+ {OPT=>"m|minCovCutoff=f", VAR=>\$minCovCutoff, DEFAULT=>0, DESC=>"The minimum cov_cutoff to be used."},
+ {OPT=>"p|prefix=s", VAR=>\$prefix, DEFAULT=>'auto', DESC=>"The prefix for the output filenames, the default is the date and time in the format DD-MM-YYYY-HH-MM_."},
+ {OPT=>"d|dir_final=s", VAR=>\$finaldir, DEFAULT=>'.', DESC=>"The name of the directory to put the final output into."},
+ {OPT=>"z|upperCovCutoff=f", VAR=>\$upperCovCutoff, DEFAULT=>0.8, DESC=>"The maximum coverage cutoff to consider as a multiplier of the expected coverage."},
+ );
+
+ (@ARGV < 1) && (usage());
+
+ &GetOptions(map {$_->{OPT}, $_->{VAR}} @Options) || usage();
+
+ # Now setup default values.
+ foreach (@Options) {
+ if (defined($_->{DEFAULT}) && !defined(${$_->{VAR}})) {
+ ${$_->{VAR}} = $_->{DEFAULT};
+ }
+ }
+
+ if ($printVersion) {
+ print "VelvetOptimiser $OptVersion\n";
+ exit 0;
+ }
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Starting to check input parameters.\n";
+
+ unless($readfile){
+ print STDERR "\tYou must supply the velveth parameter line in quotes. eg -f '-short .....'\n";
+ &usage();
+ }
+
+ if($hashs > $maxhash){
+ print STDERR "\tStart hash value too high. New start hash value is $maxhash.\n";
+ $hashs = $maxhash;
+ }
+ if(!&isOdd($hashs)){
+ $hashs = $hashs - 1;
+ print STDERR "\tStart hash value not odd. Subtracting one. New start hash value = $hashs\n";
+ }
+
+ if(&isOdd($hashstep)){
+ print STDERR "\tOld hash step value $hashstep\n";
+ $hashstep --;
+ print STDERR "\tHash search step value was odd, substracting one. New hash step value = $hashstep\n";
+ }
+ if($hashstep > ($hashe - $hashs)){
+ $hashstep = $hashe - $hashs;
+ print STDERR "\tHash search step value was higher than start to end range. Setting hash step to range. New hash step value = $hashstep\n";
+ }
+ if($hashstep < 2){
+ $hashstep = 2;
+ print STDERR "\tHash step set below minimum of 2. New hash step value = 2\n";
+ }
+ if($hashe > $maxhash || $hashe < 1){
+ print STDERR "\tEnd hash value not in workable range. New end hash value is $maxhash.\n";
+ $hashe = $maxhash;
+ }
+ if($hashe < $hashs){
+ print STDERR "\tEnd hash value lower than start hash value. New end hash value = $hashs.\n";
+ $hashe = $hashs;
+ }
+ if(!&isOdd($hashe)){
+ $hashe = $hashe - 1;
+ print STDERR "\tEnd hash value not odd. Subtracting one. New end hash value = $hashe\n";
+ }
+
+ if($num_threads > $thmax){
+ print STDERR "\tWARNING: You have set the number of threads to use to a value greater than the number of available CPUs.\n";
+ print STDERR "\tWARNING: This may be because of the velvet compile option for OMP.\n";
+ }
+
+ #check the velveth parameter string..
+ my $vh_ok = VelvetOpt::hwrap::_checkVHString("check 21 $readfile", $categories);
+
+ unless($vh_ok){ die "Please re-start with a corrected velveth parameter string." }
+
+ print STDERR "\tVelveth parameter string OK.\n";
+
+ #test if outdir exists...
+ if(-d $finaldir && $finaldir ne "."){
+ die "Output directory $finaldir already exists, please choose a different name and restart.\n";
+ }
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Finished checking input parameters.\n";
+
+}
+
+sub usage {
+ print "Usage: $0 [options] -f 'velveth input line'\n";
+ foreach (@Options) {
+ printf " --%-13s %s%s.\n",$_->{OPT},$_->{DESC},
+ defined($_->{DEFAULT}) ? " (default '$_->{DEFAULT}')" : "";
+ }
+ print "\nAdvanced!: Changing the optimisation function(s)\n";
+ print VelvetOpt::Assembly::opt_func_toString;
+ exit(1);
+}
+
+#----------------------------------------------------------------------
+
+
+#
+# runVelveth
+#
+
+sub runVelveth{
+
+ {
+ lock($current_threads);
+ $current_threads ++;
+ }
+
+ my $rf = shift;
+ my $hv = shift;
+ my $vv = shift;
+ my $semRef = shift;
+ my $anum = shift;
+ my $assembly;
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), "\t\tRunning velveth with hash value: $hv.\n";
+
+ #make the velveth command line.
+ my $vhline = $prefix . "_data_$hv $hv $rf";
+
+ #make a new VelvetAssembly and store it in the %assemblies hash...
+ $assembly = VelvetOpt::Assembly->new(ass_id => $anum, pstringh => $vhline, versionh =>$vv, assmfunc => $opt_func, assmfunc2 => $opt_func2);
+
+ #run velveth on this assembly object
+ my $vhresponse = VelvetOpt::hwrap::objectVelveth($assembly, $categories);
+
+ unless($vhresponse){ die "Velveth didn't run on hash value of $hv.\n$!\n";}
+
+ unless(-r ($prefix . "_data_$hv" . "/Roadmaps")){
+ print STDERR "Velveth failed! Response:\n$vhresponse\n";
+ {
+ lock ($threadfailed);
+ $threadfailed = 1;
+ }
+ }
+
+ #run the hashdetail generation routine.
+ $vhresponse = $assembly->getHashingDetails();
+
+ #print the objects to the log file...
+ {
+ lock($$semRef);
+ print $OUT $assembly->toStringNoV() if !$verbose;
+ print $OUT $assembly->toString() if $verbose;
+ }
+
+ {
+ lock(%assemblies);
+ my $ass_str = freeze($assembly);
+ $assemblies{$anum} = $ass_str;
+ }
+
+ {
+ lock($current_threads);
+ $current_threads --;
+ }
+ print STDERR strftime("%b %e %H:%M:%S", localtime), "\t\tVelveth with hash value $hv finished.\n";
+}
+
+#
+# runVelvetg
+#
+sub runVelvetg{
+
+ {
+ lock($current_threads);
+ $current_threads ++;
+ }
+
+ my $vv = shift;
+ my $semRef = shift;
+ my $anum = shift;
+ my $assembly;
+
+ #get back the object!
+ $assembly = bless thaw($assemblies{$anum}), "VelvetOpt::Assembly";
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), "\t\tRunning vanilla velvetg on hash value: " . $assembly->{hashval} . "\n";
+
+ #make the velvetg commandline.
+ my $vgline = $prefix . "_data_" . $assembly->{hashval};
+
+ $vgline .= " $vgoptions";
+ $vgline .= " -clean yes";
+
+ #save the velvetg commandline in the assembly.
+ $assembly->{pstringg} = $vgline;
+
+ #save the velvetg version in the assembly.
+ $assembly->{versiong} = $vv;
+
+ #run velvetg
+ my $vgresponse = VelvetOpt::gwrap::objectVelvetg($assembly);
+
+ unless($vgresponse){ die "Velvetg didn't run on the directory $vgline.\n$!\n";}
+
+ #run the assembly details routine..
+ $assembly->getAssemblyDetails();
+
+ #print the objects to the log file...
+ {
+ lock($$semRef);
+ print $OUT $assembly->toStringNoV() if !$verbose;
+ print $OUT $assembly->toString() if $verbose;
+ }
+
+ {
+ lock(%assemblies);
+ my $ass_str = freeze($assembly);
+ $assemblies{$anum} = $ass_str;
+ }
+
+ {
+ lock($current_threads);
+ $current_threads --;
+ }
+ print STDERR strftime("%b %e %H:%M:%S", localtime), "\t\tVelvetg on hash value: " . $assembly->{hashval} . " finished.\n";
+}
+
+#
+# isOdd
+#
+sub isOdd {
+ my $x = shift;
+ if($x % 2 == 1){
+ return 1;
+ }
+ else {
+ return 0;
+ }
+}
+
+
+#
+# getOptRoutine
+#
+sub getOptRoutine {
+
+ my $readfile = shift;
+
+ # Choose the optimisation path depending on the types of read files sent to velveth
+ # For short only: shortOpt routine
+ # For short and long: shortLong routine
+ # For short paired: shortPaired routine
+ # For short and long paired: longPaired routine
+ # For short paired and long: shortPaired routine
+ # For short paired & long paired: shortlongPaired routine
+
+ #look at velveth string ($readfile) and look for keywords from velvet manual...
+ my $long = 0;
+ my $longPaired = 0;
+ my $shortPaired = 0;
+ my $short = 0;
+
+ #standard cases..
+ if($readfile =~ /-short.? /) { $short = 1; }
+ if($readfile =~ /-long /) { $long = 1; }
+ if($readfile =~ /-shortPaired /) { $shortPaired = 1; }
+ if($readfile =~ /-longPaired /) { $longPaired = 1; }
+
+ #weird cases to cover the non-use of the short keyword (since its the default.)
+ if(!($readfile =~ /(-short.? )|(-long )|(-shortPaired )|(-longPaired )/)) { $short = 1; } #if nothing is specified, assume short.
+ if(!($readfile =~ /-short.? /) && ($readfile =~ /(-long )|(-longPaired )/)) { $short = 1; } #if long or longPaired is specified, also assum short since very unlikely to only have long...
+
+ if($short && !($long || $longPaired || $shortPaired)){
+ return "shortOpt";
+ }
+ elsif($short && $long && !($longPaired || $shortPaired)){
+ return "shortLong";
+ }
+ elsif($short && $longPaired && !$shortPaired){
+ return "longPaired";
+ }
+ elsif($short && $shortPaired && !$longPaired){
+ return "shortPaired";
+ }
+ elsif($short && $shortPaired && $longPaired){
+ return "shortLongPaired";
+ }
+ elsif($shortPaired && !$short && !$long && !$longPaired){
+ return "shortPaired";
+ }
+ else {
+ return "Unknown";
+ }
+}
+
+#
+# covCutoff - the coverage cutoff optimisation routine.
+#
+sub covCutoff{
+
+ my $ass = shift;
+ #get the assembly score and set the current cutoff score.
+ my $ass_score = $ass->{assmscore};
+ print "In covCutOff and assembly score is: $ass_score..\n" if $interested;
+
+
+
+ sub func {
+ my $ass = shift;
+ my $cutoff = shift;
+ my $ass_score = $ass->{assmscore};
+ my $ps = $ass->{pstringg};
+ if($ps =~ /cov_cutoff/){
+ $ps =~ s/cov_cutoff\s+\d+(\.\d+)?/cov_cutoff $cutoff/;
+ }
+ else {
+ $ps .= " -cov_cutoff $cutoff";
+ }
+ $ass->{pstringg} = $ps;
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime);
+ printf STDERR "\t\tSetting cov_cutoff to %.3f.\n", $cutoff;
+ print $OUT strftime("%b %e %H:%M:%S", localtime);
+ printf $OUT "\t\tSetting cov_cutoff to %.3f.\n", $cutoff;
+
+ my $worked = VelvetOpt::gwrap::objectVelvetg($ass);
+ if($worked){
+ $ass->getAssemblyDetails();
+ }
+ else {
+ die "Velvet Error in covCutoff!\n";
+ }
+ $ass_score = $ass->{assmscore};
+ print $OUT $ass->toStringNoV();
+
+ return $ass_score;
+
+ }
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Beginning coverage cutoff optimisation\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Beginning coverage cutoff optimisation\n";
+
+ my $dir = $ass->{ass_dir};
+ $dir .= "/stats.txt";
+ #print "\tLooking for exp_cov in $dir\n";
+ my $expCov = VelvetOpt::Utils::estExpCov($dir, $ass->{hashval});
+ if ($minCovCutoff >= $expCov * 0.8) {
+ my $maxc = $expCov * 0.8;
+ print $OUT "Minimum specified coverage cutoff is higher than the expected coverage. Please choose a minimum coverage cutoff smaller than $maxc and re-run.\n";
+ die "Minimum specified coverage cutoff is higher than the expected coverage. Please choose a minimum coverage cutoff smaller than $maxc and re-run.\n";
+ }
+ if ($minCovCutoff >= $expCov * 0.5) {
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Warning: Minimum coverage cutoff is set to be higher than 50% of the expected coverage. Please consider lowering minCovCutoff.\n";
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Warning: Minimum coverage cutoff is set to be higher than 50% of the expected coverage. Please consider lowering minCovCutoff.\n";
+ }
+
+ my $a = $minCovCutoff;
+ my $b = $upperCovCutoff * $expCov;
+ my $t = 0.618;
+ my $c = $a + $t * ($b - $a);
+ my $d = $b + $t * ($a - $b);
+ my $fc = func($ass, $c);
+ my $fd = func($ass, $d);
+
+ my $iters = 1;
+
+ printf STDERR "\t\tLooking for best cutoff score between %.3f and %.3f\n", $a, $b;
+ printf $OUT "\t\tLooking for best cutoff score between %.3f and %.3f\n", $a, $b;
+
+ while(abs($a -$b) > 1){
+ if($fc > $fd){
+ printf STDERR "\t\tMax cutoff lies between %.3f & %.3f\n", $d, $b;
+ my $absdiff = abs($fc - $fd);
+ print STDERR "\t\tfc = $fc\tfd = $fd\tabs diff = $absdiff\n";
+ printf $OUT "\t\tMax cutoff lies between %.3f & %.3f\n", $d, $b;
+ $a = $d;
+ $d = $c;
+ $fd = $fc;
+ $c = $a + $t * ($b - $a);
+ $fc = func($ass, $c);
+ }
+ else {
+ printf STDERR "\t\tMax cutoff lies between %.3f & %.3f\n", $a, $c;
+ my $absdiff = abs($fc - $fd);
+ print STDERR "\t\tfc = $fc\tfd = $fd\tabs diff = $absdiff\n";
+ printf $OUT "\t\tMax cutoff lies between %.3f & %.3f\n", $a, $c;
+ $b = $c;
+ $c = $d;
+ $fc = $fd;
+ $d = $b + $t * ($a - $b);
+ $fd = func($ass, $d);
+ }
+ $iters ++;
+ }
+
+ printf STDERR "\t\tOptimum value of cutoff is %.2f\n", $b;
+ print STDERR "\t\tTook $iters iterations\n";
+ printf $OUT "\t\tOptimum value of cutoff is %.2f\n", $b;
+ print $OUT "\t\tTook $iters iterations\n";
+
+ return 1;
+
+}
+
+#
+# expCov - find the expected coverage for the assembly and run velvetg with that exp_cov.
+#
+sub expCov {
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Looking for the expected coverage\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Looking for the expected coverage\n";
+
+ my $ass = shift;
+
+ #need to get the directory of the assembly and add "stats.txt" to it and then send it to
+ #the histogram methods in SlugsUtils.pm...
+ my $dir = $ass->{ass_dir};
+ $dir .= "/stats.txt";
+ my $expCov = VelvetOpt::Utils::estExpCov($dir, $ass->{hashval});
+
+ print STDERR strftime("%b %e %H:%M:%S", localtime), "\t\tExpected coverage set to $expCov\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), "\t\tExpected coverage set to $expCov\n";
+
+ #re-write the pstringg with the new velvetg command..
+ my $vg = $ass->{pstringg};
+ if($vg =~ /exp_cov/){
+ $vg =~ s/exp_cov\s+\d+/exp_cov $expCov/;
+ }
+ else {
+ $vg .= " -exp_cov $expCov";
+ }
+
+ $ass->{pstringg} = $vg;
+
+ print $OUT $ass->toStringNoV();
+
+}
+
+#
+# insLengthLong - get the Long insert length and use it in the assembly..
+#
+sub insLengthLong {
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Getting the long insert length\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Getting the long insert length\n";
+ my $ass = shift;
+ my $len = "auto";
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Setting assembly long insert length $len\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Setting assembly long insert length $len\n";
+
+ #re-write the pstringg with the new velvetg command..
+ #my $vg = $ass->{pstringg};
+ #if($vg =~ /ins_length_long/){
+ # $vg =~ s/ins_length_long\s+\d+/ins_length_long $len/;
+ #}
+ #else {
+ # $vg .= " -ins_length_long $len";
+ #}
+}
+
+#
+# insLengthShort - get the short insert length and use it in the assembly..
+#
+sub insLengthShort {
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Setting the short insert length\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Setting the short insert length\n";
+ my $ass = shift;
+ my $len = "auto";
+ print STDERR strftime("%b %e %H:%M:%S", localtime), " Setting assembly short insert length(s) to $len\n";
+ print $OUT strftime("%b %e %H:%M:%S", localtime), " Setting assembly short insert length(s) to $len\n";
+
+ #re-write the pstringg with the new velvetg command..
+ #my $vg = $ass->{pstringg};
+ #if($vg =~ /ins_length /){
+ # $vg =~ s/ins_length\s+\d+/ins_length $len/;
+ #}
+ #else {
+ # $vg .= " -ins_length $len";
+ #}
+ #$ass->{pstringg} = $vg;
+}
+
+
+#
+# estMemUse - estimates the memory usage from
+#
+sub estMemUse {
+
+ my $max_runs = @hashvals;
+ my $totmem = 0;
+ #get the read lengths and the number of reads...
+ #need the short read filenames...
+ my ($rs, $nr) = VelvetOpt::Utils::getReadSizeNum($readfile);
+ if ($max_runs > $num_threads){
+ for(my $i = 0; $i < $num_threads; $i ++){
+ $totmem += VelvetOpt::Utils::estVelvetMemUse($rs, $genomesize, $nr, $hashvals[$i]);
+ }
+ }
+ else {
+ foreach my $h (@hashvals){
+ $totmem += VelvetOpt::Utils::estVelvetMemUse($rs, $genomesize, $nr, $h);
+ }
+ }
+ return $totmem;
+}
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index 04413fa..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,33 +0,0 @@
-velvetoptimiser (2.2.5-5) unstable; urgency=medium
-
- * Homepage vanished - point to Github instead
-
- -- Andreas Tille <tille at debian.org> Tue, 17 Jan 2017 11:18:28 +0100
-
-velvetoptimiser (2.2.5-3) unstable; urgency=medium
-
- * cme fix dpkg-control
- * debhelper 10
- * d/watch: version=4
-
- -- Andreas Tille <tille at debian.org> Sat, 17 Dec 2016 22:50:19 +0100
-
-velvetoptimiser (2.2.5-2) unstable; urgency=medium
-
- * Initial upload to Debian (Closes: #753301)
-
- -- Andreas Tille <tille at debian.org> Mon, 30 Jun 2014 11:54:13 +0200
-
-velvetoptimiser (2.2.5-1biolinux0.1) trusty; urgency=low
-
- * Auto-Rebuild for 14.04
- * Add watch file
-
- -- Tim Booth <tbooth at ceh.ac.uk> Wed, 07 May 2014 18:06:48 +0100
-
-velvetoptimiser (2.2.5-0ubuntu1) precise; urgency=low
-
- * New package
- * Remove use of FindBin
-
- -- Tim Booth <tbooth at ceh.ac.uk> Fri, 23 Aug 2013 11:28:09 +0100
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index f599e28..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-10
diff --git a/debian/control b/debian/control
deleted file mode 100644
index b08d325..0000000
--- a/debian/control
+++ /dev/null
@@ -1,22 +0,0 @@
-Source: velvetoptimiser
-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Tim Booth <tbooth at ceh.ac.uk>,
- Andreas Tille <tille at debian.org>
-Section: science
-Priority: optional
-Build-Depends: debhelper (>= 10)
-Standards-Version: 3.9.8
-Vcs-Browser: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/velvetoptimiser/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/velvetoptimiser/trunk/
-Homepage: https://github.com/tseemann/VelvetOptimiser/
-
-Package: velvetoptimiser
-Architecture: all
-Depends: ${shlibs:Depends},
- ${misc:Depends},
- velvet,
- bioperl
-Description: automatically optimise Velvet do novo assembly parameters
- VelvetOptimiser is a multi-threaded Perl script for automatically optimising
- the three primary parameter options (K, -exp_cov, -cov_cutoff) for the Velvet
- de novo sequence assembler.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index bd10828..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,32 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: VelvetOptimiser
-Upstream-Contact: Simon Gladman <simon.gladman at monash.edu>
-Source: https://github.com/tseemann/VelvetOptimiser/
-
-Files: *
-Copyright: 2008-2012 Simon Gladman <simon.gladman at monash.edu>
-License: GPL-2+
-
-Files: debian/*
-Copyright: 2014 Tim Booth <tbooth at ceh.ac.uk>,
- Andreas Tille <tille at debian.org>
-License: GPL-2+
-
-License: GPL-2+
- VelvetOptimiser is free software; you can redistribute it and/or modify
- it under the terms of the GNU General Public License as published by
- the Free Software Foundation; either version 2 of the License, or
- (at your option) any later version.
- .
- VelvetOptimiser is distributed in the hope that it will be useful,
- but WITHOUT ANY WARRANTY; without even the implied warranty of
- MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
- GNU General Public License for more details.
- .
- You should have received a copy of the GNU General Public License
- along with Velvet; if not, write to the Free Software
- Foundation, Inc., 51 Franklin St, Fifth Floor, Boston, MA 02110-1301 USA
- .
- On Debian systems, the complete text of the GNU General Public
- License version 2 can be found in ‘/usr/share/common-licenses/GPL-2’.
-
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index e845566..0000000
--- a/debian/docs
+++ /dev/null
@@ -1 +0,0 @@
-README
diff --git a/debian/install b/debian/install
deleted file mode 100644
index 20dd309..0000000
--- a/debian/install
+++ /dev/null
@@ -1,2 +0,0 @@
-VelvetOpt /usr/share/perl5
-velvetoptimiser /usr/bin
diff --git a/debian/manpages b/debian/manpages
deleted file mode 100644
index 0f65186..0000000
--- a/debian/manpages
+++ /dev/null
@@ -1 +0,0 @@
-debian/*.1
diff --git a/debian/patches/no_findbin b/debian/patches/no_findbin
deleted file mode 100644
index 7e024a0..0000000
--- a/debian/patches/no_findbin
+++ /dev/null
@@ -1,13 +0,0 @@
---- a/VelvetOptimiser.pl
-+++ b/VelvetOptimiser.pl
-@@ -30,8 +30,8 @@
- # includes
- #
- use POSIX qw(strftime);
--use FindBin;
--use lib "$FindBin::Bin";
-+#use FindBin;
-+#use lib "$FindBin::Bin";
- use threads;
- use threads::shared;
- use VelvetOpt::Assembly;
diff --git a/debian/patches/series b/debian/patches/series
deleted file mode 100644
index e69de29..0000000
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index 5380aec..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,15 +0,0 @@
-#!/usr/bin/make -f
-# -*- makefile -*-
-
-# Uncomment this to turn on verbose mode.
-#export DH_VERBOSE=1
-
-
-%:
- dh $@
-
-override_dh_auto_build:
- grep -v FindBin VelvetOptimiser.pl > velvetoptimiser
-
-override_dh_auto_clean:
- rm -f velvetoptimiser
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/velvetoptimiser.1 b/debian/velvetoptimiser.1
deleted file mode 100644
index 59f5171..0000000
--- a/debian/velvetoptimiser.1
+++ /dev/null
@@ -1,77 +0,0 @@
-.TH VELVETOPTIMISER "1" "June 2014" "velvetoptimiser 2.2.5" "User Commands"
-.SH NAME
-velvetoptimiser \- Automatically optimise Velvet do novo assembly parameters
-.SH SYNOPSIS
-.B velvetoptimiser
-[\fI\,options\/\fR] \fI\,-f 'velveth input line'\/\fR
-.SH DESCRIPTION
-VelvetOptimiser is a multi-threaded Perl script for automatically optimising
-the three primary parameter options (K, \-exp_cov, \-cov_cutoff) for the Velvet
-de novo sequence assembler.
-.SH OPTIONS
-.TP
-\fB\-\-help\fR
-This help.
-.TP
-\fB\-\-version\fR!
-Print version to stdout and exit. (default '0').
-.TP
-\fB\-\-v\fR|verbose+
-Verbose logging, includes all velvet output in the logfile. (default '0').
-.TP
-\fB\-\-s\fR|hashs=i
-The starting (lower) hash value (default '19').
-.TP
-\fB\-\-e\fR|hashe=i
-The end (higher) hash value (default '31').
-.TP
-\fB\-\-x\fR|step=i
-The step in hash search.. min 2, no odd numbers (default '2').
-.HP
-\fB\-\-f\fR|velvethfiles=s The file section of the velveth command line. (default '0').
-.TP
-\fB\-\-a\fR|amosfile!
-Turn on velvet's read tracking and amos file output. (default '0').
-.TP
-\fB\-\-o\fR|velvetgoptions=s Extra velvetg options to pass through.
-eg. \fB\-long_mult_cutoff\fR \fB\-max_coverage\fR etc (default '').
-.TP
-\fB\-\-t\fR|threads=i
-The maximum number of simulataneous velvet instances to run. (default '4').
-.TP
-\fB\-\-g\fR|genomesize=f The approximate size of the genome to be assembled in megabases.
-Only used in memory use estimation. If not specified, memory use estimation
-will not occur. If memory use is estimated, the results are shown and then program exits. (default '0').
-.HP
-\fB\-\-k\fR|optFuncKmer=s The optimisation function used for k\-mer choice. (default 'n50').
-.HP
-\fB\-\-c\fR|optFuncCov=s The optimisation function used for cov_cutoff optimisation. (default 'Lbp').
-.HP
-\fB\-\-m\fR|minCovCutoff=f The minimum cov_cutoff to be used. (default '0').
-.TP
-\fB\-\-p\fR|prefix=s
-The prefix for the output filenames, the default is the date and time in the format DD\-MM\-YYYY\-HH\-MM_. (default 'auto').
-.HP
-\fB\-\-d\fR|dir_final=s The name of the directory to put the final output into. (default '.').
-.HP
-\fB\-\-z\fR|upperCovCutoff=f The maximum coverage cutoff to consider as a multiplier of the expected coverage. (default '0.8').
-.PP
-Advanced!: Changing the optimisation function(s)
-.PP
-Velvet optimiser assembly optimisation function can be built from the following variables.
-.IP
-LNbp = The total number of Ns in large contigs
-Lbp = The total number of base pairs in large contigs
-Lcon = The number of large contigs
-max = The length of the longest contig
-n50 = The n50
-ncon = The total number of contigs
-tbp = The total number of basepairs in contigs
-.SS "Examples are:"
-.IP
-\&'Lbp' = Just the total basepairs in contigs longer than 1kb
-\&'n50*Lcon' = The n50 times the number of long contigs.
-\&'n50*Lcon/tbp+log(Lbp)' = The n50 times the number of long contigs divided
-.IP
-by the total bases in all contigs plus the log of the number of bases
-in long contigs.
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index c33a9ff..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,3 +0,0 @@
-version=4
-
-https://github.com/tseemann/VelvetOptimiser/releases .*/archive/v*(\d[\d.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/velvetoptimiser.git
More information about the debian-med-commit
mailing list