[med-svn] [sprai] 01/07: Imported Upstream version 0.9.9.16+dfsg
Afif Elghraoui
afif at moszumanska.debian.org
Thu Jun 2 06:34:41 UTC 2016
This is an automated email from the git hooks/post-receive script.
afif pushed a commit to branch master
in repository sprai.
commit d0299168b86587798ba94094005a51fc5cef1c09
Author: Afif Elghraoui <afif at ghraoui.name>
Date: Wed Jun 1 21:28:29 2016 -0700
Imported Upstream version 0.9.9.16+dfsg
---
doc/_build/html/_sources/Download.txt | 13 +-
doc/_build/html/_sources/README.txt | 102 +++-
doc/_build/html/_sources/index.txt | 2 +
ec.spec | 22 +-
ezez4makefile_v4.pl | 864 ++++++++++++++++++++++++++++++++++
ezez4qsub_vx1.pl | 16 +-
ezez_vx1.pl | 14 +-
makefile | 4 +-
nss2v_v3.c | 481 ++++++++++---------
wscript | 6 +-
10 files changed, 1247 insertions(+), 277 deletions(-)
diff --git a/doc/_build/html/_sources/Download.txt b/doc/_build/html/_sources/Download.txt
index 3edb882..b658f02 100644
--- a/doc/_build/html/_sources/Download.txt
+++ b/doc/_build/html/_sources/Download.txt
@@ -1,6 +1,11 @@
-========
-Download
-========
+===========================
+Download The Latest Version
+===========================
+http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.9.9.15.tar.gz
+
+==============
+Older releases
+==============
http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.9.9.14.tar.gz
http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.9.9.13.tar.gz
@@ -40,3 +45,5 @@ http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.9.5.1.6.tar.gz
.. http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.9.5.1.tar.gz
.. https://bitbucket.org/yun2/sprai_doc/downloads/sprai-0.9.1.3.tar.gz
+
+http://zombie.cb.k.u-tokyo.ac.jp/sprai/dist/sprai-0.2.2.3.tar.bz2
diff --git a/doc/_build/html/_sources/README.txt b/doc/_build/html/_sources/README.txt
index 05e9e18..65b3500 100644
--- a/doc/_build/html/_sources/README.txt
+++ b/doc/_build/html/_sources/README.txt
@@ -162,8 +162,8 @@ After the first assembly, you can calculate the depth distribution of reads to e
Modify other parameters in ec.spec as well, following instructions in the file.
However, the result is not so sensitive to this parameter in our experience.
-Single Node Mode
------------------
+Single Node Mode (1)
+---------------------
Sprai has several execution modes. The first mode we describe is single node mode, with which we can use only a single machine.
If you have more than one server, please see the next section.
@@ -219,15 +219,52 @@ After error correction, if you want to assemble corrected reads using Celera Ass
-ca_path /path/to/your/wgs/Linux-amd64/bin \
-sprai_path the path to get_top_20x_fa.pl installed
+Single Node Mode (make mode)
+-----------------------------
+
+This mode can detect unfinished jobs and restart at the appropriate stage using GNU make.
+
+::
+
+ ezez4makefile_v4.pl ec.spec pbasm.spec
+
+After this command, 'Makefile' will be created.
+Then,
+
+::
+
+ make -j <number_of_processes> -f Makefile > log 2>&1 &
+
+If you only want to error correction only, do as below
+
+::
+
+ ezez4makefile_v4.pl ec.spec
+ make -j <number_of_processes> -f Makefile ec_only > log 2>&1 &
+
+The file result/c_l${libi}.1.fin.v$valid_voters.idfq.gz (eg. c_l0.1.fin.v30.idfq.gz) is the corrected read file.
+
+After error correction, if you want to assemble the corrected reads, do as below
+
+::
+
+ ezez4makefile_v4.pl ec.spec pbasm.spec
+ make -j <number_of_processes> -f Makefile > log 2>&1 &
+
+Error correction will be skipped and only assmbling will start.
+
Multi-node mode 1 (qsub mode)
------------------------------
There are two types of execution modes with Sprai. The first one is qsub mode; a single master process throws child jobs by qsub.
-This mode runs faster and more reliablely than the second mode. However, there is a drawbacks.
-The biggest problem might be that there is no way of restarting the process once a child process fails.
-Anyway, this mode is the most extensively tested, so you should use this mode if your target genome is small enough to be processed with a small number of nodes and thus with little chance of failure.
+
+.. This mode runs faster and more reliablely than the second mode. However, there is a drawbacks.
+.. The biggest problem might be that there is no way of restarting the process once a child process fails.
+.. Anyway, this mode is the most extensively tested, so you should use this mode if your target genome is small enough to be processed with a small number of nodes and thus with little chance of failure.
+
Currently, Sprai supports Sun Grid Engine (SGE) or equivalents (e.g., N1GE, UGE).
To correct sequencing errors of PacBio Continuous Long Reads and also would like to assemble them, specify *blast_path* and *sprai_path* in ec.spec, and do as follows
+
::
ezez4qsub_vx1.pl ec.spec pbasm.spec > log 2>&1 &
@@ -260,32 +297,55 @@ or
.. ezez4makefile.pl ec.spec asm.spec > ezez4makefile.log 2>&1 && make ec_only &
-Multi-node mode 2 (TGEW mode)
+The file data_yyyymmdd_hhmmss/c01.fin.idfq.gz is the corrected read file.
+
+
+If some child processes fail, do as follows
+
+::
+
+ ezez4qsub_vx1.pl ec.spec pbasm.spec -now yyyymmdd_hhmmss
+
+The yyyymmdd_hhmmss is the suffix of your data directory ezez4qsub_vx1.pl made.
+This command will detect unfinished jobs and restart at the appropriate stage.
+
+Multi-node mode 2 (make mode)
------------------------------
-The second mode works with `TGEW <https://github.com/mkasa/TGEW>`_, which is a wrapper script of qsub.
-tge_make in the TGEW package interprets Makefile and submits jobs by qsub.
+This mode can detect unfinished jobs and restart at the appropriate stage using GNU make.
+Currently, Sprai supports Sun Grid Engine (SGE) or equivalents (e.g., N1GE, UGE).
+
::
- ezez4makefile_v3.pl ec.spec pbasm.spec > log 2>&1
- tge_make -bg > tge_make.log 2>&1 &
+ ezez4makefile_v4.pl ec.spec pbasm.spec -submit
+
+The command above will create Makefile, job files and log files and submit job files using qsub.
+
+
+.. The second mode works with `TGEW <https://github.com/mkasa/TGEW>`_, which is a wrapper script of qsub.
+.. tge_make in the TGEW package interprets Makefile and submits jobs by qsub.
+
+.. ::
+
+.. ezez4makefile_v3.pl ec.spec pbasm.spec > log 2>&1
+.. tge_make -bg > tge_make.log 2>&1 &
-ezez4makefile_v3.pl creates Makefile, and tge_make processes it.
-In the case of failure, you can just reexecute tge_make to restart. As make utility, tge_make compares the timestamps of files to see if any updates are needed.
-You can type the following command to see what would be reexecuted::
+.. ezez4makefile_v3.pl creates Makefile, and tge_make processes it.
+.. In the case of failure, you can just reexecute tge_make to restart. As make utility, tge_make compares the timestamps of files to see if any updates are needed.
+.. You can type the following command to see what would be reexecuted::
- tge_make -n
+.. tge_make -n
-Since this mode submits a significant number of jobs at once to SGE, you may have to limit the number of partitions for not to exceed the job number limit.
-You might add a make target to limit the number of jobs being submitted simultaneously to SGE.
-For example, if you want only error-correction, you can specify ec_only target::
+.. Since this mode submits a significant number of jobs at once to SGE, you may have to limit the number of partitions for not to exceed the job number limit.
+.. You might add a make target to limit the number of jobs being submitted simultaneously to SGE.
+.. For example, if you want only error-correction, you can specify ec_only target::
- tge_make -bg ec_only
+.. tge_make -bg ec_only
-tge_make actually calls GNU make to analyse dependencies between files, so you can give any valid target for GNU make.
+.. tge_make actually calls GNU make to analyse dependencies between files, so you can give any valid target for GNU make.
-Before Sprai ver 0.9.6.1.4, Multi-node mode 2 considers only 'pre_partition'.
-Since Sprai ver 0.9.6.1.4, the number of jobs submitted to SGE became 'partition' * 'pre_partition' in Multi-node mode 2.
+.. Before Sprai ver 0.9.6.1.4, Multi-node mode 2 considers only 'pre_partition'.
+.. Since Sprai ver 0.9.6.1.4, the number of jobs submitted to SGE became 'partition' * 'pre_partition' in Multi-node mode 2.
Postprocessing
===============
diff --git a/doc/_build/html/_sources/index.txt b/doc/_build/html/_sources/index.txt
index b729ca9..317a77a 100644
--- a/doc/_build/html/_sources/index.txt
+++ b/doc/_build/html/_sources/index.txt
@@ -20,6 +20,8 @@ Contents
Changelogs
=============
+2016.5.31 v0.9.9.15: ezez4makefile_v4.pl was added. You can use Makefile when execute Sprai. (Thanks to Tomoaki Nishiyama for code modifications)
+
2016.4.15: v0.9.9.14: -ec_only mode can be run without a spec file of Celera Assembler. (Thanks to Afif Elghraoui for a report)
2016.4.13: v0.9.9.13: myrealigner.c: dynamic memory allocation for element & col pools. (Thanks to Tomoaki Nishiyama for code modifications)
diff --git a/ec.spec b/ec.spec
index 104bb7e..df49b5d 100644
--- a/ec.spec
+++ b/ec.spec
@@ -24,18 +24,20 @@ partition 12
# blast_path: where blastn and makeblastdb exist in
blast_path /home/imai/bin/
# sprai_path: where binaries of sprai (bfmt72s, nss2v_v3 and so on) exist in
-sprai_path /home/imai/sprai/bin/
+sprai_path /home/imai/test/sprai/bin/
#### many node mode (advanced) ####
-#sge: options for all the SGE jobs
+#sge: options for all the SGE jobs (used by ezez4qsub_vx1.pl)
#sge -soft -l ljob,lmem,sjob
-#queue_req: additional options for all the SGE jobs
+#queue_req: additional options for all the SGE jobs (used by ezez4qsub_vx1.pl and ezez4makefile_v4.pl)
#queue_req -l s_vmem=4G -l mem_req=4
-#longestXx_queue_req: if valid, displaces queue_req
+#longestXx_queue_req: if valid, displaces queue_req (used by ezez4qsub_vx1.pl)
#longestXx_queue_req -l s_vmem=64G -l mem_req=64
-#BLAST_RREQ: additional options for SGE jobs of all vs. all alignment
+#BLAST_RREQ: additional options for SGE jobs of all vs. all alignment (used by ezez4qsub.pl and ezez4makefile_v4.pl)
#BLAST_RREQ -pe def_slot 4
+#ec_rreq: options for error correction (used by ezez4makefile_v4.pl)
+#ec_rreq -l s_vmem=4G -l mem_req=4
#### common (advanced) ####
@@ -50,3 +52,13 @@ max_target_seqs 100
#trim: both ends of each alignment by blastn will be trimmed 'trim' bases to detect chimeric reads
trim 42
+# direct_vote & copy_blastdb are used by ezez4makefile_v4.pl
+direct_vote 0
+# skip writing the blast results once to disk before selecting
+# voters (default 1), or write to the disk to allow multiple use
+# of the blast results for different number of voters (0)
+copy_blastdb 1
+# copy the blastdb to $TMP, presumably set by gridengine to local node,
+# and use it during execution. This could reduce NFS access per job, but may
+# increase the total transfer if each node is running multiple jobs.
+
diff --git a/ezez4makefile_v4.pl b/ezez4makefile_v4.pl
new file mode 100755
index 0000000..bc83b9c
--- /dev/null
+++ b/ezez4makefile_v4.pl
@@ -0,0 +1,864 @@
+#!/usr/bin/perl
+use strict;
+use warnings;
+use Getopt::Long;
+use File::Path qw(make_path);
+
+# Environmental variables (deprecated):
+# MAKEDB_RREQ options for qsub in creating databases (blast)
+# BLAST_RREQ options for qsub in searching databases (blast)
+# BLAST_OPT options for blast in searching databases
+# ASSEMBLY_RREQ options for qsub in assembly (celera)
+
+my $valid_depth=4;
+my $valid_read_length=500;
+
+my $now = `date +%Y%m%d_%H%M%S`;
+chomp $now;
+my $submit=0;
+my $jobs=0;
+my $opt_help;
+
+GetOptions(
+ "now=s"=>\$now, # string
+ "jobs" => \$jobs,
+ "submit" => \$submit,
+ "h" => \$opt_help
+);
+
+my %params;
+
+my @msgs = (
+ "USAGE: <this> <ec.spec> <asm.spec> or <this> <ec.spec>",
+ "[-jobs: create job files for a grid engine]",
+ "[-submit: actually submit the job to a grid engine]",
+ "[-now string: a suffix of some directories]",
+ "[-h: show this message]",
+);
+
+if(@ARGV == 0 || @ARGV >= 3 || $opt_help){
+ my $msg = join "\n\t", @msgs;
+ die "$msg\n";
+ #die "USAGE: <this> <ec.spec> <asm.spec>\n\t[-jobs: create job files for a grid engine]\n\t[-submit: actually submit the job to a grid engine]\n\t[-now string: a suffix of some directories]\n";
+}
+
+my $spec = "/dev/null";
+if(defined($ARGV[1])){
+ $spec = $ARGV[1];
+}
+
+{
+ my $ec_spec = $ARGV[0];
+ open my $fh,"<",$ec_spec or die "cannot open $ec_spec :$!\n";
+ while($_ = <$fh>){
+ next if($_ =~ /^\s+$/);
+ next if($_ =~ /^\s*#/);
+ chomp;
+ my @line = split /\s+/,$_;
+ for(my $i=0; $i<@line; ++$i){
+ if($line[$i] =~ /^\s*#/){
+ @line = @line[0..$i-1];
+ last;
+ }
+ }
+ $params{$line[0]}=join(" ", at line[1.. at line-1]);
+ }
+ close $fh;
+}
+
+my @input_libs;
+my $nlibs;
+my $pre_partition=1;
+my $partition=12;
+my $word_size="";
+my $evalue=1e-50;
+my $num_threads=1;
+my $max_target_seqs=100;
+my $valid_voters=11;
+my $max_read_length=128000;
+my @voters=();
+my $direct_vote = 1;
+my $copy_blastdb = 0;
+my $trim=42;
+my $estimated_genome_size=0;
+my $estimated_depth = 0;
+my @gsizes=();
+my $ca_path="/opt/wgs-8.1/Linux-amd64/bin/";
+my $blast_path="/usr/local/bin/";
+my $sprai_path="/usr/local/bin/";
+my $param_makedb_rreq = ""; # not so much resource
+my $param_blast_rreq = ""; # desired to have a high CPU slot for latency optimization
+my $param_ec_rreq = ""; # not so much resource
+my $param_blast_opt = "";
+my $param_gettop_rreq = ""; # requires relatively high memory
+my $param_assembly_rreq = "";
+my $filter_same_lib = "| cat";
+my $min_len_for_query=1;
+my $max_len_for_query=1000000000000000;
+
+my $command="";
+
+if(defined($params{input_for_database})){
+ # new multi-lib mode
+ # use complementary set of data for each library to avoid correction of
+ # chimeric data as authentic data
+ my $libs;
+ $libs = $params{input_for_database};
+ @input_libs = split(/\s+/, $libs);
+ $nlibs = @input_libs;
+ printf STDERR "number of libraries: $nlibs\n";
+ printf STDERR "libs: @input_libs\n";
+}else{
+ die "specify input_fastq in ec.spec\n";
+}
+
+if(defined($params{estimated_genome_size})){
+ $estimated_genome_size = $params{estimated_genome_size};
+ @gsizes = split(/\s+/, $estimated_genome_size);
+}
+else{
+ die "specify estimated_genome_size in ec.spec\n";
+}
+if(defined($params{estimated_depth})){
+ $estimated_depth=$params{estimated_depth};
+}
+
+foreach my $eg (@gsizes){
+ if($eg <= 0){die "estimated_genome_size must be > 0\n";}
+}
+
+if(defined($params{pre_partition})){
+ $pre_partition = $params{pre_partition};
+}
+if(defined($params{partition})){
+ $partition = $params{partition};
+}
+if(defined($params{word_size})){
+ $word_size = $params{word_size};
+}
+if(defined($params{evalue})){
+ $evalue = $params{evalue};
+}
+if(defined($params{num_threads})){
+ $num_threads = $params{num_threads};
+}
+if(defined($params{max_target_seqs})){
+ $max_target_seqs = $params{max_target_seqs};
+}
+if(defined($params{valid_voters})){
+ $valid_voters = $params{valid_voters};
+ @voters = split(/\s+/, $valid_voters);
+ undef($valid_voters);
+}
+else{
+ if($estimated_depth == 0) {
+ my $n_base = 0;
+ foreach my $input_fastq(@input_libs){
+ $n_base += -s $input_fastq;
+ }
+ $n_base /= 2;
+ $estimated_depth=$n_base/$estimated_genome_size;
+ }
+ $valid_voters = int(0.8*$estimated_depth);
+ $valid_voters = ($valid_voters < 11) ? 11 : $valid_voters;
+ $valid_voters = ($valid_voters > 30) ? 30 : $valid_voters;
+ @voters = ($valid_voters);
+ undef($valid_voters);
+}
+if(defined($params{direct_vote})){
+ $direct_vote = $params{direct_vote};
+}
+if(defined($params{copy_blastdb})){
+ $copy_blastdb = $params{copy_blastdb};
+}
+if(defined($params{filter_same_lib})){
+ if($params{filter_same_lib} =~ /t/i){
+ $filter_same_lib = "| perl -e 'while(<>){\@a=split;next if(\$\$a[0] ne \$\$a[3] && \$\$a[0]=~s/_.*//&& \$\$a[3]=~s/_.*//&& \$\$a[0] eq \$\$a[3]);print;}'"
+ }
+}
+if(defined($params{max_read_length})){
+ $max_read_length = $params{max_read_length};
+}
+if(defined($params{trim})){
+ $trim = $params{trim};
+}
+if(defined($params{ca_path})){
+ $ca_path = $params{ca_path};
+}
+if(defined($params{blast_path})){
+ $blast_path = $params{blast_path};
+}
+if(defined($params{sprai_path})){
+ $sprai_path = $params{sprai_path};
+}
+if(exists $ENV{'MAKEDB_RREQ'}) {
+ $param_makedb_rreq = $ENV{'MAKEDB_RREQ'};
+ print STDERR "WARNING: environmental variable MAKEDB_RREQ is deprecated. Please set queue_req in ec.spec file.\n";
+}
+if(defined($params{queue_req})) {
+ $param_assembly_rreq = $params{queue_req};
+ $param_blast_rreq = $params{queue_req};
+ $param_makedb_rreq = $params{queue_req};
+ $param_ec_rreq = $params{queue_req};
+}
+if(defined($params{BLAST_RREQ})) {
+ $param_blast_rreq = $params{BLAST_RREQ};
+}
+if(exists $ENV{'BLAST_OPT'}) {
+ $param_blast_opt = $ENV{'BLAST_OPT'};
+ print STDERR "WARNING: environmental variable BLAST_OPT is deprecated. Please set BLAST_OPT in ec.spec file.\n";
+}
+if(defined($params{blast_opt})) {
+ $param_blast_opt = $params{blast_opt};
+}
+if($param_blast_opt =~ /num_threads/) {
+ print STDERR "ERROR: you cannot specify -num_threads in blast_opt. Please use num_threads in ec.spec file.\n";
+ exit 2;
+}
+if($param_blast_opt =~ /word_size/) {
+ print STDERR "ERROR: you cannot specify -word_size in blast_opt. Please use word_size in ec.spec file.\n"+
+ exit 2;
+}
+if(defined($params{makedb_rreq})) {
+ $param_makedb_rreq = $params{makedb_rreq};
+}
+if(defined($params{ec_rreq})) {
+ $param_ec_rreq = $params{ec_rreq};
+}
+if(defined($params{gettop_rreq})) {
+ $param_gettop_rreq = $params{gettop_rreq};
+}else{
+ $param_gettop_rreq = $param_makedb_rreq;
+}
+if(exists $ENV{'ASSEMBLY_RREQ'}) {
+ $param_assembly_rreq = $ENV{'ASSEMBLY_RREQ'};
+ print STDERR "WARNING: environmental variable ASSEMBLY_RREQ is deprecated. Please set assembly_rreq in ec.spec file.\n";
+}
+if(defined($params{assembly_rreq})) {
+ $param_assembly_rreq = $params{assembly_rreq};
+}
+if(defined($params{min_len_for_query})){
+ $min_len_for_query = $params{min_len_for_query};
+}
+if(defined($params{max_len_for_query})){
+ $max_len_for_query = $params{max_len_for_query};
+}
+
+printf STDERR ("-- params --\n");
+printf STDERR ("input_fastq %s\n", join(" ", @input_libs));
+printf STDERR ("estimated_genome_size %s\n",join(" ", @gsizes));
+printf STDERR ("pre_partition %s\n",$pre_partition);
+printf STDERR ("partition %s (total %d)\n",$partition, $pre_partition * $partition);
+$partition *= $pre_partition;
+printf STDERR ("word_size %d\n",$word_size);
+printf STDERR ("evalue %g\n",$evalue);
+printf STDERR ("num_threads %d\n",$num_threads);
+if($min_len_for_query){
+ printf STDERR ("min_len_for_query %d\n",$min_len_for_query);
+}
+if($max_len_for_query){
+ printf STDERR ("max_len_for_query %d\n",$max_len_for_query);
+}
+if($max_target_seqs){
+ printf STDERR ("max_target_seqs %d\n",$max_target_seqs);
+}
+printf STDERR ("valid_voters %s\n",join(" ", @voters));
+printf STDERR ("trim %d\n",$trim);
+printf STDERR ("ca_path %s\n",$ca_path);
+printf STDERR ("blast_path %s\n",$blast_path);
+printf STDERR ("sprai_path %s\n",$sprai_path);
+printf STDERR ("BLAST_RREQ %s\n", $param_blast_rreq);
+printf STDERR ("blast_opt %s\n", $param_blast_opt);
+printf STDERR ("makedb_rreq %s\n", $param_makedb_rreq);
+printf STDERR ("gettop_rreq %s\n", $param_gettop_rreq);
+printf STDERR ("assembly_rreq %s\n", $param_assembly_rreq);
+printf STDERR ("ec_rreq %s\n", $param_ec_rreq);
+printf STDERR ("direct_vote %d\n", $direct_vote);
+printf STDERR ("copy_blastdb %d\n", $copy_blastdb);
+printf STDERR ("-- params --\n");
+
+my $log="./ezez4makefile.log";
+
+if(length($blast_path) > 0 && $blast_path !~ /\/$/){
+ $blast_path .= "/";
+}
+if(length($sprai_path) > 0 && $sprai_path !~ /\/$/){
+ $sprai_path .= "/";
+}
+
+
+my $date="";
+my $message="";
+
+my $bindir=$sprai_path;
+my $path2blast=$blast_path;
+
+if(!-d $bindir){
+ die "$bindir does not exist\n"
+}
+if(!-d $path2blast){
+ die "$path2blast does not exist\n"
+}
+
+my $pwd = `pwd`;
+chomp $pwd;
+
+my $outdir = "result";
+if(!-d $outdir){
+ mkdir "$outdir" or die "cannot mkdir $outdir: $!\n";
+}
+
+my @allscripts=();
+my $script;
+my @alloutputfiles=();
+my @allerrfiles=();
+my $errfile;
+
+my $makefile = "$pwd/Makefile";
+open my $mh, ">", $makefile or die "cannot open $makefile: $!\n";
+my $prefix="c01.fin";
+my $rreq = "";
+if(defined $param_assembly_rreq) {
+ $rreq = "SGE_RREQ=\"$param_assembly_rreq\" ";
+}
+my @fintargets = ();
+foreach my $egsize (@gsizes){
+ foreach my $vv (@voters){
+ my $ps = "$prefix.v$vv.20x$egsize";
+ my $target = "$outdir/$ps/9-terminator/$ps.scf.fasta";
+ push(@fintargets, $target);
+ }
+}
+printf $mh "all: %s\n\n", join(" ", @fintargets);
+foreach my $egsize (@gsizes){
+ foreach my $vv (@voters){
+ my $ps = "$prefix.v$vv.20x$egsize";
+ my $target = "$outdir/$ps/9-terminator/$ps.scf.fasta";
+ printf $mh "$outdir/$ps/9-terminator/$ps.scf.fasta: $outdir/$ps.frg\n";
+ printf $mh "\t${rreq}$ca_path/runCA -dir $outdir/`basename \$< .frg` -p `basename \$< .frg` -s $spec \$< \n";
+ }
+}
+
+
+my $libi;
+my @current_dep;
+my $tmp_dfq;
+my $tmp_blastdb = sprintf("tmp_blastdb");
+if(!-d "$outdir/$tmp_blastdb"){
+ mkdir "$outdir/$tmp_blastdb" or die "cannot mkdir $outdir/$tmp_blastdb: $!\n";
+} # this is not per library directory; thus outside the loop;
+my $aggr_db ="$outdir/$tmp_blastdb/all_libs";
+my $nalf ="$aggr_db.nal";
+for($libi = 0; $libi < $nlibs; $libi++){
+# $libi is a safe string that represents an integer; safe to embed directly in a string
+ my $iprefix = "$outdir/c_l$libi";
+ my $idfqgz = "$iprefix.idfq.gz";
+ my $qidfqgz = "$iprefix.q.idfq.gz";
+ # longer scope required
+ {
+ # construct a idfq file from a fastq file
+
+ my $PG= "$bindir/fq2idfq.pl";
+ my $PG0="$bindir/fa2fq.pl";
+ my $PG2="$bindir/fqfilt.pl";
+ my $PG3="$bindir/dumbbell_filter.pl";
+ my $command = "STMP=`mktemp -d --tmpdir sprai-XXXXXX` ; set -o pipefail; $PG0 \$\< | $PG3 - | $PG - --prefix $iprefix | gzip -c -1 > \$\$STMP/c_l${libi}.idfq.gz && cp -p \$\$STMP/c_l${libi}.idfq.gz \$@ && openssl dgst \$\$STMP/c_l${libi}.idfq.gz \$@; status=\$\$?; rm -fr \$\$STMP;exit \$\$status";
+
+ $rreq = "";
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ printf $mh "$idfqgz: %s\n", $input_libs[$libi];
+ printf $mh "\t${rreq}$command\n\n";
+
+ if($min_len_for_query > 0){
+ $command = "STMP=`mktemp -d --tmpdir sprai-XXXXXX`; set -o pipefail; gzip -d -c \$\< | $PG2 - $min_len_for_query -max_len $max_len_for_query | gzip -c -1 > \$\$STMP/c_l${libi}.q.idfq.gz && cp -p \$\$STMP/c_l${libi}.q.idfq.gz \$@ && openssl dgst \$\$STMP/c_l${libi}.q.idfq.gz \$@; status=\$\$?; rm -fr \$\$STMP; exit \$\$status";
+ printf $mh "$qidfqgz: %s\n", $idfqgz;
+ printf $mh "\t${rreq}$command\n\n";
+ }else{
+ $qidfqgz=$idfqgz;
+ }
+ }
+
+ my $tmp_idfq;
+ my $dfq2fa = "$bindir/dfq2fq_v2.pl -f -";
+ my $addlibn = "sed -e 's/^>/>l${libi}_s/'";
+ #construct blast database from idfq file
+ {
+ $rreq = "";
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ my $PG = "$path2blast/makeblastdb";
+ $command = "gzip -d -c $idfqgz | $dfq2fa | $addlibn | $PG -in - -dbtype nucl -out $outdir/$tmp_blastdb/c_l$libi -title c_l$libi ";
+
+ my $makeblastdone = "$outdir/c${libi}_imakeblastdb.done";
+
+ printf $mh "$makeblastdone: $idfqgz\n";
+ printf $mh "\t${rreq}$command\n";
+ printf $mh "\ttouch \$\@\n\n";
+ }
+
+ if($libi==0){ # %.nal depends on makeblastdb.done of other libs
+# construct aggregate complementary blast database
+# don't need real data. just write to .nal file
+# write a file with the two lines stating TITLE and DBLIST
+#TITLE libs_excludinglib1
+#DBLIST lib2 lib3
+ my @clibs=();
+ my @current_dep=();
+ my $k;
+ for($k=0; $k < $nlibs; $k++){
+ push(@clibs, "c_l$k");
+ push(@current_dep, "$outdir/c${k}_imakeblastdb.done");
+ }
+ printf $mh "$nalf: %s\n", join(" ", @current_dep);
+ printf $mh "\techo TITLE all_libs > \$\@\n";
+ printf $mh "\techo DBLIST %s >> \$\@\n\n", join(" ", @clibs);
+ }
+
+ my $tmp_fa;
+ {
+ $rreq = "";
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ $tmp_fa = "tmp_fa_l$libi";
+ if(!-d "$outdir/$tmp_fa"){
+ mkdir "$outdir/$tmp_fa" or die "cannot mkdir $outdir/$tmp_fa: $!\n";
+ }
+ my $PG2 = "$bindir/partition_fa.pl";
+ $command = "${rreq}gzip -d -c $qidfqgz | $dfq2fa | $addlibn | $PG2 - $partition -p $outdir/$tmp_fa/c_l$libi";
+
+ my $ipartdone = "$outdir/c_l${libi}_partition_fa.done";
+
+ printf $mh "$ipartdone: $qidfqgz\n";
+ printf $mh "\t$command\n";
+ printf $mh "\ttouch \$\@\n\n";
+ }
+
+ { # main error correction job
+ $rreq = "";
+ if(defined $param_blast_rreq && $param_blast_rreq ne '') {
+ $rreq = "SGE_RREQ=\"$param_blast_rreq\" ";
+ }
+ my $tmp_evalue;
+ $tmp_evalue = $evalue;
+ $tmp_dfq = "tmp_dfq_l$libi";
+ if(!-d "$outdir/$tmp_dfq"){
+ mkdir "$outdir/$tmp_dfq" or die "cannot mkdir $outdir/$tmp_dfq: $!\n";
+ }
+ my $BLAST_CMD = "$path2blast/blastn -dbsize 1 -num_threads $num_threads";
+ if($word_size){
+ $BLAST_CMD .= " -word_size $word_size";
+ }
+ if($max_target_seqs){
+ $BLAST_CMD .= " -max_target_seqs $max_target_seqs";
+ }
+ if(defined $param_blast_opt) {
+ $BLAST_CMD .= " $param_blast_opt";
+ }
+ my $outfmt = "7 qseqid sstart send sacc qstart qend bitscore evalue pident qseq sseq";
+ $BLAST_CMD .= " -evalue $tmp_evalue -outfmt '$outfmt'";
+ my $PG2=sprintf("$bindir/bfmt72s -c $trim -u -i",$libi);
+ if($copy_blastdb){ # copy the blastdb per each job to reduce the number of NFS access; this might increase transer if many jobs are run on the same node
+ $BLAST_CMD .= " -db \$\$STMP/$aggr_db";
+ }else{
+ $BLAST_CMD .= " -db $aggr_db";
+ }
+ if($direct_vote){
+ foreach my $valid_voters(@voters){
+ my $PG3="$bindir/nss2v_v3 -r $max_read_length -v $valid_voters";
+ my $PG4="$bindir/myrealigner -f -B $valid_voters -b 3 -d 0.5 -l $max_read_length";
+ if(!-d "$outdir/$tmp_dfq.v$valid_voters"){
+ mkdir "$outdir/$tmp_dfq.v$valid_voters" or die "cannot mkdir $outdir/$tmp_dfq.v$valid_voters: $!\n";
+ }
+ for(my $i=0; $i < $partition; ++$i){
+ my $inputfile = sprintf("$outdir/$tmp_fa/c_l${libi}_%04d.fa", $i);
+ printf $mh "$outdir/$tmp_dfq.v$valid_voters/c_l${libi}_%04d.v$valid_voters.dfq.gz: $outdir/c_l${libi}_partition_fa.done $nalf\n", $i;
+ printf $mh "\t${rreq} STMP=`mktemp -d --tmpdir sprai-XXXXXX`; set -o pipefail; mkdir -p \$\$STMP/$outdir $outdir/$tmp_dfq.v$valid_voters &&";
+ if($copy_blastdb){ printf $mh " cp -rp $outdir/$tmp_blastdb \$\$STMP/$outdir &&";}
+ printf $mh " cp -p $inputfile \$\$STMP/input && openssl dgst $inputfile \$\$STMP/input &&";
+ printf $mh " $BLAST_CMD -query \$\$STMP/input $filter_same_lib | $PG2 - | $PG3 - | $PG4 - | gzip -c -1 > \$\$STMP/output.tmp && cp -p \$\$STMP/output.tmp \$\@ ; status=\$\$?; rm -fr \$\$STMP; exit \$\$status\n\n";
+ }
+ }
+ }else{ # not direct_vote
+ for(my $i=0; $i < $partition; ++$i){
+ my $outputfile = sprintf("$outdir/$tmp_dfq/c_l${libi}_%04d.blast.sam.gz",$i);
+ my $inputfile = sprintf("$outdir/$tmp_fa/c_l${libi}_%04d.fa", $i);
+ printf $mh "$outputfile : $outdir/c_l${libi}_partition_fa.done $nalf\n";
+ printf $mh "\t${rreq}";
+ printf $mh " STMP=`mktemp -d --tmpdir sprai-XXXXXX`; set -o pipefail; mkdir -p \$\$STMP/$outdir &&";
+ if($copy_blastdb){ printf $mh " cp -rp $outdir/$tmp_blastdb \$\$STMP/$outdir &&";}
+ printf $mh " cp -p $inputfile \$\$STMP/input && openssl dgst $inputfile \$\$STMP/input &&";
+ printf $mh " $BLAST_CMD -query \$\$STMP/input $filter_same_lib | $PG2 - | gzip -c -1 > \$\$STMP/output.tmp && cp -p \$\$STMP/output.tmp \$\@ ; status=\$\$?; rm -fr \$\$STMP; exit \$\$status\n\n", $i;
+ }
+
+ $rreq = ""; # smaller resource request
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ foreach my $valid_voters(@voters){
+ my $PG3="$bindir/nss2v_v3 -r $max_read_length -v $valid_voters";
+ my $PG4="$bindir/myrealigner -f -B $valid_voters -b 3 -d 0.5";
+ if(!-d "$outdir/$tmp_dfq.v$valid_voters"){
+ mkdir "$outdir/$tmp_dfq.v$valid_voters" or die "cannot mkdir $outdir/$tmp_dfq.v$valid_voters: $!\n";
+ }
+ # instead of using pattern match write every pattern for tge_make
+ for(my $i=0; $i < $partition; ++$i){
+ printf $mh "$outdir/$tmp_dfq.v$valid_voters/c_l${libi}_%04d.v$valid_voters.dfq.gz: $outdir/$tmp_dfq/c_l${libi}_%04d.blast.sam.gz\n", $i, $i;
+ printf $mh "\t${rreq}gzip -dc \$< | $PG3 - | $PG4 - | gzip -c -1 > \$\@.tmp && mv \$\@.tmp \$\@\n\n";
+ }
+ }
+ }
+ }
+
+ {
+ $rreq = "";
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ foreach my $valid_voters(@voters){
+ $tmp_idfq = "tmp_idfq_l$libi";
+ if(!-d "$outdir/$tmp_idfq.v$valid_voters"){
+ mkdir "$outdir/$tmp_idfq.v$valid_voters" or die "cannot mkdir $outdir/$tmp_idfq: $!\n";
+ }
+ for(my $i=0; $i < $partition; ++$i){
+ printf $mh "$outdir/$tmp_idfq.v$valid_voters/c_l${libi}.refined_%04d.v$valid_voters.idfq.gz: $outdir/$tmp_dfq.v$valid_voters/c_l${libi}_%04d.v$valid_voters.dfq.gz\n", $i, $i;
+ printf $mh "\t${rreq}gzip -d -c \$< | $bindir/dfq2fq_v2.pl - | gzip -c -1 > \$\@.tmp && mv \$\@.tmp \$\@\n\n";
+ }
+ }
+ }
+
+ foreach my $valid_voters(@voters){
+ my $outputfile = "$outdir/c_l${libi}.1.v$valid_voters.idfq.gz";
+ my @part_refined_dfqgz = (); # initialize each time
+ for(my $i=0; $i < $partition; ++$i){
+ my $pidfqfiles = sprintf("$outdir/$tmp_idfq.v$valid_voters/c_l${libi}.refined_%04d.v$valid_voters.idfq.gz", $i);
+ push(@part_refined_dfqgz, $pidfqfiles);
+ }
+ my $files=join(" ", @part_refined_dfqgz);
+
+ printf $mh "$outputfile : $files\n";
+ printf $mh "\tcat \$^ > \$\@.tmp && mv \$\@.tmp \$\@\n\n";
+ }
+
+ {
+ $rreq = "";
+ if(defined $param_makedb_rreq) {
+ $rreq = "SGE_RREQ=\"$param_makedb_rreq\" ";
+ }
+ $tmp_dfq = "tmp_dfq_l$libi";
+ foreach my $valid_voters(@voters){
+ my $tmp_finish = "tmp_finish_l$libi.v$valid_voters";
+ for(my $i=0; $i < $partition; ++$i){
+ if(!-d "$outdir/$tmp_finish"){
+ mkdir "$outdir/$tmp_finish" or die "cannot mkdir $outdir/$tmp_finish: $!\n";
+ }
+ printf $mh "$outdir/$tmp_finish/c_l${libi}.refined_%04d.fin.v$valid_voters.idfq.gz: $outdir/$tmp_dfq.v$valid_voters/c_l${libi}_%04d.v$valid_voters.dfq.gz\n", $i, $i;
+ printf $mh "\t${rreq}gzip -d -c \$< | $bindir/dfq2fq_v2.pl - -finish -valid_depth $valid_depth -valid_read_length $valid_read_length | gzip -c -1 > \$\@.tmp && mv \$\@.tmp \$\@\n\n";
+ }
+ }
+ }
+
+ foreach my $valid_voters(@voters){
+ my $tmp_finish = "tmp_finish_l$libi.v$valid_voters";
+ my @part_finidfqgz = ();
+ for(my $i = 0; $i < $partition; ++$i){
+ my $outputfile = sprintf("$outdir/$tmp_finish/c_l${libi}.refined_%04d.fin.v$valid_voters.idfq.gz",$i);
+ push(@part_finidfqgz,$outputfile);
+ }
+ my $files=join(" ", @part_finidfqgz);
+ my $outputfile = "$outdir/c_l${libi}.1.fin.v$valid_voters.idfq.gz";
+
+ printf $mh "\n";
+ printf $mh "$outputfile: $files\n";
+ printf $mh "\tcat \$^ > \$\@.tmp && mv \$\@.tmp \$\@\n\n";
+ }
+}
+
+my @cdep=();
+foreach my $valid_voters(@voters){
+ for($libi = 0; $libi < $nlibs; $libi ++){
+ push(@cdep, "$outdir/c_l${libi}.1.fin.v$valid_voters.idfq.gz");
+ }
+}
+my $ecdone = "ec.done";
+printf $mh "$ecdone: %s\n", join(" ", @cdep);
+printf $mh "\ttouch \$\@;\n\n";
+
+my $ec_only = "ec_only: $ecdone";
+printf $mh "$ec_only\n\n";
+
+my $bashcommand="";
+my $suffix = "top20x";
+
+
+foreach my $estimated_genome_size (@gsizes){
+ foreach my $valid_voters (@voters){
+ my $combined = "$outdir/$prefix.v$valid_voters.20x$estimated_genome_size";
+ {
+ $rreq = "";
+ if(defined $param_gettop_rreq) {
+ $rreq = "SGE_RREQ=\"$param_gettop_rreq\" ";
+ }
+ my $PG1 = "$bindir/get_top_20x_fa.pl";
+ my @vdep = ();
+ for($libi = 0; $libi < $nlibs; $libi ++){
+ push(@vdep, "$outdir/c_l${libi}.1.fin.v$valid_voters.idfq.gz");
+ }
+ printf $mh "%s.fq: %s\n", $combined, join(" ", @vdep);
+ printf $mh "\t${rreq}STMP=`mktemp -d --tmpdir sprai-XXXXXX`;gzip -d -c \$^ > \$\$STMP/ec.idfq && $PG1 \$\$STMP/ec.idfq -l -g $estimated_genome_size -q > \$\@.tmp && mv \$\@.tmp \$\@; status=\$\$?; rm -fr \$\$STMP; exit \$\$status \n\n";
+
+ printf $mh "%s.frg: %s.fq\n", $combined, $combined;
+ printf $mh "\t$ca_path/fastqToCA -libraryname foo -technology pacbio-corrected -reads \$< > \$\@\n\n";
+ }
+ }
+}
+my @frgs = ();
+foreach my $egsize (@gsizes){
+ foreach my $vv (@voters){
+ my $frg = "$outdir/$prefix.v$vv.20x$egsize.frg";
+ push(@frgs, $frg);
+ }
+}
+printf $mh ".PHONY: prepfrg\n";
+printf $mh "prepfrg: %s\n\n", join(" ", @frgs);
+
+close $mh;
+
+
+if($submit || $jobs){
+ my $headstr ="#!/bin/sh\n#\$ -S /bin/sh\n#\$ -cwd\n";
+# construct jobs for grid engines
+# each job just calls a make command to construct certain targets
+# $partition should be run as array job and accessed as $SGE_TASK_ID
+# STR0=`expr $SGE_TASK_ID - 1`
+# PART=`printf '%04d' $STR0`
+ my $partgen = q(STR0=`expr $SGE_TASK_ID - 1`
+PART=`printf '%04d' $STR0`
+);
+ my $jobdir = "jobs";
+ if(!-d $jobdir){
+ make_path($jobdir) or die "cannot make_path $jobdir: $!\n";
+ }
+ make_path('log');
+ my $rreq = $param_makedb_rreq;
+ my @idfqjobids = (); # store the jobids for each library
+
+ make_path('log/idfqgz');
+ for($libi = 0; $libi < $nlibs; $libi++){
+ my $iprefix = "$outdir/c_l$libi";
+ my $idfqgz = "$iprefix.idfq.gz";
+ my $jf = "$jobdir/idfq$libi.job";
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $idfqgz\n";
+ close $jh;
+ if($submit){
+ my $retv=`qsub -o log/idfqgz -e log/idfqgz $jf`;
+ if($retv =~ /Your job (\d+)/){
+ push(@idfqjobids, $1);
+ print "$1: qsub -o log/idfqgz -e log/idfqgz $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ }
+
+ my @jobids = (); # store the jobids for each library
+ make_path('log/makeblastdb');
+ for($libi = 0; $libi < $nlibs; $libi++){
+ my $jf = "$jobdir/blastdb_partition$libi.job";
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $outdir/c${libi}_imakeblastdb.done\n";
+ close $jh;
+ if($submit){
+ my $retv=`qsub -o log/makeblastdb -e log/makeblastdb -hold_jid $idfqjobids[$libi] $jf`;
+ if($retv =~ /Your job (\d+)/){
+ push(@jobids, $1);
+ print "$1: qsub -o log/makeblastdb -e log/makeblastdb -hold_jid $idfqjobids[$libi] $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ }
+ # prepare the target database consisting of all the libraries by writing a single file
+ # pseudo dependency to synchronize
+ my $jf = "$jobdir/blastdb_all.job";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $nalf\n";
+ close $jh;
+ }
+ my $nalfjid;
+ if($submit){
+ my $depends = join(",", at jobids);
+ my $retv=`qsub -o log/makeblastdb -e log/makeblastdb -hold_jid $depends $jf`;
+ if($retv =~ /Your job (\d+)/){
+ $nalfjid = $1;
+ print "$1: qsub -o log/makeblastdb -e log/makeblastdb -hold_jid $depends $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+
+ make_path('log/partition');
+ my $preq = sprintf " -t 1-%d ", $partition;
+ for($libi = 0; $libi < $nlibs; $libi++){
+ my $ipjid;
+ my $jf = "$jobdir/input_partition_l$libi.job";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $outdir/c_l${libi}_partition_fa.done\n";
+ print $jh "sleep 5s\n";
+ close $jh;
+ }
+ if($submit){
+ my $retv=`qsub -o log/partition -e log/partition -hold_jid $idfqjobids[$libi] $jf`;
+ if($retv =~ /Your job (\d+)/){
+ $ipjid = $1;
+ print "$1: qsub -o log/partition -e log/partition -hold_jid $idfqjobids[$libi] $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+
+ my $tmp_dfq = "tmp_dfq_l$libi";
+ make_path('log/blast');
+ #blast individual partitions and convert to sam like format
+ $jf = "$jobdir/blast_exec$libi.job";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $param_blast_rreq;
+ printf $jh "#\$ %s\n", $preq;
+ print $jh $partgen;
+ if($direct_vote){
+ # This code do not separate the jobs for different valid_voters, which is apparently inefficient.
+ # This is justified because for $direct_vote mode, only a single valid_voters are expected.
+ # Not reusing the blast output for the multiple valid_voters is by itself inefficient, and
+ # parallelization merit for multiple valid_voters is not implemented.
+ foreach my $valid_voters(@voters){
+ print $jh "make -f $makefile $outdir/$tmp_dfq.v$valid_voters/c_l${libi}_\$PART.v$valid_voters.dfq.gz\n";
+ }
+ }else{
+ print $jh "make -f $makefile $outdir/$tmp_dfq/c_l${libi}_\$PART.blast.sam.gz\n";
+ }
+ close $jh;
+ }
+ if($submit){
+ my $retv=`qsub -o log/blast -e log/blast -hold_jid $nalfjid,$ipjid $jf`;
+ if($retv =~ /Your job-array (\d+)/){
+ print "$1: qsub -o log/blast -e log/blast -hold_jid $nalfjid,$ipjid $jf\n";
+ $jobids[$libi] = $1;
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ }
+ $rreq = $param_ec_rreq;
+ my %jid_vv;
+ make_path('log/part_ec');
+ foreach my $valid_voters(@voters){
+ my @t_jid_lib = ();
+ for($libi = 0; $libi < $nlibs; $libi++){
+ #error correct individual partitions
+ my $jf="$jobdir/part_ec$libi.v$valid_voters.job";
+ my $tmp_dfq = "tmp_dfq_l$libi";
+ my $tmp_finish = "tmp_finish_l$libi.v$valid_voters";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ printf $jh "#\$ %s\n", $preq;
+ print $jh $partgen;
+ print $jh "make -f $makefile $outdir/$tmp_finish/c_l${libi}.refined_\$PART.fin.v$valid_voters.idfq.gz\n";
+ close $jh;
+ }
+ if($submit){
+ my $depend = $jobids[$libi];
+ my $retv=`qsub -o log/part_ec -e log/part_ec -hold_jid_ad $depend $jf`;
+ if($retv =~ /Your job-array (\d+)/){
+ $t_jid_lib[$libi] = $1;
+ print "$1: qsub -o log/part_ec -e log/part_ec -hold_jid_ad $depend $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ #finalize ec proc (merge error corrected partitions)
+ # consecutive job with array dependency
+ $jf="$jobdir/merge_gc$libi.v$valid_voters.job";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $outdir/c_l${libi}.1.fin.v$valid_voters.idfq.gz\n";
+ close $jh;
+ }
+ if($submit){
+ my $depend = $t_jid_lib[$libi];
+ my $retv=`qsub -o log -e log -hold_jid $depend $jf`;
+ if($retv =~ /Your job (\d+)/){
+ $jid_vv{$valid_voters} = $1;
+ print "$1: qsub -o log -e log -hold_jid $depend $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ }
+ }
+ $rreq = $param_gettop_rreq;
+ foreach my $estimated_genome_size (@gsizes){
+ foreach my $valid_voters (@voters){
+ #prepCA (select top 20 x based on estimated genome size)
+ my $jf="$jobdir/prepCA.v$valid_voters.e$estimated_genome_size.job";
+ my $ps = "$prefix.v$valid_voters.20x$estimated_genome_size";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $outdir/$ps.frg\n";
+ close $jh;
+ }
+ my $t_jid;
+ if($submit){
+ my $depend = $jid_vv{$valid_voters};
+ my $retv=`qsub -o log -e log -hold_jid $depend $jf`;
+ if($retv =~ /Your job (\d+)/){
+ $t_jid = $1;
+ print "$1: qsub -o log -e log -hold_jid $depend $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ #execCA
+ $jf="$jobdir/execCA.v$valid_voters.e$estimated_genome_size.job";
+ {
+ open my $jh, ">", $jf or die "cannot open $jf: $!\n";
+ print $jh $headstr;
+ printf $jh "#\$ %s\n", $rreq;
+ print $jh "make -f $makefile $outdir/$ps/9-terminator/$ps.scf.fasta\n";
+ close $jh;
+ }
+ if($submit){
+ my $depend = $t_jid;
+ my $retv=`qsub -o log -e log -hold_jid $depend $jf`;
+ if($retv =~ /Your job (\d+)/){
+ $t_jid = $1;
+ print "$1: qsub -o log -e log -hold_jid $depend $jf\n";
+ }else{
+ die "job submission error: qsub $jf said $retv";
+ }
+ }
+ }
+ }
+}
+
diff --git a/ezez4qsub_vx1.pl b/ezez4qsub_vx1.pl
index 5f5e5c9..6ed1751 100755
--- a/ezez4qsub_vx1.pl
+++ b/ezez4qsub_vx1.pl
@@ -363,6 +363,8 @@ my $orig_idfq = sprintf("$datadir/$preprefix%02d.idfq.gz",0);
my $db_idfq_gz = sprintf("$datadir/$preprefix%02d.db.idfq.gz",0);
my $qu_idfq_gz = sprintf("$datadir/$preprefix%02d.qu.idfq.gz",0);
+my $pipefail = "set -o pipefail;";
+
if($from == 0)
{
# fq -> idfq (& id2n)
@@ -385,7 +387,7 @@ if($from == 0)
$f_do = &do_or_not(\@parents,\$child);
my $PG0 = "$bindir/fa2fq.pl";
- $command = sprintf("cat $parent | $PG0 - | $dumbbell_filter - | $PG - -flag --prefix $datadir/$preprefix%02d | gzip -c -1 > $child.tmp && mv $child.tmp $child",0);
+ $command = sprintf("$pipefail cat $parent | $PG0 - | $dumbbell_filter - | $PG - -flag --prefix $datadir/$preprefix%02d | gzip -c -1 > $child.tmp && mv $child.tmp $child",0);
if($f_do){
$do_qsub=1;
}
@@ -401,7 +403,7 @@ if($from == 0)
my $com2;
if($min_len_for_query > 1){
- $com2 = sprintf("gzip -d -c $parent | $PG2 - $min_len_for_query -max_len $max_len_for_query | gzip -c -1 > $child.tmp && mv $child.tmp $child",0);
+ $com2 = sprintf("$pipefail gzip -d -c $parent | $PG2 - $min_len_for_query -max_len $max_len_for_query | gzip -c -1 > $child.tmp && mv $child.tmp $child",0);
}
else{
$com2 = sprintf("ln -s $parent $child");
@@ -495,7 +497,7 @@ for(my $index=$from; $index<$to; ++$index){
$PG = "$path2blast/makeblastdb";
$PG2= "$bindir/dfq2fq_v2.pl";
- $command = sprintf("gzip -d -c $parent | $PG2 -f - | $PG -in - -dbtype nucl -out $datadir/$preprefix%02d -title $preprefix%02d 1>$child.tmp && mv $child.tmp $child",$index,$index);
+ $command = sprintf("$pipefail gzip -d -c $parent | $PG2 -f - | $PG -in - -dbtype nucl -out $datadir/$preprefix%02d -title $preprefix%02d 1>$child.tmp && mv $child.tmp $child",$index,$index);
if($f_do){
$do_qsub=1;
}
@@ -711,7 +713,7 @@ for(my $index=$from; $index<$to; ++$index){
my $f_do=1;
my $input = sprintf("$maindir/$preprefix%02d_%04d.fq",$index,$list[$i]);
#my $input = sprintf("$maindir/$preprefix%02d_%04d.fa",$index,$list[$i]);
- $command = sprintf("cat $input | $bindir/fq2fa.pl - | $PG -db $datadir/$preprefix%02d -query - -evalue $evalue -outfmt '$outfmt' | $PG2 - | $PG3 - | $PG4 - | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile", $index);
+ $command = sprintf("$pipefail cat $input | $bindir/fq2fa.pl - | $PG -db $datadir/$preprefix%02d -query - -evalue $evalue -outfmt '$outfmt' | $PG2 - | $PG3 - | $PG4 - | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile", $index);
#$command = sprintf("cat $input | $bindir/fq2fa.pl - | $PG -db $datadir/$preprefix%02d -query - -evalue $evalue -outfmt '$outfmt' | $PG2 - -f $input | $PG3 - | $PG4 - | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile", $index);
push @othercom, "gzip -d -t $outputfile";
push @othercom, "while [ \$? -ne 0 ]";
@@ -849,10 +851,10 @@ for(my $index=$from; $index<$to; ++$index){
}
my $tmp = join " ", @ta;
if($index+1 == $to){
- $command = sprintf("gzip -d -c $tmp | $bindir/dfq2fq_v2.pl - --finish --valid_depth $valid_depth -valid_read_length $valid_read_length | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile");
+ $command = sprintf("$pipefail gzip -d -c $tmp | $bindir/dfq2fq_v2.pl - --finish --valid_depth $valid_depth -valid_read_length $valid_read_length | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile");
}
else{
- $command = sprintf("gzip -d -c $tmp | $bindir/dfq2fq_v2.pl - | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile");
+ $command = sprintf("$pipefail gzip -d -c $tmp | $bindir/dfq2fq_v2.pl - | gzip -c -1 > $outputfile.tmp && mv $outputfile.tmp $outputfile");
}
}
@@ -1059,7 +1061,7 @@ for(my $index=$from; $index<$to; ++$index){
my $uuid = $now;
#my $uuid = `uuidgen`;
chomp $uuid;
- $command = sprintf("gzip -d -c $second_catted_file > $datadir/$uuid.tmp && $PG1 $datadir/$uuid.tmp -l -g $estimated_genome_size -q -c 20 > $child.tmp && mv $child.tmp $child && rm $datadir/$uuid.tmp");
+ $command = sprintf("$pipefail gzip -d -c $second_catted_file > $datadir/$uuid.tmp && $PG1 $datadir/$uuid.tmp -l -g $estimated_genome_size -q -c 20 > $child.tmp && mv $child.tmp $child && rm $datadir/$uuid.tmp");
#$command = sprintf("gzip -d -c $second_catted_file | $PG1 - -g $estimated_genome_size -q -c 20 > $child.tmp && mv $child.tmp $child");
if($f_do){
$do_qsub=1;
diff --git a/ezez_vx1.pl b/ezez_vx1.pl
index 35d7c01..f470750 100755
--- a/ezez_vx1.pl
+++ b/ezez_vx1.pl
@@ -250,6 +250,8 @@ my $confident_length_coefficient = 0.75;
my $date="";
my $message="";
+my $pipefail = "set -o pipefail;";
+
# mkdirs
my $PWD = `pwd`;
chomp $PWD;
@@ -302,7 +304,7 @@ if($from == 0)
$PG0 = "$sprai_path/$PG0";
}
if($f_do){
- $com1 = sprintf("time cat $parent | $PG0 - | $dumbbell_filter - | $PG - --prefix $tmp_dir/db%02d | gzip -c -1 > $child.tmp && mv $child.tmp $child",$from);
+ $com1 = sprintf("$pipefail time cat $parent | $PG0 - | $dumbbell_filter - | $PG - --prefix $tmp_dir/db%02d | gzip -c -1 > $child.tmp && mv $child.tmp $child",$from);
`$com1`;
}
else{
@@ -321,7 +323,7 @@ if($from == 0)
my $parent=$db_idfq_gz;
if($f_do){
if($min_len_for_query > 1){
- $com2 = sprintf("time gzip -d -c $parent | $PG2 - $min_len_for_query -max_len $max_len_for_query | gzip -c -1 > $child.tmp && mv $child.tmp $child",$from);
+ $com2 = sprintf("$pipefail time gzip -d -c $parent | $PG2 - $min_len_for_query -max_len $max_len_for_query | gzip -c -1 > $child.tmp && mv $child.tmp $child",$from);
}
else{
$com2 = sprintf("ln -s $parent $child");
@@ -356,7 +358,7 @@ for(my $index=$from; $index<$to; ++$index){
$f_do = &do_or_not(\@parents,\$child);
my $parent = $parents[0];
if($f_do){
- $com1 = sprintf("time gzip -d -c $parent | $PG1 -f - | makeblastdb -in - -dbtype nucl -out $tmp_dir/$preprefix%02d -title $preprefix%02d 1>$child.tmp && mv $child.tmp $child",$index,$index);
+ $com1 = sprintf("$pipefail time gzip -d -c $parent | $PG1 -f - | makeblastdb -in - -dbtype nucl -out $tmp_dir/$preprefix%02d -title $preprefix%02d 1>$child.tmp && mv $child.tmp $child",$index,$index);
}
else{
$com1 = "sleep 0";
@@ -376,7 +378,7 @@ for(my $index=$from; $index<$to; ++$index){
$f_do = &do_or_not(\@parents,\$child);
my $parent = $parents[0];
if($f_do){
- $com2 = sprintf("time gzip -d -c $parent | $PG1 -f - | $PG2 - $partition -p $tmp_dir/$preprefix%02d 1>$child.tmp && mv $child.tmp $child", $index);
+ $com2 = sprintf("$pipefail time gzip -d -c $parent | $PG1 -f - | $PG2 - $partition -p $tmp_dir/$preprefix%02d 1>$child.tmp && mv $child.tmp $child", $index);
}
else{
$com2 = "sleep 0";
@@ -445,7 +447,7 @@ for(my $index=$from; $index<$to; ++$index){
my $f_do=1;
$f_do = &do_or_not(\@parents,\$child);
if($f_do){
- $command .= sprintf("time cat $parent | $BLASTN -db $tmp_dir/$preprefix%02d -query - -evalue $evalue -outfmt '$outfmt' | $PG2 - | $PG3 - | $PG4 - | gzip -c -1 > $child.tmp && mv $child.tmp $child & ", $index);
+ $command .= sprintf("$pipefail time cat $parent | $BLASTN -db $tmp_dir/$preprefix%02d -query - -evalue $evalue -outfmt '$outfmt' | $PG2 - | $PG3 - | $PG4 - | gzip -c -1 > $child.tmp && mv $child.tmp $child & ", $index);
}
else{
$command .= "sleep 0 & ";
@@ -480,7 +482,7 @@ for(my $index=$from; $index<$to; ++$index){
my $f_do=1;
$f_do = &do_or_not(\@parents,\$child);
if($f_do){
- $command .= sprintf("time gzip -d -c $parent | $PG1 - -finish -valid_depth $valid_depth -valid_read_length $valid_read_length | gzip -c -1 > $child.tmp && mv $child.tmp $child & ");
+ $command .= sprintf("$pipefail time gzip -d -c $parent | $PG1 - -finish -valid_depth $valid_depth -valid_read_length $valid_read_length | gzip -c -1 > $child.tmp && mv $child.tmp $child & ");
}
else{
$command .= "sleep 0 & ";
diff --git a/makefile b/makefile
index 84ad6f2..8f78f13 100644
--- a/makefile
+++ b/makefile
@@ -1,5 +1,5 @@
APPNAME = 'sprai'
-VERSION = '0.9.9.14'
+VERSION = '0.9.9.16'
PREFIX=$(PWD)
COMPILED= \
@@ -11,6 +11,7 @@ m52bfmt7 \
SCRIPTS= \
ca_ikki_v5.pl \
+ezez4makefile_v4.pl \
ezez4qsub_vx1.pl \
ezez_vx1.pl \
dumbbell_filter.pl \
@@ -32,7 +33,6 @@ check_circularity.pl \
#re2cons.pl \
#ezez4qsub_v9.pl \
#ezez_v8.pl \
-#ezez4makefile_v4.pl \
all: $(COMPILED)
diff --git a/nss2v_v3.c b/nss2v_v3.c
index 51b5d24..2711dac 100644
--- a/nss2v_v3.c
+++ b/nss2v_v3.c
@@ -4,8 +4,9 @@
#include <unistd.h>
#include <getopt.h>
#include <ctype.h> /* toupper */
+#include <assert.h>
-int LSEQ = 46;
+int LSEQ = 2048;
#define LAUX 32
/* ~hit num */
@@ -547,6 +548,232 @@ calculate_region_length(sam_t* t_sam)
}
return length_of_region;
}
+int
+find_row_with_space(regions_t**lists, int* i_list, int listsize, int nrows, int opt_separate, int stt, int end, int interval)
+{
+ int row;
+ for(row=0; row < nrows; ++row){
+ int f_free = 0;
+ /* find a free region */
+ if(!opt_separate){
+ int l;
+ int not_collide = 1;
+ for(l=0; l<i_list[row] && l<listsize; ++l){/* check the rgn does not overlap any reads */
+ if(end+interval < lists[row][l].stt || stt > lists[row][l].end+interval){
+ continue;
+ }else{
+ not_collide=0;
+ }
+ }
+ if(not_collide==1 && i_list[row] <= listsize){
+ f_free = 1;
+ }
+ }
+ if(opt_separate){
+ if(i_list[row] == 0){
+ f_free = 1;
+ }
+ }
+ if(f_free){
+ return row;
+ }
+ }
+ return row;
+}
+int
+paste_match(b_col_t * basic_column, el_pool_t * el_pool, int *pos, int *si, const char *sbuf, int num, int row, int stt)
+{
+ /* basic_column is the major operand that will be changed */
+ /* el_pool supplies element cells */
+
+ /* num is referred only once */
+ /* pos is incremented for num times*/
+ /* k is referred once */
+ /* row is not changed */
+ /* si is incremented */
+ /* stt is not changed */
+ /* sbuf is read only */
+
+ int t = *pos+num;
+ int l;
+ int t_si = *si;
+ for(l=*pos; l<t; ++l,++*pos){
+ int score = 0;
+ el_t * tmp;
+ int m;
+ if(basic_column->array[l].to_el == NULL){
+ /* initialize */
+ basic_column->array[l].to_el = get_el(el_pool);
+ basic_column->array[l].next = NULL;
+ basic_column->array[l].score = 0;
+ }
+ tmp = basic_column->array[l].to_el;
+ for(m=0; m<row; ++m){
+ if(tmp->down == NULL){
+ tmp->down = get_el(el_pool);
+ }
+ if(tmp->el.base != ' '){
+ ++score;
+ }
+ tmp = tmp->down;
+ }
+ if(sbuf[t_si] != ' '){
+ assert(tmp->el.base == ' ');
+ tmp->el.base = sbuf[t_si];
+ ++score;
+ }else{
+ if(tmp->el.base != ' '){
+ ++score;
+ }
+ }
+ ++t_si;
+ while(tmp->down){
+ tmp = tmp->down;
+ if(tmp->el.base != ' '){
+ ++score;
+ }
+ }
+ basic_column->array[l].score = score;
+
+ if(*pos != stt && basic_column->array[l].to_el->el.base != ' ' && basic_column->array[l-1].to_el->el.base != ' ')
+ {
+ column_t * cur_col;
+ column_t * l_col;
+ column_t * r_col;
+ assert(l >= 1);
+ cur_col = basic_column->array[l-1].next;/* ins column */
+ l_col = &(basic_column->array[l-1]);
+ r_col = &(basic_column->array[l]);
+ for(; cur_col; cur_col=cur_col->next){
+ {
+ el_t * l_el = l_col->to_el;
+ el_t * r_el = r_col->to_el;
+ el_t * c_el = cur_col->to_el;
+ int depth = 0;
+ int score = 0;
+ while(l_el != NULL && r_el != NULL)
+ {
+ if(c_el->el.base == ' '){
+ if(l_el->el.base != ' ' && r_el->el.base != ' '){
+ c_el->el.base = '-';
+ ++score;
+ }
+ }else{
+ ++score;
+ }
+ ++depth;
+
+ l_el = l_el->down;
+ r_el = r_el->down;
+ if(depth > row && (l_el == NULL || r_el == NULL)){
+ break;
+ }
+ if(c_el->down == NULL){
+ c_el->down = get_el(el_pool);
+ }
+ c_el = c_el->down;
+ }
+ while(depth <= row){
+ if(depth == row){
+ c_el->el.base = '-';
+ ++score;
+ break;
+ }else if(c_el->el.base == ' '){
+ }else{
+ ++score;
+ }
+ if(c_el->down == NULL){
+ c_el->down = get_el(el_pool);
+ }
+ c_el = c_el->down;
+ ++depth;
+ }
+ cur_col->score = score;
+ }
+ }
+ }
+ }
+ *si = t_si;
+ return *pos;
+}
+int
+paste_insert(b_col_t * basic_column, el_pool_t * el_pool, col_pool_t *col_pool, int pos, int *si, const char *sbuf, int num, int row, int stt)
+{
+ int p;
+ column_t *l_col, *r_col, *cur_col;
+ assert(pos >=1);
+ l_col = &(basic_column->array[pos-1]);
+ r_col = &(basic_column->array[pos]);
+ cur_col = l_col;
+
+ for(p=0; p< num; ++p){
+ if(cur_col->next == NULL){
+ cur_col->next = get_col(col_pool, el_pool);
+ }
+ cur_col = cur_col->next;
+ {
+ el_t * l_el = l_col->to_el;
+ el_t * r_el = r_col->to_el;
+ el_t * c_el = cur_col->to_el;
+ int depth = 0;
+ int score = 0;
+ while(l_el != NULL && r_el != NULL)
+ {
+ if(depth == row){
+ if(sbuf[*si] != ' '){
+ assert(c_el->el.base == ' ');
+ c_el->el.base = sbuf[*si];
+ ++score;
+ }else{
+ if(c_el->el.base != ' '){
+ ++score;
+ }
+ }
+ ++*si;
+ }else if(c_el->el.base == ' '){
+ if(l_el->el.base != ' ' && r_el->el.base != ' '){
+ c_el->el.base = '-';
+ ++score;
+ }
+ }else{
+ /* keep */
+ ++score;
+ }
+ ++depth;
+
+ l_el = l_el->down;
+ r_el = r_el->down;
+ if(depth > row && (l_el == NULL || r_el == NULL)){
+ break;
+ }
+ if(c_el->down == NULL){
+ c_el->down = get_el(el_pool);
+ }
+ c_el = c_el->down;
+ }
+ while(depth <= row){
+ if(depth == row){
+ if(sbuf[*si] != ' '){
+ c_el->el.base = sbuf[*si];
+ ++score;
+ }
+ ++*si;
+ break;
+ }else if(c_el->el.base == ' '){
+ }else{
+ ++score;
+ }
+ if(c_el->down == NULL){
+ c_el->down = get_el(el_pool);
+ }
+ c_el = c_el->down;
+ ++depth;
+ }
+ cur_col->score = score;
+ }
+ }
+ return *si;
+}
int main(int argc, char ** argv)
{
int hitnum=0;
@@ -612,7 +839,6 @@ int main(int argc, char ** argv)
fprintf(stderr, "USAGE: <this> <in.nss>\n");
fprintf(stderr, "\tinput MUST be sorted by 1st chromosome name (must), 2nd evalue and 3rd bitScore\n");
fprintf(stderr, "\t-v <integer>: valid voter.\n");
- fprintf(stderr, "\t-r <integer>: max read length. default value is 32768\n");
return 1;
}
if(LSEQ <= 65536){
@@ -676,7 +902,6 @@ int main(int argc, char ** argv)
int stt2;
int end2;
int f_discard;
- int not_collide;
int length_of_region;
length_of_region = calculate_region_length(t_sam);
if(!length_of_region){ continue; }
@@ -716,237 +941,33 @@ int main(int argc, char ** argv)
continue;
}
- gen_inserted_seq(sbuf, t_sam);
-
- int row;
- for(row=0; row<NROWS; ++row){
- int f_free = 0;
- not_collide = 1;
- /* find a free region */
- if(!opt_separate){
- int l;
- for(l=0; l<i_list[row] && l<LISTSIZE; ++l){/* check the rgn does not overlap any reads */
- if(end2+INTERVAL < lists[row][l].stt || stt2 > lists[row][l].end+INTERVAL){
- continue;
+ {
+ int row = find_row_with_space(lists, i_list, LISTSIZE, NROWS, opt_separate, stt2, end2, INTERVAL);
+ if(row >= NROWS) continue;
+ /* column[row] is free */
+ /* set */
+ {
+ int k;
+ int pos=stt;
+ int si = 0;
+ gen_inserted_seq(sbuf, t_sam);
+ for(k=0; k<t_sam->cigar_length; ++k){
+ char sym = t_sam->cigar[k].sym;
+ if(sym == 'M' || sym == 'D'){
+ paste_match(&basic_column, &el_pool, &pos, &si, sbuf, t_sam->cigar[k].num, row, stt);
+ }else if(sym == 'I'){
+ paste_insert(&basic_column, &el_pool, &col_pool, pos, &si, sbuf, t_sam->cigar[k].num, row, stt);
}else{
- not_collide=0;
+ fprintf(stderr,"strange sym %c\n",sym);
+ exit(1);
}
}
- if(not_collide==1 && i_list[row] <= LISTSIZE){
- f_free = 1;
- }
}
- if(opt_separate){
- if(i_list[row] == 0){
- f_free = 1;
- }
- }
- if(f_free){
- /* column[row] is free */
- /* set */
- {
- int k,m;
- int pos=stt;
- int si = 0;
- for(k=0; k<t_sam->cigar_length; ++k){
- char sym = t_sam->cigar[k].sym;
- if(sym == 'M' || sym == 'D'){
- int t = pos+t_sam->cigar[k].num;
- int l;
- for(l=pos; l<t; ++l,++pos){
- int score = 0;
- el_t * tmp;
- if(basic_column.array[l].to_el == NULL){
- /* initialize */
- basic_column.array[l].to_el = get_el(&el_pool);
- basic_column.array[l].next = NULL;
- basic_column.array[l].score = 0;
- }
- tmp = basic_column.array[l].to_el;
- for(m=0; m<row; ++m){
- if(tmp->down == NULL){
- tmp->down = get_el(&el_pool);
- }
- if(tmp->el.base != ' '){
- ++score;
- }
- tmp = tmp->down;
- }
- if(sbuf[si] != ' '){
- if(tmp->el.base != ' '){
- fprintf(stderr,"bug %s %c %c\n",t_sam->rname,tmp->el.base,sbuf[si]);
- }
- tmp->el.base = sbuf[si];
- ++score;
- }else{
- if(tmp->el.base != ' '){
- ++score;
- }
- }
- ++si;
- while(tmp->down){
- tmp = tmp->down;
- if(tmp->el.base != ' '){
- ++score;
- }
- }
- basic_column.array[l].score = score;
-
- if(pos != stt && basic_column.array[l].to_el->el.base != ' ' && basic_column.array[l-1].to_el->el.base != ' ')
- {
- column_t * cur_col;
- column_t * l_col;
- column_t * r_col;
- if(l<1){
- fprintf(stderr,"pg bug l\n");
- }
- cur_col = basic_column.array[l-1].next;/* ins column */
- l_col = &(basic_column.array[l-1]);
- r_col = &(basic_column.array[l]);
- for(; cur_col; cur_col=cur_col->next){
- {
- el_t * l_el = l_col->to_el;
- el_t * r_el = r_col->to_el;
- el_t * c_el = cur_col->to_el;
- int depth = 0;
- int score = 0;
- while(l_el != NULL && r_el != NULL)
- {
- if(c_el->el.base == ' '){
- if(l_el->el.base != ' ' && r_el->el.base != ' '){
- c_el->el.base = '-';
- ++score;
- }
- }else{
- ++score;
- }
- ++depth;
-
- l_el = l_el->down;
- r_el = r_el->down;
- if(depth > row && (l_el == NULL || r_el == NULL)){
- break;
- }
- if(c_el->down == NULL){
- c_el->down = get_el(&el_pool);
- }
- c_el = c_el->down;
- }
- while(depth <= row){
- if(depth == row){
- c_el->el.base = '-';
- ++score;
- break;
- }else if(c_el->el.base == ' '){
- }else{
- ++score;
- }
- if(c_el->down == NULL){
- c_el->down = get_el(&el_pool);
- }
- c_el = c_el->down;
- ++depth;
- }
- cur_col->score = score;
- }
- }
- }
- }
- }else if(sym == 'I'){
- int p;
- int n_ins;
- column_t *l_col, *r_col, *cur_col;
- if(pos<1){
- fprintf(stderr,"strange pos %d\n",pos);
- exit(1);
- }
- l_col = &(basic_column.array[pos-1]);
- r_col = &(basic_column.array[pos]);
- cur_col = l_col;
-
- n_ins = t_sam->cigar[k].num;
- for(p=0; p<n_ins; ++p){
- if(cur_col->next == NULL){
- cur_col->next = get_col(&col_pool, &el_pool);
- }
- cur_col = cur_col->next;
-
- {
- el_t * l_el = l_col->to_el;
- el_t * r_el = r_col->to_el;
- el_t * c_el = cur_col->to_el;
- int depth = 0;
- int score = 0;
- while(l_el != NULL && r_el != NULL)
- {
- if(depth == row){
- if(sbuf[si] != ' '){
- if(c_el->el.base != ' '){
- fprintf(stderr,"pg bug 433\n");
- }
- c_el->el.base = sbuf[si];
- ++score;
- }else{
- if(c_el->el.base != ' '){
- ++score;
- }
- }
- ++si;
- }else if(c_el->el.base == ' '){
- if(l_el->el.base != ' ' && r_el->el.base != ' '){
- c_el->el.base = '-';
- ++score;
- }
- }else{
- /* keep */
- ++score;
- }
- ++depth;
-
- l_el = l_el->down;
- r_el = r_el->down;
- if(depth > row && (l_el == NULL || r_el == NULL)){
- break;
- }
- if(c_el->down == NULL){
- c_el->down = get_el(&el_pool);
- }
- c_el = c_el->down;
- }
- while(depth <= row){
- if(depth == row){
- if(sbuf[si] != ' '){
- c_el->el.base = sbuf[si];
- ++score;
- }
- ++si;
- break;
- }else if(c_el->el.base == ' '){
- }else{
- ++score;
- }
- if(c_el->down == NULL){
- c_el->down = get_el(&el_pool);
- }
- c_el = c_el->down;
- ++depth;
- }
- cur_col->score = score;
- }
- }
- }else{
- fprintf(stderr,"strange sym %c\n",sym);
- exit(1);
- }
- }
- }
- lists[row][i_list[row]].stt = stt2;
- lists[row][i_list[row]].end = end2;
- ++i_list[row];
- if(max_end < end2){
- max_end = end2;
- }
- break;
+ lists[row][i_list[row]].stt = stt2;
+ lists[row][i_list[row]].end = end2;
+ ++i_list[row];
+ if(max_end < end2){
+ max_end = end2;
}
}
}
diff --git a/wscript b/wscript
index 69d8d4e..645b700 100644
--- a/wscript
+++ b/wscript
@@ -1,5 +1,5 @@
APPNAME = 'sprai'
-VERSION = '0.9.9.14'
+VERSION = '0.9.9.16'
srcdir = '.'
blddir = 'build'
@@ -42,7 +42,7 @@ def build(bld):
'get_top_20x_fa.pl',
# 'mira_ikki.pl',
'partition_fa.pl',
-# 'ezez4makefile_v4.pl',
+ 'ezez4makefile_v4.pl',
'get_target_fasta_records.pl',
'dfq2fq_v2.pl',
'extract_fq.pl',
@@ -120,7 +120,7 @@ def dist(ctx):
'get_top_20x_fa.pl',
# 'mira_ikki.pl',
'partition_fa.pl',
-# 'ezez4makefile_v4.pl',
+ 'ezez4makefile_v4.pl',
'get_target_fasta_records.pl',
'doc/_build/html/**',
'configure',
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/sprai.git
More information about the debian-med-commit
mailing list