[med-svn] [Git][med-team/seqprep][master] 15 commits: Replace python-markdown by markdown
Andreas Tille
gitlab at salsa.debian.org
Tue Dec 17 12:09:43 GMT 2019
Andreas Tille pushed to branch master at Debian Med / seqprep
Commits:
0aac3efe by Andreas Tille at 2019-12-17T10:53:50Z
Replace python-markdown by markdown
- - - - -
058dce6a by Andreas Tille at 2019-12-17T11:00:50Z
Use 2to3 to port from Python2 to Python3
- - - - -
80664b0c by Andreas Tille at 2019-12-17T11:07:21Z
Build-Depends: python3
- - - - -
65cb86f9 by Andreas Tille at 2019-12-17T11:07:44Z
routine-update: debhelper-compat 12
- - - - -
cb89a5dd by Andreas Tille at 2019-12-17T11:07:55Z
routine-update: Standards-Version: 4.4.1
- - - - -
e7e5f691 by Andreas Tille at 2019-12-17T11:08:39Z
R-U: DEB_BUILD_OPTIONS allow override_dh_auto_test
- - - - -
d5484a76 by Andreas Tille at 2019-12-17T11:08:39Z
R-U: Trailing whitespace in debian/changelog
- - - - -
6a8a1ad3 by Andreas Tille at 2019-12-17T11:08:39Z
R-U: autopkgtest: s/ADTTMP/AUTOPKGTEST_TMP/g
- - - - -
90e28916 by Andreas Tille at 2019-12-17T11:08:48Z
Set upstream metadata fields: Bug-Database, Repository, Repository-Browse.
- - - - -
3f529b5c by Andreas Tille at 2019-12-17T11:14:50Z
Do not call ronn for every build just to rebuild manpage - rather provide manpage itself
- - - - -
7ef3091a by Andreas Tille at 2019-12-17T11:15:30Z
Add missing endif
- - - - -
facffbd3 by Andreas Tille at 2019-12-17T11:51:59Z
Add missing Build-Depends: markdown
- - - - -
d5c2c5d3 by Andreas Tille at 2019-12-17T11:54:55Z
Do not delete manpage any more
- - - - -
b5f8267e by Andreas Tille at 2019-12-17T11:58:29Z
Upload to unstable
- - - - -
a3d5618e by Andreas Tille at 2019-12-17T12:01:34Z
Remove redundant Priority field
- - - - -
10 changed files:
- debian/changelog
- − debian/compat
- debian/control
- + debian/patches/2to3.patch
- debian/patches/series
- debian/rules
- + debian/seqprep.1
- − debian/seqprep.1.ronn
- debian/tests/run-unit-test
- debian/upstream/metadata
Changes:
=====================================
debian/changelog
=====================================
@@ -1,3 +1,20 @@
+seqprep (1.3.2-4) unstable; urgency=medium
+
+ * Replace python-markdown by markdown
+ * Use 2to3 to port from Python2 to Python3
+ Closes: #938467
+ * debhelper-compat 12
+ * Standards-Version: 4.4.1
+ * Respect DEB_BUILD_OPTIONS in override_dh_auto_test target
+ * Remove trailing whitespace in debian/changelog
+ * autopkgtest: s/ADTTMP/AUTOPKGTEST_TMP/g
+ * Set upstream metadata fields: Bug-Database, Repository, Repository-
+ Browse.
+ * Do not call ronn for every build just to rebuild manpage - rather
+ provide manpage itself
+
+ -- Andreas Tille <tille at debian.org> Tue, 17 Dec 2019 12:52:12 +0100
+
seqprep (1.3.2-3) unstable; urgency=medium
* Build-Depends: ruby-ronn -> ronn
@@ -61,7 +78,7 @@ seqprep (1.1-3) unstable; urgency=medium
[ Andreas Tille ]
* make sure autopkgtest script will not fail when cleaning up
-
+
[ Graham Inggs ]
* enable building with gcc-5
* fix clean target
@@ -81,7 +98,7 @@ seqprep (1.1-1) unstable; urgency=medium
* Initial Upload to Debian
Closes: #778838
-
+
-- Andreas Tille <tille at debian.org> Fri, 20 Feb 2015 13:49:15 +0100
seqprep (1.1-0biolinux1) trusty; urgency=medium
=====================================
debian/compat deleted
=====================================
@@ -1 +0,0 @@
-11
=====================================
debian/control
=====================================
@@ -4,12 +4,11 @@ Uploaders: Tim Booth <tbooth at ceh.ac.uk>,
Andreas Tille <tille at debian.org>
Section: science
Priority: optional
-Build-Depends: debhelper (>= 11~),
- python,
- python-markdown,
- ronn,
+Build-Depends: debhelper-compat (= 12),
+ python3,
+ markdown,
zlib1g-dev
-Standards-Version: 4.1.4
+Standards-Version: 4.4.1
Vcs-Browser: https://salsa.debian.org/med-team/seqprep
Vcs-Git: https://salsa.debian.org/med-team/seqprep.git
Homepage: http://seqanswers.com/wiki/SeqPrep
@@ -33,7 +32,6 @@ Description: stripping adaptors and/or merging paired reads of DNA sequences wit
Package: seqprep-data
Architecture: all
-Priority: optional
Depends: ${misc:Depends},
seqprep
Description: example data set for seqprep - only used for testing
=====================================
debian/patches/2to3.patch
=====================================
@@ -0,0 +1,116 @@
+Description: Use 2to3 to port from Python2 to Python3
+Bug-Debian: https://bugs.debian.org/938467
+Author: Andreas Tille <tille at debian.org>
+Last-Update: Tue, 17 Dec 2019 11:53:25 +0100
+
+--- a/Test/RUNTEST.sh
++++ b/Test/RUNTEST.sh
+@@ -21,8 +21,8 @@
+ -E ./info/alignments_trimmed.txt.gz
+
+ prog=gzcat
+-$prog ./out/pe_bad_contam_trimmed_1.fastq.gz | python seqlens.py > ./info/pe_bad_contam_trimmed_1.lenhist.txt
+-$prog ./out/pe_bad_contam_trimmed_2.fastq.gz | python seqlens.py > ./info/pe_bad_contam_trimmed_2.lenhist.txt
+-$prog ./out/pe_bad_contam_merged_1.fastq.gz | python seqlens.py > ./info/pe_bad_contam_merged_1.lenhist.txt
+-$prog ./out/pe_bad_contam_merged_2.fastq.gz | python seqlens.py > ./info/pe_bad_contam_merged_2.lenhist.txt
+-$prog ./out/pe_bad_contam_merged_s.fastq.gz | python seqlens.py > ./info/pe_bad_contam_merged_s.lenhist.txt
++$prog ./out/pe_bad_contam_trimmed_1.fastq.gz | python3 seqlens.py > ./info/pe_bad_contam_trimmed_1.lenhist.txt
++$prog ./out/pe_bad_contam_trimmed_2.fastq.gz | python3 seqlens.py > ./info/pe_bad_contam_trimmed_2.lenhist.txt
++$prog ./out/pe_bad_contam_merged_1.fastq.gz | python3 seqlens.py > ./info/pe_bad_contam_merged_1.lenhist.txt
++$prog ./out/pe_bad_contam_merged_2.fastq.gz | python3 seqlens.py > ./info/pe_bad_contam_merged_2.lenhist.txt
++$prog ./out/pe_bad_contam_merged_s.fastq.gz | python3 seqlens.py > ./info/pe_bad_contam_merged_s.lenhist.txt
+--- a/Test/SimTest/RUNTEST.sh
++++ b/Test/SimTest/RUNTEST.sh
+@@ -44,8 +44,8 @@ do
+ rm simSeq_trimmed_2.fq
+ gunzip simSeq_trimmed_1.fq.gz
+ gunzip simSeq_trimmed_2.fq.gz
+- python ./simseq_trimmed_error_check.py simSeq10k_1.fq simSeq10k_2.fq simSeq_trimmed_1.fq simSeq_trimmed_2.fq > trimmed_M${m}_N${n}_X${x}_Z${z}_Q${q}_m${m2}_n${n2}.txt
+- python ../seqlens.py < simSeq_trimmed_1.fq > lendist_M${m}_N${n}_X${x}_Z${z}_Q${q}_m${m2}_n${n2}.txt
++ python3 ./simseq_trimmed_error_check.py simSeq10k_1.fq simSeq10k_2.fq simSeq_trimmed_1.fq simSeq_trimmed_2.fq > trimmed_M${m}_N${n}_X${x}_Z${z}_Q${q}_m${m2}_n${n2}.txt
++ python3 ../seqlens.py < simSeq_trimmed_1.fq > lendist_M${m}_N${n}_X${x}_Z${z}_Q${q}_m${m2}_n${n2}.txt
+ done
+ done
+ done
+@@ -60,8 +60,8 @@ done
+ #ea-utils-read-only/clipper/fastq-clipper simSeq10k_mcf_1.fq
+ #ea-utils-read-only/clipper/fastq-clipper -o simSeq10k_mcf_1.fq simSeq10k_1.fq AGATCGGAAGAGCGGTTCAG
+ #ea-utils-read-only/clipper/fastq-clipper -o simSeq10k_mcf_2.fq simSeq10k_2.fq AGATCGGAAGAGCGTCGTGT
+-#python ./simseq_trimmed_error_check.py simSeq10k_1.fq simSeq10k_2.fq simSeq10k_mcf_1.fq simSeq10k_mcf_2.fq >trimmed_M99_N99_X99_Z99_Q99_m99_n99.txt
++#python3 ./simseq_trimmed_error_check.py simSeq10k_1.fq simSeq10k_2.fq simSeq10k_mcf_1.fq simSeq10k_mcf_2.fq >trimmed_M99_N99_X99_Z99_Q99_m99_n99.txt
+
+
+-python sens_vs_spec.py
++python3 sens_vs_spec.py
+ #open result.html
+--- a/Test/SimTest/sens_vs_spec.py
++++ b/Test/SimTest/sens_vs_spec.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/python3
+ from glob import iglob as glob
+ from operator import itemgetter
+ flist = glob("trimmed_*.txt")
+@@ -30,9 +30,9 @@ outf = open('spec_sorted_results.txt','
+ colnames = []
+ rowdata = []
+ outf.write("#Fname\tSpecificity\tSensitivity\n")
+-total = len(specdict.items())
++total = len(list(specdict.items()))
+ num = 1
+-for (fname,spec) in sorted(specdict.items(),key=itemgetter(1),reverse=True):
++for (fname,spec) in sorted(list(specdict.items()),key=itemgetter(1),reverse=True):
+ colnames.append("data.addColumn('number','%s');"%(fname))
+ sens = sensdict[fname]
+ rowarr = [str(sens)]
+--- a/Test/SimTest/simseq_trimmed_error_check.py
++++ b/Test/SimTest/simseq_trimmed_error_check.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/python3
+
+ from collections import defaultdict
+ from operator import itemgetter
+@@ -52,7 +52,7 @@ max_len = 100
+ #sens = tp/(tp+fn)
+ #spec = tn/(tn+fp)
+
+-for (pid,truelen) in sizes.iteritems():
++for (pid,truelen) in sizes.items():
+ outlen = outsizes[pid]
+ ## if truelen >= min_len:
+ if truelen < outlen:
+@@ -84,10 +84,10 @@ for (pid,truelen) in sizes.iteritems():
+
+
+
+-print("Adapter Trimmed:%d" % (TP) )
+-print("Adapter Missed:%d" % (FN) )
+-print("Adapter Total:%d" % (FN+TP) )
+-print("Genomic Bases Trimmed:%d" % (FP) )
+-print("Genomic Bases Total:%d" % (FP + TN) )
+-print("Adapter Trimming Sensitivity: %f" % (float(TP)/(float(TP)+float(FN))) )
+-print("Adapter Trimming Specificity: %f" % (float(TN)/(float(TN)+float(FP))) )
++print(("Adapter Trimmed:%d" % (TP) ))
++print(("Adapter Missed:%d" % (FN) ))
++print(("Adapter Total:%d" % (FN+TP) ))
++print(("Genomic Bases Trimmed:%d" % (FP) ))
++print(("Genomic Bases Total:%d" % (FP + TN) ))
++print(("Adapter Trimming Sensitivity: %f" % (float(TP)/(float(TP)+float(FN))) ))
++print(("Adapter Trimming Specificity: %f" % (float(TN)/(float(TN)+float(FP))) ))
+--- a/Test/seqlens.py
++++ b/Test/seqlens.py
+@@ -1,4 +1,4 @@
+-#!/usr/bin/env python
++#!/usr/bin/python3
+
+ from collections import defaultdict
+ from operator import itemgetter
+@@ -19,5 +19,5 @@ for line in stdin:
+
+
+
+-for (length,count) in sorted(seqlens.iteritems(), key=itemgetter(0),reverse=True):
+- print("%d\t%d"%(length,count))
++for (length,count) in sorted(iter(seqlens.items()), key=itemgetter(0),reverse=True):
++ print(("%d\t%d"%(length,count)))
=====================================
debian/patches/series
=====================================
@@ -1,3 +1,4 @@
fix_unused_variable_errors.patch
hardening.patch
replace-float-with-double.patch
+2to3.patch
=====================================
debian/rules
=====================================
@@ -10,16 +10,15 @@ export DEB_BUILD_MAINT_OPTIONS = hardening=+all
override_dh_auto_build:
dh_auto_build
cp SeqPrep seqprep
- TZ=UTC ronn -r --manual=seqprep --organization='Cancer Therapeutics Innovation Group' debian/seqprep.1.ronn
- markdown_py -f README.html README.md
+ markdown README.md > README.html
override_dh_clean:
dh_clean
rm -f seqprep
- rm -f debian/*.1
rm -f README.html
override_dh_auto_test:
+ifeq (,$(filter nocheck,$(DEB_BUILD_OPTIONS)))
# This checks that the tests run and produce byte-identical results.
cd Test && mkdir -p out info && \
bash -xc 'gzcat(){ zcat "$$@" ; } ; . RUNTEST.sh'
@@ -27,6 +26,7 @@ override_dh_auto_test:
# remove output dirs right after testing to make sure the files
# will not be included in the data package
rm -rf Test/info Test/out
+endif
override_dh_install-indep:
dh_install
=====================================
debian/seqprep.1
=====================================
@@ -0,0 +1,139 @@
+.\" generated with Ronn-NG/v0.8.0
+.\" http://github.com/apjanke/ronn-ng/tree/0.8.0
+.TH "SEQPREP" "1" "November 2017" "Cancer Therapeutics Innovation Group" "seqprep"
+.SH "NAME"
+\fBseqprep\fR \- merge paired end Illumina reads
+.P
+SeqPrep is a program to merge paired end Illumina reads that are overlapping into a single longer read\. It may also just be used for its adapter trimming feature without doing any paired end overlap\.
+.SH "USAGE"
+\fBseqprep\fR \fIrequired args\fR [options]
+.SH "Required Arguments:"
+.nf
+\-f <first read input fastq filename>
+\-r <second read input fastq filename>
+\-1 <first read output fastq filename>
+\-2 <second read output fastq filename>
+.fi
+.SH "General Arguments (Optional):"
+.nf
+\-3 <first read discarded fastq filename>
+\-4 <second read discarded fastq filename>
+\-h Display this help message and exit (also works with no args)
+\-6 Input sequence is in phred+64 rather than phred+33 format, the output will still be phred+33
+\-q <Quality score cutoff for mismatches to be counted in overlap; default = 13>
+\-L <Minimum length of a trimmed or merged read to print it; default = 30>
+.fi
+.SH "Arguments for Adapter/Primer Trimming (Optional):"
+.nf
+\-A <forward read primer/adapter sequence to trim as it would appear at the end of a read (recommend about 20bp of this)
+ (should validate by grepping a file); default (genomic non\-multiplexed adapter1) = AGATCGGAAGAGCGGTTCAG>
+\-B <reverse read primer/adapter sequence to trim as it would appear at the end of a read (recommend about 20bp of this)
+ (should validate by grepping a file); default (genomic non\-multiplexed adapter2) = AGATCGGAAGAGCGTCGTGT>
+\-O <minimum overall base pair overlap with adapter sequence to trim; default = 10>
+\-M <maximum fraction of good quality mismatching bases for primer/adapter overlap; default = 0\.020000>
+\-N <minimum fraction of matching bases for primer/adapter overlap; default = 0\.870000>
+\-b <adapter alignment band\-width; default = 50>
+\-Q <adapter alignment gap\-open; default = 8>
+\-t <adapter alignment gap\-extension; default = 2>
+\-e <adapter alignment gap\-end; default = 2>
+\-Z <adapter alignment minimum local alignment score cutoff [roughly (2*num_hits) \- (num_gaps*gap_open) \- (num_gaps*gap_close) \- (gap_len*gap_extend) \- (2*num_mismatches)]; default = 26>
+\-w <read alignment band\-width; default = 50>
+\-W <read alignment gap\-open; default = 26>
+\-p <read alignment gap\-extension; default = 9>
+\-P <read alignment gap\-end; default = 5>
+\-X <read alignment maximum fraction gap cutoff; default = 0\.125000>
+.fi
+.SH "Optional Arguments for Merging:"
+.nf
+\-y <maximum quality score in output ((phred 33) default = \']\' )>
+\-g <print overhang when adapters are present and stripped (use this if reads are different length)> \- UNIMPLEMENTED
+\-s <perform merging and output the merged reads to this file>
+\-E <write pretty alignments to this file for visual Examination>
+\-x <max number of pretty alignments to write (if \-E provided); default = 10000>
+\-o <minimum overall base pair overlap to merge two reads; default = 15>
+\-m <maximum fraction of good quality mismatching bases to overlap reads; default = 0\.020000>
+\-n <minimum fraction of matching bases to overlap reads; default = 0\.900000>
+.fi
+.P
+\fBNOTE 1\fR: The output is always gziped compressed\.
+.P
+\fBNOTE 2\fR: If the quality strings in the output contain characters less than asciii 33 on an ascii table (they look like lines from a binary file), try running again with or without the \-6 option\.
+.SH "SETUP"
+When an adapter sequence is present, that means that the two reads must overlap (in most cases) so they are forcefully merged\. When reads do not have adapter sequence they must be treated with care when doing the merging, so a much more specific approach is taken\. The default parameters were chosen with specificity in mind, so that they could be ran on libraries where very few reads are expected to overlap\. It is always safest though to save the overlapping procedure for libraries where you have some prior knowledge that a significant portion of the reads will have some overlap\.
+.P
+Before running SeqPrep make sure to check that the program\'s defaults are indeed the adapters you are looking for\. Try copying the default forward adapter from this file and grep it against your reads doing a word count, also try the same with the reverse adapter with grep\. You should see some hits\. You can also try using (and validating with grep) \fB\-A GATCGGAAGAGCACACG \-B AGATCGGAAGAGCGTCGT\fR as parameters\. To find a list of Illumina adapter sequences you should write to Illumina tech support TechSupport at illumina\.com (they do not like people to share the list of sequences outside of their institution)\.
+.P
+Choose about 20bp of an adapter sequence where:
+.IP "1." 4
+You see the most hits with grep\.
+.IP "2." 4
+When you run a command like \fBzcat Lane2_0d_2\.fastq\.gz | head \-n 1000000 |grep "INSERT ADAPTER HERE" | head\fR you see the adapter sequence show up at the beginning of a few reads\. Also the \-A and \-B arguments should be as they show up in your data, SeqPrep searches directly for these sequences without doing reverse complementing
+.IP "3." 4
+Check the forward and reverse and make sure that you have roughly the same number of hits via a command to count hits like: \fBzcat Lane2_0d_2\.fastq\.gz | head \-n 1000000 |grep "INSERT ADAPTER HERE" | wc \-l\fR As an additional precaution, the program checks for good read overlap once the adapters are trimmed\. If the adapter is trimmed and the reads do not have a reasonable adapter overlap (you can modify this setting with \-X) then the reads aren\'t printed or merged\.
+.IP "" 0
+.P
+See Test/README\.md for some information on testing out other parameters\. Test/SimTest has some particularly cool test data which you can use to check out sensitivity and specificity of adapter trimming using different parameters\. The results of the test are displayed in results\.html which uses the google charts API so that the points are interactive and you can easily determine which settings made which points\.
+.P
+LOW COMPLEXITY ALIGNMENTS
+.P
+My current strategy to deal with ambiguous alignments to low complexity regions is as follows:
+.P
+I have some minimum requirements for an overlap to be accepted After the first one is found (ie the one with the maximal overlap between the two sequences), if low complexity filtering is enabled, I keep searching if a second viable hit is found, I give up and say that it is not a good idea to merge the two reads\. I check for ambiguous alignments in read overlapping, but not in adapter trimming where the most conservative thing to do is strip the most aggressively aligned adapter (The closest to the beginning of the read)\.
+.P
+To accept an alignment I allow some fraction of mismatches (currently the floor of 0\.06 of the alignment length for adapter and 0\.02 of the alignment length for two reads)\. That means that in most cases for overlapping two reads I don\'t allow any mismatches between adjacent reads, but if there is a 50bp potential overlap with 1 mismatch over q20 for example, I allow it\. Anything below 50 needs to be perfect other than with low quality bases\.
+.P
+Since we ignore poor quality bases, we could have the case where a single real match followed by a long string of poor quality bases to the end of the read would result in a called overlap\. That seemed like a bad idea\. To get around that I require that at least some fraction of the overlapping length be matches\. Right now I have that parameter set at 0\.7 for adapter trimming and 0\.75 for read merging, so for a case where only the last 10 bases overlap, at least 7 of those must be matches\.
+.P
+Since doing that many floating point multiplications seems like a bad idea, I just have a table that pre\-calculates all of those min matches and max mismatch numbers for every overlap length up to the maximum allowed read length\.
+.P
+Finally I have a parameter you can set which specifies a minimum resulting read length after adapter trimming and/or merging so that ultra short trimmed reads aren\'t output\.
+.P
+Following are results from hand testing the three main merge cases\. Now to generate similar output automatically just supply the \-E readable_alignment\.txt\.gz argument to the program (the output is gzip compressed into the file name specified)\.
+.SH "Sequence Merge No Adapter Present:"
+.nf
+QUER: NCCTGCTACTACCACCCGTTCCGTGCCTGGAGCCTGCATGTTGGGCAGATACGTGCTGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTC
+ || |||||||||||| || | |||||||||||||||||||||||||||||||||
+SUBJ: TGTGTGTTGGGCAGATGCGGGGGGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTCTCCTGTTCCTTGCCCACGTCTCCGTCTCCTGTTG
+RESU: NCCTGCTACTACCACCCGTTCCGTGCCTGGAGCCTGCATGTTGGGCAGATACGTGCTGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTCTCCTGTTCCTTGCCCACGTCTCCGTCTCCTGTTG
+Quality Merge:
+QUER: !223387787@@@CCC22C@@@@@@@@@@@@@@@@@@@@@@@@@@@@?@@89887:::::\.2125@@:@@:::::@@@@@<<::8@@@@@
+SUBJ: !!!!!!!!!!!!!!!!!!!!!!!!!!!@@@8DEGE at EDDBB2<BBE at EHBFE@EE>D8 at DBE>BFIDH at IIEEIIBEIEIIGBIIGIFII
+RESU: !223387787@@@CCC22C@@@@@@@@@@@@@@@@@@@@@@@@@@@@?@@89887:::::\.QPQLSSSSSSSSSSQSSSSSSSSSSSSSSD8 at DBE>BFIDH at IIEEIIBEIEIIGBIIGIFII
+.fi
+.SH "Sequence Merge Adapter Present, Easy Peezy Mode (same lengths):"
+.nf
+SUBJ: NGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
+ |||||||||||||||||||||||||||||||||||||
+QUER: GGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
+RESU: GGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
+Quality Merge:
+SUBJ: !\.\-/\.53444@@@@@@@@@@@@@@@@@@@@@@@@@@@@
+QUER: IHGIIIDIIHGEHIGHIFHIFIIIIHIIIIIIIIIHII
+RESU: ISSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
+.fi
+.SH "Sequence merge Adapter but lengths differ:"
+.nf
+SUBJ: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTAAGATTGGA
+ |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
+QUER: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTA
+RESU: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTAAGATTGGA
+Quality Merge:
+SUBJ: =DEC??DDBD?4B=BEE@@@GB>GEE:DE8=2::6GDGBGEGDD<=;A?=AGGGG=5\.=<BD?B?DDB>B4725:E>
+QUER: GDDBBFBGGFBHFIEDGGGBDGGG<GGDDG at IIIEIHDIHGIIIDDGDGDFDIFIHGIDEGGGDIIIGI
+RESU: SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSB4725:E>
+.fi
+.P
+If interested there is a website where I post my tests of different parameters for SeqPrep on simulated data\. There are also a few comparison stats of different programs to trim adapters\. The website can be accessed here: \fBhttp://hgwdev\.cse\.ucsc\.edu/~jstjohn/seqprep/\fR where the pages are named result(date)\.html\. The latest ones (as of when I have gotten around to edit this) can be found here:
+.P
+\fBhttp://hgwdev\.cse\.ucsc\.edu/~jstjohn/seqprep/results2011\-09\-15\.html\fR
+.P
+Note that although my program is more sensitive and specific than fastq\-clipper, I optomized my default parameters based on this test\. Results on real data may be different, although I believe my method takes advantage of a more realistic adapter model than other software does\. For example, even though my program requires 10bp of adapter to be present at the end of a read to trim it off (by default) there is a backup adapter trimming function that trimms based on strong and unambiguous read overlap\. Because of this my program can trim the adapter even if it is only present in the last few bases of the read\.
+.P
+Also note that fastq\-mcf appears to do a little better at sensitivity (0\.992 vs 0\.985) at a very large cost to specificity (0\.497 vs 0\.994)\.
+.SH "AUTHOR"
+.IP "\[ci]" 4
+All content by John St\. John
+.IP "\[ci]" 4
+Manpage edited for Debian by Tim Booth
+.IP "" 0
+
=====================================
debian/seqprep.1.ronn deleted
=====================================
@@ -1,174 +0,0 @@
-seqprep(1) -- merge paired end Illumina reads
-===========
-
-SeqPrep is a program to merge paired end Illumina reads that are overlapping into
-a single longer read. It may also just be used for its adapter trimming feature
-without doing any paired end overlap.
-
-## USAGE
-
- `seqprep` <required args> [options]
-
-## Required Arguments:
-
- -f <first read input fastq filename>
- -r <second read input fastq filename>
- -1 <first read output fastq filename>
- -2 <second read output fastq filename>
-
-## General Arguments (Optional):
-
- -3 <first read discarded fastq filename>
- -4 <second read discarded fastq filename>
- -h Display this help message and exit (also works with no args)
- -6 Input sequence is in phred+64 rather than phred+33 format, the output will still be phred+33
- -q <Quality score cutoff for mismatches to be counted in overlap; default = 13>
- -L <Minimum length of a trimmed or merged read to print it; default = 30>
-
-## Arguments for Adapter/Primer Trimming (Optional):
-
- -A <forward read primer/adapter sequence to trim as it would appear at the end of a read (recommend about 20bp of this)
- (should validate by grepping a file); default (genomic non-multiplexed adapter1) = AGATCGGAAGAGCGGTTCAG>
- -B <reverse read primer/adapter sequence to trim as it would appear at the end of a read (recommend about 20bp of this)
- (should validate by grepping a file); default (genomic non-multiplexed adapter2) = AGATCGGAAGAGCGTCGTGT>
- -O <minimum overall base pair overlap with adapter sequence to trim; default = 10>
- -M <maximum fraction of good quality mismatching bases for primer/adapter overlap; default = 0.020000>
- -N <minimum fraction of matching bases for primer/adapter overlap; default = 0.870000>
- -b <adapter alignment band-width; default = 50>
- -Q <adapter alignment gap-open; default = 8>
- -t <adapter alignment gap-extension; default = 2>
- -e <adapter alignment gap-end; default = 2>
- -Z <adapter alignment minimum local alignment score cutoff [roughly (2*num_hits) - (num_gaps*gap_open) - (num_gaps*gap_close) - (gap_len*gap_extend) - (2*num_mismatches)]; default = 26>
- -w <read alignment band-width; default = 50>
- -W <read alignment gap-open; default = 26>
- -p <read alignment gap-extension; default = 9>
- -P <read alignment gap-end; default = 5>
- -X <read alignment maximum fraction gap cutoff; default = 0.125000>
-
-## Optional Arguments for Merging:
-
- -y <maximum quality score in output ((phred 33) default = ']' )>
- -g <print overhang when adapters are present and stripped (use this if reads are different length)> - UNIMPLEMENTED
- -s <perform merging and output the merged reads to this file>
- -E <write pretty alignments to this file for visual Examination>
- -x <max number of pretty alignments to write (if -E provided); default = 10000>
- -o <minimum overall base pair overlap to merge two reads; default = 15>
- -m <maximum fraction of good quality mismatching bases to overlap reads; default = 0.020000>
- -n <minimum fraction of matching bases to overlap reads; default = 0.900000>
-
-`NOTE 1`: The output is always gziped compressed.
-
-`NOTE 2`: If the quality strings in the output contain characters less than asciii
-33 on an ascii table (they look like lines from a binary file), try running again
-with or without the -6 option.
-
-## SETUP
-
-When an adapter sequence is present, that means that the two reads must overlap
-(in most cases) so they are forcefully merged. When reads do not have adapter
-sequence they must be treated with care when doing the merging, so a much more
-specific approach is taken. The default parameters were chosen with specificity
-in mind, so that they could be ran on libraries where very few reads are
-expected to overlap. It is always safest though to save the overlapping
-procedure for libraries where you have some prior knowledge that a significant
-portion of the reads will have some overlap.
-
-Before running SeqPrep make sure to check that the program's defaults are indeed
-the adapters you are looking for. Try copying the default forward adapter from
-this file and grep it against your reads doing a word count, also try the same
-with the reverse adapter with grep. You should see some hits. You can also try
-using (and validating with grep) `-A GATCGGAAGAGCACACG -B AGATCGGAAGAGCGTCGT` as
-parameters. To find a list of Illumina adapter sequences you should write to
-Illumina tech support TechSupport at illumina.com (they do not like people to share
-the list of sequences outside of their institution).
-
-Choose about 20bp of an adapter sequence where:
-
-1. You see the most hits with grep.
-1. When you run a command like
-`zcat Lane2_0d_2.fastq.gz | head -n 1000000 |grep "INSERT ADAPTER HERE" | head`
-you see the adapter sequence show up at the beginning of a few reads. Also the -A and -B
-arguments should be as they show up in your data, SeqPrep searches directly for
-these sequences without doing reverse complementing
-1. Check the forward and
-reverse and make sure that you have roughly the same number of hits via a
-command to count hits like:
- `zcat Lane2_0d_2.fastq.gz | head -n 1000000 |grep "INSERT ADAPTER HERE" | wc -l`
-As an additional precaution, the program checks
-for good read overlap once the adapters are trimmed. If the adapter is trimmed
-and the reads do not have a reasonable adapter overlap (you can modify this
-setting with -X) then the reads aren't printed or merged.
-
-See Test/README.md for some information on testing out other parameters.
-Test/SimTest has some particularly cool test data which you can use to check out
-sensitivity and specificity of adapter trimming using different parameters. The
-results of the test are displayed in results.html which uses the google charts
-API so that the points are interactive and you can easily determine which
-settings made which points.
-
-LOW COMPLEXITY ALIGNMENTS
-
-My current strategy to deal with ambiguous alignments to low complexity regions is as follows:
-
-I have some minimum requirements for an overlap to be accepted
-After the first one is found (ie the one with the maximal overlap between the two sequences), if low complexity filtering is enabled, I keep searching
-if a second viable hit is found, I give up and say that it is not a good idea to merge the two reads.
-I check for ambiguous alignments in read overlapping, but not in adapter trimming where the most conservative thing to do is strip the most aggressively aligned adapter (The closest to the beginning of the read).
-
-To accept an alignment I allow some fraction of mismatches (currently the floor of 0.06 of the alignment length for adapter and 0.02 of the alignment length for two reads). That means that in most cases for overlapping two reads I don't allow any mismatches between adjacent reads, but if there is a 50bp potential overlap with 1 mismatch over q20 for example, I allow it. Anything below 50 needs to be perfect other than with low quality bases.
-
-Since we ignore poor quality bases, we could have the case where a single real match followed by a long string of poor quality bases to the end of the read would result in a called overlap. That seemed like a bad idea. To get around that I require that at least some fraction of the overlapping length be matches. Right now I have that parameter set at 0.7 for adapter trimming and 0.75 for read merging, so for a case where only the last 10 bases overlap, at least 7 of those must be matches.
-
-Since doing that many floating point multiplications seems like a bad idea, I just have a table that pre-calculates all of those min matches and max mismatch numbers for every overlap length up to the maximum allowed read length.
-
-Finally I have a parameter you can set which specifies a minimum resulting read length after adapter trimming and/or merging so that ultra short trimmed reads aren't output.
-
-Following are results from hand testing the three main merge cases. Now to generate similar output automatically just supply the -E readable_alignment.txt.gz argument to the program (the output is gzip compressed into the file name specified).
-
-## Sequence Merge No Adapter Present:
-
- QUER: NCCTGCTACTACCACCCGTTCCGTGCCTGGAGCCTGCATGTTGGGCAGATACGTGCTGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTC
- || |||||||||||| || | |||||||||||||||||||||||||||||||||
- SUBJ: TGTGTGTTGGGCAGATGCGGGGGGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTCTCCTGTTCCTTGCCCACGTCTCCGTCTCCTGTTG
- RESU: NCCTGCTACTACCACCCGTTCCGTGCCTGGAGCCTGCATGTTGGGCAGATACGTGCTGCCACAGCCTGTCTCTGCTGGTGCCTGGGCCTCTCCTGTTCCTTGCCCACGTCTCCGTCTCCTGTTG
- Quality Merge:
- QUER: !223387787@@@CCC22C@@@@@@@@@@@@@@@@@@@@@@@@@@@@?@@89887:::::.2125@@:@@:::::@@@@@<<::8@@@@@
- SUBJ: !!!!!!!!!!!!!!!!!!!!!!!!!!!@@@8DEGE at EDDBB2<BBE at EHBFE@EE>D8 at DBE>BFIDH at IIEEIIBEIEIIGBIIGIFII
- RESU: !223387787@@@CCC22C@@@@@@@@@@@@@@@@@@@@@@@@@@@@?@@89887:::::.QPQLSSSSSSSSSSQSSSSSSSSSSSSSSD8 at DBE>BFIDH at IIEEIIBEIEIIGBIIGIFII
-
-## Sequence Merge Adapter Present, Easy Peezy Mode (same lengths):
-
- SUBJ: NGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
- |||||||||||||||||||||||||||||||||||||
- QUER: GGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
- RESU: GGATATGATTCCCAATCTAAGCAAACTGTCATGGAAAC
- Quality Merge:
- SUBJ: !.-/.53444@@@@@@@@@@@@@@@@@@@@@@@@@@@@
- QUER: IHGIIIDIIHGEHIGHIFHIFIIIIHIIIIIIIIIHII
- RESU: ISSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSS
-
-
-## Sequence merge Adapter but lengths differ:
-
- SUBJ: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTAAGATTGGA
- |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
- QUER: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTA
- RESU: AATTGATGGGTGCCCACCCACGGGCCAGACAAAATCATCTGGCAAGCTGGATGCAGCCTACAAGCTGTAAGATTGGA
- Quality Merge:
- SUBJ: =DEC??DDBD?4B=BEE@@@GB>GEE:DE8=2::6GDGBGEGDD<=;A?=AGGGG=5.=<BD?B?DDB>B4725:E>
- QUER: GDDBBFBGGFBHFIEDGGGBDGGG<GGDDG at IIIEIHDIHGIIIDDGDGDFDIFIHGIDEGGGDIIIGI
- RESU: SSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSSB4725:E>
-
-If interested there is a website where I post my tests of different parameters for SeqPrep on simulated data. There are also a few comparison stats of different programs to trim adapters. The website can be accessed here: `http://hgwdev.cse.ucsc.edu/~jstjohn/seqprep/`
-where the pages are named result(date).html. The latest ones (as of when I have gotten around to edit this) can be found here:
-
-`http://hgwdev.cse.ucsc.edu/~jstjohn/seqprep/results2011-09-15.html`
-
-Note that although my program is more sensitive and specific than fastq-clipper, I optomized my default parameters based on this test. Results on real data may be different, although I believe my method takes advantage of a more realistic adapter model than other software does. For example, even though my program requires 10bp of adapter to be present at the end of a read to trim it off (by default) there is a backup adapter trimming function that trimms based on strong and unambiguous read overlap. Because of this my program can trim the adapter even if it is only present in the last few bases of the read.
-
-Also note that fastq-mcf appears to do a little better at sensitivity (0.992 vs 0.985) at a very large cost to specificity (0.497 vs 0.994).
-
-## AUTHOR
-
-* All content by John St. John
-* Manpage edited for Debian by Tim Booth
=====================================
debian/tests/run-unit-test
=====================================
@@ -1,11 +1,11 @@
#!/bin/sh -e
pkg=seqprep
-if [ "$ADTTMP" = "" ] ; then
- ADTTMP=`mktemp -d /tmp/${pkg}-test.XXXXXX`
- trap "rm -rf $ADTTMP" 0 INT QUIT ABRT PIPE TERM
+if [ "$AUTOPKGTEST_TMP" = "" ] ; then
+ AUTOPKGTEST_TMP=`mktemp -d /tmp/${pkg}-test.XXXXXX`
+ trap "rm -rf $AUTOPKGTEST_TMP" 0 INT QUIT ABRT PIPE TERM
fi
-cd $ADTTMP
+cd $AUTOPKGTEST_TMP
cp -a /usr/share/doc/${pkg}/examples/* .
# sed -i 's#../SeqPrep#/usr/bin/seqprep#' RUNTEST.sh
mkdir -p out info && \
=====================================
debian/upstream/metadata
=====================================
@@ -1,9 +1,12 @@
Registry:
- - Name: SciCrunch
- Entry: SCR_013004
- - Name: OMICtools
- Entry: OMICS_01092
- - Name: conda:bioconda
- Entry: seqprep
- - Name: bio.tools
- Entry: seqprep
+- Name: SciCrunch
+ Entry: SCR_013004
+- Name: OMICtools
+ Entry: OMICS_01092
+- Name: conda:bioconda
+ Entry: seqprep
+- Name: bio.tools
+ Entry: seqprep
+Bug-Database: https://github.com/jstjohn/SeqPrep/issues
+Repository: https://github.com/jstjohn/SeqPrep.git
+Repository-Browse: https://github.com/jstjohn/SeqPrep
View it on GitLab: https://salsa.debian.org/med-team/seqprep/compare/ead35b36209890332f8d3e8b9d3dfd1cb132f5f2...a3d5618eb1aefc6146529dd88cba9dae0403f113
--
View it on GitLab: https://salsa.debian.org/med-team/seqprep/compare/ead35b36209890332f8d3e8b9d3dfd1cb132f5f2...a3d5618eb1aefc6146529dd88cba9dae0403f113
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191217/86d62128/attachment-0001.html>
More information about the debian-med-commit
mailing list