[med-svn] [SCM] sga branch, master, created. dfe74633fb98a17bb0540ac23c12a907c0055cc6
jts
jared.simpson at gmail.com
Thu Nov 8 08:10:26 UTC 2012
The branch, master has been created
at dfe74633fb98a17bb0540ac23c12a907c0055cc6 (commit)
- Shortlog ------------------------------------------------------------
commit dfe74633fb98a17bb0540ac23c12a907c0055cc6
Merge: b4efb32 f89c103
Author: jts <jared.simpson at gmail.com>
Date: Wed Oct 17 01:09:40 2012 -0700
Merge pull request #30 from sjackman/patch-1
ld_set is static inline. Closes #29
commit f89c103addcae4b219a3ee3bc7a34b1bb5e9c7de
Author: Shaun Jackman <sjackman at gmail.com>
Date: Tue Oct 16 15:29:49 2012 -0700
ld_set is static inline. Closes #29
Fix a compiler error.
commit b4efb323ede8367e238671f3f15eaa1748b715bb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 26 19:40:44 2012 +0100
Added namespace to fix compile error on OSX
commit b5cac74877f0060c4779d90f98879537793fdfc3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 25 14:02:56 2012 +0100
v0.9.35
commit 8a631d89fe8aa6b5176baae66bbd635ea0cace42
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 25 13:40:27 2012 +0100
Implemented an upper limit on the number of edges we allow a vertex to have before giving up on using it in the assembly graph.
This helps limit the memory usage in pathological cases, like when the genome contains a lot of satellite DNA.
commit 2a103a5608700634587689d295c2624d9b2a997a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 24 16:46:06 2012 +0100
When building the FM-index using ropebwt the lexicographic index is built using openmp if the compiler supports it.
commit 67bf3d44e35c6d8579f60b8dcd2ed6e4efcdc49b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 24 15:33:40 2012 +0100
github issue 14: sga index should exit gracefully when the input file is empty.
commit 768a4ac86b00b626cc8f785fded670c917a54bbf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 24 15:21:37 2012 +0100
github issue 26: Removed references to old --quality-scale parameter
commit 56d5c67810bb8995e9d2043fac1f4a006cd429e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 24 15:18:51 2012 +0100
github issue 27: added --no-primer-check option to preprocess. Also, cleaned up help message.
commit ae5be395722289527f51a4238aaa298711334079
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 24 15:06:08 2012 +0100
github issue 25: implement writing orphaned pairs to a file during preprocess
commit fc3f17208516719d31d1f1b6c3dfc5ba8bdcfb93
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 7 15:45:47 2012 +0100
Fix divide-by-zero in sga-astat
commit a79558c6c61ddf08ce85d03750f724823c9829d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 7 13:00:15 2012 +0100
Set the --mina option to abyss
commit 4c185d6adedcfde98a86c8b60d27f4a6a3fa1d74
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 7 10:04:35 2012 +0100
Change default min distance to -99 bases
commit 79c6c969ffbc445fb9e9b2fb89205894c2f03cb0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 7 10:03:52 2012 +0100
Add a dependency check to bam2de.pl and set the --mind parameter
commit f281663f9f0e3b651588b667a2ad8918d32891ad
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 24 09:22:31 2012 +0100
Fixed compilation warnings pointed out by Zhang Feng.
commit b43900af0e7beef354fb1157987ef3b61e69eecf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 24 09:15:20 2012 +0100
Whitespace changes
commit 9d84856d0991d8ff78dce1bf52b1152f1869db26
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 23 16:21:33 2012 +0100
v0.9.34
commit d19f4f62b1ebdbff7af23f626c378cad96eebf6c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 23 16:20:04 2012 +0100
Explicitly construct the .sai file when using ropebwt since you cannot get the lexicographic index from ropebwt when read lengths vary
commit ec2fa2717e7f8da4ef2684289621ed6b47d82c4b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 21 13:57:53 2012 +0100
v0.9.33
commit 9294bc6ff75d615db731d0a077b94977916979da
Merge: 7110f58 732d272
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 21 12:52:22 2012 +0100
Merge branch 'master' of github.com:jts/sga
commit 732d272288ef03a1d3d3877de1ed39b4c60affd6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 21 13:50:47 2012 +0100
Cleanup code and comments
commit 2b4a4cef3b5c887adc5de023c361a04a64ba3f74
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 21 13:47:01 2012 +0100
More aggressive cycle detection/removal in --strict mode of sga scaffolder
commit 7357f0b39af31471138b85bf6ba7a41edca71994
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 21 12:05:46 2012 +0100
When removing transitive edges from the scaffold graph, check the orientation of contigs in the layout
commit 7110f5833f5d314df2e7f4aae4211ae5e57978e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 20 20:11:56 2012 +0100
v0.9.32
commit f11c2191d0c93573a58e62a68a1b59ee0460f18c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 16 10:11:31 2012 +0100
Reverted to global alignment for haplotype-haplotype alignments
commit b0248d450e7cebde6118a3e678b8b76fb017da1e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 15 15:38:31 2012 +0100
Removed print
commit 2c877c5dc02b19c2b037570100b8b08c93146f6e
Merge: d783992 e87fac8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 15 14:23:01 2012 +0100
Merge branch 'master' of /nfs/users/nfs_c/caa/source/sga-basequals
commit d783992e4bb747cecd026cbf216306e32cdfbc2c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 15 14:22:14 2012 +0100
Using quality scores is now a command line option
commit e87fac84310240a509ae53c71a0e35c43d224758
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Aug 15 14:02:44 2012 +0100
Fixed DindelHaplotype::extractVariants() incorrect assertion
commit d11e0c8272aa59c38e38be0adbfc3a1b4cbf51dc
Merge: b84ccd3 7cbcbd0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 15 13:55:25 2012 +0100
Merge branch 'quality-scores'
commit 3879e723666dc9f5202d89942ea1ff4df6772bb1
Merge: b6272c3 b84ccd3
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Aug 15 13:21:36 2012 +0100
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit b6272c38ab6fbfc2d694f634e40577916ff303aa
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Aug 15 13:21:31 2012 +0100
fixing alignment bug
commit b84ccd3890e93da8492c27448cc35ddc4c8569ae
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 15 12:10:08 2012 +0100
Perform semi-global haplotype realignment within dindel
commit 6b4fa6d2027918f9f6d8b176949488c1ddf5c31e
Merge: f1e02e6 5a40ba3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 14:53:49 2012 +0100
Merge branch 'master' of /nfs/users/nfs_c/caa/source/sga-basequals
commit 5a40ba373175e83ca7e4b3c7e7f8417a7cc08221
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Aug 14 14:53:23 2012 +0100
Fixed inf bug in genotyping
commit f1e02e66e2af1685c2451fd965176f795fed6dec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 10:52:00 2012 +0100
Updated examples to use ropebwt
commit 1273ff241e8e7196dfe157e3402087af64559516
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:58:10 2012 +0100
Update README to credit Heng's ropebwt implementation
commit bceaceeb518bfb3516cd887832adaf17a4aa41b7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:43:41 2012 +0100
Updated sga-index help text
commit 42b5f294b8a4e3a0e2dcdabd3cf567071ddd0f08
Merge: 9316e8b 003059a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:39:08 2012 +0100
Merge branch 'master' of github.com:jts/sga
commit 9316e8bf1cb25f74288a6a54ea7a9662b54c6f02
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:38:34 2012 +0100
v0.9.31
commit d6e98856a44b4268cb2c515513a9746213256ce9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:38:13 2012 +0100
Ropebwt algorithm now uses the command line threading parameter
commit 4e87ce26ff6a1f2eecefb89365fd39fe5b155f49
Merge: ca1b169 2826ea6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 14 09:34:30 2012 +0100
Merge branch 'ropebwt'
commit 2826ea6af93b70deca3da826eb4d3b4b70d0887f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 13 16:23:30 2012 +0100
ropebwt: .sai file is now written, reversed index can be constructed
commit 7cbcbd0379d5b43120a49b0f42e7c22a04a785f2
Merge: 852b3e9 ca1b169
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 13 15:03:54 2012 +0100
Merge branch 'master' into quality-scores
commit 0eddf2abe2649a26ee4490c6095d1a3615275b27
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 13 14:58:54 2012 +0100
BWT writing now function in ropebwt
commit ca1b169661ff14f2a4f43d4029739879f07800c9
Merge: ff5c2ca d2a0c8a
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Mon Aug 13 14:55:01 2012 +0100
Genotyping in multisample caller seems to work.
commit ff5c2cae90730c7429fdfc2db5c4860e8f563743
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Mon Aug 13 12:03:07 2012 +0100
Added genotyping.
commit d2a0c8ab7fa5b799328e386b58e7c611d17d28ab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 13 12:02:18 2012 +0100
Wrote a new VCFCollections wrapper to pass sample names to Dindel
commit 9acd5580c1073625e5f513a4e75a0e729571e51a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 13 11:41:30 2012 +0100
Integrating Heng Li's ropebwt code
commit 003059a5f788497dd0b8a4d171e6198526b336a6
Merge: 2df18a4 52eaac6
Author: jts <jared.simpson at gmail.com>
Date: Mon Aug 13 01:29:18 2012 -0700
Merge pull request #22 from nathanhaigh/master
Minor STDOUT updates
commit 52eaac601a72c1c7ac2af1919486e7117c5fd38b
Merge: 129aa25 2df18a4
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Mon Aug 13 09:59:40 2012 +0930
Merge remote-tracking branch 'upstream/master'
commit 129aa25d50ff209d68cb2f62543a992da2e9a988
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Mon Aug 13 09:42:06 2012 +0930
Additional info (algorithm used) sent to STDOUT when using SAIS - this is to be consistent with the output when BCR is used.
commit 4c20b91fd0ba9bceacc70f99e9575e6ba4be92ad
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Mon Aug 13 09:40:57 2012 +0930
Send info about which file is being processed to STDOUT.
commit 852b3e9b87a0c9a1f68a17052438ea20b7e740f8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 9 10:59:21 2012 +0100
Set GraphCompare verbosity to the user's requested value
commit c4a64714c9d83bbb410af35d0ef3292d91d3a4b3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 9 08:59:30 2012 +0100
Integrated quality scores into the variant calling pipeline
commit 647244eafc137396f54e225591ad608a7f8f8a77
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 8 13:50:49 2012 +0100
Started implementation of quality scores for variant calling pipeline
commit 2df18a494df8b914c74e93149cc412b05123b1e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 7 09:52:58 2012 +0100
Fixed GCC 4.6 warnings
commit 8477ae6a6d2e01cee72b2f92feb8b77a3eeb1a93
Merge: e053247 0e3f0f7
Author: jts <jared.simpson at gmail.com>
Date: Tue Aug 7 01:33:58 2012 -0700
Merge pull request #20 from nathanhaigh/master
merge pull request 20
commit e053247d9283aff1f16ff0d24123f7f1ca893fd0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 7 09:24:49 2012 +0100
issue 21: merge fails when filename contains a '.' that is not part of the file extension. fixed by more careful handling of gzipped suffixes
commit 0e3f0f7cff1cc2eaa7d8c0a6219ec99fe53d9e20
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Fri Aug 3 16:59:31 2012 +0930
Added support for .f and .r read pair suffixes found in reads output by sff_extract version < 0.3.0.
commit 897727191488d0838195b17c62d8a6483fda20a8
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Fri Aug 3 15:52:17 2012 +0930
Consistently display help when no command arguments are given.
commit f4c134beb8efbea587b82b96b56aabba79e5d416
Merge: 7b4397f 29e15ed
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Fri Aug 3 15:29:30 2012 +0930
Merge branch 'master' of git://github.com/jts/sga
commit 29e15ed8da4554fdd8244fa55e449d5dae52e3ac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 2 19:59:30 2012 +0100
Fixed unused variable warning
commit 7b4397f248b4ff257ab06384e3e5cbf071d372e3
Merge: a4c9cd4 9488ba2
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Mon Jul 30 09:09:24 2012 +0930
Merge branch 'master' of git://github.com/jts/sga
commit a4c9cd47194f6850233d5587e122a4bbd9129ff4
Author: Nathan S. Watson-Haigh <nathan.watson-haigh at awri.com>
Date: Mon Jul 30 09:07:34 2012 +0930
Just need to specify base name of the FASTA files.
commit 9488ba2b92ed993ae23ecd7cf69dde356f16a5a0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 27 11:29:07 2012 +0100
Accept .fq as a fastq file extension when choosing the output name for sga-merge
commit 042a127776544bd2f71a7825124ad15f63a32f68
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 27 11:21:09 2012 +0100
When merging, if input reads are fastq/gzipped, write to the same
commit 47b8ca7091741cd09b83814fd987300ec2cacfa6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 27 11:12:30 2012 +0100
Fixed how the name of the BWT file is computed when the fastq file is gzipped
commit 3e2121bee9d5a9e1b55eb8ebd09520e552348f1e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 20 16:55:09 2012 +0100
v0.9.3
commit 0c3a5c265b7f28c98150fe50690e547608178814
Merge: bb2f0ea d87f4a5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 20 14:36:58 2012 +0100
Merge branch 'graph-diff-refactor'
Conflicts:
src/SGA/merge.cpp
commit d87f4a53dc804c59cdfe4fb1f036829e43d3ec9d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 20 14:21:43 2012 +0100
No longer extracts read mates. Cleaned up prints
commit bb2f0ea00440dbd8724768e81653e0a5a3047572
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 18 21:47:04 2012 +0100
scaffold2fasta: write the orientation of the contigs when --write-names is specified
commit 4eb361007da799fcbe0674d3e5d35c2637072aba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 17 13:58:00 2012 +0100
temporary hack to fix crash when trying to extract pairs of reads from an unpaired index
commit ef79e9e603d08cc78414d3e62cafaf241f624458
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 8 22:34:23 2012 +0100
Require at least two occurrences in the base sequence when making comparative calls
commit 20519aa3e8e149e490c91feee90422df91440e81
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 8 22:05:51 2012 +0100
Apply a minimum coverage of 2 during haplotype QC in non-ref mode
commit 0cb42e7e8d5266abca085b88aeb7ee43df180efd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 4 14:51:15 2012 +0100
Fixed missing ID for singleton scaffolds
commit 9d8469cb99b3d923d250b4f037c26f07a5473195
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 4 14:47:09 2012 +0100
Reversed order of contigs when building scaffolds from right-to-left
commit 2c6575b3da3dfe3599a637337bb3eb1aa4665544
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 4 13:17:36 2012 +0100
added new option to scaffold2fasta --write-names, which outputs the names of the contigs that make up the scaffold
commit 1f1bdc1910e387d4941fe82f338acae157fd0e8e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 2 16:39:03 2012 +0100
Fixed cluster extend --iterations option so it properly extends for N rounds, not to N reads
commit 30b263f45df8fd3615e2d581dba7ec4c795ddc84
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 30 21:56:14 2012 +0100
Reverted to using a 31-mer for mapping. Lowered MAX_READS.
commit 47562c42d8e2dd206eb59135e03d8bec6d296a2b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 30 19:41:18 2012 +0100
Disable homopolymer filter
commit 7b72babd5048138eafff3a2e8a74d2acccafb624
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 29 13:31:34 2012 +0100
Made haplotype QC less strict
commit f109a1a280e3429b6583d2b9f88f699b268a4bd0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 29 13:27:07 2012 +0100
Fixed compiler warning
commit 21daaeb9dafc0cfeeda23e4e081f07bb81657978
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 29 11:09:41 2012 +0100
sga-graphdiff now directly outputs the final set of calls.
commit 3b8915cc627242c2d23d317bf737bd5b255f7d3b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 29 10:36:10 2012 +0100
Added homopolymer filter. Dindel code now outputs VCFRecords
commit 02f718a41f4d9d3e112184d203ab9ca25887d6ed
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 27 10:32:00 2012 +0100
Build parallel haplotypes and reduce the mapping kmer size
commit aed67f618935635e032b171fd34ecdf73243f0a4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 21 10:35:44 2012 +0100
Added subprogram to evaluate how well we can detect mutations with k-mers
commit 528b53ed46a86a5c06727ef099291472f2c9eeee
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 20 15:01:25 2012 +0100
Early exit from the overlap function when the input read is shorter than min_overlap
commit 0a8f87174193fc6e3b02bba6c539216650200b38
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 19 14:22:28 2012 +0100
Move out of bounds check to inside loop
commit f3becf3256ced9eee98e09bf43c3bf8f408edab2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 15 13:47:56 2012 +0100
Lowered default storage level for merging BWTs during sga-index
commit 6280d590a92581b584d7f37fd2ec3be4c6e2048f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 15 13:38:49 2012 +0100
Fix the way that homopolymer runs are counted
commit 7843175d45da4e8b1437a6d66946c509197b8bcc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 15 09:33:45 2012 +0100
Tweaked read extraction parameters, cleaned up some debug output
commit c0b898bfa36ef8c09f56a0441dc988261af20e8e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 14 13:07:38 2012 +0100
Attempt to fix assertion when counting homopolymer lengths
commit 6c02c675e65de30c33070d87f22f86609f59b373
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 14 13:07:13 2012 +0100
Use correct file status for popidx
commit 0aaece8040e5a83b0b29373b877dc4775587ea5b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 14 13:06:39 2012 +0100
Use full path to indices
commit 37f6ed4ce535449c9f38404627b61fffa69632ce
Merge: 16bd2fe 64faf25
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 12 16:43:45 2012 +0100
Merge /nfs/users/nfs_c/caa/source/sga-graph-diff-refactor into graph-diff-refactor
commit 64faf2553c6cdc973619942ff35137bd87396840
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Jun 12 16:41:36 2012 +0100
Fixed homopolymer error when variant is in last column
commit 16bd2fe3e95cc059787ca19661ec9a724125c503
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 11 12:26:52 2012 +0100
sga-merge uses the full path to the input files so they do not need to be in the working directory
commit 562a9eb558c4589b219157daab9ae6e5db61569e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 11 11:42:58 2012 +0100
sga-merge will not merge population indices
commit 274c3aef6996d6aad95e397c2eb2844f2567592f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 11 11:06:10 2012 +0100
sga-beetl-index now has a no-convert option
commit 7fcd28531d363b34ac15e9523373159a35eb5d09
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 11 10:50:44 2012 +0100
sga-beetl-index.pl now converts fastq to fasta
commit 15b77afe3a499afedcac868b57adab1cfb637ba4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 10 22:28:34 2012 +0100
Use the new version of BEETL. Rewrote convert-beetl to use far less memory.
commit ebf3f6b16cb879c25ea6b8e76d141911590dc9d3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 10 21:56:04 2012 +0100
Removed debug prints
commit 1c187f972d545a3ec025b43d014072207b5fd105
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Thu May 31 15:27:46 2012 +0100
Added multisample EM caller.
commit a91282bb4414afe8afaea97110dea3f1e379e4e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 31 10:07:09 2012 +0100
Fix substring assertion when null strings are passed to calculateDustScore
commit c723a712684ab3b174c34eb524644542e17550fe
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 29 09:09:49 2012 +0100
Version v0.9.20
commit 55afe7a05d3c17258a618829b2a4ffa1f6fc24d1
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Thu May 24 17:37:06 2012 +0100
Added MultiSample EM and caller. Fixed homopolymer, added AmbiMap filter tag
commit 2d3f26db512c6157a6b079639dbfccfee6a7d220
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 24 11:30:25 2012 +0100
When resolving scaffold gaps over ambiguity codes, the flanking sequence of the filled gap may not match that of the scaffold anymore.
commit 9f88a2a0f62a8c9161582a2f4e2317525ebcaa8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 24 10:33:42 2012 +0100
Clean up assertions so empty BWTs can be written after filtering
commit 7bb6f86e35544b79a7a16a8e1c82cb03ec6e5266
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 23 15:02:26 2012 +0100
Minor formatting change
commit b849e6b615fe47cc089ea16bc8e34485faf8bb8c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 23 14:59:27 2012 +0100
Added assertion warning for mate-pair mode
commit 73bfc59a00dafd43d96d5992780bb0f9a23287d9
Merge: cd7bcf0 77a4bc7
Author: jts <jared.simpson at gmail.com>
Date: Wed May 23 06:58:35 2012 -0700
Merge pull request #19 from mh11/77a4bc7aacd7b838fc3097af515e2f27ffecde3e
Enable independent indexing (and merging) of forward and reverse index (and fastq) files
commit 77a4bc7aacd7b838fc3097af515e2f27ffecde3e
Author: mh11 <mhaimel at yahoo.de>
Date: Mon May 21 13:46:14 2012 +0100
Enable suppression of index creation also for memory only run + code formatting (spaces)
commit d5d3700c944a39b9dc1972975e4e319f0ec39cf8
Author: mh11 <mhaimel at yahoo.de>
Date: Mon May 21 13:17:40 2012 +0100
Supress merging of indexes / sequence files
commit f37adb254b0a6f731e797457f175192b5cc275ab
Author: mh11 <mhaimel at yahoo.de>
Date: Mon May 21 10:16:57 2012 +0100
Allow to build FWD and REV index separately to improve speed
commit 459cbff497a34043deb7fefdef35371c477503ed
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 22 13:43:16 2012 +0100
Integrated the read groups into DindelUtil code
commit d8b64b77eb7e982bb3b8098a41ed47f8eb38cdab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 22 11:20:38 2012 +0100
Started to implement multi-sample calling
commit fdd0509a0d67bdb3d0448f5960f56a633785986f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 21 10:45:17 2012 +0100
Cleaned up prints in OverlapHaplotypeBuilder
commit beba85822ca71a8b4b6fc88db5f7121582c44631
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 21 10:42:20 2012 +0100
If one haplotype fails QC, do not attempt to assemble a variant
commit 87b186793d141a88174617054412131c33787444
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 16 14:57:04 2012 +0100
--debruijn should not take a parameter
commit cd7bcf05a290f5e78735238ff9b609b15ea9d03e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 11 16:42:15 2012 +0100
Allow sga-deinterleave.pl to read gzipped files.
Thanks to Alvaro Martinez Barrio for the code.
commit faa7d0477b48ab607474bde811494e227cdecb7f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 16:34:18 2012 +0100
More option cleanup
commit 4c3854a8db59933c34297db3fd2e46da251e30bc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 16:18:57 2012 +0100
Cleaned up parameters
commit 97a70790adae61a21d55bad591156c5b456033ef
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 13:38:20 2012 +0100
The algorithm used during haplotype assembly (dbg vs string graph) is a command line option
commit cc1dabda1687ab78afb6db87ca323ed3a2a7d550
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 13:17:15 2012 +0100
Moved HapgenUtil
commit a95ff4e595117eb7aab36f0ce85441074e76082c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 13:14:02 2012 +0100
Refactored haplotype QC into its own function
commit 4e3ed9019806ecf80a85d8e26e1a666384b342e4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 13:04:30 2012 +0100
Removed old debug code
commit da0467a7bc4d9f7800cfba3f9cfc902a108337f9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 13:03:15 2012 +0100
Refactored de Bruijn haplotype builder into a new file
commit 9e9af48f072682c611b0b252b8e744471b4782f6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 11:16:02 2012 +0100
Added BWTIndexSet container. Massive refactoring of code to use it.
commit 8a09cb660b1be2db46246ab1548db5cb5a578201
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 10:03:50 2012 +0100
Refactoring GraphCompare
commit 0f44d53cf3d0fe98bf72409b21305a2fc1d0d195
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 10 09:46:03 2012 +0100
Refactored kmer masking code into a separate function
commit 01432bc47d6d815d5e8e92f5f0a21f9a67bb8a54
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 9 12:44:36 2012 +0100
Skip all IUPAC codes when finding anchors for gap filling
commit 143b83ad02db003e776246e1ce98bb11a5cc2161
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 9 10:46:38 2012 +0100
Added new flag to SeqReader to avoid changing lower case bases to upper
commit 972f4e56e0f5b29295e3308d35df10c7f2d13bc1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 9 10:27:02 2012 +0100
Allow scaffolds to contain full IUPAC ambiguity codes
commit ecaac5527e2cc52299d90fb1f5b7ce6f5ca22928
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 15:42:27 2012 +0100
Refactored BuilderCommon code into VariationBuilderCommon
commit f7b74d5baf9cb399e046af966c0f5f4621473586
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 15:08:34 2012 +0100
Renamed VariationBubbleBuilder to VariationBuilderCommon
commit 8dc04f8432f13955f63defca75c005cde99a06e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 14:59:37 2012 +0100
Removed dead code
commit 533964d065690005f70d900c50bd80eb3b59a211
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 14:57:53 2012 +0100
More refactoring
commit 850564282747cd3f81a970e1df03f371d7f15bdc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 14:40:09 2012 +0100
Removed abandoned class
commit b8106e880ad2749059bb94b6c4e35ea046995b4b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 14:38:44 2012 +0100
Refactoring the variant calling code into its own directory
commit 3c28ea33061c1780ca31247da9688172a738f1f5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 4 10:22:55 2012 +0100
Fixed error in the warning when a kmer threshold cannot be found for error correction
commit d8bac6d94698c5b3862640f329f4c8685318e463
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 2 10:39:13 2012 +0100
Fixed assertion when a haplotype aligned to the end of a chromosome
commit e1bb0e5c04f87026fa3b92d2bd5a5d9484c3eddd
Merge: 067b7a1 ecefa80
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 30 16:17:54 2012 +0100
Merge branch 'graph-diff-v4' of /nfs/users/nfs_c/caa/source/sga-graph-diff-v4-copy into graph-diff-v4
commit ecefa80363df819c3c2c9a7125c32f8545b1c24c
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Mon Apr 30 13:10:00 2012 +0100
Fixed mapping bug and -MAX_INT QUAL bug. realignMatePairs can be used to choose mate pair alignment, automatically sets FLANKING_SIZE to 1000.
commit 067b7a11965038f56bcbf7607d1ace3c46b64edb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 30 00:11:20 2012 +0100
Fixed infinite loop in haplotype QC for short haplotypes
commit 4c5dbdf6e342cf42c41f2d1e59fc346dce5152df
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Apr 27 18:47:14 2012 +0100
Fixed -MAX_INT varqual bug. Made sure flankingHaplotypes is unique. SET MAPPING QUALITY TO 1000 for all candidate alignments
commit f97cbc4c5e1496eb2293e61723550342ca1784e2
Merge: 21c34ac 759ef0b
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Apr 27 16:46:59 2012 +0100
Merge branch 'graph-diff-v4' of /nfs/users/nfs_j/js18/work/git_repository/sga into graph-diff-v4
commit 21c34ac05e8fe003cd759760bbcde7e9b587b2b8
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Apr 27 16:46:55 2012 +0100
Changed FLANKING_SIZE to zero and DINDEL_DEBUG_3->1
commit 759ef0b37a7ca607426e9371cb83f3aacf618c8a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 27 15:54:49 2012 +0100
New haplotype QC which counts the number of branches off a haplotype. Not used, just for information
commit 8e2100ad8552d7b997c85bdbe86cb2b489bf6812
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 27 09:46:10 2012 +0100
Do not pass -s to DistanceEst twice
commit f6f43ae3b89d2e5aef57e830f72754053d3660ac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 27 09:42:22 2012 +0100
Set a positive default value for the minimum contig length in sga-bam2de.pl
commit 2eac3c194dde5a8609146279b085ab14d6957ead
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 27 09:35:28 2012 +0100
Re-enabled haplotype QC
commit 658dbbdd11a763ddcc62ac2699c3ba61ac3f7b5b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 27 09:25:15 2012 +0100
Implemented a less restrictive check for when the graphs converge
commit b12eebc712317784c5c5143a224ec40992e4be77
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 26 11:32:03 2012 +0100
Refined the StringThread correction method
commit 1331bb0c2e5bacd00cfe50aa27251ae9dcd9e5cd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 26 09:14:39 2012 +0100
Debug code, synching with Kees
commit 8d48561aefc4b4401fc7dbc893b4f2e90dae60fc
Merge: 0f682c0 2b51d1b
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Apr 20 12:08:32 2012 +0100
Fixed variant frequency.
commit 0f682c0a98dd2e7e81c89ef93f1fc70c7c4ffc85
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Apr 18 15:51:37 2012 +0100
Added SingleRead as replacement for MatePairs
commit 2b51d1b924803fc0db508f0b7733eda091e13af2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 17 10:29:40 2012 +0100
Enabled the de Bruijn graph based QC check of candidate haplotypes
commit 499d27ca9d19483fb4c9139a9e39a9b109bd4a9b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 13 16:23:41 2012 +0100
Allow the user to define a set of sequences that are used to stop extension in sga-cluster
commit 716fd447c1783e5eef52510eb0b4442273900508
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 13 15:09:12 2012 +0100
sga-cluster extend mode can now be limited to a maximum number of iterations
commit 2e6aab1534f5a521c48e805d7c3177323c29e5d5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 13 08:58:51 2012 +0100
Cap the number of differences to the reference genome at 8
commit f273690e79b9256dd9470c04f96a5f885aa979f4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 12 09:33:06 2012 +0100
More restrictive alignment
commit fef6b4fc0ee4e83fbbc713d5babafd5208e133ca
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 12 09:08:46 2012 +0100
Revised trimming logic so it does not iteratively trim the whole branch. Testing lower correction kmer.
commit 7030dc0080d82688e317389528278b230e254fea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Apr 11 14:33:45 2012 +0100
Only extend the graph in one direction - this removes the effect of "back bubbles" making the graph too complex to resolve
commit 0d39e8b6ab58a313182c77222436a7bd8c05ef8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 15:23:43 2012 +0100
Min overlap length is now a command line parameter
commit 0f08a067d14540228df7a170ca2a945559384ad6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 14:49:39 2012 +0100
Recursively trim tips from the graph
commit 7de9f4a918ef3c0c574ce92533790ffbfbe4741a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 12:49:42 2012 +0100
Printing changes only.
commit d4aa808f447d5ce4f2764a245e086a60866cc79b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 10:26:13 2012 +0100
Avoid attempting to build covering paths when there are unambiguous chains of join vertices
commit 6b031e9243bc90bc1c62e62ddf0dca84488331b4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 10:05:34 2012 +0100
Separated walk candidates into left/right join positions
commit ae4e1a0fcf37ead8f5c6af349f41d5ae9083cac0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 09:57:12 2012 +0100
Extension vertices now labelled with the direction of extension
commit 7eed5ac5f0aa348efc1c5be720c2fe64ba8e13be
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 10 09:42:08 2012 +0100
Suppress construction of parallel haplotype
commit dc8c9132e521e640da32bab6c3153d97a939a0b0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 9 11:43:56 2012 +0100
New k-mer based haplotype to reference alignment, more restrictive haplotype assembly
commit c6fd8b1eecef0f6de9cf4c4033be15fca04e41e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 5 10:12:13 2012 +0100
Moved overlap parameter into the parameters object
commit cc0568707a4b0f544366b393069468eb9da8d47f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 3 13:47:15 2012 +0100
Save kmer indices instead of actual kmers sequences
commit 75a10f4c949584d8b64f8a46390ee253d588203c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 3 13:42:10 2012 +0100
Changed std::map/std::set into hashmap/hashset
commit ba76551787e2d4d81e7b4130ead5c6aa97740277
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 3 13:10:58 2012 +0100
various optimizations to improve the running time of the overlap constructor
commit 3f48fc98e4de4ce10fc7a049d096624bb7f46ed8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 15:30:35 2012 +0100
Stop the graph extension if there are too many tips in the graph.
commit 98c3b8561ee0113726f1ba41e924f92ac29bdc46
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 13:58:15 2012 +0100
Set up a kmer->vertex map to avoid huge computation inserting reads into the graph.
commit d87a248a6bc0ec6f958be5b91678327931c693b1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 13:30:44 2012 +0100
Temporarily disabled constraint requiring complete construction of parallel bubble
commit a967450598bd3acc2af9100896cad15b55d00f5c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 13:28:37 2012 +0100
Abort graph construction if no corrected read contains the initial kmer.
commit db5a3f9e6f6b3e6bd59616932e24d09b4ccbf453
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 12:56:32 2012 +0100
Refactored the big k-mer based overlapper into a new file. Optimized some functions in OverlapHaplotypeBuilder
commit f953a15146807aa8cfac846af3db620f3dd9e5ad
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 2 10:57:07 2012 +0100
First pass of string-graph based haplotype builder. Slow!
commit c89997fec430325fe492c43ac74c1855f785e7c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 1 09:56:47 2012 +0100
Tweaks to RCHB
commit fd9550bb32f29f2df5a91f858a3dc61290cf2961
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 30 16:16:09 2012 +0100
Fixed bug where empty initial haplotypes caused a crash
commit 534a56a78a1a3d02eb6bcf1d62a7b9c99a150f7f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 30 15:14:42 2012 +0100
New version of the variant algorithm that is based on inexact overlaps
commit ef64f8ad66e91de99aaa3c36964164c41f68f26e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 27 16:45:08 2012 +0100
Reverted to old overlap code. Not functional
commit b247a786f1c5eb18f0688dceea8a27197ee43b49
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 27 16:27:42 2012 +0100
Extremely crude version of inexact string graph haplotyping code
commit daa329f9864633e96114cb947f428a26af118265
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 26 15:59:18 2012 +0100
First pass at overlap-based haplotype builder. This version is not functional
commit 965bcb73520acc71f887d866f8ba649b0c7876cc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 23 13:32:17 2012 +0000
Aggressively collapse conflicting bases during haplotype construction. This is only temporary.
commit 339815f3eaada3d7196bad3da7d71d719afa1518
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Mar 22 11:03:18 2012 +0000
Integrated new multiple alignment code
commit 11efcd77b8f246911df23a18a1fab84c0fcacbf2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 21 21:34:07 2012 +0000
Check if quality string exists before adjusting for removed adapters
commit 5d7e125d72cecdee063a381497d94a76202acf91
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 21 21:05:41 2012 +0000
Allow singleton kmers to recruit new reads
commit e077ce798232908d2ecad9b94f77ecd5fc35e20a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 21 15:40:55 2012 +0000
Improved version of the haplotype generator. Lots of testing/debug code still in this version
commit a8e7816b0694c7145db43901940eff2178e06aac
Merge: ef6c803 11b5751
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 20 11:58:52 2012 +0000
Merged Kees' latest code.
commit ef6c803f4b561baf7010685aaeaaa496138b36e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 20 10:56:09 2012 +0000
Parameter tweaks
commit 11b575158e911c4576cb44dd796f33a2ec030122
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Mar 20 10:52:50 2012 +0000
March 20 debug version
commit d9f5362cf709f5526c009d7178f88545c45c37ec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 19 16:11:35 2012 +0000
Read coherent haplotype builder now extends to new variant kmers
commit 37a432e562a7fa23cad76a06547e473a2136d2d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 19 13:10:21 2012 +0000
Rewrote method of inferring haplotypes from read coherent kmers
commit 4e34ed20374fdb5c9b8b2aa87aaa322d4954a202
Author: David Rio Deiros <driodeiros at gmail.com>
Date: Tue Mar 6 09:21:12 2012 -0600
Adding feature in preprocess step to remove adapter from reads.
commit 08b41df8ed78b18776539d377f97dfe4cb2ec5e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 27 15:46:57 2012 +0000
More conservative generation of haplotypes.
commit 2188d73566f18cf8878d0e4a1cb2a6578d60730f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 27 13:41:02 2012 +0000
New method of deriving haplotypes from all the reads sharing a new kmer.
commit 2e7ad29493fdd565bf369ab67d1a8234a98392c4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 23 16:22:49 2012 +0000
Removed debug output
commit 1781aeedc562a7ba42e80ad4f5793d90585a8a72
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 23 16:13:51 2012 +0000
Removed hacked-in hardcoded path
commit da4d051207236f36709991caf1e21c9914b00837
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 23 15:06:16 2012 +0000
New "kmer witness" algorithm
commit 68bd5514e881da252536397509740c136cae169e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 21 14:37:20 2012 +0000
First implementation of read-coherent haplotype generation. Way too slow so far.
commit 5a881a63dd35e171d22ed92ee2c3096af5781cae
Merge: 9dacd9c 47d807e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 19 17:37:27 2012 +0000
Merge branch 'it-correct' into graph-diff-v3
commit 9dacd9c52d0ea0f64efc38dff44187188b53600f
Merge: 2498066 6ec7f9a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 17 13:33:38 2012 +0000
Merge branch 'master' into graph-diff-v3
commit 2498066e7991c7e7bda283b9a0ad58c4b2d431c8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 17 11:57:08 2012 +0000
Started to implement coherency-based haplotype builder.
commit ceb9fa4e273c56718044ec6cd8268ee1fa26eee3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 17 08:59:22 2012 +0000
Huge debugging hacks to investigate why we are missing some variants.
commit 6ec7f9a29c9fb19c2e78b0823bf1c11b026b8ed5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 10 16:03:46 2012 +0000
v0.9.19
commit 733ba5e42f2386cf24c7f936958b726e2251b16f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 10 15:59:40 2012 +0000
Added option to sga-filter to remove substring sequences only.
commit c8071343eef587c1691acbb3c20692e78667c979
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 10 15:45:10 2012 +0000
Filter the abyss-generated insert size histogram to avoid very long DistanceEst runtime.
commit 2f72eebc65e52f4b33be6553990be5a38b8b739b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 6 20:59:18 2012 +0000
Fixed assertion tripped by short filenames when checking for gzip extension
commit 7e6284d883bee5379efaeca8cab039860fcabe3f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 6 20:53:23 2012 +0000
Added Illumina's notice regarding the rights to the BCR algorithm
commit d503dc902bf2fe85fe2d1ef9fac3571bfe7a7cdf
Merge: 3a98fa6 b7f85ca
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 6 16:13:25 2012 +0000
Merge branch 'new-indexer'
commit 3a98fa6e9d6a2105390cb9076b7ff3b6f260b271
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 3 16:33:38 2012 +0000
Changed warning message when the operating system is OSX
commit 30f81714ead8e5613c932eb87ba672df30ee5050
Merge: 32310ec 395184d
Author: jts <jared.simpson at gmail.com>
Date: Fri Feb 3 08:28:44 2012 -0800
Merge pull request #12 from drio/master
OSX not implementing certain semaphore functions
commit 395184d87f7531c9fb38045aad95ad85f80aa476
Author: David Rio Deiros <driodeiros at gmail.com>
Date: Thu Feb 2 14:39:08 2012 -0600
Warning the user about the fact he will have issues merging
BWTs on disk since OSX doesn't implement certain semaphores'
functions.
commit 32310ecde4a640e965c94800b78dad94859fe885
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 25 22:10:51 2012 +0000
Make sure that edge records have the correct number of fields
commit ffe7e370c3151b54ee05495f92f763f5d3b5144e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 25 22:03:46 2012 +0000
Emit an error and exit when a vertex record is truncated.
commit 0c82a7a2a2651449f3e844d6feb064fa249383ea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 23 13:37:49 2012 +0000
Changed the ASQG parser to only warn if the TE tag is not present instead of aborting
commit 8e1ae0ca6db28fcee59192b57e924549d7d6ece5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 23 12:59:44 2012 +0000
Removed unused parameter in sga walk
commit 5f1d014e21420b29d4f62f95542589bc559f4c37
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 20 15:42:31 2012 +0000
sga-rmdup now writes out the number of copies of each sequence in the header line of the fasta file.
commit 343515ca47f5d48f0f808e62e6303eb123d69262
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 20 15:26:40 2012 +0000
Keep leading directories when parsing the reference filename
commit 47d807e736af7075c2b39b5d79b284ec7f78fcf1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 16 14:10:09 2012 +0000
New overlap corrector will use the -r/--rounds parameter to iteratively correct reads. This can lead to better correction accuracy but decreases correction throughput
commit 419227bba2dcca297550fc6dd715de4a9ac138a9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 9 14:44:26 2012 +0000
Updated third party code
commit 0bbe50afc1dbcf70c8af86d8bde612381c236a16
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 9 14:22:18 2012 +0000
Integrated new overlap method which extends an existing alignment. Considerably faster than previous method.
commit 628bd6d92576321aa0568d70985c86868d23c099
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 9 09:12:21 2012 +0000
Updated third party code
commit 3442ce228ff59ed62495967d2aff611cd2fd60c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 6 11:32:01 2012 +0000
Changed parameters in consensus algorithm in new overlapper
commit e4a3cd49d48035891759c18d4212ac3fc7e8d46c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 6 10:10:37 2012 +0000
Updated 3rd party code with improvements to the overlapper and multiple alignment
commit 27bf3d238bb2b370c1b51a2f7d23442245138c21
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 5 10:37:57 2012 +0000
Refined the kmer matching portion of the overlap calculation for the new corrector.
Previously, we called SampledSuffixArray::calcSA to get the read ID for every kmer match.
This was very expensive as usually reads would share multiple kmers, making the calculation
very redundant. Now we cache the indices as we visit them which is two orders of magnitude
faster. The DP overlap calculation is now the bottleneck.
commit 52aaa93b1ffdab87be0bd813c435ec3b2ce5c8c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 4 14:12:44 2012 +0000
First implementation of new correction algorithm, which allows arbitrary overlaps between reads. Not for production use
commit e6ea2388dd86889a32a54d66a4d68d74aa25693f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 4 10:26:56 2012 +0000
Removed long-used Algorithm/ErrorCorrect code
commit a38cf00128cfe52c7947b1deb6d73501026de320
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 4 10:22:24 2012 +0000
Integrate code from github.com/jts/misc
commit 5d37bd1bfa23986e956b64501fde6d5612770d7e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 3 10:48:35 2012 +0000
Implemented --longest-n parameter for sga-walk --component-walk.
This option limits the number of walks that are output for complex components.
commit 067b47b18d1a8cba80136cdd24c5e6811fe98bae
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 3 10:04:30 2012 +0000
graph-diff can now output multiple candidate haplotypes. Added a min-depth parameter to avoid traversing low-coverage k-mers.
commit d6140fc5ef79c7fcc28dc8bad2cbc12e4fde92ff
Merge: 66d7f8f 72117d0
Author: jts <jared.simpson at gmail.com>
Date: Sat Dec 31 02:10:35 2011 -0800
Merge pull request #11 from hyphaltip/patch-1
use -t option in sga-align in the example
commit 72117d007eafcb27979a86b2f2e4f749fbc11086
Author: Jason Stajich <jason.stajich at gmail.com>
Date: Fri Dec 30 19:37:59 2011 -0800
Seems like sga-align could be run with threads so that bwa uses multithreaded to be faster. Is there any reason not to do this?
commit 66d7f8fa7d1fe59bc5e30e7609e3872f2ce36f0a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 15 11:57:27 2011 +0000
Modified sga-cluster-extend to warn instead of exit when some seed that is passed in is a substring of a read.
commit 83e3135fdfe20067666dc180fa9081b0fc4a5fc0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 15 09:49:23 2011 +0000
More debugging code
commit 91b4b550839092cce30bf9a00782d8e2f0a0f604
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 14 10:15:39 2011 +0000
More debugging code
commit 2f25a664328b2ca75d5272cb72d828d790237025
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Dec 9 14:58:58 2011 +0000
Optimized and added INFO tag outputs.
commit 20e681199e0a2bf77ff2fa6f12c1dca0835eb3a4
Merge: d478456 0e8616b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 8 10:38:06 2011 +0000
Merging bug fix from Kees
commit d47845637a1a0c9cc405208260f3ef218179ef5a
Merge: e6f8f48 e3d93ae
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 7 21:31:36 2011 +0000
Merge branch 'graph-diff-v2' of /nfs/users/nfs_c/caa/source/sga_merge into graph-diff-v2
Conflicts:
src/Algorithm/DindelRealignWindow.cpp
src/Algorithm/DindelUtil.h
src/Algorithm/GraphCompare.cpp
commit 0e8616bf13abde3a298c7fa9b4b24a76072846b4
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Dec 7 21:17:19 2011 +0000
fixed silly bug in computation of penalties for haplotype alignments
commit e6f8f48b45cf1a83f25d330ef8b0e93bd066319b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 7 14:01:40 2011 +0000
Fixed bug in extractHaplotypeReads where it would incorrectly flag some haplotypes as being too deep.
commit e3d93ae3bdc297d01ebeb1e3aa0137f23bd283a1
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Dec 7 13:27:11 2011 +0000
Fixed incorrect averaging in outputAsVCF
commit fd512365fa9b1dc3ace038d7f0e3bafea3c9c1a6
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Dec 7 13:14:13 2011 +0000
Fixed getDistance: it now only used reads that match the haplotype sequence without mismatch at the position of variant. Also fixed outputAsVCF:it now combines freuquencies from haplotypes mapping to the same position.
commit d415b0dd1ba0e0d6c7b08fd8cb454a8924769ceb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 7 12:41:46 2011 +0000
New heurestics to improve the running time when calling variants versus a reference genome
commit 8771a72a320b639a4e7ef40f1b5d30120d5a6fb8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 7 09:09:43 2011 +0000
Revised profiler
commit 095e8752c30325f5bede7dac2a44dd333903e2f0
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Dec 6 16:45:25 2011 +0000
Fixed quality score issue for haplotypes mapping with low quality scores to places in the reference. Added ID output in VCF
commit b0b282e01316f2148af0e8a01fc795e46396c502
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Dec 6 15:17:08 2011 +0000
Investigating poor variant calling performance on mouse genome data. Added a lightweight profiler.
commit 634c45fe04c7288eb941e891bfcf9e7cc75c6427
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Dec 5 09:36:37 2011 +0000
Implemented a simple counting-based variant caller for debugging
commit 0b7d9585012d196b0510f9ce645d0d48a33f867b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 2 10:28:32 2011 +0000
Added overdepth filter to avoid running dindel on super deep regions
commit 72df8a425701fbc661a04a05e7362d83de4dffea
Merge: 5938d6f 8b1da8a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 2 10:06:49 2011 +0000
Merge Kees' branch, with a fix so that variants are output with respect to the correct reference strand
commit 9a014f4b77515ce2ebf701e084c558226a5a54e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 2 09:53:00 2011 +0000
Added error message when a sequence with a given ID cannot be found in the input scaffold/contig collection
commit 8b1da8a97389a3688af9b5806eb4e1bee92a880c
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Thu Dec 1 17:51:42 2011 +0000
Fixed strand issue.
commit 5938d6f164d453edc2f3b84f74d08ac974617361
Merge: 20e4b93 212743e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 15:45:35 2011 +0000
Merge Kees' branch with bug fixes of variant that are not being called
Conflicts:
src/SGA/graph-diff.cpp
commit 20e4b939e92f537fb249160c2ceb6eb484f3a240
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 15:36:03 2011 +0000
Turned off debug mode
commit 212743e19c838b7ade000fb7c367e13b2d0a26c4
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Nov 30 14:32:18 2011 +0000
Fixed position bug. Fixed output of uncalled variants
commit 2bde31a0e5891eb9046fb79e4d13df44d0592d41
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 11:05:34 2011 +0000
Added debug code for Kees
commit 14b3eaa18f254d71787dd2b983def01faca3e03c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 10:09:14 2011 +0000
Bumped version to v0.9.18
commit 378309fb3b9e4b2dcf30a295e2d00087755d4155
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 10:06:37 2011 +0000
Fixed configure script to properly handle bamtools include/lib paths in the case where it was installed with make install
commit f5f75b2e14d466f9503bee4badf18504b0d2d04c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 30 09:55:46 2011 +0000
Temporarily re-enabled some debug prints
commit 50b09c7522d8714f00b79607ab5906186116e3eb
Merge: 7ab7faa 3462451
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 29 15:19:51 2011 +0000
Merging latest changes to the dindel haplotype model by Kees Albers.
Conflicts:
src/Algorithm/DindelRealignWindow.cpp
commit 7ab7faa41634013ee9924173f1da2fc7a05e5d66
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 29 14:04:41 2011 +0000
Fixed memory stomp in DindelHaplotype constructor
commit a99e77d32de0f85c63f5aa5fcb8eae1132bc548d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 29 12:51:59 2011 +0000
Added a new exception to Dindel to handle the case where the variant found lies at the beginning of one of the haplotypes
commit 6addb8a208bb8e4c710c4fd56c4fc732db2218b8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 28 15:33:02 2011 +0000
Removed much debugging print statements
commit 0eb92db9dc3f17035b480569a556b1593b3b3aaf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 28 09:55:36 2011 +0000
Fixed two performance issues in sga assemble.
For very complex graphs with repetitive genomes, Myers' transitive
reduction algorithm can take a very long time to complete.
Since sga overlap does not output transitive edges, in the
general use case Myers' algorithm is not needed. So it is now
turned off by default.
Second, the small repeat resolution algorithm can also take a long time
to run. I'm attempting a fix here where I limit to run on not-too-
complex vertices.
commit 83854f4cf09df5ce69910d916bfa95037dca2e03
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 28 09:09:49 2011 +0000
Added new debug mode to graph-diff
commit 3462451c1e0c0da7688b7dc757394056f36a4130
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Nov 25 13:58:38 2011 +0000
tested on 167924_A1 exome and seems to give already nice results.
commit 1068d533a28b3881079a7809a0d239cd29cec3a6
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Thu Nov 24 17:03:23 2011 +0000
fixed a couple of bugs. Still debug version but seems to do sensible things.
commit a8005446bad7b31ec4c48b0d0e7cabd79fa0325e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 23 12:46:02 2011 +0000
Was using the wrong base BWT in non-reference mode
commit 624b2ae61a57d02241acf93ec3e082f65b93763f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 23 11:04:03 2011 +0000
Enabled reference based calling, fixed memory errors
commit ab1161a98a23557311efb647c7514357be26eb67
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 22 11:39:01 2011 +0000
Implemented a number of debug/development functions for graph-diff
commit da0254b86238e6354eb248a02525c43f4a8588c1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 18 18:28:49 2011 +0000
Fixed bug where the wrong data types were being written/read in the ssa files.
commit b7f85ca0332e1adbc795c66a7b790f4e6f5f659a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 14:39:20 2011 +0000
Removed print from BCR
commit 63fc7c8675eb5703d27b52d9db45b4808a7dd31b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 14:38:48 2011 +0000
Fully integrated BCR with the BWT disk algorithm
commit 691eedfa15b3d8982f0f2fed686e482b5fb9c47a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 13:44:04 2011 +0000
BCR algorithm now writes out the reverse index. Made the algorithm choice a command line argument
commit 4b0ea366cecd6a3bd684f4de080b22ccbb7cfb93
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 13:10:13 2011 +0000
Started to integrate BCR into index. Made it more efficient by using 2-bit encoded strings everywhere
commit 1114fe5a2da85b53f33d2be3a8b6b78edd8b81d8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 10:46:37 2011 +0000
BCR-constructed bwt is now written to disk.
commit 59b265f17783950a711a2559567251e50b3047c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 10:13:18 2011 +0000
BCR cleanup
commit a31b69554b3c1b88c29a53b960f7d96240b55eea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 15 09:06:22 2011 +0000
First semi-functional implementation of BCR for testing
commit 520c79bedc2f14ea87d3ec15d5f58d542f0cf0b1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 11 18:30:53 2011 +0000
In configure, specify -lbamtools in LIBS instead of LDFLAGS. This corrects the library link ordering and fixes the build in the case --as-needed is used in ld, as in newer versions of gcc.
commit 57b05ac446f2f2e1a4ae7d191992c988e35c1e65
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 11 18:18:38 2011 +0000
Skeleton code for new indexer
commit d591b82270f719ef33d464f66badc3c35194d211
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 10 15:12:51 2011 +0000
Fixed usage message for sga filter
commit aafa0e802ef53181e98957dde8b05aa9dc04984a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 8 13:14:44 2011 +0000
Changed sga walk to remove contained reads and lingering transitive edges when --component-walks is specified.
commit a201b3bdff08fe85c8fae15d4b315cb16f5470e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 1 09:11:43 2011 +0000
Reverted back to using variation bubble builder in graph-diff
commit d40c6285979766e5f477e6348cc3215e8789c601
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 31 14:31:58 2011 +0000
Trying version of code that relies on haplotype builder - kinda hacky
commit 23a11c47281aa799d1b72bfd16973764fc4bcb40
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 31 11:51:40 2011 +0000
If the best candidate haplotype is to the reverse strand of the reference reverse-complement everything so the variants are on the right strand
commit 32aaab0dfb6f4b1e506a3c40d4ed1806bf815778
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 30 00:45:14 2011 +0100
Throw an error when the homopolymer length check fails instead of printing a warning
commit 364f5ed5bf6245067b69c2d7a0e3ebd004dfb1d2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 28 14:00:33 2011 +0100
Merged Albert Villela's change which allows adding a suffix to each read ID in sga preprocess
commit 0cde56c2a2214a64c5a02e3902f054eda94af52c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 28 13:48:34 2011 +0100
Changed another assertion to a warning/return code
commit 3b1ca53784e0cf6248ce6e7dc098d30a6ff0fe24
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 27 14:24:08 2011 +0100
Changed dindel assertion to a throw; modified the post-assembly walk finding algorithm to avoid performing enormous walks in the case that the graph has loops
commit c339ae9ab115a4a5cbb64946f8feadafafea5e08
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 26 14:18:07 2011 +0100
Made new dindel integration code thread safe and removed a bunch of prints
commit 15a3790544f2b8107f9cb599988d8949e07a9ede
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 26 09:07:27 2011 +0100
Removed some dindel assertions and changed the branch logic
commit c071fed78dcbdc1de758876d8c7ba47c2f6a4e9c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 25 10:05:03 2011 +0100
More debugging output in VCFTester
commit f6b973daed23f7066655c3dc891812edd65cbcb9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 25 09:44:45 2011 +0100
DindelRealign code now outputs to a stream instead of a file. Also, added more debug output in VCFTester
commit be229f8d532540a735f340bdf77e0985014104b0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 15:19:47 2011 +0100
Added extra information in the vcf testing mode of graph-diff to help understand why some variants were not found
commit 2c44bf12fdf168d4c55049d582ddd100eec7af59
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 11:33:06 2011 +0100
Added extra stats reporting to vcf tester
commit 102cb4c60796c57d54963e8094a645e3afa9eea5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 11:00:31 2011 +0100
Refactored more of the dindel wrapper code
commit 9a2f7bdde87d600962d10d856f700248ab0df006
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 10:50:00 2011 +0100
The previous commits that changed the represtation of lexicographic index in the SampleSuffixArray broke binary compatability with previous files. Updated the magic number to catch these old binaries.
commit cf153c7efe53378f20a5b04b1f95618ed0835c52
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 10:09:33 2011 +0100
Refactored dindel calling code into DindelUtil
commit 1fac42e959663f26c3cbd3ba6edd2bf253f80491
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 24 09:56:20 2011 +0100
Adding functionality to graph-diff to test existing variants passed in via a VCF file
commit 960132a05e2ad5e6e196d927b52b95a421e19f3c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 21 10:03:55 2011 +0100
Added code to perform a fairly basic selection of the best alignment position from a set of possibilities
commit e6c88f490f3bdb953f9eec70ff14da834c399e1d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 20 15:50:36 2011 +0100
Split tumour/normal calls into separate vcf files.
commit 23a6f0d5df4c032f6a3c5ba7ecfcf244b5867c65
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 20 10:03:01 2011 +0100
Fixed bug in read pair extraction
commit 8012296e968a466dcd2984719017893ef7d1c200
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 20:59:14 2011 +0100
Removed some cruft from sga-asqg2dot
commit b604d4d81081840259e4f06195202e8a8bbedd50
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 20:50:46 2011 +0100
Converted tabs to spaces in sga-asgq2dot
commit 98720c141c77d0c825dc93960dbac37dbfd62930
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 20:45:32 2011 +0100
Added sga-asgq2dot.pl helper script to bin directory
commit 44f850046c92abc99fcb17c9c697b95130c70d24
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 17:55:02 2011 +0100
Reformatted some dindel code to fit the style of the codebase
commit 0f25758774990feabf01c3f34a5e9fd1540f9890
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 13:47:04 2011 +0100
Revised SampledSuffixArray to using a uint32_t to store the ids of the lexicographically sorted reads. This cuts the memory of the data structure in half but limits it to 2**32 strings.
commit 9bbbc89a951418c4aeb6d93f112f8298f902e4be
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 13:25:46 2011 +0100
Removed the reverse FM-index from graph-diff which is no longer needed.
commit c00030f6880bb00b405f2f8f56272020a37dd20b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 13:12:09 2011 +0100
Implemented function to get the edges of the de Bruijn graph from the FM-index using a single (forward) index. This can be used to cut the memory usage of some subprograms in half.
commit f8ba8ee9216211fe153853ce2c38fdfd1e0611c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 10:49:22 2011 +0100
Implemented testing variants with dindel separately for the normal/tumour
commit 84d27f81a7b2c972d2820f010717316512e06798
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 10:18:00 2011 +0100
Moved dindel code into a new function in GraphCompare
commit 1c228909e0c219f6187831b0bd238eb775fd380a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 19 09:21:50 2011 +0100
Set the bamtools link flag in the case that bamtools is installed in a standard directory and --with-bamtools is not needed.
commit 6f5d90d85ade37445bef4533b6f7623e627a9a6b
Merge: b521fa0 0872075
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 18 18:00:42 2011 +0100
Initial merge and integration of Kees' dindel code
Conflicts:
src/Algorithm/Makefile.am
commit b521fa0b06d5ef83f60a61cea1c63a12401fe093
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 18 12:59:09 2011 +0100
Implemented more helper functions for the hapgen process
commit ef66fee20336ac6642d9d5831db28c4bdd318f54
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 17 15:14:36 2011 +0100
Refactored code into HapgenUtil
commit 08720751f14017baf44f96f03924ab721bf2b472
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Mon Oct 17 12:19:04 2011 +0100
Fixed addSNP to candidate haplotypes.
commit 939080aa38289d16ef28bc2a3e4a3e9f80cfc675
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 13 22:55:55 2011 +0100
Furhter implemented realignment of haplotypes to reference after discovery.
commit d601af76e7ce0a6d4045ea2fba04a1086f8f47f9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 13 20:32:44 2011 +0100
Stubbed in bwasw alignment of constructed haplotypes to reference
commit 18ae46c918b8477b590d4e4eb23e67d9276f723f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 13 18:38:20 2011 +0100
Implemented loading reference fm-index for graph-diff
commit 9de09b51bd681c1a91900943f612aa7bafa2a3cb
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Thu Oct 13 17:16:21 2011 +0100
Changed calling to ModelSelection. Added EM haplotype frequency estimation. Seems to work.
commit 1906e134c6947d42e2a7904e90fdd36dd76ec775
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 12 18:45:58 2011 +0100
v0.9.17
commit 67acdb10c638c3aa85c37f3a064777346cfff617
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 12 18:44:56 2011 +0100
Added explicit cast to avoid warning on some versions of gcc
commit 14bdbbe8e2dd0df781d5ffd1da0d80682bd18510
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 10 09:37:44 2011 +0100
v0.9.16
commit 3da3bc480578326e2498c05a85d0491263a1388a
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Wed Oct 5 17:24:26 2011 +0100
Makes calls now, did some initial checking and debugging
commit 8b6b59516bb8d5fd6ac88193b4a7288e5283d324
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Oct 4 18:30:18 2011 +0100
Added VCFFile output
commit e9b78c0fdbafa7a1a52e2d35c72cf4354d066935
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Oct 4 18:04:55 2011 +0100
Integrated DindelRealignWindow into HapgenProcess
commit 40a41d5295a3051120b6e376646bdb6d46b3f38d
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Tue Oct 4 17:36:16 2011 +0100
it compiles!
commit be35c78e9faf915d00741d258d2724375e5b331d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 4 11:49:29 2011 +0100
Better implementation of coverage-cutoff based de Bruijn assembler for metagenomics
commit 2fc2aca472954e204bb6d8c5df2e47a31aa73b3b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 3 10:44:40 2011 +0100
Refined version of local coverage based metagenomic assembly
commit 3c9d39937f47e8b5c3b9cc0d8583e2017afeaea4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 3 09:26:53 2011 +0100
v0.9.15
commit 7ed99a28f36534ee7e6217dd81d9d3413bc76272
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 3 09:21:45 2011 +0100
Added strict mode to scaffolder to only keep unambiguous connections in the scaffold graph
commit 810fe13f4f12ffdf7b040abb69c8c8cbf4648286
Author: Cornelis Arnout Albers <caa at sanger.ac.uk>
Date: Fri Sep 30 21:36:12 2011 +0100
Integrated Dindel
commit 140a6322073b600cde7d49d7658494daff53691f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 28 16:04:08 2011 +0100
Added new filter to filterBAM to aggressively get rid of FR contamination in a mate pair library. Added new output to the Scaffold
commit 6ff7e7c6278fc7e9d14b081df220b4949cbca81d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 13:26:44 2011 +0100
Removed prints from scaffold sv resolver
commit f84eba059bda5be72b623c9991ec4f1194b28ac0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 13:15:19 2011 +0100
Made the gap fill start/end kmer sizes command line arguments
commit f30c232cfb876813e5c48eee603ec8e05376db33
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 13:05:09 2011 +0100
Reverted back to the old repeat resolver code
commit 85088f85382d2a56412212298c84597718a4d4d4
Merge: 1873928 a3d8d68
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 12:55:33 2011 +0100
Merge branch 'hapgen'
Conflicts:
src/SGA/correct-long.cpp
commit a3d8d68d59c8edd471f7fd25bff40a28baf60619
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 12:53:19 2011 +0100
Fixed error from merging gapfill branch
commit f80f404a6400522e981477dbf7335a21241afce9
Merge: 506fa40 a31b6db
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 27 12:47:39 2011 +0100
Merge branch 'gapfill' into hapgen
commit a31b6dbe20623247aebe9cde12aee4e5513c85cc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 25 20:21:28 2011 +0100
Added threading options to scaffold driver scripts
commit 506fa40ebc3c63e57c14e472d635605821f13915
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 23 14:37:59 2011 +0100
hapgen: added function to MultiAlignment to construct an MA from local alignments. Added code to hapgen to pull out read pairs.
commit aa846d7f5f7544aa16dfd2ba9b3689dbb1553c8c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 23 10:10:46 2011 +0100
hapgen now extracts the piece of the reference that is being reassembled.
commit be1cfc7fbbd965cf97cc10e1cfc2113906456215
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 23 09:44:14 2011 +0100
Write out beetl progress to a file.
commit 21a1face227368d9ee67f15437a9dda171c6adc1
Merge: f29b808 d7c4b2b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 22 23:48:44 2011 +0100
Merge branch 'metagenomics' into gapfill
Conflicts:
src/Algorithm/Makefile.am
src/SGA/Makefile.am
src/SGA/sga.cpp
commit f29b80853a1c3e59b2d6981bbe16cdfb3ddada4e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 21 22:43:38 2011 +0100
Added more output statistics to the gapfill module
commit d7c4b2b9d39a8e8333e40eca00349b288e296f30
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 20 18:32:43 2011 +0100
Testing a local coverage based coverage cutoff for the metagenomics prototype
commit 8d548e8db8f5e84ffeb71c77f36d5098d454daac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 20 18:31:32 2011 +0100
Output start time for main beetl processes
commit c6539d36f34ee167b23d58e1bc21e7fb864e358c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 20 11:59:08 2011 +0100
Now use BWTIntervalCache when calculating de Bruijn extensions
commit de93b5dff823f78f69499b06f40dbc31fb6790cd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 20 11:43:06 2011 +0100
Fixed metagenomics assembly logic when there is an in-branch into a repeat
commit 016d645553ec88165850359db8444d5f6bf414bf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 19 15:44:14 2011 +0100
sga metagenome: implement compare and swap logic to avoid outputting duplicate contigs when two threads assemble the same sequence simultaneously
commit 54e25936cd2eb678d8fb710c293090d98aac304c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 19 11:56:42 2011 +0100
Implemented first pass at metagenomic assembly logic
commit 97f9a96222f7768dd5191a13d798cd4809b47be7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 19 10:43:31 2011 +0100
Changed long read error correction kmer size
commit 7ee342d0b3c2b3d881cf6e36f1a1a0de9ed72be3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 18 19:09:10 2011 +0100
Do not correct a read unless two unique anchors can be found
commit 193008a289dc92a22ff1583c39ebea28042fe107
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 18 16:17:55 2011 +0100
Added descending kmer mode to gap filler and implemented first pass at choosing the assembled sequence which best fits the gap
commit 647f1fbdf1511990a76409fc3d89d65fc21534fe
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 18 15:42:27 2011 +0100
When patching a scaffold gap, remove the overlapping input sequence instead of the gap sequence.
commit 35e130518f5c6e20742a390871147f2ab84cce9c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 18 00:25:29 2011 +0100
Rewrote processGap/processScaffold to be cleaner and more robust
commit eb2f894acb677e7e4d79a61219b2b48b71f615d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Sep 17 14:10:17 2011 +0100
gapfill: the gap sequence is now placed into the scaffold and the new scaffolds written. First functional build
commit 50718876faf0bcd9086e72680c9b2447b6137273
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Sep 17 12:53:58 2011 +0100
Added more informative results stats to gapfill
commit d9242477bfb72f46813b7c94c5194934863bd9e0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 16:57:56 2011 +0100
First pass at implementing gap filling logic
commit 65512962aa1c4e1671082dd62017ecceecb9e5a8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 16:15:31 2011 +0100
Added skeleton of new sga gapfill program
commit 17598dc3e9162f1b42d0a10d6d9c8238334f69c7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 15:44:41 2011 +0100
Brand new long read error correction based on haplotype generation code. The algorithm finds kmer anchors on the long reads, then builds putative haplotypes through a de Bruijn graph between them.
commit 1873928ddc70b95654bfc7f82d991377a99384f1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 13:01:51 2011 +0100
Fixed bug in small repeat resolution algorithm
commit b2de45c3e4d17a467894f74b6bd82b49728be97e
Merge: e0b7c6b 5b6cbef
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 09:52:08 2011 +0100
Merge branch 'small_repeat_rewrite'
commit e0b7c6b3c555f5a29343fc5c41f88bd1eed31234
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 16 09:20:34 2011 +0100
correct-long: use sai file instead of ssa
commit eeabee7f7c700a7a97d9894d2f64ae25e0e36dda
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 15 15:26:03 2011 +0100
First pass at extracting the reads mapping to haplotypes in hapgen
commit c9664b7882b43ccaaf465f0c27d8e6a1a8340b0c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 16:25:16 2011 +0100
Wrote utility to build a simple multiple alignment from an array of strings
commit b441a843e05fd48155ceebe57521b1cd5dc878e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 14:51:18 2011 +0100
hapgen: properly handle cases where the anchors cannot be found
commit ee158f55e7547d63c98111322f4be31ac4a8b537
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 14:21:35 2011 +0100
Tweaked hapgen debug output
commit 0869d0d187b31c0509950a05d2c673c1eb413004
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 14:13:36 2011 +0100
Added initial haplotype generation functionality
commit 7fab6d27671cc09801cd30b0bbadb8514a6489e0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 11:26:41 2011 +0100
Added skeleton of new hapgen program
commit b47718a0e2d438c23e8ddabc8e0d334fd7807c23
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Sep 11 01:07:14 2011 +0100
added parallel processing framework to the metagenome assembly subprogram
commit 16f6f01e05f2da0cb459cde32ec18771dcac8df0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Sep 10 12:37:10 2011 +0100
Added metagenome assembler program skeleton
commit 1da5d6042091ca67b295f33a9dd31d68aef0b27c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 18 15:41:35 2011 +0100
Changed variation bubble builder to better support uncorrected sequence graphs.
commit 365c36486d435d56cc852f052e5635efdccf2380
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 13 10:04:28 2011 +0100
Removed dust check TODO message, which was implemented in a previous commit
commit 971cab2d96ed4a3172bfade0f15e280add177a2c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 13 10:02:56 2011 +0100
Fixed variant file output name.
commit 992e08e0ab79f4dc70c60343121bb38254c7a083
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 12 11:04:39 2011 +0100
Sort VCF file by the order of the reference sequences in the input BAM file instead of strict lexicographic ordering.
commit 7f67a70fb320cd6cd33616b5c54a8eaa3706d734
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 11 15:17:44 2011 +0100
Fixed -o/--outfile option to graph-diff
commit bc7c976cd74a3c4648dd5dd8ce3acc4d0d199e09
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 11 10:08:09 2011 +0100
Added interval cache to graph-diff to speed up computation.
commit 6ee7c8ef1e524df1120d48a8c5aaa3f781efa504
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 11 09:56:04 2011 +0100
Allowed target portion of the bubble to branch. Controlled with the -y command line parameter.
commit cdc893e11173c04968d6dd530efb68abd2ff4e2a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 8 23:40:33 2011 +0100
Removed print
commit b34bd0b2dcb005525905d73a7e4a5fb9730796dc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 8 23:30:40 2011 +0100
Added additional sanity checks in var2vcf to allow the processing a real human genome call set to go through.
commit 7cba0733e27c5dc6447e565d77ef8b9fda23c9b7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 8 13:49:49 2011 +0100
Added quality filter and fixed VCF coordinate calculation in case where an insertion occurs along with a second variant.
commit 593f239ccd6939e328b50f60425c59cd7484090a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 8 08:56:54 2011 +0100
Implemented var2vcf to turn variants found by graph-diff into vcf records.
commit ad1b747c111fb2da22926bd1343aae04c5df6d42
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 6 22:32:53 2011 +0100
Added proper substring function to DNAString
commit ebc7e2b4911d8b93fd16f4e9f7739eb28193e79e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 6 21:37:25 2011 +0100
Reenabled sanity check insertion in var2vcf
commit d4c1ce65bc421fa5a3777a241b945e6384141a3e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 6 21:36:18 2011 +0100
Created program and initial parsing code for var2vcf convertor.
commit 088e2956d22869cf37eac14b787389aeb5e4cee1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 6 19:48:08 2011 +0100
Whitespace change only
commit 51f3ad680d7b72c5b5f3c1744adf6a03e583e500
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 5 14:46:53 2011 +0100
Fixed boundary check for ignoring low-coverage edges
commit 39d02ec28bf7f25e4a8a1041871252cc0799b514
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 5 14:30:13 2011 +0100
Added coverage output to variants.fa
commit 83f1ff0a6e13e8519dc384bdb36ebe321ca0776e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 5 13:55:54 2011 +0100
Added extra variant kmer marking to avoid double counting the bubble construction failure reasons
commit def43905585e0643f620d1291db45aeee6834bee
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 4 14:12:41 2011 +0100
Re-enabled writing out variants to a file.
commit f3f2263f517138ce69d17926c3f7b2fd944b8367
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 4 11:40:24 2011 +0100
Added logic to filter out low-frequency kmers when attempting the process on an uncorrected data set.
commit 20a3dcffbc420a483a231cfac5f75441e18a5a8c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 4 10:59:20 2011 +0100
Moved from the substring graph traversal algorithm to a more standard SequenceProcessFramework-basd kmer traversal.
The new method is (much) faster for collections that have many low-frequency kmers due to sequencing errors.
commit 1a64ae1609cc033c399d1e2bf2a49ae118fcd384
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 4 09:06:29 2011 +0100
Removed unnecessary assertion when a loop is found when building the target bubble
commit 166fae1e49de9c5bdd29d409dad73368a0e3061d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jul 2 11:40:29 2011 +0100
Implemented threaded mode for GraphCompare.
commit 723366bdcf8b754e177801a90a2721dea89d0e18
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 14:41:22 2011 +0100
Changed GraphCompare status print condition
commit fe6322cc1c5552635ec8aa60bd52ef74e652e367
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 14:38:58 2011 +0100
Added parameter object to GraphCompare
commit 78f9dd32118e6a0dc41c8ff61e9bf542e8c73215
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 14:28:01 2011 +0100
Refactoring
commit dcefc145c5b2461b600951d7964dfc7dc3186c13
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 13:37:31 2011 +0100
More refactoring.
commit 25752dc3607a3167d3e0565a2720e26942569e6c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 13:26:39 2011 +0100
Refactored BubbleBuilder code into its own file.
commit ab19808abbd0b695f4ff53be106dbe9b697dd2a7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 1 09:55:48 2011 +0100
GraphCompare now writes out the variants that it finds in fasta format.
commit a92af5f4ff9574c2467c68bc7b0cce6fc3fe0fb4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 14:52:49 2011 +0100
Added better reporting metrics for the success rate of the bubble discovery process.
commit 09ca7e520c74a052b69c528c64f9b55c84dae7d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 29 15:46:38 2011 +0100
Implemented bitvector marking of used k-mers to avoid outputting duplicate variants
commit 47e74f94995a61450aea4f760159c15e99226a22
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 29 14:17:32 2011 +0100
Added code to build the sequence of the bubbles once a differing k-mer has been found.
commit b9745c14058d59997c8eb55cbf9ce652a4edea49
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 13:33:34 2011 +0100
Made branch code detection in GraphCompare more efficient
commit b5ad150123cb3a316911620d5075255503e89bb7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 12:53:02 2011 +0100
Initial k-mer traversal code implemented
commit 4ed699e75171c2e248b6fcc799b64d1130522861
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 11:18:23 2011 +0100
Finished skeleton of graph-diff
commit c6c2e3c6bb02bcda0b4f22a289efa26dd15d41d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 11:02:50 2011 +0100
Enabled graph-diff, cleaned up help message
commit 2b5fd709f0b16a2dab192fdc45c74353edf80de5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 11:01:19 2011 +0100
Added skeleton of graph-diff program
commit aa76eab9a3fa48141bbe93623ae9a7586c1b55b8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 15 15:31:42 2011 +0100
modified SampledSuffixArray to optionally work over the lexicographic index only (no samples). gen-ssa now avoids loading the read names to save memory.
Conflicts:
src/SuffixTools/SampledSuffixArray.cpp
src/SuffixTools/SampledSuffixArray.h
commit 34a7b850dd5f3e26fb0b03c7c8034aaa7969ae52
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 10:19:15 2011 +0100
Changed beetl index to use a named version of sga. Not for release yet
commit 9b433295e7e9df9de1b940d2cd7bc23c877e0811
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 14 09:53:59 2011 +0100
Fixed usage message for correct and correct-long
commit cdcf75068a66def7e703b7c9ff516d8c9a9eb59f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 12 19:52:05 2011 +0100
Version 0.9.14
commit cf422d578deaed1ac3f3c5c9aaee6d7f2c03c360
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 12 19:36:20 2011 +0100
Track the number of vertices that have been merged into each vertex to properly decide which walks to retain when removing bubbles.
commit 5b6cbef284458de23b580df9590db2ba5153d6ae
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Sep 10 11:55:47 2011 +0100
Rewrote the small repeat resolution algorithm to be much faster. New algorithm is slightly more aggressive.
commit f282ce8a036d3af3a61ef9f40f453cca4d0c96b6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 8 15:08:19 2011 +0100
Changed the sample rate in bwt2fa to use less memory
commit 8b89f86880138ee48af00e1e84b93ffc51bd45bd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 8 15:04:41 2011 +0100
Added new subprogram to extract the set of sequences from a bwt
commit 1ed9d452cea6f16275e0a4bb78525875d81d1804
Merge: 72248a4 f6045df
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 2 14:28:07 2011 +0100
Merge branch 'beetl'
commit 72248a4b5d7fa9f97811952633798f7cd517890d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 2 11:54:17 2011 +0100
Update human assembly instructions.
commit afc9c30d4d690e36a2ae9c797b0b7ab96fb80d59
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 2 11:50:08 2011 +0100
Added human genome assembly instructions and updated c. elegans script with the parameters used in the sga paper.
commit f6f0996ef2486dfdd3497362451e735384cb8563
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 2 10:35:24 2011 +0100
Fixed bug where the reverse index would not be used when the overlap method of correction is specified.
commit f6045df8d22c50128b05e6e03e123c602ef08214
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 17 11:43:14 2011 +0100
Created script to generate a BWT using beetl and convert it to SGA's format.
commit 87dd99074f288c7be7ffbd4648921cd58aa78d70
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 16 11:19:41 2011 +0100
convert-beetl now writes out an .sai file
commit 741f6a13ccbc3c98cb5caf301379bcd68a9c4058
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 16 10:31:08 2011 +0100
Removed debug code from convert-beetl
commit 4f8232c49759049df16cdcaebb405a5e355b6897
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 15 16:05:11 2011 +0100
Version v0.9.13
commit a58c811f3f0ecd7e81cffb9a468ad83a44aa7dce
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 15 15:45:17 2011 +0100
Fixed rare bug in the scaffold builder where a contig could be added with the wrong orientation if a unique walk is found between the contig pair with orientation opposite to that of the link.
commit 99d0df37b3620b52ee035e901f1682829f8d2c63
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 15 14:39:29 2011 +0100
In-progress checkin of subprogram to convert a BEETL index file into SGA's format
commit b2fe691d0179fcb36e4ac9ff015031bbe00d41b6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Aug 6 16:51:10 2011 +0100
Fixed a bug in the suffix array validation code. The validator assumed that suffixes with the same string were sorted by read name when they are actually sorted by position in the file. Thanks to Tomas Larsson for the bug report and test data.
commit 7ad38d1adf79abd8a6453414a709cb9248edf359
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 25 10:30:41 2011 +0100
Cleaned up unused files
commit 0e12d879d1f081e0f4c07fba59eeaf8f68d4ee1b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 25 10:19:03 2011 +0100
Updated version to 0.9.12
commit a4acf751ddf580d3546a6f8e3727a770a269b4da
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 21 14:32:58 2011 +0100
In sga filter, exit the homopolymer check if the read length is shorter than the k-mer size. The homopolymer and complexity checks are now disabled by default.
commit c8bfbfa93a5a2bb4ed0504c8af9a17adb9173ff1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 21 10:34:27 2011 +0100
Now sga walk will write out the reads making up the walk string in SAM format if the --sam option is given. This replaces --description-file.
commit 389851bbeed2a45c29b7c650b85c99d2ad87e635
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 19 13:50:12 2011 +0100
Version 0.9.11
commit 57fc244220ee2d0cfa54705b369e5f1dde940745
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 19 11:46:44 2011 +0100
Cleaned up comments
commit d3129e03f3c76acc172d599f164facb2fdf04921
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 19 11:44:09 2011 +0100
Added filter for homopolymer sequencing errors and very low complexity sequence
commit a8d3bffd13abbcea5a099211bc77f3366e80e2bb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 18 15:28:10 2011 +0100
Removed hard-coded threading option to bwa aln
commit 1c0539057e78e3b32e30ffbd59c543631778c9a6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 18 15:27:39 2011 +0100
Added function to compute an interval pair using cached intervals.
commit c7aa72e5ceda01b81f67d8d6c23bfd9d0366e3e8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jul 18 13:46:45 2011 +0100
Refactored sga filter. Moved arguments to QCProcess into a parameter object.
commit c9e1f344baf0e3b40c0307331a0c2228aa2a964c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 15 22:45:52 2011 +0100
Increased version number to 0.9.10
commit bcaf723dca41a4a292dfae1a9306230dcb3b216e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 10 12:24:41 2011 +0100
Added ability to compute overlaps between two disjoint sets of reads.
commit 3312036ce1cc04b347b70c0abf0779dabd9ea4a2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 10 11:42:51 2011 +0100
Temporarily disabled the threaded version of multi-key quicksort.
commit 5c1b929c5e0e70752ac90981da4d8fde2d355b49
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 10:38:17 2011 +0100
Fixed cluster size computation
commit e9378ca2f89e5b8ffaac6232849c1b010042b878
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 10:35:33 2011 +0100
Added ability to cluster based on seed sequences that are not present in the FM-index.
commit 2b0381e1ac9af8604a5a3e617c7d8388be357c04
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 09:51:41 2011 +0100
Added example script for the Illumina MiSeq example data. Includes the scaffolding component.
commit 67e255629e6e47178ca2862e9cf1e2bae567c363
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 09:50:43 2011 +0100
Deleting deprecated sga-pipeline script
commit 6b559aa3492e410a75786107c1199c4adab2ce93
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 09:49:49 2011 +0100
Cleaned up build. Added a Makefile to install the scripts the sga scaffold pipeline requires (astat, bam2de).
commit 450b1dc31e85e476fb41d881839f642ee20f32f1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 30 09:48:46 2011 +0100
Increased default minimum branch length in sga assemble to better handle long (150+) reads.
commit 9d21e06fea232400ca9aa1c2fa688e84d98e1d7b
Merge: d79f80c 2ba20b4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 27 09:32:15 2011 +0100
Merge branch 'long-align'
Conflicts:
src/Algorithm/Makefile.am
src/Util/Makefile.am
commit d79f80c292e3b959e86c7ad34e526b52b0bb7e51
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 21 15:01:37 2011 +0100
Added missing shortopt for --min-branch-length to sga assemble
commit 6e6517e9758d4bdbc5545cca84d6a4eb882a35ca
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 20 09:18:51 2011 +0100
Changed description in sga-cluster boilerplate
commit 97d9325012fbf2bdbdadbdcfc7b7cbb7cc35bebf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 17 16:09:57 2011 +0100
Added some defines to sga cluster to hide away the hideous template function calls
commit 24976493aaf9cf362de0eb80c5ebc3f5f8acc66e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 17 16:00:02 2011 +0100
Fixed formatting of sga cluster help
commit ac8f337e7d1431b8196a71700662920ef2f52241
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 17 15:53:20 2011 +0100
Enabled control of the max cluster size
commit 363407d47347d26662f17f07f2b3e454b93de659
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 17 15:46:53 2011 +0100
Implemented extension mode for sga cluster. This required refactoring the SequenceProcessFramework to use a generic generator object.
commit 330061b1edcb9d01f1a8ce6a2b982287e3b8a0d4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 17 14:07:57 2011 +0100
Refactored cluster generation code into ReadCluster class. Added stub subprogram for cluster-extend.
commit 6fc2d4486c5b99b47dd23661563880833e833b33
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 15 11:08:06 2011 +0100
Added read length check to k-mer filter.
commit 9047152c73ce4e86559ed98fd8e1d92bab2d22a5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 15 10:59:00 2011 +0100
Added early exit to kmerCorrect when the read sequence is shorter than the kmer length.
Thanks to Dan Hughes for the bug report and test case.
commit 2ba20b4e5114c8ac4f27ebfda95137a42261a115
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 14 18:23:52 2011 +0100
Made the extension termination condition more robust. StringThreader now returns a trimmed alignment to avoid bad-tails of the long reads.
commit 19dc59db0fbe14eb2539df59c195e551fa798924
Merge: d4334a9 7b2342c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 14 17:35:02 2011 +0100
Merge branch 'master' of git at github.com:jts/sga
commit 8f2eba2f6d327de8dce738c8319eeb3ca7019817
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 13 17:06:34 2011 +0100
tweaks to graph-based correction algorithm
commit 80b061c850a8e8ddb66eb1ad8277cf0bb71a88a5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 13 15:10:48 2011 +0100
Implemented outputting corrected sequences from the StringThreader. Still a bit hacky.
commit c43088a12d713b58f254e6559f74649ba98d96b8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 13 11:58:43 2011 +0100
Wrapped the long read correction process in a class to use in the SequenceProcessFramework.
commit 3330bc0f294f8f453644adf642f0e705b22f77e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 13 10:51:42 2011 +0100
Implemented writing out the corrected reads to a file.
commit 9de7e86ce88d62bee27b7a99564c82bb58aabd92
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 13 10:26:33 2011 +0100
New culling heuristic for StringThreader
commit 412356d04aed6aa14331047260c21e9a0ed9b3a0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 16:12:16 2011 +0100
Added a few more helper functions to the string threading code. Currently, the search explodes when threading through repeats that are slightly
divergent.
commit 9f1e8d6d5fe1be363365deda65105578622fd262
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 15:17:25 2011 +0100
Full extension algorithm complete, including culling leaves once their error rate is too high.
commit 761aa1e077c8d80a2af6b34e5050a895954130c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 15:04:56 2011 +0100
Added code to calculate the extended alignments for the StringThreaderNodes
commit 6b992a6f92b703f2023a9c8c59eccf96206c9939
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 14:07:04 2011 +0100
Integrating ExtensionDP into StringThreader. Initialization of alignment for root node complete.
commit 883bdc4824d279fcb92dd0da1f7d75a6911c0717
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 13:48:45 2011 +0100
Added function to ExtensionDP to calculate the local error rate of the alignment.
commit 8e5d79b724b783531ceb273a22f4ea8e656b7d57
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 11:57:26 2011 +0100
Implemented function in ExtensionDP to print the full alignment.
commit 9e0c8979dfa43fdc2ff450cc5c01a886421fbf21
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 10:14:52 2011 +0100
Changed ExtensionDP to use an edit distance scoring system. Fixed off-by-one in makePaddedStrings
commit 7b2342cd0f4be2e214ae2c9b24461e0e0e529430
Merge: 9b82849 d7e5739
Author: jts <jared.simpson at gmail.com>
Date: Fri Jun 10 01:45:55 2011 -0700
Merge pull request #7 from avilella/master
a small typo in a cerr
commit d7e573980cbdd6eecda3611c4ec5ec5390aaaa1a
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 10 09:36:32 2011 +0100
fixed typo --component-paths should be --component-walks in cerr
commit eca15cfb0eb9a151839705774f48ec68d380c65a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 10 09:13:37 2011 +0100
First implementation of extension dynamic programming code.
commit db66e36894dbc43cbfeb1a5f55930f7bdac846eb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 9 13:38:59 2011 +0100
Added core extension functionality to StringThreader
commit 1fb52124ad5eac620a1885847c623dfc4015cca0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 9 11:25:24 2011 +0100
Started string-threading extension code.
commit a6773e4f6345df49604d9f17ec45ac972c8f0dec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 8 18:19:52 2011 +0100
Refactored a bunch of code that uses stdaln into StdAlnTools. Started work on the new graph-based correction code.
commit 27164738cc1aefd2113f4ec98fe3832f099bc594
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 8 09:32:33 2011 +0100
Resurrected bwa-like saveHits function. Removed targetString member from LRCell/LRHit.
commit e9faeeef388dbaf5af068a71d227894d94083fdd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 7 15:13:08 2011 +0100
New experimental long read correction algorithm based on threading the read through a de Bruijn graph
commit d4334a9afe95756930e35f29bc52eeac098d2d58
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 7 11:06:12 2011 +0100
Check sem_init return codes to catch errors when this function is not implemented (on OSX). This is a stop-gap measure until I have time to switch to named semaphores.
commit 32825a8394a87366c7900b0679d7579836ac0753
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 16:51:22 2011 +0100
Added simple function to find new LRHits based on overlapping reads.
commit c50b3dc66fda2188e195e2c96fdb361ef371dc3d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 15:51:43 2011 +0100
Turned off hit extension for now
commit 76e9362991714eebf140c44e37576f5283953320
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 15:40:34 2011 +0100
Further refinements of LRHit extension. Not complete
commit 22deb14253402bcbf7d083e7756116e32fbfdd8c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 14:36:55 2011 +0100
First pass at extending LRHits to be full-length alignments
commit ed16c6ae5b8a770e8a866e324709921d152fb7a4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 11:00:21 2011 +0100
Renamed LRHit data members to have more sensible names
commit 29eb79b0418f6570ac153e3b7765c86ed5c6a26a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 10:22:17 2011 +0100
Added new LRCorrection module to implement the long-read correction code
commit 28d3763dbf36ebdf395ed28950bd2cd06fa1dc65
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 6 09:33:32 2011 +0100
Code cleanup in LRAlignment
commit 9c642214a85ea4ff3ee4df0b70f0db9143870cb5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 5 22:11:24 2011 +0100
More refactoring of bwaswAlgorithm. Changed saveHits to keep all good hits, instead of just the top 2 for every position.
commit 2b765186242dc9f1e526ae69c44833a7a0f61432
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 5 21:56:54 2011 +0100
Removed a bunch of debug info and refactored some code from the main bwaswAlignment function
commit 4026439597b3aed35fdd00ba5e1e2188bb88bc16
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 5 20:48:22 2011 +0100
Removed some debug output from the long read aligner
commit 1abb562b9d381d06627eadd9caaace624ea7120c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 5 19:57:10 2011 +0100
Added function to save the terminal hits in long-read alignment mode
commit aabaf2133ccd67e03ab2135d9a8aaa47e21e0682
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 01:19:50 2011 +0100
Added extra debugging info to the long read aligner.
commit da1f81da31d5a50f0437cea7f911131e68f88bf8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 3 16:19:16 2011 +0100
Added extra output to MultiAlignment::print
commit 6f7513255a946b0c30e03fb0233a5e1ee6c1f371
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 3 15:04:37 2011 +0100
Added --cut argument to sga correct-long to choose cell reduction heuristic
commit aef8602f311939876505cdb96ddd8654b2863d10
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 3 14:16:08 2011 +0100
long read aligner: Added two new methods of cutting the tails of a cell array
commit b7c2daa044723cab920de3f03afa9dc420a20ef1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 2 13:42:22 2011 +0100
Improved version of cutTail that does choose the cells to keep based purely on the highest score, but using the fraction of the maximum possible score for the cell. This will not be the final version of cutTail however as it still discards useful hits.
commit 17ce329bf55ca0eff735be4d44d119811a828bc8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 24 16:04:55 2011 +0100
Implemented cutTail function from bwasw to remove possibly erroneous cells from a stack. Currently it discards too much and removes a lot of valid hits.
commit 6f5d3d92a794bba7ad3745eeb1c37e387228f078
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 23 15:31:23 2011 +0100
Implemented function to transform a multiple alignment into a consensus string.
commit ed2e590f27a0b39cf6dfe09547a9ab7e1b369696
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 20 16:51:07 2011 +0100
Implemented gapped MultiAlignment class
commit 9560efb1535f74559244ae3de5ace4949d95e6d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 17 16:41:23 2011 +0100
Modified to read query sequences from a file.
commit e4d1ef645797397c96083027963c1ac7b1cf4e0c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 17 16:35:45 2011 +0100
Macro'd out the bwa compatibility print statements
commit 92ff2e2166d35f37f06ffc707db06198c936113b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 17 16:17:56 2011 +0100
Created new subprogram correct-long to hold bwa-sw porting work. Added generateCIGAR function to LRAlignment
commit 3b416b8515a1a57c43e563751008dd600f69d8f6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 16 14:12:22 2011 +0100
Implemented O(N) algorithm for constructing the sampled suffix array from a BWT. Implemented reading/writing SSA.
commit 62232bced0f8d0064c1b69ac055ff9209ec07fa6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 16 12:57:58 2011 +0100
First implementation of sampled suffix array data structure and gen-ssa subprogram.
The sampled suffix array is used to calculate suffix array elements for a given suffix array index.
commit 6438a7c3823483977c5f3c29b8dbd38e2b39438a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 15 17:31:30 2011 +0100
Implemented saving found hits. Still very much development, not for use.
commit 959bcc9c102c8f6984f980a02e5993489e9e880d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 14 10:40:30 2011 +0100
Implemented most of core bwasw algorithm. Output verified to match that of Heng Li's implementation. Does not restrict the number of nodes to track (z-best heuristic) or output the alignments. Lots of debug information in this version, not usable.
commit 5b92f6be2ff964e1507642642753bae2a6a996bc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 13 10:17:18 2011 +0100
Updates to bwa-sw algorithm
commit 65847f72133747c9465372efa0d58c256ab6ac07
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 11 09:43:01 2011 +0100
Created files and framework for implementing bwa-sw algorithm. Not functional or usable.
commit 22e193583002e70dcfcb0711d955c21bc1c5e2fc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 5 20:16:19 2011 +0100
Fixed gcc 4.6 compile warnings
commit 9b82849e79aebf2fd3e15c10b07347128790a97f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 18:15:03 2011 +0100
Handle Ns in Util::complement
commit 4bd09a64c18cb0fae260765d4a5ace46131e6f9b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 16:27:56 2011 +0100
Added std::map-based ScaffoldSequenceCollection for scaffolding sequences that do not belong to a graph.
commit 0b81eaff8ef8292c05167de42e6a26f29363476f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 15:13:54 2011 +0100
Factored out the scaffold input sequence container into ScaffoldSequenceCollection so that it does not require using a StringGraph
commit a8e01ff8228bbcd126f2c40662f92fd389c248bb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 13:57:49 2011 +0100
Rewrote some ScaffoldRecord functions to take in a parameter object
commit 15c05999592494fbefead689c09e0412877c9696
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 01:34:22 2011 +0100
Changed wording in a-stat warning
commit 914e237a4c490341fd68b536a988dba274dfbb90
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 4 01:26:31 2011 +0100
sga scaffold: allow loading sequences with ambiguity codes, disable the requirement that an a-statistic file is provided.
commit 82ad6c07270af66e80cc559f78af58d6e652646f
Merge: f106482 d190256
Author: jts <jared.simpson at gmail.com>
Date: Fri Jun 3 08:30:57 2011 -0700
Merge pull request #6 from avilella/master
merging change by avilella to allow the sequences output by sga walk to have a user-defined prefix
commit d19025646692d6c74704be31a0fd3706aae01574
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 16:24:18 2011 +0100
deleting INSTALL for now
commit 8e97f7ffaaa88164023d852ab581a87612369ef4
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 16:13:03 2011 +0100
walk with prefix tweak
commit c587c1d7f54ea92b75c6ec9e18c16a62ac02f9be
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 15:51:29 2011 +0100
adding prefix to walk for convenience to the pinball pipeline
commit a9a2a0093d541f84d3912ba9f239f4adab3a43c1
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 15:50:49 2011 +0100
reverting to original walk
commit 6b1124c4a87909cb8d69b7510326bd84c62429f1
Merge: 377a67b f106482
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 15:09:24 2011 +0100
Merge github.com:jts/sga
commit 377a67b93971e499a2264dde111858fb0520f4d0
Author: Albert Vilella <avilella at gmail.com>
Date: Fri Jun 3 15:09:05 2011 +0100
adding prefix to walk method for convenience
commit f106482fb9ffff8b1b8087885ea98ffe52777633
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 31 23:10:33 2011 +0100
Updated main README file.
commit 329ea20fea225d89e10ba72e07e51d190b071701
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 31 23:03:16 2011 +0100
Tweaked wording in top-level README
commit f9532a1f90786f9365e1547cae983be5d17707c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 31 23:01:54 2011 +0100
Added short README to the top level directory.
commit 3694bb3e022eee5697ab9bd98fa134ffdb0adbed
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 31 16:03:44 2011 +0100
Incremented version to 0.9.9.
This version includes running-time improvements including:
-an interval pre-computation scheme to reduce the time required when performing k-mer interval lookups
-the ability to suppress construction of the reverse FM-index (when performing k-mer correction)
commit 7b8faeea57f2a20305394a4d2d5501e76a1bf509
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 30 14:12:58 2011 +0100
Added a directory with sga examples. Currently holds a script for a c. elegans assembly.
commit c84e4da0c0871ec688b9fc0dd9237e3fa1ec092b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 30 13:48:21 2011 +0100
Added option to sga-index to suppress constructing the reverse BWT. Modified sga-merge to avoid attempting to merge RBWTs if they don't exist.
This new option is useful when performing k-mer correction as the RBWT is no longer used.
commit 177c80b82ef056749f19a23f2575dba76e703f5c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 30 00:09:22 2011 +0100
Code cleanup: created a parameter object for setting the error correction options in place of a constructor with many arguments
commit dc8a39fd1b865f219b4f05353e20776365bc8ed8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 29 02:49:01 2011 +0100
Changed BWTIntervalCache::lookup to take in a c-string to avoid an extra copy
commit d1ea6f567cdd42ce5924d98c48fd9520f32f1d3e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 29 02:45:46 2011 +0100
Implemented caching of BWTIntervals for all strings of a given length. Currently used in the k-mer corrector.
commit 94f734c9e37a2f32c511a26250a2c2cd14d046ec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 28 17:41:55 2011 +0100
Re-enabled -Wall as bamtools now compiles without warnings.
commit c529dfd095dacbfb2e40fb461404f56f2357a50f
Author: Albert Vilella <avilella at gmail.com>
Date: Fri May 27 12:07:59 2011 +0100
a bit rough, but just to have an idea of what is needed
commit 369340dd1cd6b28d3510b6e67133bb3b10ddee3c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 26 16:51:16 2011 +0100
More comment improvement
commit eb0a8ab2ac40c7fb3bec976389334bb6a920cc30
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 26 16:49:52 2011 +0100
Added comments to the SparseGapArray
commit f0a2b438085bbf36fb749ee4735d5ebbc5989e20
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 26 16:40:56 2011 +0100
Rewrote the gap array to use compare and swap instructions when updating the base counts. This allows much better concurrency when merging/removing reads from a bwt.
commit f61402ca428cd85075e027c2cbc5fdc9e0470e86
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 23 13:27:15 2011 +0100
Increased version to 0.9.8
commit ccdfccac98fb966124ada8fbb78b5112c750c96f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 23 13:06:56 2011 +0100
Merged the duplicate removal and qc checks into a single process: sga-filter.
commit 44578216a6f6b0b742913d8abc4ea014d8663f8e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 23 09:37:41 2011 +0100
Modified k-mer corrector to only use the forward bwt when looking up k-mer counts. This effectively halves the memory usage of the correction step.
commit e6d6a7bb8d9ebba8292f410728e078dd39cb6ba5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 9 11:35:40 2011 +0100
New mode to sga-walk which exhaustively finds all walks through the largest connected component of the input graph. Used in sga-cluster workflow.
commit 8e833bc432e4b2cd82918f5e75745d44c3a450fd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 6 15:18:08 2011 +0100
Disabled RLBWT validation
commit 10c504958fafe0a1d89c7bf868d1bf8c020bec48
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 18 15:28:41 2011 +0100
bumped version number
commit a5efc71c73a77223b543fb8f629c99368879f9b8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 18 09:29:44 2011 +0100
Made the maximum distance estimate error when resolving gaps a command line parameter
commit b586aa660abd789a670f4f96025dc4b891aae605
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 11 14:29:51 2011 +0100
Bumped version to 0.9.6
commit 0b7ed6b91cd6bb1a702aba2ecddb5c935152abe5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 11 09:50:54 2011 +0100
[Github issue 4] Made the error message for the case where a substring read is found during string graph construction to be more informative.
commit 864b46fda7a641793eada75ed55efc24761bf923
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 7 14:03:03 2011 +0000
Changed default gap array parameter back to 8.
commit 51f1f45754fe2b154ffdaacdaff6192ef73e1e87
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 4 17:07:07 2011 +0000
Changed sga-cluster so the temp file uses the same prefix as the -o parameter to prevent name clashes.
commit 1534e827e635a9fc7e1e1aa11ce6034f87a2ffd5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 4 09:28:45 2011 +0000
Changed sga-scaffold arguments. Unified repeat/unique a-stat so any contig that does not meet the unique cutoff is deemed to be a repeat. Added --min-copy-number parameter to discard contigs that have a low (<0.3) estimated copy number.
commit fd4161feb4da0ed7672062b660cb095adbfd681f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Mar 3 15:48:51 2011 +0000
Changed default gap storage parameter in sga/index
commit 27b9e392db5381bb41d4f75e49c39318db8ee35d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 1 13:59:39 2011 +0000
Bug fix in ScaffoldRecord to avoid outputting duplicate records for singleton scaffolds.
commit 10f2054cd9755de6dda2983d3655dac788a53164
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 28 14:07:17 2011 +0000
Made sga-cluster emit an error when a substring read is found.
commit ea62772f09f7ae4fa50ed2ff60321255bb86e527
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 28 14:02:34 2011 +0000
Removed the default seed stride parameter for sga-cluster as it would lead to some overlaps being missed.
commit a4c314f8483be714218d37477e0d42efa181b370
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 27 18:59:07 2011 +0000
Fixed bug in filterBAM where an extra read pair would be erroneously output
commit e21c5d62d09522080a15c7e9708849d9a076aaf3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 27 17:58:02 2011 +0000
Added new filtering modes to sga-filterBAM. Can now filter out pairs based on error rate, mapping quality and kmer depth.
commit 2cffb77f7435eac859f3b6108fbb2aca5aa416d7
Merge: a484b22 2f97650
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 26 12:59:33 2011 +0000
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit 2f976508f626ed7e1ff6aa466ef26e3cd99de431
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 26 12:59:03 2011 +0000
Fixed warning in OverlapBlock
commit a484b22742aa0c511d797a85a9ed48a21f799224
Merge: 2dc0007 e59eafa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 26 12:56:37 2011 +0000
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit e59eafa9f24398f208242664d9baa35426435758
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 26 12:54:10 2011 +0000
Rewrote sga-cluster to use the FM-index instead of an asqg file.
commit 2dc000754669d7b10c3b8e61a3cfe33ac43eec75
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 23 15:43:57 2011 +0000
Changed help text for --max-distance option to filterBAM
commit 56227dc8153d26404516b4631369787967408399
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 23 11:21:54 2011 +0000
Fixed tripped assertion in makeScaffolds for the very rare case that a terminal vertex cannot be found. Fixed stats output to avoid 32-bit integer wraparound.
commit f1a63a6d5dcbafe82efa197d1e588ab46ec349c0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 21 16:37:24 2011 +0000
Made command line argument to change the minimum walk distance in sga-connect
commit 0ac3ec605c4e299b0861fe04ff345b7e01ffccb7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 21 16:00:56 2011 +0000
Fixed usage message for scaffold.
commit 20aa3be43131e72fe8227cb32069b69f24320b90
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 21 15:33:39 2011 +0000
added optional -k parameter to sga-bam2de.pl
commit 7a672d957a228297968f97d61c667c5d118468e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 21 12:44:15 2011 +0000
Added exit statement to sga-bam2de.pl when the the command line arguments are incorrect.
commit 584a186860f95c86e23c3b629ec1ade29d37a556
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 21 11:41:52 2011 +0000
Modified sga-astat.py so that it does not require a bam index file.
commit a816d4e2d930c251c047a140c09127cc3d9720d5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 20 00:54:44 2011 +0000
Made the temp files of the bwa aln step avoid using relative paths.
commit f2676ac8a2ac953eda28b2c4a8ea4e2317619fa3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 19 23:25:29 2011 +0000
Made the kmer corrector the default algorithm to use for sga-correct. Changed the default kmer size to 31.
commit 8695469162c3e35f908f544e977810a3a8f52614
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 19 23:20:28 2011 +0000
Changed the error correction metrics to use wider integers to avoid wrap arounds for very large data sets.
commit aed15d4dd40af70f1aeb32e11753d0385c5cd16c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 19 22:48:07 2011 +0000
Added -w parameter to sga-walk to allow the exact sequeunce of the walk to be specified.
commit de4b8c1c40962f82c35c8bae72d07bd12dfe67ac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 19 22:46:01 2011 +0000
Added the contigs filename to the temporary output of bwa aln so the same reads can be mapped to different contigs at the same time without having a filename clash.
commit 9d33667e4dd6965717c34d03a3dbfbaf882e42f5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 17 17:03:08 2011 +0000
First pass at learning the kmer correction threshold.
commit d1d752a0c5c7fc20f3f2476ff9d1f29ab0233eca
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 17 11:42:03 2011 +0000
Refactored KmerDistribution code into its own class.
commit 607286d72635722d60ad63023a6ba30ccf79d1a4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 17 10:52:23 2011 +0000
Wrote experimental program sga-cluster to write out the connected components of a graph. Requested by Albert Vilella.
commit 1bb6a4259b8413f82f69566b7bf983d425c60c29
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 17 09:53:12 2011 +0000
Added --no-overlap and --branch-cutoff options to sga-stats.
commit 9413894efd1df18617d936edbaa1d98c914140aa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 15 14:32:50 2011 +0000
Made the exact-mode irreducible algorithm the default again for overlap/fm-merge.
commit 7bf4732c3b4c86524f2c49cd8eef2270d01d5978
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 15 14:30:23 2011 +0000
Rewrote the exact-mode irreducible block algorithm to be iterative instead of recursive.
commit 3f5c3675e54907442c3bbde298843afb455c9a3c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 15 13:16:11 2011 +0000
Fixed bug in OverlapAlgorithm::_processIrreducibleBlocksExact where the assertion was checking the wrong condition. Added --exact optiont overlap to force the use of the exact irreducible algorithm.
commit c6f08ff3d23683dbec2b17ce0ca950f9fd617dd1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 15 10:07:07 2011 +0000
Rewrote the CorrectionThresholds to be a proper (singleton) class instead of a namespace.
commit a5ef469d85b7bce4fe84f03a27db15a5713a0b7d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 14 21:29:20 2011 +0000
Added --kmer-distribution function to sga-stats. Re-enable the -x option for sga correct to set the min kmer coverage required.
commit de848326b2ed34aebf001fa1abfb3e83d8adce95
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 14 09:57:39 2011 +0000
Cleaned up handling of cycles in fm-merge.
Previously, duplicated intervals would be added to the result structure resulting in a warning when the bitvector would be updated.
Also, the graph would contain two vertices for the root read, leading to some joins being missed.
commit a6ad6aaea7cd18c554b77dd13335ebc4a17704d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 22:22:02 2011 +0000
Wrote help message for sga-mergeDriver.pl
commit fd3fa8dcdfe5e9dfd2a30ad1926944fe60155060
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 22:15:47 2011 +0000
Moved sga-mergeDriver.pl from sgatools into main repository.
commit b059338eeb8019417a75a767e884fa3911004bf9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 20:55:49 2011 +0000
Implemented quality score conversion from phred64 to phred33 for preprocess.
commit 14436719403f5cf4c37b116f56030168c95bf608
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 18:50:03 2011 +0000
[Issue GH-3]: sga-preprocess will stop reading the file if there is a fastq record with no sequence or quality values.
This case was explicitly used to signal error in SeqReader. Fixed by changing SeqReader to emit a warning instead of stopping processing.
commit 4e225d32bb3824d349a13a8f74a9fba6efd394c4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 12:45:44 2011 +0000
[Issue GH-1] Changed shebang in python scripts to use /usr/bin/env so user's environment python is used. Reported and fix suggested by John St. John.
commit ab838fd434212663b6ba5dcb619b6f4b9f4b406e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 12:24:50 2011 +0000
Switched the irreducible block algorithm back to inexact mode.
commit 126894f1e5a5851cdf94620586cc3f855f90afaf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:37:53 2011 +0000
Added flag to SGSearch::findWalks to specify whether any walks should be returned if the search was aborted. All uses of findWalks require an exhaustive search to be performed except the utility sga-walk.
commit 756832329d651a74c60d1b404e558493700370e6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:35:48 2011 +0000
sga-preprocess will now append /1 or /2 to read names in pe-mode if the paired reads have the exact same name.
commit 7e1d524c0d2d7cb47678a4df3eaf65b4381eb3c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:32:47 2011 +0000
Changed overlap correction QC. Now requires at least one overlap supporting each base in the read after correction.
commit ba28792a737226fefd60d9254383e283d7e192ff
Merge: a4aabaa 538a8f2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:27:34 2011 +0000
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit a4aabaa3e653d28209423f597c61de709ab86a9b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:24:23 2011 +0000
Added genome size option to sga-astat.py as alternative to performing the bootstrap estimate
commit 6ca5861b44d250e3fc0e87d1584910a17a7c710a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 13 11:22:59 2011 +0000
Added warning to the overlap computation for when a substring read is found.
commit be8768357545297fb2953ab7b028f55e2290dc0d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 10 16:27:59 2011 +0000
Added parameter to specify the maximum number of bases to correct with the kmer corrector.
This allows the progressive correction very low quality reads.
commit 538a8f25162392f5db22b343a9cc3777ed94ec12
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 10 16:27:59 2011 +0000
Added parameter to specificy the maximum number of bases to correct with the kmer corrector.
This allows the progressive correction very low quality reads.
commit 4a0d794362dde272333c55229da7842ad1de7bd7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 9 13:30:36 2011 +0000
Updated README to reference the python modules that the pipeline scripts require.
commit ac6481cd5d5a808767271d79392ddbee96871c16
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 9 13:22:15 2011 +0000
Implemented dust filter for low complexity sequences in sga-preprocess as suggested by Albert Vilella.
commit c86461c4e6ee440c5aa1f7ec1d349eb5280d24e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 9 11:34:52 2011 +0000
Cleaned up sga-scaffold to only print link validation messages if -v flag is given.
commit dbd255ef558416133192ef9dfd0bf682bd1400d2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 9 11:26:07 2011 +0000
Implemented loading contigs from a fasta file for scaffold2fasta
commit 89b297cab2176ba4ba7bac956e87d1dbb0703d5b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 9 11:07:52 2011 +0000
Moved astat.py from sgatools repository to sga/src/bin/sga-astat.py
commit 02eb53578a7e4a4798ebc588883671d859616a0c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 8 12:43:56 2011 +0000
Updated OverlapAlgorithm::_processIrreducibleBlocksExact to work in the current overlapping framework. It is now used by default.
commit e5f5be2432b18e0e4891318ae02f931039638649
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 7 11:03:58 2011 +0000
Updated version to 0.9.5
commit 9f0c7ee13c58ccda26a47e2f9db059826a7b060e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 7 10:59:41 2011 +0000
Removed dot file output from scaffolder
commit f799f2cb7cb801e570d40e142f9712b4eff02d3b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 4 15:07:26 2011 +0000
Added function to break the scaffold graph at positions that have conflicting distance estimates.
Also changed the link validator to break the graph when invalid links are found instead of deleting one of the vertices.
commit ac6a5499c516b4333c2bd968f52e190ad151c55b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 4 13:21:29 2011 +0000
Changed scaffold2fasta to write out unplaced scaffolds and use a gap with a minimum length
commit c0186dc253e931616ff3d663b6879b792badd9a4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 4 09:30:48 2011 +0000
Implemented SV detection and removal for the scaffolder. Turned off by default.
commit 8dbf97a27ec000dddfa40849cd8d6c6eed3cb85d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 3 17:03:30 2011 +0000
Minor formatting tweaks.
commit ecaf0eb4c2af6c80b8600df9b09ee944ac87921e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 2 16:44:04 2011 +0000
Imposed a max indel size on the variant resolution.
commit 43f6362aded5fe7bf1008ed3f5b4960af7e6f02b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 2 16:22:36 2011 +0000
Added filterBAM subprogram to attempt to get rid of bad MP reads.
commit 77f9e51e241e0b3ed0002b00cb8ec24d515ee7da
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 1 19:25:17 2011 +0000
Added status output message to the link validator
commit 2d8af5440d86be3cb4dacc66821fb42769b614c6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 1 18:07:46 2011 +0000
Wrote function to find and remove cycles in the scaffold graph
commit 2d2c98a1583612367bcdf0d6ae3f49dc9e78de03
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 1 14:11:34 2011 +0000
Fixed scaffold2fasta to search for a path of the correct length (was using end to start distance instead of end to end).
First pass at removing contigs from scaffold graph that do not have consistent distance estimates.
commit 4b61fb3ebc1123c3c32146bd53e4687646cb65ac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 31 17:04:47 2011 +0000
Rewrote FMMergeProcess to use the compare-and-swap functionality in the bit vector. Removed the locks.
commit 4e95047f20dd379de39b5eab15f667ee75533680
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 31 16:36:04 2011 +0000
Wrote compare-and-swap updates for the BitChar/BitVector data structures.
commit 6fa0dddacf6c419d25b05277394d32f60292d9e9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 31 15:50:38 2011 +0000
Changed name of cigar line in variants file to make it clear its an internally-used field and not for the fasta sequence that is output.
commit 50e578a74f6babc5ee1a20ea91f7875850764541
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 31 14:40:38 2011 +0000
Integrated Heng Li's stdaln dynamic programming library into ThirdParty. This is used in the variation removal algorithm to set an upper bound on how different two sequences can be and still be removed.
It was found that the previous code would collapse together very divergent pieces of unique sequence by finding a path between two different low-copy repeats. This is clearly undesirable.
commit 769f4ccfd0ff51cdaca937fe2a92460958f9a7ad
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 30 20:45:47 2011 +0000
Removed some prints
commit 402fc4775e6db768af78919c7f55ea679f02e92d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 30 18:18:37 2011 +0000
First implementation of new scaffolding algorithm.
Now generates primary scaffold but does not do much error checking or attempt to place small contigs in the gaps.
commit bb25b000cd01eddde7fed473f004cab5cbbff641
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 30 17:31:44 2011 +0000
Minor update to scaffolding output message formatting.
commit 7172839fea214bd0424a9e18e04fc461d06c548e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 30 17:25:20 2011 +0000
The scaffolder now writes out scaffold statistics at the end of the program.
commit 5e123a7132f44fcd7656ec8be9886e10cd752352
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 30 17:02:50 2011 +0000
Wrote algorithm to compute a layout of the connected component for a scaffold starting from a terminal vertex. This function is the backbone of the scaffolding algorithm.
commit 97e060eaa70da1367e7f05058242f39c3f6fc307
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 29 15:33:45 2011 +0000
Added new ScaffoldAlgorithms files and refactored some code into these.
commit 615e895cf0d1dadeca245da2daef3e3d9d9812d0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 29 15:03:53 2011 +0000
Implemented connected components algorithm.
commit db9b65fb7138db8aea8d1f159e1203b87c7aa748
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 28 16:55:47 2011 +0000
Added function to infer secondary links between nodes in a putative scaffold.
commit 0e85c8a355c965100970f45c432c88b97df00c42
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 28 10:12:27 2011 +0000
Removed unused object to remove warning.
commit 77742a1a6d3448c1d45c276e43c2857b8bd49307
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 27 20:50:49 2011 +0000
Cleaned up code
commit 40be405a7730b183b1d1e8cbd42cb001fc525db6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 27 20:43:08 2011 +0000
Experimental layout algorithm for scaffoldding
commit be58a2295b6eedacb3b1eebf467da998064fac5b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 27 16:56:40 2011 +0000
Added new function to remove transitive edges from the scaffold graph. Needs more work.
commit 205121ecb5586d768e5dd46704bba451c665c99d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 27 14:57:48 2011 +0000
Seperated input of paired end and mate pair libraries for scaffolder.
commit 124558f49b5df3cdc5902a8edac0ed179eccc450
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 25 13:00:50 2011 +0000
Removed some dead code.
commit d2ca6ffd8aa2f1557865d9fb55afd3b99f792cbf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 25 12:50:18 2011 +0000
Simplified construction of variation paths in the graph. More work on making the searching code more generic.
commit e0b148af7d5973a9c2f87375b040ebba4fed6d4f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 24 16:59:57 2011 +0000
Started to implement searching for walks on a scaffold graph.
commit c80c8ca5e23b7e232ca74b17567c741427e5f1ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 24 15:51:09 2011 +0000
Continued to make the graph search functions more generic.
commit c8088139640a0ab6b217a819db6d44dddf16629b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 24 11:41:30 2011 +0000
Made the SGSearchTree a generic templated class so that it can also be used for the scaffolding module.
commit 018fd3ac19a15c1ed86c77e02f57c55e477af782
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 21 17:07:45 2011 +0000
Added functions to compute the probability that two scaffold links are incorrectly ordered.
commit b50b55090a2222516854767072f46cd36beec785
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 20 20:55:55 2011 +0000
Added ScaffoldGroup class to handle ordering a set of contigs.
commit b862fdb5b0b7520d63bc39d650de09f7fa56ba8b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 19 13:56:18 2011 +0000
Added extra output to sga-scaffold
commit fc980aaa87fe1634a20f18cd8f9c7869a1065934
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 18 12:45:23 2011 +0000
Fixed sga-correct to search for a path with the correct length by subtracting the amount of the fragment that is present in the current contig.
commit 145a37d3352ea566f09225e0accab734be954bd0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 11 16:09:02 2011 +0000
Cleanup up the connection code.
commit fc5c87aba3b1eb3bc066fd7966af3250ae947ab7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 11 09:58:04 2011 +0000
Added statistics to sga connect to describe why the connection failed
commit d7d732b2f804964918379e5e40f7b277e6176c63
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 10 16:47:05 2011 +0000
Much faster version of BAM-based connect. Now uses substring operations when extracting the fragment instead of copying potentially very large strings constantly.
commit c0d44cbe8515e5c38791edcd9a21b1b6364a7bcf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 10 15:29:55 2011 +0000
First implementation of sga connect using a BAM file. This version is functional but could be more efficient.
commit fea32d1a6c16ddea9caa9008714c0e8fc67257ea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 7 15:46:17 2011 +0000
Updated configure.ac and the README to support use of the BamTools library.
Modified sga-connect to read in a bam file using above. This work is in progress and not complete.
commit 685326e29eaf2ccc4cf5db60351ff46f333522e4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 18 10:21:45 2011 +0000
Integrated SGSearchTree into SGSearch::findWalks
commit 6061dc33be2cfc1d55cd00fc8383cb9c15390823
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 18 09:54:37 2011 +0000
Modified output files to have a common prefix, which can be specified on the command line with the -o option.
commit 42c58389e88c3092ba6473735aa6e20ed402b8e4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 17 17:46:49 2011 +0000
Integrated new search tree code into the variation smoother. The results are subtley
different than before due to differences in how search termination is handled.
Ensured no memory leaks were introduced with Valgrind.
commit e3d4b4eb4d0ce38e45324b25143ec920c3b04a8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 17 16:59:07 2011 +0000
Implemented new SGSearchTree class for more efficient graph searching. All functionality
has been implemented but it is currently untested and not used in the project.
Refactored SGWalk into its own files
commit b016c9c2f81f37fe29180a6deb2f052e9e3ce511
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 17 10:25:18 2011 +0000
Added exit condition to SGSearch::findVariantWalks to avoid infinite loops in the case that the graph contains a non-branching cycle
commit a68a8daf6ac712c9e1feaf82326f9892c828cca1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 6 15:50:38 2011 +0000
Fixed bug where variant removal would assert if there was a degenerate bubble
commit b29206bec2c1af254cbd336992e39b6c7e7d7fff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 6 10:24:21 2011 +0000
Removed unncessary to-do warning from variant smoother
commit 8756f099853a18070ea94d004cd44562f1000a74
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 5 23:07:42 2011 +0000
sga-qc: Moved the delete call for the BWTs to occur before the indices are rebuilt to avoid having two copies loaded at once needlessly.
commit 8bb84da66ead3ff63151e74ea1e17a77adbe254d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 5 16:34:04 2011 +0000
Implemented writing out variants to a file in fasta format.
commit 26b672e50df06f79f3c124f64f7b2935020174c7
Merge: 68ed773 e517883
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 5 15:49:18 2011 +0000
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit 68ed77312f8f52b8a3fcca3e4c53cd677fff461c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jan 5 15:43:53 2011 +0000
Implemented more complex variation (bubble) removal algorithm.
Now, instead of only removing simple "diamond" shaped bubbles (2-out, 2-in) bubbles can be arbitrarily complex
as long as they hold to the following conditions:
1) they start and end with common vertices
2) for the internal vertices in the bubble, they do not branch to any vertex that is not
part of the bubble.
The second condition ensures that the bubble is not part of any larger structure and can be cleanly removed.
commit e5178832318a59cf340e78e172c75d422b58dfd3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 4 14:17:33 2011 +0000
Improved the speed of sga-qc by an order of magnitude.
Previously, the interval for each k-mer was computed individually, requiring many Occ() calls. Now, if the interval for a
given kmer k_i meets the threshold, the interval is extended by one base (finding the interval for a k+1-mer).
If this k+1 interval meets the threshold, it implies that k_(i+1) also meets the threshold which saves k - 1 calls to Occ().
commit 46848b91df16477b41cc6e3c91435b1e2b02d3a5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 3 12:19:08 2011 +0000
Removed warning
commit 3c4809ab4c3260e19368ec0593f9752c271f72d7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 3 12:16:08 2011 +0000
Implemented mutex lock on BitVector to allow threads to update the bitvector atomically in fm-merge.
This allows us to optimistically merge reads in each thread, then atomically determine whether
a different thread has already merged that subset of reads.
commit 6284f1c9532bbf76fdfc1105522daa9a8775d419
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 2 19:08:24 2011 +0000
Modified sga-rmdup to determine which read to keep based on the index of the reads, not their full id/name.
With this changed, we no longer need to store the read ids in the ReadInfoTable which lowers the high water mark
of the rmdup computation (and the whole assembly pipeline).
The default sample rate for the fm-index in sga-rmdup was changed to 256 as well.
commit d83eb536c9ec5e3008727e7829763d6de55a2c7e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 1 22:31:16 2011 +0000
Removed errant print from SGA/index
commit a4ff030b9f8c4d7560ae346b93d99a60a85cf750
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 1 22:26:02 2011 +0000
Added some extra comments
commit 5676b6086039681c9a3f208c63ed285b7a7e94c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 1 22:16:49 2011 +0000
Fixed very subtle bug in overlap computation when reads have multiple valid overlaps to each other.
If, for a given read x, more than two reads have multiple valid overlaps to x it was possible that
when post-processing the overlap blocks to remove the redundant overlaps the forward and reverse
intervals would become out of sync (they would represent different reads). This can effect
the downstream calculations (like the irreducible overlap computation) as the wrong
extensions for one of the reads would be found. This case is very rare so the fix is
to calculate the source read for every element in the overlap block, then remove
the redundant blocks. This calculation is slow but gives the correct result. It has a
minor (a few percent) impact on the total running time. The bug had very little impact
on the final quality of the assembly but the structure of the graph in repetitive
regions is likely more robust now.
commit 421951e5d5312ead32fc7370c30232f50e391595
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 31 12:36:04 2010 +0000
Revised dead-end trim function to only remove if the branch is less than a minimum length. This makes the trimmer function properly on a graph constructed by fm-merge.
commit faf1d4beb0f6d6e4c1f6b5c29edcf42ed4e0ced1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 30 14:57:22 2010 +0000
Fixed bug where fm-merge would crash if the graph contains a simple cycle
commit d962a5fc2918cca0996546fe3a6dab1a04cfb9b1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Dec 28 21:04:57 2010 +0000
Changed error corrector to print out a masked multi-overlap
commit 5aee03b84b5688fc017a628a7caf899ed22af9ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Dec 28 15:36:02 2010 +0000
Implemented quality-aware overlap correction. The quality scores are used to select the cutoffs for the number of times a base needs to be seen to avoid being corrected away.
commit 466f4d0f7e0ee6b0d2c2ea8a7a1e3f9bd5762d8a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Dec 25 13:42:45 2010 +0000
Changed default sample rate and gap array size for sga merge
commit 15238d189e056d1ca34338dbdb36c712b0b1a3dd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 23 16:38:35 2010 +0000
Modified k-mer corrector to take quality values into account when determining thresholds for correction.
Added debugging information to fm-merge
commit 4f171db122dd8a2bc84e40ed88dbf27d90816d21
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Dec 21 09:50:53 2010 +0000
Added a simple kmer caching scheme to the kmer-based corrector to avoid duplicate lookups in the fm-index.
commit 1ec3c600bfca474ee187178dd21d027b8387d69b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Dec 16 11:47:31 2010 +0000
Implemented 1-bit sparse gap array for rmdup/qc
commit 2a0ef89e243e8ff809c643af0a25a8c31afff49d
Merge: 9e15ef3 1c284af
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 15 22:22:04 2010 +0000
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit 9e15ef338d553230ffff265f4112db848a8b6abb
Merge: 0642954 50de2c1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 15 22:08:45 2010 +0000
Merge branch 'fm-merge'
commit 1c284afe553d49b46ed80c41281d1524f47c281e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 15 19:11:21 2010 +0000
Implemented interleaved mode for sga preprocess
commit 0642954b4cc437a9c167765a832270542db19460
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 15 11:39:07 2010 +0000
Fixed bug in conflictConsensus algorithm where the error correction would not filter out true conflicts if the root base is the 3rd (or 4th) most frequent base but still above the cutoff.
commit 50de2c1e6bf09cde064f481de7d1a6aa0cdc67db
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Dec 13 16:15:40 2010 +0000
Working version of fm-merge. Currently requires a remove duplicate edges operation which is sub optimal.
commit b914991bc7dbb34b891925d162fb04c5804fd2ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Dec 13 11:31:09 2010 +0000
Rewrote overlapReadExact to fix the very rare case where a read has a proper prefix/suffix overlap to itself. This case
was handled in sga overlap by filtering out self-edges but sga fm-merge cannot do this. The current version of the code
allows a read to have an overlap to the same read in both the prefix/suffix directions. It is unclear whether
this should be allowed so it is left in for now.
commit 412030bb2e68f8a41acdb49107f043ff0a3ffee4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Dec 11 16:34:28 2010 +0000
Started implementation of FMMergeProcess logic
commit 23d1c75176f9a12b64ae42ba8012db850f98c6eb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 10 15:59:29 2010 +0000
Added BitVector class for storing large arrays of bits. Stubbed in some functionality of FM-merge.
commit c93a6fffc21ffc411c6a653fdb4b8b351875b9af
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 10 10:23:59 2010 +0000
Added skeleton code for the FMMerge processes
commit ac45c0d60005408c1c5d773788bfdb4dc8eaa412
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 10 10:00:39 2010 +0000
Skeleton of fm-merge subprogram
commit 176fab61fecc2f311f905e9a4c1a711ce3a6cad5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Dec 1 09:38:05 2010 +0000
Changed input parameters in sga-pipeline
commit aa64162a13ace4e0f5bb6fffce52a541e99eb354
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 26 14:52:11 2010 +0000
Re-implemented connect pipeline. Cleaned up sga-assemble
commit 94b423fd14debb751ef30705bffdcd7f73afe726
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 21 17:01:17 2010 +0000
Changed name of FULL_COUNT define to RL_FULL_COUNT
commit 88f33295d8d4a201f77c66ad28abcda0859a8e8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 21 16:59:49 2010 +0000
Refactored RLUnit into its own class
commit 72ab561828f5e1292a5dfce4e93a05014fe45110
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 21 16:50:35 2010 +0000
Refactored the BWT Markers into their own file. Also moved accumulation code out of RLBWT into the RLUnit
commit 6a8d335212c9cce66e36e3031091adc8ab8df712
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 21 16:08:52 2010 +0000
Capped max run length in --run-lengths option to make the output more readable
commit f3050dfb686dbeb45721f48598874a40189282f3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 21 15:55:48 2010 +0000
Added extra information to sga stats --run-lengths
commit 235cf4366ff3b3244a1121b6b5910ca004830f46
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 19 15:11:33 2010 +0000
Updated version to 0.9.4. The main difference in this version is an improved strategy for managing the Occurrence array in the BWT, which requires substantially less memory.
commit 74149fe3cd127ce30719d70b0c4016facd8a0258
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 19 14:52:04 2010 +0000
Added a method to read in the non-RLE BWT from a binary bwt file.
commit 737df471dd5d9fa339140266af64623ba58cee7e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 19 13:43:16 2010 +0000
Changed default sample rate for merging bwts
commit 9b04f8a8c9a76fa888f46f6e165248e573509f2d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 18 16:10:30 2010 +0000
Fixed bug in two-tier implementation where the count for the last SmallBlock placed was incorrect.
commit 09aa2f94382605c9b6b14a89bc1fabe1a1aba495
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 18 11:13:41 2010 +0000
Implemented second version of two-tier occurrence array markers.
This version keeps the symbol counts since the last superblock in a
2-byte integer as opposed to keeping the count since the last relative block
in a 1-byte integer. The second approach is faster for most ranges of sample rates
and can allow even lower memory use than the 1-byte approach. This version
will be merged into the master branch.
commit 17fc25cc68d231a1e2b0706ae4879bfe5aee3e57
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 17:00:39 2010 +0000
Removed gcc force-inline attributes
commit 8b74e890c706f0cff9d9b3cfd4a7ae1ee336db3d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 16:56:50 2010 +0000
Removed unused print statements in getInterpolatedMarkers
commit 1214a053ce2fa9eb4b0504adc7ff1a75588905bd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 16:56:11 2010 +0000
More clean up of two-tier code.
commit 4b4a749f30739656143867c507131d099be28283
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 16:50:36 2010 +0000
Cleaned up two-tier code.
commit fcb711564ec212a0048e94b938600e9363748d32
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 15:36:43 2010 +0000
Fixed error in SmallMarker - was using size_t to hold the unitCount when it will be at most 128. Changed to uint8_t which for a huge memory saving.
Changed default sample rate for LargeMarker to be 1024.
commit e963477c485d695f353281165c8f6399c7898e97
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 14:50:52 2010 +0000
Removed old marker code and cleaned up.
commit aace6e07c473a7c93109c0fa6c619b1550538a8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 17 14:18:10 2010 +0000
Complete re-write of how the BWT occurrence array markers are represented.
We are using a two-tier system where LargeMarkers are placed every 2048 symbols with the absolute count of the number of symbols seen up to that point.
Every 128 symbols a SmallMarker is placed which holds the count of symbols over the last 128 symbols. This allows us to store these relative counts with
an 8 bit integer instead of a 64 bit for the absolute counters. The absolute counts every 128 symbols are interpolated from the relative counts.
This is a much more space efficient representation - thanks to Travis Wheeler for this suggestion.
This version of the code has the old marker system (absolute counts every 128 symbols) left in and a testing function
is placed at the end of initializeFMIndex(). This version should only be used for testing/debugging the
two-tier system. The old system will be removed in the next revision.
commit b41a2bf65d3745a86ce1102a58397aa2ab2711fc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 16 13:05:54 2010 +0000
Rewrote AlphaCount class to take in a template parameter indicating the storage size. Replaced all existing uses of AlphaCount in the code with AlphaCount64, the 64-bit storage version.
commit 0f882ca77b50c58044ad2d9098911fa094150547
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 15 14:49:20 2010 +0000
Added a numReads field to the header of the sga-connect output
commit d34598c6120fcdcb6a7eb06ab870938805e0dbb5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 15 10:40:49 2010 +0000
Fixed typo in README spotted by Matthias Haimel. Added instruction for running autogen.sh
if the program was downloaded from github.
commit 9370e73f3c4fa8fc46ba0080a13efadc0cee9a0c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 14 17:45:08 2010 +0000
Added --run-lengths parameter to sga-stats to print the run length distribution of the BWT
commit aa5c8b1223d72ee3ce9c2c30a00cdab56514d39f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 10 14:56:46 2010 +0000
Minor formatting change in configure
commit 074aba008bbaa45f98065a853879c8cf27c30173
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 10 14:45:16 2010 +0000
Added --with-hoard=PATH option to configure to allow the use of the Hoard memory allocator.
commit f621a0fda866284d1dd715eee9174a8484ecd0ff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 9 14:26:04 2010 +0000
Fixed string initialization error spotted by valgrind
commit 3a682309e39a22ebe36627fe27abddb360d1fd2b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 9 09:49:34 2010 +0000
Fixed bug in the bubble popper. The counter would never be incremented so it would always be reported that no bubbles were popped.
commit 639a457db97c0c4970ee3e33f405788606d9b82b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 8 14:26:11 2010 +0000
Added structural variation detection options to sga-connect
commit 11086531217e9dc161b3d01d5e54fa3dad4cffd6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Nov 8 10:37:34 2010 +0000
Rewrote portions of the MultiOverlap correction code for efficiency
commit 7c94c81ed65784ada679f64cb6d3a3e769a3f1ef
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Nov 6 14:43:37 2010 +0000
Added new statistics to sga-stats. Now outputs the estimated error rate in the reads and the mean overlap depth.
commit d50d0deb71de409587f564d86ef3ea785cc1a9d6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 5 13:00:35 2010 +0000
Added sga-align and sga-deinterleave helper scripts
commit 9c06ef380a181eafd57ae32e7fea18c977a00e6f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 5 12:58:52 2010 +0000
Cleaned up output in bigraph and assemble.
commit 4afa0b54b7b9b5fa0072f9846160962a040d374a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 4 15:24:41 2010 +0000
Implemented edge link update function in scaffold module
commit d54535768f8dfd18018686e80a4185af9d4503be
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 4 15:22:13 2010 +0000
Rewrote Util/HashMap.h logic to explicitly define the StringHasher function. This is to fix a problem where tr1::unordered_map was available but the sparsehash was still trying to use __gnu_cxx::hash<std::string> which does not exist.
commit 5b5b96b1d82810dcb0972f976e06cb0695a0ee91
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 27 15:59:19 2010 +0100
Update the new sga-connect program to mark vertices in the graph that are covered by a pe-walk
commit 8fba38d900c414787cdd6e13dfd63043dcd0a825
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 27 14:05:27 2010 +0100
Rewrote sga-connect to work from the graph instead of the FM-index.
commit 217878b8a139aa885b49ea9dba99482ed2e24b1a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 27 10:57:17 2010 +0100
Added flag to gmap output to indicate reverse complement alignments
commit 3efc111285f9b4a1dc9f3fcee6a953c2e67da685
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 27 10:26:33 2010 +0100
Implemented gmap subprogram which is a very basic read-read mapper.
Currently this is used to map reads that were rmdup'd into an unmerged graph
for use in the connect program.
commit e9599970aed63fb8cedff1c1c1363a002c9a2914
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 26 11:32:27 2010 +0100
Added new subprogram sga-stats which prints out a histogram of the kmer counts for a read set.
commit 47bd5bdda032019e040c900878e65dfa2a3a9ddd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 25 15:52:34 2010 +0100
Added new output file to sga-connect to record the pe reads that could not be connected
Added new parameter to sga-connect to specify the maximum distance to search for
commit d07426edce76acc72d7e114862e069cfd3c97a1d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 21 14:17:14 2010 +0100
Implemented sga qc subprogram. This program looks for, and discards, problematic reads. Right now, the qc check requires each read to have a tiling of high confidence k-mers (with a short kmer length).
commit d3ab7c9b17ecd94a9f75e3586f3f1457f26e80c6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 18 09:34:40 2010 +0100
Added --version option to sga main program
commit cb76decabb82e9343d74678939a39459c543a7d0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 13:30:57 2010 +0100
Added --skip-preprocess option to sga-pipeline
commit bad617b0b0706c2f73181560fe8f095622ded47b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 13:15:03 2010 +0100
Added sga connect workflow to sga-pipeline
commit 41d35285c265f07ec3aac27ffeb0c42b081b80e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 12:01:58 2010 +0100
Updated README with new name of the --trim option
commit 7a4a0ae526b4dfbfa346eac327adc657d909917f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 11:44:18 2010 +0100
Changed version numbering to a conventional x.x.x scheme and bumped version to v0.9.3
commit 566598bff4e6b81df91812b8d1ce1d39b169998e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 11:39:14 2010 +0100
Cleaned up help message for many subprograms, mostly by adding default parameters.
Changed -t,--trim option in sga assemble to -x,--cut-branches to avoid confusion with --threads used elsewhere.
commit 6337c7be973b5a9d2467efa7e70e6810b4005579
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 10:46:58 2010 +0100
Extended --permuteN option in preprocess to handle the full IUPAC ambiguity code set as suggest by Shaun Jackman.
The option has been renamed --permute-ambiguous
commit 1b7072809bf4006a45dcd4743b7c31ac24180929
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 09:58:20 2010 +0100
Modified SeqReader to automatically uppercase all input sequences.
Added validation flag to SeqReader to detect when non-ACGT characters are present in the reads. This condition will trigger an error in all programs except for sga preprocess.
commit f971e6665f31256ba43bfa3d34d47918e3fe918d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 09:29:04 2010 +0100
Modified SeqReader to read compressed fasta/fastq files
commit 170852dd0ce962d31e6b22c57ccfc36a5b87f591
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 15 09:23:40 2010 +0100
sga-pipeline: fixed formatting issue for rmdup and correct wrappers
commit 402152cd0888a87580d3da0b266903ff146c600a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 14 16:24:03 2010 +0100
Added logging to the sga-pipeline script
commit 071d9bcca277644c9f96c97417893abc994b9caf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 13 16:17:15 2010 +0100
add rmdup-pe workflow to sga-pipeline to remove duplicated paired-end reads
added sga-joinedpe helper script to join/split PE read files
commit a594f0e9d699f7ea38af408c4dbf001b5a11a744
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 13 13:52:51 2010 +0100
Rewrote sga-pipeline to be more modular and flexible
commit 792593920b3317c14ac83cc6102a068262e29e77
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 12 16:25:40 2010 +0100
Corrected file extension handling in sga-pipeline
commit 4008a006fc786093dac979968d53c12d15f6b6f8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 12 16:02:10 2010 +0100
Added pipeline script information to the README
Updated comments in bin/sga-pipeline
commit 5237ed412e6f4e7bf02dba750ee29275f0b8af35
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 12 15:53:24 2010 +0100
Added bin directory with first version of sga-pipeline script
Added option to overlap and rmdup subprograms to specify output file
commit 114d4302b12018435f397f94d9b1be1373d21c62
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 15:42:26 2010 +0100
Fixed formatting
commit 15b4435da2bc17f00bc0dcf9a81a15db0184aa20
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 15:31:14 2010 +0100
Obscured email address
commit 02f3df8d8d3ddcaf342b4b1d6449d35ae1318e0e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 15:18:00 2010 +0100
Rewrote sga main webpage.
commit ad0ca12560d18b8fc808919d3eae45d9400bd1cf
Merge: 384f9d5 f4cc5c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 13:22:39 2010 +0100
Merge branch 'gh-pages' of github.com:jts/sga
commit f4cc5c51dde1e638d60e9f9ba0ed9f13e50762be
Author: jts <jared.simpson at gmail.com>
Date: Mon Oct 11 05:14:02 2010 -0700
github generated gh-pages branch
commit 384f9d5b6a7edc93c59438d8ee1bc0a4631b39c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 12:42:04 2010 +0100
Minor changes to the README
commit a9704100e08a304946a282574f01a8487e22b2bf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 11:44:37 2010 +0100
Added assert ScaffoldRecord::introduceGap to catch case where the expected overlap between scaffold components is not sane.
commit 140c6dc1827b545277ddd63937931c5f5bb8aadc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 11:37:46 2010 +0100
Bumped version to 0.92
Changed usage message of main program to clearly indicate that the connect/scaffold commands are experimental.
commit f40242ea2804ff543ef272c3787a083ebcd85026
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 11:32:46 2010 +0100
Made the irreducible-edge only algorithm the default for sga overlap. All overlaps can be generated using the -x/--exhaustive option.
commit 57b4e854a47ff0ded7ae7c29db5896e3d3428a15
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 11 11:02:05 2010 +0100
Implemented hybrid mode error correction which first performs a kmer correction pass, then overlap correction.
Added parameters to the sga correct subprogram to control the kmer correction
commit c918bf6ea72a2abc52f294ee7492b74a709270ab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 18:59:55 2010 +0100
More README updates
commit f2b6d5bcfa84ea8f26e4651a05c8562fc41c328d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 18:51:17 2010 +0100
More dead code removal
commit a426eca1f9809a1341c73ca0c6979751b7d16e0c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 18:49:00 2010 +0100
Removed dead code from repository
commit 824d73b5c9a17fe78e22793ebf748f19243e7591
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 18:33:55 2010 +0100
Updated the README
commit cf9eca09faf626c8262cf0ddb1ca85545f20cf34
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 18:02:12 2010 +0100
Refactored repository to not contain data files and tools/analysis scripts. These are moved to the sgatools repo
commit 14b97619c83e86df050df44b5195df42840272b8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 10 12:07:47 2010 +0100
Fixed bug in error corrector where no sequence would be output for uncorrected reads in the kmer algorithm.
Inlined the getDir function in the Edge class
commit 500095b7fe6c76a3ae33a9cc199715210afc9530
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 7 10:45:27 2010 +0100
Changed the read discarding logic for the kmer error corrector
commit d3e4ae7a4de0a2970635cd89cce8f82ffcc9a558
Merge: 82c2b42 d2f97ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 7 09:27:04 2010 +0100
Merge branch 'master' of /nfs/users/nfs_j/js18/work/git_repository/sga
commit d2f97baf0051ba01d4c1b58fe0f09623c48fe5d3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 6 20:58:37 2010 +0100
Re-enabled threading in the connect process since the memory pool issues are fixed.
commit 2675d1919e62f4a9fa4091b7966aaf2f05b76c5e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 6 20:42:15 2010 +0100
Rewrote vertex/edge allocation logic so that a global memory pool is not used. The pool now belongs to the graph that creates the vertex/edge. This allows multiple graphs to be created in different threads without stomping over each other's memory. The global new for Vertex/Edges is disabled, the allocations must go through a pool.
This commit was validated with valgrind - no memory leaks were found.
commit 82c2b4295bdb0e2fac15c0a1847d18dc0dd58830
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 6 11:20:54 2010 +0100
Improved kmer-correction. Gives very close results to the overlap correction.
commit 6d31382f9fcdf3ee7585a2d93919319ad0a6361f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 5 14:11:04 2010 +0100
First pass at k-mer error corrector
commit 058405c71852765fe72cd759401b39f432049f9f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 4 13:28:24 2010 +0100
Updates on the sga2afg convertor script and the testing graphical fm-index python script
commit aa61b462a41bdc5f2ed232b7d80759cde3a5729d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 30 15:43:43 2010 +0100
Added development script for computing an FM-index from a polymorphic genome
commit 70f6ef9a3f4d6414de0f7c6df0d9adc20d40eeb0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 21 11:53:05 2010 +0100
Fixed bugs in the sga2afg and sga2contig scripts
commit 6a0b92b21bbcb749ee75a07b2505a9d34d3cdfc0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 20 16:17:03 2010 +0100
Initial checkin of sga2afg script
commit 5c87be1bd38ca1eb0fc9e2d1bdac7b6ea229f813
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 16 14:41:20 2010 +0100
Started work on kmer-based error correction
commit 1d549aeb32e1201aac57dbd743cc3364beead883
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 10 13:33:16 2010 +0100
Fixed bad memory leak in the branch cutoff code for the overlap algorithm
commit b027c583ccf0f7fdec3a8399ec99daf74a977450
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Sep 10 10:48:44 2010 +0100
Added parameter to the correct subprogram to limit the amount of branching for complex reads.
Added walk subprogram to extract the paths through the graph between two vertices
Added scripts for coloring a graph with variants
commit 492ed6414cdaff64cfc492ad5de80073b157cd7d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 9 10:17:04 2010 +0100
Added sampleRate parameter to rmdup
commit 0e95f812b3a99ccd660bd0a1641ac12975407363
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 8 16:50:06 2010 +0100
Made command line parameters for the coverage removal algorithm
commit cea82e60b9f43f171eaa438e0dc05d4bd471f62b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 8 16:05:27 2010 +0100
Added function to SGSearch to calculate the coverage spanning a given edge.
commit 422ebb28a0e08f717bd3ff3ee90bbebeb06aeeb2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 8 11:10:49 2010 +0100
Added depth filter to the error correct process to avoid correcting very deep sequences, which takes a lot of time.
commit 0fde038b4303e992fa4206e852c48c3192a69220
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 7 13:04:32 2010 +0100
Fixed seg fault in search where the m_pWalkIndex member of an SGWalk was not initialized in the copy constructor
commit 6c7a395d74dcfd62b446c02b7fea44638da50c6c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 7 09:30:29 2010 +0100
Added some parameters to merge and correct. Moved the smoothing task in assemble to occur before simplification. Smoothing is still experimental.
commit f8a61d96948dc663953a7cd707d3fef3cb2745b6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Sep 2 16:57:57 2010 +0100
Added experimental bubble-popping "smoothing" algorithm. Not in a state that it is usable for production work.
commit 773cf0d377eb3ff3c127e75466811ab307f33857
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 1 16:27:41 2010 +0100
Fixed bug in the connect subprogram where the program would abort if the first and second reads had identical sequences
commit aa2a0168e80c4f2d4013242a78e0235aaa66666a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 1 11:49:35 2010 +0100
Completed connect subprogram.
Threading is currently disabled since the global memory pool for the edges is not threadsafe. The solution will be to have one pool per StringGraph.
commit ed88222bc704e0c2c01cb78c5dc678123151292f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 31 16:51:36 2010 +0100
Changed default sample rate for merging to 1024
commit 92806dd91b615995ccffd753e013de27c6c75e3e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 31 16:35:45 2010 +0100
Removed unnecessary print.
commit 89dedf43b72e0fed7b4b3783a9478361b7f417d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 31 16:34:24 2010 +0100
Threaded the mkqs portion of the indexing step. Not a huge decrease in running time, around 20%.
commit 3d893a57e7f66b25ccc686bb83befa0f48844fe1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 20 12:37:50 2010 +0100
Started implementation of local string graph construction. It currently generates duplicate edges
for containments which should be fixed.
commit 67d9e29e9fcaa55376c9cf7447bec7289ff387a2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 19 12:04:21 2010 +0100
Fixed int to double conversion warnings.
commit a14006684f209a83885724a4c7c9b391a147eee5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 19 11:41:41 2010 +0100
Added --no-discard flag to sga correct to suppress discarding reads.
commit 99650124abce186892edede7171d302b01900ae1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 19 10:37:07 2010 +0100
Made the sequence process framework more generic by using an input generator
which takes records from a SeqReader and outputs a generic work item.
Made framework for the connect subprogram
commit 67ec291013fbe404bf8fa61c7731b80af12c8f57
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 18 13:37:32 2010 +0100
Finished graph resolving work. Added command line parameter to choose the stringency of the resolution step.
commit 9ee391b4d0e3dc6e571ef5f9d0db98b3c9af5b45
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 17 14:17:41 2010 +0100
Implemented scaffold resolution using the string graph.
Added parameter to the small repeat resolution so that it only resolves repeats
if the difference in lengths between the putative repeat edge and the spanning edge is greater
than the threshold. This is to avoid removing the wrong edge in highly repetitive areas.
commit bb7456d610d3e543f9dddd88075efee80844319c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 16 14:27:34 2010 +0100
Fixed major performance bug in the irreducible extension algorithm. Every right extension was performing 4 branches,
which copied the entire search history every time. The issue became very obvious when running the assembly with long strings
where very long extensions are common. In the case the algorithm is quadratic in the length of the right-extension instead
of linear (ugh). Fixed by only creating a copied branch when there is actually a divergence in the extension. This could be
made even better by using the SearchHistory tree implementation so no copying is ever performed by it would be a fairly large
change.
In the usual case of 100bp reads, this version is 3x faster than before.
commit 78f8b396b4230480b8ebca18b9a401c3f8de85aa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Aug 15 01:47:04 2010 +0100
Refactored vertex to vertex search algorithms
commit d13c49be036c42ec178fc8a3089893abcef4d0c6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 13 23:10:36 2010 +0100
Fixed bug in scaffold evaluation
commit ad348bc2edcf5daacfe47b5fc8ea705c4b290e5f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 13 16:49:42 2010 +0100
Turned off prints in Overlapper
commit 35ec631c65c1bb4bf7c0b8a9bb6e18cdf4be7b85
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 13 16:45:48 2010 +0100
Added a driver script for the scaffold evaluation.
Cleaned up the code for overlapping contigs when constructing the scaffolds.
commit a08d9a364376b7142b03610f96c463ded9cd2e2e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 13 13:56:44 2010 +0100
Added perl script to break up a set of scaffolds into contigs
commit 4b0c3a450c277ecfdbcf88b4b5daae6a6882554f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 13 13:30:27 2010 +0100
Finished code to join contigs that are predicted to overlap
commit 305b6683615f2a59fddad168b623a485be5ecfd5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 12 22:57:27 2010 +0100
Refactored dynamic programming algorithm into its own class.
commit 74d82864b290df38e493b6335808167b94de1c32
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 12 17:05:20 2010 +0100
Implemented edit distance calculation for two strings using dynamic programming in OverlapTools.
commit 8285344bc48cc9bbe75c788d912f2b99504dba84
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 11 17:00:41 2010 +0100
Modified scaffold to perform reductions until no more reductions can be made
commit 0a4ed6836caff2196ce1634ee3c0338e63642302
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 11 16:12:47 2010 +0100
Implemented simple scaffolding output where sequences are truncated if an overlap is predicted.
The results of the scaffolding are correct however and a test set of ecoli scaffolds were verified using mummerplot.
Fixed a parsing bug in ScaffoldGraph where EdgeDirs that were read from the distance estimate file were
in reverse order. abyss distanceest writes the SENSE edges first.
commit 2aa4cdafab665b90c603ee8e721554e98405825c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 11 14:23:11 2010 +0100
More refactoring. Created the ScaffoldRecord class to hold the output of the scaffolding process. It can be read from/written to a file.
Fixed a bug in Vertex::makeUnique where deleting a duplicate edge that points to the same vertex would result in memory corruption. Now
it uses a mark/sweep algorithm.
commit b2534bc3d2d90257886282a63ed9870801dac448
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 11 11:48:33 2010 +0100
Factored the link data out of the ScaffoldEdge class
commit 9f573927c1bde503fb28eddd53f7a8ae6ed960fb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 22:27:51 2010 +0100
evalScaffolds: output number of gaps and mean gap size
scaffold2fasta: load string graph
commit f81a2bf5bc6ee94c783dd11595021dfe7ac0f1b0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 17:42:18 2010 +0100
Added -a,--asqg-outfile option to assemble to write out the final graph.
Added stub scaffold2fasta subprogram
commit 5e4eca436670cb10d3eb41d9465e6f1eddf7b6ee
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 16:57:36 2010 +0100
Ported sga-scaffold into the main SGA program as a subcommand
commit 4bbe47bb4788cd9461f1c5d88fc93086d0730ab0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 16:21:00 2010 +0100
Added -o,--outfile option to sga-scaffold to specify output file.
commit cd5a5576ac65f6116e045e75048a3c670b1f95e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 16:12:21 2010 +0100
Added ability to output singletons from the scaffolder
Added ability to ignore singletons in the scaffolder evaluation
commit 86ac29d1fc261053f257274498fc88c8be227eeb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 15:49:27 2010 +0100
Added script to evaluate the scaffold output
Changed output format of scaffolder for above
commit 449d0ce66f6c4a6cf8c247cbe92d68de99b8038b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 10 12:25:58 2010 +0100
Fixed bug where an istream* reader was not cleaned up.
commit b5a60812fc1728b3b2f66cae3e3181e91710f5a0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 9 21:19:39 2010 +0100
Added ability to write out scaffolds to a file after processing. Still in development.
commit 0b85e499458d96fcf201602ea119c5496caff59b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 9 14:44:59 2010 +0100
Fixed terrible bug in scaffolder
commit d9c555b211579ee2530306fb23b300aba9419f1c
Merge: eee5297 4fb1f4e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 9 09:07:09 2010 +0100
Merge branch 'master' of /nfs/team71/phd/js18/work/git_repository/sga
commit 4fb1f4e5fd5288991d4def244f0f9f7f84daf8bb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Aug 8 22:58:20 2010 +0100
In-progress checkin of scaffolding code. It compiles but should not be used.
commit 375220e3dda58503b10122a8f4354c25d49f0b8b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Aug 7 18:53:32 2010 +0100
Added abilitiy to load a-statistic data from a file to sga-scaffold.
commit eee5297b2a63616fc83edaa00dbe4b62b16f5796
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 6 16:57:07 2010 +0100
Minor update to calculation for expected arrival rate
commit 7a42a785bec3ae72d3e69e6146e79b1a9797975e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 6 16:56:21 2010 +0100
Better estimate of expected number of reads per contig by using the number of positions in the read that
can be sequenced with an average-length read.
commit 1bf6b1122170b343b8686598b1ea49e94023e926
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 6 16:42:32 2010 +0100
Added script to compute a-statistic for contigs from a bam file
commit 75d82456b88ffb20292e754f3669cb81a33179c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 6 10:21:54 2010 +0100
Fixed help message for merge to indicate it can only take 2 files.
commit c0bbdc16016a6309618f0cbfcf62c9c7783251fd
Merge: 23cbd5d f4863d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 22:49:14 2010 +0100
Merge branch 'master' of ssh://127.0.0.1:2222/~/work/git_repository/sga
commit 23cbd5df30502395830fc4c6180fc2c34ef294b2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 22:48:48 2010 +0100
Updated the output after error correction
commit f4863d9ede32e884419d2d069eedbadad0ffbaf0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 22:15:09 2010 +0100
Removing Scaffold/Makefile.in which shouldnt have been added to the tracking
commit 8db7a5cbe571eb151a707041b7b19e232da82ed0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 22:05:06 2010 +0100
Added Metrics classes and ability to track statistics about what positions in
the reads were corrected, what their quality scores were, etc
commit 29d2b471b5cb087f4b23419ab9d809fb35e2e6c2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 16:50:19 2010 +0100
Cleaned up MultiOverlap code and removed dead code from other classes
commit 6e6482f53ea95107b8fbafef8b12604945352856
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 16:00:23 2010 +0100
Fixed order of arguments when creating a ScaffoldEdge.
Changed default storage type for SparseGapArray used by rmdup to 4 bits
commit 8869b97198347932232aedc0e6c6c5f9d46faf55
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 15:43:51 2010 +0100
Added arguments to index and merge to control the size of the gap array.
commit 39d5d7e05eaf4729bdb9a872532aba58802f0577
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 15:09:04 2010 +0100
Implemented 4-bit storage SparseGapArray
commit 41968c52d0c740cd02f506c8088241705e5eeb65
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 13:11:46 2010 +0100
Abstracted out the GapArray functions.
Implemented the SimpleGapArray subclass which uses a normal vector as the underlying storage.
Implemented the SparseGapArray subclass which uses a vector with a small int type as the underlying
storage. When the count of a gap/rank becomes too large to represent in the vector the value automatically
overflows to a HashMap. The overflow case is very rare, so this is a much more memory efficient way of
storing the gap array.
commit b582eea669fb88ab179f5bee1bf279ec04ac035f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 5 09:34:38 2010 +0100
Removed unused line of code
commit 15e94d409af2d9de00e7324fbe931e1fbd103c69
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 4 14:55:09 2010 +0100
Improved dot output
commit 286f3114ff0125eb267b34b3797bd250a719fb0a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 4 14:00:49 2010 +0100
Added ability to load distance estimate edges to the ScaffoldGraph.
Added ability to write the scaffold graph as a dot file.
commit d9eb7053f0922709c67686d9e567d9bcadd05fba
Merge: 481e9e9 667a8d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 4 11:44:55 2010 +0100
Merge branch 'master' of /nfs/team71/phd/js18/work/git_repository/sga
commit 481e9e982e271ee35cb4dbcad574d7c4400e276c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 4 11:43:00 2010 +0100
Refactored HashMap includes to check for the precense of tr1, ext/hash_map, etc.
Started implementation of scaffolder which currently does nothing.
commit 667a8d9ceef94db4452ac1e38f9043fa67d9a6f0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jul 31 14:55:56 2010 +0100
Added checks to the StringGraph construction and oview functions to ensure that each
vertex ID is unique.
commit fbc774133a74afabdd7ec6bffd7173ed78f65c32
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 30 16:52:03 2010 +0100
Fixed rmdup index rebuild. Substring reads were not being written to the dup file.
commit d132a2f3f23712af4a2f2a97e833979f8e716b10
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 29 22:46:10 2010 +0100
Changed output BWTs back to binary
commit 71ba9d104235e1b5f490a58633d06d2b1587d33c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 29 22:31:05 2010 +0100
Fixed parallel mode for rmdup. Now working as designed.
commit 460c0aa596afa6c8a756cfb8b2fe9547da4a21e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 29 21:47:27 2010 +0100
Working version of the removal of reads from the FM-index in the rmdup program.
For now only the serial version works as the original ordering of the reads is not respected by parallel mode.
commit 47e27b71d9e5ab5b1a747de43bea0a4d9efc80d7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jul 29 20:24:11 2010 +0100
Initial implementation of in-place removal of strings from the FM-index. Not working in the version.
commit 09dc2c1bc9403235d28f97bbbcd47893edb2eb9d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 28 16:59:32 2010 +0100
Removed Tests directory from standard build.
Filled in README file
commit 9746ea29e0fe75689a6a7ba9e9bc91caf55cd266
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 28 15:41:06 2010 +0100
Fixed Makefiles/includes so that make dist works
commit 6127d2ab4563ac593bb0b246515400a166b88478
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 28 15:12:36 2010 +0100
Bumped version number, removed define from SGA.cpp
commit aa7d1d1e1543a141bc8bf83d27a41e63c36f93f0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 28 15:09:28 2010 +0100
Added sparse hash check to configure
commit 466d25c9bdd653c7c27ce0114ec69988dad4103e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 28 14:24:36 2010 +0100
Fixed a crash in the contain removal algorithm found in the yeast data. An assertion would blow if multiple valid overlaps
to a read were found. Fixed by removing duplicate overlaps, only keeping the longest.
commit e492be53f8acbec6f0cc5ba389fca307401efe2a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 27 13:46:57 2010 +0100
heavy refactoring of BWT I/O. Now the binary and ascii output files are subclasses of IBWTReader/Writer. The rest of
the code no longer knows if it is reading a binary or ascii file - it all goes through the abstract interface.
commit 3cc29ae858042f40c4f74408906b143995b89d3b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 27 10:39:33 2010 +0100
Forgot to add RLBWT* files
commit b90bd157ef9637974e085e9138fa2bab08f14a83
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 27 10:38:04 2010 +0100
Working implementation of binary .bwt file. Uses run-length encoding.
commit 284c1ff247638a4dbca8789e6a29e152fbf2db23
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 25 21:03:32 2010 +0100
Refactored BWTAlgorithms::updateBothL/R to take in the AlphaCount for the lower and upper interval.
This allows the Overlap search to calculate the AlphaCounts once when performing a branch instead of 4 times.
This results in a 40% decrease in running time.
commit b3b2cb0fc9c84cb17f518f6aea1efade8fc66cfb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 25 18:33:30 2010 +0100
Changed OverlapAlgorithm to remove submaximal overlap blocks for containments and proper overlaps at the same time
before splitting the block lists. This avoids duplicate edges in the graph.
commit f5f487a8117183bdd4753fcb778cdb944cda2583
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jul 25 00:12:35 2010 +0100
Added function to calculate the amount of a read that is covered
by both prefix and suffix overlaps to MultiOverlap.
commit fae0bbc2d486e641b3a09ba4e14a8e4dc5035727
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jul 24 23:29:22 2010 +0100
Removed hardcoded paths from analyzeCorrect, run_bwa.sh and samQC
Added primitive sequence filtering to error correction step of SGA
commit 2a45314b964b1752429a924e7c5b5a1410e6cff7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jul 24 22:39:30 2010 +0100
Adding additional development/analysis scripts to revision control
commit 1c469d27f914bcc82118e9c37183674d7139284e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jul 23 17:06:37 2010 +0100
Added some metrics to the error correction.
Updated conversion scripts
commit ed4783bde0d55da45b1485926729d18416e015d8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 21 16:10:34 2010 +0100
Added new conversion scripts:
de2evidence.pl (generate a bambus evidence file from an abyss distance estimate file)
sga2contig.py (create a BAMBUS-style contig file from an sga contig file and SAM file)
Wrote a function in Bigraph to rename contigs so they dont have read names
commit 49ae7a3ed19aa96ee5c8d3660620722b19a0e40c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jul 21 09:23:08 2010 +0100
BWA-based distance estimation calculation is complete.
commit f707f72debb9a9ee6a59386bc3e014a90d9a7a14
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jul 20 14:44:10 2010 +0100
First pass at the scaffold driver. This version is based on bwa but this will be replaced by bowtie
commit 91ea027241d3c36b436dcd0b95ae7c31d515cb0f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 21 16:17:29 2010 +0100
Removed print statements from SGRepeatResolveVisitor
Disabled outputting the contig break and postmod asqg
commit c2b316d55d20aec4e3fa599f4be7e0591017bb4f
Merge: 0273116 90709e6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 23:47:53 2010 +0100
Merge branch 'master' of ssh://127.0.0.1:2222/~/work/git_repository/sga
commit 02731169d3fe68288a219462b5cb6040b1066546
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 23:46:58 2010 +0100
Added command line arguments to correct to take the in the algorithm to use and the conflictCutoff
commit 90709e61e557293e8334c5a2464cc5fcaa59bc10
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 22:53:07 2010 +0100
Added tool scripts to revision control
commit f537147d1504064ee7d9a487ddf3afbdb4fbfd74
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 22:52:02 2010 +0100
Tweaked PrimerScreen settings. Now matches over the first 14 bases of the sequence.
commit ae095f7fe343110f1425676282edc35227c32a79
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 22:31:42 2010 +0100
Added quick and dirty PrimerScreen class and enabled the screen in the preprocess. This just checks for
Sanger's pcr-free primers at the moment using exact matches.
commit 6cea50587e15225465dc0def7b43c16e34e45999
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 20 01:16:40 2010 +0100
Re-enabled seqtrie correction.
Added check to make sure the error rate parameter is sane in SGA/correct
commit 208efdd1fad98f3606c95a47dffe77aa37452581
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 19 18:55:26 2010 +0100
Added method to MultiOverlap to generate SeqTries.
Changed the error correction function to the conflict-consensus algorithm.
Removed unused code from MultiOverlap.
commit f500afe2ed5f32a439121561c2ef1b6445355754
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 18 16:09:19 2010 +0100
Added small-repeat resolution code. Remove edges that join together two sequences with a sub-read length repeat unit if there are
other reads that span the repeat. Big improvement in N50 on simulated data.
commit 2e0d8226e41da8ecb0aa6f4d643ea5e52e79f7de
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 18 09:51:31 2010 +0100
Added some extra information to the break writer
commit 0eaafc718d8083ce02bbea583dfbb1a82b82adb4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 17 12:57:29 2010 +0100
Removed print that was checked in by error
commit 4a150147153485ecb208b29c19ece4a2e6b88852
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 17 12:40:24 2010 +0100
Fix: the number of times the trim/bubble popping is performed did not match the command line parameter
commit 2fdc7ee1741212edd925963be0fd5df314ee5111
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 17 09:36:20 2010 +0100
Fixed semantics of quality filter
Made the number of trimming and popping rounds a parameter.
commit 549e9ebd17817cf753f52a93f96cceb025d59e99
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 16 14:55:35 2010 +0100
Added quality filtering option to remove reads with a substantial amount of low-quality bases.
commit 0f320358987b66cb06618106b7add925d1e9ae32
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 16 13:42:00 2010 +0100
-Added option to perform multiple rounds of error correction
-now using google sparse hash for bigraph
-added --exact option to assemble to turn off expensive containment removal
commit 8265baed107e8874dd27905b9cff5e4c5779b2f1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 15 15:19:56 2010 +0100
Implemented -o, --outfile option to SGA/correct
commit 63c59cf02814e9a0e90a7f93bed9e95fd6f89989
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 15 10:41:32 2010 +0100
Changed tabs to spaces in samQC, modified so the summary stats can be printed in every mode.
commit d32ce13af5201aae1ab406c4863b4d391afa76aa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 8 23:36:38 2010 +0100
Cleaned up subgraph, it now removes containments and properly handles the vertex visit logic
commit fc78a7a3750daf020ad4464e158a0d3da6310bf9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 8 22:48:27 2010 +0100
Added subgraph subprogram, to extract a specified portion of the graph
commit 492bec5024ff996a2ce047fda4d8376eaf0f2587
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 8 21:42:48 2010 +0100
Made the number of reads to process in a batch a parameter to the BWT disk construction algorithm
commit 6cda259a62bc47f0ee7dab372836c159ad78ffb0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 8 16:18:21 2010 +0100
Added hacky BubbleEdge removal visitor. Currently not in use.
commit f6d68d195e58c238a4060d67ed4679c8b80436cb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 7 12:51:32 2010 +0100
Force the suffix array to BWT conversion methods to use SBWT for now. This should eventually change to writing the RLBWT
directly to disk without constructing it in memory first.
commit 00f4ba52165810022283f151db6452c5d37b3030
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 7 11:32:11 2010 +0100
Re-enabled rmdup by writing the id and sequence out to the hits file.
commit f9fe32636b591211ffd21cc584f80b29dccf04eb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 7 10:36:36 2010 +0100
Fixed but in RemovalAlgorithm where cycles in the graph would cause an infinite loop.
commit 995120408a9738df68c6adc232f8c1436e2f0b13
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jun 6 19:13:22 2010 +0100
Added hidden argument to overlap to use exact mode.
Temporarily using the vector swap trick to reduce the amount of memory required. This will eventually be replaced by reading the
number of runs in the RLBWT from disk.
commit 08f8c3aecd67747c55e6fc6ed97eb9b60cf90f57
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 5 18:14:28 2010 +0100
Added ReadInfoTable to load an index of id,length pairs. This is used to construct overlaps from hits in overlap and rmdup. The benefit here
is we don't need to store the sequences which aren't needed. Rmdup needs the sequences so I disabled it for now. This
will probably be fixed by outputting the read sequence along with the hit so it doesnt need to be indexed.
commit 97d594c7de94915b855f9a92525785fed99c8ddd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 5 12:03:20 2010 +0100
Fixed bug in RLBWT::initializeFMIndex where the last marker would not be placed correctly.
Added -d flag to overlap to set the sample rate for the FM-index
commit e026474fe089be0e9c59d9e63a81f9274938eb0a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 4 16:59:46 2010 +0100
Implemented forward search in getFullOcc as well. The code could be cleaned up a bit.
commit fa0d4899d5dc78c7c784d9e06c78e69c1ebc5919
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 4 16:51:16 2010 +0100
Implemented forward-search of the Marker array
commit 9788661e2c18bb686f0eea4a41df629cfb2e8f22
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 4 14:09:43 2010 +0100
Renamed the old BWT class to SBWT ("simple" BWT). The BWT identifier is now a typedef to switch between using the RLE version and the regular version
commit 155b00fb0aee5b9be208c7efd2f2b49a9f16c541
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 4 10:13:36 2010 +0100
Implemented occurrence counting for RLBWT. Some efficiency gains can still be made
commit 5f51df85eba8c2031cf353255cb39e1ab812e73c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 3 21:39:57 2010 +0100
Implemented setting the markers in the RLBWT and random accessing of elements.
commit 125ea10008bba7c94c723b7275d12b9daad7e4c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 3 17:05:47 2010 +0100
Started to implement the marker placement code
commit c65984623743a7b8a4636b4b21dda3febf68bf33
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 3 16:20:40 2010 +0100
Rewrote RLBWT printInfo
commit 3f15ddf965213ae462fe079093d5f84b597a9b2f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 3 16:08:02 2010 +0100
Began implementation of run-length encoded BWT class. It reads from a .bwt file and compresses the string into runs as it is read.
The read and write have been implemented and tested but random access to the string and the occurrence operations have not been.
commit 39fd015df26d6976a67beef3e6c33367014a7ec7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 3 11:20:12 2010 +0100
Implemented the rest of the error correction subprogram. It uses the simple correction algorithm at the moment but gives good results on simulated data.
Wrote the tool script evalCorrection to evalulate how successfull the correction step was.
commit 84edbc27351b0f01a9eef4a532e1ad8b2f7a6d94
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 2 16:34:49 2010 +0100
Added methods to SearchHistoryVector and OverlapBlock for extracting the string corresponding to a match
commit fcb94e39b3de1f25ae3b65d88832d0f7751e1e45
Merge: 4c2e160 a5021ff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 2 14:11:00 2010 +0100
Merge branch 'master' of /nfs/team71/phd/js18/work/git_repository/sga
commit 4c2e1606b084f9c4267e67577a9d5c48f41cf8c6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 2 14:09:51 2010 +0100
Added skeleton of ErrorCorrectProcess, implemented control flow for error correct subprogram
commit 8bbb08f5ad6127f9eb1df0bfd5d3ea9bfae820cd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 2 13:26:50 2010 +0100
Added skeleton for error correction subprogram
commit a618563b86bc21eab01cd1d2b0c32ea0bf082710
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 2 11:19:09 2010 +0100
Added some void casts so the program compiles without warnings if DNDEBUG is specified
commit ba7b5bcdd14fb7e4712b3654b90b9b876c911a34
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 1 16:23:45 2010 +0100
Refactored CompleteOverlapSet to use new partitioning code
commit 66b696e2054c632c93c2cd0119cbe059b223aef5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 1 16:16:58 2010 +0100
Fixed last issue with the new remodel algorithm, it now gives the same result as the old algorithm but is much much faster.
commit d4544acc4bea92562275ed9ab71e4e6560a5032d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 1 15:12:06 2010 +0100
Implemented new, much faster remodel algorithm. The implementation is not perfect yet.
commit a5021ffe626436e3320089a354f5e4ff99343884
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 31 20:45:10 2010 +0100
Fixed a few incorrect forward declares that Clang picked up.
commit 9b0d758a4ca29f9c20ed83c11f1f5de1e5de9a82
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 31 15:35:29 2010 +0100
Removed double-construction of overlap block
commit 1441572624b23f1ee1ce5988a4aa4cfb50a18a8e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 29 15:08:50 2010 +0100
Reverted the number of reads per group back to 2M
commit 143794adcce18927d9d4c1baf6be2b4739b69e08
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 29 14:53:31 2010 +0100
Modified BWTDiskConstruction to use SequenceProcessFramework. This involved refactoring the GapArray into its own file.
Implemented threading for index and merge, using above.
commit ef8c57a061b95c44ba890a207da0154534cb2700
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 28 16:15:18 2010 +0100
Removed some prints
commit 05fde2f69ccad6c7f6ca7dfe4b8f965c383673d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 28 16:00:47 2010 +0100
Generalized the SequenceProcessFramework to take in a SeqReader and an optional parameter n which limits the number of
sequences to process from the Reader
commit a15c73a637e45180882f6dc9690e1671057faa51
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 28 15:37:05 2010 +0100
Moved some print messages to SequenceProcessFramework
commit 6562b75157876aeff3082a1661108325afa8aad1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 27 16:30:23 2010 +0100
Refactored rmdup to use the new concurrency framework. Removed OverlapThread which is now obsolete
commit bc9c80997e590cb7d4106e70fac1623d6f87d69d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 27 13:19:46 2010 +0100
Major refactoring of how a sequence file is processed in parallel. Wrote the generic SequenceProcessFramework to handle reading the file
and passing the reads to threads which perform some arbitrary operation.
commit 4487b0fa562f4dad777ef661688c8d2861030b8a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 26 11:06:20 2010 +0100
Reduced the memory usage of Vertex by using the SimplePool allocator and removing two data members that are not used currently.
commit 70ead2f4ed1fb994bb6f2a020ce687b0899eff1a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 17:00:20 2010 +0100
Added check to SGA/merge and revised mergeDriver tool
commit 0d140169fd534653d35367c23126eaadaafdc10b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 12:49:33 2010 +0100
Changed comment
commit b9ca8ccf5344d2b83d69717abe839133f3f73777
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 12:48:02 2010 +0100
Changed interface to parseHits so that the reverse read table is not used.
commit 2edebcbe15026a843787935ee879111fe03c2b3d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 11:54:46 2010 +0100
Don't load the reverse read table in SGA/overlap, only use the forward read table.
commit cf1635da01e91cba13bbedf541f37e243a48dd6a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 11:14:53 2010 +0100
Added option to merge to clean up original files.
Added helper script to make merge commands
commit 82e6ae5d65a5aa3efaef0ca0b8873bacce80ce98
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 24 10:07:15 2010 +0100
Added 4-bit BWT codec. It uses half the memory compared to not encoded the string for roughly the same speed. It is faster than the 3-bit encoder for unknown reasons.
commit f8eb528f4e72a91cee48c0971a6d1c7b59d920b2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 21:59:20 2010 +0100
Changed NoCodec to use a similar get/store function as the real codecs.
NoCodec is about as fast as using a char* as the BWT string
commit 7c146c5c8511e5a28411fffd5bb6b69cb1c7ddce
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 21:52:53 2010 +0100
Added NoCodec which can be used by EncodedString to avoid doing any actual encoding. Useful for testing.
It is the default for BWTString at the moment.
commit 252207515fe84283b32ff06764f9132c7ee0a95d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 21:40:02 2010 +0100
Added lookup table for shift values and changed value in mask from decimal to hex
commit c30bec27b3f399b742046d754f17eb6002a5c1e4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 21:11:26 2010 +0100
Changed the BWT class to use the EncodedString representation of the BWT string.
It drops the memory usage by a third but increases the running time as well. Probably can optimize
to get some time back.
commit e01804dadd5dff4ba29bb9484a5c751e7b5ba8e1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 20:25:02 2010 +0100
Added BWTCodec to encode an alphabet of ACGT$
commit bef01e79948a3480b0713a806caca332758eccc8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 23 00:49:29 2010 +0100
Implemented append and swap functions in EncodedString. Ported the Vertex class to use this class to store the sequence.
commit 85355ef0bee47c313e7e1e9f9a22b179f9fc83fd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 22 23:20:37 2010 +0100
Implemented the rest of the EncodedString class
commit 6722a0a09a18222a9410f3b20ed9426413f474bd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 21 17:00:33 2010 +0100
Started work on 2 bit per base encoded string class
commit 27c6e1dd9aed09430c592e7997fe3a735249f2a3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 21 13:43:18 2010 +0100
Implemented new overlap detection algorithm in CompleteOverlapSet
commit d7ff7c20c94ca3d4cae8a60b1861eeb531a8b9bc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 20 12:20:18 2010 +0100
Modified read sampler to add a prefix to each readname
commit 6c6a7e3db01376eee9e75a87cbb6347a4d7f6753
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 20 11:21:16 2010 +0100
Cleaned up outfile naming in SGA/merge, it is now complete in the case of merging two indices
commit fc176f70b6c6dd29907b4617ef53f0a96e616b58
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 20 11:01:43 2010 +0100
Wrote function to merge two read files together
commit a3ad2162355c0b7c320764b017a16e5908b3c1d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 19 15:56:36 2010 +0100
Added flag to merge reverse indices
commit 4867217b72ddcbfcb28db7d5480b265896b023bd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 19 15:50:56 2010 +0100
Implemented merging of indices from two different read files.
commit c865a29e460b611fdf933029333cedcf0c840ad7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 19:58:23 2010 +0100
Removed test code from read sampler that should not have been checked in.
commit c46f95ccb12c2d70aa2c681687accfec3670491a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 16:27:00 2010 +0100
Created files for merge subprogram to merge multiple BWTs.
commit 97bf8ca84798b7ae339575b2d9180abee2b67f61
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 15:55:16 2010 +0100
Fixed bug in Vertex where the containment flag was not being set in the constructor.
commit ced69a3a08175d5f4497f9ecf46413096d43b89d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 15:37:11 2010 +0100
More refactoring and the first working version the rmdup
commit 4ad871ca8ba02dc7a08c5c4e0157fcdb55c005a7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 14:17:51 2010 +0100
Refactored the hit computation code into its own file
commit ddcf7ce474e155e1d85b3c43ae61ee9492ac7fd2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 13:54:43 2010 +0100
More refactoring.
commit 057e32b2261c7707e5b329aa5aaeabe60cd18e76
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 18 13:18:42 2010 +0100
Began the implementation of the rmdup subprogram. Refactored OverlapAlgorithm so minOverlap is not a member variable but passed into the relevant algorithm to run.
Refactored computeHits functions and OverlapThread to support searching for duplicates.
commit 899c0dc26afb9e035f4c11d6cbb3ab000379d27d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 17 23:15:44 2010 +0100
Resurrected recursive overlap map construction for debugging long running time in yeast case
commit 20ee4ec736c962a2330b25eef7bd0d666cf47f5e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 17 20:32:55 2010 +0100
Cleaned up some code, added new visitor to (trivially) remove identical reads.
commit 255ef187ce9928117220959afe2017c73666b7f8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 17 12:12:54 2010 +0100
large refactoring, remodelling the graph properly handles generated containment and substring edges.
In a test case between writing the corrected reads and re-assembling them versus remodelling the graph,
the results differ by a few vertices. This difference should be tracked down.
commit 6bcd7c3365b4368d536bb83d6d57134517271faf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 16 21:54:48 2010 +0100
Tweaked setting for sampled reads
commit a2b9f2cc5169f43869a526a18674854b7e9d02e2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 16 17:04:01 2010 +0100
Added isContainment property vertex to signal that it needs to be removed from the graph instead of setting a color.
commit f74ba0f2192bc757243786b65dcf514b267f35a3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 16 16:55:22 2010 +0100
More work on substring containments, closer to giving the same results as exhaustive algorithm but not perfect.
commit 335d0f659b8292ea3f1eb067bb65437c7f87205b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 16 00:18:06 2010 +0100
Progress on handling substring vertices.
commit b70377a0b8cf2396d04027c1e63cfe2afb1855ed
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 15 18:27:45 2010 +0100
Write out containment/transitive tags in Bigraph::writeASQG
commit a135a776843a2869c21032167e203ccb69a66bc5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 15 18:22:27 2010 +0100
Added new graph parameters to specify whether the graph has containments and/or transitive edges
commit 14053eef861de30950e738a3c851daf8e7e1cb35
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 15 17:51:16 2010 +0100
Added function to write result of fragment completion algorithm to file
commit e0dbc64a2d117ba50cb95e48db54bb78c7d5a233
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 15 01:07:04 2010 +0100
Wrote core code for resolving the path between the ends of a PE fragment
commit 96d992453f166d7c808d5c8aca1dbd13961551c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 14 16:18:33 2010 +0100
Added a default value for the minimum read length
commit 6ae5cd1e0f3ac6a8ef15446b159b7ec49d3f1542
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 14 15:53:27 2010 +0100
Started work on handling substrings in irreducible algorithm. Unforunately it seems that we will have to load substrings into the graph and then remove them - they can't be determinstically removed at
overlap time.
commit 22bada92e285de3c1d8b0c10ac62bf9c7032befc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 14 12:44:16 2010 +0100
Re-implemented a cleaner version of the inexact irreducible algorithm in OverlapAlgorithm
commit 494640f0b667fa013760d305928552d55c26a2a7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 13 16:03:58 2010 +0100
Fixed potential memory leak in irreducible algorithm.
Made the remodel paramter a member of SGRemodelVisitor
commit 67a7c1b125bdb84f399774b529d6bbd315031795
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 13 10:38:16 2010 +0100
Fixed bug where the graph error rate parameter was not being set after remodelling.
Removed some dead code.
commit 2a1dc6be9b126c1462246fd5ceab41382f1701ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 12 16:51:55 2010 +0100
More refactoring, all the overlap discovery algorithms have been moved to CompleteOverlapSet.
commit 64a29251f6a930f418d9969a81239ff9235ba89d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 12 14:13:34 2010 +0100
Fixed a bug in CompleteOverlapSet, it now behaves exactly as if all overlaps within the parameters were found using the FM-index (as desired). Changed the remodel visitor to use it.
commit 321b21417489842dd37aa2b7eef35e1581a7fad6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 12 12:58:44 2010 +0100
Started to refactor the overlap collection logic out of SGAlgorithms into CompleteOverlapSet
commit 720ed311e0a67ca5b6013e418441eab05ba30aca
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 12 10:08:35 2010 +0100
Fixed bug introduced to OverlapAlgorithm a few checkins ago. The seed_length should be clamped at minOverlap.
commit 21da5cbd8365774559de2a874eae99dd7013ddcb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 11 17:06:00 2010 +0100
Rewrote the remodelAfterExcision function to use the newly developed EdgeDescOvermapMap code. It needs refactoring
commit 81081d30e0773eecc787d96b474c8e8ba093164c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 10 11:53:49 2010 +0100
Refactoring.
commit 038165652fd2b728af65fee50b848b6acee676f2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 10 10:52:28 2010 +0100
Merged SGAlgorithms::_discoverOverlaps and SGAlgorithms::addOverlapsToSet
commit ae1584cfcfe6d21437b421cf998ddab35357e706
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 10 09:44:43 2010 +0100
Factored the visitor algorithms out of SGAlgorithms into their own file.
commit f24f1b6184478605154b4783d7b9fee14888bb22
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 18:49:24 2010 +0100
Re-formatted entire source tree to use spaces instead of tabs.
commit d8e9e77b24e56a751581cd6b586af8ef84897c6f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 18:30:27 2010 +0100
Added flag to disk construction to build the reverse index. This completes the algorithm -
some improvements can still be made but it is now functional and quite efficient.
commit 946e802b277df71915802c3e453538585deb43ab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 18:10:50 2010 +0100
Changed constant.
commit 44aec5b350ad55fde596c7b2d218b57c6fa2f659
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 18:09:42 2010 +0100
Factored the Reader/Writer logic out of the SuffixArray class to use it in the disk construction.
commit d243d40fb8760a3736f7dd2e9bdcbc264f25fb7a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 17:10:20 2010 +0100
Added merging of suffix array index to disk construction. Now fully functional.
commit d0c9ff578c86cbf15a85405976176cb6ba3bd96d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 16:04:16 2010 +0100
More cleanup.
commit f3b606acd6a750e5f03bfa3b3a61ea389fc1ad03
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 15:53:51 2010 +0100
Cleanup of BWTDiskConstruction code
commit 8718a706fcd49cd11a26691be16395cc8a638e4b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 15:33:00 2010 +0100
Changed the ordering of equal strings from ID comparison to index comparison. This makes it far simpler to merge BWTs on disk.
commit 71a3f805d7c27b37b3df14ec31bcbf981d671e2c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 00:16:54 2010 +0100
Minor cleanup.
commit 5dc7ba3e14264b82d38592f4d67a1c901893092d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun May 9 00:06:30 2010 +0100
Changed constant in disk algo
commit 1911c6e5133fb9ef8049256e75f8b1d425eda45b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 8 23:43:49 2010 +0100
BWT merging now working, still need to merge the sai and track the relative ordering of read ids.
commit c10d46c2e1727c893be5c9d4736610418ace42f3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 8 18:17:16 2010 +0100
Implemented merging of a bwt in memory with a bwt on disk.
Not complete yet as the Occurrence array and suffix array index must be
computed as well.
commit ebb3ae4701fc04970504d35f6f4548b0686149c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 8 17:04:19 2010 +0100
Removed some more dead code from BWT
commit e835b69c8cbf5807d72897b505dea287e89397b7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 8 17:02:34 2010 +0100
Refactored BWT class, moved the reader/writer logic into seperate classes to allow them to be used by the BWTDisk construction algorithm.
Removed some dead code.
commit c3e8c415725f407e4a6c3814b62e178b97db9726
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 8 00:56:06 2010 +0100
Implemented the control flow for the bwtdisk algorithm
commit 4c2f41f82bcdea1234136d0a94fbe9f0a860816a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 23:52:08 2010 +0100
Added BWTDiskConstruction stub and command line arguments to index
commit d402007b6423f006a9b3759acc1bf824463146a2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 16:29:57 2010 +0100
Fixed output in overlap align loop
commit 1426c53d71cf9d04bebf766ab48c9233aa1dc217
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 16:24:58 2010 +0100
Tweaked preprocess GC filter.
commit 75c0ee548353620fa5e6cfe62f3be4b6fefe018b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 16:22:59 2010 +0100
Removed dead code.
Added ability to filter out reads with low/high gc content in SGA/preprocess
commit fd3c72faccc40e407e5df3e4ef64f83b676f65d9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 15:58:17 2010 +0100
Re-enabled the list version of the inexact overlap algorithm instead of the queue version.
The queue version was far slower for highly repetitive reads. It is not clear why this is the case.
Started to implement handling of Illumina-scaled quality values in preprocess.
Some inlining in the SearchHistory code.
commit 113d6790f0336953a4ead4b02a9e58dc2c6bc3ba
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 7 10:04:45 2010 +0100
Integrated new SearchHistory tracker into the SearchSeeds.
commit dbd6948e1c1128ab7c2d3d09827745000513602f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 6 17:02:28 2010 +0100
Refactored all the search history classes into one file. Added function to get the history from a SearchHistoryLink
commit 6298bd292b9ab8ed156edb6663c5f91a92e168ff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 6 16:36:57 2010 +0100
Implemented reference-counted search tree
commit c5cc141a87f07ced04234591a05cff6ee25123cf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 6 13:26:23 2010 +0100
Added ability to randomly change Ns to bases in preprocess so that discarding reads can be avoiding. It is turned off by default.
commit 9ecacbd47bc440e65730e682e99766e61599f4f8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 6 11:58:34 2010 +0100
Big improvement to inexact overlap, only branch the search seed after its interval is valid to avoid a big unnecessary copy.
commit d3b4d61291840580c70a34b48479dd4b0442f2d4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu May 6 11:40:31 2010 +0100
Rewrite of the findOverlapBlocksInexact algorithm. This is somewhat cleaner and a bit faster than the previous method. More cleanup/improvement is possible.
commit d3939af4687ec0acc372acf0c44836fee67da007
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 5 16:40:26 2010 +0100
Removed missed print statement
commit fc5198ed44c8e290f07db4044e17fcacdc321124
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 5 16:29:11 2010 +0100
Changed the order than vertices are remodelled in the ContainRemove visitor to visit the neighbors in order of length. This
alleviates the problem of adding transitive edges to the graph that can result if shorter overlaps are processed first.
commit f7cf954fb3a00cfb3dc5ff331ed498017097fabf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 5 16:10:30 2010 +0100
Added oview2fa.pl tool
commit 709bdfc370bf6fed6c851a5e635582677765ea5b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 5 16:09:14 2010 +0100
Added graph structure validation visitor to find cases where the irreducible edges are missing from a vertex or erroneously found.
Fixed how containments are handled by adding two edges per containment, one in each direction.
commit 7e1e5ca4389d10c16d6dbced1f0b31e920bbe233
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed May 5 09:18:11 2010 +0100
Renamed the SGTransRedVisitor to SGTransitiveReductionVisitor
commit aaa38105ac9a00431fdacb37230a51f77784f4c5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 16:43:13 2010 +0100
Do not allow containment edges in Vertex::getEdges(dir)
commit 12936c5d09d76dcc4572a9459d8085ae0934e2ae
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 16:25:58 2010 +0100
Added function to Util to make the floating point comparison between two error rates while allowing for a small tolerance.
Use it in removeVertexForExcision and OverlapAlgorithm functions.
commit 54fe047e6984d1003d73571f32e4e56cb45a217c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 16:02:03 2010 +0100
Cleaned up interface to enqueueEdges
commit 7e7b83d0ff10be999f268c97aeefcf73fc89baa7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 15:44:08 2010 +0100
Returned FUZZ parameter in SGTransRedVisitor to default value of 10.
This fixes the exhaustive transitive reduction algorithm that was previously broken. It is
apparent that the TransRed algorithm of Myers can erroneously remove edges if the vertices
are in a cycle as the direction of the edges is not respected. This only occurs when the
FUZZ parameter is high (>50).
Removed many debug prints.
commit 9bd14a810c30adf7c445d786cf7e04f00548fd4c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 15:06:09 2010 +0100
Rewrote overlap/edge inference algorithms to use EdgeDesc instead of Vertices. It is important to track the directionality of the edges as weird palindromic
sequences can have the same overlap from both ends. Tons of debug statements are still in the code and the exhaustive algorithm is broken.
commit e9bd7998280934f241f05064a1a98367d8ef187b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 13:23:10 2010 +0100
Modified EdgeDesc to use a pointer to a vertex instead of a vertex ID
commit 7cae33c6687249399fed99ce1e35912331a7b991
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 4 13:10:52 2010 +0100
Separated EdgeDesc into its own file.
Small updates to the resolveVertex functions. Large refactoring on the way.
commit 91e7e49b30cda025123806fa024563d2b864a577
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 3 18:14:36 2010 +0100
Working version of transitive-aware contain algorithm. This algorithm is much cleaner than the previous version but the implementation must be cleaned up.
At this point it is unclear how contained edges should be handled when walking the graph.
commit 8e9b34a1b281025843a3e4aecc14be65e7d718a3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon May 3 16:40:18 2010 +0100
Preliminary implementation of contained vertex resolving algorithm. This is a debug version and will be changed in a subsequent commit
commit 84a124387c0c7c351eabd10b0065847694f26d44
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat May 1 00:27:30 2010 +0100
Fixed bug in inexact irreducible object. If multiple overlap blocks are the same length, some transitive blocks may not get marked.
commit 92b255fa76670d1a8e4b815a6042d0d3fa56dc88
Merge: ffe3b2e 910e8d5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 30 23:00:49 2010 +0100
Merge branch 'master' of ssh://127.0.0.1:2222/~/work/git_repository/sga
Conflicts:
src/Algorithm/SearchHistory.cpp
commit 910e8d50fc4241c5db92da54f8e3c56451a17276
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 30 16:26:30 2010 +0100
Fixed careless bug in SearchHistory
Added experimental SGAlgorithms::patchRemove function to transfer irreducible edges between vertices
Added simple asqg to dot conversion perl script
commit 847871359fca8bfd5b38bf4b20602a711e996e44
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 29 10:37:08 2010 +0100
Fixed bug in SearchHistory calculation
Added debug statements to OverlapAlgorithm
commit ffe3b2ee30255fdf4dc1fbd2ee3821ad2c0650ce
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Apr 28 17:25:38 2010 +0100
Fixed fencepost error in SearchHistory compare
commit 50814fc8668a9fe5ad6e289331e23a0791f3eebc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Apr 28 15:46:10 2010 +0100
Added function to write out an ASQG from Bigraph
Fixed bug in inexact irreducible edge calculation where some transitive edges would be missed because the inferred error rate between two reads
would be calculated incorrectly.
commit 2f23faa66dbae94f40cc9e49ff4b9d2b7f398e15
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 27 16:34:11 2010 +0100
First pass at inexact irreducible algorithm. Some transitive edges in the test set I am using remain but the majority are culled.
The handling of containments must be improved and is likely the source of the above bug. The cost of keeping the search history
during the backwards search is significant.
commit 32b16216a80e69736695da9ece9cd10dfba15e30
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 25 22:09:35 2010 +0100
Added SearchHistory classes, transfering code to work
commit f12d00cb1c08debf9bda9dda5ce049236d6ad3fd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Apr 24 21:14:40 2010 +0100
Fixed missing include.
commit 3707ecb4f1ec030891055b10e7aa253535502c6f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 23 15:59:51 2010 +0100
Cleaned up OverlapAlgorithm for irreducible overlaps. Preparing to implement full, inexact irreducible algorithm.
commit 20851749d6cb572833d7a9c984aaeb504ae3ec4b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 23 09:53:01 2010 +0100
Fixed output in transitive reduction to accurately report the number of edges and vertices marked.
commit fe88531579a6e40c66db0978b30c1ea55de60adf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 22 16:16:11 2010 +0100
Fixed incorrect timing of collapsing seeds
commit b408e8f6a89b9f073195657923c22dee277d59b3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 22 15:43:08 2010 +0100
Added options to SGA/overlap to explicitly set the seed length and stride. These allow for more aggressive seeding (and lower computational time) but break the guarantee that all overlaps within epsilon are found. They are not used by default and fairly experimental.
commit c066cbb9fb3f553602f5f078dc9121e72e08949c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 22 14:48:28 2010 +0100
Added --edge-stats command to assemble which outputs the distribution of overlap lengths and number of differences
commit 0c4d1a91f78a34844d6c6bdaf85c3b2e1cbfc9d7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 22 11:41:55 2010 +0100
Replaced OverlapAlgorithm with the new, faster seeded algorithm that was developed in OverlapAlgorithmNew.
Refactored the OverlapAlgorithm code to be cleaner and more concise. Replaced the lists with vectors which gave a nice speed boost.
Moved OverlapSeed to the algorithms directory and renamed it SearchSeed. Moved OverlapBlock as well.
commit 145f4a10677fa0228d70dbf861ebd31ebe6d02be
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Apr 21 17:06:37 2010 +0100
Added new OverlapAlgorithmNew as a re-worked OverlapAlgorithm. This is temporary and will be merged soon
commit ffa792497ae78b172d28ef7be62e3ead92e0e5b5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 20 14:26:16 2010 +0100
Use createWriter/createReader in BWT
commit 81e57964c3080f6eea52152652e34e5782ceeeeb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 20 14:02:44 2010 +0100
Created createWriter wrapper in Util to open a gzip or plaintext file writer. Used it in SGA/overlap
commit 925b1f7eb73f16ae051f3c43856da707317165c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 20 13:49:44 2010 +0100
Created wrapper for opening a gzip or non-gzip file.
commit 5a60d5cd931fdbdb5b67af1224ea61494d6cdcd0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Apr 20 10:41:01 2010 +0100
Fixed oview to use ASQG input. Removed unused functions.
Added missing delete call to SGUtil::parseASQG
commit a7d4766fda03bdcb0d200033e880006bb05ada65
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 19 14:26:43 2010 +0100
Fixed parsing bugs, string graph with substring verts now loads and builds cleanly.
commit 2b07b58c273406f14f1eeb7ac5aaf3334627ee11
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 19 13:23:44 2010 +0100
Wrote ASQG parser in SGUtil. Now used to read in the graph.
commit 2896beb1b568867aaea1c965ef4000c9ea0933e6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 16 16:53:17 2010 +0100
Removed old unused TagValue code in SQG
commit 1b5a6a7c7a88a4b63fb995029aeda62d801de57d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 16 16:51:25 2010 +0100
Refactored the way ASQG records are output.
commit f4bed35e2619917859b31277f871071d10d850aa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 16 14:10:17 2010 +0100
Integrated gzstream wrapper for zlib. Used it in the overlap step for the final ASQG output and the temporary hits files.
For the index step, suppressed the output of the suffix arrays.
Refactored functions in Util.
commit 4c95cac06527d3b9ecd12a01099f1d3f17491ae8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 16 11:13:05 2010 +0100
Modified the inexact overlap detection algorithm to remove redundant seeds
commit 953316a98a5fb312b42f8f2171c203ec7d6a8e3d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 16 10:18:20 2010 +0100
Fixed bug in OverlapAlgorithm introduced during previous refactoring. Overlaps were being output for non-terminal right overlapblocks.
Implemented writing the vertices out to the ASQG file during the overlap computation step for both the serial and parallel mode.
commit b44cf9ea119d22727660dde9e5d38d08f7221cea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 15 11:27:13 2010 +0100
Added a pe-aware mode to preprocess. PE reads will now be discarded/kept together.
Changed name of checkFileHandle in Util to assertFileOpen to reflect the fact that the program exits on failure.
commit 773fd0cce034b321e7dc488f34fde27a9c02c515
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 15 09:35:42 2010 +0100
Fixed bug in preprocess where sub-sampling was not working properly.
Moved OverlapAlgorithm to Algorithms directory
commit 66bdef0b20ae4c554cedd3823cf81f7347f2499d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 11 01:04:28 2010 +0100
Wrote a file format to hold the assembly graph. It is implemented in the SQG/ subdirectory. Modelled after the SAM format.
Added support for SQG format to SGA/overlap
commit f4b74e45df9e0d2807e2da82040fe07b411524ec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Apr 10 17:20:16 2010 +0100
Added SQG format stub directory/files
commit 03127e06f688f71e7c6d18ab5d6dbc0bdb7c29f7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 8 16:26:13 2010 +0100
Fixed bug in Match::infer when sequences are not the same length. The coordinates must be translated before setting the .seqlen property of the SeqCoord or else the isValid() assert will blow because the start/end may be out of range
commit b5a415e7e7b40b344ee467715ef595197c149916
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 8 12:51:24 2010 +0100
Now checking error codes from pthreads creation routines
Fixed a bug where an assertion in Match.cpp was asserting on the wrong data
Fixed a bug where SGContainVisitor would keep the wrong vertex if the relationship is a proper containment
commit 63415b54b90b13007373c5c7c6704ca9426475d6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 4 18:55:10 2010 +0100
Tweaked trimming
commit 975670b893de744bc3502f61b65fa320ea47d01d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 4 16:37:00 2010 +0100
Re-wrote Vertex::makeUnique
commit 8e6bd3a59b82a778eebfc7c9c5536cd217dd8ed6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 4 16:16:10 2010 +0100
Added output counter to error correct visitor
commit 40743b0656b893dd0dcb4959b6b3e6d6c454fbaa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 4 00:39:53 2010 +0100
Added error correction mode to assemble.
Added command line options to assemble to output the error corrected reads and to use the experimental remodelling code.
commit 54d4cee718af3a413dee84ed370ea845a2cb7c4f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Apr 1 16:49:44 2010 +0100
Started work on graph remodelling code - added functions to discover the complete set of overlaps for a given vertex.
Expanded error correction so that it works on a transitively reduced string graph.
commit 2e45ae2b997a5a550f5f8e6af0b73db684994d6e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 29 22:28:16 2010 +0100
Cleaned up some includes, made a stub class for the graph remodeling visitor
commit 15701b94d7f85d57a18538009504551cdc1a2032
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 29 13:34:55 2010 +0100
Added missing includes
commit 5d4cd045677339f9839afeef1be7d3e400132d7e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 29 11:47:16 2010 +0100
Added ability to sub-sample reads to preprocess
commit 7bb72465b889cd4e115b82943ad99918a3cf7ba0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 19 17:02:12 2010 +0000
Very good version error correction
commit 2a2d2cdeafba1c7d63f09fee6450a8952280d8c8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 19 11:25:51 2010 +0000
Refactored error correction code into its own namespace in the new Algorithms directory
commit 9e45b43e4bbec0680920bf8bf61fa84afc256ba0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 19 09:57:54 2010 +0000
More changes to the experimental error correction code. Current method can resolve repeats quite well. Must be refactored
commit 9a05727c222fea66b2fac7c526b51f21f8e9ba8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 16 16:55:48 2010 +0000
Removed print statement from preprocess
commit ab63ad02aa19f389813dacd310e8259fdaadc444
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 16 16:51:04 2010 +0000
Created SGA/preprocess program which processes read files to remove low-quality subsequences and reads with ambiguous bases.
Created Util/Quality to implement quality score conversion logic
commit 3d818c712c2a05b8b6c680665aeb4bd9c100b95c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 15 20:59:27 2010 +0000
SeqTrie: removed inefficient insertAtDepth function
Vertex: instead of building one ambigious trie for the entire edge-overlap set, build one for each direction.
These modifications remove the explosive memory expansion.
commit 5161fff0d109e41c00fb938fbb6655257e81fceb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Mar 14 21:08:10 2010 +0000
saca.h/saca.cpp: Changed bucket data from int to int64_t to prevent wrap around for very large suffix arrays.
commit d11b9b0a17fcb2ecee989dc179fdd2740d3fe535
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Mar 12 17:02:55 2010 +0000
More experimental conflict resolution code
commit 888f0c9aa1e9ffed007afa4142810881b36f7a10
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Mar 11 22:56:56 2010 +0000
Checking in exploratory code to work on it from work tomorrow.
commit c5b14fa14ebc003ad4961409d50064e1f5b4149b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 10 17:00:43 2010 +0000
More work on SeqDAVG
commit 1b68641747cbe277a368b5a1a10e0efbfa620f81
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 10 10:32:42 2010 +0000
SeqDAVG insert at depth working.
commit 09cf84b0ca0fed5e39586e5d1fb20c8d8281f8fd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 9 18:01:53 2010 +0000
Implemented basic functionality of SeqDAVG
commit 0e7405e4ad131fda503b02ccf73fce7621da4c13
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 8 19:39:35 2010 +0000
Adding incomplete and non-functional SeqDAVG class to switch to sanger
commit 29f11918bd220fd8c20fe0213ee056a871fa34ff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 8 16:19:56 2010 +0000
Added samQC.py which parses a SAM/BAM file to output some error rate metrices. Mostly used to learn python
commit 0b7ec714108f9c227b4ba3a6a90dde9e138a86ab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Mar 4 17:08:58 2010 +0000
Transfering code to home, slight modifications to previous
commit 1d2114607cf1af4eea3f5ad9176d4b4dd7b943ec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 3 21:39:52 2010 +0000
Tweaks to previous.
commit 5153467ec851959631cb20e7b17909683644a823
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Mar 3 21:21:57 2010 +0000
Worked on SeqTrie-based error correction. It performs much better than the previous partitioning based methods but it is unusable because the memory usage explodes because of the insertAtDepth() which cause a combinatorial increase in memory use.
commit 6e916915f19eaadc7fb7bb3069a0eaf66d47aa5d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 2 23:05:44 2010 +0000
Added functionality to the SeqTrie. Added function to Bigraph::vertex to construct it from overlaps.
commit 5c81fcfa28c2a3ffb3e7a59905c8dff5ba664910
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 2 16:38:03 2010 +0000
Removed debug code
commit a0a5a5b9868d4ab15a6527e6c0094359a142cd80
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Mar 2 16:30:39 2010 +0000
Added SeqTrie class
commit 9d6889efbfb627827e320ebe2c38cdfb26d6d3a2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 1 22:23:51 2010 +0000
Tweaks to previous, wrapping up coding for the night and shifting working copy to sanger
commit a4aafaa931a2c1d6f912984fd70edbb03a779fad
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Mar 1 18:35:20 2010 +0000
Better partitioning function based on splitting the overlap set via discrepent bases
commit 5c19e61baf20b922b31f6c613b22e368bd896d6f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 26 15:59:54 2010 +0000
Tweaks to partitioning code
commit 9306ce7cdda1ec37f7c769822ecc57fd274328e9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 26 15:07:47 2010 +0000
Added more experimental partitioning functions.
commit 00487ff5c6ac27a6bf8da69d1bbef6c1b5844ea7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 20:43:02 2010 +0000
Added field to output in debug visitor
commit e8b2e2a82f15abaf140ca553ca30ec3418d99063
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 20:35:05 2010 +0000
Fixed bug in partitionLI and tweaked params
commit b2ffd13c3e2114312cbdb5a5337e8725dcdb0c2f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 19:56:13 2010 +0000
Added actual read correction and visitor to remove edges that have an error rate above a threshold
commit 114fbe27f7372b4490c9c118d2ef81e2c336d100
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 17:58:25 2010 +0000
Implemented a new partitioning method based on improving a global likelihood
commit 386c01761d19c26ba3ac30715b32cc1710c94007
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 17:08:46 2010 +0000
Added error correction code that uses the partitions calculated in MultiOverlap
commit 16acd1ef3780e0b3234b058fea6a202f64441e35
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 24 14:58:15 2010 +0000
Added two experimental likelihood maximization functions to MultiOverlap
commit 454d1849ac3594108edb05f46bea192b51b7011e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 23 16:59:35 2010 +0000
Re-enabled the realign visitor as the default operation to perform in debug mode.
commit 4c3aebf09722e4063ec4e52fa4a9c47157f54493
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 23 16:52:21 2010 +0000
Started refactor to refactor alphabet data structures into their own class
Started to implement wrong-edge removal algorithm by implementing likelihood calculations
commit 47262b790cd5a1c786a69dceb4017b39ea5162bb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 22 15:57:40 2010 +0000
Added function to write out the overlaps present in the graph.
commit 649f9540fbfae6cd3e3e2be8811e2578ed3acedc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 22 12:54:38 2010 +0000
Fixed bug in OverlapAlgorithms where containments would be output many times
Allowed containment edges to be inferred in SGRealignVisitor
commit 049f7bb860e94cd23dd9c546192619b5ccbea193
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 22 11:08:32 2010 +0000
Re-worked seqcoord logic for clarity and to handle containment seqcoords.
commit b7b21a265742dea9ae1990efe36598afcdafe4ea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 22 10:47:11 2010 +0000
In-progress checkin. Added ability to include containment relationships in the graph.
commit 0b42875aabcfc6a72221dac233feac1258360da7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 21 23:04:05 2010 +0000
Fixed bug where vertex colors weren't being reset correctly in SGRealignVisitor::getMissingCandidates
commit bdc95b18d6bd4e8c9faed4c4eafa00798eb3c487
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 21 18:56:00 2010 +0000
Removed some debug prints
commit 9b152351931b73b37ebc27b5d0037fd59390bed8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 21 18:54:13 2010 +0000
Fixed bug in missing edge inference where duplicate edges would be generated
commit 0b5c44f5452cb30230e7345e0db57a6f5c112f94
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 21 17:28:05 2010 +0000
In-progress checkin. More robust algorithm for computing missing edges but not perfect yet.
commit c7ad5cbc89d6d0b0a3290f1b840ab2b84c883898
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 21 13:48:46 2010 +0000
More experimental code for detecting missing edges in the graph. Needs cleaning up.
commit c50d61e5194720e70bd9f20df182b8b7897ff7ec
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 20 23:20:29 2010 +0000
First go at performing transitive closure. Very heurestic and a bit hacky in places. Notably generated containments aren't handled well.
commit e4fbabcd83f39a7d7822283d99579ed831a8d62f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 20 15:20:25 2010 +0000
Added new quality/probability calculations for overlaps
commit 8514bb6656f261f429bd6a53c3d612e0ff4f7c1e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 19 17:05:14 2010 +0000
First go at base probability calculations.
commit af261192e05cb34856f1c09e4819a8018ecd9f4c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 19 15:43:05 2010 +0000
Refactored some logic from MultiOverlap to Pileup
commit b96dbdb4e570c72becbb6b78beb93db366c7cbad
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 19 13:34:53 2010 +0000
Refactored out the Alphabet stuff from the SuffixTools dir into its own file in Alphabet.
Started to implement inference algorithms in MultiOverlap
commit 1ce43db859c4260bff75609d107973df2726b139
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 18 17:04:27 2010 +0000
Added code to infer matches between different transitive groups
commit 74c37e34faa5a2a59e588bdc920e403d07d580ff
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 18 15:52:49 2010 +0000
Added TransitiveGroup/TransitiveGroupCollection classes and a method to Vertex for constructing these.
Implemented debug logic to use above.
commit fa5e126c08f33668b31a931dbbc9662caf4b8d10
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 17 21:02:44 2010 +0000
Added debug functions for detecting when edges are missed due to base calling errors and inexact overlaps.
commit dd6502318752b3e181f8e3c6255277b35bf25626
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 17 16:52:54 2010 +0000
Added pileup functionality to multi-overlap.
commit da72d9a1b9f9291becccc2eb80fd64914fa79613
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 17 16:08:44 2010 +0000
Refactored the multi-alignment printing code from the oview program into a class (MultiOverlap)
commit 55523bcf1dc2f61f890d8634817fb67ce675730b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 16 16:20:47 2010 +0000
Heavy refactoring. Moved the inexact overlap code to OverlapAlgorithms. Broke the huge, ugly _alignBlock function into more manageable chunks. Still needs some cleanup.
commit e64169b977a893f708beb6e7fdb20aa0b9fb7c3c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 15 17:41:28 2010 +0000
Moved functions from BWTAlgorithms to OverlapAlgorithm
commit a80e90c9953289f4fceafa17d8ac2c2f3704d214
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 15 17:15:34 2010 +0000
Removed some debug code.
commit 1dfcdae7d0e04ee67574496f3d2e633d6480d4e9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 15 17:07:47 2010 +0000
Refactored some visit algorithms into SGDebugAlgorithms
commit e1640eb94d3fafc012e7df67d4718fe7bd88e5c3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 15 16:16:37 2010 +0000
Refactored functions out of SGAlgorithms into SGPairedAlgorithms
commit 4570bae2c6125bdf894a383d4cd4f1aeb7c931a0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 13 23:48:52 2010 +0000
-Removed redundant calls to get the occurrence counts from the FM-index in BWTAlgorithm::updateBothR and updateBothL. Big improvement in speed.
commit ec04fd537a1becc9e9fa4f74df1214499495b84a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 13 23:33:14 2010 +0000
-Manually unrolled very-oftenly used loop in AlphaCount
-Added experimental SSE code to STCommon::AlphaCount. Not a huge improvement and is currently not compiled in.
-Fixed naming typo in BWT class
-Removed experimental pairing code from assemble
commit a26409ebb22bc7f0f51e0f374a3050e0497e9d2d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 23:24:52 2010 +0000
More experimental PE code, transferring to work, will be reverted later.
commit 8b366e831e722b2f933fda6e20f28a030967c3d4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 21:45:20 2010 +0000
Enabled pairedoverlap visit
commit 5e2a21adbfad2b961ddddc0347386c79cfcc8c1d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 21:28:08 2010 +0000
Very experimental code for paired end resolution. Not at all a stable version - transferring to sanger to run on the farm to generate numbers.
commit 9ed7aa7ccbe99ed9fc0dd7455a35d27286b98754
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 18:59:20 2010 +0000
Experimental paired end resolution code.
commit 1c60d92f0b8658d8a6620b7d944273eb1c51240f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 13:24:03 2010 +0000
Refactored the batch model parallelization algorithm. Currently gives better performance
than the (previous) scheduling algorithm. Performance decreases after around 4 threads, I suspect its bus
contention and I will investigate with the intel vtune analyzer.
commit be2929e282a99302d1283ddac6290af71eed05f9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 11 10:54:09 2010 +0000
Committing experimental batch-scheduling algorithm for overlap threads.
commit 745e44aecf9ef2226b2eefda00739598f2ddd4f9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 22:52:24 2010 +0000
Commiting experimental multi-input buffer threading code to transfer to work
commit 65b987631fff540a58b878c81b4b43af75c27ed1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 20:28:44 2010 +0000
Minor cleanup.
commit fabb7411505d750985e449a2f5d8776277d0c6dd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 20:17:07 2010 +0000
Completed threading work. Removed LockedQueue class as it is not used in the threading module.
commit 2bc11dc07f1a161e3d662946dd53ea55128e4a01
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 19:42:43 2010 +0000
Removed one of the semaphores from OverlapThread as it was redundant
commit 128e8cf2672f385f97b823cbbce4477b0542883e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 18:40:27 2010 +0000
Stable version of threading overlap module. Output is not properly processed yet but the thread logic is more or less complete.
commit 23a987753951312b160e49052e792066900e5b3e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 18:29:02 2010 +0000
Better but more complicated semaphora usage. Current version uses multiple buffers but it is probably more complex
than necessary. Checking in to save the code for future reference.
commit f949bcfd2d9fcc286f093368c316af3d9206ab88
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 14:23:10 2010 +0000
Better threaded code, still in progress.
Changed timer so that it reports wall clock time.
commit 0c9109aae0511e8388eab279fcf572630becbcc5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 10 09:55:55 2010 +0000
Implementation of threading that isn't clean but works. Will be cleaned up.
commit d51766ada7343a0d101281b348cd6298a532ee07
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 9 21:14:22 2010 +0000
First pass at threaded overlapper. Lock contention currently kills performance.
commit 821da1a62118ca2136c059afa2346a273b8d4916
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 9 19:50:03 2010 +0000
Modified GPL boilerplate and added COPYING file to source tree.
commit 44a63628b092166ba4aa7690f93b2a48b98db247
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 9 17:04:22 2010 +0000
Moved defaults from parseArgs to declarations.
commit 15010f76693a3669340eaec1fb1f0a8ba18de729
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 9 16:51:55 2010 +0000
Output formatting changes
commit c9ae79d365a794f3d4b290c6934196b82547335d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Feb 9 16:45:03 2010 +0000
Refactored overlap module in prepartion for adding threads.
commit 63887a9a22d52c839cc0664c70686ea04838e974
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 8 21:53:30 2010 +0000
Added warning as a note to self
commit 9ab69dc4ea589a805da197d96808b109bc72f0e5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 8 21:31:23 2010 +0000
Implemented some of the threading code, fixed configure/makefiles
commit f55ab88e57374aaf454ea9e98c38a6714c17a5ac
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 7 22:18:15 2010 +0000
Added stub OverlapThread class. Currently not compiled in.
commit ac8744505e4c64c92312d0c3ff096ad882eda2c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 7 21:20:01 2010 +0000
Added destructor to LockedQueue to destroy the mutex.
commit b718deef8c6121aa6de9338b8427b9d1081900de
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 7 21:18:52 2010 +0000
Created LockedQueue class
commit bb695ec963a8062dd9aaad4e98dcb791d5255370
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 7 20:14:13 2010 +0000
Formatting changes.
commit 790c5079be6bd12028844bacc4af6d37d187bfb8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Feb 7 19:50:54 2010 +0000
Refactored OverlapBlock to remove the need for a seperate OverlapBlockRecord class.
commit 64edc8e51a228dd8e507fff3c4e30bc0ba118bc9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Feb 6 22:18:26 2010 +0000
Implemented the removal of sub-maximal overlap blocks (as produced in rare cases by BWTAlgorithms::findOverlapBlocks). The complete list of overlaps is sorted to find overlapping blocks which are split apart by OverlapBlocks::resolveOverlap. This replaces the IntervalSet idea that was never implemented. The algorithm could be slightly improved but the triggering case is so rare it isn't worth the extra complexity.
commit 4f157afe9dde313c6b05742fd034ce0afce8f521
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 16:58:49 2010 +0000
Adding interval set stub code.
commit f1ca29a97711f30d22e83072f368d9b4e6d64718
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 12:00:54 2010 +0000
Cleanup.
commit 6d7a29fb49c8673e82abdffea086ab5f9c34a8b9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 10:39:04 2010 +0000
Cleaned up overlap computation, removed dependency on hits
commit 7d12bdcf17b6b1dc0b4168ee19f477fc5e691ef9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 09:55:39 2010 +0000
Added missing files.
commit 86946f451d74b7079220c5fc6a1c1e55edee31c2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 09:54:57 2010 +0000
Fixed spelling error in Occurrence class
commit c2f85b0fbe48713830fdf425ce0e8374679187f4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Feb 5 09:38:07 2010 +0000
Changed overlap hit output mode to ascii for consistency with other programs.
commit febb3c2288b17316dab697b3715faa1fc3447ab3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Feb 4 17:04:04 2010 +0000
Large in-progress refactor of overlap stage. Now instead of outputting hits to each element of the suffix array the initial
hits are output as suffix array blocks.
Refactored classes out of BWTAlgorithms into BWTInterval.h and OverlapData.h
commit 35b47ceaca99b284ebd29d3f43333f67577d1fe2
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 3 19:24:52 2010 +0000
Moved EdgeDir/EdgeComp definitions to GraphCommon from Util
commit 457b7246b9159547de370c80dad49187fc1afccb
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 3 19:13:08 2010 +0000
Factored Interval/SeqCoord classes out of the Util files
commit c33a2655b2cb5b5c4323939111d3795f9ffd3b7f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 3 16:05:01 2010 +0000
Edge.h: Removed boost memory pool code, passed delete calls to memory pool (which do nothing by design)
SimpleAllocator/SimplePool: added void* paramter to dealloc
commit d5c75464badd3f1a38b032f1aa47fad84f4abf64
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 3 15:58:52 2010 +0000
Added SimplePool and SimpleAllocator, an implementation of a zero-overhead memory pool for objects that do not need to be freed.
commit 73f4844662551dc3777c76112d84758939701b03
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Feb 3 14:01:45 2010 +0000
Updated Vertex::getMemSize to include the size of the string
commit 111326d78d1909146893ae03aba3cda33bfe04a5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 1 16:49:42 2010 +0000
Added BitChar class containing a simple bitset of 8 bits
Used BitChar to pack the direction and comp properties of edges into a single byte, yielding an addition 8 bytes of memory saving
commit 9c6befae826c145896b7516255af4f2e0c03e866
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 1 14:03:31 2010 +0000
Removed m_pStart member from Edge as it can be found from the m_pTwin member. This saves 1 pointer.
commit 62c46dac1e2cd1932543b1dbf2879fa89963b7fe
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Feb 1 13:48:10 2010 +0000
Merged StringVertex/StringEdge into Vertex/Edge to simplify code and avoid (unused) inheritence overhead.
commit 8c9a4a720d56ca8e2821109f3bb50fed0f67ae47
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 29 12:58:15 2010 +0000
Refactored StringEdge/Vertex into its own file.
commit d95ede8e388e9a9c54d27246958f05e9f376d278
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 15 13:22:05 2010 +0000
Added functions to ensure that all edges for a vertex in a given direction are unique
commit 2ac0788759d5a2f5088372a9a89537b313eb59ef
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 15 09:48:52 2010 +0000
Refactored Vertex class to keep edges in a vector instead of a list. Many edges must be removed from the vector but the erase() calls are not a bottleneck.
commit bd6030a8b10d2872f95ed80cbc2875f589ee1f64
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jan 15 09:38:46 2010 +0000
Enabled -o and -m flags in sga assemble
Added code to StringEdge to use boost memory pool allocator which is currently disabled.
commit 8a0ecf600a441fd1ecb8fb735a76dbda4f8ff66d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jan 14 11:24:03 2010 +0000
added -o and -m flags to specify output file and the minimum overlap size to accept, respectively.
commit 7ad3239c039f760bf3db16841ce1c736db0425c5
Merge: f996766 fe2a4d6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 9 10:50:44 2010 +0000
Merge branch 'master' of /nfs/team71/phd/js18/work/git_repository/sga
commit f996766caa0ae7ce59c5c03617c2ff4a8294e1ea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jan 9 10:50:11 2010 +0000
Added ability to output RC reads to SE sampler
commit fe2a4d6931dbb487a2ecee66b59c64cdcf97d4dc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jan 5 10:06:27 2010 +0000
Removed malloc.h include as OSX does not have this header
commit 2ef1a00d9c61c056531a0af1ddad778bd8187033
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 4 16:44:03 2010 +0000
Changed simplify output frequency
commit 9f3401fd0eefb811b97811f61b3df9a590cb27db
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 4 16:40:20 2010 +0000
Minor formatting change in a print
commit c25159db90ae802d6c1d74c61563232e9b6f6807
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jan 4 16:27:50 2010 +0000
Modified simplify to preferentially merge in the ED_SENSE direction so appending to strings is prefered to prepending
Changed link order to fix dependency problem
Fixed casts in StringGraph for intel compiler
commit 4e94e3e92166145a4108fb429c7181a74d2d5f2b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Jan 3 22:41:33 2010 +0000
Minor change to assert
commit e757284b28ff41b1538cbcdf4bc5167b4118692b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Dec 14 21:20:39 2009 +0000
Generalized the irreducible overlap algorithm to handle reverse complement alignments simulatenously with regular alignments. The code is in need of a cleanup/simplification, particularly with how contained reads are handled.
makeSampledReads.pl: Disabled PE mode so I don't need the Normal distribution CPAN module installed everywhere.
commit 4b1df93db1e7eb832b688a6c7beecca8d686ebbd
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Dec 13 21:18:05 2009 +0000
Reordered members in Edge class and changed GraphColor from an enum to uint8_t. The Edge class (and classes deriving from it) are very heavy and push the memory usage way up for big assemblies. It might be worth removing the start pointer from the Edge class to save 8 bytes.
commit 9ebfa566be40e1f204cfa568c783319ebe4e3f90
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Dec 12 01:11:41 2009 +0000
Removed hash test code as its not smaller or faster than using std::map. The vertex finds are not a bottleneck.
commit c38f96a090e654a59ff56eeac0fa2122a59b3f32
Merge: 5a7e2dd fe4233a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 11 18:51:30 2009 +0000
Merge branch 'master' of ssh://127.0.0.1:2222/~/work/git_repository/sga
commit 5a7e2dd9eaad0c2588e0eb99dc7401c1c77ecd3a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 11 18:51:14 2009 +0000
Added size tracking.
commit fe4233afad70a449673c1659dfa625ec05ea953a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 11 18:29:49 2009 +0000
Commiting test code using hash_map for transfer to home.
commit 6a12c97953316adb25d7c24535e5bf31ff2f1447
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Dec 11 13:45:57 2009 +0000
Changed command line parameter names, rewrote to use strings as buffers instead of arrays. Much less memory required.
commit 58a7f9d5d885e5e967e4da310b9f761b0f178aab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 27 17:03:42 2009 +0000
Commiting changes to work from home.
commit 0143a7aa4ba3ff8729ad0f4e89fde4046215e724
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Nov 27 16:41:39 2009 +0000
First pass at bwt to string graph algorithm.
commit d6bf047b5aa6e82820114ebb6414231b17809b2b
Author: Jared Simpson <js18 at guest267.wtgc.org>
Date: Mon Nov 23 11:42:49 2009 +0000
Removed call to basename so code compiles on OSX which does not supply GNU basename function and POSIX version is unsuitable.
Added checks that output filehandle is open to SuffixArray and BWT.
commit 3f97e0a01e366cab078f91317272bb2f23ee501d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 19:34:24 2009 +0000
Removed comment
commit 608f078e2a25fa62019fd5478d7bb80fce7b1053
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 19:21:41 2009 +0000
Removed "exact" line from sga help text
commit f07afaa4b9014903d08b3631f6c94470208ad381
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 18:41:12 2009 +0000
Added data directory
commit 72a2ffe147d4ab4733a05c1ac18dca2a5ad54819
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 18:38:55 2009 +0000
Added tools directory with useful scripts
commit 18486dc0432d7abae0c0e1c0c52db3858e15f586
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 18:34:34 2009 +0000
Updated .gitignore
Fixed compile issues for gcc 4.4.2
commit 47d5f4fedd8ad0049d8931cf8d2cbf960ee26250
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 17:58:00 2009 +0000
Removed *.in files, updated .gitignore
commit 2858b7fc75e34b84283ba04f0850783194706fe9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 17:55:37 2009 +0000
Added .gitignore
commit 4dea95dfe9046187bed20f037097f6b18a0849b5
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 17:50:12 2009 +0000
Removed autogenerated files
commit 7bf18e5f1d85f8aec5562600b96d1f73c65a5521
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 17:44:30 2009 +0000
Test commit, no change
commit 1651692f3da2c2bb267dc9aa241842e306ef7320
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 13:54:59 2009 +0000
Removed unused filehandle
commit 25a989d2a5cadc4c8c2891774636c04ed6e97630
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 13:42:58 2009 +0000
Removed unused exact executable
commit 7abc72f3741d385ec5817e5b3e3b2236c8e6ff09
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 13:24:28 2009 +0000
Added output code and debug printing. The extraction is (understandably) slow for large l.
commit 9a36796a68eea3964269597074481d8ed45d447f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Nov 22 12:49:57 2009 +0000
Added algorithm to output all sequences of length k from a BWT
commit 07610a74e294956ac0cd4cb62037bd8397ff2cb9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 17 14:25:22 2009 +0000
propedit
commit c0e35391af0a512cfd1ebc89c37e2667921cae59
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Nov 14 22:30:06 2009 +0000
Added code to pair vertices based on read ids
commit cc0d7a3ec89ab041467505431b84bd4753fd9d2f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Nov 14 19:38:12 2009 +0000
Minor formatting changes
commit 28835e0d41cfa7fe652e20a9fc9b984d3d9a7ef3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Nov 14 17:39:42 2009 +0000
Refactored irreducible algorithm into the BWTAlgorithms collection.
commit 9b8b93dac19f0d75c07b9ac26f03994612047a51
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Nov 14 17:30:16 2009 +0000
Full implementation of irreducible overlap extraction algorithm. It now outputs all irreducible overlaps instead of just the unique one. It will skip short substring that are contained within some other string but in general substrings in the data set are not handled well. This should be improved.
commit f0d464649ca2aa14f3e69885812c98a25d009e5f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 12 21:59:15 2009 +0000
Added fast method to get the smallest consistent extension for a given sequence
commit 277eda3acc5ecd29393065c5ef60f41e3620a23a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 12 17:06:29 2009 +0000
Working version of exact extension assembly algorithm. Needs cleaning up.
commit ec5d1b080bebd646579695eef9a9c9fcfdb7d569
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Nov 12 15:32:07 2009 +0000
Factored out the extension gathering logic for AssembleExact so the same function can find left and right extensions
Reenabled the suffix tree class
commit 54186224f87ea9b814cbd056c16a7036a81b569b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 11 20:51:54 2009 +0000
Added AssembleExact functions and implemented initial version of exact string graph algorithm
commit 53af9450cef6c6377173be28301af68f0f8b16a1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Nov 11 11:29:54 2009 +0000
Refactored BWTAlgorithms module, made it a proper namespace and changed functions to be more generic
commit b9c960582b3c4de3bd0e844fd5b6bd23d0181561
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 10 21:39:11 2009 +0000
First pass at exact assembly/string graph construction algorithm. slow.
commit 80a405afcc787f3057fc60305374149454f2c159
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Nov 10 14:59:53 2009 +0000
Refactored the interval data out of BWTAlign
Added "exact" subprogram for development of exact-case assembly algorithms
Added Pileup system and started work on stastical inference of proper overlaps
commit 9449fafd3a03f82994ac2d95fd65a7204f9bd80a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 28 17:04:54 2009 +0000
Inlined AlphaCount constructor
Removed unused executables from configure
Fixed filename in SuffixTools/Makefile
commit 29539d4915fd8eed67160d6dcc10518ad09e3400
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 28 12:40:18 2009 +0000
Working implementation of transitive closure algorithm.
commit 9a2462143a2743827974bf28a31de9223cf0f98e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 27 20:31:55 2009 +0000
Progress towards inferring transitive closure edges from consistent overlaps. Edges that reveal containments are causing problems.
commit 69a3106aabae3442f4af9cdea124996c910a35cc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 27 18:14:45 2009 +0000
More changes to coordinate system. Now all the changes of frame happen internally to the Match class greatly simplifying client code.
commit c3c9a04581c05d9f151d8fd662acdeb323e73a2d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 27 15:04:13 2009 +0000
Factored Match into its own file
Unifed SeqCoord representation. All seq coordinates are normal intervals now (start <= end) instead of storing the fact that the intervals are reversecomplement by start > end. This infomation is now kept in the Match (which is a more appropriate place for it). All functions/data using SeqCoords have been changed for this new behaviour which simplifies matters.
commit 03391cd6a75a18aa8826b0b75804f3a69dfd2a5a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 25 21:01:43 2009 +0000
Added vertex removal program which eliminates vertices that have a high error rate.
Revised bubble removal, algorithm is a mess and needs to be fixed
commit c7cc63caa9068057ce0acd1c985cfac9fefacbcf
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 25 19:52:10 2009 +0000
Added much better, block-wise bwt alignment algorithm
Bubbles caused by errors are a big problem in the current build. Better bubble popping is needed. Maybe a pruner that removes sequences with a high error rate.
commit 2162a9f297b0c63d07764165cdcedb72fde3ada1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 23 15:15:06 2009 +0000
Fixed bug in duplicate hit removal, hits to BWT and RevBWT must be considered differently to avoid stomping IDs
commit 33ad176043cccd03bce5d65f6bd3f5e74a37eed0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 23 12:57:05 2009 +0000
Added error correction algorithm to oview, added bubble popping algorithm to StringGraph (in progress)
commit 1257c997e04e9fe5eae6a4c3e6fe7a473716dc33
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 22 16:11:13 2009 +0000
Added trimming algorithm, sweepVertex, remove duplicate hits
commit e1a3e6ef86e2cefca16b112d61fbaa22309b0082
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 22 12:12:38 2009 +0000
Inlining
commit 006d83643d5306a217f36d1e66588900c4a55ec4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 21 19:07:08 2009 +0000
Major refactoring to StringGraph. Overhangs are now stored as intervals instead of actual strings. The bookkeeping is a bit messy and could probably be cleaned up but the checked in version works. It simplified the merging logic somewhat.
commit 70fc60671d1338377ede4d7f4f510e5f4feaf61b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 21 09:34:59 2009 +0000
Refactored the Overlap struct to include a sub-struct which holds the matching coordinates
commit 0ee3b89d3319b2488f2c3d5323b7b4dd04560e63
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 21 08:56:11 2009 +0000
Fixed oview bug
commit 40cf8814540a8fa86587a55aada70e1cae5d3b3f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 20 16:28:24 2009 +0000
Slight reworking of the structure of the alignment algorithm in bwt_algorithms.cpp
commit ae03c4a2e93f17767b6771e5bfbc4200caab1243
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 20 14:02:13 2009 +0000
Portability fixes, added includes and fixed printfs so that it would compile on my home machine (32-bit Ubuntu 9.04)
commit ba404b757e744ceff6971b66babb243a11382463
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 19 13:18:32 2009 +0000
Implemented seeded bwt alignment algorithm for inexact suffix/prefix matching.
commit dcbf0f2541f1ba60c613fbb8af15e991234f1693
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Oct 16 14:22:26 2009 +0000
Implemented inexact matching to BWT, there is currently no limit on the amount of backtracking so it is significantly slower
commit 134b708c32a3a9e22128afc79fb79fc2cf258f2f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Oct 15 12:35:59 2009 +0000
Rewrote oview to draw all the alignments for a particular read at the same time
commit 0c2dfe011ba1bf7c82e4fdef85aa60689ead763b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 14 15:11:45 2009 +0000
Deleted unnecessary HitData.cpp
Fixed bug in BWT align where size_t would wrap around because of negative value and ruin the inner loop
Changed contain file format
commit d8c7551066641b1ba3529b77f5bafcb583860039
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 14 14:50:42 2009 +0000
Substantial rewrite of the overlap program
It now computes hits using only the BWT, then it dumps the BWT, loads the read table and suffix array index (not full SA)
and turns the hits into overlaps
commit 9b8e2f7bb7233a363815e224565d971a38704518
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 14 12:04:41 2009 +0000
macroed out calls in BWT for readability reasons
commit 082e83990a2e3a1b91549bfa244441925c96626c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 14 10:58:39 2009 +0000
Implemented sampling of the occurance array to lower memory
Cleaned up functions to streamline calls when calculating intermediate occurance values
commit b9f4e6ed1566782d5f3a14ecd2b8689dd340fa21
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 13 19:27:22 2009 +0000
Fixed macro.
commit f05690d94d9879fafc142aab04a1fb3d26ac2bd7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 13 16:28:40 2009 +0000
Moved read/write functions into SuffixArray/BWT classes
commit 452edd4ea65d95cde0b6a9942fb3585ea425c36a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 13 16:14:16 2009 +0000
Much refactorint
sga index now builds BWT as well as SA and writes out both
Wrote I/O code for BWT to support above
commit f0cf796f443ea59c5a2b6c022bafac83194af8e3
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 13 13:42:10 2009 +0000
Refactored SA construction code out of index and into class
Inlined more functions
commit c86cd7240cb789751cb4e5610c820816740f5d84
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 12 12:02:34 2009 +0000
Added mkqs which was forgotten
commit 8f316ed7c137065f5eefaa704427c16e9e8f1b94
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 12 12:02:01 2009 +0000
Refactor SuffixCompare class to distinguish comparing by sequence (in a radix sort) and ID
Update histogramSort/mkqs to support taking in two comparator objects
Optimize SuffixCompareRadix to cache the value of calcNumSuffixes
inlined functions as needed to remove hotspots revealed by profiling
modified saca_induced_copying to avoid the initial induceSAs/induceSAl which are not needed when using an external sort. Likewise copy initial LMS strings into the first n1 elements of the SA and immediately call mkqs/histogram sort. This removes unnecessary overhead.
commit dc4c58acd0ce2485389d81c03b4d31db7736d5fa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 11 21:28:12 2009 +0000
Inlined some frequently called functions to avoid function call overhead
commit 95ffbe3bf5d82be5beddb7464c5b956d5798a7ce
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 11 20:14:33 2009 +0000
Restored writing the SA out after indexing
commit 7899f6f0bd338c9bb8375f3a698689492962d239
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 11 20:05:28 2009 +0000
Implemented Nong-Zhang-Chan induced copying suffix array construction algorithm. In tests it only has to sort 30% of the suffixes using MKQS/histogram sort which is a large improvement. The algorithm can probably be modified further. The code is in need of a cleanup as well.
commit 4640bf0143a2a6e7d2db7353942b0b3220885067
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 11 14:19:11 2009 +0000
Ported Bentley/Sedgewick/s multikey quicksort
Changed histogram sort to delegate to above instead of std::sort when the bucket is non-degenerate
Changed histogram sort interface to take in an array pointer instead of an iterator
commit 0c156611244fd0173a6620f9e225ad49905f18d1
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Oct 10 16:30:11 2009 +0000
Added DNAString class which is a wrapper for a c-string
It is intended to allow fast accesses to suffixes without having to use a std::string::substr
It will form the basis for transitioning to a 2bit representation
commit 4b7fb936f2386dd4a035ed2a3f6d132349998a03
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 7 16:09:36 2009 +0000
Removed print of contigs before transitive removal/compaction
commit 30bdca79aadec3740fe2a532fd143253140cef54
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Oct 7 08:13:38 2009 +0000
Swapped order of conditions for terminating loop in histogram sort so that valgrind doesnt complain about out of bounds access
commit dda1720cd206268148eab6f25aec8dd1ee2974d4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 6 20:34:37 2009 +0000
Removed some prints
commit 26da9bd7c6b8cc0d87307f4e0ef1266edc996ce7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 6 19:15:47 2009 +0000
Removed prints
commit b9c1710ce1a1fa37fd8ce25068145b9daa1a4e8f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 6 18:36:32 2009 +0000
Tweaked parameters
commit daf0a5233849cb57b10d13da48411a7d8efa4aef
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 6 17:18:18 2009 +0000
Refactored SuffixCompare into its own files
Implemented progressive histogram sort
commit efbb0732020e544acdb84d9c87a1219dad962d55
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Oct 6 13:48:07 2009 +0000
Implemented histogram sort
commit da7465477561919eade092674a4d236a110d188f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Oct 5 16:06:56 2009 +0000
Implemented bucket sort as an improvement over using std::sort across the whole suffix array. Should modify this to use histogram sort or another variant to drop the memory usage
commit 1105cd3edc181d5eba8d95dd720ea4fbbd25dd46
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 4 19:47:50 2009 +0000
Fixed bug in TR algorithm. Edges must be marked for removal and removed in a single pass afterwards or else some reducible edges may be missed.
Refactored VertexColor enum to be GraphColor and added color member to Edge so edges can be marked.
Added sweepEdges function to bigraph and vertex
Removed some print statements
commit a4a59adac2e7666ba70f7874a54738f45df8ec2a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 4 12:19:16 2009 +0000
Fixed bug in transred algorithm - twin edges were not being removed
Fixed bug in oview where the calculation of display coordinates did not work for reverse comp sequences.
commit 61f3e3a89f08d3e230b87e3fa271641c4d817e19
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Oct 4 11:43:01 2009 +0000
Implemented myers transitive removal algorithm
commit c9fbffa738bbeac3ae8418894b9347697b84f7a6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Oct 3 17:54:31 2009 +0000
Added edge sorting functions to bigraph and vertex classes.
Added function to string graph classes to sort the edges by length
commit e37abf90b6da88ffa70ee98e62bd0ac1db840aa9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Oct 3 17:27:03 2009 +0000
Switched Vertex implementation to use a list instead of an STL map, for space reasons and to allow easier sorting for the transitive removal algorithm
commit 8063f674ff36a7912ed858003ac92cb02adef946
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 30 15:11:06 2009 +0000
Added verbose guard around prints
commit f2acb9a283f2746a34e37cffc5297f189799364b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 30 14:06:09 2009 +0000
Fixed bug in SuffixArray::extractPrefixSuffixOverlaps which could output multiple hits per read instead of only the optimal hit
commit 77588034710c2479992dd90db336f651a1ac31d4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 30 13:38:09 2009 +0000
Renamed BWT::getHits to BWT::getPrefixHits to more accurately reflect its purpose
commit efd4a7e3aac4033e9bd6b9809141433f22a15efa
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 30 13:36:41 2009 +0000
Fixed bug in oview which was not displaying reverse complement alignments correctly
Fixed bug in bwt match which was matching to non-prefix positions which is not desirable for read overlapping
commit 8c49427fdb39aaccd88f4d0b595d75afbc144b4d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Sep 30 09:46:08 2009 +0000
Added matching string to oview output
commit f76c9bdf0201d7a9eac38c57712e5bac2ab66db4
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Sep 29 13:32:19 2009 +0000
Implemented exact prefix/suffix matching
Adding oview program which outputs overlap alignments
Added optional read indexing in read table
commit fd13d02b00a3555ac03595bf6faf6330c3ba6112
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 28 16:07:30 2009 +0000
Added early exit to SuffixArray::removeReads if the id list to remove is empty
commit 75e26241e9d700c8d7ddcd79be6fb17a55a46e28
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 28 16:04:19 2009 +0000
Minor renaming
commit 1512465b4d556b6e2b099e22acea3ed18a3b8818
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 28 16:02:27 2009 +0000
Renamed SAID to SAElem
commit 95d0a55e87ab98d790e03ccd5cfc6eabc514220e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Sep 28 15:57:36 2009 +0000
Implemented redundant read detection and removal algorithms
commit 279081f280da38347faf02930e75453f4ba1444e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 27 16:20:43 2009 +0000
Moved stringgraph construction functions to SGUtil
commit 4f162887539f14c592a97da6571162744fe5b4a8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 27 11:48:40 2009 +0000
Added checks to overlap detection to ensure overlaps are sane
commit 3e2893eeab52ea1b62948c953259ff4a4db93277
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 27 10:27:53 2009 +0000
Rewrote overlap processing logic so all hits for a given read are processed at the same time
commit 1b9d53b3c5524c8caa5730c348b5c519ff084174
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 25 16:13:01 2009 +0000
Fixed bug where the orientation of edges that were being merged were incorrect
commit e78306ddc11325b0629262e51752e8fb05b933ed
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 25 13:27:56 2009 +0000
Updated assemble parameters
commit 78a11dcd732c72fce4e48bac20938549b14140a7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 25 13:17:50 2009 +0000
Added generation of reverse suffix array to index
Re-worked input parameters, now it is based on prefixes where the prefix defaults to the input filename prefix
Added support for aligning reverse-comp sequences by aligning all 4 cases (read, reverse read, rc read, rev rc read)
commit bf4a4d9b458dd743558c828f7dea6257da1d4006
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 24 19:05:22 2009 +0000
Simplified overlap representation
Fixed string graph creation
commit a4da81f914cfba40bbd3d099b7f4a6121ca5895b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 24 13:30:34 2009 +0000
Added validation function to stringvertex and vertex
commit da8351515bc3a538b53065dbcfe35821506aa340
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 24 13:08:16 2009 +0000
Refactored the vertex merge logic to be more intuitive
commit 49829f7af059476d4350f59b5be96dd85bae5187
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 24 12:59:37 2009 +0000
Added StringGraph clasess which implement Myer's formulation of a string graph
Modified bigraph to support above
commit dab6a744fa649b243e5fb683049cc44e67faba9f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Aug 23 19:32:27 2009 +0000
Added proper destructor to Vertex, fixed memleak in Bigraph::merge
commit 9c1d9984f2167e8d4265abb594724ad57e8c078e
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Aug 23 19:27:41 2009 +0000
Massive refactor
-> Bigraph is now implemented using inheritance instead of templates
-> Edges are now passed around as pointers
commit ce3394edbe1ac38d96e2327e416fce4229d9e67b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 21 13:09:04 2009 +0000
Implemented initial string graph construction algorithm
commit 28f76dc33abf66c8f1ec47c241796b42acd00aa7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 20 20:50:45 2009 +0000
Added assemble program and laid out skeleton
commit ee2dde607f6cb151119f665af1c671f2d570d257
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 19 16:12:07 2009 +0000
Added getopt to index
commit 5f5554b28319bd42991b07fa5fa9e5858d952d1b
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 19 15:41:32 2009 +0000
Added overlap data structure and HitData class
overlap module now computes forward strand, perfect overlaps and writes to file
started to move programs to use getopt
refactoring
commit 0c8a35a7d77ce045c641817ea84dac9c0b795608
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Aug 19 12:40:41 2009 +0000
Added SeqReader
Refactored BWT
commit 583063655bae8e93d66423b791c8ecd0c857f0ab
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 18 22:08:40 2009 +0000
Added overlapper
commit 43716210ba367fdb3cbdd266ec24d5a3cb453a76
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 18 20:54:07 2009 +0000
Added main program, refactored
commit 7eab48733930a505d47aef6ce29428e8b9e1841d
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 18 20:10:51 2009 +0000
Checking prior to refactor
commit b40ba7ecca0e112e65e30487af1f3e6685fbbad9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 18 08:12:18 2009 +0000
Merge function fixed. Too slow. In-place construction is needed
commit f6974e8e5767ef48e7c57317e533ea2dd1052c71
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 17 20:33:25 2009 +0000
Bug-fixes
commit dec788819aed7e681395dbffb92e2a25cb2bddbc
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Aug 17 16:09:49 2009 +0000
Refactored code out of BWT class into SuffixArray class
Added ReadTable class
Added InverseSuffixArray class
Implemented suffixarray merge function
commit 481adb1beb761aeafee1678b8a49b0770a94297a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Aug 14 14:20:57 2009 +0000
Big refactoring of BWT code
commit 637f8cbbda922ac42bb4f4b60f882d59d50c9bb6
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Aug 13 08:15:06 2009 +0000
Added suffix tree
commit 98d3f3b6831d23fe0ffb6b3df36631443dc7f7a7
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Aug 11 19:41:34 2009 +0000
Initial checkin of development suffixtree and bwt code
commit 8de5560feb2f0ff4db0a4e8c24f7dd5d068358ea
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 30 16:00:53 2009 +0000
In-progress checkin of experimental distance estimation code. Lots of testing/debug hooks in BDE.cpp
commit 0bce16c47e8a1e6ac383e5701157284f8870bab8
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 26 15:34:33 2009 +0000
Refactored scaffold code to use new templated Bigraph class.
commit fcca605d73fe88172d0468809acabde560c8f142
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 26 13:10:43 2009 +0000
More refactoring, renamed the SeqGraph module to Bigraph which is more general
commit e91557049ccb4c1fac20a5bef375aa190e050e52
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 26 09:57:47 2009 +0000
More refactoring.
commit 2b0495ccfc388c515f530759a9b13c6e29f1a6ee
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Jun 26 09:42:19 2009 +0000
Refactored the SeqGraph to be a template.
commit 67a4f20ce2dcca4fef2cfbd9be57b5ed200752d0
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 25 10:02:34 2009 +0000
Added automake files
commit 139f53b24dccdd5eafca3bf6ca6b9638bf3f0392
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Thu Jun 25 09:41:00 2009 +0000
Committing test code stub, unit tests will go here
commit 3a4b379d86734109273a0cf565f9e38aa9ae4246
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Jun 22 10:15:52 2009 +0000
Cleaned up resolve
commit 7a5cadf006191ff68d8ba7e6aa316c32a8335924
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 16 13:43:46 2009 +0000
UniEst: Added graph-based uniqueness inference. It does not work particularly well.
commit bcfcf94aba547f2404587ced40407dc35d611e07
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 16 13:05:13 2009 +0000
- UniEst: reworked command line arguments. The align file is now required but inference over depth can be disabled with the --no_depth flag. The --no_pair flag is removed, it is automatically set when a paired/hist file are passed in.
commit 21f0877c0afc950f5ef5e0756280638aba9ff4c9
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue Jun 16 10:29:02 2009 +0000
- Implemented contig uniqueness estimator by overhanging pairs, the performance is similar to depth estimation for long (>= 100bp) contigs but worse for small
- Many interface changes to UniEst for above
- New Stats directory, refactored functions from Util to StatsCommon
- Added FragmentDistribution class which reads a .hist file
- Added IntDist which is a discrete integer-valued probability distribution (typically a FragmentDistribution will be converted to this)
- Require GSL during configure
commit fbfeb02a71b76195ab7644a1887c48ed3808219a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Wed Jun 10 10:29:09 2009 +0000
Fixed horrible bug in UniEst and added some better command line parsing
Semi-stable scaffolding code
commit 0339581f96fe8f593ce70f44010c6d6ab58fff7c
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Jun 6 22:09:04 2009 +0000
Scaffolding in-progress checkin.
commit 4cdb6f04f6eb385081e22fdedc277927601c7d15
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 22 08:21:02 2009 +0000
In progress check-in of scaffolding code
commit 2f19a9b8c47745f742c21e4e3abf15bf91925242
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Tue May 19 13:54:31 2009 +0000
First version of UniEst is complete
First version of standard contig format is complete
commit de91b8670220021afcb6151de9ab4bc547c04863
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri May 15 21:19:27 2009 +0000
Initial import of UniEst, Util directories
commit 3fac03260c0f27700a6e774cb1b2d2437dd87923
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Mon Apr 13 00:29:39 2009 +0000
added function to load edges into the graph
commit 96f6056421e86b7b8c0b006596484356beed60ee
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sun Apr 12 23:15:43 2009 +0000
Added README
Added derived vertex class for sequences
Added contig loader
commit 8f827e2d7ecee02db2139d26a410ced2e6b2f19f
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Apr 11 23:49:24 2009 +0000
added simplify() function which removes transitive edges from the seqgraph
added validate function to seqgraph to make sure everything is sane
added vertex flipping
commit a99cda99295992bf8b17707bf0119342360b5e79
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Apr 11 19:23:54 2009 +0000
Added vertex merging and removal
Fixed Edge::operator<
commit 54e4707bc7ee11c35e7ead3be01b64b46a9ad72a
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Sat Apr 11 15:25:45 2009 +0000
Renamed IVertex to Vertex
commit 54286aebeea6cbac2183c6744d26e5289bf0f131
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 10 22:40:09 2009 +0000
Added edge labels to dotty output
commit 2a707850eb84b84f433a9506329af625fd568709
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 10 22:36:52 2009 +0000
Importing configuration files
Created initial graph building code
Created graphviz output
commit b79652b78374e18d2fecfaf18167ac5c480edc23
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 10 19:53:40 2009 +0000
Importing stub files
commit 2278f049910ba1eaafc446a38eb575ad67041ebe
Author: Jared Simpson <js18 at sanger.ac.uk>
Date: Fri Apr 10 19:45:52 2009 +0000
adding new project
-----------------------------------------------------------------------
--
Debian packaging for sga
More information about the debian-med-commit
mailing list