[med-svn] [jellyfish] 03/07: Imported Upstream version 2.2.5
Michael Crusoe
misterc-guest at moszumanska.debian.org
Sat Mar 12 22:17:31 UTC 2016
This is an automated email from the git hooks/post-receive script.
misterc-guest pushed a commit to branch master
in repository jellyfish.
commit 87fdddc6618aadfb3970ab71802f3966a0621c75
Author: Michael R. Crusoe <crusoe at ucdavis.edu>
Date: Sat Mar 12 08:07:43 2016 -0800
Imported Upstream version 2.2.5
---
.travis.yml | 6 +
Makefile.am | 6 +-
README.md | 39 ++++--
configure.ac | 2 +-
include/jellyfish/generator_manager.hpp | 45 ++++---
include/jellyfish/mer_overlap_sequence_parser.hpp | 8 +-
include/jellyfish/whole_sequence_parser.hpp | 11 +-
lib/generator_manager.cc | 22 ++--
swig/Readme.md | 95 ++++++++-------
swig/jellyfish.i | 1 +
swig/perl5/t/test_string_mers.t | 30 +++++
swig/python/test_string_mers.py | 39 ++++++
swig/ruby/test_string_mers.rb | 36 ++++++
swig/string_mers.i | 137 ++++++++++++++++++++++
tests/swig_perl.sh | 2 +-
tests/swig_python.sh | 2 +-
tests/swig_ruby.sh | 2 +-
unit_tests/test_mer_dna.cc | 3 +-
unit_tests/test_stdio_filebuf.cc | 3 +-
19 files changed, 388 insertions(+), 101 deletions(-)
diff --git a/.travis.yml b/.travis.yml
new file mode 100644
index 0000000..d8fd723
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,6 @@
+language: cpp
+before_install:
+ - mkdir -p ~/bin
+ - curl -L -o ~/bin/yaggo https://github.com/gmarcais/yaggo/releases/download/v1.5.9/yaggo
+ - chmod a+rx ~/bin/yaggo
+script: autoreconf -i && ./configure YAGGO=~/bin/yaggo && make && make check
diff --git a/Makefile.am b/Makefile.am
index a28d848..1a6ae27 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -164,9 +164,9 @@ TESTS += tests/swig_python.sh tests/swig_ruby.sh tests/swig_perl.sh
tests/swig_python.log: tests/generate_sequence.log
tests/swig_ruby.log: tests/generate_sequence.log
tests/swig_perl.log: tests/generate_sequence.log
-EXTRA_DIST += swig/python/test_mer_file.py swig/python/test_hash_counter.py
-EXTRA_DIST += swig/ruby/test_mer_file.rb swig/ruby/test_hash_counter.rb
-EXTRA_DIST += swig/perl5/t/test_mer_file.t swig/perl5/t/test_hash_counter.t
+EXTRA_DIST += swig/python/test_mer_file.py swig/python/test_hash_counter.py swig/python/test_string_mers.py
+EXTRA_DIST += swig/ruby/test_mer_file.rb swig/ruby/test_hash_counter.rb swig/ruby/test_string_mers.rb
+EXTRA_DIST += swig/perl5/t/test_mer_file.t swig/perl5/t/test_hash_counter.t swig/perl5/t/test_string_mers.t
##############
diff --git a/README.md b/README.md
index 8c8af70..6c72475 100644
--- a/README.md
+++ b/README.md
@@ -15,14 +15,20 @@ If you use Jellyfish in your research, please cite:
Installation
------------
-To get an easier to compiled packaged tar ball of the source code, download a release from [home page of Jellyfish at the University of Maryland][1] or from the [github release][3].
+To get an easier to compiled packaged tar ball of the source code, download a release from the [github release][3]. You need make and g++ version 4.4 or higher. To install in your home directory, do:
-To compile from the git tree, you will need autoconf/automake, make, g++ 4.4 or newer and [yaggo](https://github.com/gmarcais/yaggo "Yaggo on github"). Then compile with:
+```Shell
+./configure --prefix=$HOME
+make -j 4
+make install
+```
+
+To compile from the git tree, you will also need autoconf/automake, and [yaggo](https://github.com/gmarcais/yaggo/releases "Yaggo release on github"). Then to compile and install (in `/usr/local` in that example) with:
```Shell
autoreconf -i
./configure
-make
+make -j 4
sudo make install
```
@@ -35,24 +41,35 @@ In the examples directory are potentially useful extra programs to query/manipul
Binding to script languages
---------------------------
-Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the [release tarball][3]: [SWIG][2] is not required and in the command lines shown below, remove the `--enable-swig` switch. Only the development files of the scripting languages are required.
+Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the [release tarball][3]. The development files of the target scripting language are required.
+
+Compilation of the bindings from the git tree requires [SWIG][2] version 3 and adding the switch `--enable-swig` to the configure command lines show below.
+
+To compile all three bindings, configure and compile with:
+
+```Shell
+./configure --enable-ruby-binding --enable-python-binding --enable-perl-binding
+make -j 4
+sudo make install
+```
-Compilation of the bindings from the git tree requires [SWIG][2] version 3, and the development files of the scripting languages. To compile all three bindings, configure with:
+By default, Jellyfish is installed in `/usr/local` and the bindings are installed in the proper system location. When the `--prefix` switch is passed, the bindings are installed in the given directory. For example:
```Shell
-./configure --enable-swig --enable-ruby-binding --enable-python-binding --enable-perl-binding
+./configure --prefix=$HOME --enable-python-binding
+make -j 4
+make install
```
-Note that the headers of older version of Perl 5 do not compile with recent compilers (g++ > 4.4, clang++) and C++11 mode enable. One may have to specify the path to version 4.4 of gcc by adding, for example, `CXX=g++4.4` to the configure commande line.
+This will install the python binding in `$HOME/lib/python2.7/site-packages` (adjust based on your Python version).
-The binding can installed in a different location than the default (which may require root privileges for example) by passing a path to the `--enable` switches. Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (`PYTHONPATH`, `RUBYLIB` and `PERL5LIB` respectively). For example:
+Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (`PYTHONPATH`, `RUBYLIB` and `PERL5LIB` respectively). For example:
```Shell
-./configure --prefix=$HOME --enable-swig --enable-python-binding=$HOME/lib/python
-export PYTHONPATH=$HOME/lib/python
+export PYTHONPATH=$HOME/lib/python2.7/site-packages
```
-See the `swig` directory for examples on how to use the bindings.
+See the [swig directory](../../tree/master/swig) for examples on how to use the bindings.
[1]: http://www.genome.umd.edu/jellyfish.html "Genome group at University of Maryland"
[2]: http://www.swig.org/
diff --git a/configure.ac b/configure.ac
index e067893..7515855 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,4 +1,4 @@
-AC_INIT([jellyfish], [2.2.4], [gmarcais at umd.edu])
+AC_INIT([jellyfish], [2.2.5], [gmarcais at umd.edu])
AC_CANONICAL_HOST
AC_CONFIG_MACRO_DIR([m4])
AM_INIT_AUTOMAKE([subdir-objects foreign parallel-tests color-tests])
diff --git a/include/jellyfish/generator_manager.hpp b/include/jellyfish/generator_manager.hpp
index 64cd0b0..0d9224e 100644
--- a/include/jellyfish/generator_manager.hpp
+++ b/include/jellyfish/generator_manager.hpp
@@ -14,9 +14,8 @@
along with Jellyfish. If not, see <http://www.gnu.org/licenses/>.
*/
-
-#ifndef __JELLYFISH_SPAWN_EXTERNAL_HPP_
-#define __JELLYFISH_SPAWN_EXTERNAL_HPP_
+#ifndef __JELLYFISH_GENERATOR_MANAGER_H__
+#define __JELLYFISH_GENERATOR_MANAGER_H__
#ifdef HAVE_CONFIG_H
#include <config.h>
@@ -104,8 +103,7 @@ public:
// This class creates a new process which manages a bunch of
// "generators", sub-processes that writes into a fifo (named pipe)
// and generate sequence.
-class generator_manager {
- cloexec_istream cmds_;
+class generator_manager_base {
tmp_pipes pipes_;
pid_t manager_pid_;
const char* shell_;
@@ -119,21 +117,18 @@ class generator_manager {
pid2pipe_type pid2pipe_;
public:
- generator_manager(const char* cmds, int nb_pipes, const char* shell = 0) :
- cmds_(cmds),
+ generator_manager_base(int nb_pipes, const char* shell = 0) :
pipes_(nb_pipes),
manager_pid_(-1),
shell_(shell),
kill_signal_(0)
{
- if(!cmds_.good())
- throw std::runtime_error(err::msg() << "Failed to open cmds file '" << cmds << "'");
if(!shell_)
shell_ = getenv("SHELL");
if(!shell_)
shell_ = "/bin/sh";
}
- ~generator_manager() { wait(); }
+ virtual ~generator_manager_base() { wait(); }
const tmp_pipes& pipes() const { return pipes_; }
pid_t pid() const { return manager_pid_; }
@@ -144,12 +139,9 @@ public:
// with no error, false otherwise.
bool wait();
-private:
- /// Read commands from the cmds stream. There is one command per
- /// line. Empty lines or lines whose first non-white space character
- /// is a # are ignored. Return an empty string when no more commands
- /// are available.
- std::string get_cmd();
+protected:
+ virtual std::string get_cmd() = 0;
+ virtual void parent_cleanup() { }
void start_commands();
void start_one_command(const std::string& command, int pipe);
bool display_status(int status, const std::string& command);
@@ -158,6 +150,25 @@ private:
static void signal_handler(int signal);
void cleanup();
};
+
+class generator_manager : public generator_manager_base {
+ cloexec_istream cmds_;
+public:
+ generator_manager(const char* cmds, int nb_pipes, const char* shell = 0)
+ : generator_manager_base(nb_pipes, shell)
+ , cmds_(cmds)
+ {
+ if(!cmds_.good())
+ throw std::runtime_error(err::msg() << "Failed to open cmds file '" << cmds << "'");
+ }
+
+ void parent_cleanup() {
+ cmds_.close();
+ }
+
+ std::string get_cmd();
+};
}
-#endif /* __JELLYFISH_SPAWN_EXTERNAL_HPP_ */
+#endif /* __JELLYFISH_GENERATOR_MANAGER_H__ */
+
diff --git a/include/jellyfish/mer_overlap_sequence_parser.hpp b/include/jellyfish/mer_overlap_sequence_parser.hpp
index f5bd549..488cd83 100644
--- a/include/jellyfish/mer_overlap_sequence_parser.hpp
+++ b/include/jellyfish/mer_overlap_sequence_parser.hpp
@@ -208,9 +208,9 @@ protected:
size_t read_sequence(std::istream& is, const size_t read, char* const start, const char stop) {
size_t nread = read;
+
+ skip_newlines(is); // Skip new lines -> get below doesn't like them
while(is && nread < buf_size_ - 1 && is.peek() != stop) {
- // Skip new lines -> get below does like them
- skip_newlines(is);
is.get(start + nread, buf_size_ - nread);
nread += is.gcount();
skip_newlines(is);
@@ -231,12 +231,14 @@ protected:
void skip_quals(std::istream& is, size_t read_len) {
ignore_line(is);
size_t quals = 0;
+
+ skip_newlines(is);
while(is.good() && quals < read_len) {
- skip_newlines(is);
is.ignore(read_len - quals + 1, '\n');
quals += is.gcount();
if(is)
++read_len;
+ skip_newlines(is);
}
skip_newlines(is);
if(quals == read_len && (is.peek() == '@' || is.peek() == EOF))
diff --git a/include/jellyfish/whole_sequence_parser.hpp b/include/jellyfish/whole_sequence_parser.hpp
index 2474e10..46b0520 100644
--- a/include/jellyfish/whole_sequence_parser.hpp
+++ b/include/jellyfish/whole_sequence_parser.hpp
@@ -136,7 +136,11 @@ protected:
header_sequence_qual& fill_buff = buff.data[nb_filled];
st.stream->get(); // Skip '@'
std::getline(*st.stream, fill_buff.header);
- fill_buff.seq.clear();
+
+ if(st.stream->peek() != '+')
+ std::getline(*st.stream, fill_buff.seq);
+ else
+ fill_buff.seq.clear();
while(st.stream->peek() != '+' && st.stream->peek() != EOF) {
std::getline(*st.stream, st.buffer); // Wish there was an easy way to combine the
fill_buff.seq.append(st.buffer); // two lines avoiding copying
@@ -144,7 +148,10 @@ protected:
if(!st.stream->good())
throw std::runtime_error("Truncated fastq file");
st.stream->ignore(std::numeric_limits<std::streamsize>::max(), '\n');
- fill_buff.qual.clear();
+ if(st.stream->peek() != '+')
+ std::getline(*st.stream, fill_buff.qual);
+ else
+ fill_buff.qual.clear();
while(fill_buff.qual.size() < fill_buff.seq.size() && st.stream->good()) {
std::getline(*st.stream, st.buffer);
fill_buff.qual.append(st.buffer);
diff --git a/lib/generator_manager.cc b/lib/generator_manager.cc
index 1d03541..a9b99ce 100644
--- a/lib/generator_manager.cc
+++ b/lib/generator_manager.cc
@@ -109,7 +109,7 @@ void tmp_pipes::cleanup() {
rmdir(tmpdir_.c_str());
}
-void generator_manager::start() {
+void generator_manager_base::start() {
if(manager_pid_ != -1)
return;
manager_pid_ = fork();
@@ -121,7 +121,7 @@ void generator_manager::start() {
manager_pid_ = -1;
break;
default:
- cmds_.close();
+ parent_cleanup();
return;
}
@@ -145,11 +145,11 @@ void generator_manager::start() {
exit(EXIT_FAILURE); // Should not be reached
}
-static generator_manager* manager = 0;
-void generator_manager::signal_handler(int signal) {
+static generator_manager_base* manager = 0;
+void generator_manager_base::signal_handler(int signal) {
manager->kill_signal_ = signal;
}
-int generator_manager::setup_signal_handlers() {
+int generator_manager_base::setup_signal_handlers() {
struct sigaction act;
memset(&act, '\0', sizeof(act));
act.sa_handler = signal_handler;
@@ -157,14 +157,14 @@ int generator_manager::setup_signal_handlers() {
// Should we redefine other signals as well? Like SIGINT, SIGQUIT?
}
-void generator_manager::unset_signal_handlers() {
+void generator_manager_base::unset_signal_handlers() {
struct sigaction act;
memset(&act, '\0', sizeof(act));
act.sa_handler = SIG_DFL;
sigaction(SIGTERM, &act, 0);
}
-bool generator_manager::wait() {
+bool generator_manager_base::wait() {
if(manager_pid_ == -1) return false;
pid_t pid = manager_pid_;
manager_pid_ = -1;
@@ -174,7 +174,7 @@ bool generator_manager::wait() {
return WIFEXITED(status) && (WEXITSTATUS(status) == 0);
}
-void generator_manager::cleanup() {
+void generator_manager_base::cleanup() {
for(auto it = pid2pipe_.begin(); it != pid2pipe_.end(); ++it) {
kill(it->first, SIGTERM);
pipes_.discard(it->second.pipe);
@@ -182,7 +182,7 @@ void generator_manager::cleanup() {
pipes_.cleanup();
}
-void generator_manager::start_one_command(const std::string& command, int pipe)
+void generator_manager_base::start_one_command(const std::string& command, int pipe)
{
cmd_info_type info = { command, pipe };
pid_t child = fork();
@@ -228,7 +228,7 @@ std::string generator_manager::get_cmd() {
return command;
}
-void generator_manager::start_commands()
+void generator_manager_base::start_commands()
{
std::string command;
size_t i;
@@ -263,7 +263,7 @@ void generator_manager::start_commands()
}
}
-bool generator_manager::display_status(int status, const std::string& command)
+bool generator_manager_base::display_status(int status, const std::string& command)
{
if(WIFEXITED(status) && WEXITSTATUS(status) != 0) {
std::cerr << "Command '" << command
diff --git a/swig/Readme.md b/swig/Readme.md
index ae8868a..37d21ca 100644
--- a/swig/Readme.md
+++ b/swig/Readme.md
@@ -7,51 +7,6 @@ output files and use the Jellyfish hash within these scripting
languages, which is much more convenient than in C++, although
somewhat slower.
-Installation
-============
-
-Requirements
-------------
-
-In the following, it is assumed that Jellyfish has been properly
-installed and is visible to the 'pkg-config' tool. The following
-command:
-
-```Shell
-pkg-config --exists jellyfish-2.0 && echo yes
-```
-
-Must print 'yes'. If not, see the README in the Jellyfish on how to
-install and, if necessary, setup the 'PKG\_CONFIG\_PATH' variable.
-
-The [swig](http://www.swig.org/) software package must be
-installed. All the testing is done with version 3.x. Version 2.x MAY
-work, but is not tested.
-
-Configure
----------
-
-To compile the bindings, use, according to taste, some of the the following switches with configure:
-
-```Shell
-./configure --enable-swig --enable-python-binding --enable-ruby-binding --enable-perl-binding
-```
-
-In addition, each of the `--enable-*-binding` switch can take a path where to install the binding. This allows to install without root privilegies. For example:
-
-```Shell
-./configure --prefix=`pwd`/inst --enable-swig --enable-python-binding=`pwd`/inst/python
-make
-make install
-```
-
-will install `jellyfish` in `./inst/bin` and the python files in `./inst/python`. Then, one needs to add `$(pwd)/inst/python` to `PYTHONPATH` to use the binding. Similarly with ruby and `RUBYLIB`, perl and `PERL5LIB`.
-
-The swig bindings were tested with Python 3.3.3, Ruby 1.9.3 and Perl 5.18.1. The Perl headers may not compile properly with recent version of g++. It compiles properly with g++ version 4.4. Hence, you may need to pass the path to `g++` version 4.4 to the configure command line. For example:
-
-```Shell
-./configure --enable-swig --enable-perl-binding CXX=g++-4.4
-```
Examples
========
@@ -71,7 +26,7 @@ import jellyfish
mf = jellyfish.ReadMerFile(sys.argv[1])
for mer, count in mf:
- print(mer, " ", count)
+ print mer, count
```
----
@@ -113,7 +68,7 @@ qf = jellyfish.QueryMerFile(sys.argv.[1])
for m in sys.argv[2:]:
mer = jellyfish.MerDNA(m)
mer.canonicalize()
- print(mer, " ", qf[mer])
+ print mer, qf[mer])
```
----
@@ -136,9 +91,53 @@ use jellyfish;
my $qf = jellyfish::QueryMerFile->new(shift(@ARGV));
foreach my $m (@ARGV) {
- my $mer = Jellyfish::MerDNA->new($m);
+ my $mer = jellyfish::MerDNA->new($m);
$mer->canonicalize;
print($mer, " ", $qf->get($mer), "\n");
}
```
----
+
+jellyfish count
+===============
+
+The following will parse all the 30-mers in a string (`str` in the examples) and store them in a hash counter. In the following code, by replacing `string_mers` by `string_canonicals`, one gets only the canonical mers.
+
+----
+##### Ruby
+```Ruby
+require 'jellyfish'
+
+Jellyfish::MerDNA::k(30)
+h = Jellyfish::HashCounter.new(1024, 5)
+str.mers { |m|
+ h.add(m, 1)
+}
+```
+
+----
+##### Python
+```Python
+import jellyfish
+
+jellyfish.MerDNA.k(30)
+h = jellyfish.HashCounter(1024, 5)
+mers = jellyfish.string_mers(str)
+for m in mers:
+ h.add(m, 1)
+```
+
+---
+##### Perl
+```Perl
+use jellyfish;
+
+jellyfish::MerDNA::k(30);
+my $h = jellyfish::HashCounter->new(1024, 5);
+my $mers = jellyfish::string_mers($str);
+while($mers->next_mer) {
+ $h->add($mers->mer, 1);
+}
+```
+
+The argument to the `HashCounter` constructor are the initial size of the hash and the number of bits in the value field (call it `b`). Note that the number `b` of bits in the value field does not create an upper limit on the value of the count. For best performance, `b` should be chosen so that most k-mers in the hash have a count less than 2^b (2 to the power of b).
diff --git a/swig/jellyfish.i b/swig/jellyfish.i
index 167a0be..2a916ec 100644
--- a/swig/jellyfish.i
+++ b/swig/jellyfish.i
@@ -31,3 +31,4 @@
%include "mer_file.i"
%include "hash_counter.i"
%include "hash_set.i"
+%include "string_mers.i"
diff --git a/swig/perl5/t/test_string_mers.t b/swig/perl5/t/test_string_mers.t
new file mode 100644
index 0000000..2f85a00
--- /dev/null
+++ b/swig/perl5/t/test_string_mers.t
@@ -0,0 +1,30 @@
+use strict;
+use warnings;
+use Test::More;
+
+require_ok('jellyfish');
+jellyfish::MerDNA::k(int(rand(100)) + 10);
+
+my @bases = ("A", "C", "G", "T", "a", "c", "g", "t");
+my $str;
+for(my $i = 0; $i < 1000; $i++) {
+ $str .= $bases[int(rand(8))];
+}
+
+my $count = 0;
+my $m2 = jellyfish::MerDNA->new;
+my $mers = jellyfish::string_mers($str);
+my $good_mers = 1;
+my $good_strs = 1;
+while($mers->next_mer) {
+ my $mstr = uc(substr($str, $count, jellyfish::MerDNA::k()));
+ $m2->set($mstr);
+ $good_strs &&= $mers->mer == $m2;
+ $good_mers &&= $mers->mer eq $mstr;
+ $count++;
+}
+ok($good_mers, "Mers equal");
+ok($good_strs, "Strs equal");
+ok($count == length($str) - jellyfish::MerDNA::k() + 1, "Number of mers");
+
+done_testing;
diff --git a/swig/python/test_string_mers.py b/swig/python/test_string_mers.py
new file mode 100644
index 0000000..bf7d6b1
--- /dev/null
+++ b/swig/python/test_string_mers.py
@@ -0,0 +1,39 @@
+import unittest
+import sys
+import random
+import jellyfish
+
+class TestStringMers(unittest.TestCase):
+ def setUp(self):
+ bases = "ACGTacgt"
+ self.str = ''.join(random.choice(bases) for _ in range(1000))
+ self.k = random.randint(10, 110)
+ jellyfish.MerDNA.k(self.k)
+
+ def test_all_mers(self):
+ count = 0
+ good = True
+ mers = jellyfish.string_mers(self.str)
+ for m in mers:
+ m2 = jellyfish.MerDNA(self.str[count:count+self.k])
+ good = good and m == m2
+ count += 1
+ self.assertTrue(good)
+ self.assertEqual(len(self.str) - self.k + 1, count)
+
+ def test_canonical_mers(self):
+ good = True
+ mers = jellyfish.string_canonicals(self.str)
+ for count, m in enumerate(mers):
+ m2 = jellyfish.MerDNA(self.str[count:count+self.k])
+ rm2 = m2.get_reverse_complement()
+ good = good and (m == m2 or m == rm2)
+ good = good and (not (m > m2)) and (not (m > rm2))
+ # count += 1
+ self.assertTrue(good)
+ self.assertEqual(len(self.str) - self.k + 0, count)
+
+
+if __name__ == '__main__':
+ data = sys.argv.pop(1)
+ unittest.main()
diff --git a/swig/ruby/test_string_mers.rb b/swig/ruby/test_string_mers.rb
new file mode 100644
index 0000000..05c8190
--- /dev/null
+++ b/swig/ruby/test_string_mers.rb
@@ -0,0 +1,36 @@
+require 'minitest/autorun'
+require 'jellyfish'
+
+class TestStringMers < MiniTest::Unit::TestCase
+ def setup
+ bases = "ACGTacgt"
+ @str = (0..1000).map { bases[rand(bases.size())] }.join("")
+ Jellyfish::MerDNA::k(rand(100) + 10)
+ end
+
+ def test_all_mers
+ count = 0
+ m2 = Jellyfish::MerDNA.new
+
+ @str.mers.each_with_index { |m, i|
+ assert_equal m.to_s, @str[i, Jellyfish::MerDNA::k()].upcase
+ m2.set @str[i, Jellyfish::MerDNA::k()]
+ assert_equal m2, m
+ count += 1
+ }
+ assert_equal @str.size - Jellyfish::MerDNA::k() + 1, count
+ end
+
+ def test_canonical_mers
+ count = 0
+ m2 = Jellyfish::MerDNA.new
+ @str.canonicals { |m|
+ m2.set @str[count, Jellyfish::MerDNA::k()]
+ cm2 = m2.get_reverse_complement
+ assert(m2 == m || m == cm2)
+ assert(!(m > m2) && !(m > cm2));
+ count += 1
+ }
+ assert_equal @str.size - Jellyfish::MerDNA::k() + 1, count
+ end
+end
diff --git a/swig/string_mers.i b/swig/string_mers.i
index 9a1f49e..064ee06 100644
--- a/swig/string_mers.i
+++ b/swig/string_mers.i
@@ -1,6 +1,143 @@
/****************************************/
/* Iterator of all the mers in a string */
/****************************************/
+#ifdef SWIGPYTHON
+%exception __next__ {
+ $action;
+ if(!result) {
+ PyErr_SetString(PyExc_StopIteration, "Done");
+ SWIG_fail;
+ }
+ }
+%exception next {
+ $action;
+ if(!result) {
+ PyErr_SetString(PyExc_StopIteration, "Done");
+ SWIG_fail;
+ }
+ }
+#endif
+
%{
+ class StringMers {
+ const char* m_current;
+ const char* const m_last;
+ const bool m_canonical;
+ MerDNA m_m, m_rcm;
+ unsigned int m_filled;
+
+ public:
+ StringMers(const char* str, int len, bool canonical)
+ : m_current(str)
+ , m_last(str + len)
+ , m_canonical(canonical)
+ , m_filled(0)
+ { }
+
+ bool next_mer() {
+ if(m_current == m_last)
+ return false;
+
+ do {
+ int code = jellyfish::mer_dna::code(*m_current);
+ ++m_current;
+ if(code >= 0) {
+ m_m.shift_left(code);
+ if(m_canonical)
+ m_rcm.shift_right(m_rcm.complement(code));
+ m_filled = std::min(m_filled + 1, m_m.k());
+ } else
+ m_filled = 0;
+ } while(m_filled < m_m.k() && m_current != m_last);
+ return m_filled == m_m.k();
+ }
+
+ const MerDNA* mer() const { return !m_canonical || m_m < m_rcm ? &m_m : &m_rcm; }
+
+ const MerDNA* next_mer__() {
+ return next_mer() ? mer() : nullptr;
+ }
+
+
+#ifdef SWIGRUBY
+ void each() {
+ if(!rb_block_given_p()) return;
+ while(next_mer()) {
+ auto m = SWIG_NewPointerObj(const_cast<MerDNA*>(mer()), SWIGTYPE_p_MerDNA, 0);
+ rb_yield(m);
+ }
+ }
+#endif
+
+#ifdef SWIGPYTHON
+ StringMers* __iter__() { return this; }
+ const MerDNA* __next__() { return next_mer__(); }
+ const MerDNA* next() { return next_mer__(); }
+#endif
+
+#ifdef SWIGPERL
+ const MerDNA* each() { return next_mer__(); }
+#endif
+
+ };
+ StringMers* string_mers(char* str, int length) { return new StringMers(str, length, false); }
+ StringMers* string_canonicals(char* str, int length) { return new StringMers(str, length, true); }
%}
+
+%apply (char *STRING, int LENGTH) { (char* str, int length) };
+%newobject string_mers;
+%newobject string_canonicals;
+%feature("autodoc", "Get an iterator to the mers in the string");
+StringMers* string_mers(char* str, int length);
+%feature("autodoc", "Get an iterator to the canonical mers in the string");
+StringMers* string_canonicals(char* str, int length);
+
+#ifdef SWIGRUBY
+%mixin StringMers "Enumerable";
+%init %{
+ rb_eval_string("class String\n"
+ " def mers(&b); it = Jellyfish::string_mers(self); b ? it.each(&b) : it; end\n"
+ " def canonicals(&b); it = Jellyfish::string_canonicals(self, &b); b ? it.each(&b) : it; end\n"
+ "end");
+%}
+#endif
+
+/* #ifdef SWIGPERL */
+/* // For perl, return an empty array at end of iterator */
+/* %typemap(out) const MerDNA* { */
+/* if($1) { */
+/* SWIG_Object m = SWIG_NewPointerObj(const_cast<MerDNA*>($1), SWIGTYPE_p_MerDNA, 0); */
+/* %append_output(m); */
+/* } */
+/* } */
+/* #endif */
+
+
+%feature("autodoc", "Extract k-mers from a sequence string");
+class StringMers {
+public:
+ %feature("autodoc", "Create a k-mers parser from a string. Pass true as a second argument to get canonical mers");
+ StringMers(const char* str, int len, bool canonical);
+
+ %feature("autodoc", "Get the next mer. Return false if reached the end of the string.");
+ bool next_mer();
+
+ %feature("autodoc", "Return the current mer (or its canonical representation)");
+ const MerDNA* mer() const;
+
+#ifdef SWIGRUBY
+ %feature("autodoc", "Iterate through all the mers in the string");
+ void each();
+#endif
+
+#ifdef SWIGPYTHON
+ StringMers* __iter__();
+ const MerDNA* __next__();
+ const MerDNA* next();
+#endif
+
+#ifdef SWIGPERL
+ MerDNA* each();
+#endif
+};
diff --git a/tests/swig_perl.sh b/tests/swig_perl.sh
index 5aa9252..0223c37 100644
--- a/tests/swig_perl.sh
+++ b/tests/swig_perl.sh
@@ -11,7 +11,7 @@ $JF count -m $K -s 10M -t $nCPUs -C -o ${pref}.jf seq1m_$I.fa
$JF dump -c ${pref}.jf > ${pref}.dump
$JF histo ${pref}.jf > ${pref}.histo
-for i in test_mer_file.t test_hash_counter.t; do
+for i in test_mer_file.t test_hash_counter.t test_string_mers.t; do
echo Test $i
$PERL "-I$LOADPATH/.libs" "-I$LOADPATH" "-I$SRCDIR/swig/perl5" "$SRCDIR/swig/perl5/t/$i" .
done
diff --git a/tests/swig_python.sh b/tests/swig_python.sh
index e70bba5..bf8e6ae 100644
--- a/tests/swig_python.sh
+++ b/tests/swig_python.sh
@@ -11,7 +11,7 @@ $JF count -m $K -s 10M -t $nCPUs -C -o ${pref}.jf seq1m_$I.fa
$JF dump -c ${pref}.jf > ${pref}.dump
$JF histo ${pref}.jf > ${pref}.histo
-for i in test_mer_file.py test_hash_counter.py; do
+for i in test_mer_file.py test_hash_counter.py test_string_mers.py; do
echo Test $i
$PYTHON "$SRCDIR/swig/python/$i" .
done
diff --git a/tests/swig_ruby.sh b/tests/swig_ruby.sh
index 0644fcc..cc4d866 100644
--- a/tests/swig_ruby.sh
+++ b/tests/swig_ruby.sh
@@ -13,7 +13,7 @@ $JF histo ${pref}.jf > ${pref}.histo
-for i in test_mer_file.rb test_hash_counter.rb; do
+for i in test_mer_file.rb test_hash_counter.rb test_string_mers.rb; do
echo Test $i
$RUBY "-I$LOADPATH" "$SRCDIR/swig/ruby/$i" .
done
diff --git a/unit_tests/test_mer_dna.cc b/unit_tests/test_mer_dna.cc
index a6f27b8..9415219 100644
--- a/unit_tests/test_mer_dna.cc
+++ b/unit_tests/test_mer_dna.cc
@@ -461,7 +461,8 @@ TYPED_TEST(MerDNA, GetBits) {
for(unsigned int i = 0; i < 20; ++i) {
long int start = random() % (this->GetParam().size() - 1);
long int max_len =
- std::min(this->GetParam().size() - start, 8 * sizeof(typename TypeParam::Type::base_type));
+ std::min((long int)(this->GetParam().size() - start),
+ (long int)(8 * sizeof(typename TypeParam::Type::base_type)));
long int len = (random() % (max_len - 1)) + 1;
// Get bits by right-shifting
diff --git a/unit_tests/test_stdio_filebuf.cc b/unit_tests/test_stdio_filebuf.cc
index 078003a..66c3b5a 100644
--- a/unit_tests/test_stdio_filebuf.cc
+++ b/unit_tests/test_stdio_filebuf.cc
@@ -50,6 +50,7 @@ TEST(StdioFileBuf, Read) {
have_read += expect_read;
}
- EXPECT_TRUE(fd_stream.eof());
+ EXPECT_TRUE(fd_stream.eof() || fd_stream.peek() == EOF);
+ EXPECT_TRUE(file_stream.eof() || file_stream.peek() == EOF);
}
}
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/jellyfish.git
More information about the debian-med-commit
mailing list