[med-svn] [jellyfish] 03/07: Imported Upstream version 2.2.5

Michael Crusoe misterc-guest at moszumanska.debian.org
Sat Mar 12 22:17:31 UTC 2016


This is an automated email from the git hooks/post-receive script.

misterc-guest pushed a commit to branch master
in repository jellyfish.

commit 87fdddc6618aadfb3970ab71802f3966a0621c75
Author: Michael R. Crusoe <crusoe at ucdavis.edu>
Date:   Sat Mar 12 08:07:43 2016 -0800

    Imported Upstream version 2.2.5
---
 .travis.yml                                       |   6 +
 Makefile.am                                       |   6 +-
 README.md                                         |  39 ++++--
 configure.ac                                      |   2 +-
 include/jellyfish/generator_manager.hpp           |  45 ++++---
 include/jellyfish/mer_overlap_sequence_parser.hpp |   8 +-
 include/jellyfish/whole_sequence_parser.hpp       |  11 +-
 lib/generator_manager.cc                          |  22 ++--
 swig/Readme.md                                    |  95 ++++++++-------
 swig/jellyfish.i                                  |   1 +
 swig/perl5/t/test_string_mers.t                   |  30 +++++
 swig/python/test_string_mers.py                   |  39 ++++++
 swig/ruby/test_string_mers.rb                     |  36 ++++++
 swig/string_mers.i                                | 137 ++++++++++++++++++++++
 tests/swig_perl.sh                                |   2 +-
 tests/swig_python.sh                              |   2 +-
 tests/swig_ruby.sh                                |   2 +-
 unit_tests/test_mer_dna.cc                        |   3 +-
 unit_tests/test_stdio_filebuf.cc                  |   3 +-
 19 files changed, 388 insertions(+), 101 deletions(-)

diff --git a/.travis.yml b/.travis.yml
new file mode 100644
index 0000000..d8fd723
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,6 @@
+language: cpp
+before_install:
+  - mkdir -p ~/bin
+  - curl -L -o ~/bin/yaggo https://github.com/gmarcais/yaggo/releases/download/v1.5.9/yaggo
+  - chmod a+rx ~/bin/yaggo
+script: autoreconf -i && ./configure YAGGO=~/bin/yaggo && make && make check
diff --git a/Makefile.am b/Makefile.am
index a28d848..1a6ae27 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -164,9 +164,9 @@ TESTS += tests/swig_python.sh tests/swig_ruby.sh tests/swig_perl.sh
 tests/swig_python.log: tests/generate_sequence.log
 tests/swig_ruby.log: tests/generate_sequence.log
 tests/swig_perl.log: tests/generate_sequence.log
-EXTRA_DIST += swig/python/test_mer_file.py swig/python/test_hash_counter.py
-EXTRA_DIST += swig/ruby/test_mer_file.rb swig/ruby/test_hash_counter.rb
-EXTRA_DIST += swig/perl5/t/test_mer_file.t swig/perl5/t/test_hash_counter.t
+EXTRA_DIST += swig/python/test_mer_file.py swig/python/test_hash_counter.py swig/python/test_string_mers.py
+EXTRA_DIST += swig/ruby/test_mer_file.rb swig/ruby/test_hash_counter.rb swig/ruby/test_string_mers.rb
+EXTRA_DIST += swig/perl5/t/test_mer_file.t swig/perl5/t/test_hash_counter.t swig/perl5/t/test_string_mers.t
 
 
 ##############
diff --git a/README.md b/README.md
index 8c8af70..6c72475 100644
--- a/README.md
+++ b/README.md
@@ -15,14 +15,20 @@ If you use Jellyfish in your research, please cite:
 Installation
 ------------
 
-To get an easier to compiled packaged tar ball of the source code, download a release from [home page of Jellyfish at the University of Maryland][1] or from the [github release][3].
+To get an easier to compiled packaged tar ball of the source code, download a release from the [github release][3]. You need make and g++ version 4.4 or higher. To install in your home directory, do:
 
-To compile from the git tree, you will need autoconf/automake, make, g++ 4.4 or newer and [yaggo](https://github.com/gmarcais/yaggo "Yaggo on github"). Then compile with:
+```Shell
+./configure --prefix=$HOME
+make -j 4
+make install
+```
+
+To compile from the git tree, you will also need autoconf/automake, and [yaggo](https://github.com/gmarcais/yaggo/releases "Yaggo release on github"). Then to compile and install (in `/usr/local` in that example) with:
 
 ```Shell
 autoreconf -i
 ./configure
-make
+make -j 4
 sudo make install
 ```
 
@@ -35,24 +41,35 @@ In the examples directory are potentially useful extra programs to query/manipul
 Binding to script languages
 ---------------------------
 
-Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the [release tarball][3]: [SWIG][2] is not required and in the command lines shown below, remove the `--enable-swig` switch. Only the development files of the scripting languages are required.
+Bindings to Ruby, Python and Perl are provided. This binding allows to read the output file of Jellyfish directly in a scripting language. Compilation of the bindings is easier from the [release tarball][3]. The development files of the target scripting language are required.
+
+Compilation of the bindings from the git tree requires [SWIG][2] version 3 and adding the switch `--enable-swig` to the configure command lines show below.
+
+To compile all three bindings, configure and compile with:
+
+```Shell
+./configure --enable-ruby-binding --enable-python-binding --enable-perl-binding
+make -j 4
+sudo make install
+```
 
-Compilation of the bindings from the git tree requires [SWIG][2] version 3, and the development files of the scripting languages. To compile all three bindings, configure with:
+By default, Jellyfish is installed in `/usr/local` and the bindings are installed in the proper system location. When the `--prefix` switch is passed, the bindings are installed in the given directory. For example:
 
 ```Shell
-./configure --enable-swig --enable-ruby-binding --enable-python-binding --enable-perl-binding
+./configure --prefix=$HOME --enable-python-binding
+make -j 4
+make install
 ```
 
-Note that the headers of older version of Perl 5 do not compile with recent compilers (g++ > 4.4, clang++) and C++11 mode enable. One may have to specify the path to version 4.4 of gcc by adding, for example, `CXX=g++4.4` to the configure commande line.
+This will install the python binding in `$HOME/lib/python2.7/site-packages` (adjust based on your Python version).
 
-The binding can installed in a different location than the default (which may require root privileges for example) by passing a path to the `--enable` switches. Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (`PYTHONPATH`, `RUBYLIB` and `PERL5LIB` respectively). For example:
+Then, for Python, Ruby or Perl to find the binding, an environment variable may need to be adjusted (`PYTHONPATH`, `RUBYLIB` and `PERL5LIB` respectively). For example:
 
 ```Shell
-./configure --prefix=$HOME --enable-swig --enable-python-binding=$HOME/lib/python
-export PYTHONPATH=$HOME/lib/python
+export PYTHONPATH=$HOME/lib/python2.7/site-packages
 ```
 
-See the `swig` directory for examples on how to use the bindings.
+See the [swig directory](../../tree/master/swig) for examples on how to use the bindings.
 
 [1]: http://www.genome.umd.edu/jellyfish.html "Genome group at University of Maryland"
 [2]: http://www.swig.org/
diff --git a/configure.ac b/configure.ac
index e067893..7515855 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1,4 +1,4 @@
-AC_INIT([jellyfish], [2.2.4], [gmarcais at umd.edu])
+AC_INIT([jellyfish], [2.2.5], [gmarcais at umd.edu])
 AC_CANONICAL_HOST
 AC_CONFIG_MACRO_DIR([m4])
 AM_INIT_AUTOMAKE([subdir-objects foreign parallel-tests color-tests])
diff --git a/include/jellyfish/generator_manager.hpp b/include/jellyfish/generator_manager.hpp
index 64cd0b0..0d9224e 100644
--- a/include/jellyfish/generator_manager.hpp
+++ b/include/jellyfish/generator_manager.hpp
@@ -14,9 +14,8 @@
     along with Jellyfish.  If not, see <http://www.gnu.org/licenses/>.
 */
 
-
-#ifndef __JELLYFISH_SPAWN_EXTERNAL_HPP_
-#define __JELLYFISH_SPAWN_EXTERNAL_HPP_
+#ifndef __JELLYFISH_GENERATOR_MANAGER_H__
+#define __JELLYFISH_GENERATOR_MANAGER_H__
 
 #ifdef HAVE_CONFIG_H
 #include <config.h>
@@ -104,8 +103,7 @@ public:
 // This class creates a new process which manages a bunch of
 // "generators", sub-processes that writes into a fifo (named pipe)
 // and generate sequence.
-class generator_manager {
-  cloexec_istream cmds_;
+class generator_manager_base {
   tmp_pipes       pipes_;
   pid_t           manager_pid_;
   const char*     shell_;
@@ -119,21 +117,18 @@ class generator_manager {
   pid2pipe_type pid2pipe_;
 
 public:
-  generator_manager(const char* cmds, int nb_pipes, const char* shell = 0) :
-    cmds_(cmds),
+  generator_manager_base(int nb_pipes, const char* shell = 0) :
     pipes_(nb_pipes),
     manager_pid_(-1),
     shell_(shell),
     kill_signal_(0)
   {
-    if(!cmds_.good())
-      throw std::runtime_error(err::msg() << "Failed to open cmds file '" << cmds << "'");
     if(!shell_)
       shell_ = getenv("SHELL");
     if(!shell_)
       shell_ = "/bin/sh";
   }
-  ~generator_manager() { wait(); }
+  virtual ~generator_manager_base() { wait(); }
 
   const tmp_pipes& pipes() const { return pipes_; }
   pid_t pid() const { return manager_pid_; }
@@ -144,12 +139,9 @@ public:
   // with no error, false otherwise.
   bool wait();
 
-private:
-  /// Read commands from the cmds stream. There is one command per
-  /// line. Empty lines or lines whose first non-white space character
-  /// is a # are ignored. Return an empty string when no more commands
-  /// are available.
-  std::string get_cmd();
+protected:
+  virtual std::string get_cmd() = 0;
+  virtual void parent_cleanup() { }
   void start_commands();
   void start_one_command(const std::string& command, int pipe);
   bool display_status(int status, const std::string& command);
@@ -158,6 +150,25 @@ private:
   static void signal_handler(int signal);
   void cleanup();
 };
+
+class generator_manager : public generator_manager_base {
+  cloexec_istream cmds_;
+public:
+  generator_manager(const char* cmds, int nb_pipes, const char* shell = 0)
+    : generator_manager_base(nb_pipes, shell)
+    , cmds_(cmds)
+  {
+    if(!cmds_.good())
+      throw std::runtime_error(err::msg() << "Failed to open cmds file '" << cmds << "'");
+  }
+
+  void parent_cleanup() {
+    cmds_.close();
+  }
+
+  std::string get_cmd();
+};
 }
 
-#endif /* __JELLYFISH_SPAWN_EXTERNAL_HPP_ */
+#endif /* __JELLYFISH_GENERATOR_MANAGER_H__ */
+
diff --git a/include/jellyfish/mer_overlap_sequence_parser.hpp b/include/jellyfish/mer_overlap_sequence_parser.hpp
index f5bd549..488cd83 100644
--- a/include/jellyfish/mer_overlap_sequence_parser.hpp
+++ b/include/jellyfish/mer_overlap_sequence_parser.hpp
@@ -208,9 +208,9 @@ protected:
 
   size_t read_sequence(std::istream& is, const size_t read, char* const start, const char stop) {
     size_t nread = read;
+
+    skip_newlines(is); // Skip new lines -> get below doesn't like them
     while(is && nread < buf_size_ - 1 && is.peek() != stop) {
-      // Skip new lines -> get below does like them
-      skip_newlines(is);
       is.get(start + nread, buf_size_ - nread);
       nread += is.gcount();
       skip_newlines(is);
@@ -231,12 +231,14 @@ protected:
   void skip_quals(std::istream& is, size_t read_len) {
     ignore_line(is);
     size_t quals = 0;
+
+    skip_newlines(is);
     while(is.good() && quals < read_len) {
-      skip_newlines(is);
       is.ignore(read_len - quals + 1, '\n');
       quals += is.gcount();
       if(is)
         ++read_len;
+      skip_newlines(is);
     }
     skip_newlines(is);
     if(quals == read_len && (is.peek() == '@' || is.peek() == EOF))
diff --git a/include/jellyfish/whole_sequence_parser.hpp b/include/jellyfish/whole_sequence_parser.hpp
index 2474e10..46b0520 100644
--- a/include/jellyfish/whole_sequence_parser.hpp
+++ b/include/jellyfish/whole_sequence_parser.hpp
@@ -136,7 +136,11 @@ protected:
       header_sequence_qual& fill_buff = buff.data[nb_filled];
       st.stream->get(); // Skip '@'
       std::getline(*st.stream, fill_buff.header);
-      fill_buff.seq.clear();
+
+      if(st.stream->peek() != '+')
+        std::getline(*st.stream, fill_buff.seq);
+      else
+        fill_buff.seq.clear();
       while(st.stream->peek() != '+' && st.stream->peek() != EOF) {
         std::getline(*st.stream, st.buffer); // Wish there was an easy way to combine the
         fill_buff.seq.append(st.buffer);             // two lines avoiding copying
@@ -144,7 +148,10 @@ protected:
       if(!st.stream->good())
         throw std::runtime_error("Truncated fastq file");
       st.stream->ignore(std::numeric_limits<std::streamsize>::max(), '\n');
-      fill_buff.qual.clear();
+      if(st.stream->peek() != '+')
+        std::getline(*st.stream, fill_buff.qual);
+      else
+        fill_buff.qual.clear();
       while(fill_buff.qual.size() < fill_buff.seq.size() && st.stream->good()) {
         std::getline(*st.stream, st.buffer);
         fill_buff.qual.append(st.buffer);
diff --git a/lib/generator_manager.cc b/lib/generator_manager.cc
index 1d03541..a9b99ce 100644
--- a/lib/generator_manager.cc
+++ b/lib/generator_manager.cc
@@ -109,7 +109,7 @@ void tmp_pipes::cleanup() {
   rmdir(tmpdir_.c_str());
 }
 
-void generator_manager::start()  {
+void generator_manager_base::start()  {
   if(manager_pid_ != -1)
     return;
   manager_pid_ = fork();
@@ -121,7 +121,7 @@ void generator_manager::start()  {
     manager_pid_ = -1;
     break;
   default:
-    cmds_.close();
+    parent_cleanup();
     return;
   }
 
@@ -145,11 +145,11 @@ void generator_manager::start()  {
   exit(EXIT_FAILURE); // Should not be reached
 }
 
-static generator_manager* manager = 0;
-void generator_manager::signal_handler(int signal) {
+static generator_manager_base* manager = 0;
+void generator_manager_base::signal_handler(int signal) {
   manager->kill_signal_ = signal;
 }
-int generator_manager::setup_signal_handlers() {
+int generator_manager_base::setup_signal_handlers() {
   struct sigaction act;
   memset(&act, '\0', sizeof(act));
   act.sa_handler = signal_handler;
@@ -157,14 +157,14 @@ int generator_manager::setup_signal_handlers() {
   // Should we redefine other signals as well? Like SIGINT, SIGQUIT?
 }
 
-void generator_manager::unset_signal_handlers() {
+void generator_manager_base::unset_signal_handlers() {
   struct sigaction act;
   memset(&act, '\0', sizeof(act));
   act.sa_handler = SIG_DFL;
   sigaction(SIGTERM, &act, 0);
 }
 
-bool generator_manager::wait() {
+bool generator_manager_base::wait() {
   if(manager_pid_ == -1) return false;
   pid_t pid = manager_pid_;
   manager_pid_ = -1;
@@ -174,7 +174,7 @@ bool generator_manager::wait() {
   return WIFEXITED(status) && (WEXITSTATUS(status) == 0);
 }
 
-void generator_manager::cleanup() {
+void generator_manager_base::cleanup() {
   for(auto it = pid2pipe_.begin(); it != pid2pipe_.end(); ++it) {
     kill(it->first, SIGTERM);
     pipes_.discard(it->second.pipe);
@@ -182,7 +182,7 @@ void generator_manager::cleanup() {
   pipes_.cleanup();
 }
 
-void generator_manager::start_one_command(const std::string& command, int pipe)
+void generator_manager_base::start_one_command(const std::string& command, int pipe)
 {
   cmd_info_type info = { command, pipe };
   pid_t child = fork();
@@ -228,7 +228,7 @@ std::string generator_manager::get_cmd() {
   return command;
 }
 
-void generator_manager::start_commands()
+void generator_manager_base::start_commands()
 {
   std::string command;
   size_t i;
@@ -263,7 +263,7 @@ void generator_manager::start_commands()
   }
 }
 
-bool generator_manager::display_status(int status, const std::string& command)
+bool generator_manager_base::display_status(int status, const std::string& command)
 {
   if(WIFEXITED(status) && WEXITSTATUS(status) != 0) {
     std::cerr << "Command '" << command
diff --git a/swig/Readme.md b/swig/Readme.md
index ae8868a..37d21ca 100644
--- a/swig/Readme.md
+++ b/swig/Readme.md
@@ -7,51 +7,6 @@ output files and use the Jellyfish hash within these scripting
 languages, which is much more convenient than in C++, although
 somewhat slower.
 
-Installation
-============
-
-Requirements
-------------
-
-In the following, it is assumed that Jellyfish has been properly
-installed and is visible to the 'pkg-config' tool. The following
-command:
-
-```Shell
-pkg-config --exists jellyfish-2.0 && echo yes
-```
-
-Must print 'yes'. If not, see the README in the Jellyfish on how to
-install and, if necessary, setup the 'PKG\_CONFIG\_PATH' variable.
-
-The [swig](http://www.swig.org/) software package must be
-installed. All the testing is done with version 3.x. Version 2.x MAY
-work, but is not tested.
-
-Configure
----------
-
-To compile the bindings, use, according to taste, some of the the following switches with configure:
-
-```Shell
-./configure --enable-swig --enable-python-binding --enable-ruby-binding --enable-perl-binding
-```
-
-In addition, each of the `--enable-*-binding` switch can take a path where to install the binding. This allows to install without root privilegies. For example:
-
-```Shell
-./configure --prefix=`pwd`/inst --enable-swig --enable-python-binding=`pwd`/inst/python
-make
-make install
-```
-
-will install `jellyfish` in `./inst/bin` and the python files in `./inst/python`. Then, one needs to add `$(pwd)/inst/python` to `PYTHONPATH` to use the binding. Similarly with ruby and `RUBYLIB`, perl and `PERL5LIB`.
-
-The swig bindings were tested with Python 3.3.3, Ruby 1.9.3 and Perl 5.18.1. The Perl headers may not compile properly with recent version of g++. It compiles properly with g++ version 4.4. Hence, you may need to pass the path to `g++` version 4.4 to the configure command line. For example:
-
-```Shell
-./configure --enable-swig --enable-perl-binding CXX=g++-4.4
-```
 
 Examples
 ========
@@ -71,7 +26,7 @@ import jellyfish
 
 mf = jellyfish.ReadMerFile(sys.argv[1])
 for mer, count in mf:
-    print(mer, " ", count)
+    print mer, count
 ```
 
 ----
@@ -113,7 +68,7 @@ qf = jellyfish.QueryMerFile(sys.argv.[1])
 for m in sys.argv[2:]:
     mer = jellyfish.MerDNA(m)
     mer.canonicalize()
-    print(mer, " ", qf[mer])
+    print mer, qf[mer])
 ```
 
 ----
@@ -136,9 +91,53 @@ use jellyfish;
 
 my $qf = jellyfish::QueryMerFile->new(shift(@ARGV));
 foreach my $m (@ARGV) {
-  my $mer = Jellyfish::MerDNA->new($m);
+  my $mer = jellyfish::MerDNA->new($m);
   $mer->canonicalize;
   print($mer, " ", $qf->get($mer), "\n");
 }
 ```
 ----
+
+jellyfish count
+===============
+
+The following will parse all the 30-mers in a string (`str` in the examples) and store them in a hash counter. In the following code, by replacing `string_mers` by `string_canonicals`, one gets only the canonical mers.
+
+----
+##### Ruby
+```Ruby
+require 'jellyfish'
+
+Jellyfish::MerDNA::k(30)
+h = Jellyfish::HashCounter.new(1024, 5)
+str.mers { |m|
+  h.add(m, 1)
+}
+```
+
+----
+##### Python
+```Python
+import jellyfish
+
+jellyfish.MerDNA.k(30)
+h = jellyfish.HashCounter(1024, 5)
+mers = jellyfish.string_mers(str)
+for m in mers:
+    h.add(m, 1)
+```
+
+---
+##### Perl
+```Perl
+use jellyfish;
+
+jellyfish::MerDNA::k(30);
+my $h = jellyfish::HashCounter->new(1024, 5);
+my $mers = jellyfish::string_mers($str);
+while($mers->next_mer) {
+  $h->add($mers->mer, 1);
+}
+```
+
+The argument to the `HashCounter` constructor are the initial size of the hash and the number of bits in the value field (call it `b`). Note that the number `b` of bits in the value field does not create an upper limit on the value of the count. For best performance, `b` should be chosen so that most k-mers in the hash have a count less than 2^b (2 to the power of b).
diff --git a/swig/jellyfish.i b/swig/jellyfish.i
index 167a0be..2a916ec 100644
--- a/swig/jellyfish.i
+++ b/swig/jellyfish.i
@@ -31,3 +31,4 @@
 %include "mer_file.i"
 %include "hash_counter.i"
 %include "hash_set.i"
+%include "string_mers.i"
diff --git a/swig/perl5/t/test_string_mers.t b/swig/perl5/t/test_string_mers.t
new file mode 100644
index 0000000..2f85a00
--- /dev/null
+++ b/swig/perl5/t/test_string_mers.t
@@ -0,0 +1,30 @@
+use strict;
+use warnings;
+use Test::More;
+
+require_ok('jellyfish');
+jellyfish::MerDNA::k(int(rand(100)) + 10);
+
+my @bases = ("A", "C", "G", "T", "a", "c", "g", "t");
+my $str;
+for(my $i = 0; $i < 1000; $i++) {
+  $str .= $bases[int(rand(8))];
+}
+
+my $count = 0;
+my $m2 = jellyfish::MerDNA->new;
+my $mers = jellyfish::string_mers($str);
+my $good_mers = 1;
+my $good_strs = 1;
+while($mers->next_mer) {
+  my $mstr = uc(substr($str, $count, jellyfish::MerDNA::k()));
+  $m2->set($mstr);
+  $good_strs &&= $mers->mer == $m2;
+  $good_mers &&= $mers->mer eq $mstr;
+  $count++;
+}
+ok($good_mers, "Mers equal");
+ok($good_strs, "Strs equal");
+ok($count == length($str) - jellyfish::MerDNA::k() + 1, "Number of mers");
+
+done_testing;
diff --git a/swig/python/test_string_mers.py b/swig/python/test_string_mers.py
new file mode 100644
index 0000000..bf7d6b1
--- /dev/null
+++ b/swig/python/test_string_mers.py
@@ -0,0 +1,39 @@
+import unittest
+import sys
+import random
+import jellyfish
+
+class TestStringMers(unittest.TestCase):
+    def setUp(self):
+        bases = "ACGTacgt"
+        self.str = ''.join(random.choice(bases) for _ in range(1000))
+        self.k = random.randint(10, 110)
+        jellyfish.MerDNA.k(self.k)
+
+    def test_all_mers(self):
+        count = 0
+        good = True
+        mers = jellyfish.string_mers(self.str)
+        for m in mers:
+            m2 = jellyfish.MerDNA(self.str[count:count+self.k])
+            good = good and m == m2
+            count += 1
+        self.assertTrue(good)
+        self.assertEqual(len(self.str) - self.k + 1, count)
+
+    def test_canonical_mers(self):
+        good = True
+        mers = jellyfish.string_canonicals(self.str)
+        for count, m in enumerate(mers):
+            m2 = jellyfish.MerDNA(self.str[count:count+self.k])
+            rm2 = m2.get_reverse_complement()
+            good = good and (m == m2 or m == rm2)
+            good = good and (not (m > m2)) and (not (m > rm2))
+            # count += 1
+        self.assertTrue(good)
+        self.assertEqual(len(self.str) - self.k + 0, count)
+        
+
+if __name__ == '__main__':
+    data = sys.argv.pop(1)
+    unittest.main()
diff --git a/swig/ruby/test_string_mers.rb b/swig/ruby/test_string_mers.rb
new file mode 100644
index 0000000..05c8190
--- /dev/null
+++ b/swig/ruby/test_string_mers.rb
@@ -0,0 +1,36 @@
+require 'minitest/autorun'
+require 'jellyfish'
+
+class TestStringMers < MiniTest::Unit::TestCase
+  def setup
+    bases = "ACGTacgt"
+    @str = (0..1000).map { bases[rand(bases.size())] }.join("")
+    Jellyfish::MerDNA::k(rand(100) + 10)
+  end
+
+  def test_all_mers
+    count = 0
+    m2 = Jellyfish::MerDNA.new
+
+    @str.mers.each_with_index { |m, i|
+      assert_equal m.to_s, @str[i, Jellyfish::MerDNA::k()].upcase
+      m2.set @str[i, Jellyfish::MerDNA::k()]
+      assert_equal m2, m
+      count += 1
+    }
+    assert_equal @str.size - Jellyfish::MerDNA::k() + 1, count
+  end
+
+  def test_canonical_mers
+    count = 0
+    m2 = Jellyfish::MerDNA.new
+    @str.canonicals { |m|
+      m2.set @str[count, Jellyfish::MerDNA::k()]
+      cm2 = m2.get_reverse_complement
+      assert(m2 == m || m == cm2)
+      assert(!(m > m2) && !(m > cm2));
+      count += 1
+    }      
+    assert_equal @str.size - Jellyfish::MerDNA::k() + 1, count
+  end
+end
diff --git a/swig/string_mers.i b/swig/string_mers.i
index 9a1f49e..064ee06 100644
--- a/swig/string_mers.i
+++ b/swig/string_mers.i
@@ -1,6 +1,143 @@
 /****************************************/
 /* Iterator of all the mers in a string */
 /****************************************/
+#ifdef SWIGPYTHON
+%exception __next__ {
+      $action;
+      if(!result) {
+        PyErr_SetString(PyExc_StopIteration, "Done");
+        SWIG_fail;
+      }
+    }
+%exception next {
+      $action;
+      if(!result) {
+        PyErr_SetString(PyExc_StopIteration, "Done");
+        SWIG_fail;
+      }
+    }
+#endif
+
 %{
+  class StringMers {
+    const char*       m_current;
+    const char* const m_last;
+    const bool        m_canonical;
+    MerDNA            m_m, m_rcm;
+    unsigned int      m_filled;
+
+  public:
+    StringMers(const char* str, int len, bool canonical)
+      : m_current(str)
+      , m_last(str + len)
+      , m_canonical(canonical)
+      , m_filled(0)
+    { }
+
+    bool next_mer() {
+      if(m_current == m_last)
+        return false;
+
+      do {
+        int code = jellyfish::mer_dna::code(*m_current);
+        ++m_current;
+        if(code >= 0) {
+          m_m.shift_left(code);
+          if(m_canonical)
+            m_rcm.shift_right(m_rcm.complement(code));
+          m_filled = std::min(m_filled + 1, m_m.k());
+        } else
+          m_filled = 0;
+      } while(m_filled < m_m.k() && m_current != m_last);
+      return m_filled == m_m.k();
+    }
+
+    const MerDNA* mer() const { return !m_canonical || m_m < m_rcm ? &m_m : &m_rcm; }
+
+    const MerDNA* next_mer__() {
+      return next_mer() ? mer() : nullptr;
+    }
+
+
+#ifdef SWIGRUBY
+    void each() {
+      if(!rb_block_given_p()) return;
+      while(next_mer()) {
+        auto m = SWIG_NewPointerObj(const_cast<MerDNA*>(mer()), SWIGTYPE_p_MerDNA, 0);
+        rb_yield(m);
+      }
+    }
+#endif
+
+#ifdef SWIGPYTHON
+    StringMers* __iter__() { return this; }
+    const MerDNA* __next__() { return next_mer__(); }
+    const MerDNA* next() { return next_mer__(); }
+#endif
+
+#ifdef SWIGPERL
+    const MerDNA* each() { return next_mer__(); }
+#endif
+
+  };
 
+  StringMers* string_mers(char* str, int length) { return new StringMers(str, length, false); }
+  StringMers* string_canonicals(char* str, int length) { return new StringMers(str, length, true); }
 %}
+
+%apply (char *STRING, int LENGTH) { (char* str, int length) };
+%newobject string_mers;
+%newobject string_canonicals;
+%feature("autodoc", "Get an iterator to the mers in the string");
+StringMers* string_mers(char* str, int length);
+%feature("autodoc", "Get an iterator to the canonical mers in the string");
+StringMers* string_canonicals(char* str, int length);
+
+#ifdef SWIGRUBY
+%mixin StringMers "Enumerable";
+%init %{
+  rb_eval_string("class String\n"
+                 "  def mers(&b); it = Jellyfish::string_mers(self); b ? it.each(&b) : it; end\n"
+                 "  def canonicals(&b); it = Jellyfish::string_canonicals(self, &b); b ? it.each(&b) : it; end\n"
+                 "end");
+%}
+#endif
+
+/* #ifdef SWIGPERL */
+/* // For perl, return an empty array at end of iterator */
+/* %typemap(out) const MerDNA* { */
+/*   if($1) { */
+/*     SWIG_Object m = SWIG_NewPointerObj(const_cast<MerDNA*>($1), SWIGTYPE_p_MerDNA, 0); */
+/*     %append_output(m); */
+/*   } */
+/*  } */
+/* #endif */
+
+
+%feature("autodoc", "Extract k-mers from a sequence string");
+class StringMers {
+public:
+  %feature("autodoc", "Create a k-mers parser from a string. Pass true as a second argument to get canonical mers");
+  StringMers(const char* str, int len, bool canonical);
+
+  %feature("autodoc", "Get the next mer. Return false if reached the end of the string.");
+  bool next_mer();
+
+  %feature("autodoc", "Return the current mer (or its canonical representation)");
+  const MerDNA* mer() const;
+
+#ifdef SWIGRUBY
+  %feature("autodoc", "Iterate through all the mers in the string");
+  void each();
+#endif
+
+#ifdef SWIGPYTHON
+  StringMers* __iter__();
+  const MerDNA* __next__();
+  const MerDNA* next();
+#endif
+
+#ifdef SWIGPERL
+  MerDNA* each();
+#endif
+};
diff --git a/tests/swig_perl.sh b/tests/swig_perl.sh
index 5aa9252..0223c37 100644
--- a/tests/swig_perl.sh
+++ b/tests/swig_perl.sh
@@ -11,7 +11,7 @@ $JF count -m $K -s 10M -t $nCPUs -C -o ${pref}.jf seq1m_$I.fa
 $JF dump -c ${pref}.jf > ${pref}.dump
 $JF histo ${pref}.jf > ${pref}.histo
 
-for i in test_mer_file.t test_hash_counter.t; do
+for i in test_mer_file.t test_hash_counter.t test_string_mers.t; do
     echo Test $i
     $PERL "-I$LOADPATH/.libs" "-I$LOADPATH" "-I$SRCDIR/swig/perl5" "$SRCDIR/swig/perl5/t/$i" .
 done
diff --git a/tests/swig_python.sh b/tests/swig_python.sh
index e70bba5..bf8e6ae 100644
--- a/tests/swig_python.sh
+++ b/tests/swig_python.sh
@@ -11,7 +11,7 @@ $JF count -m $K -s 10M -t $nCPUs -C -o ${pref}.jf seq1m_$I.fa
 $JF dump -c ${pref}.jf > ${pref}.dump
 $JF histo ${pref}.jf > ${pref}.histo
 
-for i in test_mer_file.py test_hash_counter.py; do
+for i in test_mer_file.py test_hash_counter.py test_string_mers.py; do
     echo Test $i
     $PYTHON "$SRCDIR/swig/python/$i" .
 done
diff --git a/tests/swig_ruby.sh b/tests/swig_ruby.sh
index 0644fcc..cc4d866 100644
--- a/tests/swig_ruby.sh
+++ b/tests/swig_ruby.sh
@@ -13,7 +13,7 @@ $JF histo ${pref}.jf > ${pref}.histo
 
 
 
-for i in test_mer_file.rb test_hash_counter.rb; do
+for i in test_mer_file.rb test_hash_counter.rb test_string_mers.rb; do
     echo Test $i
     $RUBY "-I$LOADPATH" "$SRCDIR/swig/ruby/$i" .
 done
diff --git a/unit_tests/test_mer_dna.cc b/unit_tests/test_mer_dna.cc
index a6f27b8..9415219 100644
--- a/unit_tests/test_mer_dna.cc
+++ b/unit_tests/test_mer_dna.cc
@@ -461,7 +461,8 @@ TYPED_TEST(MerDNA, GetBits) {
   for(unsigned int i = 0; i < 20; ++i) {
     long int start   = random() % (this->GetParam().size() - 1);
     long int max_len =
-      std::min(this->GetParam().size() - start, 8 * sizeof(typename TypeParam::Type::base_type));
+      std::min((long int)(this->GetParam().size() - start),
+               (long int)(8 * sizeof(typename TypeParam::Type::base_type)));
     long int len     = (random() % (max_len - 1)) + 1;
 
     // Get bits by right-shifting
diff --git a/unit_tests/test_stdio_filebuf.cc b/unit_tests/test_stdio_filebuf.cc
index 078003a..66c3b5a 100644
--- a/unit_tests/test_stdio_filebuf.cc
+++ b/unit_tests/test_stdio_filebuf.cc
@@ -50,6 +50,7 @@ TEST(StdioFileBuf, Read) {
 
     have_read += expect_read;
   }
-  EXPECT_TRUE(fd_stream.eof());
+  EXPECT_TRUE(fd_stream.eof() || fd_stream.peek() == EOF);
+  EXPECT_TRUE(file_stream.eof() || file_stream.peek() == EOF);
 }
 }

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/jellyfish.git



More information about the debian-med-commit mailing list