[med-svn] [Git][med-team/minia][master] 8 commits: debhelper-compat 12

Andreas Tille gitlab at salsa.debian.org
Thu Dec 5 14:26:23 GMT 2019



Andreas Tille pushed to branch master at Debian Med / minia


Commits:
8b9d8b22 by Andreas Tille at 2019-12-05T14:05:13Z
debhelper-compat 12

- - - - -
2a4e1bfa by Andreas Tille at 2019-12-05T14:05:20Z
Standards-Version: 4.4.1

- - - - -
8b44244e by Andreas Tille at 2019-12-05T14:05:25Z
Trim trailing whitespace.

Fixes lintian: file-contains-trailing-whitespace
See https://lintian.debian.org/tags/file-contains-trailing-whitespace.html for more details.

- - - - -
9a9b6c55 by Andreas Tille at 2019-12-05T14:05:28Z
Set upstream metadata fields: Repository, Repository-Browse.
- - - - -
b7630428 by Andreas Tille at 2019-12-05T14:20:54Z
New upstream git commit since released version is not compatible with gatb-core 1.4.1+git20191130.664696c+dfsg

- - - - -
b1b6ea2d by Andreas Tille at 2019-12-05T14:21:22Z
New upstream version 3.2.1+git20191130.5b131b9
- - - - -
475d757c by Andreas Tille at 2019-12-05T14:21:24Z
Update upstream source from tag 'upstream/3.2.1+git20191130.5b131b9'

Update to upstream version '3.2.1+git20191130.5b131b9'
with Debian dir e6d9509a26fe07cf3f036bc87e3455760f03e1f3
- - - - -
00301a5a by Andreas Tille at 2019-12-05T14:26:01Z
Not uploaded!  Just verified that building against gatb-core 1.4.1+git20191130.664696c+dfsg is OK!

- - - - -


11 changed files:

- CMakeLists.txt
- README.md
- debian/changelog
- − debian/compat
- debian/control
- debian/upstream/metadata
- debian/watch
- merci/merci.cpp
- src/Minia.cpp
- test/ERR039477.md5
- + test/bubble_covmult0.5.fa


Changes:

=====================================
CMakeLists.txt
=====================================
@@ -8,7 +8,7 @@ cmake_minimum_required (VERSION 2.6)
 # The default version number is the latest official build
 SET (gatb-tool_VERSION_MAJOR 3)
 SET (gatb-tool_VERSION_MINOR 2)
-SET (gatb-tool_VERSION_PATCH 0)
+SET (gatb-tool_VERSION_PATCH 1)
 
 # But, it is possible to define another release number during a local build
 IF (DEFINED MAJOR)
@@ -84,6 +84,8 @@ link_directories (${gatb-core-extra-libraries-path})
 set (PROGRAM_SOURCE_DIR ${PROJECT_SOURCE_DIR}/src)
 set(CMAKE_RUNTIME_OUTPUT_DIRECTORY ${CMAKE_BINARY_DIR}/bin)
 
+cmake_policy(SET CMP0009 NEW) # fixes cmake complaining about symlinks
+
 include_directories (${PROGRAM_SOURCE_DIR})
 file (GLOB_RECURSE  ProjectFiles  ${PROGRAM_SOURCE_DIR}/*)
 add_executable(${PROJECT_NAME} ${ProjectFiles})


=====================================
README.md
=====================================
@@ -2,24 +2,27 @@
 
 [![License](http://img.shields.io/:license-affero-blue.svg)](http://www.gnu.org/licenses/agpl-3.0.en.html)
 
+<!---
 | **Linux** | **Mac OSX** |
 |-----------|-------------|
 [![Build Status](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-debian7-64bits-gcc-4.7/badge/icon)](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-debian7-64bits-gcc-4.7/) | [![Build Status](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-macos-10.9.5-gcc-4.2.1/badge/icon)](https://ci.inria.fr/gatb-core/view/Minia/job/tool-minia-build-macos-10.9.5-gcc-4.2.1/)
+--->
 
+# Before continuing..
 
-# What is Minia ?
+If you are looking to do high-quality genome or metagenome assemblies, please go here: https://github.com/GATB/gatb-minia-pipeline This is a pipeline built on top of Minia that does a similar algorithm to metaSpades and MEGAHIT (multi-k assembly).
 
-Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Minia produces results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet).
+# Introduction
 
-# Getting the latest source code
+Minia is a short-read assembler based on a de Bruijn graph, capable of assembling a human genome on a desktop computer in a day. The output of Minia is a set of contigs. Back when it was released, Minia produced results of similar contiguity and accuracy to other de Bruijn assemblers (e.g. Velvet). Now (2015 onwards), genome assemblers have evolved and in order ot have high contiguity, see the previous section. 
 
-## Requirements
+# Getting the latest source code
 
-CMake 2.6+; see http://www.cmake.org/cmake/resources/software.html
+## Instructions
 
-C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
+It is recommended to use download the latest binary release (Linux or OSX) there: https://github.com/GATB/minia/releases
 
-## Instructions
+Otherwise, Minia may be compiled from sources as follows:
 
     # get a local copy of minia source code
     git clone --recursive https://github.com/GATB/minia.git
@@ -28,6 +31,13 @@ C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
     cd minia
     sh INSTALL
 
+## Requirements
+
+CMake 3.10+; see http://www.cmake.org/cmake/resources/software.html
+
+C++11 compiler; (g++ version>=4.7 (Linux), clang version>=4.3 (Mac OSX))
+
+
 # User manual	 
 
 Type `minia` without any arguments for usage instructions.


=====================================
debian/changelog
=====================================
@@ -1,12 +1,21 @@
-minia (3.2.1-2) UNRELEASED; urgency=medium
+minia (3.2.1+git20191130.5b131b9-1) UNRELEASED; urgency=medium
+
+  Not uploaded!  Just verified that building against
+    gatb-core 1.4.1+git20191130.664696c+dfsg is OK!
 
   [ Andreas Tille ]
+  * New upstream git commit since released version is not compatible
+    with gatb-core 1.4.1+git20191130.664696c+dfsg
   * Test-Depends: bandage
+  * debhelper-compat 12
+  * Standards-Version: 4.4.1
+  * Trim trailing whitespace.
+  * Set upstream metadata fields: Repository, Repository-Browse.
 
   [ Andrius Merkys ]
   * Adding missing copyright details for files under thirdparty/contig2fastg/.
 
- -- Andrius Merkys <merkys at debian.org>  Wed, 13 Nov 2019 07:06:30 -0500
+ -- Andreas Tille <tille at debian.org>  Thu, 05 Dec 2019 15:25:30 +0100
 
 minia (3.2.1-1) unstable; urgency=medium
 
@@ -88,4 +97,3 @@ minia (1.6067+dfsg-1) unstable; urgency=medium
   * First debian package (Closes: #735158).
 
  -- Olivier Sallou <osallou at debian.org>  Sat, 21 Dec 2013 16:55:29 +0100
-


=====================================
debian/compat deleted
=====================================
@@ -1 +0,0 @@
-12


=====================================
debian/control
=====================================
@@ -4,14 +4,14 @@ Uploaders: Olivier Sallou <osallou at debian.org>,
            Andreas Tille <tille at debian.org>
 Section: science
 Priority: optional
-Build-Depends: debhelper (>= 12~),
+Build-Depends: debhelper-compat (= 12),
                cmake,
                bc,
                zlib1g-dev,
                libboost-dev,
-               libgatbcore-dev (>= 1.4.1+git20181225.44d5a44~),
+               libgatbcore-dev,
                libhdf5-dev
-Standards-Version: 4.3.0
+Standards-Version: 4.4.1
 Vcs-Browser: https://salsa.debian.org/med-team/minia
 Vcs-Git: https://salsa.debian.org/med-team/minia.git
 Homepage: http://minia.genouest.org/


=====================================
debian/upstream/metadata
=====================================
@@ -1,21 +1,23 @@
 Reference:
- Author: Rayan Chikhi and Guillaume Rizk
- Title: >
-  Space-Efficient and Exact de Bruijn Graph Representation
-  Based on a Bloom Filter.
- Journal: Algorithms for Molecular Biology
- Year: 2013
- Volume: 8
- Number: 1
- Pages: 22
- DOI: 10.1186/1748-7188-8-22
- PMID: 24040893
- URL: http://www.almob.org/content/8/1/22
- eprint: http://minia.genouest.org/files/minia.pdf
+  Author: Rayan Chikhi and Guillaume Rizk
+  Title: >
+    Space-Efficient and Exact de Bruijn Graph Representation
+    Based on a Bloom Filter.
+  Journal: Algorithms for Molecular Biology
+  Year: 2013
+  Volume: 8
+  Number: 1
+  Pages: 22
+  DOI: 10.1186/1748-7188-8-22
+  PMID: 24040893
+  URL: http://www.almob.org/content/8/1/22
+  eprint: http://minia.genouest.org/files/minia.pdf
 Registry:
- - Name: OMICtools
-   Entry: OMICS_00022
- - Name: bio.tools
-   Entry: minia
- - Name: SciCrunch
-   Entry: SCR_004986
+- Name: OMICtools
+  Entry: OMICS_00022
+- Name: bio.tools
+  Entry: minia
+- Name: SciCrunch
+  Entry: SCR_004986
+Repository: https://github.com/GATB/minia
+Repository-Browse: https://github.com/GATB/minia


=====================================
debian/watch
=====================================
@@ -1,3 +1,8 @@
 version=4
 
-https://github.com/GATB/minia/releases .*/archive/v(\d[\d.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)
+opts="mode=git,pretty=3.2.1+git%cd.%h" \
+    https://github.com/GATB/minia.git HEAD
+
+# Released version is not compatible with gatb-core 1.4.1+git20191130.664696c+dfsg
+# So stick to latest Git commit for the moment
+#  https://github.com/GATB/minia/releases .*/archive/v(\d[\d.-]+)\.(?:tar(?:\.gz|\.bz2)?|tgz)


=====================================
merci/merci.cpp
=====================================
@@ -373,6 +373,65 @@ static bool maybe_merge(uint64_t packed, connections_index_t &connections_index,
     return true; 
 }
 
+        
+static void
+parse_unitig_header(string header, float& mean_abundance)
+{
+    bool debug = false;
+    if (debug) std::cout << "parsing unitig links for " << header << std::endl;
+    std::stringstream stream(header);
+    while(1) {
+        string tok;
+        stream >> tok;
+        if(!stream)
+            break;
+
+        if (tok.size() < 3)
+            // that's the id, skip it
+            continue;
+
+        string field = tok.substr(0,2);
+
+		if (field == "km")
+		{
+			mean_abundance = atof(tok.substr(tok.find_last_of(':')+1).c_str());
+			//std::cout << "unitig " << header << " mean abundance " << mean_abundance << std::endl;
+		}
+	}
+}
+
+
+void renumber_glue_file(string glue_filename, uint64_t nb_out_tigs)
+{
+    {
+        std::ifstream infile(glue_filename);
+        std::ofstream outfile(glue_filename+".tmp");
+        std::string line;
+        uint64_t counter = 1;
+        while (std::getline(infile, line))
+        {
+            if (line[0] == '>')
+            {
+                size_t space_pos = line.find(' ');
+                /* // yolo
+                if (space_pos >= line.size())
+                {
+                    std::cout << "error: no space in this glue file header (" << line << ") contact a developer." << std::endl;
+                    exit(1);
+                }
+                */
+                auto end_header = line.substr(space_pos);
+                string new_header = ">" + std::to_string(nb_out_tigs+counter) + end_header;
+                outfile << new_header << std::endl;
+                counter++;
+            }
+            else
+                outfile << line << std::endl;
+        }
+    } // closes files
+    file_copy(glue_filename+".tmp",glue_filename);
+    System::file().remove (glue_filename+".tmp");
+}
 
 static void 
 extend_assembly_with_connections(const string assembly, int k, int nb_threads, bool verbose, connections_index_t &connections_index, connections_t &connections, BankFasta &out, BankFasta &glue)
@@ -415,6 +474,13 @@ extend_assembly_with_connections(const string assembly, int k, int nb_threads, b
         s.getData().setRef ((char*)seq.c_str(), seq.size());
         s._comment = string(lmark?"1":"0")+string(rmark?"1":"0"); //We set the sequence comment.
         s._comment += " ";
+
+        // add coverage information 
+		float mean_abundance;
+		parse_unitig_header(comment,mean_abundance);
+		uint nb_kmers = seq.size() - k + 1;
+		for (uint i = 0; i < nb_kmers; i++)
+			s._comment += std::to_string((uint)mean_abundance) + " ";
         
         if (lmark || rmark)
             glue.insert(s); 
@@ -438,7 +504,8 @@ void merci(int k, string reads, string assembly, int nb_threads, bool verbose)
     string linked_assembly = assembly + ".linked";
     file_copy(assembly, linked_assembly);
     uint64_t nb_tigs = 0;
-    link_tigs<span>( linked_assembly, k, nb_threads, nb_tigs, verbose);
+    bool renumber_unitigs = true; // let's allow the input to be anything. Here it doesn't amtter much. We anyway renumber at the end
+    link_tigs<span>( linked_assembly, k, nb_threads, nb_tigs, verbose, renumber_unitigs);
 
     // real trick here
     // tigs of length exactly k are annoying, they need to be handled carefully with UNITIG_BOTH positions
@@ -463,11 +530,20 @@ void merci(int k, string reads, string assembly, int nb_threads, bool verbose)
     glue.flush();
    
     // glue what needs to be glued. magic, we're re-using bcalm code
-    bglue<span> (nullptr /*no storage*/, assembly+".glue", k, 0, nb_threads, verbose);
+    bglue<span> (nullptr /*no storage*/, assembly+".glue", k, 0, nb_threads, false, verbose);
+  
+    // renumber the .glue file just to avoid ID collision with .merci file
+    renumber_glue_file(assembly+".glue", nb_tigs );
     
     // append glued to merci
     out.flush();
     file_append(assembly+".merci", assembly+".glue");
+
+    // bglue drop links so let's recreate them 
+    k += 1;
+    file_copy(assembly+".merci", assembly+".merci.b4link");
+    renumber_unitigs = true; // here it's absolutely mandatory to renumber if we want the output to be processed by minia
+    link_tigs<span>( assembly+".merci", k, nb_threads, nb_tigs, verbose, false, renumber_unitigs);
 }
 
 class Merci : public gatb::core::tools::misc::impl::Tool


=====================================
src/Minia.cpp
=====================================
@@ -154,8 +154,7 @@ struct MiniaFunctor  {  void operator ()  (Parameter parameter)
     // link contigs
     uint nb_threads = 1;  // doesn't matter because for now link_tigs is single-threaded
     bool verbose = true;
-    link_tigs<span>(output, minia.k, nb_threads, minia.nbContigs, verbose);
-
+    link_tigs<span>(output, minia.k, nb_threads, minia.nbContigs, verbose, false);
 
     /** We gather some statistics. */
     minia.getInfo()->add (1, minia.getTimeInfo().getProperties("time"));
@@ -274,8 +273,8 @@ string Minia::assemble (/*const, removed because Simplifications isn't const any
 			graphSimplifications._bulgeLen_kAdd = getInput()->getDouble("-bulge-len-kadd");
 		if (getParser()->saw("-bulge-altpath-kadd"))
 			graphSimplifications._bulgeAltPath_kAdd = getInput()->getDouble("-bulge-altpath-kadd");
-		if (getParser()->saw("-bulge-altpath-covMult"))
-			graphSimplifications._bulgeAltPath_covMult = getInput()->getDouble("-bulge-altpath-covMult");
+		if (getParser()->saw("-bulge-altpath-covmult"))
+			graphSimplifications._bulgeAltPath_covMult = getInput()->getDouble("-bulge-altpath-covmult");
 
 		if (getParser()->saw("-ec-len-kmult"))
 			graphSimplifications._ecLen_kMult = getInput()->getDouble("-ec-len-kmult");


=====================================
test/ERR039477.md5
=====================================
@@ -1,3 +1,3 @@
-3732560f98d63897d2b7a122938d7a42 # osx CI
-037b126f9e37db1db55d23eadc40477d # gcc 7 blok-bok
-c6e5a2cf1b9c6246129ae4263da749cc # debian CI
+e92d66d1e5b7450e6f6d8f6cc1de24bf # osx CI
+3192031d3491f3488a210419c50b9d4d # gcc 7 blok-bok
+dc556ec0e91c9aad6c1a68e48a2d8456 # debian CI


=====================================
test/bubble_covmult0.5.fa
=====================================
@@ -0,0 +1,16 @@
+>works well for k=21; part of genome10K.fasta
+CATCGATGCGAGACGCCTGTCGCGGGGAATTGTGGGGCGGACCACGCTCTGGCTAACGAGCTACCGTTTCCTTTAACCTGCCAGACGGTGACCAGGGCCGTTCGGCGTTGCATCGAGCGGTGTCGCTAGCGCAATGCGCAAGATTTTGACATTTACAAGGCAACATTGCAGCGTCCGATGGTCCGGTGGCCTCCAGATAGTGTCCAGTCGCTCTAACTGTATGGAGACCATAGGCATTTACCTTATTCTCATCGCCACGCCCCAAGATCTTTAGGACCCAGCATTCCTTTAACCACTAACATAACGCGTGTCATCTAGTTCAACAACC
+>that's the bubble  coverage 4
+TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
+>that's the bubble 
+TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
+>that's the bubble 
+TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
+>that's the bubble 
+TGTCATCTAGTTCAACAACCAAAATAACGACTCTTGCGCTCGGATGT
+>that's the bubble path 2, coverage 2
+TGTCATCTAGTTCAACAACCAAAAAAACGACTCTTGCGCTCGGATGT
+>that's the bubble  
+TGTCATCTAGTTCAACAACCAAAAAAACGACTCTTGCGCTCGGATGT
+>remaining part
+CGACTCTTGCGCTCGGATGTCCGCAATGGGTTATCCCTATGTTCCGGTAATCTCTCATCTACTAAGCGCCCTAAAGGTCGTATGGTTGGAGGGCGGTTACACACCCTTAAGTACCGAACGATAGAGCACCCGTCTAGGAGGGCGTGCAGGGTCTCCCGCTAGCTAATGGTCACGGCCTCTCTGGGAAAGCTGAACAACGGATGATACCCATACTGCCACTCCAGTACCTGGGCCGCGTGTTGTACGCTGTGTATCTTGAGAGCGTTTCCAGCAGATAGAACAGGATCACATGTACAAA



View it on GitLab: https://salsa.debian.org/med-team/minia/compare/7a9b62aad0fbf0aaa06611704615230828ffe9ee...00301a5a7251f91a8cde9807ee2fd8c568e42258

-- 
View it on GitLab: https://salsa.debian.org/med-team/minia/compare/7a9b62aad0fbf0aaa06611704615230828ffe9ee...00301a5a7251f91a8cde9807ee2fd8c568e42258
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20191205/be9798ca/attachment-0001.html>


More information about the debian-med-commit mailing list