[med-svn] [Git][med-team/libmaus2][master] 9 commits: routine-update: New upstream version

Thu Sep 2 18:33:11 BST 2021


Étienne Mollier pushed to branch master at Debian Med / libmaus2


Commits:
5b9566e3 by Étienne Mollier at 2021-08-29T22:10:52+02:00
routine-update: New upstream version

- - - - -
ca1d2846 by Étienne Mollier at 2021-08-29T22:10:53+02:00
New upstream version 2.0.801+dfsg
- - - - -
055b2a61 by Étienne Mollier at 2021-08-29T22:10:59+02:00
Update upstream source from tag 'upstream/2.0.801+dfsg'

Update to upstream version '2.0.801+dfsg'
with Debian dir 894a8b6a171399fe3c7dc619f14a42ba64a500f4
- - - - -
9f989f99 by Étienne Mollier at 2021-08-30T20:55:38+02:00
routine-update: New upstream version

- - - - -
30ae1f9f by Étienne Mollier at 2021-08-30T20:55:39+02:00
New upstream version 2.0.802+dfsg
- - - - -
3c4b8c9e by Étienne Mollier at 2021-08-30T20:55:45+02:00
Update upstream source from tag 'upstream/2.0.802+dfsg'

Update to upstream version '2.0.802+dfsg'
with Debian dir eda50e9aa7959e88cc4c87da93dd753b4db52e78
- - - - -
877df6b7 by Étienne Mollier at 2021-08-30T20:55:51+02:00
Remove duplicate line from changelog.

Changes-By: lintian-brush

- - - - -
1b99e86b by Étienne Mollier at 2021-08-30T21:16:26+02:00
routine-update: Ready to upload to unstable

- - - - -
0cbf640b by Étienne Mollier at 2021-08-30T21:26:02+02:00
update changelog

- - - - -


20 changed files:

- .gitignore
- ChangeLog
- configure.ac
- debian/changelog
- − debian/patches/fix-build-on-non-x86.patch
- − debian/patches/series
- src/Makefile.am
- src/libmaus2/lcs/AlignmentOneAgainstManyAVX2.cpp
- src/libmaus2/lz/BgzfDeflateHeaderFunctions.cpp
- src/libmaus2/parallel/threadpool/ThreadPool.hpp
- src/libmaus2/parallel/threadpool/bam/BamHeaderSeqData.hpp
- src/libmaus2/parallel/threadpool/bgzf/BgzfBlockInfoEnqueDecompressHandler.hpp
- + src/libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.cpp
- + src/libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.hpp
- + src/libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.cpp
- src/libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.hpp
- + src/libmaus2/sorting/RankSearch.hpp
- src/libmaus2/vcf/VCFParser.hpp
- src/test/testThreadPoolCramEncode.cpp
- src/test/testsort.cpp


Changes:

=====================================
.gitignore
=====================================
@@ -44,10 +44,12 @@ src/test[a-zA-Z]*
 */*/.dirstamp
 */*/*/.dirstamp
 */*/*/*/.dirstamp
+*/*/*/*/*/.dirstamp
 */.deps
 */*/.deps
 */*/*/.deps
 */*/*/*/.deps
+*/*/*/*/*/.deps
 log*
 compile_gcc_*.sh
 compile.sh


=====================================
ChangeLog
=====================================
@@ -1,3 +1,32 @@
+libmaus2 (2.0.802-1) unstable; urgency=medium
+
+  * Fix compilation without libdeflate
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Mon, 30 Aug 2021 10:29:07 +0200
+
+libmaus2 (2.0.801-1) unstable; urgency=medium
+
+  * Do not try to instantiate non existant context class in AlignmentOneAgainstManyAVX2 if LIBMAUS2_HAVE_ALIGNMENT_ONE_TO_MANY_AVX2 is not set
+  * Add thread pool based sorting interface in testsort
+  * Add ensureDispatcherRegistered method in ThreadPool
+  * Add delayed search merging in testsort
+  * Add more sorting/testing code in testsort.cpp
+  * Add planing code for parallel swapping ob non adjacent blocks in testsort
+  * Make error message more useful in VCFParser
+  * Add timing test for new sorting code
+  * Add some new sorting code in testsort
+  * Add computation of compressBound in BgzfCompressContext
+  * Add BgzfCompressContext class
+  * Reformat BgzfDecompressContext
+  * Replace numerical value by symbol one in BgzfDeflateHeaderFunctions
+  * Construct missing REF_CACHE elements if required and possible in testThreadPoolCramEncode
+  * Use context interface for crc32 in BgzfBlockInfoEnqueDecompressHandler
+  * Add BgzfDecompressContext.cpp in Makefile.am
+  * Add support for libdeflate in BgzfDecompressContext
+  * Add sequencesMissing method in BamHeaderSeqData
+
+ -- German Tischler-Höhle <germant at miltenyibiotec.de>  Fri, 27 Aug 2021 11:03:24 +0200
+
 libmaus2 (2.0.800-1) unstable; urgency=medium
 
   * Rename GetObject to GetCObject to avoid name clash on w32/w64


=====================================
configure.ac
=====================================
@@ -1,5 +1,5 @@
-AC_INIT(libmaus2,2.0.800,[germant at miltenyibiotec.de],[libmaus2],[https://gitlab.com/german.tischler/libmaus2])
-LIBRARY_VERSION=2:800:0
+AC_INIT(libmaus2,2.0.802,[germant at miltenyibiotec.de],[libmaus2],[https://gitlab.com/german.tischler/libmaus2])
+LIBRARY_VERSION=2:802:0
 AC_MSG_NOTICE([Configuring for source in directory ${srcdir}])
 AC_CANONICAL_SYSTEM
 AC_CANONICAL_HOST


=====================================
debian/changelog
=====================================
@@ -1,3 +1,10 @@
+libmaus2 (2.0.802+dfsg-1) unstable; urgency=medium
+
+  * New upstream version
+  * removed fix-build-on-non-x86.patch; bug fixed upstream.
+
+ -- Étienne Mollier <emollier at debian.org>  Mon, 30 Aug 2021 20:57:35 +0200
+
 libmaus2 (2.0.800+dfsg-3) unstable; urgency=medium
 
   * d/control: Breaks: biobambam2 (<< 2.0.183~); testing migration caught that


=====================================
debian/patches/fix-build-on-non-x86.patch deleted
=====================================
@@ -1,24 +0,0 @@
-Description: fix build failures on non-x86 platforms
-Author: Étienne Mollier <emollier at debian.org>
-Forwarded: https://gitlab.com/german.tischler/libmaus2/-/issues/37
-Last-Update: 2021-08-26
----
-This patch header follows DEP-3: http://dep.debian.net/deps/dep3/
---- libmaus2.orig/src/libmaus2/lcs/AlignmentOneAgainstManyAVX2.cpp
-+++ libmaus2/src/libmaus2/lcs/AlignmentOneAgainstManyAVX2.cpp
-@@ -312,7 +312,6 @@
- 		}
- 	}
- };
--#endif
- 
- libmaus2::lcs::AlignmentOneAgainstManyAVX2::AlignmentOneAgainstManyAVX2()
- : context(new AlignmentOneAgainstManyAVX2Context,[](auto p){delete reinterpret_cast<AlignmentOneAgainstManyAVX2Context *>(p);})
-@@ -324,6 +323,7 @@
- 	throw lme;
- 	#endif
- }
-+#endif
- 
- void libmaus2::lcs::AlignmentOneAgainstManyAVX2::process(
- 	uint8_t const * qa,


=====================================
debian/patches/series deleted
=====================================
@@ -1 +0,0 @@
-fix-build-on-non-x86.patch


=====================================
src/Makefile.am
=====================================
@@ -219,7 +219,9 @@ libmaus2_la_SOURCES = libmaus2/StaticInitialization.cpp \
 	libmaus2/aio/SecrecyInputStreamFactory.cpp \
 	libmaus2/aio/SecrecyOutputStreamFactory.cpp \
 	libmaus2/bambam/Scram.c \
-	libmaus2/bambam/parallel/ScramCramEncoding.cpp
+	libmaus2/bambam/parallel/ScramCramEncoding.cpp \
+	libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.cpp \
+	libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.cpp
 libmaus2_la_CPPFLAGS = ${AM_CPPFLAGS} @LIBMAUSCPPFLAGS@ @SNAPPYCPPFLAGS@ @ZLIBCPPFLAGS@ @NETTLECPPFLAGS@ @IGZIPCPPFLAGS@ @GMPCPPFLAGS@ @LIBMAUS2_LZMA_CPPFLAGS@ @LIBDEFLATECPPFLAGS@ @LIBMAUS2_LIBCURL_CPPFLAGS@ @LIBMAUS2_LIBSECRECY_CPPFLAGS@ @LIBMAUS2_PARASAIL_CPPFLAGS@ @IOLIBCPPFLAGS@
 libmaus2_la_CXXFLAGS = @LIBMAUSCXXFLAGS@ ${AM_CXXFLAGS} @WARNCXXFLAGS@
 libmaus2_la_LDFLAGS = @LIBMAUSLDFLAGS@ @SNAPPYLDFLAGS@ -version-info ${LIBRARY_VERSION} @LIBMAUSSTATIC@ @IGZIPLDFLAGS@ @NETTLELDFLAGS@ @GMPLDFLAGS@ @LIBDEFLATELDFLAGS@ @IOLIBLDFLAGS@
@@ -1762,7 +1764,8 @@ libmaus2sorting_include_HEADERS=\
 	libmaus2/sorting/InterleavedRadixSort.hpp \
 	libmaus2/sorting/ParallelRunLengthRadixUnsort.hpp \
 	libmaus2/sorting/ParallelRunLengthRadixSort.hpp \
-	libmaus2/sorting/ParallelExternalRadixSort.hpp
+	libmaus2/sorting/ParallelExternalRadixSort.hpp \
+	libmaus2/sorting/RankSearch.hpp
 
 libmaus2hashing_includedir=$(includedir)/libmaus2/hashing
 libmaus2hashing_include_HEADERS=\


=====================================
src/libmaus2/lcs/AlignmentOneAgainstManyAVX2.cpp
=====================================
@@ -315,7 +315,11 @@ struct AlignmentOneAgainstManyAVX2Context
 #endif
 
 libmaus2::lcs::AlignmentOneAgainstManyAVX2::AlignmentOneAgainstManyAVX2()
-: context(new AlignmentOneAgainstManyAVX2Context,[](auto p){delete reinterpret_cast<AlignmentOneAgainstManyAVX2Context *>(p);})
+: context(
+	#if defined(LIBMAUS2_HAVE_ALIGNMENT_ONE_TO_MANY_AVX2)
+	new AlignmentOneAgainstManyAVX2Context,[](auto p){delete reinterpret_cast<AlignmentOneAgainstManyAVX2Context *>(p);}
+	#endif
+  )
 {
 	#if! defined(LIBMAUS2_HAVE_ALIGNMENT_ONE_TO_MANY_AVX2)
 	libmaus2::exception::LibMausException lme;


=====================================
src/libmaus2/lz/BgzfDeflateHeaderFunctions.cpp
=====================================
@@ -33,7 +33,7 @@ uint8_t const * libmaus2::lz::BgzfDeflateHeaderFunctions::fillHeaderFooter(
 )
 {
         // set size of compressed block in header
-        unsigned int const blocksize = getBgzfHeaderSize()/*header*/+8/*footer*/+payloadsize-1;
+        unsigned int const blocksize = getBgzfHeaderSize()/*header*/+getBgzfFooterSize()/*footer*/+payloadsize-1;
         assert ( blocksize < getBgzfMaxBlockSize() );
         outbuf[16] = (blocksize >> 0) & 0xFF;
         outbuf[17] = (blocksize >> 8) & 0xFF;


=====================================
src/libmaus2/parallel/threadpool/ThreadPool.hpp
=====================================
@@ -60,6 +60,7 @@ namespace libmaus2
 				std::atomic<int> panicFlag;
 				std::atomic<std::size_t> seq;
 
+				std::mutex dispatcherMapLock;
 				libmaus2::avl::AtomicAVLPtrValueMap<std::type_index,ThreadPoolDispatcher> dispatcherMap;
 				libmaus2::parallel::AtomicPtrStack<libmaus2::exception::LibMausException> panicStack;
 				libmaus2::parallel::AtomicPtrStack< std::thread > Vthread;
@@ -100,9 +101,16 @@ namespace libmaus2
 					terminatedFlag = 1;
 				}
 
+				static void staticDispatch(ThreadPool * ptr)
+				{
+					ptr->dispatch();
+				}
+
 				template<typename data_type>
 				void registerDispatcher(libmaus2::util::shared_ptr<ThreadPoolDispatcher> ptr)
 				{
+					std::lock_guard<decltype(dispatcherMapLock)> slock(dispatcherMapLock);
+
 					std::type_index const type(typeid(data_type));
 
 					auto const it = dispatcherMap.find(type);
@@ -118,9 +126,29 @@ namespace libmaus2
 					dispatcherMap.insert(std::type_index(typeid(data_type)),ptr);
 				}
 
-				static void staticDispatch(ThreadPool * ptr)
+				template<typename data_type>
+				void ensureDispatcherRegistered(libmaus2::util::shared_ptr<ThreadPoolDispatcher> ptr)
 				{
-					ptr->dispatch();
+					std::lock_guard<decltype(dispatcherMapLock)> slock(dispatcherMapLock);
+
+					std::type_index const type(typeid(data_type));
+
+					auto const it = dispatcherMap.find(type);
+
+					if ( it == dispatcherMap.end() )
+						dispatcherMap.insert(std::type_index(typeid(data_type)),ptr);
+				}
+
+				ThreadPoolDispatcher * getDispatcher(std::type_index const & TI)
+				{
+					std::lock_guard<decltype(dispatcherMapLock)> slock(dispatcherMapLock);
+
+					auto it = dispatcherMap.find(TI);
+
+					if ( it == dispatcherMap.end() )
+						return nullptr;
+
+					return it->second.load().get();
 				}
 
 				void dispatch()
@@ -142,9 +170,14 @@ namespace libmaus2
 								assert ( pack );
 								assert ( pack->data.load() );
 
-								auto it = dispatcherMap.find(pack->data.load()->getTypeIndex());
+								auto * dispatcher = getDispatcher(pack->data.load()->getTypeIndex());
 
-								if ( it == dispatcherMap.end() )
+								if ( dispatcher )
+								{
+									dispatcher->dispatch(pack,this);
+									putPackage(pack);
+								}
+								else
 								{
 									std::string const packageTypeName = pack->data.load()->getPackageTypeName();
 									putPackage(pack);
@@ -154,12 +187,6 @@ namespace libmaus2
 									lme.finish();
 									throw lme;
 								}
-								else
-								{
-									it->second.load()->dispatch(pack,this);
-									putPackage(pack);
-								}
-
 							}
 							else if ( terminatedFlag.load() )
 								break;


=====================================
src/libmaus2/parallel/threadpool/bam/BamHeaderSeqData.hpp
=====================================
@@ -20,6 +20,7 @@
 
 #include <cassert>
 #include <regex>
+#include <filesystem>
 #include <libmaus2/exception/LibMausException.hpp>
 #include <libmaus2/aio/InputStreamInstance.hpp>
 
@@ -97,6 +98,38 @@ namespace libmaus2
 							}
 					}
 
+
+					bool sequencesMissing(std::filesystem::path const & ref_cache_path) const
+					{
+						bool needfascan = false;
+
+						for ( auto P : M )
+						{
+							auto const id = P.first;
+							auto const LM = P.second;
+
+							auto const it_M5 = LM.find("M5");
+
+							if ( it_M5 == LM.end() )
+							{
+								libmaus2::exception::LibMausException lme;
+								lme.getStream() << "[E] no M5 attribute found for " << id << std::endl;
+								lme.finish();
+								throw lme;
+							}
+
+							std::string const m5 = it_M5->second;
+
+							std::filesystem::path dictpath = ref_cache_path;
+							dictpath /= m5;
+
+							if ( ! std::filesystem::exists(dictpath) )
+								needfascan = true;
+						}
+
+						return needfascan;
+					}
+
 					BamHeaderSeqData(std::istream & ISI)
 					{
 						init(ISI);


=====================================
src/libmaus2/parallel/threadpool/bgzf/BgzfBlockInfoEnqueDecompressHandler.hpp
=====================================
@@ -339,12 +339,7 @@ namespace libmaus2
 							blocklen
 						);
 
-						uint32_t lcrc = crc32(0,0,0);
-						lcrc = crc32(
-							lcrc,
-							reinterpret_cast<Bytef const *>(ddata->B.load().get() + ddata->o.load()),
-							blocklen
-						);
+						uint32_t const lcrc = context->crc32(ddata->B.load().get() + ddata->o.load(), blocklen);
 
 						if ( lcrc != crc )
 						{


=====================================
src/libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.cpp
=====================================
@@ -0,0 +1,165 @@
+/*
+    libmaus2
+    Copyright (C) 2020 German Tischler-Höhle
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+*/
+#include <libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.hpp>
+#include <libmaus2/lz/BgzfConstants.hpp>
+
+#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+#include <libdeflate.h>
+#else
+#include <zlib.h>
+#endif
+
+#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+static struct libdeflate_compressor * allocateCompressor(int level)
+{
+	if ( level < 0 )
+		level = 6;
+
+	auto * lcompressor = libdeflate_alloc_compressor(level);
+
+	if ( ! lcompressor )
+		throw libmaus2::exception::LibMausException("[E] BgzfCompressContext: libdeflate_alloc_compressor failed");
+
+	return lcompressor;
+}
+#else
+std::shared_ptr<zlib_z_stream> allocateStream(int level)
+{
+	if ( level < 0 )
+		level = Z_DEFAULT_COMPRESSION;
+
+	std::shared_ptr<z_stream> stream(new z_stream);
+	std::memset(stream.get(),0,sizeof(z_stream));
+	stream->zalloc = nullptr;
+	stream->zfree = nullptr;
+	stream->opaque = nullptr;
+	stream->avail_in = 0;
+	stream->next_in = nullptr;
+        int const r = deflateInit2(stream.get(), level, Z_DEFLATED, -15 /* window size */, 8 /* mem level, gzip default */, Z_DEFAULT_STRATEGY);
+
+	if ( r != Z_OK )
+		throw libmaus2::exception::LibMausException("[E] BgzfCompressContext: deflateInit2 failed");
+
+	return stream;
+}
+#endif
+
+std::size_t libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::computeBound()
+{
+	std::size_t const maxpayload = libmaus2::lz::BgzfConstants::getBgzfMaxPayLoad();
+
+	std::size_t low = 0;
+	std::size_t high = maxpayload;
+
+	assert ( compressBound(0) < maxpayload );
+
+	while ( high-low > 1 )
+	{
+		std::size_t const m = (low+high)/2;
+
+		assert ( m > low );
+		assert ( m < high );
+
+		std::size_t const b = compressBound(m);
+
+		// bound too large?
+		if ( b > maxpayload )
+			high = m;
+		else
+			low = m;
+	}
+
+	return 0;
+}
+
+libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::BgzfCompressContext(int level)
+:
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	compressor(
+		allocateCompressor(level),
+		[](auto p) {libdeflate_free_compressor(p);}
+
+	)
+	#else
+	stream(allocateStream(level))
+	#endif
+	,
+	bound(computeBound())
+{
+}
+
+libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::~BgzfCompressContext()
+{
+	#if !defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	deflateEnd(stream.get());
+	#endif
+}
+
+std::size_t libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::compressBound(std::size_t const s)
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	return libdeflate_deflate_compress_bound(compressor.get(),s);
+	#else
+	return deflateBound(stream.get(),s);
+	#endif
+}
+
+std::size_t libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::compress(
+	char * in,
+	size_t n_in,
+	char * out,
+	size_t n_out
+)
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	return libdeflate_deflate_compress(compressor.get(),in,n_in,out,n_out);
+	#else
+	assert ( stream );
+	if ( deflateReset(stream.get()) != Z_OK )
+		throw libmaus2::exception::LibMausException("[E] BgzfCompressContext::decompress: deflateReset failed");
+
+	stream->avail_in = n_in;
+	stream->next_in = reinterpret_cast<Bytef*>(in);
+	stream->avail_out = n_out;
+	stream->next_out = reinterpret_cast<Bytef*>(out);
+
+	int const r = deflate(stream.get(),Z_FINISH);
+
+	bool const r_ok = (r == Z_STREAM_END);
+
+	// return 0 on failure
+	if ( !r_ok )
+		return 0;
+
+	return n_out - stream->avail_out;
+	#endif
+}
+
+uint32_t libmaus2::parallel::threadpool::bgzf::BgzfCompressContext::crc32(
+	char const * in,
+	size_t const n_in
+)
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	return libdeflate_crc32(0 /* init value */, in, n_in);
+	#else
+	uint32_t lcrc = ::crc32(0,0,0);
+	lcrc = ::crc32(lcrc,reinterpret_cast<Bytef const *>(in),n_in);
+	return lcrc;
+	#endif
+}


=====================================
src/libmaus2/parallel/threadpool/bgzf/BgzfCompressContext.hpp
=====================================
@@ -0,0 +1,78 @@
+/*
+    libmaus2
+    Copyright (C) 2020 German Tischler-Höhle
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+*/
+#if ! defined(LIBMAUS2_PARALLEL_THREADPOOL_BGZF_BGZFCOMPRESSCONTEXT_HPP)
+#define LIBMAUS2_PARALLEL_THREADPOOL_BGZF_BGZFCOMPRESSCONTEXT_HPP
+
+#include <memory>
+#include <cstdlib>
+#include <cstring>
+#include <cassert>
+#include <atomic>
+#include <libmaus2/exception/LibMausException.hpp>
+
+#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+struct libdeflate_compressor;
+#else
+typedef struct z_stream_s zlib_z_stream;
+#endif
+
+namespace libmaus2
+{
+	namespace parallel
+	{
+		namespace threadpool
+		{
+			namespace bgzf
+			{
+				/* zlib decompression context */
+				struct BgzfCompressContext
+				{
+					typedef BgzfCompressContext this_type;
+					typedef std::shared_ptr<this_type> shared_ptr_type;
+
+					private:
+					BgzfCompressContext & operator=(BgzfCompressContext const & O) = delete;
+					std::size_t compressBound(std::size_t);
+					std::size_t computeBound();
+
+					#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+					std::shared_ptr<struct libdeflate_compressor> compressor;
+					#else
+					std::shared_ptr<zlib_z_stream> stream;
+					#endif
+
+					std::size_t const bound;
+
+					public:
+					BgzfCompressContext(int level = -1);
+					~BgzfCompressContext();
+
+
+					/**
+					 * raw deflate. Returns on number of compressed bytes on success
+					 * and 0 on failure (most likely explanation: compressed data does not
+					 * fit in output buffer).
+					 **/
+					std::size_t compress(char * in, size_t n_in, char * out, size_t n_out);
+					uint32_t crc32(char const * in, std::size_t const n);
+				};
+			}
+		}
+	}
+}
+#endif


=====================================
src/libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.cpp
=====================================
@@ -0,0 +1,117 @@
+/*
+    libmaus2
+    Copyright (C) 2020 German Tischler-Höhle
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+*/
+#include <libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.hpp>
+
+#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+#include <libdeflate.h>
+#else
+#include <zlib.h>
+#endif
+
+libmaus2::parallel::threadpool::bgzf::BgzfDecompressContext::BgzfDecompressContext()
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	auto * ldecompressor = libdeflate_alloc_decompressor();
+
+	if ( ! ldecompressor )
+		throw libmaus2::exception::LibMausException("[E] BgzfDecompressContext: libdeflate_alloc_decompressor failed");
+
+	decompressor = std::shared_ptr<struct libdeflate_decompressor>(
+		ldecompressor,
+		[](auto p) {libdeflate_free_decompressor(p);}
+	);
+	#else
+	stream = std::shared_ptr<z_stream>(new z_stream);
+	std::memset(stream.get(),0,sizeof(z_stream));
+	stream->zalloc = nullptr;
+	stream->zfree = nullptr;
+	stream->opaque = nullptr;
+	stream->avail_in = 0;
+	stream->next_in = nullptr;
+	int const r = inflateInit2(stream.get(),-15);
+	if ( r != Z_OK )
+	{
+		throw libmaus2::exception::LibMausException("[E] BgzfDecompressContext: inflateInit2 failed");
+	}
+	#endif
+}
+
+libmaus2::parallel::threadpool::bgzf::BgzfDecompressContext::~BgzfDecompressContext()
+{
+	#if !defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	inflateEnd(stream.get());
+	#endif
+}
+
+void libmaus2::parallel::threadpool::bgzf::BgzfDecompressContext::decompress(
+	char * in,
+	size_t n_in,
+	char * out,
+	size_t n_out
+)
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	std::size_t u_out = 0;
+	auto const r = libdeflate_deflate_decompress(decompressor.get(),in,n_in,out,n_out,&u_out);
+
+	bool const ok =
+		(r == LIBDEFLATE_SUCCESS)
+		&&
+		(u_out == n_out);
+
+	if ( !ok )
+	{
+		throw ::libmaus2::exception::LibMausException("[E] BgzfDecompressContext::decompress: libdeflate_deflate_decompress");
+	}
+	#else
+	assert ( stream );
+	if ( inflateReset(stream.get()) != Z_OK )
+		throw libmaus2::exception::LibMausException("[E] BgzfDecompressContext::decompress: inflateReset failed");
+
+	stream->avail_in = n_in;
+	stream->next_in = reinterpret_cast<Bytef*>(in);
+	stream->avail_out = n_out;
+	stream->next_out = reinterpret_cast<Bytef*>(out);
+
+	int const r = inflate(stream.get(),Z_FINISH);
+
+	bool const r_ok = (r == Z_STREAM_END);
+	bool const out_ok = (stream->avail_out == 0);
+	bool const in_ok = (stream->avail_in == 0);
+	bool const ok = r_ok && out_ok && in_ok;
+
+	if ( !ok )
+	{
+		throw ::libmaus2::exception::LibMausException("[E] BgzfDecompressContext::decompress: inflate failed");
+	}
+	#endif
+}
+
+uint32_t libmaus2::parallel::threadpool::bgzf::BgzfDecompressContext::crc32(
+	char const * in,
+	size_t const n_in
+)
+{
+	#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+	return libdeflate_crc32(0 /* init value */, in, n_in);
+	#else
+	uint32_t lcrc = ::crc32(0,0,0);
+	lcrc = ::crc32(lcrc,reinterpret_cast<Bytef const *>(in),n_in);
+	return lcrc;
+	#endif
+}
\ No newline at end of file


=====================================
src/libmaus2/parallel/threadpool/bgzf/BgzfDecompressContext.hpp
=====================================
@@ -18,7 +18,6 @@
 #if ! defined(LIBMAUS2_PARALLEL_THREADPOOL_BGZF_BGZFDECOMPRESSCONTEXT_HPP)
 #define LIBMAUS2_PARALLEL_THREADPOOL_BGZF_BGZFDECOMPRESSCONTEXT_HPP
 
-#include <zlib.h>
 #include <memory>
 #include <cstdlib>
 #include <cstring>
@@ -26,6 +25,12 @@
 #include <atomic>
 #include <libmaus2/exception/LibMausException.hpp>
 
+#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+struct libdeflate_decompressor;
+#else
+typedef struct z_stream_s zlib_z_stream;
+#endif
+
 namespace libmaus2
 {
 	namespace parallel
@@ -43,58 +48,23 @@ namespace libmaus2
 					private:
 					BgzfDecompressContext & operator=(BgzfDecompressContext const & O) = delete;
 
-					std::shared_ptr<z_stream> stream;
+					#if defined(LIBMAUS2_HAVE_LIBDEFLATE)
+					std::shared_ptr<struct libdeflate_decompressor> decompressor;
+					#else
+					std::shared_ptr<zlib_z_stream> stream;
+					#endif
 
 					public:
-					BgzfDecompressContext()
-					: stream(new z_stream)
-					{
-						std::memset(stream.get(),0,sizeof(z_stream));
-						stream->zalloc = nullptr;
-						stream->zfree = nullptr;
-						stream->opaque = nullptr;
-						stream->avail_in = 0;
-						stream->next_in = nullptr;
-						int const r = inflateInit2(stream.get(),-15);
-						if ( r != Z_OK )
-						{
-							throw libmaus2::exception::LibMausException("[E] BgzfDecompressContext: inflateInit2 failed");
-						}
-					}
-
-					~BgzfDecompressContext()
-					{
-						inflateEnd(stream.get());
-					}
+					BgzfDecompressContext();
+					~BgzfDecompressContext();
 
 					void decompress(
 						char * in,
 						size_t n_in,
 						char * out,
 						size_t n_out
-					)
-					{
-						assert ( stream );
-						if ( inflateReset(stream.get()) != Z_OK )
-							throw libmaus2::exception::LibMausException("[E] BgzfDecompressContext::decompress: inflateReset failed");
-
-						stream->avail_in = n_in;
-						stream->next_in = reinterpret_cast<Bytef*>(in);
-						stream->avail_out = n_out;
-						stream->next_out = reinterpret_cast<Bytef*>(out);
-
-						int const r = inflate(stream.get(),Z_FINISH);
-
-						bool const r_ok = (r == Z_STREAM_END);
-						bool const out_ok = (stream->avail_out == 0);
-						bool const in_ok = (stream->avail_in == 0);
-						bool const ok = r_ok && out_ok && in_ok;
-
-						if ( !ok )
-						{
-							throw ::libmaus2::exception::LibMausException("[E] BgzfDecompressContext::decompress: inflate failed");
-						}
-					}
+					);
+					uint32_t crc32(char const * in, std::size_t const n);
 				};
 			}
 		}


=====================================
src/libmaus2/sorting/RankSearch.hpp
=====================================
@@ -0,0 +1,2077 @@
+/*
+    libmaus2
+    Copyright (C) 2021 German Tischler-Höhle
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>..
+*/
+#if ! defined(LIBMAUS2_SORTING_RANKSEARCH_HPP)
+#define LIBMAUS2_SORTING_RANKSEARCH_HPP
+
+#include <libmaus2/parallel/threadpool/ThreadPool.hpp>
+
+namespace libmaus2
+{
+	namespace sorting
+	{
+		struct RankSearch
+		{
+			struct SortOp
+			{
+				virtual ~SortOp() {}
+				virtual void dispatch() = 0;
+			};
+
+			template<
+				typename iterator,
+				typename order_type
+			>
+			struct SortRequest : public SortOp
+			{
+				iterator a;
+				iterator e;
+				order_type const * order;
+
+				SortRequest() : a(), e(), order(nullptr)
+				{
+				}
+				SortRequest(
+					iterator ra,
+					iterator re,
+					order_type const * const rorder
+				) : a(ra), e(re), order(rorder) {}
+
+				void dispatch()
+				{
+					::std::sort(a,e,*order);
+				}
+			};
+
+			template<
+				typename iterator,
+				typename order_type
+			>
+			struct StableSortRequest : public SortOp
+			{
+				iterator a;
+				iterator e;
+				order_type const * order;
+
+				StableSortRequest() : a(), e(), order(nullptr)
+				{
+				}
+				StableSortRequest(
+					iterator ra,
+					iterator re,
+					order_type const * const rorder
+				) : a(ra), e(re), order(rorder) {}
+
+				void dispatch()
+				{
+					::std::stable_sort(a,e,*order);
+				}
+			};
+
+			template<
+				typename from_iterator,
+				typename to_iterator
+			>
+			struct CopyRequest : public SortOp
+			{
+				from_iterator from_b;
+				from_iterator from_e;
+				to_iterator to_b;
+
+				CopyRequest()
+				{
+
+				}
+				CopyRequest(
+					from_iterator rfrom_b,
+					from_iterator rfrom_e,
+					to_iterator rto_b
+				) : from_b(rfrom_b), from_e(rfrom_e), to_b(rto_b)
+				{
+
+				}
+
+				void dispatch()
+				{
+					std::copy(from_b,from_e,to_b);
+				}
+			};
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type
+			>
+			struct MergeRequest : public SortOp
+			{
+				const_iterator ab;
+				const_iterator ae;
+				const_iterator bb;
+				const_iterator be;
+				temp_iterator itt;
+				order_type const * order;
+
+				MergeRequest()
+				{}
+				MergeRequest(
+					const_iterator rab,
+					const_iterator rae,
+					const_iterator rbb,
+					const_iterator rbe,
+					temp_iterator ritt,
+					order_type const * rorder
+				) : ab(rab), ae(rae), bb(rbb), be(rbe), itt(ritt), order(rorder)
+				{}
+
+				void dispatch()
+				{
+					std::merge(
+						ab,ae,
+						bb,be,
+						itt,
+						*order
+					);
+				}
+			};
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type
+			>
+			struct MergeRequestDelayedSearch : public SortOp
+			{
+				const_iterator ab;
+				const_iterator ae;
+				const_iterator bb;
+				const_iterator be;
+				temp_iterator itt;
+				std::size_t from;
+				std::size_t to;
+				order_type const * order;
+
+				MergeRequestDelayedSearch()
+				{}
+				MergeRequestDelayedSearch(
+					const_iterator rab,
+					const_iterator rae,
+					const_iterator rbb,
+					const_iterator rbe,
+					temp_iterator ritt,
+					std::size_t const rfrom,
+					std::size_t const rto,
+					order_type const * rorder
+				) : ab(rab), ae(rae), bb(rbb), be(rbe), itt(ritt), from(rfrom), to(rto), order(rorder)
+				{}
+
+				void dispatch()
+				{
+					std::vector< std::pair<const_iterator,const_iterator> > const Vsplitin(
+						{
+							std::make_pair(ab,ae),
+							std::make_pair(bb,be)
+						}
+					);
+					auto const Vlow = rankSearchMulti(Vsplitin,from);
+					auto const Vhigh = rankSearchMulti(Vsplitin,to);
+					auto const off = std::distance(ab,Vlow.front()) + std::distance(bb,Vlow.back());
+					assert ( static_cast<std::size_t>(off) == from );
+
+					std::merge(
+						Vlow.front(),Vhigh.front(),
+						Vlow.back(),Vhigh.back(),
+						itt + off,
+						*order
+					);
+				}
+
+				std::string toString(const_iterator ref) const
+				{
+					std::ostringstream ostr;
+
+					ostr << "MergeRequestDelayedSearch("
+						<< ab-ref << ","
+						<< ae-ref << ","
+						<< bb-ref << ","
+						<< be-ref << ","
+						<< from << ","
+						<< to
+						<< ")";
+
+					return ostr.str();
+				}
+			};
+
+			template<
+				typename const_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::pair<const_iterator,const_iterator> rankSearch(
+				const_iterator ab, const_iterator ae,
+				const_iterator bb, const_iterator be,
+				typename std::iterator_traits<const_iterator>::difference_type const p,
+				order_type const & order = order_type()
+			)
+			{
+				const_iterator low_a = ab;
+				const_iterator high_a = ae;
+				const_iterator low_b = bb;
+				const_iterator high_b = be;
+
+				bool diff_a = (high_a != low_a) && order(*low_a,high_a[-1]);
+				bool diff_b = (high_b != low_b) && order(*low_b,high_b[-1]);
+
+				assert ( (low_a-ab) + (low_b-bb) <= p );
+
+				while ( diff_a || diff_b )
+				{
+					if ( diff_a )
+					{
+						auto const ln = std::distance(low_a,high_a);
+						auto P = std::equal_range(low_a,high_a,low_a[ln/2],order);
+
+						const_iterator ac;
+
+						if ( P.first == low_a )
+							ac = P.second;
+						else
+							ac = P.first;
+
+						assert ( ac != ae );
+
+						auto const & e = *ac;
+						auto const it = std::lower_bound(low_b,high_b,e,order);
+						typename std::iterator_traits<const_iterator>::difference_type const z = (it - bb) + (ac-ab);
+
+						if ( p >= z )
+							low_a = ac;
+						else
+							high_a = ac;
+
+						assert (
+							(low_a-ab) + (low_b-bb) <= p
+						);
+
+
+						diff_a = order(*low_a,high_a[-1]);
+					}
+					if ( diff_b )
+					{
+						auto const ln = std::distance(low_b,high_b);
+						auto P = std::equal_range(low_b,high_b,low_b[ln/2],order);
+
+						const_iterator bc;
+
+						if ( P.first == low_b )
+							bc = P.second;
+						else
+							bc = P.first;
+
+						assert ( bc != be );
+
+						auto const & e = *bc;
+						auto const it = std::lower_bound(low_a,high_a,e,order);
+						typename std::iterator_traits<const_iterator>::difference_type const z = (it - ab) + (bc-bb);
+
+						if ( p >= z )
+							low_b = bc;
+						else
+							high_b = bc;
+
+						assert ( (low_a-ab) + (low_b-bb) <= p );
+
+						diff_b = order(*low_b,high_b[-1]);
+					}
+				}
+
+				auto av_a = high_a - low_a;
+				auto av_b = high_b - low_b;
+
+				while (
+					(low_a-ab) + (low_b-bb) < p
+					&&
+					(av_a || av_b)
+				)
+				{
+					auto mis = p - ((low_a-ab) + (low_b-bb));
+
+					if ( ! av_a )
+					{
+						auto const av = av_b;
+						auto const use = std::min(av,mis);
+						low_b += use;
+					}
+					else if ( ! av_b )
+					{
+						auto const av = av_a;
+						auto const use = std::min(av,mis);
+						low_a += use;
+					}
+					else if ( order(*low_a,*low_b) )
+					{
+						auto const av = av_a;
+						auto const use = std::min(av,mis);
+						low_a += use;
+						av_a -= use;
+					}
+					else
+					{
+						auto const av = av_b;
+						auto const use = std::min(av,mis);
+						low_b += use;
+						av_b -= use;
+					}
+
+					assert ( (low_a-ab) + (low_b-bb) <= p );
+				}
+
+				return std::make_pair(low_a,low_b);
+			}
+
+			template<
+				typename const_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::vector<const_iterator> rankSearchMulti(
+				std::vector< std::pair<const_iterator,const_iterator> > const & V,
+				typename std::iterator_traits<const_iterator>::difference_type const p,
+				order_type const & order = order_type()
+			)
+			{
+				std::vector< std::pair<const_iterator,const_iterator> > VLH(V);
+				std::vector< bool > Vdiff(V.size());
+				bool anydiff = false;
+
+				for ( std::size_t i = 0; i < VLH.size(); ++i )
+				{
+					const_iterator const & low  = VLH[i].first;
+					const_iterator const & high = VLH[i].second;
+					Vdiff[i] = (high != low) && order(*low,high[-1]);
+					anydiff = anydiff || Vdiff[i];
+				}
+
+				typename std::iterator_traits<const_iterator>::difference_type s = 0;
+
+				assert ( s <= p );
+
+				while ( anydiff )
+				{
+					anydiff = false;
+
+					for ( std::size_t i = 0; i < Vdiff.size(); ++i )
+						if ( Vdiff[i] )
+						{
+							const_iterator const & low  = VLH[i].first;
+							const_iterator const & high = VLH[i].second;
+
+							auto const ln = std::distance(low,high);
+							auto P = std::equal_range(low,high,low[ln/2],order);
+
+							const_iterator ac;
+
+							if ( P.first == low )
+								ac = P.second;
+							else
+								ac = P.first;
+
+							assert ( ac != V[i].second );
+
+							auto const & e = *ac;
+
+							typename std::iterator_traits<const_iterator>::difference_type z = ac-V[i].first;
+
+							for ( std::size_t j = 0; j < Vdiff.size(); ++j )
+								if ( j != i )
+								{
+									auto const it = std::lower_bound(VLH[j].first,VLH[j].second,e,order);
+									z += it - V[j].first;
+								}
+
+							if ( p >= z )
+								VLH[i].first = ac;
+							else
+								VLH[i].second = ac;
+
+							Vdiff[i] = order(*(VLH[i].first),VLH[i].second[-1]);
+
+							anydiff = anydiff || Vdiff[i];
+						}
+
+					s = 0;
+					for ( std::size_t i = 0; i < VLH.size(); ++i )
+						s += VLH[i].first - V[i].first;
+
+					assert ( s <= p );
+				}
+
+				std::vector < std::size_t> VI(V.size());
+				for ( std::size_t i = 0; i < V.size(); ++i )
+					VI[i] = i;
+
+				// sort remaining intervals (by index) in ascending order
+				std::sort(
+					VI.begin(),
+					VI.end(),
+					[&VLH,&order](auto const i, auto const j)
+					{
+						bool const empty_i = VLH[i].second == VLH[i].first;
+						bool const empty_j = VLH[j].second == VLH[j].first;
+
+						if ( empty_j )
+							return true;
+						else if ( empty_i )
+							return false;
+						else if ( order(*(VLH[i].first),*(VLH[j].first)) )
+							return true;
+						else if ( order(*(VLH[j].first),*(VLH[i].first)) )
+							return false;
+						else
+							return i < j;
+					}
+				);
+
+				for ( std::size_t j = 0; s < p && j < VI.size(); ++j )
+				{
+					std::size_t const i = VI[j];
+					auto const av = VLH[i].second - VLH[i].first;
+					auto const mis = p - s;
+					auto const use = std::min(av,mis);
+
+					VLH[i].first += use;
+					s += use;
+				}
+
+				std::vector<const_iterator> R(VLH.size());
+
+				for ( std::size_t i = 0; i < VLH.size(); ++i )
+					R[i] = VLH[i].first;
+
+				return R;
+			}
+
+			template<typename iterator>
+			struct SwapReverse
+			{
+				iterator ab;
+				iterator ae;
+				iterator be;
+
+				SwapReverse(){}
+				SwapReverse(
+					iterator rab,
+					iterator rae,
+					iterator rbe
+				) : ab(rab), ae(rae), be(rbe)
+				{
+					assert ( ae >= ab );
+					assert ( be >= ae );
+				}
+
+				void dispatch()
+				{
+					while ( ab != ae )
+						std::swap(*ab++,*--be);
+				}
+
+				std::string toString(iterator ref) const
+				{
+					auto const an = std::distance(ab,ae);
+					std::ostringstream ostr;
+					ostr << "SwapReverse([" << ab-ref << "," << ae-ref << "),[" << ((be-an)-ref) << "," << be-ref << "))";
+
+					return ostr.str();
+				}
+
+				void pushIntervals(std::vector< std::pair<iterator,iterator> > & V) const
+				{
+					auto const d = std::distance(ab,ae);
+					V.push_back(std::make_pair(ab,ae));
+					V.push_back(std::make_pair(be-d,be));
+				}
+			};
+
+			template<typename iterator>
+			static void swap(
+				iterator itab,
+				iterator itae,
+				iterator itbb,
+				iterator itbe
+			)
+			{
+				auto const an = std::distance(itab,itae);
+				auto const bn = std::distance(itbb,itbe);
+
+				SwapReverse(itab,itab+an/2,itae).dispatch();
+				SwapReverse(itbb,itbb+bn/2,itbe).dispatch();
+
+				if ( an <= bn )
+				{
+					SwapReverse(itab,itab+an,itbe).dispatch();
+					itab += an;
+					itbe -= an;
+
+					auto const rest = bn - an;
+					auto const rest2 = rest / 2;
+
+					SwapReverse(itbb,itbb+rest2,itbe).dispatch();
+				}
+				else
+				{
+					SwapReverse(itab,itab+bn,itbe).dispatch();
+					itab += bn;
+					itbe -= bn;
+
+					auto const rest = an - bn;
+					auto const rest2 = rest / 2;
+
+					SwapReverse(itab,itab+rest2,itae).dispatch();
+				}
+			}
+
+			template<typename iterator>
+			static void planReverse(
+				iterator itab,
+				iterator itae,
+				std::size_t const threads,
+				std::vector< SwapReverse<iterator> > & V
+			)
+			{
+				auto const an = std::distance(itab,itae);
+
+				if ( an > 1 )
+				{
+					std::size_t const san = an/2;
+					std::size_t const packsize = (san + threads - 1)/threads;
+					std::size_t const numpacks = (san + packsize - 1)/packsize;
+
+					// std::cerr << "an=" << an << " san=" << san << " packsize=" << packsize << std::endl;
+
+					std::size_t rest = san;
+
+					for ( std::size_t i = 0; i < numpacks; ++i )
+					{
+						std::size_t const use = std::min(rest,packsize);
+
+						// std::cerr << "use=" << use << std::endl;
+
+						V.push_back(SwapReverse<iterator>(itab,itab+use,itae));
+
+						itab += use;
+						itae -= use;
+						rest -= use;
+					}
+
+					assert (
+						itab == itae
+						||
+						itab+1 == itae
+					);
+				}
+			}
+
+			template<typename iterator>
+			static void planSwapReverse(
+				iterator itab,
+				iterator itae,
+				iterator itbe,
+				std::size_t const threads,
+				std::vector< SwapReverse<iterator> > & V
+			)
+			{
+				auto const an = std::distance(itab,itae);
+
+				if ( ! an )
+					return;
+
+				assert ( std::distance(itae,itbe) >= an );
+
+				std::size_t const san = an;
+				std::size_t const packsize = (san + threads - 1)/threads;
+				std::size_t const numpacks = (san + packsize - 1)/packsize;
+
+				std::size_t rest = san;
+
+				for ( std::size_t i = 0; i < numpacks; ++i )
+				{
+					std::size_t const use = std::min(rest,packsize);
+					V.push_back(
+						SwapReverse<iterator>(
+							itab,
+							itab+use,
+							itbe
+						)
+					);
+
+					rest -= use;
+					itab += use;
+					itbe -= use;
+				}
+
+				assert ( rest == 0 );
+			}
+
+			template<typename iterator>
+			static std::pair< std::vector < SwapReverse<iterator> >,std::vector < SwapReverse<iterator> > > planSwap(
+				iterator itab,
+				iterator itae,
+				iterator itbb,
+				iterator itbe,
+				std::size_t const threads
+			)
+			{
+				assert ( itae == itbb );
+
+				auto const an = std::distance(itab,itae);
+				auto const bn = std::distance(itbb,itbe);
+				// decltype(an) dthreads = threads;
+
+				// reverse A and B
+				std::vector < SwapReverse<iterator> > Vfirst;
+				// reverse RARB to obtain BA
+				std::vector < SwapReverse<iterator> > Vsecond;
+
+				planReverse(itab,itae,threads,Vfirst);
+				planReverse(itbb,itbe,threads,Vfirst);
+
+				if ( an <= bn )
+				{
+					planSwapReverse(itab,itab+an,itbe,threads,Vsecond);
+
+					itab += an;
+					itbe -= an;
+
+					planReverse(itbb,itbe,threads,Vsecond);
+				}
+				else
+				{
+					planSwapReverse(itab,itab+bn,itbe,threads,Vsecond);
+
+					itab += bn;
+					itbe -= bn;
+
+					planReverse(itab,itae,threads,Vsecond);
+				}
+
+				std::vector < std::pair<iterator,iterator> > VIfirst;
+				std::vector < std::pair<iterator,iterator> > VIsecond;
+
+				for ( auto R : Vfirst )
+					R.pushIntervals(VIfirst);
+				for ( auto R : Vsecond )
+					R.pushIntervals(VIsecond);
+
+				std::sort(VIfirst.begin(),VIfirst.end());
+				std::sort(VIsecond.begin(),VIsecond.end());
+
+				for ( std::size_t i = 1; i < VIfirst.size(); ++i )
+					assert ( VIfirst[i-1].second <= VIfirst[i].first );
+				for ( std::size_t i = 1; i < VIsecond.size(); ++i )
+					assert ( VIsecond[i-1].second <= VIsecond[i].first );
+
+				return std::make_pair(Vfirst,Vsecond);
+			}
+
+			template<typename iterator>
+			static void swapParallel(
+				iterator itab,
+				iterator itae,
+				iterator itbb,
+				iterator itbe,
+				std::size_t const threads
+			)
+			{
+				assert ( itae == itbb );
+
+				// #define SWAP_PARALLEL_DEBUG
+
+				#if defined(SWAP_PARALLEL_DEBUG)
+				typedef typename std::iterator_traits<iterator>::value_type value_type;
+				std::vector<value_type> VA(itab,itae);
+				std::vector<value_type> VB(itbb,itbe);
+				std::vector<value_type> V;
+				for ( auto i : VB )
+					V.push_back(i);
+				for ( auto i : VA )
+					V.push_back(i);
+				#endif
+
+				auto P = planSwap(itab,itae,itbb,itbe,threads);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t i = 0; i < P.first.size(); ++i )
+					P.first[i].dispatch();
+
+				#if defined(SWAP_PARALLEL_DEBUG)
+				std::vector<value_type> VAR = VA;
+				std::vector<value_type> VBR = VB;
+				std::reverse(VAR.begin(),VAR.end());
+				std::reverse(VBR.begin(),VBR.end());
+				std::vector<value_type> VR;
+				for ( auto i : VAR )
+					VR.push_back(i);
+				for ( auto i : VBR )
+					VR.push_back(i);
+
+				for ( std::size_t i = 0; i < VR.size(); ++i )
+					assert ( VR[i] == itab[i] );
+
+				for ( std::size_t i = 0; i < P.second.size(); ++i )
+					std::cerr << P.second[i].toString(itab) << std::endl;
+				#endif
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t i = 0; i < P.second.size(); ++i )
+					P.second[i].dispatch();
+
+				#if defined(SWAP_PARALLEL_DEBUG)
+				std::reverse(VR.begin(),VR.end());
+				for ( std::size_t i = 0; i < V.size(); ++i )
+					assert ( VR[i] == V[i] );
+
+				bool ok = true;
+				for ( std::size_t i = 0; i < V.size(); ++i )
+					ok = ok && ( itab[i] == V[i] );
+
+				if ( !ok )
+				{
+					for ( std::size_t i = 0; i < V.size(); ++i )
+						std::cerr << "V[" << i << "]=" << V[i] << " itab[" << i << "]=" << itab[i] << std::endl;
+
+					assert ( ok );
+				}
+				#endif
+			}
+
+			template<
+				typename iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void mergeInPlace(iterator ita, iterator itm, iterator ite, order_type const & order = order_type())
+			{
+				typedef std::tuple<iterator,iterator,iterator> element;
+				std::stack<element> S;
+				auto const d = std::distance(ita,ite);
+
+				if ( d > 1 )
+					S.push(element(ita,itm,ite));
+
+				while ( ! S.empty() )
+				{
+					element E = S.top();
+					S.pop();
+
+					std::vector<std::pair<iterator,iterator> > V(
+						{
+							std::make_pair(std::get<0>(E),std::get<1>(E)),
+							std::make_pair(std::get<1>(E),std::get<2>(E))
+						}
+					);
+
+					auto const d = std::distance(std::get<0>(E),std::get<2>(E));
+
+					assert ( d > 1 );
+
+					std::vector<iterator> const R = rankSearchMulti(V,d/2,order);
+
+					auto const I0 = std::make_pair(V.front().first,R.front());
+					auto const I1 = std::make_pair(R.front(),V.front().second);
+					auto const I2 = std::make_pair(V.back().first,R.back());
+					auto const I3 = std::make_pair(R.back(),V.back().second);
+
+					assert ( I0.second == I1.first );
+					assert ( I1.second == I2.first );
+					assert ( I2.second == I3.first );
+
+					swap(I1.first,I1.second,I1.second,I2.second);
+
+					// I0,I2,I1,I3
+					auto const n_0 = std::distance(I0.first,I0.second);
+					auto const n_1 = std::distance(I1.first,I1.second);
+					auto const n_2 = std::distance(I2.first,I2.second);
+					auto const n_3 = std::distance(I3.first,I3.second);
+					auto const n_left  = n_0 + n_2;
+					auto const n_right = n_1 + n_3;
+
+					if ( n_right > 1 )
+					{
+						S.push(element(I0.first + n_0 + n_2,I0.first + n_0 + n_2 + n_1, I0.first + n_0 + n_2 + n_1 + n_3));
+					}
+					if ( n_left > 1 )
+					{
+						S.push(element(I0.first,I0.first + n_0,I0.first + n_0 + n_2));
+					}
+				}
+			}
+
+			template<typename iterator>
+			struct MergeInfo
+			{
+				iterator a;
+				iterator m;
+				iterator e;
+				std::size_t parts;
+
+				MergeInfo()
+				{
+
+				}
+
+				MergeInfo(
+					iterator ra,
+					iterator rm,
+					iterator re,
+					std::size_t const rparts
+				) : a(ra), m(rm), e(re), parts(rparts)
+				{
+
+				}
+
+				template<typename order_type>
+				void dispatch(order_type const & order)
+				{
+					mergeInPlace(a,m,e,order);
+				}
+
+				template<typename temp_iterator, typename order_type>
+				void dispatch(
+					temp_iterator itt,
+					std::size_t const limit,
+					order_type const & order
+				)
+				{
+					mergeLimited(a,m,e,itt,limit,order);
+				}
+			};
+
+
+			template<
+				typename iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void mergeInPlaceParallel(
+				iterator ita,
+				iterator itm,
+				iterator ite,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				std::stack < MergeInfo<iterator> > S;
+				std::vector < MergeInfo<iterator> > V;
+				S.push(MergeInfo<iterator>(ita,itm,ite,threads));
+
+				while ( !S.empty() )
+				{
+					MergeInfo<iterator> M = S.top();
+					S.pop();
+
+					if ( M.parts > 1 )
+					{
+						std::size_t const left = (M.parts + 1) / 2;
+						std::size_t const right = M.parts - left;
+						std::size_t const ln = M.e-M.a;
+						std::size_t const split = (left * ln + M.parts- 1) / M.parts;
+						assert ( split <= ln );
+
+						#if 0
+						std::cerr << "parts=" << M.parts
+							<< " left=" << left
+							<< " right=" << right
+							<< " ln=" << ln
+							<< " split=" << split
+							<< std::endl;
+						#endif
+
+						auto const Vsplit =
+							rankSearchMulti(
+								std::vector< std::pair<iterator,iterator> >(
+									{
+										std::make_pair(M.a,M.m),
+										std::make_pair(M.m,M.e)
+									}
+								),
+								split,
+								order
+							);
+
+						iterator const ab = M.a;
+						iterator const am = Vsplit.front();
+						iterator const ae = M.m;
+						iterator const bb = M.m;
+						iterator const bm = Vsplit.back();
+						iterator const be = M.e;
+
+						std::size_t const r0 = am-ab;
+						std::size_t const r1 = ae-am;
+						std::size_t const r2 = bm-bb;
+						std::size_t const r3 = be-bm;
+
+						#if 0
+						std::cerr << "r0=" << r0 << std::endl;
+						std::cerr << "r1=" << r1 << std::endl;
+						std::cerr << "r2=" << r2 << std::endl;
+						std::cerr << "r3=" << r3 << std::endl;
+						#endif
+
+						swapParallel(am,ae,bb,bm,threads);
+
+						S.push(MergeInfo<iterator>(ab+r0+r2,ab+r0+r2+r1,ab+r0+r2+r1+r3,right));
+						S.push(MergeInfo<iterator>(ab,ab+r0,ab+r0+r2,left));
+					}
+					else
+					{
+						V.push_back(M);
+					}
+				}
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t i = 0; i < V.size(); ++i )
+					V[i].dispatch(order);
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void mergeLimitedParallel(
+				iterator ita,
+				iterator itm,
+				iterator ite,
+				temp_iterator itt,
+				std::size_t const tmpsize,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				std::stack < MergeInfo<iterator> > S;
+				std::vector < MergeInfo<iterator> > V;
+				S.push(MergeInfo<iterator>(ita,itm,ite,threads));
+
+				while ( !S.empty() )
+				{
+					MergeInfo<iterator> M = S.top();
+					S.pop();
+
+					if ( M.parts > 1 )
+					{
+						std::size_t const left = (M.parts + 1) / 2;
+						std::size_t const right = M.parts - left;
+						std::size_t const ln = M.e-M.a;
+						std::size_t const split = (left * ln + M.parts- 1) / M.parts;
+						assert ( split <= ln );
+
+						#if 0
+						std::cerr << "parts=" << M.parts
+							<< " left=" << left
+							<< " right=" << right
+							<< " ln=" << ln
+							<< " split=" << split
+							<< std::endl;
+						#endif
+
+						auto const Vsplit =
+							rankSearchMulti(
+								std::vector< std::pair<iterator,iterator> >(
+									{
+										std::make_pair(M.a,M.m),
+										std::make_pair(M.m,M.e)
+									}
+								),
+								split,
+								order
+							);
+
+						iterator const ab = M.a;
+						iterator const am = Vsplit.front();
+						iterator const ae = M.m;
+						iterator const bb = M.m;
+						iterator const bm = Vsplit.back();
+						iterator const be = M.e;
+
+						std::size_t const r0 = am-ab;
+						std::size_t const r1 = ae-am;
+						std::size_t const r2 = bm-bb;
+						std::size_t const r3 = be-bm;
+
+						#if 0
+						std::cerr << "r0=" << r0 << std::endl;
+						std::cerr << "r1=" << r1 << std::endl;
+						std::cerr << "r2=" << r2 << std::endl;
+						std::cerr << "r3=" << r3 << std::endl;
+						#endif
+
+						swapParallel(am,ae,bb,bm,threads);
+
+						S.push(MergeInfo<iterator>(ab+r0+r2,ab+r0+r2+r1,ab+r0+r2+r1+r3,right));
+						S.push(MergeInfo<iterator>(ab,ab+r0,ab+r0+r2,left));
+					}
+					else
+					{
+						V.push_back(M);
+					}
+				}
+
+				std::size_t const tmpperthread = (tmpsize + threads - 1)/threads;
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t i = 0; i < V.size(); ++i )
+				{
+					std::size_t const tmplow = i * tmpperthread;
+					std::size_t const tmphigh = std::min(tmplow + tmpperthread, tmpsize);
+
+					V[i].dispatch(itt+tmplow,tmphigh-tmplow,order);
+				}
+			}
+
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void mergeLimited(
+				iterator ita,
+				iterator itm,
+				iterator ite,
+				temp_iterator itt,
+				std::size_t const limit,
+				order_type const & order = order_type()
+			)
+			{
+				typedef std::tuple<iterator,iterator,iterator> element;
+				std::stack<element> S;
+				auto const d = std::distance(ita,ite);
+
+				if ( d > 1 )
+					S.push(element(ita,itm,ite));
+
+				while ( ! S.empty() )
+				{
+					element E = S.top();
+					S.pop();
+
+					auto const d = std::distance(std::get<0>(E),std::get<2>(E));
+
+					if ( d <= static_cast<decltype(d)>(limit) )
+					{
+						std::merge(std::get<0>(E),std::get<1>(E),std::get<1>(E),std::get<2>(E),itt);
+						std::copy(itt,itt+d,std::get<0>(E));
+					}
+					else
+					{
+						std::vector<std::pair<iterator,iterator> > V(
+							{
+								std::make_pair(std::get<0>(E),std::get<1>(E)),
+								std::make_pair(std::get<1>(E),std::get<2>(E))
+							}
+						);
+
+
+						assert ( d > 1 );
+
+						std::vector<iterator> const R = rankSearchMulti(V,d/2,order);
+
+						auto const I0 = std::make_pair(V.front().first,R.front());
+						auto const I1 = std::make_pair(R.front(),V.front().second);
+						auto const I2 = std::make_pair(V.back().first,R.back());
+						auto const I3 = std::make_pair(R.back(),V.back().second);
+
+						assert ( I0.second == I1.first );
+						assert ( I1.second == I2.first );
+						assert ( I2.second == I3.first );
+
+						swap(I1.first,I1.second,I1.second,I2.second);
+
+						// I0,I2,I1,I3
+						auto const n_0 = std::distance(I0.first,I0.second);
+						auto const n_1 = std::distance(I1.first,I1.second);
+						auto const n_2 = std::distance(I2.first,I2.second);
+						auto const n_3 = std::distance(I3.first,I3.second);
+						auto const n_left  = n_0 + n_2;
+						auto const n_right = n_1 + n_3;
+
+						if ( n_right > 1 )
+						{
+							S.push(element(I0.first + n_0 + n_2,I0.first + n_0 + n_2 + n_1, I0.first + n_0 + n_2 + n_1 + n_3));
+						}
+						if ( n_left > 1 )
+						{
+							S.push(element(I0.first,I0.first + n_0,I0.first + n_0 + n_2));
+						}
+					}
+				}
+			}
+
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::vector<
+				MergeRequest<
+					const_iterator,
+					temp_iterator,
+					order_type
+				>
+			> planMerge(
+				const_iterator ab,
+				const_iterator ae,
+				const_iterator bb,
+				const_iterator be,
+				temp_iterator itt,
+				std::size_t const parts,
+				bool const parallelSearch,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+				auto const m = std::distance(bb,be);
+				auto const s = n + m;
+
+				if ( ! s )
+					return std::vector< MergeRequest<const_iterator,temp_iterator,order_type> >();
+
+				auto const perthread = (s + parts - 1 ) / parts;
+				auto const numpack = (s + perthread - 1) / perthread;
+
+				std::vector< std::pair<const_iterator,const_iterator> > const Vsplitin(
+					{
+						std::make_pair(ab,ae),
+						std::make_pair(bb,be)
+					}
+				);
+
+				std::vector < std::vector<const_iterator> > VV(numpack+1);
+
+				if ( parallelSearch )
+				{
+					#if defined(_OPENMP)
+					#pragma omp parallel for num_threads(parallelSearch ? parts : 1) schedule(dynamic,1)
+					#endif
+					for ( std::size_t t = 0; t < numpack; ++t )
+						VV[t] = rankSearchMulti(Vsplitin,t * perthread);
+				}
+				else
+				{
+					for ( std::size_t t = 0; t < numpack; ++t )
+						VV[t] = rankSearchMulti(Vsplitin,t * perthread);
+				}
+				VV[numpack] = rankSearchMulti(Vsplitin,s,order);
+
+				std::vector < MergeRequest<const_iterator,temp_iterator,order_type> > VREQ;
+				for ( std::size_t t = 0; t < numpack; ++t )
+				{
+					auto const lab = VV.at(t+0).front();
+					auto const lae = VV.at(t+1).front();
+					auto const lbb = VV.at(t+0).back();
+					auto const lbe = VV.at(t+1).back();
+					auto const o = t * perthread;
+
+					assert ( (lab-ab) + (lbb-bb) == static_cast<decltype(n)>(o) );
+
+					VREQ.push_back(MergeRequest<const_iterator,temp_iterator,order_type>(lab,lae,lbb,lbe,itt + o,&order));
+				}
+
+				return VREQ;
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::vector<
+				MergeRequestDelayedSearch<
+					const_iterator,
+					temp_iterator,
+					order_type
+				>
+			> planMergeDelayedSearch(
+				const_iterator ab,
+				const_iterator ae,
+				const_iterator bb,
+				const_iterator be,
+				temp_iterator itt,
+				std::size_t const parts,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+				auto const m = std::distance(bb,be);
+				auto const s = n + m;
+
+				if ( ! s )
+					return std::vector< MergeRequestDelayedSearch<const_iterator,temp_iterator,order_type> >();
+
+				auto const perthread = (s + parts - 1 ) / parts;
+				auto const numpack = (s + perthread - 1) / perthread;
+
+				std::vector < MergeRequestDelayedSearch<const_iterator,temp_iterator,order_type> > VREQ;
+				for ( std::size_t t = 0; t < numpack; ++t )
+				{
+					auto const from = t * perthread;
+					auto const to = (t+1 == numpack) ? s : (from+perthread);
+					VREQ.push_back(MergeRequestDelayedSearch<const_iterator,temp_iterator,order_type>(ab,ae,bb,be,itt,from,to,&order));
+				}
+
+				return VREQ;
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::size_t planMergeDelayedSearchP(
+				std::vector < std::shared_ptr<SortOp> > & Vop,
+				const_iterator ab,
+				const_iterator ae,
+				const_iterator bb,
+				const_iterator be,
+				temp_iterator itt,
+				std::size_t const parts,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+				auto const m = std::distance(bb,be);
+				auto const s = n + m;
+
+				if ( ! s )
+					return 0;
+
+				auto const perthread = (s + parts - 1 ) / parts;
+				auto const numpack = (s + perthread - 1) / perthread;
+
+				for ( std::size_t t = 0; t < numpack; ++t )
+				{
+					auto const from = t * perthread;
+					auto const to = (t+1 == numpack) ? s : (from+perthread);
+					std::shared_ptr< SortOp > ptr(
+						new MergeRequestDelayedSearch<const_iterator,temp_iterator,order_type>(ab,ae,bb,be,itt,from,to,&order)
+					);
+					Vop.emplace_back(ptr);
+				}
+
+				return numpack;
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::vector<
+				MergeRequest<
+					const_iterator,
+					temp_iterator,
+					order_type
+				>
+			> planMerge(
+				const_iterator ab,
+				const_iterator ac,
+				const_iterator ae,
+				temp_iterator itt,
+				std::size_t const parts,
+				bool const parallelSearch,
+				order_type const & order = order_type()
+			)
+			{
+				return planMerge(ab,ac,ac,ae,itt,parts,parallelSearch,order);
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static std::vector<
+				MergeRequestDelayedSearch<
+					const_iterator,
+					temp_iterator,
+					order_type
+				>
+			> planMergeDelayedSearch(
+				const_iterator ab,
+				const_iterator ac,
+				const_iterator ae,
+				temp_iterator itt,
+				std::size_t const parts,
+				order_type const & order = order_type()
+			)
+			{
+				return planMergeDelayedSearch(ab,ac,ac,ae,itt,parts,order);
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator
+			>
+			static std::size_t
+				planCopyBack(
+				std::vector < std::shared_ptr<SortOp> > & Vop,
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const parts
+			)
+			{
+				auto const n = std::distance(ab,ae);
+
+				if ( ! n )
+					return 0;
+
+				std::size_t const perthread = (n + parts - 1)/parts;
+				std::size_t const numpacks = (n + perthread - 1)/perthread;
+
+				for ( std::size_t i = 0; i < numpacks; ++i )
+				{
+					auto const low = i * perthread;
+					auto const high = (i+1 == numpacks) ? n : low+perthread;
+
+					std::shared_ptr<SortOp> ptr(
+						new CopyRequest<temp_iterator,iterator>(
+							itt + low,
+							itt + high,
+							ab + low
+						)
+					);
+
+					Vop.emplace_back(ptr);
+				}
+
+				return numpacks;
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static
+				std::pair
+				<
+					std::vector
+					<
+						MergeRequest<
+							iterator,
+							temp_iterator,
+							order_type
+						>
+					>,
+					std::vector
+					<
+						CopyRequest<
+							temp_iterator,
+							iterator
+						>
+					>
+				>
+				planMergeCopyBack(
+				iterator ab,
+				iterator ac,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const parts,
+				bool const parallelSearch,
+				order_type const & order = order_type()
+			)
+			{
+				auto const Vmerge = planMerge(ab,ac,ae,itt,parts,parallelSearch,order);
+				auto const gn = std::distance(ab,ae);
+				typename std::remove_const<decltype(gn)>::type o = 0;
+				std::vector< CopyRequest<temp_iterator,iterator> > Vcopy(Vmerge.size());
+
+				for ( std::size_t i = 0; i < Vmerge.size(); ++i )
+				{
+					auto const & req = Vmerge[i];
+					auto const n = (req.ae-req.ab) + (req.be-req.bb);
+
+					auto const from = o;
+					auto const to   = from + n;
+
+					Vcopy[i] = CopyRequest<temp_iterator,iterator>(itt+from,itt+to,ab+from);
+
+					o = to;
+				}
+
+				assert ( o == gn );
+
+				return std::make_pair(Vmerge,Vcopy);
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static void mergeParallel(
+				const_iterator ab,
+				const_iterator ae,
+				const_iterator bb,
+				const_iterator be,
+				temp_iterator itt,
+				std::size_t const threads,
+				bool const parallelSearch,
+				order_type const & order = order_type()
+			)
+			{
+				auto VREQ = planMerge(ab,ae,bb,be,itt,threads,parallelSearch,order);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t t = 0; t < VREQ.size(); ++t )
+					VREQ[t].dispatch();
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static void mergeParallelDelayedSearch(
+				const_iterator ab,
+				const_iterator ae,
+				const_iterator bb,
+				const_iterator be,
+				temp_iterator itt,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				auto VREQ = planMergeDelayedSearch(ab,ae,bb,be,itt,threads,order);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t t = 0; t < VREQ.size(); ++t )
+					VREQ[t].dispatch();
+			}
+
+			template<
+				typename const_iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							const_iterator
+						>::value_type
+					>
+			>
+			static void mergeParallelCopyBack(
+				const_iterator ab,
+				const_iterator ac,
+				const_iterator ae,
+				temp_iterator itt,
+				std::size_t const threads,
+				bool const parallelSearch,
+				order_type const & order = order_type()
+			)
+			{
+				auto PREQCOPY = planMergeCopyBack(ab,ac,ae,itt,threads,parallelSearch,order);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t t = 0; t < PREQCOPY.first.size(); ++t )
+					PREQCOPY.first[t].dispatch();
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( std::size_t t = 0; t < PREQCOPY.second.size(); ++t )
+					PREQCOPY.second[t].dispatch();
+			}
+
+			struct SortFinishedInfoInterface
+			{
+				virtual ~SortFinishedInfoInterface() {}
+			};
+
+			struct SortFinishedInterface
+			{
+				virtual ~SortFinishedInterface() {}
+				virtual void sortFinished(std::shared_ptr<SortFinishedInfoInterface>) = 0;
+			};
+
+			struct SortInfo
+			{
+				std::vector < std::vector < std::shared_ptr<SortOp> > > Vop;
+
+				void dispatchOMP(std::size_t const threads)
+				{
+					std::atomic<int> parfailed(0);
+
+					for ( std::size_t i = 0; (! parfailed.load()) && i < Vop.size(); ++i )
+					{
+						std::vector < std::shared_ptr<SortOp> > & Lop = Vop.at(i);
+
+						#if defined(_OPENMP)
+						#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+						#endif
+						for ( std::size_t t = 0; t < Lop.size(); ++t )
+						{
+							try
+							{
+								Lop[t]->dispatch();
+							}
+							catch(...)
+							{
+								parfailed.store(1);
+							}
+						}
+					}
+				}
+			};
+
+			struct SortControl
+			{
+				std::shared_ptr<SortInfo> SI;
+				std::vector < std::shared_ptr < std::atomic < std::size_t > > > VA;
+
+				SortControl(std::shared_ptr<SortInfo> rSI)
+				: SI(rSI), VA(SI->Vop.size())
+				{
+					for ( std::size_t i = 0; i < SI->Vop.size(); ++i )
+					{
+						std::shared_ptr< std::atomic<std::size_t> > ptr(
+							new std::atomic<std::size_t>(SI->Vop[i].size())
+						);
+						VA[i] = ptr;
+					}
+				}
+
+				bool dec(std::size_t const j)
+				{
+					std::atomic<std::size_t> & A = *(VA[j]);
+					std::size_t const r = --A;
+					return r == 0;
+				}
+			};
+
+			struct SortPackageData : public libmaus2::parallel::threadpool::ThreadWorkPackageData
+			{
+				libmaus2::util::atomic_shared_ptr<SortControl> SC;
+				std::pair < std::atomic<std::size_t>, std::atomic<std::size_t> > P;
+				std::shared_ptr<SortFinishedInfoInterface> finishedInfo;
+				std::atomic<SortFinishedInterface *> finishedInterface;
+
+				SortPackageData() : SC(), P(std::make_pair(0,0)), finishedInfo(), finishedInterface(nullptr) {}
+			};
+
+			struct SortPackageDispatcher : public libmaus2::parallel::threadpool::ThreadPoolDispatcher
+			{
+				virtual void dispatch(
+					libmaus2::util::shared_ptr<libmaus2::parallel::threadpool::ThreadWorkPackage> package,
+					libmaus2::parallel::threadpool::ThreadPool * pool
+				)
+				{
+					SortPackageData & SPD = dynamic_cast<SortPackageData &>(*(package->data.load()));
+
+					std::size_t const first = SPD.P.first.load();
+					std::size_t const second = SPD.P.second.load();
+
+					SPD.SC.load()->SI->Vop.at(first).at(second)->dispatch();
+
+					bool const leveldone = SPD.SC.load()->dec(first);
+
+					if ( leveldone )
+					{
+						std::size_t const nextlevel = first+1;
+
+						if ( nextlevel < SPD.SC.load()->SI->Vop.size() )
+						{
+							std::size_t const levelsize = SPD.SC.load()->SI->Vop.at(nextlevel).size();
+
+							for ( size_t i = 0; i < levelsize; ++i )
+							{
+								libmaus2::util::shared_ptr<libmaus2::parallel::threadpool::ThreadWorkPackage> newpack(pool->getPackage<SortPackageData>());
+								SortPackageData & newSPD = dynamic_cast<SortPackageData &>(*(newpack->data.load()));
+								newSPD.SC.store(SPD.SC.load());
+								newSPD.P.first.store(nextlevel);
+								newSPD.P.second.store(i);
+								newSPD.finishedInfo = SPD.finishedInfo;
+								newSPD.finishedInterface.store(SPD.finishedInterface.load());
+								pool->enqueue(newpack);
+							}
+						}
+						else
+						{
+							SPD.finishedInterface.load()->sortFinished(SPD.finishedInfo);
+						}
+					}
+				}
+			};
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void sortWithTempSpace(
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				libmaus2::parallel::threadpool::ThreadPool & TP,
+				std::shared_ptr<SortFinishedInfoInterface> finishedInfo,
+				SortFinishedInterface * finishedInterface,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+
+				TP.ensureDispatcherRegistered<SortPackageData>(libmaus2::util::shared_ptr<libmaus2::parallel::threadpool::ThreadPoolDispatcher>(new SortPackageDispatcher));
+
+				if ( ! n )
+					finishedInterface->sortFinished(finishedInfo);
+
+				std::shared_ptr<SortInfo> SI = planSortWithTempSpace(ab,ae,itt,TP.threads,order);
+				libmaus2::util::shared_ptr<SortControl> SC(new SortControl(SI));
+
+				assert ( SI->Vop.size() );
+
+				std::size_t const nextlevel = 0;
+				std::size_t const levelsize = SI->Vop.at(nextlevel).size();
+
+				for ( size_t i = 0; i < levelsize; ++i )
+				{
+					libmaus2::util::shared_ptr<libmaus2::parallel::threadpool::ThreadWorkPackage> newpack(TP.getPackage<SortPackageData>());
+					SortPackageData & newSPD = dynamic_cast<SortPackageData &>(*(newpack->data.load()));
+					newSPD.SC.store(SC);
+					newSPD.P.first.store(nextlevel);
+					newSPD.P.second.store(i);
+					newSPD.finishedInfo = finishedInfo;
+					newSPD.finishedInterface.store(finishedInterface);
+					TP.enqueue(newpack);
+				}
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void sortWithTempSpaceViaThreadPool(
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				libmaus2::parallel::threadpool::ThreadPool TP(threads);
+				TP.registerDispatcher<SortPackageData>(libmaus2::util::shared_ptr<libmaus2::parallel::threadpool::ThreadPoolDispatcher>(new SortPackageDispatcher));
+
+				struct SortFinishedInfo : public SortFinishedInfoInterface
+				{
+					libmaus2::parallel::threadpool::ThreadPool * TP;
+
+					SortFinishedInfo(libmaus2::parallel::threadpool::ThreadPool * rTP)
+					: TP(rTP)
+					{
+					}
+				};
+
+				std::shared_ptr<SortFinishedInfoInterface> info(new SortFinishedInfo(&TP));
+
+				struct SortFinished : public SortFinishedInterface
+				{
+					virtual void sortFinished(std::shared_ptr<SortFinishedInfoInterface> ptr)
+					{
+						SortFinishedInfo & info = dynamic_cast<SortFinishedInfo &>(*ptr);
+						info.TP->terminate();
+					}
+				};
+
+				SortFinished SF;
+
+				sortWithTempSpace(ab,ae,itt,TP,info,&SF,order);
+
+				TP.join();
+			}
+
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static std::shared_ptr<SortInfo> planSortWithTempSpace(
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				std::shared_ptr<SortInfo> SI(new SortInfo);
+
+				auto const n = std::distance(ab,ae);
+
+				if ( ! n )
+					return SI;
+
+				typedef typename std::remove_const<decltype(n)>::type index_type;
+
+				index_type const packsize = (n + threads - 1) / threads;
+				index_type const numpacks = (n + packsize - 1) / packsize;
+
+				std::vector < std::pair<index_type,index_type> > Vblock(numpacks);
+
+				std::vector < std::vector < std::shared_ptr<SortOp> > > & Vop = SI->Vop;
+
+				{
+					std::vector < std::shared_ptr<SortOp> > Lop(numpacks);
+
+					for ( index_type t = 0; t < numpacks; ++t )
+					{
+						index_type const low  = t * packsize;
+						index_type const high = std::min(static_cast<index_type>(low+packsize),n);
+
+						Vblock[t] = std::make_pair(low,high);
+
+						std::shared_ptr<SortOp> ptr(
+							new SortRequest<iterator,order_type>(ab+low,ab+high,&order)
+						);
+
+						Lop[t] = ptr;
+					}
+
+					Vop.emplace_back(Lop);
+				}
+
+
+				while ( Vblock.size() > 1 )
+				{
+					std::vector < std::pair<index_type,index_type> > Nblock;
+
+					std::vector < std::shared_ptr<SortOp> > Lopmerge;
+					std::vector < std::shared_ptr<SortOp> > Lopcopy;
+
+					std::size_t i = 0;
+
+					for ( ; i + 1 < Vblock.size(); i += 2 )
+					{
+						index_type const low = Vblock[i].first;
+						index_type const middle = Vblock[i].second;
+						assert ( middle == Vblock[i+1].first );
+						index_type const high = Vblock[i+1].second;
+
+						planMergeDelayedSearchP(
+							Lopmerge,
+							ab+low,
+							ab+middle,
+							ab+middle,
+							ab+high,
+							itt+low,
+							threads,
+							order
+						);
+
+						planCopyBack(
+							Lopcopy,
+							ab+low,
+							ab+high,
+							itt+low,
+							threads
+						);
+
+						Nblock.push_back(std::make_pair(low,high));
+					}
+
+					Vop.emplace_back(Lopmerge);
+					Vop.emplace_back(Lopcopy);
+
+					if ( i < Vblock.size() )
+						Nblock.push_back(Vblock[i++]);
+
+					assert ( i == Vblock.size() );
+
+					Nblock.swap(Vblock);
+				}
+
+				return SI;
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void sortWithTempSpace(
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				std::shared_ptr<SortInfo> SI = planSortWithTempSpace(ab,ae,itt,threads,order);
+				SI->dispatchOMP(threads);
+			}
+
+			template<
+				typename iterator,
+				typename temp_iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void sortLimited(
+				iterator ab,
+				iterator ae,
+				temp_iterator itt,
+				std::size_t const limit,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+
+				typedef typename std::remove_const<decltype(n)>::type index_type;
+
+				if ( ! n )
+					return;
+
+				index_type const packsize = (n + threads - 1) / threads;
+				index_type const numpacks = (n + packsize - 1) / packsize;
+
+				std::vector < std::pair<index_type,index_type> > Vblock(numpacks);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( index_type t = 0; t < numpacks; ++t )
+				{
+					index_type const low  = t * packsize;
+					index_type const high = std::min(static_cast<index_type>(low+packsize),n);
+
+					Vblock[t] = std::make_pair(low,high);
+
+					std::sort(ab + low, ab + high, order);
+				}
+
+				while ( Vblock.size() > 1 )
+				{
+					std::vector < std::pair<index_type,index_type> > Nblock;
+
+					std::size_t i = 0;
+
+					for ( ; i + 1 < Vblock.size(); i += 2 )
+					{
+						index_type const low = Vblock[i].first;
+						index_type const middle = Vblock[i].second;
+						assert ( middle == Vblock[i+1].first );
+						index_type const high = Vblock[i+1].second;
+
+						mergeLimitedParallel(
+							ab + low,ab + middle,ab + high,
+							itt,
+							limit,
+							threads,
+							order);
+
+						Nblock.push_back(std::make_pair(low,high));
+					}
+
+					if ( i < Vblock.size() )
+						Nblock.push_back(Vblock[i++]);
+
+					assert ( i == Vblock.size() );
+
+					Nblock.swap(Vblock);
+				}
+			}
+
+			template<
+				typename iterator,
+				typename order_type =
+					std::less<
+						typename std::iterator_traits<
+							iterator
+						>::value_type
+					>
+			>
+			static void sortInPlace(
+				iterator ab,
+				iterator ae,
+				std::size_t const threads,
+				order_type const & order = order_type()
+			)
+			{
+				auto const n = std::distance(ab,ae);
+
+				typedef typename std::remove_const<decltype(n)>::type index_type;
+
+				if ( ! n )
+					return;
+
+				index_type const packsize = (n + threads - 1) / threads;
+				index_type const numpacks = (n + packsize - 1) / packsize;
+
+				std::vector < std::pair<index_type,index_type> > Vblock(numpacks);
+
+				#if defined(_OPENMP)
+				#pragma omp parallel for num_threads(threads) schedule(dynamic,1)
+				#endif
+				for ( index_type t = 0; t < numpacks; ++t )
+				{
+					index_type const low  = t * packsize;
+					index_type const high = std::min(static_cast<index_type>(low+packsize),n);
+
+					Vblock[t] = std::make_pair(low,high);
+
+					std::sort(ab + low, ab + high, order);
+				}
+
+				while ( Vblock.size() > 1 )
+				{
+					std::vector < std::pair<index_type,index_type> > Nblock;
+
+					std::size_t i = 0;
+
+					for ( ; i + 1 < Vblock.size(); i += 2 )
+					{
+						index_type const low = Vblock[i].first;
+						index_type const middle = Vblock[i].second;
+						assert ( middle == Vblock[i+1].first );
+						index_type const high = Vblock[i+1].second;
+
+						mergeInPlaceParallel(
+							ab + low,ab + middle,ab + high,
+							threads,
+							order);
+
+						Nblock.push_back(std::make_pair(low,high));
+					}
+
+					if ( i < Vblock.size() )
+						Nblock.push_back(Vblock[i++]);
+
+					assert ( i == Vblock.size() );
+
+					Nblock.swap(Vblock);
+				}
+			}
+		};
+	}
+}
+#endif


=====================================
src/libmaus2/vcf/VCFParser.hpp
=====================================
@@ -1274,7 +1274,9 @@ namespace libmaus2
 					if ( TSC0.size() != TSC1.size() )
 					{
 						libmaus2::exception::LibMausException lme;
-						lme.getStream() << "[E] malformed VCF line " <<
+						lme.getStream() << "[E] malformed VCF line (sample data for sample "
+							<< samplenum << " has " << TSC1.size() << " fields while format field has "
+							<< TSC0.size() << ") " <<
 							std::string(a,T.get(T.size()-1,a).second) << std::endl;
 						lme.finish();
 						throw lme;


=====================================
src/test/testThreadPoolCramEncode.cpp
=====================================
@@ -21,6 +21,9 @@
 #include <libmaus2/util/GetFileSize.hpp>
 #include <libmaus2/parallel/NumCpus.hpp>
 #include <regex>
+#include <filesystem>
+#include <libmaus2/fastx/FastAReader.hpp>
+#include <libmaus2/digest/md5.hpp>
 
 int main(int argc, char * argv[])
 {
@@ -48,7 +51,7 @@ int main(int argc, char * argv[])
 			return EXIT_FAILURE;
 		}
 
-		char * ref_cache = getenv("REF_CACHE");
+		char const * ref_cache = getenv("REF_CACHE");
 
 		if ( ! ref_cache )
 		{
@@ -56,8 +59,75 @@ int main(int argc, char * argv[])
 			return EXIT_FAILURE;
 		}
 
+		std::filesystem::path ref_cache_path(ref_cache);
+		std::set<std::filesystem::path> Spathseen;
+
+		while ( std::filesystem::exists(ref_cache_path) && std::filesystem::is_symlink(ref_cache_path) )
+		{
+			if ( Spathseen.find(ref_cache_path) != Spathseen.end() )
+			{
+				std::cerr << "[E] circular loop of symlinks detected for REF_CACHE path" << std::endl;
+				return EXIT_FAILURE;
+			}
+
+			Spathseen.insert(ref_cache_path);
+
+			auto n_ref_cache_path = std::filesystem::read_symlink(ref_cache_path);
+
+			ref_cache_path = n_ref_cache_path;
+		}
+
+		if ( !std::filesystem::exists(ref_cache_path) )
+		{
+			std::cerr << "[E] REF_CACHE directory " << ref_cache_path << " does not exist" << std::endl;
+			return EXIT_FAILURE;
+		}
+
+		// std::cerr << "[V] using REF_CACHE path " << ref_cache_path << std::endl;
+
 		libmaus2::parallel::threadpool::bam::BamHeaderSeqData seqdata(dictfn);
 
+		if ( seqdata.sequencesMissing(ref_cache_path) )
+		{
+			std::cerr << "[V] scanning FastA file " << ref << " as REF_CACHE is incomplete" << std::endl;
+
+			libmaus2::fastx::FastAReader FA(ref);
+			libmaus2::fastx::FastAReader::pattern_type pat;
+
+			while ( FA.getNextPatternUnlocked(pat) )
+			{
+				std::size_t o = 0;
+				std::string & s = pat.spattern;
+				for ( std::size_t i = 0; i < s.size(); ++i )
+					if ( i >= 33 || i <= 126 )
+						s[o++] = ::toupper(s[i]);
+				s.resize(o);
+
+				std::string digest;
+				bool const ok = libmaus2::util::MD5::md5(s,digest);
+
+				if ( ! ok )
+				{
+					std::cerr << "[E] failed to compute md5 checksum for " << pat.sid << " in " << ref << std::endl;
+					return EXIT_FAILURE;
+				}
+
+				std::filesystem::path dictpath = ref_cache_path;
+				dictpath /= digest;
+
+				std::cerr << "[V] writing " << dictpath << " for " << pat.sid << " in " << ref << std::endl;
+				libmaus2::aio::OutputStreamInstance OSI(static_cast<std::string>(dictpath));
+				OSI << s;
+				OSI.flush();
+			}
+		}
+
+		if ( seqdata.sequencesMissing(ref_cache_path) )
+		{
+			std::cerr << "[E] unable to create required REF_CACHE files (mismatch between dictionary and FastA?)" << std::endl;
+			return EXIT_FAILURE;
+		}
+
 		std::size_t const hwthreads = libmaus2::parallel::NumCpus::getNumLogicalProcessors();
 
 		std::size_t argthreads = arg.getParsedArgOrDefault<std::size_t>("t", hwthreads);


=====================================
src/test/testsort.cpp
=====================================
@@ -16,7 +16,7 @@
     You should have received a copy of the GNU General Public License
     along with this program.  If not, see <http://www.gnu.org/licenses/>..
 */
-
+#include <libmaus2/sorting/RankSearch.hpp>
 #include <libmaus2/huffman/LFSupportDecoder.hpp>
 
 #include <libmaus2/huffman/LFRankPosEncoder.hpp>
@@ -767,9 +767,673 @@ void testQuickSort()
 	}
 }
 
+
 int main()
 {
+		{
+			std::cerr << "[V] running sort..." << std::endl;
+
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+				decltype(V) const VR = V;
+
+				decltype(V) VT = VR;
+
+				std::size_t const threads = 10;
+
+				for ( std::size_t z = 0; z < 3; ++z )
+				{
+					V = VR;
+					decltype(V) VT = VR;
+
+					if ( z == 0 )
+					{
+						std::shared_ptr<int[]> T(new int[n]);
+						libmaus2::sorting::RankSearch::sortWithTempSpace(
+							V.begin(),
+							V.end(),
+							T.get(),
+							threads
+						);
+					}
+					else if ( z == 1 )
+					{
+						std::size_t const ln = 1024;
+						std::shared_ptr<int[]> T(new int[ln]);
+						libmaus2::sorting::RankSearch::sortLimited(
+							V.begin(),
+							V.end(),
+							T.get(),
+							ln,
+							threads
+						);
+					}
+					else if ( z == 2 )
+					{
+						libmaus2::sorting::RankSearch::sortInPlace(
+							V.begin(),
+							V.end(),
+							threads
+						);
+					}
+					else if ( z == 3 )
+					{
+						std::shared_ptr<int[]> T(new int[n]);
+						libmaus2::sorting::RankSearch::sortWithTempSpaceViaThreadPool(
+							V.begin(),
+							V.end(),
+							T.get(),
+							threads
+						);
+
+					}
+
+
+					for ( std::size_t j = 1; j < n; ++j )
+						assert ( V[j-1] <= V[j] );
+
+					std::sort(VT.begin(),VT.end());
+					for ( std::size_t j = 0; j < n; ++j )
+						assert ( VT[j] == V[j] );
+				}
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+
+			std::cerr << "[V] running sort done" << std::endl;
+		}
+
+
+		{
+			std::cerr << "[V] running mergeParallelDelayedSearch..." << std::endl;
+
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+				std::size_t const l = libmaus2::random::Random::rand64() % (n+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+
+				std::sort(V.begin(),V.begin()+l);
+				std::sort(V.begin()+l,V.end());
+
+				std::shared_ptr<int[]> T(new int[n]);
+
+				std::size_t const threads = 10;
+				libmaus2::sorting::RankSearch::mergeParallelDelayedSearch(
+					V.begin(),V.begin()+l,
+					V.begin()+l,V.end(),
+					T.get(),
+					threads
+				);
+
+				for ( std::size_t j = 1; j < n; ++j )
+					assert ( T[j-1] <= T[j] );
+
+				std::sort(V.begin(),V.end());
+				for ( std::size_t j = 0; j < n; ++j )
+					assert ( T[j] == V[j] );
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+
+			std::cerr << "[V] running mergeParallel done" << std::endl;
+		}
+
+
+	{
+		std::cerr << "running short mergeInPlaceParallel test...";
+
+		std::vector<int> VA({1,3,5,7,9,9});
+		std::vector<int> VB({2,4,6,6,8,10});
+		std::vector<int> V;
+		for ( auto i : VA )
+			V.push_back(i);
+		for ( auto i : VB )
+			V.push_back(i);
+
+		libmaus2::sorting::RankSearch::mergeInPlaceParallel(
+			V.begin(),
+			V.begin()+VA.size(),
+			V.end(),
+			2
+		);
+
+		for ( std::size_t i = 1; i < V.size(); ++i )
+			assert ( V[i-1] <= V[i] );
+
+		std::cerr << "done." << std::endl;
+	}
+
+	{
+		std::cerr << "running random mergeInPlaceParallel test...";
+
+		for ( std::size_t parts = 2; parts <= 16; ++parts )
+		{
+			std::size_t const n0 = 128;
+			std::size_t const n1 = 245;
+			std::vector<int> VA;
+			std::vector<int> VB;
+
+			{
+				int s = libmaus2::random::Random::rand64() % 64;
+
+				for ( std::size_t i = 0; i < n0; ++i )
+				{
+					VA.push_back(s);
+					s += libmaus2::random::Random::rand64() % 4;
+				}
+			}
+			{
+				int s = libmaus2::random::Random::rand64() % 64;
+
+				for ( std::size_t i = 0; i < n1; ++i )
+				{
+					VB.push_back(s);
+					s += libmaus2::random::Random::rand64() % 4;
+				}
+			}
+
+			std::vector<int> V;
+			for ( auto i : VA )
+				V.push_back(i);
+			for ( auto i : VB )
+				V.push_back(i);
+			std::vector<int> VT = V;
+
+			libmaus2::sorting::RankSearch::mergeInPlaceParallel(
+				V.begin(),
+				V.begin()+VA.size(),
+				V.end(),
+				parts
+			);
+
+			for ( std::size_t i = 1; i < V.size(); ++i )
+				assert ( V[i-1] <= V[i] );
+
+			std::sort(VT.begin(),VT.end());
+			for ( std::size_t i = 0; i < V.size(); ++i )
+				assert ( V[i] == VT[i] );
+		}
+
+		std::cerr << "done." << std::endl;
+	}
+
+	{
+		std::cerr << "running random mergeLimitedParallel test...";
+
+		std::size_t const tmpn = 1024;
+		std::shared_ptr<int[]> tptr(new int[tmpn]);
+
+		for ( std::size_t parts = 2; parts <= 16; ++parts )
+		{
+			std::size_t const n0 = 128;
+			std::size_t const n1 = 245;
+			std::vector<int> VA;
+			std::vector<int> VB;
+
+			{
+				int s = libmaus2::random::Random::rand64() % 64;
+
+				for ( std::size_t i = 0; i < n0; ++i )
+				{
+					VA.push_back(s);
+					s += libmaus2::random::Random::rand64() % 4;
+				}
+			}
+			{
+				int s = libmaus2::random::Random::rand64() % 64;
+
+				for ( std::size_t i = 0; i < n1; ++i )
+				{
+					VB.push_back(s);
+					s += libmaus2::random::Random::rand64() % 4;
+				}
+			}
+
+			std::vector<int> V;
+			for ( auto i : VA )
+				V.push_back(i);
+			for ( auto i : VB )
+				V.push_back(i);
+			std::vector<int> VT = V;
+
+			libmaus2::sorting::RankSearch::mergeLimitedParallel(
+				V.begin(),
+				V.begin()+VA.size(),
+				V.end(),
+				tptr.get(),
+				tmpn,
+				parts
+			);
+
+			for ( std::size_t i = 1; i < V.size(); ++i )
+				assert ( V[i-1] <= V[i] );
+
+			std::sort(VT.begin(),VT.end());
+			for ( std::size_t i = 0; i < V.size(); ++i )
+				assert ( V[i] == VT[i] );
+		}
+
+		std::cerr << "done." << std::endl;
+	}
+
+	{
+		std::vector<int> VA({1,3,5,7,9,9});
+		std::vector<int> VB({2,4,6,6,8,10});
+
+
+		std::size_t const zz = VA.size() + VB.size();
+
+		for ( std::size_t i = 0; i <= zz; ++i )
+		{
+			auto const P = libmaus2::sorting::RankSearch::rankSearch(VA.begin(),VA.end(),VB.begin(),VB.end(),i);
+
+			std::size_t const i_a = P.first - VA.begin();
+			std::size_t const i_b = P.second - VB.begin();
+
+			std::cerr << "[0] i=" << i << " i_a=" << i_a << " i_b=" << i_b << " i_a+i_b=" << (i_a+i_b) << std::endl;
+
+			assert ( i_a + i_b == i );
+		}
+
+		for ( std::size_t i = 0; i <= VA.size(); ++i )
+		{
+			auto const P = libmaus2::sorting::RankSearch::rankSearch(VA.begin(),VA.end(),VB.begin(),VB.begin(),i);
+
+			std::size_t const i_a = P.first - VA.begin();
+			std::size_t const i_b = P.second - VB.begin();
+
+			std::cerr << "[1] i=" << i << " i_a=" << i_a << " i_b=" << i_b << " i_a+i_b=" << (i_a+i_b) << std::endl;
+
+			assert ( i_a + i_b == i );
+		}
+
+		for ( std::size_t i = 0; i <= VB.size(); ++i )
+		{
+			auto const P = libmaus2::sorting::RankSearch::rankSearch(VA.begin(),VA.begin(),VB.begin(),VB.end(),i);
+
+			std::size_t const i_a = P.first - VA.begin();
+			std::size_t const i_b = P.second - VB.begin();
+
+			std::cerr << "[2] i=" << i << " i_a=" << i_a << " i_b=" << i_b << " i_a+i_b=" << (i_a+i_b) << std::endl;
+
+			assert ( i_a + i_b == i );
+		}
+
+		for ( std::size_t i = 0; i <= zz; ++i )
+		{
+			typedef std::vector<int>::const_iterator const_iterator;
+
+			{
+				std::vector< std::pair<const_iterator,const_iterator> > const V(
+					{
+						std::make_pair(VA.begin(),VA.end()),
+						std::make_pair(VB.begin(),VB.end())
+					}
+				);
+
+				auto const R = libmaus2::sorting::RankSearch::rankSearchMulti(V,i);
+
+				std::cerr << "[31] i=" << i;
+				std::size_t s = 0;
+
+				for ( std::size_t j = 0; j < R.size(); ++j )
+				{
+					std::size_t const c = R[j] - V[j].first ;
+					s += c;
+					std::cerr << " i_" << j << "=" << c;
+				}
+
+				std::cerr << " sum=" << s << std::endl;
+
+				assert ( s == i );
+			}
+
+			{
+				std::vector< std::pair<const_iterator,const_iterator> > const V(
+					{
+						std::make_pair(VB.begin(),VB.end()),
+						std::make_pair(VA.begin(),VA.end())
+					}
+				);
+
+				auto const R = libmaus2::sorting::RankSearch::rankSearchMulti(V,i);
+
+				std::cerr << "[32] i=" << i;
+				std::size_t s = 0;
+
+				for ( std::size_t j = 0; j < R.size(); ++j )
+				{
+					std::size_t const c = R[j] - V[j].first ;
+					s += c;
+					std::cerr << " i_" << j << "=" << c;
+				}
+
+				std::cerr << " sum=" << s << std::endl;
+
+				assert ( s == i );
+			}
+		}
+
+		{
+			std::vector<int> V;
+
+			for ( std::size_t i = 0; i < VA.size(); ++i )
+				V.push_back(VA[i]);
+			for ( std::size_t i = 0; i < VB.size(); ++i )
+				V.push_back(VB[i]);
+
+			libmaus2::sorting::RankSearch::mergeInPlace(
+				V.begin(),
+				V.begin()+VA.size(),
+				V.begin()+VA.size()+VB.size()
+			);
+
+			for ( std::size_t i = 0; i < V.size(); ++i )
+				std::cerr << "V[" << i << "]=" << V[i] << std::endl;
+		}
+
+
+		{
+			std::cerr << "[V] running mergeParallelCopyBack..." << std::endl;
+
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+				std::size_t const l = libmaus2::random::Random::rand64() % (n+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+				decltype(V) VT = V;
+
+				std::sort(V.begin(),V.begin()+l);
+				std::sort(V.begin()+l,V.end());
+
+				std::shared_ptr<int[]> T(new int[n]);
+
+				std::size_t const threads = 10;
+				bool const parallelSearch = true;
+				libmaus2::sorting::RankSearch::mergeParallelCopyBack(
+					V.begin(),
+					V.begin()+l,
+					V.end(),
+					T.get(),
+					threads,
+					parallelSearch
+				);
+
+				for ( std::size_t j = 1; j < n; ++j )
+					assert ( V[j-1] <= V[j] );
+
+				std::sort(VT.begin(),VT.end());
+				for ( std::size_t j = 0; j < n; ++j )
+					assert ( VT[j] == V[j] );
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+
+			std::cerr << "[V] running mergeParallelCopyBack done" << std::endl;
+		}
+
+
+		{
+			std::cerr << "[V] running mergeParallel..." << std::endl;
+
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+				std::size_t const l = libmaus2::random::Random::rand64() % (n+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+
+				std::sort(V.begin(),V.begin()+l);
+				std::sort(V.begin()+l,V.end());
+
+				std::shared_ptr<int[]> T(new int[n]);
+
+				std::size_t const threads = 10;
+				bool const parallelSearch = true;
+				libmaus2::sorting::RankSearch::mergeParallel(
+					V.begin(),V.begin()+l,
+					V.begin()+l,V.end(),
+					T.get(),
+					threads,
+					parallelSearch
+				);
+
+				for ( std::size_t j = 1; j < n; ++j )
+					assert ( T[j-1] <= T[j] );
+
+				std::sort(V.begin(),V.end());
+				for ( std::size_t j = 0; j < n; ++j )
+					assert ( T[j] == V[j] );
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+
+			std::cerr << "[V] running mergeParallel done" << std::endl;
+		}
+
+
+
+		{
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+				std::size_t const l = libmaus2::random::Random::rand64() % (n+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+
+				std::sort(V.begin(),V.begin()+l);
+				std::sort(V.begin()+l,V.end());
+
+				libmaus2::sorting::RankSearch::mergeInPlace(
+					V.begin(),
+					V.begin()+l,
+					V.end()
+				);
+
+				for ( std::size_t j = 1; j < n; ++j )
+					assert ( V[j-1] <= V[j] );
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+		}
+
+		{
+			std::size_t const numtries = 4096;
+			std::size_t const maxlen = 64*1024;
+			int const mod = 8192;
+			std::size_t const limit = 2048;
+			std::shared_ptr<int[]> tptr(new int[limit]);
+
+			for ( std::size_t z = 0; z < numtries; ++z )
+			{
+				std::size_t const n = libmaus2::random::Random::rand64() % (maxlen+1);
+				std::size_t const l = libmaus2::random::Random::rand64() % (n+1);
+
+				// std::size_t const r = n-l;
+
+				std::vector<int> V;
+				for ( std::size_t j = 0; j < n; ++j )
+					V.push_back(libmaus2::random::Random::rand64() % mod);
+
+				std::sort(V.begin(),V.begin()+l);
+				std::sort(V.begin()+l,V.end());
+
+				libmaus2::sorting::RankSearch::mergeLimited(
+					V.begin(),
+					V.begin()+l,
+					V.end(),
+					tptr.get(),
+					limit
+				);
+
+				for ( std::size_t j = 1; j < n; ++j )
+					assert ( V[j-1] <= V[j] );
+
+				std::size_t const zp = z+1;
+
+				if ( zp % 1024 == 0 )
+					std::cerr << zp << "/" << numtries << std::endl;
+			}
+		}
+
 	{
+		std::size_t const n = 1024*1024*1024;
+		std::size_t const threads = 24;
+
+		std::cerr << "filling array...";
+		int const mod = 1024*1024;
+		std::vector<int> V(n);
+		for ( std::size_t j = 0; j < n; ++j )
+			V[j] = libmaus2::random::Random::rand64() % mod;
+		std::vector<int> const VRT = V;
+		std::cerr << "done." << std::endl;
+
+		for ( std::size_t z = 0; z < 4; ++z )
+		{
+			std::vector<int> VT = VRT;
+			V = VRT;
+
+			libmaus2::timing::RealTimeClock rtc;
+			rtc.start();
+
+			if ( z == 0 )
+			{
+				std::shared_ptr<int[]> T(new int[n]);
+
+				std::cerr << "[V] sorting with temp space...";
+
+				libmaus2::sorting::RankSearch::sortWithTempSpace(
+					V.begin(),
+					V.end(),
+					T.get(),
+					threads
+				);
+				std::cerr << "done." << std::endl;
+			}
+			else if ( z == 1 )
+			{
+				std::size_t const ln = 64*1024;
+				std::shared_ptr<int[]> T(new int[ln]);
+
+				std::cerr << "[V] sorting with limited temp space...";
+
+				libmaus2::sorting::RankSearch::sortLimited(
+					V.begin(),
+					V.end(),
+					T.get(),
+					ln,
+					threads
+				);
+
+				std::cerr << "done." << std::endl;
+			}
+			else if ( z == 2 )
+			{
+				std::cerr << "[V] sorting without temp space...";
+
+				libmaus2::sorting::RankSearch::sortInPlace(
+					V.begin(),
+					V.end(),
+					threads
+				);
+				std::cerr << "done." << std::endl;
+			}
+			if ( z == 3 )
+			{
+				std::shared_ptr<int[]> T(new int[n]);
+
+				std::cerr << "[V] sorting with temp space using thread pool...";
+
+				libmaus2::sorting::RankSearch::sortWithTempSpaceViaThreadPool(
+					V.begin(),
+					V.end(),
+					T.get(),
+					threads
+				);
+				std::cerr << "done." << std::endl;
+			}
+			double const tim = rtc.getElapsedSeconds();
+
+			std::cerr << "time " << tim << std::endl;
+
+			rtc.start();
+			std::sort(VT.begin(),VT.end());
+
+			double const timser = rtc.getElapsedSeconds();
+
+			std::cerr << "serial time " << timser << std::endl;
+
+			for ( std::size_t j = 1; j < n; ++j )
+				assert ( V[j-1] <= V[j] );
+			for ( std::size_t j = 0; j < n; ++j )
+				assert ( VT[j] == V[j] );
+		}
+	}
+
+
+		return 0;
+
 		testQuickSort();
 
 		testpairfilesortingSecond(16*1024*1024);



View it on GitLab: https://salsa.debian.org/med-team/libmaus2/-/compare/5dcc41d4ca9cf475e13ec92ff284c528b4836f4b...0cbf640b4377ea51285d3eef56da0e43ac7a4f7e

-- 
View it on GitLab: https://salsa.debian.org/med-team/libmaus2/-/compare/5dcc41d4ca9cf475e13ec92ff284c528b4836f4b...0cbf640b4377ea51285d3eef56da0e43ac7a4f7e
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20210902/e3e13869/attachment-0001.htm>