[med-svn] [Git][med-team/pairtools][upstream] New upstream version 1.0.3

Andreas Tille (@tille) gitlab at salsa.debian.org
Mon Dec 11 14:43:04 GMT 2023



Andreas Tille pushed to branch upstream at Debian Med / pairtools


Commits:
2e5a2045 by Andreas Tille at 2023-12-11T15:24:53+01:00
New upstream version 1.0.3
- - - - -


24 changed files:

- .flake8
- .github/workflows/python-publish-test.yml
- .github/workflows/python-publish.yml
- CHANGES.md
- README.md
- doc/examples/benchmark/Snakefile
- doc/examples/benchmark/benchmark.ipynb
- doc/examples/benchmark/benchmarking_1mln.csv
- doc/index.rst
- + doc/stats.rst
- pairtools/__init__.py
- pairtools/cli/dedup.py
- pairtools/cli/flip.py
- pairtools/cli/markasdup.py
- pairtools/cli/phase.py
- pairtools/cli/restrict.py
- pairtools/cli/select.py
- pairtools/cli/split.py
- pairtools/lib/dedup.py
- pairtools/lib/headerops.py
- pairtools/lib/scaling.py
- pairtools/lib/select.py
- pairtools/lib/stats.py
- tests/test_scaling.py


Changes:

=====================================
.flake8
=====================================
@@ -5,14 +5,24 @@ exclude =
 
 max-line-length = 120
 ignore =
-    E203  # whitespace before ':'
-    E266  # too many leading '#' for block comment
-    E501  # line too long
-    W503  # line break before binary operator
+    # whitespace before ':'
+    E203
+    # too many leading '#' for block comment
+    E266
+    # line too long
+    E501
+    # line break before binary operator
+    W503
 select =
-    C  # mccabe complexity
-    E  # pycodestyle
-    F  # pyflakes error
-    W  # pyflakes warning
-    B  # bugbear
-    B950  # line exceeds max-line-length + 10%
+    # mccabe complexity
+    C
+    # pycodestyle
+    E
+    # pyflakes error
+    F
+    # pyflakes warning
+    W
+    # bugbear
+    B
+    # line exceeds max-line-length + 10%
+    B950


=====================================
.github/workflows/python-publish-test.yml
=====================================
@@ -18,7 +18,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python at v2
       with:
-        python-version: '3.x'
+        python-version: '3.10'
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip


=====================================
.github/workflows/python-publish.yml
=====================================
@@ -17,7 +17,7 @@ jobs:
     - name: Set up Python
       uses: actions/setup-python at v2
       with:
-        python-version: '3.x'
+        python-version: '3.10'
     - name: Install dependencies
       run: |
         python -m pip install --upgrade pip


=====================================
CHANGES.md
=====================================
@@ -1,3 +1,21 @@
+### 1.0.3 (2023-11-20) ###
+- [x] `pairtools dedup`: update default chunksize to 10,000 to prevent memory overflow on datasets with high duplication rate 
+
+### 1.0.2 (2022-11-XX) ###
+
+- [x] `pairtools select` regex update 
+(string substitutions failed when the column name was a substring of another)
+
+- [x] Warnings capture in dedup: pairs lines are always split after rstrip newline
+
+- [x] Important fixes of splitting schema
+
+- [x] Dedup comment removed (failed when the read qualities contained "#")
+
+- [x] Remove dbist build out of wheel
+
+- [x] pairtools scaling: fixed an issue with scaling maximum range value https://github.com/open2c/pairtools/issues/150#issue-1439106031 
+
 ### 1.0.1 (2022-09-XX) ###
 
 - [x] Fixed issue with pysam dependencies on pip and conda


=====================================
README.md
=====================================
@@ -179,6 +179,10 @@ $ cd pairtools
 $ pip install -e .
 ```
 
+## Citing `pairtools`
+
+Open2C*, Nezar Abdennur, Geoffrey Fudenberg, Ilya M. Flyamer, Aleksandra A. Galitsyna*, Anton Goloborodko*, Maxim Imakaev, Sergey V. Venev. "Pairtools: from sequencing data to chromosome contacts" bioRxiv, February 13, 2023. ; doi: https://doi.org/10.1101/2023.02.13.528389
+
 ## License
 
 MIT


=====================================
doc/examples/benchmark/Snakefile
=====================================
@@ -1,4 +1,4 @@
-cores_choices = [1] #, 2, 4]
+cores_choices = [1, 2, 4]
 
 chromap = expand(
     "output/result.chromap.{cores}.pairs",
@@ -24,6 +24,14 @@ hicpro = expand(
     "output/result.hicpro.{cores}.pairs",
     cores=cores_choices,
 )
+tadbit = expand(
+    "output/result.tadbit.{cores}.reads",
+    cores=cores_choices,
+)
+tadbit_bowtie = expand(
+    "output/result.tadbit_bowtie2.{cores}.reads",
+    cores=cores_choices,
+)
 pairtools = expand(
     "output/result.pairtools.{cores}.pairs",
     cores=cores_choices,
@@ -33,13 +41,29 @@ pairtools_bwamem2 = expand(
     cores=cores_choices,
 )
 
+# mapping only:
+bowtie = expand(
+    "output/result.bowtie.{cores}.sam",
+    cores=cores_choices,
+)
+bwamem = expand(
+    "output/result.bwamem.{cores}.sam",
+    cores=cores_choices,
+)
+bwamem2 = expand(
+    "output/result.bwamem2.{cores}.sam",
+    cores=cores_choices,
+)
+
 rule all:
     input:
-        lambda wildcards: juicer #pairtools + pairtools_bwamem2 + chromap + hicpro + fanc_bowtie + fanc_bwa + hicexplorer
+        lambda wildcards: tadbit + tadbit_bowtie + bowtie + bwamem2 + pairtools + pairtools_bwamem2 + chromap + hicpro + fanc_bowtie + fanc_bwa + hicexplorer
+    # + bowtie + bwamem + bwamem2
+    # + juicer
+    # + pairtools + pairtools_bwamem2 + chromap + hicpro + fanc_bowtie + fanc_bwa + hicexplorer
 
-# juicer #
 # hicexplorer # heavy because it creates coolers
-# juicer # run separately with the number of cores equal to tested!
+# juicer # run separately with the number of cores equal to tested, b/c multiplw juicers cannot be run with the same path
 
 rule test:
     input:
@@ -51,6 +75,7 @@ rule test:
         genome_index_chromap="data/hg38/index/chromap/hg38",
         genome_index_bwamem2="data/hg38/index/bwa-mem2/hg38",
         genome_index_bowtie2="data/hg38/index/bowtie2/hg38",
+        genome_index_gem="data/hg38/index/gem/hg38.gem",
         genome_rsites="data/hg38/hg38.DpnII.bed",
     threads: lambda wildcards: int(wildcards.cores),
     output:
@@ -64,17 +89,17 @@ rule test:
         if wildcards.mode == "pairtools_bwamem2":
             shell("""
                 soft/bwa-mem2/bwa-mem2 mem -t {wildcards.cores} -SP {input.genome_index_bwamem2} {input.fastq1} {input.fastq2} | \
-                    soft/pairtools1.0.0/bin/pairtools parse --nproc-in {wildcards.cores} --nproc-out {wildcards.cores} --drop-sam --drop-seq -c {input.chromsizes} | \
-                    soft/pairtools1.0.0/bin/pairtools sort --nproc {wildcards.cores} | \
-                    soft/pairtools1.0.0/bin/pairtools dedup -p {wildcards.cores} --chunksize 1000000 \
+                    soft/pairtools1.0.2/bin/pairtools parse --nproc-in {wildcards.cores} --nproc-out {wildcards.cores} --drop-sam --drop-seq -c {input.chromsizes} | \
+                    soft/pairtools1.0.2/bin/pairtools sort --nproc {wildcards.cores} | \
+                    soft/pairtools1.0.2/bin/pairtools dedup -p {wildcards.cores} --chunksize 1000000 \
                     -o {output.file}
                 """)
         elif wildcards.mode == "pairtools":
             shell("""
-                soft/pairtools1.0.0/bin/bwa mem -t {wildcards.cores} -SP {input.genome_index_bwa} {input.fastq1} {input.fastq2} | \
-                    soft/pairtools1.0.0/bin/pairtools parse --nproc-in {wildcards.cores} --nproc-out {wildcards.cores} --drop-sam --drop-seq -c {input.chromsizes} | \
-                    soft/pairtools1.0.0/bin/pairtools sort --nproc {wildcards.cores} | \
-                    soft/pairtools1.0.0/bin/pairtools dedup -p {wildcards.cores} --chunksize 1000000 \
+                soft/pairtools1.0.2/bin/bwa mem -t {wildcards.cores} -SP {input.genome_index_bwa} {input.fastq1} {input.fastq2} | \
+                    soft/pairtools1.0.2/bin/pairtools parse --nproc-in {wildcards.cores} --nproc-out {wildcards.cores} --drop-sam --drop-seq -c {input.chromsizes} | \
+                    soft/pairtools1.0.2/bin/pairtools sort --nproc {wildcards.cores} | \
+                    soft/pairtools1.0.2/bin/pairtools dedup -p {wildcards.cores} --chunksize 1000000 \
                     -o {output.file}
                 """)
 
@@ -109,8 +134,9 @@ rule test:
         elif wildcards.mode == "hicpro":
             shell("""
                 cd soft/HiC-Pro_env/HiC-Pro/
-                TMP_CONFIG=$(mktemp -u output/tmp.XXXXXXXX)
+		mkdir -p output
                 TMP_DIR=$(mktemp -d -u output/tmp.XXXXXXXX)
+                TMP_CONFIG=$(mktemp -u output/tmp.XXXXXXXX.config)
                 cp config-hicpro.txt $TMP_CONFIG
                 
                 sed -i 's/N_CPU = 4/N_CPU = {wildcards.cores}/' $TMP_CONFIG
@@ -118,7 +144,7 @@ rule test:
                 
                 # Cleanup:
                 cp $TMP_DIR/hic_results/data/sample1/sample1.allValidPairs ../../../{output.file}
-                rm -r $TMP_DIR $TMP_CONFIG
+                rm -r $TMP_DIR; rm $TMP_CONFIG
                 """)
         elif wildcards.mode == "juicer":
             # Note that this process is not guaranteed to work well in parallel mode;
@@ -136,16 +162,54 @@ rule test:
                 TMP_DIR=$(mktemp -d -u output/tmp.XXXXXXXX)
                 
                 soft/hicexplorer/bin/hicBuildMatrix --samFiles \
-                                <(bwa mem -A1 -B4 -E50 -L0 {input.genome_index_bwa} -t {wildcards.cores} data/SRR6107789_1.fastq.gz | samtools view -@ {wildcards.cores} -Shb -) \
-                                <(bwa mem -A1 -B4 -E50 -L0 {input.genome_index_bwa} -t {wildcards.cores} data/SRR6107789_2.fastq.gz | samtools view -@ {wildcards.cores} -Shb -) \
-                                 --restrictionSequence GATC \
-                                 --danglingSequence GATC \
-                                 --restrictionCutFile {input.genome_rsites}  \
-                                 --threads {wildcards.cores} \
-                                 --inputBufferSize 1000000 \
-                                 --QCfolder $TMP_DIR \
-                                 -o {output.file}
+                    <(bwa mem -A1 -B4 -E50 -L0 {input.genome_index_bwa} -t {wildcards.cores} {input.fastq1} | samtools view -@ {wildcards.cores} -Shb -) \
+                    <(bwa mem -A1 -B4 -E50 -L0 {input.genome_index_bwa} -t {wildcards.cores} {input.fastq2} | samtools view -@ {wildcards.cores} -Shb -) \
+                     --restrictionSequence GATC \
+                     --danglingSequence GATC \
+                     --restrictionCutFile {input.genome_rsites}  \
+                     --threads {wildcards.cores} \
+                     --inputBufferSize 1000000 \
+                     --QCfolder $TMP_DIR \
+                     -o {output.file}
 
                 # Cleanup:
                 rm -r $TMP_DIR
                 """)
+        elif wildcards.mode == "tadbit":
+            shell("""
+            
+                TMP_DIR=$(mktemp -d -u tadbit_output/tmp.XXXXXXXX)
+                
+                soft/tadbit/bin/tadbit map $TMP_DIR -C {wildcards.cores} --mapper_binary soft/tadbit/bin/gem-mapper --fastq {input.fastq1} --read 1 --index {input.genome_index_gem} --renz DpnII || true 
+                soft/tadbit/bin/tadbit map $TMP_DIR -C {wildcards.cores} --mapper_binary soft/tadbit/bin/gem-mapper --fastq {input.fastq2} --read 2 --index {input.genome_index_gem} --renz DpnII || true
+                soft/tadbit/bin/tadbit parse $TMP_DIR --genome data/hg38/hg38.fa || true
+                soft/tadbit/bin/tadbit filter $TMP_DIR -C {wildcards.cores} --format mid || true
+                
+                mv $TMP_DIR/03_filtered_reads/valid_r1-r2_intersection_*.tsv {output.file}
+                rm -r $TMP_DIR
+                """)
+        elif wildcards.mode == "tadbit_bowtie2":
+            shell("""
+            
+                TMP_DIR=$(mktemp -d -u tadbit_output/tmp.XXXXXXXX)
+                
+                soft/tadbit/bin/tadbit map $TMP_DIR -C {wildcards.cores} --mapper bowtie2 --mapper_binary soft/tadbit/bin/bowtie2 --fastq {input.fastq1} --read 1 --index {input.genome_index_bowtie2} --renz DpnII || true 
+                soft/tadbit/bin/tadbit map $TMP_DIR -C {wildcards.cores} --mapper bowtie2 --mapper_binary soft/tadbit/bin/bowtie2 --fastq {input.fastq2} --read 2 --index {input.genome_index_bowtie2} --renz DpnII || true
+                soft/tadbit/bin/tadbit parse $TMP_DIR --genome data/hg38/hg38.fa || true
+                soft/tadbit/bin/tadbit filter $TMP_DIR -C {wildcards.cores} --format mid || true
+                
+                mv $TMP_DIR/03_filtered_reads/valid_r1-r2_intersection_*.tsv {output.file}
+                rm -r $TMP_DIR
+                """)
+        elif wildcards.mode == "bowtie":
+            shell("""
+                soft/tadbit/bin/bowtie2 -p 4 -x {input.genome_index_bowtie2} -1 {input.fastq1} -2 {input.fastq2} -S {output.file}
+                """)
+        elif wildcards.mode == "bwamem":
+            shell("""
+                soft/pairtools0.3.0/bin/bwa mem -t 4 -SP {input.genome_index_bwa} {input.fastq1} {input.fastq2} > {output.file}
+                """)
+        elif wildcards.mode == "bwamem2":
+            shell("""
+                soft/bwa-mem2/bwa-mem2 mem -t 4 -SP {input.genome_index_bwamem2} {input.fastq1} {input.fastq2} > {output.file}
+                """)


=====================================
doc/examples/benchmark/benchmark.ipynb
=====================================
The diff for this file was not included because it is too large.

=====================================
doc/examples/benchmark/benchmarking_1mln.csv
=====================================
@@ -1,121 +1,196 @@
 ,s,h:m:s,max_rss,max_vms,max_uss,max_pss,io_in,io_out,mean_load,cpu_time,util,ncores
-0,171.2557,0:02:51,18999.14,20579.61,18990.64,18991.51,14942.02,61.25,81.02,140.18,chromap,1
-1,161.8126,0:02:41,19010.55,20589.94,19002.19,19003.05,29831.66,88.35,72.64,122.87,chromap,1
-2,147.2525,0:02:27,19251.87,20844.66,19244.98,19245.37,41893.51,176.68,89.18,139.78,chromap,1
-3,160.351,0:02:40,19010.16,20589.94,19004.14,19004.56,56673.84,265.02,74.24,132.19,chromap,1
-4,152.5034,0:02:32,19141.59,20730.08,19134.09,19134.47,71454.04,353.35,83.52,144.57,chromap,1
-0,4855.3497,1:20:55,7145.98,8625.58,4280.6,5694.97,84744.18,9677.61,40.17,357.4,fanc_bowtie2,1
-1,4420.6342,1:13:40,8792.0,10553.59,5632.48,7170.15,84826.6,21028.33,40.93,403.24,fanc_bowtie2,1
-2,4413.6342,1:13:33,8986.25,10746.44,5820.09,7349.67,88484.2,30605.01,39.66,404.48,fanc_bowtie2,1
-3,4559.0604,1:15:59,6141.95,7904.18,3527.04,4518.52,88624.8,41149.18,41.73,450.43,fanc_bowtie2,1
-4,4393.466,1:13:13,7152.7,8624.96,4279.3,5706.18,89015.34,51967.05,38.75,461.44,fanc_bowtie2,1
-0,3442.774,0:57:22,8261.52,10037.99,5748.96,6683.19,131784.28,6719.15,36.21,372.2,fanc_bwa,1
-1,2918.0923,0:48:38,8458.16,10221.93,5850.96,6836.51,137226.62,14901.08,36.74,395.53,fanc_bwa,1
-2,2889.2381,0:48:09,9003.84,10766.77,5838.07,7375.76,138088.69,21904.57,34.85,399.67,fanc_bwa,1
-3,2880.8428,0:48:00,6171.75,8915.39,5744.7,5861.54,138088.84,29503.53,36.48,424.87,fanc_bwa,1
-4,2988.8879,0:49:48,7149.86,8919.81,5798.19,5915.55,155317.34,37105.8,38.44,479.9,fanc_bwa,1
-0,1045.5567,0:17:25,22050.28,23335.28,22049.03,22049.68,5638.59,0.02,127.91,540.64,hicexplorer,1
-1,958.0772,0:15:58,22177.91,23335.53,22164.75,22165.79,6268.91,0.52,139.3,550.06,hicexplorer,1
-2,963.7224,0:16:03,22113.03,23335.53,22089.17,22091.46,7024.13,1.54,140.39,592.27,hicexplorer,1
-3,959.7175,0:15:59,21686.32,23335.53,21656.57,21661.51,9478.18,1.54,135.8,583.09,hicexplorer,1
-4,940.8744,0:15:40,22269.25,23335.53,22239.02,22243.54,9608.0,2.05,142.33,606.96,hicexplorer,1
-0,1161.8769,0:19:21,6719.61,7397.61,6669.41,6673.14,1520.62,1090.28,186.36,81.7,hicpro,1
-1,1150.9484,0:19:10,6704.99,7397.85,6669.48,6672.07,2811.04,2215.77,188.62,103.79,hicpro,1
-2,1152.1775,0:19:12,6702.39,7397.86,6669.6,6672.11,4304.55,3341.27,187.76,115.02,hicpro,1
-3,1137.5632,0:18:57,6709.09,7397.86,6669.89,6676.28,4467.08,4466.36,185.66,112.54,hicpro,1
-4,1136.876,0:18:56,6709.15,7397.86,6670.27,6675.79,4467.09,5592.23,187.49,148.26,hicpro,1
+0,444.776,0:07:24,3496.51,4314.64,3468.51,3472.45,0.0,780.64,398.88,1775.19,bowtie,1
+1,446.6064,0:07:26,3495.83,4314.77,3466.35,3470.77,0.0,1568.73,396.66,1776.07,bowtie,1
+2,439.4402,0:07:19,3495.91,4314.39,3468.05,3471.94,0.0,2373.96,403.7,1781.37,bowtie,1
+3,449.5572,0:07:29,3497.03,4314.64,3469.36,3473.22,0.0,3148.84,394.6,1784.75,bowtie,1
+4,449.9673,0:07:29,3490.93,4314.89,3470.16,3471.92,40.96,3942.04,393.64,1785.49,bowtie,1
+0,280.5757,0:04:40,6019.11,6453.95,5994.93,5998.23,7.86,746.23,366.91,1031.28,bwamem,1
+1,286.6525,0:04:46,6004.57,6453.95,5986.34,5988.55,1485.98,1679.14,356.05,1027.24,bwamem,1
+2,300.468,0:05:00,6009.04,6517.95,6000.2,6000.56,6660.16,2736.5,375.23,1138.7,bwamem,1
+3,336.5047,0:05:36,6027.58,6645.95,6018.34,6018.94,8138.28,3669.41,375.65,1280.85,bwamem,1
+4,323.7325,0:05:23,6012.26,6453.95,6002.96,6003.52,8932.21,4602.32,391.87,1291.62,bwamem,1
+0,169.2031,0:02:49,17583.51,21017.69,17574.31,17574.8,13309.02,870.68,260.2,441.72,bwamem2,1
+1,190.2805,0:03:10,17611.99,20913.7,17601.16,17601.72,29713.3,1554.78,185.67,357.92,bwamem2,1
+2,198.262,0:03:18,17553.89,20425.7,17545.7,17546.26,46154.95,2487.69,161.87,328.04,bwamem2,1
+3,117.1978,0:01:57,17591.75,20553.7,17583.7,17584.1,46179.89,3669.41,358.44,430.41,bwamem2,1
+4,142.6662,0:02:22,17615.75,20688.82,17606.59,17607.45,62596.36,4602.32,301.67,447.83,bwamem2,1
+0,155.4555,0:02:35,19062.05,20632.42,19039.34,19042.16,14756.57,0.02,78.1,122.2,chromap,1
+1,129.4362,0:02:09,19034.41,20605.15,19011.44,19014.27,14756.57,88.36,86.69,116.07,chromap,1
+2,130.3877,0:02:10,19034.32,20605.15,19011.52,19014.34,14756.57,176.69,86.2,121.96,chromap,1
+3,133.456,0:02:13,19030.1,20598.45,19007.18,19010.0,14756.57,265.02,84.37,128.14,chromap,1
+4,129.5292,0:02:09,19045.01,20615.43,19022.31,19025.15,14756.57,353.35,87.22,134.56,chromap,1
+0,4261.7448,1:11:01,7186.76,8679.24,4290.61,5702.39,1719.48,10257.31,36.77,348.34,fanc_bowtie2,1
+1,4077.2502,1:07:57,7188.25,8680.68,4290.98,5703.88,1719.5,20183.4,34.11,330.67,fanc_bowtie2,1
+2,4131.9376,1:08:51,7189.56,8681.8,4292.67,5709.75,1719.51,30697.16,35.35,351.1,fanc_bowtie2,1
+3,4050.4727,1:07:30,9027.23,10823.83,5831.42,7385.27,1719.51,42084.66,35.13,382.96,fanc_bowtie2,1
+4,4020.6237,1:07:00,9032.61,10828.28,5837.03,7391.75,2343.53,52412.16,34.58,394.25,fanc_bowtie2,1
+0,2731.6263,0:45:31,7185.57,8945.22,5718.36,5838.26,0.38,7029.16,35.71,346.69,fanc_bwa,1
+1,2715.4035,0:45:15,9044.39,10839.64,5846.46,7385.01,0.42,14315.23,32.27,353.83,fanc_bwa,1
+2,2769.2431,0:46:09,7188.17,9009.55,5735.71,5855.89,0.44,21877.82,37.54,374.56,fanc_bwa,1
+3,2706.3695,0:45:06,9043.16,10838.74,5844.58,7393.22,0.44,29434.04,35.55,376.27,fanc_bwa,1
+4,2682.8176,0:44:42,6172.31,8945.66,5679.21,5812.1,0.44,37010.45,30.23,380.22,fanc_bwa,1
+0,961.0734,0:16:01,21569.48,23347.51,21492.25,21513.06,2618.56,0.02,138.26,532.15,hicexplorer,1
+1,908.2106,0:15:08,22286.37,23347.51,22206.49,22228.31,2618.56,0.54,140.61,522.7,hicexplorer,1
+2,896.8171,0:14:56,21604.46,23347.51,21524.48,21546.3,2618.56,1.05,138.93,499.36,hicexplorer,1
+3,910.574,0:15:10,22440.63,23347.51,22361.79,22382.79,2618.56,1.56,140.63,542.86,hicexplorer,1
+4,895.9424,0:14:55,21516.94,23347.51,21469.93,21475.81,2618.56,2.07,138.9,529.28,hicexplorer,1
+0,1111.7203,0:18:31,6730.91,7406.83,6675.2,6682.8,237.25,1090.37,186.34,87.74,hicpro,1
+1,1130.0595,0:18:50,6730.21,7406.82,6675.14,6682.36,903.14,2215.98,186.08,80.43,hicpro,1
+2,1123.3536,0:18:43,6712.22,7406.82,6674.77,6677.97,5265.45,3341.59,182.57,89.92,hicpro,1
+3,1181.2436,0:19:41,6715.32,7406.82,6675.26,6680.12,6316.99,4467.2,177.76,100.38,hicpro,1
+4,1116.4777,0:18:36,6715.52,7406.82,6674.92,6679.32,6490.31,5592.8,188.64,114.97,hicpro,1
 0,951.7772,0:15:51,5457.95,5857.16,5432.96,5439.95,0.0,2882.08,95.15,7.25,juicer,1
 1,946.1429,0:15:46,5458.0,16410.45,5433.02,5439.99,0.0,4613.73,92.49,14.04,juicer,1
 2,950.1664,0:15:50,5458.07,5857.16,5433.16,5440.17,0.2,7180.02,95.32,26.17,juicer,1
 3,1004.5055,0:16:44,5458.27,16410.45,5433.17,5439.37,0.2,10377.86,93.19,30.43,juicer,1
 4,1088.6224,0:18:08,5458.32,5857.16,5433.14,5439.08,0.2,13611.21,94.37,43.31,juicer,1
-0,1072.4285,0:17:52,5717.21,8780.34,5713.66,5714.95,2308.36,39.24,97.52,14.75,pairtools,1
-1,1030.5073,0:17:10,5740.62,8780.59,5716.69,5717.73,2498.25,39.24,101.82,1059.32,pairtools,1
-2,1036.0581,0:17:16,5775.23,8780.52,5715.32,5716.56,3765.92,78.47,101.11,1065.81,pairtools,1
-3,1024.7524,0:17:04,5844.32,8780.55,5718.34,5751.62,9090.16,117.7,101.78,1069.4,pairtools,1
-4,1016.9986,0:16:56,5853.04,8780.52,5720.52,5756.12,9169.49,163.28,101.0,48.76,pairtools,1
-0,550.4871,0:09:10,16937.49,20627.9,16934.21,16935.8,17020.73,0.02,88.39,487.29,pairtools_bwamem2,1
-1,503.9536,0:08:23,17072.92,20436.13,16938.75,16976.63,33640.28,64.62,93.32,21.95,pairtools_bwamem2,1
-2,501.0463,0:08:21,16962.04,20820.16,16908.41,16909.56,50288.75,78.47,95.58,495.06,pairtools_bwamem2,1
-3,497.2351,0:08:17,16973.35,20820.14,16905.4,16921.17,66884.51,117.7,95.54,500.87,pairtools_bwamem2,1
-4,479.17,0:07:59,16995.29,20756.2,16902.0,16903.0,83578.67,156.94,97.32,49.76,pairtools_bwamem2,1
-0,94.1938,0:01:34,19068.07,20678.57,19047.6,19052.58,14768.99,0.02,119.14,113.62,chromap,2
-1,77.7396,0:01:17,19045.91,20665.29,19025.18,19029.89,14768.99,88.35,112.3,92.39,chromap,2
-2,77.1351,0:01:17,19045.99,20665.54,19025.28,19029.69,14768.99,176.68,113.29,98.83,chromap,2
-3,77.6121,0:01:17,19046.05,20665.54,19025.38,19029.77,14769.0,265.02,113.53,105.77,chromap,2
-4,77.5393,0:01:17,19045.89,20665.54,19025.11,19029.47,14769.0,353.35,112.5,111.44,chromap,2
-0,2606.2593,0:43:26,7169.49,8771.34,4283.19,5697.25,3509.66,9683.32,18.18,362.96,fanc_bowtie2,2
-1,2623.0423,0:43:43,8998.85,10977.97,5820.03,7368.32,3509.87,20209.78,19.39,389.26,fanc_bowtie2,2
-2,2560.5974,0:42:40,7169.39,8770.88,4284.44,5705.4,3509.88,30759.67,19.1,393.08,fanc_bowtie2,2
-3,2490.8813,0:41:30,7165.21,8769.02,4286.61,5705.35,4146.09,41263.6,18.05,379.78,fanc_bowtie2,2
-4,2567.7516,0:42:47,7163.99,8767.81,4331.13,5739.08,7999.79,51832.53,17.37,414.77,fanc_bowtie2,2
-0,1799.2042,0:29:59,9004.45,10989.12,5824.98,7362.88,24.79,6726.93,23.74,343.58,fanc_bwa,2
-1,1792.8877,0:29:52,7170.5,9286.24,5742.68,5861.39,24.87,14301.84,23.53,367.5,fanc_bwa,2
-2,1787.9905,0:29:47,7167.93,9286.07,5746.07,5884.19,5200.77,21894.25,24.17,379.1,fanc_bwa,2
-3,1786.0026,0:29:46,7167.34,9287.91,5746.42,5863.52,5200.79,29486.17,23.78,398.89,fanc_bwa,2
-4,1816.4364,0:30:16,7169.16,9285.9,5740.91,5854.21,5364.13,37082.73,24.02,424.26,fanc_bwa,2
-0,744.8584,0:12:24,22689.3,24488.74,22653.07,22661.19,0.0,0.02,162.18,520.73,hicexplorer,2
-1,739.6677,0:12:19,22848.0,24488.99,22811.35,22819.46,0.0,0.52,165.79,536.39,hicexplorer,2
-2,745.4914,0:12:25,22543.83,24488.99,22506.9,22515.17,0.0,1.03,156.08,550.24,hicexplorer,2
-3,735.0788,0:12:15,22745.13,24488.99,22707.45,22715.89,0.01,1.54,165.72,561.83,hicexplorer,2
-4,745.6433,0:12:25,22646.34,24488.99,22609.12,22617.68,0.07,2.05,156.73,570.12,hicexplorer,2
-0,1160.6621,0:19:20,6724.31,7397.86,6670.22,6681.32,237.58,1090.28,186.2,73.35,hicpro,2
-1,1157.5162,0:19:17,6724.37,7397.86,6670.52,6681.6,237.61,2215.77,188.28,96.48,hicpro,2
-2,1153.545,0:19:13,6724.21,7397.86,6670.41,6680.82,237.61,3341.26,188.33,111.24,hicpro,2
-3,1126.8091,0:18:46,6724.69,7397.86,6670.69,6681.04,237.61,4466.75,186.37,107.95,hicpro,2
-4,1123.4993,0:18:43,6724.44,7397.86,6670.7,6681.05,237.61,5592.23,187.78,127.18,hicpro,2
+0,1031.7979,0:17:11,5868.29,8533.19,5732.12,5752.16,0.57,0.02,101.37,1048.77,pairtools,1
+1,1057.2271,0:17:37,5866.38,8533.06,5731.94,5746.23,5193.45,39.25,101.33,1078.93,pairtools,1
+2,1020.0639,0:17:00,5753.98,8533.06,5731.69,5732.67,10586.02,78.47,102.21,1056.3,pairtools,1
+3,1044.6887,0:17:24,5852.79,8533.19,5734.25,5766.39,15784.64,117.71,99.64,29.63,pairtools,1
+4,1046.266,0:17:26,5824.86,8533.06,5734.24,5757.25,21050.2,156.94,99.64,38.13,pairtools,1
+0,482.1029,0:08:02,17068.97,20572.73,16931.91,16951.71,16275.21,0.02,102.18,493.82,pairtools_bwamem2,1
+1,482.5261,0:08:02,17079.46,20508.73,16941.46,16956.34,32534.11,39.25,101.73,498.88,pairtools_bwamem2,1
+2,488.9997,0:08:08,17055.04,20508.59,16920.96,16939.69,42104.61,78.47,100.17,502.58,pairtools_bwamem2,1
+3,484.346,0:08:04,16981.59,20380.6,16961.3,16962.37,45493.36,117.7,93.06,50.54,pairtools_bwamem2,1
+4,483.3159,0:08:03,16969.02,20595.51,16944.75,16945.79,61922.79,156.93,99.61,507.64,pairtools_bwamem2,1
+0,611.4665,0:10:11,13519.86,15003.46,13507.46,13508.44,22937.93,2801.37,50.95,5.28,tadbit,1
+1,442.9993,0:07:22,13559.35,15003.46,13509.66,13529.09,40809.44,6184.54,55.05,25.04,tadbit,1
+2,385.3129,0:06:25,13561.27,15003.46,13508.5,13522.26,40809.46,9295.42,59.39,22.13,tadbit,1
+3,390.4362,0:06:30,13561.43,15003.46,13509.03,13522.27,40809.5,12333.14,58.93,23.74,tadbit,1
+4,447.8446,0:07:27,13560.76,15003.46,13509.25,13528.03,51014.88,15465.8,51.07,24.5,tadbit,1
+0,862.2741,0:14:22,3639.61,4465.74,3612.88,3616.18,9401.06,3187.84,73.44,23.31,tadbit_bowtie2,1
+1,815.1788,0:13:35,3640.85,4464.65,3582.08,3602.21,9401.11,6292.2,82.09,20.52,tadbit_bowtie2,1
+2,817.2921,0:13:37,3736.97,4560.27,3712.91,3716.56,13567.1,9504.59,81.15,19.79,tadbit_bowtie2,1
+3,819.7936,0:13:39,3640.98,4464.65,3617.2,3619.83,13567.22,12458.8,80.4,19.48,tadbit_bowtie2,1
+4,760.4646,0:12:40,3641.41,4464.98,3621.55,3622.75,13567.24,15676.85,81.76,20.99,tadbit_bowtie2,1
+0,444.1354,0:07:24,3496.54,4314.39,3479.12,3481.87,27.41,783.47,400.27,1779.49,bowtie,2
+1,446.9715,0:07:26,3496.32,4314.39,3478.36,3481.27,27.41,1570.48,397.44,1782.37,bowtie,2
+2,484.7029,0:08:04,3511.4,4338.64,3490.33,3495.52,27.41,2346.41,368.72,1797.04,bowtie,2
+3,450.6565,0:07:30,3488.59,4313.88,3466.99,3471.75,27.41,3148.2,393.42,1786.37,bowtie,2
+4,453.0468,0:07:33,3488.74,4314.02,3466.96,3470.63,27.41,3936.13,392.09,1793.93,bowtie,2
+0,330.2381,0:05:30,6045.63,6645.95,6021.65,6027.11,5173.52,870.68,375.32,1240.4,bwamem,2
+1,329.5853,0:05:29,6028.34,6645.95,6019.0,6019.59,6654.07,1803.59,382.07,1265.1,bwamem,2
+2,297.1758,0:04:57,6003.03,6453.95,5994.1,5994.56,9493.45,2736.5,363.55,1091.44,bwamem,2
+3,263.866,0:04:23,5991.57,6453.95,5982.83,5983.36,10181.14,3669.41,388.27,1039.2,bwamem,2
+4,258.637,0:04:18,5992.16,6453.95,5983.26,5983.78,10181.15,4477.87,351.12,927.49,bwamem,2
+0,141.9624,0:02:21,17604.44,20094.61,17587.78,17591.4,16300.85,870.68,311.82,444.2,bwamem2,2
+1,133.0894,0:02:13,17620.68,20849.7,17603.97,17607.69,26647.44,1554.78,267.76,361.73,bwamem2,2
+2,119.1224,0:01:59,17591.03,21040.61,17573.69,17577.37,26647.45,2736.5,354.15,430.73,bwamem2,2
+3,119.2461,0:01:59,17618.88,20912.61,17601.95,17605.61,26647.45,3669.41,353.61,437.38,bwamem2,2
+4,119.4407,0:01:59,17610.04,20579.05,17592.86,17596.52,26647.46,4602.32,352.7,443.75,bwamem2,2
+0,99.0323,0:01:39,19065.62,20682.54,19043.42,19050.13,14756.72,0.02,106.11,105.91,chromap,2
+1,93.6757,0:01:33,19075.63,20687.33,19053.32,19060.02,25886.48,88.36,121.88,118.06,chromap,2
+2,120.2376,0:02:00,19145.25,20776.39,19130.55,19133.61,37626.33,176.69,104.51,132.92,chromap,2
+3,83.0685,0:01:23,18948.87,20569.96,18934.38,18937.44,41077.34,270.34,159.69,144.2,chromap,2
+4,98.8891,0:01:38,19049.5,20677.47,19034.7,19037.84,41077.34,353.35,126.08,140.51,chromap,2
+0,2463.1096,0:41:03,7184.84,8822.2,4290.22,5704.96,1790.3,9624.3,17.33,337.56,fanc_bowtie2,2
+1,2388.2986,0:39:48,7183.59,8820.36,4839.33,5975.55,1790.35,20084.17,17.7,322.7,fanc_bowtie2,2
+2,2313.5663,0:38:33,6195.27,8222.25,3496.78,4531.77,1790.41,30597.45,18.65,343.23,fanc_bowtie2,2
+3,2351.3693,0:39:11,9008.62,11035.56,5810.7,7349.54,1790.42,41169.94,17.91,356.34,fanc_bowtie2,2
+4,2369.8083,0:39:29,7186.87,8822.57,4288.14,5702.9,1790.45,51675.89,19.01,377.66,fanc_bowtie2,2
+0,1774.8625,0:29:34,9036.59,11062.79,5886.57,7381.14,180.82,6716.99,24.32,331.93,fanc_bwa,2
+1,1740.5138,0:29:00,7185.89,9381.23,5771.89,5896.22,180.85,14316.39,27.45,352.99,fanc_bwa,2
+2,1693.8652,0:28:13,7188.38,9315.89,5720.5,5842.73,180.88,21905.75,22.64,337.38,fanc_bwa,2
+3,1673.5139,0:27:53,7184.62,9376.55,5740.96,5861.3,180.95,29485.2,22.17,333.76,fanc_bwa,2
+4,1671.1137,0:27:51,7186.76,9379.55,5780.07,5902.28,180.97,37075.34,22.58,346.62,fanc_bwa,2
+0,746.2236,0:12:26,22026.01,24493.83,21948.96,21969.79,2997.02,0.53,170.7,512.85,hicexplorer,2
+1,710.6231,0:11:50,23024.07,24502.34,22945.46,22966.38,3015.5,0.54,171.76,495.71,hicexplorer,2
+2,705.1923,0:11:45,22126.58,24493.83,22047.82,22069.79,3015.5,1.05,176.52,507.18,hicexplorer,2
+3,712.7222,0:11:52,23066.12,24493.83,22987.89,23008.64,3015.5,1.56,171.49,518.58,hicexplorer,2
+4,715.7957,0:11:55,22876.23,24493.83,22797.87,22819.07,3015.5,2.58,172.73,557.14,hicexplorer,2
+0,1073.3043,0:17:53,6730.26,7406.83,6678.91,6685.25,9.88,1089.03,184.85,64.55,hicpro,2
+1,1038.1439,0:17:18,6730.3,7406.82,6675.17,6685.6,9.89,2215.98,182.2,73.83,hicpro,2
+2,1033.7443,0:17:13,6730.34,7406.82,6675.99,6686.95,221.47,3341.59,185.87,88.03,hicpro,2
+3,1022.0066,0:17:02,6731.1,7406.83,6675.88,6688.85,222.14,4467.19,188.24,109.28,hicpro,2
+4,1094.391,0:18:14,6730.64,7406.82,6676.12,6689.47,222.14,5592.8,175.63,105.96,hicpro,2
 0,502.114,0:08:22,5634.02,34265.69,5609.24,5616.47,70.82,2785.59,175.44,10.71,juicer,2
 1,502.401,0:08:22,5634.24,6057.17,5609.44,5616.5,70.89,5661.32,175.44,14.17,juicer,2
 2,502.5511,0:08:22,5634.45,6057.17,5609.73,5616.78,70.92,8537.05,175.43,18.41,juicer,2
 3,500.9428,0:08:20,5634.69,6057.17,5609.77,5616.84,70.92,11412.78,175.96,24.15,juicer,2
 4,500.3197,0:08:20,5634.83,6057.17,5610.01,5617.05,70.92,14367.65,176.2,21.5,juicer,2
-0,493.3558,0:08:13,6034.43,9532.37,5886.98,5926.35,119.25,0.02,197.44,976.1,pairtools,2
-1,495.0177,0:08:15,6035.91,9532.36,5888.32,5927.57,119.27,39.25,195.91,979.5,pairtools,2
-2,493.4662,0:08:13,6037.14,9532.62,5890.17,5931.54,119.27,78.48,196.67,986.39,pairtools,2
-3,495.3653,0:08:15,6036.7,9532.62,5888.86,5930.54,119.27,117.71,195.8,992.14,pairtools,2
-4,493.0538,0:08:13,6037.45,9532.61,5889.9,5931.07,119.27,156.94,196.7,998.26,pairtools,2
-0,269.5154,0:04:29,17364.26,22542.92,17216.99,17258.1,16458.32,0.02,184.0,497.96,pairtools_bwamem2,2
-1,264.1916,0:04:24,17370.49,21967.18,17221.91,17263.4,32731.18,39.26,169.33,14.67,pairtools_bwamem2,2
-2,242.8783,0:04:02,17374.63,22479.19,17226.26,17267.71,32731.18,78.48,202.03,504.14,pairtools_bwamem2,2
-3,240.312,0:04:00,17380.28,22584.26,17231.19,17272.84,32914.7,117.72,181.17,36.14,pairtools_bwamem2,2
-4,260.2244,0:04:20,17335.83,22287.18,17184.78,17226.53,43277.2,156.93,174.43,485.07,pairtools_bwamem2,2
-0,66.1876,0:01:06,18991.04,20714.05,18981.3,18983.05,14985.41,0.02,116.82,78.95,chromap,4
-1,66.4819,0:01:06,18990.63,20714.06,18981.04,18982.71,29925.57,88.35,117.9,82.69,chromap,4
-2,66.7036,0:01:06,18970.18,20708.46,18960.49,18962.16,44866.02,176.68,116.46,84.95,chromap,4
-3,64.9087,0:01:04,19042.49,20732.84,19032.81,19034.48,58828.13,265.02,127.11,92.57,chromap,4
-4,62.2284,0:01:02,19042.42,20730.94,19034.25,19035.61,71076.18,353.35,148.52,105.26,chromap,4
-0,1645.5116,0:27:25,8997.41,11419.24,5832.56,7402.09,3775.64,10271.72,28.18,370.43,fanc_bowtie2,4
-1,1619.9334,0:26:59,6157.95,8576.94,3572.04,4560.07,7664.87,20149.13,26.84,369.2,fanc_bowtie2,4
-2,1588.7391,0:26:28,7167.18,9050.39,4342.96,5747.66,7664.99,31081.43,30.88,374.4,fanc_bowtie2,4
-3,1586.9943,0:26:26,7164.36,9051.91,4339.34,5744.81,7699.56,41761.73,29.81,381.74,fanc_bowtie2,4
-4,1588.1941,0:26:28,7165.76,9051.89,4338.56,5745.02,7934.8,52165.17,30.26,393.88,fanc_bowtie2,4
-0,1217.6522,0:20:17,7164.02,9864.16,5755.11,5903.98,5725.82,6698.04,31.02,328.62,fanc_bwa,4
-1,1203.7575,0:20:03,7162.46,9864.08,5694.31,5844.12,5726.06,14297.14,36.35,359.69,fanc_bwa,4
-2,1209.304,0:20:09,9014.25,11434.79,5849.93,7419.8,5726.26,21891.89,36.77,360.09,fanc_bwa,4
-3,1209.9698,0:20:09,9019.61,11440.56,5857.24,7426.2,5726.32,29473.82,36.79,371.59,fanc_bwa,4
-4,1215.2228,0:20:15,9019.98,11439.18,5857.65,7426.89,5726.38,37068.11,36.41,379.67,fanc_bwa,4
-0,659.3963,0:10:59,23338.86,26170.54,23340.29,23340.77,358.09,0.52,163.06,550.72,hicexplorer,4
-1,687.6019,0:11:27,23725.26,26170.79,23724.7,23725.45,4611.54,1.03,171.24,574.94,hicexplorer,4
-2,690.3401,0:11:30,23709.58,26170.79,23689.0,23692.41,7829.07,1.54,176.44,581.15,hicexplorer,4
-3,671.4387,0:11:11,22807.14,26170.79,22787.6,22790.77,11022.66,1.54,184.36,572.08,hicexplorer,4
-4,669.665,0:11:09,22906.86,26170.79,22885.91,22889.43,14973.13,2.05,187.18,586.27,hicexplorer,4
-0,613.8815,0:10:13,6754.09,7622.23,6701.89,6712.25,480.14,1090.27,341.63,69.04,hicpro,4
-1,611.1198,0:10:11,6754.98,7622.49,6702.38,6714.12,480.16,2215.74,343.62,78.57,hicpro,4
-2,609.9385,0:10:09,6754.65,7622.49,6702.03,6712.67,480.17,3341.22,344.39,86.13,hicpro,4
-3,610.7242,0:10:10,6755.03,7622.48,6702.52,6714.04,505.8,4466.71,343.66,91.51,hicpro,4
-4,622.0769,0:10:22,6754.96,7622.48,6710.87,6718.84,506.52,5590.84,332.32,93.15,hicpro,4
+0,519.8764,0:08:39,6037.57,9285.21,5901.66,5924.11,5249.05,0.02,194.04,1009.67,pairtools,2
+1,483.1521,0:08:03,6037.39,9285.21,5901.68,5924.15,5249.05,39.25,200.27,974.78,pairtools,2
+2,480.6768,0:08:00,6037.79,9285.21,5903.83,5941.17,5249.05,78.47,201.33,980.38,pairtools,2
+3,465.3507,0:07:45,6037.73,9285.21,5903.88,5941.19,5249.05,117.7,194.76,924.69,pairtools,2
+4,466.1686,0:07:46,6038.33,9285.21,5903.87,5941.35,5249.05,156.93,194.4,930.11,pairtools,2
+0,228.2886,0:03:48,17383.15,22336.79,17245.43,17268.2,0.0,0.02,186.3,427.67,pairtools_bwamem2,2
+1,221.768,0:03:41,17381.39,22167.71,17243.96,17266.91,0.0,39.25,193.13,437.76,pairtools_bwamem2,2
+2,221.3263,0:03:41,17362.04,22295.71,17224.0,17246.68,0.0,78.47,193.49,445.19,pairtools_bwamem2,2
+3,217.581,0:03:37,17340.12,22400.79,17203.5,17233.95,0.0,117.7,196.51,452.01,pairtools_bwamem2,2
+4,223.3371,0:03:43,17393.16,22144.79,17255.85,17277.7,0.01,156.93,192.59,462.11,pairtools_bwamem2,2
+0,610.9462,0:10:10,13529.42,15232.48,13516.34,13517.79,36766.11,2801.28,49.41,96.95,tadbit,2
+1,376.1804,0:06:16,13569.33,15232.48,13545.14,13547.36,61864.18,5678.4,62.61,85.0,tadbit,2
+2,300.593,0:05:00,13570.71,15232.47,13518.36,13536.02,61954.27,9211.43,31.59,15.96,tadbit,2
+3,297.5085,0:04:57,13570.71,15232.48,13518.59,13531.8,61954.29,12339.2,53.0,24.9,tadbit,2
+4,296.7615,0:04:56,13570.29,15232.48,13518.88,13531.85,61954.32,15451.43,72.1,28.21,tadbit,2
+0,463.5859,0:07:43,3666.75,4713.83,3630.19,3636.92,0.02,2804.57,129.07,84.87,tadbit_bowtie2,2
+1,455.2237,0:07:35,3666.76,4713.65,3630.33,3637.05,0.04,6087.06,128.93,11.28,tadbit_bowtie2,2
+2,455.0799,0:07:35,3666.91,4713.66,3630.34,3637.1,0.07,9386.64,128.87,14.03,tadbit_bowtie2,2
+3,458.9496,0:07:38,3667.79,4714.59,3631.29,3638.12,321.35,12458.88,123.9,12.09,tadbit_bowtie2,2
+4,452.953,0:07:32,3667.88,4714.02,3597.93,3621.82,321.36,15940.75,129.9,20.51,tadbit_bowtie2,2
+0,420.8718,0:07:00,3496.11,4314.01,3471.04,3476.4,15.42,768.89,393.0,1654.84,bowtie,4
+1,422.8224,0:07:02,3496.45,4314.02,3471.73,3477.1,15.42,1557.63,391.27,1658.74,bowtie,4
+2,419.9842,0:06:59,3494.73,4314.02,3470.36,3475.6,15.42,2354.96,393.91,1662.14,bowtie,4
+3,420.7907,0:07:00,3495.93,4314.02,3471.43,3476.67,15.42,3145.49,393.27,1666.11,bowtie,4
+4,412.5256,0:06:52,3497.47,4314.64,3472.82,3478.16,15.42,3952.15,401.55,1671.29,bowtie,4
+0,252.8627,0:04:12,6006.18,6453.96,5982.69,5986.96,0.83,746.23,358.77,908.39,bwamem,4
+1,261.662,0:04:21,6007.87,6453.96,5984.64,5988.93,5174.14,1803.59,380.15,999.85,bwamem,4
+2,274.3422,0:04:34,6024.01,6581.96,5999.88,6003.05,6023.34,2736.5,373.3,1032.61,bwamem,4
+3,298.8224,0:04:58,6014.57,6453.96,5990.63,5993.79,11196.66,3669.41,359.87,1087.63,bwamem,4
+4,285.7411,0:04:45,5997.78,6453.96,5988.62,5989.24,12674.78,4477.87,357.24,1036.2,bwamem,4
+0,111.3764,0:01:51,17591.11,20784.61,17566.66,17572.18,0.02,932.93,377.1,420.58,bwamem2,4
+1,111.2423,0:01:51,17569.3,20537.94,17544.68,17550.22,0.02,1865.84,376.76,425.65,bwamem2,4
+2,112.9415,0:01:52,17579.97,20665.95,17555.33,17560.86,0.02,2798.75,371.87,432.32,bwamem2,4
+3,111.3421,0:01:51,17573.66,20489.7,17548.98,17554.51,0.02,3731.66,376.83,437.97,bwamem2,4
+4,111.3777,0:01:51,17544.64,20889.7,17520.3,17525.84,0.02,4353.51,270.46,325.61,bwamem2,4
+0,63.9876,0:01:03,19049.48,20739.12,19027.24,19032.6,14756.71,0.02,122.87,79.43,chromap,4
+1,46.2109,0:00:46,17981.29,19696.59,17959.17,17964.53,14756.71,88.36,86.01,43.67,chromap,4
+2,46.4987,0:00:46,17964.62,19692.91,17941.91,17947.24,14756.71,176.69,85.48,49.23,chromap,4
+3,46.4768,0:00:46,17972.1,19695.5,17949.51,17954.84,14756.71,265.02,85.79,55.0,chromap,4
+4,46.1236,0:00:46,17970.75,19693.68,17948.35,17953.68,14756.71,353.35,86.4,60.65,chromap,4
+0,1480.6598,0:24:40,6107.95,8576.53,3509.07,4446.13,0.04,9605.84,30.34,315.13,fanc_bowtie2,4
+1,1496.7858,0:24:56,9031.23,11498.52,5833.15,7371.95,0.07,20128.0,29.31,332.72,fanc_bowtie2,4
+2,1463.532,0:24:23,7184.77,9101.79,4217.2,5669.65,0.07,31165.73,31.89,344.45,fanc_bowtie2,4
+3,1450.3658,0:24:10,6120.51,8586.21,3510.02,4468.34,0.07,41105.85,30.36,336.53,fanc_bowtie2,4
+4,1452.4516,0:24:12,7187.97,9104.71,4292.93,5709.08,0.07,51577.66,30.16,338.06,fanc_bowtie2,4
+0,1150.8569,0:19:10,7186.05,9953.48,5723.31,5859.42,5224.05,6719.89,32.08,312.36,fanc_bwa,4
+1,1135.3056,0:18:55,7224.26,9950.27,5660.22,5824.15,5224.08,14390.72,35.73,329.42,fanc_bwa,4
+2,1125.9794,0:18:45,9041.66,11507.65,5849.72,7432.53,5224.11,21886.05,34.66,314.55,fanc_bwa,4
+3,1128.931,0:18:48,9044.25,11511.71,5853.64,7435.73,5224.14,29477.74,34.51,319.67,fanc_bwa,4
+4,1128.7103,0:18:48,9040.23,11509.34,5851.56,7432.95,5224.14,37075.91,34.49,326.24,fanc_bwa,4
+0,596.2727,0:09:56,23080.44,26188.7,23034.84,23042.16,0.0,0.53,199.86,493.7,hicexplorer,4
+1,594.7958,0:09:54,23219.5,26188.7,23173.16,23180.65,0.0,1.04,203.29,503.47,hicexplorer,4
+2,609.9005,0:10:09,23002.46,26188.7,22956.89,22965.46,0.0,1.05,192.2,508.65,hicexplorer,4
+3,597.4554,0:09:57,23259.77,26188.7,23215.77,23224.37,19.97,2.01,204.19,518.44,hicexplorer,4
+4,622.1449,0:10:22,23509.05,26188.7,23466.36,23474.47,5281.18,2.58,171.82,538.96,hicexplorer,4
+0,619.9254,0:10:19,6759.34,7631.43,6704.49,6717.01,3.34,1088.32,295.46,59.78,hicpro,4
+1,558.5864,0:09:18,6759.78,7631.43,6705.0,6715.83,3.34,2216.0,327.68,66.81,hicpro,4
+2,550.9692,0:09:10,6760.03,7631.43,6716.98,6725.62,3.34,3341.63,337.46,79.54,hicpro,4
+3,549.0615,0:09:09,6759.72,7631.44,6717.38,6725.84,3.34,4467.26,338.57,88.86,hicpro,4
+4,548.9341,0:09:08,6760.25,7631.43,6717.1,6725.58,3.34,5592.88,339.01,94.98,hicpro,4
 0,297.0198,0:04:57,5996.96,6457.18,5977.55,5979.61,3412.56,2785.59,297.18,6.35,juicer,4
 1,319.5797,0:05:19,5989.58,6457.18,5979.42,5981.48,9268.84,5739.07,309.19,7.78,juicer,4
 2,327.0335,0:05:27,5988.18,6457.18,5977.84,5978.92,10206.69,8537.05,311.25,12.23,juicer,4
 3,331.7378,0:05:31,5988.45,6457.18,5978.38,5979.71,16123.1,11364.07,297.09,18.28,juicer,4
 4,334.7347,0:05:34,5989.59,6457.18,5979.36,5980.55,19292.39,14066.95,290.47,19.62,juicer,4
-0,251.8315,0:04:11,6414.33,10892.68,6266.7,6308.41,5228.08,0.02,359.77,907.6,pairtools,4
-1,242.2713,0:04:02,6396.0,10828.75,6249.07,6290.71,5237.62,39.24,388.6,949.06,pairtools,4
-2,241.8187,0:04:01,6397.66,10828.74,6248.25,6290.61,5251.05,78.48,390.06,958.42,pairtools,4
-3,246.4631,0:04:06,6396.5,10892.75,6247.74,6289.48,10438.31,117.7,369.0,931.18,pairtools,4
-4,239.7562,0:03:59,6397.03,10828.68,6249.17,6290.98,10438.31,156.93,391.47,967.94,pairtools,4
-0,152.6922,0:02:32,17979.33,25543.29,17829.84,17871.88,16595.04,0.02,301.88,461.6,pairtools_bwamem2,4
-1,142.1458,0:02:22,17976.47,25223.34,17825.45,17867.58,27036.67,39.26,265.28,12.78,pairtools_bwamem2,4
-2,127.0336,0:02:07,17996.38,25159.3,17847.5,17890.06,27036.67,78.47,356.65,464.03,pairtools_bwamem2,4
-3,137.2203,0:02:17,18009.98,24643.06,17861.28,17903.91,32949.75,117.7,297.46,427.65,pairtools_bwamem2,4
-4,150.2599,0:02:30,17988.33,24331.95,17840.61,17882.52,49418.82,156.92,316.0,499.57,pairtools_bwamem2,4
+0,241.5991,0:04:01,6399.55,10581.27,6265.49,6303.07,0.17,0.02,385.61,935.31,pairtools,4
+1,241.5127,0:04:01,6398.66,10581.27,6265.65,6303.01,0.17,39.25,386.39,940.88,pairtools,4
+2,242.2225,0:04:02,6399.6,10581.27,6266.3,6303.79,0.17,78.47,384.74,943.77,pairtools,4
+3,241.9401,0:04:01,6398.33,10581.27,6264.49,6301.69,0.17,117.7,385.22,947.96,pairtools,4
+4,242.4722,0:04:02,6399.78,10581.27,6265.86,6303.43,0.17,156.93,384.24,951.72,pairtools,4
+0,149.1161,0:02:29,18032.98,24632.93,17895.58,17910.56,6662.34,0.02,321.06,482.55,pairtools_bwamem2,4
+1,129.8502,0:02:09,18011.57,24911.85,17874.46,17889.42,7215.82,39.25,336.11,443.2,pairtools_bwamem2,4
+2,149.7954,0:02:29,17943.55,24532.52,17806.29,17821.07,13922.12,78.47,316.37,488.94,pairtools_bwamem2,4
+3,151.1137,0:02:31,17982.62,24665.18,17847.23,17861.77,30195.88,117.7,311.36,487.72,pairtools_bwamem2,4
+4,151.164,0:02:31,18000.05,24454.96,17865.51,17883.09,46478.44,156.93,310.2,489.74,pairtools_bwamem2,4
+0,258.4889,0:04:18,13609.79,15690.48,13542.78,13567.37,15901.3,3103.55,57.27,19.48,tadbit,4
+1,226.7053,0:03:46,10086.89,15275.6,10018.9,10043.75,15902.2,5596.25,40.62,76.32,tadbit,4
+2,228.1718,0:03:48,8939.3,15275.61,8872.51,8896.66,15902.23,9327.17,39.3,23.97,tadbit,4
+3,226.9155,0:03:46,9170.13,15275.61,9102.29,9127.05,15902.26,11800.28,40.77,79.04,tadbit,4
+4,225.4808,0:03:45,9664.37,15275.61,9635.06,9642.85,15902.27,14928.92,41.44,83.32,tadbit,4
+0,320.7662,0:05:20,3689.32,5212.95,3650.62,3659.72,0.02,3171.16,145.76,25.25,tadbit_bowtie2,4
+1,348.5605,0:05:48,3687.51,5211.26,3648.52,3653.68,3011.82,6430.47,170.42,20.75,tadbit_bowtie2,4
+2,338.4871,0:05:38,3688.18,5211.85,3649.18,3654.37,3012.64,9240.8,172.35,8.57,tadbit_bowtie2,4
+3,340.9527,0:05:40,3689.14,5212.57,3649.83,3654.76,3014.55,12098.38,165.3,65.98,tadbit_bowtie2,4
+4,346.3,0:05:46,3688.78,5212.19,3649.61,3654.97,3060.71,15516.58,170.85,94.72,tadbit_bowtie2,4


=====================================
doc/index.rst
=====================================
@@ -70,7 +70,8 @@ Contents:
    installation
    parsing
    sorting
-   formats 
+   formats
+   stats
    technotes
    cli_tools
 


=====================================
doc/stats.rst
=====================================
@@ -0,0 +1,123 @@
+`Pairtools stats` as a source of quality control metrics
+===========================================
+
+Overview
+--------
+
+`pairtools stats` produces a human-readable nested dictionary of statistics stored in
+a YAML file or a tab-separated text table (specified through the parameters).
+
+When calculating statistics, any number of filters can be applied to generate separate
+statistics for different categories of pairs, for example they can be filtered by the
+read mapping quality (mapq values). These are then stored as separate sections of the
+output file.
+
+- **Global statistics** include:
+    - number of pairs (total, unmapped, single-side mapped, etc.),
+    - total number of different pair types (UU, NN, NU, and others, see ` Pair types in pairtools docs <https://pairtools.readthedocs.io/en/latest/formats.html#pair-types>`_),
+    - number of contacts between all chromosome pairs
+
+- **Summary statistics** include:
+    - fraction of duplicates
+    - fraction of cis interactions (at different minimal distance cutoffs) out of total
+    - estimation of library complexity
+
+Summary statistics can inform you about the quality of the data.
+For example, more trans interactions can be a sign of problems with the 3C+ procedure and lower signal-to-noise ratio.
+Substantial mapping to mitochondrial chromosome (chrM) might be a sign of random ligation.
+
+- **P(s), or scaling.**  The dependence of contact frequency on the genomic
+distance referred to as the P(s) curve or scaling, which is a rich source of both biologically relevant information and technical quality of 3C+ experiments.
+The shape of P(s) is often used to characterize mechanisms of genome folding and reveal issues with QC.
+
+Interactive visualization of stats with MultiQC
+---------
+
+Install `multiqc`:
+
+.. code-block:: bash
+
+    pip install --upgrade --force-reinstall git+https://github.com/open2c/MultiQC.git
+
+Note that (for now) the pairtools module for MultiQC is only available in the open2C fork and not in the main MultiQC repository.
+
+Run MultiQC in a folder with one or multiple .stats files:
+
+.. code-block:: bash
+
+    multiqc .
+
+
+This will produce a nice .html file with interactive graphical summaries of the stats.
+
+
+Estimating library complexity
+----------------------------
+
+Pairtools assumes that each sequencing read is randomly chosen with
+replacement from a finite pool of fragments in DNA library [1]_ [2]_.
+With each new sequenced molecule, the expected number of observed unique molecules
+increases according to a simple equation:
+
+$$ U(N+1) = U(N) + (1 - {U(N) \\over C}), $$
+
+where $N$ is the number of sequenced molecules, $U(N)$ is the expected number
+of observed unique molecules after sequencing $N$ molecules, and C is the library complexity.
+This differential equation yields [1, 2]:
+
+$$ {U(N) \\over C} = 1 - exp( - {N \\over C}), $$
+
+which can be solved as
+
+$$ C = \Re(lambert W( - { \exp( - {1 \\over u} ) \\over u} ) ) + {1 \\over u} $$
+
+Library complexity can guide in the choice of sequencing depth of the library
+and provide an estimate of library quality.
+
+
+Illumina sequencing duplicates
+-----------------
+
+Importantly, you can estimate the complexity of Hi-C libraries using only small QC
+samples to decide if their quality permits deeper sequencing [3]_.
+These estimates, however, can be significantly biased by the presence of “optical” or
+“clustering” duplicates. Such duplicates occur as artefacts of the sequencing procedure.
+Optical duplicates appear in data generated on sequencers with non-patterned flowcells in
+cases the instrument either erroneously splits a signal from a single sequenced molecule
+into two. On the other hand, clustering duplicates appear on patterned flowcells, when
+during cluster generation a cluster occupies adjacent nanowells. [4]_.
+
+The rate of optical and clustering duplication depends on the technology and the operating
+conditions (e.g. molarity of the library loaded onto the flowcell), but not on the
+library complexity or sequencing depth. Thus, in small sequencing samples in particular
+the clustering duplication on recent Illumina instruments can severely inflate the
+observed levels of duplication [5]_, resulting in underestimation of the library complexity.
+
+While the frequency of PCR duplicates increases with sequencing depth,
+optical or clustering duplication levels may stay constant for a particular sequencer,
+provided the library is loaded at the same molarity. This means that the high frequency of
+clustering duplicates on the NovaSeq leads to severe underestimation of library complexity
+in the pilot runs. In particular, the recent models of Illumina sequencers with patterned
+flowcells (such as NovaSeq) suffer from increased clustering duplication rate, which may
+far exceed the level of PCR duplication.
+
+Luckily, optical and clustering duplicates can be distinguished from the PCR ones,
+as the former are located next to each other on the sequencing flow cell.
+In case of Illumina sequencers, pairtools dedup can infer the positions of sequencing
+reads from their IDs and focuses on geometrically distant duplicates to produce unbiased
+estimates of PCR duplication and library complexity.  Although SRA does not store original
+read IDs from the sequencer, this analysis is possible when pairtools is run on a dataset
+with original Illumina-generated read IDs.
+Note that in our experience even when accounting for optical/clustering duplicates, the
+complexity can be greatly underestimated, but is still a useful measurement to choose the
+most complex libraries.
+
+
+.. [1] Picard. http://broadinstitute.github.io/picard/
+
+.. [2] Thread: [Samtools-help] Pickard estimate for the size of a library - wrong or non-transparent? https://sourceforge.net/p/samtools/mailman/samtools-help/thread/DUB405-EAS154589A1ACEF2BE4C573D4592180@phx.gbl/
+
+.. [3] Rao, S. S. P. et al. A 3D map of the human genome at kilobase resolution reveals principles of chromatin looping. Cell 159, 1665–1680 (2014).
+
+.. [4] Duplicates on Illumina. BioStars. https://www.biostars.org/p/229842/
+.. [5] Illumina Patterned Flow Cells Generate Duplicated Sequences. https://sequencing.qcfail.com/articles/illumina-patterned-flow-cells-generate-duplicated-sequences/
\ No newline at end of file


=====================================
pairtools/__init__.py
=====================================
@@ -4,12 +4,12 @@ pairtools
 
 CLI tools to process mapped Hi-C data
 
-:copyright: (c) 2017-2022 Open2C
+:copyright: (c) 2017-2023 Open2C
 :author: Open2C
 :license: MIT
 
 """
 
-__version__ = "1.0.1"
+__version__ = "1.0.3"
 
 # from . import lib


=====================================
pairtools/cli/dedup.py
=====================================
@@ -113,7 +113,7 @@ UTIL_NAME = "pairtools_dedup"
 @click.option(
     "--chunksize",
     type=int,
-    default=100_000,
+    default=10_000,
     show_default=True,
     help="Number of pairs in each chunk. Reduce for lower memory footprint."
     " Below 10,000 performance starts suffering significantly and the algorithm might"
@@ -171,12 +171,6 @@ UTIL_NAME = "pairtools_dedup"
     help=r"Separator (\t, \v, etc. characters are "
     "supported, pass them in quotes). [input format option]",
 )
- at click.option(
-    "--comment-char",
-    type=str,
-    default="#",
-    help="The first character of comment lines. [input format option]",
-)
 @click.option(
     "--send-header-to",
     type=click.Choice(["dups", "dedup", "both", "none"]),
@@ -304,7 +298,6 @@ def dedup(
     max_mismatch,
     method,
     sep,
-    comment_char,
     send_header_to,
     c1,
     c2,
@@ -342,7 +335,6 @@ def dedup(
         max_mismatch,
         method,
         sep,
-        comment_char,
         send_header_to,
         c1,
         c2,
@@ -376,7 +368,6 @@ def dedup_py(
     max_mismatch,
     method,
     sep,
-    comment_char,
     send_header_to,
     c1,
     c2,
@@ -548,7 +539,7 @@ def dedup_py(
         )
     elif backend in ("scipy", "sklearn"):
         streaming_dedup(
-            in_stream=instream,
+            in_stream=body_stream,
             colnames=column_names,
             chunksize=chunksize,
             carryover=carryover,
@@ -558,7 +549,6 @@ def dedup_py(
             extra_col_pairs=list(extra_col_pair),
             keep_parent_id=keep_parent_id,
             unmapped_chrom=unmapped_chrom,
-            comment_char=comment_char,
             outstream=outstream,
             outstream_dups=outstream_dups,
             outstream_unmapped=outstream_unmapped,


=====================================
pairtools/cli/flip.py
=====================================
@@ -101,7 +101,7 @@ def flip_py(pairs_path, chroms_path, output, **kwargs):
     ]
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
 
         is_annotated1 = cols[chrom1_col] in chrom_enum.keys()
         is_annotated2 = cols[chrom2_col] in chrom_enum.keys()


=====================================
pairtools/cli/markasdup.py
=====================================
@@ -55,7 +55,7 @@ def markasdup_py(pairsam_path, output, **kwargs):
     outstream.writelines((l + "\n" for l in header))
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         mark_split_pair_as_dup(cols)
 
         outstream.write(pairsam_format.PAIRSAM_SEP.join(cols))


=====================================
pairtools/cli/phase.py
=====================================
@@ -203,7 +203,7 @@ def phase_py(
     outstream.writelines((l + "\n" for l in header))
 
     for line in body_stream:
-        cols = line.split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         cols.append("!")
         cols.append("!")
         if report_scores:


=====================================
pairtools/cli/restrict.py
=====================================
@@ -96,7 +96,7 @@ def restrict_py(pairs_path, frags, output, **kwargs):
     }
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         chrom1, pos1 = cols[pairsam_format.COL_C1], int(cols[pairsam_format.COL_P1])
         rfrag_idx1, rfrag_start1, rfrag_end1 = find_rfrag(rfrags, chrom1, pos1)
         chrom2, pos2 = cols[pairsam_format.COL_C2], int(cols[pairsam_format.COL_P2])


=====================================
pairtools/cli/select.py
=====================================
@@ -229,7 +229,7 @@ def select_py(
     for filter_passed, line in evaluate_stream(
         body_stream, condition, column_names, type_cast, startup_code
     ):
-        COLS = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        COLS = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
 
         if remove_columns:
             COLS = [


=====================================
pairtools/cli/split.py
=====================================
@@ -92,7 +92,7 @@ def split_py(pairsam_path, output_pairs, output_sam, **kwargs):
                 header, "columns", " ".join(columns)
             )
             has_sams = True
-        elif ("sam1" in columns) != ("sam1" in columns):
+        elif ("sam1" in columns) != ("sam2" in columns):
             raise ValueError(
                 "According to the #columns header field only one sam entry is present"
             )
@@ -113,7 +113,7 @@ def split_py(pairsam_path, output_pairs, output_sam, **kwargs):
     sam1 = None
     sam2 = None
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         if has_sams:
             if sam1col < sam2col:
                 sam2 = cols.pop(sam2col)


=====================================
pairtools/lib/dedup.py
=====================================
@@ -13,6 +13,10 @@ from .._logging import get_logger
 logger = get_logger()
 import time
 
+# Ignore pandas future warnings:
+import warnings
+warnings.simplefilter(action='ignore', category=FutureWarning)
+
 # Setting for cython deduplication:
 # you don't need to load more than 10k lines at a time b/c you get out of the
 # CPU cache, so this parameter is not adjustable
@@ -29,7 +33,6 @@ def streaming_dedup(
     max_mismatch,
     extra_col_pairs,
     unmapped_chrom,
-    comment_char,
     outstream,
     outstream_dups,
     outstream_unmapped,
@@ -49,7 +52,6 @@ def streaming_dedup(
         max_mismatch=max_mismatch,
         extra_col_pairs=extra_col_pairs,
         keep_parent_id=keep_parent_id,
-        comment_char=comment_char,
         backend=backend,
         n_proc=n_proc,
     )
@@ -114,13 +116,12 @@ def _dedup_stream(
     max_mismatch,
     extra_col_pairs,
     keep_parent_id,
-    comment_char,
     backend,
     n_proc,
 ):
     # Stream the input dataframe:
     dfs = pd.read_table(
-        in_stream, comment=comment_char, names=colnames, chunksize=chunksize
+        in_stream, comment=None, names=colnames, chunksize=chunksize
     )
 
     # Set up the carryover dataframe:
@@ -375,7 +376,7 @@ def streaming_dedup_cython(
     read_idx = 0  # read index to mark the parent readID
     while True:
         rawline = next(instream, None)
-        stripline = rawline.strip() if rawline else None
+        stripline = rawline.strip('\n') if rawline else None
 
         # take care of empty lines not at the end of the file separately
         if rawline and (not stripline):


=====================================
pairtools/lib/headerops.py
=====================================
@@ -70,7 +70,7 @@ def get_header(instream, comment_char=COMMENT_CHAR, ignore_warning=False):
         if isinstance(line, bytes):
             line = line.decode()
         # append line to header, since it does start with header
-        header.append(line.strip())
+        header.append(line.rstrip('\n'))
         # peek into the remainder of the instream
         current_peek = peek_f(1)
     # apparently, next line does not start with the comment
@@ -95,7 +95,7 @@ def extract_fields(header, field_name, save_rest=False):
     rest = []
     for l in header:
         if l.lstrip(COMMENT_CHAR).startswith(field_name + ":"):
-            fields.append(l.split(":", 1)[1].strip())
+            fields.append(l.split(":", 1)[1].rstrip('\n').lstrip())
         elif save_rest:
             rest.append(l)
 


=====================================
pairtools/lib/scaling.py
=====================================
@@ -133,9 +133,11 @@ def make_empty_cross_region_table(
 
 
 def bins_pairs_by_distance(
-    pairs_df, dist_bins, regions=None, chromsizes=None, ignore_trans=False
+    pairs_df, dist_bins, regions=None, chromsizes=None, ignore_trans=False,
+    keep_unassigned=False,
 ):
 
+    dist_bins = np.r_[dist_bins, np.iinfo(np.int64).max]
     if regions is None:
         if chromsizes is None:
             chroms = sorted(
@@ -148,9 +150,10 @@ def bins_pairs_by_distance(
             region_ends1, region_ends2 = -1, -1
 
         else:
-            region_starts1, region_starts2 = 0, 0
-            region_ends1 = pairs_df.chrom1.map(chromsizes).fillna(1).astype(np.int64)
-            region_ends2 = pairs_df.chrom2.map(chromsizes).fillna(1).astype(np.int64)
+            region_ends1 = pairs_df.chrom1.map(chromsizes).fillna(-1).astype(np.int64)
+            region_ends2 = pairs_df.chrom2.map(chromsizes).fillna(-1).astype(np.int64)
+            region_starts1 = np.where(region_ends1 > 0, 0, -1)
+            region_starts2 = np.where(region_ends2 > 0, 0, -1)
             regions = pd.DataFrame(
                 [
                     {"chrom": chrom, "start": 0, "end": length}
@@ -183,6 +186,7 @@ def bins_pairs_by_distance(
             pairs_df.chrom2.values, pairs_df.pos2.values, regions
         ).T
 
+
     pairs_reduced_df = pd.DataFrame(
         {
             "chrom1": pairs_df.chrom1.values,
@@ -201,6 +205,11 @@ def bins_pairs_by_distance(
         copy=False,
     )
 
+    if not keep_unassigned:
+        pairs_reduced_df = (pairs_reduced_df
+            .query('(start1 >= 0) and (end1 > 0) and (start2 >= 0) and (end2 > 0)')
+            .reset_index(drop=True))
+
     pairs_reduced_df["min_dist"] = np.where(
         pairs_reduced_df["dist_bin_idx"] > 0,
         dist_bins[pairs_reduced_df["dist_bin_idx"] - 1],
@@ -208,7 +217,7 @@ def bins_pairs_by_distance(
     )
 
     pairs_reduced_df["max_dist"] = np.where(
-        pairs_reduced_df["dist_bin_idx"] < len(dist_bins),
+        pairs_reduced_df["dist_bin_idx"] < len(dist_bins)-1,
         dist_bins[pairs_reduced_df["dist_bin_idx"]],
         np.iinfo(np.int64).max,
     )
@@ -220,6 +229,8 @@ def bins_pairs_by_distance(
         (pairs_reduced_df.chrom1 == pairs_reduced_df.chrom2)
         & (pairs_reduced_df.start1 == pairs_reduced_df.start2)
         & (pairs_reduced_df.end1 == pairs_reduced_df.end2)
+        & (pairs_reduced_df.min_dist > 0)
+        & (pairs_reduced_df.max_dist < np.iinfo(np.int64).max)
     )
 
     pairs_for_scaling_df = pairs_reduced_df.loc[pairs_for_scaling_mask]
@@ -321,6 +332,7 @@ def compute_scaling(
     n_dist_bins=8 * 8,
     chunksize=int(1e7),
     ignore_trans=False,
+    keep_unassigned=False,
     filter_f=None,
     nproc_in=1,
     cmd_in=None,
@@ -337,6 +349,7 @@ def compute_scaling(
     n_dist_bins: number of logarithmic bins
     chunksize: size of chunks for calculations
     ignore_trans: bool, ignore trans or not
+    keep_unassigned: bool, keep pairs that are not assigned to any region
     filter_f: filter function that can be applied to each chunk
     nproc_in
     cmd_in
@@ -368,8 +381,10 @@ def compute_scaling(
         header, pairs_body = headerops.get_header(pairs_stream)
 
         cols = headerops.extract_column_names(header)
+
         if chromsizes is None:
             chromsizes = headerops.extract_chromsizes(header)
+
         pairs_df = pd.read_csv(
             pairs_body,
             header=None,
@@ -393,6 +408,7 @@ def compute_scaling(
             regions=regions,
             chromsizes=chromsizes,
             ignore_trans=ignore_trans,
+            keep_unassigned=keep_unassigned
         )
 
         sc = sc_chunk if sc is None else sc.add(sc_chunk, fill_value=0)
@@ -413,7 +429,7 @@ def compute_scaling(
 
     if not ignore_trans:
         trans_counts.reset_index(inplace=True)
-        trans_counts["np_bp2"] = (trans_counts["end1"] - trans_counts["start1"]) * (
+        trans_counts["n_bp2"] = (trans_counts["end1"] - trans_counts["start1"]) * (
             trans_counts["end2"] - trans_counts["start2"]
         )
 


=====================================
pairtools/lib/select.py
=====================================
@@ -66,9 +66,11 @@ def evaluate_stream(
     for i, col in enumerate(column_names):
         if col in TYPES:
             col_type = TYPES[col]
-            condition = condition.replace(col, "{}(COLS[{}])".format(col_type, i))
+            condition = re.sub(r"\b%s\b" % col , "{}(COLS[{}])".format(col_type, i), condition)
+            #condition.replace(col, "{}(COLS[{}])".format(col_type, i))
         else:
-            condition = condition.replace(col, "COLS[{}]".format(i))
+            condition = re.sub(r"\b%s\b" % col, "COLS[{}]".format(i), condition)
+            #condition = condition.replace(col, "COLS[{}]".format(i))
 
     # Compile the filtering expression:
     match_func = compile(condition, "<string>", "eval")
@@ -121,7 +123,8 @@ def evaluate_df(df, condition, type_cast=(), startup_code=None, engine="pandas")
     else:
         # Set up the columns indexing
         for i, col in enumerate(df.columns):
-            condition = condition.replace(col, "COLS[{}]".format(i))
+            condition = re.sub(r"\b%s\b" % col, "COLS[{}]".format(i), condition)
+            #condition = condition.replace(col, "COLS[{}]".format(i))
 
         filter_passed_output = []
         match_func = compile(condition, "<string>", "eval")


=====================================
pairtools/lib/stats.py
=====================================
@@ -58,7 +58,7 @@ class PairCounter(Mapping):
                 ** np.arange(
                     min_log10_dist, max_log10_dist + 0.001, log10_dist_bin_step
                 )
-            ).astype(np.int),
+            ).astype(np.int_),
         ]
 
         # establish structure of an empty _stat:


=====================================
tests/test_scaling.py
=====================================
@@ -22,5 +22,4 @@ def test_scaling():
 
     output = pd.read_csv(io.StringIO(result), sep="\t", header=0)
 
-    # All the pairs, even "!" are counted as present because we don't provide regions
-    assert output["n_pairs"].sum() == 8
+    assert output["n_pairs"].sum() == 5



View it on GitLab: https://salsa.debian.org/med-team/pairtools/-/commit/2e5a204560b2168353de9926b19e852e6208cc5b

-- 
View it on GitLab: https://salsa.debian.org/med-team/pairtools/-/commit/2e5a204560b2168353de9926b19e852e6208cc5b
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20231211/0c59abe4/attachment-0001.htm>


More information about the debian-med-commit mailing list