[med-svn] [Git][med-team/pairtools][master] 3 commits: New upstream version 1.0.2

Fri Dec 30 19:27:49 GMT 2022


Nilesh Patra pushed to branch master at Debian Med / pairtools


Commits:
2a17f27f by Nilesh Patra at 2022-12-30T22:06:55+05:30
New upstream version 1.0.2
- - - - -
62e00e43 by Nilesh Patra at 2022-12-30T22:06:57+05:30
Update upstream source from tag 'upstream/1.0.2'

Update to upstream version '1.0.2'
with Debian dir 2b379fcda31e4a0360900ebfa03ccea39aa5a66c
- - - - -
960243a9 by Nilesh Patra at 2022-12-31T00:57:03+05:30
Interim release. needs bioframe

- - - - -


20 changed files:

- − .github/workflows/python-package.yml
- − .github/workflows/python-publish-test.yml
- − .github/workflows/python-publish.yml
- CHANGES.md
- debian/changelog
- debian/control
- debian/patches/numpy-1.24.patch
- debian/patches/tests-to-python3
- pairtools/__init__.py
- pairtools/cli/dedup.py
- pairtools/cli/flip.py
- pairtools/cli/markasdup.py
- pairtools/cli/phase.py
- pairtools/cli/restrict.py
- pairtools/cli/select.py
- pairtools/cli/split.py
- pairtools/lib/dedup.py
- pairtools/lib/headerops.py
- pairtools/lib/scaling.py
- pairtools/lib/select.py


Changes:

=====================================
.github/workflows/python-package.yml deleted
=====================================
@@ -1,37 +0,0 @@
-# This workflow will install Python dependencies, run tests and lint with a variety of Python versions
-# For more information see: https://help.github.com/actions/language-and-framework-guides/using-python-with-github-actions
-
-name: Python package
-
-on: push
-
-jobs:
-  build:
-
-    runs-on: ubuntu-latest
-    strategy:
-      matrix:
-        python-version: ["3.7", "3.8", "3.9", "3.10"]
-
-    steps:
-    - uses: actions/checkout at v2
-    - name: Set up Python ${{ matrix.python-version }}
-      uses: actions/setup-python at v2
-      with:
-        python-version: ${{ matrix.python-version }}
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip wheel setuptools
-        pip install numpy cython pysam
-        pip install -r requirements-dev.txt
-        pip install -e .
-    - name: Lint with flake8
-      run: |
-        # stop the build if there are Python syntax errors or undefined names
-        flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
-        # exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
-        flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
-    - name: Test with pytest
-      run: |
-        pip install pytest
-        pytest


=====================================
.github/workflows/python-publish-test.yml deleted
=====================================
@@ -1,32 +0,0 @@
-
-# This workflows will upload a Python Package using Twine when a release is created
-# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
-
-name: Publish Python Package to Test PyPI
-
-on:
-  release:
-    types: [prereleased]
-
-jobs:
-  deploy:
-
-    runs-on: ubuntu-latest
-
-    steps:
-    - uses: actions/checkout at v2
-    - name: Set up Python
-      uses: actions/setup-python at v2
-      with:
-        python-version: '3.x'
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip
-        pip install setuptools wheel twine cython numpy pysam
-    - name: Build and publish
-      env:
-        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
-        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
-      run: |
-        python setup.py sdist
-        twine upload --repository-url https://test.pypi.org/legacy/ dist/*


=====================================
.github/workflows/python-publish.yml deleted
=====================================
@@ -1,31 +0,0 @@
-# This workflow will upload a Python Package using Twine when a release is created
-# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
-
-name: Upload Python Package
-
-on:
-  release:
-    types: [created]
-
-jobs:
-  deploy:
-
-    runs-on: ubuntu-latest
-
-    steps:
-    - uses: actions/checkout at v2
-    - name: Set up Python
-      uses: actions/setup-python at v2
-      with:
-        python-version: '3.x'
-    - name: Install dependencies
-      run: |
-        python -m pip install --upgrade pip
-        pip install setuptools wheel twine cython pysam numpy
-    - name: Build and publish
-      env:
-        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
-        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
-      run: |
-        python setup.py sdist
-        twine upload dist/*


=====================================
CHANGES.md
=====================================
@@ -1,3 +1,18 @@
+### 1.0.2 (2022-11-XX) ###
+
+- [x] `pairtools select` regex update 
+(string substitutions failed when the column name was a substring of another)
+
+- [x] Warnings capture in dedup: pairs lines are always split after rstrip newline
+
+- [x] Important fixes of splitting schema
+
+- [x] Dedup comment removed (failed when the read qualities contained "#")
+
+- [x] Remove dbist build out of wheel
+
+- [x] pairtools scaling: fixed an issue with scaling maximum range value https://github.com/open2c/pairtools/issues/150#issue-1439106031 
+
 ### 1.0.1 (2022-09-XX) ###
 
 - [x] Fixed issue with pysam dependencies on pip and conda


=====================================
debian/changelog
=====================================
@@ -1,3 +1,11 @@
+pairtools (1.0.2-1) UNRELEASED; urgency=medium
+
+  * Add patch to fix FTBFS with numpy 1.24 (Closes: #1027232)
+  * Bump Standards-Version to 4.6.2 (no changes needed)
+  * New upstream version 1.0.2
+
+ -- Nilesh Patra <nilesh at debian.org>  Fri, 30 Dec 2022 22:38:50 +0530
+
 pairtools (1.0.1-1) unstable; urgency=medium
 
   * Team upload.


=====================================
debian/control
=====================================
@@ -11,7 +11,7 @@ Build-Depends: debhelper-compat (= 13),
                python3-click,
                python3-numpy,
                python3-pandas,
-               python3-pysam,
+               python3-pysam (>= 0.20.0+ds-3),
                python3-scipy,
                python3-sphinx-click,
                libhts-dev,


=====================================
debian/patches/numpy-1.24.patch
=====================================
@@ -12,3 +12,190 @@ Last-Update: 2022-12-30
          ]
  
          # establish structure of an empty _stat:
+--- /dev/null
++++ b/pairtools/lib/htslib_util.h
+@@ -0,0 +1,145 @@
++#ifndef HTSLIB_UTIL_H
++#define HTSLIB_UTIL_H
++
++#include "htslib/sam.h"
++#include "htslib/vcf.h"
++#include "htslib/khash.h"
++
++/*
++typedef struct {
++        int32_t tid;
++        int32_t pos;
++        uint32_t bin:16, qual:8, l_qname:8;
++        uint32_t flag:16, n_cigar:16;
++        int32_t l_qseq;
++        int32_t mtid;
++        int32_t mpos;
++        int32_t isize;
++} bam1_core_t;
++
++typedef struct {
++        bam1_core_t core;
++        int l_aux, data_len, m_data;
++        uint8_t *data;
++} bam1_t;
++*/
++
++bam1_t *bam_init1()
++{
++    return (bam1_t*)calloc(1, sizeof(bam1_t));
++}
++
++bam1_t *bam_copy1(bam1_t *bdst, const bam1_t *bsrc)
++{
++    uint8_t *data = bdst->data;
++    int m_data = bdst->m_data;   // backup data and m_data
++    if (m_data < bsrc->l_data) { // double the capacity
++        m_data = bsrc->l_data; kroundup32(m_data);
++        data = (uint8_t*)realloc(data, m_data);
++    }
++    memcpy(data, bsrc->data, bsrc->l_data); // copy var-len data
++    *bdst = *bsrc; // copy the rest
++    // restore the backup
++    bdst->m_data = m_data;
++    bdst->data = data;
++    return bdst;
++}
++
++bam1_t *bam_dup1(const bam1_t *bsrc)
++{
++    if (bsrc == NULL) return NULL;
++    bam1_t *bdst = bam_init1();
++    if (bdst == NULL) return NULL;
++    return bam_copy1(bdst, bsrc);
++}
++
++int hts_set_verbosity(int verbosity);
++int hts_get_verbosity(void);
++
++
++KHASH_MAP_INIT_STR(vdict, bcf_idinfo_t)
++typedef khash_t(vdict) vdict_t;
++
++KHASH_DECLARE(s2i, kh_cstr_t, int64_t)
++typedef khash_t(s2i) s2i_t;
++		     
++//////////////////////////////////////////////////////////////////
++//////////////////////////////////////////////////////////////////
++//////////////////////////////////////////////////////////////////
++// various helper functions
++//
++
++/*!
++  @abstract Update the variable length data within a bam1_t entry
++
++  Old data is deleted and the data within b are re-arranged to 
++  make place for new data.
++  
++  @discussion Return NULL on error, otherwise b is returned.
++
++  @param  b           bam1_t data
++  @param  nbytes_old  size of old data
++  @param  nbytes_new  size of new data
++  @param  pos         position of data
++*/
++bam1_t * pysam_bam_update(bam1_t * b,
++			  const size_t nbytes_old,
++			  const size_t nbytes_new,
++			  uint8_t * pos);
++
++// translate a nucleotide character to binary code
++unsigned char pysam_translate_sequence(const unsigned char s);
++
++// return byte size of type
++int aux_type2size(uint8_t type);
++
++
++//-------------------------------------------------------
++// Wrapping accessor macros in sam.h
++static inline int pysam_bam_is_rev(bam1_t * b) {
++  return bam_is_rev(b);};
++
++static inline int pysam_bam_is_mrev(bam1_t * b) {
++  return bam_is_mrev(b);}
++
++static inline char * pysam_bam_get_qname(bam1_t * b) {
++  return bam_get_qname(b);}
++
++static inline uint32_t * pysam_bam_get_cigar(bam1_t * b) {
++  return bam_get_cigar(b);}
++
++static inline uint8_t * pysam_bam_get_seq(bam1_t * b) {
++  return bam_get_seq(b);}
++
++static inline uint8_t * pysam_bam_get_qual(bam1_t * b) {
++  return bam_get_qual(b);}
++
++static inline uint8_t * pysam_bam_get_aux(bam1_t * b) {
++  return bam_get_aux(b);}
++
++static inline int pysam_bam_get_l_aux(bam1_t * b) {
++  return bam_get_l_aux(b); }
++
++static inline char pysam_bam_seqi(uint8_t * s, int i) {
++  return bam_seqi(s,i);}
++
++static inline uint8_t pysam_get_qual(bam1_t * b) {
++  return b->core.qual;}
++
++static inline uint32_t pysam_get_n_cigar(bam1_t * b) {
++  return b->core.n_cigar;}
++
++static inline void pysam_set_qual(bam1_t * b, uint8_t v) {
++  b->core.qual=v;}
++
++static inline void pysam_set_n_cigar(bam1_t * b, uint32_t v) {
++  b->core.n_cigar=v;}
++
++static inline void pysam_update_flag(bam1_t * b, uint16_t v, uint16_t flag) {
++  if (v)
++    b->core.flag |= flag;
++  else
++    b->core.flag &= ~flag;
++}
++
++#endif
+--- /dev/null
++++ b/pairtools/lib/parse_pysam.pxd
+@@ -0,0 +1,27 @@
++from libc.stdint cimport int8_t, int16_t, int32_t, int64_t
++from libc.stdint cimport uint8_t, uint16_t, uint32_t, uint64_t
++
++cdef extern from "htslib/sam.h" nogil:
++    ctypedef struct bam1_core_t:
++        int32_t tid
++        int32_t pos
++        uint16_t bin
++        uint8_t qual
++        uint8_t l_qname
++        uint16_t flag
++        uint8_t unused1
++        uint8_t l_extranul
++        uint32_t n_cigar
++        int32_t l_qseq
++        int32_t mtid
++        int32_t mpos
++        int32_t isize
++
++    ctypedef struct bam1_t:
++        bam1_core_t core
++        int l_data
++        uint32_t m_data
++        uint8_t *data
++        uint64_t id
++
++    bam1_t *bam_dup1(const bam1_t *bsrc)
+--- a/pairtools/lib/parse_pysam.pyx
++++ b/pairtools/lib/parse_pysam.pyx
+@@ -122,4 +122,4 @@
+                  read_pos)
+             )
+ 
+-    return mismatches
+\ No newline at end of file
++    return mismatches


=====================================
debian/patches/tests-to-python3
=====================================
@@ -4,7 +4,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
 
 --- a/tests/test_merge.py
 +++ b/tests/test_merge.py
-@@ -22,7 +22,7 @@ def setup_sort_two():
+@@ -22,7 +22,7 @@
      try:
          subprocess.check_output(
              [
@@ -13,7 +13,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "sort",
-@@ -34,7 +34,7 @@ def setup_sort_two():
+@@ -34,7 +34,7 @@
  
          subprocess.check_output(
              [
@@ -22,7 +22,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "sort",
-@@ -54,7 +54,7 @@ def test_mock_pairsam(setup_sort_two):
+@@ -54,7 +54,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -31,7 +31,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "merge",
-@@ -102,7 +102,7 @@ def test_mock_pairsam(setup_sort_two):
+@@ -102,7 +102,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -42,7 +42,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "merge",
 --- a/tests/test_parse.py
 +++ b/tests/test_parse.py
-@@ -20,7 +20,7 @@ def test_mock_pysam():
+@@ -20,7 +20,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -51,7 +51,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "parse",
-@@ -64,7 +64,7 @@ def test_mock_pysam_parse_all():
+@@ -64,7 +64,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -62,7 +62,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "parse",
 --- a/tests/test_select.py
 +++ b/tests/test_select.py
-@@ -13,7 +13,7 @@ mock_chromsizes_path = os.path.join(test
+@@ -13,7 +13,7 @@
  def test_preserve():
      try:
          result = subprocess.check_output(
@@ -71,7 +71,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
          ).decode("ascii")
      except subprocess.CalledProcessError as e:
          print(e.output)
-@@ -35,7 +35,7 @@ def test_equal():
+@@ -35,7 +35,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -80,7 +80,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "select",
-@@ -68,7 +68,7 @@ def test_csv():
+@@ -68,7 +68,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -89,7 +89,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "select",
-@@ -101,7 +101,7 @@ def test_wildcard():
+@@ -101,7 +101,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -98,7 +98,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "select",
-@@ -136,7 +136,7 @@ def test_regex():
+@@ -136,7 +136,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -107,7 +107,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "select",
-@@ -169,7 +169,7 @@ def test_chrom_subset():
+@@ -169,7 +169,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -116,7 +116,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "select",
-@@ -221,7 +221,7 @@ def test_remove_columns():
+@@ -221,7 +221,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -127,7 +127,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "select",
 --- a/tests/test_sort.py
 +++ b/tests/test_sort.py
-@@ -11,7 +11,7 @@ def test_mock_pairsam():
+@@ -11,7 +11,7 @@
      mock_pairsam_path = os.path.join(testdir, "data", "mock.pairsam")
      try:
          result = subprocess.check_output(
@@ -138,7 +138,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
          print(e.output)
 --- a/tests/test_split.py
 +++ b/tests/test_split.py
-@@ -19,7 +19,7 @@ def setup_split():
+@@ -19,7 +19,7 @@
      try:
          subprocess.check_output(
              [
@@ -149,7 +149,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "split",
 --- a/tests/test_stats.py
 +++ b/tests/test_stats.py
-@@ -12,7 +12,7 @@ def test_mock_pairsam():
+@@ -12,7 +12,7 @@
      mock_pairsam_path = os.path.join(testdir, "data", "mock.4stats.pairs")
      try:
          result = subprocess.check_output(
@@ -158,7 +158,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
          ).decode("ascii")
      except subprocess.CalledProcessError as e:
          print(e.output)
-@@ -65,7 +65,7 @@ def test_merge_stats():
+@@ -65,7 +65,7 @@
      try:
          subprocess.check_output(
              [
@@ -167,7 +167,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "stats",
-@@ -78,7 +78,7 @@ def test_merge_stats():
+@@ -78,7 +78,7 @@
  
          subprocess.check_output(
              [
@@ -176,7 +176,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "stats",
-@@ -90,7 +90,7 @@ def test_merge_stats():
+@@ -90,7 +90,7 @@
          )
          subprocess.check_output(
              [
@@ -185,7 +185,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "stats",
-@@ -103,7 +103,7 @@ def test_merge_stats():
+@@ -103,7 +103,7 @@
          )
          subprocess.check_output(
              [
@@ -194,7 +194,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "stats",
-@@ -116,7 +116,7 @@ def test_merge_stats():
+@@ -116,7 +116,7 @@
          )
          subprocess.check_output(
              [
@@ -205,7 +205,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "stats",
 --- a/tests/test_flip.py
 +++ b/tests/test_flip.py
-@@ -13,7 +13,7 @@ def test_flip():
+@@ -13,7 +13,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -216,7 +216,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "flip",
 --- a/tests/test_markasdup.py
 +++ b/tests/test_markasdup.py
-@@ -11,7 +11,7 @@ def test_mock_pairsam():
+@@ -11,7 +11,7 @@
      mock_pairsam_path = os.path.join(testdir, "data", "mock.pairsam")
      try:
          result = subprocess.check_output(
@@ -227,7 +227,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
          print(e.output)
 --- a/tests/test_dedup.py
 +++ b/tests/test_dedup.py
-@@ -32,7 +32,7 @@ def setup_dedup():
+@@ -32,7 +32,7 @@
      try:
          subprocess.check_output(
              [
@@ -236,7 +236,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "dedup",
-@@ -49,7 +49,7 @@ def setup_dedup():
+@@ -49,7 +49,7 @@
          )
          subprocess.check_output(
              [
@@ -245,7 +245,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "dedup",
-@@ -68,7 +68,7 @@ def setup_dedup():
+@@ -68,7 +68,7 @@
          )
          subprocess.check_output(
              [
@@ -256,7 +256,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "dedup",
 --- a/tests/test_filterbycov.py
 +++ b/tests/test_filterbycov.py
-@@ -37,7 +37,7 @@ def setup_filterbycov():
+@@ -37,7 +37,7 @@
          for p in params:
              subprocess.check_output(
                  [
@@ -267,7 +267,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                      "filterbycov",
 --- a/tests/test_header.py
 +++ b/tests/test_header.py
-@@ -22,7 +22,7 @@ def test_generate():
+@@ -22,7 +22,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -278,7 +278,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "header",
 --- a/tests/test_parse2.py
 +++ b/tests/test_parse2.py
-@@ -15,7 +15,7 @@ def test_mock_pysam_parse2_read():
+@@ -15,7 +15,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -287,7 +287,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "-m",
                  "pairtools",
                  "parse2",
-@@ -75,7 +75,7 @@ def test_mock_pysam_parse2_pair():
+@@ -75,7 +75,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -298,7 +298,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "parse2",
 --- a/tests/test_restrict.py
 +++ b/tests/test_restrict.py
-@@ -16,7 +16,7 @@ def test_restrict():
+@@ -16,7 +16,7 @@
      try:
          result = subprocess.check_output(
              [
@@ -309,7 +309,7 @@ Last-Update: Wed, 18 Jun 2020 22:31:00 +0200
                  "restrict",
 --- a/tests/test_scaling.py
 +++ b/tests/test_scaling.py
-@@ -13,7 +13,7 @@ def test_scaling():
+@@ -13,7 +13,7 @@
      mock_pairsam_path = os.path.join(testdir, "data", "mock.pairsam")
      try:
          result = subprocess.check_output(


=====================================
pairtools/__init__.py
=====================================
@@ -10,6 +10,6 @@ CLI tools to process mapped Hi-C data
 
 """
 
-__version__ = "1.0.1"
+__version__ = "1.0.2"
 
 # from . import lib


=====================================
pairtools/cli/dedup.py
=====================================
@@ -171,12 +171,6 @@ UTIL_NAME = "pairtools_dedup"
     help=r"Separator (\t, \v, etc. characters are "
     "supported, pass them in quotes). [input format option]",
 )
- at click.option(
-    "--comment-char",
-    type=str,
-    default="#",
-    help="The first character of comment lines. [input format option]",
-)
 @click.option(
     "--send-header-to",
     type=click.Choice(["dups", "dedup", "both", "none"]),
@@ -304,7 +298,6 @@ def dedup(
     max_mismatch,
     method,
     sep,
-    comment_char,
     send_header_to,
     c1,
     c2,
@@ -342,7 +335,6 @@ def dedup(
         max_mismatch,
         method,
         sep,
-        comment_char,
         send_header_to,
         c1,
         c2,
@@ -376,7 +368,6 @@ def dedup_py(
     max_mismatch,
     method,
     sep,
-    comment_char,
     send_header_to,
     c1,
     c2,
@@ -548,7 +539,7 @@ def dedup_py(
         )
     elif backend in ("scipy", "sklearn"):
         streaming_dedup(
-            in_stream=instream,
+            in_stream=body_stream,
             colnames=column_names,
             chunksize=chunksize,
             carryover=carryover,
@@ -558,7 +549,6 @@ def dedup_py(
             extra_col_pairs=list(extra_col_pair),
             keep_parent_id=keep_parent_id,
             unmapped_chrom=unmapped_chrom,
-            comment_char=comment_char,
             outstream=outstream,
             outstream_dups=outstream_dups,
             outstream_unmapped=outstream_unmapped,


=====================================
pairtools/cli/flip.py
=====================================
@@ -101,7 +101,7 @@ def flip_py(pairs_path, chroms_path, output, **kwargs):
     ]
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
 
         is_annotated1 = cols[chrom1_col] in chrom_enum.keys()
         is_annotated2 = cols[chrom2_col] in chrom_enum.keys()


=====================================
pairtools/cli/markasdup.py
=====================================
@@ -55,7 +55,7 @@ def markasdup_py(pairsam_path, output, **kwargs):
     outstream.writelines((l + "\n" for l in header))
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         mark_split_pair_as_dup(cols)
 
         outstream.write(pairsam_format.PAIRSAM_SEP.join(cols))


=====================================
pairtools/cli/phase.py
=====================================
@@ -203,7 +203,7 @@ def phase_py(
     outstream.writelines((l + "\n" for l in header))
 
     for line in body_stream:
-        cols = line.split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         cols.append("!")
         cols.append("!")
         if report_scores:


=====================================
pairtools/cli/restrict.py
=====================================
@@ -96,7 +96,7 @@ def restrict_py(pairs_path, frags, output, **kwargs):
     }
 
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         chrom1, pos1 = cols[pairsam_format.COL_C1], int(cols[pairsam_format.COL_P1])
         rfrag_idx1, rfrag_start1, rfrag_end1 = find_rfrag(rfrags, chrom1, pos1)
         chrom2, pos2 = cols[pairsam_format.COL_C2], int(cols[pairsam_format.COL_P2])


=====================================
pairtools/cli/select.py
=====================================
@@ -229,7 +229,7 @@ def select_py(
     for filter_passed, line in evaluate_stream(
         body_stream, condition, column_names, type_cast, startup_code
     ):
-        COLS = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        COLS = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
 
         if remove_columns:
             COLS = [


=====================================
pairtools/cli/split.py
=====================================
@@ -92,7 +92,7 @@ def split_py(pairsam_path, output_pairs, output_sam, **kwargs):
                 header, "columns", " ".join(columns)
             )
             has_sams = True
-        elif ("sam1" in columns) != ("sam1" in columns):
+        elif ("sam1" in columns) != ("sam2" in columns):
             raise ValueError(
                 "According to the #columns header field only one sam entry is present"
             )
@@ -113,7 +113,7 @@ def split_py(pairsam_path, output_pairs, output_sam, **kwargs):
     sam1 = None
     sam2 = None
     for line in body_stream:
-        cols = line.rstrip().split(pairsam_format.PAIRSAM_SEP)
+        cols = line.rstrip('\n').split(pairsam_format.PAIRSAM_SEP)
         if has_sams:
             if sam1col < sam2col:
                 sam2 = cols.pop(sam2col)


=====================================
pairtools/lib/dedup.py
=====================================
@@ -13,6 +13,10 @@ from .._logging import get_logger
 logger = get_logger()
 import time
 
+# Ignore pandas future warnings:
+import warnings
+warnings.simplefilter(action='ignore', category=FutureWarning)
+
 # Setting for cython deduplication:
 # you don't need to load more than 10k lines at a time b/c you get out of the
 # CPU cache, so this parameter is not adjustable
@@ -29,7 +33,6 @@ def streaming_dedup(
     max_mismatch,
     extra_col_pairs,
     unmapped_chrom,
-    comment_char,
     outstream,
     outstream_dups,
     outstream_unmapped,
@@ -49,7 +52,6 @@ def streaming_dedup(
         max_mismatch=max_mismatch,
         extra_col_pairs=extra_col_pairs,
         keep_parent_id=keep_parent_id,
-        comment_char=comment_char,
         backend=backend,
         n_proc=n_proc,
     )
@@ -114,13 +116,12 @@ def _dedup_stream(
     max_mismatch,
     extra_col_pairs,
     keep_parent_id,
-    comment_char,
     backend,
     n_proc,
 ):
     # Stream the input dataframe:
     dfs = pd.read_table(
-        in_stream, comment=comment_char, names=colnames, chunksize=chunksize
+        in_stream, comment=None, names=colnames, chunksize=chunksize
     )
 
     # Set up the carryover dataframe:
@@ -375,7 +376,7 @@ def streaming_dedup_cython(
     read_idx = 0  # read index to mark the parent readID
     while True:
         rawline = next(instream, None)
-        stripline = rawline.strip() if rawline else None
+        stripline = rawline.strip('\n') if rawline else None
 
         # take care of empty lines not at the end of the file separately
         if rawline and (not stripline):


=====================================
pairtools/lib/headerops.py
=====================================
@@ -70,7 +70,7 @@ def get_header(instream, comment_char=COMMENT_CHAR, ignore_warning=False):
         if isinstance(line, bytes):
             line = line.decode()
         # append line to header, since it does start with header
-        header.append(line.strip())
+        header.append(line.rstrip('\n'))
         # peek into the remainder of the instream
         current_peek = peek_f(1)
     # apparently, next line does not start with the comment
@@ -95,7 +95,7 @@ def extract_fields(header, field_name, save_rest=False):
     rest = []
     for l in header:
         if l.lstrip(COMMENT_CHAR).startswith(field_name + ":"):
-            fields.append(l.split(":", 1)[1].strip())
+            fields.append(l.split(":", 1)[1].rstrip('\n').lstrip())
         elif save_rest:
             rest.append(l)
 


=====================================
pairtools/lib/scaling.py
=====================================
@@ -136,6 +136,7 @@ def bins_pairs_by_distance(
     pairs_df, dist_bins, regions=None, chromsizes=None, ignore_trans=False
 ):
 
+    dist_bins = np.r_[dist_bins, np.iinfo(np.int64).max]
     if regions is None:
         if chromsizes is None:
             chroms = sorted(
@@ -208,7 +209,7 @@ def bins_pairs_by_distance(
     )
 
     pairs_reduced_df["max_dist"] = np.where(
-        pairs_reduced_df["dist_bin_idx"] < len(dist_bins),
+        pairs_reduced_df["dist_bin_idx"] < len(dist_bins)-1,
         dist_bins[pairs_reduced_df["dist_bin_idx"]],
         np.iinfo(np.int64).max,
     )
@@ -220,6 +221,8 @@ def bins_pairs_by_distance(
         (pairs_reduced_df.chrom1 == pairs_reduced_df.chrom2)
         & (pairs_reduced_df.start1 == pairs_reduced_df.start2)
         & (pairs_reduced_df.end1 == pairs_reduced_df.end2)
+        & (pairs_reduced_df.min_dist > 0)
+        & (pairs_reduced_df.max_dist < np.iinfo(np.int64).max)
     )
 
     pairs_for_scaling_df = pairs_reduced_df.loc[pairs_for_scaling_mask]


=====================================
pairtools/lib/select.py
=====================================
@@ -66,9 +66,11 @@ def evaluate_stream(
     for i, col in enumerate(column_names):
         if col in TYPES:
             col_type = TYPES[col]
-            condition = condition.replace(col, "{}(COLS[{}])".format(col_type, i))
+            condition = re.sub(r"\b%s\b" % col , "{}(COLS[{}])".format(col_type, i), condition)
+            #condition.replace(col, "{}(COLS[{}])".format(col_type, i))
         else:
-            condition = condition.replace(col, "COLS[{}]".format(i))
+            condition = re.sub(r"\b%s\b" % col, "COLS[{}]".format(i), condition)
+            #condition = condition.replace(col, "COLS[{}]".format(i))
 
     # Compile the filtering expression:
     match_func = compile(condition, "<string>", "eval")
@@ -121,7 +123,8 @@ def evaluate_df(df, condition, type_cast=(), startup_code=None, engine="pandas")
     else:
         # Set up the columns indexing
         for i, col in enumerate(df.columns):
-            condition = condition.replace(col, "COLS[{}]".format(i))
+            condition = re.sub(r"\b%s\b" % col, "COLS[{}]".format(i), condition)
+            #condition = condition.replace(col, "COLS[{}]".format(i))
 
         filter_passed_output = []
         match_func = compile(condition, "<string>", "eval")



View it on GitLab: https://salsa.debian.org/med-team/pairtools/-/compare/13246875a0fc3671b4d04e019c1afc5cde255de0...960243a95c3a86e7e2a8e313db04b1215fe60838

-- 
View it on GitLab: https://salsa.debian.org/med-team/pairtools/-/compare/13246875a0fc3671b4d04e019c1afc5cde255de0...960243a95c3a86e7e2a8e313db04b1215fe60838
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20221230/54461611/attachment-0001.htm>