[med-svn] [Git][med-team/umis][master] 8 commits: New upstream version 1.0.6

Sat Jan 18 22:21:28 GMT 2020


Steffen Möller pushed to branch master at Debian Med / umis


Commits:
1279f98c by Steffen Moeller at 2020-01-18T22:48:39+01:00
New upstream version 1.0.6
- - - - -
7356cf98 by Steffen Moeller at 2020-01-18T22:48:39+01:00
routine-update: New upstream version

- - - - -
d8d338d9 by Steffen Moeller at 2020-01-18T22:48:40+01:00
Update upstream source from tag 'upstream/1.0.6'

Update to upstream version '1.0.6'
with Debian dir 7495e62d53d61f6a6a9cbe64c47599ec42e099a6
- - - - -
b604f553 by Steffen Moeller at 2020-01-18T22:48:41+01:00
routine-update: debhelper-compat 12

- - - - -
04d5fcf6 by Steffen Moeller at 2020-01-18T22:48:43+01:00
routine-update: Standards-Version: 4.4.1

- - - - -
27ad2513 by Steffen Moeller at 2020-01-18T22:48:46+01:00
Set upstream metadata fields: Bug-Database, Bug-Submit, Repository, Repository-Browse.
- - - - -
8852f513 by Steffen Moeller at 2020-01-18T22:49:03+01:00
routine-update: Ready to upload to unstable

- - - - -
9ca3f89a by Steffen Moeller at 2020-01-18T23:21:11+01:00
Cleaner clean

- - - - -


8 changed files:

- HISTORY.md
- debian/changelog
- − debian/compat
- debian/control
- debian/rules
- debian/upstream/metadata
- setup.py
- umis/umis.py


Changes:

=====================================
HISTORY.md
=====================================
@@ -1,4 +1,13 @@
-## 1.0.4 (in progress)
+## 1.0.6
+- Fix for the python3 fix.
+
+## 1.0.5
+- Fix for cb_filter with python3.
+
+## 1.0.4
+- Enable cb_histogram to be used on samples without UMIs.
+- Enable filtering of cells during `demultiplex_cells`.
+- Fix incorrect pandas.read_csv call with header=-1.
 
 ## 1.0.3 
 - Python 3 support


=====================================
debian/changelog
=====================================
@@ -1,3 +1,14 @@
+umis (1.0.6-1) unstable; urgency=medium
+
+  * Team upload.
+  * New upstream version
+  * debhelper-compat 12 (routine-update)
+  * Standards-Version: 4.4.1 (routine-update)
+  * Set upstream metadata fields: Bug-Database, Bug-Submit, Repository,
+    Repository-Browse.
+
+ -- Steffen Moeller <moeller at debian.org>  Sat, 18 Jan 2020 22:48:47 +0100
+
 umis (1.0.3-2) unstable; urgency=medium
 
   * Build-Depends-Arch: python3-pysam


=====================================
debian/compat deleted
=====================================
@@ -1 +0,0 @@
-12


=====================================
debian/control
=====================================
@@ -3,13 +3,13 @@ Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.
 Uploaders: Andreas Tille <tille at debian.org>
 Section: science
 Priority: optional
-Build-Depends: debhelper (>= 12~),
+Build-Depends: debhelper-compat (= 12),
                dh-python,
                cython3,
                python3-dev,
                python3-setuptools
 Build-Depends-Arch: python3-pysam
-Standards-Version: 4.3.0
+Standards-Version: 4.4.1
 Vcs-Browser: https://salsa.debian.org/med-team/umis
 Vcs-Git: https://salsa.debian.org/med-team/umis.git
 Homepage: https://github.com/vals/umis
@@ -41,7 +41,7 @@ Description: tools for processing UMI RNA-tag data
 Package: umis-examples
 Architecture: all
 Depends: ${shlibs:Depends},
-         ${misc:Depends},
+         ${misc:Depends}
 Recommends: umis
 Description: tools for processing UMI RNA-tag data (examples)
  Umis provides tools for estimating expression in RNA-Seq data which


=====================================
debian/rules
=====================================
@@ -18,3 +18,8 @@ override_dh_install:
 	mv debian/python3-$(PYBUILD_NAME)/usr debian/$(PYBUILD_NAME)
 	rmdir debian/python3-$(PYBUILD_NAME)
 	find debian -type d -name __pycache__ | xargs rm -rf
+
+override_dh_auto_clean:
+	dh_auto_clean
+	rm -rf umis.egg-info
+	rm umis/utils.c


=====================================
debian/upstream/metadata
=====================================
@@ -1,20 +1,24 @@
 Reference:
- - Author: >
+- Author: >
     Valentine Svensson and Kedar Nath Natarajan and Lam-Ha Ly and Ricardo
     J Miragaia and Charlotte Labalette and Iain C Macaulay and Ana Cvejic
     and Sarah A Teichmann
-   Title: "Power analysis of single-cell RNA-sequencing experiments"
-   Journal: Nature methods
-   Year: 2017
-   Volume: 14
-   Pages: 381–387
-   DOI: 10.1038/nmeth.4220
-   PMID: 28263961
-   URL: https://www.nature.com/articles/nmeth.4220
+  Title: Power analysis of single-cell RNA-sequencing experiments
+  Journal: Nature methods
+  Year: 2017
+  Volume: 14
+  Pages: 381–387
+  DOI: 10.1038/nmeth.4220
+  PMID: 28263961
+  URL: https://www.nature.com/articles/nmeth.4220
 Registry:
- - Name: conda:bioconda
-   Entry: umis
- - Name: OMICtools
-   Entry: OMICS_12783
- - Name: bio.tools
-   Entry: NA
+- Name: conda:bioconda
+  Entry: umis
+- Name: OMICtools
+  Entry: OMICS_12783
+- Name: bio.tools
+  Entry: NA
+Bug-Database: https://github.com/vals/umis/issues
+Bug-Submit: https://github.com/vals/umis/issues/new
+Repository: https://github.com/vals/umis.git
+Repository-Browse: https://github.com/vals/umis


=====================================
setup.py
=====================================
@@ -8,7 +8,7 @@ def read(fname):
 
 setup(
         name='umis',
-        version='1.0.3',
+        version='1.0.6',
         description='Package for estimating UMI counts in Transcript Tag Counting data.',
         packages=find_packages(),
         install_requires=['click', 'pysam>=0.8.3', 'pandas', 'regex', 'scipy', 'toolz'],


=====================================
umis/umis.py
=====================================
@@ -24,7 +24,7 @@ import numpy as np
 import scipy.io, scipy.sparse
 import click
 
-VERSION = "1.0.3"
+VERSION = "1.0.6"
 
 logging.basicConfig(level=logging.INFO)
 logger = logging.getLogger(__name__)
@@ -35,7 +35,10 @@ BARCODEINFO = {"sample": BarcodeInfo(bamtag="XS", readprefix="SAMPLE"),
                "molecular": BarcodeInfo(bamtag="RX", readprefix="UMI")}
 
 def open_gzipsafe(f):
-    return gzip.open(f) if f.endswith(".gz") else open(f)
+    if is_python3():
+        return gzip.open(f, mode="rt") if f.endswith(".gz") else open(f)
+    else:
+        return gzip.open(f) if f.endswith(".gz") else open(f)
 
 def safe_makedir(dname):
     """Make a directory if it doesn't exist, handling concurrent race conditions.
@@ -75,7 +78,7 @@ def read_fastq(filename):
     if filename == "-":
         filename_fh = sys.stdin
     elif filename.endswith('gz'):
-        if is_python3:
+        if is_python3():
             filename_fh = gzip.open(filename, mode='rt')
         else:
             filename_fh = BufferedReader(gzip.open(filename, mode='rt'))
@@ -485,7 +488,7 @@ def tagcount(sam, out, genemap, output_evidence_table, positional, minevidence,
     cb_hist = None
     filter_cb = False
     if cb_histogram:
-        cb_hist = pd.read_csv(cb_histogram, index_col=0, header=-1, squeeze=True, sep="\t")
+        cb_hist = pd.read_csv(cb_histogram, index_col=0, header=None, squeeze=True, sep="\t")
         total_num_cbs = cb_hist.shape[0]
         cb_hist = cb_hist[cb_hist > cb_cutoff]
         logger.info('Keeping {} out of {} cellular barcodes.'.format(cb_hist.shape[0], total_num_cbs))
@@ -712,9 +715,9 @@ def tagcount(sam, out, genemap, output_evidence_table, positional, minevidence,
                       'read gene mapping information in stead of the mapping '
                       'target nane. Useful if e.g. reads have been mapped to '
                       'genome in stead of transcriptome.'))
- at click.option('--umi_matrix', required=False, 
+ at click.option('--umi_matrix', required=False,
               help=('Save a sparse matrix of counts without UMI deduping to this file.'))
-def fasttagcount(sam, out, genemap, positional, minevidence, cb_histogram, 
+def fasttagcount(sam, out, genemap, positional, minevidence, cb_histogram,
                  cb_cutoff, subsample, parse_tags, gene_tags, umi_matrix):
     ''' Count up evidence for tagged molecules, this implementation assumes the
     alignment file is coordinate sorted
@@ -758,7 +761,7 @@ def fasttagcount(sam, out, genemap, positional, minevidence, cb_histogram,
     cb_hist = None
     filter_cb = False
     if cb_histogram:
-        cb_hist = pd.read_csv(cb_histogram, index_col=0, header=-1, squeeze=True, sep="\t")
+        cb_hist = pd.read_csv(cb_histogram, index_col=0, header=None, squeeze=True, sep="\t")
         total_num_cbs = cb_hist.shape[0]
         cb_hist = cb_hist[cb_hist > cb_cutoff]
         logger.info('Keeping {} out of {} cellular barcodes.'.format(cb_hist.shape[0], total_num_cbs))
@@ -971,9 +974,9 @@ def cb_histogram(fastq, umi_histogram):
     for read in read_fastq(fastq):
         match = parser_re.search(read).groupdict()
         cb = match['CB']
-        umi = match['MB']
         cb_counter[cb] += 1
         if umi_histogram:
+            umi = match['MB']
             umi_counter[(cb, umi)] += 1
 
     for bc, count in cb_counter.most_common():
@@ -1054,9 +1057,9 @@ def cb_filter(fastq, bc1, bc2, bc3, cores, nedit):
     ''' Filters reads with non-matching barcodes
     Expects formatted fastq files.
     '''
-
     with open_gzipsafe(bc1) as bc1_fh:
         bc1 = set(cb.strip() for cb in bc1_fh)
+
     if bc2:
         with open_gzipsafe(bc2) as bc2_fh:
             bc2 = set(cb.strip() for cb in bc2_fh)
@@ -1312,7 +1315,10 @@ def is_python3():
 @click.option('--out_dir', default=".")
 @click.option('--readnumber', default="")
 @click.option('--prefix', default="")
-def demultiplex_cells(fastq, out_dir, readnumber, prefix=""):
+ at click.option('--cb_histogram', default=None)
+ at click.option('--cb_cutoff', default=0)
+def demultiplex_cells(fastq, out_dir, readnumber, prefix, cb_histogram,
+                      cb_cutoff):
     ''' Demultiplex a fastqtransformed FASTQ file into a FASTQ file for
     each cell.
     '''
@@ -1321,7 +1327,9 @@ def demultiplex_cells(fastq, out_dir, readnumber, prefix=""):
     parser_re = re.compile(re_string)
     readstring = "" if not readnumber else "_R{}".format(readnumber)
     filestring = "{prefix}{sample}{readstring}.fq"
-
+    cb_set = set()
+    if cb_histogram:
+        cb_set = get_cb_depth_set(cb_histogram, cb_cutoff)
     sample_set = set()
     batch = collections.defaultdict(list)
     parsed = 0
@@ -1330,6 +1338,8 @@ def demultiplex_cells(fastq, out_dir, readnumber, prefix=""):
         parsed += 1
         match = parser_re.search(read).groupdict()
         sample = match['CB']
+        if cb_set and sample not in cb_set:
+            continue
         sample_set.add(sample)
         batch[sample].append(read)
         # write in batches to avoid opening up file handles repeatedly



View it on GitLab: https://salsa.debian.org/med-team/umis/compare/b2127bad612ac35c0f07fee79c219f10b5f70782...9ca3f89a6cb1e635dc83817d3179d170b2a00a97

-- 
View it on GitLab: https://salsa.debian.org/med-team/umis/compare/b2127bad612ac35c0f07fee79c219f10b5f70782...9ca3f89a6cb1e635dc83817d3179d170b2a00a97
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20200118/ad5e3501/attachment-0001.html>