[med-svn] [Git][med-team/python-pyvcf][upstream] New upstream version 0.6.8+git20170215.476169c
Andreas Tille
gitlab at salsa.debian.org
Wed Jul 18 10:47:27 BST 2018
Andreas Tille pushed to branch upstream at Debian Med / python-pyvcf
Commits:
c017d2d8 by Andreas Tille at 2018-07-18T11:05:23+02:00
New upstream version 0.6.8+git20170215.476169c
- - - - -
25 changed files:
- + .gitignore
- + .travis.yml
- + LICENSE
- − PKG-INFO
- + docs/API.rst
- + docs/FILTERS.rst
- + docs/HISTORY.rst
- + docs/INTRO.rst
- + docs/Makefile
- + docs/conf.py
- + docs/index.rst
- + requirements/common-requirements.txt
- + requirements/pypy-requirements.txt
- − setup.cfg
- setup.py
- + tox.ini
- vcf/cparse.pyx
- vcf/model.py
- vcf/parser.py
- + vcf/test/bad-info-character.vcf
- vcf/test/example-4.0.vcf
- + vcf/test/issue-254.vcf
- vcf/test/test_vcf.py
- vcf/test/uncalled_genotypes.vcf
- vcf/test/walk_left.vcf
Changes:
=====================================
.gitignore
=====================================
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,13 @@
+PyVCF.egg-info
+build
+dist
+*.pyc
+docs/_build
+.ropeproject
+1kg.prof
+.noseids
+.tox
+.DS_Store
+vcf/cparse.c
+vcf/cparse.so
+.coverage
=====================================
.travis.yml
=====================================
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,18 @@
+# Validate this file using http://lint.travis-ci.org/
+language: python
+sudo: false
+cache:
+ directories:
+ - $HOME/.cache/pip
+python:
+ - "2.7"
+ - "3.4"
+ - "3.5"
+ - "3.6"
+ - "nightly"
+ - "pypy"
+ - "pypy3"
+install:
+ - if [[ "$TRAVIS_PYTHON_VERSION" =~ ^pypy ]]; then pip install -r requirements/pypy-requirements.txt; else pip install -r requirements/common-requirements.txt; fi
+ - python setup.py install
+script: python setup.py test
=====================================
LICENSE
=====================================
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,46 @@
+Copyright (c) 2011-2012, Population Genetics Technologies Ltd, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.
+
+2. Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimer in the documentation and/or
+other materials provided with the distribution.
+
+3. Neither the name of the Population Genetics Technologies Ltd nor the names of
+its contributors may be used to endorse or promote products derived from this
+software without specific prior written permission.
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Copyright (c) 2011 John Dougherty
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
=====================================
PKG-INFO deleted
=====================================
--- a/PKG-INFO
+++ /dev/null
@@ -1,27 +0,0 @@
-Metadata-Version: 1.1
-Name: PyVCF
-Version: 0.6.8
-Summary: Variant Call Format (VCF) parser for Python
-Home-page: https://github.com/jamescasbon/PyVCF
-Author: James Casbon and @jdoughertyii
-Author-email: casbon at gmail.com
-License: UNKNOWN
-Description: UNKNOWN
-Keywords: bioinformatics
-Platform: UNKNOWN
-Classifier: Development Status :: 4 - Beta
-Classifier: Intended Audience :: Developers
-Classifier: Intended Audience :: Science/Research
-Classifier: License :: OSI Approved :: BSD License
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Cython
-Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 2
-Classifier: Programming Language :: Python :: 2.6
-Classifier: Programming Language :: Python :: 2.7
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.2
-Classifier: Programming Language :: Python :: 3.3
-Classifier: Programming Language :: Python :: 3.4
-Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
=====================================
docs/API.rst
=====================================
--- /dev/null
+++ b/docs/API.rst
@@ -0,0 +1,56 @@
+API
+===
+
+vcf.Reader
+----------
+
+.. autoclass:: vcf.Reader
+ :members:
+
+vcf.Writer
+----------
+
+.. autoclass:: vcf.Writer
+ :members:
+
+vcf.model._Record
+-----------------
+
+.. autoclass:: vcf.model._Record
+ :members:
+
+vcf.model._Call
+---------------
+
+.. autoclass:: vcf.model._Call
+ :members:
+
+vcf.model._AltRecord
+--------------------
+
+.. autoclass:: vcf.model._AltRecord
+ :members:
+
+vcf.model._Substitution
+-----------------------
+
+.. autoclass:: vcf.model._Substitution
+ :members:
+
+vcf.model._SV
+-------------
+
+.. autoclass:: vcf.model._SV
+ :members:
+
+vcf.model._SingleBreakend
+-------------------------
+
+.. autoclass:: vcf.model._SingleBreakend
+ :members:
+
+vcf.model._Breakend
+-------------------
+
+.. autoclass:: vcf.parser._Breakend
+ :members:
=====================================
docs/FILTERS.rst
=====================================
--- /dev/null
+++ b/docs/FILTERS.rst
@@ -0,0 +1,158 @@
+Filtering VCF files
+===================
+
+The filter script: vcf_filter.py
+--------------------------------
+
+Filtering a VCF file based on some properties of interest is a common enough
+operation that PyVCF offers an extensible script. ``vcf_filter.py`` does
+the work of reading input, updating the metadata and filtering the records.
+
+
+Existing Filters
+----------------
+
+.. autoclass:: vcf.filters.SiteQuality
+
+.. autoclass:: vcf.filters.VariantGenotypeQuality
+
+.. autoclass:: vcf.filters.ErrorBiasFilter
+
+.. autoclass:: vcf.filters.DepthPerSample
+
+.. autoclass:: vcf.filters.AvgDepthPerSample
+
+.. autoclass:: vcf.filters.SnpOnly
+
+
+
+
+Adding a filter
+---------------
+
+You can reuse this work by providing a filter class, rather than writing your own filter.
+For example, lets say I want to filter each site based on the quality of the site.
+I can create a class like this::
+
+ import vcf.filters
+ class SiteQuality(vcf.filters.Base):
+ 'Filter sites by quality'
+
+ name = 'sq'
+
+ @classmethod
+ def customize_parser(self, parser):
+ parser.add_argument('--site-quality', type=int, default=30,
+ help='Filter sites below this quality')
+
+ def __init__(self, args):
+ self.threshold = args.site_quality
+
+ def __call__(self, record):
+ if record.QUAL < self.threshold:
+ return record.QUAL
+
+
+This class subclasses ``vcf.filters.Base`` which provides the interface for VCF filters.
+The docstring and ``name`` are metadata about the parser. The docstring provides
+the help for the script, and the first line is included in the FILTER metadata when
+applied to a file.
+
+The ``customize_parser`` method allows you to add arguments to the script.
+We use the ``__init__`` method to grab the argument of interest from the parser.
+Finally, the ``__call__`` method processes each record and returns a value if the
+filter failed. The base class uses the ``name`` and ``threshold`` to create
+the filter ID in the VCF file.
+
+To make vcf_filter.py aware of the filter, you can either use the local script option
+or declare an entry point. To use a local script, simply call vcf_filter::
+
+ $ vcf_filter.py --local-script my_filters.py ...
+
+To use an entry point, you need to declare a ``vcf.filters`` entry point in your ``setup``::
+
+ setup(
+ ...
+ entry_points = {
+ 'vcf.filters': [
+ 'site_quality = module.path:SiteQuality',
+ ]
+ }
+ )
+
+Either way, when you call vcf_filter.py, you should see your filter in the list of available filters::
+
+ usage: vcf_filter.py [-h] [--no-short-circuit] [--no-filtered]
+ [--output OUTPUT] [--local-script LOCAL_SCRIPT]
+ input filter [filter_args] [filter [filter_args]] ...
+
+
+ Filter a VCF file
+
+ positional arguments:
+ input File to process (use - for STDIN) (default: None)
+
+ optional arguments:
+ -h, --help Show this help message and exit. (default: False)
+ --no-short-circuit Do not stop filter processing on a site if any filter
+ is triggered (default: False)
+ --output OUTPUT Filename to output [STDOUT] (default: <open file
+ '<stdout>', mode 'w' at 0x1002841e0>)
+ --no-filtered Output only sites passing the filters (default: False)
+ --local-script LOCAL_SCRIPT
+ Python file in current working directory with the
+ filter classes (default: None)
+
+ sq:
+ Filter sites by quality
+
+ --site-quality SITE_QUALITY
+ Filter sites below this quality (default: 30)
+
+The filter base class: vcf.filters.Base
+---------------------------------------
+
+.. autoclass:: vcf.filters.Base
+ :members:
+
+
+
+Utilities
+=========
+
+.. automodule:: vcf.utils
+
+Simultaneously iterate two or more files
+----------------------------------------
+
+.. autofunction:: vcf.utils.walk_together
+
+Trim common suffix
+--------------------
+.. autofunction:: vcf.utils.trim_common_suffix
+
+
+vcf_melt
+--------
+
+This script converts a VCF file from wide format (many calls per row)
+to a long format (one call per row). This is useful if you want to grep per sample
+or for really quick import into, say, a spreadsheet::
+
+ $ vcf_melt < vcf/test/gatk.vcf
+ SAMPLE AD DP GQ GT PL FILTER CHROM POS REF ALT ID info.AC info.AF info.AN info.BaseQRankSum info.DB info.DP info.DS info.Dels info.FS info.HRun info.HaplotypeScore info.InbreedingCoeff info.MQ info.MQ0 info.MQRankSum info.QD info.ReadPosRankSum
+ BLANK 6,0 6 18.04 0/0 0,18,211 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA12878 138,107 250 99.0 0/1 1961,0,3049 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA12891 169,77 250 99.0 0/1 1038,0,3533 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA12892 249,0 250 99.0 0/0 0,600,5732 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA19238 248,1 250 99.0 0/0 0,627,6191 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA19239 250,0 250 99.0 0/0 0,615,5899 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ NA19240 250,0 250 99.0 0/0 0,579,5674 . chr22 42522392 G [A] rs28371738 2 0.143 14 0.375 True 1506 True 0.0 0.0 0 123.5516 253.92 0 0.685 5.9 0.59
+ BLANK 13,4 17 62.64 0/1 63,0,296 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA12878 118,127 246 99.0 0/1 2396,0,1719 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA12891 241,0 244 99.0 0/0 0,459,4476 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA12892 161,85 246 99.0 0/1 1489,0,2353 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA19238 110,132 242 99.0 0/1 2561,0,1488 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA19239 106,135 242 99.0 0/1 2613,0,1389 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+ NA19240 116,126 243 99.0 0/1 2489,0,1537 . chr22 42522613 G [C] rs1135840 6 0.429 14 16.289 True 1518 True 0.03 0.0 0 142.5716 242.46 0 2.01 9.16 -1.731
+
=====================================
docs/HISTORY.rst
=====================================
--- /dev/null
+++ b/docs/HISTORY.rst
@@ -0,0 +1,199 @@
+Development
+===========
+
+Please use the `PyVCF repository <https://github.com/jamescasbon/PyVCF/>`_.
+Pull requests gladly accepted.
+Issues should be reported at the github issue tracker.
+
+Running tests
+-------------
+
+Please check the tests by running them with::
+
+ python setup.py test
+
+New features should have test code sent with them.
+
+Changes
+=======
+
+0.6.7 Release
+-------------
+
+* Include missing .pyx files
+
+0.6.6 Release
+-------------
+
+* better walk together record ordering (Thanks @datagram, #141)
+
+0.6.5 Release
+-------------
+
+* Better contig handling (#115, #116, #119 thanks Martijn)
+* INFO lines with type character (#120, #121 thanks @AndrewUzilov, Martijn)
+* Single breakends fix (#126 thanks @pkrushe)
+* Speedup by losing ordering of INFO (#128 thanks Martijn)
+* HOMSEQ and other missing fields in INFO (#130 thanks Martijn)
+* Add aaf property, (thanks @mgymrek #131)
+* Custom equality for walk_together, thanks bow #132
+* Change default line encoding to '\n'
+* Improved __eq__ (#134, thanks bow)
+
+
+0.6.4 Release
+-------------
+
+* Handle INFO fields with multiple values, thanks
+* Support writing records without GT data #88, thanks @bow
+* Pickleable call data #112, thanks @superbobry
+* Write files without FORMAT #95 thanks Martijn
+* Strict whitespace mode, thanks Martijn, Lee Lichtenstein and Manawsi Gupta
+* Add support for contigs in header, thanks @gcnh and Martijn
+* Fix GATK header parsing, thanks @alimanfoo
+
+0.6.3 Release
+-------------
+
+* cython port of #79
+* correct writing of meta lines #84
+
+0.6.2 Release
+-------------
+
+* issues #78, #79 (thanks Sean, Brad)
+
+0.6.1 Release
+-------------
+
+* Add strict whitespace mode for well formed VCFs with spaces
+ in sample names (thanks Marco)
+* Ignore blank lines in files (thanks Martijn)
+* Tweaks for handling missing data (thanks Sean)
+* bcftools tests (thanks Martijn)
+* record.FILTER is always a list
+
+0.6.0 Release
+-------------
+
+* Backwards incompatible change: _Call.data is now a
+ namedtuple (previously it was a dict)
+* Optional cython version, much improved performance.
+* Improvements to writer (thanks @cmclean)
+* Improvements to inheritance of classes (thanks @lennax)
+
+
+0.5.0 Release
+-------------
+
+* VCF 4.1 support:
+ - support missing genotype #28 (thanks @martijnvermaat)
+ - parseALT for svs #42, #48 (thanks @dzerbino)
+* `trim_common_suffix` method #22 (thanks @martijnvermaat)
+* Multiple metadata with the same key is stored (#52)
+* Writer improvements:
+ - A/G in Number INFO fields #53 (thanks @lennax)
+ - Better output #55 (thanks @cmclean)
+* Allow malformed INFO fields #49 (thanks @ilyaminkin)
+* Added bayes factor error bias VCF filter
+* Added docs on vcf_melt
+* filters from @libor-m (SNP only, depth per sample, avg depth per sample)
+* change to the filter API, use docstring for filter description
+
+0.4.6 Release
+-------------
+
+* Performance improvements (#47)
+* Preserve order of INFO column (#46)
+
+0.4.5 Release
+-------------
+
+* Support exponent syntax qual values (#43, #44) (thanks @martijnvermaat)
+* Preserve order of header lines (#45)
+
+0.4.4 Release
+-------------
+
+* Support whitespace in sample names
+* SV work (thanks @arq5x)
+* Python 3 support via 2to3 (thanks @marcelm)
+* Improved filtering script, capable of importing local files
+
+0.4.3 Release
+-------------
+
+* Single floats in Reader._sample_parser not being converted to float #35
+* Handle String INFO values when Number=1 in header #34
+
+0.4.2 Release
+-------------
+
+* Installation problems
+
+0.4.1 Release
+-------------
+
+* Installation problems
+
+0.4.0 Release
+-------------
+
+* Package structure
+* add ``vcf.utils`` module with ``walk_together`` method
+* samtools tests
+* support Freebayes' non standard '.' for no call
+* fix vcf_melt
+* support monomorphic sites, add ``is_monomorphic`` method, handle null QUALs
+* filter support for files with monomorphic calls
+* Values declared as single are no-longer returned in lists
+* several performance improvements
+
+
+0.3.0 Release
+-------------
+
+* Fix setup.py for python < 2.7
+* Add ``__eq__`` to ``_Record`` and ``_Call``
+* Add ``is_het`` and ``is_variant`` to ``_Call``
+* Drop aggressive parse mode: we're always aggressive.
+* Add tabix fetch for single calls, fix one->zero based indexing
+* add prepend_chr mode for ``Reader`` to add `chr` to CHROM attributes
+
+0.2.2 Release
+-------------
+
+Documentation release
+
+0.2.1 Release
+-------------
+
+* Add shebang to vcf_filter.py
+
+0.2 Release
+-----------
+
+* Replace genotype dictionary with a ``Call`` object
+* Methods on ``Record`` and ``Call`` (thanks @arq5x)
+* Shortcut parse_sample when genotype is None
+
+0.1 Release
+-----------
+
+* Added test code
+* Added Writer class
+* Allow negative number in ``INFO`` and ``FORMAT`` fields (thanks @martijnvermaat)
+* Prefer ``vcf.Reader`` to ``vcf.VCFReader``
+* Support compressed files with guessing where filename is available on fsock
+* Allow opening by filename as well as filesocket
+* Support fetching rows for tabixed indexed files
+* Performance improvements (see ``test/prof.py``)
+* Added extensible filter script (see FILTERS.md), vcf_filter.py
+
+Contributions
+=============
+
+Project started by @jdoughertyii and taken over by @jamescasbon on 12th January 2011.
+Contributions from @arq5x, @brentp, @martijnvermaat, @ian1roberts, @marcelm.
+
+This project was supported by `Population Genetics <http://www.populationgenetics.com/>`_.
=====================================
docs/INTRO.rst
=====================================
--- /dev/null
+++ b/docs/INTRO.rst
@@ -0,0 +1,4 @@
+Introduction
+============
+
+.. include:: ../README.rst
=====================================
docs/Makefile
=====================================
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,130 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS =
+SPHINXBUILD = sphinx-build
+PAPER =
+BUILDDIR = _build
+
+# Internal variables.
+PAPEROPT_a4 = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest
+
+help:
+ @echo "Please use \`make <target>' where <target> is one of"
+ @echo " html to make standalone HTML files"
+ @echo " dirhtml to make HTML files named index.html in directories"
+ @echo " singlehtml to make a single large HTML file"
+ @echo " pickle to make pickle files"
+ @echo " json to make JSON files"
+ @echo " htmlhelp to make HTML files and a HTML help project"
+ @echo " qthelp to make HTML files and a qthelp project"
+ @echo " devhelp to make HTML files and a Devhelp project"
+ @echo " epub to make an epub"
+ @echo " latex to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+ @echo " latexpdf to make LaTeX files and run them through pdflatex"
+ @echo " text to make text files"
+ @echo " man to make manual pages"
+ @echo " changes to make an overview of all changed/added/deprecated items"
+ @echo " linkcheck to check all external links for integrity"
+ @echo " doctest to run all doctests embedded in the documentation (if enabled)"
+
+clean:
+ -rm -rf $(BUILDDIR)/*
+
+html:
+ $(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+ $(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+ @echo
+ @echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+ $(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+ @echo
+ @echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+ $(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+ @echo
+ @echo "Build finished; now you can process the pickle files."
+
+json:
+ $(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+ @echo
+ @echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+ $(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+ @echo
+ @echo "Build finished; now you can run HTML Help Workshop with the" \
+ ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+ $(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+ @echo
+ @echo "Build finished; now you can run "qcollectiongenerator" with the" \
+ ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+ @echo "# qcollectiongenerator $(BUILDDIR)/qthelp/PyVCF.qhcp"
+ @echo "To view the help file:"
+ @echo "# assistant -collectionFile $(BUILDDIR)/qthelp/PyVCF.qhc"
+
+devhelp:
+ $(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+ @echo
+ @echo "Build finished."
+ @echo "To view the help file:"
+ @echo "# mkdir -p $$HOME/.local/share/devhelp/PyVCF"
+ @echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/PyVCF"
+ @echo "# devhelp"
+
+epub:
+ $(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+ @echo
+ @echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo
+ @echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+ @echo "Run \`make' in that directory to run these through (pdf)latex" \
+ "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+ $(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+ @echo "Running LaTeX files through pdflatex..."
+ make -C $(BUILDDIR)/latex all-pdf
+ @echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+ $(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+ @echo
+ @echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+ $(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+ @echo
+ @echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+changes:
+ $(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+ @echo
+ @echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+ $(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+ @echo
+ @echo "Link check complete; look for any errors in the above output " \
+ "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+ $(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+ @echo "Testing of doctests in the sources finished, look at the " \
+ "results in $(BUILDDIR)/doctest/output.txt."
=====================================
docs/conf.py
=====================================
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,217 @@
+# -*- coding: utf-8 -*-
+#
+# PyVCF documentation build configuration file, created by
+# sphinx-quickstart on Wed Jan 25 12:29:23 2012.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+sys.path.insert(0, os.path.abspath('..'))
+
+# -- General configuration -----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.viewcode']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['.templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'PyVCF'
+copyright = u'2012, James Casbon, @jdoughertyii'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+import vcf
+version = vcf.VERSION
+# The full version, including alpha/beta/rc tags.
+release = vcf.VERSION
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = ['.build']
+
+# The reST default role (used for this markup: `text`) to use for all documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+
+# -- Options for HTML output ---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages. See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further. For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents. If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar. Default is the same as html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs. This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['.static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it. The value of this option must be the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'PyVCFdoc'
+
+
+# -- Options for LaTeX output --------------------------------------------------
+
+# The paper size ('letter' or 'a4').
+#latex_paper_size = 'letter'
+
+# The font size ('10pt', '11pt' or '12pt').
+#latex_font_size = '10pt'
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass [howto/manual]).
+latex_documents = [
+ ('index', 'PyVCF.tex', u'PyVCF Documentation',
+ u'James Casbon, @jdoughertyii', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Additional stuff for the LaTeX preamble.
+#latex_preamble = ''
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output --------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+ ('index', 'pyvcf', u'PyVCF Documentation',
+ [u'James Casbon, @jdoughertyii'], 1)
+]
=====================================
docs/index.rst
=====================================
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,22 @@
+
+PyVCF - A Variant Call Format Parser for Python
+===============================================
+
+Contents:
+
+.. toctree::
+ :maxdepth: 2
+
+ INTRO
+ API
+ FILTERS
+ HISTORY
+
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+
=====================================
requirements/common-requirements.txt
=====================================
--- /dev/null
+++ b/requirements/common-requirements.txt
@@ -0,0 +1,3 @@
+cython
+pysam!=0.8.0
+setuptools
=====================================
requirements/pypy-requirements.txt
=====================================
--- /dev/null
+++ b/requirements/pypy-requirements.txt
@@ -0,0 +1 @@
+setuptools
=====================================
setup.cfg deleted
=====================================
--- a/setup.cfg
+++ /dev/null
@@ -1,5 +0,0 @@
-[egg_info]
-tag_date = 0
-tag_build =
-tag_svn_revision = 0
-
=====================================
setup.py
=====================================
--- a/setup.py
+++ b/setup.py
@@ -8,15 +8,8 @@ try:
except:
CYTHON = False
-IS_PYTHON26 = sys.version_info[:2] == (2, 6)
-
DEPENDENCIES = ['setuptools']
-if IS_PYTHON26:
- DEPENDENCIES.extend(['argparse', 'counter', 'ordereddict',
- 'unittest2'])
-
-
# get the version without an import
VERSION = "Undefined"
DOC = ""
@@ -68,12 +61,13 @@ setup(
'Programming Language :: Cython',
'Programming Language :: Python',
'Programming Language :: Python :: 2',
- 'Programming Language :: Python :: 2.6',
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
- 'Programming Language :: Python :: 3.2',
- 'Programming Language :: Python :: 3.3',
'Programming Language :: Python :: 3.4',
+ 'Programming Language :: Python :: 3.5',
+ 'Programming Language :: Python :: 3.6',
+ 'Programming Language :: Python :: Implementation :: CPython',
+ 'Programming Language :: Python :: Implementation :: PyPy',
'Topic :: Scientific/Engineering :: Bio-Informatics',
],
keywords='bioinformatics',
=====================================
tox.ini
=====================================
--- /dev/null
+++ b/tox.ini
@@ -0,0 +1,21 @@
+# Tox (http://tox.testrun.org/) is a tool for running tests
+# in multiple virtualenvs. This configuration file will run the
+# test suite on all supported python versions. To use it, "pip install tox"
+# and then run "tox" from this directory.
+
+[tox]
+envlist = py27, py34, py35, py36, pypy, pypy3
+
+[testenv]
+deps =
+ -rrequirements/common-requirements.txt
+commands =
+ python setup.py clean --all test
+
+[testenv:pypy]
+deps =
+ -rrequirements/pypy-requirements.txt
+
+[testenv:pypy3]
+deps =
+ -rrequirements/pypy-requirements.txt
=====================================
vcf/cparse.pyx
=====================================
--- a/vcf/cparse.pyx
+++ b/vcf/cparse.pyx
@@ -1,14 +1,27 @@
from model import _Call
-cdef _map(func, iterable, bad='.'):
+cdef _map(func, iterable, bad=['.', '']):
'''``map``, but make bad values None.'''
- return [func(x) if x != bad else None
+ return [func(x) if x not in bad else None
for x in iterable]
INTEGER = 'Integer'
FLOAT = 'Float'
NUMERIC = 'Numeric'
+def _parse_filter(filt_str):
+ '''Parse the FILTER field of a VCF entry into a Python list
+
+ NOTE: this method has a python equivalent and care must be taken
+ to keep the two methods equivalent
+ '''
+ if filt_str == '.':
+ return None
+ elif filt_str == 'PASS':
+ return []
+ else:
+ return filt_str.split(';')
+
def parse_samples(
list names, list samples, samp_fmt,
list samp_fmt_types, list samp_fmt_nums, site):
@@ -39,6 +52,10 @@ def parse_samples(
if samp_fmt._fields[j] == 'GT':
sampdat[j] = vals
continue
+ # genotype filters are a special case
+ elif samp_fmt._fields[j] == 'FT':
+ sampdat[j] = _parse_filter(vals)
+ continue
elif not vals or vals == '.':
sampdat[j] = None
continue
@@ -48,8 +65,7 @@ def parse_samples(
entry_num = samp_fmt_nums[j]
# we don't need to split single entries
- if entry_num == 1 or ',' not in vals:
-
+ if entry_num == 1:
if entry_type == INTEGER:
try:
sampdat[j] = int(vals)
@@ -59,14 +75,9 @@ def parse_samples(
sampdat[j] = float(vals)
else:
sampdat[j] = vals
-
- if entry_num != 1:
- sampdat[j] = (sampdat[j])
-
continue
vals = vals.split(',')
-
if entry_type == INTEGER:
try:
sampdat[j] = _map(int, vals)
=====================================
vcf/model.py
=====================================
--- a/vcf/model.py
+++ b/vcf/model.py
@@ -23,10 +23,10 @@ class _Call(object):
#: Namedtuple of data from the VCF file
self.data = data
- if hasattr(self.data, 'GT'):
+ if getattr(self.data, 'GT', None) is not None:
self.gt_alleles = [(al if al != '.' else None) for al in allele_delimiter.split(self.data.GT)]
self.ploidity = len(self.gt_alleles)
- self.called = all([al != None for al in self.gt_alleles])
+ self.called = any(al is not None for al in self.gt_alleles)
self.gt_nums = self.data.GT if self.called else None
else:
#62 a call without a genotype is not defined as called or not
@@ -65,7 +65,7 @@ class _Call(object):
if self.called:
# lookup and return the actual DNA alleles
try:
- return self.gt_phase_char().join(str(self.site.alleles[int(X)]) for X in self.gt_alleles)
+ return self.gt_phase_char().join(str(self.site.alleles[int(X)] if X is not None else '.') for X in self.gt_alleles)
except:
sys.stderr.write("Allele number not found in list of alleles\n")
else:
@@ -117,6 +117,18 @@ class _Call(object):
return None
return self.gt_type == 1
+ @property
+ def is_filtered(self):
+ """ Return True for filtered calls """
+ try: # no FT annotation present for this variant
+ filt = self.data.FT
+ except AttributeError:
+ return False
+ if filt is None or len(filt) == 0: # FT is not set or set to PASS
+ return False
+ else:
+ return True
+
class _Record(object):
""" A set of calls at a site. Equivalent to a row in a VCF file.
@@ -279,7 +291,7 @@ class _Record(object):
@property
def num_called(self):
""" The number of called samples"""
- return sum(s.called for s in self.samples)
+ return sum(1 for s in self.samples if s.called)
@property
def call_rate(self):
@@ -389,7 +401,7 @@ class _Record(object):
return True
for alt in self.ALT:
if alt is None:
- return True
+ return False
if alt.type != "SNV" and alt.type != "MNV":
return False
elif len(alt) != len(self.REF):
@@ -440,7 +452,7 @@ class _Record(object):
# just one alt allele
alt_allele = self.ALT[0]
if alt_allele is None:
- return True
+ return False
if len(self.REF) > len(alt_allele):
return True
else:
@@ -536,6 +548,15 @@ class _Record(object):
""" Return True for reference calls """
return len(self.ALT) == 1 and self.ALT[0] is None
+ @property
+ def is_filtered(self):
+ """ Return True if a variant has been filtered """
+ filt = self.FILTER
+ if filt is None or len(filt) == 0: # FILTER is not set or set to PASS
+ return False
+ else:
+ return True
+
class _AltRecord(object):
'''An alternative allele record: either replacement string, SV placeholder, or breakend'''
=====================================
vcf/parser.py
=====================================
--- a/vcf/parser.py
+++ b/vcf/parser.py
@@ -78,12 +78,12 @@ _Contig = collections.namedtuple('Contig', ['id', 'length'])
class _vcf_metadata_parser(object):
- '''Parse the metadat in the header of a VCF file.'''
+ '''Parse the metadata in the header of a VCF file.'''
def __init__(self):
super(_vcf_metadata_parser, self).__init__()
self.info_pattern = re.compile(r'''\#\#INFO=<
ID=(?P<id>[^,]+),\s*
- Number=(?P<number>-?\d+|\.|[AGR]),\s*
+ Number=(?P<number>-?\d+|\.|[AGR])?,\s*
Type=(?P<type>Integer|Float|Flag|Character|String),\s*
Description="(?P<desc>[^"]*)"
(?:,\s*Source="(?P<source>[^"]*)")?
@@ -151,7 +151,7 @@ class _vcf_metadata_parser(object):
match = self.alt_pattern.match(alt_string)
if not match:
raise SyntaxError(
- "One of the FILTER lines is malformed: %s" % alt_string)
+ "One of the ALT lines is malformed: %s" % alt_string)
alt = _Alt(match.group('id'), match.group('desc'))
@@ -354,11 +354,24 @@ class Reader(object):
self.samples = fields[9:]
self._sample_indexes = dict([(x,i) for (i,x) in enumerate(self.samples)])
- def _map(self, func, iterable, bad='.'):
+ def _map(self, func, iterable, bad=['.', '']):
'''``map``, but make bad values None.'''
- return [func(x) if x != bad else None
+ return [func(x) if x not in bad else None
for x in iterable]
+ def _parse_filter(self, filt_str):
+ '''Parse the FILTER field of a VCF entry into a Python list
+
+ NOTE: this method has a cython equivalent and care must be taken
+ to keep the two methods equivalent
+ '''
+ if filt_str == '.':
+ return None
+ elif filt_str == 'PASS':
+ return []
+ else:
+ return filt_str.split(';')
+
def _parse_info(self, info_str):
'''Parse the INFO field of a VCF entry into a dictionary of Python
types.
@@ -466,6 +479,10 @@ class Reader(object):
if samp_fmt._fields[i] == 'GT':
sampdat[i] = vals
continue
+ # genotype filters are a special case
+ elif samp_fmt._fields[i] == 'FT':
+ sampdat[i] = self._parse_filter(vals)
+ continue
elif not vals or vals == ".":
sampdat[i] = None
continue
@@ -474,25 +491,19 @@ class Reader(object):
entry_type = samp_fmt._types[i]
# we don't need to split single entries
- if entry_num == 1 or ',' not in vals:
-
+ if entry_num == 1:
if entry_type == 'Integer':
try:
sampdat[i] = int(vals)
except ValueError:
sampdat[i] = float(vals)
- elif entry_type == 'Float':
+ elif entry_type == 'Float' or entry_type == 'Numeric':
sampdat[i] = float(vals)
else:
sampdat[i] = vals
-
- if entry_num != 1:
- sampdat[i] = (sampdat[i])
-
continue
vals = vals.split(',')
-
if entry_type == 'Integer':
try:
sampdat[i] = _map(int, vals)
@@ -562,13 +573,7 @@ class Reader(object):
except ValueError:
qual = None
- filt = row[6]
- if filt == '.':
- filt = None
- elif filt == 'PASS':
- filt = []
- else:
- filt = filt.split(';')
+ filt = self._parse_filter(row[6])
info = self._parse_info(row[7])
try:
@@ -741,11 +746,17 @@ class Writer(object):
else:
gt = './.' if 'GT' in fmt else ''
- if not gt:
- return ':'.join([self._stringify(x) for x in sample.data])
+ result = [gt] if gt else []
# Following the VCF spec, GT is always the first item whenever it is present.
- else:
- return ':'.join([gt] + [self._stringify(x) for x in sample.data[1:]])
+ for field in sample.data._fields:
+ value = getattr(sample.data,field)
+ if field == 'GT':
+ continue
+ if field == 'FT':
+ result.append(self._format_filter(value))
+ else:
+ result.append(self._stringify(value))
+ return ':'.join(result)
def _stringify(self, x, none='.', delim=','):
if type(x) == type([]):
=====================================
vcf/test/bad-info-character.vcf
=====================================
--- /dev/null
+++ b/vcf/test/bad-info-character.vcf
@@ -0,0 +1,14 @@
+##fileformat=VCFv4.1
+##INFO=<ID=EMPTY_1,Number=1,Type=Float,Description="A floating point value">
+##INFO=<ID=EMPTY_3,Number=3,Type=Float,Description="Floating point values">
+##INFO=<ID=EMPTY_N,Number=.,Type=Float,Description="Floating point values">
+##INFO=<ID=DOT_1,Number=1,Type=Character,Description="A character value">
+##INFO=<ID=DOT_3,Number=3,Type=Character,Description="Character values">
+##INFO=<ID=DOT_N,Number=.,Type=Character,Description="Character values">
+##INFO=<ID=NOTEMPTY_1,Number=1,Type=Float,Description="A floating point value">
+##INFO=<ID=NOTEMPTY_3,Number=3,Type=Float,Description="Floating point values">
+##INFO=<ID=NOTEMPTY_N,Number=.,Type=Float,Description="Floating point values">
+##INFO=<ID=FLAG,Number=0,Type=Flag,Description="HapMap2 membership">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT Sample
+chr1 100 id1 G A . . FLAG;EMPTY_1=;EMPTY_3=;EMPTY_N=;DOT_1=.;DOT_3=.,.,.;DOT_N=.;NOTEMPTY_1=1;NOTEMPTY_3=1,2,3;NOTEMPTY_N=1 GT 0/1
=====================================
vcf/test/example-4.0.vcf
=====================================
--- a/vcf/test/example-4.0.vcf
+++ b/vcf/test/example-4.0.vcf
@@ -20,4 +20,5 @@
20 17330 . T A 3.0 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3
20 1110696 rs6040355 A G,T 1e+03 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2
+20 1231234 . AT A 46 PASS NS=3;DP=15;AA=A GT:GQ:DP:HQ 1|1:23:7:26,30 0|0:27:9:56,60 0|0:31:10:65,71
20 1234567 microsat1 GTCT G,GTACT . PASS NS=3;DP=9;AA=G GT:GQ:DP ./.:35:4 0/2:17:2 1/1:40:3
=====================================
vcf/test/issue-254.vcf
=====================================
--- /dev/null
+++ b/vcf/test/issue-254.vcf
@@ -0,0 +1,9 @@
+##fileformat=VCFv4.1
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##phasing=partial
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
+#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT NA00001 NA00002 NA00003
+21 4242421 . T A 30 . . GT:AO 0|0:0.1 0|1:0.2 0/0:0.3
=====================================
vcf/test/test_vcf.py
=====================================
--- a/vcf/test/test_vcf.py
+++ b/vcf/test/test_vcf.py
@@ -393,6 +393,24 @@ class TestInfoTypeCharacter(unittest.TestCase):
self.assertEquals(l.INFO, r.INFO)
+class TestBadInfoFields(unittest.TestCase):
+ def test_parse(self):
+ reader = vcf.Reader(fh('bad-info-character.vcf'))
+ record = next(reader)
+ self.assertEquals(record.INFO['DOT_1'], None)
+ self.assertEquals(record.INFO['DOT_3'], [None, None, None])
+ self.assertEquals(record.INFO['DOT_N'], [None])
+ self.assertEquals(record.INFO['EMPTY_1'], None)
+ # Perhaps EMPTY_3 should yield [None, None, None] but this is really a
+ # cornercase of unspecified behaviour.
+ self.assertEquals(record.INFO['EMPTY_3'], [None])
+ self.assertEquals(record.INFO['EMPTY_N'], [None])
+ self.assertEquals(record.INFO['NOTEMPTY_1'], 1)
+ self.assertEquals(record.INFO['NOTEMPTY_3'], [1, 2, 3])
+ self.assertEquals(record.INFO['NOTEMPTY_N'], [1])
+ pass
+
+
class TestParseMetaLine(unittest.TestCase):
def test_parse(self):
reader = vcf.Reader(fh('parse-meta-line.vcf'))
@@ -578,6 +596,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(3.0/3.0, call_rate)
if var.POS == 1230237:
self.assertEqual(3.0/3.0, call_rate)
+ if var.POS == 1231234:
+ self.assertEqual(3.0/3.0, call_rate)
elif var.POS == 1234567:
self.assertEqual(2.0/3.0, call_rate)
@@ -593,6 +613,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual([2.0/6.0, 4.0/6.0], aaf)
if var.POS == 1230237:
self.assertEqual([0.0/6.0], aaf)
+ if var.POS == 1231234:
+ self.assertEqual([2.0/6.0], aaf)
elif var.POS == 1234567:
self.assertEqual([2.0/4.0, 1.0/4.0], aaf)
reader = vcf.Reader(fh('example-4.1-ploidy.vcf'))
@@ -615,6 +637,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(None, pi)
if var.POS == 1230237:
self.assertEqual(0.0/6.0, pi)
+ if var.POS == 1231234:
+ self.assertEqual((6.0/(6.0-1))*(2.0*(1.0/3.0)*(2.0/3.0)) , pi)
elif var.POS == 1234567:
self.assertEqual(None, pi)
@@ -630,6 +654,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(4.0/9.0, het)
if var.POS == 1230237:
self.assertEqual(0.0, het)
+ if var.POS == 1231234:
+ self.assertEqual(4.0/9.0, het)
elif var.POS == 1234567:
self.assertEqual(5.0/8.0, het)
@@ -650,6 +676,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(True, is_snp)
if var.POS == 1230237:
self.assertEqual(False, is_snp)
+ if var.POS == 1231234:
+ self.assertEqual(False, is_snp)
elif var.POS == 1234567:
self.assertEqual(False, is_snp)
@@ -682,6 +710,8 @@ class TestRecord(unittest.TestCase):
if var.POS == 1110696:
self.assertEqual(False, is_indel)
if var.POS == 1230237:
+ self.assertEqual(False, is_indel)
+ if var.POS == 1231234:
self.assertEqual(True, is_indel)
elif var.POS == 1234567:
self.assertEqual(True, is_indel)
@@ -698,6 +728,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(False, is_trans)
if var.POS == 1230237:
self.assertEqual(False, is_trans)
+ if var.POS == 1231234:
+ self.assertEqual(False, is_trans)
elif var.POS == 1234567:
self.assertEqual(False, is_trans)
@@ -712,6 +744,8 @@ class TestRecord(unittest.TestCase):
if var.POS == 1110696:
self.assertEqual(False, is_del)
if var.POS == 1230237:
+ self.assertEqual(False, is_del)
+ if var.POS == 1231234:
self.assertEqual(True, is_del)
elif var.POS == 1234567:
self.assertEqual(False, is_del)
@@ -727,6 +761,8 @@ class TestRecord(unittest.TestCase):
if var.POS == 1110696:
self.assertEqual("snp", type)
if var.POS == 1230237:
+ self.assertEqual("unknown", type)
+ if var.POS == 1231234:
self.assertEqual("indel", type)
elif var.POS == 1234567:
self.assertEqual("indel", type)
@@ -759,6 +795,8 @@ class TestRecord(unittest.TestCase):
if var.POS == 1110696:
self.assertEqual("unknown", subtype)
if var.POS == 1230237:
+ self.assertEqual("unknown", subtype)
+ if var.POS == 1231234:
self.assertEqual("del", subtype)
elif var.POS == 1234567:
self.assertEqual("unknown", subtype)
@@ -807,6 +845,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(False, is_sv)
if var.POS == 1230237:
self.assertEqual(False, is_sv)
+ if var.POS == 1231234:
+ self.assertEqual(False, is_sv)
elif var.POS == 1234567:
self.assertEqual(False, is_sv)
@@ -838,6 +878,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(False, is_precise)
if var.POS == 1230237:
self.assertEqual(False, is_precise)
+ if var.POS == 1231234:
+ self.assertEqual(False, is_precise)
elif var.POS == 1234567:
self.assertEqual(False, is_precise)
@@ -869,6 +911,8 @@ class TestRecord(unittest.TestCase):
self.assertEqual(None, sv_end)
if var.POS == 1230237:
self.assertEqual(None, sv_end)
+ if var.POS == 1231234:
+ self.assertEqual(None, sv_end)
elif var.POS == 1234567:
self.assertEqual(None, sv_end)
@@ -885,6 +929,8 @@ class TestRecord(unittest.TestCase):
expected = 1e+03
if var.POS == 1230237:
expected = 47
+ if var.POS == 1231234:
+ expected = 46
elif var.POS == 1234567:
expected = None
self.assertEqual(expected, qual)
@@ -1166,6 +1212,8 @@ class TestCall(unittest.TestCase):
self.assertEqual([True, True, False], phases)
if var.POS == 1230237:
self.assertEqual([True, True, False], phases)
+ if var.POS == 1231234:
+ self.assertEqual([True, True, True], phases)
elif var.POS == 1234567:
self.assertEqual([False, False, False], phases)
@@ -1181,6 +1229,8 @@ class TestCall(unittest.TestCase):
self.assertEqual(['G|T', 'T|G', 'T/T'], gt_bases)
elif var.POS == 1230237:
self.assertEqual(['T|T', 'T|T', 'T/T'], gt_bases)
+ elif var.POS == 1231234:
+ self.assertEqual(['A|A', 'AT|AT', 'AT|AT'], gt_bases)
elif var.POS == 1234567:
self.assertEqual([None, 'GTCT/GTACT', 'G/G'], gt_bases)
@@ -1198,6 +1248,8 @@ class TestCall(unittest.TestCase):
self.assertEqual([1,1,2], gt_types)
elif var.POS == 1230237:
self.assertEqual([0,0,0], gt_types)
+ elif var.POS == 1231234:
+ self.assertEqual([2,0,0], gt_types)
elif var.POS == 1234567:
self.assertEqual([None,1,2], gt_types)
@@ -1271,6 +1323,100 @@ class TestIssue201(unittest.TestCase):
pass
+class TestIssue234(unittest.TestCase):
+ """ See https://github.com/jamescasbon/PyVCF/issues/234 """
+
+ def test_vcf_metadata_parser_doesnt_break_with_empty_number_tags(self):
+ parser = vcf.parser._vcf_metadata_parser()
+ num_str = '##INFO=<ID=CA,Number=,Type=Flag,Description="Position '
+ num_str += 'could not be annotated to a coding region of a transcript '
+ num_str += 'using the supplied bed file">'
+ try:
+ info = parser.read_info(num_str)[1]
+ self.assertIsNone(info.num)
+ except SyntaxError:
+ msg = "vcf.parser._vcf_metadata_parser shouldn't raise SyntaxError"
+ msg += " if Number tag is empty."
+ self.fail(msg)
+
+
+class TestIssue246(unittest.TestCase):
+ """ See https://github.com/jamescasbon/PyVCF/issues/246 """
+
+ def test_FT_pass_two(self):
+ reader=vcf.Reader(fh('FT.vcf'))
+ next(reader)
+ r=next(reader)
+ target=[
+ [],
+ ['DP125','DP130'],
+ ['DP125','DP130'],
+ ['DP125','DP130'],
+ ['DP125','DP130']
+ ]
+ result=[call.data.FT for call in r.samples]
+ self.assertEqual(target,result)
+
+ def test_FT_one_two(self):
+ reader=list(vcf.Reader(fh('FT.vcf')))
+ r=reader[6]
+ target=[
+ ['DP125','DP130'],
+ ['DP125','DP130'],
+ ['DP125','DP130'],
+ ['DP130'],
+ ['DP125','DP130']
+ ]
+ result=[call.data.FT for call in r.samples]
+ self.assertEqual(target,result)
+
+
+class TestIssue254(unittest.TestCase):
+ """ See https://github.com/jamescasbon/PyVCF/issues/254 """
+
+ def test_remains_singleton_list(self):
+ reader = vcf.Reader(fh('issue-254.vcf'))
+ record = next(reader)
+ expected = [[0.1], [0.2], [0.3]]
+ actual = [call.data.AO for call in record.samples]
+ self.assertEqual(expected, actual)
+
+
+class TestIsFiltered(unittest.TestCase):
+ """ Test is_filtered property for _Call and _Record """
+
+ def test_is_filt_record(self):
+ reader = vcf.Reader(fh('FT.vcf'))
+ target = [
+ False, False, True, False, False,
+ False, True, False, False, False
+ ]
+ result = [record.is_filtered for record in reader]
+ self.assertEqual(target,result)
+
+ def test_is_filt_call_unset(self):
+ reader = vcf.Reader(fh('FT.vcf'))
+ record = next(reader)
+ target = [False]*5
+ result = [call.is_filtered for call in record]
+ self.assertEqual(target,result)
+
+ def test_is_filt_call_pass_two(self):
+ reader = vcf.Reader(fh('FT.vcf'))
+ next(reader)
+ record = next(reader)
+ target = [False, True, True, True, True]
+ result = [call.is_filtered for call in record]
+ self.assertEqual(target,result)
+
+ def test_is_filt_call_one(self):
+ reader = list(vcf.Reader(fh('FT.vcf')))
+ record = reader[6]
+ target = [True]*5
+ result = [call.is_filtered for call in record]
+ self.assertEqual(target,result)
+
+
class TestOpenMethods(unittest.TestCase):
samples = 'NA00001 NA00002 NA00003'.split()
@@ -1434,10 +1580,10 @@ class TestUtils(unittest.TestCase):
self.assertEqual(x[0], x[1])
self.assertEqual(x[1], x[2])
n+= 1
- self.assertEqual(n, 5)
+ self.assertEqual(n, 6)
- # artificial case 2 from the left, 2 from the right, 2 together, 1 from the right, 1 from the left
- expected = 'llrrttrl'
+ # artificial case 2 from the left, 2 from the right, 3 together, 1 from the right, 1 from the left
+ expected = 'llrrtttrl'
reader1 = vcf.Reader(fh('walk_left.vcf'))
reader2 = vcf.Reader(fh('example-4.0.vcf'))
@@ -1511,22 +1657,32 @@ class TestUncalledGenotypes(unittest.TestCase):
gt_nums = [s.gt_nums for s in var.samples]
ploidity = [s.ploidity for s in var.samples]
gt_alleles = [s.gt_alleles for s in var.samples]
+ gt_type = [s.gt_type for s in var.samples]
if var.POS == 14370:
self.assertEqual(['0|0', None, '1/1'], gt_nums)
self.assertEqual(['G|G', None, 'A/A'], gt_bases)
self.assertEqual([2,2,2], ploidity)
self.assertEqual([['0','0'], [None,None], ['1','1']], gt_alleles)
+ self.assertEqual([0, None, 2], gt_type)
elif var.POS == 17330:
self.assertEqual([None, '0|1', '0/0'], gt_nums)
self.assertEqual([None, 'T|A', 'T/T'], gt_bases)
self.assertEqual([3,2,2], ploidity)
self.assertEqual([[None,None,None], ['0','1'], ['0','0']], gt_alleles)
+ self.assertEqual([None, 1, 0], gt_type)
elif var.POS == 1234567:
self.assertEqual(['0/1', '0/2', None], gt_nums)
self.assertEqual(['GTC/G', 'GTC/GTCT', None], gt_bases)
self.assertEqual([2,2,1], ploidity)
self.assertEqual([['0','1'], ['0','2'], [None]], gt_alleles)
+ self.assertEqual([1, 1, None], gt_type)
+ elif var.POS == 1234568:
+ self.assertEqual(['./1', '0/.', None], gt_nums)
+ self.assertEqual(['./G', 'GTC/.', None], gt_bases)
+ self.assertEqual([2,2,1], ploidity)
+ self.assertEqual([[None,'1'], ['0',None], [None]], gt_alleles)
+ self.assertEqual([1, 1, None], gt_type)
reader._reader.close()
@@ -1584,6 +1740,10 @@ suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestRecord))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestCall))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestFetch))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue201))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue234))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue246))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue254))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIsFiltered))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestOpenMethods))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestSampleFilter))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestFilter))
@@ -1592,3 +1752,4 @@ suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestUtils))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestGATKMeta))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestUncalledGenotypes))
suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestStrelka))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestBadInfoFields))
=====================================
vcf/test/uncalled_genotypes.vcf
=====================================
--- a/vcf/test/uncalled_genotypes.vcf
+++ b/vcf/test/uncalled_genotypes.vcf
@@ -5,3 +5,4 @@
20 14370 rs6054257 G A 29 PASS NS=3 GT 0|0 ./. 1/1
20 17330 . T A 3 q10 NS=3 GT ././. 0|1 0/0
20 1234567 microsat1 GTC G,GTCT 50 PASS NS=3 GT 0/1 0/2 .
+20 1234568 . GTC G,GTCT 50 PASS NS=3 GT ./1 0/. .
=====================================
vcf/test/walk_left.vcf
=====================================
--- a/vcf/test/walk_left.vcf
+++ b/vcf/test/walk_left.vcf
@@ -21,4 +21,5 @@
19 17330 . T A 3 q10 NS=3;DP=11;AF=0.017 GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3 0/0:41:3:65,3
20 1110696 rs6040355 A G,T 67 PASS NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2 2/2:35:4:65,4
20 1230237 . T . 47 PASS NS=3;DP=13;AA=T GT:GQ:DP:HQ 0|0:54:7:56,60 0|0:48:4:51,51 0/0:61:2:65,3
+20 1231234 . AT A 46 PASS NS=3;DP=15;AA=A GT:GQ:DP:HQ 1|1:23:7:26,30 0|0:27:9:56,60 0|0:31:10:65,71
21 1234567 microsat1 GTCT G,GTACT 50 PASS NS=3;DP=9;AA=G GT:GQ:DP ./.:35:4 0/2:17:2 1/1:40:3
View it on GitLab: https://salsa.debian.org/med-team/python-pyvcf/commit/c017d2d8db0cb2bc11c399f99e28d1953dcc7c1a
--
View it on GitLab: https://salsa.debian.org/med-team/python-pyvcf/commit/c017d2d8db0cb2bc11c399f99e28d1953dcc7c1a
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20180718/22a524b5/attachment-0001.html>
More information about the debian-med-commit
mailing list