[med-svn] [Git][med-team/python-pyvcf][upstream] New upstream version 0.6.8+git20170215.476169c

Andreas Tille gitlab at salsa.debian.org
Wed Jul 18 10:47:27 BST 2018


Andreas Tille pushed to branch upstream at Debian Med / python-pyvcf


Commits:
c017d2d8 by Andreas Tille at 2018-07-18T11:05:23+02:00
New upstream version 0.6.8+git20170215.476169c
- - - - -


25 changed files:

- + .gitignore
- + .travis.yml
- + LICENSE
- − PKG-INFO
- + docs/API.rst
- + docs/FILTERS.rst
- + docs/HISTORY.rst
- + docs/INTRO.rst
- + docs/Makefile
- + docs/conf.py
- + docs/index.rst
- + requirements/common-requirements.txt
- + requirements/pypy-requirements.txt
- − setup.cfg
- setup.py
- + tox.ini
- vcf/cparse.pyx
- vcf/model.py
- vcf/parser.py
- + vcf/test/bad-info-character.vcf
- vcf/test/example-4.0.vcf
- + vcf/test/issue-254.vcf
- vcf/test/test_vcf.py
- vcf/test/uncalled_genotypes.vcf
- vcf/test/walk_left.vcf


Changes:

=====================================
.gitignore
=====================================
--- /dev/null
+++ b/.gitignore
@@ -0,0 +1,13 @@
+PyVCF.egg-info
+build
+dist
+*.pyc
+docs/_build
+.ropeproject
+1kg.prof
+.noseids
+.tox
+.DS_Store
+vcf/cparse.c
+vcf/cparse.so
+.coverage


=====================================
.travis.yml
=====================================
--- /dev/null
+++ b/.travis.yml
@@ -0,0 +1,18 @@
+# Validate this file using http://lint.travis-ci.org/
+language: python
+sudo: false
+cache:
+  directories:
+    - $HOME/.cache/pip
+python:
+  - "2.7"
+  - "3.4"
+  - "3.5"
+  - "3.6"
+  - "nightly"
+  - "pypy"
+  - "pypy3"
+install:
+  - if [[ "$TRAVIS_PYTHON_VERSION" =~ ^pypy ]]; then pip install -r requirements/pypy-requirements.txt; else pip install -r requirements/common-requirements.txt; fi
+  - python setup.py install
+script: python setup.py test


=====================================
LICENSE
=====================================
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,46 @@
+Copyright (c) 2011-2012, Population Genetics Technologies Ltd, All rights reserved.
+
+Redistribution and use in source and binary forms, with or without
+modification, are permitted provided that the following conditions are met:
+
+1. Redistributions of source code must retain the above copyright notice, this
+list of conditions and the following disclaimer.  
+
+2. Redistributions in binary form must reproduce the above copyright notice, this
+list of conditions and the following disclaimer in the documentation and/or
+other materials provided with the distribution.  
+
+3. Neither the name of the Population Genetics Technologies Ltd nor the names of
+its contributors may be used to endorse or promote products derived from this
+software without specific prior written permission.  
+
+THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
+ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
+WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
+DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
+FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
+DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
+SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
+CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
+OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
+OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
+
+
+Copyright (c) 2011 John Dougherty
+
+Permission is hereby granted, free of charge, to any person obtaining a copy of
+this software and associated documentation files (the "Software"), to deal in
+the Software without restriction, including without limitation the rights to
+use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of
+the Software, and to permit persons to whom the Software is furnished to do so,
+subject to the following conditions:
+
+The above copyright notice and this permission notice shall be included in all
+copies or substantial portions of the Software.
+
+THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
+IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS
+FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR
+COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER
+IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN
+CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.


=====================================
PKG-INFO deleted
=====================================
--- a/PKG-INFO
+++ /dev/null
@@ -1,27 +0,0 @@
-Metadata-Version: 1.1
-Name: PyVCF
-Version: 0.6.8
-Summary: Variant Call Format (VCF) parser for Python
-Home-page: https://github.com/jamescasbon/PyVCF
-Author: James Casbon and @jdoughertyii
-Author-email: casbon at gmail.com
-License: UNKNOWN
-Description: UNKNOWN
-Keywords: bioinformatics
-Platform: UNKNOWN
-Classifier: Development Status :: 4 - Beta
-Classifier: Intended Audience :: Developers
-Classifier: Intended Audience :: Science/Research
-Classifier: License :: OSI Approved :: BSD License
-Classifier: License :: OSI Approved :: MIT License
-Classifier: Operating System :: OS Independent
-Classifier: Programming Language :: Cython
-Classifier: Programming Language :: Python
-Classifier: Programming Language :: Python :: 2
-Classifier: Programming Language :: Python :: 2.6
-Classifier: Programming Language :: Python :: 2.7
-Classifier: Programming Language :: Python :: 3
-Classifier: Programming Language :: Python :: 3.2
-Classifier: Programming Language :: Python :: 3.3
-Classifier: Programming Language :: Python :: 3.4
-Classifier: Topic :: Scientific/Engineering :: Bio-Informatics


=====================================
docs/API.rst
=====================================
--- /dev/null
+++ b/docs/API.rst
@@ -0,0 +1,56 @@
+API
+===
+
+vcf.Reader
+----------
+
+.. autoclass:: vcf.Reader
+   :members:
+
+vcf.Writer
+----------
+
+.. autoclass:: vcf.Writer
+   :members:
+
+vcf.model._Record
+-----------------
+
+.. autoclass:: vcf.model._Record
+   :members:
+
+vcf.model._Call
+---------------
+
+.. autoclass:: vcf.model._Call
+   :members:
+
+vcf.model._AltRecord
+--------------------
+
+.. autoclass:: vcf.model._AltRecord
+   :members:
+
+vcf.model._Substitution
+-----------------------
+
+.. autoclass:: vcf.model._Substitution
+   :members:
+
+vcf.model._SV
+-------------
+
+.. autoclass:: vcf.model._SV
+   :members:
+
+vcf.model._SingleBreakend
+-------------------------
+
+.. autoclass:: vcf.model._SingleBreakend
+   :members:
+
+vcf.model._Breakend
+-------------------
+
+.. autoclass:: vcf.parser._Breakend
+   :members:


=====================================
docs/FILTERS.rst
=====================================
--- /dev/null
+++ b/docs/FILTERS.rst
@@ -0,0 +1,158 @@
+Filtering VCF files
+===================
+
+The filter script: vcf_filter.py
+--------------------------------
+
+Filtering a VCF file based on some properties of interest is a common enough 
+operation that PyVCF offers an extensible script.  ``vcf_filter.py`` does 
+the work of reading input, updating the metadata and filtering the records.
+
+
+Existing Filters
+----------------
+
+.. autoclass:: vcf.filters.SiteQuality
+
+.. autoclass:: vcf.filters.VariantGenotypeQuality
+
+.. autoclass:: vcf.filters.ErrorBiasFilter
+
+.. autoclass:: vcf.filters.DepthPerSample
+
+.. autoclass:: vcf.filters.AvgDepthPerSample
+
+.. autoclass:: vcf.filters.SnpOnly
+
+
+
+
+Adding a filter
+---------------
+
+You can reuse this work by providing a filter class, rather than writing your own filter.
+For example, lets say I want to filter each site based on the quality of the site.
+I can create a class like this::
+   
+    import vcf.filters
+    class SiteQuality(vcf.filters.Base):
+        'Filter sites by quality'
+
+        name = 'sq'
+
+        @classmethod
+        def customize_parser(self, parser):
+            parser.add_argument('--site-quality', type=int, default=30,
+                    help='Filter sites below this quality')
+
+        def __init__(self, args):
+            self.threshold = args.site_quality
+
+        def __call__(self, record):
+            if record.QUAL < self.threshold:
+                return record.QUAL
+
+
+This class subclasses ``vcf.filters.Base`` which provides the interface for VCF filters.
+The docstring  and ``name`` are metadata about the parser.  The docstring provides
+the help for the script, and the first line is included in the FILTER metadata when 
+applied to a file.
+
+The ``customize_parser`` method allows you to add arguments to the script.
+We use the ``__init__`` method to grab the argument of interest from the parser.
+Finally, the ``__call__`` method processes each record and returns a value if the 
+filter failed.  The base class uses the ``name`` and ``threshold`` to create
+the filter ID in the VCF file.
+
+To make vcf_filter.py aware of the filter, you can either use the local script option
+or declare an entry point.  To use a local script, simply call vcf_filter::
+
+    $ vcf_filter.py --local-script my_filters.py ...
+
+To use an entry point, you need to declare a ``vcf.filters`` entry point in your ``setup``::
+
+    setup(
+        ...
+        entry_points = {
+            'vcf.filters': [
+                'site_quality = module.path:SiteQuality',
+            ]
+        }
+    )
+
+Either way, when you call vcf_filter.py, you should see your filter in the list of available filters::
+
+    usage: vcf_filter.py [-h] [--no-short-circuit] [--no-filtered] 
+                  [--output OUTPUT] [--local-script LOCAL_SCRIPT]
+                  input filter [filter_args] [filter [filter_args]] ...
+                
+
+    Filter a VCF file
+
+    positional arguments:
+      input                 File to process (use - for STDIN) (default: None)
+
+    optional arguments:
+      -h, --help            Show this help message and exit. (default: False)
+      --no-short-circuit    Do not stop filter processing on a site if any filter
+                            is triggered (default: False)
+      --output OUTPUT       Filename to output [STDOUT] (default: <open file
+                            '<stdout>', mode 'w' at 0x1002841e0>)
+      --no-filtered         Output only sites passing the filters (default: False)
+      --local-script LOCAL_SCRIPT
+                            Python file in current working directory with the
+                            filter classes (default: None)
+
+    sq:
+      Filter sites by quality
+
+      --site-quality SITE_QUALITY
+                            Filter sites below this quality (default: 30)
+
+The filter base class: vcf.filters.Base
+---------------------------------------
+
+.. autoclass:: vcf.filters.Base
+   :members:
+
+
+
+Utilities
+=========
+
+.. automodule:: vcf.utils
+
+Simultaneously iterate two or more files
+----------------------------------------
+
+.. autofunction:: vcf.utils.walk_together
+
+Trim common suffix
+--------------------
+.. autofunction:: vcf.utils.trim_common_suffix
+
+
+vcf_melt
+--------
+
+This script converts a VCF file from wide format (many calls per row) 
+to a long format (one call per row).  This is useful if you want to grep per sample
+or for really quick import into, say, a spreadsheet::
+
+    $ vcf_melt < vcf/test/gatk.vcf 
+    SAMPLE	AD	DP	GQ	GT	PL	FILTER	CHROM	POS	REF	ALT	ID	info.AC	info.AF	info.AN	info.BaseQRankSum	info.DB	info.DP	info.DS	info.Dels	info.FS	info.HRun	info.HaplotypeScore	info.InbreedingCoeff	info.MQ	info.MQ0	info.MQRankSum	info.QD	info.ReadPosRankSum
+    BLANK	6,0	6	18.04	0/0	0,18,211	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA12878	138,107	250	99.0	0/1	1961,0,3049	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA12891	169,77	250	99.0	0/1	1038,0,3533	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA12892	249,0	250	99.0	0/0	0,600,5732	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA19238	248,1	250	99.0	0/0	0,627,6191	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA19239	250,0	250	99.0	0/0	0,615,5899	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    NA19240	250,0	250	99.0	0/0	0,579,5674	.	chr22	42522392	G	[A]	rs28371738	2	0.143	14	0.375	True	1506	True	0.0	0.0	0	123.5516		253.92	0	0.685	5.9	0.59
+    BLANK	13,4	17	62.64	0/1	63,0,296	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA12878	118,127	246	99.0	0/1	2396,0,1719	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA12891	241,0	244	99.0	0/0	0,459,4476	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA12892	161,85	246	99.0	0/1	1489,0,2353	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA19238	110,132	242	99.0	0/1	2561,0,1488	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA19239	106,135	242	99.0	0/1	2613,0,1389	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+    NA19240	116,126	243	99.0	0/1	2489,0,1537	.	chr22	42522613	G	[C]	rs1135840	6	0.429	14	16.289	True	1518	True	0.03	0.0	0	142.5716		242.46	0	2.01	9.16	-1.731
+


=====================================
docs/HISTORY.rst
=====================================
--- /dev/null
+++ b/docs/HISTORY.rst
@@ -0,0 +1,199 @@
+Development
+===========
+
+Please use the `PyVCF repository <https://github.com/jamescasbon/PyVCF/>`_.
+Pull requests gladly accepted.
+Issues should be reported at the github issue tracker.
+
+Running tests
+-------------
+
+Please check the tests by running them with::
+
+    python setup.py test
+
+New features should have test code sent with them.
+
+Changes
+=======
+
+0.6.7 Release
+-------------
+
+* Include missing .pyx files
+
+0.6.6 Release
+-------------
+
+* better walk together record ordering (Thanks @datagram, #141)
+
+0.6.5 Release
+-------------
+
+* Better contig handling (#115, #116, #119 thanks Martijn)
+* INFO lines with type character (#120, #121 thanks @AndrewUzilov, Martijn)
+* Single breakends fix (#126 thanks @pkrushe)
+* Speedup by losing ordering of INFO (#128 thanks Martijn)
+* HOMSEQ and other missing fields in INFO (#130 thanks Martijn)
+* Add aaf property, (thanks @mgymrek #131)
+* Custom equality for walk_together, thanks bow #132
+* Change default line encoding to '\n'
+* Improved __eq__ (#134, thanks bow)
+
+
+0.6.4 Release
+-------------
+
+* Handle INFO fields with multiple values, thanks
+* Support writing records without GT data #88, thanks @bow
+* Pickleable call data #112, thanks @superbobry
+* Write files without FORMAT #95 thanks Martijn
+* Strict whitespace mode, thanks Martijn, Lee Lichtenstein and Manawsi Gupta
+* Add support for contigs in header, thanks @gcnh and Martijn
+* Fix GATK header parsing, thanks @alimanfoo
+
+0.6.3 Release
+-------------
+
+* cython port of #79
+* correct writing of meta lines #84
+
+0.6.2 Release
+-------------
+
+* issues #78, #79 (thanks Sean, Brad)
+
+0.6.1 Release
+-------------
+
+* Add strict whitespace mode for well formed VCFs with spaces
+  in sample names (thanks Marco)
+* Ignore blank lines in files (thanks Martijn)
+* Tweaks for handling missing data (thanks Sean)
+* bcftools tests (thanks Martijn)
+* record.FILTER is always a list
+
+0.6.0 Release
+-------------
+
+* Backwards incompatible change: _Call.data is now a
+  namedtuple (previously it was a dict)
+* Optional cython version, much improved performance.
+* Improvements to writer (thanks @cmclean)
+* Improvements to inheritance of classes (thanks @lennax)
+
+
+0.5.0 Release
+-------------
+
+* VCF 4.1 support:
+  - support missing genotype #28 (thanks @martijnvermaat)
+  - parseALT for svs #42, #48 (thanks @dzerbino)
+* `trim_common_suffix` method #22 (thanks @martijnvermaat)
+* Multiple metadata with the same key is stored (#52)
+* Writer improvements:
+  - A/G in Number INFO fields #53 (thanks @lennax)
+  - Better output #55 (thanks @cmclean)
+* Allow malformed INFO fields #49 (thanks @ilyaminkin)
+* Added bayes factor error bias VCF filter
+* Added docs on vcf_melt
+* filters from @libor-m (SNP only, depth per sample, avg depth per sample)
+* change to the filter API, use docstring for filter description
+
+0.4.6 Release
+-------------
+
+* Performance improvements (#47)
+* Preserve order of INFO column (#46)
+
+0.4.5 Release
+-------------
+
+* Support exponent syntax qual values (#43, #44) (thanks @martijnvermaat)
+* Preserve order of header lines (#45)
+
+0.4.4 Release
+-------------
+
+* Support whitespace in sample names
+* SV work (thanks @arq5x)
+* Python 3 support via 2to3 (thanks @marcelm)
+* Improved filtering script, capable of importing local files
+
+0.4.3 Release
+-------------
+
+* Single floats in Reader._sample_parser not being converted to float #35
+* Handle String INFO values when Number=1 in header #34
+
+0.4.2 Release
+-------------
+
+* Installation problems
+
+0.4.1 Release
+-------------
+
+* Installation problems
+
+0.4.0 Release
+-------------
+
+* Package structure
+* add ``vcf.utils`` module with ``walk_together`` method
+* samtools tests
+* support Freebayes' non standard '.' for no call
+* fix vcf_melt
+* support monomorphic sites, add ``is_monomorphic`` method, handle null QUALs
+* filter support for files with monomorphic calls
+* Values declared as single are no-longer returned in lists
+* several performance improvements
+
+
+0.3.0 Release
+-------------
+
+* Fix setup.py for python < 2.7
+* Add ``__eq__`` to ``_Record`` and ``_Call``
+* Add ``is_het`` and ``is_variant`` to ``_Call``
+* Drop aggressive parse mode: we're always aggressive.
+* Add tabix fetch for single calls, fix one->zero based indexing
+* add prepend_chr mode for ``Reader`` to add `chr` to CHROM attributes
+
+0.2.2 Release
+-------------
+
+Documentation release
+
+0.2.1 Release
+-------------
+
+* Add shebang to vcf_filter.py
+
+0.2 Release
+-----------
+
+* Replace genotype dictionary with a ``Call`` object
+* Methods on ``Record`` and ``Call`` (thanks @arq5x)
+* Shortcut parse_sample when genotype is None
+
+0.1 Release
+-----------
+
+* Added test code
+* Added Writer class
+* Allow negative number in ``INFO`` and ``FORMAT`` fields (thanks @martijnvermaat)
+* Prefer ``vcf.Reader`` to ``vcf.VCFReader``
+* Support compressed files with guessing where filename is available on fsock
+* Allow opening by filename as well as filesocket
+* Support fetching rows for tabixed indexed files
+* Performance improvements (see ``test/prof.py``)
+* Added extensible filter script (see FILTERS.md), vcf_filter.py
+
+Contributions
+=============
+
+Project started by @jdoughertyii and taken over by @jamescasbon on 12th January 2011.
+Contributions from @arq5x, @brentp, @martijnvermaat, @ian1roberts, @marcelm.
+
+This project was supported by `Population Genetics <http://www.populationgenetics.com/>`_.


=====================================
docs/INTRO.rst
=====================================
--- /dev/null
+++ b/docs/INTRO.rst
@@ -0,0 +1,4 @@
+Introduction
+============
+
+.. include:: ../README.rst


=====================================
docs/Makefile
=====================================
--- /dev/null
+++ b/docs/Makefile
@@ -0,0 +1,130 @@
+# Makefile for Sphinx documentation
+#
+
+# You can set these variables from the command line.
+SPHINXOPTS    =
+SPHINXBUILD   = sphinx-build
+PAPER         =
+BUILDDIR      = _build
+
+# Internal variables.
+PAPEROPT_a4     = -D latex_paper_size=a4
+PAPEROPT_letter = -D latex_paper_size=letter
+ALLSPHINXOPTS   = -d $(BUILDDIR)/doctrees $(PAPEROPT_$(PAPER)) $(SPHINXOPTS) .
+
+.PHONY: help clean html dirhtml singlehtml pickle json htmlhelp qthelp devhelp epub latex latexpdf text man changes linkcheck doctest
+
+help:
+	@echo "Please use \`make <target>' where <target> is one of"
+	@echo "  html       to make standalone HTML files"
+	@echo "  dirhtml    to make HTML files named index.html in directories"
+	@echo "  singlehtml to make a single large HTML file"
+	@echo "  pickle     to make pickle files"
+	@echo "  json       to make JSON files"
+	@echo "  htmlhelp   to make HTML files and a HTML help project"
+	@echo "  qthelp     to make HTML files and a qthelp project"
+	@echo "  devhelp    to make HTML files and a Devhelp project"
+	@echo "  epub       to make an epub"
+	@echo "  latex      to make LaTeX files, you can set PAPER=a4 or PAPER=letter"
+	@echo "  latexpdf   to make LaTeX files and run them through pdflatex"
+	@echo "  text       to make text files"
+	@echo "  man        to make manual pages"
+	@echo "  changes    to make an overview of all changed/added/deprecated items"
+	@echo "  linkcheck  to check all external links for integrity"
+	@echo "  doctest    to run all doctests embedded in the documentation (if enabled)"
+
+clean:
+	-rm -rf $(BUILDDIR)/*
+
+html:
+	$(SPHINXBUILD) -b html $(ALLSPHINXOPTS) $(BUILDDIR)/html
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/html."
+
+dirhtml:
+	$(SPHINXBUILD) -b dirhtml $(ALLSPHINXOPTS) $(BUILDDIR)/dirhtml
+	@echo
+	@echo "Build finished. The HTML pages are in $(BUILDDIR)/dirhtml."
+
+singlehtml:
+	$(SPHINXBUILD) -b singlehtml $(ALLSPHINXOPTS) $(BUILDDIR)/singlehtml
+	@echo
+	@echo "Build finished. The HTML page is in $(BUILDDIR)/singlehtml."
+
+pickle:
+	$(SPHINXBUILD) -b pickle $(ALLSPHINXOPTS) $(BUILDDIR)/pickle
+	@echo
+	@echo "Build finished; now you can process the pickle files."
+
+json:
+	$(SPHINXBUILD) -b json $(ALLSPHINXOPTS) $(BUILDDIR)/json
+	@echo
+	@echo "Build finished; now you can process the JSON files."
+
+htmlhelp:
+	$(SPHINXBUILD) -b htmlhelp $(ALLSPHINXOPTS) $(BUILDDIR)/htmlhelp
+	@echo
+	@echo "Build finished; now you can run HTML Help Workshop with the" \
+	      ".hhp project file in $(BUILDDIR)/htmlhelp."
+
+qthelp:
+	$(SPHINXBUILD) -b qthelp $(ALLSPHINXOPTS) $(BUILDDIR)/qthelp
+	@echo
+	@echo "Build finished; now you can run "qcollectiongenerator" with the" \
+	      ".qhcp project file in $(BUILDDIR)/qthelp, like this:"
+	@echo "# qcollectiongenerator $(BUILDDIR)/qthelp/PyVCF.qhcp"
+	@echo "To view the help file:"
+	@echo "# assistant -collectionFile $(BUILDDIR)/qthelp/PyVCF.qhc"
+
+devhelp:
+	$(SPHINXBUILD) -b devhelp $(ALLSPHINXOPTS) $(BUILDDIR)/devhelp
+	@echo
+	@echo "Build finished."
+	@echo "To view the help file:"
+	@echo "# mkdir -p $$HOME/.local/share/devhelp/PyVCF"
+	@echo "# ln -s $(BUILDDIR)/devhelp $$HOME/.local/share/devhelp/PyVCF"
+	@echo "# devhelp"
+
+epub:
+	$(SPHINXBUILD) -b epub $(ALLSPHINXOPTS) $(BUILDDIR)/epub
+	@echo
+	@echo "Build finished. The epub file is in $(BUILDDIR)/epub."
+
+latex:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo
+	@echo "Build finished; the LaTeX files are in $(BUILDDIR)/latex."
+	@echo "Run \`make' in that directory to run these through (pdf)latex" \
+	      "(use \`make latexpdf' here to do that automatically)."
+
+latexpdf:
+	$(SPHINXBUILD) -b latex $(ALLSPHINXOPTS) $(BUILDDIR)/latex
+	@echo "Running LaTeX files through pdflatex..."
+	make -C $(BUILDDIR)/latex all-pdf
+	@echo "pdflatex finished; the PDF files are in $(BUILDDIR)/latex."
+
+text:
+	$(SPHINXBUILD) -b text $(ALLSPHINXOPTS) $(BUILDDIR)/text
+	@echo
+	@echo "Build finished. The text files are in $(BUILDDIR)/text."
+
+man:
+	$(SPHINXBUILD) -b man $(ALLSPHINXOPTS) $(BUILDDIR)/man
+	@echo
+	@echo "Build finished. The manual pages are in $(BUILDDIR)/man."
+
+changes:
+	$(SPHINXBUILD) -b changes $(ALLSPHINXOPTS) $(BUILDDIR)/changes
+	@echo
+	@echo "The overview file is in $(BUILDDIR)/changes."
+
+linkcheck:
+	$(SPHINXBUILD) -b linkcheck $(ALLSPHINXOPTS) $(BUILDDIR)/linkcheck
+	@echo
+	@echo "Link check complete; look for any errors in the above output " \
+	      "or in $(BUILDDIR)/linkcheck/output.txt."
+
+doctest:
+	$(SPHINXBUILD) -b doctest $(ALLSPHINXOPTS) $(BUILDDIR)/doctest
+	@echo "Testing of doctests in the sources finished, look at the " \
+	      "results in $(BUILDDIR)/doctest/output.txt."


=====================================
docs/conf.py
=====================================
--- /dev/null
+++ b/docs/conf.py
@@ -0,0 +1,217 @@
+# -*- coding: utf-8 -*-
+#
+# PyVCF documentation build configuration file, created by
+# sphinx-quickstart on Wed Jan 25 12:29:23 2012.
+#
+# This file is execfile()d with the current directory set to its containing dir.
+#
+# Note that not all possible configuration values are present in this
+# autogenerated file.
+#
+# All configuration values have a default; values that are commented out
+# serve to show the default.
+
+import sys, os
+
+# If extensions (or modules to document with autodoc) are in another directory,
+# add these directories to sys.path here. If the directory is relative to the
+# documentation root, use os.path.abspath to make it absolute, like shown here.
+sys.path.insert(0, os.path.abspath('..'))
+
+# -- General configuration -----------------------------------------------------
+
+# If your documentation needs a minimal Sphinx version, state it here.
+#needs_sphinx = '1.0'
+
+# Add any Sphinx extension module names here, as strings. They can be extensions
+# coming with Sphinx (named 'sphinx.ext.*') or your custom ones.
+extensions = ['sphinx.ext.autodoc', 'sphinx.ext.doctest', 'sphinx.ext.viewcode']
+
+# Add any paths that contain templates here, relative to this directory.
+templates_path = ['.templates']
+
+# The suffix of source filenames.
+source_suffix = '.rst'
+
+# The encoding of source files.
+#source_encoding = 'utf-8-sig'
+
+# The master toctree document.
+master_doc = 'index'
+
+# General information about the project.
+project = u'PyVCF'
+copyright = u'2012, James Casbon, @jdoughertyii'
+
+# The version info for the project you're documenting, acts as replacement for
+# |version| and |release|, also used in various other places throughout the
+# built documents.
+#
+# The short X.Y version.
+import vcf
+version = vcf.VERSION
+# The full version, including alpha/beta/rc tags.
+release = vcf.VERSION
+
+# The language for content autogenerated by Sphinx. Refer to documentation
+# for a list of supported languages.
+#language = None
+
+# There are two options for replacing |today|: either, you set today to some
+# non-false value, then it is used:
+#today = ''
+# Else, today_fmt is used as the format for a strftime call.
+#today_fmt = '%B %d, %Y'
+
+# List of patterns, relative to source directory, that match files and
+# directories to ignore when looking for source files.
+exclude_patterns = ['.build']
+
+# The reST default role (used for this markup: `text`) to use for all documents.
+#default_role = None
+
+# If true, '()' will be appended to :func: etc. cross-reference text.
+#add_function_parentheses = True
+
+# If true, the current module name will be prepended to all description
+# unit titles (such as .. function::).
+#add_module_names = True
+
+# If true, sectionauthor and moduleauthor directives will be shown in the
+# output. They are ignored by default.
+#show_authors = False
+
+# The name of the Pygments (syntax highlighting) style to use.
+pygments_style = 'sphinx'
+
+# A list of ignored prefixes for module index sorting.
+#modindex_common_prefix = []
+
+
+# -- Options for HTML output ---------------------------------------------------
+
+# The theme to use for HTML and HTML Help pages.  See the documentation for
+# a list of builtin themes.
+html_theme = 'default'
+
+# Theme options are theme-specific and customize the look and feel of a theme
+# further.  For a list of options available for each theme, see the
+# documentation.
+#html_theme_options = {}
+
+# Add any paths that contain custom themes here, relative to this directory.
+#html_theme_path = []
+
+# The name for this set of Sphinx documents.  If None, it defaults to
+# "<project> v<release> documentation".
+#html_title = None
+
+# A shorter title for the navigation bar.  Default is the same as html_title.
+#html_short_title = None
+
+# The name of an image file (relative to this directory) to place at the top
+# of the sidebar.
+#html_logo = None
+
+# The name of an image file (within the static path) to use as favicon of the
+# docs.  This file should be a Windows icon file (.ico) being 16x16 or 32x32
+# pixels large.
+#html_favicon = None
+
+# Add any paths that contain custom static files (such as style sheets) here,
+# relative to this directory. They are copied after the builtin static files,
+# so a file named "default.css" will overwrite the builtin "default.css".
+html_static_path = ['.static']
+
+# If not '', a 'Last updated on:' timestamp is inserted at every page bottom,
+# using the given strftime format.
+#html_last_updated_fmt = '%b %d, %Y'
+
+# If true, SmartyPants will be used to convert quotes and dashes to
+# typographically correct entities.
+#html_use_smartypants = True
+
+# Custom sidebar templates, maps document names to template names.
+#html_sidebars = {}
+
+# Additional templates that should be rendered to pages, maps page names to
+# template names.
+#html_additional_pages = {}
+
+# If false, no module index is generated.
+#html_domain_indices = True
+
+# If false, no index is generated.
+#html_use_index = True
+
+# If true, the index is split into individual pages for each letter.
+#html_split_index = False
+
+# If true, links to the reST sources are added to the pages.
+#html_show_sourcelink = True
+
+# If true, "Created using Sphinx" is shown in the HTML footer. Default is True.
+#html_show_sphinx = True
+
+# If true, "(C) Copyright ..." is shown in the HTML footer. Default is True.
+#html_show_copyright = True
+
+# If true, an OpenSearch description file will be output, and all pages will
+# contain a <link> tag referring to it.  The value of this option must be the
+# base URL from which the finished HTML is served.
+#html_use_opensearch = ''
+
+# This is the file name suffix for HTML files (e.g. ".xhtml").
+#html_file_suffix = None
+
+# Output file base name for HTML help builder.
+htmlhelp_basename = 'PyVCFdoc'
+
+
+# -- Options for LaTeX output --------------------------------------------------
+
+# The paper size ('letter' or 'a4').
+#latex_paper_size = 'letter'
+
+# The font size ('10pt', '11pt' or '12pt').
+#latex_font_size = '10pt'
+
+# Grouping the document tree into LaTeX files. List of tuples
+# (source start file, target name, title, author, documentclass [howto/manual]).
+latex_documents = [
+  ('index', 'PyVCF.tex', u'PyVCF Documentation',
+   u'James Casbon, @jdoughertyii', 'manual'),
+]
+
+# The name of an image file (relative to this directory) to place at the top of
+# the title page.
+#latex_logo = None
+
+# For "manual" documents, if this is true, then toplevel headings are parts,
+# not chapters.
+#latex_use_parts = False
+
+# If true, show page references after internal links.
+#latex_show_pagerefs = False
+
+# If true, show URL addresses after external links.
+#latex_show_urls = False
+
+# Additional stuff for the LaTeX preamble.
+#latex_preamble = ''
+
+# Documents to append as an appendix to all manuals.
+#latex_appendices = []
+
+# If false, no module index is generated.
+#latex_domain_indices = True
+
+
+# -- Options for manual page output --------------------------------------------
+
+# One entry per manual page. List of tuples
+# (source start file, name, description, authors, manual section).
+man_pages = [
+    ('index', 'pyvcf', u'PyVCF Documentation',
+     [u'James Casbon, @jdoughertyii'], 1)
+]


=====================================
docs/index.rst
=====================================
--- /dev/null
+++ b/docs/index.rst
@@ -0,0 +1,22 @@
+
+PyVCF - A Variant Call Format Parser for Python 
+===============================================
+
+Contents:
+
+.. toctree::
+   :maxdepth: 2
+
+   INTRO
+   API
+   FILTERS
+   HISTORY
+   
+
+Indices and tables
+==================
+
+* :ref:`genindex`
+* :ref:`modindex`
+* :ref:`search`
+


=====================================
requirements/common-requirements.txt
=====================================
--- /dev/null
+++ b/requirements/common-requirements.txt
@@ -0,0 +1,3 @@
+cython
+pysam!=0.8.0
+setuptools


=====================================
requirements/pypy-requirements.txt
=====================================
--- /dev/null
+++ b/requirements/pypy-requirements.txt
@@ -0,0 +1 @@
+setuptools


=====================================
setup.cfg deleted
=====================================
--- a/setup.cfg
+++ /dev/null
@@ -1,5 +0,0 @@
-[egg_info]
-tag_date = 0
-tag_build = 
-tag_svn_revision = 0
-


=====================================
setup.py
=====================================
--- a/setup.py
+++ b/setup.py
@@ -8,15 +8,8 @@ try:
 except:
     CYTHON = False
 
-IS_PYTHON26 = sys.version_info[:2] == (2, 6)
-
 DEPENDENCIES = ['setuptools']
 
-if IS_PYTHON26:
-    DEPENDENCIES.extend(['argparse', 'counter', 'ordereddict',
-                         'unittest2'])
-
-
 # get the version without an import
 VERSION = "Undefined"
 DOC = ""
@@ -68,12 +61,13 @@ setup(
         'Programming Language :: Cython',
         'Programming Language :: Python',
         'Programming Language :: Python :: 2',
-        'Programming Language :: Python :: 2.6',
         'Programming Language :: Python :: 2.7',
         'Programming Language :: Python :: 3',
-        'Programming Language :: Python :: 3.2',
-        'Programming Language :: Python :: 3.3',
         'Programming Language :: Python :: 3.4',
+        'Programming Language :: Python :: 3.5',
+        'Programming Language :: Python :: 3.6',
+        'Programming Language :: Python :: Implementation :: CPython',
+        'Programming Language :: Python :: Implementation :: PyPy',
         'Topic :: Scientific/Engineering :: Bio-Informatics',
       ],
     keywords='bioinformatics',


=====================================
tox.ini
=====================================
--- /dev/null
+++ b/tox.ini
@@ -0,0 +1,21 @@
+# Tox (http://tox.testrun.org/) is a tool for running tests
+# in multiple virtualenvs. This configuration file will run the
+# test suite on all supported python versions. To use it, "pip install tox"
+# and then run "tox" from this directory.
+
+[tox]
+envlist = py27, py34, py35, py36, pypy, pypy3
+
+[testenv]
+deps =
+    -rrequirements/common-requirements.txt
+commands =
+    python setup.py clean --all test
+
+[testenv:pypy]
+deps =
+    -rrequirements/pypy-requirements.txt
+
+[testenv:pypy3]
+deps =
+    -rrequirements/pypy-requirements.txt


=====================================
vcf/cparse.pyx
=====================================
--- a/vcf/cparse.pyx
+++ b/vcf/cparse.pyx
@@ -1,14 +1,27 @@
 from model import _Call
 
-cdef _map(func, iterable, bad='.'):
+cdef _map(func, iterable, bad=['.', '']):
     '''``map``, but make bad values None.'''
-    return [func(x) if x != bad else None
+    return [func(x) if x not in bad else None
             for x in iterable]
 
 INTEGER = 'Integer'
 FLOAT = 'Float'
 NUMERIC = 'Numeric'
 
+def _parse_filter(filt_str):
+    '''Parse the FILTER field of a VCF entry into a Python list
+
+    NOTE: this method has a python equivalent and care must be taken
+    to keep the two methods equivalent
+    '''
+    if filt_str == '.':
+        return None
+    elif filt_str == 'PASS':
+        return []
+    else:
+        return filt_str.split(';')
+
 def parse_samples(
         list names, list samples, samp_fmt,
         list samp_fmt_types, list samp_fmt_nums, site):
@@ -39,6 +52,10 @@ def parse_samples(
             if samp_fmt._fields[j] == 'GT':
                 sampdat[j] = vals
                 continue
+            # genotype filters are a special case
+            elif samp_fmt._fields[j] == 'FT':
+                sampdat[j] = _parse_filter(vals)
+                continue
             elif not vals or vals == '.':
                 sampdat[j] = None
                 continue
@@ -48,8 +65,7 @@ def parse_samples(
             entry_num = samp_fmt_nums[j]
 
             # we don't need to split single entries
-            if entry_num == 1 or ',' not in vals:
-
+            if entry_num == 1:
                 if entry_type == INTEGER:
                     try:
                         sampdat[j] = int(vals)
@@ -59,14 +75,9 @@ def parse_samples(
                     sampdat[j] = float(vals)
                 else:
                     sampdat[j] = vals
-
-                if entry_num != 1:
-                    sampdat[j] = (sampdat[j])
-
                 continue
 
             vals = vals.split(',')
-
             if entry_type == INTEGER:
                 try:
                     sampdat[j] = _map(int, vals)


=====================================
vcf/model.py
=====================================
--- a/vcf/model.py
+++ b/vcf/model.py
@@ -23,10 +23,10 @@ class _Call(object):
         #: Namedtuple of data from the VCF file
         self.data = data
 
-        if hasattr(self.data, 'GT'):
+        if getattr(self.data, 'GT', None) is not None:
             self.gt_alleles = [(al if al != '.' else None) for al in allele_delimiter.split(self.data.GT)]
             self.ploidity = len(self.gt_alleles)
-            self.called = all([al != None for al in self.gt_alleles])
+            self.called = any(al is not None for al in self.gt_alleles)
             self.gt_nums = self.data.GT if self.called else None
         else:
             #62 a call without a genotype is not defined as called or not
@@ -65,7 +65,7 @@ class _Call(object):
         if self.called:
             # lookup and return the actual DNA alleles
             try:
-                return self.gt_phase_char().join(str(self.site.alleles[int(X)]) for X in self.gt_alleles)
+                return self.gt_phase_char().join(str(self.site.alleles[int(X)] if X is not None else '.') for X in self.gt_alleles)
             except:
                 sys.stderr.write("Allele number not found in list of alleles\n")
         else:
@@ -117,6 +117,18 @@ class _Call(object):
             return None
         return self.gt_type == 1
 
+    @property
+    def is_filtered(self):
+        """ Return True for filtered calls """
+        try: # no FT annotation present for this variant
+            filt = self.data.FT
+        except AttributeError:
+            return False
+        if filt is None or len(filt) == 0: # FT is not set or set to PASS
+            return False
+        else:
+            return True
+
 
 class _Record(object):
     """ A set of calls at a site.  Equivalent to a row in a VCF file.
@@ -279,7 +291,7 @@ class _Record(object):
     @property
     def num_called(self):
         """ The number of called samples"""
-        return sum(s.called for s in self.samples)
+        return sum(1 for s in self.samples if s.called)
 
     @property
     def call_rate(self):
@@ -389,7 +401,7 @@ class _Record(object):
             return True
         for alt in self.ALT:
             if alt is None:
-                return True
+                return False
             if alt.type != "SNV" and alt.type != "MNV":
                 return False
             elif len(alt) != len(self.REF):
@@ -440,7 +452,7 @@ class _Record(object):
             # just one alt allele
             alt_allele = self.ALT[0]
             if alt_allele is None:
-                return True
+                return False
             if len(self.REF) > len(alt_allele):
                 return True
             else:
@@ -536,6 +548,15 @@ class _Record(object):
         """ Return True for reference calls """
         return len(self.ALT) == 1 and self.ALT[0] is None
 
+    @property
+    def is_filtered(self):
+        """ Return True if a variant has been filtered """
+        filt = self.FILTER
+        if filt is None or len(filt) == 0: # FILTER is not set or set to PASS
+            return False
+        else:
+            return True
+
 
 class _AltRecord(object):
     '''An alternative allele record: either replacement string, SV placeholder, or breakend'''


=====================================
vcf/parser.py
=====================================
--- a/vcf/parser.py
+++ b/vcf/parser.py
@@ -78,12 +78,12 @@ _Contig = collections.namedtuple('Contig', ['id', 'length'])
 
 
 class _vcf_metadata_parser(object):
-    '''Parse the metadat in the header of a VCF file.'''
+    '''Parse the metadata in the header of a VCF file.'''
     def __init__(self):
         super(_vcf_metadata_parser, self).__init__()
         self.info_pattern = re.compile(r'''\#\#INFO=<
             ID=(?P<id>[^,]+),\s*
-            Number=(?P<number>-?\d+|\.|[AGR]),\s*
+            Number=(?P<number>-?\d+|\.|[AGR])?,\s*
             Type=(?P<type>Integer|Float|Flag|Character|String),\s*
             Description="(?P<desc>[^"]*)"
             (?:,\s*Source="(?P<source>[^"]*)")?
@@ -151,7 +151,7 @@ class _vcf_metadata_parser(object):
         match = self.alt_pattern.match(alt_string)
         if not match:
             raise SyntaxError(
-                "One of the FILTER lines is malformed: %s" % alt_string)
+                "One of the ALT lines is malformed: %s" % alt_string)
 
         alt = _Alt(match.group('id'), match.group('desc'))
 
@@ -354,11 +354,24 @@ class Reader(object):
         self.samples = fields[9:]
         self._sample_indexes = dict([(x,i) for (i,x) in enumerate(self.samples)])
 
-    def _map(self, func, iterable, bad='.'):
+    def _map(self, func, iterable, bad=['.', '']):
         '''``map``, but make bad values None.'''
-        return [func(x) if x != bad else None
+        return [func(x) if x not in bad else None
                 for x in iterable]
 
+    def _parse_filter(self, filt_str):
+        '''Parse the FILTER field of a VCF entry into a Python list
+
+        NOTE: this method has a cython equivalent and care must be taken
+        to keep the two methods equivalent
+        '''
+        if filt_str == '.':
+            return None
+        elif filt_str == 'PASS':
+            return []
+        else:
+            return filt_str.split(';')
+
     def _parse_info(self, info_str):
         '''Parse the INFO field of a VCF entry into a dictionary of Python
         types.
@@ -466,6 +479,10 @@ class Reader(object):
                 if samp_fmt._fields[i] == 'GT':
                     sampdat[i] = vals
                     continue
+                # genotype filters are a special case
+                elif samp_fmt._fields[i] == 'FT':
+                    sampdat[i] = self._parse_filter(vals)
+                    continue
                 elif not vals or vals == ".":
                     sampdat[i] = None
                     continue
@@ -474,25 +491,19 @@ class Reader(object):
                 entry_type = samp_fmt._types[i]
 
                 # we don't need to split single entries
-                if entry_num == 1 or ',' not in vals:
-
+                if entry_num == 1:
                     if entry_type == 'Integer':
                         try:
                             sampdat[i] = int(vals)
                         except ValueError:
                             sampdat[i] = float(vals)
-                    elif entry_type == 'Float':
+                    elif entry_type == 'Float' or entry_type == 'Numeric':
                         sampdat[i] = float(vals)
                     else:
                         sampdat[i] = vals
-
-                    if entry_num != 1:
-                        sampdat[i] = (sampdat[i])
-
                     continue
 
                 vals = vals.split(',')
-
                 if entry_type == 'Integer':
                     try:
                         sampdat[i] = _map(int, vals)
@@ -562,13 +573,7 @@ class Reader(object):
             except ValueError:
                 qual = None
 
-        filt = row[6]
-        if filt == '.':
-            filt = None
-        elif filt == 'PASS':
-            filt = []
-        else:
-            filt = filt.split(';')
+        filt = self._parse_filter(row[6])
         info = self._parse_info(row[7])
 
         try:
@@ -741,11 +746,17 @@ class Writer(object):
         else:
             gt = './.' if 'GT' in fmt else ''
 
-        if not gt:
-            return ':'.join([self._stringify(x) for x in sample.data])
+        result = [gt] if gt else []
         # Following the VCF spec, GT is always the first item whenever it is present.
-        else:
-            return ':'.join([gt] + [self._stringify(x) for x in sample.data[1:]])
+        for field in sample.data._fields:
+            value = getattr(sample.data,field)
+            if field == 'GT':
+                continue
+            if field == 'FT':
+                result.append(self._format_filter(value))
+            else:
+                result.append(self._stringify(value))
+        return ':'.join(result)
 
     def _stringify(self, x, none='.', delim=','):
         if type(x) == type([]):


=====================================
vcf/test/bad-info-character.vcf
=====================================
--- /dev/null
+++ b/vcf/test/bad-info-character.vcf
@@ -0,0 +1,14 @@
+##fileformat=VCFv4.1
+##INFO=<ID=EMPTY_1,Number=1,Type=Float,Description="A floating point value">
+##INFO=<ID=EMPTY_3,Number=3,Type=Float,Description="Floating point values">
+##INFO=<ID=EMPTY_N,Number=.,Type=Float,Description="Floating point values">
+##INFO=<ID=DOT_1,Number=1,Type=Character,Description="A character value">
+##INFO=<ID=DOT_3,Number=3,Type=Character,Description="Character values">
+##INFO=<ID=DOT_N,Number=.,Type=Character,Description="Character values">
+##INFO=<ID=NOTEMPTY_1,Number=1,Type=Float,Description="A floating point value">
+##INFO=<ID=NOTEMPTY_3,Number=3,Type=Float,Description="Floating point values">
+##INFO=<ID=NOTEMPTY_N,Number=.,Type=Float,Description="Floating point values">
+##INFO=<ID=FLAG,Number=0,Type=Flag,Description="HapMap2 membership">
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	Sample
+chr1	100	id1	G	A	.	.	FLAG;EMPTY_1=;EMPTY_3=;EMPTY_N=;DOT_1=.;DOT_3=.,.,.;DOT_N=.;NOTEMPTY_1=1;NOTEMPTY_3=1,2,3;NOTEMPTY_N=1	GT	0/1


=====================================
vcf/test/example-4.0.vcf
=====================================
--- a/vcf/test/example-4.0.vcf
+++ b/vcf/test/example-4.0.vcf
@@ -20,4 +20,5 @@
 20	17330	.	T	A	3.0	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3
 20	1110696	rs6040355	A	G,T	1e+03	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4
 20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:7:56,60	0|0:48:4:51,51	0/0:61:2
+20	1231234	.	AT	A	46	PASS	NS=3;DP=15;AA=A	GT:GQ:DP:HQ	1|1:23:7:26,30	0|0:27:9:56,60	0|0:31:10:65,71
 20	1234567	microsat1	GTCT	G,GTACT	.	PASS	NS=3;DP=9;AA=G	GT:GQ:DP	./.:35:4	0/2:17:2	1/1:40:3


=====================================
vcf/test/issue-254.vcf
=====================================
--- /dev/null
+++ b/vcf/test/issue-254.vcf
@@ -0,0 +1,9 @@
+##fileformat=VCFv4.1
+##fileDate=20090805
+##source=myImputationProgramV3.1
+##reference=1000GenomesPilot-NCBI36
+##phasing=partial
+##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
+##FORMAT=<ID=AO,Number=A,Type=Integer,Description="Alternate allele observation count">
+#CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	NA00001	NA00002	NA00003
+21	4242421	.	T	A	30	.	.	GT:AO	0|0:0.1	0|1:0.2	0/0:0.3


=====================================
vcf/test/test_vcf.py
=====================================
--- a/vcf/test/test_vcf.py
+++ b/vcf/test/test_vcf.py
@@ -393,6 +393,24 @@ class TestInfoTypeCharacter(unittest.TestCase):
             self.assertEquals(l.INFO, r.INFO)
 
 
+class TestBadInfoFields(unittest.TestCase):
+    def test_parse(self):
+        reader = vcf.Reader(fh('bad-info-character.vcf'))
+        record = next(reader)
+        self.assertEquals(record.INFO['DOT_1'], None)
+        self.assertEquals(record.INFO['DOT_3'], [None, None, None])
+        self.assertEquals(record.INFO['DOT_N'], [None])
+        self.assertEquals(record.INFO['EMPTY_1'], None)
+        # Perhaps EMPTY_3 should yield [None, None, None] but this is really a
+        # cornercase of unspecified behaviour.
+        self.assertEquals(record.INFO['EMPTY_3'], [None])
+        self.assertEquals(record.INFO['EMPTY_N'], [None])
+        self.assertEquals(record.INFO['NOTEMPTY_1'], 1)
+        self.assertEquals(record.INFO['NOTEMPTY_3'], [1, 2, 3])
+        self.assertEquals(record.INFO['NOTEMPTY_N'], [1])
+        pass
+
+
 class TestParseMetaLine(unittest.TestCase):
     def test_parse(self):
         reader = vcf.Reader(fh('parse-meta-line.vcf'))
@@ -578,6 +596,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(3.0/3.0, call_rate)
             if var.POS == 1230237:
                 self.assertEqual(3.0/3.0, call_rate)
+            if var.POS == 1231234:
+                self.assertEqual(3.0/3.0, call_rate)
             elif var.POS == 1234567:
                 self.assertEqual(2.0/3.0, call_rate)
 
@@ -593,6 +613,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual([2.0/6.0, 4.0/6.0], aaf)
             if var.POS == 1230237:
                 self.assertEqual([0.0/6.0], aaf)
+            if var.POS == 1231234:
+                self.assertEqual([2.0/6.0], aaf)
             elif var.POS == 1234567:
                 self.assertEqual([2.0/4.0, 1.0/4.0], aaf)
         reader = vcf.Reader(fh('example-4.1-ploidy.vcf'))
@@ -615,6 +637,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(None, pi)
             if var.POS == 1230237:
                 self.assertEqual(0.0/6.0, pi)
+            if var.POS == 1231234:
+                self.assertEqual((6.0/(6.0-1))*(2.0*(1.0/3.0)*(2.0/3.0)) , pi)
             elif var.POS == 1234567:
                 self.assertEqual(None, pi)
 
@@ -630,6 +654,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(4.0/9.0, het)
             if var.POS == 1230237:
                 self.assertEqual(0.0, het)
+            if var.POS == 1231234:
+                self.assertEqual(4.0/9.0, het)
             elif var.POS == 1234567:
                 self.assertEqual(5.0/8.0, het)
 
@@ -650,6 +676,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(True, is_snp)
             if var.POS == 1230237:
                 self.assertEqual(False, is_snp)
+            if var.POS == 1231234:
+                self.assertEqual(False, is_snp)
             elif var.POS == 1234567:
                 self.assertEqual(False, is_snp)
 
@@ -682,6 +710,8 @@ class TestRecord(unittest.TestCase):
             if var.POS == 1110696:
                 self.assertEqual(False, is_indel)
             if var.POS == 1230237:
+                self.assertEqual(False, is_indel)
+            if var.POS == 1231234:
                 self.assertEqual(True, is_indel)
             elif var.POS == 1234567:
                 self.assertEqual(True, is_indel)
@@ -698,6 +728,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(False, is_trans)
             if var.POS == 1230237:
                 self.assertEqual(False, is_trans)
+            if var.POS == 1231234:
+                self.assertEqual(False, is_trans)
             elif var.POS == 1234567:
                 self.assertEqual(False, is_trans)
 
@@ -712,6 +744,8 @@ class TestRecord(unittest.TestCase):
             if var.POS == 1110696:
                 self.assertEqual(False, is_del)
             if var.POS == 1230237:
+                self.assertEqual(False, is_del)
+            if var.POS == 1231234:
                 self.assertEqual(True, is_del)
             elif var.POS == 1234567:
                 self.assertEqual(False, is_del)
@@ -727,6 +761,8 @@ class TestRecord(unittest.TestCase):
             if var.POS == 1110696:
                 self.assertEqual("snp", type)
             if var.POS == 1230237:
+                self.assertEqual("unknown", type)
+            if var.POS == 1231234:
                 self.assertEqual("indel", type)
             elif var.POS == 1234567:
                 self.assertEqual("indel", type)
@@ -759,6 +795,8 @@ class TestRecord(unittest.TestCase):
             if var.POS == 1110696:
                 self.assertEqual("unknown", subtype)
             if var.POS == 1230237:
+                self.assertEqual("unknown", subtype)
+            if var.POS == 1231234:
                 self.assertEqual("del", subtype)
             elif var.POS == 1234567:
                 self.assertEqual("unknown", subtype)
@@ -807,6 +845,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(False, is_sv)
             if var.POS == 1230237:
                 self.assertEqual(False, is_sv)
+            if var.POS == 1231234:
+                self.assertEqual(False, is_sv)
             elif var.POS == 1234567:
                 self.assertEqual(False, is_sv)
 
@@ -838,6 +878,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(False, is_precise)
             if var.POS == 1230237:
                 self.assertEqual(False, is_precise)
+            if var.POS == 1231234:
+                self.assertEqual(False, is_precise)
             elif var.POS == 1234567:
                 self.assertEqual(False, is_precise)
 
@@ -869,6 +911,8 @@ class TestRecord(unittest.TestCase):
                 self.assertEqual(None, sv_end)
             if var.POS == 1230237:
                 self.assertEqual(None, sv_end)
+            if var.POS == 1231234:
+                self.assertEqual(None, sv_end)
             elif var.POS == 1234567:
                 self.assertEqual(None, sv_end)
 
@@ -885,6 +929,8 @@ class TestRecord(unittest.TestCase):
                 expected = 1e+03
             if var.POS == 1230237:
                 expected = 47
+            if var.POS == 1231234:
+                expected = 46
             elif var.POS == 1234567:
                 expected = None
             self.assertEqual(expected, qual)
@@ -1166,6 +1212,8 @@ class TestCall(unittest.TestCase):
                 self.assertEqual([True, True, False], phases)
             if var.POS == 1230237:
                 self.assertEqual([True, True, False], phases)
+            if var.POS == 1231234:
+                self.assertEqual([True, True, True], phases)
             elif var.POS == 1234567:
                 self.assertEqual([False, False, False], phases)
 
@@ -1181,6 +1229,8 @@ class TestCall(unittest.TestCase):
                 self.assertEqual(['G|T', 'T|G', 'T/T'], gt_bases)
             elif var.POS == 1230237:
                 self.assertEqual(['T|T', 'T|T', 'T/T'], gt_bases)
+            elif var.POS == 1231234:
+                self.assertEqual(['A|A', 'AT|AT', 'AT|AT'], gt_bases)
             elif var.POS == 1234567:
                 self.assertEqual([None, 'GTCT/GTACT', 'G/G'], gt_bases)
 
@@ -1198,6 +1248,8 @@ class TestCall(unittest.TestCase):
                 self.assertEqual([1,1,2], gt_types)
             elif var.POS == 1230237:
                 self.assertEqual([0,0,0], gt_types)
+            elif var.POS == 1231234:
+                self.assertEqual([2,0,0], gt_types)
             elif var.POS == 1234567:
                 self.assertEqual([None,1,2], gt_types)
 
@@ -1271,6 +1323,100 @@ class TestIssue201(unittest.TestCase):
             pass
 
 
+class TestIssue234(unittest.TestCase):
+    """ See https://github.com/jamescasbon/PyVCF/issues/234 """
+
+    def test_vcf_metadata_parser_doesnt_break_with_empty_number_tags(self):
+        parser = vcf.parser._vcf_metadata_parser()
+        num_str = '##INFO=<ID=CA,Number=,Type=Flag,Description="Position '
+        num_str += 'could not be annotated to a coding region of a transcript '
+        num_str += 'using the supplied bed file">'
+        try:
+            info = parser.read_info(num_str)[1]
+            self.assertIsNone(info.num)
+        except SyntaxError:
+            msg = "vcf.parser._vcf_metadata_parser shouldn't raise SyntaxError"
+            msg += " if Number tag is empty."
+            self.fail(msg)
+
+
+class TestIssue246(unittest.TestCase):
+    """ See https://github.com/jamescasbon/PyVCF/issues/246 """
+
+    def test_FT_pass_two(self):
+        reader=vcf.Reader(fh('FT.vcf'))
+        next(reader)
+        r=next(reader)
+        target=[
+            [],
+            ['DP125','DP130'],
+            ['DP125','DP130'],
+            ['DP125','DP130'],
+            ['DP125','DP130']
+        ]
+        result=[call.data.FT for call in r.samples]
+        self.assertEqual(target,result)
+
+    def test_FT_one_two(self):
+        reader=list(vcf.Reader(fh('FT.vcf')))
+        r=reader[6]
+        target=[
+            ['DP125','DP130'],
+            ['DP125','DP130'],
+            ['DP125','DP130'],
+            ['DP130'],
+            ['DP125','DP130']
+        ]
+        result=[call.data.FT for call in r.samples]
+        self.assertEqual(target,result)
+
+
+class TestIssue254(unittest.TestCase):
+    """ See https://github.com/jamescasbon/PyVCF/issues/254 """
+
+    def test_remains_singleton_list(self):
+        reader = vcf.Reader(fh('issue-254.vcf'))
+        record = next(reader)
+        expected = [[0.1], [0.2], [0.3]]
+        actual = [call.data.AO for call in record.samples]
+        self.assertEqual(expected, actual)
+
+
+class TestIsFiltered(unittest.TestCase):
+    """ Test is_filtered property for _Call and _Record """
+
+    def test_is_filt_record(self):
+        reader = vcf.Reader(fh('FT.vcf'))
+        target = [
+            False, False, True, False, False,
+            False, True, False, False, False
+        ]
+        result = [record.is_filtered for record in reader]
+        self.assertEqual(target,result)
+
+    def test_is_filt_call_unset(self):
+        reader = vcf.Reader(fh('FT.vcf'))
+        record = next(reader)
+        target = [False]*5
+        result = [call.is_filtered for call in record]
+        self.assertEqual(target,result)
+
+    def test_is_filt_call_pass_two(self):
+        reader = vcf.Reader(fh('FT.vcf'))
+        next(reader)
+        record = next(reader)
+        target = [False, True, True, True, True]
+        result = [call.is_filtered for call in record]
+        self.assertEqual(target,result)
+
+    def test_is_filt_call_one(self):
+        reader = list(vcf.Reader(fh('FT.vcf')))
+        record = reader[6]
+        target = [True]*5
+        result = [call.is_filtered for call in record]
+        self.assertEqual(target,result)
+
+
 class TestOpenMethods(unittest.TestCase):
 
     samples = 'NA00001 NA00002 NA00003'.split()
@@ -1434,10 +1580,10 @@ class TestUtils(unittest.TestCase):
             self.assertEqual(x[0], x[1])
             self.assertEqual(x[1], x[2])
             n+= 1
-        self.assertEqual(n, 5)
+        self.assertEqual(n, 6)
 
-        # artificial case 2 from the left, 2 from the right, 2 together, 1 from the right, 1 from the left
-        expected = 'llrrttrl'
+        # artificial case 2 from the left, 2 from the right, 3 together, 1 from the right, 1 from the left
+        expected = 'llrrtttrl'
         reader1 = vcf.Reader(fh('walk_left.vcf'))
         reader2 = vcf.Reader(fh('example-4.0.vcf'))
 
@@ -1511,22 +1657,32 @@ class TestUncalledGenotypes(unittest.TestCase):
             gt_nums = [s.gt_nums for s in var.samples]
             ploidity = [s.ploidity for s in var.samples]
             gt_alleles = [s.gt_alleles for s in var.samples]
+            gt_type = [s.gt_type for s in var.samples]
 
             if var.POS == 14370:
                 self.assertEqual(['0|0', None, '1/1'], gt_nums)
                 self.assertEqual(['G|G', None, 'A/A'], gt_bases)
                 self.assertEqual([2,2,2], ploidity)
                 self.assertEqual([['0','0'], [None,None], ['1','1']], gt_alleles)
+                self.assertEqual([0, None, 2], gt_type)
             elif var.POS == 17330:
                 self.assertEqual([None, '0|1', '0/0'], gt_nums)
                 self.assertEqual([None, 'T|A', 'T/T'], gt_bases)
                 self.assertEqual([3,2,2], ploidity)
                 self.assertEqual([[None,None,None], ['0','1'], ['0','0']], gt_alleles)
+                self.assertEqual([None, 1, 0], gt_type)
             elif var.POS == 1234567:
                 self.assertEqual(['0/1', '0/2', None], gt_nums)
                 self.assertEqual(['GTC/G', 'GTC/GTCT', None], gt_bases)
                 self.assertEqual([2,2,1], ploidity)
                 self.assertEqual([['0','1'], ['0','2'], [None]], gt_alleles)
+                self.assertEqual([1, 1, None], gt_type)
+            elif var.POS == 1234568:
+                self.assertEqual(['./1', '0/.', None], gt_nums)
+                self.assertEqual(['./G', 'GTC/.', None], gt_bases)
+                self.assertEqual([2,2,1], ploidity)
+                self.assertEqual([[None,'1'], ['0',None], [None]], gt_alleles)
+                self.assertEqual([1, 1, None], gt_type)
         reader._reader.close()
 
 
@@ -1584,6 +1740,10 @@ suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestRecord))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestCall))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestFetch))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue201))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue234))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue246))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIssue254))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestIsFiltered))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestOpenMethods))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestSampleFilter))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestFilter))
@@ -1592,3 +1752,4 @@ suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestUtils))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestGATKMeta))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestUncalledGenotypes))
 suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestStrelka))
+suite.addTests(unittest.TestLoader().loadTestsFromTestCase(TestBadInfoFields))


=====================================
vcf/test/uncalled_genotypes.vcf
=====================================
--- a/vcf/test/uncalled_genotypes.vcf
+++ b/vcf/test/uncalled_genotypes.vcf
@@ -5,3 +5,4 @@
 20	14370	rs6054257	G	A	29	PASS	NS=3	GT	0|0	./.	1/1
 20	17330	.	T	A	3	q10	NS=3	GT	././.	0|1	0/0
 20	1234567	microsat1	GTC	G,GTCT	50	PASS	NS=3	GT	0/1	0/2	.
+20	1234568	.	GTC	G,GTCT	50	PASS	NS=3	GT	./1	0/.	.


=====================================
vcf/test/walk_left.vcf
=====================================
--- a/vcf/test/walk_left.vcf
+++ b/vcf/test/walk_left.vcf
@@ -21,4 +21,5 @@
 19	17330	.	T	A	3	q10	NS=3;DP=11;AF=0.017	GT:GQ:DP:HQ	0|0:49:3:58,50	0|1:3:5:65,3	0/0:41:3:65,3
 20	1110696	rs6040355	A	G,T	67	PASS	NS=2;DP=10;AF=0.333,0.667;AA=T;DB	GT:GQ:DP:HQ	1|2:21:6:23,27	2|1:2:0:18,2	2/2:35:4:65,4
 20	1230237	.	T	.	47	PASS	NS=3;DP=13;AA=T	GT:GQ:DP:HQ	0|0:54:7:56,60	0|0:48:4:51,51	0/0:61:2:65,3
+20	1231234	.	AT	A	46	PASS	NS=3;DP=15;AA=A	GT:GQ:DP:HQ	1|1:23:7:26,30	0|0:27:9:56,60	0|0:31:10:65,71
 21	1234567	microsat1	GTCT	G,GTACT	50	PASS	NS=3;DP=9;AA=G	GT:GQ:DP	./.:35:4	0/2:17:2	1/1:40:3



View it on GitLab: https://salsa.debian.org/med-team/python-pyvcf/commit/c017d2d8db0cb2bc11c399f99e28d1953dcc7c1a

-- 
View it on GitLab: https://salsa.debian.org/med-team/python-pyvcf/commit/c017d2d8db0cb2bc11c399f99e28d1953dcc7c1a
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20180718/22a524b5/attachment-0001.html>


More information about the debian-med-commit mailing list