[med-svn] [Git][med-team/pyensembl][upstream] New upstream version 1.9.4+ds

Andreas Tille (@tille) gitlab at salsa.debian.org
Mon Oct 11 18:58:45 BST 2021



Andreas Tille pushed to branch upstream at Debian Med / pyensembl


Commits:
5fe01131 by Andreas Tille at 2021-10-11T19:55:55+02:00
New upstream version 1.9.4+ds
- - - - -


8 changed files:

- PKG-INFO
- README.md
- pyensembl.egg-info/PKG-INFO
- pyensembl/__init__.py
- pyensembl/ensembl_release_versions.py
- pyensembl/species.py
- pyensembl/transcript.py
- test/test_transcript_ids.py


Changes:

=====================================
PKG-INFO
=====================================
@@ -1,219 +1,11 @@
 Metadata-Version: 2.1
 Name: pyensembl
-Version: 1.9.1
+Version: 1.9.4
 Summary: Python interface to ensembl reference genome metadata
 Home-page: https://github.com/openvax/pyensembl
 Author: Alex Rubinsteyn
 Author-email: alex.rubinsteyn at unc.edu
 License: http://www.apache.org/licenses/LICENSE-2.0.html
-Description: <a href="https://travis-ci.org/openvax/pyensembl">
-            <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
-        </a>
-        <a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
-            <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
-        </a>
-        <a href="https://pypi.python.org/pypi/pyensembl/">
-            <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
-        </a>
-        
-        
-        PyEnsembl
-        =======
-        PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. 
-        
-        # Example Usage
-        
-        ```python
-        from pyensembl import EnsemblRelease
-        
-        # release 77 uses human reference genome GRCh38
-        data = EnsemblRelease(77)
-        
-        # will return ['HLA-A']
-        gene_names = data.gene_names_at_locus(contig=6, position=29945884)
-        
-        # get all exons associated with HLA-A
-        exon_ids  = data.exon_ids_of_gene_name('HLA-A')
-        ```
-        
-        # Installation
-        
-        You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
-        
-        ```sh
-        pip install pyensembl
-        ```
-        
-        This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
-        [BioPython](http://biopython.org/).
-        
-        Before using PyEnsembl, run the following command to download and install
-        Ensembl data:
-        
-        ```
-        pyensembl install --release <list of Ensembl release numbers> --species <species-name>
-        ```
-        
-        For example, `pyensembl install --release 75 76 --species human` will download and install all
-        human reference data from Ensembl releases 75 and 76.
-        
-        Alternatively, you can create the `EnsemblRelease` object from inside a Python
-        process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
-        
-        ## Cache Location
-        By default, PyEnsembl uses the platform-specific `Cache` folder
-        and caches the files into the `pyensembl` sub-directory.
-        You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
-        as your preferred location for caching:
-        
-        ```sh
-        export PYENSEMBL_CACHE_DIR=/custom/cache/dir
-        ```
-        
-        or
-        
-        ```python
-        import os
-        
-        os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
-        # ... PyEnsembl API usage
-        ```
-        
-        # Non-Ensembl Data
-        
-        PyEnsembl also allows arbitrary genomes via the specification
-        of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
-        and FASTA files. (Warning: GTF formats can vary, and handling of
-        non-Ensembl data is still very much in development.)
-        
-        For example:
-        
-        ```python
-        data = Genome(
-            reference_name='GRCh38',
-            annotation_name='my_genome_features',
-            gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
-        # parse GTF and construct database of genomic features
-        data.index()
-        gene_names = data.gene_names_at_locus(contig=6, position=29945884)
-        ```
-        
-        # API
-        
-        The `EnsemblRelease` object has methods to let you access all possible
-        combinations of the annotation features *gene\_name*, *gene\_id*,
-        *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
-        these genomic elements (contig, start position, end position, strand).
-        
-        ## Genes
-        
-        <dl>
-        <dt>genes(contig=None, strand=None)</dt>
-        <dd>Returns a list of Gene objects, optionally restricted to a particular contig
-        or strand.</dd>
-        
-        <dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
-        <dd>Returns a list of Gene objects overlapping a particular position on a contig,
-        optionally extend into a range with the end parameter and restrict to
-        forward or backward strand by passing strand='+' or strand='-'.</dd>
-        
-        <dt>gene_by_id(gene_id)</dt>
-        <dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
-        
-        <dt>gene_names(contig=None, strand=None)</dt>
-        <dd>Returns all gene names in the annotation database, optionally restricted
-        to a particular contig or strand.</dd>
-        
-        <dt>genes_by_name(gene_name)</dt>
-        <dd>Get all the unqiue genes with the given name (there might be multiple
-        due to copies in the genome), return a list containing a Gene object for each
-        distinct ID.</dd>
-        
-        <dt>gene_by_protein_id(protein_id)</dt>
-        <dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
-        
-        <dt>gene_names_at_locus(contig, position, end=None, strand=None)
-        </dt>
-        <dd>Names of genes overlapping with the given locus, optionally restricted by strand.
-        (returns a list to account for overlapping genes)</dd>
-        
-        <dt>gene_name_of_gene_id(gene_id)
-        </dt>
-        <dd>Returns name of gene with given genen ID.</dd>
-        
-        <dt>gene_name_of_transcript_id(transcript_id)
-        </dt><dd>Returns name of gene associated with given transcript ID.</dd>
-        
-        <dt>gene_name_of_transcript_name(transcript_name)
-        </dt>
-        <dd>Returns name of gene associated with given transcript name.</dd>
-        
-        <dt>gene_name_of_exon_id(exon_id)
-        </dt><dd>Returns name of gene associated with given exon ID.</dd>
-        
-        <dt>gene_ids(contig=None, strand=None)
-        </dt>
-        <dd>Return all gene IDs in the annotation database, optionally restricted by
-        chromosome name or strand.</dd>
-        
-        <dt>gene_ids_of_gene_name(gene_name)
-        </dt>
-        <dd>Returns all Ensembl gene IDs with the given name.</dd>
-        
-        </dl>
-        
-        ## Transcripts
-        
-        <dl>
-        <dt>transcripts(contig=None, strand=None)</dt>
-        <dd>Returns a list of Transcript objects for all transcript entries in the
-        Ensembl database, optionally restricted to a particular contig or strand.</dd>
-        
-        <dt>transcript_by_id(transcript_id)</dt>
-        <dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
-        
-        <dt>transcripts_by_name(transcript_name)</dt>
-        <dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
-        
-        <dt>transcript_names(contig=None, strand=None)</dt>
-        <dd>Returns all transcript names in the annotation database.</dd>
-        
-        <dt>transcript_ids(contig=None, strand=None)</dt>
-        <dd>Returns all transcript IDs in the annotation database.</dd>
-        
-        <dt>transcript_ids_of_gene_id(gene_id)</dt>
-        <dd>Return IDs of all transcripts associated with given gene ID.</dd>
-        
-        <dt>transcript_ids_of_gene_name(gene_name)</dt>
-        <dd>Return IDs of all transcripts associated with given gene name.</dd>
-        
-        <dt>transcript_ids_of_transcript_name(transcript_name)</dt>
-        <dd>Find all Ensembl transcript IDs with the given name.</dd>
-        
-        <dt>transcript_ids_of_exon_id(exon_id)</dt>
-        <dd>Return IDs of all transcripts associatd with given exon ID.</dd>
-        </dl>
-        
-        ## Exons
-        
-        <dl>
-        <dt>exon_ids(contig=None, strand=None)</dt>
-        <dd>Returns a list of exons IDs in the annotation database, optionally restricted
-        by the given chromosome and strand.</dd>
-        
-        <dt>exon_ids_of_gene_id(gene_id)</dt>
-        <dd>Returns a list of exon IDs associated with a given gene ID.</dd>
-        
-        <dt>exon_ids_of_gene_name(gene_name)</dt>
-        <dd>Returns a list of exon IDs associated with a given gene name.</dd>
-        
-        <dt>exon_ids_of_transcript_id(transcript_id)</dt>
-        <dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
-        
-        <dt>exon_ids_of_transcript_name(transcript_name)</dt>
-        <dd>Returns a list of exon IDs associated with a given transcript name.</dd>
-        </dl>
-        
 Platform: UNKNOWN
 Classifier: Development Status :: 3 - Alpha
 Classifier: Environment :: Console
@@ -223,3 +15,217 @@ Classifier: License :: OSI Approved :: Apache Software License
 Classifier: Programming Language :: Python
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Description-Content-Type: text/markdown
+License-File: LICENSE
+
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+    <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+</a>
+<a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
+    <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
+</a>
+<a href="https://pypi.python.org/pypi/pyensembl/">
+    <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
+</a>
+
+
+PyEnsembl
+=======
+PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. 
+
+# Example Usage
+
+```python
+from pyensembl import EnsemblRelease
+
+# release 77 uses human reference genome GRCh38
+data = EnsemblRelease(77)
+
+# will return ['HLA-A']
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+
+# get all exons associated with HLA-A
+exon_ids  = data.exon_ids_of_gene_name('HLA-A')
+```
+
+# Installation
+
+You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
+
+```sh
+pip install pyensembl
+```
+
+This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
+[BioPython](http://biopython.org/).
+
+Before using PyEnsembl, run the following command to download and install
+Ensembl data:
+
+```
+pyensembl install --release <list of Ensembl release numbers> --species <species-name>
+```
+
+For example, `pyensembl install --release 75 76 --species human` will download and install all
+human reference data from Ensembl releases 75 and 76.
+
+Alternatively, you can create the `EnsemblRelease` object from inside a Python
+process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
+
+## Cache Location
+By default, PyEnsembl uses the platform-specific `Cache` folder
+and caches the files into the `pyensembl` sub-directory.
+You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
+as your preferred location for caching:
+
+```sh
+export PYENSEMBL_CACHE_DIR=/custom/cache/dir
+```
+
+or
+
+```python
+import os
+
+os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
+# ... PyEnsembl API usage
+```
+
+# Non-Ensembl Data
+
+PyEnsembl also allows arbitrary genomes via the specification
+of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
+and FASTA files. (Warning: GTF formats can vary, and handling of
+non-Ensembl data is still very much in development.)
+
+For example:
+
+```python
+data = Genome(
+    reference_name='GRCh38',
+    annotation_name='my_genome_features',
+    gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
+# parse GTF and construct database of genomic features
+data.index()
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+```
+
+# API
+
+The `EnsemblRelease` object has methods to let you access all possible
+combinations of the annotation features *gene\_name*, *gene\_id*,
+*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
+these genomic elements (contig, start position, end position, strand).
+
+## Genes
+
+<dl>
+<dt>genes(contig=None, strand=None)</dt>
+<dd>Returns a list of Gene objects, optionally restricted to a particular contig
+or strand.</dd>
+
+<dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
+<dd>Returns a list of Gene objects overlapping a particular position on a contig,
+optionally extend into a range with the end parameter and restrict to
+forward or backward strand by passing strand='+' or strand='-'.</dd>
+
+<dt>gene_by_id(gene_id)</dt>
+<dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
+
+<dt>gene_names(contig=None, strand=None)</dt>
+<dd>Returns all gene names in the annotation database, optionally restricted
+to a particular contig or strand.</dd>
+
+<dt>genes_by_name(gene_name)</dt>
+<dd>Get all the unqiue genes with the given name (there might be multiple
+due to copies in the genome), return a list containing a Gene object for each
+distinct ID.</dd>
+
+<dt>gene_by_protein_id(protein_id)</dt>
+<dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
+
+<dt>gene_names_at_locus(contig, position, end=None, strand=None)
+</dt>
+<dd>Names of genes overlapping with the given locus, optionally restricted by strand.
+(returns a list to account for overlapping genes)</dd>
+
+<dt>gene_name_of_gene_id(gene_id)
+</dt>
+<dd>Returns name of gene with given genen ID.</dd>
+
+<dt>gene_name_of_transcript_id(transcript_id)
+</dt><dd>Returns name of gene associated with given transcript ID.</dd>
+
+<dt>gene_name_of_transcript_name(transcript_name)
+</dt>
+<dd>Returns name of gene associated with given transcript name.</dd>
+
+<dt>gene_name_of_exon_id(exon_id)
+</dt><dd>Returns name of gene associated with given exon ID.</dd>
+
+<dt>gene_ids(contig=None, strand=None)
+</dt>
+<dd>Return all gene IDs in the annotation database, optionally restricted by
+chromosome name or strand.</dd>
+
+<dt>gene_ids_of_gene_name(gene_name)
+</dt>
+<dd>Returns all Ensembl gene IDs with the given name.</dd>
+
+</dl>
+
+## Transcripts
+
+<dl>
+<dt>transcripts(contig=None, strand=None)</dt>
+<dd>Returns a list of Transcript objects for all transcript entries in the
+Ensembl database, optionally restricted to a particular contig or strand.</dd>
+
+<dt>transcript_by_id(transcript_id)</dt>
+<dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
+
+<dt>transcripts_by_name(transcript_name)</dt>
+<dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
+
+<dt>transcript_names(contig=None, strand=None)</dt>
+<dd>Returns all transcript names in the annotation database.</dd>
+
+<dt>transcript_ids(contig=None, strand=None)</dt>
+<dd>Returns all transcript IDs in the annotation database.</dd>
+
+<dt>transcript_ids_of_gene_id(gene_id)</dt>
+<dd>Return IDs of all transcripts associated with given gene ID.</dd>
+
+<dt>transcript_ids_of_gene_name(gene_name)</dt>
+<dd>Return IDs of all transcripts associated with given gene name.</dd>
+
+<dt>transcript_ids_of_transcript_name(transcript_name)</dt>
+<dd>Find all Ensembl transcript IDs with the given name.</dd>
+
+<dt>transcript_ids_of_exon_id(exon_id)</dt>
+<dd>Return IDs of all transcripts associatd with given exon ID.</dd>
+</dl>
+
+## Exons
+
+<dl>
+<dt>exon_ids(contig=None, strand=None)</dt>
+<dd>Returns a list of exons IDs in the annotation database, optionally restricted
+by the given chromosome and strand.</dd>
+
+<dt>exon_by_id(exon_id)</dt>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
+
+<dt>exon_ids_of_gene_id(gene_id)</dt>
+<dd>Returns a list of exon IDs associated with a given gene ID.</dd>
+
+<dt>exon_ids_of_gene_name(gene_name)</dt>
+<dd>Returns a list of exon IDs associated with a given gene name.</dd>
+
+<dt>exon_ids_of_transcript_id(transcript_id)</dt>
+<dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
+
+<dt>exon_ids_of_transcript_name(transcript_name)</dt>
+<dd>Returns a list of exon IDs associated with a given transcript name.</dd>
+</dl>
+
+


=====================================
README.md
=====================================
@@ -1,5 +1,5 @@
-<a href="https://travis-ci.org/openvax/pyensembl">
-    <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+    <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
 </a>
 <a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
     <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
@@ -193,6 +193,9 @@ Ensembl database, optionally restricted to a particular contig or strand.</dd>
 <dd>Returns a list of exons IDs in the annotation database, optionally restricted
 by the given chromosome and strand.</dd>
 
+<dt>exon_by_id(exon_id)</dt>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
+
 <dt>exon_ids_of_gene_id(gene_id)</dt>
 <dd>Returns a list of exon IDs associated with a given gene ID.</dd>
 


=====================================
pyensembl.egg-info/PKG-INFO
=====================================
@@ -1,219 +1,11 @@
 Metadata-Version: 2.1
 Name: pyensembl
-Version: 1.9.1
+Version: 1.9.4
 Summary: Python interface to ensembl reference genome metadata
 Home-page: https://github.com/openvax/pyensembl
 Author: Alex Rubinsteyn
 Author-email: alex.rubinsteyn at unc.edu
 License: http://www.apache.org/licenses/LICENSE-2.0.html
-Description: <a href="https://travis-ci.org/openvax/pyensembl">
-            <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
-        </a>
-        <a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
-            <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
-        </a>
-        <a href="https://pypi.python.org/pypi/pyensembl/">
-            <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
-        </a>
-        
-        
-        PyEnsembl
-        =======
-        PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. 
-        
-        # Example Usage
-        
-        ```python
-        from pyensembl import EnsemblRelease
-        
-        # release 77 uses human reference genome GRCh38
-        data = EnsemblRelease(77)
-        
-        # will return ['HLA-A']
-        gene_names = data.gene_names_at_locus(contig=6, position=29945884)
-        
-        # get all exons associated with HLA-A
-        exon_ids  = data.exon_ids_of_gene_name('HLA-A')
-        ```
-        
-        # Installation
-        
-        You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
-        
-        ```sh
-        pip install pyensembl
-        ```
-        
-        This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
-        [BioPython](http://biopython.org/).
-        
-        Before using PyEnsembl, run the following command to download and install
-        Ensembl data:
-        
-        ```
-        pyensembl install --release <list of Ensembl release numbers> --species <species-name>
-        ```
-        
-        For example, `pyensembl install --release 75 76 --species human` will download and install all
-        human reference data from Ensembl releases 75 and 76.
-        
-        Alternatively, you can create the `EnsemblRelease` object from inside a Python
-        process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
-        
-        ## Cache Location
-        By default, PyEnsembl uses the platform-specific `Cache` folder
-        and caches the files into the `pyensembl` sub-directory.
-        You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
-        as your preferred location for caching:
-        
-        ```sh
-        export PYENSEMBL_CACHE_DIR=/custom/cache/dir
-        ```
-        
-        or
-        
-        ```python
-        import os
-        
-        os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
-        # ... PyEnsembl API usage
-        ```
-        
-        # Non-Ensembl Data
-        
-        PyEnsembl also allows arbitrary genomes via the specification
-        of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
-        and FASTA files. (Warning: GTF formats can vary, and handling of
-        non-Ensembl data is still very much in development.)
-        
-        For example:
-        
-        ```python
-        data = Genome(
-            reference_name='GRCh38',
-            annotation_name='my_genome_features',
-            gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
-        # parse GTF and construct database of genomic features
-        data.index()
-        gene_names = data.gene_names_at_locus(contig=6, position=29945884)
-        ```
-        
-        # API
-        
-        The `EnsemblRelease` object has methods to let you access all possible
-        combinations of the annotation features *gene\_name*, *gene\_id*,
-        *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
-        these genomic elements (contig, start position, end position, strand).
-        
-        ## Genes
-        
-        <dl>
-        <dt>genes(contig=None, strand=None)</dt>
-        <dd>Returns a list of Gene objects, optionally restricted to a particular contig
-        or strand.</dd>
-        
-        <dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
-        <dd>Returns a list of Gene objects overlapping a particular position on a contig,
-        optionally extend into a range with the end parameter and restrict to
-        forward or backward strand by passing strand='+' or strand='-'.</dd>
-        
-        <dt>gene_by_id(gene_id)</dt>
-        <dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
-        
-        <dt>gene_names(contig=None, strand=None)</dt>
-        <dd>Returns all gene names in the annotation database, optionally restricted
-        to a particular contig or strand.</dd>
-        
-        <dt>genes_by_name(gene_name)</dt>
-        <dd>Get all the unqiue genes with the given name (there might be multiple
-        due to copies in the genome), return a list containing a Gene object for each
-        distinct ID.</dd>
-        
-        <dt>gene_by_protein_id(protein_id)</dt>
-        <dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
-        
-        <dt>gene_names_at_locus(contig, position, end=None, strand=None)
-        </dt>
-        <dd>Names of genes overlapping with the given locus, optionally restricted by strand.
-        (returns a list to account for overlapping genes)</dd>
-        
-        <dt>gene_name_of_gene_id(gene_id)
-        </dt>
-        <dd>Returns name of gene with given genen ID.</dd>
-        
-        <dt>gene_name_of_transcript_id(transcript_id)
-        </dt><dd>Returns name of gene associated with given transcript ID.</dd>
-        
-        <dt>gene_name_of_transcript_name(transcript_name)
-        </dt>
-        <dd>Returns name of gene associated with given transcript name.</dd>
-        
-        <dt>gene_name_of_exon_id(exon_id)
-        </dt><dd>Returns name of gene associated with given exon ID.</dd>
-        
-        <dt>gene_ids(contig=None, strand=None)
-        </dt>
-        <dd>Return all gene IDs in the annotation database, optionally restricted by
-        chromosome name or strand.</dd>
-        
-        <dt>gene_ids_of_gene_name(gene_name)
-        </dt>
-        <dd>Returns all Ensembl gene IDs with the given name.</dd>
-        
-        </dl>
-        
-        ## Transcripts
-        
-        <dl>
-        <dt>transcripts(contig=None, strand=None)</dt>
-        <dd>Returns a list of Transcript objects for all transcript entries in the
-        Ensembl database, optionally restricted to a particular contig or strand.</dd>
-        
-        <dt>transcript_by_id(transcript_id)</dt>
-        <dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
-        
-        <dt>transcripts_by_name(transcript_name)</dt>
-        <dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
-        
-        <dt>transcript_names(contig=None, strand=None)</dt>
-        <dd>Returns all transcript names in the annotation database.</dd>
-        
-        <dt>transcript_ids(contig=None, strand=None)</dt>
-        <dd>Returns all transcript IDs in the annotation database.</dd>
-        
-        <dt>transcript_ids_of_gene_id(gene_id)</dt>
-        <dd>Return IDs of all transcripts associated with given gene ID.</dd>
-        
-        <dt>transcript_ids_of_gene_name(gene_name)</dt>
-        <dd>Return IDs of all transcripts associated with given gene name.</dd>
-        
-        <dt>transcript_ids_of_transcript_name(transcript_name)</dt>
-        <dd>Find all Ensembl transcript IDs with the given name.</dd>
-        
-        <dt>transcript_ids_of_exon_id(exon_id)</dt>
-        <dd>Return IDs of all transcripts associatd with given exon ID.</dd>
-        </dl>
-        
-        ## Exons
-        
-        <dl>
-        <dt>exon_ids(contig=None, strand=None)</dt>
-        <dd>Returns a list of exons IDs in the annotation database, optionally restricted
-        by the given chromosome and strand.</dd>
-        
-        <dt>exon_ids_of_gene_id(gene_id)</dt>
-        <dd>Returns a list of exon IDs associated with a given gene ID.</dd>
-        
-        <dt>exon_ids_of_gene_name(gene_name)</dt>
-        <dd>Returns a list of exon IDs associated with a given gene name.</dd>
-        
-        <dt>exon_ids_of_transcript_id(transcript_id)</dt>
-        <dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
-        
-        <dt>exon_ids_of_transcript_name(transcript_name)</dt>
-        <dd>Returns a list of exon IDs associated with a given transcript name.</dd>
-        </dl>
-        
 Platform: UNKNOWN
 Classifier: Development Status :: 3 - Alpha
 Classifier: Environment :: Console
@@ -223,3 +15,217 @@ Classifier: License :: OSI Approved :: Apache Software License
 Classifier: Programming Language :: Python
 Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
 Description-Content-Type: text/markdown
+License-File: LICENSE
+
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+    <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+</a>
+<a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
+    <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
+</a>
+<a href="https://pypi.python.org/pypi/pyensembl/">
+    <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
+</a>
+
+
+PyEnsembl
+=======
+PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files. 
+
+# Example Usage
+
+```python
+from pyensembl import EnsemblRelease
+
+# release 77 uses human reference genome GRCh38
+data = EnsemblRelease(77)
+
+# will return ['HLA-A']
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+
+# get all exons associated with HLA-A
+exon_ids  = data.exon_ids_of_gene_name('HLA-A')
+```
+
+# Installation
+
+You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
+
+```sh
+pip install pyensembl
+```
+
+This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
+[BioPython](http://biopython.org/).
+
+Before using PyEnsembl, run the following command to download and install
+Ensembl data:
+
+```
+pyensembl install --release <list of Ensembl release numbers> --species <species-name>
+```
+
+For example, `pyensembl install --release 75 76 --species human` will download and install all
+human reference data from Ensembl releases 75 and 76.
+
+Alternatively, you can create the `EnsemblRelease` object from inside a Python
+process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
+
+## Cache Location
+By default, PyEnsembl uses the platform-specific `Cache` folder
+and caches the files into the `pyensembl` sub-directory.
+You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
+as your preferred location for caching:
+
+```sh
+export PYENSEMBL_CACHE_DIR=/custom/cache/dir
+```
+
+or
+
+```python
+import os
+
+os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
+# ... PyEnsembl API usage
+```
+
+# Non-Ensembl Data
+
+PyEnsembl also allows arbitrary genomes via the specification
+of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
+and FASTA files. (Warning: GTF formats can vary, and handling of
+non-Ensembl data is still very much in development.)
+
+For example:
+
+```python
+data = Genome(
+    reference_name='GRCh38',
+    annotation_name='my_genome_features',
+    gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
+# parse GTF and construct database of genomic features
+data.index()
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+```
+
+# API
+
+The `EnsemblRelease` object has methods to let you access all possible
+combinations of the annotation features *gene\_name*, *gene\_id*,
+*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
+these genomic elements (contig, start position, end position, strand).
+
+## Genes
+
+<dl>
+<dt>genes(contig=None, strand=None)</dt>
+<dd>Returns a list of Gene objects, optionally restricted to a particular contig
+or strand.</dd>
+
+<dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
+<dd>Returns a list of Gene objects overlapping a particular position on a contig,
+optionally extend into a range with the end parameter and restrict to
+forward or backward strand by passing strand='+' or strand='-'.</dd>
+
+<dt>gene_by_id(gene_id)</dt>
+<dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
+
+<dt>gene_names(contig=None, strand=None)</dt>
+<dd>Returns all gene names in the annotation database, optionally restricted
+to a particular contig or strand.</dd>
+
+<dt>genes_by_name(gene_name)</dt>
+<dd>Get all the unqiue genes with the given name (there might be multiple
+due to copies in the genome), return a list containing a Gene object for each
+distinct ID.</dd>
+
+<dt>gene_by_protein_id(protein_id)</dt>
+<dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
+
+<dt>gene_names_at_locus(contig, position, end=None, strand=None)
+</dt>
+<dd>Names of genes overlapping with the given locus, optionally restricted by strand.
+(returns a list to account for overlapping genes)</dd>
+
+<dt>gene_name_of_gene_id(gene_id)
+</dt>
+<dd>Returns name of gene with given genen ID.</dd>
+
+<dt>gene_name_of_transcript_id(transcript_id)
+</dt><dd>Returns name of gene associated with given transcript ID.</dd>
+
+<dt>gene_name_of_transcript_name(transcript_name)
+</dt>
+<dd>Returns name of gene associated with given transcript name.</dd>
+
+<dt>gene_name_of_exon_id(exon_id)
+</dt><dd>Returns name of gene associated with given exon ID.</dd>
+
+<dt>gene_ids(contig=None, strand=None)
+</dt>
+<dd>Return all gene IDs in the annotation database, optionally restricted by
+chromosome name or strand.</dd>
+
+<dt>gene_ids_of_gene_name(gene_name)
+</dt>
+<dd>Returns all Ensembl gene IDs with the given name.</dd>
+
+</dl>
+
+## Transcripts
+
+<dl>
+<dt>transcripts(contig=None, strand=None)</dt>
+<dd>Returns a list of Transcript objects for all transcript entries in the
+Ensembl database, optionally restricted to a particular contig or strand.</dd>
+
+<dt>transcript_by_id(transcript_id)</dt>
+<dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
+
+<dt>transcripts_by_name(transcript_name)</dt>
+<dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
+
+<dt>transcript_names(contig=None, strand=None)</dt>
+<dd>Returns all transcript names in the annotation database.</dd>
+
+<dt>transcript_ids(contig=None, strand=None)</dt>
+<dd>Returns all transcript IDs in the annotation database.</dd>
+
+<dt>transcript_ids_of_gene_id(gene_id)</dt>
+<dd>Return IDs of all transcripts associated with given gene ID.</dd>
+
+<dt>transcript_ids_of_gene_name(gene_name)</dt>
+<dd>Return IDs of all transcripts associated with given gene name.</dd>
+
+<dt>transcript_ids_of_transcript_name(transcript_name)</dt>
+<dd>Find all Ensembl transcript IDs with the given name.</dd>
+
+<dt>transcript_ids_of_exon_id(exon_id)</dt>
+<dd>Return IDs of all transcripts associatd with given exon ID.</dd>
+</dl>
+
+## Exons
+
+<dl>
+<dt>exon_ids(contig=None, strand=None)</dt>
+<dd>Returns a list of exons IDs in the annotation database, optionally restricted
+by the given chromosome and strand.</dd>
+
+<dt>exon_by_id(exon_id)</dt>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
+
+<dt>exon_ids_of_gene_id(gene_id)</dt>
+<dd>Returns a list of exon IDs associated with a given gene ID.</dd>
+
+<dt>exon_ids_of_gene_name(gene_name)</dt>
+<dd>Returns a list of exon IDs associated with a given gene name.</dd>
+
+<dt>exon_ids_of_transcript_id(transcript_id)</dt>
+<dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
+
+<dt>exon_ids_of_transcript_name(transcript_name)</dt>
+<dd>Returns a list of exon IDs associated with a given transcript name.</dd>
+</dl>
+
+


=====================================
pyensembl/__init__.py
=====================================
@@ -40,7 +40,7 @@ from .species import (
 )
 from .transcript import Transcript
 
-__version__ = '1.9.1'
+__version__ = '1.9.4'
 
 __all__ = [
     "MemoryCache",


=====================================
pyensembl/ensembl_release_versions.py
=====================================
@@ -13,7 +13,7 @@
 from __future__ import print_function, division, absolute_import
 
 MIN_ENSEMBL_RELEASE = 54
-MAX_ENSEMBL_RELEASE = 102
+MAX_ENSEMBL_RELEASE = 104
 
 def check_release_number(release):
     """


=====================================
pyensembl/species.py
=====================================
@@ -177,8 +177,8 @@ mouse = Species.register(
     synonyms=["mouse", "house mouse"],
     reference_assemblies={
         "NCBIM37": (54, 67),
-        "GRCm38": (68, MAX_ENSEMBL_RELEASE),
-
+        "GRCm38": (68, 102),
+        "GRCm39": (103, MAX_ENSEMBL_RELEASE),
     })
 
 dog = Species.register(


=====================================
pyensembl/transcript.py
=====================================
@@ -418,7 +418,7 @@ class Transcript(LocusWithGenome):
         Spliced cDNA sequence of transcript
         (includes 5" UTR, coding sequence, and 3" UTR)
         """
-        return self.genome.transcript_sequences.get(self.id)
+        return self.genome.transcript_sequences.get(self.transcript_id.rsplit(".", 1)[0])
 
     @memoized_property
     def first_start_codon_spliced_offset(self):


=====================================
test/test_transcript_ids.py
=====================================
@@ -50,8 +50,9 @@ def test_all_transcript_ids(ensembl):
             "Missing transcript ID %s from %s" % (transcript_id, ensembl)
 
 def test_transcript_id_of_protein_id_CCR2():
-    # looked up CCR2-201 transcript ID ENST00000292301 mapping to protein ID
-    # ENSP00000292301 on Sept. 14th 2016 from GRCh38 Ensembl release 85
+    # Looked up on Oct 9 2021:
+    # CCR2-203 ENST00000445132.3 maps to ENSP00000399285.2
+    # Ensembl release 104, GRCh38.p13
     transcript_id = grch38.transcript_id_of_protein_id(
-        "ENSP00000292301")
-    eq_("ENST00000292301", transcript_id)
+        "ENSP00000399285")
+    eq_("ENST00000445132", transcript_id)



View it on GitLab: https://salsa.debian.org/med-team/pyensembl/-/commit/5fe011317e77c7200bbd0938248482f6da6f523c

-- 
View it on GitLab: https://salsa.debian.org/med-team/pyensembl/-/commit/5fe011317e77c7200bbd0938248482f6da6f523c
You're receiving this email because of your account on salsa.debian.org.


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20211011/9d239995/attachment-0001.htm>


More information about the debian-med-commit mailing list