[med-svn] [Git][med-team/pyensembl][upstream] New upstream version 1.9.4+ds
Andreas Tille (@tille)
gitlab at salsa.debian.org
Mon Oct 11 18:58:45 BST 2021
Andreas Tille pushed to branch upstream at Debian Med / pyensembl
5fe01131 by Andreas Tille at 2021-10-11T19:55:55+02:00
New upstream version 1.9.4+ds
- - - - -
8 changed files:
- pyensembl.egg-info/PKG-INFO
- pyensembl/__init__.py
- pyensembl/ensembl_release_versions.py
- pyensembl/species.py
- pyensembl/transcript.py
- test/test_transcript_ids.py
@@ -1,219 +1,11 @@
Metadata-Version: 2.1
Name: pyensembl
-Version: 1.9.1
+Version: 1.9.4
Summary: Python interface to ensembl reference genome metadata
Home-page: https://github.com/openvax/pyensembl
Author: Alex Rubinsteyn
Author-email: alex.rubinsteyn at unc.edu
License: http://www.apache.org/licenses/LICENSE-2.0.html
-Description: <a href="https://travis-ci.org/openvax/pyensembl">
- <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
- </a>
- <a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
- <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
- </a>
- <a href="https://pypi.python.org/pypi/pyensembl/">
- <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
- </a>
- PyEnsembl
- =======
- PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.
- # Example Usage
- ```python
- from pyensembl import EnsemblRelease
- # release 77 uses human reference genome GRCh38
- data = EnsemblRelease(77)
- # will return ['HLA-A']
- gene_names = data.gene_names_at_locus(contig=6, position=29945884)
- # get all exons associated with HLA-A
- exon_ids = data.exon_ids_of_gene_name('HLA-A')
- ```
- # Installation
- You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
- ```sh
- pip install pyensembl
- ```
- This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
- [BioPython](http://biopython.org/).
- Before using PyEnsembl, run the following command to download and install
- Ensembl data:
- ```
- pyensembl install --release <list of Ensembl release numbers> --species <species-name>
- ```
- For example, `pyensembl install --release 75 76 --species human` will download and install all
- human reference data from Ensembl releases 75 and 76.
- Alternatively, you can create the `EnsemblRelease` object from inside a Python
- process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
- ## Cache Location
- By default, PyEnsembl uses the platform-specific `Cache` folder
- and caches the files into the `pyensembl` sub-directory.
- You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
- as your preferred location for caching:
- ```sh
- export PYENSEMBL_CACHE_DIR=/custom/cache/dir
- ```
- or
- ```python
- import os
- os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
- # ... PyEnsembl API usage
- ```
- # Non-Ensembl Data
- PyEnsembl also allows arbitrary genomes via the specification
- of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
- and FASTA files. (Warning: GTF formats can vary, and handling of
- non-Ensembl data is still very much in development.)
- For example:
- ```python
- data = Genome(
- reference_name='GRCh38',
- annotation_name='my_genome_features',
- gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
- # parse GTF and construct database of genomic features
- data.index()
- gene_names = data.gene_names_at_locus(contig=6, position=29945884)
- ```
- # API
- The `EnsemblRelease` object has methods to let you access all possible
- combinations of the annotation features *gene\_name*, *gene\_id*,
- *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
- these genomic elements (contig, start position, end position, strand).
- ## Genes
- <dl>
- <dt>genes(contig=None, strand=None)</dt>
- <dd>Returns a list of Gene objects, optionally restricted to a particular contig
- or strand.</dd>
- <dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
- <dd>Returns a list of Gene objects overlapping a particular position on a contig,
- optionally extend into a range with the end parameter and restrict to
- forward or backward strand by passing strand='+' or strand='-'.</dd>
- <dt>gene_by_id(gene_id)</dt>
- <dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
- <dt>gene_names(contig=None, strand=None)</dt>
- <dd>Returns all gene names in the annotation database, optionally restricted
- to a particular contig or strand.</dd>
- <dt>genes_by_name(gene_name)</dt>
- <dd>Get all the unqiue genes with the given name (there might be multiple
- due to copies in the genome), return a list containing a Gene object for each
- distinct ID.</dd>
- <dt>gene_by_protein_id(protein_id)</dt>
- <dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
- <dt>gene_names_at_locus(contig, position, end=None, strand=None)
- </dt>
- <dd>Names of genes overlapping with the given locus, optionally restricted by strand.
- (returns a list to account for overlapping genes)</dd>
- <dt>gene_name_of_gene_id(gene_id)
- </dt>
- <dd>Returns name of gene with given genen ID.</dd>
- <dt>gene_name_of_transcript_id(transcript_id)
- </dt><dd>Returns name of gene associated with given transcript ID.</dd>
- <dt>gene_name_of_transcript_name(transcript_name)
- </dt>
- <dd>Returns name of gene associated with given transcript name.</dd>
- <dt>gene_name_of_exon_id(exon_id)
- </dt><dd>Returns name of gene associated with given exon ID.</dd>
- <dt>gene_ids(contig=None, strand=None)
- </dt>
- <dd>Return all gene IDs in the annotation database, optionally restricted by
- chromosome name or strand.</dd>
- <dt>gene_ids_of_gene_name(gene_name)
- </dt>
- <dd>Returns all Ensembl gene IDs with the given name.</dd>
- </dl>
- ## Transcripts
- <dl>
- <dt>transcripts(contig=None, strand=None)</dt>
- <dd>Returns a list of Transcript objects for all transcript entries in the
- Ensembl database, optionally restricted to a particular contig or strand.</dd>
- <dt>transcript_by_id(transcript_id)</dt>
- <dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
- <dt>transcripts_by_name(transcript_name)</dt>
- <dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
- <dt>transcript_names(contig=None, strand=None)</dt>
- <dd>Returns all transcript names in the annotation database.</dd>
- <dt>transcript_ids(contig=None, strand=None)</dt>
- <dd>Returns all transcript IDs in the annotation database.</dd>
- <dt>transcript_ids_of_gene_id(gene_id)</dt>
- <dd>Return IDs of all transcripts associated with given gene ID.</dd>
- <dt>transcript_ids_of_gene_name(gene_name)</dt>
- <dd>Return IDs of all transcripts associated with given gene name.</dd>
- <dt>transcript_ids_of_transcript_name(transcript_name)</dt>
- <dd>Find all Ensembl transcript IDs with the given name.</dd>
- <dt>transcript_ids_of_exon_id(exon_id)</dt>
- <dd>Return IDs of all transcripts associatd with given exon ID.</dd>
- </dl>
- ## Exons
- <dl>
- <dt>exon_ids(contig=None, strand=None)</dt>
- <dd>Returns a list of exons IDs in the annotation database, optionally restricted
- by the given chromosome and strand.</dd>
- <dt>exon_ids_of_gene_id(gene_id)</dt>
- <dd>Returns a list of exon IDs associated with a given gene ID.</dd>
- <dt>exon_ids_of_gene_name(gene_name)</dt>
- <dd>Returns a list of exon IDs associated with a given gene name.</dd>
- <dt>exon_ids_of_transcript_id(transcript_id)</dt>
- <dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
- <dt>exon_ids_of_transcript_name(transcript_name)</dt>
- <dd>Returns a list of exon IDs associated with a given transcript name.</dd>
- </dl>
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
@@ -223,3 +15,217 @@ Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
+License-File: LICENSE
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+ <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+<a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
+ <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
+<a href="https://pypi.python.org/pypi/pyensembl/">
+ <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
+PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.
+# Example Usage
+from pyensembl import EnsemblRelease
+# release 77 uses human reference genome GRCh38
+data = EnsemblRelease(77)
+# will return ['HLA-A']
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+# get all exons associated with HLA-A
+exon_ids = data.exon_ids_of_gene_name('HLA-A')
+# Installation
+You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
+pip install pyensembl
+This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
+Before using PyEnsembl, run the following command to download and install
+Ensembl data:
+pyensembl install --release <list of Ensembl release numbers> --species <species-name>
+For example, `pyensembl install --release 75 76 --species human` will download and install all
+human reference data from Ensembl releases 75 and 76.
+Alternatively, you can create the `EnsemblRelease` object from inside a Python
+process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
+## Cache Location
+By default, PyEnsembl uses the platform-specific `Cache` folder
+and caches the files into the `pyensembl` sub-directory.
+You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
+as your preferred location for caching:
+export PYENSEMBL_CACHE_DIR=/custom/cache/dir
+import os
+os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
+# ... PyEnsembl API usage
+# Non-Ensembl Data
+PyEnsembl also allows arbitrary genomes via the specification
+of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
+and FASTA files. (Warning: GTF formats can vary, and handling of
+non-Ensembl data is still very much in development.)
+For example:
+data = Genome(
+ reference_name='GRCh38',
+ annotation_name='my_genome_features',
+ gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
+# parse GTF and construct database of genomic features
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+# API
+The `EnsemblRelease` object has methods to let you access all possible
+combinations of the annotation features *gene\_name*, *gene\_id*,
+*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
+these genomic elements (contig, start position, end position, strand).
+## Genes
+<dt>genes(contig=None, strand=None)</dt>
+<dd>Returns a list of Gene objects, optionally restricted to a particular contig
+or strand.</dd>
+<dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
+<dd>Returns a list of Gene objects overlapping a particular position on a contig,
+optionally extend into a range with the end parameter and restrict to
+forward or backward strand by passing strand='+' or strand='-'.</dd>
+<dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
+<dt>gene_names(contig=None, strand=None)</dt>
+<dd>Returns all gene names in the annotation database, optionally restricted
+to a particular contig or strand.</dd>
+<dd>Get all the unqiue genes with the given name (there might be multiple
+due to copies in the genome), return a list containing a Gene object for each
+distinct ID.</dd>
+<dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
+<dt>gene_names_at_locus(contig, position, end=None, strand=None)
+<dd>Names of genes overlapping with the given locus, optionally restricted by strand.
+(returns a list to account for overlapping genes)</dd>
+<dd>Returns name of gene with given genen ID.</dd>
+</dt><dd>Returns name of gene associated with given transcript ID.</dd>
+<dd>Returns name of gene associated with given transcript name.</dd>
+</dt><dd>Returns name of gene associated with given exon ID.</dd>
+<dt>gene_ids(contig=None, strand=None)
+<dd>Return all gene IDs in the annotation database, optionally restricted by
+chromosome name or strand.</dd>
+<dd>Returns all Ensembl gene IDs with the given name.</dd>
+## Transcripts
+<dt>transcripts(contig=None, strand=None)</dt>
+<dd>Returns a list of Transcript objects for all transcript entries in the
+Ensembl database, optionally restricted to a particular contig or strand.</dd>
+<dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
+<dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
+<dt>transcript_names(contig=None, strand=None)</dt>
+<dd>Returns all transcript names in the annotation database.</dd>
+<dt>transcript_ids(contig=None, strand=None)</dt>
+<dd>Returns all transcript IDs in the annotation database.</dd>
+<dd>Return IDs of all transcripts associated with given gene ID.</dd>
+<dd>Return IDs of all transcripts associated with given gene name.</dd>
+<dd>Find all Ensembl transcript IDs with the given name.</dd>
+<dd>Return IDs of all transcripts associatd with given exon ID.</dd>
+## Exons
+<dt>exon_ids(contig=None, strand=None)</dt>
+<dd>Returns a list of exons IDs in the annotation database, optionally restricted
+by the given chromosome and strand.</dd>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
+<dd>Returns a list of exon IDs associated with a given gene ID.</dd>
+<dd>Returns a list of exon IDs associated with a given gene name.</dd>
+<dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
+<dd>Returns a list of exon IDs associated with a given transcript name.</dd>
@@ -1,5 +1,5 @@
-<a href="https://travis-ci.org/openvax/pyensembl">
- <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+ <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
<a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
<img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
@@ -193,6 +193,9 @@ Ensembl database, optionally restricted to a particular contig or strand.</dd>
<dd>Returns a list of exons IDs in the annotation database, optionally restricted
by the given chromosome and strand.</dd>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
<dd>Returns a list of exon IDs associated with a given gene ID.</dd>
@@ -1,219 +1,11 @@
Metadata-Version: 2.1
Name: pyensembl
-Version: 1.9.1
+Version: 1.9.4
Summary: Python interface to ensembl reference genome metadata
Home-page: https://github.com/openvax/pyensembl
Author: Alex Rubinsteyn
Author-email: alex.rubinsteyn at unc.edu
License: http://www.apache.org/licenses/LICENSE-2.0.html
-Description: <a href="https://travis-ci.org/openvax/pyensembl">
- <img src="https://travis-ci.org/openvax/pyensembl.svg?branch=master" alt="Build Status" />
- </a>
- <a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
- <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
- </a>
- <a href="https://pypi.python.org/pypi/pyensembl/">
- <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
- </a>
- PyEnsembl
- =======
- PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.
- # Example Usage
- ```python
- from pyensembl import EnsemblRelease
- # release 77 uses human reference genome GRCh38
- data = EnsemblRelease(77)
- # will return ['HLA-A']
- gene_names = data.gene_names_at_locus(contig=6, position=29945884)
- # get all exons associated with HLA-A
- exon_ids = data.exon_ids_of_gene_name('HLA-A')
- ```
- # Installation
- You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
- ```sh
- pip install pyensembl
- ```
- This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
- [BioPython](http://biopython.org/).
- Before using PyEnsembl, run the following command to download and install
- Ensembl data:
- ```
- pyensembl install --release <list of Ensembl release numbers> --species <species-name>
- ```
- For example, `pyensembl install --release 75 76 --species human` will download and install all
- human reference data from Ensembl releases 75 and 76.
- Alternatively, you can create the `EnsemblRelease` object from inside a Python
- process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
- ## Cache Location
- By default, PyEnsembl uses the platform-specific `Cache` folder
- and caches the files into the `pyensembl` sub-directory.
- You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
- as your preferred location for caching:
- ```sh
- export PYENSEMBL_CACHE_DIR=/custom/cache/dir
- ```
- or
- ```python
- import os
- os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
- # ... PyEnsembl API usage
- ```
- # Non-Ensembl Data
- PyEnsembl also allows arbitrary genomes via the specification
- of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
- and FASTA files. (Warning: GTF formats can vary, and handling of
- non-Ensembl data is still very much in development.)
- For example:
- ```python
- data = Genome(
- reference_name='GRCh38',
- annotation_name='my_genome_features',
- gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
- # parse GTF and construct database of genomic features
- data.index()
- gene_names = data.gene_names_at_locus(contig=6, position=29945884)
- ```
- # API
- The `EnsemblRelease` object has methods to let you access all possible
- combinations of the annotation features *gene\_name*, *gene\_id*,
- *transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
- these genomic elements (contig, start position, end position, strand).
- ## Genes
- <dl>
- <dt>genes(contig=None, strand=None)</dt>
- <dd>Returns a list of Gene objects, optionally restricted to a particular contig
- or strand.</dd>
- <dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
- <dd>Returns a list of Gene objects overlapping a particular position on a contig,
- optionally extend into a range with the end parameter and restrict to
- forward or backward strand by passing strand='+' or strand='-'.</dd>
- <dt>gene_by_id(gene_id)</dt>
- <dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
- <dt>gene_names(contig=None, strand=None)</dt>
- <dd>Returns all gene names in the annotation database, optionally restricted
- to a particular contig or strand.</dd>
- <dt>genes_by_name(gene_name)</dt>
- <dd>Get all the unqiue genes with the given name (there might be multiple
- due to copies in the genome), return a list containing a Gene object for each
- distinct ID.</dd>
- <dt>gene_by_protein_id(protein_id)</dt>
- <dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
- <dt>gene_names_at_locus(contig, position, end=None, strand=None)
- </dt>
- <dd>Names of genes overlapping with the given locus, optionally restricted by strand.
- (returns a list to account for overlapping genes)</dd>
- <dt>gene_name_of_gene_id(gene_id)
- </dt>
- <dd>Returns name of gene with given genen ID.</dd>
- <dt>gene_name_of_transcript_id(transcript_id)
- </dt><dd>Returns name of gene associated with given transcript ID.</dd>
- <dt>gene_name_of_transcript_name(transcript_name)
- </dt>
- <dd>Returns name of gene associated with given transcript name.</dd>
- <dt>gene_name_of_exon_id(exon_id)
- </dt><dd>Returns name of gene associated with given exon ID.</dd>
- <dt>gene_ids(contig=None, strand=None)
- </dt>
- <dd>Return all gene IDs in the annotation database, optionally restricted by
- chromosome name or strand.</dd>
- <dt>gene_ids_of_gene_name(gene_name)
- </dt>
- <dd>Returns all Ensembl gene IDs with the given name.</dd>
- </dl>
- ## Transcripts
- <dl>
- <dt>transcripts(contig=None, strand=None)</dt>
- <dd>Returns a list of Transcript objects for all transcript entries in the
- Ensembl database, optionally restricted to a particular contig or strand.</dd>
- <dt>transcript_by_id(transcript_id)</dt>
- <dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
- <dt>transcripts_by_name(transcript_name)</dt>
- <dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
- <dt>transcript_names(contig=None, strand=None)</dt>
- <dd>Returns all transcript names in the annotation database.</dd>
- <dt>transcript_ids(contig=None, strand=None)</dt>
- <dd>Returns all transcript IDs in the annotation database.</dd>
- <dt>transcript_ids_of_gene_id(gene_id)</dt>
- <dd>Return IDs of all transcripts associated with given gene ID.</dd>
- <dt>transcript_ids_of_gene_name(gene_name)</dt>
- <dd>Return IDs of all transcripts associated with given gene name.</dd>
- <dt>transcript_ids_of_transcript_name(transcript_name)</dt>
- <dd>Find all Ensembl transcript IDs with the given name.</dd>
- <dt>transcript_ids_of_exon_id(exon_id)</dt>
- <dd>Return IDs of all transcripts associatd with given exon ID.</dd>
- </dl>
- ## Exons
- <dl>
- <dt>exon_ids(contig=None, strand=None)</dt>
- <dd>Returns a list of exons IDs in the annotation database, optionally restricted
- by the given chromosome and strand.</dd>
- <dt>exon_ids_of_gene_id(gene_id)</dt>
- <dd>Returns a list of exon IDs associated with a given gene ID.</dd>
- <dt>exon_ids_of_gene_name(gene_name)</dt>
- <dd>Returns a list of exon IDs associated with a given gene name.</dd>
- <dt>exon_ids_of_transcript_id(transcript_id)</dt>
- <dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
- <dt>exon_ids_of_transcript_name(transcript_name)</dt>
- <dd>Returns a list of exon IDs associated with a given transcript name.</dd>
- </dl>
Platform: UNKNOWN
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
@@ -223,3 +15,217 @@ Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Description-Content-Type: text/markdown
+License-File: LICENSE
+<a href="https://app.travis-ci.com/github/openvax/pyensembl">
+ <img src="https://app.travis-ci.com/openvax/pyensembl.svg?branch=master" alt="Build Status" />
+<a href="https://coveralls.io/github/openvax/pyensembl?branch=master">
+ <img src="https://coveralls.io/repos/openvax/pyensembl/badge.svg?branch=master&service=github" alt="Coverage Status" />
+<a href="https://pypi.python.org/pypi/pyensembl/">
+ <img src="https://img.shields.io/pypi/v/pyensembl.svg?maxAge=1000" alt="PyPI" />
+PyEnsembl is a Python interface to [Ensembl](http://www.ensembl.org) reference genome metadata such as exons and transcripts. PyEnsembl downloads [GTF](https://en.wikipedia.org/wiki/Gene_transfer_format) and [FASTA](https://en.wikipedia.org/wiki/FASTA_format) files from the [Ensembl FTP server](ftp://ftp.ensembl.org) and loads them into a local database. PyEnsembl can also work with custom reference data specified using user-supplied GTF and FASTA files.
+# Example Usage
+from pyensembl import EnsemblRelease
+# release 77 uses human reference genome GRCh38
+data = EnsemblRelease(77)
+# will return ['HLA-A']
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+# get all exons associated with HLA-A
+exon_ids = data.exon_ids_of_gene_name('HLA-A')
+# Installation
+You can install PyEnsembl using [pip](https://pip.pypa.io/en/latest/quickstart.html):
+pip install pyensembl
+This should also install any required packages, such as [datacache](https://github.com/openvax/datacache) and
+Before using PyEnsembl, run the following command to download and install
+Ensembl data:
+pyensembl install --release <list of Ensembl release numbers> --species <species-name>
+For example, `pyensembl install --release 75 76 --species human` will download and install all
+human reference data from Ensembl releases 75 and 76.
+Alternatively, you can create the `EnsemblRelease` object from inside a Python
+process and call `ensembl_object.download()` followed by `ensembl_object.index()`.
+## Cache Location
+By default, PyEnsembl uses the platform-specific `Cache` folder
+and caches the files into the `pyensembl` sub-directory.
+You can override this default by setting the environment key `PYENSEMBL_CACHE_DIR`
+as your preferred location for caching:
+export PYENSEMBL_CACHE_DIR=/custom/cache/dir
+import os
+os.environ['PYENSEMBL_CACHE_DIR'] = '/custom/cache/dir'
+# ... PyEnsembl API usage
+# Non-Ensembl Data
+PyEnsembl also allows arbitrary genomes via the specification
+of local file paths or remote URLs to both Ensembl and non-Ensembl GTF
+and FASTA files. (Warning: GTF formats can vary, and handling of
+non-Ensembl data is still very much in development.)
+For example:
+data = Genome(
+ reference_name='GRCh38',
+ annotation_name='my_genome_features',
+ gtf_path_or_url='/My/local/gtf/path_to_my_genome_features.gtf')
+# parse GTF and construct database of genomic features
+gene_names = data.gene_names_at_locus(contig=6, position=29945884)
+# API
+The `EnsemblRelease` object has methods to let you access all possible
+combinations of the annotation features *gene\_name*, *gene\_id*,
+*transcript\_name*, *transcript\_id*, *exon\_id* as well as the location of
+these genomic elements (contig, start position, end position, strand).
+## Genes
+<dt>genes(contig=None, strand=None)</dt>
+<dd>Returns a list of Gene objects, optionally restricted to a particular contig
+or strand.</dd>
+<dt>genes_at_locus(contig, position, end=None, strand=None)</dt>
+<dd>Returns a list of Gene objects overlapping a particular position on a contig,
+optionally extend into a range with the end parameter and restrict to
+forward or backward strand by passing strand='+' or strand='-'.</dd>
+<dd>Return a Gene object for given Ensembl gene ID (e.g. "ENSG00000068793").</dd>
+<dt>gene_names(contig=None, strand=None)</dt>
+<dd>Returns all gene names in the annotation database, optionally restricted
+to a particular contig or strand.</dd>
+<dd>Get all the unqiue genes with the given name (there might be multiple
+due to copies in the genome), return a list containing a Gene object for each
+distinct ID.</dd>
+<dd>Find Gene associated with the given Ensembl protein ID (e.g. "ENSP00000350283")</dd>
+<dt>gene_names_at_locus(contig, position, end=None, strand=None)
+<dd>Names of genes overlapping with the given locus, optionally restricted by strand.
+(returns a list to account for overlapping genes)</dd>
+<dd>Returns name of gene with given genen ID.</dd>
+</dt><dd>Returns name of gene associated with given transcript ID.</dd>
+<dd>Returns name of gene associated with given transcript name.</dd>
+</dt><dd>Returns name of gene associated with given exon ID.</dd>
+<dt>gene_ids(contig=None, strand=None)
+<dd>Return all gene IDs in the annotation database, optionally restricted by
+chromosome name or strand.</dd>
+<dd>Returns all Ensembl gene IDs with the given name.</dd>
+## Transcripts
+<dt>transcripts(contig=None, strand=None)</dt>
+<dd>Returns a list of Transcript objects for all transcript entries in the
+Ensembl database, optionally restricted to a particular contig or strand.</dd>
+<dd>Construct a Transcript object for given Ensembl transcript ID (e.g. "ENST00000369985")</dd>
+<dd>Returns a list of Transcript objects for every transcript matching the given name.</dd>
+<dt>transcript_names(contig=None, strand=None)</dt>
+<dd>Returns all transcript names in the annotation database.</dd>
+<dt>transcript_ids(contig=None, strand=None)</dt>
+<dd>Returns all transcript IDs in the annotation database.</dd>
+<dd>Return IDs of all transcripts associated with given gene ID.</dd>
+<dd>Return IDs of all transcripts associated with given gene name.</dd>
+<dd>Find all Ensembl transcript IDs with the given name.</dd>
+<dd>Return IDs of all transcripts associatd with given exon ID.</dd>
+## Exons
+<dt>exon_ids(contig=None, strand=None)</dt>
+<dd>Returns a list of exons IDs in the annotation database, optionally restricted
+by the given chromosome and strand.</dd>
+<dd>Construct an Exon object for given Ensembl exon ID (e.g. "ENSE00001209410")</dd>
+<dd>Returns a list of exon IDs associated with a given gene ID.</dd>
+<dd>Returns a list of exon IDs associated with a given gene name.</dd>
+<dd>Returns a list of exon IDs associated with a given transcript ID.</dd>
+<dd>Returns a list of exon IDs associated with a given transcript name.</dd>
@@ -40,7 +40,7 @@ from .species import (
from .transcript import Transcript
-__version__ = '1.9.1'
+__version__ = '1.9.4'
__all__ = [
@@ -13,7 +13,7 @@
from __future__ import print_function, division, absolute_import
def check_release_number(release):
@@ -177,8 +177,8 @@ mouse = Species.register(
synonyms=["mouse", "house mouse"],
"NCBIM37": (54, 67),
+ "GRCm38": (68, 102),
dog = Species.register(
@@ -418,7 +418,7 @@ class Transcript(LocusWithGenome):
Spliced cDNA sequence of transcript
(includes 5" UTR, coding sequence, and 3" UTR)
- return self.genome.transcript_sequences.get(self.id)
+ return self.genome.transcript_sequences.get(self.transcript_id.rsplit(".", 1)[0])
def first_start_codon_spliced_offset(self):
@@ -50,8 +50,9 @@ def test_all_transcript_ids(ensembl):
"Missing transcript ID %s from %s" % (transcript_id, ensembl)
def test_transcript_id_of_protein_id_CCR2():
- # looked up CCR2-201 transcript ID ENST00000292301 mapping to protein ID
- # ENSP00000292301 on Sept. 14th 2016 from GRCh38 Ensembl release 85
+ # Looked up on Oct 9 2021:
+ # CCR2-203 ENST00000445132.3 maps to ENSP00000399285.2
+ # Ensembl release 104, GRCh38.p13
transcript_id = grch38.transcript_id_of_protein_id(
- "ENSP00000292301")
- eq_("ENST00000292301", transcript_id)
+ "ENSP00000399285")
+ eq_("ENST00000445132", transcript_id)
View it on GitLab: https://salsa.debian.org/med-team/pyensembl/-/commit/5fe011317e77c7200bbd0938248482f6da6f523c
View it on GitLab: https://salsa.debian.org/med-team/pyensembl/-/commit/5fe011317e77c7200bbd0938248482f6da6f523c
You're receiving this email because of your account on salsa.debian.org.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://alioth-lists.debian.net/pipermail/debian-med-commit/attachments/20211011/9d239995/attachment-0001.htm>
More information about the debian-med-commit
mailing list