[med-svn] [r-bioc-ensembldb] 06/10: New upstream version 1.6.2
Andreas Tille
tille at debian.org
Tue Oct 3 07:24:12 UTC 2017
This is an automated email from the git hooks/post-receive script.
tille pushed a commit to branch master
in repository r-bioc-ensembldb.
commit 0f017c857db44b6dc5a4bb2e66e528a0b07511f7
Author: Andreas Tille <tille at debian.org>
Date: Tue Oct 3 09:14:14 2017 +0200
New upstream version 1.6.2
---
DESCRIPTION | 37 +
NAMESPACE | 63 +
R/Classes.R | 404 +++
R/EnsDbFromGTF.R | 1138 ++++++++
R/Generics.R | 147 +
R/Methods-Filter.R | 933 +++++++
R/Methods.R | 1758 ++++++++++++
R/dbhelpers.R | 589 ++++
R/functions-utils.R | 83 +
R/loadEnsDb.R | 5 +
R/makeEnsemblDbPackage.R | 213 ++
R/runEnsDbApp.R | 10 +
R/select-methods.R | 319 +++
R/seqname-utils.R | 258 ++
R/zzz.R | 15 +
build/vignette.rds | Bin 0 -> 325 bytes
debian/README.test | 13 -
debian/changelog | 22 -
debian/compat | 1 -
debian/control | 31 -
debian/copyright | 115 -
debian/docs | 1 -
debian/rules | 21 -
debian/source/format | 1 -
debian/tests/control | 3 -
debian/tests/run-unit-test | 5 -
debian/watch | 3 -
inst/NEWS | 488 ++++
inst/YGRanges.RData | Bin 0 -> 47220 bytes
inst/chrY/ens_chromosome.txt | 2 +
inst/chrY/ens_exon.txt | 2700 +++++++++++++++++++
inst/chrY/ens_gene.txt | 496 ++++
inst/chrY/ens_metadata.txt | 12 +
inst/chrY/ens_tx.txt | 732 +++++
inst/chrY/ens_tx2exon.txt | 3745 ++++++++++++++++++++++++++
inst/doc/MySQL-backend.R | 30 +
inst/doc/MySQL-backend.Rmd | 74 +
inst/doc/MySQL-backend.html | 209 ++
inst/doc/ensembldb.R | 386 +++
inst/doc/ensembldb.Rmd | 920 +++++++
inst/doc/ensembldb.html | 1300 +++++++++
inst/perl/get_gene_transcript_exon_tables.pl | 278 ++
inst/pkg-template/DESCRIPTION | 15 +
inst/pkg-template/NAMESPACE | 9 +
inst/pkg-template/R/zzz.R | 18 +
inst/pkg-template/man/package.Rd | 40 +
inst/shinyHappyPeople/server.R | 242 ++
inst/shinyHappyPeople/ui.R | 154 ++
inst/test/testFunctionality.R | 293 ++
inst/test/testInternals.R | 146 +
inst/txt/ENST00000200135.fa.gz | Bin 0 -> 13986 bytes
inst/txt/ENST00000335953.fa.gz | Bin 0 -> 60063 bytes
inst/unitTests/test_Filters.R | 241 ++
inst/unitTests/test_Functionality.R | 507 ++++
inst/unitTests/test_GFF.R | 179 ++
inst/unitTests/test_GRangeFilter.R | 102 +
inst/unitTests/test_SymbolFilter.R | 58 +
inst/unitTests/test_buildEdb.R | 45 +
inst/unitTests/test_getGenomeFaFile.R | 49 +
inst/unitTests/test_get_sequence.R | 189 ++
inst/unitTests/test_mysql.R | 24 +
inst/unitTests/test_ordering.R | 280 ++
inst/unitTests/test_performance.R | 62 +
inst/unitTests/test_returnCols.R | 319 +++
inst/unitTests/test_select.R | 229 ++
inst/unitTests/test_transcript_lengths.R | 140 +
inst/unitTests/test_ucscChromosomeNames.R | 508 ++++
inst/unitTests/test_validity.R | 11 +
inst/unitTests/test_xByOverlap.R | 102 +
man/EnsDb-AnnotationDbi.Rd | 223 ++
man/EnsDb-class.Rd | 368 +++
man/EnsDb-exonsBy.Rd | 568 ++++
man/EnsDb-lengths.Rd | 110 +
man/EnsDb-seqlevels.Rd | 149 +
man/EnsDb-sequences.Rd | 118 +
man/EnsDb-utils.Rd | 118 +
man/EnsDb.Rd | 50 +
man/GeneidFilter-class.Rd | 451 ++++
man/SeqendFilter.Rd | 237 ++
man/listEnsDbs.Rd | 53 +
man/makeEnsemblDbPackage.Rd | 311 +++
man/runEnsDbApp.Rd | 41 +
man/useMySQL-EnsDb-method.Rd | 56 +
tests/runTests.R | 1 +
vignettes/MySQL-backend.Rmd | 74 +
vignettes/MySQL-backend.org | 88 +
vignettes/ensembldb.Rmd | 920 +++++++
vignettes/ensembldb.org | 1369 ++++++++++
vignettes/images/dblayout.png | Bin 0 -> 444031 bytes
vignettes/issues.org | 183 ++
90 files changed, 26794 insertions(+), 216 deletions(-)
diff --git a/DESCRIPTION b/DESCRIPTION
new file mode 100644
index 0000000..094a900
--- /dev/null
+++ b/DESCRIPTION
@@ -0,0 +1,37 @@
+Package: ensembldb
+Type: Package
+Title: Utilities to create and use an Ensembl based annotation database
+Version: 1.6.2
+Author: Johannes Rainer <johannes.rainer at eurac.edu>,
+ Tim Triche <tim.triche at usc.edu>
+Maintainer: Johannes Rainer <johannes.rainer at eurac.edu>
+URL: https://github.com/jotsetung/ensembldb
+BugReports: https://github.com/jotsetung/ensembldb/issues
+Imports: methods, RSQLite, DBI, Biobase, GenomeInfoDb, AnnotationDbi
+ (>= 1.31.19), rtracklayer, S4Vectors, AnnotationHub, Rsamtools,
+ IRanges
+Depends: BiocGenerics (>= 0.15.10), GenomicRanges (>= 1.23.21),
+ GenomicFeatures (>= 1.23.18)
+Suggests: BiocStyle, knitr, rmarkdown, EnsDb.Hsapiens.v75 (>= 0.99.7),
+ RUnit, shiny, Gviz, BSgenome.Hsapiens.UCSC.hg19
+Enhances: RMySQL
+VignetteBuilder: knitr
+Description: The package provides functions to create and use
+ transcript centric annotation databases/packages. The
+ annotation for the databases are directly fetched from Ensembl
+ using their Perl API. The functionality and data is similar to
+ that of the TxDb packages from the GenomicFeatures package,
+ but, in addition to retrieve all gene/transcript models and
+ annotations from the database, the ensembldb package provides
+ also a filter framework allowing to retrieve annotations for
+ specific entries like genes encoded on a chromosome region or
+ transcript models of lincRNA genes.
+Collate: Classes.R Generics.R functions-utils.R dbhelpers.R Methods.R
+ Methods-Filter.R loadEnsDb.R makeEnsemblDbPackage.R
+ EnsDbFromGTF.R runEnsDbApp.R select-methods.R seqname-utils.R
+ zzz.R
+biocViews: Genetics, AnnotationData, Sequencing, Coverage
+License: LGPL
+RoxygenNote: 5.0.1
+NeedsCompilation: no
+Packaged: 2016-11-17 00:52:31 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
new file mode 100644
index 0000000..02aa5b2
--- /dev/null
+++ b/NAMESPACE
@@ -0,0 +1,63 @@
+## ensembldb NAMESPACE
+import(methods)
+
+importFrom("utils", "read.table", "str")
+import(BiocGenerics)
+import(S4Vectors)
+importFrom(DBI, dbDriver)
+importFrom(Biobase, createPackage)
+importFrom(GenomeInfoDb, Seqinfo, isCircular, genome, seqlengths, seqnames, seqlevels,
+ keepSeqlevels, seqlevelsStyle, "seqlevelsStyle<-", genomeStyles)
+importMethodsFrom(AnnotationDbi, dbconn, columns, keytypes, keys, select, mapIds)
+importFrom(rtracklayer, import)
+import(RSQLite)
+import(GenomicFeatures)
+##importMethodsFrom(GenomicFeatures, extractTranscriptSeqs)
+import(GenomicRanges)
+importFrom(IRanges, IRanges)
+importMethodsFrom(IRanges,subsetByOverlaps)
+## AnnotationHub
+importFrom(AnnotationHub, AnnotationHub)
+importClassesFrom(AnnotationHub, AnnotationHub)
+importMethodsFrom(AnnotationHub, query, mcols)
+## Rsamtools
+importClassesFrom(Rsamtools, FaFile, RsamtoolsFile)
+importFrom(Rsamtools, FaFile)
+importMethodsFrom(Rsamtools, getSeq, indexFa, path)
+importFrom(Rsamtools, index)
+
+## biovizBase
+##importMethodsFrom(biovizBase, crunch)
+
+#exportPattern("^[[:alpha:]]+")
+export(fetchTablesFromEnsembl, makeEnsemblSQLiteFromTables, makeEnsembldbPackage,
+ ensDbFromGtf, ensDbFromGff, ensDbFromGRanges, ensDbFromAH, runEnsDbApp,
+ listEnsDbs)
+exportClasses(EnsDb, BasicFilter, EntrezidFilter, GeneidFilter, GenebiotypeFilter,
+ GenenameFilter, TxidFilter, TxbiotypeFilter, ExonidFilter,
+ SeqnameFilter, SeqstrandFilter, SeqstartFilter, SeqendFilter,
+ GRangesFilter, ExonrankFilter, SymbolFilter)
+## for EnsFilter
+exportMethods(column, print, show, value, where, "condition<-", "value<-",
+ seqnames, start, end, strand, seqlevels)
+## for class EnsDb:
+exportMethods(dbconn, condition, buildQuery, ensemblVersion, exons, exonsBy, genes,
+ getGenomeFaFile, lengthOf, listColumns, listGenebiotypes, listTxbiotypes,
+ listTables, organism, seqinfo, toSAF, transcripts, transcriptsBy,
+ disjointExons, metadata, promoters, cdsBy, fiveUTRsByTranscript,
+ threeUTRsByTranscript, getGeneRegionTrackForGviz, updateEnsDb,
+ transcriptsByOverlaps, exonsByOverlaps, returnFilterColumns,
+ "returnFilterColumns<-", useMySQL)
+## Methods for AnnotationDbi
+exportMethods(columns, keytypes, keys, select, mapIds)
+## Methods for GenomeInfoDb and related stuff
+exportMethods("seqlevelsStyle", "seqlevelsStyle<-", "supportedSeqlevelsStyles",
+ seqlevels)
+
+## constructors
+export(EntrezidFilter, GeneidFilter, GenenameFilter, GenebiotypeFilter, TxidFilter,
+ TxbiotypeFilter, ExonidFilter, SeqnameFilter, SeqstrandFilter, SeqstartFilter,
+ SeqendFilter, EnsDb, GRangesFilter, ExonrankFilter, SymbolFilter)
+
+
+
diff --git a/R/Classes.R b/R/Classes.R
new file mode 100644
index 0000000..ffa266a
--- /dev/null
+++ b/R/Classes.R
@@ -0,0 +1,404 @@
+##***********************************************************************
+##
+## EnsBb classes
+##
+## Main class providing access and functionality for the database.
+##
+##***********************************************************************
+setClass("EnsDb",
+ representation(ensdb="DBIConnection", tables="list", .properties="list"),
+ prototype=list(ensdb=NULL, tables=list(), .properties=list())
+ )
+
+
+##***********************************************************************
+##
+## BasicFilter classes
+##
+## Allow to filter the results fetched from the database.
+##
+## gene:
+## - GeneidFilter
+## - GenebiotypeFilter
+## - GenenameFilter
+## - EntrezidFilter
+##
+## transcript:
+## - TxidFilter
+## - TxbiotypeFilter
+##
+## exon:
+## - ExonidFilter
+##
+## chrom position (using info from exon):
+## - SeqnameFilter
+## - SeqstartFilter
+## - SeqendFilter
+## - SeqstrandFilter
+## alternative: GRangesFilter. See below.
+##
+##***********************************************************************
+setClass("BasicFilter",
+ representation(
+ "VIRTUAL",
+ condition="character",
+ value="character",
+ .valueIsCharacter="logical"
+ ),
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+
+## Table gene
+## filter for gene_id
+setClass("GeneidFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+GeneidFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("GeneidFilter", condition=condition, value=as.character(value)))
+}
+## filter for gene_biotype
+setClass("GenebiotypeFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+GenebiotypeFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("GenebiotypeFilter", condition=condition, value=as.character(value)))
+}
+## filter for gene_name
+setClass("GenenameFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+GenenameFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("GenenameFilter", condition=condition, value=as.character(value)))
+}
+## filter for entrezid
+setClass("EntrezidFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+EntrezidFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("EntrezidFilter", condition=condition, value=as.character(value)))
+}
+
+
+## Table transcript
+## filter for tx_id
+setClass("TxidFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+TxidFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("TxidFilter", condition=condition, value=as.character(value)))
+}
+## filter for gene_biotype
+setClass("TxbiotypeFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+TxbiotypeFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("TxbiotypeFilter", condition=condition, value=as.character(value)))
+}
+
+## Table exon
+## filter for exon_id
+setClass("ExonidFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+ExonidFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("ExonidFilter", condition=condition, value=as.character(value)))
+}
+
+## Table tx2exon
+## filter for exon_idx
+setClass("ExonrankFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=FALSE
+ )
+ )
+ExonrankFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(any(is.na(as.numeric(value))))
+ stop("Argument 'value' has to be numeric!")
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("ExonrankFilter", condition=condition, value=as.character(value)))
+}
+
+
+## chromosome positions
+## basic chromosome/seqname filter.
+setClass("SeqnameFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=TRUE
+ )
+ )
+## builder...
+SeqnameFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ if(condition=="=")
+ condition="in"
+ if(condition=="!=")
+ condition="not in"
+ }
+ return(new("SeqnameFilter", condition=condition, value=as.character(value)))
+}
+
+## basic chromosome strand filter.
+setClass("SeqstrandFilter", contains="BasicFilter",
+ prototype=list(
+ condition="=",
+ value="",
+ .valueIsCharacter=FALSE
+ )
+ )
+## builder...
+SeqstrandFilter <- function(value, condition="="){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ ## checking value: should be +, -, will however be translated to -1, 1
+ if(class(value)=="character"){
+ value <- match.arg(value, c("1", "-1", "+1", "-", "+"))
+ if(value=="-")
+ value <- "-1"
+ if(value=="+")
+ value <- "+1"
+ ## OK, now transforming to number
+ value <- as.numeric(value)
+ }
+ if(!(value==1 | value==-1))
+ stop("The strand has to be either 1 or -1 (or \"+\" or \"-\")")
+ return(new("SeqstrandFilter", condition=condition, value=as.character(value)))
+}
+
+## chromstart filter
+setClass("SeqstartFilter", contains="BasicFilter",
+ representation(
+ feature="character"
+ ),
+ prototype=list(
+ condition=">",
+ value="",
+ .valueIsCharacter=FALSE,
+ feature="gene"
+ )
+ )
+SeqstartFilter <- function(value, condition="=", feature="gene"){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ value <- value[ 1 ]
+ warning("Multiple values provided, but only the first (", value,") will be considered")
+ }
+ return(new("SeqstartFilter", condition=condition, value=as.character(value),
+ feature=feature))
+}
+
+## chromend filter
+setClass("SeqendFilter", contains="BasicFilter",
+ representation(
+ feature="character"
+ ),
+ prototype=list(
+ condition="<",
+ value="",
+ .valueIsCharacter=FALSE,
+ feature="gene"
+ )
+ )
+SeqendFilter <- function(value, condition="=", feature="gene"){
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1){
+ value <- value[ 1 ]
+ warning("Multiple values provided, but only the first (", value,") will be considered")
+ }
+ return(new("SeqendFilter", condition=condition, value=as.character(value),
+ feature=feature))
+}
+
+
+###============================================================
+## GRangesFilter
+## adding new arguments since we can not overwrite the data type
+## of the BasicFilter class... unfortunately.
+## + grange <- value
+## + location <- condition
+###------------------------------------------------------------
+setClass("GRangesFilter", contains="BasicFilter",
+ representation(grange="GRanges",
+ feature="character",
+ location="character"),
+ prototype=list(
+ grange=GRanges(),
+ .valueIsCharacter=FALSE,
+ condition="=",
+ location="within",
+ feature="gene",
+ value=""
+ ))
+## Constructor
+GRangesFilter <- function(value, condition="within", feature="gene"){
+ if(missing(value))
+ stop("No value provided for the filter!")
+ if(!is(value, "GRanges"))
+ stop("'value' has to be a GRanges object!")
+ if(length(value) == 0)
+ stop("No value provided for the filter!")
+ ## if(length(value) > 1){
+ ## warning(paste0("GRanges in 'value' has length ", length(value),
+ ## "! Using only the first element!"))
+ ## value <- value[1]
+ ## }
+ grf <- new("GRangesFilter", grange=value, location=condition,
+ feature=feature)
+ ##validObject(grf)
+ return(grf)
+}
+###------------------------------------------------------------
+
+
+###============================================================
+## SymbolFilter
+###------------------------------------------------------------
+setClass("SymbolFilter", contains = "BasicFilter",
+ prototype = list(
+ condition = "=",
+ value = "",
+ .valueIsCharacter = TRUE
+ )
+ )
+SymbolFilter <- function(value, condition = "=") {
+ if(missing(value)){
+ stop("A filter without a value makes no sense!")
+ }
+ if(length(value) > 1) {
+ if(condition == "=")
+ condition = "in"
+ if(condition == "!=")
+ condition = "not in"
+ }
+ return(new("SymbolFilter", condition = condition,
+ value = as.character(value)))
+}
+
+############################################################
+## OnlyCodingTx
+##
+## That's a special case filter that just returns transcripts
+## that have tx_cds_seq_start defined (i.e. not NULL).
+setClass("OnlyCodingTx", contains = "BasicFilter",
+ prototype = list(
+ condition = "=",
+ value = "",
+ .valueIsCharacter = TRUE
+ ))
+OnlyCodingTx <- function() {
+ return(new("OnlyCodingTx"))
+}
diff --git a/R/EnsDbFromGTF.R b/R/EnsDbFromGTF.R
new file mode 100644
index 0000000..c4eb887
--- /dev/null
+++ b/R/EnsDbFromGTF.R
@@ -0,0 +1,1138 @@
+####
+## function to create a EnsDb object (or rather the SQLite database) from
+## a Ensembl GTF file.
+## Limitation:
+## + There is no way to get the Entrezgene ID from this file.
+## + Assuming that the element 2 in a row for a transcript represents its biotype, since
+## there is no explicit key transcript_biotype in element 9.
+## + The CDS features in the GTF are somewhat problematic, while we're used to get just the
+## coding start and end for a transcript from the Ensembl perl API, here we get the coding
+## start and end for each exon.
+ensDbFromGtf <- function(gtf, outfile, path, organism, genomeVersion, version){
+ options(useFancyQuotes=FALSE)
+ message("Importing GTF file...", appendLF=FALSE)
+ ## wanted.features <- c("gene", "transcript", "exon", "CDS")
+ wanted.features <- c("exon")
+ ## GTF <- import(con=gtf, format="gtf", feature.type=wanted.features)
+ GTF <- import(con=gtf, format="gtf")
+ message("OK")
+ ## check what we've got...
+ ## all wanted features?
+ if(any(!(wanted.features %in% levels(GTF$type)))){
+ stop(paste0("One or more required types are not in the gtf file. Need ",
+ paste(wanted.features, collapse=","), " but got only ",
+ paste(wanted.features[wanted.features %in% levels(GTF$type)], collapse=","),
+ "."))
+ }
+ ## transcript biotype?
+ if(any(colnames(mcols(GTF))=="transcript_biotype")){
+ txBiotypeCol <- "transcript_biotype"
+ }else{
+ ## that's a little weird, but it seems that certain gtf files from Ensembl
+ ## provide the transcript biotype in the element "source"
+ txBiotypeCol <- "source"
+ }
+ ## processing the metadata:
+ ## first read the header...
+ tmp <- readLines(gtf, n=10)
+ tmp <- tmp[grep(tmp, pattern="^#")]
+ haveHeader <- FALSE
+ if(length(tmp) > 0){
+ ##message("GTF file has a header.")
+ tmp <- gsub(tmp, pattern="^#", replacement="")
+ tmp <- gsub(tmp, pattern="^!", replacement="")
+ Header <- do.call(rbind, strsplit(tmp, split=" ", fixed=TRUE))
+ colnames(Header) <- c("name", "value")
+ haveHeader <- TRUE
+ }
+ ## Check parameters
+ Parms <- .checkExtractVersions(gtf, organism, genomeVersion, version)
+ ensemblVersion <- Parms["version"]
+ organism <- Parms["organism"]
+ genomeVersion <- Parms["genomeVersion"]
+
+ if(haveHeader){
+ if(genomeVersion!=Header[Header[, "name"] == "genome-version", "value"]){
+ stop(paste0("The GTF file name is not as expected: <Organism>.",
+ "<genome version>.<Ensembl version>.gtf!",
+ " I've got genome version ", genomeVersion,
+ " but in the header of the GTF file ",
+ Header[Header[, "name"] == "genome-version", "value"],
+ " is specified!"))
+ }
+ }
+
+ GTF <- fixCDStypeInEnsemblGTF(GTF)
+ ## here on -> call ensDbFromGRanges.
+ dbname <- ensDbFromGRanges(GTF, outfile=outfile, path=path, organism=organism,
+ genomeVersion=genomeVersion, version=ensemblVersion)
+
+ gtfFilename <- unlist(strsplit(gtf, split=.Platform$file.sep))
+ gtfFilename <- gtfFilename[length(gtfFilename)]
+ ## updating the Metadata information...
+ lite <- dbDriver("SQLite")
+ con <- dbConnect(lite, dbname = dbname )
+ bla <- dbGetQuery(con, paste0("update metadata set value='",
+ gtfFilename,
+ "' where name='source_file';"))
+ dbDisconnect(con)
+ return(dbname)
+}
+
+####============================================================
+## fixCDStypeInEnsemblGTF
+##
+## Takes an GRanges object as input and returns a GRanges object in
+## which the feature type stop_codon and start_codon is replaced by
+## feature type CDS. This is to fix a potential problem (bug?) in
+## GTF files from Ensembl, in which the stop_codon or start_codon for
+## some transcripts is outside of the CDS.
+####------------------------------------------------------------
+fixCDStypeInEnsemblGTF <- function(x){
+ if(any(unique(x$type) %in% c("start_codon", "stop_codon"))){
+ x$type[x$type %in% c("start_codon", "stop_codon")] <- "CDS"
+ }
+ return(x)
+}
+
+####============================================================
+## ensDbFromAH
+##
+## Retrieve a GTF file from AnnotationHub and build a EnsDb object from that.
+##
+####------------------------------------------------------------
+ensDbFromAH <- function(ah, outfile, path, organism, genomeVersion, version){
+ options(useFancyQuotes=FALSE)
+ ## Input checking...
+ if(!is(ah, "AnnotationHub"))
+ stop("Argument 'ah' has to be a (single) AnnotationHub object.")
+ if(length(ah) != 1)
+ stop("Argument 'ah' has to be a single AnnotationHub resource!")
+ if(tolower(ah$dataprovider) != "ensembl")
+ stop("Can only process GTF files provided by Ensembl!")
+ if(tolower(ah$sourcetype) != "gtf")
+ stop("Resource is not a GTF file!")
+ ## Check parameters
+ Parms <- .checkExtractVersions(ah$title, organism, genomeVersion, version)
+ ensFromAH <- Parms["version"]
+ orgFromAH <- Parms["organism"]
+ genFromAH <- Parms["genomeVersion"]
+ gtfFilename <- ah$title
+ message("Fetching data ...", appendLF=FALSE)
+ suppressMessages(
+ gff <- ah[[1]]
+ )
+ message("OK")
+ message(" -------------")
+ message("Proceeding to create the database.")
+
+ gff <- fixCDStypeInEnsemblGTF(gff)
+ ## Proceed.
+ dbname <- ensDbFromGRanges(gff, outfile=outfile, path=path, organism=orgFromAH,
+ genomeVersion=genFromAH, version=ensFromAH)
+ ## updating the Metadata information...
+ lite <- dbDriver("SQLite")
+ con <- dbConnect(lite, dbname = dbname )
+ bla <- dbGetQuery(con, paste0("update metadata set value='",
+ gtfFilename,
+ "' where name='source_file';"))
+ dbDisconnect(con)
+ return(dbname)
+}
+
+.checkExtractVersions <- function(filename, organism, genomeVersion, version){
+ if(isEnsemblFileName(filename)){
+ ensFromFile <- ensemblVersionFromGtfFileName(filename)
+ orgFromFile <- organismFromGtfFileName(filename)
+ genFromFile <- genomeVersionFromGtfFileName(filename)
+ }else{
+ ensFromFile <- NA
+ orgFromFile <- NA
+ genFromFile <- NA
+ if(missing(organism) | missing(genomeVersion) | missing(version))
+ stop("The file name does not match the expected naming scheme of Ensembl",
+ " files hence I cannot extract any information from it! Parameters",
+ " 'organism', 'genomeVersion' and 'version' are thus required!")
+ }
+ ## Do some more testing with versions provided from the user.
+ if(!missing(organism)){
+ if(!is.na(orgFromFile)){
+ if(organism != orgFromFile){
+ warning("User specified organism (", organism, ") is different to the one extracted",
+ " from the file name (", orgFromFile, ")! Using the one defined by the user.")
+ }
+ }
+ orgFromFile <- organism
+ }
+ if(!missing(genomeVersion)){
+ if(!is.na(genFromFile)){
+ if(genomeVersion != genFromFile){
+ warning("User specified genome version (", genomeVersion, ") is different to the one extracted",
+ " from the file name (", genFromFile, ")! Using the one defined by the user.")
+ }
+ }
+ genFromFile <- genomeVersion
+ }
+ if(!missing(version)){
+ if(!is.na(ensFromFile)){
+ if(version != ensFromFile){
+ warning("User specified Ensembl version (", version, ") is different to the one extracted",
+ " from the file name (", ensFromFile, ")! Using the one defined by the user.")
+ }
+ }
+ ensFromFile <- version
+ }
+ res <- c(orgFromFile, genFromFile, ensFromFile)
+ names(res) <- c("organism", "genomeVersion", "version")
+ return(res)
+}
+
+
+
+####============================================================
+##
+## ensDbFromGff
+##
+####------------------------------------------------------------
+ensDbFromGff <- function(gff, outfile, path, organism, genomeVersion, version){
+ options(useFancyQuotes=FALSE)
+
+ ## Check parameters
+ Parms <- .checkExtractVersions(gff, organism, genomeVersion, version)
+ ensFromFile <- Parms["version"]
+ orgFromFile <- Parms["organism"]
+ genFromFile <- Parms["genomeVersion"]
+ ## Reading some info from the header.
+ tmp <- readLines(gff, n=500)
+ if(length(grep(tmp[1], pattern="##gff-version")) == 0)
+ stop("File ", gff, " does not seem to be a correct GFF file! ",
+ "The ##gff-version line is missing!")
+ gffVersion <- unlist(strsplit(tmp[1], split="[ ]+"))[2]
+ if(gffVersion != "3")
+ stop("This function supports only GFF version 3 files!")
+ tmp <- tmp[grep(tmp, pattern="^#!")]
+ if(length(tmp) > 0){
+ tmp <- gsub(tmp, pattern="^#!", replacement="")
+ Header <- do.call(rbind, strsplit(tmp, split="[ ]+"))
+ colnames(Header) <- c("name", "value")
+ if(any(Header[, "name"] == "genome-version")){
+ genFromHeader <- Header[Header[, "name"] == "genome-version", "value"]
+ if(genFromHeader != genFromFile){
+ warning("Genome version extracted from file name (", genFromFile,
+ ") does not match the genome version specified inside the file (",
+ genFromHeader, "). Will consider the one defined inside the file.")
+ genFromFile <- genFromHeader
+ }
+ }
+ }
+
+ message("Importing GFF...", appendLF=FALSE)
+ suppressWarnings(
+ theGff <- import(gff, format=paste0("gff", gffVersion))
+ )
+ message("OK")
+ ## Works with Ensembl 83; eventually not for updated Ensembl gff files!
+
+ ## what seems a little strange: exons have an ID of NA.
+ ## Ensembl specific fields: gene_id, transcript_id, exon_id, rank, biotype.
+ ## GFF3 fields: type, ID, Name, Parent
+ ## check columns and subset...
+ gffcols <- c("type", "ID", "Name", "Parent")
+ if(!all(gffcols %in% colnames(mcols(theGff))))
+ stop("Required columns/fields ",
+ paste(gffcols[!(gffcols %in% colnames(mcols(theGff)))], collapse=";"),
+ " not present in the GFF file!")
+ enscols <- c("gene_id", "transcript_id", "exon_id", "rank", "biotype")
+ if(!all(enscols %in% colnames(mcols(theGff))))
+ stop("Required columns/fields ",
+ paste(enscols[!(enscols %in% colnames(mcols(theGff)))], collapse=";"),
+ " not present in the GFF file!")
+ ## Subsetting to eventually speed up further processing.
+ theGff <- theGff[, c(gffcols, enscols)]
+ ## Renaming and fixing some columns:
+ CN <- colnames(mcols(theGff))
+ colnames(mcols(theGff))[CN == "Name"] <- "gene_name"
+ colnames(mcols(theGff))[CN == "biotype"] <- "gene_biotype"
+ colnames(mcols(theGff))[CN == "rank"] <- "exon_number"
+ theGff$transcript_biotype <- theGff$gene_biotype
+
+ ## Processing that stuff...
+ ## Replace the ID format type:ID.
+ ids <- strsplit(theGff$ID, split=":")
+ message("Fixing IDs...", appendLF=FALSE)
+ ## For those that have length > 1 use the second element.
+ theGff$ID <- unlist(lapply(ids, function(z){
+ if(length(z) > 1)
+ return(z[2])
+ return(z)
+ }))
+ message("OK")
+ ## Process genes...
+ message("Processing genes...", appendLF=FALSE)
+ ## Bring the GFF into the correct format for EnsDb/ensDbFromGRanges.
+ idx <- which(!is.na(theGff$gene_id))
+ theGff$type[idx] <- "gene"
+ message("OK")
+
+ ## ## Can not use the lengths of chromosomes provided in the chromosome features!!!
+ ## ## For whatever reasons chromosome Y length is incorrect!!!
+ ## message("Processing seqinfo...", appendLF=FALSE)
+ ## SI <- seqinfo(theGff)
+ ## tmp <- theGff[theGff$ID %in% seqlevels(SI)]
+ ## ## Check if we've got length for all.
+ ## message("OK")
+
+ ## Process transcripts...
+ message("Processing transcripts...", appendLF=FALSE)
+ idx <- which(!is.na(theGff$transcript_id))
+ ## Check if I've got multiple parents...
+ parentGenes <- theGff$Parent[idx]
+ if(any(lengths(parentGenes) > 1))
+ stop("Transcripts with multiple parents in GFF element 'Parent' not (yet) supported!")
+ theGff$type[idx] <- "transcript"
+ ## Setting the gene_id for these guys...
+ theGff$gene_id[idx] <- unlist(sub(parentGenes, pattern="gene:", replacement="", fixed=TRUE))
+ ## The CDS:
+ idx <- which(theGff$type == "CDS")
+ parentTx <- theGff$Parent[idx]
+ if(any(lengths(parentTx) > 1))
+ stop("CDS with multiple parent transcripts in GFF element 'Parent' not (yet) supported!")
+ theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:", replacement="", fixed=TRUE))
+ message("OK")
+
+ message("Processing exons...", appendLF=FALSE)
+ idx <- which(!is.na(theGff$exon_id))
+ parentTx <- theGff$Parent[idx]
+ if(any(lengths(parentTx) > 1))
+ stop("Exons with multiple parent transcripts in GFF element 'Parent' not (yet) supported!")
+ theGff$transcript_id[idx] <- unlist(sub(parentTx, pattern="transcript:", replacement="", fixed=TRUE))
+ message("OK")
+
+ theGff <- theGff[theGff$type %in% c("gene", "transcript", "exon", "CDS")]
+ theGff <- keepSeqlevels(theGff, as.character(unique(seqnames(theGff))))
+ ## Now we can proceed and pass that to the next function!
+
+ message(" -------------")
+ message("Proceeding to create the database.")
+
+ ## Proceed.
+ dbname <- ensDbFromGRanges(theGff, outfile=outfile, path=path, organism=orgFromFile,
+ genomeVersion=genFromFile, version=ensFromFile)
+
+ gtfFilename <- unlist(strsplit(gff, split=.Platform$file.sep))
+ gtfFilename <- gtfFilename[length(gtfFilename)]
+ ## updating the Metadata information...
+ lite <- dbDriver("SQLite")
+ con <- dbConnect(lite, dbname = dbname )
+ bla <- dbGetQuery(con, paste0("update metadata set value='",
+ gtfFilename,
+ "' where name='source_file';"))
+ dbDisconnect(con)
+ return(dbname)
+}
+
+
+
+#### build a EnsDb SQLite database from the GRanges.
+## we can however not get all of the information from the GRanges (yet), for example,
+## the seqinfo might not be available in all GRanges objects. Also, there is no way
+## we can guess the organism or the Ensembl version from the GRanges, thus, this
+## information has to be provided by the user.
+## x: the GRanges object or file name. If file name, the function tries to guess
+## the organism, genome build and ensembl version from the file name, if not
+## provided.
+##
+ensDbFromGRanges <- function(x, outfile, path, organism, genomeVersion, version){
+ if(!is(x, "GRanges"))
+ stop("This method can only be called on GRanges objects!")
+ ## check for missing parameters
+ if(missing(organism)){
+ stop("The organism has to be specified (e.g. using organism=\"Homo_sapiens\")")
+ }
+ if(missing(version)){
+ stop("The Ensembl version has to be specified!")
+ }
+
+ ## checking the seqinfo in the GRanges object...
+ Seqinfo <- seqinfo(x)
+ fetchSeqinfo <- FALSE
+ ## check if we've got some information...
+ if(any(is.na(seqlengths(Seqinfo)))){
+ fetchSeqinfo <- TRUE ## means we have to fetch the seqinfo ourselfs...
+ }
+ if(missing(genomeVersion)){
+ ## is there a seqinfo in x that I could use???
+ if(!fetchSeqinfo){
+ genomeVersion <- unique(genome(Seqinfo))
+ if(is.na(genomeVersion) | length(genomeVersion) > 1){
+ stop(paste0("The genome version has to be specified as",
+ " it can not be extracted from the seqinfo!"))
+ }
+ }else{
+ stop("The genome version has to be specified!")
+ }
+ }
+ if(missing(outfile)){
+ ## use the organism, genome version and ensembl version as the file name.
+ outfile <- paste0(c(organism, genomeVersion, version, "sqlite"), collapse=".")
+ if(missing(path))
+ path <- "."
+ dbname <- paste0(path, .Platform$file.sep, outfile)
+ }else{
+ if(!missing(path))
+ warning("outfile specified, thus I will discard the path argument.")
+ dbname <- outfile
+ }
+
+ ## that's quite some hack
+ ## transcript biotype?
+ if(any(colnames(mcols(x))=="transcript_biotype")){
+ txBiotypeCol <- "transcript_biotype"
+ }else{
+ ## that's a little weird, but it seems that certain gtf files from Ensembl
+ ## provide the transcript biotype in the element "source"
+ txBiotypeCol <- "source"
+ }
+
+ con <- dbConnect(dbDriver("SQLite"), dbname=dbname)
+ on.exit(dbDisconnect(con))
+ ## ----------------------------
+ ## metadata table:
+ message("Processing metadata...", appendLF=FALSE)
+ Metadata <- buildMetadata(organism, version, host="unknown",
+ sourceFile="GRanges object", genomeVersion=genomeVersion)
+ dbWriteTable(con, name="metadata", Metadata, overwrite=TRUE, row.names=FALSE)
+ message("OK")
+ ## Check if we've got column "type"
+ if(!any(colnames(mcols(x)) == "type"))
+ stop("The GRanges object lacks the required column 'type', sorry.")
+ gotTypes <- as.character(unique(x$type))
+ gotColumns <- colnames(mcols(x))
+ ## ----------------------------
+ ##
+ ## process genes
+ ## we're lacking NCBI Entrezids and also the coord system, but these are not
+ ## required columns anyway...
+ message("Processing genes...")
+ ## want to have: gene_id, gene_name, entrezid, gene_biotype, gene_seq_start,
+ ## gene_seq_end, seq_name, seq_strand, seq_coord_system.
+ wouldBeNice <- c("gene_id", "gene_name", "entrezid", "gene_biotype")
+ dontHave <- wouldBeNice[!(wouldBeNice %in% gotColumns)]
+ haveGot <- wouldBeNice[wouldBeNice %in% gotColumns]
+ ## Just really require the gene_id...
+ reqCols <- c("gene_id")
+ if(length(dontHave) > 0){
+ mess <- paste0(" I'm missing column(s): ", paste0(sQuote(dontHave), collapse=","),
+ ".")
+ warning(mess, " The corresponding database column(s) will be empty!")
+ }
+ message(" Attribute availability:", appendLF=TRUE)
+ for(i in 1:length(wouldBeNice)){
+ message(" o ", wouldBeNice[i], "...",
+ ifelse(any(gotColumns == wouldBeNice[i]), yes=" OK", no=" Nope"))
+ }
+ if(!any(reqCols %in% haveGot))
+ stop(paste0("One or more required fields are not defined in the",
+ " submitted GRanges object! Need ",
+ paste(sQuote(reqCols), collapse=","), " but got only ",
+ paste(reqCols[reqCols %in% gotColumns], collapse=","),
+ "."))
+ ## Now gets tricky; special case Ensembl < 75: we've got NO gene type.
+ if(any(gotTypes == "gene")){
+ ## All is fine.
+ genes <- as.data.frame(x[x$type == "gene", haveGot])
+ }else{
+ ## Well, have to split by gene_id and process...
+ genes <- split(x[ , haveGot], x$gene_id)
+ gnRanges <- unlist(range(genes))
+ gnMcol <- as.data.frame(unique(mcols(unlist(genes))))
+ genes <- as.data.frame(gnRanges)
+ ## Adding mcols again.
+ genes <- cbind(genes, gnMcol[match(rownames(genes), gnMcol$gene_id), ])
+ rm(gnRanges)
+ rm(gnMcol)
+ }
+ colnames(genes) <- c("seq_name", "gene_seq_start", "gene_seq_end", "width",
+ "seq_strand", haveGot)
+ ## Add missing cols...
+ if(length(dontHave) > 0){
+ cn <- colnames(genes)
+ for(i in 1:length(dontHave)){
+ genes <- cbind(genes, rep(NA, nrow(genes)))
+ }
+ colnames(genes) <- c(cn, dontHave)
+ }
+ genes <- cbind(genes, seq_coord_system=rep(NA, nrow(genes)))
+
+ ## transforming seq_strand from +/- to +1, -1.
+ strand <- rep(0L, nrow(genes))
+ strand[as.character(genes$seq_strand) == "+"] <- 1L
+ strand[as.character(genes$seq_strand) == "-"] <- -1L
+ genes[ , "seq_strand"] <- strand
+ ## rearranging data.frame...
+ genes <- genes[ , c("gene_id", "gene_name", "entrezid", "gene_biotype",
+ "gene_seq_start", "gene_seq_end", "seq_name",
+ "seq_strand", "seq_coord_system")]
+ OK <- .checkIntegerCols(genes)
+ dbWriteTable(con, name="gene", genes, overwrite=TRUE, row.names=FALSE)
+ ## Done.
+
+ message("OK")
+ ## ----------------------------
+ ##
+ ## process transcripts
+ message("Processing transcripts...", appendLF=TRUE)
+ ## want to have: tx_id, tx_biotype, tx_seq_start, tx_seq_end, tx_cds_seq_start,
+ ## tx_cds_seq_end, gene_id
+ wouldBeNice <- c("transcript_id", "gene_id", txBiotypeCol)
+ dontHave <- wouldBeNice[!(wouldBeNice %in% gotColumns)]
+ if(length(dontHave) > 0){
+ mess <- paste0("I'm missing column(s): ", paste0(sQuote(dontHave), collapse=","),
+ ".")
+ warning(mess, " The corresponding database columns will be empty!")
+ }
+ haveGot <- wouldBeNice[wouldBeNice %in% gotColumns]
+ message(" Attribute availability:", appendLF=TRUE)
+ for(i in 1:length(wouldBeNice)){
+ message(" o ", wouldBeNice[i], "...",
+ ifelse(any(gotColumns == wouldBeNice[i]), yes=" OK", no=" Nope"))
+ }
+ reqCols <- c("transcript_id", "gene_id")
+ if(!any(reqCols %in% gotColumns))
+ stop(paste0("One or more required fields are not defined in",
+ " the submitted GRanges object! Need ",
+ paste(reqCols, collapse=","), " but got only ",
+ paste(reqCols[reqCols %in% gotColumns], collapse=","),
+ "."))
+ if(any(gotTypes == "transcript")){
+ tx <- as.data.frame(x[x$type == "transcript" , haveGot])
+ }else{
+ tx <- split(x[, haveGot], x$transcript_id)
+ txRanges <- unlist(range(tx))
+ txMcol <- as.data.frame(unique(mcols(unlist(tx))))
+ tx <- as.data.frame(txRanges)
+ tx <- cbind(tx, txMcol[match(rownames(tx), txMcol$transcript_id), ])
+ rm(txRanges)
+ rm(txMcol)
+ }
+ ## Drop columns seqnames, width and strand
+ tx <- tx[, -c(1, 4, 5)]
+ ## Add empty columns, eventually
+ if(length(dontHave) > 0){
+ cn <- colnames(tx)
+ for(i in 1:length(dontHave)){
+ tx <- cbind(tx, rep(NA, nrow(tx)))
+ }
+ colnames(tx) <- c(cn, dontHave)
+ }
+ ## Add columns for UTR
+ tx <- cbind(tx, tx_cds_seq_start=rep(NA, nrow(tx)), tx_cds_seq_end=rep(NA, nrow(tx)))
+ ## Process CDS...
+ if(any(gotTypes == "CDS")){
+ ## Only do that if we've got type == "CDS"!
+ ## process the CDS features to get the cds start and end of the transcript.
+ CDS <- as.data.frame(x[x$type == "CDS", "transcript_id"])
+ ##
+ startByTx <- split(CDS$start, f=CDS$transcript_id)
+ cdsStarts <- unlist(lapply(startByTx, function(z){return(min(z, na.rm=TRUE))}))
+ endByTx <- split(CDS$end, f=CDS$transcript_id)
+ cdsEnds <- unlist(lapply(endByTx, function(z){return(max(z, na.rm=TRUE))}))
+ idx <- match(names(cdsStarts), tx$transcript_id)
+ areNas <- is.na(idx)
+ idx <- idx[!areNas]
+ cdsStarts <- cdsStarts[!areNas]
+ cdsEnds <- cdsEnds[!areNas]
+ tx[idx, "tx_cds_seq_start"] <- cdsStarts
+ tx[idx, "tx_cds_seq_end"] <- cdsEnds
+ }else{
+ mess <- " I can't find type=='CDS'! The resulting database will lack CDS information!"
+ message(mess, appendLF = TRUE)
+ warning(mess)
+ }
+ colnames(tx) <- c("tx_seq_start", "tx_seq_end", "tx_id", "gene_id", "tx_biotype",
+ "tx_cds_seq_start", "tx_cds_seq_end")
+ ## rearranging data.frame:
+ tx <- tx[ , c("tx_id", "tx_biotype", "tx_seq_start", "tx_seq_end",
+ "tx_cds_seq_start", "tx_cds_seq_end", "gene_id")]
+ ## write the table.
+ OK <- .checkIntegerCols(tx)
+ dbWriteTable(con, name="tx", tx, overwrite=TRUE, row.names=FALSE)
+ rm(tx)
+ rm(CDS)
+ rm(cdsStarts)
+ rm(cdsEnds)
+ message("OK")
+ ## ----------------------------
+ ##
+ ## process exons
+ message("Processing exons...", appendLF=FALSE)
+ reqCols <- c("exon_id", "transcript_id", "exon_number")
+ if(!any(reqCols %in% gotColumns))
+ stop(paste0("One or more required fields are not defined in",
+ " the submitted GRanges object! Need ",
+ paste(reqCols, collapse=","), " but got only ",
+ paste(reqCols[reqCols %in% gotColumns], collapse=","),
+ "."))
+ exons <- as.data.frame(x[x$type == "exon", reqCols])[, -c(1, 4, 5)]
+ ## for table tx2exon we want to have:
+ ## tx_id, exon_id, exon_idx
+ t2e <- unique(exons[ , c("transcript_id", "exon_id", "exon_number")])
+ colnames(t2e) <- c("tx_id", "exon_id", "exon_idx")
+ ## Force exon_idx to be an integer!
+ t2e[, "exon_idx"] <- as.integer(t2e[, "exon_idx"])
+ ## Cross-check that we've got the corresponding tx_ids in the tx table!
+ ## for table exons we want to have:
+ ## exon_id, exon_seq_start, exon_seq_end
+ exons <- unique(exons[ , c("exon_id", "start", "end")])
+ colnames(exons) <- c("exon_id", "exon_seq_start", "exon_seq_end")
+ ## writing the tables.
+ .checkIntegerCols(exons)
+ .checkIntegerCols(t2e)
+ dbWriteTable(con, name="exon", exons, overwrite=TRUE, row.names=FALSE)
+ dbWriteTable(con, name="tx2exon", t2e, overwrite=TRUE, row.names=FALSE)
+ message("OK")
+ ## ----------------------------
+ ##
+ ## process chromosomes
+ message("Processing chromosomes...", appendLF=FALSE)
+ if(fetchSeqinfo){
+ ## problem is I don't have these available...
+ chroms <- data.frame(seq_name=unique(as.character(genes$seq_name)))
+ chroms <- cbind(chroms, seq_length=rep(NA, nrow(chroms)),
+ is_circular=rep(NA, nrow(chroms)))
+ rownames(chroms) <- chroms$seq_name
+ ## now trying to get the sequence lengths directly from Ensembl using internal
+ ## functions from the GenomicFeatures package. I will use "try" to not break
+ ## the call if no seqlengths are available.
+ seqlengths <- tryGetSeqinfoFromEnsembl(organism, version, seqnames=chroms$seq_name)
+ if(nrow(seqlengths)>0){
+ seqlengths <- seqlengths[seqlengths[, "name"] %in% rownames(chroms), ]
+ chroms[seqlengths[, "name"], "seq_length"] <- seqlengths[, "length"]
+ }
+ }else{
+ ## have seqinfo available.
+ chroms <- data.frame(seq_name=seqnames(Seqinfo), seq_length=seqlengths(Seqinfo),
+ is_circular=isCircular(Seqinfo))
+ }
+ ## write the table.
+ dbWriteTable(con, name="chromosome", chroms, overwrite=TRUE, row.names=FALSE)
+ rm(genes)
+ message("OK")
+ message("Generating index...", appendLF=FALSE)
+ ## generating all indices...
+ .createEnsDbIndices(con)
+ message("OK")
+ message(" -------------")
+ message("Verifying validity of the information in the database:")
+ checkValidEnsDb(EnsDb(dbname))
+ return(dbname)
+}
+
+
+## helper function that checks that the gene, transcript and exon data in the
+## EnsDb database is correct (i.e. transcript within gene coordinates, exons within
+## transcript coordinates, cds within transcript)
+checkValidEnsDb <- function(x){
+ message("Checking transcripts...", appendLF=FALSE)
+ tx <- transcripts(x, columns=c("gene_id", "tx_id", "gene_seq_start", "gene_seq_end",
+ "tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end"), return.type="DataFrame")
+ ## check if the tx are inside the genes...
+ isInside <- tx$tx_seq_start >= tx$gene_seq_start & tx$tx_seq_end <= tx$gene_seq_end
+ if(any(!isInside))
+ stop("Start and end coordinates for ", sum(!isInside),
+ "transcripts are not within the gene coordinates!")
+ ## check cds coordinates
+ notInside <- which(!(tx$tx_cds_seq_start >= tx$tx_seq_start & tx$tx_cds_seq_end <= tx$tx_seq_end))
+ if(length(notInside) > 0){
+ stop("The CDS start and end coordinates for ", length(notInside),
+ " transcripts are not within the transcript coordinates!")
+ }
+ rm(tx)
+ message("OK\nChecking exons...", appendLF=FALSE)
+ ex <- exons(x, columns=c("exon_id", "tx_id", "exon_seq_start", "exon_seq_end",
+ "tx_seq_start", "tx_seq_end", "seq_strand", "exon_idx"),
+ return.type="data.frame")
+ ## check if exons are within tx
+ isInside <- ex$exon_seq_start >= ex$tx_seq_start & ex$exon_seq_end <= ex$tx_seq_end
+ if(any(!isInside))
+ stop("Start and end coordinates for ", sum(!isInside),
+ " exons are not within the transcript coordinates!")
+ ## checking the exon index...
+ extmp <- ex[ex$seq_strand==1, c("exon_idx", "tx_id", "exon_seq_start")]
+ extmp <- extmp[order(extmp$exon_seq_start), ]
+ extmp.split <- split(extmp[ , c("exon_idx")], f=factor(extmp$tx_id))
+ Different <- unlist(lapply(extmp.split, FUN=function(z){
+ return(any(z != seq(1, length(z))))
+ }))
+ if(any(Different)){
+ stop(paste0("Provided exon index in transcript does not match with ordering",
+ " of the exons by chromosomal coordinates for",
+ sum(Different), "of the", length(Different),
+ "transcripts encoded on the + strand!"))
+ }
+ extmp <- ex[ex$seq_strand==-1, c("exon_idx", "tx_id", "exon_seq_end")]
+ extmp <- extmp[order(extmp$exon_seq_end, decreasing=TRUE), ]
+ extmp.split <- split(extmp[ , c("exon_idx")], f=factor(extmp$tx_id))
+ Different <- unlist(lapply(extmp.split, FUN=function(z){
+ return(any(z != seq(1, length(z))))
+ }))
+ if(any(Different)){
+ stop(paste0("Provided exon index in transcript does not match with ordering",
+ " of the exons by chromosomal coordinates for",
+ sum(Different), "of the", length(Different),
+ "transcripts encoded on the - strand!"))
+ }
+ message("OK")
+}
+
+
+## organism is expected to be e.g. Homo_sapiens, so the full organism name, with
+## _ as a separator
+tryGetSeqinfoFromEnsembl <- function(organism, ensemblVersion, seqnames){
+ ## Quick fix if organism contains whitespace instead of _:
+ organism <- gsub(organism, pattern=" ", replacement="_", fixed=TRUE)
+ Dataset <- paste0(c(tolower(.abbrevOrganismName(organism)), "gene_ensembl"),
+ collapse="_")
+ message("Fetch seqlengths from ensembl, dataset ", Dataset, " version ",
+ ensemblVersion, "...", appendLF=FALSE)
+ ## get it all from the ensemblgenomes.org host???
+ tmp <- try(
+ GenomicFeatures:::fetchChromLengthsFromEnsembl(dataset=Dataset,
+ release=ensemblVersion,
+ extra_seqnames=seqnames),
+ silent=TRUE)
+ if(class(tmp)=="try-error"){
+ message(paste0("Unable to get sequence lengths from Ensembl for dataset: ",
+ Dataset, ". Error was: ", message(tmp), "\n"))
+ }else{
+ message("OK")
+ return(tmp)
+ }
+ ## try plant genomes...
+ tmp <- try(
+ GenomicFeatures:::fetchChromLengthsFromEnsemblPlants(dataset=Dataset,
+ extra_seqnames=seqnames),
+ silent=TRUE)
+ if(class(tmp)=="try-error"){
+ message(paste0("Unable to get sequence lengths from Ensembl plants for dataset: ",
+ Dataset, ". Error was: ", message(tmp), "\n"))
+ }else{
+ message("OK")
+ return(tmp)
+ }
+ message("FAIL")
+ return(matrix(ncol=2, nrow=0))
+}
+
+buildMetadata <- function(organism="", ensemblVersion="", genomeVersion="",
+ host="", sourceFile=""){
+ MetaData <- data.frame(matrix(ncol=2, nrow=11))
+ colnames(MetaData) <- c("name", "value")
+ MetaData[1, ] <- c("Db type", "EnsDb")
+ MetaData[2, ] <- c("Type of Gene ID", "Ensembl Gene ID")
+ MetaData[3, ] <- c("Supporting package", "ensembldb")
+ MetaData[4, ] <- c("Db created by", "ensembldb package from Bioconductor")
+ MetaData[5, ] <- c("script_version", "0.0.1")
+ MetaData[6, ] <- c("Creation time", date())
+ MetaData[7, ] <- c("ensembl_version", ensemblVersion)
+ MetaData[8, ] <- c("ensembl_host", host)
+ MetaData[9, ] <- c("Organism", organism )
+ MetaData[10, ] <- c("genome_build", genomeVersion)
+ MetaData[11, ] <- c("DBSCHEMAVERSION", "1.0")
+ MetaData[12, ] <- c("source_file", sourceFile)
+ return(MetaData)
+}
+
+## compare the contents of the EnsDb sqlite database generated from a GTF (file name submitted
+## with x ) with the one provided by package "lib".
+compareEnsDbs <- function(x, y){
+ ## compare two EnsDbs...
+ if(organism(x)!=organism(y))
+ stop("Well, at least the organism should be the same for both databases!")
+ Messages <- rep("OK", 5)
+ names(Messages) <- c("metadata", "chromosome", "gene", "transcript", "exon")
+ ## comparing metadata.
+ metadataX <- metadata(x)
+ metadataY <- metadata(y)
+ rownames(metadataX) <- metadataX[, 1]
+ rownames(metadataY) <- metadataY[, 1]
+ metadataY <- metadataY[rownames(metadataX),]
+ cat("\nComparing metadata:\n")
+ idx <- which(metadataX[, "value"]!=metadataY[, "value"])
+ if(length(idx)>0)
+ Messages["metadata"] <- "NOTE"
+ ## check ensembl version
+ if(metadataX["ensembl_version", "value"] == metadataY["ensembl_version", "value"]){
+ cat(" Ensembl versions match.\n")
+ }else{
+ cat(" WARNING: databases base on different Ensembl versions! Expect considerable differences!\n")
+ Messages["metadata"] <- "WARN"
+ }
+ ## genome build
+ if(metadataX["genome_build", "value"] == metadataY["genome_build", "value"]){
+ cat(" Genome builds match.\n")
+ }else{
+ cat(" WARNING: databases base on different Genome builds! Expect considerable differences!\n")
+ Messages["metadata"] <- "WARN"
+ }
+ if(length(idx)>0){
+ cat(" All differences: <name>: <value x> != <value y>\n")
+ for(i in idx){
+ cat(paste(" - ", metadataX[i, "name"], ":", metadataX[i, "value"], " != ",
+ metadataY[i, "value"], "\n"))
+ }
+ }
+ cat(paste0("Done. Result: ", Messages["metadata"],"\n"))
+ ## now comparing chromosomes
+ Messages["chromosome"] <- compareChromosomes(x, y)
+ ## comparing genes
+ Messages["gene"] <- compareGenes(x, y)
+ ## comparing transcripts
+ Messages["transcript"] <- compareTx(x, y)
+ ## comparing exons
+ Messages["exon"] <- compareExons(x, y)
+ return(Messages)
+}
+
+
+compareChromosomes <- function(x, y){
+ Ret <- "OK"
+ cat("\nComparing chromosome data:\n")
+ chromX <- as.data.frame(seqinfo(x))
+ chromY <- as.data.frame(seqinfo(y))
+ ## compare seqnames
+ inboth <- rownames(chromX)[rownames(chromX) %in% rownames(chromY)]
+ onlyX <- rownames(chromX)[!(rownames(chromX) %in% rownames(chromY))]
+ onlyY <- rownames(chromY)[!(rownames(chromY) %in% rownames(chromX))]
+ if(length(onlyX) > 0 | length(onlyY) > 0)
+ Ret <- "WARN"
+ cat(paste0( " Sequence names: (", length(inboth), ") common, (",
+ length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n" ))
+ same <- length(which(chromX[inboth, "seqlengths"]==chromY[inboth, "seqlengths"]))
+ different <- length(inboth) - same
+ cat(paste0( " Sequence lengths: (",same, ") identical, (", different, ") different.\n" ))
+ if(different > 0)
+ Ret <- "WARN"
+ cat(paste0("Done. Result: ", Ret,"\n"))
+ return(Ret)
+}
+
+compareGenes <- function(x, y){
+ cat("\nComparing gene data:\n")
+ Ret <- "OK"
+ genesX <- genes(x)
+ genesY <- genes(y)
+ inboth <- names(genesX)[names(genesX) %in% names(genesY)]
+ onlyX <- names(genesX)[!(names(genesX) %in% names(genesY))]
+ onlyY <- names(genesY)[!(names(genesY) %in% names(genesX))]
+ if(length(onlyX) > 0 | length(onlyY) > 0)
+ Ret <- "WARN"
+ cat(paste0(" gene IDs: (", length(inboth), ") common, (",
+ length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n"))
+ ## seq names
+ same <- length(
+ which(as.character(seqnames(genesX[inboth]))==as.character(seqnames(genesY[inboth])))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Sequence names: (",same, ") identical, (", different, ") different.\n" ))
+ ## start
+ same <- length(
+ which(start(genesX[inboth]) == start(genesY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Gene start coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## end
+ same <- length(
+ which(end(genesX[inboth]) == end(genesY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Gene end coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## strand
+ same <- length(
+ which(as.character(strand(genesX[inboth]))
+ == as.character(strand(genesY[inboth])))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Gene strand: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## name
+ same <- length(
+ which(genesX[inboth]$gene_name == genesY[inboth]$gene_name)
+ )
+ different <- length(inboth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "WARN"
+ cat(paste0( " Gene names: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## entrezid
+ same <- length(
+ which(genesX[inboth]$entrezid == genesY[inboth]$entrezid)
+ )
+ different <- length(inboth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "WARN"
+ cat(paste0( " Entrezgene IDs: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## gene biotype
+ same <- length(
+ which(genesX[inboth]$gene_biotype == genesY[inboth]$gene_biotype)
+ )
+ different <- length(inboth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "WARN"
+ cat(paste0( " Gene biotypes: (",same,
+ ") identical, (", different, ") different.\n" ))
+ cat(paste0("Done. Result: ", Ret,"\n"))
+ return(Ret)
+}
+
+compareTx <- function(x, y){
+ cat("\nComparing transcript data:\n")
+ Ret <- "OK"
+ txX <- transcripts(x)
+ txY <- transcripts(y)
+ inboth <- names(txX)[names(txX) %in% names(txY)]
+ onlyX <- names(txX)[!(names(txX) %in% names(txY))]
+ onlyY <- names(txY)[!(names(txY) %in% names(txX))]
+ if(length(onlyX) > 0 | length(onlyY) > 0)
+ Ret <- "WARN"
+ cat(paste0(" transcript IDs: (", length(inboth), ") common, (",
+ length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n"))
+ ## start
+ same <- length(
+ which(start(txX[inboth]) == start(txY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Transcript start coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## end
+ same <- length(
+ which(end(txX[inboth]) == end(txY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Transcript end coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## tx biotype
+ same <- length(
+ which(txX[inboth]$tx_biotype == txY[inboth]$tx_biotype)
+ )
+ different <- length(inboth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "WARN"
+ cat(paste0( " Transcript biotypes: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## cds start
+ ## Makes sense to just compare for those that have the same tx!
+ txXSub <- txX[inboth]
+ txYSub <- txY[inboth]
+ txCdsX <- names(txXSub)[!is.na(txXSub$tx_cds_seq_start)]
+ txCdsY <- names(txYSub)[!is.na(txYSub$tx_cds_seq_start)]
+ cdsInBoth <- txCdsX[txCdsX %in% txCdsY]
+ cdsOnlyX <- txCdsX[!(txCdsX %in% txCdsY)]
+ cdsOnlyY <- txCdsY[!(txCdsY %in% txCdsX)]
+ if((length(cdsOnlyX) > 0 | length(cdsOnlyY)) & Ret!="ERROR")
+ Ret <- "ERROR"
+ cat(paste0(" Common transcripts with defined CDS: (",length(cdsInBoth), ") common, (",
+ length(cdsOnlyX), ") only in x, (", length(cdsOnlyY), ") only in y.\n"))
+ same <- length(
+ which(txX[cdsInBoth]$tx_cds_seq_start == txY[cdsInBoth]$tx_cds_seq_start)
+ )
+ different <- length(cdsInBoth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "ERROR"
+ cat(paste0( " CDS start coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## cds end
+ same <- length(
+ which(txX[cdsInBoth]$tx_cds_seq_end == txY[cdsInBoth]$tx_cds_seq_end)
+ )
+ different <- length(cdsInBoth) - same
+ if(different > 0 & Ret!="ERROR")
+ Ret <- "ERROR"
+ cat(paste0( " CDS end coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## gene id
+ same <- length(
+ which(txX[inboth]$gene_id == txY[inboth]$gene_id)
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Associated gene IDs: (",same,
+ ") identical, (", different, ") different.\n" ))
+ cat(paste0("Done. Result: ", Ret,"\n"))
+ return(Ret)
+}
+
+compareExons <- function(x, y){
+ cat("\nComparing exon data:\n")
+ Ret <- "OK"
+ exonX <- exons(x)
+ exonY <- exons(y)
+ inboth <- names(exonX)[names(exonX) %in% names(exonY)]
+ onlyX <- names(exonX)[!(names(exonX) %in% names(exonY))]
+ onlyY <- names(exonY)[!(names(exonY) %in% names(exonX))]
+ if(length(onlyX) > 0 | length(onlyY) > 0)
+ Ret <- "WARN"
+ cat(paste0(" exon IDs: (", length(inboth), ") common, (",
+ length(onlyX), ") only in x, (", length(onlyY), ") only in y.\n"))
+ ## start
+ same <- length(
+ which(start(exonX[inboth]) == start(exonY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Exon start coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## end
+ same <- length(
+ which(end(exonX[inboth]) == end(exonY[inboth]))
+ )
+ different <- length(inboth) - same
+ if(different > 0)
+ Ret <- "ERROR"
+ cat(paste0( " Exon end coordinates: (",same,
+ ") identical, (", different, ") different.\n" ))
+ ## now getting also the exon index in tx:
+ exonX <- exons(x, columns=c("exon_id", "tx_id", "exon_idx"),
+ return.type="DataFrame")
+ rownames(exonX) <- paste(exonX$tx_id, exonX$exon_id, sep=":")
+ exonY <- exons(y, columns=c("exon_id", "tx_id", "exon_idx"),
+ return.type="DataFrame")
+ rownames(exonY) <- paste(exonY$tx_id, exonY$exon_id, sep=":")
+ inboth <- rownames(exonX)[rownames(exonX) %in% rownames(exonY)]
+ onlyX <- rownames(exonX)[!(rownames(exonX) %in% rownames(exonY))]
+ onlyY <- rownames(exonY)[!(rownames(exonY) %in% rownames(exonX))]
+
+ ## tx exon idx
+ same <- length(
+ which(exonX[inboth, ]$exon_idx == exonY[inboth, ]$exon_idx)
+ )
+ different <- length(inboth) - same
+ if(different > 0 )
+ Ret <- "ERROR"
+ cat(paste0( " Exon index in transcript models: (",same,
+ ") identical, (", different, ") different.\n" ))
+ cat(paste0("Done. Result: ", Ret,"\n"))
+ return(Ret)
+}
+
+####============================================================
+## isEnsemblFileName
+##
+## evaluate whether the file name is "most likely" corresponding
+## to a file name from Ensembl, i.e. following the convention
+## <organism>.<genome version>.<ensembl version>.[chr].gff/gtf.gz
+## The problem is that the genome version can also be . separated.
+####------------------------------------------------------------
+isEnsemblFileName <- function(x){
+ x <- file.name(x)
+ ## If we split by ., do we get at least 4 elements?
+ els <- unlist(strsplit(x, split=".", fixed=TRUE))
+ if(length(els) < 4)
+ return(FALSE)
+ ## Can we get an Ensembl version?
+ ensVer <- ensemblVersionFromGtfFileName(x)
+ if(is.na(ensVer))
+ return(FALSE)
+ ## If we got one, do we still have enough fields left of the version?
+ idx <- which(els == ensVer)
+ idx <- idx[length(idx)]
+ if(idx < 3){
+ ## No way, we're missing the organism and the genome build field!
+ return(FALSE)
+ }
+ ## Well, can not think of any other torture... let's assume it's OK.
+ return(TRUE)
+}
+organismFromGtfFileName <- function(x){
+ return(elementFromEnsemblFilename(x, 1))
+}
+####============================================================
+## ensemblVersionFromGtfFileName
+##
+## Tries to extract the Ensembl version from the file name. If it
+## finds a numeric value it returns it, otherwise it returns NA.
+####------------------------------------------------------------
+ensemblVersionFromGtfFileName <- function(x){
+ x <- file.name(x)
+ els <- unlist(strsplit(x, split=".", fixed=TRUE))
+ ## Ensembl version is the last numeric value in the file name.
+ for(elm in rev(els)){
+ suppressWarnings(
+ if(!is.na(as.numeric(elm))){
+ return(elm)
+ }
+ )
+ }
+ return(NA)
+}
+####============================================================
+## genomeVersionFromGtfFileName
+##
+## the genome build can also contain .! thus, I return everything which is not
+## the first element (i.e. organism), or the ensembl version, that is one left of
+## the gtf.
+genomeVersionFromGtfFileName <- function(x){
+ x <- file.name(x)
+ els <- unlist(strsplit(x, split=".", fixed=TRUE))
+ ensVer <- ensemblVersionFromGtfFileName(x)
+ if(is.na(ensVer)){
+ stop("Can not extract the genome version from the file name!",
+ " The file name does not follow the expected naming convention from Ensembl!")
+ }
+ idx <- which(els == ensVer)
+ idx <- idx[length(idx)]
+ if(idx < 3)
+ stop("Can not extract the genome version from the file name!",
+ " The file name does not follow the expected naming convention from Ensembl!")
+ return(paste(els[2:(idx-1)], collapse="."))
+}
+old_ensemblVersionFromGtfFileName <- function(x){
+ tmp <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
+ splitty <- unlist(strsplit(tmp[length(tmp)], split=".", fixed=TRUE))
+ return(splitty[(grep(splitty, pattern="gtf")-1)])
+}
+
+## the genome build can also contain .! thus, I return everything which is not
+## the first element (i.e. organism), or the ensembl version, that is one left of
+## the gtf.
+old_genomeVersionFromGtfFileName <- function(x){
+ tmp <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
+ splitty <- unlist(strsplit(tmp[length(tmp)], split=".", fixed=TRUE))
+ gvparts <- splitty[2:(grep(splitty, pattern="gtf")-2)]
+ return(paste(gvparts, collapse="."))
+}
+
+## Returns NULL if there was a problem.
+elementFromEnsemblFilename <- function(x, which=1){
+ tmp <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
+ splitty <- unlist(strsplit(tmp[length(tmp)], split=".", fixed=TRUE))
+ if(length(splitty) < which){
+ warning("File ", x, " does not conform to the Ensembl file naming convention.")
+ return(NULL)
+ }
+ return(splitty[which])
+}
+
+file.name <- function(x){
+ fn <- unlist(strsplit(x, split=.Platform$file.sep, fixed=TRUE))
+ fn <- fn[length(fn)]
+ return(fn)
+}
diff --git a/R/Generics.R b/R/Generics.R
new file mode 100644
index 0000000..d420a43
--- /dev/null
+++ b/R/Generics.R
@@ -0,0 +1,147 @@
+##***********************************************************************
+##
+## Generic methods
+##
+##***********************************************************************
+if(!isGeneric("column"))
+ setGeneric("column", function(object, db, with.tables, ...)
+ standardGeneric("column"))
+if(!isGeneric("buildQuery"))
+ setGeneric("buildQuery", function(x, ...)
+ standardGeneric("buildQuery"))
+if(!isGeneric("cleanColumns"))
+ setGeneric("cleanColumns", function(x, columns, ...)
+ starndardGeneric("cleanColumns"))
+if(!isGeneric("condition"))
+ setGeneric("condition", function(x, ...)
+ standardGeneric("condition"))
+setGeneric("condition<-", function(x, value)
+ standardGeneric("condition<-"))
+setGeneric("dbSeqlevelsStyle", function(x, ...)
+ standardGeneric("dbSeqlevelsStyle"))
+
+if(!isGeneric("genes"))
+ setGeneric("genes", function(x, ...)
+ standardGeneric("genes"))
+if(!isGeneric("getWhat"))
+ setGeneric("getWhat", function(x, ...)
+ standardGeneric("getWhat"))
+if(!isGeneric("ensemblVersion"))
+ setGeneric("ensemblVersion", function(x)
+ standardGeneric("ensemblVersion"))
+if(!isGeneric("exons"))
+ setGeneric("exons", function(x, ...)
+ standardGeneric("exons"))
+if(!isGeneric("exonsBy"))
+ setGeneric("exonsBy", function(x, ...)
+ standardGeneric("exonsBy"))
+
+setGeneric("getGeneRegionTrackForGviz", function(x, ...)
+ standardGeneric("getGeneRegionTrackForGviz"))
+
+if(!isGeneric("getGenomeFaFile"))
+ setGeneric("getGenomeFaFile", function(x, ...)
+ standardGeneric("getGenomeFaFile"))
+if(!isGeneric("getGenomeTwoBitFile"))
+ setGeneric("getGenomeTwoBitFile", function(x, ...)
+ standardGeneric("getGenomeTwoBitFile"))
+if(!isGeneric("getMetadataValue"))
+ setGeneric("getMetadataValue", function(x, name)
+ standardGeneric("getMetadataValue"))
+if(!isGeneric("listColumns")){
+ setGeneric("listColumns", function(x, ...)
+ standardGeneric("listColumns"))
+}
+if(!isGeneric("listGenebiotypes")){
+ setGeneric("listGenebiotypes", function(x, ...)
+ standardGeneric("listGenebiotypes"))
+}
+if(!isGeneric("listTxbiotypes")){
+ setGeneric("listTxbiotypes", function(x, ...)
+ standardGeneric("listTxbiotypes"))
+}
+if(!isGeneric("lengthOf"))
+ setGeneric("lengthOf", function(x, ...)
+ standardGeneric("lengthOf"))
+if(!isGeneric("print"))
+ setGeneric("print", function(x, ...)
+ standardGeneric("print"))
+if(!isGeneric("requireTable"))
+ setGeneric("requireTable", function(x, db, ...)
+ standardGeneric("requireTable"))
+
+setGeneric("supportedSeqlevelsStyles", function(x)
+ standardGeneric("supportedSeqlevelsStyles"))
+
+if(!isGeneric("seqinfo"))
+ setGeneric("seqinfo", function(x)
+ standardGeneric("seqinfo"))
+if(!isGeneric("show"))
+ setGeneric("show", function(object, ...)
+ standardGeneric("show"))
+if(!isGeneric("toSAF"))
+ setGeneric("toSAF", function(x, ...)
+ standardGeneric("toSAF"))
+if(!isGeneric("listTables")){
+ setGeneric("listTables", function(x, ...)
+ standardGeneric("listTables"))
+}
+
+setGeneric("returnFilterColumns", function(x)
+ standardGeneric("returnFilterColumns"))
+setGeneric("returnFilterColumns<-", function(x, value)
+ standardGeneric("returnFilterColumns<-"))
+
+if(!isGeneric("tablesByDegree")){
+ setGeneric("tablesByDegree", function(x, ...)
+ standardGeneric("tablesByDegree"))
+}
+if(!isGeneric("tablesForColumns"))
+ setGeneric("tablesForColumns", function(x, attributes, ...)
+ standardGeneric("tablesForColumns"))
+
+if(!isGeneric("transcriptLengths"))
+ setGeneric("transcriptLengths", function(x, with.cds_len=FALSE,
+ with.utr5_len=FALSE,
+ with.utr3_len=FALSE, ...)
+ standardGeneric("transcriptLengths"))
+
+if(!isGeneric("transcripts"))
+ setGeneric("transcripts", function(x, ...)
+ standardGeneric("transcripts"))
+if(!isGeneric("transcriptsBy"))
+ setGeneric("transcriptsBy", function(x, ...)
+ standardGeneric("transcriptsBy"))
+setGeneric("updateEnsDb", function(x, ...)
+ standardGeneric("updateEnsDb"))
+##if(!isGeneric("value"))
+ setGeneric("value", function(x, db, ...)
+ standardGeneric("value"))
+setGeneric("value<-", function(x, value)
+ standardGeneric("value<-"))
+if(!isGeneric("where"))
+ setGeneric("where", function(object, db, with.tables, ...)
+ standardGeneric("where"))
+
+####============================================================
+## Private methods
+##
+####------------------------------------------------------------
+setGeneric("properties", function(x, ...)
+ standardGeneric("properties"))
+## setGeneric("properties<-", function(x, name, value, ...)
+## standardGeneric("properties<-"))
+setGeneric("getProperty", function(x, name=NULL, ...)
+ standardGeneric("getProperty"))
+setGeneric("setProperty", function(x, value=NULL, ...)
+ standardGeneric("setProperty"))
+setGeneric("formatSeqnamesForQuery", function(x, sn, ...)
+ standardGeneric("formatSeqnamesForQuery"))
+setGeneric("formatSeqnamesFromQuery", function(x, sn, ...)
+ standardGeneric("formatSeqnamesFromQuery"))
+setGeneric("orderResultsInR", function(x)
+ standardGeneric("orderResultsInR"))
+setGeneric("orderResultsInR<-", function(x, value)
+ standardGeneric("orderResultsInR<-"))
+setGeneric("useMySQL", function(x, host = "localhost", port = 3306, user, pass)
+ standardGeneric("useMySQL"))
diff --git a/R/Methods-Filter.R b/R/Methods-Filter.R
new file mode 100644
index 0000000..ebae8c0
--- /dev/null
+++ b/R/Methods-Filter.R
@@ -0,0 +1,933 @@
+##***********************************************************************
+##
+## Methods for BasicFilter classes.
+##
+##***********************************************************************
+validateConditionFilter <- function(object){
+ if(object at .valueIsCharacter){
+ ## condition has to be either = or in
+ if(!any(c("=", "in", "not in", "like", "!=")==object at condition)){
+ return(paste("only \"=\", \"!=\", \"in\" , \"not in\" and \"like\"",
+ "allowed for condition",
+ ", I've got", object at condition))
+ }
+ }else{
+ ## condition has to be = < > >= <=
+ if(!any(c("=", ">", "<", ">=", "<=", "in", "not in")==object at condition)){
+ return(paste("only \"=\", \">\", \"<\", \">=\", \"<=\" , \"in\" and \"not in\"",
+ " are allowed for condition, I've got", object at condition))
+ }
+ }
+ if(length(object at value) > 1){
+ if(any(!object at condition %in% c("in", "not in")))
+ return(paste("only \"in\" and \"not in\" are allowed if value",
+ "is a vector with more than one value!"))
+ }
+ if(!object at .valueIsCharacter){
+ vals <- object at value
+ if(length(vals) == 1){
+ if(vals == ""){
+ vals <- "0"
+ }
+ }
+ ## value has to be numeric!!!
+ suppressWarnings(
+ if(any(is.na(is.numeric(vals))))
+ return(paste("value has to be numeric!!!"))
+ )
+ }
+ return(TRUE)
+}
+setValidity("BasicFilter", validateConditionFilter)
+setMethod("initialize", "BasicFilter", function(.Object, ...){
+ OK <- validateConditionFilter(.Object)
+ if(class(OK)=="character"){
+ stop(OK)
+ }
+ callNextMethod(.Object, ...)
+})
+
+.where <- function(object, db=NULL){
+ if(is.null(db)){
+ Vals <- value(object)
+ }else{
+ Vals <- value(object, db)
+ }
+ ## if not a number we have to single quote!
+ if(object at .valueIsCharacter){
+ Vals <- sQuote(gsub(unique(Vals),pattern="'",replacement="''"))
+ }else{
+ Vals <- unique(Vals)
+ }
+ ## check, if there are more than one, concatenate in that case on put () aroung
+ if(length(Vals) > 1){
+ Vals <- paste0("(", paste(Vals, collapse=",") ,")")
+ }
+ return(paste(condition(object), Vals))
+}
+setMethod("where", signature(object="BasicFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return(.where(object))
+})
+setMethod("where", signature(object="BasicFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return(.where(object, db=db))
+})
+setMethod("where", signature(object="BasicFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(.where(object, db=db))
+})
+setMethod("condition", "BasicFilter", function(x, ...){
+ if(length(unique(value(x))) > 1){
+ if(x at condition=="in" | x at condition=="not in")
+ return(x at condition)
+ if(x at condition=="!="){
+ return("not in")
+ }else if(x at condition=="="){
+ return("in")
+ }else{
+ stop("With more than 1 value only conditions \"=\" and \"!=\" are allowed!")
+ }
+ }else{
+ ## check first if we do have "in" or "not in" and if
+ ## cast it to a = and != respectively
+ if(x at condition=="in")
+ return("=")
+ if(x at condition=="not in")
+ return("!=")
+ return(x at condition)
+ }
+})
+setReplaceMethod("condition", "BasicFilter", function(x, value){
+ if(x at .valueIsCharacter){
+ allowed <- c("=", "!=", "in", "not in", "like")
+ if(!any(allowed == value)){
+ stop("Only ", paste(allowed, collapse=", "), " are allowed if the value from",
+ " the filter is of type character!")
+ }
+ if(value == "=" & length(x at value) > 1)
+ value <- "in"
+ if(value == "!=" & length(x at value) > 1)
+ value <- "not in"
+ if(value == "in" & length(x at value) == 1)
+ value <- "="
+ if(value == "not in" & length(x at value) == 1)
+ value <- "!="
+ }else{
+ allowed <- c("=", ">", "<", ">=", "<=")
+ if(!any(allowed == value)){
+ stop("Only ", paste(allowed, collapse=", "), " are allowed if the value from",
+ " the filter is numeric!")
+ }
+ }
+ x at condition <- value
+ validObject(x)
+ return(x)
+})
+setMethod("value", signature(x="BasicFilter", db="missing"),
+ function(x, db, ...){
+ return(x at value)
+ })
+setMethod("value", signature(x="BasicFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(x at value)
+ })
+setReplaceMethod("value", "BasicFilter", function(x, value){
+ if(is.numeric(value)){
+ x at .valueIsCharacter <- FALSE
+ }else{
+ x at .valueIsCharacter <- TRUE
+ }
+ x at value <- as.character(value)
+ ## Checking if condition matches the value.
+ if(length(value) > 1){
+ if(x at condition == "=")
+ x at condition <- "in"
+ if(x at condition == "!=")
+ x at condition <- "not in"
+ }else{
+ if(x at condition == "in")
+ x at condition <- "="
+ if(x at condition == "not in")
+ x at condition <- "!="
+ }
+ ## Test validity
+ validObject(x)
+ return(x)
+})
+## setMethod("requireTable", "EnsFilter", function(object, ...){
+## return(object at required.table)
+## })
+setMethod("print", "BasicFilter", function(x, ...){
+ show(x)
+})
+setMethod("show", "BasicFilter", function(object){
+ cat("| Object of class:", class(object), "\n")
+ cat("| condition:", object at condition, "\n")
+ cat("| value:", value(object), "\n")
+})
+
+##***********************************************************************
+##
+## where for a list.
+##
+##***********************************************************************
+setMethod("where", signature(object="list",db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ wherequery <- paste(" where", paste(unlist(lapply(object, where)),
+ collapse=" and "))
+ return(wherequery)
+ })
+setMethod("where", signature(object="list",db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ wherequery <- paste(" where", paste(unlist(lapply(object, where, db)),
+ collapse=" and "))
+ return(wherequery)
+ })
+setMethod("where", signature(object="list",db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ wherequery <- paste(" where", paste(unlist(lapply(object, where, db,
+ with.tables=with.tables)),
+ collapse=" and "))
+ return(wherequery)
+ })
+
+
+
+##***********************************************************************
+##
+## Methods for GeneidFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="GeneidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="GeneidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("gene_id")
+ })
+setMethod("where", signature(object="GeneidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="GeneidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="GeneidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables), suff))
+ })
+setMethod("column", signature("GeneidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+##***********************************************************************
+##
+## Methods for EntrezidFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="EntrezidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="EntrezidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("entrezid")
+ })
+setMethod("where", signature(object="EntrezidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="EntrezidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="EntrezidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature("EntrezidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+##***********************************************************************
+##
+## Methods for GenebiotypeFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="GenebiotypeFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="GenebiotypeFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("gene_biotype")
+ })
+setMethod("where", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="GenebiotypeFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+##***********************************************************************
+##
+## Methods for GenenameFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="GenenameFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="GenenameFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("gene_name")
+ })
+setMethod("where", signature(object="GenenameFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="GenenameFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="GenenameFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables="character", ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="GenenameFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+
+##***********************************************************************
+##
+## Methods for TxidFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="TxidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="TxidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("tx_id")
+ })
+setMethod("where", signature(object="TxidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="TxidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="TxidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="TxidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+
+##***********************************************************************
+##
+## Methods for TxbiotypeFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="TxbiotypeFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="TxbiotypeFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables...){
+ return("tx_biotype")
+ })
+setMethod("where", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="TxbiotypeFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+
+##***********************************************************************
+##
+## Methods for ExonidFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="ExonidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="ExonidFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("exon_id")
+ })
+setMethod("where", signature(object="ExonidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="ExonidFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="ExonidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="ExonidFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+##***********************************************************************
+##
+## Methods for ExonrankFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="ExonrankFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="ExonrankFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("exon_idx")
+ })
+setMethod("where", signature(object="ExonrankFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="ExonrankFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="ExonrankFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="ExonrankFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+setReplaceMethod("value", "ExonrankFilter", function(x, value){
+ if(any(is.na(as.numeric(value))))
+ stop("Argument 'value' has to be numeric!")
+ x at value <- value
+ validObject(x)
+ return(x)
+})
+
+
+##***********************************************************************
+##
+## Methods for SeqnameFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="SeqnameFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="SeqnameFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("seq_name")
+ })
+setMethod("where", signature(object="SeqnameFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="SeqnameFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="SeqnameFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="SeqnameFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+## Overwriting the value method allows us to fix chromosome names (e.g. with prefix chr)
+## to be usable for EnsDb and Ensembl based chromosome names (i.e. without chr).
+setMethod("value", signature(x="SeqnameFilter", db="EnsDb"),
+ function(x, db, ...){
+ val <- formatSeqnamesForQuery(db, value(x))
+ if(any(is.na(val))){
+ stop("A value of <NA> is not allowed for a SeqnameFilter!")
+ }
+ return(val)
+ ##return(ucscToEns(value(x)))
+ })
+
+
+##***********************************************************************
+##
+## Methods for SeqstrandFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="SeqstrandFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="SeqstrandFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ return("seq_strand")
+ })
+setMethod("where", signature(object="SeqstrandFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="SeqstrandFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="SeqstrandFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="SeqstrandFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+##***********************************************************************
+##
+## Methods for SeqstartFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="SeqstartFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="SeqstartFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ ## assuming that we follow the naming convention:
+ ## <feature>_seq_end for the naming of the database columns.
+ feature <- object at feature
+ feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
+ if(object at feature=="transcript")
+ feature <- "tx"
+ return(paste0(feature, "_seq_start"))
+ })
+setMethod("where", signature(object="SeqstartFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="SeqstartFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="SeqstartFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="SeqstartFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+
+##***********************************************************************
+##
+## Methods for SeqendFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object="SeqendFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+ })
+setMethod("column", signature(object="SeqendFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ ## assuming that we follow the naming convention:
+ ## <feature>_seq_end for the naming of the database columns.
+ feature <- object at feature
+ feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
+ if(object at feature=="transcript")
+ feature <- "tx"
+ return(paste0(feature, "_seq_end"))
+ })
+setMethod("where", signature(object="SeqendFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("column", signature(object="SeqendFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="SeqendFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables=with.tables), suff))
+ })
+setMethod("column", signature(object="SeqendFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ return(unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE))
+ })
+
+
+###============================================================
+## Methods for GRangesFilter
+## + show
+## + condition
+## + value
+## + where
+## + column
+## + start
+## + end
+## + seqnames
+## + strand
+###------------------------------------------------------------
+## Overwrite the validation method.
+setValidity("GRangesFilter", function(object){
+ if(!any(object at location == c("within", "overlapping") )){
+ return(paste0("Argument condition should be either 'within' or 'overlapping'! Got ",
+ object at location, "!"))
+ }
+ ## GRanges has to have valid values for start, end and seqnames!
+ if(length(start(object)) == 0)
+ return("start coordinate of the range is missing!")
+ if(length(end(object)) == 0)
+ return("end coordinate of the range is missing!")
+ if(length(seqnames(object)) == 0)
+ return("A valid seqname is required from the submitted GRanges!")
+ return(TRUE)
+})
+setMethod("show", "GRangesFilter", function(object){
+ cat("| Object of class:" , class(object), "\n")
+ cat("| region:\n")
+ cat("| + start:", paste0(start(object), collapse=", "), "\n")
+ cat("| + end: ", paste0(end(object), collapse=", "), "\n")
+ cat("| + seqname:", paste0(seqnames(object), collapse=", "), "\n")
+ cat("| + strand: ", paste0(strand(object), collapse=", "), "\n")
+ cat("| condition:", condition(object), "\n")
+})
+setMethod("condition", "GRangesFilter", function(x, ...){
+ return(x at location)
+})
+setReplaceMethod("condition", "GRangesFilter", function(x, value){
+ value <- match.arg(value, c("within", "overlapping"))
+ x at location <- value
+ validObject(x)
+ return(x)
+})
+setMethod("value", signature(x="GRangesFilter", db="missing"),
+ function(x, db, ...){
+ return(x at grange)
+ })
+setMethod("value", signature(x="GRangesFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(x at grange)
+ })
+setMethod("start", signature(x="GRangesFilter"),
+ function(x, ...){
+ return(start(value(x)))
+ })
+setMethod("end", signature(x="GRangesFilter"),
+ function(x, ...){
+ return(end(value(x)))
+ })
+setMethod("strand", signature(x="GRangesFilter"),
+ function(x, ...){
+ strnd <- as.character(strand(value(x)))
+ return(strnd)
+ })
+setMethod("seqnames", signature(x="GRangesFilter"),
+ function(x){
+ return(as.character(seqnames(value(x))))
+ })
+setMethod("seqlevels", signature(x="GRangesFilter"),
+ function(x){
+ return(seqlevels(value(x)))
+ })
+## The column method for GRangesFilter returns all columns required for the query, i.e.
+## the _seq_start, _seq_end for the feature, seq_name and seq_strand.
+## Note: this method has to return a named vector!
+setMethod("column", signature(object="GRangesFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ ## assuming that we follow the naming convention:
+ ## <feature>_seq_end for the naming of the database columns.
+ feature <- object at feature
+ feature <- match.arg(feature, c("gene", "transcript", "exon", "tx"))
+ if(object at feature=="transcript")
+ feature <- "tx"
+ cols <- c(start=paste0(feature, "_seq_start"),
+ end=paste0(feature, "_seq_end"),
+ seqname="seq_name",
+ strand="seq_strand")
+ return(cols)
+ })
+setMethod("column", signature(object="GRangesFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables=tn))
+ })
+## Providing also the columns.
+setMethod("column", signature(object="GRangesFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ cols <- unlist(prefixColumns(db, column(object), with.tables=with.tables),
+ use.names=FALSE)
+ ## We have to give the vector the required names!
+ names(cols) <- 1:length(cols)
+ names(cols)[grep(cols, pattern="seq_name")] <- "seqname"
+ names(cols)[grep(cols, pattern="seq_strand")] <- "strand"
+ names(cols)[grep(cols, pattern="seq_start")] <- "start"
+ names(cols)[grep(cols, pattern="seq_end")] <- "end"
+ return(cols[c("start", "end", "seqname", "strand")])
+ })
+## Where for GRangesFilter only.
+setMethod("where", signature(object="GRangesFilter", db="missing", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ ## Get the names of the columns we're going to query.
+ cols <- column(object)
+ query <- buildWhereForGRanges(object, cols)
+ return(query)
+ })
+setMethod("where", signature(object="GRangesFilter", db="EnsDb", with.tables="missing"),
+ function(object, db, with.tables, ...){
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables=tn))
+ })
+setMethod("where", signature(object="GRangesFilter", db="EnsDb", with.tables="character"),
+ function(object, db, with.tables, ...){
+ cols <- column(object, db, with.tables)
+ query <- buildWhereForGRanges(object, cols, db=db)
+ return(query)
+ })
+
+
+## grf: GRangesFilter
+buildWhereForGRanges <- function(grf, columns, db=NULL){
+ condition <- condition(grf)
+ if(!any(condition == c("within", "overlapping")))
+ stop(paste0("'condition' for GRangesFilter should either be ",
+ "'within' or 'overlapping', got ", condition, "."))
+ if(is.null(names(columns))){
+ stop(paste0("The vector with the required column names for the",
+ " GRangesFilter query has to have names!"))
+ }
+ if(!all(c("start", "end", "seqname", "strand") %in% names(columns)))
+ stop(paste0("'columns' has to be a named vector with names being ",
+ "'start', 'end', 'seqname', 'strand'!"))
+ ## Build the query to fetch all features that are located within the range
+ quers <- sapply(value(grf), function(z){
+ if(!is.null(db)){
+ seqn <- formatSeqnamesForQuery(db, as.character(seqnames(z)))
+ }else{
+ seqn <- as.character(seqnames(z))
+ }
+ if(condition == "within"){
+ query <- paste0(columns["start"], " >= ", start(z), " and ",
+ columns["end"], " <= ", end(z), " and ",
+ columns["seqname"], " == '", seqn, "'")
+ }
+ ## Build the query to fetch all features (partially) overlapping the range. This
+ ## includes also all features (genes or transcripts) that have an intron at that
+ ## position.
+ if(condition == "overlapping"){
+ query <- paste0(columns["start"], " <= ", end(z), " and ",
+ columns["end"], " >= ", start(z), " and ",
+ columns["seqname"], " = '", seqn, "'")
+ }
+ ## Include the strand, if it's not "*"
+ if(as.character(strand(z)) != "*"){
+ query <- paste0(query, " and ", columns["strand"], " = ",
+ strand2num(as.character(strand(z))))
+ }
+ return(query)
+ })
+ if(length(quers) > 1)
+ quers <- paste0("(", quers, ")")
+ query <- paste0(quers, collapse=" or ")
+ ## Collapse now the queries.
+ return(query)
+}
+
+
+
+## map chromosome strand...
+strand2num <- function(x){
+ if(x == "+" | x == "-"){
+ return(as.numeric(paste0(x, 1)))
+ }else{
+ stop("Only '+' and '-' supported!")
+ }
+}
+num2strand <- function(x){
+ if(x < 0){
+ return("-")
+ }else{
+ return("+")
+ }
+}
+
+##***********************************************************************
+##
+## Methods for SymbolFilter classes.
+##
+##***********************************************************************
+setMethod("where", signature(object = "SymbolFilter", db = "missing",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ suff <- callNextMethod()
+ return(paste(column(object), suff))
+})
+setMethod("column", signature(object = "SymbolFilter", db = "missing",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ return("symbol")
+})
+setMethod("where", signature(object = "SymbolFilter", db = "EnsDb",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables = tn))
+})
+setMethod("column", signature(object = "SymbolFilter", db = "EnsDb",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables = tn))
+})
+setMethod("where", signature(object = "SymbolFilter", db = "EnsDb",
+ with.tables="character"),
+ function(object, db, with.tables = "character", ...) {
+ suff <- callNextMethod()
+ return(paste(column(object, db, with.tables = with.tables), suff))
+})
+setMethod("column", signature(object = "SymbolFilter", db = "EnsDb",
+ with.tables = "character"),
+ function(object, db, with.tables, ...) {
+ return(unlist(prefixColumns(db, "gene_name",
+ with.tables = with.tables),
+ use.names = FALSE))
+})
+
+##***********************************************************************
+##
+## Methods for OnlyCodingTx classes.
+##
+##***********************************************************************
+setMethod("where", signature(object = "OnlyCodingTx", db = "EnsDb",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ tn <- names(listTables(db))
+ return(where(object, db, with.tables = tn))
+})
+setMethod("column", signature(object = "OnlyCodingTx", db = "EnsDb",
+ with.tables = "missing"),
+ function(object, db, with.tables, ...) {
+ tn <- names(listTables(db))
+ return(column(object, db, with.tables = tn))
+})
+setMethod("where", signature(object = "OnlyCodingTx", db = "EnsDb",
+ with.tables="character"),
+ function(object, db, with.tables = "character", ...) {
+ ## Hard coded.
+ return("tx.tx_cds_seq_start is not null")
+})
+setMethod("column", signature(object = "OnlyCodingTx", db = "EnsDb",
+ with.tables = "character"),
+ function(object, db, with.tables, ...) {
+ return("tx.tx_cds_seq_start")
+})
diff --git a/R/Methods.R b/R/Methods.R
new file mode 100644
index 0000000..bb7c255
--- /dev/null
+++ b/R/Methods.R
@@ -0,0 +1,1758 @@
+##***********************************************************************
+##
+## Methods for EnsDb classes
+##
+##***********************************************************************
+setMethod("show", "EnsDb", function(object) {
+ if (is.null(object at ensdb)) {
+ cat("Dash it! Got an empty thing!\n")
+ } else {
+ info <- dbGetQuery(object at ensdb, "select * from metadata")
+ cat("EnsDb for Ensembl:\n")
+ if (inherits(object at ensdb, "SQLiteConnection"))
+ cat(paste0("|Backend: SQLite\n"))
+ if (inherits(object at ensdb, "MySQLConnection"))
+ cat(paste0("|Backend: MySQL\n"))
+ for (i in 1:nrow(info)) {
+ cat(paste0("|", info[ i, "name" ], ": ",
+ info[ i, "value" ], "\n"))
+ }
+ ## gene and transcript info.
+ cat(paste0("| No. of genes: ",
+ dbGetQuery(object at ensdb,
+ "select count(distinct gene_id) from gene")[1, 1], ".\n"))
+ cat(paste0("| No. of transcripts: ",
+ dbGetQuery(object at ensdb,
+ "select count(distinct tx_id) from tx")[1, 1], ".\n"))
+ }
+})
+
+setMethod("organism", "EnsDb", function(object){
+ Species <- .getMetaDataValue(object at ensdb, "Organism")
+ ## reformat the e.g. homo_sapiens string into Homo sapiens
+ #
+ Species <- gsub(Species, pattern="_", replacement=" ", fixed=TRUE)
+ Species <- .organismName(Species)
+ return(Species)
+})
+
+setMethod("metadata", "EnsDb", function(x, ...){
+ Res <- dbGetQuery(dbconn(x), "select * from metadata")
+ return(Res)
+})
+#####
+## Validation
+##
+validateEnsDb <- function(object){
+ ## check if the database contains all required tables...
+ if(!is.null(object at ensdb)){
+ OK <- dbHasRequiredTables(object at ensdb)
+ if (is.character(OK))
+ return(OK)
+ OK <- dbHasValidTables(object at ensdb)
+ if (is.character(OK))
+ return(OK)
+ }
+ return(TRUE)
+}
+setValidity("EnsDb", validateEnsDb)
+setMethod("initialize", "EnsDb", function(.Object,...){
+ OK <- validateEnsDb(.Object)
+ if(class(OK)=="character"){
+ stop(OK)
+ }
+ callNextMethod(.Object, ...)
+})
+
+### connection:
+## returns the connection object to the SQL database
+setMethod("dbconn", "EnsDb", function(x){
+ return(x at ensdb)
+})
+
+### ensemblVersion
+## returns the ensembl version of the package.
+setMethod("ensemblVersion", "EnsDb", function(x){
+ eVersion <- getMetadataValue(x, "ensembl_version")
+ return(eVersion)
+})
+### getMetadataValue
+## returns the metadata value for the specified name/key
+setMethod("getMetadataValue", "EnsDb", function(x, name){
+ if(missing(name))
+ stop("Argument name has to be specified!")
+ return(metadata(x)[metadata(x)$name==name, "value"])
+})
+
+### seqinfo
+## returns the sequence/chromosome information from the database.
+setMethod("seqinfo", "EnsDb", function(x){
+ Chrs <- dbGetQuery(dbconn(x), "select * from chromosome")
+ Chr.build <- .getMetaDataValue(dbconn(x), "genome_build")
+ Chrs$seq_name <- formatSeqnamesFromQuery(x, Chrs$seq_name)
+ SI <- Seqinfo(seqnames=Chrs$seq_name,
+ seqlengths=Chrs$seq_length,
+ isCircular=Chrs$is_circular==1, genome=Chr.build)
+ return(SI)
+})
+
+### seqlevels
+setMethod("seqlevels", "EnsDb", function(x){
+ Chrs <- dbGetQuery(dbconn(x), "select distinct seq_name from chromosome")
+ Chrs <- formatSeqnamesFromQuery(x, Chrs$seq_name)
+ return(Chrs)
+})
+
+### getGenomeFaFile
+## queries the dna.toplevel.fa file from AnnotationHub matching the current
+## Ensembl version
+## Update: if we can't find a FaFile matching the Ensembl version we suggest ones
+## that might match.
+setMethod("getGenomeFaFile", "EnsDb", function(x, pattern="dna.toplevel.fa"){
+ ah <- AnnotationHub()
+ ## Reduce the AnnotationHub to species, provider and genome version.
+ ah <- .reduceAH(ah, organism=organism(x), dataprovider="Ensembl",
+ genome=unique(genome(x)))
+ if(length(ah) == 0)
+ stop("Can not find any ressources in AnnotationHub for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "!")
+ ## Reduce to all Fasta files with toplevel or primary_assembly.
+ ah <- ah[ah$rdataclass == "FaFile", ]
+ if(length(ah) == 0)
+ stop("No FaFiles available in AnnotationHub for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "! You might also try to use the",
+ " 'getGenomeTwoBitFile' method instead.")
+ ## Reduce to dna.toplevel or dna.primary_assembly.
+ idx <- c(grep(ah$title, pattern="dna.toplevel"),
+ grep(ah$title, pattern="dna.primary_assembly"))
+ if(length(idx) == 0)
+ stop("No genome assembly fasta file available for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "!")
+ ah <- ah[idx, ]
+ ## Get the Ensembl version from the source url.
+ ensVers <- .ensVersionFromSourceUrl(ah$sourceurl)
+ if(any(ensVers == ensemblVersion(x))){
+ ## Got it.
+ itIs <- which(ensVers == ensemblVersion(x))
+ }else{
+ ## Get the "closest" one.
+ diffs <- abs(ensVers - as.numeric(ensemblVersion(x)))
+ itIs <- which(diffs == min(diffs))[1]
+ message("Returning the Fasta file for Ensembl version ", ensVers[itIs],
+ " since no file for Ensembl version ", ensemblVersion(x),
+ " is available.")
+ }
+ ## Getting the ressource.
+ Dna <- ah[[names(ah)[itIs]]]
+ ## generate an index if none is available
+ if(is.na(index(Dna))){
+ indexFa(Dna)
+ Dna <- FaFile(path(Dna))
+ }
+ return(Dna)
+})
+## Just restricting the Annotation Hub to entries matching the species and the
+## genome; not yet the Ensembl version.
+.reduceAH <- function(ah, organism=NULL, dataprovider="Ensembl",
+ genome=NULL){
+ if(!is.null(dataprovider))
+ ah <- ah[ah$dataprovider == dataprovider, ]
+ if(!is.null(organism))
+ ah <- ah[ah$species == organism, ]
+ if(!is.null(genome))
+ ah <- ah[ah$genome == genome, ]
+ return(ah)
+}
+.ensVersionFromSourceUrl <- function(url){
+ url <- strsplit(url, split="/", fixed=TRUE)
+ ensVers <- unlist(lapply(url, function(z){
+ idx <- grep(z, pattern="^release")
+ if(length(idx) == 0)
+ return(-1)
+ return(as.numeric(unlist(strsplit(z[idx], split="-"))[2]))
+ }))
+ return(ensVers)
+}
+
+####============================================================
+## getGenomeTwoBitFile
+##
+## Search and retrieve a genomic DNA resource through a TwoBitFile
+## from AnnotationHub.
+####------------------------------------------------------------
+setMethod("getGenomeTwoBitFile", "EnsDb", function(x){
+ ah <- AnnotationHub()
+ ## Reduce the AnnotationHub to species, provider and genome version.
+ ah <- .reduceAH(ah, organism=organism(x), dataprovider="Ensembl",
+ genome=unique(genome(x)))
+ if(length(ah) == 0)
+ stop("Can not find any ressources in AnnotationHub for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "!")
+ ## Reduce to all Fasta files with toplevel or primary_assembly.
+ ah <- ah[ah$rdataclass == "TwoBitFile", ]
+ if(length(ah) == 0)
+ stop("No TwoBitFile available in AnnotationHub for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "!")
+ ## Reduce to dna.toplevel or dna.primary_assembly.
+ idx <- c(grep(ah$title, pattern="dna.toplevel"),
+ grep(ah$title, pattern="dna.primary_assembly"))
+ if(length(idx) == 0)
+ stop("No genome assembly fasta file available for organism: ",
+ organism(x), ", data provider: Ensembl and genome version: ",
+ unique(genome(x)), "!")
+ ah <- ah[idx, ]
+ ## Get the Ensembl version from the source url.
+ ensVers <- .ensVersionFromSourceUrl(ah$sourceurl)
+ if(any(ensVers == ensemblVersion(x))){
+ ## Got it.
+ itIs <- which(ensVers == ensemblVersion(x))
+ }else{
+ ## Get the "closest" one.
+ diffs <- abs(ensVers - as.numeric(ensemblVersion(x)))
+ itIs <- which(diffs == min(diffs))[1]
+ message("Returning the TwoBit file for Ensembl version ", ensVers[itIs],
+ " since no file for Ensembl version ", ensemblVersion(x),
+ " is available.")
+ }
+ ## Getting the ressource.
+ Dna <- ah[[names(ah)[itIs]]]
+ return(Dna)
+})
+
+
+
+### listTables
+## returns a named list with database table columns
+setMethod("listTables", "EnsDb", function(x, ...){
+ if(length(x at tables)==0){
+ tables <- dbListTables(dbconn(x))
+ ## Quick fix for EnsDbs containing also protein data (issue #30):
+ tables <- tables[!(tables %in% c("protein", "uniprot",
+ "protein_domain"))]
+ ## read the columns for these tables.
+ Tables <- vector(length=length(tables), "list")
+ for(i in 1:length(Tables)){
+ Tables[[ i ]] <- colnames(dbGetQuery(dbconn(x),
+ paste0("select * from ",
+ tables[ i ],
+ " limit 1")))
+ }
+ names(Tables) <- tables
+ x at tables <- Tables
+ }
+ Tab <- x at tables
+ Tab <- Tab[tablesByDegree(x, tab=names(Tab))]
+ ## Manually add tx_name as a "virtual" column; getWhat will insert the tx_id into that.
+ Tab$tx <- unique(c(Tab$tx, "tx_name"))
+ ## Manually add the symbol as a "virtual" column.
+ Tab$gene <- unique(c(Tab$gene, "symbol"))
+ return(Tab)
+})
+
+### listColumns
+## lists all columns.
+setMethod("listColumns", "EnsDb", function(x,
+ table,
+ skip.keys=TRUE, ...){
+ if(length(x at tables)==0){
+ tables <- dbListTables(dbconn(x))
+ ## Quick fix for EnsDbs containing also protein data (issue #30):
+ tables <- tables[!(tables %in% c("protein", "uniprot",
+ "protein_domain"))]
+ ## read the columns for these tables.
+ Tables <- vector(length=length(tables), "list")
+ for(i in 1:length(Tables)){
+ Tables[[ i ]] <- colnames(dbGetQuery(dbconn(x),
+ paste0("select * from ",
+ tables[ i ],
+ " limit 1")))
+ }
+ names(Tables) <- tables
+ x at tables <- Tables
+ }
+ Tab <- x at tables
+ ## Manually add tx_name as a "virtual" column; getWhat will insert the tx_id into that.
+ Tab$tx <- unique(c(Tab$tx, "tx_name"))
+ ## Manually add the symbol as a "virtual" column.
+ Tab$gene <- unique(c(Tab$gene, "symbol"))
+ if(!missing(table)){
+ columns <- Tab[[ table ]]
+ }else{
+ columns <- unlist(Tab, use.names=FALSE)
+ }
+ if(skip.keys){
+ ## remove everything that has a _pk or _fk...
+ idx <- grep(columns, pattern="_fk$")
+ if(length(idx) > 0)
+ columns <- columns[ -idx ]
+ idx <- grep(columns, pattern="_pk$")
+ if(length(idx) > 0)
+ columns <- columns[ -idx ]
+ }
+ return(columns)
+})
+
+setMethod("listGenebiotypes", "EnsDb", function(x, ...){
+ return(dbGetQuery(dbconn(x), "select distinct gene_biotype from gene")[,1])
+})
+setMethod("listTxbiotypes", "EnsDb", function(x, ...){
+ return(dbGetQuery(dbconn(x), "select distinct tx_biotype from tx")[,1])
+})
+
+### cleanColumns
+## checks columns and removes all that are not present in database tables
+## the method checks internally whether the columns are in the full form,
+## i.e. gene.gene_id (<table name>.<column name>)
+setMethod("cleanColumns", "EnsDb", function(x,
+ columns, ...){
+ if(missing(columns))
+ stop("No columns submitted!")
+ ## vote of the majority
+ full.name <- length(grep(columns, pattern=".", fixed=TRUE)) >
+ floor(length(columns) /2)
+ if(full.name){
+ suppressWarnings(
+ full.columns <- unlist(prefixColumns(x,
+ unlist(listTables(x)),
+ clean=FALSE),
+ use.names=TRUE)
+ )
+ bm <- columns %in% full.columns
+ removed <- columns[ !bm ]
+ }else{
+ bm <- columns %in% unlist(listTables(x)[ c("gene", "tx", "exon",
+ "tx2exon", "chromosome") ])
+ removed <- columns[ !bm ]
+ }
+ if(length(removed) > 0){
+ warning("Columns ", paste(sQuote(removed), collapse=", "),
+ " are not valid and have been removed")
+ }
+ return(columns[ bm ])
+})
+
+### tablesForColumns
+## returns the tables for the specified columns.
+setMethod("tablesForColumns", "EnsDb", function(x, columns, ...){
+ if(missing(columns))
+ stop("No columns submitted!")
+ bm <- unlist(lapply(listTables(x), function(z){
+ return(any(z %in% columns))
+ }))
+ if(!any(bm))
+ return(NULL)
+ Tables <- names(bm)[ bm ]
+ Tables <- Tables[ !(Tables %in% c("metadata")) ]
+ return(Tables)
+})
+
+## returns the table names ordered by degree, i.e. edges to other tables
+setMethod("tablesByDegree", "EnsDb", function(x,
+ tab=names(listTables(x)),
+ ...){
+ ## ## to do this with a graph:
+ ## DBgraph <- graphNEL(nodes=c("gene", "tx", "tx2exon", "exon", "chromosome", "information"),
+ ## edgeL=list(gene=c("tx", "chromosome"),
+ ## tx=c("gene", "tx2exon"),
+ ## tx2exon=c("tx", "exon"),
+ ## exon="tx2exon",
+ ## chromosome="gene"
+ ## ))
+ ## Tab <- names(sort(degree(DBgraph), decreasing=TRUE))
+ Table.order <- c(gene=1, tx=2, tx2exon=3, exon=4, chromosome=5, metadata=6)
+ ##Table.order <- c(gene=2, tx=1, tx2exon=3, exon=4, chromosome=5, metadata=6)
+ Tab <- tab[ order(Table.order[ tab ]) ]
+ return(Tab)
+})
+
+
+
+
+### genes:
+## get genes from the database.
+setMethod("genes", "EnsDb", function(x,
+ columns=listColumns(x, "gene"),
+ filter, order.by="",
+ order.type="asc",
+ return.type="GRanges"){
+ return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
+ columns <- unique(c(columns, "gene_id"))
+ ## if return.type is GRanges we require columns: seq_name, gene_seq_start
+ ## and gene_seq_end and seq_strand
+ if(return.type=="GRanges"){
+ columns <- unique(c(columns, c("gene_seq_start", "gene_seq_end",
+ "seq_name", "seq_strand")))
+ }
+ if(missing(filter)){
+ filter=list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ filter <- setFeatureInGRangesFilter(filter, "gene")
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ retColumns <- columns
+ ## If we don't have an order.by define one.
+ if(all(order.by == "")){
+ order.by <- NULL
+ if (any(columns == "seq_name"))
+ order.by <- c(order.by, "seq_name")
+ if( any(columns == "gene_seq_start"))
+ order.by <- c(order.by, "gene_seq_start")
+ if(is.null(order.by))
+ order.by <- ""
+ }
+ Res <- getWhat(x, columns=columns, filter=filter,
+ order.by=order.by, order.type=order.type)
+ if(return.type=="data.frame" | return.type=="DataFrame"){
+ notThere <- !(retColumns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ retColumns <- retColumns[!notThere]
+ Res <- Res[, retColumns]
+ if(return.type=="DataFrame")
+ Res <- DataFrame(Res)
+ return(Res)
+ }
+ if(return.type=="GRanges"){
+ metacols <- columns[ !(columns %in% c("seq_name",
+ "seq_strand",
+ "gene_seq_start",
+ "gene_seq_end")) ]
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ SI <- SI[as.character(unique(Res$seq_name))]
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ ranges=IRanges(start=Res$gene_seq_start, end=Res$gene_seq_end),
+ strand=Rle(Res$seq_strand),
+ seqinfo=SI[as.character(unique(Res$seq_name))],
+ Res[ , metacols, drop=FALSE ]
+ )
+ names(GR) <- Res$gene_id
+ return(GR)
+ }
+})
+
+### transcripts:
+## get transcripts from the database.
+setMethod("transcripts", "EnsDb", function(x, columns=listColumns(x, "tx"),
+ filter, order.by="", order.type="asc",
+ return.type="GRanges"){
+ return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
+ columns <- unique(c(columns, "tx_id"))
+ ## if return.type is GRanges we require columns: seq_name, gene_seq_start
+ ## and gene_seq_end and seq_strand
+ if(return.type=="GRanges"){
+ columns <- unique(c(columns, c("tx_seq_start",
+ "tx_seq_end",
+ "seq_name",
+ "seq_strand")))
+ }
+ if(missing(filter)){
+ filter=list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ filter <- setFeatureInGRangesFilter(filter, "tx")
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ retColumns <- columns
+ ## If we don't have an order.by define one.
+ if(all(order.by == "")){
+ order.by <- NULL
+ if(any(columns == "seq_name"))
+ order.by <- c(order.by, "seq_name")
+ if(any(columns == "tx_seq_start"))
+ order.by <- c(order.by, "tx_seq_start")
+ if(is.null(order.by))
+ order.by <- ""
+ }
+ Res <- getWhat(x, columns=columns, filter=filter,
+ order.by=order.by, order.type=order.type)
+ if(return.type=="data.frame" | return.type=="DataFrame"){
+ notThere <- !(retColumns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ retColumns <- retColumns[!notThere]
+ Res <- Res[, retColumns]
+ if(return.type=="DataFrame")
+ Res <- DataFrame(Res)
+ return(Res)
+ }
+ if(return.type=="GRanges"){
+ notThere <- !(columns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(columns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ columns <- columns[!notThere]
+ metacols <- columns[ !(columns %in% c("seq_name",
+ "seq_strand",
+ "tx_seq_start",
+ "tx_seq_end")) ]
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ SI <- SI[as.character(unique(Res$seq_name))]
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ ranges=IRanges(start=Res$tx_seq_start, end=Res$tx_seq_end),
+ strand=Rle(Res$seq_strand),
+ seqinfo=SI[as.character(unique(Res$seq_name))],
+ Res[ , metacols, drop=FALSE ]
+ )
+ names(GR) <- Res$tx_id
+ return(GR)
+ }
+})
+
+### promoters:
+## get promoter regions from the database.
+setMethod("promoters", "EnsDb",
+ function(x, upstream=2000, downstream=200, ...)
+ {
+ gr <- transcripts(x, ...)
+ trim(suppressWarnings(promoters(gr,
+ upstream=upstream,
+ downstream=downstream)))
+ }
+)
+
+### exons:
+## get exons from the database.
+setMethod("exons", "EnsDb", function(x, columns=listColumns(x, "exon"), filter,
+ order.by="", order.type="asc",
+ return.type="GRanges"){
+ return.type <- match.arg(return.type, c("data.frame", "GRanges", "DataFrame"))
+ if(!any(columns %in% c(listColumns(x, "exon"), "exon_idx"))){
+ ## have to have at least one column from the gene table...
+ columns <- c(columns, "exon_id")
+ }
+ columns <- unique(c(columns, "exon_id"))
+ ## if return.type is GRanges we require columns: seq_name, gene_seq_start
+ ## and gene_seq_end and seq_strand
+ if(return.type=="GRanges"){
+ columns <- unique(c(columns, c("exon_seq_start",
+ "exon_seq_end",
+ "seq_name",
+ "seq_strand")))
+ }
+ if(missing(filter)){
+ filter=list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ filter <- setFeatureInGRangesFilter(filter, "exon")
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ retColumns <- columns
+ ## If we don't have an order.by define one.
+ if (order.by == "") {
+ order.by <- NULL
+ if (any(columns == "seq_name"))
+ order.by <- c(order.by, "seq_name")
+ if (any(columns == "exon_seq_start"))
+ order.by <- c(order.by, "exon_seq_start")
+ if(is.null(order.by))
+ order.by <- ""
+ }
+ Res <- getWhat(x, columns=columns, filter=filter,
+ order.by=order.by, order.type=order.type)
+ if(return.type=="data.frame" | return.type=="DataFrame"){
+ notThere <- !(retColumns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(retColumns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ retColumns <- retColumns[!notThere]
+ Res <- Res[, retColumns]
+ if(return.type=="DataFrame")
+ Res <- DataFrame(Res)
+ return(Res)
+ }
+ if(return.type=="GRanges"){
+ notThere <- !(columns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(columns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ columns <- columns[!notThere]
+ metacols <- columns[ !(columns %in% c("seq_name",
+ "seq_strand",
+ "exon_seq_start",
+ "exon_seq_end")) ]
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ SI <- SI[as.character(unique(Res$seq_name))]
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ ranges=IRanges(start=Res$exon_seq_start, end=Res$exon_seq_end),
+ strand=Rle(Res$seq_strand),
+ seqinfo=SI[as.character(unique(Res$seq_name))],
+ Res[ , metacols, drop=FALSE ]
+ )
+ names(GR) <- Res$exon_id
+ return(GR)
+ }
+})
+
+
+## should return a GRangesList
+## still considerably slower than the corresponding call in the GenomicFeatures package.
+setMethod("exonsBy", "EnsDb", function(x, by = c("tx", "gene"),
+ columns = listColumns(x, "exon"),
+ filter, use.names = FALSE) {
+ by <- match.arg(by, c("tx", "gene"))
+ bySuff <- "_id"
+ if (use.names) {
+ if (by == "tx") {
+ use.names <- FALSE
+ warning("Argument use.names ignored as no transcript names are available.")
+ } else {
+ columns <- unique(c(columns, "gene_name"))
+ bySuff <- "_name"
+ }
+ }
+ if (missing(filter)) {
+ filter <- list()
+ } else {
+ filter <- checkFilter(filter)
+ }
+ ## We're applying eventual GRangesFilter to either gene or tx.
+ filter <- setFeatureInGRangesFilter(filter, by)
+ ## Eventually add columns for the filters:
+ columns <- unique(c(columns, "exon_id"))
+ columns <- addFilterColumns(columns, filter, x)
+ ## Quick fix; rename any exon_rank to exon_idx.
+ columns[columns == "exon_rank"] <- "exon_idx"
+
+ ## The minimum columns we need, in addition to "columns"
+ min.columns <- c(paste0(by, "_id"), "seq_name","exon_seq_start",
+ "exon_seq_end", "exon_id", "seq_strand")
+ by.id.full <- unlist(prefixColumns(x, columns = paste0(by, "_id"),
+ clean = FALSE),
+ use.names = FALSE)
+ if (by == "gene") {
+ ## tx columns have to be removed, since the same exon can be part of
+ ## more than one tx
+ txcolumns <- c(listColumns(x, "tx"), "exon_idx")
+ txcolumns <- txcolumns[txcolumns != "gene_id"]
+ torem <- columns %in% txcolumns
+ if (any(torem))
+ warning("Columns ",
+ paste(columns[ torem ], collapse = ","),
+ " have been removed as they are not allowed if exons",
+ " are fetched by gene.")
+ columns <- columns[!torem]
+ } else {
+ min.columns <- unique(c(min.columns, "exon_idx"))
+ columns <- c(columns, "exon_idx")
+ }
+ ## define the minimal columns that we need...
+ ret_cols <- unique(columns) ## before adding the "min.columns"
+ columns <- unique(c(columns, min.columns))
+ ## get the seqinfo:
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ ## Resolve ordering problems.
+ orderR <- orderResultsInR(x)
+ if (orderR) {
+ order.by <- ""
+ } else {
+ if (by == "gene") {
+ order.by <- paste0("gene.gene_id, ",
+ "case when seq_strand = 1 then exon_seq_start",
+ " when seq_strand = -1 then (exon_seq_end * -1)",
+ " end")
+ } else {
+ ## Funny thing is the query takes longer if I use tx2exon.tx_id!
+ order.by <- "tx.tx_id, tx2exon.exon_idx"
+ }
+ }
+ Res <- getWhat(x, columns = columns, filter = filter,
+ order.by = order.by, skip.order.check = TRUE)
+ ## Now, order in R, if not already done in SQL.
+ if (orderR) {
+ if (by == "gene") {
+ startend <- (Res$seq_strand == 1) * Res$exon_seq_start +
+ (Res$seq_strand == -1) * (Res$exon_seq_end * -1)
+ Res <- Res[order(Res$gene_id, startend,
+ method = "radix"), ]
+ } else {
+ Res <- Res[order(Res$tx_id, Res$exon_idx,
+ method = "radix"), ]
+ }
+ }
+ SI <- SI[as.character(unique(Res$seq_name))]
+ ## replace exon_idx with exon_rank
+ colnames(Res)[colnames(Res) == "exon_idx"] <- "exon_rank"
+ columns[columns == "exon_idx"] <- "exon_rank"
+ ret_cols[ret_cols == "exon_idx"] <- "exon_rank"
+ notThere <- !(ret_cols %in% colnames(Res))
+ if (any(notThere))
+ warning(paste0("Columns ", paste(ret_cols[notThere], collapse = ", "),
+ " not present in the result data.frame!"))
+ ret_cols <- ret_cols[!notThere]
+ columns.metadata <- ret_cols[!(ret_cols %in% c("seq_name", "seq_strand",
+ "exon_seq_start",
+ "exon_seq_end"))]
+ columns.metadata <- match(columns.metadata, colnames(Res))
+ GR <- GRanges(seqnames = Rle(Res$seq_name),
+ strand = Rle(Res$seq_strand),
+ ranges = IRanges(start = Res$exon_seq_start,
+ end = Res$exon_seq_end),
+ seqinfo = SI,
+ Res[, columns.metadata, drop=FALSE]
+ )
+ return(split(GR, Res[, paste0(by, bySuff)]))
+})
+
+
+############################################################
+## transcriptsBy
+setMethod("transcriptsBy", "EnsDb", function(x, by = c("gene", "exon"),
+ columns = listColumns(x, "tx"),
+ filter){
+ if (any(by == "cds"))
+ stop("fetching transcripts by cds is not (yet) implemented.")
+ by <- match.arg(by, c("gene", "exon"))
+ byId <- paste0(by, "_id")
+ min.columns <- c(paste0(by, "_id"), "seq_name", "tx_seq_start",
+ "tx_seq_end", "tx_id", "seq_strand")
+ ## can not have exon columns!
+ ex_cols <- c(listColumns(x, "exon"), "exon_idx")
+ ex_cols <- ex_cols[ex_cols != "tx_id"]
+ torem <- columns %in% ex_cols
+ if (any(torem))
+ warning("Columns ",
+ paste(columns[ torem ], collapse=","),
+ " have been removed as they are not allowed if",
+ " transcripts are fetched.")
+ columns <- columns[!torem]
+ ## Process filters
+ if (missing(filter)) {
+ filter <- list()
+ } else {
+ filter <- checkFilter(filter)
+ }
+ ## GRanges filter should be based on either gene or exon coors.
+ filter <- setFeatureInGRangesFilter(filter, by)
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ ret_cols <- unique(columns)
+ ## define the minimal columns that we need...
+ columns <- unique(c(columns, min.columns))
+ ## get the seqinfo:
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ byIdFull <- unlist(prefixColumns(x, columns = byId, clean = FALSE),
+ use.names = FALSE)
+ orderR <- orderResultsInR(x)
+ if (orderR) {
+ order.by <- ""
+ } else {
+ order.by <- paste0(byIdFull ,
+ ", case when seq_strand = 1 then tx_seq_start",
+ " when seq_strand = -1 then (tx_seq_end * -1) end")
+ }
+ Res <- getWhat(x, columns=columns, filter=filter,
+ order.by=order.by, skip.order.check=TRUE)
+ if (orderR) {
+ startEnd <- (Res$seq_strand == 1) * Res$tx_seq_start +
+ (Res$seq_strand == -1) * (Res$tx_seq_end * -1)
+ Res <- Res[order(Res[, byId], startEnd, method = "radix"), ]
+ }
+ SI <- SI[as.character(unique(Res$seq_name))]
+ ## Replace exon_idx with exon_rank
+ colnames(Res) <- gsub(colnames(Res), pattern = "exon_idx",
+ replacement = "exon_rank", fixed = TRUE)
+ ret_cols[ret_cols == "exon_idx"] <- "exon_rank"
+ notThere <- !(ret_cols %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(ret_cols[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ ret_cols <- ret_cols[!notThere]
+ columns.metadata <- ret_cols[!(ret_cols %in% c("seq_name", "seq_strand",
+ "tx_seq_start",
+ "tx_seq_end"))]
+ columns.metadata <- match(columns.metadata, colnames(Res)) ## presumably faster...
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ strand=Rle(Res$seq_strand),
+ ranges=IRanges(start=Res$tx_seq_start, end=Res$tx_seq_end),
+ seqinfo=SI,
+ Res[ , columns.metadata, drop=FALSE ]
+ )
+ return(split(GR, Res[ , byId]))
+})
+
+
+## for GRangesList...
+setMethod("lengthOf", "GRangesList", function(x, ...){
+ return(sum(width(reduce(x))))
+## return(unlist(lapply(width(reduce(x)), sum)))
+})
+
+## return the length of genes or transcripts
+setMethod("lengthOf", "EnsDb", function(x, of="gene", filter=list()){
+ of <- match.arg(of, c("gene", "tx"))
+ ## get the exons by gene or transcript from the database...
+ suppressWarnings(
+ GRL <- exonsBy(x, by=of, filter=filter)
+ )
+ return(lengthOf(GRL))
+})
+
+####============================================================
+## transcriptLengths
+##
+## For TxDb: calls just the function (not method!) from the GenomicFeatures
+## package.
+## For EnsDb: calls the .transcriptLengths function.
+####------------------------------------------------------------
+## setMethod("transcriptLengths", "TxDb", function(x, with.cds_len=FALSE, with.utr5_len=FALSE,
+## with.utr3_len=FALSE){
+## return(GenomicFeatures::transcriptLengths(x, with.cds_len=with.cds_len,
+## with.utr5_len=with.utr5_len,
+## with.utr3_len=with.utr3_len))
+## })
+## setMethod("transcriptLengths", "EnsDb", function(x, with.cds_len=FALSE, with.utr5_len=FALSE,
+## with.utr3_len=FALSE, filter=list()){
+## return(.transcriptLengths(x, with.cds_len=with.cds_len, with.utr5_len=with.utr3_len,
+## with.utr3_len=with.utr3_len, filter=filter))
+## })
+## implement the method from the GenomicFeatures package
+.transcriptLengths <- function(x, with.cds_len=FALSE, with.utr5_len=FALSE,
+ with.utr3_len=FALSE, filter=list()){
+ ## First we're going to fetch the exonsBy.
+ ## Or use getWhat???
+ ## Dash, have to make two queries!
+ allTxs <- transcripts(x, filter=filter)
+ exns <- exonsBy(x, filter=filter)
+ ## Match ordering
+ exns <- exns[match(allTxs$tx_id, names(exns))]
+ ## Calculate length of transcripts.
+ txLengths <- sum(width(reduce(exns)))
+ ## Calculate no. of exons.
+ ## build result data frame:
+ Res <- data.frame(tx_id=allTxs$tx_id, gene_id=allTxs$gene_id,
+ nexon=lengths(exns), tx_len=txLengths,
+ stringsAsFactors=FALSE)
+ if(!any(c(with.cds_len, with.utr5_len, with.utr3_len))){
+ ## Return what we've got thus far.
+ return(Res)
+ }
+ if(with.cds_len)
+ Res <- cbind(Res, cds_len=rep(NA, nrow(Res)))
+ if(with.utr5_len)
+ Res <- cbind(Res, utr5_len=rep(NA, nrow(Res)))
+ if(with.utr3_len)
+ Res <- cbind(Res, utr3_len=rep(NA, nrow(Res)))
+ ## Otherwise do the remaining stuff...
+ txs <- allTxs[!is.na(allTxs$tx_cds_seq_start)]
+ if(length(txs) > 0){
+ cExns <- exns[txs$tx_id]
+ cReg <- GRanges(seqnames=seqnames(txs),
+ ranges=IRanges(txs$tx_cds_seq_start,
+ txs$tx_cds_seq_end),
+ strand=strand(txs),
+ tx_id=txs$tx_id)
+ cReg <- split(cReg, f=cReg$tx_id)
+ ## Match order.
+ cReg <- cReg[match(txs$tx_id, names(cReg))]
+ cdsExns <- intersect(cReg, cExns)
+ ## cExns: all exons of coding transcripts (includes untranslated
+ ## and translated region)
+ ## cReg: just the start-end position of the coding region of the tx.
+ ## cdsExns: the coding part of all exons of the tx.
+ if(with.cds_len){
+ ## Calculate CDS length
+ cdsLengths <- sum(width(reduce(cdsExns)))
+ Res[names(cdsLengths), "cds_len"] <- cdsLengths
+ }
+ if(with.utr3_len | with.utr5_len){
+ ## ! UTR is the difference between the exons and the cds-exons
+ ## Note: order of parameters is important!
+ utrReg <- setdiff(cExns, cdsExns)
+ leftOfCds <- utrReg[end(utrReg) < start(cReg)]
+ rightOfCds <- utrReg[start(utrReg) > end(cReg)]
+ ## Calculate lengths.
+ leftOfLengths <- sum(width(reduce(leftOfCds)))
+ rightOfLengths <- sum(width(reduce(rightOfCds)))
+ minusTx <- which(as.character(strand(txs)) == "-" )
+ if(with.utr3_len){
+ ## Ordering of txs and all other stuff matches.
+ tmp <- rightOfLengths
+ tmp[minusTx] <- leftOfLengths[minusTx]
+ Res[names(tmp), "utr3_len"] <- tmp
+ }
+ if(with.utr5_len){
+ tmp <- leftOfLengths
+ tmp[minusTx] <- rightOfLengths[minusTx]
+ Res[names(tmp), "utr5_len"] <- tmp
+ }
+ }
+ }
+ return(Res)
+}
+
+## cdsBy... return coding region ranges by tx or by gene.
+setMethod("cdsBy", "EnsDb", function(x, by = c("tx", "gene"),
+ columns = NULL, filter,
+ use.names = FALSE){
+ by <- match.arg(by, c("tx", "gene"))
+ if (missing(filter)) {
+ filter = list()
+ } else {
+ filter <- checkFilter(filter)
+ }
+ filter <- setFeatureInGRangesFilter(filter, by)
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ ## Add a filter ensuring that only coding transcripts are queried.
+ filter <- c(list(OnlyCodingTx()), filter)
+ bySuff <- "_id"
+ if (by == "tx") {
+ ## adding exon_id, exon_idx to the columns.
+ columns <- unique(c(columns, "exon_id", "exon_idx"))
+ if (use.names)
+ warning("Not considering use.names as no transcript names are available.")
+ } else {
+ columns <- unique(c("gene_id", columns))
+ if( use.names) {
+ bySuff <- "_name"
+ columns <- c(columns, "gene_name")
+ }
+ }
+ byId <- paste0(by, bySuff)
+ ## Query the data
+ fetchCols <- unique(c(byId, columns, "tx_cds_seq_start", "tx_cds_seq_end",
+ "seq_name", "seq_strand", "exon_idx", "exon_id",
+ "exon_seq_start", "exon_seq_end"))
+ ## Ordering of the results:
+ ## Force ordering in R by default here to fix issue #11
+ ##orderR <- orderResultsInR(x)
+ orderR <- TRUE
+ if (orderR) {
+ order.by <- ""
+ } else {
+ if (by == "tx") {
+ ## Here we want to sort the exons by exon_idx
+ order.by <- "tx.tx_id, tx2exon.exon_idx"
+ } else {
+ ## Here we want to sort the transcripts by tx start.
+ order.by <- "gene.gene_id, case when seq_strand = 1 then tx_cds_seq_start when seq_strand = -1 then (tx_cds_seq_end * -1) end"
+ }
+ }
+ Res <- getWhat(x, columns = fetchCols,
+ filter = filter,
+ order.by = order.by,
+ skip.order.check = TRUE)
+ ## Remove rows with NA in tx_cds_seq_start; that's the case for "old" databases.
+ nas <- is.na(Res$tx_cds_seq_start)
+ if (any(nas))
+ Res <- Res[!nas, ]
+ ## Remove exons that are not within the cds.
+ Res <- Res[Res$exon_seq_end >= Res$tx_cds_seq_start &
+ Res$exon_seq_start <= Res$tx_cds_seq_end,
+ , drop = FALSE]
+ if (orderR) {
+ ## And finally ordering them.
+ if (by == "tx") {
+ Res <- Res[order(Res$tx_id, Res$exon_idx, method = "radix"), ]
+ } else {
+ startend <- (Res$seq_strand == 1) * Res$tx_cds_seq_start +
+ (Res$seq_strand == -1) * (Res$tx_cds_seq_end * -1)
+ Res <- Res[order(Res$gene_id, startend, method = "radix"), ]
+ }
+ }
+ if(nrow(Res)==0){
+ warning("No cds found!")
+ return(NULL)
+ }
+ cdsStarts <- pmax.int(Res$exon_seq_start, Res$tx_cds_seq_start)
+ cdsEnds <- pmin.int(Res$exon_seq_end, Res$tx_cds_seq_end)
+ ## get the seqinfo:
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ SI <- SI[as.character(unique(Res$seq_name))]
+ ## Rename columns exon_idx to exon_rank, if present
+ if(any(colnames(Res) == "exon_idx")){
+ colnames(Res)[colnames(Res) == "exon_idx"] <- "exon_rank"
+ columns[columns == "exon_idx"] <- "exon_rank"
+ }
+ ## Building the result.
+ if(length(columns) > 0){
+ notThere <- !(columns %in% colnames(Res))
+ if(any(notThere))
+ warning(paste0("Columns ", paste(columns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ columns <- columns[!notThere]
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ strand=Rle(Res$seq_strand),
+ ranges=IRanges(start=cdsStarts, end=cdsEnds),
+ seqinfo=SI,
+ Res[, columns, drop=FALSE])
+ }else{
+ GR <- GRanges(seqnames=Rle(Res$seq_name),
+ strand=Rle(Res$seq_strand),
+ ranges=IRanges(start=cdsStarts, end=cdsEnds),
+ seqinfo=SI)
+ }
+ GR <- split(GR, Res[, paste0(by, bySuff)])
+ ## For "by gene" we reduce the redundant ranges;
+ ## that way we loose however all additional columns!
+ if(by == "gene")
+ GR <- reduce(GR)
+ return(GR)
+})
+
+
+############################################################
+## getUTRsByTranscript
+getUTRsByTranscript <- function(x, what, columns = NULL, filter) {
+ if (missing(filter)) {
+ filter <- list()
+ } else {
+ filter <- checkFilter(filter)
+ }
+ filter <- setFeatureInGRangesFilter(filter, "tx")
+ ## Eventually add columns for the filters:
+ columns <- addFilterColumns(columns, filter, x)
+ columns <- unique(c(columns, "exon_id", "exon_idx"))
+ ## Add the filter for coding tx only.
+ filter <- c(list(OnlyCodingTx()), filter)
+ ## what do we need: tx_cds_seq_start, tx_cds_seq_end and exon_idx
+ fetchCols <- unique(c("tx_id", columns, "tx_cds_seq_start",
+ "tx_cds_seq_end", "seq_name", "seq_strand",
+ "exon_seq_start", "exon_seq_end"))
+ order.by <- "tx.tx_id"
+ ## get the seqinfo:
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ ## Note: doing that with a single query and some coordinate juggling
+ ## is faster than calling exonsBy and GRangesList setdiff etc.
+ Res <- getWhat(x, columns=fetchCols,
+ filter=filter,
+ order.by=order.by,
+ skip.order.check=TRUE)
+ nas <- is.na(Res$tx_cds_seq_start)
+ if (any(nas))
+ Res <- Res[!nas, ]
+ ## Remove exons that are within the cds.
+ Res <- Res[Res$exon_seq_start < Res$tx_cds_seq_start |
+ Res$exon_seq_end > Res$tx_cds_seq_end, , drop=FALSE]
+ if (nrow(Res) == 0) {
+ warning(paste0("No ", what, "UTR found!"))
+ return(NULL)
+ }
+ ## Rename columns exon_idx to exon_rank, if present
+ if (any(colnames(Res) == "exon_idx")) {
+ colnames(Res) <- sub(colnames(Res), pattern = "exon_idx",
+ replacement = "exon_rank", fixed = TRUE)
+ columns[columns == "exon_idx"] <- "exon_rank"
+ }
+ if (what == "five") {
+ ## All those on the forward strand for which the exon start is smaller
+ ## than the cds start and those on the reverse strand with an exon end
+ ## larger than the cds end.
+ Res <- Res[(Res$seq_strand > 0 & Res$exon_seq_start < Res$tx_cds_seq_start)
+ | (Res$seq_strand < 0 & Res$exon_seq_end > Res$tx_cds_seq_end),
+ , drop=FALSE]
+ } else {
+ ## Other way round.
+ Res <- Res[(Res$seq_strand > 0 & Res$exon_seq_end > Res$tx_cds_seq_end) |
+ (Res$seq_strand < 0 & Res$exon_seq_start < Res$tx_cds_seq_start),
+ , drop=FALSE]
+ }
+ if (nrow(Res) == 0) {
+ warning(paste0("No ", what, "UTR found!"))
+ return(NULL)
+ }
+ ## Increase the cds end by 1 and decrease the start by 1, thus,
+ ## avoiding that the UTR overlaps the cds
+ Res$tx_cds_seq_end <- Res$tx_cds_seq_end + 1L
+ Res$tx_cds_seq_start <- Res$tx_cds_seq_start - 1L
+ utrStarts <- rep(0, nrow(Res))
+ utrEnds <- utrStarts
+ ## Distinguish between stuff which is left of and right of the CDS:
+ ## Left of the CDS: can be either 5' for + strand or 3' for - strand.
+ bm <- which(Res$exon_seq_start <= Res$tx_cds_seq_start)
+ if (length(bm) > 0) {
+ if (what == "five") {
+ ## 5' and left of CDS means we're having 5' CDSs
+ bm <- bm[Res$seq_strand[bm] > 0]
+ if(length(bm) > 0){
+ utrStarts[bm] <- Res$exon_seq_start[bm]
+ utrEnds[bm] <- pmin.int(Res$exon_seq_end[bm],
+ Res$tx_cds_seq_start[bm])
+ }
+ } else {
+ bm <- bm[Res$seq_strand[bm] < 0]
+ if (length(bm) > 0) {
+ utrStarts[bm] <- Res$exon_seq_start[bm]
+ utrEnds[bm] <- pmin.int(Res$exon_seq_end[bm],
+ Res$tx_cds_seq_start[bm])
+ }
+ }
+ }
+ ## Right of the CDS: can be either 5' for - strand of 3' for + strand.
+ bm <- which(Res$exon_seq_end >= Res$tx_cds_seq_end)
+ if (length(bm) > 0) {
+ if (what == "five") {
+ ## Right of CDS is 5' for - strand.
+ bm <- bm[Res$seq_strand[bm] < 0]
+ if (length(bm) > 0) {
+ utrStarts[bm] <- pmax.int(Res$exon_seq_start[bm],
+ Res$tx_cds_seq_end[bm])
+ utrEnds[bm] <- Res$exon_seq_end[bm]
+ }
+ } else {
+ ## Right of CDS is 3' for + strand
+ bm <- bm[Res$seq_strand[bm] > 0]
+ if (length(bm) > 0) {
+ utrStarts[bm] <- pmax.int(Res$exon_seq_start[bm],
+ Res$tx_cds_seq_end[bm])
+ utrEnds[bm] <- Res$exon_seq_end[bm]
+ }
+ }
+ }
+ notThere <- !(columns %in% colnames(Res))
+ if (any(notThere))
+ warning(paste0("Columns ", paste(columns[notThere], collapse=", "),
+ " not present in the result data.frame!"))
+ columns <- columns[!notThere]
+ SI <- SI[as.character(unique(Res$seq_name))]
+ GR <- GRanges(seqnames = Rle(Res$seq_name),
+ strand = Rle(Res$seq_strand),
+ ranges = IRanges(start=utrStarts, end=utrEnds),
+ seqinfo = SI,
+ Res[, columns, drop = FALSE])
+ GR <- split(GR, Res[, "tx_id"])
+ return(GR)
+}
+
+## threeUTRsByTranscript
+setMethod("threeUTRsByTranscript", "EnsDb", function(x, columns=NULL, filter){
+ if(missing(filter)){
+ filter=list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ return(getUTRsByTranscript(x=x, what="three", columns=columns, filter=filter))
+})
+
+## fiveUTRsByTranscript
+setMethod("fiveUTRsByTranscript", "EnsDb", function(x, columns=NULL, filter){
+ if(missing(filter)){
+ filter=list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ return(getUTRsByTranscript(x=x, what="five", columns=columns, filter=filter))
+})
+
+
+## toSAF... function to transform a GRangesList into a data.frame
+## corresponding to the SAF format.
+## assuming the names of the GRangesList to be the GeneID and the
+## element (GRanges) the start/end coordinates
+## of an exon, transcript or the gene itself.
+.toSaf <- function(x){
+ DF <- as.data.frame(x)
+ colnames(DF)[ colnames(DF)=="group_name" ] <- "GeneID"
+ colnames(DF)[ colnames(DF)=="seqnames" ] <- "Chr"
+ colnames(DF)[ colnames(DF)=="start" ] <- "Start"
+ colnames(DF)[ colnames(DF)=="end" ] <- "End"
+ colnames(DF)[ colnames(DF)=="strand" ] <- "Strand"
+ return(DF[ , c("GeneID", "Chr", "Start", "End", "Strand")])
+}
+
+## for GRangesList...
+setMethod("toSAF", "GRangesList", function(x, ...){
+ return(.toSaf(x))
+})
+
+.requireTable <- function(db, attr){
+ return(names(prefixColumns(db, columns=attr)))
+}
+## these function determine which tables we need for the submitted filters.
+setMethod("requireTable", signature(x="GeneidFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="gene_id"))
+ })
+setMethod("requireTable", signature(x="EntrezidFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="entrezid"))
+ })
+setMethod("requireTable", signature(x="GenebiotypeFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="gene_biotype"))
+ })
+setMethod("requireTable", signature(x="GenenameFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="gene_name"))
+ })
+setMethod("requireTable", signature(x="TxidFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="tx_id"))
+ })
+setMethod("requireTable", signature(x="TxbiotypeFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="tx_biotype"))
+ })
+setMethod("requireTable", signature(x="ExonidFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="exon_id"))
+ })
+setMethod("requireTable", signature(x="SeqnameFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="seq_name"))
+ })
+setMethod("requireTable", signature(x="SeqstrandFilter", db="EnsDb"),
+ function(x, db, ...){
+ return(.requireTable(db=db, attr="seq_name"))
+ })
+setMethod("requireTable", signature(x="SeqstartFilter", db="EnsDb"),
+ function(x, db, ...){
+ if(x at feature=="gene")
+ return(.requireTable(db=db, attr="gene_seq_start"))
+ if(x at feature=="transcript" | x at feature=="tx")
+ return(.requireTable(db=db, attr="tx_seq_start"))
+ if(x at feature=="exon")
+ return(.requireTable(db=db, attr="exon_seq_start"))
+ return(NA)
+ })
+setMethod("requireTable", signature(x="SeqendFilter", db="EnsDb"),
+ function(x, db, ...){
+ if(x at feature=="gene")
+ return(.requireTable(db=db, attr="gene_seq_end"))
+ if(x at feature=="transcript" | x at feature=="tx")
+ return(.requireTable(db=db, attr="tx_seq_end"))
+ if(x at feature=="exon")
+ return(.requireTable(db=db, attr="exon_seq_end"))
+ return(NA)
+ })
+setMethod("requireTable", signature(x = "SymbolFilter", db = "EnsDb"),
+ function(x, db, ...) {
+ return(.requireTable(db = db, attr = "gene_name"))
+})
+setMethod("buildQuery", "EnsDb",
+ function(x, columns=c("gene_id", "gene_biotype", "gene_name"),
+ filter=list(), order.by="",
+ order.type="asc",
+ skip.order.check=FALSE){
+ return(.buildQuery(x=x,
+ columns=columns,
+ filter=filter,
+ order.by=order.by,
+ order.type=order.type,
+ skip.order.check=skip.order.check))
+ })
+####
+## Method that wraps the internal .getWhat function to retrieve data from the
+## database. In addition, if present, we're renaming chromosome names depending
+## on the ucscChromosomeNames option.
+setMethod("getWhat", "EnsDb",
+ function(x, columns = c("gene_id", "gene_biotype", "gene_name"),
+ filter = list(), order.by = "", order.type = "asc",
+ group.by = NULL, skip.order.check = FALSE) {
+ Res <- .getWhat(x = x,
+ columns = columns,
+ filter = filter,
+ order.by = order.by,
+ order.type = order.type,
+ group.by = group.by,
+ skip.order.check = skip.order.check)
+ ## Eventually renaming seqnames according to the specified style.
+ if(any(colnames(Res) == "seq_name"))
+ Res$seq_name <- formatSeqnamesFromQuery(x, Res$seq_name)
+ return(Res)
+ })
+
+## that's basically a copy of the code from the GenomicFeatures package.
+setMethod("disjointExons", "EnsDb",
+ function(x, aggregateGenes=FALSE, includeTranscripts=TRUE, filter, ...){
+ if(missing(filter)){
+ filter <- list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+
+ exonsByGene <- exonsBy(x, by="gene", filter=filter)
+ exonicParts <- disjoin(unlist(exonsByGene, use.names=FALSE))
+
+ if (aggregateGenes) {
+ foGG <- findOverlaps(exonsByGene, exonsByGene)
+ aggregateNames <- GenomicFeatures:::.listNames(names(exonsByGene),
+ as.list(foGG))
+ foEG <- findOverlaps(exonicParts, exonsByGene, select="first")
+ gene_id <- aggregateNames[foEG]
+ pasteNames <- GenomicFeatures:::.pasteNames(names(exonsByGene),
+ as.list(foGG))[foEG]
+ orderByGeneName <- order(pasteNames)
+ exonic_rle <- runLength(Rle(pasteNames[orderByGeneName]))
+ } else {
+ ## drop exonic parts that overlap > 1 gene
+ foEG <- findOverlaps(exonicParts, exonsByGene)
+ idxList <- as.list(foEG)
+ if (any(keep <- countQueryHits(foEG) == 1)) {
+ idxList <- idxList[keep]
+ exonicParts <- exonicParts[keep]
+ }
+ gene_id <- GenomicFeatures:::.listNames(names(exonsByGene),
+ idxList)
+ orderByGeneName <- order(unlist(gene_id, use.names=FALSE))
+ exonic_rle <- runLength(Rle(unlist(gene_id[orderByGeneName],
+ use.names=FALSE)))
+ }
+ values <- DataFrame(gene_id)
+
+ if (includeTranscripts) {
+ exonsByTx <- exonsBy(x, by="tx", filter=filter)
+ foET <- findOverlaps(exonicParts, exonsByTx)
+ values$tx_name <- GenomicFeatures:::.listNames(names(exonsByTx),
+ as.list(foET))
+ }
+ mcols(exonicParts) <- values
+ exonicParts <- exonicParts[orderByGeneName]
+ exonic_part <- unlist(lapply(exonic_rle, seq_len), use.names=FALSE)
+ exonicParts$exonic_part <- exonic_part
+ return(exonicParts)
+ }
+ )
+
+
+### utility functions
+## checkFilter:
+## checks the filter argument and ensures that a list of Filter object is returned
+checkFilter <- function(x){
+ if(is(x, "list")){
+ if(length(x)==0)
+ return(x)
+ ## check if all elements are Filter classes.
+ IsAFilter <- unlist(lapply(x, function(z){
+ return(is(z, "BasicFilter"))
+ }))
+ if(any(!IsAFilter))
+ stop("One of more elements in filter are not filter objects!")
+ }else{
+ if(is(x, "BasicFilter")){
+ x <- list(x)
+ }else{
+ stop("filter has to be a filter object or a list of filter objects!")
+ }
+ }
+ return(x)
+}
+
+## Fetch data to add as a GeneTrack.
+## filter ... Used to filter the result.
+## chromosome, start, end ... Either all or none has to be specified. If specified, the function
+## first retrieves all transcripts that have an exon in the specified
+## range and adds them as a TranscriptidFilter to the filters. The
+## query to fetch the "real" data is performed after.
+## featureIs ... Wheter gene_biotype or tx_biotype should be mapped to the column
+## feature.
+setMethod("getGeneRegionTrackForGviz", "EnsDb", function(x, filter=list(),
+ chromosome=NULL,
+ start=NULL,
+ end=NULL,
+ featureIs="gene_biotype"){
+ featureIs <- match.arg(featureIs, c("gene_biotype", "tx_biotype"))
+ filter <- checkFilter(filter)
+ if(missing(chromosome))
+ chromosome <- NULL
+ if(missing(start))
+ start <- NULL
+ if(missing(end))
+ end <- NULL
+ ## if only chromosome is specified, create a SeqnameFilter and add it to the filter
+ if(is.null(start) & is.null(end) & !is.null(chromosome)){
+ filter <- c(filter, list(SeqnameFilter(chromosome)))
+ chromosome <- NULL
+ }
+ if(any(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
+ ## Require however that all are defined!!!
+ if(all(c(!is.null(chromosome), !is.null(start), !is.null(end)))){
+ ## Fix eventually provided UCSC chromosome names:
+ chromosome <- ucscToEns(chromosome)
+ ## Fetch all transcripts in that region:
+ tids <- dbGetQuery(dbconn(x),
+ paste0("select distinct tx.tx_id from tx join gene on",
+ " (tx.gene_id=gene.gene_id)",
+ " where seq_name='", chromosome, "' and (",
+ "(tx_seq_start >=",start," and tx_seq_start <=",end,") or ",
+ "(tx_seq_end >=",start," and tx_seq_end <=",end,") or ",
+ "(tx_seq_start <=",start," and tx_seq_end >=",end,")",
+ ")"))[, "tx_id"]
+ if(length(tids) == 0)
+ stop(paste0("Did not find any transcript on chromosome ", chromosome,
+ " from ", start, " to ", end, "!"))
+ filter <- c(filter, TxidFilter(tids))
+ }else{
+ stop(paste0("Either all or none of arguments 'chromosome', 'start' and 'end' ",
+ " have to be specified!"))
+ }
+ }
+ ## Return a data.frame with columns: chromosome, start, end, width, strand, feature,
+ ## gene, exon, transcript and symbol.
+ ## 1) Query the data as we usually would.
+ ## 2) Perform an additional query to get cds and utr, remove all entries from the
+ ## first result for the same transcripts and rbind the data.frames.
+ needCols <- c("seq_name", "exon_seq_start", "exon_seq_end", "seq_strand",
+ featureIs, "gene_id", "exon_id",
+ "exon_idx", "tx_id", "gene_name")
+ ## That's the names to which we map the original columns from the EnsDb.
+ names(needCols) <- c("chromosome", "start", "end", "strand",
+ "feature", "gene", "exon", "exon_rank", "transcript",
+ "symbol")
+ txs <- transcripts(x, filter=filter,
+ columns=needCols, return.type="data.frame")
+ ## Rename columns
+ idx <- match(needCols, colnames(txs))
+ notThere <- is.na(idx)
+ idx <- idx[!notThere]
+ colnames(txs)[idx] <- names(needCols)[!notThere]
+ ## now processing the 5utr
+ fUtr <- fiveUTRsByTranscript(x, filter=filter, columns=needCols)
+ if(length(fUtr) > 0){
+ fUtr <- as(unlist(fUtr, use.names=FALSE), "data.frame")
+ fUtr <- fUtr[, !(colnames(fUtr) %in% c("width", "seq_name", "exon_seq_start",
+ "exon_seq_end", "strand"))]
+ colnames(fUtr)[1] <- "chromosome"
+ idx <- match(needCols, colnames(fUtr))
+ notThere <- is.na(idx)
+ idx <- idx[!notThere]
+ colnames(fUtr)[idx] <- names(needCols)[!notThere]
+ ## Force being in the correct ordering:
+ fUtr <- fUtr[, names(needCols)]
+ fUtr$feature <- "utr5"
+ ## Remove transcripts from the txs data.frame
+ txs <- txs[!(txs$transcript %in% fUtr$transcript), , drop=FALSE]
+ }
+ tUtr <- threeUTRsByTranscript(x, filter=filter, columns=needCols)
+ if(length(tUtr) > 0){
+ tUtr <- as(unlist(tUtr, use.names=FALSE), "data.frame")
+ tUtr <- tUtr[, !(colnames(tUtr) %in% c("width", "seq_name", "exon_seq_start",
+ "exon_seq_end", "strand"))]
+ colnames(tUtr)[1] <- "chromosome"
+ idx <- match(needCols, colnames(tUtr))
+ notThere <- is.na(idx)
+ idx <- idx[!notThere]
+ colnames(tUtr)[idx] <- names(needCols)[!notThere]
+ ## Force being in the correct ordering:
+ tUtr <- tUtr[, names(needCols)]
+ tUtr$feature <- "utr3"
+ ## Remove transcripts from the txs data.frame
+ if(nrow(txs) > 0){
+ txs <- txs[!(txs$transcript %in% tUtr$transcript), , drop=FALSE]
+ }
+ }
+ cds <- cdsBy(x, filter=filter, columns=needCols)
+ if(length(cds) > 0){
+ cds <- as(unlist(cds, use.names=FALSE), "data.frame")
+ cds <- cds[, !(colnames(cds) %in% c("width", "seq_name", "exon_seq_start",
+ "exon_seq_end", "strand"))]
+ colnames(cds)[1] <- "chromosome"
+ idx <- match(needCols, colnames(cds))
+ notThere <- is.na(idx)
+ idx <- idx[!notThere]
+ colnames(cds)[idx] <- names(needCols)[!notThere]
+ ## Force being in the correct ordering:
+ cds <- cds[, names(needCols)]
+ ## Remove transcripts from the txs data.frame
+ if(nrow(txs) > 0){
+ txs <- txs[!(txs$transcript %in% cds$transcript), , drop=FALSE]
+ }
+ }
+ if(length(fUtr) > 0){
+ txs <- rbind(txs, fUtr)
+ }
+ if(length(tUtr) > 0){
+ txs <- rbind(txs, tUtr)
+ }
+ if(length(cds) > 0){
+ txs <- rbind(txs, cds)
+ }
+ ## Convert into GRanges.
+ suppressWarnings(
+ SI <- seqinfo(x)
+ )
+ SI <- SI[as.character(unique(txs$chromosome))]
+ GR <- GRanges(seqnames=Rle(txs$chromosome),
+ strand=Rle(txs$strand),
+ ranges=IRanges(start=txs$start, end=txs$end),
+ seqinfo=SI,
+ txs[, c("feature", "gene", "exon", "exon_rank",
+ "transcript", "symbol"), drop=FALSE])
+ return(GR)
+})
+
+
+## Simple helper function to set the @feature in GRangesFilter depending on the calling method.
+setFeatureInGRangesFilter <- function(x, feature){
+ for(i in seq(along.with=x)){
+ if(is(x[[i]], "GRangesFilter")){
+ x[[i]]@feature <- feature
+ }
+ }
+ return(x)
+}
+
+####============================================================
+## properties
+##
+## Get access to the "hidden" .properties slot and return it.
+## This ensures that we're not generating an error for objects that
+## do not have yet that slot.
+####------------------------------------------------------------
+setMethod("properties", "EnsDb", function(x, ...){
+ if(.hasSlot(x, ".properties")){
+ return(x at .properties)
+ }else{
+ warning("The present EnsDb instance has no .properties slot! ",
+ "Please use 'updateEnsDb' to update the object!")
+ return(list())
+ }
+})
+
+####============================================================
+## getProperty
+##
+## Return the value for the property with the specified name or
+## NA if not present.
+####------------------------------------------------------------
+setMethod("getProperty", "EnsDb", function(x, name, default = NA){
+ props <- properties(x)
+ if(any(names(props) == name)){
+ return(props[[name]])
+ }else{
+ return(default)
+ }
+})
+
+####============================================================
+## setProperty
+##
+## Sets a property in the object. The value has to be a named vector.
+####------------------------------------------------------------
+setMethod("setProperty", "EnsDb", function(x, ...){
+ dotL <- list(...)
+ if(length(dotL) == 0){
+ stop("No property specified! The property has to be submitted ",
+ "in the format name=value!")
+ return(x)
+ }
+ if(length(dotL) > 1){
+ warning("'setProperty' does only support setting of a single property!",
+ " Using the first submitted one.")
+ dotL <- dotL[1]
+ }
+ if(is.null(names(dotL)) | names(dotL) == "")
+ stop("A name is required! Use name=value!")
+ if(.hasSlot(x, ".properties")){
+ x at .properties[names(dotL)] <- dotL[[1]]
+ }else{
+ warning("The present EnsDb instance has no .properties slot! ",
+ "Please use 'updateEnsDb' to update the object!")
+ }
+ return(x)
+})
+
+####============================================================
+## updateEnsDb
+##
+## Update any "old" EnsDb instance to the most recent implementation.
+####------------------------------------------------------------
+setMethod("updateEnsDb", "EnsDb", function(x, ...){
+ newE <- new("EnsDb", ensdb=x at ensdb, tables=x at tables)
+ if(.hasSlot(x, ".properties"))
+ newE at .properties <- x at .properties
+ return(newE)
+})
+
+
+####============================================================
+## transcriptsByOverlaps
+##
+## Just "re-implementing" the transcriptsByOverlaps methods from the
+## GenomicFeature package, finetuning and adapting it for EnsDbs
+####------------------------------------------------------------
+setMethod("transcriptsByOverlaps", "EnsDb",
+ function(x, ranges, maxgap = 0L, minoverlap = 1L,
+ type = c("any", "start", "end"),
+ columns=listColumns(x, "tx"),
+ filter) {
+ if(missing(ranges))
+ stop("Parameter 'ranges' is missing!")
+ if(missing(filter)){
+ filter <- list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ SLs <- unique(as.character(seqnames(ranges)))
+ filter <- c(filter, SeqnameFilter(SLs))
+ return(subsetByOverlaps(transcripts(x, columns=columns, filter=filter),
+ ranges, maxgap=maxgap, minoverlap=minoverlap, type=match.arg(type)))
+})
+
+####============================================================
+## exonsByOverlaps
+##
+####------------------------------------------------------------
+setMethod("exonsByOverlaps", "EnsDb",
+ function(x, ranges, maxgap=0L, minoverlap=1L,
+ type=c("any", "start", "end"),
+ columns=listColumns(x, "exon"),
+ filter) {
+ if(missing(ranges))
+ stop("Parameter 'ranges' is missing!")
+ if(missing(filter)){
+ filter <- list()
+ }else{
+ filter <- checkFilter(filter)
+ }
+ SLs <- unique(as.character(seqnames(ranges)))
+ filter <- c(filter, SeqnameFilter(SLs))
+ return(subsetByOverlaps(exons(x, columns=columns, filter=filter),
+ ranges, maxgap=maxgap, minoverlap=minoverlap, type=match.arg(type)))
+})
+
+############################################################
+## returnFilterColumns
+##
+## Method to set the option whether or not the filter columns should be
+## returned too.
+setMethod("returnFilterColumns", "EnsDb", function(x) {
+ return(getProperty(x, "returnFilterColumns"))
+})
+setReplaceMethod("returnFilterColumns", "EnsDb", function(x, value) {
+ if(!is.logical(value))
+ stop("'value' has to be a logical!")
+ if(length(value) > 1)
+ stop("'value' has to be a logical of length 1!")
+ x <- setProperty(x, returnFilterColumns=value)
+ return(x)
+})
+
+############################################################
+## orderResultsInR
+##
+## Whether the results should be ordered in R instead of in the
+## SQL call
+setMethod("orderResultsInR", "EnsDb", function(x) {
+ return(getProperty(x, "orderResultsInR", default = FALSE))
+})
+setReplaceMethod("orderResultsInR", "EnsDb", function(x, value) {
+ if(!is.logical(value))
+ stop("'value' has to be a logical!")
+ if(length(value) > 1)
+ stop("'value' has to be a logical of length 1!")
+ x <- setProperty(x, orderResultsInR = value)
+ return(x)
+})
+
+############################################################
+## useMySQL
+##
+## Switch from RSQlite backend to a MySQL backend.
+##' @title Use a MySQL backend
+##' @aliases useMySQL
+##'
+##' @description Change the SQL backend from \emph{SQLite} to \emph{MySQL}.
+##' When first called on an \code{\linkS4class{EnsDb}} object, the function
+##' tries to create and save all of the data into a MySQL database. All
+##' subsequent calls will connect to the already existing MySQL database.
+##'
+##' @details This functionality requires that the \code{RMySQL} package is
+##' installed and that the user has (write) access to a running MySQL server.
+##' If the corresponding database does already exist users without write access
+##' can use this functionality.
+##'
+##' @note At present the function does not evaluate whether the versions
+##' between the SQLite and MySQL database differ.
+##'
+##' @param x The \code{\linkS4class{EnsDb}} object.
+##' @param host Character vector specifying the host on which the MySQL
+##' server runs.
+##' @param port The port on which the MySQL server can be accessed.
+##' @param user The user name for the MySQL server.
+##' @param pass The password for the MySQL server.
+##' @return A \code{\linkS4class{EnsDb}} object providing access to the
+##' data stored in the MySQL backend.
+##' @author Johannes Rainer
+##' @examples
+##' ## Load the EnsDb database (SQLite backend).
+##' library(EnsDb.Hsapiens.v75)
+##' edb <- EnsDb.Hsapiens.v75
+##' ## Now change the backend to MySQL; my_user and my_pass should
+##' ## be the user name and password to access the MySQL server.
+##' \dontrun{
+##' edb_mysql <- useMySQL(edb, host = "localhost", user = my_user, pass = my_pass)
+##' }
+setMethod("useMySQL", "EnsDb", function(x, host = "localhost",
+ port = 3306, user, pass) {
+ if (missing(user))
+ stop("'user' has to be specified.")
+ if (missing(pass))
+ stop("'pass' has to be specified.")
+ ## Check if RMySQL package is available.
+ if(requireNamespace("RMySQL", quietly = TRUE)) {
+ ## Check if we can connect to MySQL.
+ driva <- dbDriver("MySQL")
+ con <- dbConnect(driva, host = host, user = user, pass = pass,
+ port = port)
+ ## Check if database is available.
+ dbs <- dbGetQuery(con, "show databases;")
+ sqliteName <- sub(basename(dbfile(dbconn(x))),
+ pattern = ".sqlite", replacement = "",
+ fixed = TRUE)
+ mysqlName <- SQLiteName2MySQL(sqliteName)
+ if (nrow(dbs) == 0 | !any(dbs$Database == mysqlName)) {
+ message("Database not available, trying to create it...",
+ appendLF = FALSE)
+ dbGetQuery(con, paste0("create database ", mysqlName))
+ message("OK")
+ }
+ dbDisconnect(con)
+ ## Connect to the database and check if we've got all tables.
+ con <- dbConnect(driva, host = host, user = user, pass = pass,
+ dbname = mysqlName)
+ ## If we've got no tables we try to feed the SQLite database
+ if (length(dbListTables(con)) == 0)
+ feedEnsDb2MySQL(x, mysql = con)
+ ## Check if we've got all required tables.
+ OK <- dbHasRequiredTables(con)
+ if (is.character(OK))
+ stop(OK)
+ OK <- dbHasValidTables(con)
+ if (is.character(OK))
+ stop(OK)
+ ## Check if the versions/creation date differ.
+ metadata_pkg <- metadata(x)
+ ## Now store the connection into the @ensdb slot
+ ## dbDisconnect(x at ensdb)
+ ## x at ensdb <- NULL
+ x at ensdb <- con
+ metadata_db <- metadata(x)
+ cre_pkg <- metadata_pkg[metadata_pkg$name == "Creation time", "value"]
+ cre_db <- metadata_db[metadata_db$name == "Creation time", "value"]
+ if (cre_pkg != cre_db) {
+ message("Creation date between the package and the information in",
+ " the database differ:\n o package: ", cre_pkg,
+ "\n o database: ", cre_db, ".\nYou might consider to delete",
+ " the database and re-install it calling this function.")
+ }
+ return(x)
+ } else {
+ stop("Package 'RMySQL' not available.")
+ }
+})
diff --git a/R/dbhelpers.R b/R/dbhelpers.R
new file mode 100644
index 0000000..0bf090d
--- /dev/null
+++ b/R/dbhelpers.R
@@ -0,0 +1,589 @@
+############################################################
+## EnsDb
+## Constructor function.
+##' @title Connect to an EnsDb object
+##'
+##' @description The \code{EnsDb} constructor function connects to the database
+##' specified with argument \code{x} and returns a corresponding
+##' \code{\linkS4class{EnsDb}} object.
+##'
+##' @details By providing the connection to a MySQL database, it is possible
+##' to use MySQL as the database backend and queries will be performed on that
+##' database. Note however that this requires the package \code{RMySQL} to be
+##' installed. In addition, the user needs to have access to a MySQL server
+##' providing already an EnsDb database, or must have write privileges on a
+##' MySQL server, in which case the \code{\link{useMySQL}} method can be used
+##' to insert the annotations from an EnsDB package into a MySQL database.
+##' @param x Either a character specifying the \emph{SQLite} database file, or
+##' a \code{DBIConnection} to e.g. a MySQL database.
+##' @return A \code{\linkS4class{EnsDb}} object.
+##' @author Johannes Rainer
+##' @examples
+##' ## "Standard" way to create an EnsDb object:
+##' library(EnsDb.Hsapiens.v75)
+##' EnsDb.Hsapiens.v75
+##'
+##' ## Alternatively, provide the full file name of a SQLite database file
+##' dbfile <- system.file("extdata/EnsDb.Hsapiens.v75.sqlite", package = "EnsDb.Hsapiens.v75")
+##' edb <- EnsDb(dbfile)
+##' edb
+##'
+##' ## Third way: connect to a MySQL database
+##' \dontrun{
+##' library(RMySQL)
+##' dbcon <- dbConnect(MySQL(), user = my_user, pass = my_pass, host = my_host, dbname = "ensdb_hsapiens_v75")
+##' edb <- EnsDb(dbcon)
+##' }
+EnsDb <- function(x){
+ options(useFancyQuotes=FALSE)
+ if(missing(x)){
+ stop("No sqlite file provided!")
+ }
+ if (is.character(x)) {
+ lite <- dbDriver("SQLite")
+ con <- dbConnect(lite, dbname = x, flags=SQLITE_RO)
+ }
+ else if (is(x, "DBIConnection")) {
+ con <- x
+ } else {
+ stop("'x' should be either a character specifying the SQLite file to",
+ " be loaded, or a DBIConnection object providing the connection",
+ " to the database.")
+ }
+ ## Check if the database is valid.
+ OK <- dbHasRequiredTables(con)
+ if (is.character(OK))
+ stop(OK)
+ OK <- dbHasValidTables(con)
+ if (is.character(OK))
+ stop(OK)
+ tables <- dbListTables(con)
+ ## Quick fix for EnsDbs containing also protein data (issue #30):
+ tables <- tables[!(tables %in% c("protein", "uniprot", "protein_domain"))]
+ ## read the columns for these tables.
+ Tables <- vector(length=length(tables), "list")
+ for(i in 1:length(Tables)){
+ Tables[[ i ]] <- colnames(dbGetQuery(con, paste0("select * from ",
+ tables[ i ], " limit 1")))
+ }
+ names(Tables) <- tables
+ EDB <- new("EnsDb", ensdb=con, tables=Tables)
+ EDB <- setProperty(EDB, dbSeqlevelsStyle="Ensembl")
+ ## Setting the default for the returnFilterColumns
+ returnFilterColumns(EDB) <- TRUE
+ ## Defining the default for the ordering
+ orderResultsInR(EDB) <- FALSE
+ return(EDB)
+}
+
+## x is the connection to the database, name is the name of the entry to fetch
+.getMetaDataValue <- function(x, name){
+ return(dbGetQuery(x, paste0("select value from metadata where name='", name, "'"))[ 1, 1])
+}
+
+####
+## Note: that's the central function that checks which tables are needed for the
+## least expensive join!!! The names of the tables should then also be submitted
+## to any other method that calls prefixColumns (e.g. where of the Filter classes)
+##
+## this function checks:
+## a) for multi-table columns, selects the table with the highest degree
+## b) pre-pend (inverse of append ;)) the table name to the column name.
+## returns a list, names being the tables and the values being the columns
+## named: <table name>.<column name>
+## clean: whether a cleanColumns should be called on the submitted columns.
+## with.tables: force the prefix to be specifically on the submitted tables.
+prefixColumns <- function(x, columns, clean = TRUE, with.tables){
+ if (missing(columns))
+ stop("columns is empty! No columns provided!")
+ ## first get to the tables that contain these columns
+ Tab <- listTables(x) ## returns the tables by degree!
+ if (!missing(with.tables)) {
+ with.tables <- with.tables[ with.tables %in% names(Tab) ]
+ if (length(with.tables) > 0) {
+ Tab <- Tab[ with.tables ]
+ } else {
+ warning("The submitted table names are not valid in the database and were thus dropped.")
+ }
+ if (length(Tab) == 0)
+ stop("None of the tables submitted with with.tables is present in the database!")
+ }
+ if (clean)
+ columns <- cleanColumns(x, columns)
+ if (length(columns) == 0) {
+ return(NULL)
+ }
+ ## group the columns by table.
+ columns.bytable <- sapply(Tab, function(z){
+ return(z[ z %in% columns ])
+ }, simplify=FALSE, USE.NAMES=TRUE)
+ ## kick out empty tables...
+ columns.bytable <- columns.bytable[ unlist(lapply(columns.bytable, function(z){
+ return(length(z) > 0)
+ })) ]
+ if(length(columns.bytable)==0)
+ stop("No columns available!")
+ have.columns <- NULL
+ ## new approach! order the tables by number of elements, and after that, re-order them.
+ columns.bytable <- columns.bytable[ order(unlist(lapply(columns.bytable, length)),
+ decreasing=TRUE) ]
+ ## has to be a for loop!!!
+ ## loop throught the columns by table and sequentially kick out columns for the current table if they where already
+ ## in a previous (more relevant) table
+ ## however, prefer also cases were fewer tables are returned.
+ for(i in 1:length(columns.bytable)){
+ bm <- columns.bytable[[ i ]] %in% have.columns
+ keepvals <- columns.bytable[[ i ]][ !bm ] ## keep those
+ if(length(keepvals) > 0){
+ have.columns <- c(have.columns, keepvals)
+ }
+ if(length(keepvals) > 0){
+ columns.bytable[[ i ]] <- paste(names(columns.bytable)[ i ], keepvals, sep=".")
+ }else{
+ columns.bytable[[ i ]] <- keepvals
+ }
+ }
+ ## kick out those tables with no elements left...
+ columns.bytable <- columns.bytable[ unlist(lapply(columns.bytable, function(z){
+ return(length(z) > 0)
+ })) ]
+ ## re-order by degree.
+ columns.bytable <- columns.bytable[ tablesByDegree(x, names(columns.bytable)) ]
+ return(columns.bytable)
+}
+
+############################################################
+## call the prefixColumns function and return just the column
+## names, but in the same order than the provided columns.
+prefixColumnsKeepOrder <- function(x, columns, clean = TRUE, with.tables) {
+ res <- unlist(prefixColumns(x, columns, clean, with.tables),
+ use.names = FALSE)
+ res_order <- sapply(columns, function(z) {
+ idx <- grep(res, pattern = paste0("\\.", z, "$"))
+ if (length(idx) == 0)
+ return(NULL)
+ return(res[idx[1]])
+ })
+ return(res_order[!is.null(res_order)])
+}
+
+
+
+## define a function to create a join query based on columns
+## this function has to first get all tables that contain the columns,
+## and then select, for columns present in more than one
+## x... EnsDb
+## columns... the columns
+joinQueryOnColumns <- function(x, columns){
+ columns.bytable <- prefixColumns(x, columns)
+ ## based on that we can build the query based on the tables we've got. Note that the
+ ## function internally
+ ## adds tables that might be needed for the join.
+ Query <- joinQueryOnTables(x, names(columns.bytable))
+ return(Query)
+}
+
+
+## only list direct joins!!!
+.JOINS <- rbind(
+ c("gene", "tx", "join tx on (gene.gene_id=tx.gene_id)"),
+ c("gene", "chromosome", "join chromosome on (gene.seq_name=chromosome.seq_name)"),
+ c("tx", "tx2exon", "join tx2exon on (tx.tx_id=tx2exon.tx_id)"),
+ c("tx2exon", "exon", "join exon on (tx2exon.exon_id=exon.exon_id)")
+)
+## tx is now no 1:
+## .JOINS <- rbind(
+## c("tx", "gene", "join gene on (tx.gene_id=gene.gene_id)"),
+## c("gene", "chromosome", "join chromosome on (gene.seq_name=chromosome.seq_name)"),
+## c("tx", "tx2exon", "join tx2exon on (tx.tx_id=tx2exon.tx_id)"),
+## c("tx2exon", "exon", "join exon on (tx2exon.exon_id=exon.exon_id)")
+## )
+
+
+joinQueryOnTables <- function(x, tab){
+ ## just to be on the save side: evaluate whether we have all required tables to join;
+ ## this will also ensure that the order is by degree.
+ tab <- addRequiredTables(x, tab)
+ Query <- tab[ 1 ]
+ previous.table <- tab[ 1 ]
+ for(i in 1:length(tab)){
+ if(i > 1){
+ Query <- paste(Query, .JOINS[ .JOINS[ , 2 ]==tab[ i ], 3 ])
+ }
+ }
+ return(Query)
+}
+
+
+###
+## Add additional tables in case the submitted tables are not directly connected
+## and can thus not be joined. That's however not so complicated, since the database
+## layout is pretty simple.
+## The tables are:
+##
+## exon -(exon_id=t2e_exon_id)- tx2exon -(t2e_tx_id=tx_id)- tx -(gene_id=gene_id)- gene
+## |
+## chromosome -(seq_name=seq_name)-´
+addRequiredTables <- function(x, tab){
+ ## dash it, as long as I can't find a way to get connected objects in a
+ ## graph I'll do it manually...
+ ## if we have exon and any other table, we need definitely tx2exon
+ if(any(tab=="exon") & length(tab) > 1){
+ tab <- unique(c(tab, "tx2exon"))
+ }
+ ## if we have chromosome and any other table, we'll need gene
+ if(any(tab=="chromosome") & length(tab) > 1){
+ tab <- unique(c(tab, "gene"))
+ }
+ ## if we have exon and we have gene, we'll need also tx
+ if((any(tab=="exon") | (any(tab=="tx2exon"))) & any(tab=="gene")){
+ tab <- unique(c(tab, "tx"))
+ }
+ return(tablesByDegree(x, tab))
+}
+
+
+############################################################
+## .buildQuery
+##
+## The "backbone" function that builds the SQL query based on the specified
+## columns, the provided filters etc.
+## x an EnsDb object
+.buildQuery <- function(x, columns, filter = list(), order.by = "",
+ order.type = "asc", group.by, skip.order.check=FALSE,
+ return.all.columns = TRUE) {
+ resultcolumns <- columns ## just to remember what we really want to give back
+ ## 1) get all column names from the filters also removing the prefix.
+ if (class(filter)!="list")
+ stop("parameter filter has to be a list of BasicFilter classes!")
+ if (length(filter) > 0) {
+ ## check filter!
+ ## add the columns needed for the filter
+ filtercolumns <- unlist(lapply(filter, column, x))
+ ## remove the prefix (column name for these)
+ filtercolumns <- sapply(filtercolumns, removePrefix, USE.NAMES = FALSE)
+ columns <- unique(c(columns, filtercolumns))
+ }
+ ## 2) get all column names for the order.by:
+ if (any(order.by != "")) {
+ ## if we have skip.order.check set we use the order.by as is.
+ if (!skip.order.check) {
+ order.by <- checkOrderBy(orderBy = order.by, supported = columns)
+ }
+ }else{
+ order.by <- ""
+ }
+ ## Note: order by is now a vector!!!
+ ## columns are now all columns that we want to fetch or that we need to
+ ## filter or to sort.
+ ## 3) check which tables we need for all of these columns:
+ need.tables <- names(prefixColumns(x, columns))
+ ##
+ ## Now we can begin to build the query parts!
+ ## a) the query part that joins all required tables.
+ joinquery <- joinQueryOnColumns(x, columns=columns)
+ ## b) the filter part of the query
+ if (length(filter) > 0) {
+ filterquery <- paste(" where",
+ paste(unlist(lapply(filter, where, x,
+ with.tables = need.tables)),
+ collapse=" and "))
+ } else {
+ filterquery <- ""
+ }
+ ## c) the order part of the query
+ if (any(order.by != "")) {
+ if (!skip.order.check) {
+ ## order.by <- paste(unlist(prefixColumns(x=x, columns=order.by,
+ ## with.tables=need.tables),
+ ## use.names=FALSE), collapse=",")
+ order.by <- paste(prefixColumnsKeepOrder(x = x, columns = order.by,
+ with.tables = need.tables),
+ collapse=",")
+ }
+ orderquery <- paste(" order by", order.by, order.type)
+ }else{
+ orderquery <- ""
+ }
+ ## And finally build the final query
+ if(return.all.columns){
+ resultcolumns <- columns
+ }
+ finalquery <- paste0("select distinct ",
+ ## paste(unlist(prefixColumns(x,
+ ## resultcolumns,
+ ## with.tables=need.tables),
+ ## use.names=FALSE), collapse=","),
+ paste(prefixColumnsKeepOrder(x,
+ resultcolumns,
+ with.tables = need.tables),
+ collapse=","),
+ " from ",
+ joinquery,
+ filterquery,
+ orderquery
+ )
+ return(finalquery)
+}
+
+
+## remove the prefix again...
+removePrefix <- function(x, split=".", fixed=TRUE){
+ return(sapply(x, function(z){
+ tmp <- unlist(strsplit(z, split=split, fixed=fixed))
+ return(tmp[ length(tmp) ])
+ }))
+}
+
+
+## just to add another layer; basically just calls buildQuery and executes the query
+.getWhat <- function(x, columns, filter = list(), order.by = "",
+ order.type = "asc", group.by = NULL,
+ skip.order.check = FALSE) {
+ ## That's nasty stuff; for now we support the column tx_name, which we however
+ ## don't have in the database. Thus, we are querying everything except that
+ ## column and filling it later with the values from tx_id.
+ fetchColumns <- columns
+ if(any(columns == "tx_name"))
+ fetchColumns <- unique(c("tx_id", fetchColumns[fetchColumns != "tx_name"]))
+ if (class(filter) != "list")
+ stop("parameter filter has to be a list of BasicFilter classes!")
+ ## If any of the filter is a SymbolFilter, add "symbol" to the return columns.
+ if (length(filter) > 0) {
+ if (any(unlist(lapply(filter, function(z) {
+ return(is(z, "SymbolFilter"))
+ }))))
+ columns <- unique(c(columns, "symbol")) ## append a filter column.
+ }
+ ## Catch also a "symbol" in columns
+ if(any(columns == "symbol"))
+ fetchColumns <- unique(c(fetchColumns[fetchColumns != "symbol"],
+ "gene_name"))
+ ## Shall we do the ordering in R or in SQL?
+ if (orderResultsInR(x) & !skip.order.check) {
+ ## Build the query
+ Q <- .buildQuery(x = x, columns = fetchColumns, filter = filter,
+ order.by = "", order.type = order.type,
+ group.by = group.by,
+ skip.order.check = skip.order.check)
+ ## Get the data
+ Res <- dbGetQuery(dbconn(x), Q)
+ ## Note: we can only order by the columns that we did get back from the
+ ## database; that might be different for the SQL sorting!
+ Res <- orderDataFrameBy(Res, by = checkOrderBy(order.by, fetchColumns),
+ decreasing = order.type != "asc")
+ } else {
+ ## Build the query
+ Q <- .buildQuery(x = x, columns = fetchColumns, filter = filter,
+ order.by = order.by, order.type = order.type,
+ group.by = group.by,
+ skip.order.check = skip.order.check)
+ ## Get the data
+ Res <- dbGetQuery(dbconn(x), Q)
+ }
+ ## cat("Query:\n", Q, "\n")
+ if(any(columns == "tx_cds_seq_start")) {
+ if (!is.integer(Res[, "tx_cds_seq_start"])) {
+ suppressWarnings(
+ ## column contains "NULL" if not defined and coordinates are
+ ## characters as.numeric transforms "NULL" into NA, and ensures
+ ## coords are numeric.
+ Res[ , "tx_cds_seq_start"] <- as.integer(Res[ , "tx_cds_seq_start"])
+ )
+ }
+ }
+ if(any(columns=="tx_cds_seq_end")){
+ if (!is.integer(Res[, "tx_cds_seq_end"])) {
+ suppressWarnings(
+ ## column contains "NULL" if not defined and coordinates are
+ ## characters as.numeric transforms "NULL" into NA, and ensures
+ ## coords are numeric.
+ Res[ , "tx_cds_seq_end" ] <- as.integer(Res[ , "tx_cds_seq_end" ])
+ )
+ }
+ }
+ ## Fix for MySQL returning 'numeric' instead of 'integer'.
+ ## THIS SHOULD BE REMOVED ONCE THE PROBLEM IS FIXED IN RMySQL!!!
+ int_cols <- c("exon_seq_start", "exon_seq_end", "exon_idx", "tx_seq_start",
+ "tx_seq_end", "tx_cds_seq_start", "tx_cds_seq_end",
+ "gene_seq_start", "gene_seq_end", "seq_strand")
+ for (the_col in int_cols) {
+ if (any(colnames(Res) == the_col))
+ if (!is.integer(Res[, the_col]))
+ Res[, the_col] <- as.integer(Res[, the_col])
+ }
+ ## Resolving the "symlinks" again.
+ if(any(columns == "tx_name")) {
+ Res <- data.frame(Res, tx_name = Res$tx_id, stringsAsFactors = FALSE)
+ }
+ if(any(columns == "symbol")) {
+ Res <- data.frame(Res, symbol = Res$gene_name, stringsAsFactors = FALSE)
+ }
+ ## Ensure that the ordering is as requested.
+ Res <- Res[, columns, drop=FALSE]
+ return(Res)
+}
+
+############################################################
+## Check database validity.
+.ENSDB_TABLES <- list(gene = c("gene_id", "gene_name", "entrezid",
+ "gene_biotype", "gene_seq_start",
+ "gene_seq_end", "seq_name", "seq_strand",
+ "seq_coord_system"),
+ tx = c("tx_id", "tx_biotype", "tx_seq_start",
+ "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end", "gene_id"),
+ tx2exon = c("tx_id", "exon_id", "exon_idx"),
+ exon = c("exon_id", "exon_seq_start", "exon_seq_end"),
+ chromosome = c("seq_name", "seq_length", "is_circular"),
+ metadata = c("name", "value"))
+dbHasRequiredTables <- function(con, returnError = TRUE) {
+ tabs <- dbListTables(con)
+ if (length(tabs) == 0) {
+ if (returnError)
+ return("Database does not have any tables!")
+ return(FALSE)
+ }
+ not_there <- names(.ENSDB_TABLES)[!(names(.ENSDB_TABLES) %in% tabs)]
+ if (length(not_there) > 0) {
+ if (returnError)
+ return(paste0("Required tables ", paste(not_there, collapse = ", "),
+ " are not present in the database!"))
+ return(FALSE)
+ }
+ return(TRUE)
+}
+dbHasValidTables <- function(con, returnError = TRUE) {
+ for (tab in names(.ENSDB_TABLES)) {
+ cols <- .ENSDB_TABLES[[tab]]
+ from_db <- colnames(dbGetQuery(con, paste0("select * from ", tab,
+ " limit 1")))
+ not_there <- cols[!(cols %in% from_db)]
+ if (length(not_there) > 0) {
+ if (returnError)
+ return(paste0("Table ", tab, " is missing required columns ",
+ paste(not_there, collapse = ", "), "!"))
+ return(FALSE)
+ }
+ }
+ return(TRUE)
+}
+
+############################################################
+## feedEnsDb2MySQL
+##
+##
+feedEnsDb2MySQL <- function(x, mysql, verbose = TRUE) {
+ if (!inherits(mysql, "MySQLConnection"))
+ stop("'mysql' is supposed to be a connection to a MySQL database.")
+ ## Fetch the tables and feed them to MySQL.
+ sqlite_con <- dbconn(x)
+ tabs <- dbListTables(sqlite_con)
+ for (the_table in tabs) {
+ if (verbose)
+ message("Fetch table ", the_table, "...", appendLF = FALSE)
+ tmp <- dbGetQuery(sqlite_con, paste0("select * from ", the_table, ";"))
+ if (verbose)
+ message("OK\nStoring the table in MySQL...", appendLF = FALSE)
+ ## Fix tx_cds_seq_start being a character in old databases
+ if (any(colnames(tmp) == "tx_cds_seq_start")) {
+ suppressWarnings(
+ tmp[, "tx_cds_seq_start"] <- as.integer(tmp[, "tx_cds_seq_start"])
+ )
+ suppressWarnings(
+ tmp[, "tx_cds_seq_end"] <- as.integer(tmp[, "tx_cds_seq_end"])
+ )
+ }
+ dbWriteTable(mysql, tmp, name = the_table, row.names = FALSE)
+ if (verbose)
+ message("OK")
+ }
+ ## Create the indices.
+ if (verbose)
+ message("Creating indices...", appendLF = FALSE)
+ .createEnsDbIndices(mysql, indexLength = "(20)")
+ if (verbose)
+ message("OK")
+ return(TRUE)
+}
+## Small helper function to cfeate all the indices.
+.createEnsDbIndices <- function(con, indexLength = "") {
+ dbGetQuery(con, paste0("create index seq_name_idx on chromosome (seq_name",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index gene_gene_id_idx on gene (gene_id",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index gene_gene_name_idx on gene (gene_name",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index gene_seq_name_idx on gene (seq_name",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index tx_tx_id_idx on tx (tx_id",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index tx_gene_id_idx on tx (gene_id",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index exon_exon_id_idx on exon (exon_id",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index t2e_tx_id_idx on tx2exon (tx_id",
+ indexLength, ");"))
+ dbGetQuery(con, paste0("create index t2e_exon_id_idx on tx2exon (exon_id",
+ indexLength, ");"))
+ dbGetQuery(con, "create index t2e_exon_idx_idx on tx2exon (exon_idx);")
+}
+
+############################################################
+## listEnsDbs
+## list databases
+##' @title List EnsDb databases in a MySQL server
+##' @description The \code{listEnsDbs} function lists EnsDb databases in a
+##' MySQL server.
+##'
+##' @details The use of this function requires that the \code{RMySQL} package
+##' is installed and that the user has either access to a MySQL server with
+##' already installed EnsDb databases, or write access to a MySQL server in
+##' which case EnsDb databases could be added with the \code{\link{useMySQL}}
+##' method. EnsDb databases follow the same naming conventions than the EnsDb
+##' packages, with the exception that the name is all lower case and that
+##' \code{"."} is replaced by \code{"_"}.
+##' @param dbcon A \code{DBIConnection} object providing access to a MySQL
+##' database. Either \code{dbcon} or all of the other arguments have to be
+##' specified.
+##' @param host Character specifying the host on which the MySQL server is
+##' running.
+##' @param port The port of the MySQL server (usually \code{3306}).
+##' @param user The username for the MySQL server.
+##' @param pass The password for the MySQL server.
+##' @return A \code{data.frame} listing the database names, organism name
+##' and Ensembl version of the EnsDb databases found on the server.
+##' @author Johannes Rainer
+##' @seealso \code{\link{useMySQL}}
+##' @examples
+##' \dontrun{
+##' library(RMySQL)
+##' dbcon <- dbConnect(MySQL(), host = "localhost", user = my_user, pass = my_pass)
+##' listEnsDbs(dbcon)
+##' }
+listEnsDbs <- function(dbcon, host, port, user, pass) {
+ if(requireNamespace("RMySQL", quietly = TRUE)) {
+ if (missing(dbcon)) {
+ if (missing(host) | missing(user) | missing(port) | missing(host))
+ stop("Arguments 'host', 'port', 'user' and 'pass' are required",
+ " if 'dbcon' is not specified.")
+ dbcon <- dbConnect(RMySQL::MySQL(), host = host, port = port, user = user,
+ pass = pass)
+ }
+ dbs <- dbGetQuery(dbcon, "show databases;")
+ edbs <- dbs[grep(dbs$Database, pattern = "^ensdb_"), "Database"]
+ edbTable <- data.frame(matrix(ncol = 3, nrow = length(edbs)))
+ colnames (edbTable) <- c("dbname", "organism", "ensembl_version")
+ for (i in seq_along(edbs)) {
+ edbTable[i, "dbname"] <- edbs[i]
+ tmp <- unlist(strsplit(edbs[i], split = "_"))
+ edbTable[i, "organism"] <- tmp[2]
+ edbTable[i, "ensembl_version"] <- as.numeric(gsub(pattern = "v",
+ replacement = "",
+ tmp[3]))
+ }
+ return(edbTable)
+ } else {
+ stop("Required package 'RMySQL' is not installed.")
+ }
+}
diff --git a/R/functions-utils.R b/R/functions-utils.R
new file mode 100644
index 0000000..bf108e7
--- /dev/null
+++ b/R/functions-utils.R
@@ -0,0 +1,83 @@
+############################################################
+## Utility functions
+
+############################################################
+## orderDataFrameBy
+##
+## Simply orders the data.frame x based on the columns specified
+## with by.
+orderDataFrameBy <- function(x, by = "", decreasing = FALSE) {
+ if (all(by == "") | all(is.null(by)))
+ return(x)
+ return(x[do.call(order,
+ args = c(list(method = "radix",
+ decreasing = decreasing),
+ as.list(x[, by, drop = FALSE]))), ])
+}
+
+############################################################
+## checkOrderBy
+##
+## Check the orderBy argument.
+## o orderBy can be a character vector or a , separated list.
+## o Ensure that the columns are valid by comparing with 'supported'.
+## Returns a character vector, each element representing a column
+## on which sorting should be performed.
+checkOrderBy <- function(orderBy, supported = character()) {
+ if (is.null(orderBy) | all(orderBy == "")) {
+ return(orderBy)
+ }
+ if (length(orderBy) == 1 & length(grep(orderBy, pattern = ",")) > 0) {
+ orderBy <- unlist(strsplit(orderBy, split = ","), use.names = FALSE)
+ orderBy <- gsub(orderBy, pattern = " ", replacement = "", fixed = TRUE)
+ }
+ not_supported <- !(orderBy %in% supported)
+ if (any(not_supported)) {
+ warning("Columns in 'order.by' (",
+ paste(orderBy[not_supported], collapse = ", "),
+ ") are not in 'columns' and were thus removed.")
+ orderBy <- orderBy[!not_supported]
+ if (length(orderBy) == 0)
+ orderBy <- ""
+ }
+ return(orderBy)
+}
+
+############################################################
+## addFilterColumns
+##
+## This function checks the filter objects and adds, depending on the
+## returnFilterColumns setting of the EnsDb, also columns for each of the
+## filters, ensuring that:
+## a) "Symlink" filters are added correctly (the column returned by the
+## column call without db are added).
+## b) GRangesFilter: the feature is set based on the specified feature parameter
+## Args:
+addFilterColumns <- function(cols, filter = list(), edb) {
+ gimmeAll <- returnFilterColumns(edb)
+ if (!missing(filter)) {
+ if(!is.list(filter))
+ filter <- list(filter)
+ } else {
+ return(cols)
+ }
+ if (!gimmeAll)
+ return(cols)
+ ## Or alternatively process the filters and add columns.
+ symFilts <- c("SymbolFilter")
+ addC <- unlist(lapply(filter, function(z) {
+ if(class(z) %in% symFilts)
+ return(column(z))
+ return(column(z))
+ }))
+ return(unique(c(cols, addC)))
+}
+
+############################################################
+## SQLiteName2MySQL
+##
+## Convert the SQLite database name (file name) to the corresponding
+## MySQL database name.
+SQLiteName2MySQL <- function(x) {
+ return(tolower(gsub(x, pattern = ".", replacement = "_", fixed = TRUE)))
+}
diff --git a/R/loadEnsDb.R b/R/loadEnsDb.R
new file mode 100644
index 0000000..2a9832c
--- /dev/null
+++ b/R/loadEnsDb.R
@@ -0,0 +1,5 @@
+loadEnsDb <- function( x ){
+ ## con <- ensDb( x )
+ ## EDB <- new( "EnsDb", ensdb=con )
+ return( EnsDb( x ) )
+}
diff --git a/R/makeEnsemblDbPackage.R b/R/makeEnsemblDbPackage.R
new file mode 100644
index 0000000..d40c750
--- /dev/null
+++ b/R/makeEnsemblDbPackage.R
@@ -0,0 +1,213 @@
+## part of this code is from GenomicFeatures makeTxDbPackage.R
+## So to make a package we need a couple things:
+## 1) we need a method called makeTxDbPackage (that will take a txdb object)
+## 2) we will need a package template to use
+
+
+
+## Separate helper function for abbreviating the genus and species name strings
+## this simply makes the first character uppercase
+.organismName <- function(x){
+ substring(x, 1, 1) <- toupper(substring(x, 1, 1))
+ return(x)
+}
+
+.abbrevOrganismName <- function(organism){
+ spc <- unlist(strsplit(organism, "_"))
+ ## this assumes a binomial nomenclature has been maintained.
+ return(paste0(substr(spc[[1]], 1, 1), spc[[2]]))
+}
+
+
+
+## x has to be the connection to the database.
+.makePackageName <- function(x){
+ species <- .getMetaDataValue(x, "Organism")
+ ensembl_version <- .getMetaDataValue(x, "ensembl_version")
+ pkgName <- paste0("EnsDb.",.abbrevOrganismName(.organismName(species)),
+ ".v", ensembl_version)
+ return(pkgName)
+}
+
+.makeObjectName <- function(pkgName){
+ strs <- unlist(strsplit(pkgName, "\\."))
+ paste(c(strs[2:length(strs)],strs[1]), collapse="_")
+}
+
+
+## retrieve Ensembl data
+## save all files to local folder.
+## returns the path where files have been saved to.
+fetchTablesFromEnsembl <- function(version, ensemblapi, user="anonymous",
+ host="ensembldb.ensembl.org", pass="",
+ port=5306, species="human"){
+ if(missing(version))
+ stop("The version of the Ensembl database has to be provided!")
+ ## setting the stage for perl:
+ fn <- system.file("perl", "get_gene_transcript_exon_tables.pl", package="ensembldb")
+ ## parameters: s, U, H, P, e
+ ## replacing white spaces with _
+ species <- gsub(species, pattern=" ", replacement="_")
+
+ cmd <- paste0("perl ", fn, " -s ", species," -e ", version,
+ " -U ", user, " -H ", host, " -p ", port, " -P ", pass)
+ if(!missing(ensemblapi)){
+ Sys.setenv(ENS=ensemblapi)
+ }
+ system(cmd)
+ if(!missing(ensemblapi)){
+ Sys.unsetenv("ENS")
+ }
+
+ ## we should now have the files:
+ in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
+ "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
+ ## check if we have all files...
+ all_files <- dir(pattern="txt")
+ if(sum(in_files %in% all_files)!=length(in_files))
+ stop("Something went wrong! I'm missing some of the txt files the perl script should have generated.")
+}
+
+
+####
+##
+## create a SQLite database containing the information defined in the txt files.
+makeEnsemblSQLiteFromTables <- function(path=".", dbname){
+ ## check if we have all files...
+ in_files <- c("ens_gene.txt", "ens_tx.txt", "ens_exon.txt",
+ "ens_tx2exon.txt", "ens_chromosome.txt", "ens_metadata.txt")
+ ## check if we have all files...
+ all_files <- dir(path, pattern="txt")
+ if(sum(in_files %in% all_files)!=length(in_files))
+ stop("Something went wrong! I'm missing some of the txt files the perl script should have generated.")
+
+ ## read information
+ info <- read.table(paste0(path, .Platform$file.sep ,"ens_metadata.txt"), sep="\t",
+ as.is=TRUE, header=TRUE)
+ species <- .organismName(info[ info$name=="Organism", "value" ])
+ ##substring(species, 1, 1) <- toupper(substring(species, 1, 1))
+ if(missing(dbname)){
+ dbname <- paste0("EnsDb.",substring(species, 1, 1),
+ unlist(strsplit(species, split="_"))[ 2 ], ".v",
+ info[ info$name=="ensembl_version", "value" ], ".sqlite")
+ }
+ con <- dbConnect(dbDriver("SQLite"), dbname=dbname)
+
+ ## write information table
+ dbWriteTable(con, name="metadata", info, row.names=FALSE)
+
+ ## process chromosome
+ tmp <- read.table(paste0(path, .Platform$file.sep ,"ens_chromosome.txt"), sep="\t", as.is=TRUE, header=TRUE)
+ tmp[, "seq_name"] <- as.character(tmp[, "seq_name"])
+ dbWriteTable(con, name="chromosome", tmp, row.names=FALSE)
+ rm(tmp)
+
+ ## process genes: some gene names might have fancy names...
+ tmp <- read.table(paste0(path, .Platform$file.sep, "ens_gene.txt"), sep="\t", as.is=TRUE, header=TRUE,
+ quote="", comment.char="" )
+ OK <- .checkIntegerCols(tmp)
+ dbWriteTable(con, name="gene", tmp, row.names=FALSE)
+ rm(tmp)
+
+ ## process transcripts:
+ tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx.txt"), sep="\t", as.is=TRUE, header=TRUE)
+ ## Fix the tx_cds_seq_start and tx_cds_seq_end columns: these should be integer!
+ suppressWarnings(
+ tmp[, "tx_cds_seq_start"] <- as.integer(tmp[, "tx_cds_seq_start"])
+ )
+ suppressWarnings(
+ tmp[, "tx_cds_seq_end"] <- as.integer(tmp[, "tx_cds_seq_end"])
+ )
+ OK <- .checkIntegerCols(tmp)
+ dbWriteTable(con, name="tx", tmp, row.names=FALSE)
+ rm(tmp)
+
+ ## process exons:
+ tmp <- read.table(paste0(path, .Platform$file.sep, "ens_exon.txt"), sep="\t", as.is=TRUE, header=TRUE)
+ OK <- .checkIntegerCols(tmp)
+ dbWriteTable(con, name="exon", tmp, row.names=FALSE)
+ rm(tmp)
+ tmp <- read.table(paste0(path, .Platform$file.sep, "ens_tx2exon.txt"), sep="\t", as.is=TRUE, header=TRUE)
+ OK <- .checkIntegerCols(tmp)
+ dbWriteTable(con, name="tx2exon", tmp, row.names=FALSE)
+ rm(tmp)
+ ## Create indices
+ .createEnsDbIndices(con)
+ dbDisconnect(con)
+ ## done.
+ return(dbname)
+}
+
+############################################################
+## Simply checking that some columns are integer
+.checkIntegerCols <- function(x, columns = c("gene_seq_start", "gene_seq_end",
+ "tx_seq_start", "tx_seq_start",
+ "exon_seq_start", "exon_seq_end",
+ "exon_idx", "tx_cds_seq_start",
+ "tx_cds_seq_end")) {
+ cols <- columns[columns %in% colnames(x)]
+ if(length(cols) > 0) {
+ sapply(cols, function(z) {
+ if(!is.integer(x[, z]))
+ stop("Column '", z,"' is not of type integer!")
+ })
+ }
+ return(TRUE)
+}
+
+
+####
+## the function that creates the annotation package.
+## ensdb should be a connection to an SQLite database, or a character string...
+makeEnsembldbPackage <- function(ensdb,
+ version,
+ maintainer,
+ author,
+ destDir=".",
+ license="Artistic-2.0"){
+ if(class(ensdb)!="character")
+ stop("ensdb has to be the name of the SQLite database!")
+ ensdbfile <- ensdb
+ ensdb <- EnsDb(x=ensdbfile)
+ con <- dbconn(ensdb)
+ pkgName <- .makePackageName(con)
+ ensembl_version <- .getMetaDataValue(con, "ensembl_version")
+ ## there should only be one template
+ template_path <- system.file("pkg-template",package="ensembldb")
+ ## We need to define some symbols in order to have the
+ ## template filled out correctly.
+ symvals <- list(
+ PKGTITLE=paste("Ensembl based annotation package"),
+ PKGDESCRIPTION=paste("Exposes an annotation databases generated from Ensembl."),
+ PKGVERSION=version,
+ AUTHOR=author,
+ MAINTAINER=maintainer,
+ LIC=license,
+ ORGANISM=.organismName(.getMetaDataValue(con ,'Organism')),
+ SPECIES=.organismName(.getMetaDataValue(con,'Organism')),
+ PROVIDER="Ensembl",
+ PROVIDERVERSION=as.character(ensembl_version),
+ RELEASEDATE= .getMetaDataValue(con ,'Creation time'),
+ SOURCEURL= .getMetaDataValue(con ,'ensembl_host'),
+ ORGANISMBIOCVIEW=gsub(" ","_",.organismName(.getMetaDataValue(con ,'Organism'))),
+ TXDBOBJNAME=pkgName ## .makeObjectName(pkgName)
+ )
+ ## Should never happen
+ if (any(duplicated(names(symvals)))) {
+ str(symvals)
+ stop("'symvals' contains duplicated symbols")
+ }
+ createPackage(pkgname=pkgName,
+ destinationDir=destDir,
+ originDir=template_path,
+ symbolValues=symvals)
+ ## then copy the contents of the database into the extdata dir
+ sqlfilename <- unlist(strsplit(ensdbfile, split=.Platform$file.sep))
+ sqlfilename <- sqlfilename[ length(sqlfilename) ]
+ dir.create(paste(c(destDir, pkgName, "inst", "extdata"),
+ collapse=.Platform$file.sep), showWarnings=FALSE, recursive=TRUE)
+ db_path <- file.path(destDir, pkgName, "inst", "extdata",
+ paste(pkgName,"sqlite",sep="."))
+ file.copy(ensdbfile, to=db_path)
+}
+
diff --git a/R/runEnsDbApp.R b/R/runEnsDbApp.R
new file mode 100644
index 0000000..7d74396
--- /dev/null
+++ b/R/runEnsDbApp.R
@@ -0,0 +1,10 @@
+## running the shiny web app.
+runEnsDbApp <- function(...){
+ if(requireNamespace("shiny", quietly=TRUE)){
+ message("Starting the EnsDb shiny web app. Use Ctrl-C to stop.")
+ shiny::runApp(appDir=system.file("shinyHappyPeople", package="ensembldb"), ...)
+ }else{
+ stop("Package shiny not installed!")
+ }
+}
+
diff --git a/R/select-methods.R b/R/select-methods.R
new file mode 100644
index 0000000..3f9cc18
--- /dev/null
+++ b/R/select-methods.R
@@ -0,0 +1,319 @@
+## That's to support and interface the AnnotionDbi package.
+
+####============================================================
+## .getColMappings
+##
+## That returns a character vector of abbreviated column names
+## which can be/are used by AnnotationDbi with the names correponding
+## to the column names from ensembldb.
+## x: is supposed to be an EnsDb object.
+## all: if TRUE we return all of them, otherwise we just return those
+## that should be visible for the user.
+####------------------------------------------------------------
+.getColMappings <- function(x, all=FALSE){
+ cols <- listColumns(x)
+ if(!all){
+ cols <- cols[!(cols %in% c("name", "value"))]
+ }
+ ret <- toupper(gsub("_", replacement="", cols))
+ names(ret) <- cols
+ return(ret)
+}
+
+####============================================================
+## columnForKeytype
+##
+## Returns the appropriate column name in the database for the
+## given keytypes.
+####------------------------------------------------------------
+ensDbColumnForColumn <- function(x, column){
+ maps <- .getColMappings(x)
+ revmaps <- names(maps)
+ names(revmaps) <- maps
+ cols <- revmaps[column]
+ if(any(is.na(cols))){
+ warning("The following columns can not be mapped to column names in the",
+ " db: ", paste(column[is.na(cols)], collapse=", "))
+ cols <- cols[!is.na(cols)]
+ }
+ ## Fixing tx_name; tx_name should be mapped to tx_id in the database!
+ ##cols[cols == "tx_name"] <- "tx_id"
+ return(cols)
+}
+
+
+####============================================================
+## columns method
+##
+## Just return the attributes, but as expected by the AnnotationDbi
+## interface (i.e. upper case, no _).
+####------------------------------------------------------------
+.getColumns <- function(x){
+ cols <- .getColMappings(x, all=FALSE)
+ names(cols) <- NULL
+ return(unique(cols))
+}
+setMethod("columns", "EnsDb",
+ function(x) .getColumns(x)
+ )
+
+
+####============================================================
+## keytypes method
+##
+## I will essentially use all of the filters here.
+####------------------------------------------------------------
+setMethod("keytypes", "EnsDb",
+ function(x){
+ return(.filterKeytypes())
+ }
+)
+## This just returns some (eventually) usefull names for keys
+.simpleKeytypes <- function(x){
+ return(c("GENEID","TXID","TXNAME","EXONID","EXONNAME","CDSID","CDSNAME"))
+}
+.filterKeytypes <- function(x){
+ return(names(.keytype2FilterMapping()))
+}
+## returns a vector mapping keytypes (names of vector) to filter names (elements).
+.keytype2FilterMapping <- function(){
+ filters <- c("EntrezidFilter", "GeneidFilter", "GenebiotypeFilter", "GenenameFilter",
+ "TxidFilter", "TxbiotypeFilter", "ExonidFilter", "SeqnameFilter",
+ "SeqstrandFilter", "TxidFilter", "SymbolFilter")
+ names(filters) <- c("ENTREZID", "GENEID", "GENEBIOTYPE", "GENENAME", "TXID",
+ "TXBIOTYPE", "EXONID", "SEQNAME", "SEQSTRAND", "TXNAME",
+ "SYMBOL")
+ return(filters)
+}
+filterForKeytype <- function(keytype){
+ filters <- .keytype2FilterMapping()
+ if(any(names(filters) == keytype)){
+ filt <- new(filters[keytype])
+ return(filt)
+ }else{
+ stop("No filter for that keytype!")
+ }
+}
+
+####============================================================
+## keys method
+##
+## This keys method returns all of the keys for a specified keytype.
+## There should also be an implementation without keytypes, which
+## returns in our case the gene_ids
+##
+####------------------------------------------------------------
+setMethod("keys", "EnsDb",
+ function(x, keytype, filter,...){
+ if(missing(keytype))
+ keytype <- "GENEID"
+ if(missing(filter))
+ filter <- list()
+ if(is(filter, "BasicFilter"))
+ filter <- list(filter)
+ keyt <- keytypes(x)
+ keytype <- match.arg(keytype, keyt)
+ ## Map the keytype to the appropriate column name.
+ dbColumn <- ensDbColumnForColumn(x, keytype)
+ ## Perform the query.
+ res <- getWhat(x, columns=dbColumn, filter=filter)[, dbColumn]
+ return(res)
+ })
+
+
+####============================================================
+## select method
+##
+##
+####------------------------------------------------------------
+setMethod("select", "EnsDb",
+ function(x, keys, columns, keytype, ...) {
+ if (missing(keys))
+ keys <- NULL
+ if (missing(columns))
+ columns <- NULL
+ if (missing(keytype))
+ keytype <- NULL
+ return(.select(x = x, keys = keys, columns = columns,
+ keytype = keytype, ...))
+ })
+.select <- function(x, keys = NULL, columns = NULL, keytype = NULL, ...) {
+ extraArgs <- list(...)
+ ## Perform argument checking:
+ ## columns:
+ if (missing(columns) | is.null(columns))
+ columns <- columns(x)
+ notAvailable <- !(columns %in% columns(x))
+ if (all(notAvailable))
+ stop("None of the specified columns are avaliable in the database!")
+ if (any(notAvailable)){
+ warning("The following columns are not available in the database and have",
+ " thus been removed: ", paste(columns[notAvailable], collapse = ", "))
+ columns <- columns[!notAvailable]
+ }
+ ## keys:
+ if (is.null(keys) | missing(keys)) {
+ ## Get everything from the database...
+ keys <- list()
+ } else {
+ if (!(is(keys, "character") | is(keys, "list") | is(keys, "BasicFilter")))
+ stop("Argument keys should be a character vector, an object extending BasicFilter ",
+ "or a list of objects extending BasicFilter.")
+ if (is(keys, "list")) {
+ if (!all(vapply(keys, is, logical(1L), "BasicFilter")))
+ stop("If keys is a list it should be a list of objects extending BasicFilter!")
+ }
+ if (is(keys, "BasicFilter")) {
+ keys <- list(keys)
+ }
+ if (is(keys, "character")) {
+ if (is.null(keytype)) {
+ stop("Argument keytype is mandatory if keys is a character vector!")
+ }
+ ## Check also keytype:
+ if (!(keytype %in% keytypes(x)))
+ stop("keytype ", keytype, " not available in the database.",
+ " Use keytypes method to list all available keytypes.")
+ ## Generate a filter object for the filters.
+ keyFilter <- filterForKeytype(keytype)
+ value(keyFilter) <- keys
+ keys <- list(keyFilter)
+ ## Add also the keytype itself to the columns.
+ if (!any(columns == keytype))
+ columns <- c(keytype, columns)
+ }
+ }
+ ## Map the columns to column names we have in the database and add filter columns too.
+ ensCols <- unique(c(ensDbColumnForColumn(x, columns),
+ addFilterColumns(character(), filter = keys, x)))
+ ## OK, now perform the query given the filters we've got.
+ res <- getWhat(x, columns = ensCols, filter = keys)
+ ## Order results if length of filters is 1.
+ if (length(keys) == 1) {
+ ## Define the filters on which we could sort.
+ sortFilts <- c("GenenameFilter", "GeneidFilter", "EntrezidFilter", "GenebiotypeFilter",
+ "SymbolFilter", "TxidFilter", "TxbiotypeFilter", "ExonidFilter",
+ "ExonrankFilter", "SeqnameFilter")
+ if (class(keys[[1]]) %in% sortFilts) {
+ keyvals <- value(keys[[1]])
+ ## Handle symlink Filter differently:
+ if (is(keys[[1]], "SymbolFilter")) {
+ sortCol <- column(keys[[1]])
+ } else {
+ sortCol <- removePrefix(column(keys[[1]], x))
+ }
+ res <- res[order(match(res[, sortCol], keyvals)), ]
+ }
+ } else {
+ ## Show a mild warning message
+ message("Note: ordering of the results might not match ordering of keys!")
+ }
+ colMap <- .getColMappings(x)
+ colnames(res) <- colMap[colnames(res)]
+ rownames(res) <- NULL
+ if (returnFilterColumns(x))
+ return(res)
+ ## ## Now, if we've got a "TXNAME" in columns, we have to replace at least one of the "TXID"s
+ ## ## in the colnames...
+ ## if(any(columns == "TXNAME"))
+ ## colnames(res)[match("TXID", colnames(res))] <- "TXNAME"
+ return(res[, columns])
+}
+
+
+####============================================================
+## mapIds method
+##
+## maps the submitted keys (names of the returned vector) to values
+## of the column specified by column.
+## x, key, column, keytype, ..., multiVals
+####------------------------------------------------------------
+setMethod("mapIds", "EnsDb", function(x, keys, column, keytype, ..., multiVals){
+ if(missing(keys))
+ keys <- NULL
+ if(missing(column))
+ column <- NULL
+ if(missing(keytype))
+ keytype <- NULL
+ if(missing(multiVals))
+ multiVals <- NULL
+ return(.mapIds(x=x, keys=keys, column=column, keytype=keytype, multiVals=multiVals, ...))
+})
+## Other methods: saveDb, species, dbfile, dbconn, taxonomyId
+.mapIds <- function(x, keys=NULL, column=NULL, keytype=NULL, ..., multiVals=NULL){
+ if(is.null(keys))
+ stop("Argument keys has to be provided!")
+ if(!(is(keys, "character") | is(keys, "list") | is(keys, "BasicFilter")))
+ stop("Argument keys should be a character vector, an object extending BasicFilter ",
+ "or a list of objects extending BasicFilter.")
+ if(is.null(column))
+ column <- "GENEID"
+ ## Have to specify the columns argument. Has to be keytype and column.
+ if(is(keys, "character")){
+ if(is.null(keytype))
+ stop("Argument keytype is mandatory if keys is a character vector!")
+ columns <- c(keytype, column)
+ }
+ if(is(keys, "list") | is(keys, "BasicFilter")){
+ if(is(keys, "list")){
+ if(length(keys) > 1)
+ warning("Got ", length(keys), " filter objects.",
+ " Will use the keys of the first for the mapping!")
+ cn <- class(keys[[1]])[1]
+ }else{
+ cn <- class(keys)[1]
+ }
+ ## Use the first element to determine the keytype...
+ mapping <- .keytype2FilterMapping()
+ columns <- c(names(mapping)[mapping == cn], column)
+ keytype <- NULL
+ }
+ res <- select(x, keys=keys, columns=columns, keytype=keytype)
+ if(nrow(res) == 0)
+ return(character())
+ ## Handling multiVals.
+ if(is.null(multiVals))
+ multiVals <- "first"
+ if(is(multiVals, "function"))
+ stop("Not yet implemented!")
+ ## Eventually re-order the data.frame in the same order than the keys...
+ ## That's amazingly slow!!!
+ ## if(is.character(keys)){
+ ## res <- split(res, f=factor(res[, 1], levels=keys))
+ ## res <- do.call(rbind, res)
+ ## rownames(res) <- NULL
+ ## }
+ if(is.character(keys)){
+ theNames <- keys
+ }else{
+ theNames <- unique(res[, 1])
+ }
+ switch(multiVals,
+ first={
+ vals <- res[match(theNames, res[, 1]), 2]
+ names(vals) <- theNames
+ return(vals)
+ },
+ list={
+ ## vals <- split(res[, 2], f=factor(res[, 1], levels=unique(res[, 1])))
+ vals <- split(res[, 2], f=factor(res[, 1], levels=unique(theNames)))
+ return(vals)
+ },
+ filter={
+ vals <- split(res[, 2], f=factor(res[, 1], levels=unique(theNames)))
+ vals <- vals[unlist(lapply(vals, length)) == 1]
+ return(unlist(vals))
+ },
+ asNA={
+ ## Split the vector, set all those with multi mappings NA.
+ vals <- split(res[, 2], f=factor(res[, 1], levels=unique(theNames)))
+ vals[unlist(lapply(vals, length)) > 1] <- NA
+ return(unlist(vals))
+ },
+ CharacterList={
+ stop("Not yet implemented!")
+ })
+}
+
+
+
diff --git a/R/seqname-utils.R b/R/seqname-utils.R
new file mode 100644
index 0000000..e0a0533
--- /dev/null
+++ b/R/seqname-utils.R
@@ -0,0 +1,258 @@
+####============================================================
+## Methods and functions to allow usage of EnsDb objects also
+## with genomic resources that do not use Ensembl based
+## seqnames
+## We're storing the seqname style into the .properties slot
+## of the EnsDb object.
+####------------------------------------------------------------
+.ENSOPT.SEQNOTFOUND="ensembldb.seqnameNotFound"
+####============================================================
+## formatSeqnamesForQuery
+##
+## Formating/renamaing the seqname(s) according to the specified
+## style.
+## x is an EnsDb,
+## sn the seqnames to convert...
+## If a seqname can not be mapped NA will be returned.
+####------------------------------------------------------------
+setMethod("formatSeqnamesForQuery", "EnsDb", function(x, sn, ifNotFound){
+ return(.formatSeqnameByStyleForQuery(x, sn, ifNotFound))
+})
+## Little helper function that returns eventually the argument.
+## Returns MISSING if the argument was not set.
+.getSeqnameNotFoundOption <- function(){
+ notFound <- "MISSING"
+ if(any(names(options()) == .ENSOPT.SEQNOTFOUND)){
+ notFound <- getOption(.ENSOPT.SEQNOTFOUND)
+ ## Do some sanity checks?
+ }
+ return(notFound)
+}
+.formatSeqnameByStyleForQuery <- function(x, sn, ifNotFound){
+ ## Fixing ifNotFound, allowing that this can be set using options.
+ if(missing(ifNotFound)){
+ ifNotFound <- .getSeqnameNotFoundOption()
+ }
+ ## Map whatever to Ensembl seqnames, such that we can perform queries.
+ ## Use mapSeqlevels, or rather genomeStyles and do it hand-crafted!
+ sst <- seqlevelsStyle(x)
+ dbSst <- dbSeqlevelsStyle(x)
+ if(sst == dbSst)
+ return(sn)
+ ## Don't like that the genomeStyles is reading the stuff form file.
+ map <- getProperty(x, "genomeStyle")
+ if(!is(map, "data.frame"))
+ map <- genomeStyles(organism(x))
+ ## sn are supposed to be in sst style, map them to dbSst
+ idx <- match(sn, map[, sst])
+ mapped <- map[idx, dbSst]
+ if(any(is.na(mapped))){
+ noMap <- which(is.na(mapped))
+ seqNoMap <- unique(sn[noMap])
+ if(length(seqNoMap) > 5){
+ theMess <- paste0("More than 5 seqnames could not be mapped to ",
+ "the seqlevels style of the database (", dbSst, ")!")
+ }else{
+ theMess <- paste0("Seqnames: ", paste(seqNoMap, collapse=", "),
+ " could not be mapped to ",
+ " the seqlevels style of the database (", dbSst, ")!")
+ }
+ if(is.na(ifNotFound) | is.null(ifNotFound)){
+ ## Replacing the missing seqname mappings with ifNotFound.
+ mapped[noMap] <- ifNotFound
+ warnMess <- paste0(" Returning ", ifNotFound, " for these.")
+ }else{
+ ## If MISSING -> STOP
+ if(ifNotFound == "MISSING"){
+ stop(theMess)
+ }else{
+ ## Next special case: use the original names, i.e. don't map at all.
+ if(ifNotFound == "ORIGINAL"){
+ mapped[noMap] <- sn[noMap]
+ warnMess <- "Returning the orginal seqnames for these."
+ }else{
+ mapped[noMap] <- ifNotFound
+ warnMess <- paste0(" Returning ", ifNotFound, " for these.")
+ }
+ }
+ }
+ warning(theMess, warnMess)
+ }
+ return(mapped)
+}
+setMethod("formatSeqnamesFromQuery", "EnsDb", function(x, sn, ifNotFound){
+ return(.formatSeqnameByStyleFromQuery(x, sn, ifNotFound))
+})
+.formatSeqnameByStyleFromQuery <- function(x, sn, ifNotFound){
+ ## Fixing ifNotFound, allowing that this can be set using options.
+ if(missing(ifNotFound)){
+ ifNotFound <- .getSeqnameNotFoundOption()
+ }
+ ## Map Ensembl seqnames resulting form queries to the seqlevel style by
+ ## seqlevelsStyle.
+ sst <- seqlevelsStyle(x)
+ dbSst <- dbSeqlevelsStyle(x)
+ if(sst == dbSst)
+ return(sn)
+ ## Otherwise...
+ map <- getProperty(x, "genomeStyle")
+ if(!is(map, "data.frame"))
+ map <- genomeStyles(organism(x))
+ ## sn are supposed to be in sst style, map them to dbSst
+ idx <- match(sn, map[, dbSst])
+ mapped <- map[idx, sst]
+ if(any(is.na(mapped))){
+ noMap <- which(is.na(mapped))
+ seqNoMap <- unique(sn[noMap])
+ if(length(seqNoMap) > 5){
+ theMess <- paste0("More than 5 seqnames with seqlevels style of the database (",
+ dbSst, ") could not be mapped to the seqlevels style: ",
+ sst, "!)")
+ }else{
+ theMess <- paste0("Seqnames: ", paste(seqNoMap, collapse=", "),
+ " with seqlevels style of the database (", dbSst,
+ ") could not be mapped to seqlevels style: ", sst,
+ "!")
+ }
+
+ if(is.na(ifNotFound) | is.null(ifNotFound)){
+ ## Replacing the missing seqname mappings with ifNotFound.
+ mapped[noMap] <- ifNotFound
+ warnMess <- paste0(" Returning ", ifNotFound, " for these.")
+ }else{
+ ## If MISSING -> STOP
+ if(ifNotFound == "MISSING"){
+ stop(theMess)
+ }else{
+ ## Next special case: use the original names, i.e. don't map at all.
+ if(ifNotFound == "ORIGINAL"){
+ mapped[noMap] <- sn[noMap]
+ warnMess <- " Returning the orginal seqnames for these."
+ }else{
+ mapped[noMap] <- ifNotFound
+ warnMess <- paste0(" Returning ", ifNotFound, " for these.")
+ }
+ }
+ }
+ warning(theMess, warnMess)
+ }
+ return(mapped)
+}
+
+
+####============================================================
+## dbSeqlevelsStyle
+##
+## Returns the seqname style used by the database. Defaults to
+## Ensembl and reads the property: dbSeqlevelsStyle.
+####------------------------------------------------------------
+setMethod("dbSeqlevelsStyle", "EnsDb", function(x){
+ stl <- getProperty(x, "dbSeqlevelsStyle")
+ if(is.na(stl))
+ stl <- "Ensembl"
+ return(stl)
+})
+####============================================================
+## seqlevelStyle
+##
+## Get or set the seqlevel style. If we can't find the stype in
+## GenomeInfoDb throw and error.
+####------------------------------------------------------------
+setMethod("seqlevelsStyle", "EnsDb", function(x){
+ st <- getProperty(x, "seqlevelsStyle")
+ if(is.na(st))
+ st <- "Ensembl"
+ return(st)
+})
+setReplaceMethod("seqlevelsStyle", "EnsDb", function(x, value){
+ if(value == dbSeqlevelsStyle(x)){
+ ## Not much to do; that's absolutely fine.
+ x <- setProperty(x, seqlevelsStyle=value)
+ }else{
+ ## Have to check whether I have the mapping available in GenomeInfoDb, if not
+ ## -> throw an error.
+ dbStyle <- dbSeqlevelsStyle(x)
+ ## Note that both, the db seqlevel style and the style have to be available!
+ ## Check if we could use the mapping provided by GenomeInfoDb.
+ genSt <- try(genomeStyles(organism(x)), silent=TRUE)
+ if(is(genSt, "try-error")){
+ stop("No mapping of seqlevel styles available in GenomeInfoDb for",
+ " species ", organism(x), "! Please refer to the Vignette of the",
+ " GenomeInfoDb package if you would like to provide this mapping.")
+ }
+ if(!any(colnames(genSt) == value)){
+ stop("The provided seqlevels style is not known to GenomeInfoDb!")
+ }
+ if(!any(colnames(genSt) == dbStyle)){
+ stop("The seqlevels style of the database (", dbStyle,
+ ") is not known to GenomeInfoDb!")
+ }
+ ## If we got that far it should be OK
+ x <- setProperty(x, seqlevelsStyle=value)
+ x <- setProperty(x, genomeStyle=list(genSt))
+ }
+ return(x)
+})
+
+####============================================================
+## supportedSeqlevelsStyles
+##
+## Get all supported seqlevels styles for the species of the EnsDb
+####------------------------------------------------------------
+setMethod("supportedSeqlevelsStyles", "EnsDb", function(x){
+ map <- genomeStyles(organism(x))
+ cn <- colnames(map)
+ cn <- cn[!(cn %in% c("circular", "auto", "sex"))]
+ return(colnames(cn))
+})
+
+
+####==================== OLD STUFF BELOW ====================
+
+###==============================================================
+## Prefix chromosome names with "chr" if ucscChromosomeNames option
+## is set, otherwise, use chromosome names "as is".
+## This function should be used in functions that return results from
+## EnsDbs.
+###--------------------------------------------------------------
+prefixChromName <- function(x, prefix="chr"){
+ ucsc <- getOption("ucscChromosomeNames", default=FALSE)
+ if(ucsc){
+ ## TODO fix also the mitochondrial chromosome name.
+ mapping <- ucscToEnsMapping()
+ for(i in 1:length(mapping)){
+ x <- sub(x, pattern=names(mapping)[i], replacement=mapping[i],
+ fixed=TRUE)
+ }
+ ## Replace chr if it's already there
+ x <- gsub(x, pattern="^chr", replacement="", ignore.case=TRUE)
+ x <- paste0(prefix, x)
+ }
+ return(x)
+}
+
+###==============================================================
+## Remove leading "chr" to fit Ensembl based chromosome names.
+## This function should be called in functions that fetch data from
+## EnsDbs.
+###--------------------------------------------------------------
+ucscToEns <- function(x){
+ ## TODO rename all additional chromosome names.
+ mapping <- ucscToEnsMapping()
+ for(i in 1:length(mapping)){
+ x <- sub(x, pattern=mapping[i], replacement=names(mapping)[i],
+ fixed=TRUE)
+ }
+ x <- gsub(x, pattern="^chr", replacement="", ignore.case=TRUE)
+ return(x)
+}
+###============================================================
+## Returns a character vector, elements representing UCSC chromosome
+## names with their names corresponding to the respective Ensembl
+## chromosome names.
+###------------------------------------------------------------
+ucscToEnsMapping <- function(){
+ theMap <- c(MT="chrM")
+ return(theMap)
+}
+
diff --git a/R/zzz.R b/R/zzz.R
new file mode 100644
index 0000000..02f9d47
--- /dev/null
+++ b/R/zzz.R
@@ -0,0 +1,15 @@
+
+.onLoad <- function(libname, pkgname){
+ op <- options()
+ ## What should be returned by default if the seqnames can not be mapped based
+ ## on the style set by seqlevelsStyle<-
+ ## Options:
+ ## + NA or any other value: return this value for such cases.
+ ## + MISSING: stop and throw an error.
+ ## + ORIGINAL: return the original seqnames.
+ opts.ens <- list(useFancyQuotes=FALSE,
+ ensembldb.seqnameNotFound="ORIGINAL")
+ options(opts.ens)
+ invisible()
+}
+
diff --git a/build/vignette.rds b/build/vignette.rds
new file mode 100644
index 0000000..6e57dc9
Binary files /dev/null and b/build/vignette.rds differ
diff --git a/debian/README.test b/debian/README.test
deleted file mode 100644
index bb496c4..0000000
--- a/debian/README.test
+++ /dev/null
@@ -1,13 +0,0 @@
-Notes on how this package can be tested.
-────────────────────────────────────────
-
-This package can be tested by running the provided test:
-
-LC_ALL=C R --no-save <<EOT
-BiocGenerics:::testPackage("ensembldb")
-EOT
-
-in order to confirm its integrity. However, to successfully run this
-testsuite you need to install the EnsDb.Hsapiens.v75 BioConductor
-databases. It was decided that creating Debian packages of large size
-just to run the test suite is not very sensible.
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index 5137c0a..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,22 +0,0 @@
-r-bioc-ensembldb (1.6.2-1) unstable; urgency=medium
-
- * New upstream version
- * debhelper 10
- * d/watch: version=4
-
- -- Andreas Tille <tille at debian.org> Wed, 30 Nov 2016 10:33:07 +0100
-
-r-bioc-ensembldb (1.6.0-1) unstable; urgency=medium
-
- * New upstream version
- * Convert to dh-r
- * Generic BioConductor homepage
- * Exclude tests requiring EnsDb.Hsapiens.v75 from unit tests
-
- -- Andreas Tille <tille at debian.org> Thu, 27 Oct 2016 14:39:59 +0200
-
-r-bioc-ensembldb (1.4.6-1) unstable; urgency=low
-
- * Initial release (closes: #825906)
-
- -- Andreas Tille <tille at debian.org> Tue, 31 May 2016 11:41:29 +0200
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index f599e28..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-10
diff --git a/debian/control b/debian/control
deleted file mode 100644
index f694613..0000000
--- a/debian/control
+++ /dev/null
@@ -1,31 +0,0 @@
-Source: r-bioc-ensembldb
-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Andreas Tille <tille at debian.org>
-Section: gnu-r
-Priority: optional
-Build-Depends: debhelper (>= 10),
- dh-r,
- r-base-dev,
- r-bioc-genomicfeatures,
- r-bioc-annotationhub
-Standards-Version: 3.9.8
-Vcs-Browser: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/R/r-bioc-ensembldb/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/R/r-bioc-ensembldb/trunk/
-Homepage: https://bioconductor.org/packages/ensembldb/
-
-Package: r-bioc-ensembldb
-Architecture: all
-Depends: ${R:Depends},
- ${misc:Depends},
-Recommends: ${R:Recommends}
-Suggests: ${R:Suggests}
-Description: GNU R utilities to create and use an Ensembl based annotation database
- The package provides functions to create and use transcript centric
- annotation databases/packages. The annotation for the databases are
- directly fetched from Ensembl using their Perl API. The functionality
- and data is similar to that of the TxDb packages from the
- GenomicFeatures package, but, in addition to retrieve all
- gene/transcript models and annotations from the database, the ensembldb
- package provides also a filter framework allowing to retrieve
- annotations for specific entries like genes encoded on a chromosome
- region or transcript models of lincRNA genes.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index 08c8a37..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,115 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: ensembldb
-Upstream-Contact: Johannes Rainer <johannes.rainer at eurac.edu>
-Source: https://bioconductor.org/packages/ensembldb/
-
-Files: *
-Copyright: 2006-2016 Johannes Rainer <johannes.rainer at eurac.edu>,
- Tim Triche <tim.triche at usc.edu>
-License: LGPL-2.1+
-
-Files: R/makeEnsemblDbPackage.R
- man/makeEnsemblDbPackage.Rd
-Copyright: 2006-2014 M. Carlson, H. Pages, P. Aboyoun, S. Falcon, M. Morgan, D. Sarkar, M. Lawrence
-License: Artistic-2.0
- The "Artistic License"
- .
- Preamble
- .
- 1. You may make and give away verbatim copies of the source form of the
- Standard Version of this Package without restriction, provided that
- you duplicate all of the original copyright notices and associated
- disclaimers.
- .
- 2. You may apply bug fixes, portability fixes and other modifications
- derived from the Public Domain or from the Copyright Holder. A
- Package modified in such a way shall still be considered the Standard
- Version.
- .
- 3. You may otherwise modify your copy of this Package in any way,
- provided that you insert a prominent notice in each changed file stating
- how and when you changed that file, and provided that you do at least
- ONE of the following:
- .
- a) place your modifications in the Public Domain or otherwise make them
- Freely Available, such as by posting said modifications to Usenet or
- an equivalent medium, or placing the modifications on a major archive
- site such as uunet.uu.net, or by allowing the Copyright Holder to include
- your modifications in the Standard Version of the Package.
- .
- b) use the modified Package only within your corporation or organization.
- .
- c) rename any non-standard executables so the names do not conflict
- with standard executables, which must also be provided, and provide
- a separate manual page for each non-standard executable that clearly
- documents how it differs from the Standard Version.
- .
- d) make other distribution arrangements with the Copyright Holder.
- .
- 4. You may distribute the programs of this Package in object code or
- executable form, provided that you do at least ONE of the following:
- .
- a) distribute a Standard Version of the executables and library files,
- together with instructions (in the manual page or equivalent) on where
- to get the Standard Version.
- .
- b) accompany the distribution with the machine-readable source of
- the Package with your modifications.
- .
- c) give non-standard executables non-standard names, and clearly
- document the differences in manual pages (or equivalent), together
- with instructions on where to get the Standard Version.
- .
- d) make other distribution arrangements with the Copyright Holder.
- .
- 5. You may charge a reasonable copying fee for any distribution of this
- Package. You may charge any fee you choose for support of this Package.
- You may not charge a fee for this Package itself. However, you may
- distribute this Package in aggregate with other (possibly commercial)
- programs as part of a larger (possibly commercial) software distribution
- provided that you do not advertise this Package as a product of your
- own. You may embed this Package's interpreter within an executable of
- yours (by linking); this shall be construed as a mere form of
- aggregation, provided that the complete Standard Version of the
- interpreter is so embedded.
- .
- 6. The scripts and library files supplied as input to or produced as
- output from the programs of this Package do not automatically fall under
- the copyright of this Package, but belong to whoever generated them, and
- may be sold commercially, and may be aggregated with this Package. If
- such scripts or library files are aggregated with this Package via the
- so-called "undump" or "unexec" methods of producing a binary executable
- image, then distribution of such an image shall neither be construed as
- a distribution of this Package nor shall it fall under the restrictions
- of Paragraphs 3 and 4, provided that you do not represent such an
- executable image as a Standard Version of this Package.
- .
- 7. C subroutines (or comparably compiled subroutines in other
- languages) supplied by you and linked into this Package in order to
- emulate subroutines and variables of the language defined by this
- Package shall not be considered part of this Package, but are the
- equivalent of input as in Paragraph 6, provided these subroutines do
- not change the language in any way that would cause it to fail the
- regression tests for the language.
- .
- 8. Aggregation of this Package with a commercial distribution is always
- permitted provided that the use of this Package is embedded; that is,
- when no overt attempt is made to make this Package's interfaces visible
- to the end user of the commercial distribution. Such use shall not be
- construed as a distribution of this Package.
- .
- 9. The name of the Copyright Holder may not be used to endorse or promote
- products derived from this software without specific prior written permission.
- .
- 10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
- IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
- WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
-Comment: part of this code is from GenomicFeatures makeTxDbPackage.R
-
-Files: debian/*
-Copyright: 2016 Andreas Tille <tille at debian.org>
-License: LGPL-2.1+
-
-License: LGPL-2.1+
- On Debian GNU/Linux system you can find the complete text of the
- LGPL license in '/usr/share/common-licenses/LGPL-2.1'.
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index 50f6656..0000000
--- a/debian/docs
+++ /dev/null
@@ -1 +0,0 @@
-debian/README.test
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index 1355319..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,21 +0,0 @@
-#!/usr/bin/make -f
-
-debRreposname := $(shell dpkg-parsechangelog | awk '/^Source:/ {print $$2}' | sed 's/r-\([a-z]\+\)-.*/\1/')
-awkString := "'/^(Package|Bundle):/ {print $$2 }'"
-cranNameOrig := $(shell awk "$(awkString)" DESCRIPTION)
-cranName := $(shell echo "$(cranNameOrig)" | tr A-Z a-z)
-package := r-$(debRreposname)-$(cranName)
-debRdir := usr/lib/R/site-library
-debRlib := $(CURDIR)/debian/$(package)/$(debRdir)
-
-%:
- dh $@ --buildsystem R
-
-override_dh_fixperms:
- dh_fixperms
- find debian -name "*.pl" -exec chmod +x \{\} \;
-
-# remove those unit tests that are requiring EnsDb.Hsapiens.v75
-override_dh_install:
- dh_install
- for tst in `grep -H -l 'library(.*EnsDb.Hsapiens.v75.*)' $(debRlib)/$(cranNameOrig)/unitTests/test_*.R` ; do rm -f $${tst} ; done
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/tests/control b/debian/tests/control
deleted file mode 100644
index 25377fc..0000000
--- a/debian/tests/control
+++ /dev/null
@@ -1,3 +0,0 @@
-Tests: run-unit-test
-Depends: @, r-cran-runit
-Restrictions: allow-stderr
diff --git a/debian/tests/run-unit-test b/debian/tests/run-unit-test
deleted file mode 100644
index 5a0e2a1..0000000
--- a/debian/tests/run-unit-test
+++ /dev/null
@@ -1,5 +0,0 @@
-#!/bin/sh -e
-
-LC_ALL=C R --no-save <<EOT
-BiocGenerics:::testPackage("ensembldb")
-EOT
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index 9ae4727..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,3 +0,0 @@
-version=4
-opts=downloadurlmangle=s?^(.*)\.\.?http:$1packages/release/bioc? \
- http://www.bioconductor.org/packages/release/bioc/html/ensembldb.html .*/ensembldb_([\d\.]+)\.tar\.gz
diff --git a/inst/NEWS b/inst/NEWS
new file mode 100644
index 0000000..5391de0
--- /dev/null
+++ b/inst/NEWS
@@ -0,0 +1,488 @@
+CHANGES IN VERSION 1.6.2
+------------------------
+
+BUG FIXES:
+ o Avoid errors when using EnsDbs with protein annotations.
+
+
+CHANGES IN VERSION 1.6.1
+------------------------
+
+BUG FIXES:
+ o Fix plain return statements in shiny server.R.
+
+CHANGES IN VERSION 1.5.14
+-------------------------
+
+NEW FEATURES:
+ o listEnsDbs function to list EnsDb databases in a MySQL server.
+ o EnsDb constructor function allows to directly connect to a EnsDb database
+ in a MySQL server.
+ o useMySQL compares the creation date between database and SQLite version and
+ proposes to update database if different.
+
+
+CHANGES IN VERSION 1.5.13
+-------------------------
+
+NEW FEATURES:
+ o useMySQL method to insert the data into a MySQL database and switch backend
+ from SQLite to MySQL.
+
+
+CHANGES IN VERSION 1.5.12
+-------------------------
+
+USER VISIBLE CHANGES:
+ o Add additional indices on newly created database which improves performance
+ considerably.
+
+BUG FIXES
+ o Fix issue #11: performance problems with RSQLite 1.0.9011. Ordering for
+ cdsBy, transcriptsBy, UTRs by is performed in R and not in SQL.
+ o Fix ordering bug: results were sorted by columns in alphabetical order
+ (e.g. if order.by = "seq_name, gene_seq_start" was provided they were sorted
+ by gene_seq_start and then by seq_name
+
+
+CHANGES IN VERSION 1.5.11
+-------------------------
+
+BUG FIXES
+ o makeEnsemblSQLiteFromTables and ensDbFromGRanges perform sanity checks
+ on the input tables.
+
+
+CHANGES IN VERSION 1.5.10
+-------------------------
+
+USER VISIBLE CHANGES:
+ o Using html_document2 style for the vignette.
+
+CHANGES IN VERSION 1.5.9
+-------------------------
+
+NEW FEATURES:
+ o New SymbolFilter.
+ o returnFilterColumns method to enable/disable that filter columns are also
+ returned by the methods (which is the default).
+ o select method support for SYMBOL keys, columns and filter.
+ o Select method does ensure result ordering matches the input keys if a
+ single filter or only keys are provided.
+
+
+CHANGES IN VERSION 1.5.8
+-------------------------
+
+BUG FIXES
+ o Fix problem with white space separated species name in ensDbFromGRanges.
+
+
+CHANGES IN VERSION 1.5.7
+-------------------------
+
+OTHER CHANGES
+ o Fixed typos in documentation
+
+
+CHANGES IN VERSION 1.5.6
+-------------------------
+
+BUG FIXES
+ o Fix warning fo validation of numeric BasicFilter.
+
+
+CHANGES IN VERSION 1.5.5
+-------------------------
+
+BUG FIXES
+ o exonsBy: did always return tx_id, even if not present in columns argument.
+
+
+CHANGES IN VERSION 1.5.4
+-------------------------
+
+Bug fixes
+ o Column tx_id was always removed from exonsBy result even if in the
+ columns argument.
+ o exon_idx was of type character if database generated from a GTF file.
+
+
+CHANGES IN VERSION 1.5.2
+-------------------------
+
+NEW FEATURES:
+ o Added support for column tx_name in all methods and in the keys and select methods.
+ Values in the returned tx_name columns correspond to the tx_id.
+ o Update documentation.
+
+
+CHANGES IN VERSION 1.5.1
+-------------------------
+
+BUG FIXES
+ o tx_id was removed from metadata columns in txBy.
+ o Fixed a bug that caused exon_idx column to be character if database created
+ from a GTF.
+
+
+CHANGES IN VERSION 1.3.20
+-------------------------
+
+BUG FIXES
+ o methods transcripts, genes etc don't result in an error when columns are specified which
+ are not present in the database and the return.type is GRanges.
+ o Removed the transcriptLengths method implemented in ensembldb in favor of using the one
+ from GenomicFeatures.
+
+
+CHANGES IN VERSION 1.3.19
+-------------------------
+
+BUG FIXES
+ o ensDbFromGRanges (and thus ensDbFromGtf, ensDbFromGff and ensDbFromAH) support now
+ Ensembl GTF file formats from version 74 and before.
+
+
+CHANGES IN VERSION 1.3.18
+-------------------------
+
+NEW FEATURES
+ o New ExonrankFilter to filter based on exon index/rank.
+
+
+CHANGES IN VERSION 1.3.17
+-------------------------
+
+BUG FIXES
+ o Use setdiff/intersect instead of psetdiff/pintersect.
+
+
+CHANGES IN VERSION 1.3.16
+-------------------------
+
+BUG FIXES
+ o Fixed failing test.
+
+
+CHANGES IN VERSION 1.3.15
+-------------------------
+
+NEW FEATURES
+ o GRangesFilter now supports GRanges of length > 1.
+ o seqlevels method for GRangesFilter.
+ o New methods exonsByOverlaps and transcriptsByOverlaps.
+
+
+CHANGES IN VERSION 1.3.14
+-------------------------
+
+NEW FEATURES
+ o seqlevelsStyle getter and setter method to change the enable easier integration
+ of EnsDb objects with UCSC based packages. supportedSeqlevelsStyle method to list
+ possible values. Global option "ensembldb.seqnameNotFound" allows to adapt the
+ behaviour of the mapping functions when a seqname can not be mapped properly.
+ o Added a seqlevels method for EnsDb objects.
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o Add an example to extract transcript sequences directly from an EnsDb object to
+ the vignette.
+ o Add examples to use EnsDb objects with UCSC chromosome names to the vignette.
+
+BUG FIXES
+ o Seqinfo for genes, transcripts and exons contain now only the seqnames returned
+ in the GRanges, not all that are in the database.
+
+
+CHANGES IN VERSION 1.3.13
+-------------------------
+
+NEW FEATURES
+ o EnsDb: new "hidden" slot to store additional properties and a method updateEnsDb
+ to update objects to the new implementation.
+ o New method "transcriptLengths" for EnsDb that creates a similar data.frame than
+ the same named function in the GenomicFeatures package.
+
+BUG FIXES
+ o fiveUTRsByTranscript and threeUTRsByTranscript returned wrong UTRs for some special
+ cases in which the CDS start and end were in the same exon. This has been fixed.
+
+
+CHANGES IN VERSION 1.3.12
+-------------------------
+
+NEW FEATURES
+ o ensDbFromGff and ensDbFromAH functions to build EnsDb objects from GFF3 files
+ or directly from AnnotationHub ressources.
+ o getGenomeFaFile does now also retrieve Fasta files for the "closest" Ensembl
+ release if none is available for the matching version.
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o Removed argument 'verbose' in ensDbFromGRanges and ensDbFromGtf.
+ o Updated parts of the vignette.
+ o Removed method extractTranscriptSeqs again due to some compatibility problems
+ with GenomicRanges.
+
+BUG FIXES
+ o Avoid wrong CDS start/end position definition for Ensembl gtf files in which the
+ start or end codon is outside the CDS.
+
+
+CHANGES IN VERSION 1.3.11
+-------------------------
+
+BUG FIXES
+ o "select" method returns now also the keytype as a column from the database.
+
+
+CHANGES IN VERSION 1.3.10
+-------------------------
+
+NEW FEATURES
+ o Implemented methods columns, keys, keytypes, mapIds and select from AnnotationDbi.
+ o Methods condition<- and value<- for BasicFilter.
+
+
+CHANGES IN VERSION 1.3.9
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o The shiny app now allows to return the search results.
+
+
+
+CHANGES IN VERSION 1.3.7
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o Some small changes to the vignette.
+
+BUG FIXES
+ o Fixed a problem in an unit test.
+
+
+CHANGES IN VERSION 1.3.6
+------------------------
+
+BUG FIXES
+ o Fixed a bug in ensDbFromGRanges.
+
+
+CHANGES IN VERSION 1.3.5
+------------------------
+
+NEW FEATURES
+ o Added GRangesFilter enabling filtering using a (single!) GRanges object.
+ o Better usability and compatibility with chromosome names: SeqnameFilter and
+ GRangesFilter support both Ensembl and UCSC chromosome names, if option
+ ucscChromosomeNames is set to TRUE returned chromosome/seqnames are in
+ UCSC format.
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o Added method "value" for BasicFilter objects.
+
+BUG FIXES
+ o transcripts, genes, exons return now results sorted
+ by seq name and start coordinate.
+
+
+CHANGES IN VERSION 1.3.4
+------------------------
+
+NEW FEATURES
+ o Added extractTranscriptSeqs method for EnsDb objects.
+
+
+SIGNIFICANT USER-VISIBLE CHANGES
+ o Added a section to the vignette describing the use of ensembldb in Gviz.
+ o Fixed the vignette to conform the "Bioconductor style".
+ o Added argument use.names to exonsBy.
+
+BUG FIXES
+ o Fixed bug with getGeneRegionTrackForGviz with only chromosome specified.
+ o Fixed an internal problem subsetting a seqinfo.
+
+
+CHANGES IN VERSION 1.3.3
+------------------------
+
+NEW FEATURES
+ o Add method getGeneRegionTrackForGviz to enable using EnsDb databases for Gviz.
+
+BUG FIXES
+ o cdsBy, fiveUTRsForTranscript and threeUTRsForTranscript do no longer throw
+ an error if nothing was found but return NULL and produce a warning.
+
+
+
+CHANGES IN VERSION 1.3.2
+------------------------
+
+NEW FEATURES
+ o Implemented methods cdsBy, fiveUTRsForTranscript and threeUTRsForTranscript
+ for EnsDb.
+
+
+
+CHANGES IN VERSION 1.3.1
+------------------------
+
+BUG FIXES
+ o Ensuring that methods exons, genes and transcripts return columns in the
+ same order than provided with argument 'columns' for return.type 'data.frame'
+ or 'DataFrame'.
+
+
+
+CHANGES IN VERSION 1.1.9
+------------------------
+
+BUG FIXES
+
+ o Fixed a figure placement problem that can result in an error on certain
+ systems using a recent TexLive distribution.
+
+
+
+CHANGES IN VERSION 1.1.6
+------------------------
+
+BUG FIXES
+
+ o Fix a bug in lengthOf that caused an error if no filter was supplied.
+
+
+CHANGES IN VERSION 1.1.5
+------------------------
+
+NEW FEATURES
+
+ o Implemented a shiny web app to search for genes/transcripts/exons using
+ annotation of an EnsDb annotation package (function runEnsDbApp).
+
+
+
+CHANGES IN VERSION 1.1.4
+------------------------
+
+NEW FEATURES
+
+ o Added promoters method.
+
+
+
+CHANGES IN VERSION 1.1.3
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+ o Added method ensemblVersion that returns the Ensembl version the package bases on.
+ o Added method getGenomeFaFile that queries AnnotationHub to retrieve the Genome
+ FaFile matching the Ensembl version of the EnsDb object.
+
+
+CHANGES IN VERSION 1.1.2
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+ o Added examples to the vignette for building an EnsDb using AnnotationHub along with
+ the matching genomic sequence.
+ o Added an example for fetching the sequences of genes, transcripts and exons to the vignette.
+
+
+BUG FIXES
+
+ o Fixed a bug in ensDbFromGRanges and ensDbFromGtf in which the genome build version
+ was not set even if provided.
+
+
+
+CHANGES IN VERSION 1.1.1
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+ o The filter argument in all functions supports now also submission of a filter
+ object, not only of a list of filter objects.
+
+
+
+CHANGES IN VERSION 0.99.18
+--------------------------
+
+BUG FIXES
+
+ o Fixed a problem in processing GTF files without header information.
+
+ o Fixed a bug failing to throw an error if not all required feature types are
+ available in the GTF.
+
+
+
+CHANGES IN VERSION 0.99.17
+--------------------------
+
+NEW FEATURES
+
+ o Added new function ensDbFromGRanges that builds an EnsDB database from information
+ provided in a GRanges object (e.g. retrieved from the AnnotationHub).
+
+
+
+CHANGES IN VERSION 0.99.16
+--------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+ o Added argument outfile to ensDbFromGtf that allows to manually specify the file
+ name of the database file.
+
+ o ensDbFromGtf tries now to automatically fetch the sequence lengths from Ensembl.
+
+
+BUG FIXES
+
+ o Fixed the function that extracts the genome build version from the gtf file name.
+
+
+
+CHANGES IN VERSION 0.99.15
+--------------------------
+
+NEW FEATURES
+
+ o metadata method to extract the information from the metadata database table.
+
+ o ensDbFromGtf function to generate a EnsDb SQLite file from an (Ensembl)
+ GTF file.
+
+
+
+CHANGES IN VERSION 0.99.14
+--------------------------
+
+BUG FIXES
+
+ o Fixed a problem when reading tables fetched from Ensembl that contained ' or #.
+
+
+
+CHANGES IN VERSION 0.99.13
+--------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+ o Added argument "port" to the fetchTablesFromEnsembl to allow specifying the MySQL port
+ of the database.
+
+
+
+CHANGES IN VERSION 0.99.12
+--------------------------
+
+BUG FIXES
+
+ o argument "x" for method organism changed to "object".
+
+
diff --git a/inst/YGRanges.RData b/inst/YGRanges.RData
new file mode 100644
index 0000000..b34e142
Binary files /dev/null and b/inst/YGRanges.RData differ
diff --git a/inst/chrY/ens_chromosome.txt b/inst/chrY/ens_chromosome.txt
new file mode 100644
index 0000000..28aaf36
--- /dev/null
+++ b/inst/chrY/ens_chromosome.txt
@@ -0,0 +1,2 @@
+seq_name seq_length is_circular
+Y 59373566 0
diff --git a/inst/chrY/ens_exon.txt b/inst/chrY/ens_exon.txt
new file mode 100644
index 0000000..d2cfcaf
--- /dev/null
+++ b/inst/chrY/ens_exon.txt
@@ -0,0 +1,2700 @@
+exon_id exon_seq_start exon_seq_end
+ENSE00001902471 21878482 21878581
+ENSE00003665408 21878165 21878359
+ENSE00003572891 21877709 21877890
+ENSE00003452486 21877470 21877625
+ENSE00003535737 21872262 21872358
+ENSE00003306433 21871337 21872146
+ENSE00003587057 21870761 21870899
+ENSE00003522786 21870130 21870309
+ENSE00003674105 21869820 21869957
+ENSE00003605358 21869033 21869632
+ENSE00001928834 21868327 21868749
+ENSE00001954294 21865751 21868231
+ENSE00001733393 21906557 21906809
+ENSE00000891759 21906271 21906439
+ENSE00000652508 21905048 21905125
+ENSE00000652506 21903621 21903743
+ENSE00001788914 21903204 21903374
+ENSE00001805865 21901414 21901548
+ENSE00000652503 21897507 21897636
+ENSE00000652502 21897238 21897383
+ENSE00000652501 21894470 21894628
+ENSE00001799734 21893932 21894051
+ENSE00000652498 21893658 21893816
+ENSE00001786866 21883016 21883197
+ENSE00003497148 21882758 21882920
+ENSE00003688313 21878482 21878601
+ENSE00003664560 21878165 21878359
+ENSE00003602862 21877709 21877890
+ENSE00003491684 21877501 21877625
+ENSE00003488142 21877238 21877385
+ENSE00003534652 21872262 21872358
+ENSE00003607600 21871337 21871695
+ENSE00003544083 21870761 21870899
+ENSE00003591494 21870130 21870309
+ENSE00003484526 21869820 21869957
+ENSE00003539670 21869033 21869632
+ENSE00001670444 21868680 21868749
+ENSE00001594027 21868327 21868526
+ENSE00001945799 21867301 21868231
+ENSE00001277406 21906557 21906647
+ENSE00001919342 21867306 21868231
+ENSE00001946630 21883016 21884353
+ENSE00003582949 21882758 21882920
+ENSE00003593440 21878482 21878601
+ENSE00001900202 21877709 21877997
+ENSE00003677957 21877501 21877625
+ENSE00003607967 21877238 21877385
+ENSE00003484073 21871337 21871695
+ENSE00001941473 21867311 21868231
+ENSE00001741626 21867949 21868231
+ENSE00001692290 21870761 21870835
+ENSE00001803510 21870208 21870309
+ENSE00001658853 21869068 21869632
+ENSE00001849878 21872262 21872492
+ENSE00001940004 21871586 21872146
+ENSE00001928223 21878165 21878234
+ENSE00001930174 21877022 21877625
+ENSE00001685428 21906557 21906597
+ENSE00001652491 21883158 21883197
+ENSE00002279676 21906557 21906825
+ENSE00002207746 21885227 21885319
+ENSE00002285913 21867303 21868231
+ENSE00001402535 15016019 15016325
+ENSE00001403038 15016846 15016892
+ENSE00003463722 15019448 15019505
+ENSE00001605860 15021271 15021318
+ENSE00000862006 15023751 15023880
+ENSE00000773394 15024639 15024794
+ENSE00003574309 15024875 15024974
+ENSE00003652139 15025630 15025765
+ENSE00003536211 15026476 15026561
+ENSE00001614446 15026796 15026894
+ENSE00001746396 15026979 15027139
+ENSE00000773388 15027542 15027686
+ENSE00003534942 15027795 15027939
+ENSE00000773386 15028173 15028354
+ENSE00001703401 15028429 15028546
+ENSE00001729063 15028819 15028972
+ENSE00001722940 15029315 15029454
+ENSE00001928277 15029955 15030451
+ENSE00001685858 15016029 15016325
+ENSE00001350052 15016769 15016892
+ENSE00003656145 15025630 15025765
+ENSE00001832940 15016742 15016892
+ENSE00001352037 15029955 15032390
+ENSE00001826669 15016760 15016892
+ENSE00003643575 15019448 15019505
+ENSE00001906809 15021271 15021607
+ENSE00001797666 15017649 15017726
+ENSE00003670172 15026476 15026561
+ENSE00001937992 15024552 15024974
+ENSE00001848230 15025630 15025713
+ENSE00003512200 15024875 15024974
+ENSE00003537802 15025630 15025765
+ENSE00001948256 15026476 15026811
+ENSE00001935991 15024920 15024974
+ENSE00003534699 15026476 15026561
+ENSE00001893350 15026796 15027139
+ENSE00001811040 15027408 15027686
+ENSE00003693954 15027795 15027939
+ENSE00001922429 15028173 15028265
+ENSE00001803243 2803322 2803487
+ENSE00002223884 2821950 2822038
+ENSE00003645989 2829115 2829687
+ENSE00003548678 2843136 2843285
+ENSE00003611496 2843552 2843695
+ENSE00001649504 2844711 2844863
+ENSE00001777381 2845981 2846121
+ENSE00001494540 2846851 2850547
+ENSE00001857352 2803541 2803810
+ENSE00003626126 2843136 2843285
+ENSE00003631374 2843552 2843695
+ENSE00001955114 2845981 2846094
+ENSE00001334555 2803546 2803810
+ENSE00001597745 2829115 2829327
+ENSE00001900413 2845860 2846121
+ENSE00001880607 2846851 2847391
+ENSE00001746216 2803112 2803487
+ENSE00001368923 2846851 2850546
+ENSE00001648585 2803518 2803810
+ENSE00003604519 2829115 2829687
+ENSE00001408731 6778727 6779023
+ENSE00001494437 6780129 6780213
+ENSE00001494434 6846254 6846284
+ENSE00001494433 6863845 6863939
+ENSE00001494431 6889490 6889578
+ENSE00001793373 6893076 6893183
+ENSE00003432678 6911021 6911166
+ENSE00003437631 6931938 6932190
+ENSE00003379331 6938237 6938369
+ENSE00001213593 6938761 6938902
+ENSE00001607530 6939601 6939664
+ENSE00001716749 6939774 6939871
+ENSE00001744765 6942601 6942661
+ENSE00001731081 6948773 6948894
+ENSE00001654905 6953939 6954013
+ENSE00001729171 6954331 6954458
+ENSE00001763966 6955308 6955473
+ENSE00001593786 6958130 6958231
+ENSE00001370395 6959513 6959724
+ENSE00001631277 4868267 4868646
+ENSE00001619635 4899947 4900052
+ENSE00001755808 4900711 4900769
+ENSE00000981568 4924930 4925500
+ENSE00001640924 4966256 4968748
+ENSE00001602875 4972385 4973485
+ENSE00001350198 4924131 4925500
+ENSE00001944263 4972385 4972741
+ENSE00001803775 5369098 5369296
+ENSE00001677522 5449816 5449839
+ENSE00001322750 5605313 5610265
+ENSE00001436852 4924865 4925500
+ENSE00001731866 5483308 5483316
+ENSE00001711324 5491131 5491145
+ENSE00001779807 5605313 5605983
+ENSE00001348274 6742013 6742068
+ENSE00001671586 6740596 6740661
+ENSE00001645681 6738047 6738094
+ENSE00000652250 6736773 6736817
+ENSE00001667251 6736078 6736503
+ENSE00001494454 6733959 6734119
+ENSE00001494452 6740596 6740649
+ENSE00001651085 6736909 6736950
+ENSE00001727000 6734114 6734119
+ENSE00002300179 7142013 7142519
+ENSE00003518103 7171978 7172146
+ENSE00003489469 7193947 7194210
+ENSE00003507474 7209156 7209275
+ENSE00003502780 7224176 7224271
+ENSE00002194667 7235397 7235474
+ENSE00001494358 7239784 7239930
+ENSE00001494356 7243753 7249589
+ENSE00002183990 7142354 7142519
+ENSE00002194098 7239784 7239909
+ENSE00001877035 7194108 7194210
+ENSE00001918349 7196068 7196444
+ENSE00001948825 7201071 7201149
+ENSE00001928357 7224176 7224264
+ENSE00002210406 7142336 7142519
+ENSE00003574133 7171978 7172146
+ENSE00003534383 7193947 7194210
+ENSE00003600341 7209156 7209275
+ENSE00003520510 7224176 7224271
+ENSE00001661914 7235397 7235456
+ENSE00001020131 14813160 14813984
+ENSE00001020125 14820567 14820626
+ENSE00003651478 14821321 14821476
+ENSE00003476271 14832522 14832670
+ENSE00003602419 14834041 14834120
+ENSE00003467347 14837046 14837158
+ENSE00003693194 14838508 14838726
+ENSE00003654508 14847546 14847661
+ENSE00003550081 14847932 14848183
+ENSE00003624888 14848345 14848483
+ENSE00003487242 14850091 14850243
+ENSE00003601630 14851459 14851563
+ENSE00003666147 14869122 14869328
+ENSE00003566318 14870436 14870572
+ENSE00003557039 14872414 14872547
+ENSE00003515132 14883002 14883089
+ENSE00003621719 14885517 14885859
+ENSE00003555621 14887405 14887500
+ENSE00003655936 14888583 14888794
+ENSE00003476364 14889953 14890193
+ENSE00003510955 14890540 14890689
+ENSE00003621863 14891460 14891580
+ENSE00003560948 14898137 14898267
+ENSE00003561084 14898455 14898733
+ENSE00003508322 14902340 14902465
+ENSE00003664000 14903432 14903557
+ENSE00003547650 14904965 14905134
+ENSE00003694071 14922114 14922222
+ENSE00003679141 14922607 14922753
+ENSE00003652645 14923570 14923716
+ENSE00003564733 14924765 14924987
+ENSE00003681535 14928059 14928279
+ENSE00003669946 14930355 14930545
+ENSE00003585886 14945614 14945787
+ENSE00003635310 14949837 14949978
+ENSE00003552773 14951790 14952540
+ENSE00003548868 14952936 14953059
+ENSE00001601104 14954166 14954391
+ENSE00003586868 14954989 14955118
+ENSE00003686311 14958258 14958443
+ENSE00003476240 14958858 14959078
+ENSE00003513236 14959164 14959252
+ENSE00003654441 14968265 14968421
+ENSE00003686907 14968558 14968770
+ENSE00003595540 14969491 14969586
+ENSE00003502466 14971204 14972764
+ENSE00001869747 14813969 14813984
+ENSE00003574267 14821321 14821476
+ENSE00001853448 14832522 14833136
+ENSE00001952418 14821369 14821476
+ENSE00003642330 14832522 14832670
+ENSE00003608903 14834041 14834120
+ENSE00003569874 14837046 14837158
+ENSE00003525307 14838508 14838726
+ENSE00003468269 14847546 14847661
+ENSE00003668411 14847932 14848183
+ENSE00003544199 14848345 14848483
+ENSE00003543968 14850091 14850243
+ENSE00003678761 14851459 14851563
+ENSE00003522394 14869122 14869328
+ENSE00003556397 14870436 14870572
+ENSE00003484639 14872414 14872547
+ENSE00003494702 14883002 14883089
+ENSE00003664430 14885517 14885859
+ENSE00003545466 14887405 14887500
+ENSE00003634092 14888583 14888794
+ENSE00003562091 14889953 14890193
+ENSE00003499612 14890540 14890689
+ENSE00003612302 14891460 14891580
+ENSE00003510705 14898137 14898267
+ENSE00003494757 14898455 14898733
+ENSE00003626016 14902340 14902465
+ENSE00003624705 14903432 14903557
+ENSE00003646476 14904965 14905134
+ENSE00003683748 14922114 14922222
+ENSE00003522502 14922607 14922753
+ENSE00003692063 14923570 14923716
+ENSE00003612215 14924765 14924987
+ENSE00003491362 14928059 14928279
+ENSE00003599293 14930355 14930545
+ENSE00003548985 14945614 14945787
+ENSE00003477744 14949837 14949978
+ENSE00003471770 14951790 14952540
+ENSE00003654551 14952936 14953059
+ENSE00001841790 14954151 14954391
+ENSE00003628831 14954989 14955118
+ENSE00003640718 14958258 14958443
+ENSE00003536017 14958858 14959078
+ENSE00003664007 14959164 14959252
+ENSE00003650241 14968265 14968421
+ENSE00003632911 14968558 14968770
+ENSE00003662966 14969491 14969586
+ENSE00003580686 14971204 14972764
+ENSE00001595780 14958970 14959078
+ENSE00001746916 14971204 14971537
+ENSE00001944130 14968738 14969586
+ENSE00001756639 6258472 6258716
+ENSE00001663636 6262141 6262300
+ENSE00001595449 6269164 6269272
+ENSE00001634261 6271629 6271766
+ENSE00001767444 6279348 6279605
+ENSE00002490412 2709527 2709668
+ENSE00001709586 2710206 2710283
+ENSE00001738202 2712118 2712298
+ENSE00001602849 2713687 2713784
+ENSE00001601989 2722641 2722812
+ENSE00003667463 2733129 2733286
+ENSE00003636667 2734834 2735309
+ENSE00001732508 2709961 2710014
+ENSE00001635459 2734834 2734903
+ENSE00001859961 2722137 2722812
+ENSE00003469154 2733129 2733286
+ENSE00003654930 2734834 2734997
+ENSE00002062109 2722771 2722812
+ENSE00002034437 2796905 2797026
+ENSE00002072199 2799752 2800041
+ENSE00001782128 9611654 9611898
+ENSE00001664883 9608070 9608229
+ENSE00001800396 9601098 9601206
+ENSE00001771159 9598604 9598741
+ENSE00001609985 9590765 9591022
+ENSE00001914170 16168097 16168271
+ENSE00001617113 16168464 16168838
+ENSE00001940528 16098219 16098393
+ENSE00001622417 16097652 16098026
+ENSE00002314376 19990147 19992100
+ENSE00002282227 19990178 19992098
+ENSE00002227312 19989715 19989742
+ENSE00002265258 19989290 19989709
+ENSE00002810788 21729199 21729368
+ENSE00002783311 21749096 21749393
+ENSE00002873804 21750256 21750536
+ENSE00002825272 21751413 21751498
+ENSE00001893165 21751969 21752133
+ENSE00001826433 21729236 21729456
+ENSE00001875276 21750256 21751498
+ENSE00002811240 21751969 21752304
+ENSE00002957208 21729268 21729368
+ENSE00002829137 21751407 21751498
+ENSE00002862473 21751969 21752309
+ENSE00002920932 21753666 21753845
+ENSE00001657009 21755285 21755479
+ENSE00001617655 21757901 21758020
+ENSE00001663997 21759477 21759551
+ENSE00003585093 21761625 21761717
+ENSE00003676904 21764102 21764197
+ENSE00001621570 21765683 21766006
+ENSE00002224044 21729673 21729837
+ENSE00002265061 21731271 21731345
+ENSE00002256441 21750256 21751735
+ENSE00001793723 21729715 21729837
+ENSE00001687078 21749096 21751733
+ENSE00002916216 21750429 21750536
+ENSE00002875419 21755285 21755603
+ENSE00002833019 21750456 21750536
+ENSE00002905816 21751969 21752305
+ENSE00002863993 21750495 21750536
+ENSE00002757214 21753806 21753845
+ENSE00002783490 21755285 21755488
+ENSE00002932169 21750497 21750536
+ENSE00001950956 21752639 21752658
+ENSE00001930064 21755285 21755537
+ENSE00001279198 21754336 21754700
+ENSE00001279210 21755285 21759551
+ENSE00001035195 21765683 21768160
+ENSE00002958230 21754974 21756047
+ENSE00002880368 21759243 21759551
+ENSE00002922995 21760438 21760525
+ENSE00002745809 21765683 21765937
+ENSE00001035136 21729235 21729368
+ENSE00000891756 21749096 21749365
+ENSE00001281659 21749368 21749393
+ENSE00000981692 21751407 21751443
+ENSE00001281628 21751446 21751498
+ENSE00002836144 21751969 21752308
+ENSE00001542537 21758442 21759551
+ENSE00003664473 21761625 21761717
+ENSE00003581743 21764102 21764197
+ENSE00001493485 21765683 21767698
+ENSE00001668181 20752266 20752407
+ENSE00003505404 20750492 20750592
+ENSE00001757235 20750017 20750130
+ENSE00003519096 20749298 20749434
+ENSE00003627127 20743092 20744234
+ENSE00002217996 20750492 20750598
+ENSE00003529279 20749298 20749434
+ENSE00003568887 20743092 20744234
+ENSE00001785382 24587492 24587605
+ENSE00003499302 24587136 24587273
+ENSE00001693495 24585740 24585978
+ENSE00002290871 24587492 24587549
+ENSE00003596079 24587136 24587273
+ENSE00002214070 24586927 24587025
+ENSE00002310424 24585887 24585978
+ENSE00001917099 24291113 24291226
+ENSE00003687937 24291445 24291582
+ENSE00001937237 24292740 24292978
+ENSE00002299728 24291169 24291226
+ENSE00003671276 24291445 24291582
+ENSE00002277139 24291693 24291791
+ENSE00002209602 24292740 24292831
+ENSE00001799711 6317509 6317545
+ENSE00001638328 6318487 6318648
+ENSE00001698649 6320956 6321030
+ENSE00001648830 6321719 6322195
+ENSE00001726416 6324113 6324226
+ENSE00001799620 6325702 6325947
+ENSE00001594918 6325200 6325455
+ENSE00001682908 6321946 6322195
+ENSE00001619344 6324014 6324226
+ENSE00001729695 9548185 9548434
+ENSE00001805126 9546154 9546366
+ENSE00001633857 9544433 9544678
+ENSE00001692562 9552835 9552871
+ENSE00001658267 9551732 9551893
+ENSE00001691944 9549350 9549424
+ENSE00001628673 9548185 9548661
+ENSE00001700737 9546154 9546267
+ENSE00001717109 9544925 9545180
+ENSE00001016870 15815447 15816315
+ENSE00001274354 15817105 15817904
+ENSE00001923202 22918050 22918052
+ENSE00001677561 22918664 22918741
+ENSE00001680575 22921754 22921934
+ENSE00001638361 22923170 22923267
+ENSE00001725100 22930691 22930862
+ENSE00001734476 22941395 22941552
+ENSE00001159425 22942817 22942918
+ENSE00001867079 16634518 16634821
+ENSE00001832496 16733889 16734377
+ENSE00001420613 16634632 16634821
+ENSE00003548219 16831339 16831398
+ENSE00003542397 16834997 16835149
+ENSE00003537524 16936068 16936253
+ENSE00003686712 16941610 16942399
+ENSE00001424034 16952293 16957530
+ENSE00001493621 16635626 16635778
+ENSE00003620397 16733901 16734471
+ENSE00003691073 16834997 16835149
+ENSE00003601036 16952293 16955606
+ENSE00003554354 16831339 16831398
+ENSE00003477208 16860499 16860609
+ENSE00003610714 16733901 16734471
+ENSE00003537517 16834997 16835149
+ENSE00003613876 16860499 16860609
+ENSE00001738756 16863561 16863682
+ENSE00003625704 16936068 16936253
+ENSE00003593743 16941610 16942399
+ENSE00003665738 16952293 16955606
+ENSE00001955812 16636409 16636588
+ENSE00001955997 16733889 16734248
+ENSE00001368742 16636454 16636816
+ENSE00001390456 16733889 16734471
+ENSE00001868567 16952293 16955527
+ENSE00001493617 16734061 16734471
+ENSE00001612473 16845332 16845429
+ENSE00001707785 6114264 6114795
+ENSE00003546295 6115397 6115474
+ENSE00001623435 6115603 6115714
+ENSE00001642228 6115816 6115961
+ENSE00003563383 6116068 6116149
+ENSE00001830098 6116844 6117054
+ENSE00003503096 6114310 6114795
+ENSE00001618514 6116057 6116149
+ENSE00001687266 6116844 6117053
+ENSE00003465137 6114310 6114795
+ENSE00003510285 6115397 6115474
+ENSE00001885391 6115603 6116149
+ENSE00001950610 6116277 6117060
+ENSE00001851117 6115664 6115961
+ENSE00003591129 6116068 6116149
+ENSE00001858368 6116844 6117045
+ENSE00001902634 25840606 25840726
+ENSE00003696150 25837904 25838019
+ENSE00001591394 25827587 25828412
+ENSE00003701041 25840606 25840674
+ENSE00003695877 25828154 25828412
+ENSE00001786049 25840606 25840710
+ENSE00003695050 25837904 25838019
+ENSE00001644491 25829559 25829589
+ENSE00001686003 24636544 24636631
+ENSE00001657597 24636733 24636819
+ENSE00003636240 24647660 24647780
+ENSE00003502664 24650356 24650471
+ENSE00003502641 24659959 24660784
+ENSE00001876694 24658785 24658899
+ENSE00003543250 24659959 24660784
+ENSE00001624902 24658785 24658815
+ENSE00001703126 24328982 24329095
+ENSE00003469687 24327008 24327120
+ENSE00003536546 24326438 24326541
+ENSE00001739138 24323241 24323418
+ENSE00003679115 24321173 24321325
+ENSE00003668781 24319048 24319162
+ENSE00003531954 24317982 24318092
+ENSE00003661928 24317431 24317541
+ENSE00003557303 24316886 24316996
+ENSE00003568885 24315852 24315962
+ENSE00003638296 24315677 24315765
+ENSE00003580283 24314689 24315265
+ENSE00001816306 24328982 24329104
+ENSE00003562722 24327008 24327120
+ENSE00003591193 24326438 24326541
+ENSE00003636020 24321173 24321325
+ENSE00003530779 24319048 24319162
+ENSE00003638449 24317982 24318092
+ENSE00003576126 24317431 24317541
+ENSE00003480178 24316886 24316996
+ENSE00003687067 24315852 24315962
+ENSE00003468889 24315677 24315765
+ENSE00003578044 24314689 24315265
+ENSE00001765593 24328982 24329129
+ENSE00001625881 24242067 24242154
+ENSE00001682815 24241879 24241965
+ENSE00003673462 24230918 24231038
+ENSE00003492436 24228227 24228342
+ENSE00003573636 24217903 24218728
+ENSE00001899711 24219791 24219905
+ENSE00003488975 24217903 24218728
+ENSE00001628640 24219875 24219905
+ENSE00003525363 23655405 23655582
+ENSE00003550074 23657496 23657648
+ENSE00003651701 23659659 23659773
+ENSE00003567644 23660729 23660839
+ENSE00003490701 23661281 23661391
+ENSE00003662908 23661826 23661936
+ENSE00001594909 23662370 23662480
+ENSE00003642058 23662858 23662968
+ENSE00003574473 23663056 23663144
+ENSE00003635823 23663556 23663854
+ENSE00003480955 23655405 23655582
+ENSE00003658768 23657496 23657648
+ENSE00003532851 23659659 23659773
+ENSE00003665479 23660729 23660839
+ENSE00003491376 23661281 23661391
+ENSE00003537055 23661826 23661936
+ENSE00001657501 23662370 23662391
+ENSE00001596885 23661276 23661391
+ENSE00003604831 23662858 23662968
+ENSE00003477338 23663056 23663144
+ENSE00003512248 23663556 23663854
+ENSE00001611088 23630044 23630517
+ENSE00001715185 23631127 23631204
+ENSE00001657254 23631333 23631444
+ENSE00001645729 23631544 23631689
+ENSE00001802624 23631798 23631879
+ENSE00001543685 23632547 23632569
+ENSE00003670797 20934992 20935601
+ENSE00003588065 20893326 20893753
+ENSE00003662804 20934992 20935601
+ENSE00003613429 20932069 20932178
+ENSE00001899520 20931419 20931505
+ENSE00001698194 20930217 20930371
+ENSE00001803454 20899531 20899616
+ENSE00003604288 20893326 20893753
+ENSE00001597600 20934992 20935572
+ENSE00003535599 20932069 20932178
+ENSE00001493553 20930807 20931505
+ENSE00001674814 20934992 20935621
+ENSE00001858694 20933700 20934513
+ENSE00002036080 20990471 20990548
+ENSE00002085962 20981241 20981672
+ENSE00002062695 20952456 20952653
+ENSE00002046872 20951697 20951805
+ENSE00002035366 20934594 20935259
+ENSE00001900148 28121696 28121764
+ENSE00003701812 28124351 28124466
+ENSE00001781602 28133958 28134216
+ENSE00001665399 28121660 28121764
+ENSE00003700592 28124351 28124466
+ENSE00001707150 28132781 28132811
+ENSE00001854309 27768309 27770181
+ENSE00001666303 27770602 27771049
+ENSE00001436475 27768264 27770483
+ENSE00001786020 27624416 27624579
+ENSE00001735399 27626218 27629777
+ENSE00003506820 27629055 27629777
+ENSE00003604808 27632781 27632852
+ENSE00001668733 27600708 27600827
+ENSE00001679890 27601441 27601500
+ENSE00001610203 27601687 27601765
+ENSE00003589912 27602885 27603234
+ENSE00001637486 27603533 27603590
+ENSE00001798553 27604621 27604690
+ENSE00001752493 27604710 27604870
+ENSE00003654675 27604947 27605021
+ENSE00003649198 27605442 27605530
+ENSE00003496796 27605786 27605974
+ENSE00003547203 27606064 27606156
+ENSE00001716410 27606239 27606329
+ENSE00001635424 27606412 27606567
+ENSE00001555256 27606658 27606719
+ENSE00002286560 27601458 27601500
+ENSE00003553650 27602885 27603234
+ENSE00002324147 27603532 27603590
+ENSE00003604497 27604947 27605021
+ENSE00003480675 27605442 27605530
+ENSE00003621244 27605786 27605974
+ENSE00003668556 27606064 27606156
+ENSE00001795579 27606239 27606322
+ENSE00001629437 26361609 26361728
+ENSE00001719001 26360936 26360995
+ENSE00001728235 26360671 26360751
+ENSE00001708482 26359940 26360029
+ENSE00001598676 26359202 26359552
+ENSE00003608732 26358846 26358903
+ENSE00003588397 26357415 26357489
+ENSE00003459260 26356906 26356994
+ENSE00003477737 26356462 26356650
+ENSE00003459192 26356280 26356372
+ENSE00001742949 26356102 26356197
+ENSE00001800235 26355872 26356018
+ENSE00001597641 26355714 26355770
+ENSE00003490496 26360936 26360978
+ENSE00003609331 26360671 26360749
+ENSE00003569147 26359202 26359551
+ENSE00002302306 26358846 26358904
+ENSE00003652688 26357415 26357489
+ENSE00003586010 26356906 26356994
+ENSE00003532508 26356462 26356650
+ENSE00003491049 26356280 26356372
+ENSE00003688093 26356114 26356197
+ENSE00001654202 26337854 26338017
+ENSE00001648900 26332656 26336215
+ENSE00001931820 26192244 26194116
+ENSE00001621868 26191376 26191823
+ENSE00001491983 26191940 26194166
+ENSE00001652231 20708557 20709186
+ENSE00001822279 20709665 20710478
+ENSE00001838947 20708577 20709186
+ENSE00003536585 20750422 20750849
+ENSE00001801443 20712000 20712109
+ENSE00001739030 20712669 20712755
+ENSE00001756006 20713803 20713957
+ENSE00001661293 20744559 20744644
+ENSE00003593310 20750422 20750849
+ENSE00001603573 20708606 20709186
+ENSE00001493572 20712669 20713351
+ENSE00003551056 8774265 8774373
+ENSE00001663043 8775135 8775238
+ENSE00003516285 8777924 8778101
+ENSE00003527480 8779987 8780139
+ENSE00001644937 8782134 8782245
+ENSE00001644677 8783125 8783235
+ENSE00001705054 8783317 8783405
+ENSE00001636903 8783812 8784114
+ENSE00002323304 8777989 8778101
+ENSE00002263796 8782134 8782196
+ENSE00003587796 8774265 8774373
+ENSE00002255439 8774933 8774966
+ENSE00002234877 8775160 8775238
+ENSE00003466737 8777924 8778101
+ENSE00003689056 8779987 8780139
+ENSE00002236871 8782134 8782184
+ENSE00001091887 3447082 3447168
+ENSE00001889836 3447265 3448082
+ENSE00002559577 3447156 3448082
+ENSE00001285919 21038954 21040114
+ENSE00001702356 21034387 21034720
+ENSE00001684888 21238931 21239004
+ENSE00001631223 21038954 21039091
+ENSE00001624668 21034389 21034720
+ENSE00001747572 21237827 21237882
+ENSE00001776523 21230610 21230684
+ENSE00001661526 21207128 21207177
+ENSE00001792674 21206428 21206581
+ENSE00001594458 21094203 21094727
+ENSE00001313137 21238931 21239302
+ENSE00001556105 21094585 21094727
+ENSE00001748128 21238931 21239281
+ENSE00001720075 21210659 21210798
+ENSE00001800197 21207128 21207174
+ENSE00001721855 21206548 21206581
+ENSE00001746218 21205049 21205232
+ENSE00001626612 21203384 21203472
+ENSE00001544018 8685387 8685423
+ENSE00001544017 8662902 8662985
+ENSE00000900348 8657041 8657343
+ENSE00001272668 8651929 8652244
+ENSE00001544015 8651351 8651447
+ENSE00001708504 20137669 20139589
+ENSE00001744590 20140025 20140052
+ENSE00001639130 20140058 20140477
+ENSE00002310447 20137667 20139626
+ENSE00001493287 23548118 23548246
+ENSE00001296914 23545799 23545855
+ENSE00001325246 23545318 23545682
+ENSE00001493285 23544840 23545166
+ENSE00001853079 9528709 9528844
+ENSE00001693371 9529322 9529526
+ENSE00001729545 9531119 9531308
+ENSE00001799150 9531390 9531566
+ENSE00002265228 13524543 13524717
+ENSE00001730908 13506553 13506659
+ENSE00001631073 13505561 13505710
+ENSE00001704814 13501668 13501776
+ENSE00001619518 13500644 13500780
+ENSE00002258116 13496241 13496470
+ENSE00003436862 25119966 25120121
+ENSE00003254543 25130295 25130363
+ENSE00001669186 25130716 25130838
+ENSE00001626256 25133904 25134040
+ENSE00003262349 25138450 25138523
+ENSE00001730185 25130410 25130440
+ENSE00003542244 25138450 25138568
+ENSE00001792376 25140628 25140745
+ENSE00003677192 25143599 25143704
+ENSE00003473409 25144397 25144521
+ENSE00001543555 25151072 25151606
+ENSE00003442800 25130434 25130440
+ENSE00003516313 25138450 25138568
+ENSE00003688894 25143599 25143704
+ENSE00003560022 25144397 25144521
+ENSE00003360806 25151072 25151553
+ENSE00001698397 25151072 25151196
+ENSE00001791291 25151291 25151612
+ENSE00003302451 26753707 26753862
+ENSE00003336765 26764036 26764104
+ENSE00001619605 26764457 26764579
+ENSE00001689728 26767645 26767781
+ENSE00003272237 26772191 26772264
+ENSE00001676764 26764151 26764181
+ENSE00003549506 26772191 26772309
+ENSE00001757255 26774369 26774486
+ENSE00003667169 26777341 26777446
+ENSE00003463186 26778139 26778263
+ENSE00001625788 26784814 26784938
+ENSE00001723611 26785033 26785354
+ENSE00003454106 26764175 26764181
+ENSE00003681954 26772191 26772309
+ENSE00003623347 26777341 26777446
+ENSE00003640779 26778139 26778263
+ENSE00003335658 26784814 26784943
+ENSE00003346171 26785033 26785295
+ENSE00001864071 15591394 15592553
+ENSE00003531058 15591134 15591197
+ENSE00003601240 15582001 15582109
+ENSE00003660729 15560897 15560946
+ENSE00003557845 15526615 15526673
+ENSE00003529356 15522873 15522993
+ENSE00003648165 15508798 15508852
+ENSE00001594460 15505739 15505773
+ENSE00003527061 15481136 15481229
+ENSE00003473448 15478147 15478273
+ENSE00001775185 15472310 15472408
+ENSE00001755119 15471647 15471866
+ENSE00001291636 15469757 15469849
+ENSE00001311020 15467803 15467898
+ENSE00001696208 15466883 15467278
+ENSE00001305759 15447443 15448215
+ENSE00001624388 15438101 15438230
+ENSE00001639693 15436481 15436586
+ENSE00001651897 15435435 15435640
+ENSE00001295816 15418066 15418130
+ENSE00001591330 15417918 15417992
+ENSE00001751483 15417279 15417427
+ENSE00001740894 15414757 15414871
+ENSE00001640284 15410837 15411024
+ENSE00001785465 15409587 15409728
+ENSE00001298131 15372158 15372284
+ENSE00001713248 15362897 15363067
+ENSE00001875480 15360259 15361762
+ENSE00001385925 15591394 15592550
+ENSE00001436855 15409321 15409728
+ENSE00001640421 15434914 15435248
+ENSE00001822678 15591394 15591420
+ENSE00003612487 15591134 15591197
+ENSE00003529285 15582001 15582109
+ENSE00003678156 15560897 15560946
+ENSE00003644070 15526615 15526673
+ENSE00003505329 15522873 15522993
+ENSE00003575738 15508798 15508852
+ENSE00003582459 15481136 15481229
+ENSE00003591211 15478147 15478273
+ENSE00001899507 15472368 15472408
+ENSE00001931275 15591134 15591384
+ENSE00001818021 15505032 15505773
+ENSE00001436854 15591394 15591858
+ENSE00001493710 15508182 15508852
+ENSE00001942244 15591394 15591803
+ENSE00001872513 15590922 15591197
+ENSE00002281428 15591394 15591545
+ENSE00002277775 15470968 15471102
+ENSE00001330353 15361736 15361762
+ENSE00002240413 15467255 15467278
+ENSE00002284432 15591394 15591551
+ENSE00002258300 15434927 15435248
+ENSE00002206921 15470344 15470433
+ENSE00002312710 15434948 15435270
+ENSE00001494622 2654896 2655740
+ENSE00002201849 2655075 2655644
+ENSE00002323146 2655049 2655069
+ENSE00002144027 2655171 2655644
+ENSE00002214525 2655145 2655168
+ENSE00001543664 23756472 23756552
+ENSE00001299738 23751852 23752013
+ENSE00001425815 23751158 23751256
+ENSE00001543661 23749387 23749548
+ENSE00001543658 23746669 23746760
+ENSE00001543653 23745486 23745548
+ENSE00001493514 21154353 21154595
+ENSE00001774474 6341536 6341671
+ENSE00001632371 6340854 6341058
+ENSE00001757652 6338814 6338990
+ENSE00001598872 6339072 6339261
+ENSE00001643163 27198221 27198251
+ENSE00001775879 27197823 27197945
+ENSE00001797093 27194621 27194757
+ENSE00003604811 27190093 27190211
+ENSE00001711311 27187916 27188033
+ENSE00003547967 27184956 27185061
+ENSE00003506422 27184139 27184263
+ENSE00001764403 27177464 27177588
+ENSE00001740581 27177048 27177369
+ENSE00003378858 27198221 27198227
+ENSE00003663501 27190093 27190211
+ENSE00003522688 27184956 27185061
+ENSE00003640409 27184139 27184263
+ENSE00003450106 27177459 27177588
+ENSE00003432143 27177107 27177369
+ENSE00003307767 27208540 27208695
+ENSE00003370731 27198298 27198366
+ENSE00003266869 27190138 27190211
+ENSE00001695979 26959330 26959626
+ENSE00001803162 26952582 26952728
+ENSE00001733013 26952216 26952307
+ENSE00001602566 26951604 26951655
+ENSE00001709286 26951104 26951167
+ENSE00001706721 26950875 26951014
+ENSE00001722493 26949403 26949474
+ENSE00001757238 26946960 26947031
+ENSE00001785551 26944570 26944641
+ENSE00001618455 26942174 26942245
+ENSE00001728003 26939797 26939868
+ENSE00001655608 26937418 26937489
+ENSE00001684201 26935041 26935112
+ENSE00001594431 26932661 26932732
+ENSE00001620779 26930284 26930355
+ENSE00001604640 26920497 26920568
+ENSE00001799005 26919717 26919751
+ENSE00001715045 26915046 26915144
+ENSE00001879735 26909216 26911072
+ENSE00001759696 26959330 26959542
+ENSE00001670353 26909220 26911072
+ENSE00001638508 26959330 26959540
+ENSE00001596582 26909222 26911072
+ENSE00001710130 26959330 26959519
+ENSE00001761121 26929528 26929539
+ENSE00002297176 26934285 26934296
+ENSE00001761952 9745272 9745748
+ENSE00001787222 9744588 9744665
+ENSE00001756785 9744349 9744459
+ENSE00001659741 9744107 9744249
+ENSE00001614961 9743913 9743997
+ENSE00001783992 9743204 9743223
+ENSE00001790708 25344945 25345241
+ENSE00001612801 25338199 25338345
+ENSE00001805705 25337833 25337924
+ENSE00001701677 25337221 25337272
+ENSE00001677573 25336721 25336784
+ENSE00001692245 25336492 25336631
+ENSE00001655036 25327351 25327497
+ENSE00001644179 25326985 25327076
+ENSE00001646813 25326373 25326424
+ENSE00001693241 25325873 25325936
+ENSE00001620467 25325644 25325783
+ENSE00001793499 25316511 25316657
+ENSE00001598075 25316145 25316236
+ENSE00001622354 25315533 25315584
+ENSE00001695781 25315033 25315096
+ENSE00001646624 25314804 25314943
+ENSE00001597343 25313332 25313403
+ENSE00001707161 25310887 25310958
+ENSE00001696683 25308496 25308567
+ENSE00001742324 25306100 25306171
+ENSE00001758855 25303717 25303788
+ENSE00001633941 25301321 25301392
+ENSE00001675126 25298944 25299015
+ENSE00001697295 25289158 25289229
+ENSE00001614454 25286773 25286844
+ENSE00003674346 25285994 25286028
+ENSE00003567366 25281322 25281420
+ENSE00001730215 25275502 25277358
+ENSE00001907756 25286773 25286959
+ENSE00003593491 25285994 25286028
+ENSE00001924070 25284357 25284428
+ENSE00003490554 25281322 25281420
+ENSE00001941183 25275510 25277358
+ENSE00002223761 25344945 25345070
+ENSE00002211619 25306111 25306171
+ENSE00002216645 25276010 25276503
+ENSE00002288175 25344945 25345140
+ENSE00002302795 25275506 25277358
+ENSE00002244340 25344945 25345157
+ENSE00002220199 25275509 25277358
+ENSE00001364289 28555962 28556092
+ENSE00001703024 28560195 28560309
+ENSE00001598245 28560476 28560649
+ENSE00001623069 28564991 28565087
+ENSE00001638930 28566541 28566682
+ENSE00001793914 9215731 9216215
+ENSE00003611070 9216823 9216900
+ENSE00003508777 9217029 9217140
+ENSE00003486078 9217242 9217387
+ENSE00003501886 9217494 9217575
+ENSE00002025193 9218270 9218480
+ENSE00001605630 9160370 9160478
+ENSE00002029402 9159794 9159897
+ENSE00002071256 9156601 9156778
+ENSE00001752142 9154548 9154700
+ENSE00001625315 9152385 9152499
+ENSE00001784994 9151319 9151429
+ENSE00001627758 9150782 9150891
+ENSE00001785135 9150234 9150344
+ENSE00001784976 9149625 9149735
+ENSE00001614661 9148739 9149040
+ENSE00002247829 9160370 9160483
+ENSE00002322909 9154670 9154700
+ENSE00001696638 27648022 27648105
+ENSE00001710713 27646259 27646378
+ENSE00001621856 27645577 27645636
+ENSE00001665831 27645311 27645391
+ENSE00001614475 27644598 27644654
+ENSE00001747959 27643623 27644192
+ENSE00001629056 27643285 27643355
+ENSE00001738683 27642139 27642262
+ENSE00001745419 27641798 27641929
+ENSE00001435537 22737611 22737773
+ENSE00003613684 22741494 22741577
+ENSE00003657245 22744476 22744579
+ENSE00003465639 22746360 22746410
+ENSE00001635972 22749910 22749991
+ENSE00003488477 22751370 22751461
+ENSE00001436409 22754227 22755040
+ENSE00001900287 22737664 22737773
+ENSE00003550832 22741494 22741577
+ENSE00003637584 22744476 22744579
+ENSE00003458723 22746360 22746410
+ENSE00001848909 22749910 22750415
+ENSE00001848730 22737680 22737773
+ENSE00003530204 22754227 22754516
+ENSE00001918431 22748604 22751461
+ENSE00003571869 22754227 22754516
+ENSE00001943944 22749733 22749991
+ENSE00003530759 22751370 22751461
+ENSE00001608723 26980064 26980276
+ENSE00001761380 26997726 26997872
+ENSE00001671603 26998147 26998238
+ENSE00001610089 26998799 26998850
+ENSE00001639000 26999287 26999350
+ENSE00001609022 26999440 26999579
+ENSE00001796227 27000980 27001051
+ENSE00001643133 27003423 27003494
+ENSE00001610738 27005813 27005884
+ENSE00001694635 27008209 27008280
+ENSE00001628323 27010593 27010664
+ENSE00001759079 27017746 27017817
+ENSE00003621500 27020123 27020194
+ENSE00003476346 27041835 27041906
+ENSE00003593830 27042652 27042686
+ENSE00003673436 27047259 27047357
+ENSE00002286159 27051331 27053183
+ENSE00003521530 27039450 27039521
+ENSE00001409187 27047259 27047331
+ENSE00001784722 26986878 26987024
+ENSE00001612288 26987299 26987390
+ENSE00001607094 26987951 26988002
+ENSE00001741374 26988439 26988502
+ENSE00001709278 26988592 26988731
+ENSE00001632077 27012989 27013060
+ENSE00003646399 27015366 27015437
+ENSE00002226159 26980066 26980276
+ENSE00003609241 27029911 27029982
+ENSE00003641574 27032296 27032367
+ENSE00003582499 27034681 27034752
+ENSE00002310973 27051331 27053181
+ENSE00001804650 26980081 26980276
+ENSE00002247728 26980008 26980276
+ENSE00003688590 27020123 27020194
+ENSE00003491822 27029911 27029982
+ENSE00003627572 27032296 27032367
+ENSE00003463917 27034681 27034752
+ENSE00002242041 27037065 27037136
+ENSE00003542339 27039450 27039521
+ENSE00003675468 27041835 27041906
+ENSE00003605405 27042652 27042686
+ENSE00003571325 27047259 27047357
+ENSE00002324480 26980087 26980276
+ENSE00003528908 27015366 27015437
+ENSE00003669768 27020123 27020194
+ENSE00001693578 25525631 25525720
+ENSE00001701535 25535138 25535341
+ENSE00001742121 25537312 25537501
+ENSE00001646556 25537590 25537702
+ENSE00001671473 25538803 25538844
+ENSE00001639518 25365594 25365890
+ENSE00001640624 25372491 25372637
+ENSE00001689126 25372912 25373003
+ENSE00001736653 25373564 25373615
+ENSE00001659494 25374052 25374115
+ENSE00001723992 25374205 25374344
+ENSE00001592425 25375745 25375816
+ENSE00001676137 25378188 25378259
+ENSE00001748863 25380579 25380650
+ENSE00001787726 25387753 25387824
+ENSE00001605577 25390137 25390208
+ENSE00001607297 25399688 25399759
+ENSE00001771158 25402065 25402136
+ENSE00001743603 25411854 25411925
+ENSE00001706458 25414238 25414309
+ENSE00001690453 25416623 25416694
+ENSE00001659100 25419007 25419078
+ENSE00001717191 25421391 25421462
+ENSE00001705999 25423775 25423846
+ENSE00001622502 25426160 25426231
+ENSE00001695324 25426976 25427010
+ENSE00001659634 25431585 25431683
+ENSE00001668820 25435647 25437503
+ENSE00001603036 25365622 25365890
+ENSE00001780208 25382975 25383046
+ENSE00001682794 25385357 25385428
+ENSE00001736828 25392533 25392604
+ENSE00001674452 25394910 25394981
+ENSE00001686143 25397292 25397363
+ENSE00001687149 25435647 25437499
+ENSE00001943922 25365695 25365890
+ENSE00001865723 25435647 25436146
+ENSE00001542120 25435647 25437496
+ENSE00001685890 25365678 25365890
+ENSE00001795799 25365680 25365890
+ENSE00001656459 25435647 25437497
+ENSE00001612046 25365701 25365890
+ENSE00002307170 25435647 25435854
+ENSE00002288552 25395727 25395738
+ENSE00001494103 14532115 14532255
+ENSE00003566412 14518595 14518736
+ENSE00001494100 14475147 14475502
+ENSE00001936584 14529689 14530093
+ENSE00003683672 14518992 14519108
+ENSE00001833093 14510475 14510612
+ENSE00001864445 14494992 14495133
+ENSE00001431292 14514046 14514385
+ENSE00001494096 14532115 14532171
+ENSE00001494094 14517922 14518299
+ENSE00001494099 14532115 14532121
+ENSE00001626669 9578193 9580010
+ENSE00001628061 9582487 9582573
+ENSE00001654045 9589958 9590106
+ENSE00001742514 9590799 9590955
+ENSE00001625965 9593786 9593864
+ENSE00001637388 9593985 9594182
+ENSE00001647627 9595406 9596085
+ENSE00001713628 6290369 6292186
+ENSE00001608925 6287806 6287892
+ENSE00001661659 6280264 6280412
+ENSE00001718232 6279415 6279571
+ENSE00001712140 6276506 6276584
+ENSE00001710562 6276188 6276385
+ENSE00001697617 6274285 6274964
+ENSE00001796890 5441186 5442472
+ENSE00001542224 15398518 15399258
+ENSE00002304178 15398518 15398647
+ENSE00002218148 15398663 15399258
+ENSE00002225900 28654360 28654413
+ENSE00002060648 28661203 28661256
+ENSE00002054613 28661802 28661855
+ENSE00002062274 28671998 28672057
+ENSE00001794058 28678931 28678981
+ENSE00002063262 28680470 28680631
+ENSE00002075791 28687117 28687236
+ENSE00002083827 28688890 28688989
+ENSE00001329242 28689136 28689198
+ENSE00001755500 28689931 28689995
+ENSE00001609863 28703135 28703199
+ENSE00002050610 28704021 28704095
+ENSE00001759871 28704194 28704282
+ENSE00001803087 28710465 28710525
+ENSE00001650788 28712325 28712406
+ENSE00001785401 28716007 28716112
+ENSE00001711963 28718030 28718109
+ENSE00001803764 28722071 28722161
+ENSE00001634122 28723571 28723641
+ENSE00002035744 28725736 28725837
+ENSE00001633881 28269821 28269929
+ENSE00003705348 28270383 28270486
+ENSE00003708504 28273195 28273372
+ENSE00001648607 28275308 28275460
+ENSE00001624353 28277468 28277582
+ENSE00001654144 28278452 28278562
+ENSE00001742721 28278645 28278732
+ENSE00001733694 28279157 28279455
+ENSE00002063794 28269867 28269929
+ENSE00002073782 28275308 28275354
+ENSE00001798056 25012691 25013903
+ENSE00001786804 25692981 25693089
+ENSE00003707528 25692424 25692527
+ENSE00003707660 25689538 25689715
+ENSE00001646086 25687450 25687602
+ENSE00001635863 25685328 25685442
+ENSE00001692837 25684348 25684458
+ENSE00001650343 25684178 25684265
+ENSE00001693911 25683455 25683753
+ENSE00001782800 25692981 25693043
+ENSE00001601487 25687556 25687602
+ENSE00001543607 24444453 24445023
+ENSE00001543606 24442945 24443935
+ENSE00001610376 21664918 21665039
+ENSE00002093140 21642055 21642180
+ENSE00003244086 21636526 21636789
+ENSE00002127178 21635791 21636030
+ENSE00002119618 21628342 21629260
+ENSE00002135990 21626992 21627113
+ENSE00002871965 21621175 21621252
+ENSE00002832419 21620810 21620965
+ENSE00001709914 21617317 21618750
+ENSE00002099269 21664918 21665022
+ENSE00002135691 21621845 21621990
+ENSE00002116806 21618943 21619003
+ENSE00002114284 21618198 21618750
+ENSE00002035739 21643553 21646770
+ENSE00002065873 21642577 21642624
+ENSE00002052301 21642051 21642180
+ENSE00002063105 21636536 21636784
+ENSE00002068678 21635791 21636136
+ENSE00002035720 21628950 21629273
+ENSE00002040446 21628330 21628593
+ENSE00002087154 21618471 21618750
+ENSE00001641174 14043242 14044475
+ENSE00001735477 8148239 8148376
+ENSE00001670231 8149742 8149851
+ENSE00001794413 8149938 8150083
+ENSE00001687318 8150202 8150250
+ENSE00001616737 7721727 7722917
+ENSE00001561333 3417693 3417851
+ENSE00001755093 8286832 8287941
+ENSE00001564286 23005142 23005596
+ENSE00001563558 21033988 21034158
+ENSE00001556375 21147061 21147740
+ENSE00001551524 21148104 21148284
+ENSE00001551713 3719265 3720910
+ENSE00001664386 26001669 26003238
+ENSE00001757650 27898535 27899017
+ENSE00001733565 25813749 25813969
+ENSE00001699806 25811830 25811945
+ENSE00001780177 25809855 25810043
+ENSE00001657364 25807367 25807496
+ENSE00001601089 25805613 25805651
+ENSE00001753322 7546940 7547251
+ENSE00001706758 7548308 7548373
+ENSE00001651867 7548569 7548661
+ENSE00001608049 7548761 7548905
+ENSE00001662653 7549024 7549069
+ENSE00001705807 16389230 16389369
+ENSE00001746889 16388092 16388491
+ENSE00001722988 19894090 19894215
+ENSE00001776348 19894790 19894881
+ENSE00001631973 19896507 19896695
+ENSE00001733982 2863108 2863314
+ENSE00001607377 27577303 27577404
+ENSE00001744544 27583358 27583462
+ENSE00001706460 23563363 23563471
+ENSE00003704369 23562809 23562912
+ENSE00001600307 23559224 23559338
+ENSE00001699202 23558232 23558342
+ENSE00003708495 23557757 23557867
+ENSE00003704715 23557586 23557674
+ENSE00001654717 23556877 23557177
+ENSE00001801361 23563363 23563448
+ENSE00001684171 23557034 23557177
+ENSE00001787469 27330861 27330920
+ENSE00001786906 27329790 27329895
+ENSE00001624987 28064470 28065042
+ENSE00001742768 26314334 26314417
+ENSE00001741857 26316061 26316180
+ENSE00001756336 26316803 26316862
+ENSE00001733532 26317048 26317128
+ENSE00001765193 26317785 26317859
+ENSE00001788990 26318248 26318816
+ENSE00003551602 26319084 26319144
+ENSE00003674280 26320185 26320299
+ENSE00002050900 26320509 26320641
+ENSE00001632235 6196093 6196270
+ENSE00001760970 6206786 6206935
+ENSE00001709237 6208936 6209035
+ENSE00001738839 6209843 6209953
+ENSE00001657532 6210351 6210461
+ENSE00001682229 6210572 6210653
+ENSE00001638892 6211063 6211364
+ENSE00001720793 27859471 27859704
+ENSE00001745042 27856055 27856417
+ENSE00001645847 15342765 15343320
+ENSE00001733686 8231577 8232000
+ENSE00001640125 27131348 27132623
+ENSE00001673618 20134179 20134244
+ENSE00001701558 20134576 20135788
+ENSE00001666463 15195704 15196258
+ENSE00001732944 15206489 15207719
+ENSE00001779327 14460540 14460702
+ENSE00001715744 14461903 14462022
+ENSE00001713276 14467867 14468218
+ENSE00001682376 14465804 14465942
+ENSE00001714515 14467867 14468226
+ENSE00001803533 9638762 9638916
+ENSE00001742365 9642383 9642494
+ENSE00001763553 9646920 9646994
+ENSE00001699801 9647680 9647718
+ENSE00001593910 9650809 9650854
+ENSE00001609393 21001068 21001199
+ENSE00001645218 20995937 20996093
+ENSE00001669340 20995776 20995804
+ENSE00001648145 20995360 20995534
+ENSE00001618505 20994888 20994983
+ENSE00001722426 19856420 19856476
+ENSE00001774330 19853796 19853911
+ENSE00001697230 19841408 19841679
+ENSE00001667957 25805122 25805205
+ENSE00001741624 25804630 25804789
+ENSE00001617890 25803986 25804144
+ENSE00001732131 26641455 26641730
+ENSE00001695041 26640856 26641153
+ENSE00001705100 28695572 28695890
+ENSE00001731067 9384401 9384693
+ENSE00001688408 9384257 9384264
+ENSE00001799980 9384120 9384149
+ENSE00001667811 9384061 9384106
+ENSE00001697848 9383769 9383780
+ENSE00001645782 9383598 9383608
+ENSE00001597947 9383272 9383336
+ENSE00001783028 9383040 9383075
+ENSE00001724466 9382966 9383038
+ENSE00001668444 9382734 9382879
+ENSE00001787097 9374241 9374297
+ENSE00001602981 9021303 9021394
+ENSE00001645114 9023211 9023412
+ENSE00001632052 22973205 22973695
+ENSE00001746774 22970662 22971351
+ENSE00001759712 20020635 20020855
+ENSE00001655699 20022478 20022686
+ENSE00001640260 17053626 17053701
+ENSE00003294704 17054541 17054595
+ENSE00001787918 26066052 26066271
+ENSE00001790627 26067398 26067657
+ENSE00001775532 8218972 8220184
+ENSE00003556843 24795392 24795500
+ENSE00003707698 24795954 24796057
+ENSE00003705095 24798766 24798943
+ENSE00001787776 24800879 24801031
+ENSE00001628066 24803039 24803153
+ENSE00001708635 24804023 24804133
+ENSE00001792249 24804216 24804303
+ENSE00001783563 24804728 24805025
+ENSE00002058038 24795438 24795500
+ENSE00003582078 24800879 24800925
+ENSE00001730385 26422392 26423173
+ENSE00001660101 25880340 25880590
+ENSE00001706320 25882205 25882396
+ENSE00001793186 24666828 24667842
+ENSE00001664773 24818605 24819006
+ENSE00001659765 24817518 24817572
+ENSE00001628115 24817194 24817302
+ENSE00001615798 24816958 24817017
+ENSE00001787460 24816763 24816836
+ENSE00001747464 6587003 6587221
+ENSE00001728661 19912484 19912576
+ENSE00001689812 19910390 19910531
+ENSE00001613966 19909863 19909948
+ENSE00001614250 19612838 19612995
+ENSE00001615218 19613591 19613640
+ENSE00001658609 19624978 19625106
+ENSE00001802978 19625354 19625447
+ENSE00001752917 19626041 19626211
+ENSE00001742406 19626321 19626414
+ENSE00001606348 19626689 19626898
+ENSE00001646418 14474827 14474869
+ENSE00001596298 14482956 14483105
+ENSE00001692812 14486671 14486792
+ENSE00001685756 14487928 14488049
+ENSE00001804752 14494139 14494272
+ENSE00001603823 14494944 14495106
+ENSE00001679516 14496112 14496233
+ENSE00001625545 14498765 14499123
+ENSE00001687009 25608612 25608682
+ENSE00001681165 25618301 25618427
+ENSE00001664672 27845598 27845690
+ENSE00001793473 27848054 27848188
+ENSE00001676006 27848630 27848709
+ENSE00001665330 27843396 27843493
+ENSE00001718020 27841897 27842095
+ENSE00001736617 27839620 27839688
+ENSE00001660147 27839476 27839534
+ENSE00001747287 27831866 27831998
+ENSE00001772012 27831391 27831526
+ENSE00001591413 27828240 27828345
+ENSE00001706519 27826017 27826133
+ENSE00001681474 27824167 27824240
+ENSE00001767346 27822970 27823061
+ENSE00001660597 27821207 27821355
+ENSE00001766514 27896136 27896353
+ENSE00001667582 27894743 27895009
+ENSE00001592213 25727742 25727848
+ENSE00001776907 25729370 25729494
+ENSE00001725078 25730031 25730140
+ENSE00001700142 25731858 25732473
+ENSE00001645488 25732847 25732951
+ENSE00001633384 25752297 25752397
+ENSE00001701227 25759032 25759186
+ENSE00001700211 25760375 25760456
+ENSE00001624131 25760756 25760787
+ENSE00001714386 26385030 26385131
+ENSE00001771613 26378973 26379077
+ENSE00001677160 9323603 9323893
+ENSE00001679028 9311656 9311705
+ENSE00001715921 9303716 9303730
+ENSE00001780929 9302377 9302423
+ENSE00001670508 9302159 9302223
+ENSE00001760125 9301927 9301962
+ENSE00001765669 9301853 9301925
+ENSE00001594565 9301621 9301766
+ENSE00001723672 9293012 9293068
+ENSE00001659766 9301461 9301502
+ENSE00002290744 9344027 9344176
+ENSE00002234593 9323618 9323761
+ENSE00001723990 7569228 7569288
+ENSE00001745366 7567398 7567528
+ENSE00001785693 9192996 9193010
+ENSE00001643138 9192690 9192799
+ENSE00001806627 9190799 9190884
+ENSE00001617913 9185120 9187248
+ENSE00001725040 9192690 9192791
+ENSE00001616038 9192458 9192603
+ENSE00001645870 9190799 9190957
+ENSE00001746455 9187188 9187248
+ENSE00001695542 27764791 27764856
+ENSE00001711695 27765188 27766400
+ENSE00001622377 24344576 24344683
+ENSE00001786914 24345087 24345189
+ENSE00001763025 24349747 24349859
+ENSE00001649733 22889140 22889383
+ENSE00001795209 22891287 22891417
+ENSE00001658484 22891809 22891956
+ENSE00001602075 22892655 22892718
+ENSE00001646110 22892828 22892926
+ENSE00001782218 22895969 22896036
+ENSE00001712850 22902786 22902859
+ENSE00001806612 22903197 22903321
+ENSE00001717995 22904714 22904786
+ENSE00001712612 3646038 3647587
+ENSE00001752848 9904910 9904995
+ENSE00001543999 9904163 9904218
+ENSE00001731603 9906266 9906760
+ENSE00001617820 9905581 9905658
+ENSE00001788369 9905341 9905452
+ENSE00001672593 9905098 9905243
+ENSE00001688242 9904910 9904991
+ENSE00001713327 9904196 9904218
+ENSE00001592550 20602683 20603011
+ENSE00001642025 19949081 19949408
+ENSE00001618473 8145122 8145229
+ENSE00001629725 8144595 8144697
+ENSE00001758450 8139321 8139470
+ENSE00001591894 8134356 8134470
+ENSE00001757932 8133375 8133485
+ENSE00001666134 8133216 8133303
+ENSE00001686479 8132490 8132806
+ENSE00001653013 10036113 10036711
+ENSE00001652052 28390498 28390720
+ENSE00001638514 4669726 4670889
+ENSE00001732026 20267200 20267292
+ENSE00001643224 20269242 20269388
+ENSE00001686603 20269833 20269988
+ENSE00001722408 19838632 19838960
+ENSE00001677525 19921687 19921786
+ENSE00001668583 19923007 19923207
+ENSE00001727103 19925402 19925469
+ENSE00001635486 19925555 19925585
+ENSE00001702927 19927108 19927244
+ENSE00001667443 19927569 19927715
+ENSE00001638683 19930805 19930910
+ENSE00001673115 19932214 19932333
+ENSE00001615368 19934171 19934244
+ENSE00001675564 19935349 19935440
+ENSE00001738547 19937061 19937209
+ENSE00001674051 23799361 23800927
+ENSE00001634395 10011462 10011816
+ENSE00001756820 24083189 24083297
+ENSE00001674806 24082632 24082735
+ENSE00001673237 24079748 24079925
+ENSE00001732143 24077660 24077812
+ENSE00001679690 24075538 24075652
+ENSE00001782041 24074558 24074668
+ENSE00001745014 24074388 24074475
+ENSE00001743398 24073668 24073966
+ENSE00001650266 20033289 20033510
+ENSE00001796711 20030904 20031124
+ENSE00001805326 20028903 20029103
+ENSE00001739171 20026834 20026918
+ENSE00001615615 20024579 20024703
+ENSE00001644643 9477858 9478325
+ENSE00001634743 26829804 26831082
+ENSE00001712081 19740393 19740865
+ENSE00001691275 19741241 19741282
+ENSE00001719617 20566775 20566932
+ENSE00001722479 20566130 20566179
+ENSE00001650606 20554672 20554800
+ENSE00001798385 20554331 20554424
+ENSE00001740686 20553567 20553737
+ENSE00001660603 20553364 20553457
+ENSE00001753397 20552880 20553089
+ENSE00001787453 26081100 26081138
+ENSE00001755017 26082813 26083036
+ENSE00001689362 26084160 26084422
+ENSE00001619402 26086058 26086248
+ENSE00001707653 19833478 19835057
+ENSE00001754048 23069230 23069448
+ENSE00001631720 8239704 8240071
+ENSE00001598941 16750976 16752238
+ENSE00001619110 20257991 20258088
+ENSE00001655047 20256569 20256769
+ENSE00001663253 20254307 20254374
+ENSE00001599244 20254191 20254221
+ENSE00001716829 20252531 20252667
+ENSE00001740285 20252060 20252206
+ENSE00001722377 20248865 20248970
+ENSE00001799387 20247442 20247561
+ENSE00001639175 20244336 20244426
+ENSE00001602332 20242567 20242716
+ENSE00001740373 14649708 14649836
+ENSE00001638353 14652995 14655149
+ENSE00001697673 14656719 14656818
+ENSE00001618557 25820526 25821539
+ENSE00001760629 25082602 25082762
+ENSE00001687747 25109775 25109850
+ENSE00001806588 25111655 25111762
+ENSE00001713773 25118431 25119431
+ENSE00001715524 23473154 23473262
+ENSE00001803876 23473708 23473886
+ENSE00001802420 23476064 23476181
+ENSE00001631333 23476493 23476523
+ENSE00001792500 23478520 23478634
+ENSE00001606958 23479502 23479612
+ENSE00001748340 23479694 23479782
+ENSE00001745595 23480206 23480383
+ENSE00001655007 23480660 23480781
+ENSE00001958730 24549608 24549729
+ENSE00003461217 24551591 24551703
+ENSE00003566508 24552172 24552275
+ENSE00003512617 24557392 24557544
+ENSE00003628086 24559555 24559669
+ENSE00003475218 24560625 24560735
+ENSE00003524115 24561176 24561286
+ENSE00003544610 24561721 24561831
+ENSE00003519695 24562755 24562865
+ENSE00003617706 24562952 24563040
+ENSE00003462282 24563452 24564028
+ENSE00001775261 24549617 24549729
+ENSE00003571276 24551591 24551703
+ENSE00003560118 24552172 24552275
+ENSE00001663738 24555299 24555476
+ENSE00003481268 24557392 24557544
+ENSE00003643053 24559555 24559669
+ENSE00003478187 24560625 24560735
+ENSE00003515241 24561176 24561286
+ENSE00003518190 24561721 24561831
+ENSE00003566008 24562755 24562865
+ENSE00003680149 24562952 24563040
+ENSE00003509588 24563452 24564028
+ENSE00001594771 24454970 24455117
+ENSE00001766942 10007397 10007923
+ENSE00001691508 17659096 17659238
+ENSE00001674716 17661491 17661821
+ENSE00001733692 17663477 17663895
+ENSE00001677440 17678652 17678808
+ENSE00001757764 17705111 17705211
+ENSE00001804714 7995171 7995279
+ENSE00001711635 7997274 7997276
+ENSE00001760261 7999689 7999866
+ENSE00001636850 8001726 8001841
+ENSE00001595014 8002167 8002203
+ENSE00001745989 8010267 8010379
+ENSE00001636511 8011249 8011355
+ENSE00001632004 8011435 8011528
+ENSE00001798989 8011943 8012244
+ENSE00001794129 25204177 25204476
+ENSE00001605189 25204818 25205333
+ENSE00001600631 2751182 2751693
+ENSE00001611586 2749724 2750402
+ENSE00001616321 26631479 26631538
+ENSE00001618682 26632505 26632610
+ENSE00001698182 24065082 24065475
+ENSE00001796451 24066464 24066548
+ENSE00001650846 24066793 24066871
+ENSE00001799290 24066945 24067088
+ENSE00001659363 24067206 24067253
+ENSE00001700980 14619111 14619171
+ENSE00001778840 14600234 14600269
+ENSE00001683270 14596531 14596554
+ENSE00001612791 14580243 14580305
+ENSE00001736429 14571502 14571559
+ENSE00001608554 14567945 14567995
+ENSE00001712855 14555064 14555099
+ENSE00001639452 14551845 14551961
+ENSE00001615810 20642973 20643106
+ENSE00001622792 20648080 20648236
+ENSE00001804045 20648639 20648814
+ENSE00001713121 20649191 20649286
+ENSE00001632993 28737695 28737748
+ENSE00001691206 28733545 28733707
+ENSE00001652425 28732789 28732913
+ENSE00001775619 27412731 27412843
+ENSE00001799498 27413254 27413355
+ENSE00001705987 27415744 27415925
+ENSE00001714230 27418670 27418780
+ENSE00001736832 27418862 27418950
+ENSE00001670100 27419364 27419678
+ENSE00001801471 28081780 28082028
+ENSE00001777749 28079980 28080162
+ENSE00001744076 8281841 8282095
+ENSE00001648363 8281078 8281495
+ENSE00001761637 26837938 26838222
+ENSE00001646263 26838564 26839079
+ENSE00001738572 28137388 28137719
+ENSE00001647752 26098309 26098440
+ENSE00001779865 26093094 26093247
+ENSE00001620696 26092551 26092738
+ENSE00001763493 26092074 26092169
+ENSE00001791861 25824651 25824982
+ENSE00001806367 17053427 17053630
+ENSE00001727723 23719571 23719956
+ENSE00001647639 23718482 23718541
+ENSE00001611070 23718159 23718285
+ENSE00001757809 23717738 23717787
+ENSE00001649434 20949636 20949966
+ENSE00001694343 8771032 8771317
+ENSE00001742226 8769642 8769780
+ENSE00001769856 8769411 8769554
+ENSE00001698979 8769244 8769299
+ENSE00001747233 24997731 24997790
+ENSE00001718178 24998757 24998862
+ENSE00001743100 23695501 23695897
+ENSE00001753135 23694449 23694514
+ENSE00001762790 23694122 23694185
+ENSE00001609952 23693890 23694035
+ENSE00001788043 23693725 23693772
+ENSE00001613955 27245879 27246039
+ENSE00001793058 27218793 27218868
+ENSE00001658199 27216881 27216988
+ENSE00001702789 27209230 27210230
+ENSE00001670654 9650924 9651009
+ENSE00001742535 9653511 9653653
+ENSE00001782038 9654904 9655122
+ENSE00001628818 9213276 9213290
+ENSE00001645989 9212970 9213079
+ENSE00001671314 9211075 9211160
+ENSE00001645127 9205412 9207525
+ENSE00001623303 9212970 9213071
+ENSE00003646877 9212738 9212883
+ENSE00001624981 9211075 9211233
+ENSE00001776488 9207466 9207525
+ENSE00002237197 9234756 9235047
+ENSE00002305238 9234328 9234347
+ENSE00002214727 9232876 9232920
+ENSE00002234764 9224460 9224505
+ENSE00001705725 9213276 9213340
+ENSE00001777540 9213044 9213079
+ENSE00001698821 9212970 9213042
+ENSE00003487206 9212738 9212883
+ENSE00001776941 9212578 9212619
+ENSE00001786986 9214410 9214701
+ENSE00001636298 9213969 9214001
+ENSE00001724614 9213752 9213783
+ENSE00001641069 9213494 9213539
+ENSE00002259310 9234315 9234347
+ENSE00002243480 9234098 9234129
+ENSE00001731628 9233840 9233885
+ENSE00001702837 9233622 9233686
+ENSE00001777950 9233390 9233425
+ENSE00001759506 9233316 9233388
+ENSE00003471081 9233084 9233229
+ENSE00001630624 9232924 9232965
+ENSE00001755166 14914995 14915763
+ENSE00001729630 14913906 14914649
+ENSE00001708301 26116730 26116841
+ENSE00001718075 26114234 26114365
+ENSE00001596904 26113711 26113789
+ENSE00001691278 19304706 19305532
+ENSE00001592952 24683194 24683352
+ENSE00001667759 24683838 24683996
+ENSE00001788558 19630862 19630918
+ENSE00001790420 19630448 19630556
+ENSE00001783650 19630277 19630367
+ENSE00001800896 19628716 19629764
+ENSE00001753165 22551937 22551996
+ENSE00001624781 22557394 22557905
+ENSE00001800341 22558157 22558682
+ENSE00001796314 27540773 27540866
+ENSE00001653980 27533064 27533232
+ENSE00001596444 27524447 27525025
+ENSE00001638780 16915885 16915913
+ENSE00001750121 16913156 16913224
+ENSE00001766520 16910502 16910589
+ENSE00001648650 16905522 16905867
+ENSE00001720861 24210845 24211859
+ENSE00001806349 9558704 9558905
+ENSE00001729950 9555262 9555366
+ENSE00001774191 9236030 9236561
+ENSE00003692666 9237169 9237246
+ENSE00001747539 9237375 9237486
+ENSE00001719444 9237588 9237733
+ENSE00001781353 9237840 9237921
+ENSE00001954357 9238616 9238826
+ENSE00003617259 9236076 9236561
+ENSE00003611797 9237829 9237921
+ENSE00001758618 9238616 9238825
+ENSE00003511805 9236076 9236561
+ENSE00003461293 9237169 9237246
+ENSE00003601991 9237375 9237921
+ENSE00001879368 9238049 9238832
+ENSE00001932930 9237436 9237733
+ENSE00003565177 9237829 9237921
+ENSE00001931598 9238616 9238817
+ENSE00003463520 9306122 9306267
+ENSE00003659094 9306374 9306455
+ENSE00003525623 9307148 9307357
+ENSE00001618274 9236076 9236162
+ENSE00001610531 9236424 9236561
+ENSE00003525860 9237375 9237921
+ENSE00001724421 20442064 20442235
+ENSE00001776105 20443132 20443571
+ENSE00001726546 20443649 20443794
+ENSE00001754461 20443924 20444031
+ENSE00001804017 20444676 20444747
+ENSE00001740783 20444998 20445051
+ENSE00001721829 20445778 20445876
+ENSE00001689481 20446357 20446440
+ENSE00001600079 20446520 20446647
+ENSE00001770854 19868881 19870005
+ENSE00001707259 19995523 19995588
+ENSE00001614194 19993979 19995178
+ENSE00001610381 24329997 24330390
+ENSE00001729283 24331382 24331463
+ENSE00001736394 24331702 24331772
+ENSE00001611617 24331860 24332003
+ENSE00001675926 24332121 24332168
+ENSE00001786158 2799112 2799161
+ENSE00001605858 2797411 2798407
+ENSE00001748625 2797042 2797083
+ENSE00001628114 9684128 9684305
+ENSE00001632712 9673456 9673605
+ENSE00001592139 9671356 9671455
+ENSE00001666942 9670438 9670548
+ENSE00001734562 9669930 9670040
+ENSE00001636017 9669744 9669819
+ENSE00001733825 9669027 9669328
+ENSE00001657129 24355478 24355590
+ENSE00001754863 24356104 24356174
+ENSE00001761172 24358572 24358748
+ENSE00001773734 24362613 24362727
+ENSE00001757026 22680157 22681114
+ENSE00001688639 22669238 22669342
+ENSE00001719273 22631758 22631867
+ENSE00001669312 22630075 22630134
+ENSE00001646929 22628314 22628553
+ENSE00001717483 22627554 22627708
+ENSE00001652776 22680157 22680293
+ENSE00001755411 22669140 22669342
+ENSE00001642256 28500435 28500565
+ENSE00001701173 28494607 28494725
+ENSE00001806782 28484982 28485141
+ENSE00001713752 28483524 28483668
+ENSE00001655231 28481793 28481866
+ENSE00001777012 28479872 28479951
+ENSE00001591489 28479076 28479215
+ENSE00001678181 28475093 28475275
+ENSE00002438533 28437195 28437284
+ENSE00001666272 28427573 28427776
+ENSE00001694641 28425413 28425602
+ENSE00001689979 28425212 28425324
+ENSE00001680774 28424070 28424111
+ENSE00001638605 26236272 26236493
+ENSE00001619709 26233907 26234126
+ENSE00001732018 26231919 26232108
+ENSE00001600167 26229975 26230103
+ENSE00001682049 26227851 26227970
+ENSE00001691510 20438909 20439381
+ENSE00001725503 20438493 20438534
+ENSE00001806707 3904538 3904761
+ENSE00001708248 3968027 3968361
+ENSE00001747579 27959160 27960713
+ENSE00001742312 20633950 20634023
+ENSE00001785470 20632556 20632756
+ENSE00001687737 20630270 20630337
+ENSE00001738169 20630152 20630182
+ENSE00001804747 20625798 20625934
+ENSE00001597616 20625334 20625501
+ENSE00001629276 20622118 20622223
+ENSE00001688905 20619831 20619949
+ENSE00001626846 20617975 20618044
+ENSE00001658632 20616779 20616869
+ENSE00001653676 20615036 20615149
+ENSE00001771502 23842919 23843004
+ENSE00001774620 23840886 23841109
+ENSE00001774789 23839132 23839321
+ENSE00001793905 20309770 20310894
+ENSE00001678856 3734347 3734763
+ENSE00002028706 9195406 9195937
+ENSE00003676294 9196545 9196622
+ENSE00002045368 9196751 9196862
+ENSE00001638906 9196964 9197109
+ENSE00003564905 9197216 9197297
+ENSE00001885622 9197992 9198202
+ENSE00003489927 9195452 9195937
+ENSE00001801838 9197205 9197297
+ENSE00001603826 9197992 9198201
+ENSE00003579602 9195452 9195937
+ENSE00003593279 9196545 9196622
+ENSE00001943292 9196751 9197297
+ENSE00001940527 9197425 9198208
+ENSE00001908923 9196812 9197109
+ENSE00003618941 9197216 9197297
+ENSE00001859698 9197992 9198193
+ENSE00003586983 9217242 9217387
+ENSE00003603865 9217494 9217575
+ENSE00001720810 9218270 9218479
+ENSE00002220142 9195452 9195688
+ENSE00002247562 9215950 9216150
+ENSE00001350919 9216153 9216215
+ENSE00003564996 9216823 9216900
+ENSE00003465500 9217029 9217140
+ENSE00001159243 9195501 9195937
+ENSE00002206433 9197216 9197317
+ENSE00001806624 23023223 23024223
+ENSE00001696124 24204040 24204260
+ENSE00001722283 24202119 24202236
+ENSE00001646198 24200144 24200332
+ENSE00001630914 24197657 24197785
+ENSE00001751441 24195902 24195940
+ENSE00001671403 6229369 6229454
+ENSE00001660336 6226725 6226867
+ENSE00001785311 6225260 6225478
+ENSE00001640432 27725937 27726142
+ENSE00001650835 27728306 27728529
+ENSE00001768767 27730334 27730512
+ENSE00001734323 27732325 27732414
+ENSE00001757798 27734462 27734591
+ENSE00001754078 24017478 24017864
+ENSE00001801282 24019109 24019276
+ENSE00001669347 24019648 24019697
+ENSE00001781088 7805612 7805681
+ENSE00001788779 7804676 7804745
+ENSE00001602297 7801735 7801840
+ENSE00001712311 7801038 7801056
+ENSE00001663391 24453707 24454098
+ENSE00001634582 24452302 24452367
+ENSE00001626840 24452072 24452217
+ENSE00001668677 8841768 8842136
+ENSE00001765004 8840028 8840072
+ENSE00001613408 8839792 8839935
+ENSE00001664411 23858456 23860106
+ENSE00001675278 9342754 9342768
+ENSE00001648707 9342449 9342558
+ENSE00001614122 9340560 9340645
+ENSE00001755116 9334896 9337009
+ENSE00001615889 9342449 9342550
+ENSE00001739908 9342217 9342362
+ENSE00001694789 9340560 9340718
+ENSE00001664818 9336950 9337009
+ENSE00001596967 25669468 25669869
+ENSE00001793420 25670900 25670956
+ENSE00001682691 25671157 25671280
+ENSE00001762358 25671373 25671516
+ENSE00001653366 20694212 20694542
+ENSE00001719426 20230369 20230696
+ENSE00001766548 20740303 20740446
+ENSE00001718469 24760122 24760228
+ENSE00001757100 24758482 24758611
+ENSE00001774249 24757838 24757947
+ENSE00001660181 24755505 24756120
+ENSE00001783530 24755021 24755131
+ENSE00001683819 24735579 24735679
+ENSE00001748455 24728791 24728945
+ENSE00001605377 24727528 24727594
+ENSE00001631605 24727136 24727229
+ENSE00001743559 19691196 19691634
+ENSE00001734255 19688583 19688700
+ENSE00001754684 19667656 19667890
+ENSE00001738421 19665922 19666006
+ENSE00001606680 19664680 19665834
+ENSE00001722745 24915822 24915934
+ENSE00001612746 24915310 24915411
+ENSE00001617577 24912745 24912921
+ENSE00001806297 24909883 24909993
+ENSE00001803898 24909713 24909801
+ENSE00001607696 24908998 24909299
+ENSE00001627359 24674428 24674648
+ENSE00001652271 24676451 24676569
+ENSE00001614594 24678356 24678544
+ENSE00001693270 24680903 24681031
+ENSE00001639643 24682748 24682786
+ENSE00001679003 27164646 27165450
+ENSE00001610666 28069535 28069666
+ENSE00001663003 28074809 28074946
+ENSE00001597803 28075090 28075126
+ENSE00001638987 28075341 28075498
+ENSE00001596879 28075881 28075973
+ENSE00001768357 20973224 20973715
+ENSE00001694457 26328624 26329218
+ENSE00001592853 26063388 26063870
+ENSE00001685903 27874637 27874740
+ENSE00001745631 27876939 27877073
+ENSE00001676858 27879431 27879535
+ENSE00001724152 24118460 24118566
+ENSE00001669607 24120079 24120206
+ENSE00001639143 24120743 24120852
+ENSE00001650621 24122570 24123185
+ENSE00001633081 24123559 24123669
+ENSE00001602954 24143010 24143110
+ENSE00001699591 24149743 24149894
+ENSE00001771661 24151091 24151156
+ENSE00001632036 24151455 24151496
+ENSE00001791163 20323252 20323355
+ENSE00001745360 20325864 20325979
+ENSE00001646564 20338099 20338370
+ENSE00001732601 5075256 5076110
+ENSE00001622837 26197569 26197634
+ENSE00001747409 26196025 26197240
+ENSE00001674006 10009199 10010334
+ENSE00001636194 26102716 26102943
+ENSE00001794176 26105997 26106365
+ENSE00001756377 9458777 9458885
+ENSE00001746468 9458236 9458339
+ENSE00001745811 9454762 9454936
+ENSE00001752960 9452699 9452845
+ENSE00001702756 9450030 9450143
+ENSE00001799300 9449062 9449172
+ENSE00001670043 9448893 9448976
+ENSE00001770923 9448180 9448480
+ENSE00001616687 28772667 28773306
+ENSE00001678149 2870953 2871258
+ENSE00001757585 2947325 2947382
+ENSE00001696132 2965036 2965258
+ENSE00001677281 2871039 2871258
+ENSE00001620860 2965036 2965097
+ENSE00001694023 2970126 2970313
+ENSE00001634403 25954968 25955206
+ENSE00001705880 28140831 28141844
+ENSE00001632978 9707273 9707748
+ENSE00001683725 9708358 9708390
+ENSE00001617130 8898635 8898771
+ENSE00001728356 8899096 8899270
+ENSE00001783134 8902714 8902829
+ENSE00001750562 8904498 8904571
+ENSE00001675644 8905717 8905808
+ENSE00001603923 8907841 8908029
+ENSE00001804178 26646410 26647652
+ENSE00001788264 24214967 24215298
+ENSE00001800082 27738263 27738492
+ENSE00001746593 27736470 27736660
+ENSE00001614266 2696023 2696259
+ENSE00001714431 26250254 26250447
+ENSE00001710497 26250751 26252165
+ENSE00001691217 14378586 14378737
+ENSE00001804878 14374685 14374811
+ENSE00001737381 14373025 14373222
+ENSE00001717478 9003694 9005311
+ENSE00001650635 8506335 8506447
+ENSE00001733122 8512173 8512368
+ENSE00001613384 8512720 8512883
+ENSE00001715519 8572513 8572600
+ENSE00001776324 8573144 8573324
+ENSE00001657190 25892713 25892840
+ENSE00001765641 25887431 25887580
+ENSE00001718086 25887252 25887281
+ENSE00001712499 25886883 25887041
+ENSE00001732839 25886405 25886500
+ENSE00001623599 24195336 24195494
+ENSE00001768555 24194701 24194850
+ENSE00001607684 22173972 22174090
+ENSE00001704650 22167626 22167714
+ENSE00001699413 22163071 22163585
+ENSE00001774695 6974434 6974543
+ENSE00001648340 6968774 6968868
+ENSE00001644797 26118927 26119025
+ENSE00001599057 26120341 26120540
+ENSE00001651852 26122748 26122815
+ENSE00001726876 26122901 26122933
+ENSE00001788414 26130439 26130572
+ENSE00001741171 26130911 26131046
+ENSE00001782936 26134091 26134199
+ENSE00001616123 26136306 26136418
+ENSE00001758121 26138195 26138268
+ENSE00001695453 26139373 26139464
+ENSE00001664413 26141080 26141228
+ENSE00001694546 27881271 27881307
+ENSE00001642941 27879374 27879595
+ENSE00001689808 27877987 27878249
+ENSE00001742552 27876158 27876350
+ENSE00001659626 9502838 9503010
+ENSE00001688131 9506532 9506619
+ENSE00001679725 6388504 6388970
+ENSE00001666493 23382992 23383975
+ENSE00001693593 28007090 28007415
+ENSE00001728711 14442350 14443562
+ENSE00001767583 20285560 20285685
+ENSE00001743874 20284895 20284986
+ENSE00001592268 20283081 20283269
+ENSE00001612213 25171741 25171991
+ENSE00001761722 25172300 25172810
+ENSE00001800398 9172360 9172441
+ENSE00001695576 9167489 9167665
+ENSE00001763167 25911619 25911730
+ENSE00001732672 25909520 25909664
+ENSE00001649962 25908985 25909070
+ENSE00001702931 20344721 20346300
+ENSE00001772654 25897335 25897907
+ENSE00001767851 28050665 28050757
+ENSE00001731820 28052712 28052856
+ENSE00001621307 28053305 28053397
+ENSE00001778513 9068524 9068856
+ENSE00001789966 3550846 3551909
+ENSE00001599138 26805484 26805714
+ENSE00001781743 26806034 26806553
+ENSE00001615694 2870354 2870667
+ENSE00001670294 2849879 2849968
+ENSE00001787837 2834885 2834980
+ENSE00001664412 2870579 2870612
+ENSE00001751906 2870354 2870442
+ENSE00001659262 2869669 2869964
+ENSE00001618370 20278413 20278550
+ENSE00001740946 20278706 20278734
+ENSE00001805102 20278947 20279105
+ENSE00001724905 20279486 20279581
+ENSE00001670949 25195394 25195507
+ENSE00001795349 25195827 25195883
+ENSE00001626354 25196049 25197342
+ENSE00001715878 28157582 28157740
+ENSE00001703739 28158226 28158384
+ENSE00001796121 20096229 20096449
+ENSE00001737699 20098617 20098835
+ENSE00001743166 20100636 20100825
+ENSE00001784518 20102823 20102907
+ENSE00001605344 20105048 20105140
+ENSE00001672475 20488139 20488576
+ENSE00001640636 20491077 20491194
+ENSE00001758131 20511886 20512120
+ENSE00001710040 20513770 20513854
+ENSE00001790139 20513942 20515096
+ENSE00001706607 20691244 20691509
+ENSE00001779202 27633431 27633809
+ENSE00001632989 6705748 6706293
+ENSE00001734013 27535139 27537421
+ENSE00001598660 27537832 27537958
+ENSE00001744860 8551762 8551919
+ENSE00001765620 8551411 8551473
+ENSE00001794752 26425509 26427287
+ENSE00001781093 26425021 26425442
+ENSE00001614820 26424484 26424610
+ENSE00001746702 14746216 14746333
+ENSE00001794515 14744661 14744805
+ENSE00001629993 14731962 14732097
+ENSE00001649124 14731558 14731665
+ENSE00001774864 14730915 14731055
+ENSE00001494279 9175073 9175622
+ENSE00003638561 9176230 9176307
+ENSE00001734097 9176436 9176547
+ENSE00001697845 9176649 9176794
+ENSE00001703875 9176901 9176982
+ENSE00001897275 9177677 9177887
+ENSE00003476280 9175119 9175622
+ENSE00003578244 9176890 9176982
+ENSE00001711997 9177677 9177886
+ENSE00003579134 9175119 9175622
+ENSE00003581570 9176230 9176307
+ENSE00001855646 9176436 9176982
+ENSE00001810355 9177110 9177893
+ENSE00001825143 9176497 9176794
+ENSE00003677158 9176890 9176982
+ENSE00001840824 9177677 9177878
+ENSE00001661126 28546758 28546808
+ENSE00001647030 28547106 28547377
+ENSE00001717777 14774265 14774637
+ENSE00001661171 14776571 14776617
+ENSE00001804864 14798442 14798535
+ENSE00001769405 14802255 14802370
+ENSE00001538271 14799393 14804162
+ENSE00001677769 14774468 14774637
+ENSE00001720726 14799614 14800184
+ENSE00002231970 14774284 14774637
+ENSE00002219117 14776571 14776614
+ENSE00001660519 27539301 27540203
+ENSE00001721076 15060008 15060090
+ENSE00001690255 15059583 15059686
+ENSE00001766707 15058196 15058215
+ENSE00001768739 15057933 15057980
+ENSE00001787917 15057797 15057831
+ENSE00001595199 15057528 15057571
+ENSE00001629383 15047112 15047196
+ENSE00001762780 15042075 15042237
+ENSE00001621226 26153058 26153384
+ENSE00001789186 9461792 9462154
+ENSE00001771580 9463179 9463265
+ENSE00001633534 9463471 9463583
+ENSE00001599361 9463670 9463810
+ENSE00001743476 9463921 9463961
+ENSE00001600092 24663389 24663720
+ENSE00001680177 8232073 8233191
+ENSE00001758106 13901758 13903233
+ENSE00001797129 26549565 26549673
+ENSE00001724851 26549053 26549154
+ENSE00001706454 26546483 26546664
+ENSE00001717052 26543636 26543746
+ENSE00001779101 26543466 26543554
+ENSE00001603762 26542751 26543052
+ENSE00001727355 23696765 23696912
+ENSE00001718460 23698774 23698886
+ENSE00001661257 23699361 23699464
+ENSE00003587206 23702484 23702661
+ENSE00003656490 23704577 23704729
+ENSE00003571694 23706740 23706854
+ENSE00003569851 23707810 23707920
+ENSE00003633827 23708362 23708472
+ENSE00003550118 23708907 23709017
+ENSE00003663549 23709939 23710049
+ENSE00003674760 23710136 23710224
+ENSE00003536697 23710636 23711212
+ENSE00001670247 23696790 23696912
+ENSE00003581140 23704577 23704729
+ENSE00003460783 23706740 23706854
+ENSE00003639215 23707810 23707920
+ENSE00003489167 23708362 23708472
+ENSE00003511329 23708907 23709017
+ENSE00003525217 23709939 23710049
+ENSE00003545208 23710136 23710224
+ENSE00003621655 23710636 23711212
+ENSE00001600029 23698778 23698886
+ENSE00001632888 23710636 23711210
+ENSE00001795478 23673258 23673371
+ENSE00003461810 23675233 23675345
+ENSE00003638069 23675820 23675923
+ENSE00003462284 23702484 23702661
+ENSE00003673811 23704577 23704729
+ENSE00003599248 23706740 23706854
+ENSE00001635427 21489455 21490459
+ENSE00001789043 6172640 6173115
+ENSE00001629179 6171998 6172030
+ENSE00001774484 17567915 17567954
+ENSE00001712382 17493473 17493722
+ENSE00001710358 17460542 17460746
+ENSE00001661949 3161849 3162867
+ENSE00001688800 28148401 28148623
+ENSE00001679843 28150422 28150540
+ENSE00001627555 28152327 28152516
+ENSE00001792441 28154875 28155003
+ENSE00003710715 7644562 7644748
+ENSE00001628915 7617404 7617512
+ENSE00001624117 7597498 7597881
+ENSE00001599548 7589897 7590099
+ENSE00001591968 7581026 7581138
+ENSE00001803550 7677195 7677302
+ENSE00001675697 7673536 7673645
+ENSE00001644958 7644875 7645069
+ENSE00001678529 7610636 7611068
+ENSE00001637728 9363082 9363096
+ENSE00001601067 9362776 9362885
+ENSE00001733722 9360885 9360970
+ENSE00001707869 9355208 9357334
+ENSE00001654072 9362776 9362877
+ENSE00003461311 9362544 9362689
+ENSE00001709675 9360885 9361043
+ENSE00001705207 9357275 9357334
+ENSE00003224946 9364215 9364506
+ENSE00003096789 9363787 9363806
+ENSE00003004848 9363280 9363308
+ENSE00003055029 9363163 9363219
+ENSE00003198125 9363082 9363148
+ENSE00003165650 9362850 9362885
+ENSE00003163421 9362776 9362848
+ENSE00003611037 9362544 9362689
+ENSE00003028414 9362384 9362425
+ENSE00001611400 6131875 6131994
+ENSE00001669863 6129985 6130070
+ENSE00001673478 6124308 6126433
+ENSE00001712586 6131875 6131976
+ENSE00003709639 6131643 6131788
+ENSE00001663750 6129985 6130143
+ENSE00001795860 6126374 6126433
+ENSE00003709660 6130047 6130143
+ENSE00001688314 8240282 8240751
+ENSE00001698575 28234536 28234640
+ENSE00001669923 28232883 28233007
+ENSE00001614971 28232234 28232346
+ENSE00001661020 28229897 28230516
+ENSE00001744230 28229431 28229527
+ENSE00001720034 28209980 28210076
+ENSE00001752268 28203178 28203341
+ENSE00001592390 28201926 28201996
+ENSE00001666495 9789162 9789270
+ENSE00001728287 9789732 9789807
+ENSE00001639523 9795049 9795161
+ENSE00001648850 9796028 9796138
+ENSE00001756419 9796219 9796307
+ENSE00001707777 9796721 9797032
+ENSE00001796289 5205786 5207005
+ENSE00002291273 5205788 5206981
+ENSE00001772277 27869163 27869328
+ENSE00001708810 27869664 27869857
+ENSE00001618426 27870239 27870333
+ENSE00001635775 28354223 28354295
+ENSE00001597436 28344478 28344604
+ENSE00001731360 24585087 24585287
+ENSE00001688179 24585622 24585931
+ENSE00001696656 24592479 24592619
+ENSE00001742686 24602582 24602804
+ENSE00001695575 24603113 24603198
+ENSE00001748239 24610377 24610495
+ENSE00001686372 24610926 24611279
+ENSE00001757378 24624912 24625001
+ENSE00001738135 24626468 24626803
+ENSE00001662296 24628604 24628689
+ENSE00001781418 24628810 24629006
+ENSE00001735320 24629551 24629668
+ENSE00001802546 24630184 24630861
+ENSE00001607267 24626684 24626803
+ENSE00001677719 24630184 24630448
+ENSE00001719566 24631498 24631739
+ENSE00001779056 6174083 6174412
+ENSE00001754002 6175464 6175541
+ENSE00001765437 6175737 6175852
+ENSE00001739366 6175939 6175961
+ENSE00001795992 5661341 5661778
+ENSE00001739581 9925635 9925740
+ENSE00001645033 9926775 9927195
+ENSE00001700611 26716349 26716509
+ENSE00001798741 26743516 26743591
+ENSE00001724601 26745396 26745503
+ENSE00001655881 26752172 26753172
+ENSE00001742351 16192955 16193049
+ENSE00001606802 16193186 16193329
+ENSE00001592567 16198415 16198480
+ENSE00001620082 15272747 15274125
+ENSE00001793879 15271867 15272027
+ENSE00001621504 15265842 15265935
+ENSE00001741426 15272805 15273460
+ENSE00001721819 15271184 15271380
+ENSE00001760736 20811557 20812165
+ENSE00001706919 28044702 28044800
+ENSE00001668409 28043278 28043483
+ENSE00001658969 28027522 28027656
+ENSE00001806118 28027030 28027177
+ENSE00001766229 28023878 28023983
+ENSE00001720050 28021112 28021230
+ENSE00001705255 28019252 28019325
+ENSE00001599768 28018078 28018149
+ENSE00001723842 19901223 19901364
+ENSE00001600158 19901042 19901070
+ENSE00001782991 19900671 19900841
+ENSE00001648973 19900195 19900290
+ENSE00001803460 27673796 27673933
+ENSE00001656819 27671351 27671435
+ENSE00001682459 27670168 27670257
+ENSE00001653328 27665826 27665884
+ENSE00001804144 27659272 27659320
+ENSE00001709330 27658448 27658543
+ENSE00001598977 14077914 14078105
+ENSE00001609880 14088788 14093745
+ENSE00001671213 14099618 14100561
+ENSE00001674255 14106187 14108092
+ENSE00001623621 9928411 9928816
+ENSE00001799526 23628730 23629049
+ENSE00001753337 23627615 23627697
+ENSE00001695538 23627270 23627386
+ENSE00001794473 59001391 59001635
+ENSE00001695426 6364626 6364798
+ENSE00001695477 6361017 6361112
+ENSE00001643114 26288505 26288642
+ENSE00001795389 26291003 26291087
+ENSE00001629300 26292177 26292270
+ENSE00001702866 26296554 26296612
+ENSE00001660124 26303118 26303166
+ENSE00001619869 26303895 26303990
+ENSE00001716125 17019778 17019948
+ENSE00001626278 27314746 27315988
+ENSE00001758513 9365489 9366020
+ENSE00003678107 9366628 9366705
+ENSE00001645840 9366834 9366945
+ENSE00003600117 9367047 9367192
+ENSE00003511199 9367299 9367380
+ENSE00001900367 9368075 9368285
+ENSE00003531660 9365535 9366020
+ENSE00001716327 9367288 9367380
+ENSE00001604245 9368075 9368284
+ENSE00003623643 9365535 9366020
+ENSE00003481640 9366628 9366705
+ENSE00001885335 9366834 9367380
+ENSE00001840919 9367508 9368291
+ENSE00001860458 9366895 9367192
+ENSE00003665834 9367299 9367380
+ENSE00001947561 9368075 9368276
+ENSE00001665114 9365535 9365621
+ENSE00001608967 9365883 9366020
+ENSE00003550132 9367047 9367192
+ENSE00001799927 20903729 20903872
+ENSE00001728219 7557118 7557589
+ENSE00001660382 7556424 7556497
+ENSE00001592812 7556182 7556292
+ENSE00001628878 7555938 7556083
+ENSE00001744875 7555780 7555844
+ENSE00001601315 14365457 14365582
+ENSE00001626390 14365918 14366162
+ENSE00001803290 20340818 20341146
+ENSE00001772292 23567656 23567881
+ENSE00001707613 25872620 25872955
+ENSE00001679661 25862069 25862848
+ENSE00001746820 25695956 25696297
+ENSE00001781854 25696762 25696800
+ENSE00001650660 25697030 25697104
+ENSE00001695376 25697200 25697339
+ENSE00001765364 25697457 25697532
+ENSE00001648735 7558552 7558878
+ENSE00001646873 7560201 7560318
+ENSE00001718144 7560405 7560574
+ENSE00001762583 7560673 7560719
+ENSE00001727148 9871697 9871796
+ENSE00001611218 9871164 9871264
+ENSE00001798859 9867998 9868175
+ENSE00001631408 9864119 9864270
+ENSE00001610747 9861966 9862080
+ENSE00001695935 9860979 9861089
+ENSE00001751980 9860804 9860892
+ENSE00001795242 9860098 9860398
+ENSE00002293458 9868005 9868142
+ENSE00001661837 9385717 9386189
+ENSE00001780404 9386797 9386874
+ENSE00001761447 9387003 9387114
+ENSE00001796958 9387216 9387360
+ENSE00001609132 9387456 9387548
+ENSE00001624832 9388269 9388290
+ENSE00001791982 24251897 24252016
+ENSE00001595474 24249032 24249149
+ENSE00001710293 24248252 24248516
+ENSE00001792666 24246961 24247202
+ENSE00001799370 24293431 24293631
+ENSE00001733114 24292787 24293096
+ENSE00001657618 24286097 24286237
+ENSE00001783522 24275906 24276128
+ENSE00001698815 24275512 24275597
+ENSE00001706895 24268215 24268333
+ENSE00001789274 24267431 24267784
+ENSE00001654967 24253699 24253788
+ENSE00001692849 24251897 24252232
+ENSE00001679112 24250011 24250096
+ENSE00001694527 24249694 24249890
+ENSE00001769353 24247839 24248516
+ENSE00001731828 23823812 23824028
+ENSE00001720641 23826649 23826863
+ENSE00001658227 23828985 23829162
+ENSE00001797496 23831970 23832096
+ENSE00001739891 23834456 23835359
+ENSE00001634813 23835770 23835894
+ENSE00001793150 7672965 7673120
+ENSE00001702273 7677902 7678724
+ENSE00001639020 6111595 6111651
+ENSE00003553084 6111336 6111481
+ENSE00001663564 6110487 6110795
+ENSE00001791643 6026843 6027306
+ENSE00001802023 23598954 23599131
+ENSE00001693799 23596577 23596729
+ENSE00001692050 23594470 23594584
+ENSE00001799280 23593473 23593583
+ENSE00001673716 23593292 23593380
+ENSE00001673218 23592583 23592883
+ENSE00001735464 25917379 25917476
+ENSE00001733578 25918895 25919096
+ENSE00001790322 25929787 25929855
+ENSE00001797275 25929940 25929970
+ENSE00001737717 25930761 25930884
+ENSE00001693294 25934719 25934853
+ENSE00001707543 25935198 25935343
+ENSE00001761870 25938390 25938496
+ENSE00001624501 25941145 25941263
+ENSE00001635783 25943050 25943123
+ENSE00001736833 25944226 25944297
+ENSE00001766910 23292756 23293067
+ENSE00001794575 10027986 10029907
+ENSE00001739223 26223946 26224167
+ENSE00001657411 26225770 26225959
+ENSE00001652940 28089415 28089750
+ENSE00001752327 28099520 28100299
+ENSE00001634021 20063469 20063633
+ENSE00001709795 20063942 20065380
+ENSE00001734007 6311475 6311676
+ENSE00001620336 6315014 6315118
+ENSE00001679045 20108912 20109132
+ENSE00001662816 20107081 20107289
+ENSE00001772499 2657868 2658369
+ENSE00001762435 6768794 6768889
+ENSE00001777650 6769188 6769413
+ENSE00001788545 9233622 9233636
+ENSE00001744623 9233316 9233425
+ENSE00001769532 9231415 9231500
+ENSE00001679232 9225731 9227859
+ENSE00001651045 9233316 9233417
+ENSE00003512039 9233084 9233229
+ENSE00001780836 9231415 9231573
+ENSE00001616192 9227800 9227859
+ENSE00001618439 27712010 27712180
+ENSE00001613903 27710269 27711710
+ENSE00001697534 24548322 24548715
+ENSE00001718987 24547249 24547333
+ENSE00001618280 24546939 24547004
+ENSE00001681425 24546709 24546852
+ENSE00001658411 24546550 24546591
+ENSE00001797328 28780670 28780799
+ENSE00001638296 28779492 28779578
+ENSE00001681574 28776794 28776896
+ENSE00001741452 28774418 28774584
+ENSE00001725096 28774090 28774169
+ENSE00001670663 28769733 28769813
+ENSE00001752207 28768659 28768755
+ENSE00001687652 28767920 28768042
+ENSE00001747631 28761441 28761678
+ENSE00001702742 28760301 28761109
+ENSE00001647502 28757860 28757954
+ENSE00001744948 28747038 28747169
+ENSE00001663098 28740998 28741192
+ENSE00001745034 24086159 24086490
+ENSE00001592977 24086970 24087008
+ENSE00001710578 24087207 24087316
+ENSE00001639465 24087404 24087547
+ENSE00001714793 24087665 24087740
+ENSE00001767839 22148951 22149738
+ENSE00001716848 22148461 22148590
+ENSE00001766151 20670463 20670954
+ENSE00001747238 7543952 7544065
+ENSE00001683875 7543407 7543512
+ENSE00001737985 7540669 7540779
+ENSE00001745774 7540491 7540579
+ENSE00001733068 7539784 7540083
+ENSE00001290990 9324922 9325425
+ENSE00001663813 9326033 9326110
+ENSE00003569983 9326239 9326350
+ENSE00003622104 9326452 9326597
+ENSE00003482640 9326704 9326785
+ENSE00003512313 9327480 9327689
+ENSE00001696304 9324922 9325008
+ENSE00001597607 9325288 9325425
+ENSE00003468775 9326239 9326350
+ENSE00003536218 9326452 9326597
+ENSE00003492729 9326704 9326785
+ENSE00003690238 9327480 9327689
+ENSE00001761519 21010150 21010221
+ENSE00001788634 21011415 21011615
+ENSE00001681234 21013835 21013899
+ENSE00001682885 21013987 21014016
+ENSE00001673336 21018235 21018371
+ENSE00001695625 21018720 21018835
+ENSE00001609688 21021946 21022052
+ENSE00001655456 21024221 21024338
+ENSE00001692470 21026120 21026187
+ENSE00001762760 21027300 21027399
+ENSE00001659235 21029026 21029166
+ENSE00001761647 20987993 20988213
+ENSE00001593456 20988983 20989106
+ENSE00001801031 20989185 20989278
+ENSE00001625027 20991185 20991373
+ENSE00001685429 9042006 9042220
+ENSE00001591638 9039003 9039209
+ENSE00001627602 9036701 9036890
+ENSE00001792305 9031201 9031324
+ENSE00001757037 9027797 9028681
+ENSE00001684075 9027238 9027361
+ENSE00001723160 19737527 19737712
+ENSE00001611031 19736501 19736641
+ENSE00001786310 19736203 19736318
+ENSE00001774959 19735972 19736125
+ENSE00001686516 19735744 19735868
+ENSE00001719847 19735027 19735099
+ENSE00001637326 19734685 19734776
+ENSE00001667900 19733897 19733995
+ENSE00001663539 19733340 19733419
+ENSE00001684864 19733139 19733262
+ENSE00001801245 6134634 6135110
+ENSE00001673942 6135717 6135794
+ENSE00001616659 6135923 6136033
+ENSE00001720897 6136133 6136275
+ENSE00001765257 6136383 6136471
+ENSE00001690000 6137170 6137316
+ENSE00001716389 9748407 9748463
+ENSE00001712560 9748577 9748722
+ENSE00001616878 9749263 9749571
+ENSE00001763631 27632787 27633012
+ENSE00001727939 27633257 27633469
+ENSE00003523445 26360936 26360978
+ENSE00003487606 26360671 26360749
+ENSE00003671977 26359202 26359551
+ENSE00003601792 26358846 26358903
+ENSE00003665075 26357415 26357489
+ENSE00003544401 26356906 26356994
+ENSE00003690828 26356462 26356650
+ENSE00003689528 26356280 26356372
+ENSE00003469618 26356114 26356197
+ENSE00002089038 26356197 26356279
+ENSE00001747437 7938643 7938764
+ENSE00001676711 7936937 7938159
+ENSE00001663038 20790859 20790963
+ENSE00001745094 20788906 20789042
+ENSE00001639312 20783952 20784123
+ENSE00001691121 20771224 20771330
+ENSE00001731541 20769344 20769471
+ENSE00001801724 20767634 20767707
+ENSE00001637732 20765910 20766045
+ENSE00001620479 20764913 20765042
+ENSE00001596214 20764267 20764378
+ENSE00001632411 20761593 20762219
+ENSE00001598287 20761106 20761217
+ENSE00003465401 20750492 20750592
+ENSE00003603784 20749298 20749434
+ENSE00001607039 20745796 20745946
+ENSE00001725327 20744540 20744606
+ENSE00001768509 20743949 20744242
+ENSE00003484855 27629055 27629777
+ENSE00003584721 27632781 27632852
+ENSE00001805078 26796955 26797681
+ENSE00001651627 7859028 7859847
+ENSE00001897973 15863536 15863885
+ENSE00003707476 15902799 15902846
+ENSE00001833379 15970382 15970604
+ENSE00001943337 15980947 15981130
+ENSE00001869013 15983562 15983678
+ENSE00001819899 15999138 15999343
+ENSE00001871038 16014754 16014848
+ENSE00001853509 16017561 16017731
+ENSE00001855011 16018706 16018926
+ENSE00001869938 16020052 16020193
+ENSE00001858305 16027646 16027704
+ENSE00001665626 15863673 15863885
+ENSE00001700678 15864974 15865039
+ENSE00003705410 15866520 15866669
+ENSE00001708493 15966324 15966457
+ENSE00001755301 15970397 15970474
+ENSE00001834814 15864978 15865039
+ENSE00001810485 15970397 15970526
+ENSE00001921850 15971548 15971701
+ENSE00001942280 15980947 15981008
+ENSE00001957709 15983513 15983586
+ENSE00001593220 20835919 20835930
+ENSE00001799029 20836103 20836201
+ENSE00001736209 20837256 20837458
+ENSE00001597754 20840328 20840395
+ENSE00001645084 20840484 20840514
+ENSE00001630093 20853209 20853313
+ENSE00001777109 20855130 20855266
+ENSE00001788874 20860049 20860220
+ENSE00001795527 20872844 20872950
+ENSE00001784531 20874704 20874802
+ENSE00001748450 20876467 20876540
+ENSE00001604424 20878129 20878264
+ENSE00001711949 20879132 20879261
+ENSE00001747846 20879796 20879907
+ENSE00001683758 20881955 20882582
+ENSE00001729014 20882957 20883068
+ENSE00003703880 20893583 20893683
+ENSE00003706916 20894741 20894877
+ENSE00001642045 20898229 20898379
+ENSE00001673682 20899569 20899635
+ENSE00001602445 20899933 20899975
+ENSE00001773752 20891768 20891909
+ENSE00003709828 20894045 20894158
+ENSE00003684405 20899941 20901083
+ENSE00002305110 20893577 20893683
+ENSE00003584068 20894741 20894877
+ENSE00003459679 20899941 20901083
+ENSE00001731860 24064067 24064189
+ENSE00001635505 24062093 24062205
+ENSE00001802644 24061513 24061616
+ENSE00003593123 24056248 24056400
+ENSE00003642844 24054123 24054237
+ENSE00003666042 24053057 24053167
+ENSE00003674429 24052505 24052615
+ENSE00003502244 24051960 24052070
+ENSE00003616658 24050928 24051038
+ENSE00003603776 24050753 24050841
+ENSE00003548786 24049765 24050341
+ENSE00001683783 24064067 24064214
+ENSE00001619375 24058316 24058493
+ENSE00003468556 24056248 24056400
+ENSE00003528506 24054123 24054237
+ENSE00003480174 24053057 24053167
+ENSE00003577506 24052505 24052615
+ENSE00003482521 24051960 24052070
+ENSE00003559850 24050928 24051038
+ENSE00003547647 24050753 24050841
+ENSE00003613569 24049765 24050341
+ENSE00001794358 24064067 24064174
+ENSE00003465760 24026223 24026799
+ENSE00002228703 24064067 24064127
+ENSE00002309932 24049997 24050341
+ENSE00001643289 23671964 23672356
+ENSE00001615364 23670909 23670974
+ENSE00001713521 23670582 23670645
+ENSE00001786811 23670350 23670495
+ENSE00001765379 23670185 23670232
+ENSE00001908333 26357107 26357382
+ENSE00001666674 26329421 26329652
+ENSE00001796018 26328965 26329169
+ENSE00001784489 26325463 26325515
+ENSE00001881074 23673224 23673371
+ENSE00003564737 23675233 23675345
+ENSE00003500350 23675820 23675923
+ENSE00001799218 23678943 23679120
+ENSE00003552000 23681036 23681188
+ENSE00003596670 23683199 23683313
+ENSE00003464309 23684269 23684379
+ENSE00003633721 23684821 23684931
+ENSE00003674912 23685366 23685476
+ENSE00003567291 23686398 23686508
+ENSE00003679647 23686596 23686684
+ENSE00003617149 23687096 23687672
+ENSE00001661847 23673249 23673371
+ENSE00003675456 23681036 23681188
+ENSE00003499970 23683199 23683313
+ENSE00003586486 23684269 23684379
+ENSE00003643888 23684821 23684931
+ENSE00003501027 23685366 23685476
+ENSE00003669209 23686398 23686508
+ENSE00003536838 23686596 23686684
+ENSE00003662366 23687096 23687672
+ENSE00001659203 6175751 6175852
+ENSE00001709097 6175939 6176078
+ENSE00001731355 6177541 6177628
+ENSE00001752741 24455006 24455117
+ENSE00001595873 24457007 24457119
+ENSE00002550304 24457542 24457645
+ENSE00001190571 24460640 24462352
+ENSE00001792479 24457011 24457119
+ENSE00001683633 24460640 24460817
+ENSE00001618747 24462742 24462894
+ENSE00001598516 24464930 24465044
+ENSE00001800192 24466510 24466611
+ENSE00001753847 24466977 24467087
+ENSE00001642884 24467168 24467256
+ENSE00001598101 24467671 24467972
+ENSE00001629733 9873804 9874161
+ENSE00001690028 9875508 9875576
+ENSE00001791094 9875672 9875817
+ENSE00001700224 9875928 9875984
+ENSE00001859492 14394177 14394465
+ENSE00001686036 25163210 25163968
+ENSE00002094213 26332656 26333378
+ENSE00002123033 26329581 26329652
+ENSE00001780671 7781463 7782131
+ENSE00001751822 24040526 24040648
+ENSE00001612685 24038552 24038664
+ENSE00001763210 24037972 24038075
+ENSE00003654373 24032707 24032859
+ENSE00003581142 24030582 24030696
+ENSE00003526911 24029516 24029626
+ENSE00003486158 24028964 24029074
+ENSE00003580424 24028419 24028529
+ENSE00003605671 24027387 24027497
+ENSE00003601536 24027211 24027299
+ENSE00003624241 24026223 24026799
+ENSE00001636167 24040526 24040673
+ENSE00001665354 24034775 24034952
+ENSE00003494048 24032707 24032859
+ENSE00003647885 24030582 24030696
+ENSE00003646572 24029516 24029626
+ENSE00003489479 24028964 24029074
+ENSE00003670003 24028419 24028529
+ENSE00003588956 24027387 24027497
+ENSE00003484557 24027211 24027299
+ENSE00003612403 24026223 24026799
+ENSE00001744906 24038552 24038660
+ENSE00002215963 24040526 24040586
+ENSE00002316782 24026455 24026799
+ENSE00001592980 20290496 20290830
+ENSE00001733820 20298015 20298506
+ENSE00002317106 20297335 20298306
+ENSE00002237968 20298308 20298913
+ENSE00002019278 20952595 20952937
+ENSE00002070232 26424828 26425034
+ENSE00002079738 26429209 26429377
+ENSE00002059695 26437416 26437493
+ENSE00002051610 20941185 20941313
+ENSE00002083133 20938702 20939447
+ENSE00002028839 20938165 20938289
+ENSE00002069932 20817932 20818076
+ENSE00002063868 20817406 20817491
+ENSE00002057796 20655961 20656181
+ENSE00002048409 20655067 20655191
+ENSE00002073771 20654896 20654988
+ENSE00002045000 20652801 20652991
+ENSE00002068415 20548860 20548916
+ENSE00002086946 20549222 20549330
+ENSE00002062200 20549411 20549520
+ENSE00002049899 20550014 20551037
+ENSE00002077940 6113013 6113324
+ENSE00001694957 6111594 6111684
+ENSE00003675853 6111336 6111481
+ENSE00002079014 19888946 19889280
+ENSE00002054858 19880996 19881760
+ENSE00001426978 19881469 19882440
+ENSE00001427537 19880862 19881467
+ENSE00002086052 20702866 20702995
+ENSE00002030784 20704732 20705478
+ENSE00002046435 20705890 20706014
+ENSE00002021560 6111568 6111670
+ENSE00003683409 6111336 6111481
+ENSE00002051027 6109809 6109902
+ENSE00002079391 20653626 20653703
+ENSE00002055245 20662506 20662937
+ENSE00002063141 20691526 20691723
+ENSE00002077751 20692374 20692482
+ENSE00002066035 20708919 20709584
+ENSE00002036613 24041541 24041925
+ENSE00002083771 24042923 24043007
+ENSE00002040343 24043252 24043330
+ENSE00002082814 24043416 24043547
+ENSE00002059714 24043665 24043706
+ENSE00002088173 10037764 10037915
+ENSE00002088234 9928019 9928137
+ENSE00002088264 28393531 28393668
+ENSE00002088309 2652790 2652894
+ENSE00002088347 5742287 5742379
+ENSE00002088385 26092765 26092918
+ENSE00002088393 25569246 25569383
+ENSE00002088421 20508009 20508124
+ENSE00002088434 10033981 10034093
+ENSE00002088438 20995615 20995776
+ENSE00002088464 7209572 7209683
+ENSE00002088480 19671654 19671769
+ENSE00002088506 19734941 19735035
+ENSE00002088527 5441969 5442060
+ENSE00002088623 7246713 7246820
+ENSE00002088634 20278734 20278893
+ENSE00002088641 18360815 18360921
+ENSE00002088677 20648397 20648558
+ENSE00002088757 9930484 9930602
+ENSE00002088783 19669744 19669856
+ENSE00002088791 18448163 18448269
+ENSE00002088894 27869489 27869642
+ENSE00002088907 20444739 20444833
+ENSE00002088936 4887117 4887307
+ENSE00002088939 27606157 27606239
+ENSE00002088940 7291095 7291199
+ENSE00002088981 19900883 19901042
+ENSE00002089054 14482123 14482230
+ENSE00002089093 28075126 28075289
+ENSE00002089101 7192338 7192636
+ENSE00002089132 18174646 18174709
+ENSE00002089135 20509921 20510033
+ENSE00002089149 25887089 25887252
+ENSE00002089157 18250128 18250259
+ENSE00002089162 15779840 15779936
+ENSE00002089234 21180869 21180973
+ENSE00002089323 13340551 13340633
+ENSE00002089368 4043026 4043131
+ENSE00002089416 28507136 28507239
+ENSE00002198122 23206485 23206610
+ENSE00002174318 23204155 23204255
+ENSE00002153935 23200175 23200930
+ENSE00002532814 13462594 13463857
+ENSE00002532407 13488005 13489271
+ENSE00001685809 9304564 9305095
+ENSE00001706012 9305703 9305780
+ENSE00001701882 9305909 9306020
+ENSE00003684179 9306122 9306267
+ENSE00003459001 9306374 9306455
+ENSE00003593931 9307148 9307357
+ENSE00002289488 9236076 9236312
+ENSE00002299255 9304829 9305095
+ENSE00002495525 13477233 13478499
+ENSE00002516975 13470597 13471863
+ENSE00002541336 24476599 24477171
+ENSE00002560807 24477716 24478647
+ENSE00002607126 21853827 21856492
+ENSE00002699100 13340359 13340440
+ENSE00002716056 13947443 13947512
+ENSE00002692672 18398127 18398238
+ENSE00002709597 16364065 16364171
+ENSE00002923801 21760074 21760643
+ENSE00002984896 25847479 25847492
+ENSE00003022139 25850384 25850398
+ENSE00003135789 25850559 25850592
+ENSE00003159256 21738045 21738068
+ENSE00003053921 21737895 21737969
+ENSE00003090195 15418467 15418484
+ENSE00003068265 15418626 15418711
+ENSE00003133967 15428905 15428910
+ENSE00003158951 15429021 15429063
+ENSE00003052018 15429068 15429181
+ENSE00003105168 28114876 28114889
+ENSE00002994181 28111970 28111984
+ENSE00003035759 28111776 28111809
+ENSE00003089533 5312577 5312605
+ENSE00003100810 5306691 5306703
+ENSE00001782073 9345205 9345423
+ENSE00001705189 9345442 9345707
+ENSE00001641329 9346315 9346392
+ENSE00001803456 9346521 9346632
+ENSE00001739988 9346734 9346879
+ENSE00001717366 9346986 9347067
+ENSE00001761223 9347762 9347784
+ENSE00003671391 13551375 13551415
+ENSE00003551235 13551547 13552752
+ENSE00003639402 13491303 13491471
+ENSE00003668884 13491863 13493369
+ENSE00003649651 27809047 27809373
+ENSE00003691087 13263395 13263563
+ENSE00003621115 23793569 23793901
+ENSE00003657297 13263065 13263272
+ENSE00003625313 13262741 13262939
+ENSE00003638565 13629856 13629913
+ENSE00003555471 13629403 13629733
+ENSE00003625865 19576759 19577094
+ENSE00003697854 27605054 27605329
diff --git a/inst/chrY/ens_gene.txt b/inst/chrY/ens_gene.txt
new file mode 100644
index 0000000..af0be16
--- /dev/null
+++ b/inst/chrY/ens_gene.txt
@@ -0,0 +1,496 @@
+gene_id gene_name entrezid gene_biotype gene_seq_start gene_seq_end seq_name seq_strand seq_coord_system
+ENSG00000012817 KDM5D 8284 protein_coding 21865751 21906825 Y -1 chromosome
+ENSG00000067048 DDX3Y 8653 protein_coding 15016019 15032390 Y 1 chromosome
+ENSG00000067646 ZFY 7544 protein_coding 2803112 2850547 Y 1 chromosome
+ENSG00000092377 TBL1Y 90665 protein_coding 6778727 6959724 Y 1 chromosome
+ENSG00000099715 PCDH11Y 83259;27328 protein_coding 4868267 5610265 Y 1 chromosome
+ENSG00000099721 AMELY 266 protein_coding 6733959 6742068 Y -1 chromosome
+ENSG00000099725 PRKY 5616 pseudogene 7142013 7249589 Y 1 chromosome
+ENSG00000114374 USP9Y 8287 protein_coding 14813160 14972764 Y 1 chromosome
+ENSG00000129816 TTTY1B 50858;100101116 lincRNA 6258472 6279605 Y 1 chromosome
+ENSG00000129824 RPS4Y1 6192 protein_coding 2709527 2800041 Y 1 chromosome
+ENSG00000129845 TTTY1 50858;100101116 lincRNA 9590765 9611898 Y -1 chromosome
+ENSG00000129862 VCY1B 9084;353513 protein_coding 16168097 16168838 Y 1 chromosome
+ENSG00000129864 VCY 9084;353513 protein_coding 16097652 16098393 Y -1 chromosome
+ENSG00000129873 CDY2B 9426;203611 protein_coding 19989290 19992100 Y -1 chromosome
+ENSG00000131002 TXLNG2P 246126 pseudogene 21729199 21768160 Y 1 chromosome
+ENSG00000131007 TTTY9B 83864;425057 antisense 20743092 20752407 Y -1 chromosome
+ENSG00000131538 TTTY6 84672;441543 lincRNA 24585740 24587605 Y -1 chromosome
+ENSG00000131548 TTTY6B 84672;441543 antisense 24291113 24292978 Y 1 chromosome
+ENSG00000147753 TTTY7 246122 lincRNA 6317509 6325947 Y 1 chromosome
+ENSG00000147761 TTTY7B 100101120 lincRNA 9544433 9552871 Y -1 chromosome
+ENSG00000154620 TMSB4Y 9087 protein_coding 15815447 15817904 Y 1 chromosome
+ENSG00000157828 RPS4Y2 140032 protein_coding 22918050 22942918 Y 1 chromosome
+ENSG00000165246 NLGN4Y 22829 protein_coding 16634518 16957530 Y 1 chromosome
+ENSG00000168757 TSPY2 64591 protein_coding 6114264 6117060 Y 1 chromosome
+ENSG00000169763 PRYP3 pseudogene 25827587 25840726 Y -1 chromosome
+ENSG00000169789 PRY 442862;100509646;9081 protein_coding 24636544 24660784 Y 1 chromosome
+ENSG00000169800 RBMY1F 159163;378951 protein_coding 24314689 24329129 Y -1 chromosome
+ENSG00000169807 PRY2 442862;100509646;9081 protein_coding 24217903 24242154 Y -1 chromosome
+ENSG00000169811 RBMY1HP pseudogene 23655405 23663854 Y 1 chromosome
+ENSG00000169849 TSPY14P pseudogene 23630044 23632569 Y 1 chromosome
+ENSG00000169953 HSFY2 159119;86614 protein_coding 20893326 20990548 Y -1 chromosome
+ENSG00000172283 PRYP4 pseudogene 28121660 28134216 Y 1 chromosome
+ENSG00000172288 CDY1 9085;253175 protein_coding 27768264 27771049 Y 1 chromosome
+ENSG00000172294 CSPG4P4Y 114758 pseudogene 27624416 27632852 Y 1 chromosome
+ENSG00000172297 GOLGA2P3Y 401634;84559 pseudogene 27600708 27606719 Y 1 chromosome
+ENSG00000172332 AC012005.2 401634;84559 pseudogene 26355714 26361728 Y -1 chromosome
+ENSG00000172342 CSPG4P3Y pseudogene 26332656 26338017 Y -1 chromosome
+ENSG00000172352 CDY1B 9085;253175 protein_coding 26191376 26194166 Y -1 chromosome
+ENSG00000172468 HSFY1 159119;86614 protein_coding 20708557 20750849 Y 1 chromosome
+ENSG00000173357 AC007967.3 pseudogene 8774265 8784114 Y 1 chromosome
+ENSG00000176679 TGIF2LY 90655 protein_coding 3447082 3448082 Y 1 chromosome
+ENSG00000176728 TTTY14 55410;83869 lincRNA 21034387 21239302 Y -1 chromosome
+ENSG00000180910 TTTY11 83866 lincRNA 8651351 8685423 Y -1 chromosome
+ENSG00000182415 CDY2A 9085;9426;203611;253175 protein_coding 20137667 20140477 Y 1 chromosome
+ENSG00000183146 CYorf17 100533178 protein_coding 23544840 23548246 Y -1 chromosome
+ENSG00000183385 TTTY8 84673;100101118 lincRNA 9528709 9531566 Y 1 chromosome
+ENSG00000183704 SLC9B1P1 protein_coding 13496241 13524717 Y -1 chromosome
+ENSG00000183753 BPY2 442867;9083;442868 protein_coding 25119966 25151612 Y 1 chromosome
+ENSG00000183795 BPY2B 442867;9083;442868 protein_coding 26753707 26785354 Y 1 chromosome
+ENSG00000183878 UTY 7404 protein_coding 15360259 15592553 Y -1 chromosome
+ENSG00000184895 SRY 6736 protein_coding 2654896 2655740 Y -1 chromosome
+ENSG00000184991 TTTY13 83868 lincRNA 23745486 23756552 Y -1 chromosome
+ENSG00000185275 CD24P4 pseudogene 21154353 21154595 Y -1 chromosome
+ENSG00000185700 TTTY8B 84673;100101118 lincRNA 6338814 6341671 Y -1 chromosome
+ENSG00000185894 BPY2C 442867;9083;442868 protein_coding 27177048 27208695 Y -1 chromosome
+ENSG00000187191 DAZ3 57054;57055 protein_coding 26909216 26959626 Y -1 chromosome
+ENSG00000187657 TSPY13P pseudogene 9743204 9745748 Y -1 chromosome
+ENSG00000188120 DAZ1 57135;1617 protein_coding 25275502 25345241 Y -1 chromosome
+ENSG00000188399 ANKRD36P1 pseudogene 28555962 28566682 Y 1 chromosome
+ENSG00000188656 TSPY7P pseudogene 9215731 9218480 Y 1 chromosome
+ENSG00000197038 RBMY1A3P 286557 pseudogene 9148739 9160483 Y -1 chromosome
+ENSG00000197092 GOLGA6L16P pseudogene 27641798 27648105 Y -1 chromosome
+ENSG00000198692 EIF1AY 9086;101060318 protein_coding 22737611 22755040 Y 1 chromosome
+ENSG00000205916 DAZ4 57055;57135;1617 protein_coding 26980008 27053183 Y 1 chromosome
+ENSG00000205936 PPP1R12BP2 pseudogene 25525631 25538844 Y 1 chromosome
+ENSG00000205944 DAZ2 57055;57135;1617 protein_coding 25365594 25437503 Y 1 chromosome
+ENSG00000206159 GYG2P1 352887 pseudogene 14475147 14532255 Y -1 chromosome
+ENSG00000212855 TTTY2 lincRNA 9578193 9596085 Y 1 chromosome
+ENSG00000212856 TTTY2B lincRNA 6274285 6292186 Y -1 chromosome
+ENSG00000214207 KRT18P10 pseudogene 5441186 5442472 Y 1 chromosome
+ENSG00000215414 PSMA6P1 5687 pseudogene 15398518 15399258 Y 1 chromosome
+ENSG00000215506 TPTE2P4 pseudogene 28654360 28725837 Y 1 chromosome
+ENSG00000215507 RBMY2DP pseudogene 28269821 28279455 Y 1 chromosome
+ENSG00000215537 ZNF736P11Y pseudogene 25012691 25013903 Y -1 chromosome
+ENSG00000215540 AC009947.3 pseudogene 25683455 25693089 Y -1 chromosome
+ENSG00000215560 TTTY5 83863 lincRNA 24442945 24445023 Y -1 chromosome
+ENSG00000215580 BCORP1 286554 pseudogene 21617317 21665039 Y -1 chromosome
+ENSG00000215583 ASS1P6 pseudogene 14043242 14044475 Y -1 chromosome
+ENSG00000215601 TSPY24P pseudogene 8148239 8150250 Y 1 chromosome
+ENSG00000215603 ZNF92P1Y pseudogene 7721727 7722917 Y -1 chromosome
+ENSG00000216777 PRRC2CP1 pseudogene 3417693 3417851 Y -1 chromosome
+ENSG00000216824 ZNF736P10Y pseudogene 8286832 8287941 Y -1 chromosome
+ENSG00000216844 AC009494.3 pseudogene 23005142 23005596 Y -1 chromosome
+ENSG00000217179 MTCYBP2 pseudogene 21033988 21034158 Y 1 chromosome
+ENSG00000217896 ZNF839P1 pseudogene 21147061 21148284 Y 1 chromosome
+ENSG00000218410 AC012078.2 pseudogene 3719265 3720910 Y -1 chromosome
+ENSG00000223362 CDY15P pseudogene 26001669 26003238 Y -1 chromosome
+ENSG00000223406 XKRYP5 pseudogene 27898535 27899017 Y 1 chromosome
+ENSG00000223407 USP9YP18 pseudogene 25805613 25813969 Y -1 chromosome
+ENSG00000223422 AC007274.2 pseudogene 7546940 7549069 Y 1 chromosome
+ENSG00000223517 AC010723.1 lincRNA 16388092 16389369 Y -1 chromosome
+ENSG00000223555 USP9YP23 pseudogene 19894090 19896695 Y 1 chromosome
+ENSG00000223600 EEF1A1P41 pseudogene 2863108 2863314 Y -1 chromosome
+ENSG00000223636 UBE2Q2P5Y pseudogene 27577303 27583462 Y 1 chromosome
+ENSG00000223637 RBMY2EP 159125 pseudogene 23556877 23563471 Y -1 chromosome
+ENSG00000223641 TTTY17C 474152;474151;252949 lincRNA 27329790 27330920 Y -1 chromosome
+ENSG00000223655 RAB9AP3 pseudogene 28064470 28065042 Y -1 chromosome
+ENSG00000223698 GOLGA6L11P pseudogene 26314334 26320641 Y 1 chromosome
+ENSG00000223744 RBMY2GP pseudogene 6196093 6211364 Y 1 chromosome
+ENSG00000223856 RAB9AP2 pseudogene 27856055 27859704 Y -1 chromosome
+ENSG00000223915 DPPA2P1 pseudogene 15342765 15343320 Y 1 chromosome
+ENSG00000223955 MTND6P1 pseudogene 8231577 8232000 Y -1 chromosome
+ENSG00000223978 ZNF736P1Y pseudogene 27131348 27132623 Y -1 chromosome
+ENSG00000224033 CDY8P pseudogene 20134179 20135788 Y 1 chromosome
+ENSG00000224035 SFPQP1 pseudogene 15195704 15207719 Y 1 chromosome
+ENSG00000224060 ARSEP1 pseudogene 14460540 14468226 Y 1 chromosome
+ENSG00000224075 TTTY22 252954 lincRNA 9638762 9650854 Y 1 chromosome
+ENSG00000224151 USP9YP28 pseudogene 20994888 21001199 Y -1 chromosome
+ENSG00000224166 PRYP1 pseudogene 19841408 19856476 Y -1 chromosome
+ENSG00000224169 HSFY6P pseudogene 25803986 25805205 Y -1 chromosome
+ENSG00000224210 TRIM60P5Y pseudogene 26640856 26641730 Y -1 chromosome
+ENSG00000224240 CYCSP49 pseudogene 28695572 28695890 Y 1 chromosome
+ENSG00000224336 FAM197Y1 protein_coding 9374241 9384693 Y -1 chromosome
+ENSG00000224408 USP9YP22 pseudogene 9021303 9023412 Y 1 chromosome
+ENSG00000224482 HSFY4P pseudogene 22970662 22973695 Y -1 chromosome
+ENSG00000224485 USP9YP7 pseudogene 20020635 20022686 Y 1 chromosome
+ENSG00000224518 AC006989.2 pseudogene 17053626 17054595 Y 1 chromosome
+ENSG00000224571 USP9YP13 pseudogene 26066052 26067657 Y 1 chromosome
+ENSG00000224634 ZNF736P6Y pseudogene 8218972 8220184 Y 1 chromosome
+ENSG00000224657 RBMY2BP pseudogene 24795392 24805025 Y 1 chromosome
+ENSG00000224827 LINC00265-2P pseudogene 26422392 26423173 Y 1 chromosome
+ENSG00000224866 USP9YP25 pseudogene 25880340 25882396 Y 1 chromosome
+ENSG00000224873 CDY13P pseudogene 24666828 24667842 Y -1 chromosome
+ENSG00000224917 AC016694.2 pseudogene 24816763 24819006 Y -1 chromosome
+ENSG00000224953 SRIP3 pseudogene 6587003 6587221 Y -1 chromosome
+ENSG00000224964 TRAPPC2P3 pseudogene 19909863 19912576 Y -1 chromosome
+ENSG00000224989 FAM41AY1 340618;100302526 lincRNA 19612838 19626898 Y 1 chromosome
+ENSG00000225117 ARSDP1 pseudogene 14474827 14499123 Y 1 chromosome
+ENSG00000225189 REREP1Y pseudogene 25608612 25618427 Y 1 chromosome
+ENSG00000225256 TRAPPC2P5 pseudogene 27845598 27848709 Y 1 chromosome
+ENSG00000225287 OFD1P13Y pseudogene 27821207 27843493 Y -1 chromosome
+ENSG00000225326 USP9YP19 pseudogene 27894743 27896353 Y -1 chromosome
+ENSG00000225466 OFD1P10Y pseudogene 25727742 25760787 Y 1 chromosome
+ENSG00000225491 UBE2Q2P4Y pseudogene 26378973 26385131 Y -1 chromosome
+ENSG00000225516 AC006156.1 protein_coding 9293012 9344176 Y -1 chromosome
+ENSG00000225520 TTTY16 252948 lincRNA 7567398 7569288 Y -1 chromosome
+ENSG00000225560 FAM197Y8 252946;100287826;100289150 antisense 9185120 9193010 Y -1 chromosome
+ENSG00000225609 CDY20P pseudogene 27764791 27766400 Y 1 chromosome
+ENSG00000225615 RBMY2UP pseudogene 24344576 24349859 Y 1 chromosome
+ENSG00000225624 TBL1YP1 pseudogene 22889140 22904786 Y 1 chromosome
+ENSG00000225653 RNF19BPY pseudogene 3646038 3647587 Y -1 chromosome
+ENSG00000225685 TSPY5P pseudogene 9904163 9906760 Y -1 chromosome
+ENSG00000225716 TCEB1P13 pseudogene 20602683 20603011 Y 1 chromosome
+ENSG00000225740 TCEB1P6 pseudogene 19949081 19949408 Y -1 chromosome
+ENSG00000225809 RBMY2KP pseudogene 8132490 8145229 Y -1 chromosome
+ENSG00000225840 AC010970.2 pseudogene 10036113 10036711 Y -1 chromosome
+ENSG00000225876 AC024067.1 pseudogene 28390498 28390720 Y -1 chromosome
+ENSG00000225878 SERBP1P2 pseudogene 4669726 4670889 Y -1 chromosome
+ENSG00000225895 TRAPPC2P8 pseudogene 20267200 20269988 Y 1 chromosome
+ENSG00000225896 AC007742.3 pseudogene 19838632 19838960 Y 1 chromosome
+ENSG00000226011 OFD1P1Y pseudogene 19921687 19937209 Y 1 chromosome
+ENSG00000226042 CDY10P pseudogene 23799361 23800927 Y -1 chromosome
+ENSG00000226061 PCMTD1P1 pseudogene 10011462 10011816 Y 1 chromosome
+ENSG00000226092 RBMY2AP pseudogene 24073668 24083297 Y -1 chromosome
+ENSG00000226116 USP9YP6 pseudogene 20024579 20033510 Y -1 chromosome
+ENSG00000226223 TSPY16P pseudogene 9477858 9478325 Y -1 chromosome
+ENSG00000226270 ZNF736P2Y pseudogene 26829804 26831082 Y 1 chromosome
+ENSG00000226353 TAF9P1 pseudogene 19740393 19741282 Y 1 chromosome
+ENSG00000226362 FAM41AY2 340618;100302526 lincRNA 20552880 20566932 Y -1 chromosome
+ENSG00000226369 USP9YP11 pseudogene 26081100 26086248 Y 1 chromosome
+ENSG00000226449 CDY5P pseudogene 19833478 19835057 Y 1 chromosome
+ENSG00000226504 TMEM167AP1 pseudogene 23069230 23069448 Y -1 chromosome
+ENSG00000226529 MTND1P1 pseudogene 8239704 8240071 Y 1 chromosome
+ENSG00000226555 AGKP1 pseudogene 16750976 16752238 Y 1 chromosome
+ENSG00000226611 OFD1P2Y pseudogene 20242567 20258088 Y -1 chromosome
+ENSG00000226863 SHROOM2P1 pseudogene 14649708 14656818 Y 1 chromosome
+ENSG00000226873 CDY14P pseudogene 25820526 25821539 Y 1 chromosome
+ENSG00000226906 TTTY4 474149;474150;114761 lincRNA 25082602 25119431 Y 1 chromosome
+ENSG00000226918 AC010086.1 pseudogene 23473154 23480781 Y 1 chromosome
+ENSG00000226941 RBMY1J 378949;5940;378950;159163;378948;378951 protein_coding 24454970 24564028 Y 1 chromosome
+ENSG00000226975 AC006987.6 pseudogene 10007397 10007923 Y -1 chromosome
+ENSG00000227166 STSP1 pseudogene 17659096 17705211 Y 1 chromosome
+ENSG00000227204 RBMY2JP pseudogene 7995171 8012244 Y 1 chromosome
+ENSG00000227251 TRIM60P9Y pseudogene 25204177 25205333 Y 1 chromosome
+ENSG00000227289 HSFY3P pseudogene 2749724 2751693 Y -1 chromosome
+ENSG00000227439 TTTY17B 474152;474151;252949 lincRNA 26631479 26632610 Y 1 chromosome
+ENSG00000227444 AC007322.5 pseudogene 24065082 24067253 Y 1 chromosome
+ENSG00000227447 XGPY pseudogene 14551845 14619171 Y -1 chromosome
+ENSG00000227494 USP9YP14 pseudogene 20642973 20649286 Y 1 chromosome
+ENSG00000227629 SLC25A15P1 pseudogene 28732789 28737748 Y -1 chromosome
+ENSG00000227633 RBMY2YP pseudogene 27412731 27419678 Y 1 chromosome
+ENSG00000227635 USP9YP21 pseudogene 28079980 28082028 Y -1 chromosome
+ENSG00000227830 TRIM60P3Y pseudogene 8281078 8282095 Y -1 chromosome
+ENSG00000227837 TRIM60P11Y pseudogene 26837938 26839079 Y 1 chromosome
+ENSG00000227867 TCEB1P11 pseudogene 28137388 28137719 Y -1 chromosome
+ENSG00000227871 USP9YP12 pseudogene 26092074 26098440 Y -1 chromosome
+ENSG00000227915 TCEB1P16 pseudogene 25824651 25824982 Y 1 chromosome
+ENSG00000227949 CYCSP46 pseudogene 17053427 17053630 Y -1 chromosome
+ENSG00000227989 AC010141.8 pseudogene 23717738 23719956 Y -1 chromosome
+ENSG00000228193 TCEB1P14 pseudogene 20949636 20949966 Y 1 chromosome
+ENSG00000228207 AC007967.2 pseudogene 8769244 8771317 Y -1 chromosome
+ENSG00000228240 TTTY17A 474152;474151;252949 lincRNA 24997731 24998862 Y 1 chromosome
+ENSG00000228257 AC010141.6 pseudogene 23693725 23695897 Y -1 chromosome
+ENSG00000228296 TTTY4C 474149;474150;114761 lincRNA 27209230 27246039 Y -1 chromosome
+ENSG00000228379 AC010891.2 lincRNA 9650924 9655122 Y 1 chromosome
+ENSG00000228383 FAM197Y7 252946;100287826;100289150 antisense 9205412 9235047 Y -1 chromosome
+ENSG00000228411 CDY4P pseudogene 14913906 14915763 Y -1 chromosome
+ENSG00000228465 TRAPPC2P10 pseudogene 26113711 26116841 Y -1 chromosome
+ENSG00000228518 SURF6P1 pseudogene 19304706 19305532 Y 1 chromosome
+ENSG00000228571 HSFY7P pseudogene 24683194 24683996 Y 1 chromosome
+ENSG00000228578 TUBB1P2 pseudogene 19628716 19630918 Y -1 chromosome
+ENSG00000228764 ZNF885P pseudogene 22551937 22558682 Y 1 chromosome
+ENSG00000228786 LINC00266-4P lincRNA 27524447 27540866 Y -1 chromosome
+ENSG00000228787 NLGN4Y-AS1 100874056 antisense 16905522 16915913 Y -1 chromosome
+ENSG00000228850 CDY12P pseudogene 24210845 24211859 Y 1 chromosome
+ENSG00000228890 TTTY21 252953;100101115 lincRNA 9555262 9558905 Y -1 chromosome
+ENSG00000228927 TSPY3 728137;100289087;7258 protein_coding 9236030 9307357 Y 1 chromosome
+ENSG00000228945 CLUHP2 pseudogene 20442064 20446647 Y 1 chromosome
+ENSG00000229129 ACTG1P2 pseudogene 19868881 19870005 Y -1 chromosome
+ENSG00000229138 CDY6P pseudogene 19993979 19995588 Y -1 chromosome
+ENSG00000229159 TSPY23P pseudogene 24329997 24332168 Y 1 chromosome
+ENSG00000229163 NAP1L1P2 pseudogene 2797042 2799161 Y -1 chromosome
+ENSG00000229208 RBMY2NP pseudogene 9669027 9684305 Y -1 chromosome
+ENSG00000229234 RBMY1KP pseudogene 24355478 24362727 Y 1 chromosome
+ENSG00000229236 TTTY10 246119 lincRNA 22627554 22681114 Y -1 chromosome
+ENSG00000229238 PPP1R12BP1 pseudogene 28424070 28500565 Y -1 chromosome
+ENSG00000229250 USP9YP31 pseudogene 26227851 26236493 Y -1 chromosome
+ENSG00000229302 TAF9P2 pseudogene 20438493 20439381 Y -1 chromosome
+ENSG00000229308 AC010084.1 lincRNA 3904538 3968361 Y 1 chromosome
+ENSG00000229343 CDY22P pseudogene 27959160 27960713 Y 1 chromosome
+ENSG00000229406 OFD1P4Y pseudogene 20615036 20634023 Y -1 chromosome
+ENSG00000229416 USP9YP8 pseudogene 23839132 23843004 Y -1 chromosome
+ENSG00000229465 ACTG1P11 pseudogene 20309770 20310894 Y 1 chromosome
+ENSG00000229518 UBE2V1P3 pseudogene 3734347 3734763 Y -1 chromosome
+ENSG00000229549 TSPY8 728403 protein_coding 9195406 9218479 Y 1 chromosome
+ENSG00000229551 GAPDHP17 pseudogene 23023223 23024223 Y 1 chromosome
+ENSG00000229553 USP9YP17 pseudogene 24195902 24204260 Y -1 chromosome
+ENSG00000229643 LINC00280 lincRNA 6225260 6229454 Y -1 chromosome
+ENSG00000229709 USP9YP36 pseudogene 27725937 27734591 Y 1 chromosome
+ENSG00000229725 AC007322.1 pseudogene 24017478 24019697 Y 1 chromosome
+ENSG00000229745 BPY2DP pseudogene 7801038 7805681 Y -1 chromosome
+ENSG00000229940 TSPY22P pseudogene 24452072 24454098 Y -1 chromosome
+ENSG00000230025 AC007967.4 pseudogene 8839792 8842136 Y -1 chromosome
+ENSG00000230029 CDY11P pseudogene 23858456 23860106 Y 1 chromosome
+ENSG00000230066 FAM197Y3 antisense 9334896 9342768 Y -1 chromosome
+ENSG00000230073 AC009947.2 pseudogene 25669468 25671516 Y 1 chromosome
+ENSG00000230377 TCEB1P7 pseudogene 20694212 20694542 Y -1 chromosome
+ENSG00000230412 TCEB1P12 pseudogene 20230369 20230696 Y 1 chromosome
+ENSG00000230458 GPM6BP1 pseudogene 20740303 20740446 Y 1 chromosome
+ENSG00000230476 OFD1P9Y pseudogene 24727136 24760228 Y -1 chromosome
+ENSG00000230663 FAM224B lincRNA 19664680 19691634 Y -1 chromosome
+ENSG00000230727 RBMY2WP pseudogene 24908998 24915934 Y -1 chromosome
+ENSG00000230814 USP9YP24 pseudogene 24674428 24682786 Y 1 chromosome
+ENSG00000230819 ZNF736P5Y pseudogene 27164646 27165450 Y -1 chromosome
+ENSG00000230854 USP9YP20 pseudogene 28069535 28075973 Y 1 chromosome
+ENSG00000230904 XKRYP2 pseudogene 20973224 20973715 Y -1 chromosome
+ENSG00000230977 AC023274.6 pseudogene 26328624 26329218 Y 1 chromosome
+ENSG00000231026 XKRYP4 pseudogene 26063388 26063870 Y -1 chromosome
+ENSG00000231141 TTTY3 114760;474148 lincRNA 27874637 27879535 Y 1 chromosome
+ENSG00000231159 OFD1P8Y pseudogene 24118460 24151496 Y 1 chromosome
+ENSG00000231311 PRYP2 pseudogene 20323252 20338370 Y 1 chromosome
+ENSG00000231341 VDAC1P6 pseudogene 5075256 5076110 Y -1 chromosome
+ENSG00000231375 CDY17P pseudogene 26196025 26197634 Y -1 chromosome
+ENSG00000231411 AC006987.7 pseudogene 10009199 10010334 Y -1 chromosome
+ENSG00000231423 RAB9AP5 pseudogene 26102716 26106365 Y 1 chromosome
+ENSG00000231436 RBMY3AP pseudogene 9448180 9458885 Y -1 chromosome
+ENSG00000231514 FAM58CP pseudogene 28772667 28773306 Y -1 chromosome
+ENSG00000231535 LINC00278 100873962 lincRNA 2870953 2970313 Y 1 chromosome
+ENSG00000231540 TCEB1P9 pseudogene 25954968 25955206 Y -1 chromosome
+ENSG00000231716 CDY23P pseudogene 28140831 28141844 Y -1 chromosome
+ENSG00000231874 TSPY18P pseudogene 9707273 9708390 Y 1 chromosome
+ENSG00000231988 OFD1P3Y pseudogene 8898635 8908029 Y 1 chromosome
+ENSG00000232003 ZNF736P12Y pseudogene 26646410 26647652 Y -1 chromosome
+ENSG00000232029 TCEB1P15 pseudogene 24214967 24215298 Y 1 chromosome
+ENSG00000232064 USP9YP33 pseudogene 27736470 27738492 Y -1 chromosome
+ENSG00000232195 TOMM22P2 pseudogene 2696023 2696259 Y 1 chromosome
+ENSG00000232205 CDY18P pseudogene 26250254 26252165 Y 1 chromosome
+ENSG00000232226 ARSFP1 pseudogene 14373025 14378737 Y -1 chromosome
+ENSG00000232235 CDY3P pseudogene 9003694 9005311 Y -1 chromosome
+ENSG00000232348 LINC00279 lincRNA 8506335 8512883 Y 1 chromosome
+ENSG00000232419 TTTY19 252952 lincRNA 8572513 8573324 Y 1 chromosome
+ENSG00000232424 USP9YP29 pseudogene 25886405 25892840 Y -1 chromosome
+ENSG00000232475 HSFY5P pseudogene 24194701 24195494 Y -1 chromosome
+ENSG00000232522 ZNF886P pseudogene 22163071 22174090 Y -1 chromosome
+ENSG00000232583 GPR143P pseudogene 6968774 6974543 Y -1 chromosome
+ENSG00000232585 OFD1P12Y pseudogene 26118927 26141228 Y 1 chromosome
+ENSG00000232614 USP9YP9 pseudogene 27876158 27881307 Y -1 chromosome
+ENSG00000232617 AC017019.1 pseudogene 9502838 9506619 Y 1 chromosome
+ENSG00000232620 TSPY17P pseudogene 6388504 6388970 Y 1 chromosome
+ENSG00000232634 NEFLP1 pseudogene 23382992 23383975 Y -1 chromosome
+ENSG00000232695 TCEB1P17 pseudogene 28007090 28007415 Y 1 chromosome
+ENSG00000232730 FAM8A4P pseudogene 14442350 14443562 Y 1 chromosome
+ENSG00000232744 USP9YP16 pseudogene 20283081 20285685 Y -1 chromosome
+ENSG00000232764 TRIM60P8Y pseudogene 25171741 25172810 Y 1 chromosome
+ENSG00000232808 TTTY20 252951 antisense 9167489 9172441 Y -1 chromosome
+ENSG00000232845 TRAPPC2P9 pseudogene 25908985 25911730 Y -1 chromosome
+ENSG00000232899 CDY9P pseudogene 20344721 20346300 Y -1 chromosome
+ENSG00000232910 RAB9AP1 pseudogene 25897335 25897907 Y 1 chromosome
+ENSG00000232914 TRAPPC2P4 pseudogene 28050665 28053397 Y 1 chromosome
+ENSG00000232924 TCEB1P4 pseudogene 9068524 9068856 Y 1 chromosome
+ENSG00000232927 USP12PY pseudogene 3550846 3551909 Y 1 chromosome
+ENSG00000232976 TRIM60P10Y pseudogene 26805484 26806553 Y 1 chromosome
+ENSG00000233070 ZFY-AS1 antisense 2834885 2870667 Y -1 chromosome
+ENSG00000233120 USP9YP15 pseudogene 20278413 20279581 Y 1 chromosome
+ENSG00000233126 ZNF736P3Y pseudogene 25195394 25197342 Y 1 chromosome
+ENSG00000233156 HSFY8P pseudogene 28157582 28158384 Y 1 chromosome
+ENSG00000233378 USP9YP34 pseudogene 20096229 20105140 Y 1 chromosome
+ENSG00000233522 FAM224A lincRNA 20488139 20515096 Y 1 chromosome
+ENSG00000233546 PRYP5 pseudogene 20691244 20691509 Y 1 chromosome
+ENSG00000233619 AC006328.9 pseudogene 27633431 27633809 Y -1 chromosome
+ENSG00000233634 GOT2P5 pseudogene 6705748 6706293 Y 1 chromosome
+ENSG00000233652 CICP1 pseudogene 27535139 27537958 Y 1 chromosome
+ENSG00000233699 TTTY18 252950 lincRNA 8551411 8551919 Y -1 chromosome
+ENSG00000233740 CICP2 pseudogene 26424484 26427287 Y -1 chromosome
+ENSG00000233774 MED14P1 pseudogene 14730915 14746333 Y -1 chromosome
+ENSG00000233803 TSPY4 728395 protein_coding 9175073 9177893 Y 1 chromosome
+ENSG00000233843 CYCSP48 pseudogene 28546758 28547377 Y 1 chromosome
+ENSG00000233864 TTTY15 64595 lincRNA 14774265 14804162 Y 1 chromosome
+ENSG00000233944 LINC00265-3P pseudogene 27539301 27540203 Y -1 chromosome
+ENSG00000234059 CASKP1 pseudogene 15042075 15060090 Y -1 chromosome
+ENSG00000234081 TCEB1P10 pseudogene 26153058 26153384 Y -1 chromosome
+ENSG00000234110 TSPY25P pseudogene 9461792 9463961 Y 1 chromosome
+ENSG00000234131 TCEB1P8 pseudogene 24663389 24663720 Y -1 chromosome
+ENSG00000234179 MTCYBP1 pseudogene 8232073 8233191 Y 1 chromosome
+ENSG00000234385 RCC2P1 pseudogene 13901758 13903233 Y -1 chromosome
+ENSG00000234399 RBMY2XP pseudogene 26542751 26549673 Y -1 chromosome
+ENSG00000234414 RBMY1A1 5940 protein_coding 23673258 23711212 Y 1 chromosome
+ENSG00000234529 GAPDHP19 pseudogene 21489455 21490459 Y 1 chromosome
+ENSG00000234583 TSPY19P pseudogene 6171998 6173115 Y -1 chromosome
+ENSG00000234620 HDHD1P1 pseudogene 17460542 17567954 Y -1 chromosome
+ENSG00000234652 AGPAT5P1 pseudogene 3161849 3162867 Y 1 chromosome
+ENSG00000234744 USP9YP26 pseudogene 28148401 28155003 Y 1 chromosome
+ENSG00000234795 RFTN1P1 pseudogene 7581026 7677302 Y -1 chromosome
+ENSG00000234803 FAM197Y2 252946;100287826;100289150 antisense 9355208 9364506 Y -1 chromosome
+ENSG00000234830 FAM197Y9 pseudogene 6124308 6131994 Y -1 chromosome
+ENSG00000234850 MTND2P3 pseudogene 8240282 8240751 Y 1 chromosome
+ENSG00000234888 OFD1P15Y pseudogene 28201926 28234640 Y -1 chromosome
+ENSG00000234950 RBMY2OP pseudogene 9789162 9797032 Y 1 chromosome
+ENSG00000235001 EIF4A1P2 pseudogene 5205786 5207005 Y -1 chromosome
+ENSG00000235004 USP9YP30 pseudogene 27869163 27870333 Y 1 chromosome
+ENSG00000235014 REREP2Y pseudogene 28344478 28354295 Y -1 chromosome
+ENSG00000235059 AC008175.1 101929148 lincRNA 24585087 24631739 Y 1 chromosome
+ENSG00000235094 AC006335.10 pseudogene 6174083 6175961 Y 1 chromosome
+ENSG00000235175 RPL26P37 pseudogene 5661341 5661778 Y -1 chromosome
+ENSG00000235193 AC006987.4 pseudogene 9925635 9927195 Y 1 chromosome
+ENSG00000235412 TTTY4B 474149;474150;114761 lincRNA 26716349 26753172 Y 1 chromosome
+ENSG00000235451 PNPLA4P1 pseudogene 16192955 16198480 Y 1 chromosome
+ENSG00000235462 TAB3P1 pseudogene 15265842 15274125 Y -1 chromosome
+ENSG00000235479 RAB9AP4 pseudogene 20811557 20812165 Y 1 chromosome
+ENSG00000235511 OFD1P18Y pseudogene 28018078 28044800 Y -1 chromosome
+ENSG00000235521 USP9YP27 pseudogene 19900195 19901364 Y -1 chromosome
+ENSG00000235583 AC007562.1 pseudogene 27658448 27673933 Y -1 chromosome
+ENSG00000235649 MXRA5P1 pseudogene 14077914 14108092 Y 1 chromosome
+ENSG00000235691 AC006987.5 pseudogene 9928411 9928816 Y 1 chromosome
+ENSG00000235719 AC010141.1 pseudogene 23627270 23629049 Y -1 chromosome
+ENSG00000235857 CTBP2P1 pseudogene 59001391 59001635 Y 1 chromosome
+ENSG00000235895 AC010154.2 pseudogene 6361017 6364798 Y -1 chromosome
+ENSG00000235981 AC023274.4 pseudogene 26288505 26303990 Y 1 chromosome
+ENSG00000236131 MED13P1 pseudogene 17019778 17019948 Y 1 chromosome
+ENSG00000236379 ZNF736P4Y pseudogene 27314746 27315988 Y 1 chromosome
+ENSG00000236424 TSPY10 728137;100289087;7258 protein_coding 9365489 9368291 Y 1 chromosome
+ENSG00000236429 GPM6BP2 pseudogene 20903729 20903872 Y -1 chromosome
+ENSG00000236435 TSPY12P pseudogene 7555780 7557589 Y -1 chromosome
+ENSG00000236477 RPS24P1 pseudogene 14365457 14366162 Y 1 chromosome
+ENSG00000236599 TCEB1P26 pseudogene 20340818 20341146 Y -1 chromosome
+ENSG00000236615 AC010086.5 pseudogene 23567656 23567881 Y 1 chromosome
+ENSG00000236620 XKRYP3 pseudogene 25862069 25872955 Y -1 chromosome
+ENSG00000236647 AC009947.5 pseudogene 25695956 25697532 Y 1 chromosome
+ENSG00000236690 AC007274.5 pseudogene 7558552 7560719 Y 1 chromosome
+ENSG00000236718 RBMY2QP pseudogene 9860098 9871796 Y -1 chromosome
+ENSG00000236786 TSPY15P pseudogene 9385717 9388290 Y 1 chromosome
+ENSG00000236951 AC007359.6 101929148 lincRNA 24246961 24293631 Y -1 chromosome
+ENSG00000237023 USP9YP3 pseudogene 23823812 23835894 Y 1 chromosome
+ENSG00000237048 TTTY12 83867 lincRNA 7672965 7678724 Y 1 chromosome
+ENSG00000237069 TTTY23B 252955;100101121 lincRNA 6110487 6111651 Y -1 chromosome
+ENSG00000237195 DLGAP5P1 pseudogene 6026843 6027306 Y 1 chromosome
+ENSG00000237269 RBMY2TP pseudogene 23592583 23599131 Y -1 chromosome
+ENSG00000237302 OFD1P11Y pseudogene 25917379 25944297 Y 1 chromosome
+ENSG00000237427 TOMM22P1 pseudogene 23292756 23293067 Y 1 chromosome
+ENSG00000237447 CDC27P2 pseudogene 10027986 10029907 Y -1 chromosome
+ENSG00000237467 USP9YP35 pseudogene 26223946 26225959 Y 1 chromosome
+ENSG00000237546 XKRYP6 pseudogene 28089415 28100299 Y 1 chromosome
+ENSG00000237558 CDY7P pseudogene 20063469 20065380 Y 1 chromosome
+ENSG00000237563 TTTY21B 252953;100101115 lincRNA 6311475 6315118 Y 1 chromosome
+ENSG00000237616 USP9YP32 pseudogene 20107081 20109132 Y -1 chromosome
+ENSG00000237659 RNASEH2CP1 pseudogene 2657868 2658369 Y 1 chromosome
+ENSG00000237701 ATP5JP1 pseudogene 6768794 6769413 Y 1 chromosome
+ENSG00000237802 FAM197Y6 100289188 antisense 9225731 9233636 Y -1 chromosome
+ENSG00000237823 CDY19P pseudogene 27710269 27712180 Y -1 chromosome
+ENSG00000237902 TSPY21P pseudogene 24546550 24548715 Y -1 chromosome
+ENSG00000237917 PARP4P1 pseudogene 28740998 28780799 Y -1 chromosome
+ENSG00000237968 AC007322.7 pseudogene 24086159 24087740 Y 1 chromosome
+ENSG00000237997 RCC2P2 pseudogene 22148461 22149738 Y -1 chromosome
+ENSG00000238067 XKRYP1 pseudogene 20670463 20670954 Y 1 chromosome
+ENSG00000238073 RBMY2HP pseudogene 7539784 7544065 Y -1 chromosome
+ENSG00000238074 TSPY6P 7258 protein_coding 9324922 9327689 Y 1 chromosome
+ENSG00000238088 OFD1P7Y pseudogene 21010150 21029166 Y 1 chromosome
+ENSG00000238135 USP9YP10 pseudogene 20987993 20991373 Y 1 chromosome
+ENSG00000238154 USP9YP4 pseudogene 9027238 9042220 Y -1 chromosome
+ENSG00000238191 CLUHP1 pseudogene 19733139 19737712 Y -1 chromosome
+ENSG00000238235 TSPY11P pseudogene 6134634 6137316 Y 1 chromosome
+ENSG00000239225 TTTY23 252955;100101121 lincRNA 9748407 9749571 Y 1 chromosome
+ENSG00000239304 DNM1P48 pseudogene 27632787 27633469 Y 1 chromosome
+ENSG00000239533 GOLGA2P2Y processed_transcript 26356114 26360978 Y -1 chromosome
+ENSG00000239893 ZNF736P9Y pseudogene 7936937 7938764 Y -1 chromosome
+ENSG00000240438 OFD1P5Y pseudogene 20743949 20790963 Y -1 chromosome
+ENSG00000240450 CSPG4P1Y 114758 lincRNA 27629055 27632852 Y 1 chromosome
+ENSG00000240566 AC010153.3 pseudogene 26796955 26797681 Y 1 chromosome
+ENSG00000241200 ZNF736P7Y pseudogene 7859028 7859847 Y -1 chromosome
+ENSG00000241859 KALP pseudogene 15863536 16027704 Y 1 chromosome
+ENSG00000242153 OFD1P6Y 83864;425057 pseudogene 20835919 20901083 Y 1 chromosome
+ENSG00000242389 RBMY1E 378949;378950 protein_coding 24026223 24064214 Y -1 chromosome
+ENSG00000242393 AC010141.4 pseudogene 23670185 23672356 Y -1 chromosome
+ENSG00000242425 RN7SL818P misc_RNA 26357107 26357382 Y -1 chromosome
+ENSG00000242854 DNM1P24 pseudogene 26325463 26329652 Y -1 chromosome
+ENSG00000242875 RBMY1B 378949;378948 protein_coding 23673224 23687672 Y 1 chromosome
+ENSG00000242879 AC006335.11 pseudogene 6175751 6177628 Y 1 chromosome
+ENSG00000243040 RBMY2FP 159162;100652931 pseudogene 24455006 24467972 Y 1 chromosome
+ENSG00000243643 TSPY20P pseudogene 9873804 9875984 Y 1 chromosome
+ENSG00000243980 RN7SL702P misc_RNA 14394177 14394465 Y 1 chromosome
+ENSG00000244000 AC006366.3 pseudogene 25163210 25163968 Y 1 chromosome
+ENSG00000244231 CSPG4P2Y 114758 processed_transcript 26329581 26333378 Y -1 chromosome
+ENSG00000244246 ZNF736P8Y pseudogene 7781463 7782131 Y -1 chromosome
+ENSG00000244395 RBMY1D 378949;5940;378948 protein_coding 24026223 24040673 Y -1 chromosome
+ENSG00000244646 AC024183.3 pseudogene 20290496 20298913 Y 1 chromosome
+ENSG00000248573 PRYP6 pseudogene 20952595 20952937 Y -1 chromosome
+ENSG00000248792 LINC00266-2P pseudogene 26424828 26437493 Y 1 chromosome
+ENSG00000249501 USP9YP2 pseudogene 20938165 20941313 Y -1 chromosome
+ENSG00000249606 TRAPPC2P7 pseudogene 20817406 20818076 Y -1 chromosome
+ENSG00000249634 USP9YP5 pseudogene 20652801 20656181 Y -1 chromosome
+ENSG00000249726 TUBB1P1 pseudogene 20548860 20551037 Y 1 chromosome
+ENSG00000250204 AC006335.6 pseudogene 6111336 6113324 Y -1 chromosome
+ENSG00000250868 AC007742.7 pseudogene 19880862 19889280 Y -1 chromosome
+ENSG00000250951 USP9YP1 pseudogene 20702866 20706014 Y 1 chromosome
+ENSG00000251275 AC006335.2 pseudogene 6109809 6111670 Y -1 chromosome
+ENSG00000251510 AC022486.1 lincRNA 20653626 20709584 Y 1 chromosome
+ENSG00000251618 AC007322.3 pseudogene 24041541 24043706 Y 1 chromosome
+ENSG00000251705 RNA5-8SP6 rRNA 10037764 10037915 Y 1 chromosome
+ENSG00000251766 RNA5SP518 rRNA 9928019 9928137 Y -1 chromosome
+ENSG00000251796 SNORA70 snoRNA 28393531 28393668 Y 1 chromosome
+ENSG00000251841 RNU6-1334P snRNA 2652790 2652894 Y 1 chromosome
+ENSG00000251879 AC010874.1 miRNA 5742287 5742379 Y 1 chromosome
+ENSG00000251917 RNU1-86P snRNA 26092765 26092918 Y -1 chromosome
+ENSG00000251925 SNORA70 snoRNA 25569246 25569383 Y -1 chromosome
+ENSG00000251953 RNA5SP522 rRNA 20508009 20508124 Y -1 chromosome
+ENSG00000251966 AC010970.1 miRNA 10033981 10034093 Y 1 chromosome
+ENSG00000251970 RNU1-41P snRNA 20995615 20995776 Y -1 chromosome
+ENSG00000251996 Y_RNA misc_RNA 7209572 7209683 Y 1 chromosome
+ENSG00000252012 RNA5SP521 rRNA 19671654 19671769 Y 1 chromosome
+ENSG00000252038 AC068704.1 miRNA 19734941 19735035 Y -1 chromosome
+ENSG00000252059 AC012667.1 miRNA 5441969 5442060 Y 1 chromosome
+ENSG00000252155 RNU6-941P snRNA 7246713 7246820 Y 1 chromosome
+ENSG00000252166 RNU1-95P snRNA 20278734 20278893 Y 1 chromosome
+ENSG00000252173 RNU6-109P snRNA 18360815 18360921 Y 1 chromosome
+ENSG00000252209 RNU1-48P snRNA 20648397 20648558 Y 1 chromosome
+ENSG00000252289 RNA5SP519 rRNA 9930484 9930602 Y -1 chromosome
+ENSG00000252315 RNA5SP520 rRNA 19669744 19669856 Y 1 chromosome
+ENSG00000252323 RNU6-184P snRNA 18448163 18448269 Y -1 chromosome
+ENSG00000252426 RNU1-107P snRNA 27869489 27869642 Y 1 chromosome
+ENSG00000252439 AC007241.1 miRNA 20444739 20444833 Y 1 chromosome
+ENSG00000252468 RNU2-57P snRNA 4887117 4887307 Y 1 chromosome
+ENSG00000252471 AC006328.1 miRNA 27606157 27606239 Y 1 chromosome
+ENSG00000252472 RNU6-521P snRNA 7291095 7291199 Y -1 chromosome
+ENSG00000252513 RNU1-128P snRNA 19900883 19901042 Y -1 chromosome
+ENSG00000252586 AC002992.1 miRNA 14482123 14482230 Y 1 chromosome
+ENSG00000252625 RNU1-40P snRNA 28075126 28075289 Y 1 chromosome
+ENSG00000252633 RN7SKP282 misc_RNA 7192338 7192636 Y -1 chromosome
+ENSG00000252664 AC017020.1 miRNA 18174646 18174709 Y 1 chromosome
+ENSG00000252667 RNA5SP523 rRNA 20509921 20510033 Y -1 chromosome
+ENSG00000252681 RNU1-97P snRNA 25887089 25887252 Y -1 chromosome
+ENSG00000252689 SNORA20 snoRNA 18250128 18250259 Y -1 chromosome
+ENSG00000252694 AC006371.1 miRNA 15779840 15779936 Y -1 chromosome
+ENSG00000252766 RNU6-255P snRNA 21180869 21180973 Y -1 chromosome
+ENSG00000252855 AC134878.1 miRNA 13340551 13340633 Y -1 chromosome
+ENSG00000252900 RNU6-303P snRNA 4043026 4043131 Y -1 chromosome
+ENSG00000252948 RNU6-1314P snRNA 28507136 28507239 Y 1 chromosome
+ENSG00000254488 RP11-65G9.1 lincRNA 23200175 23206610 Y -1 chromosome
+ENSG00000258567 DUX4L16 pseudogene 13462594 13463857 Y 1 chromosome
+ENSG00000258991 DUX4L19 pseudogene 13488005 13489271 Y 1 chromosome
+ENSG00000258992 TSPY1 7258 protein_coding 9236076 9307357 Y 1 chromosome
+ENSG00000259029 DUX4L18 pseudogene 13477233 13478499 Y 1 chromosome
+ENSG00000259154 DUX4L17 pseudogene 13470597 13471863 Y 1 chromosome
+ENSG00000259247 TTTY25P pseudogene 24476599 24478647 Y 1 chromosome
+ENSG00000260197 RP11-424G14.1 lincRNA 21853827 21856492 Y -1 chromosome
+ENSG00000263502 AC134878.2 miRNA 13340359 13340440 Y -1 chromosome
+ENSG00000265161 AC011293.1 miRNA 13947443 13947512 Y -1 chromosome
+ENSG00000265197 AC053516.1 miRNA 18398127 18398238 Y -1 chromosome
+ENSG00000266220 AC010723.2 miRNA 16364065 16364171 Y -1 chromosome
+ENSG00000267793 RP11-576C2.1 pseudogene 21760074 21760643 Y 1 chromosome
+ENSG00000267935 AC016752.1 protein_coding 25847479 25850592 Y 1 chromosome
+ENSG00000269084 AC009977.1 protein_coding 21737895 21738068 Y -1 chromosome
+ENSG00000269291 AC010877.1 protein_coding 15418467 15429181 Y 1 chromosome
+ENSG00000269393 AC007965.1 protein_coding 28111776 28114889 Y -1 chromosome
+ENSG00000269464 AC012067.1 protein_coding 5306691 5312605 Y -1 chromosome
+ENSG00000270073 AC006156.2 pseudogene 9345205 9347784 Y 1 chromosome
+ENSG00000270242 RP11-295P22.1 pseudogene 13551375 13552752 Y 1 chromosome
+ENSG00000270455 PABPC1P5 pseudogene 13491303 13493369 Y 1 chromosome
+ENSG00000270535 TCEB1P34 pseudogene 27809047 27809373 Y 1 chromosome
+ENSG00000270570 RP1-85D24.1 pseudogene 13263395 13263563 Y -1 chromosome
+ENSG00000271123 TCEB1P5 pseudogene 23793569 23793901 Y -1 chromosome
+ENSG00000271309 RP1-85D24.2 pseudogene 13263065 13263272 Y -1 chromosome
+ENSG00000271365 RP1-85D24.3 pseudogene 13262741 13262939 Y -1 chromosome
+ENSG00000271375 RP11-295P22.2 pseudogene 13629403 13629913 Y -1 chromosome
+ENSG00000271595 TCEB1P35 pseudogene 19576759 19577094 Y -1 chromosome
+ENSG00000272042 Metazoa_SRP misc_RNA 27605054 27605329 Y 1 chromosome
diff --git a/inst/chrY/ens_metadata.txt b/inst/chrY/ens_metadata.txt
new file mode 100644
index 0000000..647a335
--- /dev/null
+++ b/inst/chrY/ens_metadata.txt
@@ -0,0 +1,12 @@
+name value
+Db type EnsDb
+Type of Gene ID Ensembl Gene ID
+Supporting package ensembldb
+Db created by ensembldb package from Bioconductor
+script_version 0.1.2
+Creation time Wed Mar 18 09:30:54 2015
+ensembl_version 75
+ensembl_host manny.i-med.ac.at
+Organism homo_sapiens
+genome_build GRCh37
+DBSCHEMAVERSION 1.0
diff --git a/inst/chrY/ens_tx.txt b/inst/chrY/ens_tx.txt
new file mode 100644
index 0000000..c8d9771
--- /dev/null
+++ b/inst/chrY/ens_tx.txt
@@ -0,0 +1,732 @@
+tx_id tx_biotype tx_seq_start tx_seq_end tx_cds_seq_start tx_cds_seq_end gene_id
+ENST00000469599 retained_intron 21865751 21878581 NULL NULL ENSG00000012817
+ENST00000317961 protein_coding 21867301 21906809 21867881 21906420 ENSG00000012817
+ENST00000382806 protein_coding 21867306 21906647 21867881 21906420 ENSG00000012817
+ENST00000492117 retained_intron 21867311 21884353 NULL NULL ENSG00000012817
+ENST00000440077 protein_coding 21867949 21906809 21867949 21906420 ENSG00000012817
+ENST00000415360 protein_coding 21869068 21870835 21869068 21870835 ENSG00000012817
+ENST00000485154 retained_intron 21871586 21872492 NULL NULL ENSG00000012817
+ENST00000478891 retained_intron 21877022 21878234 NULL NULL ENSG00000012817
+ENST00000447300 protein_coding 21883158 21906597 21883158 21906420 ENSG00000012817
+ENST00000541639 protein_coding 21867303 21906825 21867881 21906420 ENSG00000012817
+ENST00000360160 protein_coding 15016019 15030451 15016848 15030034 ENSG00000067048
+ENST00000454054 protein_coding 15016029 15025765 15016848 15025765 ENSG00000067048
+ENST00000336079 protein_coding 15016742 15032390 15016848 15030034 ENSG00000067048
+ENST00000493363 processed_transcript 15016760 15021607 NULL NULL ENSG00000067048
+ENST00000440554 protein_coding 15017649 15026561 15017691 15026561 ENSG00000067048
+ENST00000469101 processed_transcript 15024552 15025713 NULL NULL ENSG00000067048
+ENST00000472510 processed_transcript 15024875 15026811 NULL NULL ENSG00000067048
+ENST00000463199 processed_transcript 15024920 15027139 NULL NULL ENSG00000067048
+ENST00000495478 processed_transcript 15027408 15028265 NULL NULL ENSG00000067048
+ENST00000383052 protein_coding 2803322 2850547 2821978 2848034 ENSG00000067646
+ENST00000469869 processed_transcript 2803541 2846094 NULL NULL ENSG00000067646
+ENST00000443793 protein_coding 2803546 2829327 2821978 2829327 ENSG00000067646
+ENST00000478783 processed_transcript 2845860 2847391 NULL NULL ENSG00000067646
+ENST00000431102 protein_coding 2803112 2850546 2821978 2848034 ENSG00000067646
+ENST00000155093 protein_coding 2803518 2850546 2821978 2848034 ENSG00000067646
+ENST00000449237 protein_coding 2803518 2850546 2829132 2848034 ENSG00000067646
+ENST00000383032 protein_coding 6778727 6959724 6893126 6959533 ENSG00000092377
+ENST00000355162 protein_coding 6778727 6959724 6893126 6959533 ENSG00000092377
+ENST00000346432 protein_coding 6778727 6959724 6893126 6959533 ENSG00000092377
+ENST00000333703 protein_coding 4868267 4973485 4900738 4972402 ENSG00000099715
+ENST00000362095 protein_coding 4924131 4972741 4924865 4972402 ENSG00000099715
+ENST00000400457 protein_coding 4924930 5610265 4924930 5605983 ENSG00000099715
+ENST00000215473 protein_coding 4924865 5605983 4924865 5605983 ENSG00000099715
+ENST00000215479 protein_coding 6733959 6742068 6734114 6740649 ENSG00000099721
+ENST00000383036 protein_coding 6734114 6740649 6734114 6740649 ENSG00000099721
+ENST00000383037 protein_coding 6736078 6740649 6736078 6740649 ENSG00000099721
+ENST00000528056 processed_transcript 7142013 7249589 NULL NULL ENSG00000099725
+ENST00000533551 transcribed_unprocessed_pseudogene 7142354 7239909 NULL NULL ENSG00000099725
+ENST00000495163 processed_transcript 7194108 7196444 NULL NULL ENSG00000099725
+ENST00000472666 processed_transcript 7201071 7224264 NULL NULL ENSG00000099725
+ENST00000362758 transcribed_unprocessed_pseudogene 7142336 7235456 NULL NULL ENSG00000099725
+ENST00000338981 protein_coding 14813160 14972764 14821381 14971341 ENSG00000114374
+ENST00000493168 processed_transcript 14813969 14833136 NULL NULL ENSG00000114374
+ENST00000426564 processed_transcript 14821369 14972764 NULL NULL ENSG00000114374
+ENST00000453031 protein_coding 14958970 14971537 14958970 14971341 ENSG00000114374
+ENST00000471409 processed_transcript 14968738 14972764 NULL NULL ENSG00000114374
+ENST00000250776 lincRNA 6258472 6279605 NULL NULL ENSG00000129816
+ENST00000250784 protein_coding 2709527 2735309 2709666 2734935 ENSG00000129824
+ENST00000430575 protein_coding 2709961 2734903 2709985 2734903 ENSG00000129824
+ENST00000477725 processed_transcript 2722137 2734997 NULL NULL ENSG00000129824
+ENST00000515575 processed_transcript 2722771 2800041 NULL NULL ENSG00000129824
+ENST00000250805 lincRNA 9590765 9611898 NULL NULL ENSG00000129845
+ENST00000250823 protein_coding 16168097 16168838 16168170 16168739 ENSG00000129862
+ENST00000250825 protein_coding 16097652 16098393 16097751 16098320 ENSG00000129864
+ENST00000382867 protein_coding 19990147 19992100 19990147 19991772 ENSG00000129873
+ENST00000544303 protein_coding 19989290 19992098 19989665 19991772 ENSG00000129873
+ENST00000407724 processed_transcript 21729199 21752133 NULL NULL ENSG00000131002
+ENST00000459719 retained_intron 21729236 21752304 NULL NULL ENSG00000131002
+ENST00000447520 processed_transcript 21729268 21752309 NULL NULL ENSG00000131002
+ENST00000445715 transcribed_unprocessed_pseudogene 21729268 21766006 NULL NULL ENSG00000131002
+ENST00000538014 retained_intron 21729673 21751735 NULL NULL ENSG00000131002
+ENST00000447202 retained_intron 21729715 21751733 NULL NULL ENSG00000131002
+ENST00000588613 processed_transcript 21750429 21755603 NULL NULL ENSG00000131002
+ENST00000589075 processed_transcript 21750456 21752305 NULL NULL ENSG00000131002
+ENST00000585549 processed_transcript 21750495 21755488 NULL NULL ENSG00000131002
+ENST00000587095 processed_transcript 21750497 21755488 NULL NULL ENSG00000131002
+ENST00000488280 processed_transcript 21752639 21755537 NULL NULL ENSG00000131002
+ENST00000253320 processed_transcript 21754336 21768160 NULL NULL ENSG00000131002
+ENST00000593000 retained_intron 21754974 21756047 NULL NULL ENSG00000131002
+ENST00000592697 processed_transcript 21759243 21765937 NULL NULL ENSG00000131002
+ENST00000251749 transcribed_unprocessed_pseudogene 21729235 21752308 NULL NULL ENSG00000131002
+ENST00000382832 transcribed_unprocessed_pseudogene 21758442 21767698 NULL NULL ENSG00000131002
+ENST00000433794 antisense 20743092 20752407 NULL NULL ENSG00000131007
+ENST00000545582 antisense 20743092 20750598 NULL NULL ENSG00000131007
+ENST00000253838 lincRNA 24585740 24587605 NULL NULL ENSG00000131538
+ENST00000538537 lincRNA 24585887 24587549 NULL NULL ENSG00000131538
+ENST00000253848 antisense 24291113 24292978 NULL NULL ENSG00000131548
+ENST00000545808 antisense 24291169 24292831 NULL NULL ENSG00000131548
+ENST00000457100 lincRNA 6317509 6325947 NULL NULL ENSG00000147753
+ENST00000276770 lincRNA 6317509 6325947 NULL NULL ENSG00000147753
+ENST00000449828 lincRNA 6321946 6325947 NULL NULL ENSG00000147753
+ENST00000447655 lincRNA 9544433 9548434 NULL NULL ENSG00000147761
+ENST00000276779 lincRNA 9544433 9552871 NULL NULL ENSG00000147761
+ENST00000415405 lincRNA 9544433 9552871 NULL NULL ENSG00000147761
+ENST00000284856 protein_coding 15815447 15817904 15816216 15817139 ENSG00000154620
+ENST00000288666 protein_coding 22918050 22942918 22918050 22942918 ENSG00000157828
+ENST00000471252 processed_transcript 16634518 16734377 NULL NULL ENSG00000165246
+ENST00000382872 protein_coding 16634632 16957530 16835029 16953142 ENSG00000165246
+ENST00000355905 protein_coding 16635626 16955606 16734000 16953142 ENSG00000165246
+ENST00000382868 protein_coding 16635626 16955606 16734000 16953142 ENSG00000165246
+ENST00000476359 processed_transcript 16635626 16955606 NULL NULL ENSG00000165246
+ENST00000481089 processed_transcript 16636409 16734248 NULL NULL ENSG00000165246
+ENST00000339174 protein_coding 16636454 16955527 16734000 16953142 ENSG00000165246
+ENST00000413217 protein_coding 16734061 16845429 16734061 16845417 ENSG00000165246
+ENST00000297967 protein_coding 16733901 16845429 16734000 16845417 ENSG00000165246
+ENST00000320701 protein_coding 6114264 6117054 6114310 6116866 ENSG00000168757
+ENST00000383042 protein_coding 6114310 6117053 6114310 6116119 ENSG00000168757
+ENST00000470569 retained_intron 6114310 6117060 NULL NULL ENSG00000168757
+ENST00000464674 retained_intron 6115664 6117045 NULL NULL ENSG00000168757
+ENST00000343584 processed_transcript 25827587 25840726 NULL NULL ENSG00000169763
+ENST00000607210 transcribed_unprocessed_pseudogene 25828154 25840674 NULL NULL ENSG00000169763
+ENST00000303593 transcribed_unprocessed_pseudogene 25829559 25840710 NULL NULL ENSG00000169763
+ENST00000303728 protein_coding 24636544 24660784 24647712 24660217 ENSG00000169789
+ENST00000477123 nonsense_mediated_decay 24636544 24660784 24647712 24658815 ENSG00000169789
+ENST00000338793 protein_coding 24636544 24658815 24647712 24658815 ENSG00000169789
+ENST00000303766 protein_coding 24314689 24329095 24314967 24327116 ENSG00000169800
+ENST00000481858 processed_transcript 24314689 24329104 NULL NULL ENSG00000169800
+ENST00000454978 protein_coding 24314689 24329129 24314967 24327116 ENSG00000169800
+ENST00000303804 protein_coding 24217903 24242154 24218470 24230986 ENSG00000169807
+ENST00000472391 nonsense_mediated_decay 24217903 24242154 24219875 24230986 ENSG00000169807
+ENST00000341740 protein_coding 24219875 24242154 24219875 24230986 ENSG00000169807
+ENST00000382759 unprocessed_pseudogene 23655405 23663854 NULL NULL ENSG00000169811
+ENST00000326985 unprocessed_pseudogene 23655405 23662391 NULL NULL ENSG00000169811
+ENST00000426043 unprocessed_pseudogene 23661276 23663854 NULL NULL ENSG00000169811
+ENST00000303979 unprocessed_pseudogene 23630044 23632569 NULL NULL ENSG00000169849
+ENST00000344884 protein_coding 20893326 20935601 20893655 20935504 ENSG00000169953
+ENST00000491902 processed_transcript 20893326 20935601 NULL NULL ENSG00000169953
+ENST00000382852 protein_coding 20930807 20935572 20931484 20935504 ENSG00000169953
+ENST00000304790 protein_coding 20933700 20935621 20933821 20935504 ENSG00000169953
+ENST00000505047 processed_transcript 20934594 20990548 NULL NULL ENSG00000169953
+ENST00000306589 unprocessed_pseudogene 28121696 28134216 NULL NULL ENSG00000172283
+ENST00000338673 unprocessed_pseudogene 28121660 28132811 NULL NULL ENSG00000172283
+ENST00000306609 protein_coding 27768309 27771049 27768590 27770674 ENSG00000172288
+ENST00000361963 protein_coding 27768264 27770483 27768590 27770212 ENSG00000172288
+ENST00000333235 unprocessed_pseudogene 27624416 27629777 NULL NULL ENSG00000172294
+ENST00000539489 unprocessed_pseudogene 27629055 27632852 NULL NULL ENSG00000172294
+ENST00000416946 unprocessed_pseudogene 27600708 27606719 NULL NULL ENSG00000172297
+ENST00000306667 unprocessed_pseudogene 27601458 27606322 NULL NULL ENSG00000172297
+ENST00000423852 unprocessed_pseudogene 26355714 26361728 NULL NULL ENSG00000172332
+ENST00000338706 unprocessed_pseudogene 26356114 26360978 NULL NULL ENSG00000172332
+ENST00000418188 unprocessed_pseudogene 26332656 26338017 NULL NULL ENSG00000172342
+ENST00000306882 protein_coding 26191376 26194116 26191751 26193835 ENSG00000172352
+ENST00000382407 protein_coding 26191940 26194166 26192213 26193835 ENSG00000172352
+ENST00000307393 protein_coding 20708557 20710478 20708674 20710357 ENSG00000172468
+ENST00000309834 protein_coding 20708577 20750849 20708674 20750520 ENSG00000172468
+ENST00000338876 nonsense_mediated_decay 20708577 20750849 20708674 20712690 ENSG00000172468
+ENST00000382856 protein_coding 20708606 20713351 20708674 20712690 ENSG00000172468
+ENST00000455422 transcribed_unprocessed_pseudogene 8774265 8784114 NULL NULL ENSG00000173357
+ENST00000311828 processed_transcript 8777989 8782196 NULL NULL ENSG00000173357
+ENST00000416687 transcribed_unprocessed_pseudogene 8774265 8782184 NULL NULL ENSG00000173357
+ENST00000321217 protein_coding 3447082 3448082 3447286 3447843 ENSG00000176679
+ENST00000559055 protein_coding 3447156 3448082 3447286 3447843 ENSG00000176679
+ENST00000324446 lincRNA 21034387 21040114 NULL NULL ENSG00000176728
+ENST00000454875 lincRNA 21034389 21239004 NULL NULL ENSG00000176728
+ENST00000452584 lincRNA 21094203 21237882 NULL NULL ENSG00000176728
+ENST00000331787 lincRNA 21094585 21239302 NULL NULL ENSG00000176728
+ENST00000447937 lincRNA 21203384 21239281 NULL NULL ENSG00000176728
+ENST00000253470 lincRNA 8651351 8685423 NULL NULL ENSG00000180910
+ENST00000426790 protein_coding 20137669 20140477 20137995 20140102 ENSG00000182415
+ENST00000250838 protein_coding 20137667 20139626 20137995 20139620 ENSG00000182415
+ENST00000382764 protein_coding 23544840 23548246 23545072 23548149 ENSG00000183146
+ENST00000329684 lincRNA 9528709 9531308 NULL NULL ENSG00000183385
+ENST00000426035 lincRNA 9528709 9531566 NULL NULL ENSG00000183385
+ENST00000331172 protein_coding 13496241 13524717 13496255 13524717 ENSG00000183704
+ENST00000602732 protein_coding 25119966 25138523 25138491 25138523 ENSG00000183753
+ENST00000331070 protein_coding 25130410 25151606 25138491 25144415 ENSG00000183753
+ENST00000602818 processed_transcript 25130434 25151553 NULL NULL ENSG00000183753
+ENST00000382585 protein_coding 25130410 25151612 25138491 25144415 ENSG00000183753
+ENST00000602770 protein_coding 26753707 26772264 26772232 26772264 ENSG00000183795
+ENST00000382392 protein_coding 26764151 26785354 26772232 26778157 ENSG00000183795
+ENST00000602549 processed_transcript 26764175 26785295 NULL NULL ENSG00000183795
+ENST00000331397 protein_coding 15360259 15592553 15361736 15591545 ENSG00000183878
+ENST00000362096 protein_coding 15409321 15592550 15409583 15591545 ENSG00000183878
+ENST00000329134 protein_coding 15434914 15592550 15434994 15591545 ENSG00000183878
+ENST00000478900 processed_transcript 15472368 15591420 NULL NULL ENSG00000183878
+ENST00000474365 processed_transcript 15505032 15591384 NULL NULL ENSG00000183878
+ENST00000382893 protein_coding 15508182 15591858 15508784 15591545 ENSG00000183878
+ENST00000479713 processed_transcript 15590922 15591803 NULL NULL ENSG00000183878
+ENST00000382896 protein_coding 15361736 15591545 15361736 15591545 ENSG00000183878
+ENST00000537580 protein_coding 15361736 15591545 15361736 15591545 ENSG00000183878
+ENST00000538878 protein_coding 15434927 15591551 15434994 15591545 ENSG00000183878
+ENST00000540140 protein_coding 15434948 15591545 15435229 15591545 ENSG00000183878
+ENST00000545955 protein_coding 15435435 15591545 15435435 15591545 ENSG00000183878
+ENST00000383070 protein_coding 2654896 2655740 2655030 2655644 ENSG00000184895
+ENST00000525526 protein_coding 2655049 2655644 2655049 2655644 ENSG00000184895
+ENST00000534739 protein_coding 2655145 2655644 2655145 2655644 ENSG00000184895
+ENST00000330337 lincRNA 23745486 23756552 NULL NULL ENSG00000184991
+ENST00000382840 processed_pseudogene 21154353 21154595 NULL NULL ENSG00000185275
+ENST00000455570 lincRNA 6338814 6341671 NULL NULL ENSG00000185700
+ENST00000328819 lincRNA 6339072 6341671 NULL NULL ENSG00000185700
+ENST00000382287 protein_coding 27177048 27198251 27184245 27190170 ENSG00000185894
+ENST00000602559 processed_transcript 27177107 27198227 NULL NULL ENSG00000185894
+ENST00000602680 protein_coding 27190138 27208695 27190138 27190170 ENSG00000185894
+ENST00000382365 protein_coding 26909216 26959626 26915081 26959332 ENSG00000187191
+ENST00000446723 protein_coding 26909220 26959542 26915081 26959332 ENSG00000187191
+ENST00000315357 protein_coding 26909222 26959540 26915081 26959332 ENSG00000187191
+ENST00000400212 protein_coding 26929528 26959519 26929528 26959332 ENSG00000187191
+ENST00000306737 protein_coding 26934285 26959519 26934285 26959332 ENSG00000187191
+ENST00000338964 unprocessed_pseudogene 9743204 9745748 NULL NULL ENSG00000187657
+ENST00000405239 protein_coding 25275502 25345241 25281357 25344947 ENSG00000188120
+ENST00000466332 processed_transcript 25275510 25286959 NULL NULL ENSG00000188120
+ENST00000382510 protein_coding 25276010 25345070 25276427 25344947 ENSG00000188120
+ENST00000426000 protein_coding 25275506 25345140 25281357 25344947 ENSG00000188120
+ENST00000540248 protein_coding 25275509 25345157 25281357 25344947 ENSG00000188120
+ENST00000344424 unprocessed_pseudogene 28555962 28566682 NULL NULL ENSG00000188399
+ENST00000431358 unprocessed_pseudogene 9215731 9218480 NULL NULL ENSG00000188656
+ENST00000382670 unprocessed_pseudogene 9148739 9160478 NULL NULL ENSG00000197038
+ENST00000538925 unprocessed_pseudogene 9154670 9160483 NULL NULL ENSG00000197038
+ENST00000441780 unprocessed_pseudogene 27641798 27648105 NULL NULL ENSG00000197092
+ENST00000361365 protein_coding 22737611 22755040 22737758 22754232 ENSG00000198692
+ENST00000465253 processed_transcript 22737664 22750415 NULL NULL ENSG00000198692
+ENST00000382772 protein_coding 22737680 22754516 22737758 22754232 ENSG00000198692
+ENST00000464196 processed_transcript 22748604 22754516 NULL NULL ENSG00000198692
+ENST00000485584 processed_transcript 22749733 22754516 NULL NULL ENSG00000198692
+ENST00000382314 protein_coding 26980064 27053183 26980274 27047322 ENSG00000205916
+ENST00000382296 protein_coding 26997726 27047331 26997726 27047322 ENSG00000205916
+ENST00000449750 protein_coding 26980064 27053183 26980274 27047322 ENSG00000205916
+ENST00000415508 protein_coding 26980066 27053181 26980274 27047322 ENSG00000205916
+ENST00000440066 protein_coding 26980081 27053183 26980274 27047322 ENSG00000205916
+ENST00000382432 protein_coding 26980008 27053183 26980274 27020194 ENSG00000205916
+ENST00000400494 protein_coding 26980087 27053181 26980274 27020194 ENSG00000205916
+ENST00000382290 protein_coding 26980008 27053183 26980274 27015437 ENSG00000205916
+ENST00000416956 unprocessed_pseudogene 25525631 25538844 NULL NULL ENSG00000205936
+ENST00000382449 protein_coding 25365594 25437503 25365888 25431648 ENSG00000205944
+ENST00000382440 protein_coding 25365622 25437499 25365888 25431648 ENSG00000205944
+ENST00000382433 protein_coding 25365695 25436146 25365888 25431648 ENSG00000205944
+ENST00000382306 protein_coding 25365622 25437496 25365888 25431648 ENSG00000205944
+ENST00000449947 protein_coding 25365678 25437499 25365888 25431648 ENSG00000205944
+ENST00000382294 protein_coding 25365680 25437497 25365888 25431648 ENSG00000205944
+ENST00000382424 protein_coding 25365695 25437497 25365888 25431648 ENSG00000205944
+ENST00000400493 protein_coding 25365701 25437497 25365888 25431648 ENSG00000205944
+ENST00000382431 protein_coding 25365701 25435854 25365888 25431648 ENSG00000205944
+ENST00000382434 protein_coding 25365622 25395738 25365888 25395738 ENSG00000205944
+ENST00000382966 processed_transcript 14475147 14532255 NULL NULL ENSG00000206159
+ENST00000493160 processed_transcript 14494992 14530093 NULL NULL ENSG00000206159
+ENST00000357871 processed_transcript 14514046 14532255 NULL NULL ENSG00000206159
+ENST00000382963 processed_transcript 14517922 14532171 NULL NULL ENSG00000206159
+ENST00000382965 transcribed_unprocessed_pseudogene 14518595 14532121 NULL NULL ENSG00000206159
+ENST00000417072 lincRNA 9578193 9596085 NULL NULL ENSG00000212855
+ENST00000450591 lincRNA 6274285 6292186 NULL NULL ENSG00000212856
+ENST00000388836 processed_pseudogene 5441186 5442472 NULL NULL ENSG00000214207
+ENST00000400275 processed_pseudogene 15398518 15399258 NULL NULL ENSG00000215414
+ENST00000536206 processed_pseudogene 15398518 15399258 NULL NULL ENSG00000215414
+ENST00000258589 unprocessed_pseudogene 28654360 28725837 NULL NULL ENSG00000215506
+ENST00000448881 transcribed_processed_pseudogene 28269821 28279455 NULL NULL ENSG00000215507
+ENST00000400476 processed_transcript 28269867 28275354 NULL NULL ENSG00000215507
+ENST00000442535 processed_pseudogene 25012691 25013903 NULL NULL ENSG00000215537
+ENST00000448575 transcribed_processed_pseudogene 25683455 25693089 NULL NULL ENSG00000215540
+ENST00000458444 processed_transcript 25687556 25693043 NULL NULL ENSG00000215540
+ENST00000400581 lincRNA 24442945 24445023 NULL NULL ENSG00000215560
+ENST00000441139 processed_transcript 21617317 21665039 NULL NULL ENSG00000215580
+ENST00000400605 processed_transcript 21618198 21665022 NULL NULL ENSG00000215580
+ENST00000513194 transcribed_unprocessed_pseudogene 21618471 21646770 NULL NULL ENSG00000215580
+ENST00000421118 processed_pseudogene 14043242 14044475 NULL NULL ENSG00000215583
+ENST00000431340 unprocessed_pseudogene 8148239 8150250 NULL NULL ENSG00000215601
+ENST00000415010 processed_pseudogene 7721727 7722917 NULL NULL ENSG00000215603
+ENST00000404428 processed_pseudogene 3417693 3417851 NULL NULL ENSG00000216777
+ENST00000403990 processed_pseudogene 8286832 8287941 NULL NULL ENSG00000216824
+ENST00000406090 processed_pseudogene 23005142 23005596 NULL NULL ENSG00000216844
+ENST00000407745 processed_pseudogene 21033988 21034158 NULL NULL ENSG00000217179
+ENST00000403487 processed_pseudogene 21147061 21148284 NULL NULL ENSG00000217896
+ENST00000405035 processed_pseudogene 3719265 3720910 NULL NULL ENSG00000218410
+ENST00000421995 unprocessed_pseudogene 26001669 26003238 NULL NULL ENSG00000223362
+ENST00000449659 unprocessed_pseudogene 27898535 27899017 NULL NULL ENSG00000223406
+ENST00000425912 unprocessed_pseudogene 25805613 25813969 NULL NULL ENSG00000223407
+ENST00000439805 unprocessed_pseudogene 7546940 7549069 NULL NULL ENSG00000223422
+ENST00000435111 lincRNA 16388092 16389369 NULL NULL ENSG00000223517
+ENST00000455085 unprocessed_pseudogene 19894090 19896695 NULL NULL ENSG00000223555
+ENST00000414667 processed_pseudogene 2863108 2863314 NULL NULL ENSG00000223600
+ENST00000413867 unprocessed_pseudogene 27577303 27583462 NULL NULL ENSG00000223636
+ENST00000456360 transcribed_unprocessed_pseudogene 23556877 23563471 NULL NULL ENSG00000223637
+ENST00000444169 processed_transcript 23557034 23563448 NULL NULL ENSG00000223637
+ENST00000421387 lincRNA 27329790 27330920 NULL NULL ENSG00000223641
+ENST00000412737 processed_pseudogene 28064470 28065042 NULL NULL ENSG00000223655
+ENST00000417252 unprocessed_pseudogene 26314334 26320641 NULL NULL ENSG00000223698
+ENST00000418671 unprocessed_pseudogene 6196093 6211364 NULL NULL ENSG00000223744
+ENST00000443911 unprocessed_pseudogene 27856055 27859704 NULL NULL ENSG00000223856
+ENST00000428060 processed_pseudogene 15342765 15343320 NULL NULL ENSG00000223915
+ENST00000417434 processed_pseudogene 8231577 8232000 NULL NULL ENSG00000223955
+ENST00000440402 processed_pseudogene 27131348 27132623 NULL NULL ENSG00000223978
+ENST00000428380 unprocessed_pseudogene 20134179 20135788 NULL NULL ENSG00000224033
+ENST00000446621 processed_pseudogene 15195704 15207719 NULL NULL ENSG00000224035
+ENST00000420443 transcribed_unprocessed_pseudogene 14460540 14468218 NULL NULL ENSG00000224060
+ENST00000430152 processed_transcript 14465804 14468226 NULL NULL ENSG00000224060
+ENST00000445253 lincRNA 9638762 9650854 NULL NULL ENSG00000224075
+ENST00000435142 unprocessed_pseudogene 20994888 21001199 NULL NULL ENSG00000224151
+ENST00000412474 unprocessed_pseudogene 19841408 19856476 NULL NULL ENSG00000224166
+ENST00000445573 unprocessed_pseudogene 25803986 25805205 NULL NULL ENSG00000224169
+ENST00000413946 processed_pseudogene 26640856 26641730 NULL NULL ENSG00000224210
+ENST00000420810 processed_pseudogene 28695572 28695890 NULL NULL ENSG00000224240
+ENST00000421178 protein_coding 9374241 9384693 9374241 9384693 ENSG00000224336
+ENST00000438971 unprocessed_pseudogene 9021303 9023412 NULL NULL ENSG00000224408
+ENST00000415994 unprocessed_pseudogene 22970662 22973695 NULL NULL ENSG00000224482
+ENST00000436723 unprocessed_pseudogene 20020635 20022686 NULL NULL ENSG00000224485
+ENST00000427212 processed_pseudogene 17053626 17054595 NULL NULL ENSG00000224518
+ENST00000417797 unprocessed_pseudogene 26066052 26067657 NULL NULL ENSG00000224571
+ENST00000447526 processed_pseudogene 8218972 8220184 NULL NULL ENSG00000224634
+ENST00000451071 transcribed_unprocessed_pseudogene 24795392 24805025 NULL NULL ENSG00000224657
+ENST00000400578 processed_transcript 24795438 24800925 NULL NULL ENSG00000224657
+ENST00000413372 processed_pseudogene 26422392 26423173 NULL NULL ENSG00000224827
+ENST00000457708 unprocessed_pseudogene 25880340 25882396 NULL NULL ENSG00000224866
+ENST00000445813 unprocessed_pseudogene 24666828 24667842 NULL NULL ENSG00000224873
+ENST00000434179 unprocessed_pseudogene 24816763 24819006 NULL NULL ENSG00000224917
+ENST00000413200 processed_pseudogene 6587003 6587221 NULL NULL ENSG00000224953
+ENST00000443200 unprocessed_pseudogene 19909863 19912576 NULL NULL ENSG00000224964
+ENST00000421205 lincRNA 19612838 19626898 NULL NULL ENSG00000224989
+ENST00000443820 unprocessed_pseudogene 14474827 14499123 NULL NULL ENSG00000225117
+ENST00000422655 unprocessed_pseudogene 25608612 25618427 NULL NULL ENSG00000225189
+ENST00000426293 unprocessed_pseudogene 27845598 27848709 NULL NULL ENSG00000225256
+ENST00000431179 unprocessed_pseudogene 27821207 27843493 NULL NULL ENSG00000225287
+ENST00000412165 unprocessed_pseudogene 27894743 27896353 NULL NULL ENSG00000225326
+ENST00000434110 unprocessed_pseudogene 25727742 25760787 NULL NULL ENSG00000225466
+ENST00000433767 unprocessed_pseudogene 26378973 26385131 NULL NULL ENSG00000225491
+ENST00000450145 protein_coding 9293012 9323893 9293012 9323893 ENSG00000225516
+ENST00000423213 protein_coding 9301461 9323893 9301461 9323893 ENSG00000225516
+ENST00000430032 protein_coding 9323618 9344176 9323618 9344176 ENSG00000225516
+ENST00000437686 lincRNA 7567398 7569288 NULL NULL ENSG00000225520
+ENST00000426661 antisense 9185120 9193010 NULL NULL ENSG00000225560
+ENST00000432394 antisense 9187188 9192791 NULL NULL ENSG00000225560
+ENST00000447528 unprocessed_pseudogene 27764791 27766400 NULL NULL ENSG00000225609
+ENST00000453474 unprocessed_pseudogene 24344576 24349859 NULL NULL ENSG00000225615
+ENST00000440297 unprocessed_pseudogene 22889140 22904786 NULL NULL ENSG00000225624
+ENST00000458209 processed_pseudogene 3646038 3647587 NULL NULL ENSG00000225653
+ENST00000456541 processed_transcript 9904163 9904995 NULL NULL ENSG00000225685
+ENST00000426936 transcribed_unprocessed_pseudogene 9904196 9906760 NULL NULL ENSG00000225685
+ENST00000443934 processed_pseudogene 20602683 20603011 NULL NULL ENSG00000225716
+ENST00000429066 processed_pseudogene 19949081 19949408 NULL NULL ENSG00000225740
+ENST00000418221 unprocessed_pseudogene 8132490 8145229 NULL NULL ENSG00000225809
+ENST00000445125 processed_pseudogene 10036113 10036711 NULL NULL ENSG00000225840
+ENST00000426199 unprocessed_pseudogene 28390498 28390720 NULL NULL ENSG00000225876
+ENST00000436888 processed_pseudogene 4669726 4670889 NULL NULL ENSG00000225878
+ENST00000440676 unprocessed_pseudogene 20267200 20269988 NULL NULL ENSG00000225895
+ENST00000425789 processed_pseudogene 19838632 19838960 NULL NULL ENSG00000225896
+ENST00000411756 unprocessed_pseudogene 19921687 19937209 NULL NULL ENSG00000226011
+ENST00000452415 unprocessed_pseudogene 23799361 23800927 NULL NULL ENSG00000226042
+ENST00000455560 unprocessed_pseudogene 10011462 10011816 NULL NULL ENSG00000226061
+ENST00000445405 unprocessed_pseudogene 24073668 24083297 NULL NULL ENSG00000226092
+ENST00000413550 unprocessed_pseudogene 20024579 20033510 NULL NULL ENSG00000226116
+ENST00000449843 unprocessed_pseudogene 9477858 9478325 NULL NULL ENSG00000226223
+ENST00000416040 processed_pseudogene 26829804 26831082 NULL NULL ENSG00000226270
+ENST00000445421 processed_pseudogene 19740393 19741282 NULL NULL ENSG00000226353
+ENST00000458627 lincRNA 20552880 20566932 NULL NULL ENSG00000226362
+ENST00000427537 unprocessed_pseudogene 26081100 26086248 NULL NULL ENSG00000226369
+ENST00000411487 unprocessed_pseudogene 19833478 19835057 NULL NULL ENSG00000226449
+ENST00000423409 processed_pseudogene 23069230 23069448 NULL NULL ENSG00000226504
+ENST00000445453 processed_pseudogene 8239704 8240071 NULL NULL ENSG00000226529
+ENST00000455254 processed_pseudogene 16750976 16752238 NULL NULL ENSG00000226555
+ENST00000421895 unprocessed_pseudogene 20242567 20258088 NULL NULL ENSG00000226611
+ENST00000434773 unprocessed_pseudogene 14649708 14656818 NULL NULL ENSG00000226863
+ENST00000431260 unprocessed_pseudogene 25820526 25821539 NULL NULL ENSG00000226873
+ENST00000436568 lincRNA 25082602 25119431 NULL NULL ENSG00000226906
+ENST00000437359 unprocessed_pseudogene 23473154 23480781 NULL NULL ENSG00000226918
+ENST00000470460 processed_transcript 24549608 24564028 NULL NULL ENSG00000226941
+ENST00000250831 protein_coding 24549617 24564028 24551595 24563750 ENSG00000226941
+ENST00000414629 protein_coding 24454970 24564028 24551595 24563750 ENSG00000226941
+ENST00000445779 protein_coding 24454970 24564028 24551595 24563750 ENSG00000226941
+ENST00000413610 unprocessed_pseudogene 10007397 10007923 NULL NULL ENSG00000226975
+ENST00000412493 unprocessed_pseudogene 17659096 17705211 NULL NULL ENSG00000227166
+ENST00000425544 unprocessed_pseudogene 7995171 8012244 NULL NULL ENSG00000227204
+ENST00000438459 processed_pseudogene 25204177 25205333 NULL NULL ENSG00000227251
+ENST00000444242 unprocessed_pseudogene 2749724 2751693 NULL NULL ENSG00000227289
+ENST00000441906 lincRNA 26631479 26632610 NULL NULL ENSG00000227439
+ENST00000426983 unprocessed_pseudogene 24065082 24067253 NULL NULL ENSG00000227444
+ENST00000381172 unprocessed_pseudogene 14551845 14619171 NULL NULL ENSG00000227447
+ENST00000430663 unprocessed_pseudogene 20642973 20649286 NULL NULL ENSG00000227494
+ENST00000456738 unprocessed_pseudogene 28732789 28737748 NULL NULL ENSG00000227629
+ENST00000423333 unprocessed_pseudogene 27412731 27419678 NULL NULL ENSG00000227633
+ENST00000420603 unprocessed_pseudogene 28079980 28082028 NULL NULL ENSG00000227635
+ENST00000432201 processed_pseudogene 8281078 8282095 NULL NULL ENSG00000227830
+ENST00000445313 processed_pseudogene 26837938 26839079 NULL NULL ENSG00000227837
+ENST00000436690 processed_pseudogene 28137388 28137719 NULL NULL ENSG00000227867
+ENST00000450481 unprocessed_pseudogene 26092074 26098440 NULL NULL ENSG00000227871
+ENST00000444494 unprocessed_pseudogene 25824651 25824982 NULL NULL ENSG00000227915
+ENST00000423569 processed_pseudogene 17053427 17053630 NULL NULL ENSG00000227949
+ENST00000411585 unprocessed_pseudogene 23717738 23719956 NULL NULL ENSG00000227989
+ENST00000428070 processed_pseudogene 20949636 20949966 NULL NULL ENSG00000228193
+ENST00000432613 unprocessed_pseudogene 8769244 8771317 NULL NULL ENSG00000228207
+ENST00000416110 lincRNA 24997731 24998862 NULL NULL ENSG00000228240
+ENST00000452426 unprocessed_pseudogene 23693725 23695897 NULL NULL ENSG00000228257
+ENST00000456123 lincRNA 27209230 27246039 NULL NULL ENSG00000228296
+ENST00000449963 lincRNA 9650924 9655122 NULL NULL ENSG00000228379
+ENST00000433321 antisense 9205412 9213290 NULL NULL ENSG00000228383
+ENST00000436801 antisense 9207466 9213071 NULL NULL ENSG00000228383
+ENST00000452256 antisense 9212578 9235047 NULL NULL ENSG00000228383
+ENST00000432892 antisense 9212578 9214701 NULL NULL ENSG00000228383
+ENST00000420524 antisense 9232924 9235047 NULL NULL ENSG00000228383
+ENST00000418290 processed_pseudogene 14913906 14915763 NULL NULL ENSG00000228411
+ENST00000415662 unprocessed_pseudogene 26113711 26116841 NULL NULL ENSG00000228465
+ENST00000451854 processed_pseudogene 19304706 19305532 NULL NULL ENSG00000228518
+ENST00000432672 unprocessed_pseudogene 24683194 24683996 NULL NULL ENSG00000228571
+ENST00000422712 unprocessed_pseudogene 19628716 19630918 NULL NULL ENSG00000228578
+ENST00000420174 processed_pseudogene 22551937 22558682 NULL NULL ENSG00000228764
+ENST00000427373 lincRNA 27524447 27540866 NULL NULL ENSG00000228786
+ENST00000434164 antisense 16905522 16915913 NULL NULL ENSG00000228787
+ENST00000440679 unprocessed_pseudogene 24210845 24211859 NULL NULL ENSG00000228850
+ENST00000430228 lincRNA 9555262 9558905 NULL NULL ENSG00000228890
+ENST00000457222 protein_coding 9236030 9238826 9236076 9238638 ENSG00000228927
+ENST00000424594 protein_coding 9236076 9238825 9236076 9237891 ENSG00000228927
+ENST00000491844 processed_transcript 9236076 9238832 NULL NULL ENSG00000228927
+ENST00000469322 processed_transcript 9237436 9238817 NULL NULL ENSG00000228927
+ENST00000440483 protein_coding 9236030 9307357 9236076 9307170 ENSG00000228927
+ENST00000427871 protein_coding 9236076 9238832 9236076 9237587 ENSG00000228927
+ENST00000426660 unprocessed_pseudogene 20442064 20446647 NULL NULL ENSG00000228945
+ENST00000415776 processed_pseudogene 19868881 19870005 NULL NULL ENSG00000229129
+ENST00000413320 unprocessed_pseudogene 19993979 19995588 NULL NULL ENSG00000229138
+ENST00000440136 unprocessed_pseudogene 24329997 24332168 NULL NULL ENSG00000229159
+ENST00000414182 processed_pseudogene 2797042 2799161 NULL NULL ENSG00000229163
+ENST00000434454 unprocessed_pseudogene 9669027 9684305 NULL NULL ENSG00000229208
+ENST00000423480 unprocessed_pseudogene 24355478 24362727 NULL NULL ENSG00000229234
+ENST00000455084 lincRNA 22627554 22681114 NULL NULL ENSG00000229236
+ENST00000439472 lincRNA 22669140 22680293 NULL NULL ENSG00000229236
+ENST00000367272 unprocessed_pseudogene 28424070 28500565 NULL NULL ENSG00000229238
+ENST00000431405 unprocessed_pseudogene 26227851 26236493 NULL NULL ENSG00000229250
+ENST00000454958 processed_pseudogene 20438493 20439381 NULL NULL ENSG00000229302
+ENST00000426699 lincRNA 3904538 3968361 NULL NULL ENSG00000229308
+ENST00000451423 unprocessed_pseudogene 27959160 27960713 NULL NULL ENSG00000229343
+ENST00000455273 unprocessed_pseudogene 20615036 20634023 NULL NULL ENSG00000229406
+ENST00000441913 unprocessed_pseudogene 23839132 23843004 NULL NULL ENSG00000229416
+ENST00000425411 processed_pseudogene 20309770 20310894 NULL NULL ENSG00000229465
+ENST00000439651 processed_pseudogene 3734347 3734763 NULL NULL ENSG00000229518
+ENST00000287721 protein_coding 9195406 9198202 9195452 9198014 ENSG00000229549
+ENST00000383000 protein_coding 9195452 9198201 9195452 9197267 ENSG00000229549
+ENST00000477879 processed_transcript 9195452 9198208 NULL NULL ENSG00000229549
+ENST00000436159 processed_transcript 9196812 9198193 NULL NULL ENSG00000229549
+ENST00000383005 protein_coding 9195406 9218479 9195452 9218292 ENSG00000229549
+ENST00000330628 protein_coding 9195452 9218479 9195452 9218292 ENSG00000229549
+ENST00000537415 protein_coding 9195501 9197317 9195758 9197317 ENSG00000229549
+ENST00000449858 processed_pseudogene 23023223 23024223 NULL NULL ENSG00000229551
+ENST00000444617 unprocessed_pseudogene 24195902 24204260 NULL NULL ENSG00000229553
+ENST00000439525 lincRNA 6225260 6229454 NULL NULL ENSG00000229643
+ENST00000424581 unprocessed_pseudogene 27725937 27734591 NULL NULL ENSG00000229709
+ENST00000414540 unprocessed_pseudogene 24017478 24019697 NULL NULL ENSG00000229725
+ENST00000422208 unprocessed_pseudogene 7801038 7805681 NULL NULL ENSG00000229745
+ENST00000427496 unprocessed_pseudogene 24452072 24454098 NULL NULL ENSG00000229940
+ENST00000434308 unprocessed_pseudogene 8839792 8842136 NULL NULL ENSG00000230025
+ENST00000447168 unprocessed_pseudogene 23858456 23860106 NULL NULL ENSG00000230029
+ENST00000427622 antisense 9334896 9342768 NULL NULL ENSG00000230066
+ENST00000414596 antisense 9336950 9342550 NULL NULL ENSG00000230066
+ENST00000444350 unprocessed_pseudogene 25669468 25671516 NULL NULL ENSG00000230073
+ENST00000439968 processed_pseudogene 20694212 20694542 NULL NULL ENSG00000230377
+ENST00000458706 processed_pseudogene 20230369 20230696 NULL NULL ENSG00000230412
+ENST00000446387 unprocessed_pseudogene 20740303 20740446 NULL NULL ENSG00000230458
+ENST00000417699 unprocessed_pseudogene 24727136 24760228 NULL NULL ENSG00000230476
+ENST00000458667 lincRNA 19664680 19691634 NULL NULL ENSG00000230663
+ENST00000424230 unprocessed_pseudogene 24908998 24915934 NULL NULL ENSG00000230727
+ENST00000434374 unprocessed_pseudogene 24674428 24682786 NULL NULL ENSG00000230814
+ENST00000451173 processed_pseudogene 27164646 27165450 NULL NULL ENSG00000230819
+ENST00000419224 unprocessed_pseudogene 28069535 28075973 NULL NULL ENSG00000230854
+ENST00000432915 unprocessed_pseudogene 20973224 20973715 NULL NULL ENSG00000230904
+ENST00000418278 unprocessed_pseudogene 26328624 26329218 NULL NULL ENSG00000230977
+ENST00000414395 unprocessed_pseudogene 26063388 26063870 NULL NULL ENSG00000231026
+ENST00000417334 lincRNA 27874637 27879535 NULL NULL ENSG00000231141
+ENST00000449148 unprocessed_pseudogene 24118460 24151496 NULL NULL ENSG00000231159
+ENST00000416652 unprocessed_pseudogene 20323252 20338370 NULL NULL ENSG00000231311
+ENST00000430062 processed_pseudogene 5075256 5076110 NULL NULL ENSG00000231341
+ENST00000414121 unprocessed_pseudogene 26196025 26197634 NULL NULL ENSG00000231375
+ENST00000446466 unprocessed_pseudogene 10009199 10010334 NULL NULL ENSG00000231411
+ENST00000437794 unprocessed_pseudogene 26102716 26106365 NULL NULL ENSG00000231423
+ENST00000449381 unprocessed_pseudogene 9448180 9458885 NULL NULL ENSG00000231436
+ENST00000435741 processed_pseudogene 28772667 28773306 NULL NULL ENSG00000231514
+ENST00000444263 lincRNA 2870953 2965258 NULL NULL ENSG00000231535
+ENST00000425031 lincRNA 2871039 2970313 NULL NULL ENSG00000231535
+ENST00000418624 processed_pseudogene 25954968 25955206 NULL NULL ENSG00000231540
+ENST00000451661 unprocessed_pseudogene 28140831 28141844 NULL NULL ENSG00000231716
+ENST00000441642 unprocessed_pseudogene 9707273 9708390 NULL NULL ENSG00000231874
+ENST00000453312 unprocessed_pseudogene 8898635 8908029 NULL NULL ENSG00000231988
+ENST00000443152 processed_pseudogene 26646410 26647652 NULL NULL ENSG00000232003
+ENST00000442090 processed_pseudogene 24214967 24215298 NULL NULL ENSG00000232029
+ENST00000412220 unprocessed_pseudogene 27736470 27738492 NULL NULL ENSG00000232064
+ENST00000430735 processed_pseudogene 2696023 2696259 NULL NULL ENSG00000232195
+ENST00000438294 unprocessed_pseudogene 26250254 26252165 NULL NULL ENSG00000232205
+ENST00000398758 unprocessed_pseudogene 14373025 14378737 NULL NULL ENSG00000232226
+ENST00000454543 processed_pseudogene 9003694 9005311 NULL NULL ENSG00000232235
+ENST00000413486 lincRNA 8506335 8512883 NULL NULL ENSG00000232348
+ENST00000453955 lincRNA 8572513 8573324 NULL NULL ENSG00000232419
+ENST00000445112 unprocessed_pseudogene 25886405 25892840 NULL NULL ENSG00000232424
+ENST00000453726 unprocessed_pseudogene 24194701 24195494 NULL NULL ENSG00000232475
+ENST00000450781 unprocessed_pseudogene 22163071 22174090 NULL NULL ENSG00000232522
+ENST00000434155 unprocessed_pseudogene 6968774 6974543 NULL NULL ENSG00000232583
+ENST00000427963 unprocessed_pseudogene 26118927 26141228 NULL NULL ENSG00000232585
+ENST00000429406 unprocessed_pseudogene 27876158 27881307 NULL NULL ENSG00000232614
+ENST00000426129 unprocessed_pseudogene 9502838 9506619 NULL NULL ENSG00000232617
+ENST00000450329 unprocessed_pseudogene 6388504 6388970 NULL NULL ENSG00000232620
+ENST00000436364 processed_pseudogene 23382992 23383975 NULL NULL ENSG00000232634
+ENST00000414751 processed_pseudogene 28007090 28007415 NULL NULL ENSG00000232695
+ENST00000435433 processed_pseudogene 14442350 14443562 NULL NULL ENSG00000232730
+ENST00000430287 unprocessed_pseudogene 20283081 20285685 NULL NULL ENSG00000232744
+ENST00000436338 processed_pseudogene 25171741 25172810 NULL NULL ENSG00000232764
+ENST00000434487 antisense 9167489 9172441 NULL NULL ENSG00000232808
+ENST00000434556 unprocessed_pseudogene 25908985 25911730 NULL NULL ENSG00000232845
+ENST00000451397 unprocessed_pseudogene 20344721 20346300 NULL NULL ENSG00000232899
+ENST00000449393 unprocessed_pseudogene 25897335 25897907 NULL NULL ENSG00000232910
+ENST00000421750 unprocessed_pseudogene 28050665 28053397 NULL NULL ENSG00000232914
+ENST00000420090 processed_pseudogene 9068524 9068856 NULL NULL ENSG00000232924
+ENST00000427677 processed_pseudogene 3550846 3551909 NULL NULL ENSG00000232927
+ENST00000425589 processed_pseudogene 26805484 26806553 NULL NULL ENSG00000232976
+ENST00000417305 antisense 2834885 2870667 NULL NULL ENSG00000233070
+ENST00000431145 antisense 2869669 2870612 NULL NULL ENSG00000233070
+ENST00000442391 unprocessed_pseudogene 20278413 20279581 NULL NULL ENSG00000233120
+ENST00000428264 processed_pseudogene 25195394 25197342 NULL NULL ENSG00000233126
+ENST00000437934 unprocessed_pseudogene 28157582 28158384 NULL NULL ENSG00000233156
+ENST00000453983 unprocessed_pseudogene 20096229 20105140 NULL NULL ENSG00000233378
+ENST00000419557 lincRNA 20488139 20515096 NULL NULL ENSG00000233522
+ENST00000451909 unprocessed_pseudogene 20691244 20691509 NULL NULL ENSG00000233546
+ENST00000437571 processed_pseudogene 27633431 27633809 NULL NULL ENSG00000233619
+ENST00000433995 processed_pseudogene 6705748 6706293 NULL NULL ENSG00000233634
+ENST00000416803 processed_pseudogene 27535139 27537958 NULL NULL ENSG00000233652
+ENST00000438677 lincRNA 8551411 8551919 NULL NULL ENSG00000233699
+ENST00000420675 processed_pseudogene 26424484 26427287 NULL NULL ENSG00000233740
+ENST00000430517 unprocessed_pseudogene 14730915 14746333 NULL NULL ENSG00000233774
+ENST00000426950 protein_coding 9175073 9177887 9175119 9177699 ENSG00000233803
+ENST00000383008 protein_coding 9175119 9177886 9175119 9176952 ENSG00000233803
+ENST00000466036 processed_transcript 9175119 9177893 NULL NULL ENSG00000233803
+ENST00000482082 processed_transcript 9176497 9177878 NULL NULL ENSG00000233803
+ENST00000417124 processed_pseudogene 28546758 28547377 NULL NULL ENSG00000233843
+ENST00000457658 lincRNA 14774265 14802370 NULL NULL ENSG00000233864
+ENST00000440408 lincRNA 14774265 14804162 NULL NULL ENSG00000233864
+ENST00000417071 lincRNA 14774468 14800184 NULL NULL ENSG00000233864
+ENST00000543097 lincRNA 14774284 14776614 NULL NULL ENSG00000233864
+ENST00000447588 processed_pseudogene 27539301 27540203 NULL NULL ENSG00000233944
+ENST00000412870 unprocessed_pseudogene 15042075 15060090 NULL NULL ENSG00000234059
+ENST00000440624 processed_pseudogene 26153058 26153384 NULL NULL ENSG00000234081
+ENST00000429463 unprocessed_pseudogene 9461792 9463961 NULL NULL ENSG00000234110
+ENST00000439309 processed_pseudogene 24663389 24663720 NULL NULL ENSG00000234131
+ENST00000454315 processed_pseudogene 8232073 8233191 NULL NULL ENSG00000234179
+ENST00000452257 processed_pseudogene 13901758 13903233 NULL NULL ENSG00000234385
+ENST00000447471 unprocessed_pseudogene 26542751 26549673 NULL NULL ENSG00000234399
+ENST00000382707 protein_coding 23696765 23711212 23698778 23710934 ENSG00000234414
+ENST00000361046 nonsense_mediated_decay 23696790 23711212 23698778 23704609 ENSG00000234414
+ENST00000303902 protein_coding 23698778 23711210 23698778 23710934 ENSG00000234414
+ENST00000439108 protein_coding 23673258 23711210 23706817 23710934 ENSG00000234414
+ENST00000431768 processed_pseudogene 21489455 21490459 NULL NULL ENSG00000234529
+ENST00000418461 unprocessed_pseudogene 6171998 6173115 NULL NULL ENSG00000234583
+ENST00000421058 unprocessed_pseudogene 17460542 17567954 NULL NULL ENSG00000234620
+ENST00000455855 processed_pseudogene 3161849 3162867 NULL NULL ENSG00000234652
+ENST00000421008 unprocessed_pseudogene 28148401 28155003 NULL NULL ENSG00000234744
+ENST00000455527 transcribed_unprocessed_pseudogene 7581026 7644748 NULL NULL ENSG00000234795
+ENST00000442584 processed_transcript 7610636 7677302 NULL NULL ENSG00000234795
+ENST00000448006 antisense 9355208 9363096 NULL NULL ENSG00000234803
+ENST00000418016 antisense 9357275 9362877 NULL NULL ENSG00000234803
+ENST00000598351 antisense 9362384 9364506 NULL NULL ENSG00000234803
+ENST00000451062 processed_transcript 6124308 6131994 NULL NULL ENSG00000234830
+ENST00000452103 processed_transcript 6126374 6131976 NULL NULL ENSG00000234830
+ENST00000505707 transcribed_unprocessed_pseudogene 6130047 6131976 NULL NULL ENSG00000234830
+ENST00000452458 processed_pseudogene 8240282 8240751 NULL NULL ENSG00000234850
+ENST00000411536 unprocessed_pseudogene 28201926 28234640 NULL NULL ENSG00000234888
+ENST00000447105 unprocessed_pseudogene 9789162 9797032 NULL NULL ENSG00000234950
+ENST00000422633 processed_pseudogene 5205786 5207005 NULL NULL ENSG00000235001
+ENST00000545933 processed_pseudogene 5205788 5206981 NULL NULL ENSG00000235001
+ENST00000430729 unprocessed_pseudogene 27869163 27870333 NULL NULL ENSG00000235004
+ENST00000439103 unprocessed_pseudogene 28344478 28354295 NULL NULL ENSG00000235014
+ENST00000446299 lincRNA 24585087 24630861 NULL NULL ENSG00000235059
+ENST00000439653 lincRNA 24626684 24631739 NULL NULL ENSG00000235059
+ENST00000445200 unprocessed_pseudogene 6174083 6175961 NULL NULL ENSG00000235094
+ENST00000411983 processed_pseudogene 5661341 5661778 NULL NULL ENSG00000235175
+ENST00000434709 unprocessed_pseudogene 9925635 9927195 NULL NULL ENSG00000235193
+ENST00000420149 lincRNA 26716349 26753172 NULL NULL ENSG00000235412
+ENST00000458328 unprocessed_pseudogene 16192955 16198480 NULL NULL ENSG00000235451
+ENST00000452645 transcribed_unprocessed_pseudogene 15265842 15274125 NULL NULL ENSG00000235462
+ENST00000439217 processed_transcript 15271184 15273460 NULL NULL ENSG00000235462
+ENST00000415230 unprocessed_pseudogene 20811557 20812165 NULL NULL ENSG00000235479
+ENST00000420889 unprocessed_pseudogene 28018078 28044800 NULL NULL ENSG00000235511
+ENST00000418455 unprocessed_pseudogene 19900195 19901364 NULL NULL ENSG00000235521
+ENST00000421819 unprocessed_pseudogene 27658448 27673933 NULL NULL ENSG00000235583
+ENST00000420610 unprocessed_pseudogene 14077914 14108092 NULL NULL ENSG00000235649
+ENST00000416843 unprocessed_pseudogene 9928411 9928816 NULL NULL ENSG00000235691
+ENST00000425318 unprocessed_pseudogene 23627270 23629049 NULL NULL ENSG00000235719
+ENST00000431853 processed_pseudogene 59001391 59001635 NULL NULL ENSG00000235857
+ENST00000425857 unprocessed_pseudogene 6361017 6364798 NULL NULL ENSG00000235895
+ENST00000445264 unprocessed_pseudogene 26288505 26303990 NULL NULL ENSG00000235981
+ENST00000428342 processed_pseudogene 17019778 17019948 NULL NULL ENSG00000236131
+ENST00000454995 processed_pseudogene 27314746 27315988 NULL NULL ENSG00000236379
+ENST00000428845 protein_coding 9365489 9368285 9365535 9368097 ENSG00000236424
+ENST00000444056 protein_coding 9365535 9368284 9365535 9367350 ENSG00000236424
+ENST00000489397 processed_transcript 9365535 9368291 NULL NULL ENSG00000236424
+ENST00000495839 processed_transcript 9366895 9368276 NULL NULL ENSG00000236424
+ENST00000429039 protein_coding 9365535 9368284 9365535 9367069 ENSG00000236424
+ENST00000432046 unprocessed_pseudogene 20903729 20903872 NULL NULL ENSG00000236429
+ENST00000421279 unprocessed_pseudogene 7555780 7557589 NULL NULL ENSG00000236435
+ENST00000438550 processed_pseudogene 14365457 14366162 NULL NULL ENSG00000236477
+ENST00000436067 processed_pseudogene 20340818 20341146 NULL NULL ENSG00000236599
+ENST00000428616 unprocessed_pseudogene 23567656 23567881 NULL NULL ENSG00000236615
+ENST00000442113 unprocessed_pseudogene 25862069 25872955 NULL NULL ENSG00000236620
+ENST00000444014 unprocessed_pseudogene 25695956 25697532 NULL NULL ENSG00000236647
+ENST00000421675 unprocessed_pseudogene 7558552 7560719 NULL NULL ENSG00000236690
+ENST00000429799 unprocessed_pseudogene 9860098 9871796 NULL NULL ENSG00000236718
+ENST00000420376 unprocessed_pseudogene 9868005 9868142 NULL NULL ENSG00000236718
+ENST00000457163 unprocessed_pseudogene 9385717 9388290 NULL NULL ENSG00000236786
+ENST00000417910 lincRNA 24246961 24252016 NULL NULL ENSG00000236951
+ENST00000419158 lincRNA 24247839 24293631 NULL NULL ENSG00000236951
+ENST00000434481 unprocessed_pseudogene 23823812 23835894 NULL NULL ENSG00000237023
+ENST00000413466 lincRNA 7672965 7678724 NULL NULL ENSG00000237048
+ENST00000451467 lincRNA 6110487 6111651 NULL NULL ENSG00000237069
+ENST00000430307 processed_pseudogene 6026843 6027306 NULL NULL ENSG00000237195
+ENST00000451162 unprocessed_pseudogene 23592583 23599131 NULL NULL ENSG00000237269
+ENST00000418213 unprocessed_pseudogene 25917379 25944297 NULL NULL ENSG00000237302
+ENST00000418578 processed_pseudogene 23292756 23293067 NULL NULL ENSG00000237427
+ENST00000425026 processed_pseudogene 10027986 10029907 NULL NULL ENSG00000237447
+ENST00000435696 unprocessed_pseudogene 26223946 26225959 NULL NULL ENSG00000237467
+ENST00000440468 unprocessed_pseudogene 28089415 28100299 NULL NULL ENSG00000237546
+ENST00000429883 unprocessed_pseudogene 20063469 20065380 NULL NULL ENSG00000237558
+ENST00000421353 lincRNA 6311475 6315118 NULL NULL ENSG00000237563
+ENST00000452432 unprocessed_pseudogene 20107081 20109132 NULL NULL ENSG00000237616
+ENST00000454281 processed_pseudogene 2657868 2658369 NULL NULL ENSG00000237659
+ENST00000431631 processed_pseudogene 6768794 6769413 NULL NULL ENSG00000237701
+ENST00000442145 antisense 9225731 9233636 NULL NULL ENSG00000237802
+ENST00000450658 antisense 9227800 9233417 NULL NULL ENSG00000237802
+ENST00000411668 unprocessed_pseudogene 27710269 27712180 NULL NULL ENSG00000237823
+ENST00000433481 unprocessed_pseudogene 24546550 24548715 NULL NULL ENSG00000237902
+ENST00000435945 unprocessed_pseudogene 28740998 28780799 NULL NULL ENSG00000237917
+ENST00000426526 unprocessed_pseudogene 24086159 24087740 NULL NULL ENSG00000237968
+ENST00000422174 processed_pseudogene 22148461 22149738 NULL NULL ENSG00000237997
+ENST00000423438 unprocessed_pseudogene 20670463 20670954 NULL NULL ENSG00000238067
+ENST00000457961 unprocessed_pseudogene 7539784 7544065 NULL NULL ENSG00000238073
+ENST00000440215 protein_coding 9324922 9327689 9324922 9327502 ENSG00000238074
+ENST00000446779 protein_coding 9324922 9327689 9324922 9326349 ENSG00000238074
+ENST00000454643 unprocessed_pseudogene 21010150 21029166 NULL NULL ENSG00000238088
+ENST00000454868 unprocessed_pseudogene 20987993 20991373 NULL NULL ENSG00000238135
+ENST00000424401 unprocessed_pseudogene 9027238 9042220 NULL NULL ENSG00000238154
+ENST00000435012 unprocessed_pseudogene 19733139 19737712 NULL NULL ENSG00000238191
+ENST00000442607 unprocessed_pseudogene 6134634 6137316 NULL NULL ENSG00000238235
+ENST00000452889 lincRNA 9748407 9749571 NULL NULL ENSG00000239225
+ENST00000419538 processed_pseudogene 27632787 27633469 NULL NULL ENSG00000239304
+ENST00000398377 processed_transcript 26356114 26360978 NULL NULL ENSG00000239533
+ENST00000516761 miRNA 26356197 26356279 NULL NULL ENSG00000239533
+ENST00000439586 processed_pseudogene 7936937 7938764 NULL NULL ENSG00000239893
+ENST00000447585 unprocessed_pseudogene 20743949 20790963 NULL NULL ENSG00000240438
+ENST00000306641 lincRNA 27629055 27632852 NULL NULL ENSG00000240450
+ENST00000432862 processed_pseudogene 26796955 26797681 NULL NULL ENSG00000240566
+ENST00000425158 processed_pseudogene 7859028 7859847 NULL NULL ENSG00000241200
+ENST00000472227 transcribed_unprocessed_pseudogene 15863536 16027704 NULL NULL ENSG00000241859
+ENST00000430079 processed_transcript 15863673 15970474 NULL NULL ENSG00000241859
+ENST00000460561 processed_transcript 15864978 15983586 NULL NULL ENSG00000241859
+ENST00000451061 transcribed_unprocessed_pseudogene 20835919 20899975 NULL NULL ENSG00000242153
+ENST00000432335 processed_transcript 20891768 20901083 NULL NULL ENSG00000242153
+ENST00000538268 transcribed_unprocessed_pseudogene 20893577 20901083 NULL NULL ENSG00000242153
+ENST00000358944 nonsense_mediated_decay 24049765 24064189 24056368 24062201 ENSG00000242389
+ENST00000382659 protein_coding 24049765 24064214 24050043 24062201 ENSG00000242389
+ENST00000382673 protein_coding 24026223 24064174 24026501 24062201 ENSG00000242389
+ENST00000382658 protein_coding 24049997 24064127 24050043 24062201 ENSG00000242389
+ENST00000456659 unprocessed_pseudogene 23670185 23672356 NULL NULL ENSG00000242393
+ENST00000485099 misc_RNA 26357107 26357382 NULL NULL ENSG00000242425
+ENST00000441091 unprocessed_pseudogene 26325463 26329652 NULL NULL ENSG00000242854
+ENST00000383020 protein_coding 23673224 23687672 23675237 23687394 ENSG00000242875
+ENST00000382639 nonsense_mediated_decay 23673249 23687672 23675237 23681068 ENSG00000242875
+ENST00000437993 unprocessed_pseudogene 6175751 6177628 NULL NULL ENSG00000242879
+ENST00000303922 processed_transcript 24455006 24462352 NULL NULL ENSG00000243040
+ENST00000420346 transcribed_unprocessed_pseudogene 24457011 24467972 NULL NULL ENSG00000243040
+ENST00000450910 unprocessed_pseudogene 9873804 9875984 NULL NULL ENSG00000243643
+ENST00000488394 misc_RNA 14394177 14394465 NULL NULL ENSG00000243980
+ENST00000422002 processed_pseudogene 25163210 25163968 NULL NULL ENSG00000244000
+ENST00000306853 processed_transcript 26329581 26333378 NULL NULL ENSG00000244231
+ENST00000451913 processed_pseudogene 7781463 7782131 NULL NULL ENSG00000244246
+ENST00000382653 nonsense_mediated_decay 24026223 24040648 24032827 24038660 ENSG00000244395
+ENST00000382680 protein_coding 24026223 24040673 24026501 24038660 ENSG00000244395
+ENST00000382677 protein_coding 24026223 24038660 24026501 24038660 ENSG00000244395
+ENST00000418956 protein_coding 24026455 24040586 24026501 24038660 ENSG00000244395
+ENST00000442362 unprocessed_pseudogene 20290496 20298506 NULL NULL ENSG00000244646
+ENST00000535771 pseudogene 20297335 20298913 NULL NULL ENSG00000244646
+ENST00000514804 transcribed_unprocessed_pseudogene 20952595 20952937 NULL NULL ENSG00000248573
+ENST00000509776 unprocessed_pseudogene 26424828 26437493 NULL NULL ENSG00000248792
+ENST00000504503 unprocessed_pseudogene 20938165 20941313 NULL NULL ENSG00000249501
+ENST00000509650 unprocessed_pseudogene 20817406 20818076 NULL NULL ENSG00000249606
+ENST00000511770 unprocessed_pseudogene 20652801 20656181 NULL NULL ENSG00000249634
+ENST00000503144 unprocessed_pseudogene 20548860 20551037 NULL NULL ENSG00000249726
+ENST00000442790 unprocessed_pseudogene 6111336 6113324 NULL NULL ENSG00000250204
+ENST00000510392 unprocessed_pseudogene 19880996 19889280 NULL NULL ENSG00000250868
+ENST00000354494 pseudogene 19880862 19882440 NULL NULL ENSG00000250868
+ENST00000513521 unprocessed_pseudogene 20702866 20706014 NULL NULL ENSG00000250951
+ENST00000506069 unprocessed_pseudogene 6109809 6111670 NULL NULL ENSG00000251275
+ENST00000510613 lincRNA 20653626 20709584 NULL NULL ENSG00000251510
+ENST00000509611 unprocessed_pseudogene 24041541 24043706 NULL NULL ENSG00000251618
+ENST00000515896 rRNA 10037764 10037915 NULL NULL ENSG00000251705
+ENST00000515957 rRNA 9928019 9928137 NULL NULL ENSG00000251766
+ENST00000515987 snoRNA 28393531 28393668 NULL NULL ENSG00000251796
+ENST00000516032 snRNA 2652790 2652894 NULL NULL ENSG00000251841
+ENST00000516070 miRNA 5742287 5742379 NULL NULL ENSG00000251879
+ENST00000516108 snRNA 26092765 26092918 NULL NULL ENSG00000251917
+ENST00000516116 snoRNA 25569246 25569383 NULL NULL ENSG00000251925
+ENST00000516144 rRNA 20508009 20508124 NULL NULL ENSG00000251953
+ENST00000516157 miRNA 10033981 10034093 NULL NULL ENSG00000251966
+ENST00000516161 snRNA 20995615 20995776 NULL NULL ENSG00000251970
+ENST00000516187 misc_RNA 7209572 7209683 NULL NULL ENSG00000251996
+ENST00000516203 rRNA 19671654 19671769 NULL NULL ENSG00000252012
+ENST00000516229 miRNA 19734941 19735035 NULL NULL ENSG00000252038
+ENST00000516250 miRNA 5441969 5442060 NULL NULL ENSG00000252059
+ENST00000516346 snRNA 7246713 7246820 NULL NULL ENSG00000252155
+ENST00000516357 snRNA 20278734 20278893 NULL NULL ENSG00000252166
+ENST00000516364 snRNA 18360815 18360921 NULL NULL ENSG00000252173
+ENST00000516400 snRNA 20648397 20648558 NULL NULL ENSG00000252209
+ENST00000516480 rRNA 9930484 9930602 NULL NULL ENSG00000252289
+ENST00000516506 rRNA 19669744 19669856 NULL NULL ENSG00000252315
+ENST00000516514 snRNA 18448163 18448269 NULL NULL ENSG00000252323
+ENST00000516617 snRNA 27869489 27869642 NULL NULL ENSG00000252426
+ENST00000516630 miRNA 20444739 20444833 NULL NULL ENSG00000252439
+ENST00000516659 snRNA 4887117 4887307 NULL NULL ENSG00000252468
+ENST00000516662 miRNA 27606157 27606239 NULL NULL ENSG00000252471
+ENST00000516663 snRNA 7291095 7291199 NULL NULL ENSG00000252472
+ENST00000516704 snRNA 19900883 19901042 NULL NULL ENSG00000252513
+ENST00000516777 miRNA 14482123 14482230 NULL NULL ENSG00000252586
+ENST00000516816 snRNA 28075126 28075289 NULL NULL ENSG00000252625
+ENST00000516824 misc_RNA 7192338 7192636 NULL NULL ENSG00000252633
+ENST00000516855 miRNA 18174646 18174709 NULL NULL ENSG00000252664
+ENST00000516858 rRNA 20509921 20510033 NULL NULL ENSG00000252667
+ENST00000516872 snRNA 25887089 25887252 NULL NULL ENSG00000252681
+ENST00000516880 snoRNA 18250128 18250259 NULL NULL ENSG00000252689
+ENST00000516885 miRNA 15779840 15779936 NULL NULL ENSG00000252694
+ENST00000516957 snRNA 21180869 21180973 NULL NULL ENSG00000252766
+ENST00000517046 miRNA 13340551 13340633 NULL NULL ENSG00000252855
+ENST00000517091 snRNA 4043026 4043131 NULL NULL ENSG00000252900
+ENST00000517139 snRNA 28507136 28507239 NULL NULL ENSG00000252948
+ENST00000527562 lincRNA 23200175 23206610 NULL NULL ENSG00000254488
+ENST00000555130 unprocessed_pseudogene 13462594 13463857 NULL NULL ENSG00000258567
+ENST00000557448 unprocessed_pseudogene 13488005 13489271 NULL NULL ENSG00000258991
+ENST00000451548 protein_coding 9304564 9307357 9304610 9307170 ENSG00000258992
+ENST00000423647 protein_coding 9236076 9307357 9236076 9307170 ENSG00000258992
+ENST00000553347 unprocessed_pseudogene 13477233 13478499 NULL NULL ENSG00000259029
+ENST00000557360 unprocessed_pseudogene 13470597 13471863 NULL NULL ENSG00000259154
+ENST00000558356 unprocessed_pseudogene 24476599 24478647 NULL NULL ENSG00000259247
+ENST00000566193 lincRNA 21853827 21856492 NULL NULL ENSG00000260197
+ENST00000584011 miRNA 13340359 13340440 NULL NULL ENSG00000263502
+ENST00000580394 miRNA 13947443 13947512 NULL NULL ENSG00000265161
+ENST00000584045 miRNA 18398127 18398238 NULL NULL ENSG00000265197
+ENST00000578366 miRNA 16364065 16364171 NULL NULL ENSG00000266220
+ENST00000586015 transcribed_processed_pseudogene 21760074 21760643 NULL NULL ENSG00000267793
+ENST00000601700 protein_coding 25847479 25850592 25847479 25850592 ENSG00000267935
+ENST00000599485 protein_coding 21737895 21738068 21737895 21738068 ENSG00000269084
+ENST00000595988 protein_coding 15418467 15429181 15418467 15429181 ENSG00000269291
+ENST00000598545 protein_coding 28111776 28114889 28111776 28114889 ENSG00000269393
+ENST00000601705 protein_coding 5306691 5312605 5306691 5312605 ENSG00000269464
+ENST00000448518 unprocessed_pseudogene 9345205 9347784 NULL NULL ENSG00000270073
+ENST00000605584 processed_pseudogene 13551375 13552752 NULL NULL ENSG00000270242
+ENST00000603738 processed_pseudogene 13491303 13493369 NULL NULL ENSG00000270455
+ENST00000604436 processed_pseudogene 27809047 27809373 NULL NULL ENSG00000270535
+ENST00000603467 processed_pseudogene 13263395 13263563 NULL NULL ENSG00000270570
+ENST00000604924 processed_pseudogene 23793569 23793901 NULL NULL ENSG00000271123
+ENST00000604370 processed_pseudogene 13263065 13263272 NULL NULL ENSG00000271309
+ENST00000605663 processed_pseudogene 13262741 13262939 NULL NULL ENSG00000271365
+ENST00000604178 processed_pseudogene 13629403 13629913 NULL NULL ENSG00000271375
+ENST00000604289 processed_pseudogene 19576759 19577094 NULL NULL ENSG00000271595
+ENST00000606439 misc_RNA 27605054 27605329 NULL NULL ENSG00000272042
diff --git a/inst/chrY/ens_tx2exon.txt b/inst/chrY/ens_tx2exon.txt
new file mode 100644
index 0000000..8c8c361
--- /dev/null
+++ b/inst/chrY/ens_tx2exon.txt
@@ -0,0 +1,3745 @@
+tx_id exon_id exon_idx
+ENST00000469599 ENSE00001902471 1
+ENST00000469599 ENSE00003665408 2
+ENST00000469599 ENSE00003572891 3
+ENST00000469599 ENSE00003452486 4
+ENST00000469599 ENSE00003535737 5
+ENST00000469599 ENSE00003306433 6
+ENST00000469599 ENSE00003587057 7
+ENST00000469599 ENSE00003522786 8
+ENST00000469599 ENSE00003674105 9
+ENST00000469599 ENSE00003605358 10
+ENST00000469599 ENSE00001928834 11
+ENST00000469599 ENSE00001954294 12
+ENST00000317961 ENSE00001733393 1
+ENST00000317961 ENSE00000891759 2
+ENST00000317961 ENSE00000652508 3
+ENST00000317961 ENSE00000652506 4
+ENST00000317961 ENSE00001788914 5
+ENST00000317961 ENSE00001805865 6
+ENST00000317961 ENSE00000652503 7
+ENST00000317961 ENSE00000652502 8
+ENST00000317961 ENSE00000652501 9
+ENST00000317961 ENSE00001799734 10
+ENST00000317961 ENSE00000652498 11
+ENST00000317961 ENSE00001786866 12
+ENST00000317961 ENSE00003497148 13
+ENST00000317961 ENSE00003688313 14
+ENST00000317961 ENSE00003664560 15
+ENST00000317961 ENSE00003602862 16
+ENST00000317961 ENSE00003491684 17
+ENST00000317961 ENSE00003488142 18
+ENST00000317961 ENSE00003534652 19
+ENST00000317961 ENSE00003607600 20
+ENST00000317961 ENSE00003544083 21
+ENST00000317961 ENSE00003591494 22
+ENST00000317961 ENSE00003484526 23
+ENST00000317961 ENSE00003539670 24
+ENST00000317961 ENSE00001670444 25
+ENST00000317961 ENSE00001594027 26
+ENST00000317961 ENSE00001945799 27
+ENST00000382806 ENSE00001277406 1
+ENST00000382806 ENSE00000891759 2
+ENST00000382806 ENSE00000652508 3
+ENST00000382806 ENSE00000652506 4
+ENST00000382806 ENSE00001805865 5
+ENST00000382806 ENSE00000652503 6
+ENST00000382806 ENSE00000652502 7
+ENST00000382806 ENSE00000652501 8
+ENST00000382806 ENSE00001799734 9
+ENST00000382806 ENSE00000652498 10
+ENST00000382806 ENSE00001786866 11
+ENST00000382806 ENSE00003497148 12
+ENST00000382806 ENSE00003688313 13
+ENST00000382806 ENSE00003664560 14
+ENST00000382806 ENSE00003602862 15
+ENST00000382806 ENSE00003491684 16
+ENST00000382806 ENSE00003488142 17
+ENST00000382806 ENSE00003534652 18
+ENST00000382806 ENSE00003607600 19
+ENST00000382806 ENSE00003544083 20
+ENST00000382806 ENSE00003591494 21
+ENST00000382806 ENSE00003484526 22
+ENST00000382806 ENSE00003539670 23
+ENST00000382806 ENSE00001670444 24
+ENST00000382806 ENSE00001594027 25
+ENST00000382806 ENSE00001919342 26
+ENST00000492117 ENSE00001946630 1
+ENST00000492117 ENSE00003582949 2
+ENST00000492117 ENSE00003593440 3
+ENST00000492117 ENSE00003665408 4
+ENST00000492117 ENSE00001900202 5
+ENST00000492117 ENSE00003677957 6
+ENST00000492117 ENSE00003607967 7
+ENST00000492117 ENSE00003535737 8
+ENST00000492117 ENSE00003484073 9
+ENST00000492117 ENSE00003587057 10
+ENST00000492117 ENSE00003522786 11
+ENST00000492117 ENSE00003674105 12
+ENST00000492117 ENSE00003605358 13
+ENST00000492117 ENSE00001928834 14
+ENST00000492117 ENSE00001941473 15
+ENST00000440077 ENSE00001733393 1
+ENST00000440077 ENSE00000891759 2
+ENST00000440077 ENSE00000652508 3
+ENST00000440077 ENSE00001788914 4
+ENST00000440077 ENSE00001805865 5
+ENST00000440077 ENSE00000652503 6
+ENST00000440077 ENSE00000652502 7
+ENST00000440077 ENSE00000652501 8
+ENST00000440077 ENSE00001799734 9
+ENST00000440077 ENSE00000652498 10
+ENST00000440077 ENSE00001786866 11
+ENST00000440077 ENSE00003497148 12
+ENST00000440077 ENSE00003688313 13
+ENST00000440077 ENSE00003664560 14
+ENST00000440077 ENSE00003602862 15
+ENST00000440077 ENSE00003491684 16
+ENST00000440077 ENSE00003488142 17
+ENST00000440077 ENSE00003534652 18
+ENST00000440077 ENSE00003607600 19
+ENST00000440077 ENSE00003544083 20
+ENST00000440077 ENSE00003591494 21
+ENST00000440077 ENSE00003484526 22
+ENST00000440077 ENSE00003539670 23
+ENST00000440077 ENSE00001670444 24
+ENST00000440077 ENSE00001594027 25
+ENST00000440077 ENSE00001741626 26
+ENST00000415360 ENSE00001692290 1
+ENST00000415360 ENSE00001803510 2
+ENST00000415360 ENSE00003484526 3
+ENST00000415360 ENSE00001658853 4
+ENST00000485154 ENSE00001849878 1
+ENST00000485154 ENSE00001940004 2
+ENST00000478891 ENSE00001928223 1
+ENST00000478891 ENSE00003572891 2
+ENST00000478891 ENSE00001930174 3
+ENST00000447300 ENSE00001685428 1
+ENST00000447300 ENSE00000891759 2
+ENST00000447300 ENSE00000652508 3
+ENST00000447300 ENSE00000652506 4
+ENST00000447300 ENSE00001788914 5
+ENST00000447300 ENSE00000652503 6
+ENST00000447300 ENSE00000652502 7
+ENST00000447300 ENSE00000652501 8
+ENST00000447300 ENSE00001799734 9
+ENST00000447300 ENSE00000652498 10
+ENST00000447300 ENSE00001652491 11
+ENST00000541639 ENSE00002279676 1
+ENST00000541639 ENSE00000891759 2
+ENST00000541639 ENSE00000652508 3
+ENST00000541639 ENSE00000652506 4
+ENST00000541639 ENSE00001788914 5
+ENST00000541639 ENSE00001805865 6
+ENST00000541639 ENSE00000652503 7
+ENST00000541639 ENSE00000652502 8
+ENST00000541639 ENSE00000652501 9
+ENST00000541639 ENSE00001799734 10
+ENST00000541639 ENSE00000652498 11
+ENST00000541639 ENSE00002207746 12
+ENST00000541639 ENSE00001786866 13
+ENST00000541639 ENSE00003497148 14
+ENST00000541639 ENSE00003688313 15
+ENST00000541639 ENSE00003664560 16
+ENST00000541639 ENSE00003602862 17
+ENST00000541639 ENSE00003491684 18
+ENST00000541639 ENSE00003488142 19
+ENST00000541639 ENSE00003534652 20
+ENST00000541639 ENSE00003607600 21
+ENST00000541639 ENSE00003544083 22
+ENST00000541639 ENSE00003591494 23
+ENST00000541639 ENSE00003484526 24
+ENST00000541639 ENSE00003539670 25
+ENST00000541639 ENSE00001670444 26
+ENST00000541639 ENSE00001594027 27
+ENST00000541639 ENSE00002285913 28
+ENST00000360160 ENSE00001402535 1
+ENST00000360160 ENSE00001403038 2
+ENST00000360160 ENSE00003463722 3
+ENST00000360160 ENSE00001605860 4
+ENST00000360160 ENSE00000862006 5
+ENST00000360160 ENSE00000773394 6
+ENST00000360160 ENSE00003574309 7
+ENST00000360160 ENSE00003652139 8
+ENST00000360160 ENSE00003536211 9
+ENST00000360160 ENSE00001614446 10
+ENST00000360160 ENSE00001746396 11
+ENST00000360160 ENSE00000773388 12
+ENST00000360160 ENSE00003534942 13
+ENST00000360160 ENSE00000773386 14
+ENST00000360160 ENSE00001703401 15
+ENST00000360160 ENSE00001729063 16
+ENST00000360160 ENSE00001722940 17
+ENST00000360160 ENSE00001928277 18
+ENST00000454054 ENSE00001685858 1
+ENST00000454054 ENSE00001350052 2
+ENST00000454054 ENSE00003463722 3
+ENST00000454054 ENSE00001605860 4
+ENST00000454054 ENSE00000862006 5
+ENST00000454054 ENSE00000773394 6
+ENST00000454054 ENSE00003574309 7
+ENST00000454054 ENSE00003656145 8
+ENST00000336079 ENSE00001832940 1
+ENST00000336079 ENSE00003463722 2
+ENST00000336079 ENSE00001605860 3
+ENST00000336079 ENSE00000862006 4
+ENST00000336079 ENSE00000773394 5
+ENST00000336079 ENSE00003574309 6
+ENST00000336079 ENSE00003652139 7
+ENST00000336079 ENSE00003536211 8
+ENST00000336079 ENSE00001614446 9
+ENST00000336079 ENSE00001746396 10
+ENST00000336079 ENSE00000773388 11
+ENST00000336079 ENSE00003534942 12
+ENST00000336079 ENSE00000773386 13
+ENST00000336079 ENSE00001703401 14
+ENST00000336079 ENSE00001729063 15
+ENST00000336079 ENSE00001722940 16
+ENST00000336079 ENSE00001352037 17
+ENST00000493363 ENSE00001826669 1
+ENST00000493363 ENSE00003643575 2
+ENST00000493363 ENSE00001906809 3
+ENST00000440554 ENSE00001797666 1
+ENST00000440554 ENSE00003463722 2
+ENST00000440554 ENSE00001605860 3
+ENST00000440554 ENSE00000862006 4
+ENST00000440554 ENSE00000773394 5
+ENST00000440554 ENSE00003574309 6
+ENST00000440554 ENSE00003652139 7
+ENST00000440554 ENSE00003670172 8
+ENST00000469101 ENSE00001937992 1
+ENST00000469101 ENSE00001848230 2
+ENST00000472510 ENSE00003512200 1
+ENST00000472510 ENSE00003537802 2
+ENST00000472510 ENSE00001948256 3
+ENST00000463199 ENSE00001935991 1
+ENST00000463199 ENSE00003537802 2
+ENST00000463199 ENSE00003534699 3
+ENST00000463199 ENSE00001893350 4
+ENST00000495478 ENSE00001811040 1
+ENST00000495478 ENSE00003693954 2
+ENST00000495478 ENSE00001922429 3
+ENST00000383052 ENSE00001803243 1
+ENST00000383052 ENSE00002223884 2
+ENST00000383052 ENSE00003645989 3
+ENST00000383052 ENSE00003548678 4
+ENST00000383052 ENSE00003611496 5
+ENST00000383052 ENSE00001649504 6
+ENST00000383052 ENSE00001777381 7
+ENST00000383052 ENSE00001494540 8
+ENST00000469869 ENSE00001857352 1
+ENST00000469869 ENSE00003626126 2
+ENST00000469869 ENSE00003631374 3
+ENST00000469869 ENSE00001955114 4
+ENST00000443793 ENSE00001334555 1
+ENST00000443793 ENSE00002223884 2
+ENST00000443793 ENSE00001597745 3
+ENST00000478783 ENSE00001900413 1
+ENST00000478783 ENSE00001880607 2
+ENST00000431102 ENSE00001746216 1
+ENST00000431102 ENSE00002223884 2
+ENST00000431102 ENSE00003548678 3
+ENST00000431102 ENSE00003611496 4
+ENST00000431102 ENSE00001649504 5
+ENST00000431102 ENSE00001777381 6
+ENST00000431102 ENSE00001368923 7
+ENST00000155093 ENSE00001648585 1
+ENST00000155093 ENSE00002223884 2
+ENST00000155093 ENSE00003645989 3
+ENST00000155093 ENSE00003548678 4
+ENST00000155093 ENSE00003611496 5
+ENST00000155093 ENSE00001649504 6
+ENST00000155093 ENSE00001777381 7
+ENST00000155093 ENSE00001368923 8
+ENST00000449237 ENSE00001648585 1
+ENST00000449237 ENSE00003604519 2
+ENST00000449237 ENSE00003548678 3
+ENST00000449237 ENSE00003611496 4
+ENST00000449237 ENSE00001777381 5
+ENST00000449237 ENSE00001368923 6
+ENST00000383032 ENSE00001408731 1
+ENST00000383032 ENSE00001494437 2
+ENST00000383032 ENSE00001494434 3
+ENST00000383032 ENSE00001494433 4
+ENST00000383032 ENSE00001494431 5
+ENST00000383032 ENSE00001793373 6
+ENST00000383032 ENSE00003432678 7
+ENST00000383032 ENSE00003437631 8
+ENST00000383032 ENSE00003379331 9
+ENST00000383032 ENSE00001213593 10
+ENST00000383032 ENSE00001607530 11
+ENST00000383032 ENSE00001716749 12
+ENST00000383032 ENSE00001744765 13
+ENST00000383032 ENSE00001731081 14
+ENST00000383032 ENSE00001654905 15
+ENST00000383032 ENSE00001729171 16
+ENST00000383032 ENSE00001763966 17
+ENST00000383032 ENSE00001593786 18
+ENST00000383032 ENSE00001370395 19
+ENST00000355162 ENSE00001408731 1
+ENST00000355162 ENSE00001494437 2
+ENST00000355162 ENSE00001494433 3
+ENST00000355162 ENSE00001494431 4
+ENST00000355162 ENSE00001793373 5
+ENST00000355162 ENSE00003432678 6
+ENST00000355162 ENSE00003437631 7
+ENST00000355162 ENSE00003379331 8
+ENST00000355162 ENSE00001213593 9
+ENST00000355162 ENSE00001607530 10
+ENST00000355162 ENSE00001716749 11
+ENST00000355162 ENSE00001744765 12
+ENST00000355162 ENSE00001731081 13
+ENST00000355162 ENSE00001654905 14
+ENST00000355162 ENSE00001729171 15
+ENST00000355162 ENSE00001763966 16
+ENST00000355162 ENSE00001593786 17
+ENST00000355162 ENSE00001370395 18
+ENST00000346432 ENSE00001408731 1
+ENST00000346432 ENSE00001494437 2
+ENST00000346432 ENSE00001494434 3
+ENST00000346432 ENSE00001494431 4
+ENST00000346432 ENSE00001793373 5
+ENST00000346432 ENSE00003432678 6
+ENST00000346432 ENSE00003437631 7
+ENST00000346432 ENSE00003379331 8
+ENST00000346432 ENSE00001213593 9
+ENST00000346432 ENSE00001607530 10
+ENST00000346432 ENSE00001716749 11
+ENST00000346432 ENSE00001744765 12
+ENST00000346432 ENSE00001731081 13
+ENST00000346432 ENSE00001654905 14
+ENST00000346432 ENSE00001729171 15
+ENST00000346432 ENSE00001763966 16
+ENST00000346432 ENSE00001593786 17
+ENST00000346432 ENSE00001370395 18
+ENST00000333703 ENSE00001631277 1
+ENST00000333703 ENSE00001619635 2
+ENST00000333703 ENSE00001755808 3
+ENST00000333703 ENSE00000981568 4
+ENST00000333703 ENSE00001640924 5
+ENST00000333703 ENSE00001602875 6
+ENST00000362095 ENSE00001350198 1
+ENST00000362095 ENSE00001640924 2
+ENST00000362095 ENSE00001944263 3
+ENST00000400457 ENSE00000981568 1
+ENST00000400457 ENSE00001640924 2
+ENST00000400457 ENSE00001803775 3
+ENST00000400457 ENSE00001677522 4
+ENST00000400457 ENSE00001322750 5
+ENST00000215473 ENSE00001436852 1
+ENST00000215473 ENSE00001640924 2
+ENST00000215473 ENSE00001803775 3
+ENST00000215473 ENSE00001731866 4
+ENST00000215473 ENSE00001711324 5
+ENST00000215473 ENSE00001779807 6
+ENST00000215479 ENSE00001348274 1
+ENST00000215479 ENSE00001671586 2
+ENST00000215479 ENSE00001645681 3
+ENST00000215479 ENSE00000652250 4
+ENST00000215479 ENSE00001667251 5
+ENST00000215479 ENSE00001494454 6
+ENST00000383036 ENSE00001494452 1
+ENST00000383036 ENSE00001645681 2
+ENST00000383036 ENSE00001651085 3
+ENST00000383036 ENSE00000652250 4
+ENST00000383036 ENSE00001667251 5
+ENST00000383036 ENSE00001727000 6
+ENST00000383037 ENSE00001494452 1
+ENST00000383037 ENSE00001645681 2
+ENST00000383037 ENSE00001651085 3
+ENST00000383037 ENSE00000652250 4
+ENST00000383037 ENSE00001667251 5
+ENST00000528056 ENSE00002300179 1
+ENST00000528056 ENSE00003518103 2
+ENST00000528056 ENSE00003489469 3
+ENST00000528056 ENSE00003507474 4
+ENST00000528056 ENSE00003502780 5
+ENST00000528056 ENSE00002194667 6
+ENST00000528056 ENSE00001494358 7
+ENST00000528056 ENSE00001494356 8
+ENST00000533551 ENSE00002183990 1
+ENST00000533551 ENSE00003518103 2
+ENST00000533551 ENSE00003489469 3
+ENST00000533551 ENSE00003507474 4
+ENST00000533551 ENSE00003502780 5
+ENST00000533551 ENSE00002194667 6
+ENST00000533551 ENSE00002194098 7
+ENST00000495163 ENSE00001877035 1
+ENST00000495163 ENSE00001918349 2
+ENST00000472666 ENSE00001948825 1
+ENST00000472666 ENSE00003507474 2
+ENST00000472666 ENSE00001928357 3
+ENST00000362758 ENSE00002210406 1
+ENST00000362758 ENSE00003574133 2
+ENST00000362758 ENSE00003534383 3
+ENST00000362758 ENSE00003600341 4
+ENST00000362758 ENSE00003520510 5
+ENST00000362758 ENSE00001661914 6
+ENST00000338981 ENSE00001020131 1
+ENST00000338981 ENSE00001020125 2
+ENST00000338981 ENSE00003651478 3
+ENST00000338981 ENSE00003476271 4
+ENST00000338981 ENSE00003602419 5
+ENST00000338981 ENSE00003467347 6
+ENST00000338981 ENSE00003693194 7
+ENST00000338981 ENSE00003654508 8
+ENST00000338981 ENSE00003550081 9
+ENST00000338981 ENSE00003624888 10
+ENST00000338981 ENSE00003487242 11
+ENST00000338981 ENSE00003601630 12
+ENST00000338981 ENSE00003666147 13
+ENST00000338981 ENSE00003566318 14
+ENST00000338981 ENSE00003557039 15
+ENST00000338981 ENSE00003515132 16
+ENST00000338981 ENSE00003621719 17
+ENST00000338981 ENSE00003555621 18
+ENST00000338981 ENSE00003655936 19
+ENST00000338981 ENSE00003476364 20
+ENST00000338981 ENSE00003510955 21
+ENST00000338981 ENSE00003621863 22
+ENST00000338981 ENSE00003560948 23
+ENST00000338981 ENSE00003561084 24
+ENST00000338981 ENSE00003508322 25
+ENST00000338981 ENSE00003664000 26
+ENST00000338981 ENSE00003547650 27
+ENST00000338981 ENSE00003694071 28
+ENST00000338981 ENSE00003679141 29
+ENST00000338981 ENSE00003652645 30
+ENST00000338981 ENSE00003564733 31
+ENST00000338981 ENSE00003681535 32
+ENST00000338981 ENSE00003669946 33
+ENST00000338981 ENSE00003585886 34
+ENST00000338981 ENSE00003635310 35
+ENST00000338981 ENSE00003552773 36
+ENST00000338981 ENSE00003548868 37
+ENST00000338981 ENSE00001601104 38
+ENST00000338981 ENSE00003586868 39
+ENST00000338981 ENSE00003686311 40
+ENST00000338981 ENSE00003476240 41
+ENST00000338981 ENSE00003513236 42
+ENST00000338981 ENSE00003654441 43
+ENST00000338981 ENSE00003686907 44
+ENST00000338981 ENSE00003595540 45
+ENST00000338981 ENSE00003502466 46
+ENST00000493168 ENSE00001869747 1
+ENST00000493168 ENSE00003574267 2
+ENST00000493168 ENSE00001853448 3
+ENST00000426564 ENSE00001952418 1
+ENST00000426564 ENSE00003642330 2
+ENST00000426564 ENSE00003608903 3
+ENST00000426564 ENSE00003569874 4
+ENST00000426564 ENSE00003525307 5
+ENST00000426564 ENSE00003468269 6
+ENST00000426564 ENSE00003668411 7
+ENST00000426564 ENSE00003544199 8
+ENST00000426564 ENSE00003543968 9
+ENST00000426564 ENSE00003678761 10
+ENST00000426564 ENSE00003522394 11
+ENST00000426564 ENSE00003556397 12
+ENST00000426564 ENSE00003484639 13
+ENST00000426564 ENSE00003494702 14
+ENST00000426564 ENSE00003664430 15
+ENST00000426564 ENSE00003545466 16
+ENST00000426564 ENSE00003634092 17
+ENST00000426564 ENSE00003562091 18
+ENST00000426564 ENSE00003499612 19
+ENST00000426564 ENSE00003612302 20
+ENST00000426564 ENSE00003510705 21
+ENST00000426564 ENSE00003494757 22
+ENST00000426564 ENSE00003626016 23
+ENST00000426564 ENSE00003624705 24
+ENST00000426564 ENSE00003646476 25
+ENST00000426564 ENSE00003683748 26
+ENST00000426564 ENSE00003522502 27
+ENST00000426564 ENSE00003692063 28
+ENST00000426564 ENSE00003612215 29
+ENST00000426564 ENSE00003491362 30
+ENST00000426564 ENSE00003599293 31
+ENST00000426564 ENSE00003548985 32
+ENST00000426564 ENSE00003477744 33
+ENST00000426564 ENSE00003471770 34
+ENST00000426564 ENSE00003654551 35
+ENST00000426564 ENSE00001841790 36
+ENST00000426564 ENSE00003628831 37
+ENST00000426564 ENSE00003640718 38
+ENST00000426564 ENSE00003536017 39
+ENST00000426564 ENSE00003664007 40
+ENST00000426564 ENSE00003650241 41
+ENST00000426564 ENSE00003632911 42
+ENST00000426564 ENSE00003662966 43
+ENST00000426564 ENSE00003580686 44
+ENST00000453031 ENSE00001595780 1
+ENST00000453031 ENSE00003654441 2
+ENST00000453031 ENSE00003686907 3
+ENST00000453031 ENSE00003595540 4
+ENST00000453031 ENSE00001746916 5
+ENST00000471409 ENSE00001944130 1
+ENST00000471409 ENSE00003580686 2
+ENST00000250776 ENSE00001756639 1
+ENST00000250776 ENSE00001663636 2
+ENST00000250776 ENSE00001595449 3
+ENST00000250776 ENSE00001634261 4
+ENST00000250776 ENSE00001767444 5
+ENST00000250784 ENSE00002490412 1
+ENST00000250784 ENSE00001709586 2
+ENST00000250784 ENSE00001738202 3
+ENST00000250784 ENSE00001602849 4
+ENST00000250784 ENSE00001601989 5
+ENST00000250784 ENSE00003667463 6
+ENST00000250784 ENSE00003636667 7
+ENST00000430575 ENSE00001732508 1
+ENST00000430575 ENSE00001709586 2
+ENST00000430575 ENSE00001738202 3
+ENST00000430575 ENSE00001602849 4
+ENST00000430575 ENSE00001601989 5
+ENST00000430575 ENSE00003667463 6
+ENST00000430575 ENSE00001635459 7
+ENST00000477725 ENSE00001859961 1
+ENST00000477725 ENSE00003469154 2
+ENST00000477725 ENSE00003654930 3
+ENST00000515575 ENSE00002062109 1
+ENST00000515575 ENSE00002034437 2
+ENST00000515575 ENSE00002072199 3
+ENST00000250805 ENSE00001782128 1
+ENST00000250805 ENSE00001664883 2
+ENST00000250805 ENSE00001800396 3
+ENST00000250805 ENSE00001771159 4
+ENST00000250805 ENSE00001609985 5
+ENST00000250823 ENSE00001914170 1
+ENST00000250823 ENSE00001617113 2
+ENST00000250825 ENSE00001940528 1
+ENST00000250825 ENSE00001622417 2
+ENST00000382867 ENSE00002314376 1
+ENST00000544303 ENSE00002282227 1
+ENST00000544303 ENSE00002227312 2
+ENST00000544303 ENSE00002265258 3
+ENST00000407724 ENSE00002810788 1
+ENST00000407724 ENSE00002783311 2
+ENST00000407724 ENSE00002873804 3
+ENST00000407724 ENSE00002825272 4
+ENST00000407724 ENSE00001893165 5
+ENST00000459719 ENSE00001826433 1
+ENST00000459719 ENSE00002783311 2
+ENST00000459719 ENSE00001875276 3
+ENST00000459719 ENSE00002811240 4
+ENST00000447520 ENSE00002957208 1
+ENST00000447520 ENSE00002783311 2
+ENST00000447520 ENSE00002829137 3
+ENST00000447520 ENSE00002862473 4
+ENST00000445715 ENSE00002957208 1
+ENST00000445715 ENSE00002783311 2
+ENST00000445715 ENSE00002829137 3
+ENST00000445715 ENSE00002862473 4
+ENST00000445715 ENSE00002920932 5
+ENST00000445715 ENSE00001657009 6
+ENST00000445715 ENSE00001617655 7
+ENST00000445715 ENSE00001663997 8
+ENST00000445715 ENSE00003585093 9
+ENST00000445715 ENSE00003676904 10
+ENST00000445715 ENSE00001621570 11
+ENST00000538014 ENSE00002224044 1
+ENST00000538014 ENSE00002265061 2
+ENST00000538014 ENSE00002783311 3
+ENST00000538014 ENSE00002256441 4
+ENST00000447202 ENSE00001793723 1
+ENST00000447202 ENSE00001687078 2
+ENST00000588613 ENSE00002916216 1
+ENST00000588613 ENSE00002829137 2
+ENST00000588613 ENSE00002920932 3
+ENST00000588613 ENSE00002875419 4
+ENST00000589075 ENSE00002833019 1
+ENST00000589075 ENSE00002829137 2
+ENST00000589075 ENSE00002905816 3
+ENST00000585549 ENSE00002863993 1
+ENST00000585549 ENSE00002829137 2
+ENST00000585549 ENSE00002757214 3
+ENST00000585549 ENSE00002783490 4
+ENST00000587095 ENSE00002932169 1
+ENST00000587095 ENSE00002829137 2
+ENST00000587095 ENSE00002783490 3
+ENST00000488280 ENSE00001950956 1
+ENST00000488280 ENSE00002920932 2
+ENST00000488280 ENSE00001930064 3
+ENST00000253320 ENSE00001279198 1
+ENST00000253320 ENSE00001279210 2
+ENST00000253320 ENSE00003585093 3
+ENST00000253320 ENSE00003676904 4
+ENST00000253320 ENSE00001035195 5
+ENST00000593000 ENSE00002958230 1
+ENST00000592697 ENSE00002880368 1
+ENST00000592697 ENSE00002922995 2
+ENST00000592697 ENSE00003585093 3
+ENST00000592697 ENSE00003676904 4
+ENST00000592697 ENSE00002745809 5
+ENST00000251749 ENSE00001035136 1
+ENST00000251749 ENSE00000891756 2
+ENST00000251749 ENSE00001281659 3
+ENST00000251749 ENSE00000981692 4
+ENST00000251749 ENSE00001281628 5
+ENST00000251749 ENSE00002836144 6
+ENST00000382832 ENSE00001542537 1
+ENST00000382832 ENSE00003664473 2
+ENST00000382832 ENSE00003581743 3
+ENST00000382832 ENSE00001493485 4
+ENST00000433794 ENSE00001668181 1
+ENST00000433794 ENSE00003505404 2
+ENST00000433794 ENSE00001757235 3
+ENST00000433794 ENSE00003519096 4
+ENST00000433794 ENSE00003627127 5
+ENST00000545582 ENSE00002217996 1
+ENST00000545582 ENSE00001757235 2
+ENST00000545582 ENSE00003529279 3
+ENST00000545582 ENSE00003568887 4
+ENST00000253838 ENSE00001785382 1
+ENST00000253838 ENSE00003499302 2
+ENST00000253838 ENSE00001693495 3
+ENST00000538537 ENSE00002290871 1
+ENST00000538537 ENSE00003596079 2
+ENST00000538537 ENSE00002214070 3
+ENST00000538537 ENSE00002310424 4
+ENST00000253848 ENSE00001917099 1
+ENST00000253848 ENSE00003687937 2
+ENST00000253848 ENSE00001937237 3
+ENST00000545808 ENSE00002299728 1
+ENST00000545808 ENSE00003671276 2
+ENST00000545808 ENSE00002277139 3
+ENST00000545808 ENSE00002209602 4
+ENST00000457100 ENSE00001799711 1
+ENST00000457100 ENSE00001638328 2
+ENST00000457100 ENSE00001698649 3
+ENST00000457100 ENSE00001648830 4
+ENST00000457100 ENSE00001726416 5
+ENST00000457100 ENSE00001799620 6
+ENST00000276770 ENSE00001799711 1
+ENST00000276770 ENSE00001638328 2
+ENST00000276770 ENSE00001698649 3
+ENST00000276770 ENSE00001648830 4
+ENST00000276770 ENSE00001726416 5
+ENST00000276770 ENSE00001594918 6
+ENST00000276770 ENSE00001799620 7
+ENST00000449828 ENSE00001682908 1
+ENST00000449828 ENSE00001619344 2
+ENST00000449828 ENSE00001799620 3
+ENST00000447655 ENSE00001729695 1
+ENST00000447655 ENSE00001805126 2
+ENST00000447655 ENSE00001633857 3
+ENST00000276779 ENSE00001692562 1
+ENST00000276779 ENSE00001658267 2
+ENST00000276779 ENSE00001691944 3
+ENST00000276779 ENSE00001628673 4
+ENST00000276779 ENSE00001700737 5
+ENST00000276779 ENSE00001717109 6
+ENST00000276779 ENSE00001633857 7
+ENST00000415405 ENSE00001692562 1
+ENST00000415405 ENSE00001658267 2
+ENST00000415405 ENSE00001691944 3
+ENST00000415405 ENSE00001628673 4
+ENST00000415405 ENSE00001700737 5
+ENST00000415405 ENSE00001633857 6
+ENST00000284856 ENSE00001016870 1
+ENST00000284856 ENSE00001274354 2
+ENST00000288666 ENSE00001923202 1
+ENST00000288666 ENSE00001677561 2
+ENST00000288666 ENSE00001680575 3
+ENST00000288666 ENSE00001638361 4
+ENST00000288666 ENSE00001725100 5
+ENST00000288666 ENSE00001734476 6
+ENST00000288666 ENSE00001159425 7
+ENST00000471252 ENSE00001867079 1
+ENST00000471252 ENSE00001832496 2
+ENST00000382872 ENSE00001420613 1
+ENST00000382872 ENSE00003548219 2
+ENST00000382872 ENSE00003542397 3
+ENST00000382872 ENSE00003537524 4
+ENST00000382872 ENSE00003686712 5
+ENST00000382872 ENSE00001424034 6
+ENST00000355905 ENSE00001493621 1
+ENST00000355905 ENSE00003620397 2
+ENST00000355905 ENSE00003691073 3
+ENST00000355905 ENSE00003537524 4
+ENST00000355905 ENSE00003686712 5
+ENST00000355905 ENSE00003601036 6
+ENST00000382868 ENSE00001493621 1
+ENST00000382868 ENSE00003620397 2
+ENST00000382868 ENSE00003554354 3
+ENST00000382868 ENSE00003691073 4
+ENST00000382868 ENSE00003477208 5
+ENST00000382868 ENSE00003537524 6
+ENST00000382868 ENSE00003686712 7
+ENST00000382868 ENSE00003601036 8
+ENST00000476359 ENSE00001493621 1
+ENST00000476359 ENSE00003610714 2
+ENST00000476359 ENSE00003548219 3
+ENST00000476359 ENSE00003537517 4
+ENST00000476359 ENSE00003613876 5
+ENST00000476359 ENSE00001738756 6
+ENST00000476359 ENSE00003625704 7
+ENST00000476359 ENSE00003593743 8
+ENST00000476359 ENSE00003665738 9
+ENST00000481089 ENSE00001955812 1
+ENST00000481089 ENSE00001955997 2
+ENST00000339174 ENSE00001368742 1
+ENST00000339174 ENSE00001390456 2
+ENST00000339174 ENSE00003691073 3
+ENST00000339174 ENSE00003537524 4
+ENST00000339174 ENSE00003686712 5
+ENST00000339174 ENSE00001868567 6
+ENST00000413217 ENSE00001493617 1
+ENST00000413217 ENSE00003554354 2
+ENST00000413217 ENSE00003691073 3
+ENST00000413217 ENSE00001612473 4
+ENST00000297967 ENSE00003620397 1
+ENST00000297967 ENSE00003554354 2
+ENST00000297967 ENSE00003691073 3
+ENST00000297967 ENSE00001612473 4
+ENST00000320701 ENSE00001707785 1
+ENST00000320701 ENSE00003546295 2
+ENST00000320701 ENSE00001623435 3
+ENST00000320701 ENSE00001642228 4
+ENST00000320701 ENSE00003563383 5
+ENST00000320701 ENSE00001830098 6
+ENST00000383042 ENSE00003503096 1
+ENST00000383042 ENSE00003546295 2
+ENST00000383042 ENSE00001623435 3
+ENST00000383042 ENSE00001642228 4
+ENST00000383042 ENSE00001618514 5
+ENST00000383042 ENSE00001687266 6
+ENST00000470569 ENSE00003465137 1
+ENST00000470569 ENSE00003510285 2
+ENST00000470569 ENSE00001885391 3
+ENST00000470569 ENSE00001950610 4
+ENST00000464674 ENSE00001851117 1
+ENST00000464674 ENSE00003591129 2
+ENST00000464674 ENSE00001858368 3
+ENST00000343584 ENSE00001902634 1
+ENST00000343584 ENSE00003696150 2
+ENST00000343584 ENSE00001591394 3
+ENST00000607210 ENSE00003701041 1
+ENST00000607210 ENSE00003696150 2
+ENST00000607210 ENSE00003695877 3
+ENST00000303593 ENSE00001786049 1
+ENST00000303593 ENSE00003695050 2
+ENST00000303593 ENSE00001644491 3
+ENST00000303728 ENSE00001686003 1
+ENST00000303728 ENSE00001657597 2
+ENST00000303728 ENSE00003636240 3
+ENST00000303728 ENSE00003502664 4
+ENST00000303728 ENSE00003502641 5
+ENST00000477123 ENSE00001686003 1
+ENST00000477123 ENSE00001657597 2
+ENST00000477123 ENSE00003636240 3
+ENST00000477123 ENSE00003502664 4
+ENST00000477123 ENSE00001876694 5
+ENST00000477123 ENSE00003543250 6
+ENST00000338793 ENSE00001686003 1
+ENST00000338793 ENSE00001657597 2
+ENST00000338793 ENSE00003636240 3
+ENST00000338793 ENSE00003502664 4
+ENST00000338793 ENSE00001624902 5
+ENST00000303766 ENSE00001703126 1
+ENST00000303766 ENSE00003469687 2
+ENST00000303766 ENSE00003536546 3
+ENST00000303766 ENSE00001739138 4
+ENST00000303766 ENSE00003679115 5
+ENST00000303766 ENSE00003668781 6
+ENST00000303766 ENSE00003531954 7
+ENST00000303766 ENSE00003661928 8
+ENST00000303766 ENSE00003557303 9
+ENST00000303766 ENSE00003568885 10
+ENST00000303766 ENSE00003638296 11
+ENST00000303766 ENSE00003580283 12
+ENST00000481858 ENSE00001816306 1
+ENST00000481858 ENSE00003562722 2
+ENST00000481858 ENSE00003591193 3
+ENST00000481858 ENSE00003636020 4
+ENST00000481858 ENSE00003530779 5
+ENST00000481858 ENSE00003638449 6
+ENST00000481858 ENSE00003576126 7
+ENST00000481858 ENSE00003480178 8
+ENST00000481858 ENSE00003687067 9
+ENST00000481858 ENSE00003468889 10
+ENST00000481858 ENSE00003578044 11
+ENST00000454978 ENSE00001765593 1
+ENST00000454978 ENSE00003469687 2
+ENST00000454978 ENSE00003536546 3
+ENST00000454978 ENSE00001739138 4
+ENST00000454978 ENSE00003679115 5
+ENST00000454978 ENSE00003668781 6
+ENST00000454978 ENSE00003531954 7
+ENST00000454978 ENSE00003661928 8
+ENST00000454978 ENSE00003557303 9
+ENST00000454978 ENSE00003638296 10
+ENST00000454978 ENSE00003580283 11
+ENST00000303804 ENSE00001625881 1
+ENST00000303804 ENSE00001682815 2
+ENST00000303804 ENSE00003673462 3
+ENST00000303804 ENSE00003492436 4
+ENST00000303804 ENSE00003573636 5
+ENST00000472391 ENSE00001625881 1
+ENST00000472391 ENSE00001682815 2
+ENST00000472391 ENSE00003673462 3
+ENST00000472391 ENSE00003492436 4
+ENST00000472391 ENSE00001899711 5
+ENST00000472391 ENSE00003488975 6
+ENST00000341740 ENSE00001625881 1
+ENST00000341740 ENSE00001682815 2
+ENST00000341740 ENSE00003673462 3
+ENST00000341740 ENSE00003492436 4
+ENST00000341740 ENSE00001628640 5
+ENST00000382759 ENSE00003525363 1
+ENST00000382759 ENSE00003550074 2
+ENST00000382759 ENSE00003651701 3
+ENST00000382759 ENSE00003567644 4
+ENST00000382759 ENSE00003490701 5
+ENST00000382759 ENSE00003662908 6
+ENST00000382759 ENSE00001594909 7
+ENST00000382759 ENSE00003642058 8
+ENST00000382759 ENSE00003574473 9
+ENST00000382759 ENSE00003635823 10
+ENST00000326985 ENSE00003480955 1
+ENST00000326985 ENSE00003658768 2
+ENST00000326985 ENSE00003532851 3
+ENST00000326985 ENSE00003665479 4
+ENST00000326985 ENSE00003491376 5
+ENST00000326985 ENSE00003537055 6
+ENST00000326985 ENSE00001657501 7
+ENST00000426043 ENSE00001596885 1
+ENST00000426043 ENSE00003537055 2
+ENST00000426043 ENSE00003604831 3
+ENST00000426043 ENSE00003477338 4
+ENST00000426043 ENSE00003512248 5
+ENST00000303979 ENSE00001611088 1
+ENST00000303979 ENSE00001715185 2
+ENST00000303979 ENSE00001657254 3
+ENST00000303979 ENSE00001645729 4
+ENST00000303979 ENSE00001802624 5
+ENST00000303979 ENSE00001543685 6
+ENST00000344884 ENSE00003670797 1
+ENST00000344884 ENSE00003588065 2
+ENST00000491902 ENSE00003662804 1
+ENST00000491902 ENSE00003613429 2
+ENST00000491902 ENSE00001899520 3
+ENST00000491902 ENSE00001698194 4
+ENST00000491902 ENSE00001803454 5
+ENST00000491902 ENSE00003604288 6
+ENST00000382852 ENSE00001597600 1
+ENST00000382852 ENSE00003535599 2
+ENST00000382852 ENSE00001493553 3
+ENST00000304790 ENSE00001674814 1
+ENST00000304790 ENSE00001858694 2
+ENST00000505047 ENSE00002036080 1
+ENST00000505047 ENSE00002085962 2
+ENST00000505047 ENSE00002062695 3
+ENST00000505047 ENSE00002046872 4
+ENST00000505047 ENSE00002035366 5
+ENST00000306589 ENSE00001900148 1
+ENST00000306589 ENSE00003701812 2
+ENST00000306589 ENSE00001781602 3
+ENST00000338673 ENSE00001665399 1
+ENST00000338673 ENSE00003700592 2
+ENST00000338673 ENSE00001707150 3
+ENST00000306609 ENSE00001854309 1
+ENST00000306609 ENSE00001666303 2
+ENST00000361963 ENSE00001436475 1
+ENST00000333235 ENSE00001786020 1
+ENST00000333235 ENSE00001735399 2
+ENST00000539489 ENSE00003506820 1
+ENST00000539489 ENSE00003604808 2
+ENST00000416946 ENSE00001668733 1
+ENST00000416946 ENSE00001679890 2
+ENST00000416946 ENSE00001610203 3
+ENST00000416946 ENSE00003589912 4
+ENST00000416946 ENSE00001637486 5
+ENST00000416946 ENSE00001798553 6
+ENST00000416946 ENSE00001752493 7
+ENST00000416946 ENSE00003654675 8
+ENST00000416946 ENSE00003649198 9
+ENST00000416946 ENSE00003496796 10
+ENST00000416946 ENSE00003547203 11
+ENST00000416946 ENSE00001716410 12
+ENST00000416946 ENSE00001635424 13
+ENST00000416946 ENSE00001555256 14
+ENST00000306667 ENSE00002286560 1
+ENST00000306667 ENSE00001610203 2
+ENST00000306667 ENSE00003553650 3
+ENST00000306667 ENSE00002324147 4
+ENST00000306667 ENSE00003604497 5
+ENST00000306667 ENSE00003480675 6
+ENST00000306667 ENSE00003621244 7
+ENST00000306667 ENSE00003668556 8
+ENST00000306667 ENSE00001795579 9
+ENST00000423852 ENSE00001629437 1
+ENST00000423852 ENSE00001719001 2
+ENST00000423852 ENSE00001728235 3
+ENST00000423852 ENSE00001708482 4
+ENST00000423852 ENSE00001598676 5
+ENST00000423852 ENSE00003608732 6
+ENST00000423852 ENSE00003588397 7
+ENST00000423852 ENSE00003459260 8
+ENST00000423852 ENSE00003477737 9
+ENST00000423852 ENSE00003459192 10
+ENST00000423852 ENSE00001742949 11
+ENST00000423852 ENSE00001800235 12
+ENST00000423852 ENSE00001597641 13
+ENST00000338706 ENSE00003490496 1
+ENST00000338706 ENSE00003609331 2
+ENST00000338706 ENSE00003569147 3
+ENST00000338706 ENSE00002302306 4
+ENST00000338706 ENSE00003652688 5
+ENST00000338706 ENSE00003586010 6
+ENST00000338706 ENSE00003532508 7
+ENST00000338706 ENSE00003491049 8
+ENST00000338706 ENSE00003688093 9
+ENST00000418188 ENSE00001654202 1
+ENST00000418188 ENSE00001648900 2
+ENST00000306882 ENSE00001931820 1
+ENST00000306882 ENSE00001621868 2
+ENST00000382407 ENSE00001491983 1
+ENST00000307393 ENSE00001652231 1
+ENST00000307393 ENSE00001822279 2
+ENST00000309834 ENSE00001838947 1
+ENST00000309834 ENSE00003536585 2
+ENST00000338876 ENSE00001838947 1
+ENST00000338876 ENSE00001801443 2
+ENST00000338876 ENSE00001739030 3
+ENST00000338876 ENSE00001756006 4
+ENST00000338876 ENSE00001661293 5
+ENST00000338876 ENSE00003593310 6
+ENST00000382856 ENSE00001603573 1
+ENST00000382856 ENSE00001801443 2
+ENST00000382856 ENSE00001493572 3
+ENST00000455422 ENSE00003551056 1
+ENST00000455422 ENSE00001663043 2
+ENST00000455422 ENSE00003516285 3
+ENST00000455422 ENSE00003527480 4
+ENST00000455422 ENSE00001644937 5
+ENST00000455422 ENSE00001644677 6
+ENST00000455422 ENSE00001705054 7
+ENST00000455422 ENSE00001636903 8
+ENST00000311828 ENSE00002323304 1
+ENST00000311828 ENSE00003527480 2
+ENST00000311828 ENSE00002263796 3
+ENST00000416687 ENSE00003587796 1
+ENST00000416687 ENSE00002255439 2
+ENST00000416687 ENSE00002234877 3
+ENST00000416687 ENSE00003466737 4
+ENST00000416687 ENSE00003689056 5
+ENST00000416687 ENSE00002236871 6
+ENST00000321217 ENSE00001091887 1
+ENST00000321217 ENSE00001889836 2
+ENST00000559055 ENSE00002559577 1
+ENST00000324446 ENSE00001285919 1
+ENST00000324446 ENSE00001702356 2
+ENST00000454875 ENSE00001684888 1
+ENST00000454875 ENSE00001631223 2
+ENST00000454875 ENSE00001624668 3
+ENST00000452584 ENSE00001747572 1
+ENST00000452584 ENSE00001776523 2
+ENST00000452584 ENSE00001661526 3
+ENST00000452584 ENSE00001792674 4
+ENST00000452584 ENSE00001594458 5
+ENST00000331787 ENSE00001313137 1
+ENST00000331787 ENSE00001556105 2
+ENST00000447937 ENSE00001748128 1
+ENST00000447937 ENSE00001776523 2
+ENST00000447937 ENSE00001720075 3
+ENST00000447937 ENSE00001800197 4
+ENST00000447937 ENSE00001721855 5
+ENST00000447937 ENSE00001746218 6
+ENST00000447937 ENSE00001626612 7
+ENST00000253470 ENSE00001544018 1
+ENST00000253470 ENSE00001544017 2
+ENST00000253470 ENSE00000900348 3
+ENST00000253470 ENSE00001272668 4
+ENST00000253470 ENSE00001544015 5
+ENST00000426790 ENSE00001708504 1
+ENST00000426790 ENSE00001744590 2
+ENST00000426790 ENSE00001639130 3
+ENST00000250838 ENSE00002310447 1
+ENST00000382764 ENSE00001493287 1
+ENST00000382764 ENSE00001296914 2
+ENST00000382764 ENSE00001325246 3
+ENST00000382764 ENSE00001493285 4
+ENST00000329684 ENSE00001853079 1
+ENST00000329684 ENSE00001693371 2
+ENST00000329684 ENSE00001729545 3
+ENST00000426035 ENSE00001853079 1
+ENST00000426035 ENSE00001693371 2
+ENST00000426035 ENSE00001799150 3
+ENST00000331172 ENSE00002265228 1
+ENST00000331172 ENSE00001730908 2
+ENST00000331172 ENSE00001631073 3
+ENST00000331172 ENSE00001704814 4
+ENST00000331172 ENSE00001619518 5
+ENST00000331172 ENSE00002258116 6
+ENST00000602732 ENSE00003436862 1
+ENST00000602732 ENSE00003254543 2
+ENST00000602732 ENSE00001669186 3
+ENST00000602732 ENSE00001626256 4
+ENST00000602732 ENSE00003262349 5
+ENST00000331070 ENSE00001730185 1
+ENST00000331070 ENSE00001669186 2
+ENST00000331070 ENSE00001626256 3
+ENST00000331070 ENSE00003542244 4
+ENST00000331070 ENSE00001792376 5
+ENST00000331070 ENSE00003677192 6
+ENST00000331070 ENSE00003473409 7
+ENST00000331070 ENSE00001543555 8
+ENST00000602818 ENSE00003442800 1
+ENST00000602818 ENSE00001669186 2
+ENST00000602818 ENSE00001626256 3
+ENST00000602818 ENSE00003516313 4
+ENST00000602818 ENSE00003688894 5
+ENST00000602818 ENSE00003560022 6
+ENST00000602818 ENSE00003360806 7
+ENST00000382585 ENSE00001730185 1
+ENST00000382585 ENSE00001669186 2
+ENST00000382585 ENSE00001626256 3
+ENST00000382585 ENSE00003542244 4
+ENST00000382585 ENSE00001792376 5
+ENST00000382585 ENSE00003677192 6
+ENST00000382585 ENSE00003473409 7
+ENST00000382585 ENSE00001698397 8
+ENST00000382585 ENSE00001791291 9
+ENST00000602770 ENSE00003302451 1
+ENST00000602770 ENSE00003336765 2
+ENST00000602770 ENSE00001619605 3
+ENST00000602770 ENSE00001689728 4
+ENST00000602770 ENSE00003272237 5
+ENST00000382392 ENSE00001676764 1
+ENST00000382392 ENSE00001619605 2
+ENST00000382392 ENSE00001689728 3
+ENST00000382392 ENSE00003549506 4
+ENST00000382392 ENSE00001757255 5
+ENST00000382392 ENSE00003667169 6
+ENST00000382392 ENSE00003463186 7
+ENST00000382392 ENSE00001625788 8
+ENST00000382392 ENSE00001723611 9
+ENST00000602549 ENSE00003454106 1
+ENST00000602549 ENSE00001619605 2
+ENST00000602549 ENSE00001689728 3
+ENST00000602549 ENSE00003681954 4
+ENST00000602549 ENSE00003623347 5
+ENST00000602549 ENSE00003640779 6
+ENST00000602549 ENSE00003335658 7
+ENST00000602549 ENSE00003346171 8
+ENST00000331397 ENSE00001864071 1
+ENST00000331397 ENSE00003531058 2
+ENST00000331397 ENSE00003601240 3
+ENST00000331397 ENSE00003660729 4
+ENST00000331397 ENSE00003557845 5
+ENST00000331397 ENSE00003529356 6
+ENST00000331397 ENSE00003648165 7
+ENST00000331397 ENSE00001594460 8
+ENST00000331397 ENSE00003527061 9
+ENST00000331397 ENSE00003473448 10
+ENST00000331397 ENSE00001775185 11
+ENST00000331397 ENSE00001755119 12
+ENST00000331397 ENSE00001291636 13
+ENST00000331397 ENSE00001311020 14
+ENST00000331397 ENSE00001696208 15
+ENST00000331397 ENSE00001305759 16
+ENST00000331397 ENSE00001624388 17
+ENST00000331397 ENSE00001639693 18
+ENST00000331397 ENSE00001651897 19
+ENST00000331397 ENSE00001295816 20
+ENST00000331397 ENSE00001591330 21
+ENST00000331397 ENSE00001751483 22
+ENST00000331397 ENSE00001740894 23
+ENST00000331397 ENSE00001640284 24
+ENST00000331397 ENSE00001785465 25
+ENST00000331397 ENSE00001298131 26
+ENST00000331397 ENSE00001713248 27
+ENST00000331397 ENSE00001875480 28
+ENST00000362096 ENSE00001385925 1
+ENST00000362096 ENSE00003531058 2
+ENST00000362096 ENSE00003601240 3
+ENST00000362096 ENSE00003660729 4
+ENST00000362096 ENSE00003557845 5
+ENST00000362096 ENSE00003529356 6
+ENST00000362096 ENSE00003648165 7
+ENST00000362096 ENSE00001594460 8
+ENST00000362096 ENSE00003527061 9
+ENST00000362096 ENSE00003473448 10
+ENST00000362096 ENSE00001775185 11
+ENST00000362096 ENSE00001755119 12
+ENST00000362096 ENSE00001291636 13
+ENST00000362096 ENSE00001311020 14
+ENST00000362096 ENSE00001696208 15
+ENST00000362096 ENSE00001305759 16
+ENST00000362096 ENSE00001624388 17
+ENST00000362096 ENSE00001639693 18
+ENST00000362096 ENSE00001651897 19
+ENST00000362096 ENSE00001295816 20
+ENST00000362096 ENSE00001591330 21
+ENST00000362096 ENSE00001751483 22
+ENST00000362096 ENSE00001740894 23
+ENST00000362096 ENSE00001640284 24
+ENST00000362096 ENSE00001436855 25
+ENST00000329134 ENSE00001385925 1
+ENST00000329134 ENSE00003531058 2
+ENST00000329134 ENSE00003601240 3
+ENST00000329134 ENSE00003660729 4
+ENST00000329134 ENSE00003557845 5
+ENST00000329134 ENSE00003529356 6
+ENST00000329134 ENSE00003648165 7
+ENST00000329134 ENSE00001594460 8
+ENST00000329134 ENSE00003527061 9
+ENST00000329134 ENSE00003473448 10
+ENST00000329134 ENSE00001775185 11
+ENST00000329134 ENSE00001755119 12
+ENST00000329134 ENSE00001291636 13
+ENST00000329134 ENSE00001311020 14
+ENST00000329134 ENSE00001696208 15
+ENST00000329134 ENSE00001305759 16
+ENST00000329134 ENSE00001624388 17
+ENST00000329134 ENSE00001639693 18
+ENST00000329134 ENSE00001651897 19
+ENST00000329134 ENSE00001640421 20
+ENST00000478900 ENSE00001822678 1
+ENST00000478900 ENSE00003612487 2
+ENST00000478900 ENSE00003529285 3
+ENST00000478900 ENSE00003678156 4
+ENST00000478900 ENSE00003644070 5
+ENST00000478900 ENSE00003505329 6
+ENST00000478900 ENSE00003575738 7
+ENST00000478900 ENSE00003582459 8
+ENST00000478900 ENSE00003591211 9
+ENST00000478900 ENSE00001899507 10
+ENST00000474365 ENSE00001931275 1
+ENST00000474365 ENSE00003529285 2
+ENST00000474365 ENSE00003678156 3
+ENST00000474365 ENSE00003644070 4
+ENST00000474365 ENSE00003505329 5
+ENST00000474365 ENSE00003575738 6
+ENST00000474365 ENSE00001818021 7
+ENST00000382893 ENSE00001436854 1
+ENST00000382893 ENSE00003531058 2
+ENST00000382893 ENSE00003601240 3
+ENST00000382893 ENSE00003660729 4
+ENST00000382893 ENSE00003557845 5
+ENST00000382893 ENSE00003529356 6
+ENST00000382893 ENSE00001493710 7
+ENST00000479713 ENSE00001942244 1
+ENST00000479713 ENSE00001872513 2
+ENST00000382896 ENSE00002281428 1
+ENST00000382896 ENSE00003531058 2
+ENST00000382896 ENSE00003601240 3
+ENST00000382896 ENSE00003660729 4
+ENST00000382896 ENSE00003557845 5
+ENST00000382896 ENSE00003529356 6
+ENST00000382896 ENSE00003648165 7
+ENST00000382896 ENSE00001594460 8
+ENST00000382896 ENSE00003527061 9
+ENST00000382896 ENSE00003473448 10
+ENST00000382896 ENSE00001775185 11
+ENST00000382896 ENSE00001755119 12
+ENST00000382896 ENSE00002277775 13
+ENST00000382896 ENSE00001291636 14
+ENST00000382896 ENSE00001311020 15
+ENST00000382896 ENSE00001696208 16
+ENST00000382896 ENSE00001305759 17
+ENST00000382896 ENSE00001624388 18
+ENST00000382896 ENSE00001639693 19
+ENST00000382896 ENSE00001651897 20
+ENST00000382896 ENSE00001295816 21
+ENST00000382896 ENSE00001591330 22
+ENST00000382896 ENSE00001751483 23
+ENST00000382896 ENSE00001740894 24
+ENST00000382896 ENSE00001640284 25
+ENST00000382896 ENSE00001785465 26
+ENST00000382896 ENSE00001298131 27
+ENST00000382896 ENSE00001713248 28
+ENST00000382896 ENSE00001330353 29
+ENST00000537580 ENSE00002281428 1
+ENST00000537580 ENSE00003531058 2
+ENST00000537580 ENSE00003601240 3
+ENST00000537580 ENSE00003660729 4
+ENST00000537580 ENSE00003557845 5
+ENST00000537580 ENSE00003529356 6
+ENST00000537580 ENSE00003648165 7
+ENST00000537580 ENSE00001594460 8
+ENST00000537580 ENSE00003527061 9
+ENST00000537580 ENSE00003473448 10
+ENST00000537580 ENSE00001775185 11
+ENST00000537580 ENSE00001755119 12
+ENST00000537580 ENSE00002277775 13
+ENST00000537580 ENSE00001291636 14
+ENST00000537580 ENSE00001311020 15
+ENST00000537580 ENSE00002240413 16
+ENST00000537580 ENSE00001305759 17
+ENST00000537580 ENSE00001624388 18
+ENST00000537580 ENSE00001639693 19
+ENST00000537580 ENSE00001651897 20
+ENST00000537580 ENSE00001295816 21
+ENST00000537580 ENSE00001591330 22
+ENST00000537580 ENSE00001751483 23
+ENST00000537580 ENSE00001740894 24
+ENST00000537580 ENSE00001640284 25
+ENST00000537580 ENSE00001785465 26
+ENST00000537580 ENSE00001298131 27
+ENST00000537580 ENSE00001330353 28
+ENST00000538878 ENSE00002284432 1
+ENST00000538878 ENSE00003531058 2
+ENST00000538878 ENSE00003601240 3
+ENST00000538878 ENSE00003660729 4
+ENST00000538878 ENSE00003557845 5
+ENST00000538878 ENSE00003529356 6
+ENST00000538878 ENSE00003648165 7
+ENST00000538878 ENSE00001594460 8
+ENST00000538878 ENSE00003527061 9
+ENST00000538878 ENSE00003473448 10
+ENST00000538878 ENSE00001755119 11
+ENST00000538878 ENSE00001291636 12
+ENST00000538878 ENSE00001311020 13
+ENST00000538878 ENSE00001696208 14
+ENST00000538878 ENSE00001305759 15
+ENST00000538878 ENSE00001624388 16
+ENST00000538878 ENSE00001639693 17
+ENST00000538878 ENSE00001651897 18
+ENST00000538878 ENSE00002258300 19
+ENST00000540140 ENSE00002281428 1
+ENST00000540140 ENSE00003531058 2
+ENST00000540140 ENSE00003601240 3
+ENST00000540140 ENSE00003660729 4
+ENST00000540140 ENSE00003557845 5
+ENST00000540140 ENSE00003529356 6
+ENST00000540140 ENSE00003648165 7
+ENST00000540140 ENSE00001594460 8
+ENST00000540140 ENSE00003527061 9
+ENST00000540140 ENSE00003473448 10
+ENST00000540140 ENSE00001755119 11
+ENST00000540140 ENSE00002206921 12
+ENST00000540140 ENSE00001291636 13
+ENST00000540140 ENSE00001311020 14
+ENST00000540140 ENSE00001696208 15
+ENST00000540140 ENSE00001305759 16
+ENST00000540140 ENSE00001624388 17
+ENST00000540140 ENSE00001639693 18
+ENST00000540140 ENSE00001651897 19
+ENST00000540140 ENSE00002312710 20
+ENST00000545955 ENSE00002281428 1
+ENST00000545955 ENSE00003531058 2
+ENST00000545955 ENSE00003601240 3
+ENST00000545955 ENSE00003660729 4
+ENST00000545955 ENSE00003557845 5
+ENST00000545955 ENSE00003529356 6
+ENST00000545955 ENSE00003648165 7
+ENST00000545955 ENSE00001594460 8
+ENST00000545955 ENSE00003527061 9
+ENST00000545955 ENSE00003473448 10
+ENST00000545955 ENSE00001775185 11
+ENST00000545955 ENSE00001755119 12
+ENST00000545955 ENSE00002277775 13
+ENST00000545955 ENSE00002206921 14
+ENST00000545955 ENSE00001291636 15
+ENST00000545955 ENSE00001311020 16
+ENST00000545955 ENSE00001696208 17
+ENST00000545955 ENSE00001305759 18
+ENST00000545955 ENSE00001624388 19
+ENST00000545955 ENSE00001639693 20
+ENST00000545955 ENSE00001651897 21
+ENST00000383070 ENSE00001494622 1
+ENST00000525526 ENSE00002201849 1
+ENST00000525526 ENSE00002323146 2
+ENST00000534739 ENSE00002144027 1
+ENST00000534739 ENSE00002214525 2
+ENST00000330337 ENSE00001543664 1
+ENST00000330337 ENSE00001299738 2
+ENST00000330337 ENSE00001425815 3
+ENST00000330337 ENSE00001543661 4
+ENST00000330337 ENSE00001543658 5
+ENST00000330337 ENSE00001543653 6
+ENST00000382840 ENSE00001493514 1
+ENST00000455570 ENSE00001774474 1
+ENST00000455570 ENSE00001632371 2
+ENST00000455570 ENSE00001757652 3
+ENST00000328819 ENSE00001774474 1
+ENST00000328819 ENSE00001632371 2
+ENST00000328819 ENSE00001598872 3
+ENST00000382287 ENSE00001643163 1
+ENST00000382287 ENSE00001775879 2
+ENST00000382287 ENSE00001797093 3
+ENST00000382287 ENSE00003604811 4
+ENST00000382287 ENSE00001711311 5
+ENST00000382287 ENSE00003547967 6
+ENST00000382287 ENSE00003506422 7
+ENST00000382287 ENSE00001764403 8
+ENST00000382287 ENSE00001740581 9
+ENST00000602559 ENSE00003378858 1
+ENST00000602559 ENSE00001775879 2
+ENST00000602559 ENSE00001797093 3
+ENST00000602559 ENSE00003663501 4
+ENST00000602559 ENSE00003522688 5
+ENST00000602559 ENSE00003640409 6
+ENST00000602559 ENSE00003450106 7
+ENST00000602559 ENSE00003432143 8
+ENST00000602680 ENSE00003307767 1
+ENST00000602680 ENSE00003370731 2
+ENST00000602680 ENSE00001775879 3
+ENST00000602680 ENSE00001797093 4
+ENST00000602680 ENSE00003266869 5
+ENST00000382365 ENSE00001695979 1
+ENST00000382365 ENSE00001803162 2
+ENST00000382365 ENSE00001733013 3
+ENST00000382365 ENSE00001602566 4
+ENST00000382365 ENSE00001709286 5
+ENST00000382365 ENSE00001706721 6
+ENST00000382365 ENSE00001722493 7
+ENST00000382365 ENSE00001757238 8
+ENST00000382365 ENSE00001785551 9
+ENST00000382365 ENSE00001618455 10
+ENST00000382365 ENSE00001728003 11
+ENST00000382365 ENSE00001655608 12
+ENST00000382365 ENSE00001684201 13
+ENST00000382365 ENSE00001594431 14
+ENST00000382365 ENSE00001620779 15
+ENST00000382365 ENSE00001604640 16
+ENST00000382365 ENSE00001799005 17
+ENST00000382365 ENSE00001715045 18
+ENST00000382365 ENSE00001879735 19
+ENST00000446723 ENSE00001759696 1
+ENST00000446723 ENSE00001803162 2
+ENST00000446723 ENSE00001733013 3
+ENST00000446723 ENSE00001602566 4
+ENST00000446723 ENSE00001709286 5
+ENST00000446723 ENSE00001706721 6
+ENST00000446723 ENSE00001722493 7
+ENST00000446723 ENSE00001757238 8
+ENST00000446723 ENSE00001785551 9
+ENST00000446723 ENSE00001618455 10
+ENST00000446723 ENSE00001728003 11
+ENST00000446723 ENSE00001655608 12
+ENST00000446723 ENSE00001684201 13
+ENST00000446723 ENSE00001604640 14
+ENST00000446723 ENSE00001799005 15
+ENST00000446723 ENSE00001715045 16
+ENST00000446723 ENSE00001670353 17
+ENST00000315357 ENSE00001638508 1
+ENST00000315357 ENSE00001803162 2
+ENST00000315357 ENSE00001733013 3
+ENST00000315357 ENSE00001602566 4
+ENST00000315357 ENSE00001709286 5
+ENST00000315357 ENSE00001706721 6
+ENST00000315357 ENSE00001722493 7
+ENST00000315357 ENSE00001757238 8
+ENST00000315357 ENSE00001785551 9
+ENST00000315357 ENSE00001728003 10
+ENST00000315357 ENSE00001684201 11
+ENST00000315357 ENSE00001620779 12
+ENST00000315357 ENSE00001604640 13
+ENST00000315357 ENSE00001799005 14
+ENST00000315357 ENSE00001715045 15
+ENST00000315357 ENSE00001596582 16
+ENST00000400212 ENSE00001710130 1
+ENST00000400212 ENSE00001803162 2
+ENST00000400212 ENSE00001733013 3
+ENST00000400212 ENSE00001602566 4
+ENST00000400212 ENSE00001709286 5
+ENST00000400212 ENSE00001706721 6
+ENST00000400212 ENSE00001722493 7
+ENST00000400212 ENSE00001757238 8
+ENST00000400212 ENSE00001785551 9
+ENST00000400212 ENSE00001618455 10
+ENST00000400212 ENSE00001728003 11
+ENST00000400212 ENSE00001655608 12
+ENST00000400212 ENSE00001684201 13
+ENST00000400212 ENSE00001620779 14
+ENST00000400212 ENSE00001761121 15
+ENST00000306737 ENSE00001710130 1
+ENST00000306737 ENSE00001803162 2
+ENST00000306737 ENSE00001733013 3
+ENST00000306737 ENSE00001602566 4
+ENST00000306737 ENSE00001709286 5
+ENST00000306737 ENSE00001706721 6
+ENST00000306737 ENSE00001722493 7
+ENST00000306737 ENSE00001757238 8
+ENST00000306737 ENSE00001785551 9
+ENST00000306737 ENSE00001618455 10
+ENST00000306737 ENSE00001728003 11
+ENST00000306737 ENSE00001655608 12
+ENST00000306737 ENSE00001684201 13
+ENST00000306737 ENSE00002297176 14
+ENST00000338964 ENSE00001761952 1
+ENST00000338964 ENSE00001787222 2
+ENST00000338964 ENSE00001756785 3
+ENST00000338964 ENSE00001659741 4
+ENST00000338964 ENSE00001614961 5
+ENST00000338964 ENSE00001783992 6
+ENST00000405239 ENSE00001790708 1
+ENST00000405239 ENSE00001612801 2
+ENST00000405239 ENSE00001805705 3
+ENST00000405239 ENSE00001701677 4
+ENST00000405239 ENSE00001677573 5
+ENST00000405239 ENSE00001692245 6
+ENST00000405239 ENSE00001655036 7
+ENST00000405239 ENSE00001644179 8
+ENST00000405239 ENSE00001646813 9
+ENST00000405239 ENSE00001693241 10
+ENST00000405239 ENSE00001620467 11
+ENST00000405239 ENSE00001793499 12
+ENST00000405239 ENSE00001598075 13
+ENST00000405239 ENSE00001622354 14
+ENST00000405239 ENSE00001695781 15
+ENST00000405239 ENSE00001646624 16
+ENST00000405239 ENSE00001597343 17
+ENST00000405239 ENSE00001707161 18
+ENST00000405239 ENSE00001696683 19
+ENST00000405239 ENSE00001742324 20
+ENST00000405239 ENSE00001758855 21
+ENST00000405239 ENSE00001633941 22
+ENST00000405239 ENSE00001675126 23
+ENST00000405239 ENSE00001697295 24
+ENST00000405239 ENSE00001614454 25
+ENST00000405239 ENSE00003674346 26
+ENST00000405239 ENSE00003567366 27
+ENST00000405239 ENSE00001730215 28
+ENST00000466332 ENSE00001907756 1
+ENST00000466332 ENSE00003593491 2
+ENST00000466332 ENSE00001924070 3
+ENST00000466332 ENSE00003490554 4
+ENST00000466332 ENSE00001941183 5
+ENST00000382510 ENSE00002223761 1
+ENST00000382510 ENSE00001612801 2
+ENST00000382510 ENSE00001805705 3
+ENST00000382510 ENSE00001701677 4
+ENST00000382510 ENSE00001677573 5
+ENST00000382510 ENSE00001692245 6
+ENST00000382510 ENSE00001655036 7
+ENST00000382510 ENSE00001644179 8
+ENST00000382510 ENSE00001646813 9
+ENST00000382510 ENSE00001693241 10
+ENST00000382510 ENSE00001620467 11
+ENST00000382510 ENSE00001793499 12
+ENST00000382510 ENSE00001598075 13
+ENST00000382510 ENSE00001622354 14
+ENST00000382510 ENSE00001695781 15
+ENST00000382510 ENSE00001646624 16
+ENST00000382510 ENSE00001597343 17
+ENST00000382510 ENSE00001707161 18
+ENST00000382510 ENSE00001696683 19
+ENST00000382510 ENSE00002211619 20
+ENST00000382510 ENSE00002216645 21
+ENST00000426000 ENSE00002288175 1
+ENST00000426000 ENSE00001612801 2
+ENST00000426000 ENSE00001805705 3
+ENST00000426000 ENSE00001701677 4
+ENST00000426000 ENSE00001677573 5
+ENST00000426000 ENSE00001692245 6
+ENST00000426000 ENSE00001655036 7
+ENST00000426000 ENSE00001644179 8
+ENST00000426000 ENSE00001646813 9
+ENST00000426000 ENSE00001693241 10
+ENST00000426000 ENSE00001620467 11
+ENST00000426000 ENSE00001597343 12
+ENST00000426000 ENSE00001707161 13
+ENST00000426000 ENSE00001696683 14
+ENST00000426000 ENSE00001742324 15
+ENST00000426000 ENSE00001758855 16
+ENST00000426000 ENSE00001633941 17
+ENST00000426000 ENSE00001675126 18
+ENST00000426000 ENSE00001697295 19
+ENST00000426000 ENSE00001614454 20
+ENST00000426000 ENSE00003674346 21
+ENST00000426000 ENSE00003567366 22
+ENST00000426000 ENSE00002302795 23
+ENST00000540248 ENSE00002244340 1
+ENST00000540248 ENSE00001612801 2
+ENST00000540248 ENSE00001805705 3
+ENST00000540248 ENSE00001701677 4
+ENST00000540248 ENSE00001677573 5
+ENST00000540248 ENSE00001692245 6
+ENST00000540248 ENSE00001597343 7
+ENST00000540248 ENSE00001707161 8
+ENST00000540248 ENSE00001696683 9
+ENST00000540248 ENSE00001742324 10
+ENST00000540248 ENSE00001758855 11
+ENST00000540248 ENSE00001633941 12
+ENST00000540248 ENSE00001675126 13
+ENST00000540248 ENSE00001614454 14
+ENST00000540248 ENSE00003674346 15
+ENST00000540248 ENSE00003567366 16
+ENST00000540248 ENSE00002220199 17
+ENST00000344424 ENSE00001364289 1
+ENST00000344424 ENSE00001703024 2
+ENST00000344424 ENSE00001598245 3
+ENST00000344424 ENSE00001623069 4
+ENST00000344424 ENSE00001638930 5
+ENST00000431358 ENSE00001793914 1
+ENST00000431358 ENSE00003611070 2
+ENST00000431358 ENSE00003508777 3
+ENST00000431358 ENSE00003486078 4
+ENST00000431358 ENSE00003501886 5
+ENST00000431358 ENSE00002025193 6
+ENST00000382670 ENSE00001605630 1
+ENST00000382670 ENSE00002029402 2
+ENST00000382670 ENSE00002071256 3
+ENST00000382670 ENSE00001752142 4
+ENST00000382670 ENSE00001625315 5
+ENST00000382670 ENSE00001784994 6
+ENST00000382670 ENSE00001627758 7
+ENST00000382670 ENSE00001785135 8
+ENST00000382670 ENSE00001784976 9
+ENST00000382670 ENSE00001614661 10
+ENST00000538925 ENSE00002247829 1
+ENST00000538925 ENSE00002029402 2
+ENST00000538925 ENSE00002071256 3
+ENST00000538925 ENSE00002322909 4
+ENST00000441780 ENSE00001696638 1
+ENST00000441780 ENSE00001710713 2
+ENST00000441780 ENSE00001621856 3
+ENST00000441780 ENSE00001665831 4
+ENST00000441780 ENSE00001614475 5
+ENST00000441780 ENSE00001747959 6
+ENST00000441780 ENSE00001629056 7
+ENST00000441780 ENSE00001738683 8
+ENST00000441780 ENSE00001745419 9
+ENST00000361365 ENSE00001435537 1
+ENST00000361365 ENSE00003613684 2
+ENST00000361365 ENSE00003657245 3
+ENST00000361365 ENSE00003465639 4
+ENST00000361365 ENSE00001635972 5
+ENST00000361365 ENSE00003488477 6
+ENST00000361365 ENSE00001436409 7
+ENST00000465253 ENSE00001900287 1
+ENST00000465253 ENSE00003550832 2
+ENST00000465253 ENSE00003637584 3
+ENST00000465253 ENSE00003458723 4
+ENST00000465253 ENSE00001848909 5
+ENST00000382772 ENSE00001848730 1
+ENST00000382772 ENSE00003613684 2
+ENST00000382772 ENSE00003657245 3
+ENST00000382772 ENSE00001635972 4
+ENST00000382772 ENSE00003488477 5
+ENST00000382772 ENSE00003530204 6
+ENST00000464196 ENSE00001918431 1
+ENST00000464196 ENSE00003571869 2
+ENST00000485584 ENSE00001943944 1
+ENST00000485584 ENSE00003530759 2
+ENST00000485584 ENSE00003571869 3
+ENST00000382314 ENSE00001608723 1
+ENST00000382314 ENSE00001761380 2
+ENST00000382314 ENSE00001671603 3
+ENST00000382314 ENSE00001610089 4
+ENST00000382314 ENSE00001639000 5
+ENST00000382314 ENSE00001609022 6
+ENST00000382314 ENSE00001796227 7
+ENST00000382314 ENSE00001643133 8
+ENST00000382314 ENSE00001610738 9
+ENST00000382314 ENSE00001694635 10
+ENST00000382314 ENSE00001628323 11
+ENST00000382314 ENSE00001759079 12
+ENST00000382314 ENSE00003621500 13
+ENST00000382314 ENSE00003476346 14
+ENST00000382314 ENSE00003593830 15
+ENST00000382314 ENSE00003673436 16
+ENST00000382314 ENSE00002286159 17
+ENST00000382296 ENSE00001761380 1
+ENST00000382296 ENSE00001671603 2
+ENST00000382296 ENSE00001610089 3
+ENST00000382296 ENSE00001639000 4
+ENST00000382296 ENSE00001609022 5
+ENST00000382296 ENSE00001796227 6
+ENST00000382296 ENSE00001643133 7
+ENST00000382296 ENSE00001610738 8
+ENST00000382296 ENSE00001694635 9
+ENST00000382296 ENSE00001628323 10
+ENST00000382296 ENSE00001759079 11
+ENST00000382296 ENSE00003621500 12
+ENST00000382296 ENSE00003521530 13
+ENST00000382296 ENSE00003476346 14
+ENST00000382296 ENSE00003593830 15
+ENST00000382296 ENSE00001409187 16
+ENST00000449750 ENSE00001608723 1
+ENST00000449750 ENSE00001784722 2
+ENST00000449750 ENSE00001612288 3
+ENST00000449750 ENSE00001607094 4
+ENST00000449750 ENSE00001741374 5
+ENST00000449750 ENSE00001709278 6
+ENST00000449750 ENSE00001796227 7
+ENST00000449750 ENSE00001643133 8
+ENST00000449750 ENSE00001610738 9
+ENST00000449750 ENSE00001694635 10
+ENST00000449750 ENSE00001628323 11
+ENST00000449750 ENSE00001632077 12
+ENST00000449750 ENSE00003646399 13
+ENST00000449750 ENSE00003476346 14
+ENST00000449750 ENSE00003593830 15
+ENST00000449750 ENSE00003673436 16
+ENST00000449750 ENSE00002286159 17
+ENST00000415508 ENSE00002226159 1
+ENST00000415508 ENSE00001784722 2
+ENST00000415508 ENSE00001612288 3
+ENST00000415508 ENSE00001607094 4
+ENST00000415508 ENSE00001741374 5
+ENST00000415508 ENSE00001709278 6
+ENST00000415508 ENSE00001796227 7
+ENST00000415508 ENSE00001643133 8
+ENST00000415508 ENSE00001628323 9
+ENST00000415508 ENSE00003609241 10
+ENST00000415508 ENSE00003641574 11
+ENST00000415508 ENSE00003582499 12
+ENST00000415508 ENSE00003476346 13
+ENST00000415508 ENSE00003593830 14
+ENST00000415508 ENSE00003673436 15
+ENST00000415508 ENSE00002310973 16
+ENST00000440066 ENSE00001804650 1
+ENST00000440066 ENSE00001784722 2
+ENST00000440066 ENSE00001612288 3
+ENST00000440066 ENSE00001607094 4
+ENST00000440066 ENSE00001741374 5
+ENST00000440066 ENSE00001709278 6
+ENST00000440066 ENSE00001761380 7
+ENST00000440066 ENSE00001671603 8
+ENST00000440066 ENSE00001610089 9
+ENST00000440066 ENSE00001639000 10
+ENST00000440066 ENSE00001609022 11
+ENST00000440066 ENSE00001796227 12
+ENST00000440066 ENSE00001643133 13
+ENST00000440066 ENSE00001610738 14
+ENST00000440066 ENSE00001694635 15
+ENST00000440066 ENSE00001628323 16
+ENST00000440066 ENSE00001632077 17
+ENST00000440066 ENSE00003646399 18
+ENST00000440066 ENSE00003609241 19
+ENST00000440066 ENSE00003476346 20
+ENST00000440066 ENSE00003593830 21
+ENST00000440066 ENSE00003673436 22
+ENST00000440066 ENSE00002286159 23
+ENST00000382432 ENSE00002247728 1
+ENST00000382432 ENSE00001761380 2
+ENST00000382432 ENSE00001671603 3
+ENST00000382432 ENSE00001610089 4
+ENST00000382432 ENSE00001639000 5
+ENST00000382432 ENSE00001609022 6
+ENST00000382432 ENSE00001796227 7
+ENST00000382432 ENSE00001643133 8
+ENST00000382432 ENSE00001610738 9
+ENST00000382432 ENSE00001694635 10
+ENST00000382432 ENSE00001628323 11
+ENST00000382432 ENSE00001632077 12
+ENST00000382432 ENSE00003646399 13
+ENST00000382432 ENSE00001759079 14
+ENST00000382432 ENSE00003688590 15
+ENST00000382432 ENSE00003491822 16
+ENST00000382432 ENSE00003627572 17
+ENST00000382432 ENSE00003463917 18
+ENST00000382432 ENSE00002242041 19
+ENST00000382432 ENSE00003542339 20
+ENST00000382432 ENSE00003675468 21
+ENST00000382432 ENSE00003605405 22
+ENST00000382432 ENSE00003571325 23
+ENST00000382432 ENSE00002286159 24
+ENST00000400494 ENSE00002324480 1
+ENST00000400494 ENSE00001761380 2
+ENST00000400494 ENSE00001671603 3
+ENST00000400494 ENSE00001610089 4
+ENST00000400494 ENSE00001639000 5
+ENST00000400494 ENSE00001609022 6
+ENST00000400494 ENSE00001796227 7
+ENST00000400494 ENSE00001643133 8
+ENST00000400494 ENSE00001610738 9
+ENST00000400494 ENSE00001694635 10
+ENST00000400494 ENSE00001628323 11
+ENST00000400494 ENSE00001632077 12
+ENST00000400494 ENSE00003646399 13
+ENST00000400494 ENSE00001759079 14
+ENST00000400494 ENSE00003688590 15
+ENST00000400494 ENSE00003675468 16
+ENST00000400494 ENSE00003605405 17
+ENST00000400494 ENSE00003571325 18
+ENST00000400494 ENSE00002310973 19
+ENST00000382290 ENSE00002247728 1
+ENST00000382290 ENSE00001761380 2
+ENST00000382290 ENSE00001671603 3
+ENST00000382290 ENSE00001610089 4
+ENST00000382290 ENSE00001639000 5
+ENST00000382290 ENSE00001609022 6
+ENST00000382290 ENSE00001796227 7
+ENST00000382290 ENSE00001643133 8
+ENST00000382290 ENSE00001610738 9
+ENST00000382290 ENSE00001694635 10
+ENST00000382290 ENSE00001628323 11
+ENST00000382290 ENSE00001632077 12
+ENST00000382290 ENSE00003528908 13
+ENST00000382290 ENSE00003669768 14
+ENST00000382290 ENSE00003491822 15
+ENST00000382290 ENSE00003627572 16
+ENST00000382290 ENSE00003463917 17
+ENST00000382290 ENSE00002242041 18
+ENST00000382290 ENSE00003542339 19
+ENST00000382290 ENSE00003675468 20
+ENST00000382290 ENSE00003605405 21
+ENST00000382290 ENSE00003571325 22
+ENST00000382290 ENSE00002286159 23
+ENST00000416956 ENSE00001693578 1
+ENST00000416956 ENSE00001701535 2
+ENST00000416956 ENSE00001742121 3
+ENST00000416956 ENSE00001646556 4
+ENST00000416956 ENSE00001671473 5
+ENST00000382449 ENSE00001639518 1
+ENST00000382449 ENSE00001640624 2
+ENST00000382449 ENSE00001689126 3
+ENST00000382449 ENSE00001736653 4
+ENST00000382449 ENSE00001659494 5
+ENST00000382449 ENSE00001723992 6
+ENST00000382449 ENSE00001592425 7
+ENST00000382449 ENSE00001676137 8
+ENST00000382449 ENSE00001748863 9
+ENST00000382449 ENSE00001787726 10
+ENST00000382449 ENSE00001605577 11
+ENST00000382449 ENSE00001607297 12
+ENST00000382449 ENSE00001771158 13
+ENST00000382449 ENSE00001743603 14
+ENST00000382449 ENSE00001706458 15
+ENST00000382449 ENSE00001690453 16
+ENST00000382449 ENSE00001659100 17
+ENST00000382449 ENSE00001717191 18
+ENST00000382449 ENSE00001705999 19
+ENST00000382449 ENSE00001622502 20
+ENST00000382449 ENSE00001695324 21
+ENST00000382449 ENSE00001659634 22
+ENST00000382449 ENSE00001668820 23
+ENST00000382440 ENSE00001603036 1
+ENST00000382440 ENSE00001640624 2
+ENST00000382440 ENSE00001689126 3
+ENST00000382440 ENSE00001736653 4
+ENST00000382440 ENSE00001659494 5
+ENST00000382440 ENSE00001723992 6
+ENST00000382440 ENSE00001592425 7
+ENST00000382440 ENSE00001676137 8
+ENST00000382440 ENSE00001748863 9
+ENST00000382440 ENSE00001780208 10
+ENST00000382440 ENSE00001682794 11
+ENST00000382440 ENSE00001787726 12
+ENST00000382440 ENSE00001605577 13
+ENST00000382440 ENSE00001736828 14
+ENST00000382440 ENSE00001674452 15
+ENST00000382440 ENSE00001686143 16
+ENST00000382440 ENSE00001607297 17
+ENST00000382440 ENSE00001771158 18
+ENST00000382440 ENSE00001743603 19
+ENST00000382440 ENSE00001706458 20
+ENST00000382440 ENSE00001690453 21
+ENST00000382440 ENSE00001659100 22
+ENST00000382440 ENSE00001717191 23
+ENST00000382440 ENSE00001705999 24
+ENST00000382440 ENSE00001622502 25
+ENST00000382440 ENSE00001695324 26
+ENST00000382440 ENSE00001659634 27
+ENST00000382440 ENSE00001687149 28
+ENST00000382433 ENSE00001943922 1
+ENST00000382433 ENSE00001640624 2
+ENST00000382433 ENSE00001689126 3
+ENST00000382433 ENSE00001736653 4
+ENST00000382433 ENSE00001659494 5
+ENST00000382433 ENSE00001723992 6
+ENST00000382433 ENSE00001592425 7
+ENST00000382433 ENSE00001676137 8
+ENST00000382433 ENSE00001748863 9
+ENST00000382433 ENSE00001787726 10
+ENST00000382433 ENSE00001605577 11
+ENST00000382433 ENSE00001736828 12
+ENST00000382433 ENSE00001674452 13
+ENST00000382433 ENSE00001771158 14
+ENST00000382433 ENSE00001743603 15
+ENST00000382433 ENSE00001706458 16
+ENST00000382433 ENSE00001690453 17
+ENST00000382433 ENSE00001659100 18
+ENST00000382433 ENSE00001717191 19
+ENST00000382433 ENSE00001705999 20
+ENST00000382433 ENSE00001622502 21
+ENST00000382433 ENSE00001695324 22
+ENST00000382433 ENSE00001659634 23
+ENST00000382433 ENSE00001865723 24
+ENST00000382306 ENSE00001603036 1
+ENST00000382306 ENSE00001640624 2
+ENST00000382306 ENSE00001689126 3
+ENST00000382306 ENSE00001736653 4
+ENST00000382306 ENSE00001659494 5
+ENST00000382306 ENSE00001723992 6
+ENST00000382306 ENSE00001592425 7
+ENST00000382306 ENSE00001676137 8
+ENST00000382306 ENSE00001748863 9
+ENST00000382306 ENSE00001780208 10
+ENST00000382306 ENSE00001682794 11
+ENST00000382306 ENSE00001787726 12
+ENST00000382306 ENSE00001674452 13
+ENST00000382306 ENSE00001743603 14
+ENST00000382306 ENSE00001706458 15
+ENST00000382306 ENSE00001690453 16
+ENST00000382306 ENSE00001659100 17
+ENST00000382306 ENSE00001717191 18
+ENST00000382306 ENSE00001705999 19
+ENST00000382306 ENSE00001622502 20
+ENST00000382306 ENSE00001695324 21
+ENST00000382306 ENSE00001659634 22
+ENST00000382306 ENSE00001542120 23
+ENST00000449947 ENSE00001685890 1
+ENST00000449947 ENSE00001640624 2
+ENST00000449947 ENSE00001689126 3
+ENST00000449947 ENSE00001736653 4
+ENST00000449947 ENSE00001659494 5
+ENST00000449947 ENSE00001723992 6
+ENST00000449947 ENSE00001592425 7
+ENST00000449947 ENSE00001676137 8
+ENST00000449947 ENSE00001748863 9
+ENST00000449947 ENSE00001780208 10
+ENST00000449947 ENSE00001682794 11
+ENST00000449947 ENSE00001787726 12
+ENST00000449947 ENSE00001674452 13
+ENST00000449947 ENSE00001622502 14
+ENST00000449947 ENSE00001695324 15
+ENST00000449947 ENSE00001659634 16
+ENST00000449947 ENSE00001687149 17
+ENST00000382294 ENSE00001795799 1
+ENST00000382294 ENSE00001640624 2
+ENST00000382294 ENSE00001689126 3
+ENST00000382294 ENSE00001736653 4
+ENST00000382294 ENSE00001659494 5
+ENST00000382294 ENSE00001723992 6
+ENST00000382294 ENSE00001592425 7
+ENST00000382294 ENSE00001676137 8
+ENST00000382294 ENSE00001682794 9
+ENST00000382294 ENSE00001743603 10
+ENST00000382294 ENSE00001706458 11
+ENST00000382294 ENSE00001690453 12
+ENST00000382294 ENSE00001622502 13
+ENST00000382294 ENSE00001695324 14
+ENST00000382294 ENSE00001659634 15
+ENST00000382294 ENSE00001656459 16
+ENST00000382424 ENSE00001943922 1
+ENST00000382424 ENSE00001640624 2
+ENST00000382424 ENSE00001689126 3
+ENST00000382424 ENSE00001736653 4
+ENST00000382424 ENSE00001659494 5
+ENST00000382424 ENSE00001723992 6
+ENST00000382424 ENSE00001592425 7
+ENST00000382424 ENSE00001676137 8
+ENST00000382424 ENSE00001748863 9
+ENST00000382424 ENSE00001780208 10
+ENST00000382424 ENSE00001682794 11
+ENST00000382424 ENSE00001787726 12
+ENST00000382424 ENSE00001674452 13
+ENST00000382424 ENSE00001771158 14
+ENST00000382424 ENSE00001743603 15
+ENST00000382424 ENSE00001706458 16
+ENST00000382424 ENSE00001690453 17
+ENST00000382424 ENSE00001659100 18
+ENST00000382424 ENSE00001717191 19
+ENST00000382424 ENSE00001705999 20
+ENST00000382424 ENSE00001622502 21
+ENST00000382424 ENSE00001695324 22
+ENST00000382424 ENSE00001659634 23
+ENST00000382424 ENSE00001656459 24
+ENST00000400493 ENSE00001612046 1
+ENST00000400493 ENSE00001640624 2
+ENST00000400493 ENSE00001689126 3
+ENST00000400493 ENSE00001736653 4
+ENST00000400493 ENSE00001659494 5
+ENST00000400493 ENSE00001723992 6
+ENST00000400493 ENSE00001592425 7
+ENST00000400493 ENSE00001676137 8
+ENST00000400493 ENSE00001748863 9
+ENST00000400493 ENSE00001780208 10
+ENST00000400493 ENSE00001787726 11
+ENST00000400493 ENSE00001736828 12
+ENST00000400493 ENSE00001674452 13
+ENST00000400493 ENSE00001607297 14
+ENST00000400493 ENSE00001771158 15
+ENST00000400493 ENSE00001622502 16
+ENST00000400493 ENSE00001695324 17
+ENST00000400493 ENSE00001659634 18
+ENST00000400493 ENSE00001656459 19
+ENST00000382431 ENSE00001612046 1
+ENST00000382431 ENSE00001640624 2
+ENST00000382431 ENSE00001689126 3
+ENST00000382431 ENSE00001736653 4
+ENST00000382431 ENSE00001659494 5
+ENST00000382431 ENSE00001723992 6
+ENST00000382431 ENSE00001592425 7
+ENST00000382431 ENSE00001676137 8
+ENST00000382431 ENSE00001748863 9
+ENST00000382431 ENSE00001780208 10
+ENST00000382431 ENSE00001682794 11
+ENST00000382431 ENSE00001787726 12
+ENST00000382431 ENSE00001605577 13
+ENST00000382431 ENSE00001736828 14
+ENST00000382431 ENSE00001674452 15
+ENST00000382431 ENSE00001607297 16
+ENST00000382431 ENSE00001771158 17
+ENST00000382431 ENSE00001622502 18
+ENST00000382431 ENSE00001695324 19
+ENST00000382431 ENSE00001659634 20
+ENST00000382431 ENSE00002307170 21
+ENST00000382434 ENSE00001603036 1
+ENST00000382434 ENSE00001640624 2
+ENST00000382434 ENSE00001689126 3
+ENST00000382434 ENSE00001736653 4
+ENST00000382434 ENSE00001659494 5
+ENST00000382434 ENSE00001723992 6
+ENST00000382434 ENSE00001592425 7
+ENST00000382434 ENSE00001676137 8
+ENST00000382434 ENSE00001748863 9
+ENST00000382434 ENSE00001780208 10
+ENST00000382434 ENSE00001682794 11
+ENST00000382434 ENSE00001787726 12
+ENST00000382434 ENSE00001736828 13
+ENST00000382434 ENSE00001674452 14
+ENST00000382434 ENSE00002288552 15
+ENST00000382966 ENSE00001494103 1
+ENST00000382966 ENSE00003566412 2
+ENST00000382966 ENSE00001494100 3
+ENST00000493160 ENSE00001936584 1
+ENST00000493160 ENSE00003683672 2
+ENST00000493160 ENSE00003566412 3
+ENST00000493160 ENSE00001833093 4
+ENST00000493160 ENSE00001864445 5
+ENST00000357871 ENSE00001494103 1
+ENST00000357871 ENSE00003566412 2
+ENST00000357871 ENSE00001431292 3
+ENST00000382963 ENSE00001494096 1
+ENST00000382963 ENSE00003683672 2
+ENST00000382963 ENSE00003566412 3
+ENST00000382963 ENSE00001494094 4
+ENST00000382965 ENSE00001494099 1
+ENST00000382965 ENSE00003566412 2
+ENST00000417072 ENSE00001626669 1
+ENST00000417072 ENSE00001628061 2
+ENST00000417072 ENSE00001654045 3
+ENST00000417072 ENSE00001742514 4
+ENST00000417072 ENSE00001625965 5
+ENST00000417072 ENSE00001637388 6
+ENST00000417072 ENSE00001647627 7
+ENST00000450591 ENSE00001713628 1
+ENST00000450591 ENSE00001608925 2
+ENST00000450591 ENSE00001661659 3
+ENST00000450591 ENSE00001718232 4
+ENST00000450591 ENSE00001712140 5
+ENST00000450591 ENSE00001710562 6
+ENST00000450591 ENSE00001697617 7
+ENST00000388836 ENSE00001796890 1
+ENST00000400275 ENSE00001542224 1
+ENST00000536206 ENSE00002304178 1
+ENST00000536206 ENSE00002218148 2
+ENST00000258589 ENSE00002225900 1
+ENST00000258589 ENSE00002060648 2
+ENST00000258589 ENSE00002054613 3
+ENST00000258589 ENSE00002062274 4
+ENST00000258589 ENSE00001794058 5
+ENST00000258589 ENSE00002063262 6
+ENST00000258589 ENSE00002075791 7
+ENST00000258589 ENSE00002083827 8
+ENST00000258589 ENSE00001329242 9
+ENST00000258589 ENSE00001755500 10
+ENST00000258589 ENSE00001609863 11
+ENST00000258589 ENSE00002050610 12
+ENST00000258589 ENSE00001759871 13
+ENST00000258589 ENSE00001803087 14
+ENST00000258589 ENSE00001650788 15
+ENST00000258589 ENSE00001785401 16
+ENST00000258589 ENSE00001711963 17
+ENST00000258589 ENSE00001803764 18
+ENST00000258589 ENSE00001634122 19
+ENST00000258589 ENSE00002035744 20
+ENST00000448881 ENSE00001633881 1
+ENST00000448881 ENSE00003705348 2
+ENST00000448881 ENSE00003708504 3
+ENST00000448881 ENSE00001648607 4
+ENST00000448881 ENSE00001624353 5
+ENST00000448881 ENSE00001654144 6
+ENST00000448881 ENSE00001742721 7
+ENST00000448881 ENSE00001733694 8
+ENST00000400476 ENSE00002063794 1
+ENST00000400476 ENSE00003705348 2
+ENST00000400476 ENSE00003708504 3
+ENST00000400476 ENSE00002073782 4
+ENST00000442535 ENSE00001798056 1
+ENST00000448575 ENSE00001786804 1
+ENST00000448575 ENSE00003707528 2
+ENST00000448575 ENSE00003707660 3
+ENST00000448575 ENSE00001646086 4
+ENST00000448575 ENSE00001635863 5
+ENST00000448575 ENSE00001692837 6
+ENST00000448575 ENSE00001650343 7
+ENST00000448575 ENSE00001693911 8
+ENST00000458444 ENSE00001782800 1
+ENST00000458444 ENSE00003707528 2
+ENST00000458444 ENSE00003707660 3
+ENST00000458444 ENSE00001601487 4
+ENST00000400581 ENSE00001543607 1
+ENST00000400581 ENSE00001543606 2
+ENST00000441139 ENSE00001610376 1
+ENST00000441139 ENSE00002093140 2
+ENST00000441139 ENSE00003244086 3
+ENST00000441139 ENSE00002127178 4
+ENST00000441139 ENSE00002119618 5
+ENST00000441139 ENSE00002135990 6
+ENST00000441139 ENSE00002871965 7
+ENST00000441139 ENSE00002832419 8
+ENST00000441139 ENSE00001709914 9
+ENST00000400605 ENSE00002099269 1
+ENST00000400605 ENSE00002093140 2
+ENST00000400605 ENSE00003244086 3
+ENST00000400605 ENSE00002127178 4
+ENST00000400605 ENSE00002119618 5
+ENST00000400605 ENSE00002135990 6
+ENST00000400605 ENSE00002135691 7
+ENST00000400605 ENSE00002871965 8
+ENST00000400605 ENSE00002832419 9
+ENST00000400605 ENSE00002116806 10
+ENST00000400605 ENSE00002114284 11
+ENST00000513194 ENSE00002035739 1
+ENST00000513194 ENSE00002065873 2
+ENST00000513194 ENSE00002052301 3
+ENST00000513194 ENSE00002063105 4
+ENST00000513194 ENSE00002068678 5
+ENST00000513194 ENSE00002035720 6
+ENST00000513194 ENSE00002040446 7
+ENST00000513194 ENSE00002871965 8
+ENST00000513194 ENSE00002832419 9
+ENST00000513194 ENSE00002087154 10
+ENST00000421118 ENSE00001641174 1
+ENST00000431340 ENSE00001735477 1
+ENST00000431340 ENSE00001670231 2
+ENST00000431340 ENSE00001794413 3
+ENST00000431340 ENSE00001687318 4
+ENST00000415010 ENSE00001616737 1
+ENST00000404428 ENSE00001561333 1
+ENST00000403990 ENSE00001755093 1
+ENST00000406090 ENSE00001564286 1
+ENST00000407745 ENSE00001563558 1
+ENST00000403487 ENSE00001556375 1
+ENST00000403487 ENSE00001551524 2
+ENST00000405035 ENSE00001551713 1
+ENST00000421995 ENSE00001664386 1
+ENST00000449659 ENSE00001757650 1
+ENST00000425912 ENSE00001733565 1
+ENST00000425912 ENSE00001699806 2
+ENST00000425912 ENSE00001780177 3
+ENST00000425912 ENSE00001657364 4
+ENST00000425912 ENSE00001601089 5
+ENST00000439805 ENSE00001753322 1
+ENST00000439805 ENSE00001706758 2
+ENST00000439805 ENSE00001651867 3
+ENST00000439805 ENSE00001608049 4
+ENST00000439805 ENSE00001662653 5
+ENST00000435111 ENSE00001705807 1
+ENST00000435111 ENSE00001746889 2
+ENST00000455085 ENSE00001722988 1
+ENST00000455085 ENSE00001776348 2
+ENST00000455085 ENSE00001631973 3
+ENST00000414667 ENSE00001733982 1
+ENST00000413867 ENSE00001607377 1
+ENST00000413867 ENSE00001744544 2
+ENST00000456360 ENSE00001706460 1
+ENST00000456360 ENSE00003704369 2
+ENST00000456360 ENSE00001600307 3
+ENST00000456360 ENSE00001699202 4
+ENST00000456360 ENSE00003708495 5
+ENST00000456360 ENSE00003704715 6
+ENST00000456360 ENSE00001654717 7
+ENST00000444169 ENSE00001801361 1
+ENST00000444169 ENSE00003704369 2
+ENST00000444169 ENSE00003708495 3
+ENST00000444169 ENSE00003704715 4
+ENST00000444169 ENSE00001684171 5
+ENST00000421387 ENSE00001787469 1
+ENST00000421387 ENSE00001786906 2
+ENST00000412737 ENSE00001624987 1
+ENST00000417252 ENSE00001742768 1
+ENST00000417252 ENSE00001741857 2
+ENST00000417252 ENSE00001756336 3
+ENST00000417252 ENSE00001733532 4
+ENST00000417252 ENSE00001765193 5
+ENST00000417252 ENSE00001788990 6
+ENST00000417252 ENSE00003551602 7
+ENST00000417252 ENSE00003674280 8
+ENST00000417252 ENSE00002050900 9
+ENST00000418671 ENSE00001632235 1
+ENST00000418671 ENSE00001760970 2
+ENST00000418671 ENSE00001709237 3
+ENST00000418671 ENSE00001738839 4
+ENST00000418671 ENSE00001657532 5
+ENST00000418671 ENSE00001682229 6
+ENST00000418671 ENSE00001638892 7
+ENST00000443911 ENSE00001720793 1
+ENST00000443911 ENSE00001745042 2
+ENST00000428060 ENSE00001645847 1
+ENST00000417434 ENSE00001733686 1
+ENST00000440402 ENSE00001640125 1
+ENST00000428380 ENSE00001673618 1
+ENST00000428380 ENSE00001701558 2
+ENST00000446621 ENSE00001666463 1
+ENST00000446621 ENSE00001732944 2
+ENST00000420443 ENSE00001779327 1
+ENST00000420443 ENSE00001715744 2
+ENST00000420443 ENSE00001713276 3
+ENST00000430152 ENSE00001682376 1
+ENST00000430152 ENSE00001714515 2
+ENST00000445253 ENSE00001803533 1
+ENST00000445253 ENSE00001742365 2
+ENST00000445253 ENSE00001763553 3
+ENST00000445253 ENSE00001699801 4
+ENST00000445253 ENSE00001593910 5
+ENST00000435142 ENSE00001609393 1
+ENST00000435142 ENSE00001645218 2
+ENST00000435142 ENSE00001669340 3
+ENST00000435142 ENSE00001648145 4
+ENST00000435142 ENSE00001618505 5
+ENST00000412474 ENSE00001722426 1
+ENST00000412474 ENSE00001774330 2
+ENST00000412474 ENSE00001697230 3
+ENST00000445573 ENSE00001667957 1
+ENST00000445573 ENSE00001741624 2
+ENST00000445573 ENSE00001617890 3
+ENST00000413946 ENSE00001732131 1
+ENST00000413946 ENSE00001695041 2
+ENST00000420810 ENSE00001705100 1
+ENST00000421178 ENSE00001731067 1
+ENST00000421178 ENSE00001688408 2
+ENST00000421178 ENSE00001799980 3
+ENST00000421178 ENSE00001667811 4
+ENST00000421178 ENSE00001697848 5
+ENST00000421178 ENSE00001645782 6
+ENST00000421178 ENSE00001597947 7
+ENST00000421178 ENSE00001783028 8
+ENST00000421178 ENSE00001724466 9
+ENST00000421178 ENSE00001668444 10
+ENST00000421178 ENSE00001787097 11
+ENST00000438971 ENSE00001602981 1
+ENST00000438971 ENSE00001645114 2
+ENST00000415994 ENSE00001632052 1
+ENST00000415994 ENSE00001746774 2
+ENST00000436723 ENSE00001759712 1
+ENST00000436723 ENSE00001655699 2
+ENST00000427212 ENSE00001640260 1
+ENST00000427212 ENSE00003294704 2
+ENST00000417797 ENSE00001787918 1
+ENST00000417797 ENSE00001790627 2
+ENST00000447526 ENSE00001775532 1
+ENST00000451071 ENSE00003556843 1
+ENST00000451071 ENSE00003707698 2
+ENST00000451071 ENSE00003705095 3
+ENST00000451071 ENSE00001787776 4
+ENST00000451071 ENSE00001628066 5
+ENST00000451071 ENSE00001708635 6
+ENST00000451071 ENSE00001792249 7
+ENST00000451071 ENSE00001783563 8
+ENST00000400578 ENSE00002058038 1
+ENST00000400578 ENSE00003707698 2
+ENST00000400578 ENSE00003705095 3
+ENST00000400578 ENSE00003582078 4
+ENST00000413372 ENSE00001730385 1
+ENST00000457708 ENSE00001660101 1
+ENST00000457708 ENSE00001706320 2
+ENST00000445813 ENSE00001793186 1
+ENST00000434179 ENSE00001664773 1
+ENST00000434179 ENSE00001659765 2
+ENST00000434179 ENSE00001628115 3
+ENST00000434179 ENSE00001615798 4
+ENST00000434179 ENSE00001787460 5
+ENST00000413200 ENSE00001747464 1
+ENST00000443200 ENSE00001728661 1
+ENST00000443200 ENSE00001689812 2
+ENST00000443200 ENSE00001613966 3
+ENST00000421205 ENSE00001614250 1
+ENST00000421205 ENSE00001615218 2
+ENST00000421205 ENSE00001658609 3
+ENST00000421205 ENSE00001802978 4
+ENST00000421205 ENSE00001752917 5
+ENST00000421205 ENSE00001742406 6
+ENST00000421205 ENSE00001606348 7
+ENST00000443820 ENSE00001646418 1
+ENST00000443820 ENSE00001596298 2
+ENST00000443820 ENSE00001692812 3
+ENST00000443820 ENSE00001685756 4
+ENST00000443820 ENSE00001804752 5
+ENST00000443820 ENSE00001603823 6
+ENST00000443820 ENSE00001679516 7
+ENST00000443820 ENSE00001625545 8
+ENST00000422655 ENSE00001687009 1
+ENST00000422655 ENSE00001681165 2
+ENST00000426293 ENSE00001664672 1
+ENST00000426293 ENSE00001793473 2
+ENST00000426293 ENSE00001676006 3
+ENST00000431179 ENSE00001665330 1
+ENST00000431179 ENSE00001718020 2
+ENST00000431179 ENSE00001736617 3
+ENST00000431179 ENSE00001660147 4
+ENST00000431179 ENSE00001747287 5
+ENST00000431179 ENSE00001772012 6
+ENST00000431179 ENSE00001591413 7
+ENST00000431179 ENSE00001706519 8
+ENST00000431179 ENSE00001681474 9
+ENST00000431179 ENSE00001767346 10
+ENST00000431179 ENSE00001660597 11
+ENST00000412165 ENSE00001766514 1
+ENST00000412165 ENSE00001667582 2
+ENST00000434110 ENSE00001592213 1
+ENST00000434110 ENSE00001776907 2
+ENST00000434110 ENSE00001725078 3
+ENST00000434110 ENSE00001700142 4
+ENST00000434110 ENSE00001645488 5
+ENST00000434110 ENSE00001633384 6
+ENST00000434110 ENSE00001701227 7
+ENST00000434110 ENSE00001700211 8
+ENST00000434110 ENSE00001624131 9
+ENST00000433767 ENSE00001714386 1
+ENST00000433767 ENSE00001771613 2
+ENST00000450145 ENSE00001677160 1
+ENST00000450145 ENSE00001679028 2
+ENST00000450145 ENSE00001715921 3
+ENST00000450145 ENSE00001780929 4
+ENST00000450145 ENSE00001670508 5
+ENST00000450145 ENSE00001760125 6
+ENST00000450145 ENSE00001765669 7
+ENST00000450145 ENSE00001594565 8
+ENST00000450145 ENSE00001723672 9
+ENST00000423213 ENSE00001677160 1
+ENST00000423213 ENSE00001679028 2
+ENST00000423213 ENSE00001715921 3
+ENST00000423213 ENSE00001780929 4
+ENST00000423213 ENSE00001670508 5
+ENST00000423213 ENSE00001760125 6
+ENST00000423213 ENSE00001765669 7
+ENST00000423213 ENSE00001594565 8
+ENST00000423213 ENSE00001659766 9
+ENST00000430032 ENSE00002290744 1
+ENST00000430032 ENSE00002234593 2
+ENST00000437686 ENSE00001723990 1
+ENST00000437686 ENSE00001745366 2
+ENST00000426661 ENSE00001785693 1
+ENST00000426661 ENSE00001643138 2
+ENST00000426661 ENSE00001806627 3
+ENST00000426661 ENSE00001617913 4
+ENST00000432394 ENSE00001725040 1
+ENST00000432394 ENSE00001616038 2
+ENST00000432394 ENSE00001645870 3
+ENST00000432394 ENSE00001746455 4
+ENST00000447528 ENSE00001695542 1
+ENST00000447528 ENSE00001711695 2
+ENST00000453474 ENSE00001622377 1
+ENST00000453474 ENSE00001786914 2
+ENST00000453474 ENSE00001763025 3
+ENST00000440297 ENSE00001649733 1
+ENST00000440297 ENSE00001795209 2
+ENST00000440297 ENSE00001658484 3
+ENST00000440297 ENSE00001602075 4
+ENST00000440297 ENSE00001646110 5
+ENST00000440297 ENSE00001782218 6
+ENST00000440297 ENSE00001712850 7
+ENST00000440297 ENSE00001806612 8
+ENST00000440297 ENSE00001717995 9
+ENST00000458209 ENSE00001712612 1
+ENST00000456541 ENSE00001752848 1
+ENST00000456541 ENSE00001543999 2
+ENST00000426936 ENSE00001731603 1
+ENST00000426936 ENSE00001617820 2
+ENST00000426936 ENSE00001788369 3
+ENST00000426936 ENSE00001672593 4
+ENST00000426936 ENSE00001688242 5
+ENST00000426936 ENSE00001713327 6
+ENST00000443934 ENSE00001592550 1
+ENST00000429066 ENSE00001642025 1
+ENST00000418221 ENSE00001618473 1
+ENST00000418221 ENSE00001629725 2
+ENST00000418221 ENSE00001758450 3
+ENST00000418221 ENSE00001591894 4
+ENST00000418221 ENSE00001757932 5
+ENST00000418221 ENSE00001666134 6
+ENST00000418221 ENSE00001686479 7
+ENST00000445125 ENSE00001653013 1
+ENST00000426199 ENSE00001652052 1
+ENST00000436888 ENSE00001638514 1
+ENST00000440676 ENSE00001732026 1
+ENST00000440676 ENSE00001643224 2
+ENST00000440676 ENSE00001686603 3
+ENST00000425789 ENSE00001722408 1
+ENST00000411756 ENSE00001677525 1
+ENST00000411756 ENSE00001668583 2
+ENST00000411756 ENSE00001727103 3
+ENST00000411756 ENSE00001635486 4
+ENST00000411756 ENSE00001702927 5
+ENST00000411756 ENSE00001667443 6
+ENST00000411756 ENSE00001638683 7
+ENST00000411756 ENSE00001673115 8
+ENST00000411756 ENSE00001615368 9
+ENST00000411756 ENSE00001675564 10
+ENST00000411756 ENSE00001738547 11
+ENST00000452415 ENSE00001674051 1
+ENST00000455560 ENSE00001634395 1
+ENST00000445405 ENSE00001756820 1
+ENST00000445405 ENSE00001674806 2
+ENST00000445405 ENSE00001673237 3
+ENST00000445405 ENSE00001732143 4
+ENST00000445405 ENSE00001679690 5
+ENST00000445405 ENSE00001782041 6
+ENST00000445405 ENSE00001745014 7
+ENST00000445405 ENSE00001743398 8
+ENST00000413550 ENSE00001650266 1
+ENST00000413550 ENSE00001796711 2
+ENST00000413550 ENSE00001805326 3
+ENST00000413550 ENSE00001739171 4
+ENST00000413550 ENSE00001615615 5
+ENST00000449843 ENSE00001644643 1
+ENST00000416040 ENSE00001634743 1
+ENST00000445421 ENSE00001712081 1
+ENST00000445421 ENSE00001691275 2
+ENST00000458627 ENSE00001719617 1
+ENST00000458627 ENSE00001722479 2
+ENST00000458627 ENSE00001650606 3
+ENST00000458627 ENSE00001798385 4
+ENST00000458627 ENSE00001740686 5
+ENST00000458627 ENSE00001660603 6
+ENST00000458627 ENSE00001753397 7
+ENST00000427537 ENSE00001787453 1
+ENST00000427537 ENSE00001755017 2
+ENST00000427537 ENSE00001689362 3
+ENST00000427537 ENSE00001619402 4
+ENST00000411487 ENSE00001707653 1
+ENST00000423409 ENSE00001754048 1
+ENST00000445453 ENSE00001631720 1
+ENST00000455254 ENSE00001598941 1
+ENST00000421895 ENSE00001619110 1
+ENST00000421895 ENSE00001655047 2
+ENST00000421895 ENSE00001663253 3
+ENST00000421895 ENSE00001599244 4
+ENST00000421895 ENSE00001716829 5
+ENST00000421895 ENSE00001740285 6
+ENST00000421895 ENSE00001722377 7
+ENST00000421895 ENSE00001799387 8
+ENST00000421895 ENSE00001639175 9
+ENST00000421895 ENSE00001602332 10
+ENST00000434773 ENSE00001740373 1
+ENST00000434773 ENSE00001638353 2
+ENST00000434773 ENSE00001697673 3
+ENST00000431260 ENSE00001618557 1
+ENST00000436568 ENSE00001760629 1
+ENST00000436568 ENSE00001687747 2
+ENST00000436568 ENSE00001806588 3
+ENST00000436568 ENSE00001713773 4
+ENST00000437359 ENSE00001715524 1
+ENST00000437359 ENSE00001803876 2
+ENST00000437359 ENSE00001802420 3
+ENST00000437359 ENSE00001631333 4
+ENST00000437359 ENSE00001792500 5
+ENST00000437359 ENSE00001606958 6
+ENST00000437359 ENSE00001748340 7
+ENST00000437359 ENSE00001745595 8
+ENST00000437359 ENSE00001655007 9
+ENST00000470460 ENSE00001958730 1
+ENST00000470460 ENSE00003461217 2
+ENST00000470460 ENSE00003566508 3
+ENST00000470460 ENSE00003512617 4
+ENST00000470460 ENSE00003628086 5
+ENST00000470460 ENSE00003475218 6
+ENST00000470460 ENSE00003524115 7
+ENST00000470460 ENSE00003544610 8
+ENST00000470460 ENSE00003519695 9
+ENST00000470460 ENSE00003617706 10
+ENST00000470460 ENSE00003462282 11
+ENST00000250831 ENSE00001775261 1
+ENST00000250831 ENSE00003571276 2
+ENST00000250831 ENSE00003560118 3
+ENST00000250831 ENSE00001663738 4
+ENST00000250831 ENSE00003481268 5
+ENST00000250831 ENSE00003643053 6
+ENST00000250831 ENSE00003478187 7
+ENST00000250831 ENSE00003515241 8
+ENST00000250831 ENSE00003518190 9
+ENST00000250831 ENSE00003566008 10
+ENST00000250831 ENSE00003680149 11
+ENST00000250831 ENSE00003509588 12
+ENST00000414629 ENSE00001594771 1
+ENST00000414629 ENSE00003571276 2
+ENST00000414629 ENSE00003560118 3
+ENST00000414629 ENSE00001663738 4
+ENST00000414629 ENSE00003481268 5
+ENST00000414629 ENSE00003643053 6
+ENST00000414629 ENSE00003478187 7
+ENST00000414629 ENSE00003515241 8
+ENST00000414629 ENSE00003518190 9
+ENST00000414629 ENSE00003566008 10
+ENST00000414629 ENSE00003680149 11
+ENST00000414629 ENSE00003509588 12
+ENST00000445779 ENSE00001594771 1
+ENST00000445779 ENSE00003571276 2
+ENST00000445779 ENSE00003560118 3
+ENST00000445779 ENSE00001663738 4
+ENST00000445779 ENSE00003481268 5
+ENST00000445779 ENSE00003643053 6
+ENST00000445779 ENSE00003478187 7
+ENST00000445779 ENSE00003515241 8
+ENST00000445779 ENSE00003518190 9
+ENST00000445779 ENSE00003680149 10
+ENST00000445779 ENSE00003509588 11
+ENST00000413610 ENSE00001766942 1
+ENST00000412493 ENSE00001691508 1
+ENST00000412493 ENSE00001674716 2
+ENST00000412493 ENSE00001733692 3
+ENST00000412493 ENSE00001677440 4
+ENST00000412493 ENSE00001757764 5
+ENST00000425544 ENSE00001804714 1
+ENST00000425544 ENSE00001711635 2
+ENST00000425544 ENSE00001760261 3
+ENST00000425544 ENSE00001636850 4
+ENST00000425544 ENSE00001595014 5
+ENST00000425544 ENSE00001745989 6
+ENST00000425544 ENSE00001636511 7
+ENST00000425544 ENSE00001632004 8
+ENST00000425544 ENSE00001798989 9
+ENST00000438459 ENSE00001794129 1
+ENST00000438459 ENSE00001605189 2
+ENST00000444242 ENSE00001600631 1
+ENST00000444242 ENSE00001611586 2
+ENST00000441906 ENSE00001616321 1
+ENST00000441906 ENSE00001618682 2
+ENST00000426983 ENSE00001698182 1
+ENST00000426983 ENSE00001796451 2
+ENST00000426983 ENSE00001650846 3
+ENST00000426983 ENSE00001799290 4
+ENST00000426983 ENSE00001659363 5
+ENST00000381172 ENSE00001700980 1
+ENST00000381172 ENSE00001778840 2
+ENST00000381172 ENSE00001683270 3
+ENST00000381172 ENSE00001612791 4
+ENST00000381172 ENSE00001736429 5
+ENST00000381172 ENSE00001608554 6
+ENST00000381172 ENSE00001712855 7
+ENST00000381172 ENSE00001639452 8
+ENST00000430663 ENSE00001615810 1
+ENST00000430663 ENSE00001622792 2
+ENST00000430663 ENSE00001804045 3
+ENST00000430663 ENSE00001713121 4
+ENST00000456738 ENSE00001632993 1
+ENST00000456738 ENSE00001691206 2
+ENST00000456738 ENSE00001652425 3
+ENST00000423333 ENSE00001775619 1
+ENST00000423333 ENSE00001799498 2
+ENST00000423333 ENSE00001705987 3
+ENST00000423333 ENSE00001714230 4
+ENST00000423333 ENSE00001736832 5
+ENST00000423333 ENSE00001670100 6
+ENST00000420603 ENSE00001801471 1
+ENST00000420603 ENSE00001777749 2
+ENST00000432201 ENSE00001744076 1
+ENST00000432201 ENSE00001648363 2
+ENST00000445313 ENSE00001761637 1
+ENST00000445313 ENSE00001646263 2
+ENST00000436690 ENSE00001738572 1
+ENST00000450481 ENSE00001647752 1
+ENST00000450481 ENSE00001779865 2
+ENST00000450481 ENSE00001620696 3
+ENST00000450481 ENSE00001763493 4
+ENST00000444494 ENSE00001791861 1
+ENST00000423569 ENSE00001806367 1
+ENST00000411585 ENSE00001727723 1
+ENST00000411585 ENSE00001647639 2
+ENST00000411585 ENSE00001611070 3
+ENST00000411585 ENSE00001757809 4
+ENST00000428070 ENSE00001649434 1
+ENST00000432613 ENSE00001694343 1
+ENST00000432613 ENSE00001742226 2
+ENST00000432613 ENSE00001769856 3
+ENST00000432613 ENSE00001698979 4
+ENST00000416110 ENSE00001747233 1
+ENST00000416110 ENSE00001718178 2
+ENST00000452426 ENSE00001743100 1
+ENST00000452426 ENSE00001753135 2
+ENST00000452426 ENSE00001762790 3
+ENST00000452426 ENSE00001609952 4
+ENST00000452426 ENSE00001788043 5
+ENST00000456123 ENSE00001613955 1
+ENST00000456123 ENSE00001793058 2
+ENST00000456123 ENSE00001658199 3
+ENST00000456123 ENSE00001702789 4
+ENST00000449963 ENSE00001670654 1
+ENST00000449963 ENSE00001742535 2
+ENST00000449963 ENSE00001782038 3
+ENST00000433321 ENSE00001628818 1
+ENST00000433321 ENSE00001645989 2
+ENST00000433321 ENSE00001671314 3
+ENST00000433321 ENSE00001645127 4
+ENST00000436801 ENSE00001623303 1
+ENST00000436801 ENSE00003646877 2
+ENST00000436801 ENSE00001624981 3
+ENST00000436801 ENSE00001776488 4
+ENST00000452256 ENSE00002237197 1
+ENST00000452256 ENSE00002305238 2
+ENST00000452256 ENSE00002214727 3
+ENST00000452256 ENSE00002234764 4
+ENST00000452256 ENSE00001705725 5
+ENST00000452256 ENSE00001777540 6
+ENST00000452256 ENSE00001698821 7
+ENST00000452256 ENSE00003487206 8
+ENST00000452256 ENSE00001776941 9
+ENST00000432892 ENSE00001786986 1
+ENST00000432892 ENSE00001636298 2
+ENST00000432892 ENSE00001724614 3
+ENST00000432892 ENSE00001641069 4
+ENST00000432892 ENSE00001705725 5
+ENST00000432892 ENSE00001777540 6
+ENST00000432892 ENSE00001698821 7
+ENST00000432892 ENSE00003487206 8
+ENST00000432892 ENSE00001776941 9
+ENST00000420524 ENSE00002237197 1
+ENST00000420524 ENSE00002259310 2
+ENST00000420524 ENSE00002243480 3
+ENST00000420524 ENSE00001731628 4
+ENST00000420524 ENSE00001702837 5
+ENST00000420524 ENSE00001777950 6
+ENST00000420524 ENSE00001759506 7
+ENST00000420524 ENSE00003471081 8
+ENST00000420524 ENSE00001630624 9
+ENST00000418290 ENSE00001755166 1
+ENST00000418290 ENSE00001729630 2
+ENST00000415662 ENSE00001708301 1
+ENST00000415662 ENSE00001718075 2
+ENST00000415662 ENSE00001596904 3
+ENST00000451854 ENSE00001691278 1
+ENST00000432672 ENSE00001592952 1
+ENST00000432672 ENSE00001667759 2
+ENST00000422712 ENSE00001788558 1
+ENST00000422712 ENSE00001790420 2
+ENST00000422712 ENSE00001783650 3
+ENST00000422712 ENSE00001800896 4
+ENST00000420174 ENSE00001753165 1
+ENST00000420174 ENSE00001624781 2
+ENST00000420174 ENSE00001800341 3
+ENST00000427373 ENSE00001796314 1
+ENST00000427373 ENSE00001653980 2
+ENST00000427373 ENSE00001596444 3
+ENST00000434164 ENSE00001638780 1
+ENST00000434164 ENSE00001750121 2
+ENST00000434164 ENSE00001766520 3
+ENST00000434164 ENSE00001648650 4
+ENST00000440679 ENSE00001720861 1
+ENST00000430228 ENSE00001806349 1
+ENST00000430228 ENSE00001729950 2
+ENST00000457222 ENSE00001774191 1
+ENST00000457222 ENSE00003692666 2
+ENST00000457222 ENSE00001747539 3
+ENST00000457222 ENSE00001719444 4
+ENST00000457222 ENSE00001781353 5
+ENST00000457222 ENSE00001954357 6
+ENST00000424594 ENSE00003617259 1
+ENST00000424594 ENSE00003692666 2
+ENST00000424594 ENSE00001747539 3
+ENST00000424594 ENSE00001719444 4
+ENST00000424594 ENSE00003611797 5
+ENST00000424594 ENSE00001758618 6
+ENST00000491844 ENSE00003511805 1
+ENST00000491844 ENSE00003461293 2
+ENST00000491844 ENSE00003601991 3
+ENST00000491844 ENSE00001879368 4
+ENST00000469322 ENSE00001932930 1
+ENST00000469322 ENSE00003565177 2
+ENST00000469322 ENSE00001931598 3
+ENST00000440483 ENSE00001774191 1
+ENST00000440483 ENSE00003692666 2
+ENST00000440483 ENSE00001747539 3
+ENST00000440483 ENSE00003463520 4
+ENST00000440483 ENSE00003659094 5
+ENST00000440483 ENSE00003525623 6
+ENST00000427871 ENSE00001618274 1
+ENST00000427871 ENSE00001610531 2
+ENST00000427871 ENSE00003692666 3
+ENST00000427871 ENSE00003525860 4
+ENST00000427871 ENSE00001879368 5
+ENST00000426660 ENSE00001724421 1
+ENST00000426660 ENSE00001776105 2
+ENST00000426660 ENSE00001726546 3
+ENST00000426660 ENSE00001754461 4
+ENST00000426660 ENSE00001804017 5
+ENST00000426660 ENSE00001740783 6
+ENST00000426660 ENSE00001721829 7
+ENST00000426660 ENSE00001689481 8
+ENST00000426660 ENSE00001600079 9
+ENST00000415776 ENSE00001770854 1
+ENST00000413320 ENSE00001707259 1
+ENST00000413320 ENSE00001614194 2
+ENST00000440136 ENSE00001610381 1
+ENST00000440136 ENSE00001729283 2
+ENST00000440136 ENSE00001736394 3
+ENST00000440136 ENSE00001611617 4
+ENST00000440136 ENSE00001675926 5
+ENST00000414182 ENSE00001786158 1
+ENST00000414182 ENSE00001605858 2
+ENST00000414182 ENSE00001748625 3
+ENST00000434454 ENSE00001628114 1
+ENST00000434454 ENSE00001632712 2
+ENST00000434454 ENSE00001592139 3
+ENST00000434454 ENSE00001666942 4
+ENST00000434454 ENSE00001734562 5
+ENST00000434454 ENSE00001636017 6
+ENST00000434454 ENSE00001733825 7
+ENST00000423480 ENSE00001657129 1
+ENST00000423480 ENSE00001754863 2
+ENST00000423480 ENSE00001761172 3
+ENST00000423480 ENSE00001773734 4
+ENST00000455084 ENSE00001757026 1
+ENST00000455084 ENSE00001688639 2
+ENST00000455084 ENSE00001719273 3
+ENST00000455084 ENSE00001669312 4
+ENST00000455084 ENSE00001646929 5
+ENST00000455084 ENSE00001717483 6
+ENST00000439472 ENSE00001652776 1
+ENST00000439472 ENSE00001755411 2
+ENST00000367272 ENSE00001642256 1
+ENST00000367272 ENSE00001701173 2
+ENST00000367272 ENSE00001806782 3
+ENST00000367272 ENSE00001713752 4
+ENST00000367272 ENSE00001655231 5
+ENST00000367272 ENSE00001777012 6
+ENST00000367272 ENSE00001591489 7
+ENST00000367272 ENSE00001678181 8
+ENST00000367272 ENSE00002438533 9
+ENST00000367272 ENSE00001666272 10
+ENST00000367272 ENSE00001694641 11
+ENST00000367272 ENSE00001689979 12
+ENST00000367272 ENSE00001680774 13
+ENST00000431405 ENSE00001638605 1
+ENST00000431405 ENSE00001619709 2
+ENST00000431405 ENSE00001732018 3
+ENST00000431405 ENSE00001600167 4
+ENST00000431405 ENSE00001682049 5
+ENST00000454958 ENSE00001691510 1
+ENST00000454958 ENSE00001725503 2
+ENST00000426699 ENSE00001806707 1
+ENST00000426699 ENSE00001708248 2
+ENST00000451423 ENSE00001747579 1
+ENST00000455273 ENSE00001742312 1
+ENST00000455273 ENSE00001785470 2
+ENST00000455273 ENSE00001687737 3
+ENST00000455273 ENSE00001738169 4
+ENST00000455273 ENSE00001804747 5
+ENST00000455273 ENSE00001597616 6
+ENST00000455273 ENSE00001629276 7
+ENST00000455273 ENSE00001688905 8
+ENST00000455273 ENSE00001626846 9
+ENST00000455273 ENSE00001658632 10
+ENST00000455273 ENSE00001653676 11
+ENST00000441913 ENSE00001771502 1
+ENST00000441913 ENSE00001774620 2
+ENST00000441913 ENSE00001774789 3
+ENST00000425411 ENSE00001793905 1
+ENST00000439651 ENSE00001678856 1
+ENST00000287721 ENSE00002028706 1
+ENST00000287721 ENSE00003676294 2
+ENST00000287721 ENSE00002045368 3
+ENST00000287721 ENSE00001638906 4
+ENST00000287721 ENSE00003564905 5
+ENST00000287721 ENSE00001885622 6
+ENST00000383000 ENSE00003489927 1
+ENST00000383000 ENSE00003676294 2
+ENST00000383000 ENSE00002045368 3
+ENST00000383000 ENSE00001638906 4
+ENST00000383000 ENSE00001801838 5
+ENST00000383000 ENSE00001603826 6
+ENST00000477879 ENSE00003579602 1
+ENST00000477879 ENSE00003593279 2
+ENST00000477879 ENSE00001943292 3
+ENST00000477879 ENSE00001940527 4
+ENST00000436159 ENSE00001908923 1
+ENST00000436159 ENSE00003618941 2
+ENST00000436159 ENSE00001859698 3
+ENST00000383005 ENSE00002028706 1
+ENST00000383005 ENSE00003676294 2
+ENST00000383005 ENSE00002045368 3
+ENST00000383005 ENSE00003586983 4
+ENST00000383005 ENSE00003603865 5
+ENST00000383005 ENSE00001720810 6
+ENST00000330628 ENSE00002220142 1
+ENST00000330628 ENSE00002247562 2
+ENST00000330628 ENSE00001350919 3
+ENST00000330628 ENSE00003564996 4
+ENST00000330628 ENSE00003465500 5
+ENST00000330628 ENSE00003586983 6
+ENST00000330628 ENSE00003603865 7
+ENST00000330628 ENSE00001720810 8
+ENST00000537415 ENSE00001159243 1
+ENST00000537415 ENSE00003676294 2
+ENST00000537415 ENSE00002045368 3
+ENST00000537415 ENSE00001638906 4
+ENST00000537415 ENSE00002206433 5
+ENST00000449858 ENSE00001806624 1
+ENST00000444617 ENSE00001696124 1
+ENST00000444617 ENSE00001722283 2
+ENST00000444617 ENSE00001646198 3
+ENST00000444617 ENSE00001630914 4
+ENST00000444617 ENSE00001751441 5
+ENST00000439525 ENSE00001671403 1
+ENST00000439525 ENSE00001660336 2
+ENST00000439525 ENSE00001785311 3
+ENST00000424581 ENSE00001640432 1
+ENST00000424581 ENSE00001650835 2
+ENST00000424581 ENSE00001768767 3
+ENST00000424581 ENSE00001734323 4
+ENST00000424581 ENSE00001757798 5
+ENST00000414540 ENSE00001754078 1
+ENST00000414540 ENSE00001801282 2
+ENST00000414540 ENSE00001669347 3
+ENST00000422208 ENSE00001781088 1
+ENST00000422208 ENSE00001788779 2
+ENST00000422208 ENSE00001602297 3
+ENST00000422208 ENSE00001712311 4
+ENST00000427496 ENSE00001663391 1
+ENST00000427496 ENSE00001634582 2
+ENST00000427496 ENSE00001626840 3
+ENST00000434308 ENSE00001668677 1
+ENST00000434308 ENSE00001765004 2
+ENST00000434308 ENSE00001613408 3
+ENST00000447168 ENSE00001664411 1
+ENST00000427622 ENSE00001675278 1
+ENST00000427622 ENSE00001648707 2
+ENST00000427622 ENSE00001614122 3
+ENST00000427622 ENSE00001755116 4
+ENST00000414596 ENSE00001615889 1
+ENST00000414596 ENSE00001739908 2
+ENST00000414596 ENSE00001694789 3
+ENST00000414596 ENSE00001664818 4
+ENST00000444350 ENSE00001596967 1
+ENST00000444350 ENSE00001793420 2
+ENST00000444350 ENSE00001682691 3
+ENST00000444350 ENSE00001762358 4
+ENST00000439968 ENSE00001653366 1
+ENST00000458706 ENSE00001719426 1
+ENST00000446387 ENSE00001766548 1
+ENST00000417699 ENSE00001718469 1
+ENST00000417699 ENSE00001757100 2
+ENST00000417699 ENSE00001774249 3
+ENST00000417699 ENSE00001660181 4
+ENST00000417699 ENSE00001783530 5
+ENST00000417699 ENSE00001683819 6
+ENST00000417699 ENSE00001748455 7
+ENST00000417699 ENSE00001605377 8
+ENST00000417699 ENSE00001631605 9
+ENST00000458667 ENSE00001743559 1
+ENST00000458667 ENSE00001734255 2
+ENST00000458667 ENSE00001754684 3
+ENST00000458667 ENSE00001738421 4
+ENST00000458667 ENSE00001606680 5
+ENST00000424230 ENSE00001722745 1
+ENST00000424230 ENSE00001612746 2
+ENST00000424230 ENSE00001617577 3
+ENST00000424230 ENSE00001806297 4
+ENST00000424230 ENSE00001803898 5
+ENST00000424230 ENSE00001607696 6
+ENST00000434374 ENSE00001627359 1
+ENST00000434374 ENSE00001652271 2
+ENST00000434374 ENSE00001614594 3
+ENST00000434374 ENSE00001693270 4
+ENST00000434374 ENSE00001639643 5
+ENST00000451173 ENSE00001679003 1
+ENST00000419224 ENSE00001610666 1
+ENST00000419224 ENSE00001663003 2
+ENST00000419224 ENSE00001597803 3
+ENST00000419224 ENSE00001638987 4
+ENST00000419224 ENSE00001596879 5
+ENST00000432915 ENSE00001768357 1
+ENST00000418278 ENSE00001694457 1
+ENST00000414395 ENSE00001592853 1
+ENST00000417334 ENSE00001685903 1
+ENST00000417334 ENSE00001745631 2
+ENST00000417334 ENSE00001676858 3
+ENST00000449148 ENSE00001724152 1
+ENST00000449148 ENSE00001669607 2
+ENST00000449148 ENSE00001639143 3
+ENST00000449148 ENSE00001650621 4
+ENST00000449148 ENSE00001633081 5
+ENST00000449148 ENSE00001602954 6
+ENST00000449148 ENSE00001699591 7
+ENST00000449148 ENSE00001771661 8
+ENST00000449148 ENSE00001632036 9
+ENST00000416652 ENSE00001791163 1
+ENST00000416652 ENSE00001745360 2
+ENST00000416652 ENSE00001646564 3
+ENST00000430062 ENSE00001732601 1
+ENST00000414121 ENSE00001622837 1
+ENST00000414121 ENSE00001747409 2
+ENST00000446466 ENSE00001674006 1
+ENST00000437794 ENSE00001636194 1
+ENST00000437794 ENSE00001794176 2
+ENST00000449381 ENSE00001756377 1
+ENST00000449381 ENSE00001746468 2
+ENST00000449381 ENSE00001745811 3
+ENST00000449381 ENSE00001752960 4
+ENST00000449381 ENSE00001702756 5
+ENST00000449381 ENSE00001799300 6
+ENST00000449381 ENSE00001670043 7
+ENST00000449381 ENSE00001770923 8
+ENST00000435741 ENSE00001616687 1
+ENST00000444263 ENSE00001678149 1
+ENST00000444263 ENSE00001757585 2
+ENST00000444263 ENSE00001696132 3
+ENST00000425031 ENSE00001677281 1
+ENST00000425031 ENSE00001757585 2
+ENST00000425031 ENSE00001620860 3
+ENST00000425031 ENSE00001694023 4
+ENST00000418624 ENSE00001634403 1
+ENST00000451661 ENSE00001705880 1
+ENST00000441642 ENSE00001632978 1
+ENST00000441642 ENSE00001683725 2
+ENST00000453312 ENSE00001617130 1
+ENST00000453312 ENSE00001728356 2
+ENST00000453312 ENSE00001783134 3
+ENST00000453312 ENSE00001750562 4
+ENST00000453312 ENSE00001675644 5
+ENST00000453312 ENSE00001603923 6
+ENST00000443152 ENSE00001804178 1
+ENST00000442090 ENSE00001788264 1
+ENST00000412220 ENSE00001800082 1
+ENST00000412220 ENSE00001746593 2
+ENST00000430735 ENSE00001614266 1
+ENST00000438294 ENSE00001714431 1
+ENST00000438294 ENSE00001710497 2
+ENST00000398758 ENSE00001691217 1
+ENST00000398758 ENSE00001804878 2
+ENST00000398758 ENSE00001737381 3
+ENST00000454543 ENSE00001717478 1
+ENST00000413486 ENSE00001650635 1
+ENST00000413486 ENSE00001733122 2
+ENST00000413486 ENSE00001613384 3
+ENST00000453955 ENSE00001715519 1
+ENST00000453955 ENSE00001776324 2
+ENST00000445112 ENSE00001657190 1
+ENST00000445112 ENSE00001765641 2
+ENST00000445112 ENSE00001718086 3
+ENST00000445112 ENSE00001712499 4
+ENST00000445112 ENSE00001732839 5
+ENST00000453726 ENSE00001623599 1
+ENST00000453726 ENSE00001768555 2
+ENST00000450781 ENSE00001607684 1
+ENST00000450781 ENSE00001704650 2
+ENST00000450781 ENSE00001699413 3
+ENST00000434155 ENSE00001774695 1
+ENST00000434155 ENSE00001648340 2
+ENST00000427963 ENSE00001644797 1
+ENST00000427963 ENSE00001599057 2
+ENST00000427963 ENSE00001651852 3
+ENST00000427963 ENSE00001726876 4
+ENST00000427963 ENSE00001788414 5
+ENST00000427963 ENSE00001741171 6
+ENST00000427963 ENSE00001782936 7
+ENST00000427963 ENSE00001616123 8
+ENST00000427963 ENSE00001758121 9
+ENST00000427963 ENSE00001695453 10
+ENST00000427963 ENSE00001664413 11
+ENST00000429406 ENSE00001694546 1
+ENST00000429406 ENSE00001642941 2
+ENST00000429406 ENSE00001689808 3
+ENST00000429406 ENSE00001742552 4
+ENST00000426129 ENSE00001659626 1
+ENST00000426129 ENSE00001688131 2
+ENST00000450329 ENSE00001679725 1
+ENST00000436364 ENSE00001666493 1
+ENST00000414751 ENSE00001693593 1
+ENST00000435433 ENSE00001728711 1
+ENST00000430287 ENSE00001767583 1
+ENST00000430287 ENSE00001743874 2
+ENST00000430287 ENSE00001592268 3
+ENST00000436338 ENSE00001612213 1
+ENST00000436338 ENSE00001761722 2
+ENST00000434487 ENSE00001800398 1
+ENST00000434487 ENSE00001695576 2
+ENST00000434556 ENSE00001763167 1
+ENST00000434556 ENSE00001732672 2
+ENST00000434556 ENSE00001649962 3
+ENST00000451397 ENSE00001702931 1
+ENST00000449393 ENSE00001772654 1
+ENST00000421750 ENSE00001767851 1
+ENST00000421750 ENSE00001731820 2
+ENST00000421750 ENSE00001621307 3
+ENST00000420090 ENSE00001778513 1
+ENST00000427677 ENSE00001789966 1
+ENST00000425589 ENSE00001599138 1
+ENST00000425589 ENSE00001781743 2
+ENST00000417305 ENSE00001615694 1
+ENST00000417305 ENSE00001670294 2
+ENST00000417305 ENSE00001787837 3
+ENST00000431145 ENSE00001664412 1
+ENST00000431145 ENSE00001751906 2
+ENST00000431145 ENSE00001659262 3
+ENST00000442391 ENSE00001618370 1
+ENST00000442391 ENSE00001740946 2
+ENST00000442391 ENSE00001805102 3
+ENST00000442391 ENSE00001724905 4
+ENST00000428264 ENSE00001670949 1
+ENST00000428264 ENSE00001795349 2
+ENST00000428264 ENSE00001626354 3
+ENST00000437934 ENSE00001715878 1
+ENST00000437934 ENSE00001703739 2
+ENST00000453983 ENSE00001796121 1
+ENST00000453983 ENSE00001737699 2
+ENST00000453983 ENSE00001743166 3
+ENST00000453983 ENSE00001784518 4
+ENST00000453983 ENSE00001605344 5
+ENST00000419557 ENSE00001672475 1
+ENST00000419557 ENSE00001640636 2
+ENST00000419557 ENSE00001758131 3
+ENST00000419557 ENSE00001710040 4
+ENST00000419557 ENSE00001790139 5
+ENST00000451909 ENSE00001706607 1
+ENST00000437571 ENSE00001779202 1
+ENST00000433995 ENSE00001632989 1
+ENST00000416803 ENSE00001734013 1
+ENST00000416803 ENSE00001598660 2
+ENST00000438677 ENSE00001744860 1
+ENST00000438677 ENSE00001765620 2
+ENST00000420675 ENSE00001794752 1
+ENST00000420675 ENSE00001781093 2
+ENST00000420675 ENSE00001614820 3
+ENST00000430517 ENSE00001746702 1
+ENST00000430517 ENSE00001794515 2
+ENST00000430517 ENSE00001629993 3
+ENST00000430517 ENSE00001649124 4
+ENST00000430517 ENSE00001774864 5
+ENST00000426950 ENSE00001494279 1
+ENST00000426950 ENSE00003638561 2
+ENST00000426950 ENSE00001734097 3
+ENST00000426950 ENSE00001697845 4
+ENST00000426950 ENSE00001703875 5
+ENST00000426950 ENSE00001897275 6
+ENST00000383008 ENSE00003476280 1
+ENST00000383008 ENSE00003638561 2
+ENST00000383008 ENSE00001734097 3
+ENST00000383008 ENSE00001697845 4
+ENST00000383008 ENSE00003578244 5
+ENST00000383008 ENSE00001711997 6
+ENST00000466036 ENSE00003579134 1
+ENST00000466036 ENSE00003581570 2
+ENST00000466036 ENSE00001855646 3
+ENST00000466036 ENSE00001810355 4
+ENST00000482082 ENSE00001825143 1
+ENST00000482082 ENSE00003677158 2
+ENST00000482082 ENSE00001840824 3
+ENST00000417124 ENSE00001661126 1
+ENST00000417124 ENSE00001647030 2
+ENST00000457658 ENSE00001717777 1
+ENST00000457658 ENSE00001661171 2
+ENST00000457658 ENSE00001804864 3
+ENST00000457658 ENSE00001769405 4
+ENST00000440408 ENSE00001717777 1
+ENST00000440408 ENSE00001661171 2
+ENST00000440408 ENSE00001804864 3
+ENST00000440408 ENSE00001538271 4
+ENST00000417071 ENSE00001677769 1
+ENST00000417071 ENSE00001661171 2
+ENST00000417071 ENSE00001804864 3
+ENST00000417071 ENSE00001720726 4
+ENST00000543097 ENSE00002231970 1
+ENST00000543097 ENSE00002219117 2
+ENST00000447588 ENSE00001660519 1
+ENST00000412870 ENSE00001721076 1
+ENST00000412870 ENSE00001690255 2
+ENST00000412870 ENSE00001766707 3
+ENST00000412870 ENSE00001768739 4
+ENST00000412870 ENSE00001787917 5
+ENST00000412870 ENSE00001595199 6
+ENST00000412870 ENSE00001629383 7
+ENST00000412870 ENSE00001762780 8
+ENST00000440624 ENSE00001621226 1
+ENST00000429463 ENSE00001789186 1
+ENST00000429463 ENSE00001771580 2
+ENST00000429463 ENSE00001633534 3
+ENST00000429463 ENSE00001599361 4
+ENST00000429463 ENSE00001743476 5
+ENST00000439309 ENSE00001600092 1
+ENST00000454315 ENSE00001680177 1
+ENST00000452257 ENSE00001758106 1
+ENST00000447471 ENSE00001797129 1
+ENST00000447471 ENSE00001724851 2
+ENST00000447471 ENSE00001706454 3
+ENST00000447471 ENSE00001717052 4
+ENST00000447471 ENSE00001779101 5
+ENST00000447471 ENSE00001603762 6
+ENST00000382707 ENSE00001727355 1
+ENST00000382707 ENSE00001718460 2
+ENST00000382707 ENSE00001661257 3
+ENST00000382707 ENSE00003587206 4
+ENST00000382707 ENSE00003656490 5
+ENST00000382707 ENSE00003571694 6
+ENST00000382707 ENSE00003569851 7
+ENST00000382707 ENSE00003633827 8
+ENST00000382707 ENSE00003550118 9
+ENST00000382707 ENSE00003663549 10
+ENST00000382707 ENSE00003674760 11
+ENST00000382707 ENSE00003536697 12
+ENST00000361046 ENSE00001670247 1
+ENST00000361046 ENSE00001718460 2
+ENST00000361046 ENSE00001661257 3
+ENST00000361046 ENSE00003581140 4
+ENST00000361046 ENSE00003460783 5
+ENST00000361046 ENSE00003639215 6
+ENST00000361046 ENSE00003489167 7
+ENST00000361046 ENSE00003511329 8
+ENST00000361046 ENSE00003525217 9
+ENST00000361046 ENSE00003545208 10
+ENST00000361046 ENSE00003621655 11
+ENST00000303902 ENSE00001600029 1
+ENST00000303902 ENSE00001661257 2
+ENST00000303902 ENSE00003587206 3
+ENST00000303902 ENSE00003656490 4
+ENST00000303902 ENSE00003571694 5
+ENST00000303902 ENSE00003569851 6
+ENST00000303902 ENSE00003633827 7
+ENST00000303902 ENSE00003550118 8
+ENST00000303902 ENSE00003663549 9
+ENST00000303902 ENSE00003674760 10
+ENST00000303902 ENSE00001632888 11
+ENST00000439108 ENSE00001795478 1
+ENST00000439108 ENSE00003461810 2
+ENST00000439108 ENSE00003638069 3
+ENST00000439108 ENSE00003462284 4
+ENST00000439108 ENSE00003673811 5
+ENST00000439108 ENSE00003599248 6
+ENST00000439108 ENSE00003569851 7
+ENST00000439108 ENSE00003633827 8
+ENST00000439108 ENSE00003550118 9
+ENST00000439108 ENSE00003663549 10
+ENST00000439108 ENSE00003674760 11
+ENST00000439108 ENSE00001632888 12
+ENST00000431768 ENSE00001635427 1
+ENST00000418461 ENSE00001789043 1
+ENST00000418461 ENSE00001629179 2
+ENST00000421058 ENSE00001774484 1
+ENST00000421058 ENSE00001712382 2
+ENST00000421058 ENSE00001710358 3
+ENST00000455855 ENSE00001661949 1
+ENST00000421008 ENSE00001688800 1
+ENST00000421008 ENSE00001679843 2
+ENST00000421008 ENSE00001627555 3
+ENST00000421008 ENSE00001792441 4
+ENST00000455527 ENSE00003710715 1
+ENST00000455527 ENSE00001628915 2
+ENST00000455527 ENSE00001624117 3
+ENST00000455527 ENSE00001599548 4
+ENST00000455527 ENSE00001591968 5
+ENST00000442584 ENSE00001803550 1
+ENST00000442584 ENSE00001675697 2
+ENST00000442584 ENSE00001644958 3
+ENST00000442584 ENSE00003710715 4
+ENST00000442584 ENSE00001678529 5
+ENST00000448006 ENSE00001637728 1
+ENST00000448006 ENSE00001601067 2
+ENST00000448006 ENSE00001733722 3
+ENST00000448006 ENSE00001707869 4
+ENST00000418016 ENSE00001654072 1
+ENST00000418016 ENSE00003461311 2
+ENST00000418016 ENSE00001709675 3
+ENST00000418016 ENSE00001705207 4
+ENST00000598351 ENSE00003224946 1
+ENST00000598351 ENSE00003096789 2
+ENST00000598351 ENSE00003004848 3
+ENST00000598351 ENSE00003055029 4
+ENST00000598351 ENSE00003198125 5
+ENST00000598351 ENSE00003165650 6
+ENST00000598351 ENSE00003163421 7
+ENST00000598351 ENSE00003611037 8
+ENST00000598351 ENSE00003028414 9
+ENST00000451062 ENSE00001611400 1
+ENST00000451062 ENSE00001669863 2
+ENST00000451062 ENSE00001673478 3
+ENST00000452103 ENSE00001712586 1
+ENST00000452103 ENSE00003709639 2
+ENST00000452103 ENSE00001663750 3
+ENST00000452103 ENSE00001795860 4
+ENST00000505707 ENSE00001712586 1
+ENST00000505707 ENSE00003709639 2
+ENST00000505707 ENSE00003709660 3
+ENST00000452458 ENSE00001688314 1
+ENST00000411536 ENSE00001698575 1
+ENST00000411536 ENSE00001669923 2
+ENST00000411536 ENSE00001614971 3
+ENST00000411536 ENSE00001661020 4
+ENST00000411536 ENSE00001744230 5
+ENST00000411536 ENSE00001720034 6
+ENST00000411536 ENSE00001752268 7
+ENST00000411536 ENSE00001592390 8
+ENST00000447105 ENSE00001666495 1
+ENST00000447105 ENSE00001728287 2
+ENST00000447105 ENSE00001639523 3
+ENST00000447105 ENSE00001648850 4
+ENST00000447105 ENSE00001756419 5
+ENST00000447105 ENSE00001707777 6
+ENST00000422633 ENSE00001796289 1
+ENST00000545933 ENSE00002291273 1
+ENST00000430729 ENSE00001772277 1
+ENST00000430729 ENSE00001708810 2
+ENST00000430729 ENSE00001618426 3
+ENST00000439103 ENSE00001635775 1
+ENST00000439103 ENSE00001597436 2
+ENST00000446299 ENSE00001731360 1
+ENST00000446299 ENSE00001688179 2
+ENST00000446299 ENSE00001696656 3
+ENST00000446299 ENSE00001742686 4
+ENST00000446299 ENSE00001695575 5
+ENST00000446299 ENSE00001748239 6
+ENST00000446299 ENSE00001686372 7
+ENST00000446299 ENSE00001757378 8
+ENST00000446299 ENSE00001738135 9
+ENST00000446299 ENSE00001662296 10
+ENST00000446299 ENSE00001781418 11
+ENST00000446299 ENSE00001735320 12
+ENST00000446299 ENSE00001802546 13
+ENST00000439653 ENSE00001607267 1
+ENST00000439653 ENSE00001735320 2
+ENST00000439653 ENSE00001677719 3
+ENST00000439653 ENSE00001719566 4
+ENST00000445200 ENSE00001779056 1
+ENST00000445200 ENSE00001754002 2
+ENST00000445200 ENSE00001765437 3
+ENST00000445200 ENSE00001739366 4
+ENST00000411983 ENSE00001795992 1
+ENST00000434709 ENSE00001739581 1
+ENST00000434709 ENSE00001645033 2
+ENST00000420149 ENSE00001700611 1
+ENST00000420149 ENSE00001798741 2
+ENST00000420149 ENSE00001724601 3
+ENST00000420149 ENSE00001655881 4
+ENST00000458328 ENSE00001742351 1
+ENST00000458328 ENSE00001606802 2
+ENST00000458328 ENSE00001592567 3
+ENST00000452645 ENSE00001620082 1
+ENST00000452645 ENSE00001793879 2
+ENST00000452645 ENSE00001621504 3
+ENST00000439217 ENSE00001741426 1
+ENST00000439217 ENSE00001721819 2
+ENST00000415230 ENSE00001760736 1
+ENST00000420889 ENSE00001706919 1
+ENST00000420889 ENSE00001668409 2
+ENST00000420889 ENSE00001658969 3
+ENST00000420889 ENSE00001806118 4
+ENST00000420889 ENSE00001766229 5
+ENST00000420889 ENSE00001720050 6
+ENST00000420889 ENSE00001705255 7
+ENST00000420889 ENSE00001599768 8
+ENST00000418455 ENSE00001723842 1
+ENST00000418455 ENSE00001600158 2
+ENST00000418455 ENSE00001782991 3
+ENST00000418455 ENSE00001648973 4
+ENST00000421819 ENSE00001803460 1
+ENST00000421819 ENSE00001656819 2
+ENST00000421819 ENSE00001682459 3
+ENST00000421819 ENSE00001653328 4
+ENST00000421819 ENSE00001804144 5
+ENST00000421819 ENSE00001709330 6
+ENST00000420610 ENSE00001598977 1
+ENST00000420610 ENSE00001609880 2
+ENST00000420610 ENSE00001671213 3
+ENST00000420610 ENSE00001674255 4
+ENST00000416843 ENSE00001623621 1
+ENST00000425318 ENSE00001799526 1
+ENST00000425318 ENSE00001753337 2
+ENST00000425318 ENSE00001695538 3
+ENST00000431853 ENSE00001794473 1
+ENST00000425857 ENSE00001695426 1
+ENST00000425857 ENSE00001695477 2
+ENST00000445264 ENSE00001643114 1
+ENST00000445264 ENSE00001795389 2
+ENST00000445264 ENSE00001629300 3
+ENST00000445264 ENSE00001702866 4
+ENST00000445264 ENSE00001660124 5
+ENST00000445264 ENSE00001619869 6
+ENST00000428342 ENSE00001716125 1
+ENST00000454995 ENSE00001626278 1
+ENST00000428845 ENSE00001758513 1
+ENST00000428845 ENSE00003678107 2
+ENST00000428845 ENSE00001645840 3
+ENST00000428845 ENSE00003600117 4
+ENST00000428845 ENSE00003511199 5
+ENST00000428845 ENSE00001900367 6
+ENST00000444056 ENSE00003531660 1
+ENST00000444056 ENSE00003678107 2
+ENST00000444056 ENSE00001645840 3
+ENST00000444056 ENSE00003600117 4
+ENST00000444056 ENSE00001716327 5
+ENST00000444056 ENSE00001604245 6
+ENST00000489397 ENSE00003623643 1
+ENST00000489397 ENSE00003481640 2
+ENST00000489397 ENSE00001885335 3
+ENST00000489397 ENSE00001840919 4
+ENST00000495839 ENSE00001860458 1
+ENST00000495839 ENSE00003665834 2
+ENST00000495839 ENSE00001947561 3
+ENST00000429039 ENSE00001665114 1
+ENST00000429039 ENSE00001608967 2
+ENST00000429039 ENSE00003678107 3
+ENST00000429039 ENSE00001645840 4
+ENST00000429039 ENSE00003550132 5
+ENST00000429039 ENSE00003665834 6
+ENST00000429039 ENSE00001604245 7
+ENST00000432046 ENSE00001799927 1
+ENST00000421279 ENSE00001728219 1
+ENST00000421279 ENSE00001660382 2
+ENST00000421279 ENSE00001592812 3
+ENST00000421279 ENSE00001628878 4
+ENST00000421279 ENSE00001744875 5
+ENST00000438550 ENSE00001601315 1
+ENST00000438550 ENSE00001626390 2
+ENST00000436067 ENSE00001803290 1
+ENST00000428616 ENSE00001772292 1
+ENST00000442113 ENSE00001707613 1
+ENST00000442113 ENSE00001679661 2
+ENST00000444014 ENSE00001746820 1
+ENST00000444014 ENSE00001781854 2
+ENST00000444014 ENSE00001650660 3
+ENST00000444014 ENSE00001695376 4
+ENST00000444014 ENSE00001765364 5
+ENST00000421675 ENSE00001648735 1
+ENST00000421675 ENSE00001646873 2
+ENST00000421675 ENSE00001718144 3
+ENST00000421675 ENSE00001762583 4
+ENST00000429799 ENSE00001727148 1
+ENST00000429799 ENSE00001611218 2
+ENST00000429799 ENSE00001798859 3
+ENST00000429799 ENSE00001631408 4
+ENST00000429799 ENSE00001610747 5
+ENST00000429799 ENSE00001695935 6
+ENST00000429799 ENSE00001751980 7
+ENST00000429799 ENSE00001795242 8
+ENST00000420376 ENSE00002293458 1
+ENST00000457163 ENSE00001661837 1
+ENST00000457163 ENSE00001780404 2
+ENST00000457163 ENSE00001761447 3
+ENST00000457163 ENSE00001796958 4
+ENST00000457163 ENSE00001609132 5
+ENST00000457163 ENSE00001624832 6
+ENST00000417910 ENSE00001791982 1
+ENST00000417910 ENSE00001595474 2
+ENST00000417910 ENSE00001710293 3
+ENST00000417910 ENSE00001792666 4
+ENST00000419158 ENSE00001799370 1
+ENST00000419158 ENSE00001733114 2
+ENST00000419158 ENSE00001657618 3
+ENST00000419158 ENSE00001783522 4
+ENST00000419158 ENSE00001698815 5
+ENST00000419158 ENSE00001706895 6
+ENST00000419158 ENSE00001789274 7
+ENST00000419158 ENSE00001654967 8
+ENST00000419158 ENSE00001692849 9
+ENST00000419158 ENSE00001679112 10
+ENST00000419158 ENSE00001694527 11
+ENST00000419158 ENSE00001595474 12
+ENST00000419158 ENSE00001769353 13
+ENST00000434481 ENSE00001731828 1
+ENST00000434481 ENSE00001720641 2
+ENST00000434481 ENSE00001658227 3
+ENST00000434481 ENSE00001797496 4
+ENST00000434481 ENSE00001739891 5
+ENST00000434481 ENSE00001634813 6
+ENST00000413466 ENSE00001793150 1
+ENST00000413466 ENSE00001702273 2
+ENST00000451467 ENSE00001639020 1
+ENST00000451467 ENSE00003553084 2
+ENST00000451467 ENSE00001663564 3
+ENST00000430307 ENSE00001791643 1
+ENST00000451162 ENSE00001802023 1
+ENST00000451162 ENSE00001693799 2
+ENST00000451162 ENSE00001692050 3
+ENST00000451162 ENSE00001799280 4
+ENST00000451162 ENSE00001673716 5
+ENST00000451162 ENSE00001673218 6
+ENST00000418213 ENSE00001735464 1
+ENST00000418213 ENSE00001733578 2
+ENST00000418213 ENSE00001790322 3
+ENST00000418213 ENSE00001797275 4
+ENST00000418213 ENSE00001737717 5
+ENST00000418213 ENSE00001693294 6
+ENST00000418213 ENSE00001707543 7
+ENST00000418213 ENSE00001761870 8
+ENST00000418213 ENSE00001624501 9
+ENST00000418213 ENSE00001635783 10
+ENST00000418213 ENSE00001736833 11
+ENST00000418578 ENSE00001766910 1
+ENST00000425026 ENSE00001794575 1
+ENST00000435696 ENSE00001739223 1
+ENST00000435696 ENSE00001657411 2
+ENST00000440468 ENSE00001652940 1
+ENST00000440468 ENSE00001752327 2
+ENST00000429883 ENSE00001634021 1
+ENST00000429883 ENSE00001709795 2
+ENST00000421353 ENSE00001734007 1
+ENST00000421353 ENSE00001620336 2
+ENST00000452432 ENSE00001679045 1
+ENST00000452432 ENSE00001662816 2
+ENST00000454281 ENSE00001772499 1
+ENST00000431631 ENSE00001762435 1
+ENST00000431631 ENSE00001777650 2
+ENST00000442145 ENSE00001788545 1
+ENST00000442145 ENSE00001744623 2
+ENST00000442145 ENSE00001769532 3
+ENST00000442145 ENSE00001679232 4
+ENST00000450658 ENSE00001651045 1
+ENST00000450658 ENSE00003512039 2
+ENST00000450658 ENSE00001780836 3
+ENST00000450658 ENSE00001616192 4
+ENST00000411668 ENSE00001618439 1
+ENST00000411668 ENSE00001613903 2
+ENST00000433481 ENSE00001697534 1
+ENST00000433481 ENSE00001718987 2
+ENST00000433481 ENSE00001618280 3
+ENST00000433481 ENSE00001681425 4
+ENST00000433481 ENSE00001658411 5
+ENST00000435945 ENSE00001797328 1
+ENST00000435945 ENSE00001638296 2
+ENST00000435945 ENSE00001681574 3
+ENST00000435945 ENSE00001741452 4
+ENST00000435945 ENSE00001725096 5
+ENST00000435945 ENSE00001670663 6
+ENST00000435945 ENSE00001752207 7
+ENST00000435945 ENSE00001687652 8
+ENST00000435945 ENSE00001747631 9
+ENST00000435945 ENSE00001702742 10
+ENST00000435945 ENSE00001647502 11
+ENST00000435945 ENSE00001744948 12
+ENST00000435945 ENSE00001663098 13
+ENST00000426526 ENSE00001745034 1
+ENST00000426526 ENSE00001592977 2
+ENST00000426526 ENSE00001710578 3
+ENST00000426526 ENSE00001639465 4
+ENST00000426526 ENSE00001714793 5
+ENST00000422174 ENSE00001767839 1
+ENST00000422174 ENSE00001716848 2
+ENST00000423438 ENSE00001766151 1
+ENST00000457961 ENSE00001747238 1
+ENST00000457961 ENSE00001683875 2
+ENST00000457961 ENSE00001737985 3
+ENST00000457961 ENSE00001745774 4
+ENST00000457961 ENSE00001733068 5
+ENST00000440215 ENSE00001290990 1
+ENST00000440215 ENSE00001663813 2
+ENST00000440215 ENSE00003569983 3
+ENST00000440215 ENSE00003622104 4
+ENST00000440215 ENSE00003482640 5
+ENST00000440215 ENSE00003512313 6
+ENST00000446779 ENSE00001696304 1
+ENST00000446779 ENSE00001597607 2
+ENST00000446779 ENSE00001663813 3
+ENST00000446779 ENSE00003468775 4
+ENST00000446779 ENSE00003536218 5
+ENST00000446779 ENSE00003492729 6
+ENST00000446779 ENSE00003690238 7
+ENST00000454643 ENSE00001761519 1
+ENST00000454643 ENSE00001788634 2
+ENST00000454643 ENSE00001681234 3
+ENST00000454643 ENSE00001682885 4
+ENST00000454643 ENSE00001673336 5
+ENST00000454643 ENSE00001695625 6
+ENST00000454643 ENSE00001609688 7
+ENST00000454643 ENSE00001655456 8
+ENST00000454643 ENSE00001692470 9
+ENST00000454643 ENSE00001762760 10
+ENST00000454643 ENSE00001659235 11
+ENST00000454868 ENSE00001761647 1
+ENST00000454868 ENSE00001593456 2
+ENST00000454868 ENSE00001801031 3
+ENST00000454868 ENSE00001625027 4
+ENST00000424401 ENSE00001685429 1
+ENST00000424401 ENSE00001591638 2
+ENST00000424401 ENSE00001627602 3
+ENST00000424401 ENSE00001792305 4
+ENST00000424401 ENSE00001757037 5
+ENST00000424401 ENSE00001684075 6
+ENST00000435012 ENSE00001723160 1
+ENST00000435012 ENSE00001611031 2
+ENST00000435012 ENSE00001786310 3
+ENST00000435012 ENSE00001774959 4
+ENST00000435012 ENSE00001686516 5
+ENST00000435012 ENSE00001719847 6
+ENST00000435012 ENSE00001637326 7
+ENST00000435012 ENSE00001667900 8
+ENST00000435012 ENSE00001663539 9
+ENST00000435012 ENSE00001684864 10
+ENST00000442607 ENSE00001801245 1
+ENST00000442607 ENSE00001673942 2
+ENST00000442607 ENSE00001616659 3
+ENST00000442607 ENSE00001720897 4
+ENST00000442607 ENSE00001765257 5
+ENST00000442607 ENSE00001690000 6
+ENST00000452889 ENSE00001716389 1
+ENST00000452889 ENSE00001712560 2
+ENST00000452889 ENSE00001616878 3
+ENST00000419538 ENSE00001763631 1
+ENST00000419538 ENSE00001727939 2
+ENST00000398377 ENSE00003523445 1
+ENST00000398377 ENSE00003487606 2
+ENST00000398377 ENSE00003671977 3
+ENST00000398377 ENSE00003601792 4
+ENST00000398377 ENSE00003665075 5
+ENST00000398377 ENSE00003544401 6
+ENST00000398377 ENSE00003690828 7
+ENST00000398377 ENSE00003689528 8
+ENST00000398377 ENSE00003469618 9
+ENST00000516761 ENSE00002089038 1
+ENST00000439586 ENSE00001747437 1
+ENST00000439586 ENSE00001676711 2
+ENST00000447585 ENSE00001663038 1
+ENST00000447585 ENSE00001745094 2
+ENST00000447585 ENSE00001639312 3
+ENST00000447585 ENSE00001691121 4
+ENST00000447585 ENSE00001731541 5
+ENST00000447585 ENSE00001801724 6
+ENST00000447585 ENSE00001637732 7
+ENST00000447585 ENSE00001620479 8
+ENST00000447585 ENSE00001596214 9
+ENST00000447585 ENSE00001632411 10
+ENST00000447585 ENSE00001598287 11
+ENST00000447585 ENSE00003465401 12
+ENST00000447585 ENSE00003603784 13
+ENST00000447585 ENSE00001607039 14
+ENST00000447585 ENSE00001725327 15
+ENST00000447585 ENSE00001768509 16
+ENST00000306641 ENSE00003484855 1
+ENST00000306641 ENSE00003584721 2
+ENST00000432862 ENSE00001805078 1
+ENST00000425158 ENSE00001651627 1
+ENST00000472227 ENSE00001897973 1
+ENST00000472227 ENSE00003707476 2
+ENST00000472227 ENSE00001833379 3
+ENST00000472227 ENSE00001943337 4
+ENST00000472227 ENSE00001869013 5
+ENST00000472227 ENSE00001819899 6
+ENST00000472227 ENSE00001871038 7
+ENST00000472227 ENSE00001853509 8
+ENST00000472227 ENSE00001855011 9
+ENST00000472227 ENSE00001869938 10
+ENST00000472227 ENSE00001858305 11
+ENST00000430079 ENSE00001665626 1
+ENST00000430079 ENSE00001700678 2
+ENST00000430079 ENSE00003705410 3
+ENST00000430079 ENSE00003707476 4
+ENST00000430079 ENSE00001708493 5
+ENST00000430079 ENSE00001755301 6
+ENST00000460561 ENSE00001834814 1
+ENST00000460561 ENSE00003705410 2
+ENST00000460561 ENSE00003707476 3
+ENST00000460561 ENSE00001810485 4
+ENST00000460561 ENSE00001921850 5
+ENST00000460561 ENSE00001942280 6
+ENST00000460561 ENSE00001957709 7
+ENST00000451061 ENSE00001593220 1
+ENST00000451061 ENSE00001799029 2
+ENST00000451061 ENSE00001736209 3
+ENST00000451061 ENSE00001597754 4
+ENST00000451061 ENSE00001645084 5
+ENST00000451061 ENSE00001630093 6
+ENST00000451061 ENSE00001777109 7
+ENST00000451061 ENSE00001788874 8
+ENST00000451061 ENSE00001795527 9
+ENST00000451061 ENSE00001784531 10
+ENST00000451061 ENSE00001748450 11
+ENST00000451061 ENSE00001604424 12
+ENST00000451061 ENSE00001711949 13
+ENST00000451061 ENSE00001747846 14
+ENST00000451061 ENSE00001683758 15
+ENST00000451061 ENSE00001729014 16
+ENST00000451061 ENSE00003703880 17
+ENST00000451061 ENSE00003706916 18
+ENST00000451061 ENSE00001642045 19
+ENST00000451061 ENSE00001673682 20
+ENST00000451061 ENSE00001602445 21
+ENST00000432335 ENSE00001773752 1
+ENST00000432335 ENSE00003703880 2
+ENST00000432335 ENSE00003709828 3
+ENST00000432335 ENSE00003706916 4
+ENST00000432335 ENSE00003684405 5
+ENST00000538268 ENSE00002305110 1
+ENST00000538268 ENSE00003709828 2
+ENST00000538268 ENSE00003584068 3
+ENST00000538268 ENSE00003459679 4
+ENST00000358944 ENSE00001731860 1
+ENST00000358944 ENSE00001635505 2
+ENST00000358944 ENSE00001802644 3
+ENST00000358944 ENSE00003593123 4
+ENST00000358944 ENSE00003642844 5
+ENST00000358944 ENSE00003666042 6
+ENST00000358944 ENSE00003674429 7
+ENST00000358944 ENSE00003502244 8
+ENST00000358944 ENSE00003616658 9
+ENST00000358944 ENSE00003603776 10
+ENST00000358944 ENSE00003548786 11
+ENST00000382659 ENSE00001683783 1
+ENST00000382659 ENSE00001635505 2
+ENST00000382659 ENSE00001802644 3
+ENST00000382659 ENSE00001619375 4
+ENST00000382659 ENSE00003468556 5
+ENST00000382659 ENSE00003528506 6
+ENST00000382659 ENSE00003480174 7
+ENST00000382659 ENSE00003577506 8
+ENST00000382659 ENSE00003482521 9
+ENST00000382659 ENSE00003559850 10
+ENST00000382659 ENSE00003547647 11
+ENST00000382659 ENSE00003613569 12
+ENST00000382673 ENSE00001794358 1
+ENST00000382673 ENSE00001635505 2
+ENST00000382673 ENSE00001802644 3
+ENST00000382673 ENSE00001619375 4
+ENST00000382673 ENSE00003468556 5
+ENST00000382673 ENSE00003528506 6
+ENST00000382673 ENSE00003480174 7
+ENST00000382673 ENSE00003577506 8
+ENST00000382673 ENSE00003482521 9
+ENST00000382673 ENSE00003559850 10
+ENST00000382673 ENSE00003547647 11
+ENST00000382673 ENSE00003465760 12
+ENST00000382658 ENSE00002228703 1
+ENST00000382658 ENSE00001635505 2
+ENST00000382658 ENSE00001802644 3
+ENST00000382658 ENSE00001619375 4
+ENST00000382658 ENSE00003468556 5
+ENST00000382658 ENSE00003528506 6
+ENST00000382658 ENSE00003480174 7
+ENST00000382658 ENSE00003577506 8
+ENST00000382658 ENSE00003482521 9
+ENST00000382658 ENSE00003547647 10
+ENST00000382658 ENSE00002309932 11
+ENST00000456659 ENSE00001643289 1
+ENST00000456659 ENSE00001615364 2
+ENST00000456659 ENSE00001713521 3
+ENST00000456659 ENSE00001786811 4
+ENST00000456659 ENSE00001765379 5
+ENST00000485099 ENSE00001908333 1
+ENST00000441091 ENSE00001666674 1
+ENST00000441091 ENSE00001796018 2
+ENST00000441091 ENSE00001784489 3
+ENST00000383020 ENSE00001881074 1
+ENST00000383020 ENSE00003564737 2
+ENST00000383020 ENSE00003500350 3
+ENST00000383020 ENSE00001799218 4
+ENST00000383020 ENSE00003552000 5
+ENST00000383020 ENSE00003596670 6
+ENST00000383020 ENSE00003464309 7
+ENST00000383020 ENSE00003633721 8
+ENST00000383020 ENSE00003674912 9
+ENST00000383020 ENSE00003567291 10
+ENST00000383020 ENSE00003679647 11
+ENST00000383020 ENSE00003617149 12
+ENST00000382639 ENSE00001661847 1
+ENST00000382639 ENSE00003564737 2
+ENST00000382639 ENSE00003500350 3
+ENST00000382639 ENSE00003675456 4
+ENST00000382639 ENSE00003499970 5
+ENST00000382639 ENSE00003586486 6
+ENST00000382639 ENSE00003643888 7
+ENST00000382639 ENSE00003501027 8
+ENST00000382639 ENSE00003669209 9
+ENST00000382639 ENSE00003536838 10
+ENST00000382639 ENSE00003662366 11
+ENST00000437993 ENSE00001659203 1
+ENST00000437993 ENSE00001709097 2
+ENST00000437993 ENSE00001731355 3
+ENST00000303922 ENSE00001752741 1
+ENST00000303922 ENSE00001595873 2
+ENST00000303922 ENSE00002550304 3
+ENST00000303922 ENSE00001190571 4
+ENST00000420346 ENSE00001792479 1
+ENST00000420346 ENSE00002550304 2
+ENST00000420346 ENSE00001683633 3
+ENST00000420346 ENSE00001618747 4
+ENST00000420346 ENSE00001598516 5
+ENST00000420346 ENSE00001800192 6
+ENST00000420346 ENSE00001753847 7
+ENST00000420346 ENSE00001642884 8
+ENST00000420346 ENSE00001598101 9
+ENST00000450910 ENSE00001629733 1
+ENST00000450910 ENSE00001690028 2
+ENST00000450910 ENSE00001791094 3
+ENST00000450910 ENSE00001700224 4
+ENST00000488394 ENSE00001859492 1
+ENST00000422002 ENSE00001686036 1
+ENST00000306853 ENSE00002094213 1
+ENST00000306853 ENSE00002123033 2
+ENST00000451913 ENSE00001780671 1
+ENST00000382653 ENSE00001751822 1
+ENST00000382653 ENSE00001612685 2
+ENST00000382653 ENSE00001763210 3
+ENST00000382653 ENSE00003654373 4
+ENST00000382653 ENSE00003581142 5
+ENST00000382653 ENSE00003526911 6
+ENST00000382653 ENSE00003486158 7
+ENST00000382653 ENSE00003580424 8
+ENST00000382653 ENSE00003605671 9
+ENST00000382653 ENSE00003601536 10
+ENST00000382653 ENSE00003624241 11
+ENST00000382680 ENSE00001636167 1
+ENST00000382680 ENSE00001612685 2
+ENST00000382680 ENSE00001763210 3
+ENST00000382680 ENSE00001665354 4
+ENST00000382680 ENSE00003494048 5
+ENST00000382680 ENSE00003647885 6
+ENST00000382680 ENSE00003646572 7
+ENST00000382680 ENSE00003489479 8
+ENST00000382680 ENSE00003670003 9
+ENST00000382680 ENSE00003588956 10
+ENST00000382680 ENSE00003484557 11
+ENST00000382680 ENSE00003612403 12
+ENST00000382677 ENSE00001744906 1
+ENST00000382677 ENSE00001763210 2
+ENST00000382677 ENSE00001665354 3
+ENST00000382677 ENSE00003494048 4
+ENST00000382677 ENSE00003647885 5
+ENST00000382677 ENSE00003646572 6
+ENST00000382677 ENSE00003489479 7
+ENST00000382677 ENSE00003670003 8
+ENST00000382677 ENSE00003588956 9
+ENST00000382677 ENSE00003484557 10
+ENST00000382677 ENSE00003612403 11
+ENST00000418956 ENSE00002215963 1
+ENST00000418956 ENSE00001612685 2
+ENST00000418956 ENSE00001763210 3
+ENST00000418956 ENSE00001665354 4
+ENST00000418956 ENSE00003494048 5
+ENST00000418956 ENSE00003647885 6
+ENST00000418956 ENSE00003646572 7
+ENST00000418956 ENSE00003489479 8
+ENST00000418956 ENSE00003670003 9
+ENST00000418956 ENSE00003484557 10
+ENST00000418956 ENSE00002316782 11
+ENST00000442362 ENSE00001592980 1
+ENST00000442362 ENSE00001733820 2
+ENST00000535771 ENSE00002317106 1
+ENST00000535771 ENSE00002237968 2
+ENST00000514804 ENSE00002019278 1
+ENST00000509776 ENSE00002070232 1
+ENST00000509776 ENSE00002079738 2
+ENST00000509776 ENSE00002059695 3
+ENST00000504503 ENSE00002051610 1
+ENST00000504503 ENSE00002083133 2
+ENST00000504503 ENSE00002028839 3
+ENST00000509650 ENSE00002069932 1
+ENST00000509650 ENSE00002063868 2
+ENST00000511770 ENSE00002057796 1
+ENST00000511770 ENSE00002048409 2
+ENST00000511770 ENSE00002073771 3
+ENST00000511770 ENSE00002045000 4
+ENST00000503144 ENSE00002068415 1
+ENST00000503144 ENSE00002086946 2
+ENST00000503144 ENSE00002062200 3
+ENST00000503144 ENSE00002049899 4
+ENST00000442790 ENSE00002077940 1
+ENST00000442790 ENSE00001694957 2
+ENST00000442790 ENSE00003675853 3
+ENST00000510392 ENSE00002079014 1
+ENST00000510392 ENSE00002054858 2
+ENST00000354494 ENSE00001426978 1
+ENST00000354494 ENSE00001427537 2
+ENST00000513521 ENSE00002086052 1
+ENST00000513521 ENSE00002030784 2
+ENST00000513521 ENSE00002046435 3
+ENST00000506069 ENSE00002021560 1
+ENST00000506069 ENSE00003683409 2
+ENST00000506069 ENSE00002051027 3
+ENST00000510613 ENSE00002079391 1
+ENST00000510613 ENSE00002055245 2
+ENST00000510613 ENSE00002063141 3
+ENST00000510613 ENSE00002077751 4
+ENST00000510613 ENSE00002066035 5
+ENST00000509611 ENSE00002036613 1
+ENST00000509611 ENSE00002083771 2
+ENST00000509611 ENSE00002040343 3
+ENST00000509611 ENSE00002082814 4
+ENST00000509611 ENSE00002059714 5
+ENST00000515896 ENSE00002088173 1
+ENST00000515957 ENSE00002088234 1
+ENST00000515987 ENSE00002088264 1
+ENST00000516032 ENSE00002088309 1
+ENST00000516070 ENSE00002088347 1
+ENST00000516108 ENSE00002088385 1
+ENST00000516116 ENSE00002088393 1
+ENST00000516144 ENSE00002088421 1
+ENST00000516157 ENSE00002088434 1
+ENST00000516161 ENSE00002088438 1
+ENST00000516187 ENSE00002088464 1
+ENST00000516203 ENSE00002088480 1
+ENST00000516229 ENSE00002088506 1
+ENST00000516250 ENSE00002088527 1
+ENST00000516346 ENSE00002088623 1
+ENST00000516357 ENSE00002088634 1
+ENST00000516364 ENSE00002088641 1
+ENST00000516400 ENSE00002088677 1
+ENST00000516480 ENSE00002088757 1
+ENST00000516506 ENSE00002088783 1
+ENST00000516514 ENSE00002088791 1
+ENST00000516617 ENSE00002088894 1
+ENST00000516630 ENSE00002088907 1
+ENST00000516659 ENSE00002088936 1
+ENST00000516662 ENSE00002088939 1
+ENST00000516663 ENSE00002088940 1
+ENST00000516704 ENSE00002088981 1
+ENST00000516777 ENSE00002089054 1
+ENST00000516816 ENSE00002089093 1
+ENST00000516824 ENSE00002089101 1
+ENST00000516855 ENSE00002089132 1
+ENST00000516858 ENSE00002089135 1
+ENST00000516872 ENSE00002089149 1
+ENST00000516880 ENSE00002089157 1
+ENST00000516885 ENSE00002089162 1
+ENST00000516957 ENSE00002089234 1
+ENST00000517046 ENSE00002089323 1
+ENST00000517091 ENSE00002089368 1
+ENST00000517139 ENSE00002089416 1
+ENST00000527562 ENSE00002198122 1
+ENST00000527562 ENSE00002174318 2
+ENST00000527562 ENSE00002153935 3
+ENST00000555130 ENSE00002532814 1
+ENST00000557448 ENSE00002532407 1
+ENST00000451548 ENSE00001685809 1
+ENST00000451548 ENSE00001706012 2
+ENST00000451548 ENSE00001701882 3
+ENST00000451548 ENSE00003684179 4
+ENST00000451548 ENSE00003459001 5
+ENST00000451548 ENSE00003593931 6
+ENST00000423647 ENSE00002289488 1
+ENST00000423647 ENSE00002299255 2
+ENST00000423647 ENSE00001706012 3
+ENST00000423647 ENSE00001701882 4
+ENST00000423647 ENSE00003684179 5
+ENST00000423647 ENSE00003459001 6
+ENST00000423647 ENSE00003593931 7
+ENST00000553347 ENSE00002495525 1
+ENST00000557360 ENSE00002516975 1
+ENST00000558356 ENSE00002541336 1
+ENST00000558356 ENSE00002560807 2
+ENST00000566193 ENSE00002607126 1
+ENST00000584011 ENSE00002699100 1
+ENST00000580394 ENSE00002716056 1
+ENST00000584045 ENSE00002692672 1
+ENST00000578366 ENSE00002709597 1
+ENST00000586015 ENSE00002923801 1
+ENST00000601700 ENSE00002984896 1
+ENST00000601700 ENSE00003022139 2
+ENST00000601700 ENSE00003135789 3
+ENST00000599485 ENSE00003159256 1
+ENST00000599485 ENSE00003053921 2
+ENST00000595988 ENSE00003090195 1
+ENST00000595988 ENSE00003068265 2
+ENST00000595988 ENSE00003133967 3
+ENST00000595988 ENSE00003158951 4
+ENST00000595988 ENSE00003052018 5
+ENST00000598545 ENSE00003105168 1
+ENST00000598545 ENSE00002994181 2
+ENST00000598545 ENSE00003035759 3
+ENST00000601705 ENSE00003089533 1
+ENST00000601705 ENSE00003100810 2
+ENST00000448518 ENSE00001782073 1
+ENST00000448518 ENSE00001705189 2
+ENST00000448518 ENSE00001641329 3
+ENST00000448518 ENSE00001803456 4
+ENST00000448518 ENSE00001739988 5
+ENST00000448518 ENSE00001717366 6
+ENST00000448518 ENSE00001761223 7
+ENST00000605584 ENSE00003671391 1
+ENST00000605584 ENSE00003551235 2
+ENST00000603738 ENSE00003639402 1
+ENST00000603738 ENSE00003668884 2
+ENST00000604436 ENSE00003649651 1
+ENST00000603467 ENSE00003691087 1
+ENST00000604924 ENSE00003621115 1
+ENST00000604370 ENSE00003657297 1
+ENST00000605663 ENSE00003625313 1
+ENST00000604178 ENSE00003638565 1
+ENST00000604178 ENSE00003555471 2
+ENST00000604289 ENSE00003625865 1
+ENST00000606439 ENSE00003697854 1
diff --git a/inst/doc/MySQL-backend.R b/inst/doc/MySQL-backend.R
new file mode 100644
index 0000000..df94d72
--- /dev/null
+++ b/inst/doc/MySQL-backend.R
@@ -0,0 +1,30 @@
+## ----eval=FALSE----------------------------------------------------------
+# library(ensembldb)
+# ## Load the EnsDb package that should be installed on the MySQL server
+# library(EnsDb.Hsapiens.v75)
+#
+# ## Call the useMySQL method providing the required credentials to create
+# ## databases and inserting data on the MySQL server
+# edb_mysql <- useMySQL(EnsDb.Hsapiens.v75, host = "localhost", user = "userwrite",
+# pass = "userpass")
+#
+# ## Use this EnsDb object
+# genes(edb_mysql)
+
+## ----eval=FALSE----------------------------------------------------------
+# library(ensembldb)
+# library(RMySQL)
+#
+# ## Connect to the MySQL database to list the databases.
+# dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+# pass = "readonly")
+#
+# ## List the available databases
+# listEnsDbs(dbcon)
+#
+# ## Connect to one of the databases and use that one.
+# dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+# pass = "readonly", dbname = "ensdb_hsapiens_v75")
+# edb <- EnsDb(dbcon)
+# edb
+
diff --git a/inst/doc/MySQL-backend.Rmd b/inst/doc/MySQL-backend.Rmd
new file mode 100644
index 0000000..0acd514
--- /dev/null
+++ b/inst/doc/MySQL-backend.Rmd
@@ -0,0 +1,74 @@
+---
+title: "Using a MySQL server backend"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Using a MySQL server backend}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 20 September, 2016<br />
+**Compiled**: `r date()`
+
+# Introduction
+
+`ensembldb` uses by default, similar to other annotation packages in Bioconductor,
+a SQLite database backend, i.e. annotations are retrieved from file-based SQLite
+databases that are provided *via* packages, such as the `EnsDb.Hsapiens.v75`
+package. In addition, `ensembldb` allows to switch the backend from SQLite to
+MySQL and thus to retrieve annotations from a MySQL server instead. Such a setup
+might be useful for a lab running a well-configured MySQL server that would
+require installation of EnsDb databases only on the database server and not on
+the individual clients.
+
+**Note** the code in this document is not executed during vignette generation as
+this would require access to a MySQL server.
+
+# Using `ensembldb` with a MySQL server
+
+Installation of `EnsDb` databases in a MySQL server is straight forward - given
+that the user has write access to the server:
+
+```{r eval=FALSE}
+library(ensembldb)
+## Load the EnsDb package that should be installed on the MySQL server
+library(EnsDb.Hsapiens.v75)
+
+## Call the useMySQL method providing the required credentials to create
+## databases and inserting data on the MySQL server
+edb_mysql <- useMySQL(EnsDb.Hsapiens.v75, host = "localhost", user = "userwrite",
+ pass = "userpass")
+
+## Use this EnsDb object
+genes(edb_mysql)
+```
+
+To use an `EnsDb` in a MySQL server without the need to install the corresponding
+R-package, the connection to the database can be passed to the `EnsDb` constructor
+function. With the resulting `EnsDb` object annotations can be retrieved from the
+MySQL database.
+
+```{r eval=FALSE}
+library(ensembldb)
+library(RMySQL)
+
+## Connect to the MySQL database to list the databases.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly")
+
+## List the available databases
+listEnsDbs(dbcon)
+
+## Connect to one of the databases and use that one.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly", dbname = "ensdb_hsapiens_v75")
+edb <- EnsDb(dbcon)
+edb
+```
diff --git a/inst/doc/MySQL-backend.html b/inst/doc/MySQL-backend.html
new file mode 100644
index 0000000..0d6d27b
--- /dev/null
+++ b/inst/doc/MySQL-backend.html
@@ -0,0 +1,209 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+
+
+<title>Using a MySQL server backend</title>
+
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
+<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjUgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE1IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgdGhlIE1JVCBsaWNlbnNlCiAqLwppZigidW5kZWZpbmVkIj09dHlwZW9mIGpRdWVyeSl0aHJvdyBuZXcgRXJyb3IoIkJvb3RzdHJhcCdzIEphdmFTY3JpcHQgcmVxdWlyZXMgalF1ZXJ5Iik7K2Z1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0Ijt2YXIgYj1hLmZuLmpxdWVyeS5zcGxpdCgiICIpWzBdLnNwbGl0KCIuIik7aWYoYlswXTwyJiZiWzFdPDl8fDE9PWJbMF0mJjk9PWJbMV0mJmJbMl08MSl0aHJvdy [...]
+<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
+<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgovLyBPbmx5IHJ1biB0aGlzIGNvZGUgaW4gSUUgOAppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG [...]
+
+<style type="text/css">code{white-space: pre;}</style>
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
+<style type="text/css">
+
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+ window.setTimeout(function() {
+ hljs.initHighlighting();
+ }, 0);
+}
+</script>
+
+
+
+<style type="text/css">
+h1 {
+ font-size: 34px;
+}
+h1.title {
+ font-size: 38px;
+}
+h2 {
+ font-size: 30px;
+}
+h3 {
+ font-size: 24px;
+}
+h4 {
+ font-size: 18px;
+}
+h5 {
+ font-size: 16px;
+}
+h6 {
+ font-size: 12px;
+}
+.table th:not([align]) {
+ text-align: left;
+}
+</style>
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Amax%2Dwidth%3A%201054px%3B%0Amargin%3A%200px%20auto%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
+
+</head>
+
+<body>
+
+<style type="text/css">
+.main-container {
+ max-width: 768px;
+ margin-left: auto;
+ margin-right: auto;
+}
+
+img {
+ max-width:100%;
+ height: auto;
+}
+.tabbed-pane {
+ padding-top: 12px;
+}
+button.code-folding-btn:focus {
+ outline: none;
+}
+</style>
+
+
+
+<div class="container-fluid main-container">
+
+<!-- tabsets -->
+<script src="data:application/x-javascript;base64,Cgp3aW5kb3cuYnVpbGRUYWJzZXRzID0gZnVuY3Rpb24odG9jSUQpIHsKCiAgLy8gYnVpbGQgYSB0YWJzZXQgZnJvbSBhIHNlY3Rpb24gZGl2IHdpdGggdGhlIC50YWJzZXQgY2xhc3MKICBmdW5jdGlvbiBidWlsZFRhYnNldCh0YWJzZXQpIHsKCiAgICAvLyBjaGVjayBmb3IgZmFkZSBhbmQgcGlsbHMgb3B0aW9ucwogICAgdmFyIGZhZGUgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1mYWRlIik7CiAgICB2YXIgcGlsbHMgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1waWxscyIpOwogICAgdmFyIG5hdkNsYXNzID0gcGlsbHMgPyAibmF2LXBpbGxzIiA6ICJuYXYtdGFicyI7CgogIC [...]
+<script>
+$(document).ready(function () {
+ window.buildTabsets("TOC");
+});
+</script>
+
+<!-- code folding -->
+
+
+
+
+
+
+<div class="fluid-row" id="header">
+
+
+
+<h1 class="title toc-ignore">Using a MySQL server backend</h1>
+
+</div>
+
+<h1>Contents</h1>
+<div id="TOC">
+<ul>
+<li><a href="#introduction"><span class="toc-section-number">1</span> Introduction</a></li>
+<li><a href="#using-ensembldb-with-a-mysql-server"><span class="toc-section-number">2</span> Using <code>ensembldb</code> with a MySQL server</a></li>
+</ul>
+</div>
+
+<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/ensembldb">ensembldb</a></em><br /> <strong>Authors</strong>: Johannes Rainer <a href="mailto:johannes.rainer at eurac.edu">johannes.rainer at eurac.edu</a>, Tim Triche <a href="mailto:tim.triche at usc.edu">tim.triche at usc.edu</a><br /> <strong>Modified</strong>: 20 September, 2016<br /> <strong>Compiled</strong>: Wed Nov 16 19:52:05 2016</p>
+<div id="introduction" class="section level1">
+<h1><span class="header-section-number">1</span> Introduction</h1>
+<p><code>ensembldb</code> uses by default, similar to other annotation packages in Bioconductor, a SQLite database backend, i.e. annotations are retrieved from file-based SQLite databases that are provided <em>via</em> packages, such as the <code>EnsDb.Hsapiens.v75</code> package. In addition, <code>ensembldb</code> allows to switch the backend from SQLite to MySQL and thus to retrieve annotations from a MySQL server instead. Such a setup might be useful for a lab running a well-configur [...]
+<p><strong>Note</strong> the code in this document is not executed during vignette generation as this would require access to a MySQL server.</p>
+</div>
+<div id="using-ensembldb-with-a-mysql-server" class="section level1">
+<h1><span class="header-section-number">2</span> Using <code>ensembldb</code> with a MySQL server</h1>
+<p>Installation of <code>EnsDb</code> databases in a MySQL server is straight forward - given that the user has write access to the server:</p>
+<pre class="r"><code>library(ensembldb)
+## Load the EnsDb package that should be installed on the MySQL server
+library(EnsDb.Hsapiens.v75)
+
+## Call the useMySQL method providing the required credentials to create
+## databases and inserting data on the MySQL server
+edb_mysql <- useMySQL(EnsDb.Hsapiens.v75, host = "localhost", user = "userwrite",
+ pass = "userpass")
+
+## Use this EnsDb object
+genes(edb_mysql)</code></pre>
+<p>To use an <code>EnsDb</code> in a MySQL server without the need to install the corresponding R-package, the connection to the database can be passed to the <code>EnsDb</code> constructor function. With the resulting <code>EnsDb</code> object annotations can be retrieved from the MySQL database.</p>
+<pre class="r"><code>library(ensembldb)
+library(RMySQL)
+
+## Connect to the MySQL database to list the databases.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly")
+
+## List the available databases
+listEnsDbs(dbcon)
+
+## Connect to one of the databases and use that one.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly", dbname = "ensdb_hsapiens_v75")
+edb <- EnsDb(dbcon)
+edb</code></pre>
+</div>
+
+
+
+
+</div>
+
+<script>
+
+// add bootstrap table styles to pandoc tables
+$(document).ready(function () {
+ $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+});
+
+
+</script>
+
+<script type="text/x-mathjax-config">
+ MathJax.Hub.Config({
+ TeX: {
+ TagSide: "right",
+ equationNumbers: {
+ autoNumber: "AMS"
+ }
+ },
+ "HTML-CSS": {
+ styles: {
+ ".MathJax_Display": {
+ "text-align": "center",
+ padding: "0px 150px 0px 65px",
+ margin: "0px 0px 0.5em"
+ },
+ }
+ }
+ });
+</script>
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+ (function () {
+ var script = document.createElement("script");
+ script.type = "text/javascript";
+ script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+ document.getElementsByTagName("head")[0].appendChild(script);
+ })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/ensembldb.R b/inst/doc/ensembldb.R
new file mode 100644
index 0000000..2efa4be
--- /dev/null
+++ b/inst/doc/ensembldb.R
@@ -0,0 +1,386 @@
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(EnsDb.Hsapiens.v75)
+
+## Making a "short cut"
+edb <- EnsDb.Hsapiens.v75
+## print some informations for this package
+edb
+
+## for what organism was the database generated?
+organism(edb)
+
+## ------------------------------------------------------------------------
+Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
+
+Tx
+
+## as this is a GRanges object we can access e.g. the start coordinates with
+head(start(Tx))
+
+## or extract the biotype with
+head(Tx$tx_biotype)
+
+## ------------------------------------------------------------------------
+## list all database tables along with their columns
+listTables(edb)
+
+## list columns from a specific table
+listColumns(edb, "tx")
+
+## ------------------------------------------------------------------------
+Tx <- transcripts(edb,
+ columns = c(listColumns(edb , "tx"), "gene_name"),
+ filter = TxbiotypeFilter("nonsense_mediated_decay"),
+ return.type = "DataFrame")
+nrow(Tx)
+Tx
+
+## ------------------------------------------------------------------------
+yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+yCds
+
+## ------------------------------------------------------------------------
+## Define the filter
+grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+ strand = "+"), condition = "overlapping")
+
+## Query genes:
+gn <- genes(edb, filter = grf)
+gn
+
+## Next we retrieve all transcripts for that gene so that we can plot them.
+txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+
+## ----tx-for-zbtb16, message=FALSE, fig.align='center', fig.width=7.5, fig.height=5----
+plot(3, 3, pch = NA, xlim = c(start(gn), end(gn)), ylim = c(0, length(txs)),
+ yaxt = "n", ylab = "")
+## Highlight the GRangesFilter region
+rect(xleft = start(grf), xright = end(grf), ybottom = 0, ytop = length(txs),
+ col = "red", border = "red")
+for(i in 1:length(txs)) {
+ current <- txs[i]
+ rect(xleft = start(current), xright = end(current), ybottom = i-0.975,
+ ytop = i-0.125, border = "grey")
+ text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
+}
+
+## ------------------------------------------------------------------------
+transcripts(edb, filter = grf)
+
+## ------------------------------------------------------------------------
+## Get all gene biotypes from the database. The GenebiotypeFilter
+## allows to filter on these values.
+listGenebiotypes(edb)
+
+## Get all transcript biotypes from the database.
+listTxbiotypes(edb)
+
+## ------------------------------------------------------------------------
+## We're going to fetch all genes which names start with BCL. To this end
+## we define a GenenameFilter with partial matching, i.e. condition "like"
+## and a % for any character/string.
+BCLs <- genes(edb,
+ columns = c("gene_name", "entrezid", "gene_biotype"),
+ filter = list(GenenameFilter("BCL%", condition = "like")),
+ return.type = "DataFrame")
+nrow(BCLs)
+BCLs
+
+## ------------------------------------------------------------------------
+## determine the average length of snRNA, snoRNA and rRNA genes encoded on
+## chromosomes X and Y.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+ SeqnameFilter(c("X", "Y")))))
+
+## determine the average length of protein coding genes encoded on the same
+## chromosomes.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))
+
+## ------------------------------------------------------------------------
+## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns = c("tx_id", "exon_idx"),
+ filter = list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition = "<")))
+
+## ------------------------------------------------------------------------
+TxByGns <- transcriptsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c("X", "Y")))
+ )
+TxByGns
+
+## ----eval=FALSE----------------------------------------------------------
+# ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
+# ## Note: want to get rid of the "LRG" genes!!!
+# EnsGenes <- exonsBy(edb, by = "gene",
+# filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+# GeneidFilter("ENSG%", "like")))
+
+## ----eval=FALSE----------------------------------------------------------
+# ## Transforming the GRangesList into a data.frame in SAF format
+# EnsGenes.SAF <- toSAF(EnsGenes)
+
+## ----eval=FALSE----------------------------------------------------------
+# ## Create a GRanges of non-overlapping exon parts.
+# DJE <- disjointExons(edb,
+# filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+# GeneidFilter("ENSG%", "like")))
+
+## ----eval=FALSE----------------------------------------------------------
+# library(EnsDb.Hsapiens.v75)
+# library(Rsamtools)
+# edb <- EnsDb.Hsapiens.v75
+#
+# ## Get the FaFile with the genomic sequence matching the Ensembl version
+# ## using the AnnotationHub package.
+# Dna <- getGenomeFaFile(edb)
+#
+# ## Get start/end coordinates of all genes.
+# genes <- genes(edb)
+# ## Subset to all genes that are encoded on chromosomes for which
+# ## we do have DNA sequence available.
+# genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+#
+# ## Get the gene sequences, i.e. the sequence including the sequence of
+# ## all of the gene's exons and introns.
+# geneSeqs <- getSeq(Dna, genes)
+
+## ----eval=FALSE----------------------------------------------------------
+# ## get all exons of all transcripts encoded on chromosome Y
+# yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+#
+# ## Retrieve the sequences for these transcripts from the FaFile.
+# library(GenomicFeatures)
+# yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
+# yTxSeqs
+#
+# ## Extract the sequences of all transcripts encoded on chromosome Y.
+# yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+#
+# ## Along these lines, we could use the method also to retrieve the coding sequence
+# ## of all transcripts on the Y chromosome.
+# cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+# extractTranscriptSeqs(Dna, cdsY)
+
+## ----message=FALSE-------------------------------------------------------
+## Change the seqlevels style form Ensembl (default) to UCSC:
+seqlevelsStyle(edb) <- "UCSC"
+
+## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
+genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## The seqlevels of the returned GRanges are also in UCSC style
+seqlevels(genesY)
+
+## ------------------------------------------------------------------------
+seqlevelsStyle(edb) <- "UCSC"
+
+## Getting the default option:
+getOption("ensembldb.seqnameNotFound")
+
+## Listing all seqlevels in the database.
+seqlevels(edb)[1:30]
+
+## Setting the option to NA, thus, for each seqname for which no mapping is available,
+## NA is returned.
+options(ensembldb.seqnameNotFound=NA)
+seqlevels(edb)[1:30]
+
+## Resetting the option.
+options(ensembldb.seqnameNotFound = "ORIGINAL")
+
+## ----warning=FALSE, message=FALSE----------------------------------------
+library(BSgenome.Hsapiens.UCSC.hg19)
+bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+## Get the genome version
+unique(genome(bsg))
+unique(genome(edb))
+## Although differently named, both represent genome build GRCh37.
+
+## Extract the full transcript sequences.
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+
+yTxSeqs
+
+## Extract just the CDS
+Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxCds
+
+## ------------------------------------------------------------------------
+seqlevelsStyle(edb) <- "Ensembl"
+
+## ----gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25----
+## Loading the Gviz library
+library(Gviz)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Retrieving a Gviz compatible GRanges object with all genes
+## encoded on chromosome Y.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "Y",
+ start = 20400000, end = 21400000)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+
+## We have to change the ucscChromosomeNames option to FALSE to enable Gviz usage
+## with non-UCSC chromosome names.
+options(ucscChromosomeNames = FALSE)
+
+plotTracks(list(gat, GeneRegionTrack(gr)))
+
+options(ucscChromosomeNames = TRUE)
+
+## ----message=FALSE-------------------------------------------------------
+seqlevelsStyle(edb) <- "UCSC"
+## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000)
+seqnames(gr)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+plotTracks(list(gat, GeneRegionTrack(gr)))
+
+## ----gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25----
+protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("protein_coding"))
+lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("lincRNA"))
+
+plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
+ GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
+
+## At last we change the seqlevels style again to Ensembl
+seqlevelsStyle <- "Ensembl"
+
+## ------------------------------------------------------------------------
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## List all available columns in the database.
+columns(edb)
+
+## Note that these do *not* correspond to the actual column names
+## of the database that can be passed to methods like exons, genes,
+## transcripts etc. These column names can be listed with the listColumns
+## method.
+listColumns(edb)
+
+## List all of the supported key types.
+keytypes(edb)
+
+## Get all gene ids from the database.
+gids <- keys(edb, keytype = "GENEID")
+length(gids)
+
+## Get all gene names for genes encoded on chromosome Y.
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+head(gnames)
+
+## ----warning=FALSE-------------------------------------------------------
+## Use the /standard/ way to fetch data.
+select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
+## Use the filtering system of ensembldb
+select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
+## ------------------------------------------------------------------------
+## Use the default method, which just returns the first value for multi mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
+
+## Alternatively, specify multiVals="list" to return all mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
+ multiVals = "list")
+
+## And, just like before, we can use filters to map only to protein coding transcripts.
+mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column = "TXID",
+ multiVals = "list")
+
+## ----eval=FALSE----------------------------------------------------------
+# library(ensembldb)
+#
+# ## get all human gene/transcript/exon annotations from Ensembl (75)
+# ## the resulting tables will be stored by default to the current working
+# ## directory
+# fetchTablesFromEnsembl(75, species = "human")
+#
+# ## These tables can then be processed to generate a SQLite database
+# ## containing the annotations (again, the function assumes the required
+# ## txt files to be present in the current working directory)
+# DBFile <- makeEnsemblSQLiteFromTables()
+#
+# ## and finally we can generate the package
+# makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
+# maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+# author = "J Rainer")
+
+## ----eval=FALSE----------------------------------------------------------
+# ## Load the AnnotationHub data.
+# library(AnnotationHub)
+# ah <- AnnotationHub()
+#
+# ## Query all available files for Ensembl release 77 for
+# ## Mus musculus.
+# query(ah, c("Mus musculus", "release-77"))
+#
+# ## Get the resource for the gtf file with the gene/transcript definitions.
+# Gtf <- ah["AH28822"]
+# ## Create a EnsDb database file from this.
+# DbFile <- ensDbFromAH(Gtf)
+# ## We can either generate a database package, or directly load the data
+# edb <- EnsDb(DbFile)
+#
+#
+# ## Identify and get the FaFile object with the genomic DNA sequence matching
+# ## the EnsDb annotation.
+# Dna <- getGenomeFaFile(edb)
+# library(Rsamtools)
+# ## We next retrieve the sequence of all exons on chromosome Y.
+# exons <- exons(edb, filter = SeqnameFilter("Y"))
+# exonSeq <- getSeq(Dna, exons)
+#
+# ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
+# Dna <- ah[["AH22042"]]
+
+## ----message=FALSE-------------------------------------------------------
+## Generate a sqlite database from a GRanges object specifying
+## genes encoded on chromosome Y
+load(system.file("YGRanges.RData", package = "ensembldb"))
+Y
+
+DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+ organism = "Homo_sapiens")
+
+edb <- EnsDb(DB)
+edb
+
+## As shown in the example below, we could make an EnsDb package on
+## this DB object using the makeEnsembldbPackage function.
+
+## ----eval=FALSE----------------------------------------------------------
+# library(ensembldb)
+#
+# ## the GTF file can be downloaded from
+# ## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+# gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+# ## generate the SQLite database file
+# DB <- ensDbFromGtf(gtf = gtffile)
+#
+# ## load the DB file directly
+# EDB <- EnsDb(DB)
+#
+# ## alternatively, build the annotation package
+# ## and finally we can generate the package
+# makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
+# maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+# author = "J Rainer")
+
diff --git a/inst/doc/ensembldb.Rmd b/inst/doc/ensembldb.Rmd
new file mode 100644
index 0000000..44420d6
--- /dev/null
+++ b/inst/doc/ensembldb.Rmd
@@ -0,0 +1,920 @@
+---
+title: "Generating an using Ensembl based annotation packages"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 12 September, 2016<br />
+**Compiled**: `r date()`
+
+# Introduction
+
+The `ensembldb` package provides functions to create and use transcript centric
+annotation databases/packages. The annotation for the databases are directly
+fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is
+similar to that of the `TxDb` packages from the `GenomicFeatures` package, but,
+in addition to retrieve all gene/transcript models and annotations from the
+database, the `ensembldb` package provides also a filter framework allowing to
+retrieve annotations for specific entries like genes encoded on a chromosome
+region or transcript models of lincRNA genes. In the databases, along with the
+gene and transcript models and their chromosomal coordinates, additional
+annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
+well as the gene and transcript biotypes are stored too (see Section
+[11](#orgtarget1) for the database layout and an overview of available
+attributes/columns).
+
+Another main goal of this package is to generate *versioned* annotation
+packages, i.e. annotation packages that are build for a specific Ensembl
+release, and are also named according to that (e.g. `EnsDb.Hsapiens.v75` for
+human gene definitions of the Ensembl code database version 75). This ensures
+reproducibility, as it allows to load annotations from a specific Ensembl
+release also if newer versions of annotation packages/releases are available. It
+also allows to load multiple annotation packages at the same time in order to
+e.g. compare gene models between Ensembl releases.
+
+In the example below we load an Ensembl based annotation package for Homo
+sapiens, Ensembl version 75. The connection to the database is bound to the
+variable `EnsDb.Hsapiens.v75`.
+
+```{r warning=FALSE, message=FALSE}
+library(EnsDb.Hsapiens.v75)
+
+## Making a "short cut"
+edb <- EnsDb.Hsapiens.v75
+## print some informations for this package
+edb
+
+## for what organism was the database generated?
+organism(edb)
+```
+
+# Using `ensembldb` annotation packages to retrieve specific annotations
+
+The `ensembldb` package provides a set of filter objects allowing to specify
+which entries should be fetched from the database. The complete list of filters,
+which can be used individually or can be combined, is shown below (in
+alphabetical order):
+
+- `ExonidFilter`: allows to filter the result based on the (Ensembl) exon
+ identifiers.
+- `ExonrankFilter`: filter results on the rank (index) of an exon within the
+ transcript model. Exons are always numbered from 5' to 3' end of the
+ transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
+ of the transcript.
+- `EntrezidFilter`: allows to filter results based on NCBI Entrezgene
+ identifiers of the genes.
+- `GenebiotypeFilter`: allows to filter for the gene biotypes defined in the
+ Ensembl database; use the `listGenebiotypes` method to list all available
+ biotypes.
+- `GeneidFilter`: allows to filter based on the Ensembl gene IDs.
+- `GenenameFilter`: allows to filter based on the names (symbols) of the genes.
+- `SymbolFilter`: allows to filter on gene symbols; note that no database columns
+ *symbol* is available in an `EnsDb` database and hence the gene name is used for
+ filtering.
+- `GRangesFilter`: allows to retrieve all features (genes, transcripts or exons)
+ that are either within (setting `condition` to "within") or partially
+ overlapping (setting `condition` to "overlapping") the defined genomic
+ region/range. Note that, depending on the called method (`genes`, `transcripts`
+ or `exons`) the start and end coordinates of either the genes, transcripts or
+ exons are used for the filter. For methods `exonsBy`, `cdsBy` and `txBy` the
+ coordinates of `by` are used.
+- `SeqendFilter`: filter based on the chromosomal end coordinate of the exons,
+ transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
+ =feature = "gene"=).
+- `SeqnameFilter`: filter by the name of the chromosomes the genes are encoded
+ on.
+- `SeqstartFilter`: filter based on the chromosomal start coordinates of the
+ exons, transcripts or genes (correspondingly set =feature = "exon"=,
+ =feature = "tx"= or =feature = "gene"=).
+- `SeqstrandFilter`: filter for the chromosome strand on which the genes are
+ encoded.
+- `TxbiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
+ the `listTxbiotypes` method to list all available biotypes.
+- `TxidFilter`: filter on the Ensembl transcript identifiers.
+
+Each of the filter classes can take a single value or a vector of values (with
+the exception of the `SeqendFilter` and `SeqstartFilter`) for comparison. In
+addition, it is possible to specify the *condition* for the filter,
+e.g. setting `condition` to = to retrieve all entries matching the filter value,
+to != to negate the filter or setting `condition = "like"= to allow
+partial matching. The =condition` parameter for `SeqendFilter` and
+`SeqendFilter` can take the values = , >, >=, < and <= (since these
+filters base on numeric values).
+
+A simple example would be to get all transcripts for the gene *BCL2L11*. To this
+end we specify a `GenenameFilter` with the value *BCL2L11*. As a result we get
+a `GRanges` object with `start`, `end`, `strand` and `seqname` of the `GRanges`
+object being the start coordinate, end coordinate, chromosome name and strand
+for the respective transcripts. All additional annotations are available as
+metadata columns. Alternatively, by setting `return.type` to "DataFrame", or
+"data.frame" the method would return a `DataFrame` or `data.frame` object.
+
+```{r }
+Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
+
+Tx
+
+## as this is a GRanges object we can access e.g. the start coordinates with
+head(start(Tx))
+
+## or extract the biotype with
+head(Tx$tx_biotype)
+```
+
+The parameter `columns` of the `exons`, `genes` and `transcripts` method allows
+to specify which database attributes (columns) should be retrieved. The `exons`
+method returns by default all exon-related columns, the `transcripts` all columns
+from the transcript database table and the `genes` all from the gene table. Note
+however that in the example above we got also a column `gene_name` although this
+column is not present in the transcript database table. By default the methods
+return also all columns that are used by any of the filters submitted with the
+`filter` argument (thus, because a `GenenameFilter` was used, the column `gene_name`
+is also returned). Setting `returnFilterColumns(edb) <- FALSE` disables this
+option and only the columns specified by the `columns` parameter are retrieved.
+
+To get an overview of database tables and available columns the function
+`listTables` can be used. The method `listColumns` on the other hand lists columns
+for the specified database table.
+
+```{r }
+## list all database tables along with their columns
+listTables(edb)
+
+## list columns from a specific table
+listColumns(edb, "tx")
+```
+
+Thus, we could retrieve all transcripts of the biotype *nonsense\_mediated\_decay*
+(which, according to the definitions by Ensembl are transcribed, but most likely
+not translated in a protein, but rather degraded after transcription) along with
+the name of the gene for each transcript. Note that we are changing here the
+`return.type` to `DataFrame`, so the method will return a `DataFrame` with the
+results instead of the default `GRanges`.
+
+```{r }
+Tx <- transcripts(edb,
+ columns = c(listColumns(edb , "tx"), "gene_name"),
+ filter = TxbiotypeFilter("nonsense_mediated_decay"),
+ return.type = "DataFrame")
+nrow(Tx)
+Tx
+```
+
+For protein coding transcripts, we can also specifically extract their coding
+region. In the example below we extract the CDS for all transcripts encoded on
+chromosome Y.
+
+```{r }
+yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+yCds
+```
+
+Using a `GRangesFilter` we can retrieve all features from the database that are
+either within or overlapping the specified genomic region. In the example
+below we query all genes that are partially overlapping with a small region on
+chromosome 11. The filter restricts to all genes for which either an exon or an
+intron is partially overlapping with the region.
+
+```{r }
+## Define the filter
+grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+ strand = "+"), condition = "overlapping")
+
+## Query genes:
+gn <- genes(edb, filter = grf)
+gn
+
+## Next we retrieve all transcripts for that gene so that we can plot them.
+txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+```
+
+```{r tx-for-zbtb16, message=FALSE, fig.align='center', fig.width=7.5, fig.height=5}
+plot(3, 3, pch = NA, xlim = c(start(gn), end(gn)), ylim = c(0, length(txs)),
+ yaxt = "n", ylab = "")
+## Highlight the GRangesFilter region
+rect(xleft = start(grf), xright = end(grf), ybottom = 0, ytop = length(txs),
+ col = "red", border = "red")
+for(i in 1:length(txs)) {
+ current <- txs[i]
+ rect(xleft = start(current), xright = end(current), ybottom = i-0.975,
+ ytop = i-0.125, border = "grey")
+ text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
+}
+```
+
+As we can see, 4 transcripts of the gene ZBTB16 are also overlapping the
+region. Below we fetch these 4 transcripts. Note, that a call to `exons` will
+not return any features from the database, as no exon is overlapping with the
+region.
+
+```{r }
+transcripts(edb, filter = grf)
+```
+
+The `GRangesFilter` supports also `GRanges` defining multiple regions and a
+query will return all features overlapping any of these regions. Besides using
+the `GRangesFilter` it is also possible to search for transcripts or exons
+overlapping genomic regions using the `exonsByOverlaps` or
+`transcriptsByOverlaps` known from the `GenomicFeatures` package. Note that the
+implementation of these methods for `EnsDb` objects supports also to use filters
+to further fine-tune the query.
+
+To get an overview of allowed/available gene and transcript biotype the
+functions `listGenebiotypes` and `listTxbiotypes` can be used.
+
+```{r }
+## Get all gene biotypes from the database. The GenebiotypeFilter
+## allows to filter on these values.
+listGenebiotypes(edb)
+
+## Get all transcript biotypes from the database.
+listTxbiotypes(edb)
+```
+
+Data can be fetched in an analogous way using the `exons` and `genes`
+methods. In the example below we retrieve `gene_name`, `entrezid` and the
+`gene_biotype` of all genes in the database which names start with "BCL2".
+
+```{r }
+## We're going to fetch all genes which names start with BCL. To this end
+## we define a GenenameFilter with partial matching, i.e. condition "like"
+## and a % for any character/string.
+BCLs <- genes(edb,
+ columns = c("gene_name", "entrezid", "gene_biotype"),
+ filter = list(GenenameFilter("BCL%", condition = "like")),
+ return.type = "DataFrame")
+nrow(BCLs)
+BCLs
+```
+
+Sometimes it might be useful to know the length of genes or transcripts
+(i.e. the total sum of nucleotides covered by their exons). Below we calculate
+the mean length of transcripts from protein coding genes on chromosomes X and Y
+as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
+these chromosomes.
+
+```{r }
+## determine the average length of snRNA, snoRNA and rRNA genes encoded on
+## chromosomes X and Y.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+ SeqnameFilter(c("X", "Y")))))
+
+## determine the average length of protein coding genes encoded on the same
+## chromosomes.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))
+```
+
+Not unexpectedly, transcripts of protein coding genes are longer than those of
+snRNA, snoRNA or rRNA genes.
+
+At last we extract the first two exons of each transcript model from the
+database.
+
+```{r }
+## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns = c("tx_id", "exon_idx"),
+ filter = list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition = "<")))
+```
+
+# Extracting gene/transcript/exon models for RNASeq feature counting
+
+For the feature counting step of an RNAseq experiment, the gene or transcript
+models (defined by the chromosomal start and end positions of their exons) have
+to be known. To extract these from an Ensembl based annotation package, the
+`exonsBy`, `genesBy` and `transcriptsBy` methods can be used in an analogous way as in
+`TxDb` packages generated by the `GenomicFeatures` package. However, the
+`transcriptsBy` method does not, in contrast to the method in the `GenomicFeatures`
+package, allow to return transcripts by "cds". While the annotation packages
+built by the `ensembldb` contain the chromosomal start and end coordinates of
+the coding region (for protein coding genes) they do not assign an ID to each
+CDS.
+
+A simple use case is to retrieve all genes encoded on chromosomes X and Y from
+the database.
+
+```{r }
+TxByGns <- transcriptsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c("X", "Y")))
+ )
+TxByGns
+```
+
+Since Ensembl contains also definitions of genes that are on chromosome variants
+(supercontigs), it is advisable to specify the chromosome names for which the
+gene models should be returned.
+
+In a real use case, we might thus want to retrieve all genes encoded on the
+*standard* chromosomes. In addition it is advisable to use a `GeneidFilter` to
+restrict to Ensembl genes only, as also *LRG* (Locus Reference Genomic)
+genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with
+Ensembl genes.
+
+```{r eval=FALSE}
+## will just get exons for all genes on chromosomes 1 to 22, X and Y.
+## Note: want to get rid of the "LRG" genes!!!
+EnsGenes <- exonsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+```
+
+The code above returns a `GRangesList` that can be used directly as an input for
+the `summarizeOverlaps` function from the `GenomicAlignments` package <sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>.
+
+Alternatively, the above `GRangesList` can be transformed to a `data.frame` in
+*SAF* format that can be used as an input to the `featureCounts` function of the
+`Rsubread` package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.
+
+```{r eval=FALSE}
+## Transforming the GRangesList into a data.frame in SAF format
+EnsGenes.SAF <- toSAF(EnsGenes)
+```
+
+Note that the ID by which the `GRangesList` is split is used in the SAF
+formatted `data.frame` as the `GeneID`. In the example below this would be the
+Ensembl gene IDs, while the start, end coordinates (along with the strand and
+chromosomes) are those of the the exons.
+
+In addition, the `disjointExons` function (similar to the one defined in
+`GenomicFeatures`) can be used to generate a `GRanges` of non-overlapping exon
+parts which can be used in the `DEXSeq` package.
+
+```{r eval=FALSE}
+## Create a GRanges of non-overlapping exon parts.
+DJE <- disjointExons(edb,
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+```
+
+# Retrieving sequences for gene/transcript/exon models
+
+The methods to retrieve exons, transcripts and genes (i.e. `exons`, `transcripts`
+and `genes`) return by default `GRanges` objects that can be used to retrieve
+sequences using the `getSeq` method e.g. from BSgenome packages. The basic
+workflow is thus identical to the one for `TxDb` packages, however, it is not
+straight forward to identify the BSgenome package with the matching genomic
+sequence. Most BSgenome packages are named according to the genome build
+identifier used in UCSC which does not (always) match the genome build name used
+by Ensembl. Using the Ensembl version provided by the `EnsDb`, the correct genomic
+sequence can however be retrieved easily from the `AnnotationHub` using the
+`getGenomeFaFile`. If no Fasta file matching the Ensembl version is available, the
+function tries to identify a Fasta file with the correct genome build from the
+*closest* Ensembl release and returns that instead.
+
+In the code block below we retrieve first the `FaFile` with the genomic DNA
+sequence, extract the genomic start and end coordinates for all genes defined in
+the package, subset to genes encoded on sequences available in the `FaFile` and
+extract all of their sequences. Note: these sequences represent the sequence
+between the chromosomal start and end coordinates of the gene.
+
+```{r eval=FALSE}
+library(EnsDb.Hsapiens.v75)
+library(Rsamtools)
+edb <- EnsDb.Hsapiens.v75
+
+## Get the FaFile with the genomic sequence matching the Ensembl version
+## using the AnnotationHub package.
+Dna <- getGenomeFaFile(edb)
+
+## Get start/end coordinates of all genes.
+genes <- genes(edb)
+## Subset to all genes that are encoded on chromosomes for which
+## we do have DNA sequence available.
+genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+## Get the gene sequences, i.e. the sequence including the sequence of
+## all of the gene's exons and introns.
+geneSeqs <- getSeq(Dna, genes)
+```
+
+To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
+use directly the `extractTranscriptSeqs` method defined in the `GenomicFeatures` on
+the `EnsDb` object, eventually using a filter to restrict the query.
+
+```{r eval=FALSE}
+## get all exons of all transcripts encoded on chromosome Y
+yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+
+## Retrieve the sequences for these transcripts from the FaFile.
+library(GenomicFeatures)
+yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
+yTxSeqs
+
+## Extract the sequences of all transcripts encoded on chromosome Y.
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+
+## Along these lines, we could use the method also to retrieve the coding sequence
+## of all transcripts on the Y chromosome.
+cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+extractTranscriptSeqs(Dna, cdsY)
+```
+
+Note: in the next section we describe how transcript sequences can be retrieved
+from a `BSgenome` package that is based on UCSC, not Ensembl.
+
+# Integrating annotations from Ensembl based `EnsDb` packages with UCSC based annotations
+
+Sometimes it might be useful to combine (Ensembl based) annotations from `EnsDb`
+packages/objects with annotations from other Bioconductor packages, that might
+base on UCSC annotations. To support such an integration of annotations, the
+`ensembldb` packages implements the `seqlevelsStyle` and `seqlevelsStyle<-` from the
+`GenomeInfoDb` package that allow to change the style of chromosome naming. Thus,
+sequence/chromosome names other than those used by Ensembl can be used in, and
+are returned by, the queries to `EnsDb` objects as long as a mapping for them is
+provided by the `GenomeInfoDb` package (which provides a mapping mostly between
+UCSC, NCBI and Ensembl chromosome names for the *main* chromosomes).
+
+In the example below we change the seqnames style to UCSC.
+
+```{r message=FALSE}
+## Change the seqlevels style form Ensembl (default) to UCSC:
+seqlevelsStyle(edb) <- "UCSC"
+
+## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
+genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## The seqlevels of the returned GRanges are also in UCSC style
+seqlevels(genesY)
+```
+
+Note that in most instances no mapping is available for sequences not
+corresponding to the main chromosomes (i.e. contigs, patched chromosomes
+etc). What is returned in cases in which no mapping is available can be
+specified with the global `ensembldb.seqnameNotFound` option. By default (with
+`ensembldb.seqnameNotFound` set to "ORIGINAL"), the original seqnames (i.e. the
+ones from Ensembl) are returned. With `ensembldb.seqnameNotFound` "MISSING" each
+time a seqname can not be found an error is thrown. For all other cases
+(e.g. `ensembldb.seqnameNotFound = NA`) the value of the option is returned.
+
+```{r }
+seqlevelsStyle(edb) <- "UCSC"
+
+## Getting the default option:
+getOption("ensembldb.seqnameNotFound")
+
+## Listing all seqlevels in the database.
+seqlevels(edb)[1:30]
+
+## Setting the option to NA, thus, for each seqname for which no mapping is available,
+## NA is returned.
+options(ensembldb.seqnameNotFound=NA)
+seqlevels(edb)[1:30]
+
+## Resetting the option.
+options(ensembldb.seqnameNotFound = "ORIGINAL")
+```
+
+Next we retrieve transcript sequences from genes encoded on chromosome Y using
+the `BSGenome` package for the human genome from UCSC. The specified version
+`hg19` matches the genome build of Ensembl version 75, i.e. `GRCh37`. Note that
+while we changed the style of the seqnames to UCSC we did not change the naming
+of the genome release.
+
+```{r warning=FALSE, message=FALSE}
+library(BSgenome.Hsapiens.UCSC.hg19)
+bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+## Get the genome version
+unique(genome(bsg))
+unique(genome(edb))
+## Although differently named, both represent genome build GRCh37.
+
+## Extract the full transcript sequences.
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+
+yTxSeqs
+
+## Extract just the CDS
+Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxCds
+```
+
+At last changing the seqname style to the default value ="Ensembl"=.
+
+```{r }
+seqlevelsStyle(edb) <- "Ensembl"
+```
+
+# Interactive annotation lookup using the `shiny` web app
+
+In addition to the `genes`, `transcripts` and `exons` methods it is possibly to
+search interactively for gene/transcript/exon annotations using the internal,
+`shiny` based, web application. The application can be started with the
+`runEnsDbApp()` function. The search results from this app can also be returned
+to the R workspace either as a `data.frame` or `GRanges` object.
+
+# Plotting gene/transcript features using `ensembldb` and `Gviz`
+
+The `Gviz` package provides functions to plot genes and transcripts along with
+other data on a genomic scale. Gene models can be provided either as a
+`data.frame`, `GRanges`, `TxDB` database, can be fetched from biomart and can
+also be retrieved from `ensembldb`.
+
+Below we generate a `GeneRegionTrack` fetching all transcripts from a certain
+region on chromosome Y.
+
+Note that if we want in addition to work also with BAM files that were aligned
+against DNA sequences retrieved from Ensembl or FASTA files representing genomic
+DNA sequences from Ensembl we should change the `ucscChromosomeNames` option from
+`Gviz` to `FALSE` (i.e. by calling `options(ucscChromosomeNames = FALSE)`). This is
+not necessary if we just want to retrieve gene models from an `EnsDb` object, as
+the `ensembldb` package internally checks the `ucscChromosomeNames` option and,
+depending on that, maps Ensembl chromosome names to UCSC chromosome names.
+
+```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+## Loading the Gviz library
+library(Gviz)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Retrieving a Gviz compatible GRanges object with all genes
+## encoded on chromosome Y.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "Y",
+ start = 20400000, end = 21400000)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+
+## We have to change the ucscChromosomeNames option to FALSE to enable Gviz usage
+## with non-UCSC chromosome names.
+options(ucscChromosomeNames = FALSE)
+
+plotTracks(list(gat, GeneRegionTrack(gr)))
+
+options(ucscChromosomeNames = TRUE)
+```
+
+Above we had to change the option `ucscChromosomeNames` to `FALSE` in order to
+use it with non-UCSC chromosome names. Alternatively, we could however also
+change the `seqnamesStyle` of the `EnsDb` object to `UCSC`. Note that we have to
+use now also chromosome names in the *UCSC style* in the `SeqnameFilter`
+(i.e. "chrY" instead of `Y`).
+
+```{r message=FALSE}
+seqlevelsStyle(edb) <- "UCSC"
+## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000)
+seqnames(gr)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+plotTracks(list(gat, GeneRegionTrack(gr)))
+```
+
+We can also use the filters from the `ensembldb` package to further refine what
+transcripts are fetched, like in the example below, in which we create two
+different gene region tracks, one for protein coding genes and one for lincRNAs.
+
+```{r gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("protein_coding"))
+lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("lincRNA"))
+
+plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
+ GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
+
+## At last we change the seqlevels style again to Ensembl
+seqlevelsStyle <- "Ensembl"
+```
+
+# Using `EnsDb` objects in the `AnnotationDbi` framework
+
+Most of the methods defined for objects extending the basic annotation package
+class `AnnotationDbi` are also defined for `EnsDb` objects (i.e. methods
+`columns`, `keytypes`, `keys`, `mapIds` and `select`). While these methods can
+be used analogously to basic annotation packages, the implementation for `EnsDb`
+objects also support the filtering framework of the `ensembldb` package.
+
+In the example below we first evaluate all the available columns and keytypes in
+the database and extract then the gene names for all genes encoded on chromosome
+X.
+
+```{r }
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## List all available columns in the database.
+columns(edb)
+
+## Note that these do *not* correspond to the actual column names
+## of the database that can be passed to methods like exons, genes,
+## transcripts etc. These column names can be listed with the listColumns
+## method.
+listColumns(edb)
+
+## List all of the supported key types.
+keytypes(edb)
+
+## Get all gene ids from the database.
+gids <- keys(edb, keytype = "GENEID")
+length(gids)
+
+## Get all gene names for genes encoded on chromosome Y.
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+head(gnames)
+```
+
+In the next example we retrieve specific information from the database using the
+`select` method. First we fetch all transcripts for the genes *BCL2* and
+*BCL2L11*. In the first call we provide the gene names, while in the second call
+we employ the filtering system to perform a more fine-grained query to fetch
+only the protein coding transcripts for these genes.
+
+```{r warning=FALSE}
+## Use the /standard/ way to fetch data.
+select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
+## Use the filtering system of ensembldb
+select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+```
+
+Finally, we use the `mapIds` method to establish a mapping between ids and
+values. In the example below we fetch transcript ids for the two genes from the
+example above.
+
+```{r }
+## Use the default method, which just returns the first value for multi mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
+
+## Alternatively, specify multiVals="list" to return all mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
+ multiVals = "list")
+
+## And, just like before, we can use filters to map only to protein coding transcripts.
+mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column = "TXID",
+ multiVals = "list")
+```
+
+Note that, if the filters are used, the ordering of the result does no longer
+match the ordering of the genes.
+
+# Important notes
+
+These notes might explain eventually unexpected results (and, more importantly,
+help avoiding them):
+
+- The ordering of the results returned by the `genes`, `exons`, `transcripts` methods
+ can be specified with the `order.by` parameter. The ordering of the results does
+ however **not** correspond to the ordering of values in submitted filter
+ objects. The exception is the `select` method. If a character vector of values
+ or a single filter is passed with argument `keys` the ordering of results of
+ this method matches the ordering of the key values or the values of the
+ filter.
+
+- Results of `exonsBy`, `transcriptsBy` are always ordered by the `by` argument.
+
+- The CDS provided by `EnsDb` objects **always** includes both, the start and the
+ stop codon.
+
+- Transcripts with multiple CDS are at present not supported by `EnsDb`.
+
+- At present, `EnsDb` support only genes/transcripts for which all of their
+ exons are encoded on the same chromosome and the same strand.
+
+# Building an transcript-centric database package based on Ensembl annotation
+
+The code in this section is not supposed to be automatically executed when the
+vignette is built, as this would require a working installation of the Ensembl
+Perl API, which is not expected to be available on each system. Also, building
+`EnsDb` from alternative sources, like GFF or GTF files takes some time and
+thus also these examples are not directly executed when the vignette is build.
+
+## Requirements
+
+The `fetchTablesFromEnsembl` function of the package uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site *ensembldb.ensembl.org*). Thus, to use the functionality to built
+databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+
+Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
+functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
+files from Ensembl (either provided as files or *via* `AnnotationHub`). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files.
+
+## Building annotation packages
+
+The functions below use the Ensembl Perl API to fetch the required data directly
+from the Ensembl core databases. Thus, the path to the Perl API specific for the
+desired Ensembl version needs to be added to the `PERL5LIB` environment variable.
+
+An annotation package containing all human genes for Ensembl version 75 can be
+created using the code in the block below.
+
+```{r eval=FALSE}
+library(ensembldb)
+
+## get all human gene/transcript/exon annotations from Ensembl (75)
+## the resulting tables will be stored by default to the current working
+## directory
+fetchTablesFromEnsembl(75, species = "human")
+
+## These tables can then be processed to generate a SQLite database
+## containing the annotations (again, the function assumes the required
+## txt files to be present in the current working directory)
+DBFile <- makeEnsemblSQLiteFromTables()
+
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+```
+
+The generated package can then be build using `R CMD build EnsDb.Hsapiens.v75`
+and installed with `R CMD INSTALL EnsDb.Hsapiens.v75*`. Note that we could
+directly generate an `EnsDb` instance by loading the database file, i.e. by
+calling `edb <- EnsDb(DBFile)` and work with that annotation object.
+
+To fetch and build annotation packages for plant genomes (e.g. arabidopsis
+thaliana), the *Ensembl genomes* should be specified as a host, i.e. setting
+`host` to "mysql-eg-publicsql.ebi.ac.uk", `port` to `4157` and `species` to
+e.g. "arabidopsis thaliana".
+
+In the next example we create an `EnsDb` database using the `AnnotationHub`
+package and load also the corresponding genomic DNA sequence matching the
+Ensembl version. We thus first query the `AnnotationHub` package for all
+resources available for `Mus musculus` and the Ensembl release 77. Next we
+create the `EnsDb` object from the appropriate `AnnotationHub` resource. We
+then use the `getGenomeFaFile` method on the `EnsDb` to directly look up and
+retrieve the correct or best matching `FaFile` with the genomic DNA sequence. At
+last we retrieve the sequences of all exons using the `getSeq` method.
+
+```{r eval=FALSE}
+## Load the AnnotationHub data.
+library(AnnotationHub)
+ah <- AnnotationHub()
+
+## Query all available files for Ensembl release 77 for
+## Mus musculus.
+query(ah, c("Mus musculus", "release-77"))
+
+## Get the resource for the gtf file with the gene/transcript definitions.
+Gtf <- ah["AH28822"]
+## Create a EnsDb database file from this.
+DbFile <- ensDbFromAH(Gtf)
+## We can either generate a database package, or directly load the data
+edb <- EnsDb(DbFile)
+
+
+## Identify and get the FaFile object with the genomic DNA sequence matching
+## the EnsDb annotation.
+Dna <- getGenomeFaFile(edb)
+library(Rsamtools)
+## We next retrieve the sequence of all exons on chromosome Y.
+exons <- exons(edb, filter = SeqnameFilter("Y"))
+exonSeq <- getSeq(Dna, exons)
+
+## Alternatively, look up and retrieve the toplevel DNA sequence manually.
+Dna <- ah[["AH22042"]]
+```
+
+In the example below we load a `GRanges` containing gene definitions for genes
+encoded on chromosome Y and generate a EnsDb SQLite database from that
+information.
+
+```{r message=FALSE}
+## Generate a sqlite database from a GRanges object specifying
+## genes encoded on chromosome Y
+load(system.file("YGRanges.RData", package = "ensembldb"))
+Y
+
+DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+ organism = "Homo_sapiens")
+
+edb <- EnsDb(DB)
+edb
+
+## As shown in the example below, we could make an EnsDb package on
+## this DB object using the makeEnsembldbPackage function.
+```
+
+Alternatively we can build the annotation database using the `ensDbFromGtf`
+`ensDbFromGff` functions, that extracts most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
+<ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene definitions
+from Ensembl version 75; for plant genomes etc files can be retrieved from
+<ftp://ftp.ensemblgenomes.org>). All information except the chromosome lengths and
+the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
+tries to retrieve chromosome length information automatically from Ensembl.
+
+Below we create the annotation from a gtf file that we fetch directly from Ensembl.
+
+```{r eval=FALSE}
+library(ensembldb)
+
+## the GTF file can be downloaded from
+## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+## generate the SQLite database file
+DB <- ensDbFromGtf(gtf = gtffile)
+
+## load the DB file directly
+EDB <- EnsDb(DB)
+
+## alternatively, build the annotation package
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+```
+
+# Database layout<a id="orgtarget1"></a>
+
+The database consists of the following tables and attributes (the layout is also
+shown in Figure [115](#orgparagraph1)):
+
+- **gene**: all gene specific annotations.
+ - `gene_id`: the Ensembl ID of the gene.
+ - `gene_name`: the name (symbol) of the gene.
+ - `entrezid`: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
+ `;` separated list of IDs for genes that are mapped to more than one
+ Entrezgene.
+ - `gene_biotype`: the biotype of the gene.
+ - `gene_seq_start`: the start coordinate of the gene on the sequence (usually
+ a chromosome).
+ - `gene_seq_end`: the end coordinate of the gene on the sequence.
+ - `seq_name`: the name of the sequence (usually the chromosome name).
+ - `seq_strand`: the strand on which the gene is encoded.
+ - `seq_coord_system`: the coordinate system of the sequence.
+
+- **tx**: all transcript related annotations. Note that while no `tx_name` column
+ is available in this database column, all methods to retrieve data from the
+ database support also this column. The returned values are however the ID of
+ the transcripts.
+ - `tx_id`: the Ensembl transcript ID.
+ - `tx_biotype`: the biotype of the transcript.
+ - `tx_seq_start`: the start coordinate of the transcript.
+ - `tx_seq_end`: the end coordinate of the transcript.
+ - `tx_cds_seq_start`: the start coordinate of the coding region of the
+ transcript (NULL for non-coding transcripts).
+ - `tx_cds_seq_end`: the end coordinate of the coding region of the transcript.
+ - `gene_id`: the gene to which the transcript belongs.
+
+- **exon**: all exon related annotation.
+ - `exon_id`: the Ensembl exon ID.
+ - `exon_seq_start`: the start coordinate of the exon.
+ - `exon_seq_end`: the end coordinate of the exon.
+
+- **tx2exon**: provides the n:m mapping between transcripts and exons.
+ - `tx_id`: the Ensembl transcript ID.
+ - `exon_id`: the Ensembl exon ID.
+ - `exon_idx`: the index of the exon in the corresponding transcript, always
+ from 5' to 3' of the transcript.
+
+- **chromosome**: provides some information about the chromosomes.
+ - `seq_name`: the name of the sequence/chromosome.
+ - `seq_length`: the length of the sequence.
+ - `is_circular`: whether the sequence in circular.
+
+- **information**: some additional, internal, informations (Genome build, Ensembl
+ version etc).
+ - `key`
+ - `value`
+
+- *virtual* columns:
+ - `symbol`: the database does not have such a database column, but it is still
+ possible to use it in the `columns` parameter. This column is *symlinked* to the
+ `gene_name` column.
+ - `tx_name`: similar to the `symbol` column, this column is *symlinked* to the `tx_id`
+ column.
+
+![img](images/dblayout.png "Database layout.")
+
+<div id="footnotes">
+<h2 class="footnotes">Footnotes: </h2>
+<div id="text-footnotes">
+
+<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup> <div class="footpara"><http://www.ensembl.org></div></div>
+
+<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup> <div class="footpara"><http://www.lrg-sequence.org></div></div>
+
+<div class="footdef"><sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/23950696></div></div>
+
+<div class="footdef"><sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/24227677></div></div>
+
+<div class="footdef"><sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup> <div class="footpara"><http://www.ensembl.org/info/docs/api/api_installation.html></div></div>
+
+
+</div>
+</div>
diff --git a/inst/doc/ensembldb.html b/inst/doc/ensembldb.html
new file mode 100644
index 0000000..4b49ca3
--- /dev/null
+++ b/inst/doc/ensembldb.html
@@ -0,0 +1,1300 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+
+
+<title>Generating an using Ensembl based annotation packages</title>
+
+<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4zIHwgKGMpIDIwMDUsIDIwMTUgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
+<meta name="viewport" content="width=device-width, initial-scale=1" />
+<link href="data:text/css;charset=utf-8,html%7Bfont%2Dfamily%3Asans%2Dserif%3B%2Dwebkit%2Dtext%2Dsize%2Dadjust%3A100%25%3B%2Dms%2Dtext%2Dsize%2Dadjust%3A100%25%7Dbody%7Bmargin%3A0%7Darticle%2Caside%2Cdetails%2Cfigcaption%2Cfigure%2Cfooter%2Cheader%2Chgroup%2Cmain%2Cmenu%2Cnav%2Csection%2Csummary%7Bdisplay%3Ablock%7Daudio%2Ccanvas%2Cprogress%2Cvideo%7Bdisplay%3Ainline%2Dblock%3Bvertical%2Dalign%3Abaseline%7Daudio%3Anot%28%5Bcontrols%5D%29%7Bdisplay%3Anone%3Bheight%3A0%7D%5Bhidden%5D%2Ctem [...]
+<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjUgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE1IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgdGhlIE1JVCBsaWNlbnNlCiAqLwppZigidW5kZWZpbmVkIj09dHlwZW9mIGpRdWVyeSl0aHJvdyBuZXcgRXJyb3IoIkJvb3RzdHJhcCdzIEphdmFTY3JpcHQgcmVxdWlyZXMgalF1ZXJ5Iik7K2Z1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0Ijt2YXIgYj1hLmZuLmpxdWVyeS5zcGxpdCgiICIpWzBdLnNwbGl0KCIuIik7aWYoYlswXTwyJiZiWzFdPDl8fDE9PWJbMF0mJjk9PWJbMV0mJmJbMl08MSl0aHJvdy [...]
+<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
+<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgovLyBPbmx5IHJ1biB0aGlzIGNvZGUgaW4gSUUgOAppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG [...]
+
+<style type="text/css">code{white-space: pre;}</style>
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
+<style type="text/css">
+
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+ window.setTimeout(function() {
+ hljs.initHighlighting();
+ }, 0);
+}
+</script>
+
+
+
+<style type="text/css">
+h1 {
+ font-size: 34px;
+}
+h1.title {
+ font-size: 38px;
+}
+h2 {
+ font-size: 30px;
+}
+h3 {
+ font-size: 24px;
+}
+h4 {
+ font-size: 18px;
+}
+h5 {
+ font-size: 16px;
+}
+h6 {
+ font-size: 12px;
+}
+.table th:not([align]) {
+ text-align: left;
+}
+</style>
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Amax%2Dwidth%3A%201054px%3B%0Amargin%3A%200px%20auto%3B%0A%7D%0Abody%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Afont%2Dsize%3A%2010pt%3B%0A%7D%0A%0Adiv%23TOC%20ul%20%7B%0Apadding%3A%200px%200px%200px%2045px%3B%0Alist%2Dstyle%3A%20none%3B%0Abackground%2Dimage%3A%20none%3B%0Abackground%2Drepeat%3A%20none%3B%0Abackground%2Dposition%3A%200%3B%0Afont%2Dsize%3A%2010pt%3B%0Afont%2Dfamily%3A%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B [...]
+
+</head>
+
+<body>
+
+<style type="text/css">
+.main-container {
+ max-width: 768px;
+ margin-left: auto;
+ margin-right: auto;
+}
+
+img {
+ max-width:100%;
+ height: auto;
+}
+.tabbed-pane {
+ padding-top: 12px;
+}
+button.code-folding-btn:focus {
+ outline: none;
+}
+</style>
+
+
+
+<div class="container-fluid main-container">
+
+<!-- tabsets -->
+<script src="data:application/x-javascript;base64,Cgp3aW5kb3cuYnVpbGRUYWJzZXRzID0gZnVuY3Rpb24odG9jSUQpIHsKCiAgLy8gYnVpbGQgYSB0YWJzZXQgZnJvbSBhIHNlY3Rpb24gZGl2IHdpdGggdGhlIC50YWJzZXQgY2xhc3MKICBmdW5jdGlvbiBidWlsZFRhYnNldCh0YWJzZXQpIHsKCiAgICAvLyBjaGVjayBmb3IgZmFkZSBhbmQgcGlsbHMgb3B0aW9ucwogICAgdmFyIGZhZGUgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1mYWRlIik7CiAgICB2YXIgcGlsbHMgPSB0YWJzZXQuaGFzQ2xhc3MoInRhYnNldC1waWxscyIpOwogICAgdmFyIG5hdkNsYXNzID0gcGlsbHMgPyAibmF2LXBpbGxzIiA6ICJuYXYtdGFicyI7CgogIC [...]
+<script>
+$(document).ready(function () {
+ window.buildTabsets("TOC");
+});
+</script>
+
+<!-- code folding -->
+
+
+
+
+
+
+<div class="fluid-row" id="header">
+
+
+
+<h1 class="title toc-ignore">Generating an using Ensembl based annotation packages</h1>
+
+</div>
+
+<h1>Contents</h1>
+<div id="TOC">
+<ul>
+<li><a href="#introduction"><span class="toc-section-number">1</span> Introduction</a></li>
+<li><a href="#using-ensembldb-annotation-packages-to-retrieve-specific-annotations"><span class="toc-section-number">2</span> Using <code>ensembldb</code> annotation packages to retrieve specific annotations</a></li>
+<li><a href="#extracting-genetranscriptexon-models-for-rnaseq-feature-counting"><span class="toc-section-number">3</span> Extracting gene/transcript/exon models for RNASeq feature counting</a></li>
+<li><a href="#retrieving-sequences-for-genetranscriptexon-models"><span class="toc-section-number">4</span> Retrieving sequences for gene/transcript/exon models</a></li>
+<li><a href="#integrating-annotations-from-ensembl-based-ensdb-packages-with-ucsc-based-annotations"><span class="toc-section-number">5</span> Integrating annotations from Ensembl based <code>EnsDb</code> packages with UCSC based annotations</a></li>
+<li><a href="#interactive-annotation-lookup-using-the-shiny-web-app"><span class="toc-section-number">6</span> Interactive annotation lookup using the <code>shiny</code> web app</a></li>
+<li><a href="#plotting-genetranscript-features-using-ensembldb-and-gviz"><span class="toc-section-number">7</span> Plotting gene/transcript features using <code>ensembldb</code> and <code>Gviz</code></a></li>
+<li><a href="#using-ensdb-objects-in-the-annotationdbi-framework"><span class="toc-section-number">8</span> Using <code>EnsDb</code> objects in the <code>AnnotationDbi</code> framework</a></li>
+<li><a href="#important-notes"><span class="toc-section-number">9</span> Important notes</a></li>
+<li><a href="#building-an-transcript-centric-database-package-based-on-ensembl-annotation"><span class="toc-section-number">10</span> Building an transcript-centric database package based on Ensembl annotation</a><ul>
+<li><a href="#requirements"><span class="toc-section-number">10.1</span> Requirements</a></li>
+<li><a href="#building-annotation-packages"><span class="toc-section-number">10.2</span> Building annotation packages</a></li>
+</ul></li>
+<li><a href="#database-layout"><span class="toc-section-number">11</span> Database layout<a id="orgtarget1"></a></a></li>
+</ul>
+</div>
+
+<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/ensembldb">ensembldb</a></em><br /> <strong>Authors</strong>: Johannes Rainer <a href="mailto:johannes.rainer at eurac.edu">johannes.rainer at eurac.edu</a>, Tim Triche <a href="mailto:tim.triche at usc.edu">tim.triche at usc.edu</a><br /> <strong>Modified</strong>: 12 September, 2016<br /> <strong>Compiled</strong>: Wed Nov 16 19:52:05 2016</p>
+<div id="introduction" class="section level1">
+<h1><span class="header-section-number">1</span> Introduction</h1>
+<p>The <code>ensembldb</code> package provides functions to create and use transcript centric annotation databases/packages. The annotation for the databases are directly fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is similar to that of the <code>TxDb</code> packages from the <code>GenomicFeatures</code> package, but, in addition to retrieve all gene/transcript models and annotations from the database, [...]
+<p>Another main goal of this package is to generate <em>versioned</em> annotation packages, i.e. annotation packages that are build for a specific Ensembl release, and are also named according to that (e.g. <code>EnsDb.Hsapiens.v75</code> for human gene definitions of the Ensembl code database version 75). This ensures reproducibility, as it allows to load annotations from a specific Ensembl release also if newer versions of annotation packages/releases are available. It also allows to l [...]
+<p>In the example below we load an Ensembl based annotation package for Homo sapiens, Ensembl version 75. The connection to the database is bound to the variable <code>EnsDb.Hsapiens.v75</code>.</p>
+<pre class="r"><code>library(EnsDb.Hsapiens.v75)
+
+## Making a "short cut"
+edb <- EnsDb.Hsapiens.v75
+## print some informations for this package
+edb</code></pre>
+<pre><code>## EnsDb for Ensembl:
+## |Backend: SQLite
+## |Db type: EnsDb
+## |Type of Gene ID: Ensembl Gene ID
+## |Supporting package: ensembldb
+## |Db created by: ensembldb package from Bioconductor
+## |script_version: 0.1.3
+## |Creation time: Thu Sep 15 13:16:58 2016
+## |ensembl_version: 75
+## |ensembl_host: localhost
+## |Organism: homo_sapiens
+## |genome_build: GRCh37
+## |DBSCHEMAVERSION: 1.0
+## | No. of genes: 64102.
+## | No. of transcripts: 215647.</code></pre>
+<pre class="r"><code>## for what organism was the database generated?
+organism(edb)</code></pre>
+<pre><code>## [1] "Homo sapiens"</code></pre>
+</div>
+<div id="using-ensembldb-annotation-packages-to-retrieve-specific-annotations" class="section level1">
+<h1><span class="header-section-number">2</span> Using <code>ensembldb</code> annotation packages to retrieve specific annotations</h1>
+<p>The <code>ensembldb</code> package provides a set of filter objects allowing to specify which entries should be fetched from the database. The complete list of filters, which can be used individually or can be combined, is shown below (in alphabetical order):</p>
+<ul>
+<li><code>ExonidFilter</code>: allows to filter the result based on the (Ensembl) exon identifiers.</li>
+<li><code>ExonrankFilter</code>: filter results on the rank (index) of an exon within the transcript model. Exons are always numbered from 5’ to 3’ end of the transcript, thus, also on the reverse strand, the exon 1 is the most 5’ exon of the transcript.</li>
+<li><code>EntrezidFilter</code>: allows to filter results based on NCBI Entrezgene identifiers of the genes.</li>
+<li><code>GenebiotypeFilter</code>: allows to filter for the gene biotypes defined in the Ensembl database; use the <code>listGenebiotypes</code> method to list all available biotypes.</li>
+<li><code>GeneidFilter</code>: allows to filter based on the Ensembl gene IDs.</li>
+<li><code>GenenameFilter</code>: allows to filter based on the names (symbols) of the genes.</li>
+<li><code>SymbolFilter</code>: allows to filter on gene symbols; note that no database columns <em>symbol</em> is available in an <code>EnsDb</code> database and hence the gene name is used for filtering.</li>
+<li><code>GRangesFilter</code>: allows to retrieve all features (genes, transcripts or exons) that are either within (setting <code>condition</code> to “within”) or partially overlapping (setting <code>condition</code> to “overlapping”) the defined genomic region/range. Note that, depending on the called method (<code>genes</code>, <code>transcripts</code> or <code>exons</code>) the start and end coordinates of either the genes, transcripts or exons are used for the filter. For methods < [...]
+<li><code>SeqendFilter</code>: filter based on the chromosomal end coordinate of the exons, transcripts or genes (correspondingly set =feature = “exon”=, =feature = “tx”= or =feature = “gene”=).</li>
+<li><code>SeqnameFilter</code>: filter by the name of the chromosomes the genes are encoded on.</li>
+<li><code>SeqstartFilter</code>: filter based on the chromosomal start coordinates of the exons, transcripts or genes (correspondingly set =feature = “exon”=, =feature = “tx”= or =feature = “gene”=).</li>
+<li><code>SeqstrandFilter</code>: filter for the chromosome strand on which the genes are encoded.</li>
+<li><code>TxbiotypeFilter</code>: filter on the transcript biotype defined in Ensembl; use the <code>listTxbiotypes</code> method to list all available biotypes.</li>
+<li><code>TxidFilter</code>: filter on the Ensembl transcript identifiers.</li>
+</ul>
+<p>Each of the filter classes can take a single value or a vector of values (with the exception of the <code>SeqendFilter</code> and <code>SeqstartFilter</code>) for comparison. In addition, it is possible to specify the <em>condition</em> for the filter, e.g. setting <code>condition</code> to = to retrieve all entries matching the filter value, to != to negate the filter or setting <code>condition = "like"= to allow partial matching. The =condition</code> parameter for <code>S [...]
+<p>A simple example would be to get all transcripts for the gene <em>BCL2L11</em>. To this end we specify a <code>GenenameFilter</code> with the value <em>BCL2L11</em>. As a result we get a <code>GRanges</code> object with <code>start</code>, <code>end</code>, <code>strand</code> and <code>seqname</code> of the <code>GRanges</code> object being the start coordinate, end coordinate, chromosome name and strand for the respective transcripts. All additional annotations are available as meta [...]
+<pre class="r"><code>Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
+
+Tx</code></pre>
+<pre><code>## GRanges object with 17 ranges and 7 metadata columns:
+## seqnames ranges strand | tx_id
+## <Rle> <IRanges> <Rle> | <character>
+## ENST00000432179 2 [111876955, 111881689] + | ENST00000432179
+## ENST00000308659 2 [111878491, 111922625] + | ENST00000308659
+## ENST00000357757 2 [111878491, 111919016] + | ENST00000357757
+## ENST00000393253 2 [111878491, 111909428] + | ENST00000393253
+## ENST00000337565 2 [111878491, 111886423] + | ENST00000337565
+## ... ... ... ... . ...
+## ENST00000452231 2 [111881323, 111921808] + | ENST00000452231
+## ENST00000361493 2 [111881323, 111921808] + | ENST00000361493
+## ENST00000431217 2 [111881323, 111921929] + | ENST00000431217
+## ENST00000439718 2 [111881323, 111922220] + | ENST00000439718
+## ENST00000438054 2 [111881329, 111903861] + | ENST00000438054
+## tx_biotype tx_cds_seq_start tx_cds_seq_end
+## <character> <integer> <integer>
+## ENST00000432179 protein_coding 111881323 111881689
+## ENST00000308659 protein_coding 111881323 111921808
+## ENST00000357757 protein_coding 111881323 111919016
+## ENST00000393253 protein_coding 111881323 111909428
+## ENST00000337565 protein_coding 111881323 111886328
+## ... ... ... ...
+## ENST00000452231 nonsense_mediated_decay 111881323 111919016
+## ENST00000361493 nonsense_mediated_decay 111881323 111887812
+## ENST00000431217 nonsense_mediated_decay 111881323 111902078
+## ENST00000439718 nonsense_mediated_decay 111881323 111909428
+## ENST00000438054 protein_coding 111881329 111902068
+## gene_id tx_name gene_name
+## <character> <character> <character>
+## ENST00000432179 ENSG00000153094 ENST00000432179 BCL2L11
+## ENST00000308659 ENSG00000153094 ENST00000308659 BCL2L11
+## ENST00000357757 ENSG00000153094 ENST00000357757 BCL2L11
+## ENST00000393253 ENSG00000153094 ENST00000393253 BCL2L11
+## ENST00000337565 ENSG00000153094 ENST00000337565 BCL2L11
+## ... ... ... ...
+## ENST00000452231 ENSG00000153094 ENST00000452231 BCL2L11
+## ENST00000361493 ENSG00000153094 ENST00000361493 BCL2L11
+## ENST00000431217 ENSG00000153094 ENST00000431217 BCL2L11
+## ENST00000439718 ENSG00000153094 ENST00000439718 BCL2L11
+## ENST00000438054 ENSG00000153094 ENST00000438054 BCL2L11
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<pre class="r"><code>## as this is a GRanges object we can access e.g. the start coordinates with
+head(start(Tx))</code></pre>
+<pre><code>## [1] 111876955 111878491 111878491 111878491 111878491 111878506</code></pre>
+<pre class="r"><code>## or extract the biotype with
+head(Tx$tx_biotype)</code></pre>
+<pre><code>## [1] "protein_coding" "protein_coding" "protein_coding" "protein_coding"
+## [5] "protein_coding" "protein_coding"</code></pre>
+<p>The parameter <code>columns</code> of the <code>exons</code>, <code>genes</code> and <code>transcripts</code> method allows to specify which database attributes (columns) should be retrieved. The <code>exons</code> method returns by default all exon-related columns, the <code>transcripts</code> all columns from the transcript database table and the <code>genes</code> all from the gene table. Note however that in the example above we got also a column <code>gene_name</code> although th [...]
+<p>To get an overview of database tables and available columns the function <code>listTables</code> can be used. The method <code>listColumns</code> on the other hand lists columns for the specified database table.</p>
+<pre class="r"><code>## list all database tables along with their columns
+listTables(edb)</code></pre>
+<pre><code>## $gene
+## [1] "gene_id" "gene_name" "entrezid"
+## [4] "gene_biotype" "gene_seq_start" "gene_seq_end"
+## [7] "seq_name" "seq_strand" "seq_coord_system"
+## [10] "symbol"
+##
+## $tx
+## [1] "tx_id" "tx_biotype" "tx_seq_start"
+## [4] "tx_seq_end" "tx_cds_seq_start" "tx_cds_seq_end"
+## [7] "gene_id" "tx_name"
+##
+## $tx2exon
+## [1] "tx_id" "exon_id" "exon_idx"
+##
+## $exon
+## [1] "exon_id" "exon_seq_start" "exon_seq_end"
+##
+## $chromosome
+## [1] "seq_name" "seq_length" "is_circular"
+##
+## $metadata
+## [1] "name" "value"</code></pre>
+<pre class="r"><code>## list columns from a specific table
+listColumns(edb, "tx")</code></pre>
+<pre><code>## [1] "tx_id" "tx_biotype" "tx_seq_start"
+## [4] "tx_seq_end" "tx_cds_seq_start" "tx_cds_seq_end"
+## [7] "gene_id" "tx_name"</code></pre>
+<p>Thus, we could retrieve all transcripts of the biotype <em>nonsense_mediated_decay</em> (which, according to the definitions by Ensembl are transcribed, but most likely not translated in a protein, but rather degraded after transcription) along with the name of the gene for each transcript. Note that we are changing here the <code>return.type</code> to <code>DataFrame</code>, so the method will return a <code>DataFrame</code> with the results instead of the default <code>GRanges</code>.</p>
+<pre class="r"><code>Tx <- transcripts(edb,
+ columns = c(listColumns(edb , "tx"), "gene_name"),
+ filter = TxbiotypeFilter("nonsense_mediated_decay"),
+ return.type = "DataFrame")
+nrow(Tx)</code></pre>
+<pre><code>## [1] 13812</code></pre>
+<pre class="r"><code>Tx</code></pre>
+<pre><code>## DataFrame with 13812 rows and 9 columns
+## tx_id tx_biotype tx_seq_start tx_seq_end
+## <character> <character> <integer> <integer>
+## 1 ENST00000495251 nonsense_mediated_decay 64085 69409
+## 2 ENST00000462860 nonsense_mediated_decay 64085 69452
+## 3 ENST00000483390 nonsense_mediated_decay 65739 68764
+## 4 ENST00000538848 nonsense_mediated_decay 66411 68843
+## 5 ENST00000567466 nonsense_mediated_decay 97578 99521
+## ... ... ... ... ...
+## 13808 ENST00000496411 nonsense_mediated_decay 249149927 249153217
+## 13809 ENST00000483223 nonsense_mediated_decay 249150714 249152728
+## 13810 ENST00000533647 nonsense_mediated_decay 249151472 249152523
+## 13811 ENST00000528141 nonsense_mediated_decay 249151590 249153284
+## 13812 ENST00000530986 nonsense_mediated_decay 249151668 249153284
+## tx_cds_seq_start tx_cds_seq_end gene_id tx_name
+## <integer> <integer> <character> <character>
+## 1 68052 68789 ENSG00000234769 ENST00000495251
+## 2 68052 68789 ENSG00000234769 ENST00000462860
+## 3 66428 68764 ENSG00000234769 ENST00000483390
+## 4 67418 68789 ENSG00000234769 ENST00000538848
+## 5 98546 98893 ENSG00000261456 ENST00000567466
+## ... ... ... ... ...
+## 13808 249152153 249152508 ENSG00000171163 ENST00000496411
+## 13809 249152153 249152508 ENSG00000171163 ENST00000483223
+## 13810 249152153 249152508 ENSG00000171163 ENST00000533647
+## 13811 249152203 249152508 ENSG00000171163 ENST00000528141
+## 13812 249152203 249152508 ENSG00000171163 ENST00000530986
+## gene_name
+## <character>
+## 1 WASH4P
+## 2 WASH4P
+## 3 WASH4P
+## 4 WASH4P
+## 5 TUBB8
+## ... ...
+## 13808 ZNF692
+## 13809 ZNF692
+## 13810 ZNF692
+## 13811 ZNF692
+## 13812 ZNF692</code></pre>
+<p>For protein coding transcripts, we can also specifically extract their coding region. In the example below we extract the CDS for all transcripts encoded on chromosome Y.</p>
+<pre class="r"><code>yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+yCds</code></pre>
+<pre><code>## GRangesList object of length 160:
+## $ENST00000155093
+## GRanges object with 7 ranges and 3 metadata columns:
+## seqnames ranges strand | seq_name exon_id
+## <Rle> <IRanges> <Rle> | <character> <character>
+## [1] Y [2821978, 2822038] + | Y ENSE00002223884
+## [2] Y [2829115, 2829687] + | Y ENSE00003645989
+## [3] Y [2843136, 2843285] + | Y ENSE00003548678
+## [4] Y [2843552, 2843695] + | Y ENSE00003611496
+## [5] Y [2844711, 2844863] + | Y ENSE00001649504
+## [6] Y [2845981, 2846121] + | Y ENSE00001777381
+## [7] Y [2846851, 2848034] + | Y ENSE00001368923
+## exon_rank
+## <integer>
+## [1] 2
+## [2] 3
+## [3] 4
+## [4] 5
+## [5] 6
+## [6] 7
+## [7] 8
+##
+## $ENST00000215473
+## GRanges object with 6 ranges and 3 metadata columns:
+## seqnames ranges strand | seq_name exon_id
+## [1] Y [4924865, 4925500] + | Y ENSE00001436852
+## [2] Y [4966256, 4968748] + | Y ENSE00001640924
+## [3] Y [5369098, 5369296] + | Y ENSE00001803775
+## [4] Y [5483308, 5483316] + | Y ENSE00001731866
+## [5] Y [5491131, 5491145] + | Y ENSE00001711324
+## [6] Y [5605313, 5605983] + | Y ENSE00001779807
+## exon_rank
+## [1] 1
+## [2] 2
+## [3] 3
+## [4] 4
+## [5] 5
+## [6] 6
+##
+## $ENST00000215479
+## GRanges object with 5 ranges and 3 metadata columns:
+## seqnames ranges strand | seq_name exon_id
+## [1] Y [6740596, 6740649] - | Y ENSE00001671586
+## [2] Y [6738047, 6738094] - | Y ENSE00001645681
+## [3] Y [6736773, 6736817] - | Y ENSE00000652250
+## [4] Y [6736078, 6736503] - | Y ENSE00001667251
+## [5] Y [6734114, 6734119] - | Y ENSE00001494454
+## exon_rank
+## [1] 2
+## [2] 3
+## [3] 4
+## [4] 5
+## [5] 6
+##
+## ...
+## <157 more elements>
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<p>Using a <code>GRangesFilter</code> we can retrieve all features from the database that are either within or overlapping the specified genomic region. In the example below we query all genes that are partially overlapping with a small region on chromosome 11. The filter restricts to all genes for which either an exon or an intron is partially overlapping with the region.</p>
+<pre class="r"><code>## Define the filter
+grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+ strand = "+"), condition = "overlapping")
+
+## Query genes:
+gn <- genes(edb, filter = grf)
+gn</code></pre>
+<pre><code>## GRanges object with 1 range and 6 metadata columns:
+## seqnames ranges strand | gene_id
+## <Rle> <IRanges> <Rle> | <character>
+## ENSG00000109906 11 [113930315, 114121398] + | ENSG00000109906
+## gene_name entrezid gene_biotype seq_coord_system
+## <character> <character> <character> <character>
+## ENSG00000109906 ZBTB16 7704 protein_coding chromosome
+## symbol
+## <character>
+## ENSG00000109906 ZBTB16
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<pre class="r"><code>## Next we retrieve all transcripts for that gene so that we can plot them.
+txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))</code></pre>
+<pre class="r"><code>plot(3, 3, pch = NA, xlim = c(start(gn), end(gn)), ylim = c(0, length(txs)),
+ yaxt = "n", ylab = "")
+## Highlight the GRangesFilter region
+rect(xleft = start(grf), xright = end(grf), ybottom = 0, ytop = length(txs),
+ col = "red", border = "red")
+for(i in 1:length(txs)) {
+ current <- txs[i]
+ rect(xleft = start(current), xright = end(current), ybottom = i-0.975,
+ ytop = i-0.125, border = "grey")
+ text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
+}</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAPACAIAAABM5kn/AAAACXBIWXMAAB2HAAAdhwGP5fFlAAAgAElEQVR4nOzde3RV5Z0//ieHhMM1yE0FRJk6CChIB/AyigvEUVGkM1214CCWm4I1dSxUqf0qikKdqiy8FBWkWmahdgBdlSlfAVdJdYSKkAoCXkDUUUSRACGQQIAk5/fH/jXfTEIwUOBkw+v1h+vk2Z/9nM/eSPS8z7P3zkilUgEAAAAgzhLpbgAAAADgbyXgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7Ag4AAAAg9gQcAAAAQOwJOAAAAIDYE3AAAAAAsSfgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7Ag4AAAAg9gQcAAAAQOwJOAAAAIDYE3AAAAAAsSfgAAAAAGJPwAEAAADEnoADAAAAiD0BBwAAABB7A [...]
+<p>As we can see, 4 transcripts of the gene ZBTB16 are also overlapping the region. Below we fetch these 4 transcripts. Note, that a call to <code>exons</code> will not return any features from the database, as no exon is overlapping with the region.</p>
+<pre class="r"><code>transcripts(edb, filter = grf)</code></pre>
+<pre><code>## GRanges object with 4 ranges and 6 metadata columns:
+## seqnames ranges strand | tx_id
+## <Rle> <IRanges> <Rle> | <character>
+## ENST00000335953 11 [113930315, 114121398] + | ENST00000335953
+## ENST00000541602 11 [113930447, 114060486] + | ENST00000541602
+## ENST00000392996 11 [113931229, 114121374] + | ENST00000392996
+## ENST00000539918 11 [113935134, 114118066] + | ENST00000539918
+## tx_biotype tx_cds_seq_start tx_cds_seq_end
+## <character> <integer> <integer>
+## ENST00000335953 protein_coding 113934023 114121277
+## ENST00000541602 retained_intron <NA> <NA>
+## ENST00000392996 protein_coding 113934023 114121277
+## ENST00000539918 nonsense_mediated_decay 113935134 113992549
+## gene_id tx_name
+## <character> <character>
+## ENST00000335953 ENSG00000109906 ENST00000335953
+## ENST00000541602 ENSG00000109906 ENST00000541602
+## ENST00000392996 ENSG00000109906 ENST00000392996
+## ENST00000539918 ENSG00000109906 ENST00000539918
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<p>The <code>GRangesFilter</code> supports also <code>GRanges</code> defining multiple regions and a query will return all features overlapping any of these regions. Besides using the <code>GRangesFilter</code> it is also possible to search for transcripts or exons overlapping genomic regions using the <code>exonsByOverlaps</code> or <code>transcriptsByOverlaps</code> known from the <code>GenomicFeatures</code> package. Note that the implementation of these methods for <code>EnsDb</code> [...]
+<p>To get an overview of allowed/available gene and transcript biotype the functions <code>listGenebiotypes</code> and <code>listTxbiotypes</code> can be used.</p>
+<pre class="r"><code>## Get all gene biotypes from the database. The GenebiotypeFilter
+## allows to filter on these values.
+listGenebiotypes(edb)</code></pre>
+<pre><code>## [1] "protein_coding" "pseudogene"
+## [3] "processed_transcript" "antisense"
+## [5] "lincRNA" "polymorphic_pseudogene"
+## [7] "IG_V_pseudogene" "IG_V_gene"
+## [9] "sense_overlapping" "sense_intronic"
+## [11] "TR_V_gene" "misc_RNA"
+## [13] "snRNA" "miRNA"
+## [15] "snoRNA" "rRNA"
+## [17] "Mt_tRNA" "Mt_rRNA"
+## [19] "IG_C_gene" "IG_J_gene"
+## [21] "TR_J_gene" "TR_C_gene"
+## [23] "TR_V_pseudogene" "TR_J_pseudogene"
+## [25] "IG_D_gene" "IG_C_pseudogene"
+## [27] "TR_D_gene" "IG_J_pseudogene"
+## [29] "3prime_overlapping_ncrna" "processed_pseudogene"
+## [31] "LRG_gene"</code></pre>
+<pre class="r"><code>## Get all transcript biotypes from the database.
+listTxbiotypes(edb)</code></pre>
+<pre><code>## [1] "protein_coding"
+## [2] "processed_transcript"
+## [3] "retained_intron"
+## [4] "nonsense_mediated_decay"
+## [5] "unitary_pseudogene"
+## [6] "non_stop_decay"
+## [7] "unprocessed_pseudogene"
+## [8] "processed_pseudogene"
+## [9] "transcribed_unprocessed_pseudogene"
+## [10] "antisense"
+## [11] "lincRNA"
+## [12] "polymorphic_pseudogene"
+## [13] "transcribed_processed_pseudogene"
+## [14] "miRNA"
+## [15] "pseudogene"
+## [16] "IG_V_pseudogene"
+## [17] "snoRNA"
+## [18] "IG_V_gene"
+## [19] "sense_overlapping"
+## [20] "sense_intronic"
+## [21] "TR_V_gene"
+## [22] "snRNA"
+## [23] "misc_RNA"
+## [24] "rRNA"
+## [25] "Mt_tRNA"
+## [26] "Mt_rRNA"
+## [27] "IG_C_gene"
+## [28] "IG_J_gene"
+## [29] "TR_J_gene"
+## [30] "TR_C_gene"
+## [31] "TR_V_pseudogene"
+## [32] "TR_J_pseudogene"
+## [33] "IG_D_gene"
+## [34] "IG_C_pseudogene"
+## [35] "TR_D_gene"
+## [36] "IG_J_pseudogene"
+## [37] "3prime_overlapping_ncrna"
+## [38] "translated_processed_pseudogene"
+## [39] "LRG_gene"</code></pre>
+<p>Data can be fetched in an analogous way using the <code>exons</code> and <code>genes</code> methods. In the example below we retrieve <code>gene_name</code>, <code>entrezid</code> and the <code>gene_biotype</code> of all genes in the database which names start with “BCL2”.</p>
+<pre class="r"><code>## We're going to fetch all genes which names start with BCL. To this end
+## we define a GenenameFilter with partial matching, i.e. condition "like"
+## and a % for any character/string.
+BCLs <- genes(edb,
+ columns = c("gene_name", "entrezid", "gene_biotype"),
+ filter = list(GenenameFilter("BCL%", condition = "like")),
+ return.type = "DataFrame")
+nrow(BCLs)</code></pre>
+<pre><code>## [1] 25</code></pre>
+<pre class="r"><code>BCLs</code></pre>
+<pre><code>## DataFrame with 25 rows and 4 columns
+## gene_name entrezid gene_biotype gene_id
+## <character> <character> <character> <character>
+## 1 BCL10 8915 protein_coding ENSG00000142867
+## 2 BCL11A 53335 protein_coding ENSG00000119866
+## 3 BCL11B 64919 protein_coding ENSG00000127152
+## 4 BCL2 596 protein_coding ENSG00000171791
+## 5 BCL2A1 597 protein_coding ENSG00000140379
+## ... ... ... ... ...
+## 21 BCL7C 9274 protein_coding ENSG00000099385
+## 22 BCL9 607 protein_coding ENSG00000116128
+## 23 BCL9 607 protein_coding ENSG00000266095
+## 24 BCL9L 283149 protein_coding ENSG00000186174
+## 25 BCLAF1 9774 protein_coding ENSG00000029363</code></pre>
+<p>Sometimes it might be useful to know the length of genes or transcripts (i.e. the total sum of nucleotides covered by their exons). Below we calculate the mean length of transcripts from protein coding genes on chromosomes X and Y as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on these chromosomes.</p>
+<pre class="r"><code>## determine the average length of snRNA, snoRNA and rRNA genes encoded on
+## chromosomes X and Y.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+ SeqnameFilter(c("X", "Y")))))</code></pre>
+<pre><code>## [1] 116.3046</code></pre>
+<pre class="r"><code>## determine the average length of protein coding genes encoded on the same
+## chromosomes.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))</code></pre>
+<pre><code>## [1] 1920</code></pre>
+<p>Not unexpectedly, transcripts of protein coding genes are longer than those of snRNA, snoRNA or rRNA genes.</p>
+<p>At last we extract the first two exons of each transcript model from the database.</p>
+<pre class="r"><code>## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns = c("tx_id", "exon_idx"),
+ filter = list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition = "<")))</code></pre>
+<pre><code>## GRanges object with 1287 ranges and 3 metadata columns:
+## seqnames ranges strand | tx_id
+## <Rle> <IRanges> <Rle> | <character>
+## ENSE00002088309 Y [2652790, 2652894] + | ENST00000516032
+## ENSE00001494622 Y [2654896, 2655740] - | ENST00000383070
+## ENSE00002323146 Y [2655049, 2655069] - | ENST00000525526
+## ENSE00002201849 Y [2655075, 2655644] - | ENST00000525526
+## ENSE00002214525 Y [2655145, 2655168] - | ENST00000534739
+## ... ... ... ... . ...
+## ENSE00001632993 Y [28737695, 28737748] - | ENST00000456738
+## ENSE00001616687 Y [28772667, 28773306] - | ENST00000435741
+## ENSE00001638296 Y [28779492, 28779578] - | ENST00000435945
+## ENSE00001797328 Y [28780670, 28780799] - | ENST00000435945
+## ENSE00001794473 Y [59001391, 59001635] + | ENST00000431853
+## exon_idx exon_id
+## <integer> <character>
+## ENSE00002088309 1 ENSE00002088309
+## ENSE00001494622 1 ENSE00001494622
+## ENSE00002323146 2 ENSE00002323146
+## ENSE00002201849 1 ENSE00002201849
+## ENSE00002214525 2 ENSE00002214525
+## ... ... ...
+## ENSE00001632993 1 ENSE00001632993
+## ENSE00001616687 1 ENSE00001616687
+## ENSE00001638296 2 ENSE00001638296
+## ENSE00001797328 1 ENSE00001797328
+## ENSE00001794473 1 ENSE00001794473
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+</div>
+<div id="extracting-genetranscriptexon-models-for-rnaseq-feature-counting" class="section level1">
+<h1><span class="header-section-number">3</span> Extracting gene/transcript/exon models for RNASeq feature counting</h1>
+<p>For the feature counting step of an RNAseq experiment, the gene or transcript models (defined by the chromosomal start and end positions of their exons) have to be known. To extract these from an Ensembl based annotation package, the <code>exonsBy</code>, <code>genesBy</code> and <code>transcriptsBy</code> methods can be used in an analogous way as in <code>TxDb</code> packages generated by the <code>GenomicFeatures</code> package. However, the <code>transcriptsBy</code> method does n [...]
+<p>A simple use case is to retrieve all genes encoded on chromosomes X and Y from the database.</p>
+<pre class="r"><code>TxByGns <- transcriptsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c("X", "Y")))
+ )
+TxByGns</code></pre>
+<pre><code>## GRangesList object of length 2908:
+## $ENSG00000000003
+## GRanges object with 3 ranges and 6 metadata columns:
+## seqnames ranges strand | tx_id
+## <Rle> <IRanges> <Rle> | <character>
+## [1] X [99888439, 99894988] - | ENST00000494424
+## [2] X [99883667, 99891803] - | ENST00000373020
+## [3] X [99887538, 99891686] - | ENST00000496771
+## tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id
+## <character> <integer> <integer> <character>
+## [1] processed_transcript <NA> <NA> ENSG00000000003
+## [2] protein_coding 99885795 99891691 ENSG00000000003
+## [3] processed_transcript <NA> <NA> ENSG00000000003
+## tx_name
+## <character>
+## [1] ENST00000494424
+## [2] ENST00000373020
+## [3] ENST00000496771
+##
+## $ENSG00000000005
+## GRanges object with 2 ranges and 6 metadata columns:
+## seqnames ranges strand | tx_id
+## [1] X [99839799, 99854882] + | ENST00000373031
+## [2] X [99848621, 99852528] + | ENST00000485971
+## tx_biotype tx_cds_seq_start tx_cds_seq_end gene_id
+## [1] protein_coding 99840016 99854714 ENSG00000000005
+## [2] processed_transcript <NA> <NA> ENSG00000000005
+## tx_name
+## [1] ENST00000373031
+## [2] ENST00000485971
+##
+## $ENSG00000001497
+## GRanges object with 6 ranges and 6 metadata columns:
+## seqnames ranges strand | tx_id
+## [1] X [64732463, 64754655] - | ENST00000484069
+## [2] X [64732462, 64754636] - | ENST00000374811
+## [3] X [64732463, 64754636] - | ENST00000374804
+## [4] X [64732463, 64754636] - | ENST00000312391
+## [5] X [64732462, 64754634] - | ENST00000374807
+## [6] X [64740309, 64743497] - | ENST00000469091
+## tx_biotype tx_cds_seq_start tx_cds_seq_end
+## [1] nonsense_mediated_decay 64744901 64754595
+## [2] protein_coding 64732655 64754595
+## [3] protein_coding 64732655 64754595
+## [4] protein_coding 64744901 64754595
+## [5] protein_coding 64732655 64754595
+## [6] protein_coding 64740535 64743497
+## gene_id tx_name
+## [1] ENSG00000001497 ENST00000484069
+## [2] ENSG00000001497 ENST00000374811
+## [3] ENSG00000001497 ENST00000374804
+## [4] ENSG00000001497 ENST00000312391
+## [5] ENSG00000001497 ENST00000374807
+## [6] ENSG00000001497 ENST00000469091
+##
+## ...
+## <2905 more elements>
+## -------
+## seqinfo: 2 sequences from GRCh37 genome</code></pre>
+<p>Since Ensembl contains also definitions of genes that are on chromosome variants (supercontigs), it is advisable to specify the chromosome names for which the gene models should be returned.</p>
+<p>In a real use case, we might thus want to retrieve all genes encoded on the <em>standard</em> chromosomes. In addition it is advisable to use a <code>GeneidFilter</code> to restrict to Ensembl genes only, as also <em>LRG</em> (Locus Reference Genomic) genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with Ensembl genes.</p>
+<pre class="r"><code>## will just get exons for all genes on chromosomes 1 to 22, X and Y.
+## Note: want to get rid of the "LRG" genes!!!
+EnsGenes <- exonsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))</code></pre>
+<p>The code above returns a <code>GRangesList</code> that can be used directly as an input for the <code>summarizeOverlaps</code> function from the <code>GenomicAlignments</code> package <sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>.</p>
+<p>Alternatively, the above <code>GRangesList</code> can be transformed to a <code>data.frame</code> in <em>SAF</em> format that can be used as an input to the <code>featureCounts</code> function of the <code>Rsubread</code> package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.</p>
+<pre class="r"><code>## Transforming the GRangesList into a data.frame in SAF format
+EnsGenes.SAF <- toSAF(EnsGenes)</code></pre>
+<p>Note that the ID by which the <code>GRangesList</code> is split is used in the SAF formatted <code>data.frame</code> as the <code>GeneID</code>. In the example below this would be the Ensembl gene IDs, while the start, end coordinates (along with the strand and chromosomes) are those of the the exons.</p>
+<p>In addition, the <code>disjointExons</code> function (similar to the one defined in <code>GenomicFeatures</code>) can be used to generate a <code>GRanges</code> of non-overlapping exon parts which can be used in the <code>DEXSeq</code> package.</p>
+<pre class="r"><code>## Create a GRanges of non-overlapping exon parts.
+DJE <- disjointExons(edb,
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))</code></pre>
+</div>
+<div id="retrieving-sequences-for-genetranscriptexon-models" class="section level1">
+<h1><span class="header-section-number">4</span> Retrieving sequences for gene/transcript/exon models</h1>
+<p>The methods to retrieve exons, transcripts and genes (i.e. <code>exons</code>, <code>transcripts</code> and <code>genes</code>) return by default <code>GRanges</code> objects that can be used to retrieve sequences using the <code>getSeq</code> method e.g. from BSgenome packages. The basic workflow is thus identical to the one for <code>TxDb</code> packages, however, it is not straight forward to identify the BSgenome package with the matching genomic sequence. Most BSgenome packages a [...]
+<p>In the code block below we retrieve first the <code>FaFile</code> with the genomic DNA sequence, extract the genomic start and end coordinates for all genes defined in the package, subset to genes encoded on sequences available in the <code>FaFile</code> and extract all of their sequences. Note: these sequences represent the sequence between the chromosomal start and end coordinates of the gene.</p>
+<pre class="r"><code>library(EnsDb.Hsapiens.v75)
+library(Rsamtools)
+edb <- EnsDb.Hsapiens.v75
+
+## Get the FaFile with the genomic sequence matching the Ensembl version
+## using the AnnotationHub package.
+Dna <- getGenomeFaFile(edb)
+
+## Get start/end coordinates of all genes.
+genes <- genes(edb)
+## Subset to all genes that are encoded on chromosomes for which
+## we do have DNA sequence available.
+genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+## Get the gene sequences, i.e. the sequence including the sequence of
+## all of the gene's exons and introns.
+geneSeqs <- getSeq(Dna, genes)</code></pre>
+<p>To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can use directly the <code>extractTranscriptSeqs</code> method defined in the <code>GenomicFeatures</code> on the <code>EnsDb</code> object, eventually using a filter to restrict the query.</p>
+<pre class="r"><code>## get all exons of all transcripts encoded on chromosome Y
+yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+
+## Retrieve the sequences for these transcripts from the FaFile.
+library(GenomicFeatures)
+yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
+yTxSeqs
+
+## Extract the sequences of all transcripts encoded on chromosome Y.
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+
+## Along these lines, we could use the method also to retrieve the coding sequence
+## of all transcripts on the Y chromosome.
+cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+extractTranscriptSeqs(Dna, cdsY)</code></pre>
+<p>Note: in the next section we describe how transcript sequences can be retrieved from a <code>BSgenome</code> package that is based on UCSC, not Ensembl.</p>
+</div>
+<div id="integrating-annotations-from-ensembl-based-ensdb-packages-with-ucsc-based-annotations" class="section level1">
+<h1><span class="header-section-number">5</span> Integrating annotations from Ensembl based <code>EnsDb</code> packages with UCSC based annotations</h1>
+<p>Sometimes it might be useful to combine (Ensembl based) annotations from <code>EnsDb</code> packages/objects with annotations from other Bioconductor packages, that might base on UCSC annotations. To support such an integration of annotations, the <code>ensembldb</code> packages implements the <code>seqlevelsStyle</code> and <code>seqlevelsStyle<-</code> from the <code>GenomeInfoDb</code> package that allow to change the style of chromosome naming. Thus, sequence/chromosome names o [...]
+<p>In the example below we change the seqnames style to UCSC.</p>
+<pre class="r"><code>## Change the seqlevels style form Ensembl (default) to UCSC:
+seqlevelsStyle(edb) <- "UCSC"
+
+## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
+genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## The seqlevels of the returned GRanges are also in UCSC style
+seqlevels(genesY)</code></pre>
+<pre><code>## [1] "chrY"</code></pre>
+<p>Note that in most instances no mapping is available for sequences not corresponding to the main chromosomes (i.e. contigs, patched chromosomes etc). What is returned in cases in which no mapping is available can be specified with the global <code>ensembldb.seqnameNotFound</code> option. By default (with <code>ensembldb.seqnameNotFound</code> set to “ORIGINAL”), the original seqnames (i.e. the ones from Ensembl) are returned. With <code>ensembldb.seqnameNotFound</code> “MISSING” each t [...]
+<pre class="r"><code>seqlevelsStyle(edb) <- "UCSC"
+
+## Getting the default option:
+getOption("ensembldb.seqnameNotFound")</code></pre>
+<pre><code>## [1] "ORIGINAL"</code></pre>
+<pre class="r"><code>## Listing all seqlevels in the database.
+seqlevels(edb)[1:30]</code></pre>
+<pre><code>## Warning in .formatSeqnameByStyleFromQuery(x, sn, ifNotFound): More than 5
+## seqnames with seqlevels style of the database (Ensembl) could not be mapped
+## to the seqlevels style: UCSC!) Returning the orginal seqnames for these.</code></pre>
+<pre><code>## [1] "chr1" "chr10" "chr11" "chr12" "chr13"
+## [6] "chr14" "chr15" "chr16" "chr17" "chr18"
+## [11] "chr19" "chr2" "chr20" "chr21" "chr22"
+## [16] "chr3" "chr4" "chr5" "chr6" "chr7"
+## [21] "chr8" "chr9" "GL000191.1" "GL000192.1" "GL000193.1"
+## [26] "GL000194.1" "GL000195.1" "GL000196.1" "GL000199.1" "GL000201.1"</code></pre>
+<pre class="r"><code>## Setting the option to NA, thus, for each seqname for which no mapping is available,
+## NA is returned.
+options(ensembldb.seqnameNotFound=NA)
+seqlevels(edb)[1:30]</code></pre>
+<pre><code>## Warning in .formatSeqnameByStyleFromQuery(x, sn, ifNotFound): More than 5
+## seqnames with seqlevels style of the database (Ensembl) could not be mapped
+## to the seqlevels style: UCSC!) Returning NA for these.</code></pre>
+<pre><code>## [1] "chr1" "chr10" "chr11" "chr12" "chr13" "chr14" "chr15" "chr16"
+## [9] "chr17" "chr18" "chr19" "chr2" "chr20" "chr21" "chr22" "chr3"
+## [17] "chr4" "chr5" "chr6" "chr7" "chr8" "chr9" NA NA
+## [25] NA NA NA NA NA NA</code></pre>
+<pre class="r"><code>## Resetting the option.
+options(ensembldb.seqnameNotFound = "ORIGINAL")</code></pre>
+<p>Next we retrieve transcript sequences from genes encoded on chromosome Y using the <code>BSGenome</code> package for the human genome from UCSC. The specified version <code>hg19</code> matches the genome build of Ensembl version 75, i.e. <code>GRCh37</code>. Note that while we changed the style of the seqnames to UCSC we did not change the naming of the genome release.</p>
+<pre class="r"><code>library(BSgenome.Hsapiens.UCSC.hg19)
+bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+## Get the genome version
+unique(genome(bsg))</code></pre>
+<pre><code>## [1] "hg19"</code></pre>
+<pre class="r"><code>unique(genome(edb))</code></pre>
+<pre><code>## [1] "GRCh37"</code></pre>
+<pre class="r"><code>## Although differently named, both represent genome build GRCh37.
+
+## Extract the full transcript sequences.
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+
+yTxSeqs</code></pre>
+<pre><code>## A DNAStringSet instance of length 731
+## width seq names
+## [1] 5239 GCCTAGTGCGCGCGCAGTAA...AAATGTTTACTTGTATATG ENST00000155093
+## [2] 4023 ATGTTTAGGGTTGGCTTCTT...GGAAACACATCCCTTGTAA ENST00000215473
+## [3] 802 AGAGGACCAAGCCTCCCTGT...TAAAATGTTTTAAAAATCA ENST00000215479
+## [4] 910 TGTCTGTCAGAGCTGTCAGC...ACACTGGTATATTTCTGTT ENST00000250776
+## [5] 1305 TTCCAGGATATGAACTCTAC...ATCCTGTGGCTGTAGGAAA ENST00000250784
+## ... ... ...
+## [727] 333 ATGGATGAAGAAGAGAAAAC...TGAACTTTCTAGATTGCAT ENST00000604924
+## [728] 1247 CATGGCGGGGTTCCTGCCTT...TTTGGAGTAATGTCTTAGT ENST00000605584
+## [729] 199 CAGTTCTCGCTCCTGTGCAG...GGTCTGGGTGGCTTCTGGA ENST00000605663
+## [730] 276 GCCCCAGGAGGAAAGGGGGA...AATAAAGAACAGCGCATTC ENST00000606439
+## [731] 444 ATGGGAGCCACTGGGCTTGG...CGTTCATGAAGAAGACTAA ENST00000607210</code></pre>
+<pre class="r"><code>## Extract just the CDS
+Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxCds</code></pre>
+<pre><code>## A DNAStringSet instance of length 160
+## width seq names
+## [1] 2406 ATGGATGAAGATGAATTTGA...AGAAGTTGGTCTGCCCTAA ENST00000155093
+## [2] 4023 ATGTTTAGGGTTGGCTTCTT...GGAAACACATCCCTTGTAA ENST00000215473
+## [3] 579 ATGGGGACCTGGATTTTGTT...GCAGGAGGAAGTGGATTAA ENST00000215479
+## [4] 792 ATGGCCCGGGGCCCCAAGAA...CAAACAGAGCAGTGGCTAA ENST00000250784
+## [5] 378 ATGAGTCCAAAGCCGAGAGC...TACTCCCCTATCTCCCTGA ENST00000250823
+## ... ... ...
+## [156] 63 CGCAAGGATTTAAAAGAGAT...ACCCTGTTGGCCAGGCTAG ENST00000601700
+## [157] 42 CTTGATACAAAGAATCAATTTAATTTTAAGATTGTCTATCTT ENST00000601705
+## [158] 33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT ENST00000602680
+## [159] 33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT ENST00000602732
+## [160] 33 ATGATGACGCTTGTCCCCAGAGCCAGGACACGT ENST00000602770</code></pre>
+<p>At last changing the seqname style to the default value =“Ensembl”=.</p>
+<pre class="r"><code>seqlevelsStyle(edb) <- "Ensembl"</code></pre>
+</div>
+<div id="interactive-annotation-lookup-using-the-shiny-web-app" class="section level1">
+<h1><span class="header-section-number">6</span> Interactive annotation lookup using the <code>shiny</code> web app</h1>
+<p>In addition to the <code>genes</code>, <code>transcripts</code> and <code>exons</code> methods it is possibly to search interactively for gene/transcript/exon annotations using the internal, <code>shiny</code> based, web application. The application can be started with the <code>runEnsDbApp()</code> function. The search results from this app can also be returned to the R workspace either as a <code>data.frame</code> or <code>GRanges</code> object.</p>
+</div>
+<div id="plotting-genetranscript-features-using-ensembldb-and-gviz" class="section level1">
+<h1><span class="header-section-number">7</span> Plotting gene/transcript features using <code>ensembldb</code> and <code>Gviz</code></h1>
+<p>The <code>Gviz</code> package provides functions to plot genes and transcripts along with other data on a genomic scale. Gene models can be provided either as a <code>data.frame</code>, <code>GRanges</code>, <code>TxDB</code> database, can be fetched from biomart and can also be retrieved from <code>ensembldb</code>.</p>
+<p>Below we generate a <code>GeneRegionTrack</code> fetching all transcripts from a certain region on chromosome Y.</p>
+<p>Note that if we want in addition to work also with BAM files that were aligned against DNA sequences retrieved from Ensembl or FASTA files representing genomic DNA sequences from Ensembl we should change the <code>ucscChromosomeNames</code> option from <code>Gviz</code> to <code>FALSE</code> (i.e. by calling <code>options(ucscChromosomeNames = FALSE)</code>). This is not necessary if we just want to retrieve gene models from an <code>EnsDb</code> object, as the <code>ensembldb</code> [...]
+<pre class="r"><code>## Loading the Gviz library
+library(Gviz)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Retrieving a Gviz compatible GRanges object with all genes
+## encoded on chromosome Y.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "Y",
+ start = 20400000, end = 21400000)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+
+## We have to change the ucscChromosomeNames option to FALSE to enable Gviz usage
+## with non-UCSC chromosome names.
+options(ucscChromosomeNames = FALSE)
+
+plotTracks(list(gat, GeneRegionTrack(gr)))</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAGwCAMAAABo5zJyAAABuVBMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqaiqqqqrqqirq6usqqesq6isrKytq6etra2uq6eurq6vrKevr6+wsLCxsbGysrKzs7O0r6W0tLS1r6S1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AtaHAwMDBwcHCwsLDw8PExMTFxcXGxsbHuZ/Hx8fIyMjJuZ3JycnKysrLup3Ly8vMzMzNzc3OvJzOzs7PvJvPz8/Q0NDR0dHS0tLTv5rT09PU1 [...]
+<pre class="r"><code>options(ucscChromosomeNames = TRUE)</code></pre>
+<p>Above we had to change the option <code>ucscChromosomeNames</code> to <code>FALSE</code> in order to use it with non-UCSC chromosome names. Alternatively, we could however also change the <code>seqnamesStyle</code> of the <code>EnsDb</code> object to <code>UCSC</code>. Note that we have to use now also chromosome names in the <em>UCSC style</em> in the <code>SeqnameFilter</code> (i.e. “chrY” instead of <code>Y</code>).</p>
+<pre class="r"><code>seqlevelsStyle(edb) <- "UCSC"
+## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000)
+seqnames(gr)</code></pre>
+<pre><code>## factor-Rle of length 218 with 1 run
+## Lengths: 218
+## Values : chrY
+## Levels(1): chrY</code></pre>
+<pre class="r"><code>## Define a genome axis track
+gat <- GenomeAxisTrack()
+plotTracks(list(gat, GeneRegionTrack(gr)))</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAPACAMAAAD0Wi6aAAABv1BMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqmqqqqrqqirq6usqqesq6isrKytq6etq6itra2urKeurq6vrKavrKevr6+wsLCxraaxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7sqO7u7u8vLy9s6G9vb2+tKG+vr6/v7/AtaDAwMDBtaHBwcHCwsLDw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLTv5rT09PU1 [...]
+<p>We can also use the filters from the <code>ensembldb</code> package to further refine what transcripts are fetched, like in the example below, in which we create two different gene region tracks, one for protein coding genes and one for lincRNAs.</p>
+<pre class="r"><code>protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("protein_coding"))
+lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("lincRNA"))
+
+plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
+ GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")</code></pre>
+<p><img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAABaAAAAGwCAMAAABo5zJyAAABsFBMVEWAgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usqqesrKytq6itra2uq6auq6eurq6vrKevr6+wrKawsLCxraaxsbGysrKzrqWzs7O0rqS0r6W0tLS1r6W1tbW2tra3sKS3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLDw8PEt5/ExMTFxcXGuJ/GxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLT09PU1NTV1dXW1 [...]
+<pre class="r"><code>## At last we change the seqlevels style again to Ensembl
+seqlevelsStyle <- "Ensembl"</code></pre>
+</div>
+<div id="using-ensdb-objects-in-the-annotationdbi-framework" class="section level1">
+<h1><span class="header-section-number">8</span> Using <code>EnsDb</code> objects in the <code>AnnotationDbi</code> framework</h1>
+<p>Most of the methods defined for objects extending the basic annotation package class <code>AnnotationDbi</code> are also defined for <code>EnsDb</code> objects (i.e. methods <code>columns</code>, <code>keytypes</code>, <code>keys</code>, <code>mapIds</code> and <code>select</code>). While these methods can be used analogously to basic annotation packages, the implementation for <code>EnsDb</code> objects also support the filtering framework of the <code>ensembldb</code> package.</p>
+<p>In the example below we first evaluate all the available columns and keytypes in the database and extract then the gene names for all genes encoded on chromosome X.</p>
+<pre class="r"><code>library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## List all available columns in the database.
+columns(edb)</code></pre>
+<pre><code>## [1] "ENTREZID" "EXONID" "EXONIDX" "EXONSEQEND"
+## [5] "EXONSEQSTART" "GENEBIOTYPE" "GENEID" "GENENAME"
+## [9] "GENESEQEND" "GENESEQSTART" "ISCIRCULAR" "SEQCOORDSYSTEM"
+## [13] "SEQLENGTH" "SEQNAME" "SEQSTRAND" "SYMBOL"
+## [17] "TXBIOTYPE" "TXCDSSEQEND" "TXCDSSEQSTART" "TXID"
+## [21] "TXNAME" "TXSEQEND" "TXSEQSTART"</code></pre>
+<pre class="r"><code>## Note that these do *not* correspond to the actual column names
+## of the database that can be passed to methods like exons, genes,
+## transcripts etc. These column names can be listed with the listColumns
+## method.
+listColumns(edb)</code></pre>
+<pre><code>## [1] "seq_name" "seq_length" "is_circular"
+## [4] "exon_id" "exon_seq_start" "exon_seq_end"
+## [7] "gene_id" "gene_name" "entrezid"
+## [10] "gene_biotype" "gene_seq_start" "gene_seq_end"
+## [13] "seq_name" "seq_strand" "seq_coord_system"
+## [16] "symbol" "name" "value"
+## [19] "tx_id" "tx_biotype" "tx_seq_start"
+## [22] "tx_seq_end" "tx_cds_seq_start" "tx_cds_seq_end"
+## [25] "gene_id" "tx_name" "tx_id"
+## [28] "exon_id" "exon_idx"</code></pre>
+<pre class="r"><code>## List all of the supported key types.
+keytypes(edb)</code></pre>
+<pre><code>## [1] "ENTREZID" "EXONID" "GENEBIOTYPE" "GENEID" "GENENAME"
+## [6] "SEQNAME" "SEQSTRAND" "SYMBOL" "TXBIOTYPE" "TXID"
+## [11] "TXNAME"</code></pre>
+<pre class="r"><code>## Get all gene ids from the database.
+gids <- keys(edb, keytype = "GENEID")
+length(gids)</code></pre>
+<pre><code>## [1] 64102</code></pre>
+<pre class="r"><code>## Get all gene names for genes encoded on chromosome Y.
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+head(gnames)</code></pre>
+<pre><code>## [1] "KDM5D" "DDX3Y" "ZFY" "TBL1Y" "PCDH11Y" "AMELY"</code></pre>
+<p>In the next example we retrieve specific information from the database using the <code>select</code> method. First we fetch all transcripts for the genes <em>BCL2</em> and <em>BCL2L11</em>. In the first call we provide the gene names, while in the second call we employ the filtering system to perform a more fine-grained query to fetch only the protein coding transcripts for these genes.</p>
+<pre class="r"><code>## Use the /standard/ way to fetch data.
+select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))</code></pre>
+<pre><code>## GENEID GENENAME TXID TXBIOTYPE
+## 1 ENSG00000171791 BCL2 ENST00000398117 protein_coding
+## 2 ENSG00000171791 BCL2 ENST00000333681 protein_coding
+## 3 ENSG00000171791 BCL2 ENST00000590515 processed_transcript
+## 4 ENSG00000171791 BCL2 ENST00000589955 protein_coding
+## 5 ENSG00000171791 BCL2 ENST00000444484 protein_coding
+## 6 ENSG00000153094 BCL2L11 ENST00000432179 protein_coding
+## 7 ENSG00000153094 BCL2L11 ENST00000308659 protein_coding
+## 8 ENSG00000153094 BCL2L11 ENST00000393256 protein_coding
+## 9 ENSG00000153094 BCL2L11 ENST00000393252 protein_coding
+## 10 ENSG00000153094 BCL2L11 ENST00000433098 nonsense_mediated_decay
+## 11 ENSG00000153094 BCL2L11 ENST00000405953 protein_coding
+## 12 ENSG00000153094 BCL2L11 ENST00000415458 nonsense_mediated_decay
+## 13 ENSG00000153094 BCL2L11 ENST00000436733 nonsense_mediated_decay
+## 14 ENSG00000153094 BCL2L11 ENST00000437029 nonsense_mediated_decay
+## 15 ENSG00000153094 BCL2L11 ENST00000452231 nonsense_mediated_decay
+## 16 ENSG00000153094 BCL2L11 ENST00000361493 nonsense_mediated_decay
+## 17 ENSG00000153094 BCL2L11 ENST00000431217 nonsense_mediated_decay
+## 18 ENSG00000153094 BCL2L11 ENST00000439718 nonsense_mediated_decay
+## 19 ENSG00000153094 BCL2L11 ENST00000438054 protein_coding
+## 20 ENSG00000153094 BCL2L11 ENST00000357757 protein_coding
+## 21 ENSG00000153094 BCL2L11 ENST00000393253 protein_coding
+## 22 ENSG00000153094 BCL2L11 ENST00000337565 protein_coding</code></pre>
+<pre class="r"><code>## Use the filtering system of ensembldb
+select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))</code></pre>
+<pre><code>## Note: ordering of the results might not match ordering of keys!</code></pre>
+<pre><code>## GENEID GENENAME TXID TXBIOTYPE
+## 1 ENSG00000171791 BCL2 ENST00000398117 protein_coding
+## 2 ENSG00000171791 BCL2 ENST00000333681 protein_coding
+## 3 ENSG00000171791 BCL2 ENST00000589955 protein_coding
+## 4 ENSG00000171791 BCL2 ENST00000444484 protein_coding
+## 5 ENSG00000153094 BCL2L11 ENST00000432179 protein_coding
+## 6 ENSG00000153094 BCL2L11 ENST00000308659 protein_coding
+## 7 ENSG00000153094 BCL2L11 ENST00000393256 protein_coding
+## 8 ENSG00000153094 BCL2L11 ENST00000393252 protein_coding
+## 9 ENSG00000153094 BCL2L11 ENST00000405953 protein_coding
+## 10 ENSG00000153094 BCL2L11 ENST00000438054 protein_coding
+## 11 ENSG00000153094 BCL2L11 ENST00000357757 protein_coding
+## 12 ENSG00000153094 BCL2L11 ENST00000393253 protein_coding
+## 13 ENSG00000153094 BCL2L11 ENST00000337565 protein_coding</code></pre>
+<p>Finally, we use the <code>mapIds</code> method to establish a mapping between ids and values. In the example below we fetch transcript ids for the two genes from the example above.</p>
+<pre class="r"><code>## Use the default method, which just returns the first value for multi mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")</code></pre>
+<pre><code>## BCL2 BCL2L11
+## "ENST00000398117" "ENST00000432179"</code></pre>
+<pre class="r"><code>## Alternatively, specify multiVals="list" to return all mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
+ multiVals = "list")</code></pre>
+<pre><code>## $BCL2
+## [1] "ENST00000398117" "ENST00000333681" "ENST00000590515" "ENST00000589955"
+## [5] "ENST00000444484"
+##
+## $BCL2L11
+## [1] "ENST00000432179" "ENST00000308659" "ENST00000393256"
+## [4] "ENST00000393252" "ENST00000433098" "ENST00000405953"
+## [7] "ENST00000415458" "ENST00000436733" "ENST00000437029"
+## [10] "ENST00000452231" "ENST00000361493" "ENST00000431217"
+## [13] "ENST00000439718" "ENST00000438054" "ENST00000357757"
+## [16] "ENST00000393253" "ENST00000337565"</code></pre>
+<pre class="r"><code>## And, just like before, we can use filters to map only to protein coding transcripts.
+mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column = "TXID",
+ multiVals = "list")</code></pre>
+<pre><code>## Warning in .mapIds(x = x, keys = keys, column = column, keytype =
+## keytype, : Got 2 filter objects. Will use the keys of the first for the
+## mapping!</code></pre>
+<pre><code>## Note: ordering of the results might not match ordering of keys!</code></pre>
+<pre><code>## $BCL2
+## [1] "ENST00000398117" "ENST00000333681" "ENST00000589955" "ENST00000444484"
+##
+## $BCL2L11
+## [1] "ENST00000432179" "ENST00000308659" "ENST00000393256" "ENST00000393252"
+## [5] "ENST00000405953" "ENST00000438054" "ENST00000357757" "ENST00000393253"
+## [9] "ENST00000337565"</code></pre>
+<p>Note that, if the filters are used, the ordering of the result does no longer match the ordering of the genes.</p>
+</div>
+<div id="important-notes" class="section level1">
+<h1><span class="header-section-number">9</span> Important notes</h1>
+<p>These notes might explain eventually unexpected results (and, more importantly, help avoiding them):</p>
+<ul>
+<li><p>The ordering of the results returned by the <code>genes</code>, <code>exons</code>, <code>transcripts</code> methods can be specified with the <code>order.by</code> parameter. The ordering of the results does however <strong>not</strong> correspond to the ordering of values in submitted filter objects. The exception is the <code>select</code> method. If a character vector of values or a single filter is passed with argument <code>keys</code> the ordering of results of this method [...]
+<li><p>Results of <code>exonsBy</code>, <code>transcriptsBy</code> are always ordered by the <code>by</code> argument.</p></li>
+<li><p>The CDS provided by <code>EnsDb</code> objects <strong>always</strong> includes both, the start and the stop codon.</p></li>
+<li><p>Transcripts with multiple CDS are at present not supported by <code>EnsDb</code>.</p></li>
+<li><p>At present, <code>EnsDb</code> support only genes/transcripts for which all of their exons are encoded on the same chromosome and the same strand.</p></li>
+</ul>
+</div>
+<div id="building-an-transcript-centric-database-package-based-on-ensembl-annotation" class="section level1">
+<h1><span class="header-section-number">10</span> Building an transcript-centric database package based on Ensembl annotation</h1>
+<p>The code in this section is not supposed to be automatically executed when the vignette is built, as this would require a working installation of the Ensembl Perl API, which is not expected to be available on each system. Also, building <code>EnsDb</code> from alternative sources, like GFF or GTF files takes some time and thus also these examples are not directly executed when the vignette is build.</p>
+<div id="requirements" class="section level2">
+<h2><span class="header-section-number">10.1</span> Requirements</h2>
+<p>The <code>fetchTablesFromEnsembl</code> function of the package uses the Ensembl Perl API to retrieve the required annotations from an Ensembl database (e.g. from the main site <em>ensembldb.ensembl.org</em>). Thus, to use the functionality to built databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).</p>
+<p>Alternatively, the <code>ensDbFromAH</code>, <code>ensDbFromGff</code>, <code>ensDbFromGRanges</code> and <code>ensDbFromGtf</code> functions allow to build EnsDb SQLite files from a <code>GRanges</code> object or GFF/GTF files from Ensembl (either provided as files or <em>via</em> <code>AnnotationHub</code>). These functions do not depend on the Ensembl Perl API, but require a working internet connection to fetch the chromosome lengths from Ensembl as these are not provided within GT [...]
+</div>
+<div id="building-annotation-packages" class="section level2">
+<h2><span class="header-section-number">10.2</span> Building annotation packages</h2>
+<p>The functions below use the Ensembl Perl API to fetch the required data directly from the Ensembl core databases. Thus, the path to the Perl API specific for the desired Ensembl version needs to be added to the <code>PERL5LIB</code> environment variable.</p>
+<p>An annotation package containing all human genes for Ensembl version 75 can be created using the code in the block below.</p>
+<pre class="r"><code>library(ensembldb)
+
+## get all human gene/transcript/exon annotations from Ensembl (75)
+## the resulting tables will be stored by default to the current working
+## directory
+fetchTablesFromEnsembl(75, species = "human")
+
+## These tables can then be processed to generate a SQLite database
+## containing the annotations (again, the function assumes the required
+## txt files to be present in the current working directory)
+DBFile <- makeEnsemblSQLiteFromTables()
+
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")</code></pre>
+<p>The generated package can then be build using <code>R CMD build EnsDb.Hsapiens.v75</code> and installed with <code>R CMD INSTALL EnsDb.Hsapiens.v75*</code>. Note that we could directly generate an <code>EnsDb</code> instance by loading the database file, i.e. by calling <code>edb <- EnsDb(DBFile)</code> and work with that annotation object.</p>
+<p>To fetch and build annotation packages for plant genomes (e.g. arabidopsis thaliana), the <em>Ensembl genomes</em> should be specified as a host, i.e. setting <code>host</code> to “mysql-eg-publicsql.ebi.ac.uk”, <code>port</code> to <code>4157</code> and <code>species</code> to e.g. “arabidopsis thaliana”.</p>
+<p>In the next example we create an <code>EnsDb</code> database using the <code>AnnotationHub</code> package and load also the corresponding genomic DNA sequence matching the Ensembl version. We thus first query the <code>AnnotationHub</code> package for all resources available for <code>Mus musculus</code> and the Ensembl release 77. Next we create the <code>EnsDb</code> object from the appropriate <code>AnnotationHub</code> resource. We then use the <code>getGenomeFaFile</code> method [...]
+<pre class="r"><code>## Load the AnnotationHub data.
+library(AnnotationHub)
+ah <- AnnotationHub()
+
+## Query all available files for Ensembl release 77 for
+## Mus musculus.
+query(ah, c("Mus musculus", "release-77"))
+
+## Get the resource for the gtf file with the gene/transcript definitions.
+Gtf <- ah["AH28822"]
+## Create a EnsDb database file from this.
+DbFile <- ensDbFromAH(Gtf)
+## We can either generate a database package, or directly load the data
+edb <- EnsDb(DbFile)
+
+
+## Identify and get the FaFile object with the genomic DNA sequence matching
+## the EnsDb annotation.
+Dna <- getGenomeFaFile(edb)
+library(Rsamtools)
+## We next retrieve the sequence of all exons on chromosome Y.
+exons <- exons(edb, filter = SeqnameFilter("Y"))
+exonSeq <- getSeq(Dna, exons)
+
+## Alternatively, look up and retrieve the toplevel DNA sequence manually.
+Dna <- ah[["AH22042"]]</code></pre>
+<p>In the example below we load a <code>GRanges</code> containing gene definitions for genes encoded on chromosome Y and generate a EnsDb SQLite database from that information.</p>
+<pre class="r"><code>## Generate a sqlite database from a GRanges object specifying
+## genes encoded on chromosome Y
+load(system.file("YGRanges.RData", package = "ensembldb"))
+Y</code></pre>
+<pre><code>## GRanges object with 7155 ranges and 16 metadata columns:
+## seqnames ranges strand | source
+## <Rle> <IRanges> <Rle> | <factor>
+## [1] Y [2652790, 2652894] + | snRNA
+## [2] Y [2652790, 2652894] + | snRNA
+## [3] Y [2652790, 2652894] + | snRNA
+## [4] Y [2654896, 2655740] - | protein_coding
+## [5] Y [2654896, 2655740] - | protein_coding
+## ... ... ... ... . ...
+## [7151] Y [28772667, 28773306] - | processed_pseudogene
+## [7152] Y [28772667, 28773306] - | processed_pseudogene
+## [7153] Y [59001391, 59001635] + | pseudogene
+## [7154] Y [59001391, 59001635] + | processed_pseudogene
+## [7155] Y [59001391, 59001635] + | processed_pseudogene
+## type score phase gene_id gene_name
+## <factor> <numeric> <integer> <character> <character>
+## [1] gene <NA> <NA> ENSG00000251841 RNU6-1334P
+## [2] transcript <NA> <NA> ENSG00000251841 RNU6-1334P
+## [3] exon <NA> <NA> ENSG00000251841 RNU6-1334P
+## [4] gene <NA> <NA> ENSG00000184895 SRY
+## [5] transcript <NA> <NA> ENSG00000184895 SRY
+## ... ... ... ... ... ...
+## [7151] transcript <NA> <NA> ENSG00000231514 FAM58CP
+## [7152] exon <NA> <NA> ENSG00000231514 FAM58CP
+## [7153] gene <NA> <NA> ENSG00000235857 CTBP2P1
+## [7154] transcript <NA> <NA> ENSG00000235857 CTBP2P1
+## [7155] exon <NA> <NA> ENSG00000235857 CTBP2P1
+## gene_source gene_biotype transcript_id transcript_name
+## <character> <character> <character> <character>
+## [1] ensembl snRNA <NA> <NA>
+## [2] ensembl snRNA ENST00000516032 RNU6-1334P-201
+## [3] ensembl snRNA ENST00000516032 RNU6-1334P-201
+## [4] ensembl_havana protein_coding <NA> <NA>
+## [5] ensembl_havana protein_coding ENST00000383070 SRY-001
+## ... ... ... ... ...
+## [7151] havana pseudogene ENST00000435741 FAM58CP-001
+## [7152] havana pseudogene ENST00000435741 FAM58CP-001
+## [7153] havana pseudogene <NA> <NA>
+## [7154] havana pseudogene ENST00000431853 CTBP2P1-001
+## [7155] havana pseudogene ENST00000431853 CTBP2P1-001
+## transcript_source exon_number exon_id tag
+## <character> <numeric> <character> <character>
+## [1] <NA> <NA> <NA> <NA>
+## [2] ensembl <NA> <NA> <NA>
+## [3] ensembl 1 ENSE00002088309 <NA>
+## [4] <NA> <NA> <NA> <NA>
+## [5] ensembl_havana <NA> <NA> CCDS
+## ... ... ... ... ...
+## [7151] havana <NA> <NA> <NA>
+## [7152] havana 1 ENSE00001616687 <NA>
+## [7153] <NA> <NA> <NA> <NA>
+## [7154] havana <NA> <NA> <NA>
+## [7155] havana 1 ENSE00001794473 <NA>
+## ccds_id protein_id
+## <character> <character>
+## [1] <NA> <NA>
+## [2] <NA> <NA>
+## [3] <NA> <NA>
+## [4] <NA> <NA>
+## [5] CCDS14772 <NA>
+## ... ... ...
+## [7151] <NA> <NA>
+## [7152] <NA> <NA>
+## [7153] <NA> <NA>
+## [7154] <NA> <NA>
+## [7155] <NA> <NA>
+## -------
+## seqinfo: 1 sequence from GRCh37 genome</code></pre>
+<pre class="r"><code>DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+ organism = "Homo_sapiens")</code></pre>
+<pre><code>## Warning in ensDbFromGRanges(Y, path = tempdir(), version = 75, organism
+## = "Homo_sapiens"): I'm missing column(s): 'entrezid'. The corresponding
+## database column(s) will be empty!</code></pre>
+<pre class="r"><code>edb <- EnsDb(DB)
+edb</code></pre>
+<pre><code>## EnsDb for Ensembl:
+## |Backend: SQLite
+## |Db type: EnsDb
+## |Type of Gene ID: Ensembl Gene ID
+## |Supporting package: ensembldb
+## |Db created by: ensembldb package from Bioconductor
+## |script_version: 0.0.1
+## |Creation time: Wed Nov 16 19:52:30 2016
+## |ensembl_version: 75
+## |ensembl_host: unknown
+## |Organism: Homo_sapiens
+## |genome_build: GRCh37
+## |DBSCHEMAVERSION: 1.0
+## |source_file: GRanges object
+## | No. of genes: 495.
+## | No. of transcripts: 731.</code></pre>
+<pre class="r"><code>## As shown in the example below, we could make an EnsDb package on
+## this DB object using the makeEnsembldbPackage function.</code></pre>
+<p>Alternatively we can build the annotation database using the <code>ensDbFromGtf</code> <code>ensDbFromGff</code> functions, that extracts most of the required data from a GTF respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from <a href="ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens" class="uri">ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens</a> for human gene definitions from Ensembl version 75; for plant genomes etc files can be retrieved f [...]
+<p>Below we create the annotation from a gtf file that we fetch directly from Ensembl.</p>
+<pre class="r"><code>library(ensembldb)
+
+## the GTF file can be downloaded from
+## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+## generate the SQLite database file
+DB <- ensDbFromGtf(gtf = gtffile)
+
+## load the DB file directly
+EDB <- EnsDb(DB)
+
+## alternatively, build the annotation package
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")</code></pre>
+</div>
+</div>
+<div id="database-layout" class="section level1">
+<h1><span class="header-section-number">11</span> Database layout<a id="orgtarget1"></a></h1>
+<p>The database consists of the following tables and attributes (the layout is also shown in Figure <a href="#orgparagraph1">115</a>):</p>
+<ul>
+<li><strong>gene</strong>: all gene specific annotations.
+<ul>
+<li><code>gene_id</code>: the Ensembl ID of the gene.</li>
+<li><code>gene_name</code>: the name (symbol) of the gene.</li>
+<li><code>entrezid</code>: the NCBI Entrezgene ID(s) of the gene. Note that this can be a <code>;</code> separated list of IDs for genes that are mapped to more than one Entrezgene.</li>
+<li><code>gene_biotype</code>: the biotype of the gene.</li>
+<li><code>gene_seq_start</code>: the start coordinate of the gene on the sequence (usually a chromosome).</li>
+<li><code>gene_seq_end</code>: the end coordinate of the gene on the sequence.</li>
+<li><code>seq_name</code>: the name of the sequence (usually the chromosome name).</li>
+<li><code>seq_strand</code>: the strand on which the gene is encoded.</li>
+<li><code>seq_coord_system</code>: the coordinate system of the sequence.</li>
+</ul></li>
+<li><strong>tx</strong>: all transcript related annotations. Note that while no <code>tx_name</code> column is available in this database column, all methods to retrieve data from the database support also this column. The returned values are however the ID of the transcripts.
+<ul>
+<li><code>tx_id</code>: the Ensembl transcript ID.</li>
+<li><code>tx_biotype</code>: the biotype of the transcript.</li>
+<li><code>tx_seq_start</code>: the start coordinate of the transcript.</li>
+<li><code>tx_seq_end</code>: the end coordinate of the transcript.</li>
+<li><code>tx_cds_seq_start</code>: the start coordinate of the coding region of the transcript (NULL for non-coding transcripts).</li>
+<li><code>tx_cds_seq_end</code>: the end coordinate of the coding region of the transcript.</li>
+<li><code>gene_id</code>: the gene to which the transcript belongs.</li>
+</ul></li>
+<li><strong>exon</strong>: all exon related annotation.
+<ul>
+<li><code>exon_id</code>: the Ensembl exon ID.</li>
+<li><code>exon_seq_start</code>: the start coordinate of the exon.</li>
+<li><code>exon_seq_end</code>: the end coordinate of the exon.</li>
+</ul></li>
+<li><strong>tx2exon</strong>: provides the n:m mapping between transcripts and exons.
+<ul>
+<li><code>tx_id</code>: the Ensembl transcript ID.</li>
+<li><code>exon_id</code>: the Ensembl exon ID.</li>
+<li><code>exon_idx</code>: the index of the exon in the corresponding transcript, always from 5’ to 3’ of the transcript.</li>
+</ul></li>
+<li><strong>chromosome</strong>: provides some information about the chromosomes.
+<ul>
+<li><code>seq_name</code>: the name of the sequence/chromosome.</li>
+<li><code>seq_length</code>: the length of the sequence.</li>
+<li><code>is_circular</code>: whether the sequence in circular.</li>
+</ul></li>
+<li><strong>information</strong>: some additional, internal, informations (Genome build, Ensembl version etc).
+<ul>
+<li><code>key</code></li>
+<li><code>value</code></li>
+</ul></li>
+<li><em>virtual</em> columns:
+<ul>
+<li><code>symbol</code>: the database does not have such a database column, but it is still possible to use it in the <code>columns</code> parameter. This column is <em>symlinked</em> to the <code>gene_name</code> column.</li>
+<li><code>tx_name</code>: similar to the <code>symbol</code> column, this column is <em>symlinked</em> to the <code>tx_id</code> column.</li>
+</ul></li>
+</ul>
+<div class="figure">
+<img src="data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAACe0AAAhRCAYAAAB251FjAAAABmJLR0QA/wD/AP+gvaeTAAAACXBIWXMAABYlAAAWJQFJUiTwAAAAB3RJTUUH3wMSCDkmQJa4YQAAACZpVFh0Q29tbWVudAAAAAAAQ3JlYXRlZCB3aXRoIEdJTVAgb24gYSBNYWOV5F9bAAAgAElEQVR42uzdd5wcdf3H8ffsbL1+SUghJLkkJKQgUgREARFEioIgiiCo/PDnD6UoWADBgoIgiop0FQQBEcSCSFMiKATpCSU9kN7ucrl+23fn98fc7O3s7l2ubvYur+fjMY+Z+c7c7Oxndvd2dj7z+RrLly2z5sydK/TfiuXLJUnEj/gRP+JH/ED8iB/xI34gfsSP+BE/ED/iR/yIH4gf8SN+xA/Ej/gRP+JH/Igf8SN+xI/4gfj1hYdDDwAAAAAAAAAAAAAAAABA [...]
+<p class="caption">img</p>
+</div>
+<div id="footnotes">
+<h2 class="footnotes">
+Footnotes:
+</h2>
+<div id="text-footnotes">
+<div class="footdef">
+<sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup>
+<div class="footpara">
+<a href="http://www.ensembl.org" class="uri">http://www.ensembl.org</a>
+</div>
+</div>
+<div class="footdef">
+<sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup>
+<div class="footpara">
+<a href="http://www.lrg-sequence.org" class="uri">http://www.lrg-sequence.org</a>
+</div>
+</div>
+<div class="footdef">
+<sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup>
+<div class="footpara">
+<a href="http://www.ncbi.nlm.nih.gov/pubmed/23950696" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/23950696</a>
+</div>
+</div>
+<div class="footdef">
+<sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup>
+<div class="footpara">
+<a href="http://www.ncbi.nlm.nih.gov/pubmed/24227677" class="uri">http://www.ncbi.nlm.nih.gov/pubmed/24227677</a>
+</div>
+</div>
+<div class="footdef">
+<sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup>
+<div class="footpara">
+<a href="http://www.ensembl.org/info/docs/api/api_installation.html" class="uri">http://www.ensembl.org/info/docs/api/api_installation.html</a>
+</div>
+</div>
+</div>
+</div>
+</div>
+
+
+
+
+</div>
+
+<script>
+
+// add bootstrap table styles to pandoc tables
+$(document).ready(function () {
+ $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
+});
+
+
+</script>
+
+<script type="text/x-mathjax-config">
+ MathJax.Hub.Config({
+ TeX: {
+ TagSide: "right",
+ equationNumbers: {
+ autoNumber: "AMS"
+ }
+ },
+ "HTML-CSS": {
+ styles: {
+ ".MathJax_Display": {
+ "text-align": "center",
+ padding: "0px 150px 0px 65px",
+ margin: "0px 0px 0.5em"
+ },
+ }
+ }
+ });
+</script>
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+ (function () {
+ var script = document.createElement("script");
+ script.type = "text/javascript";
+ script.src = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+ document.getElementsByTagName("head")[0].appendChild(script);
+ })();
+</script>
+
+</body>
+</html>
diff --git a/inst/perl/get_gene_transcript_exon_tables.pl b/inst/perl/get_gene_transcript_exon_tables.pl
new file mode 100644
index 0000000..e54d108
--- /dev/null
+++ b/inst/perl/get_gene_transcript_exon_tables.pl
@@ -0,0 +1,278 @@
+#!/usr/bin/perl
+#####################################
+## version 0.0.2: * get also gene_seq_start, gene_seq_end, tx_seq_start and tx_seq_end from the database!
+## * did rename chrom_start to seq_start.
+
+## uses environment variable ENS pointing to the
+## ENSEMBL API on the computer
+use lib $ENV{ENS} || $ENV{PERL5LIB};
+use IO::File;
+use Getopt::Std;
+use strict;
+use warnings;
+use Bio::EnsEMBL::ApiVersion;
+use Bio::EnsEMBL::Registry;
+## unification function for arrays
+use List::MoreUtils qw/ uniq /;
+my $script_version = "0.1.3";
+
+## connecting to the ENSEMBL data base
+use Bio::EnsEMBL::Registry;
+use Bio::EnsEMBL::ApiVersion;
+my $user = "anonymous";
+my $host = "ensembldb.ensembl.org";
+my $port = 5306;
+my $pass = "";
+my $registry = 'Bio::EnsEMBL::Registry';
+my $ensembl_version="none";
+my $ensembl_database="core";
+my $species = "human";
+my $slice;
+my $coord_system_version="unknown";
+## get all gene ids defined in the database...
+my @gene_ids = ();
+
+my %option=();
+getopts("e:hH:P:p:U:s:",\%option);
+if($option{ h }){
+ ## print help and exit.
+ print("\nget_gene_transcript_exon_tables version ".$script_version.".\n");
+ print("Retrieves gene/transcript/exon annotations from Ensembl and stores it as tabulator delimited text files.\n\n");
+ print("usage: perl get_gene_transcript_exon_tables -e:hH:P:U:s:\n");
+ print("-e (required): the Ensembl version (e.g. -e 75). The function will internally check if the submitted version matches the Ensembl API version and database queried.\n");
+ print("-H (optional): the hostname of the Ensembl database; defaults to ensembldb.ensembl.org.\n");
+ print("-h (optional): print this help message.\n");
+ print("-p (optional): the port to access the Ensembl database.\n");
+ print("-P (optional): the password to access the Ensembl database.\n");
+ print("-U (optional): the username to access the Ensembl database.\n");
+ print("-s (optional): the species; defaults to human.\n");
+ print("\n\nThe script will generate the following tables:\n");
+ print("- ens_gene.txt: contains all genes defined in Ensembl.\n");
+ print("- ens_transcript.txt: contains all transcripts of all genes.\n");
+ print("- ens_exon.txt: contains all (unique) exons, along with their genomic alignment.\n");
+ print("- ens_tx2exon.txt: relates transcript ids to exon ids (m:n), along with the index of the exon in the respective transcript (since the same exon can be part of different transcripts and have a different index in each transcript).\n");
+ print("- ens_chromosome.txt: the information of all chromosomes (chromosome/sequence/contig names). \n");
+ print("- ens_metadata.txt\n");
+ exit 0;
+}
+
+if(defined($option{ s })){
+ $species=$option{ s };
+}
+if(defined($option{ U })){
+ $user=$option{ U };
+}
+if(defined($option{ H })){
+ $host=$option{ H };
+}
+if(defined($option{ P })){
+ $pass=$option{ P };
+}
+if(defined($option{ p })){
+ $port=$option{ p };
+}
+if(defined($option{ e })){
+ $ensembl_version=$option{ e };
+}else{
+ die("The ensembl version has to be specified with the -e parameter (e.g. -e 75)");
+}
+
+my $api_version="".software_version()."";
+if($ensembl_version ne $api_version){
+ die "The submitted Ensembl version (".$ensembl_version.") does not match the version of the Ensembl API (".$api_version."). Please configure the environment variable ENS to point to the correct API.";
+}
+
+print "Connecting to ".$host." at port ".$port."\n";
+
+# $registry->load_registry_from_db(-host => $host, -user => $user,
+# -pass => $pass, -port => $port,
+# -verbose => "1");
+$registry->load_registry_from_db(-host => $host, -user => $user,
+ -pass => $pass, -port => $port);
+my $gene_adaptor = $registry->get_adaptor($species, $ensembl_database, "gene");
+my $slice_adaptor = $registry->get_adaptor($species, $ensembl_database, "slice");
+
+## determine the species:
+my $species_id = $gene_adaptor->db->species_id;
+my $species_ens = $gene_adaptor->db->species;
+
+my $infostring = "# get_gene_transcript_exon_tables.pl version $script_version:\nRetrieve gene models for Ensembl version $ensembl_version, species $species from Ensembl database at host: $host\n";
+
+print $infostring;
+
+## preparing output files:
+open(GENE , ">ens_gene.txt");
+print GENE "gene_id\tgene_name\tentrezid\tgene_biotype\tgene_seq_start\tgene_seq_end\tseq_name\tseq_strand\tseq_coord_system\n";
+
+open(TRANSCRIPT , ">ens_tx.txt");
+print TRANSCRIPT "tx_id\ttx_biotype\ttx_seq_start\ttx_seq_end\ttx_cds_seq_start\ttx_cds_seq_end\tgene_id\n";
+
+open(EXON , ">ens_exon.txt");
+print EXON "exon_id\texon_seq_start\texon_seq_end\n";
+
+# open(G2T , ">ens_gene2transcript.txt");
+# print G2T "g2t_gene_id\tg2t_tx_id\n";
+
+open(T2E , ">ens_tx2exon.txt");
+print T2E "tx_id\texon_id\texon_idx\n";
+
+open(CHR , ">ens_chromosome.txt");
+print CHR "seq_name\tseq_length\tis_circular\n";
+
+##OK now running the stuff:
+print "Start fetching data\n";
+my %done_chromosomes=();
+my %done_exons=(); ## to keep track of which exons have already been saved.
+my $counta = 0;
+ at gene_ids = @{$gene_adaptor->list_stable_ids()};
+foreach my $gene_id (@gene_ids){
+ $counta++;
+ if(($counta % 2000) == 0){
+ print "processed $counta genes\n";
+ }
+ my $orig_gene;
+
+ $orig_gene = $gene_adaptor->fetch_by_stable_id($gene_id);
+ if(defined $orig_gene){
+ my $do_transform=1;
+ my $gene = $orig_gene->transform("chromosome");
+ if(!defined $gene){
+ ## gene is not on known defined chromosomes!
+ $gene = $orig_gene;
+ $do_transform=0;
+ }
+ my $coord_system = $gene->coord_system_name;
+ my $chrom = $gene->slice->seq_region_name;
+ my $strand = $gene->strand;
+
+ ## check if we did already fetch some info for that chromosome
+ if(exists($done_chromosomes{ $chrom })){
+ ## don't do anything...
+ }else{
+ $done_chromosomes{ $chrom } = "done";
+ my $chr_slice = $gene->slice->seq_region_Slice();
+ my $name = $chr_slice->seq_region_name;
+ my $length = $chr_slice->length;
+ my $is_circular = $chr_slice->is_circular;
+ print CHR "$name\t$length\t$is_circular\n";
+ my $chr_slice_again = $slice_adaptor->fetch_by_region('chromosome', $chrom);
+ if(defined($chr_slice_again)){
+ $coord_system_version = $chr_slice_again->coord_system()->version();
+ }
+ # if(defined $chr_slice){
+ # my $name = $chr_slice->seq_region_name;
+ # my $length = $chr_slice->length;
+ # my $is_circular = $chr_slice->is_circular;
+ # $coord_system_version = $chr_slice->coord_system()->version();
+ # print CHR "$name\t$length\t$is_circular\n";
+ # }else{
+ # my $length = $gene->slice->seq_region_length();
+ # print CHR "$chrom\t0\t0\n";
+ # }
+ }
+
+ ## get information for the gene.
+ my $gene_external_name= $gene->external_name;
+ if(!defined($gene_external_name)){
+ $gene_external_name="";
+ }
+ my $gene_biotype = $gene->biotype;
+ my $gene_seq_start = $gene->start;
+ my $gene_seq_end = $gene->end;
+ ## get entrezgene ID, if any...
+ my $all_entries = $gene->get_all_DBLinks("EntrezGene");
+ my %entrezgene_hash=();
+ foreach my $dbe (@{$all_entries}){
+ $entrezgene_hash{ $dbe->primary_id } = 1;
+ }
+ my $hash_size = keys %entrezgene_hash;
+ my $entrezid = "";
+ if($hash_size > 0){
+ $entrezid = join(";", keys %entrezgene_hash);
+ }
+ print GENE "$gene_id\t$gene_external_name\t$entrezid\t$gene_biotype\t$gene_seq_start\t$gene_seq_end\t$chrom\t$strand\t$coord_system\n";
+
+ ## process transcript(s)
+ my @transcripts = @{ $gene->get_all_Transcripts };
+ ## ok looping through the transcripts
+ foreach my $transcript (@transcripts){
+ if($do_transform==1){
+ ## just to be shure that we have the transcript in chromosomal coordinations.
+ $transcript = $transcript->transform("chromosome");
+ }
+ ##my $tx_start = $transcript->start;
+ ##my $tx_end = $transcript->end;
+
+ ## caution!!! will get undef if transcript is non-coding!
+ my $tx_cds_start = $transcript->coding_region_start;
+ if(!defined($tx_cds_start)){
+ $tx_cds_start = "NULL";
+ }
+ my $tx_cds_end = $transcript->coding_region_end;
+ if(!defined($tx_cds_end)){
+ $tx_cds_end = "NULL";
+ }
+ my $tx_id = $transcript->stable_id;
+ my $tx_biotype = $transcript->biotype;
+ my $tx_seq_start = $transcript->start;
+ my $tx_seq_end = $transcript->end;
+ ## write info.
+ print TRANSCRIPT "$tx_id\t$tx_biotype\t$tx_seq_start\t$tx_seq_end\t$tx_cds_start\t$tx_cds_end\t$gene_id\n";
+## print G2T "$gene_id\t$tx_id\n";
+
+ ## process exon(s)
+ ##my @exons = @{ $transcript->get_all_Exons(-constitutive => 1) };
+ my @exons = @{ $transcript->get_all_Exons() }; ## exons always returned 5' 3' of transcript!
+ my $current_exon_idx = 1;
+ foreach my $exon (@exons){
+ if($do_transform==1){
+ $exon->transform("chromosome");
+ }
+ my $exon_start = $exon->start;
+ my $exon_end = $exon->end;
+ my $exon_id = $exon->stable_id;
+
+ ## write info, but only if we didn't already saved this exon (exon can be
+ ## part of more than one transcript).
+ if(exists($done_exons{ $exon_id })){
+ ## don't do anything.
+ }else{
+ $done_exons{ $exon_id } = 1;
+ print EXON "$exon_id\t$exon_start\t$exon_end\n";
+ }
+ ## saving the exon id to this file that provides the n:m mappint; also saving
+ ## the index of the exon in the present transcript to that.
+ print T2E "$tx_id\t$exon_id\t$current_exon_idx\n";
+
+ $current_exon_idx++;
+ }
+ }
+ }
+}
+
+## want to save:
+## data, ensembl host, species, ensembl version, genome build?
+open(INFO , ">ens_metadata.txt");
+print INFO "name\tvalue\n";
+print INFO "Db type\tEnsDb\n";
+print INFO "Type of Gene ID\tEnsembl Gene ID\n";
+print INFO "Supporting package\tensembldb\n";
+print INFO "Db created by\tensembldb package from Bioconductor\n";
+print INFO "script_version\t$script_version\n";
+print INFO "Creation time\t".localtime()."\n";
+print INFO "ensembl_version\t$ensembl_version\n";
+print INFO "ensembl_host\t$host\n";
+print INFO "Organism\t$species_ens\n";
+print INFO "genome_build\t$coord_system_version\n";
+print INFO "DBSCHEMAVERSION\t1.0\n";
+
+close(INFO);
+
+close(GENE);
+close(TRANSCRIPT);
+close(EXON);
+##close(G2T);
+close(T2E);
+close(CHR);
+
+
diff --git a/inst/pkg-template/DESCRIPTION b/inst/pkg-template/DESCRIPTION
new file mode 100644
index 0000000..a91bb87
--- /dev/null
+++ b/inst/pkg-template/DESCRIPTION
@@ -0,0 +1,15 @@
+Package: @PKGNAME@
+Title: @PKGTITLE@
+Description: @PKGDESCRIPTION@
+Version: @PKGVERSION@
+Author: @AUTHOR@
+Maintainer: @MAINTAINER@
+Depends: ensembldb
+License: @LIC@
+organism: @ORGANISM@
+species: @SPECIES@
+provider: @PROVIDER@
+provider_version: @PROVIDERVERSION@
+release_date: @RELEASEDATE@
+resource_url: @SOURCEURL@
+biocViews: AnnotationData, EnsDb, @ORGANISMBIOCVIEW@
diff --git a/inst/pkg-template/NAMESPACE b/inst/pkg-template/NAMESPACE
new file mode 100644
index 0000000..fb37b70
--- /dev/null
+++ b/inst/pkg-template/NAMESPACE
@@ -0,0 +1,9 @@
+##import(AnnotationDbi)
+#import(GenomicFeatures)
+import(ensembldb)
+
+### Don't export @TXDBOBJNAME@ (the object defined in this
+### package): it is created and dynamically exported at load time (refer
+### to R/zzz.R for the details).
+
+
diff --git a/inst/pkg-template/R/zzz.R b/inst/pkg-template/R/zzz.R
new file mode 100644
index 0000000..c3c9b09
--- /dev/null
+++ b/inst/pkg-template/R/zzz.R
@@ -0,0 +1,18 @@
+###
+### Load any db objects whenever the package is loaded.
+###
+
+.onLoad <- function(libname, pkgname)
+{
+ ns <- asNamespace(pkgname)
+ path <- system.file("extdata", package=pkgname, lib.loc=libname)
+ files <- dir(path)
+ for(i in seq_len(length(files))){
+ db <- EnsDb(system.file("extdata", files[[i]], package=pkgname,
+ lib.loc=libname))
+ objname <- sub(".sqlite$","",files[[i]])
+ assign(objname, db, envir=ns)
+ namespaceExport(ns, objname)
+ }
+}
+
diff --git a/inst/pkg-template/man/package.Rd b/inst/pkg-template/man/package.Rd
new file mode 100644
index 0000000..a671567
--- /dev/null
+++ b/inst/pkg-template/man/package.Rd
@@ -0,0 +1,40 @@
+\name{@TXDBOBJNAME@}
+\docType{package}
+
+\alias{@PKGNAME at -package}
+\alias{@PKGNAME@}
+\alias{@TXDBOBJNAME@}
+
+
+\title{@PKGTITLE@}
+
+\description{
+ This package loads an SQL connection to a database containing
+ annotations from Ensembl. For examples and help on functions see the
+ help pages from the \code{ensembldb} package!
+}
+
+\note{
+ This data package was made from resources at @PROVIDER@ on
+ @RELEASEDATE@ and based on the @PROVIDERVERSION@
+}
+
+\author{@AUTHOR@}
+
+
+
+\examples{
+## load the library
+##library(@PKGNAME@)
+## list the contents that are loaded into memory
+ls('package:@PKGNAME@')
+## show the db object that is loaded by calling it's name
+ at PKGNAME@
+
+## for more examples see the ensembldb package.
+
+
+}
+
+\keyword{package}
+\keyword{data}
diff --git a/inst/shinyHappyPeople/server.R b/inst/shinyHappyPeople/server.R
new file mode 100644
index 0000000..d169d05
--- /dev/null
+++ b/inst/shinyHappyPeople/server.R
@@ -0,0 +1,242 @@
+## list all packages...
+packs <- installed.packages()
+epacks <- packs[grep(packs, pattern="^Ens")]
+
+## library(EnsDb.Hsapiens.v75)
+## edb <- EnsDb.Hsapiens.v75
+
+TheFilter <- function(input){
+ Cond <- input$condition
+ ## check if we've got something to split...
+ Vals <- input$geneName
+ ## check if we've got ,
+ if(length(grep(Vals, pattern=",")) > 0){
+ ## don't want whitespaces here...
+ Vals <- gsub(Vals, pattern=" ", replacement="", fixed=TRUE)
+ Vals <- unlist(strsplit(Vals, split=","))
+ }
+ if(length(grep(Vals, pattern=" ", fixed=TRUE)) > 0){
+ Vals <- unlist(strsplit(Vals, split=" ", fixed=TRUE))
+ }
+ if(input$type=="Gene name"){
+ return(GenenameFilter(Vals, condition=Cond))
+ }
+ if(input$type=="Chrom name"){
+ return(SeqnameFilter(Vals, condition=Cond))
+ }
+ if(input$type=="Gene biotype"){
+ return(GenebiotypeFilter(Vals, condition=Cond))
+ }
+ if(input$type=="Tx biotype"){
+ return(TxbiotypeFilter(Vals, condition=Cond))
+ }
+}
+
+## checkSelectedPackage <- function(input){
+## if(is.null(input$package)){
+## return(FALSE)
+## }else{
+## require(input$package, character.only=TRUE)
+## message("Assigning ", input$package, " to variable edb.")
+## assign("edb", get(input$package), envir=globalenv())
+## return(TRUE)
+## }
+## }
+
+## Based on the given EnsDb package name it loads the library and returns
+## the object.
+getEdb <- function(x){
+ require(x, character.only=TRUE)
+ return(get(x))
+}
+
+## Define server logic required to draw a histogram
+shinyServer(function(input, output) {
+
+ ## Generate the select field for the package...
+ output$packages <- renderUI(
+ selectInput("package", "Select installed EnsDb package", as.list(epacks))
+ )
+
+ selectedPackage <- reactive({
+ names(epacks) <- epacks
+ ## epacks <- sapply(epacks, as.symbol)
+ ## load the package.
+ if(length(input$package) > 0){
+ require(input$package, character.only=TRUE)
+ ## Actually, should be enough to just return input$package...
+ ##return(switch(input$package, epacks))
+ return(getEdb(input$package))
+ }else{
+ return(NULL)
+ }
+ })
+
+ ## Metadata infos
+ output$metadata_organism <- renderText({
+ edb <- selectedPackage()
+ if(!is.null(edb)){
+ ## db <- getEdb(edb)
+ paste0("Organism: ", organism(edb))
+ }
+
+ })
+ output$metadata_ensembl <- renderText({
+ edb <- selectedPackage()
+ if(!is.null(edb)){
+ ## db <- getEdb(edb)
+ md <- metadata(edb)
+ rownames(md) <- md$name
+ paste0("Ensembl version: ", md["ensembl_version", "value"])
+ }
+ })
+ output$metadata_genome <- renderText({
+ edb <- selectedPackage()
+ if(!is.null(edb)){
+ ## db <- getEdb(edb)
+ md <- metadata(edb)
+ rownames(md) <- md$name
+ paste0("Genome build: ", md["genome_build", "value"])
+ }
+ })
+
+ output$genename <- renderText({
+ if(length(input$geneName) > 0){
+ input$geneName
+ }else{
+ return()
+ }
+ })
+ ## That's the actual queries for genes, transcripts and exons...
+ output$Genes <- renderDataTable({
+ ## if(!checkSelectedPackage(input))
+ ## return()
+ if(length(input$package) == 0)
+ return(NULL)
+ if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+ edb <- selectedPackage()
+ res <- genes(edb, filter=TheFilter(input),
+ return.type="data.frame")
+ assign(".ENS_TMP_RES", res, envir=globalenv())
+ return(res)
+ }
+ })
+ output$Transcripts <- renderDataTable({
+ if(length(input$package) == 0)
+ return(NULL)
+ if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+ edb <- selectedPackage()
+ res <- transcripts(edb, filter=TheFilter(input),
+ return.type="data.frame")
+ assign(".ENS_TMP_RES", res, envir=globalenv())
+ return(res)
+ }
+ })
+ output$Exons <- renderDataTable({
+ if(length(input$package) == 0)
+ return(NULL)
+ if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+ edb <- selectedPackage()
+ res <- exons(edb, filter=TheFilter(input),
+ return.type="data.frame")
+ assign(".ENS_TMP_RES", res, envir=globalenv())
+ return(res)
+ }
+ })
+ observe({
+ if(input$closeButton > 0){
+ ## OK, now, gather all the data and return it in the selected format.
+ edb <- selectedPackage()
+ resType <- input$returnType
+ resTab <- input$resultTab
+ res <- NULL
+ ## If result type is data.frame we just return what we've got.
+ if(resType == "data.frame"){
+ res <- get(".ENS_TMP_RES")
+ }else{
+ ## Otherwise we have to fetch a little bit more data, thus, we perform the
+ ## query again and return it as GRanges.
+ if(resTab == "Genes")
+ res <- genes(edb, filter=TheFilter(input), return.type="GRanges")
+ if(resTab == "Transcripts")
+ res <- transcripts(edb,filter=TheFilter(input), return.type="GRanges")
+ if(resTab == "Exons")
+ res <- exons(edb,filter=TheFilter(input), return.type="GRanges")
+ }
+ rm(".ENS_TMP_RES", envir=globalenv())
+ stopApp(res)
+ }
+ })
+})
+
+
+
+## ## Define server logic required to draw a histogram
+## shinyServer(function(input, output) {
+
+## ## generate the select field for the package...
+## output$packages <- renderUI(
+## selectInput("package", "Select EnsDb package", as.list(epacks))
+## )
+
+## ## generating metadata info.
+## output$metadata_organism <- renderText({
+## if(!checkSelectedPackage(input))
+## return()
+## paste0("Organism: ", organism(edb))
+## })
+## output$metadata_ensembl <- renderText({
+## if(!checkSelectedPackage(input))
+## return()
+## md <- metadata(edb)
+## rownames(md) <- md$name
+## paste0("Ensembl version: ", md["ensembl_version", "value"])
+## })
+## output$metadata_genome <- renderText({
+## if(!checkSelectedPackage(input))
+## return()
+## md <- metadata(edb)
+## rownames(md) <- md$name
+## paste0("Genome build: ", md["genome_build", "value"])
+## })
+## ## output$genename <- renderText({
+## ## if(!checkSelectedPackage(input))
+## ## return()
+## ## input$geneName
+## ## })
+## ## That's the actual queries for genes, transcripts and exons...
+## output$Genes <- renderDataTable({
+## if(!checkSelectedPackage(input))
+## return()
+## if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+## res <- genes(edb, filter=TheFilter(input),
+## return.type="data.frame")
+## return(res)
+## }
+## })
+## output$Transcripts <- renderDataTable({
+## if(!checkSelectedPackage(input))
+## return()
+## if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+## res <- transcripts(edb, filter=TheFilter(input),
+## return.type="data.frame")
+## return(res)
+## }
+## })
+## output$Exons <- renderDataTable({
+## if(!checkSelectedPackage(input))
+## return()
+## if(!is.na(input$geneName) & length(input$geneName) > 0 & input$geneName!=""){
+## res <- exons(edb, filter=TheFilter(input),
+## return.type="data.frame")
+## return(res)
+## }
+## })
+## ## observe({
+## ## if(input$Close > 0){
+## ## stopApp("AAARGHHH")
+## ## }
+## ## })
+## })
+
+
diff --git a/inst/shinyHappyPeople/ui.R b/inst/shinyHappyPeople/ui.R
new file mode 100644
index 0000000..7e562d9
--- /dev/null
+++ b/inst/shinyHappyPeople/ui.R
@@ -0,0 +1,154 @@
+library(shiny)
+## start with runApp("jo_test")
+
+shinyUI(fluidPage(
+
+ ## Application title
+ titlePanel("Get gene/transcript/exon annotations"),
+
+ fluidRow(
+ shiny::column(3,
+ uiOutput("packages")
+ ),
+ shiny::column(3,
+ ## div(
+ h4("EnsDb annotation:"),
+ textOutput("metadata_organism"),
+ textOutput("metadata_ensembl"),
+ textOutput("metadata_genome"),
+ " "
+ ## )
+ ),
+ shiny::column(4,
+ h4("Hints:"),
+ tags$li("Enter comma or whitespace separated values to search for multiple e.g. genes."),
+ tags$li("Use % and condition like for partial matching."))
+ ),
+ fluidRow(
+ shiny::column(4,
+ " "
+ )
+ ),
+ fluidRow(
+ shiny::column(2,
+ selectInput("type", NA,
+ choices=c("Gene name", "Chrom name",
+ "Gene biotype", "Tx biotype"),
+ selected="Gene name")
+ ),
+ shiny::column(1,
+ selectInput("condition", NA,
+ choices=c("=", "!=", "like", "in"),
+ selected="=")
+ ),
+ shiny::column(2,
+ textInput("geneName", NA, value="")
+ )
+ ),
+ fluidRow(
+ mainPanel(
+ tabsetPanel(
+ tabPanel('Genes',
+ dataTableOutput("Genes")
+ ),
+ tabPanel('Transcripts',
+ dataTableOutput("Transcripts")
+ ),
+ tabPanel('Exons',
+ dataTableOutput("Exons")
+ )
+ , id="resultTab"
+ )
+ )
+ ),
+ wellPanel(
+ fluidRow(
+ shiny::column(2,
+ "Return results as "
+ ),
+ shiny::column(2,
+ selectInput("returnType", NA,
+ choices=c("data.frame", "GRanges"),
+ selected="data.frame")
+ ),
+ shiny::column(2,
+ actionButton("closeButton", "Return & close")
+ )
+ )
+ )
+))
+
+
+
+## shinyUI(fluidPage(
+
+## ## Application title
+## titlePanel("Get gene/transcript/exon annotations"),
+
+## fluidRow(
+## shiny::column(3,
+## uiOutput("packages")
+## ),
+## shiny::column(3,
+## ## div(
+## h4("EnsDb annotation:"),
+## textOutput("metadata_organism"),
+## textOutput("metadata_ensembl"),
+## textOutput("metadata_genome"),
+## " "
+## ## )
+## ),
+## shiny::column(4,
+## h4("Hints:"),
+## tags$li("Enter comma or whitespace separated values to search for multiple e.g. genes."),
+## tags$li("Use % and condition like for partial matching."),
+## tags$li("The selected database is assigned to the environment variable ENS_DB."))
+## ),
+## fluidRow(
+## shiny::column(4,
+## " "
+## )
+## ## )
+## ),
+## fluidRow(
+## shiny::column(2,
+## selectInput("type", NA,
+## choices=c("Gene name", "Chrom name",
+## "Gene biotype", "Tx biotype"),
+## selected="Gene name")
+## ),
+## shiny::column(1,
+## selectInput("condition", NA,
+## choices=c("=", "!=", "like", "in"),
+## selected="=")
+## ),
+## shiny::column(2,
+## textInput("geneName", NA, value="")
+## ),
+## ## shiny::column(2,
+## ## submitButton("Go!")
+## ## ),
+## shiny::column(2,
+## actionButton("closeButton", "Return result")
+## )
+## ),
+## ## fluidRow(
+## ## mainPanel(
+## ## tabsetPanel(
+## ## tabPanel('Genes',
+## ## dataTableOutput("Genes")
+## ## ),
+## ## tabPanel('Transcripts',
+## ## dataTableOutput("Transcripts")
+## ## ),
+## ## tabPanel('Exons',
+## ## dataTableOutput("Exons")
+## ## )
+## ## )
+## ## ##h4(textOutput("genename"))
+## ## )
+## ## )
+## ))
+
+
+
diff --git a/inst/test/testFunctionality.R b/inst/test/testFunctionality.R
new file mode 100644
index 0000000..3161093
--- /dev/null
+++ b/inst/test/testFunctionality.R
@@ -0,0 +1,293 @@
+## check namespace.
+detachem <- function( x ){
+ NS <- loadedNamespaces()
+ if( any( NS==x ) ){
+ pkgn <- paste0( "package:", x )
+ detach( pkgn, unload=TRUE, character.only=TRUE )
+ }
+}
+Pkgs <- c( "EnsDb.Hsapiens.v75", "ensembldb" )
+tmp <- sapply( Pkgs, detachem )
+tmp <- sapply( Pkgs, library, character.only=TRUE )
+
+###
+
+## just get all genes.
+cat( "getting all genes..." )
+Gns <- genes( EnsDb.Hsapiens.v75 )
+Gns
+cat("done\n")
+
+cat( "getting all transcripts..." )
+Gns <- transcripts( EnsDb.Hsapiens.v75 )
+Gns
+cat("done\n")
+
+cat( "getting all exons..." )
+Gns <- exons( EnsDb.Hsapiens.v75 )
+Gns
+cat("done\n")
+
+## get exons, sort by exon_seq_start
+Gns <- exons( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( TxidFilter( "a" ) ) )
+ensembldb:::.buildQuery( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( TxidFilter( "a" ) ))
+
+cat( "all transcripts by..." )
+tmp <- transcriptsBy( EnsDb.Hsapiens.v75 )
+tmp
+cat("done\n")
+
+cat( "all exons by..." )
+tmp <- exonsBy( EnsDb.Hsapiens.v75 )
+tmp
+cat("done\n")
+
+
+###########
+## getWhat... generic query interface to the database.
+Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name" ), filter=list( SeqnameFilter( "Y" ) ) )
+head(Test)
+dim(Test)
+
+## now let's joind exon...
+Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end" ), order.by="exon_seq_end", order.type="desc", filter=list( SeqnameFilter( "Y" ) ) )
+head(Test)
+dim(Test)
+
+
+## throws a warning since exon_chrom_end is not valid.
+Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end" ), order.by="exon_chrom_end", order.type="desc", filter=list( SeqnameFilter( "Y" ) ) )
+head(Test)
+dim(Test)
+
+
+## add a Txid Filter.
+Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name", "exon_id", "exon_seq_start", "exon_seq_end", "tx_id" ), order.by="exon_seq_end", order.type="desc", filter=list( TxidFilter( "ENST00000028008" ) ) )
+Test
+
+Test <- ensembldb:::getWhat( EnsDb.Hsapiens.v75, columns=c( "gene_id", "gene_biotype", "gene_name", "seq_name" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
+Test
+
+
+
+######
+## exonsBy
+## get all Exons by gene for genes encoded on chromosomes 1, 2, 4
+Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "gene_name", "gene_biotype" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ), SeqstrandFilter( "-" ) ) )
+Test
+
+## tx_biotype and tx_id have been removed.
+Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "gene_name", "gene_biotype", "tx_biotype", "tx_id" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ), SeqstrandFilter( "-" ) ) )
+Test
+
+Test <- exonsBy( EnsDb.Hsapiens.v75, by="tx", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( SeqnameFilter( c( 1, 2,4 ) ) ) )
+Test
+
+## exons for a specific transcript
+Test <- exonsBy( EnsDb.Hsapiens.v75, by="tx", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
+Test
+
+## that also works, albeit throwing an warning.
+Test <- exonsBy( EnsDb.Hsapiens.v75, by="gene", columns=c( "gene_id", "tx_id", "tx_biotype" ), filter=list( TxidFilter( "ENST00000028008" ) ) )
+Test
+
+
+
+########
+## transcriptsBy
+Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ) )
+Test
+
+## that should throw a warning
+Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns=c( "exon_id", "exon_seq_start" ) )
+Test
+
+
+Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="exon", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns="tx_biotype" )
+Test
+
+## that should throw a warning
+Test <- transcriptsBy( EnsDb.Hsapiens.v75, by="exon", filter=list( SeqstrandFilter( "+" ), SeqnameFilter( "X" ) ), columns=c( "exon_id", "exon_seq_start", "tx_biotype" ) )
+Test
+
+
+######
+## genes
+Test <- genes( EnsDb.Hsapiens.v75, filter=list( GenebiotypeFilter( "lincRNA" ) ) )
+head( Test )
+length( Test )
+
+## adding tx properties along with gene columns; this will return a data.frame with the
+## additional information; gene columns can however no longer be unique in the data.frame
+Test <- genes( EnsDb.Hsapiens.v75, filter=list( GenebiotypeFilter( "lincRNA" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "gene"), "tx_id", "tx_biotype" ) )
+head( Test )
+length( Test )
+
+######
+## transcripts
+## get all transcripts that are target to nonsense mediated decay
+Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ) )
+head( Test )
+length( Test )
+
+## order the transcripts by seq_name; this does not work.
+Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), order.by="seq_name" )
+head( Test )
+nrow( Test )
+
+## order the transcripts by seq_name; have to explicitely add seq_name to the columns.
+Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), order.by="seq_name", columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "seq_name" ) )
+head( Test )
+nrow( Test )
+
+## get in addition the gene_name and gene_id
+Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "gene_id", "gene_name" ) )
+head( Test )
+nrow( Test )
+
+## get in addition the gene_name and gene_id and also exon_id and exon_idx
+Test <- transcripts( EnsDb.Hsapiens.v75, filter=list( TxbiotypeFilter( "nonsense_mediated_decay" ) ), columns=c( listColumns( EnsDb.Hsapiens.v75, "tx" ), "gene_id", "gene_name", "exon_id", "exon_idx" ) )
+head( Test )
+nrow( Test )
+
+
+#####
+## exons
+##
+Test <- exons( EnsDb.Hsapiens.v75, filter=list( TxidFilter( "ENST00000028008" ) ), columns=c( "gene_id","gene_name", "gene_biotype" ) )
+Test
+
+
+
+
+##################
+## examples from EnsDb-class:
+
+## display some information:
+EnsDb.Hsapiens.v75
+
+organism( EnsDb.Hsapiens.v75 )
+
+seqinfo( EnsDb.Hsapiens.v75 )
+
+## show the tables
+listTables( EnsDb.Hsapiens.v75 )
+
+
+###### buildQuery
+##
+## join tables gene and transcript and return gene_id and tx_id
+buildQuery( EnsDb.Hsapiens.v75, columns=c( "gene_id", "tx_id" ) )
+
+
+## get all exon_ids and transcript ids of genes encoded on chromosome Y.
+buildQuery( EnsDb.Hsapiens.v75, columns=c( "exon_id", "tx_id" ), filter=list( SeqnameFilter( "Y") ) )
+
+
+###### genes
+##
+## get all genes coded on chromosome Y
+AllY <- genes( EnsDb.Hsapiens.v75, filter=list( SeqnameFilter( "Y" ) ) )
+head( AllY )
+
+## return result as GRanges.
+AllY.granges <- genes( EnsDb.Hsapiens.v75, filter=list( SeqnameFilter(
+ "Y" ) ), return.type="GRanges" )
+AllY.granges
+
+## include all transcripts of the gene and their chromosomal
+## coordinates, sort by chrom start of transcripts and return as
+## GRanges.
+AllY.granges.tx <- genes( EnsDb.Hsapiens.v75, filter=list(
+ SeqnameFilter( "Y" ) ), return.type="GRanges", columns=c(
+ "gene_id", "seq_name", "seq_strand", "tx_id", "tx_biotype",
+ "tx_seq_start", "tx_seq_end" ), order.by="tx_seq_start" )
+AllY.granges.tx
+
+
+
+###### transcripts
+##
+## get all transcripts of a gene
+Tx <- transcripts( EnsDb.Hsapiens.v75, filter=list( GeneidFilter(
+ "ENSG00000184895" ) ), order.by="tx_seq_start" )
+Tx
+
+## get all transcripts of two genes along with some information on the
+## gene and transcript
+Tx.granges <- transcripts( EnsDb.Hsapiens.v75, filter=list(
+ GeneidFilter( c( "ENSG00000184895", "ENSG00000092377" ),
+ condition="in" )), return.type="GRanges", order.by="tx_seq_start",
+ columns=c( "gene_id", "gene_seq_start", "gene_seq_end",
+ "gene_biotype", "tx_biotype" ) )
+Tx.granges
+
+
+
+###### exons
+##
+## get all exons of the provided genes
+Exon.granges <- exons( EnsDb.Hsapiens.v75, filter=list( GeneidFilter( c(
+ "ENSG00000184895", "ENSG00000092377" ) )),
+ return.type="GRanges", order.by="exon_seq_start", columns=c(
+ "gene_id", "gene_seq_start", "gene_seq_end", "gene_biotype" ) )
+Exon.granges
+
+
+
+##### exonsBy
+##
+## get all exons for transcripts encoded on chromosomes 1 to 22, X and Y.
+ETx <- exonsBy( EnsDb.Hsapiens.v75, by="tx", filter=list( SeqnameFilter(
+ c( 1:22, "X", "Y" ) ) ) )
+ETx
+## get all exons for genes encoded on chromosome 1 to 22, X and Y and
+## include additional annotation columns in the result
+EGenes <- exonsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
+ SeqnameFilter( c( 1:22, "X", "Y" ) ) ), columns=c( "gene_biotype",
+ "gene_name" ) )
+EGenes
+
+## Note that this might also contain "LRG" genes.
+sum( grep( names( EGenes ), pattern="LRG" ) )
+## fetch just Ensembl genes:
+EGenes <- exonsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
+ SeqnameFilter( c( 1:22, "X", "Y" ) ), GeneidFilter( "ENS%", "like" ) ), columns=c( "gene_biotype",
+ "gene_name" ) )
+
+sum( grep( names( EGenes ), pattern="LRG" ) )
+
+
+
+##### transcriptsBy
+##
+TGenes <- transcriptsBy( EnsDb.Hsapiens.v75, by="gene", filter=list(
+ SeqnameFilter( c( 1:22, "X", "Y" ) ) ) )
+TGenes
+
+
+
+##### lengthOf
+##
+## length of a specific gene.
+lengthOf( EnsDb.Hsapiens.v75, filter=list( GeneidFilter(
+ "ENSG00000000003" ) ) )
+
+## length of a transcript
+lengthOf( EnsDb.Hsapiens.v75, of="tx", filter=list( TxidFilter(
+ "ENST00000494424" ) ) )
+
+## average length of all protein coding genes
+mean( lengthOf( EnsDb.Hsapiens.v75, of="gene", filter=list(
+ GenebiotypeFilter( "protein_coding" ),
+ SeqnameFilter( c( 1:22, "X", "Y" ) ) ) ) )
+
+## average length of all snoRNAs
+mean( lengthOf( EnsDb.Hsapiens.v75, of="gene", filter=list(
+ GenebiotypeFilter( "snoRNA" ),
+ SeqnameFilter( c( 1:22, "X", "Y" ) ) ) ) )
+
+listGenebiotypes(EnsDb.Hsapiens.v75)
+
+listTxbiotypes(EnsDb.Hsapiens.v75)
+
diff --git a/inst/test/testInternals.R b/inst/test/testInternals.R
new file mode 100644
index 0000000..7590d88
--- /dev/null
+++ b/inst/test/testInternals.R
@@ -0,0 +1,146 @@
+detachem <- function(x){
+ NS <- loadedNamespaces()
+ if(any(NS==x)){
+ pkgn <- paste0("package:", x)
+ detach(pkgn, unload=TRUE, character.only=TRUE)
+ }
+}
+Pkgs <- c("EnsDb.Hsapiens.v75", "ensembldb")
+tmp <- sapply(Pkgs, detachem)
+tmp <- sapply(Pkgs, library, character.only=TRUE)
+DB <- EnsDb.Hsapiens.v75
+
+
+#######################################################
+##
+## add required tables if needed.
+##
+## check if we get what we want...
+Expect <- c("exon", "tx2exon", "tx")
+Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "tx"))
+Get
+if(sum(Get %in% Expect)!=length(Expect))
+ stop("Didn't get what I expected!")
+
+
+Expect <- c("exon", "tx2exon", "tx", "gene")
+Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "gene"))
+Get
+if(sum(Get %in% Expect)!=length(Expect))
+ stop("Didn't get what I expected!")
+
+
+
+Expect <- c("exon", "tx2exon", "tx", "gene")
+Get <- ensembldb:::addRequiredTables(EnsDb.Hsapiens.v75, c("exon", "gene", "tx"))
+Get
+if(sum(Get %in% Expect)!=length(Expect))
+ stop("Didn't get what I expected!")
+
+
+#######################################################
+##
+## join queries
+##
+ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon", "t2exon", "tx"))
+
+
+ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon"))
+
+
+ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("exon", "t2exon", "tx", "gene"))
+
+
+ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("tx", "gene"))
+
+
+ensembldb:::joinQueryOnTables(EnsDb.Hsapiens.v75, c("chromosome", "gene"))
+
+
+
+
+#######################################################
+##
+## join queries on column names
+##
+## for that query we don't need the exon table
+ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id","tx_id", "bla", "value"))
+
+## don't require the exon table here, exon_id is also in tx2exon.
+ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_id"))
+
+##
+ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_idx"))
+
+
+ensembldb:::joinQueryOnColumns(EnsDb.Hsapiens.v75, c("gene_id", "tx_id", "gene_name", "exon_id", "exon_seq_start"))
+
+
+
+#######################################################
+##
+## clean columns
+##
+ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id" ,"bma", "gene.gene_biotype"))
+
+ensembldb:::cleanColumns(EnsDb.Hsapiens.v75, c("gene_id" ,"gene.gene_name", "gene.gene_biotype"))
+
+
+
+#######################################################
+##
+## check built queries
+##
+ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")))
+
+
+## throws a warning
+ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
+
+
+## works
+ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "gene_name", "tx_id", "exon_id", "exon_seq_end"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
+
+
+ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("tx_id", "exon_id", "exon_seq_end"), filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")), order.by="exon_seq_end", order.type="desc")
+
+
+ensembldb:::.buildQuery(EnsDb.Hsapiens.v75, columns=c("tx_id", "gene_id"))
+
+
+## check the new filter thingy.
+GF <- GeneidFilter("a")
+where(GF)
+column(GF)
+
+## with db
+column(GF, DB)
+where(GF, DB)
+
+## with db and with.tables
+column(GF, DB, with.tables="tx")
+where(GF, DB, with.tables="tx")
+
+
+column(GF, DB, with.tables=c("gene", "tx"))
+where(GF, DB, with.tables=c("gene", "tx"))
+
+## does throw an error!
+##column(GF, DB, with.tables="exon")
+
+## silently drops the submitted ones.
+column(GF, DB, with.tables="blu")
+
+##
+ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"))
+## with filter
+ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"),
+ filter=list(GeneidFilter("a")))
+ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id"),
+ filter=list(GeneidFilter("a"),
+ SeqnameFilter(1)))
+
+ensembldb:::.buildQuery(DB, columns=c("tx_id", "gene_id", "exon_idx"),
+ filter=list(GeneidFilter("a"),
+ SeqnameFilter(1)))
+
diff --git a/inst/txt/ENST00000200135.fa.gz b/inst/txt/ENST00000200135.fa.gz
new file mode 100644
index 0000000..0ffe5c7
Binary files /dev/null and b/inst/txt/ENST00000200135.fa.gz differ
diff --git a/inst/txt/ENST00000335953.fa.gz b/inst/txt/ENST00000335953.fa.gz
new file mode 100644
index 0000000..b5d51b0
Binary files /dev/null and b/inst/txt/ENST00000335953.fa.gz differ
diff --git a/inst/unitTests/test_Filters.R b/inst/unitTests/test_Filters.R
new file mode 100644
index 0000000..4300f4a
--- /dev/null
+++ b/inst/unitTests/test_Filters.R
@@ -0,0 +1,241 @@
+library("EnsDb.Hsapiens.v75")
+edb <- EnsDb.Hsapiens.v75
+
+## testing GeneidFilter
+test_GeneidFilter <- function(){
+ GF <- GeneidFilter("ENSG0000001")
+ ## check if column matches the present database.
+ checkEquals(column(GF, EnsDb.Hsapiens.v75), "gene.gene_id")
+ ## check error if value is not as expected.
+ checkException(GeneidFilter("ENSG000001", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(GeneidFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(GeneidFilter(c("a", "b"), "!="))
+}
+
+test_GenebiotypeFilter <- function(){
+ Filt <- GenebiotypeFilter("protein_coding")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_biotype")
+ checkException(GenebiotypeFilter("protein_coding", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(GenebiotypeFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(GenebiotypeFilter(c("a", "b"), "!="))
+
+}
+
+test_GenenameFilter <- function(){
+ Filt <- GenenameFilter("genename")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_name")
+ checkException(GenenameFilter("genename", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(GenenameFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(GenenameFilter(c("a", "b"), "!="))
+ ## check if we're escaping correctly!
+ Filt <- GenenameFilter("I'm a gene")
+ checkEquals(where(Filt, EnsDb.Hsapiens.v75), "gene.gene_name = 'I''m a gene'")
+}
+
+test_TxidFilter <- function(){
+ Filt <- TxidFilter("a")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_id")
+ checkException(TxidFilter("a", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(TxidFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(TxidFilter(c("a", "b"), "!="))
+}
+
+test_TxbiotypeFilter <- function(){
+ Filt <- TxbiotypeFilter("a")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_biotype")
+ checkException(TxbiotypeFilter("a", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(TxbiotypeFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(TxbiotypeFilter(c("a", "b"), "!="))
+}
+
+test_ExonidFilter <- function(){
+ Filt <- ExonidFilter("a")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx2exon.exon_id")
+ checkException(ExonidFilter("a", ">"))
+ ## expect the filter to change the condition if lenght of values
+ ## is > 1
+ checkMultiValsIn(ExonidFilter(c("a", "b"), "="))
+ checkMultiValsNotIn(ExonidFilter(c("a", "b"), "!="))
+}
+
+## SeqnameFilter
+test_SeqnameFilter <- function(){
+ Filt <- SeqnameFilter("a")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.seq_name")
+ checkException(SeqnameFilter("a", ">"))
+}
+
+## SeqstrandFilter
+test_SeqstrandFilter <- function(){
+ checkException(SeqstrandFilter("a"))
+ Filt <- SeqstrandFilter("-")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.seq_strand")
+}
+
+## SeqstartFilter, feature
+test_SeqstartFilter <- function(){
+ Filt <- SeqstartFilter(123, feature="gene")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_seq_start")
+ Filt <- SeqstartFilter(123, feature="transcript")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_seq_start")
+}
+
+## SeqendFilter
+test_SeqendFilter <- function(){
+ Filt <- SeqendFilter(123, feature="gene")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "gene.gene_seq_end")
+ Filt <- SeqendFilter(123, feature="transcript")
+ checkEquals(column(Filt, EnsDb.Hsapiens.v75), "tx.tx_seq_end")
+}
+
+
+
+## checks if "condition" of the filter is "in"
+checkMultiValsIn <- function(filt){
+ checkEquals(condition(filt), "in")
+}
+## checks if "condition" of the filter is "in"
+checkMultiValsNotIn <- function(filt){
+ checkEquals(condition(filt), "not in")
+}
+
+test_ExonrankFilter <- function(){
+ Filt <- ExonrankFilter(123)
+ checkException(ExonrankFilter("a"))
+
+ edb <- EnsDb.Hsapiens.v75
+ checkException(value(Filt) <- "b")
+
+ checkEquals(column(Filt), "exon_idx")
+ checkEquals(column(Filt, edb), "tx2exon.exon_idx")
+ where(Filt, edb)
+}
+
+## SymbolFilter
+test_SymbolFilter <- function() {
+ edb <- EnsDb.Hsapiens.v75
+ sf <- SymbolFilter("SKA2")
+
+ ## Check the column method.
+ checkEquals(column(sf), "symbol")
+ ## For EnsDb we want it to link to gene_name
+ checkEquals(column(sf, edb), "gene.gene_name")
+ checkException(column(sf, edb, with.tables = c("tx", "exon")))
+
+ ## Check the where method.
+ checkEquals(where(sf), "symbol = 'SKA2'")
+ condition(sf) <- "!="
+ checkEquals(where(sf, edb), "gene.gene_name != 'SKA2'")
+
+ ## Test if we can use it:
+ condition(sf) <- "="
+ Res <- genes(edb, filter = sf, return.type = "data.frame")
+ checkEquals(Res$gene_id, "ENSG00000182628")
+ ## We need now also a column "symbol"!
+ checkEquals(Res$symbol, Res$gene_name)
+ ## Asking explicitely for symbol
+ Res <- genes(edb, filter = sf, return.type = "data.frame",
+ columns = c("symbol", "gene_id"))
+ checkEquals(colnames(Res), c("symbol", "gene_id"))
+ ## Some more stuff, also shuffling the order.
+ Res <- genes(edb, filter = sf, return.type = "data.frame",
+ columns = c("gene_name", "symbol", "gene_id"))
+ checkEquals(colnames(Res), c("gene_name", "symbol", "gene_id"))
+ Res <- genes(edb, filter = sf, return.type = "data.frame",
+ columns = c("gene_id", "gene_name", "symbol"))
+ checkEquals(colnames(Res), c("gene_id", "gene_name", "symbol"))
+ ## And with GRanges as return type.
+ Res <- genes(edb, filter = sf, return.type = "GRanges",
+ columns = c("gene_id", "gene_name", "symbol"))
+ checkEquals(colnames(mcols(Res)), c("gene_id", "gene_name", "symbol"))
+
+ ## Combine tx_name and symbol
+ Res <- genes(edb, filter = sf, columns = c("tx_name", "symbol"),
+ return.type = "data.frame")
+ checkEquals(colnames(Res), c("tx_name", "symbol", "gene_id"))
+ checkTrue(all(Res$symbol == "SKA2"))
+
+ ## Test for transcripts
+ Res <- transcripts(edb, filter=sf, return.type="data.frame")
+ checkTrue(all(Res$symbol == "SKA2"))
+ Res <- transcripts(edb, filter = sf, return.type = "data.frame",
+ columns = c("symbol", "tx_id", "gene_name"))
+ checkTrue(all(Res$symbol == "SKA2"))
+ checkEquals(Res$symbol, Res$gene_name)
+ checkEquals(colnames(Res), c("symbol", "tx_id", "gene_name"))
+
+ ## Test for exons
+ Res <- exons(edb, filter=sf, return.type="data.frame")
+ checkTrue(all(Res$symbol == "SKA2"))
+ Res <- exons(edb, filter = c(sf, TxbiotypeFilter("nonsense_mediated_decay")),
+ return.type = "data.frame",
+ columns = c("symbol", "tx_id", "gene_name"))
+ checkTrue(all(Res$symbol == "SKA2"))
+ checkEquals(Res$symbol, Res$gene_name)
+ checkEquals(colnames(Res), c("symbol", "tx_id", "gene_name", "exon_id", "tx_biotype"))
+
+ ## Test for exonsBy
+ Res <- exonsBy(edb, filter=sf)
+ checkTrue(all(unlist(Res)$symbol == "SKA2"))
+ Res <- exonsBy(edb, filter = c(sf, TxbiotypeFilter("nonsense_mediated_decay")),
+ columns = c("symbol", "tx_id", "gene_name"))
+ checkTrue(all(unlist(Res)$symbol == "SKA2"))
+
+ checkEquals(unlist(Res)$symbol, unlist(Res)$gene_name)
+
+ ## Test for transcriptsBy too
+}
+
+
+## Here we want to test if we get always also the filter columns back.
+test_multiFilterReturnCols <- function() {
+ cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+ filter = SymbolFilter("SKA2"))
+ checkEquals(cols, c("exon_id", "symbol"))
+ ## Two filter
+ cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+ filter = list(SymbolFilter("SKA2"),
+ GenenameFilter("SKA2")))
+ checkEquals(cols, c("exon_id", "symbol", "gene_name"))
+ cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+ filter = list(SymbolFilter("SKA2"),
+ GenenameFilter("SKA2"),
+ GRangesFilter(GRanges("3",
+ IRanges(3, 5)
+ ))))
+ checkEquals(cols, c("exon_id", "symbol", "gene_name", "gene_seq_start",
+ "gene_seq_end", "seq_name", "seq_strand"))
+ cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+ filter = list(SymbolFilter("SKA2"),
+ GenenameFilter("SKA2"),
+ GRangesFilter(GRanges("3",
+ IRanges(3, 5)
+ ),
+ feature = "exon")))
+ checkEquals(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
+ "exon_seq_end", "seq_name", "seq_strand"))
+ ## SeqstartFilter and GRangesFilter
+ ssf <- SeqstartFilter(123, feature="tx")
+ cols <- ensembldb:::addFilterColumns(edb, cols = c("exon_id"),
+ filter = list(SymbolFilter("SKA2"),
+ GenenameFilter("SKA2"),
+ GRangesFilter(GRanges("3",
+ IRanges(3, 5)
+ ),
+ feature = "exon"),
+ ssf))
+ checkEquals(cols, c("exon_id", "symbol", "gene_name", "exon_seq_start",
+ "exon_seq_end", "seq_name", "seq_strand", "tx_seq_start"))
+
+}
+
diff --git a/inst/unitTests/test_Functionality.R b/inst/unitTests/test_Functionality.R
new file mode 100644
index 0000000..bbb79ce
--- /dev/null
+++ b/inst/unitTests/test_Functionality.R
@@ -0,0 +1,507 @@
+## that's just a plain simple R-script calling the standard methods.
+
+library( "EnsDb.Hsapiens.v75" )
+DB <- EnsDb.Hsapiens.v75
+
+## testing genes method.
+test_genes <- function(){
+ Gns <- genes(DB, filter=SeqnameFilter("Y"))
+ Gns <- genes(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
+ checkEquals(sort(colnames(Gns)), sort(listColumns(DB, "gene")))
+ Gns <- genes(DB, filter=SeqnameFilter("Y"), return.type="DataFrame",
+ columns=c("gene_id", "tx_name"))
+ checkEquals(colnames(Gns), c("gene_id", "tx_name", "seq_name"))
+
+ Gns <- genes(DB, filter=SeqnameFilter("Y"), columns=c("gene_id", "gene_name"))
+ ## Here we don't need the seqnames in mcols!
+ checkEquals(colnames(mcols(Gns)), c("gene_id", "gene_name"))
+
+
+ ## checkEquals(class(genes(DB, return.type="DataFrame",
+ ## filter=list(SeqnameFilter("Y")))), "DataFrame" )
+}
+
+test_transcripts <- function(){
+ Tns <- transcripts(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
+ checkEquals(sort(colnames(Tns)), sort(c(listColumns(DB, "tx"), "seq_name")))
+
+ Tns <- transcripts(DB, columns=c("tx_id", "tx_name"), filter=SeqnameFilter("Y"))
+ checkEquals(sort(colnames(mcols(Tns))), sort(c("tx_id", "tx_name")))
+
+ ## Check the default ordering.
+ Tns <- transcripts(DB, filter = TxbiotypeFilter("protein_coding"),
+ return.type = "data.frame",
+ columns = c("seq_name", listColumns(DB, "tx")))
+ checkEquals(order(Tns$seq_name, method = "radix"), 1:nrow(Tns))
+}
+
+test_transcriptsBy <- function(){
+ ## Expect results on the forward strand to be ordered by tx_seq_start
+ res <- transcriptsBy(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+")),
+ by = "gene")
+ fw <- res[[3]]
+ checkEquals(order(start(fw)), 1:length(fw))
+ ## Expect results on the reverse strand to be ordered by -tx_seq_end
+ res <- transcriptsBy(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-")),
+ by = "gene")
+ rv <- res[[3]]
+ checkEquals(order(start(rv), decreasing = TRUE), 1:length(rv))
+}
+
+test_exons <- function(){
+ Exns <- exons(DB, filter=SeqnameFilter("Y"), return.type="DataFrame")
+ checkEquals(sort(colnames(Exns)), sort(c(listColumns(DB, "exon"), "seq_name")))
+
+ ## Check correct ordering.
+ Exns <- exons(DB, return.type = "data.frame", filter = SeqnameFilter(20:23))
+ checkEquals(order(Exns$seq_name, method = "radix"), 1:nrow(Exns))
+}
+
+test_exonsBy <- function() {
+ ##ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("X")), by="tx")
+ ExnsBy <- exonsBy(DB, filter = list(SeqnameFilter("Y")), by = "tx",
+ columns = c("tx_name"))
+ checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
+ sort(c("exon_id", "exon_rank", "tx_name")))
+
+ ## Check what happens if we specify tx_id.
+ ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y")), by="tx",
+ columns=c("tx_id"))
+ checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
+ sort(c("exon_id", "exon_rank", "tx_id")))
+
+ ## ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y")), by="tx",
+ ## columns=c("exon_rank"))
+ ## checkEquals(sort(colnames(mcols(ExnsBy[[1]]))),
+ ## sort(c("exon_id", "exon_rank")))
+
+ ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
+ by="gene")
+ ## Check that ordering is on start on the forward strand.
+ fw <- ExnsBy[[3]]
+ checkEquals(order(start(fw)), 1:length(fw))
+ ##
+ ExnsBy <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")),
+ by="gene")
+ ## Check that ordering is on start on the forward strand.
+ rv <- ExnsBy[[3]]
+ checkEquals(order(end(rv), decreasing = TRUE), 1:length(rv))
+}
+
+test_dbfunctionality <- function(){
+ GBT <- listGenebiotypes(DB)
+ TBT <- listTxbiotypes(DB)
+}
+
+## test if we get the expected exceptions if we're not submitting
+## correct filter objects
+test_filterExceptions <- function(){
+ checkException(genes(DB, filter="d"))
+ checkException(genes(DB, filter=list(SeqnameFilter("X"),
+ "z")))
+ checkException(transcripts(DB, filter="d"))
+ checkException(transcripts(DB, filter=list(SeqnameFilter("X"),
+ "z")))
+ checkException(exons(DB, filter="d"))
+ checkException(exons(DB, filter=list(SeqnameFilter("X"),
+ "z")))
+ checkException(exonsBy(DB, filter="d"))
+ checkException(exonsBy(DB, filter=list(SeqnameFilter("X"),
+ "z")))
+ checkException(transcriptsBy(DB, filter="d"))
+ checkException(transcriptsBy(DB, filter=list(SeqnameFilter("X"),
+ "z")))
+}
+
+test_promoters <- function(){
+ promoters(EnsDb.Hsapiens.v75, filter=GeneidFilter(c("ENSG00000184895",
+ "ENSG00000092377")))
+}
+
+test_return_columns_gene <- function(){
+ cols <- c("gene_name", "tx_id")
+ Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
+ checkEquals(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
+
+ Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
+ checkEquals(sort(c(cols, "seq_name", "gene_id")), sort(colnames(Resu)))
+
+ Resu <- genes(DB, filter=SeqnameFilter("Y"), columns=cols)
+ checkEquals(sort(c(cols, "gene_id")), sort(colnames(mcols(Resu))))
+}
+
+test_return_columns_tx <- function(){
+ cols <- c("tx_id", "exon_id", "tx_biotype")
+ Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
+ checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+
+ Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
+ checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+
+ Resu <- transcripts(DB, filter=SeqnameFilter("Y"), columns=cols)
+ checkEquals(sort(cols), sort(colnames(mcols(Resu))))
+}
+test_return_columns_exon <- function(){
+ cols <- c("tx_id", "exon_id", "tx_biotype")
+ Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="data.frame")
+ checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+
+ Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols, return.type="DataFrame")
+ checkEquals(sort(c(cols, "seq_name")), sort(colnames(Resu)))
+
+ Resu <- exons(DB, filter=SeqnameFilter("Y"), columns=cols)
+ checkEquals(sort(cols), sort(colnames(mcols(Resu))))
+}
+
+test_cdsBy <- function(){
+ ## Just checking if we get also tx_name
+ cs <- cdsBy(DB, filter=SeqnameFilter("Y"), column="tx_name")
+ checkTrue(any(colnames(mcols(cs[[1]])) == "tx_name"))
+
+ do.plot <- FALSE
+ ## By tx
+ cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")))
+ tx <- exonsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")))
+ ## Check for the first if it makes sense:
+ whichTx <- names(cs)[1]
+ whichCs <- cs[[1]]
+ tx <- transcripts(DB, filter=TxidFilter(whichTx),
+ columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
+ "exon_idx", "exon_id", "seq_strand"),
+ return.type="data.frame")
+ checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+ ## Next one:
+ whichTx <- names(cs)[2]
+ tx <- transcripts(DB, filter=TxidFilter(whichTx),
+ columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
+ "exon_idx", "exon_id"), return.type="data.frame")
+ checkSingleTx(tx=tx, cds=cs[[2]], do.plot=do.plot)
+
+ ## Now for reverse strand:
+ cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")))
+ whichTx <- names(cs)[1]
+ whichCs <- cs[[1]]
+ tx <- transcripts(DB, filter=TxidFilter(whichTx),
+ columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
+ "exon_idx", "exon_id"), return.type="data.frame")
+ ## order the guys by seq_start
+ whichCs <- whichCs[order(start(whichCs))]
+ checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+ ## Next one:
+ whichTx <- names(cs)[2]
+ whichCs <- cs[[2]]
+ tx <- transcripts(DB, filter=TxidFilter(whichTx),
+ columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start",
+ "tx_cds_seq_end", "exon_seq_start", "exon_seq_end",
+ "exon_idx", "exon_id"), return.type="data.frame")
+ ## order the guys by seq_start
+ whichCs <- whichCs[order(start(whichCs))]
+ checkSingleTx(tx=tx, cds=whichCs, do.plot=do.plot)
+
+ ## Check adding columns
+ Test <- cdsBy(DB, filter=list(SeqnameFilter("Y")),
+ columns=c("gene_biotype", "gene_name"))
+}
+
+test_cdsByGene <- function(){
+ do.plot <- FALSE
+ ## By gene.
+ cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
+ by="gene", columns=NULL)
+ checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
+ checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
+ ## - strand
+ cs <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("-")),
+ by="gene", columns=NULL)
+ checkSingleGene(cs[[1]], gene=names(cs)[[1]], do.plot=do.plot)
+ checkSingleGene(cs[[2]], gene=names(cs)[[2]], do.plot=do.plot)
+
+ ## looks good!
+ cs2 <- cdsBy(DB, filter=list(SeqnameFilter("Y"), SeqstrandFilter("+")),
+ by="gene", use.names=TRUE)
+}
+
+test_UTRs <- function() {
+ ## check presence of tx_name
+ fUTRs <- fiveUTRsByTranscript(DB,
+ filter = TxidFilter("ENST00000155093"),
+ column = "tx_name")
+ checkTrue(any(colnames(mcols(fUTRs[[1]])) == "tx_name"))
+
+ do.plot <- FALSE
+ fUTRs <- fiveUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+")))
+ tUTRs <- threeUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+")))
+ cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+")))
+ ## Check a TX:
+ tx <- names(fUTRs)[1]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+ tx <- names(fUTRs)[2]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+ tx <- names(fUTRs)[3]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+
+ ## Reverse strand
+ fUTRs <- fiveUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-")))
+ tUTRs <- threeUTRsByTranscript(DB, filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-")))
+ cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-")))
+ ## Check a TX:
+ tx <- names(fUTRs)[1]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+ tx <- names(fUTRs)[2]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+ tx <- names(fUTRs)[3]
+ checkGeneUTRs(fUTRs[[tx]], tUTRs[[tx]], cds[[tx]], tx = tx,
+ do.plot = do.plot)
+}
+
+## The "test_UTRs" has a very poor performance with the RSQLite 1.0.9011
+## release candidate. Here we want to evaluate the performance.
+dontrun_test_UTRs_performance <- function() {
+ system.time(fUTRs <- fiveUTRsByTranscript(DB,
+ filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+")),
+ column = "tx_name")
+ )
+ ## 6.4 secs.
+ system.time(fUTRs <- fiveUTRsByTranscript(DB,
+ filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+"))))
+ ## 6.4 secs.
+ system.time(tUTRs <- threeUTRsByTranscript(DB,
+ filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+"))))
+ ## 6.3 secs.
+ system.time(cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("+"))))
+ ## 6.3 secs.
+ system.time(fUTRs <- fiveUTRsByTranscript(DB,
+ filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-"))))
+ ## 6.4 secs.
+ system.time(tUTRs <- threeUTRsByTranscript(DB,
+ filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-"))))
+ ## 6.6 secs.
+ system.time(cds <- cdsBy(DB, "tx", filter = list(SeqnameFilter("Y"),
+ SeqstrandFilter("-"))))
+ ## 6.3 secs.
+}
+
+checkGeneUTRs <- function(f, t, c, tx, do.plot=FALSE){
+ if(any(strand(c) == "+")){
+ ## End of five UTR has to be smaller than any start of cds
+ checkTrue(max(end(f)) < min(start(c)))
+ ## 3'
+ checkTrue(min(start(t)) > max(end(c)))
+ }else{
+ ## 5'
+ checkTrue(min(start(f)) > max(end(c)))
+ ## 3'
+ checkTrue(max(end(t)) < min(start(c)))
+ }
+ ## just plot...
+ if(do.plot){
+ tx <- transcripts(DB, filter=TxidFilter(tx), columns=c("exon_seq_start", "exon_seq_end"),
+ return.type="data.frame")
+ XL <- range(c(start(f), start(c), start(t), end(f), end(c), end(t)))
+ YL <- c(0, 4)
+ plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
+ ## five UTR
+ rect(xleft=start(f), xright=end(f), ybottom=0.1, ytop=0.9, col="blue")
+ ## cds
+ rect(xleft=start(c), xright=end(c), ybottom=1.1, ytop=1.9)
+ ## three UTR
+ rect(xleft=start(t), xright=end(t), ybottom=2.1, ytop=2.9, col="red")
+ ## all exons
+ rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end, ybottom=3.1, ytop=3.9)
+ }
+}
+
+checkSingleGene <- function(whichCs, gene, do.plot=FALSE){
+ tx <- transcripts(DB, filter=GeneidFilter(gene),
+ columns=c("tx_seq_start", "tx_seq_end", "tx_cds_seq_start", "tx_cds_seq_end", "tx_id",
+ "exon_id", "exon_seq_start", "exon_seq_end"), return.type="data.frame")
+ XL <- range(tx[, c("tx_seq_start", "tx_seq_end")])
+ tx <- split(tx, f=tx$tx_id)
+ if(do.plot){
+ ##XL <- range(c(start(whichCs), end(whichCs)))
+ YL <- c(0, length(tx) + 1)
+ plot(4, 4, pch=NA, xlim=XL, ylim=YL, yaxt="n", ylab="", xlab="")
+ ## plot the txses
+ for(i in 1:length(tx)){
+ current <- tx[[i]]
+ rect(xleft=current$exon_seq_start, xright=current$exon_seq_end,
+ ybottom=rep((i-1+0.1), nrow(current)), ytop=rep((i-0.1), nrow(current)))
+ ## coding:
+ rect(xleft=current$tx_cds_seq_start, xright=current$tx_cds_seq_end,
+ ybottom=rep((i-1+0.1), nrow(current)), ytop=rep((i-0.1), nrow(current)),
+ border="blue")
+ }
+ rect(xleft=start(whichCs), xright=end(whichCs), ybottom=rep(length(tx)+0.1, length(whichCs)),
+ ytop=rep(length(tx)+0.9, length(whichCs)), border="red")
+ }
+}
+
+checkSingleTx <- function(tx, cds, do.plot=FALSE){
+ rownames(tx) <- tx$exon_id
+ tx <- tx[cds$exon_id, ]
+ ## cds start and end have to be within the correct range.
+ checkTrue(all(start(cds) >= min(tx$tx_cds_seq_start)))
+ checkTrue(all(end(cds) <= max(tx$tx_cds_seq_end)))
+ ## For all except the first and the last we have to assume that exon_seq_start
+ ## is equal to start of cds.
+ checkTrue(all(start(cds)[-1] == tx$exon_seq_start[-1]))
+ checkTrue(all(end(cds)[-nrow(tx)] == tx$exon_seq_end[-nrow(tx)]))
+ ## just plotting the stuff...
+ if(do.plot){
+ XL <- range(tx[, c("exon_seq_start", "exon_seq_end")])
+ YL <- c(0, 4)
+ plot(3, 3, pch=NA, xlim=XL, ylim=YL, xlab="", yaxt="n", ylab="")
+ ## plotting the "real" exons:
+ rect(xleft=tx$exon_seq_start, xright=tx$exon_seq_end, ybottom=rep(0, nrow(tx)),
+ ytop=rep(1, nrow(tx)))
+ ## plotting the cds:
+ rect(xleft=start(cds), xright=end(cds), ybottom=rep(1.2, nrow(tx)),
+ ytop=rep(2.2, nrow(tx)), col="blue")
+ }
+}
+
+
+##*****************************************************************
+## Gviz stuff
+notrun_test_genetrack_df <- function(){
+ do.plot <- FALSE
+ if(do.plot){
+ library(Gviz)
+ options(ucscChromosomeNames=FALSE)
+ data(geneModels)
+ geneModels$chromosome <- 7
+ chr <- 7
+ start <- min(geneModels$start)
+ end <- max(geneModels$end)
+ myGeneModels <- getGeneRegionTrackForGviz(DB, chromosome=chr, start=start,
+ end=end)
+ ## chromosome has to be the same....
+ gtrack <- GenomeAxisTrack()
+ gvizTrack <- GeneRegionTrack(geneModels, name="Gviz")
+ ensdbTrack <- GeneRegionTrack(myGeneModels, name="ensdb")
+ plotTracks(list(gtrack, gvizTrack, ensdbTrack))
+ plotTracks(list(gtrack, gvizTrack, ensdbTrack), from=26700000, to=26780000)
+ ## Looks very nice...
+ }
+ ## Put the stuff below into the vignette:
+ ## Next we get all lincRNAs on chromosome Y
+ Lncs <- getGeneRegionTrackForGviz(DB,
+ filter=list(SeqnameFilter("Y"),
+ GenebiotypeFilter("lincRNA")))
+ Prots <- getGeneRegionTrackForGviz(DB,
+ filter=list(SeqnameFilter("Y"),
+ GenebiotypeFilter("protein_coding")))
+ if(do.plot){
+ plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+ GeneRegionTrack(Prots, name="proteins")))
+ plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+ GeneRegionTrack(Prots, name="proteins")),
+ from=5000000, to=7000000, transcriptAnnotation="symbol")
+ }
+ ## is that the same than:
+ TestL <- getGeneRegionTrackForGviz(DB,
+ filter=list(GenebiotypeFilter("lincRNA")),
+ chromosome="Y", start=5000000, end=7000000)
+ TestP <- getGeneRegionTrackForGviz(DB,
+ filter=list(GenebiotypeFilter("protein_coding")),
+ chromosome="Y", start=5000000, end=7000000)
+ if(do.plot){
+ plotTracks(list(gtrack, GeneRegionTrack(Lncs, name="lincRNAs"),
+ GeneRegionTrack(Prots, name="proteins"),
+ GeneRegionTrack(TestL, name="compareL"),
+ GeneRegionTrack(TestP, name="compareP")),
+ from=5000000, to=7000000, transcriptAnnotation="symbol")
+ }
+ checkTrue(all(TestL$exon %in% Lncs$exon))
+ checkTrue(all(TestP$exon %in% Prots$exon))
+ ## Crazy amazing stuff
+ ## system.time(
+ ## All <- getGeneRegionTrackForGviz(DB)
+ ## )
+}
+
+####============================================================
+## length stuff
+##
+####------------------------------------------------------------
+test_lengthOf <- function(){
+ system.time(
+ lenY <- lengthOf(DB, "tx", filter=SeqnameFilter("Y"))
+ )
+ ## Check what would happen if we do it ourselfs...
+ system.time(
+ lenY2 <- sum(width(reduce(exonsBy(DB, "tx", filter=SeqnameFilter("Y")))))
+ )
+ checkEquals(lenY, lenY2)
+ ## Same for genes.
+ system.time(
+ lenY <- lengthOf(DB, "gene", filter=SeqnameFilter("Y"))
+ )
+ ## Check what would happen if we do it ourselfs...
+ system.time(
+ lenY2 <- sum(width(reduce(exonsBy(DB, "gene", filter=SeqnameFilter("Y")))))
+ )
+ checkEquals(lenY, lenY2)
+ ## Just using the transcriptLengths
+
+
+}
+
+####============================================================
+## ExonrankFilter
+##
+####------------------------------------------------------------
+test_ExonrankFilter <- function(){
+ txs <- transcripts(DB, columns=c("exon_id", "exon_idx"),
+ filter=SeqnameFilter(c("Y")))
+ txs <- txs[order(names(txs))]
+
+ txs2 <- transcripts(DB, columns=c("exon_id"),
+ filter=list(SeqnameFilter(c("Y")),
+ ExonrankFilter(3)))
+ txs2 <- txs[order(names(txs2))]
+ ## hm, that's weird somehow.
+ exns <- exons(DB, columns=c("tx_id", "exon_idx"),
+ filter=list(SeqnameFilter("Y"),
+ ExonrankFilter(3)))
+ checkTrue(all(exns$exon_idx == 3))
+ exns <- exons(DB, columns=c("tx_id", "exon_idx"),
+ filter=list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition="<")))
+ checkTrue(all(exns$exon_idx < 3))
+}
+
+
+notrun_lengthOf <- function(){
+ ## How does TxDb do that?s
+ library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+ txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+ Test <- transcriptLengths(txdb)
+ head(Test)
+}
+
+
+
+
diff --git a/inst/unitTests/test_GFF.R b/inst/unitTests/test_GFF.R
new file mode 100644
index 0000000..d9ef256
--- /dev/null
+++ b/inst/unitTests/test_GFF.R
@@ -0,0 +1,179 @@
+notrun_test_builds <- function(){
+ input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+ fromGtf <- ensDbFromGtf(input, outfile=tempfile())
+ ## provide wrong ensembl version
+ fromGtf <- ensDbFromGtf(input, outfile=tempfile(), version="75")
+ ## provide wrong genome version
+ fromGtf <- ensDbFromGtf(input, outfile=tempfile(), genomeVersion="75")
+ EnsDb(fromGtf)
+ ## provide wrong organism
+ fromGtf <- ensDbFromGtf(input, outfile=tempfile(), organism="blalba")
+ EnsDb(fromGtf)
+ ## GFF
+ input <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.chr.gff3.gz"
+ fromGff <- ensDbFromGff(input, outfile=tempfile())
+ EnsDb(fromGff)
+ fromGff <- ensDbFromGff(input, outfile=tempfile(), version="75")
+ EnsDb(fromGff)
+ fromGff <- ensDbFromGff(input, outfile=tempfile(), genomeVersion="bla")
+ EnsDb(fromGff)
+ fromGff <- ensDbFromGff(input, outfile=tempfile(), organism="blabla")
+ EnsDb(fromGff)
+
+ ## AH
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile())
+ EnsDb(fromAH)
+ fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), version="75")
+ EnsDb(fromAH)
+ fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), genomeVersion="bla")
+ EnsDb(fromAH)
+ fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile(), organism="blabla")
+ EnsDb(fromAH)
+}
+
+
+
+notrun_test_ensdbFromGFF <- function(){
+ library(ensembldb)
+ ##library(rtracklayer)
+ ## VERSION 83
+ gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+ fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
+ egtf <- EnsDb(fromGtf)
+
+ gff <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gff3.gz"
+ fromGff <- ensDbFromGff(gff, outfile=tempfile())
+ egff <- EnsDb(fromGff)
+
+ ## Compare EnsDbs
+ ensembldb:::compareEnsDbs(egtf, egff)
+ ## OK, only Entrezgene ID "problems"
+
+ ## Compare with the one built with the Perl API
+ library(EnsDb.Hsapiens.v83)
+ edb <- EnsDb.Hsapiens.v83
+
+ ensembldb:::compareEnsDbs(egtf, edb)
+
+ ensembldb:::compareEnsDbs(egff, edb)
+ ## OK, I get different genes...
+ genes1 <- genes(egtf)
+ genes2 <- genes(edb)
+
+ only2 <- genes2[!(genes2$gene_id %in% genes1$gene_id)]
+
+ ## That below was before the fix to include feature type start_codon and stop_codon
+ ## to the CDS type.
+ ## Identify which are the different transcripts:
+ txGtf <- transcripts(egtf)
+ txGff <- transcripts(egff)
+ commonIds <- intersect(names(txGtf), names(txGff))
+ haveCds <- commonIds[!is.na(txGtf[commonIds]$tx_cds_seq_start) & !is.na(txGff[commonIds]$tx_cds_seq_start)]
+ diffs <- haveCds[txGtf[haveCds]$tx_cds_seq_start != txGff[haveCds]$tx_cds_seq_start]
+ head(diffs)
+
+ ## What could be reasons?
+ ## 1) alternative CDS?
+ ## Checking the GTF:
+ ## tx ENST00000623834: start_codon: 195409 195411.
+ ## first CDS: 195259 195411.
+ ## last CDS: 185220 185350.
+ ## stop_codon: 185217 185219.
+ ## So, why the heck is the stop codon OUTSIDE the CDS???
+ ## library(rtracklayer)
+ ## theGtf <- import(gtf, format="gtf")
+ ## ## Apparently, the GTF contains the additional elements start_codon/stop_codon.
+ ## theGff <- import(gff, format="gff3")
+
+
+ ## transcripts(egtf, filter=TxidFilter(diffs[1]))
+ ## transcripts(egff, filter=TxidFilter(diffs[1]))
+
+
+ ## VERSION 81
+ ## Try to get the same via AnnotationHub
+ gff <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gff3.gz"
+ fromGff <- ensDbFromGff(gff, outfile=tempfile())
+ egff <- EnsDb(fromGff)
+
+ gtf <- "/Users/jo/Projects/EnsDbs/81/homo_sapiens/Homo_sapiens.GRCh38.81.gtf.gz"
+ fromGtf <- ensDbFromGtf(gtf, outfile=tempfile())
+ egtf <- EnsDb(fromGtf)
+
+ ## Compare those two:
+ ensembldb:::compareEnsDbs(egff, egtf)
+ ## Why are there some differences in the transcripts???
+ trans1 <- transcripts(egff)
+ trans2 <- transcripts(egtf)
+ onlyInGtf <- trans2[!(trans2$tx_id %in% trans1$tx_id)]
+
+ ##gtfGRanges <- ah["AH47963"]
+
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ fromAh <- ensDbFromAH(ah["AH47963"], outfile=tempfile()) ## That's human...
+ eah <- EnsDb(fromAh)
+
+ ## Compare it to gtf:
+ ensembldb:::compareEnsDbs(eah, egtf)
+ ## OK. Same cds starts and cds ends.
+
+ ## Compare it to gff:
+ ensembldb:::compareEnsDbs(eah, egff)
+ ## hm.
+
+ ## Compare to EnsDb
+ library(EnsDb.Hsapiens.v81)
+ edb <- EnsDb.Hsapiens.v81
+ ensembldb:::compareEnsDbs(edb, egtf)
+ ## Problem with CDS
+ ensembldb:::compareEnsDbs(edb, egff)
+ ## That's fine.
+
+ ## Summary:
+ ## GTF and AH are the same.
+ ## GFF and Perl API are the same.
+
+ ## OLD STUFF BELOW.
+
+ ##fromAh <- EnsDbFromAH(ah["AH47963"], outfile=tempfile(), organism="Homo sapiens", version=81)
+
+ ## Try with a fancy species:
+ gff <- "/Users/jo/Projects/EnsDbs/83/gadus_morhua/Gadus_morhua.gadMor1.83.gff3.gz"
+ fromGtf <- ensDbFromGff(gff, outfile=tempfile())
+
+ gff <- "/Users/jo/Projects/EnsDbs/83/rattus_norvegicus/Rattus_norvegicus.Rnor_6.0.83.gff3.gz"
+ fromGff <- ensDbFromGff(gff, outfile=tempfile())
+ ## That works.
+
+ ## Try with a file from AnnotationHub: Gorilla gorilla.
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ ah <- ah["AH47962"]
+
+ res <- ensDbFromAH(ah, outfile=tempfile())
+ edb <- EnsDb(res)
+ genes(edb)
+
+
+ ## ensRel <- query(ah, c("GTF", "ensembl"))
+
+ ## gtf <- "/Users/jo/Projects/EnsDbs/83/Homo_sapiens.GRCh38.83.gtf.gz"
+ ## ## GTF
+ ## dir.create("/tmp/fromGtf")
+ ## fromGtf <- ensDbFromGtf(gtf, path="/tmp/fromGtf", verbose=TRUE)
+ ## ## GFF
+ ## dir.create("/tmp/fromGff")
+ ## fromGff <- ensembldb:::ensDbFromGff(gff, path="/tmp/fromGff", verbose=TRUE)
+
+ ## ## ZBTB16:
+ ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000335953
+ ## ## exon: ENSE00003606532 is 3rd exon of tx: ENST00000392996
+ ## ## the Ensembl GFF has 2 entries for this exon.
+
+}
+
+
+
diff --git a/inst/unitTests/test_GRangeFilter.R b/inst/unitTests/test_GRangeFilter.R
new file mode 100644
index 0000000..684aa91
--- /dev/null
+++ b/inst/unitTests/test_GRangeFilter.R
@@ -0,0 +1,102 @@
+###============================================================
+## Testing the GRangesFilter
+###------------------------------------------------------------
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_GRangesFilterValidity <- function(){
+ checkException(GRangesFilter(value="bla"))
+ checkException(GRangesFilter(GRanges(seqnames="X", ranges=IRanges(4, 6)),
+ condition=">"))
+ ## Testing slots
+ gr <- GRanges("X", ranges=IRanges(123, 234), strand="-")
+ grf <- GRangesFilter(gr, condition="within")
+ ## Now check some stuff
+ checkEquals(start(grf), start(gr))
+ checkEquals(end(grf), end(gr))
+ checkEquals(as.character(strand(gr)), strand(grf))
+ checkEquals(as.character(seqnames(gr)), seqnames(grf))
+
+ ## Test column:
+ ## filter alone.
+ tocomp <- c(start="gene_seq_start", end="gene_seq_end", seqname="seq_name",
+ strand="seq_strand")
+ checkEquals(column(grf), tocomp)
+ grf at feature <- "tx"
+ tocomp <- c(start="tx_seq_start", end="tx_seq_end", seqname="seq_name",
+ strand="seq_strand")
+ checkEquals(column(grf), tocomp)
+ grf at feature <- "exon"
+ tocomp <- c(start="exon_seq_start", end="exon_seq_end", seqname="seq_name",
+ strand="seq_strand")
+ checkEquals(column(grf), tocomp)
+ ## filter and ensdb.
+ tocomp <- c(start="exon.exon_seq_start", end="exon.exon_seq_end", seqname="gene.seq_name",
+ strand="gene.seq_strand")
+ checkEquals(column(grf, edb), tocomp)
+ grf at feature <- "tx"
+ tocomp <- c(start="tx.tx_seq_start", end="tx.tx_seq_end", seqname="gene.seq_name",
+ strand="gene.seq_strand")
+ checkEquals(column(grf, edb), tocomp)
+ grf at feature <- "gene"
+ tocomp <- c(start="gene.gene_seq_start", end="gene.gene_seq_end", seqname="gene.seq_name",
+ strand="gene.seq_strand")
+ checkEquals(column(grf, edb), tocomp)
+
+ ## Test where:
+ ## filter alone.
+ tocomp <- "gene_seq_start >= 123 and gene_seq_end <= 234 and seq_name == 'X' and seq_strand = -1"
+ checkEquals(where(grf), tocomp)
+ ## what if we set strand to *
+ grf2 <- GRangesFilter(GRanges("1", IRanges(123, 234)))
+ tocomp <- "gene.gene_seq_start >= 123 and gene.gene_seq_end <= 234 and gene.seq_name == '1'"
+ checkEquals(where(grf2, edb), tocomp)
+
+ ## Now, using overlapping.
+ grf at location <- "overlapping"
+ grf at feature <- "transcript"
+ tocomp <- "tx.tx_seq_start <= 234 and tx.tx_seq_end >= 123 and gene.seq_name = 'X' and gene.seq_strand = -1"
+ checkEquals(where(grf, edb), tocomp)
+}
+
+## Here we check if we fetch what we expect from the database.
+test_GRangesFilterQuery <- function(){
+ do.plot <- FALSE
+ zbtb <- genes(edb, filter=GenenameFilter("ZBTB16"))
+ txs <- transcripts(edb, filter=GenenameFilter("ZBTB16"))
+
+ ## Now use the GRangesFilter to fetch all tx
+ txs2 <- transcripts(edb, filter=GRangesFilter(zbtb))
+ checkEquals(txs$tx_id, txs2$tx_id)
+
+ ## Exons:
+ exs <- exons(edb, filter=GenenameFilter("ZBTB16"))
+ exs2 <- exons(edb, filter=GRangesFilter(zbtb))
+ checkEquals(exs$exon_id, exs2$exon_id)
+
+ ## Now check the filter with "overlapping".
+ intr <- GRanges("11", ranges=IRanges(114000000, 114000050), strand="+")
+ gns <- genes(edb, filter=GRangesFilter(intr, condition="overlapping"))
+ checkEquals(gns$gene_name, "ZBTB16")
+
+ txs <- transcripts(edb, filter=GRangesFilter(intr, condition="overlapping"))
+ if(do.plot){
+ plot(3, 3, pch=NA, xlim=c(start(zbtb), end(zbtb)), ylim=c(0, length(txs2)))
+ rect(xleft=start(intr), xright=end(intr), ybottom=0, ytop=length(txs2), col="red", border="red")
+ for(i in 1:length(txs2)){
+ current <- txs2[i]
+ rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
+ text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+ }
+ ## OK, that' OK.
+ }
+
+ ## OK, now for a GRangesFilter with more than one GRanges.
+ ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+ end=c(2654900, 2709550, 28111790))
+ grf2 <- GRangesFilter(GRanges(rep("Y", length(ir2)), ir2), condition="overlapping")
+ Test <- transcripts(edb, filter=grf2)
+ checkEquals(names(Test), c("ENST00000383070", "ENST00000250784", "ENST00000598545"))
+
+}
+
diff --git a/inst/unitTests/test_SymbolFilter.R b/inst/unitTests/test_SymbolFilter.R
new file mode 100644
index 0000000..a29b4ec
--- /dev/null
+++ b/inst/unitTests/test_SymbolFilter.R
@@ -0,0 +1,58 @@
+############################################################
+## Testing the SymbolFilter.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_sf_on_genes <- function(){
+ sf <- SymbolFilter("SKA2")
+ gnf <- GenenameFilter("SKA2")
+
+ returnFilterColumns(edb) <- FALSE
+ gns_sf <- genes(edb, filter=sf)
+ gns_gnf <- genes(edb, filter=gnf)
+ checkEquals(gns_sf, gns_gnf)
+
+ returnFilterColumns(edb) <- TRUE
+ gns_sf <- genes(edb, filter=sf)
+ checkEquals(gns_sf$gene_name, gns_sf$symbol)
+
+ ## Hm, what happens if we use both?
+ gns <- genes(edb, filter=list(sf, gnf))
+ ## All fine.
+}
+
+
+test_sf_on_tx <- function(){
+ sf <- SymbolFilter("SKA2")
+ gnf <- GenenameFilter("SKA2")
+
+ returnFilterColumns(edb) <- FALSE
+ tx_sf <- transcripts(edb, filter=sf)
+ tx_gnf <- transcripts(edb, filter=gnf)
+ checkEquals(tx_sf, tx_gnf)
+
+ returnFilterColumns(edb) <- TRUE
+ tx_sf <- transcripts(edb, filter=sf, columns=c("gene_name"))
+ checkEquals(tx_sf$gene_name, tx_sf$symbol)
+
+}
+
+
+test_sf_on_exons <- function(){
+ sf <- SymbolFilter("SKA2")
+ gnf <- GenenameFilter("SKA2")
+
+ returnFilterColumns(edb) <- FALSE
+ ex_sf <- exons(edb, filter=sf)
+ ex_gnf <- exons(edb, filter=gnf)
+ checkEquals(ex_sf, ex_gnf)
+
+ returnFilterColumns(edb) <- TRUE
+ ex_sf <- exons(edb, filter=sf, columns=c("gene_name"))
+ checkEquals(ex_sf$gene_name, ex_sf$symbol)
+}
+
+
+############################################################
+## select method
+
diff --git a/inst/unitTests/test_buildEdb.R b/inst/unitTests/test_buildEdb.R
new file mode 100644
index 0000000..c45b09f
--- /dev/null
+++ b/inst/unitTests/test_buildEdb.R
@@ -0,0 +1,45 @@
+test_ensDbFromGRanges <- function(){
+ load(system.file("YGRanges.RData", package="ensembldb"))
+ DB <- ensDbFromGRanges(Y, path=tempdir(), version=75,
+ organism="Homo_sapiens")
+ edb <- EnsDb(DB)
+ checkEquals(unname(genome(edb)), "GRCh37")
+}
+
+
+## Test some internal functions...
+test_processEnsemblFileNames <- function(){
+ Test <- "Homo_sapiens.GRCh38.83.gtf.gz"
+ checkTrue(ensembldb:::isEnsemblFileName(Test))
+ checkEquals(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
+ checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
+ checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+ Test <- "Homo_sapiens.GRCh38.83.chr.gff3.gz"
+ checkTrue(ensembldb:::isEnsemblFileName(Test))
+ checkEquals(ensembldb:::organismFromGtfFileName(Test), "Homo_sapiens")
+ checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GRCh38")
+ checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+ Test <- "Gadus_morhua.gadMor1.83.gff3.gz"
+ checkTrue(ensembldb:::isEnsemblFileName(Test))
+ checkEquals(ensembldb:::organismFromGtfFileName(Test), "Gadus_morhua")
+ checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "gadMor1")
+ checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "83")
+
+ Test <- "Solanum_lycopersicum.GCA_000188115.2.30.chr.gtf.gz"
+ checkTrue(ensembldb:::isEnsemblFileName(Test))
+ checkEquals(ensembldb:::organismFromGtfFileName(Test), "Solanum_lycopersicum")
+ checkEquals(ensembldb:::genomeVersionFromGtfFileName(Test), "GCA_000188115.2")
+ checkEquals(ensembldb:::ensemblVersionFromGtfFileName(Test), "30")
+
+ Test <- "ref_GRCh38.p2_top_level.gff3.gz"
+ checkEquals(ensembldb:::isEnsemblFileName(Test), FALSE)
+ ensembldb:::organismFromGtfFileName(Test)
+ checkException(ensembldb:::genomeVersionFromGtfFileName(Test))
+ ##checkException(ensembldb:::ensemblVersionFromGtfFileName(Test))
+}
+
+
+
+
diff --git a/inst/unitTests/test_getGenomeFaFile.R b/inst/unitTests/test_getGenomeFaFile.R
new file mode 100644
index 0000000..2dbd0b6
--- /dev/null
+++ b/inst/unitTests/test_getGenomeFaFile.R
@@ -0,0 +1,49 @@
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+notrun_test_getGenomeFaFile <- function(){
+ library(EnsDb.Hsapiens.v82)
+ edb <- EnsDb.Hsapiens.v82
+
+ ## We know that there is no Fasta file for that Ensembl release available.
+ Fa <- getGenomeFaFile(edb)
+ ## Got the one from Ensembl 81.
+ genes <- genes(edb, filter=SeqnameFilter("Y"))
+ geneSeqsFa <- getSeq(Fa, genes)
+ ## Get the transcript sequences...
+ txSeqsFa <- extractTranscriptSeqs(Fa, edb, filter=SeqnameFilter("Y"))
+
+ ## Get the TwoBitFile.
+ twob <- ensembldb:::getGenomeTwoBitFile(edb)
+ ## Get thegene sequences.
+ ## ERROR FIX BELOW WITH UPDATED VERSIONS!!!
+ geneSeqs2b <- getSeq(twob, genes)
+
+ ## Have to fix the seqnames.
+ si <- seqinfo(twob)
+ sn <- unlist(lapply(strsplit(seqnames(si), split=" ", fixed=TRUE), function(z){
+ return(z[1])
+ }))
+ seqnames(si) <- sn
+ seqinfo(twob) <- si
+
+ ## Do the same with the TwoBitFile
+ geneSeqsTB <- getSeq(twob, genes)
+
+ ## Subset to all genes that are encoded on chromosomes for which
+ ## we do have DNA sequence available.
+ genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+ ## Get the gene sequences, i.e. the sequence including the sequence of
+ ## all of the gene's exons and introns.
+ geneSeqs <- getSeq(Dna, genes)
+
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ quer <- query(ah, c("release-", "Homo sapiens"))
+ ## So, I get 2bit files and toplevel stuff.
+ Test <- ah[["AH50068"]]
+
+}
+
+
diff --git a/inst/unitTests/test_get_sequence.R b/inst/unitTests/test_get_sequence.R
new file mode 100644
index 0000000..801bf65
--- /dev/null
+++ b/inst/unitTests/test_get_sequence.R
@@ -0,0 +1,189 @@
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## That's now using the BSGenome package...
+test_extractTranscriptSeqs_with_BSGenome <- function(){
+ library(BSgenome.Hsapiens.UCSC.hg19)
+ bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+ ## Changing the seqlevels tyle to UCSC
+ seqlevelsStyle(edb) <- "UCSC"
+ ZBTB <- extractTranscriptSeqs(bsg, edb, filter=GenenameFilter("ZBTB16"))
+ ## Load the sequences for one ZBTB16 transcript from FA.
+ faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
+ Seqs <- readDNAStringSet(faf)
+ tx <- "ENST00000335953"
+ ## cDNA
+ checkEquals(unname(as.character(ZBTB[tx])),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## CDS
+ cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
+ CDS <- extractTranscriptSeqs(bsg, cBy)
+ checkEquals(unname(as.character(CDS)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+ ## 5' UTR
+ fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(bsg, fBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+ ## 3' UTR
+ tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(bsg, tBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+
+
+ ## Another gene on the reverse strand:
+ faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
+ Seqs <- readDNAStringSet(faf)
+ tx <- "ENST00000200135"
+ ## cDNA
+ cDNA <- extractTranscriptSeqs(bsg, edb, filter=TxidFilter(tx))
+ checkEquals(unname(as.character(cDNA)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## do the same, but from other strand
+ exns <- exonsBy(edb, "tx", filter=TxidFilter(tx))
+ cDNA <- extractTranscriptSeqs(bsg, exns)
+ checkEquals(unname(as.character(cDNA)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ strand(exns) <- "+"
+ cDNA <- extractTranscriptSeqs(bsg, exns)
+ checkTrue(unname(as.character(cDNA)) !=
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## CDS
+ cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
+ CDS <- extractTranscriptSeqs(bsg, cBy)
+ checkEquals(unname(as.character(CDS)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+ ## 5' UTR
+ fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(bsg, fBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+ ## 3' UTR
+ tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(bsg, tBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+}
+
+
+notrun_test_extractTranscriptSeqs <- function(){
+ ## Note: we can't run that by default as we can not assume everybody has
+ ## AnnotationHub and the required ressource installed.
+ ## That's how we want to test the transcript seqs.
+ genome <- getGenomeFaFile(edb)
+ ZBTB <- extractTranscriptSeqs(genome, edb, filter=GenenameFilter("ZBTB16"))
+ ## Load the sequences for one ZBTB16 transcript from FA.
+ faf <- system.file("txt/ENST00000335953.fa.gz", package="ensembldb")
+ Seqs <- readDNAStringSet(faf)
+ tx <- "ENST00000335953"
+ ## cDNA
+ checkEquals(unname(as.character(ZBTB[tx])),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## CDS
+ cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
+ CDS <- extractTranscriptSeqs(genome, cBy)
+ checkEquals(unname(as.character(CDS)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+ ## 5' UTR
+ fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(genome, fBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+ ## 3' UTR
+ tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(genome, tBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+
+
+ ## Another gene on the reverse strand:
+ faf <- system.file("txt/ENST00000200135.fa.gz", package="ensembldb")
+ Seqs <- readDNAStringSet(faf)
+ tx <- "ENST00000200135"
+ ## cDNA
+ cDNA <- extractTranscriptSeqs(genome, edb, filter=TxidFilter(tx))
+ checkEquals(unname(as.character(cDNA)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## do the same, but from other strand
+ exns <- exonsBy(edb, "tx", filter=TxidFilter(tx))
+ cDNA <- extractTranscriptSeqs(genome, exns)
+ checkEquals(unname(as.character(cDNA)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ strand(exns) <- "+"
+ cDNA <- extractTranscriptSeqs(genome, exns)
+ checkTrue(unname(as.character(cDNA)) !=
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cdna")])))
+ ## CDS
+ cBy <- cdsBy(edb, "tx", filter=TxidFilter(tx))
+ CDS <- extractTranscriptSeqs(genome, cBy)
+ checkEquals(unname(as.character(CDS)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="cds")])))
+ ## 5' UTR
+ fBy <- fiveUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(genome, fBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr5")])))
+ ## 3' UTR
+ tBy <- threeUTRsByTranscript(edb, filter=TxidFilter(tx))
+ UTR <- extractTranscriptSeqs(genome, tBy)
+ checkEquals(unname(as.character(UTR)),
+ unname(as.character(Seqs[grep(names(Seqs), pattern="utr3")])))
+}
+
+notrun_test_getCdsSequence <- function(){
+ ## That's when we like to get the sequence from the coding region.
+ genome <- getGenomeFaFile(edb)
+ tx <- extractTranscriptSeqs(genome, edb, filter=SeqnameFilter("Y"))
+ cdsSeq <- extractTranscriptSeqs(genome, cdsBy(edb, filter=SeqnameFilter("Y")))
+ ## that's basically to get the CDS sequence.
+ ## UTR sequence:
+ tutr <- extractTranscriptSeqs(genome, threeUTRsByTranscript(edb, filter=SeqnameFilter("Y")))
+ futr <- extractTranscriptSeqs(genome, fiveUTRsByTranscript(edb, filter=SeqnameFilter("Y")))
+ theTx <- "ENST00000602770"
+ fullSeq <- as.character(tx[theTx])
+ ## build the one from 5', cds and 3'
+ compSeq <- ""
+ if(any(names(futr) == theTx))
+ compSeq <- paste0(compSeq, as.character(futr[theTx]))
+ if(any(names(cdsSeq) == theTx))
+ compSeq <- paste0(compSeq, as.character(cdsSeq[theTx]))
+ if(any(names(tutr) == theTx))
+ compSeq <- paste(compSeq, as.character(tutr[theTx]))
+ checkEquals(unname(fullSeq), compSeq)
+}
+
+notrun_test_cds <- function(){
+ library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+ txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+ cds <- cds(txdb)
+ cby <- cdsBy(txdb, by="tx")
+
+ gr <- cby[[7]][1]
+ seqlevels(gr) <- sub(seqlevels(gr), pattern="chr", replacement="")
+ tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+ cby[[7]]
+
+ ## Note: so that fits! And we've to include the stop_codon feature for GTF import!
+ ## Make an TxDb from GTF:
+ gtf <- "/Users/jo/Projects/EnsDbs/75/homo_sapiens/Homo_sapiens.GRCh37.75.gtf.gz"
+ library(GenomicFeatures)
+ Test <- makeTxDbFromGFF(gtf, format="gtf", organism="Homo sapiens")
+ scds <- cdsBy(Test, by="tx")
+ gr <- scds[[7]][1]
+ tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+ scds[[7]]
+ ## Compare:
+ ## TxDb form GTF has: 865692 879533
+ ## EnsDb: 865692 879533
+
+ ## Next test:
+ gr <- scds[[2]][1]
+ tx <- transcripts(edb, filter=GRangesFilter(gr, condition="overlapping"))
+ tx
+ scds[[2]]
+ ## start_codon: 367659 367661, stop_codon: 368595 368597 CDS: 367659 368594.
+ ## TxDb from GTF includes the stop_codon!
+}
+
diff --git a/inst/unitTests/test_mysql.R b/inst/unitTests/test_mysql.R
new file mode 100644
index 0000000..e1ba213
--- /dev/null
+++ b/inst/unitTests/test_mysql.R
@@ -0,0 +1,24 @@
+############################################################
+## Can not perform these tests right away, as they require a
+## working MySQL connection.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+dontrun_test_useMySQL <- function() {
+ edb_mysql <- useMySQL(edb, user = "anonuser", host = "localhost", pass = "")
+}
+
+dontrun_test_connect_EnsDb <- function() {
+ library(RMySQL)
+ con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "")
+
+ ensembldb:::listEnsDbs(dbcon = con)
+ ## just with user.
+ ensembldb:::listEnsDbs(user = "anonuser", host = "localhost", pass = "",
+ port = 3306)
+
+ ## Connecting directly to a EnsDb MySQL database.
+ con <- dbConnect(MySQL(), user = "anonuser", host = "localhost", pass = "",
+ dbname = "ensdb_hsapiens_v75")
+ edb_mysql <- EnsDb(con)
+}
diff --git a/inst/unitTests/test_ordering.R b/inst/unitTests/test_ordering.R
new file mode 100644
index 0000000..2e4f0b4
--- /dev/null
+++ b/inst/unitTests/test_ordering.R
@@ -0,0 +1,280 @@
+############################################################
+## Some tests on the ordering/sorting of the results.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Compare the results for genes call with and without ordering in R
+test_ordering_genes <- function() {
+ orig <- ensembldb:::orderResultsInR(edb)
+ ensembldb:::orderResultsInR(edb) <- FALSE
+ res_sql <- genes(edb, return.type = "data.frame")
+ ensembldb:::orderResultsInR(edb) <- TRUE
+ res_r <- genes(edb, return.type = "data.frame")
+ rownames(res_sql) <- NULL
+ rownames(res_r) <- NULL
+ checkIdentical(res_sql, res_r)
+ ## Join tx table
+ ensembldb:::orderResultsInR(edb) <- FALSE
+ res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
+ return.type = "data.frame")
+ ensembldb:::orderResultsInR(edb) <- TRUE
+ res_r <- genes(edb, columns = c("gene_id", "tx_id"),
+ return.type = "data.frame")
+ rownames(res_sql) <- NULL
+ rownames(res_r) <- NULL
+ checkIdentical(res_sql, res_r)
+ ## Join tx table and use an SeqnameFilter
+ ensembldb:::orderResultsInR(edb) <- FALSE
+ res_sql <- genes(edb, columns = c("gene_id", "tx_id"),
+ filter = SeqnameFilter("Y"))
+ ensembldb:::orderResultsInR(edb) <- TRUE
+ res_r <- genes(edb, columns = c("gene_id", "tx_id"),
+ filter = SeqnameFilter("Y"))
+ checkIdentical(res_sql, res_r)
+
+ ensembldb:::orderResultsInR(edb) <- orig
+}
+
+dontrun_benchmark_ordering_genes <- function() {
+ .withR <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- TRUE
+ genes(x, ...)
+ }
+ .withSQL <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- FALSE
+ genes(x, ...)
+ }
+ library(microbenchmark)
+ microbenchmark(.withR(edb), .withSQL(edb), times = 10) ## same
+ microbenchmark(.withR(edb, columns = c("gene_id", "tx_id")),
+ .withSQL(edb, columns = c("gene_id", "tx_id")),
+ times = 10) ## R slightly faster.
+ microbenchmark(.withR(edb, columns = c("gene_id", "tx_id"),
+ SeqnameFilter("Y")),
+ .withSQL(edb, columns = c("gene_id", "tx_id"),
+ SeqnameFilter("Y")),
+ times = 10) ## same.
+}
+
+## We aim to fix issue #11 by performing the ordering in R instead
+## of SQL. Thus, we don't want to run this as a "regular" test
+## case.
+dontrun_test_ordering_cdsBy <- function() {
+ doBench <- FALSE
+ if (doBench)
+ library(microbenchmark)
+ .withR <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- TRUE
+ cdsBy(x, ...)
+ }
+ .withSQL <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- FALSE
+ cdsBy(x, ...)
+ }
+ res_sql <- .withSQL(edb)
+ res_r <- .withR(edb)
+ checkEquals(res_sql, res_r)
+ if (dobench)
+ microbenchmark(.withSQL(edb), .withR(edb),
+ times = 3) ## R slightly faster.
+ res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
+ res_r <- .withR(edb, filter = SeqnameFilter("Y"))
+ checkEquals(res_sql, res_r)
+ if (dobench)
+ microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
+ .withR(edb, filter = SeqnameFilter("Y")),
+ times = 10) ## R 6x faster.
+}
+
+dontrun_test_ordering_exonsBy <- function() {
+ doBench <- FALSE
+ if (doBench)
+ library(microbenchmark)
+ .withR <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- TRUE
+ exonsBy(x, ...)
+ }
+ .withSQL <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- FALSE
+ exonsBy(x, ...)
+ }
+ res_sql <- .withSQL(edb)
+ res_r <- .withR(edb)
+ checkEquals(res_sql, res_r)
+ if (doBench)
+ microbenchmark(.withSQL(edb), .withR(edb),
+ times = 3) ## about the same; R slightly faster.
+ ## with using a SeqnameFilter in addition.
+ res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
+ res_r <- .withR(edb, filter = SeqnameFilter("Y")) ## query takes longer.
+ checkEquals(res_sql, res_r)
+ if (doBench)
+ microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
+ .withR(edb, filter = SeqnameFilter("Y")),
+ times = 3) ## SQL twice as fast.
+ ## Now getting stuff by gene
+ res_sql <- .withSQL(edb, by = "gene")
+ res_r <- .withR(edb, by = "gene")
+ ## checkEquals(res_sql, res_r) ## Differences due to ties
+ if (doBench)
+ microbenchmark(.withSQL(edb, by = "gene"),
+ .withR(edb, by = "gene"),
+ times = 3) ## SQL faster; ???
+ ## Along with a SeqnameFilter
+ res_sql <- .withSQL(edb, by = "gene", filter = SeqnameFilter("Y"))
+ res_r <- .withR(edb, by = "gene", filter = SeqnameFilter("Y"))
+ ## Why does the query take longer for R???
+ ## checkEquals(res_sql, res_r) ## Differences due to ties
+ if (doBench)
+ microbenchmark(.withSQL(edb, by = "gene", filter = SeqnameFilter("Y")),
+ .withR(edb, by = "gene", filter = SeqnameFilter("Y")),
+ times = 3) ## SQL faster.
+ ## Along with a GenebiotypeFilter
+ if (doBench)
+ microbenchmark(.withSQL(edb, by = "gene", filter = GenebiotypeFilter("protein_coding"))
+ , .withR(edb, by = "gene", filter = GenebiotypeFilter("protein_coding"))
+ , times = 3)
+}
+
+dontrun_test_ordering_transcriptsBy <- function() {
+ .withR <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- TRUE
+ transcriptsBy(x, ...)
+ }
+ .withSQL <- function(x, ...) {
+ ensembldb:::orderResultsInR(x) <- FALSE
+ transcriptsBy(x, ...)
+ }
+ res_sql <- .withSQL(edb)
+ res_r <- .withR(edb)
+ checkEquals(res_sql, res_r)
+ microbenchmark(.withSQL(edb), .withR(edb), times = 3) ## same speed
+
+ res_sql <- .withSQL(edb, filter = SeqnameFilter("Y"))
+ res_r <- .withR(edb, filter = SeqnameFilter("Y"))
+ checkEquals(res_sql, res_r)
+ microbenchmark(.withSQL(edb, filter = SeqnameFilter("Y")),
+ .withR(edb, filter = SeqnameFilter("Y")),
+ times = 3) ## SQL slighly faster.
+}
+
+dontrun_query_tune <- function() {
+ ## Query tuning:
+ library(RSQLite)
+ con <- dbconn(edb)
+
+ Q <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene join tx on (gene.gene_id=tx.gene_id) join tx2exon on (tx.tx_id=tx2exon.tx_id) join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y'"
+ system.time(dbGetQuery(con, Q))
+
+ Q2 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from exon join tx2exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
+ system.time(dbGetQuery(con, Q2))
+
+ Q3 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y'"
+ system.time(dbGetQuery(con, Q3))
+
+ Q4 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon join exon on (tx2exon.exon_id = exon.exon_id) join tx on (tx2exon.tx_id = tx.tx_id) join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
+ system.time(dbGetQuery(con, Q4))
+
+ Q5 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from tx2exon inner join exon on (tx2exon.exon_id = exon.exon_id) inner join tx on (tx2exon.tx_id = tx.tx_id) inner join gene on (gene.gene_id=tx.gene_id) where gene.seq_name = 'Y' order by tx.tx_id"
+ system.time(dbGetQuery(con, Q5))
+
+ Q6 <- "select distinct tx2exon.exon_id,exon.exon_seq_start,exon.exon_seq_end,gene.seq_name,tx2exon.tx_id,gene.seq_strand,tx2exon.exon_idx from gene inner join tx on (gene.gene_id=tx.gene_id) inner join tx2exon on (tx.tx_id=tx2exon.tx_id) inner join exon on (tx2exon.exon_id=exon.exon_id) where gene.seq_name = 'Y' order by tx.tx_id asc"
+ system.time(dbGetQuery(con, Q6))
+}
+
+
+## Compare the performance of doing the sorting within R or
+## directly in the SQL query.
+dontrun_test_ordering_performance <- function() {
+
+ library(RUnit)
+ library(RSQLite)
+ ## gene table: order by in SQL query vs R:
+ db_con <- dbconn(edb)
+
+ .callWithOrder <- function(con, query, orderBy = "",
+ orderSQL = TRUE) {
+ if (all(orderBy == ""))
+ orderBy <- NULL
+ if (orderSQL & !is.null(orderBy)) {
+ orderBy <- paste(orderBy, collapse = ", ")
+ query <- paste0(query, " order by ", orderBy)
+ }
+ res <- dbGetQuery(con, query)
+ if (!orderSQL & !all(is.null(orderBy))) {
+ if (!all(orderBy %in% colnames(res)))
+ stop("orderBy not in columns!")
+ ## Do the ordering in R
+ res <- res[do.call(order,
+ c(list(method = "radix"),
+ as.list(res[, orderBy, drop = FALSE]))), ]
+ }
+ rownames(res) <- NULL
+ return(res)
+ }
+
+ #######################
+ ## gene table
+ ## Simple condition
+ the_q <- "select * from gene"
+ system.time(res1 <- .callWithOrder(db_con, query = the_q))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderSQL = FALSE))
+ checkIdentical(res1, res2)
+ ## order by gene_id
+ orderBy <- "gene_id"
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ ## SQL: 0.16, R: 0.164.
+ checkIdentical(res1, res2)
+ ## order by gene_name
+ orderBy <- "gene_name"
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ checkIdentical(res1, res2)
+ ## SQL: 0.245, R: 0.185
+ ## sort by gene_name and gene_seq_start
+ orderBy <- c("gene_name", "gene_seq_start")
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ ## SQL: 0.26, R: 0.188
+ checkEquals(res1, res2)
+ ## with subsetting:
+ the_q <- "select * from gene where seq_name in ('5', 'Y')"
+ orderBy <- c("gene_name", "gene_seq_start")
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ ## SQL: 0.031, R: 0.024
+ checkEquals(res1, res2)
+
+ ########################
+ ## joining tables.
+ the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
+ " join tx2exon on (tx.tx_id = tx2exon.tx_id)")
+ orderBy <- c("tx_id", "exon_id")
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ ## SQL: 9.6, R: 9.032
+ checkEquals(res1, res2)
+ ## subsetting.
+ the_q <- paste0("select * from gene join tx on (gene.gene_id = tx.gene_id)",
+ " join tx2exon on (tx.tx_id = tx2exon.tx_id) where",
+ " seq_name = 'Y'")
+ orderBy <- c("tx_id", "exon_id")
+ system.time(res1 <- .callWithOrder(db_con, query = the_q, orderBy = orderBy))
+ system.time(res2 <- .callWithOrder(db_con, query = the_q,
+ orderBy = orderBy, orderSQL = FALSE))
+ ## SQL: 0.9, R: 1.6
+ checkEquals(res1, res2)
+}
+
+## implement:
+## .checkOrderBy: checks order.by argument removing columns that are
+## not present in the database
+## orderBy columns are added to the columns.
+## .orderDataFrameBy: orders the dataframe by the specified columns.
diff --git a/inst/unitTests/test_performance.R b/inst/unitTests/test_performance.R
new file mode 100644
index 0000000..057c3d0
--- /dev/null
+++ b/inst/unitTests/test_performance.R
@@ -0,0 +1,62 @@
+############################################################
+## These are not test cases to be executed, but performance
+## comparisons.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+
+############################################################
+## Compare MySQL vs SQLite backends:
+## Amazing how inefficient the MySQL backend seems to be! Most
+## likely it's due to RMySQL, not MySQL.
+dontrun_test_MySQL_vs_SQLite <- function() {
+ ## Compare the performance of the MySQL backend against
+ ## the SQLite backend.
+ edb_mysql <- useMySQL(edb, user = "anonuser", pass = "")
+
+ library(microbenchmark)
+ ## genes
+ microbenchmark(genes(edb), genes(edb_mysql), times = 5)
+ microbenchmark(genes(edb, filter = GenebiotypeFilter("lincRNA")),
+ genes(edb_mysql, filter = GenebiotypeFilter("lincRNA")),
+ times = 5)
+ microbenchmark(genes(edb, filter = SeqnameFilter(20:23)),
+ genes(edb_mysql, filter = SeqnameFilter(20:23)),
+ times = 5)
+ microbenchmark(genes(edb, columns = "tx_id"),
+ genes(edb_mysql, columns = "tx_id"),
+ times = 5)
+ microbenchmark(genes(edb, filter = GenenameFilter("BCL2L11")),
+ genes(edb_mysql, filter = GenenameFilter("BCL2L11")),
+ times = 5)
+ ## transcripts
+ microbenchmark(transcripts(edb),
+ transcripts(edb_mysql),
+ times = 5)
+ microbenchmark(transcripts(edb, filter = GenenameFilter("BCL2L11")),
+ transcripts(edb_mysql, filter = GenenameFilter("BCL2L11")),
+ times = 5)
+ ## exons
+ microbenchmark(exons(edb),
+ exons(edb_mysql),
+ times = 5)
+ microbenchmark(exons(edb, filter = GenenameFilter("BCL2L11")),
+ exons(edb_mysql, filter = GenenameFilter("BCL2L11")),
+ times = 5)
+ ## exonsBy
+ microbenchmark(exonsBy(edb),
+ exonsBy(edb_mysql),
+ times = 5)
+ microbenchmark(exonsBy(edb, filter = SeqnameFilter("Y")),
+ exonsBy(edb_mysql, filter = SeqnameFilter("Y")),
+ times = 5)
+ ## cdsBy
+ microbenchmark(cdsBy(edb), cdsBy(edb_mysql), times = 5)
+ microbenchmark(cdsBy(edb, by = "gene"), cdsBy(edb_mysql, by = "gene"),
+ times = 5)
+ microbenchmark(cdsBy(edb, filter = SeqstrandFilter("-")),
+ cdsBy(edb_mysql, filter = SeqstrandFilter("-")),
+ times = 5)
+
+}
+
diff --git a/inst/unitTests/test_returnCols.R b/inst/unitTests/test_returnCols.R
new file mode 100644
index 0000000..6b829f5
--- /dev/null
+++ b/inst/unitTests/test_returnCols.R
@@ -0,0 +1,319 @@
+############################################################
+## Here we're checking the returnFilterColumns setting, i.e.
+## whether also filter columns should be returned or not.
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Testing the internal function.
+test_set_returnFilterColumns <- function(x) {
+ orig <- returnFilterColumns(edb)
+ returnFilterColumns(edb) <- TRUE
+ checkEquals(TRUE, returnFilterColumns(edb))
+ returnFilterColumns(edb) <- FALSE
+ checkEquals(FALSE, returnFilterColumns(edb))
+ checkException(returnFilterColumns(edb) <- "d")
+ checkException(returnFilterColumns(edb) <- c(TRUE, FALSE))
+ ## Restore the "original" setting
+ returnFilterColumns(edb) <- orig
+}
+
+test_with_genes <- function(x) {
+ orig <- returnFilterColumns(edb)
+
+ returnFilterColumns(edb) <- FALSE
+ ## What happens if we use a GRangesFilter with return filter cols FALSE?
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+ res <- genes(edb, filter = grf)
+ checkEquals(res$gene_id, c("ENSG00000224738", "ENSG00000182628", "ENSG00000252212",
+ "ENSG00000211514", "ENSG00000207996"))
+ cols <- c("gene_id", "gene_name")
+ res <- genes(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Expect only the columns
+ checkEquals(colnames(res), cols)
+ returnFilterColumns(edb) <- TRUE
+ res <- genes(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(res), c(cols, "gene_seq_start", "gene_seq_end", "seq_name",
+ "seq_strand"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- genes(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(res$gene_name, "SKA2")
+ checkEquals(colnames(res), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end",
+ "seq_name", "seq_strand"))
+ returnFilterColumns(edb) <- FALSE
+ res <- genes(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(colnames(res), cols)
+
+ returnFilterColumns(edb) <- orig
+}
+
+
+test_with_tx <- function(x) {
+ orig <- returnFilterColumns(edb)
+
+ returnFilterColumns(edb) <- FALSE
+ ## What happens if we use a GRangesFilter with return filter cols FALSE?
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+ res <- transcripts(edb, filter = grf)
+ cols <- c("tx_id", "gene_name")
+ res <- transcripts(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Expect only the columns
+ checkEquals(colnames(res), cols)
+ returnFilterColumns(edb) <- TRUE
+ res <- transcripts(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(res), c(cols, "tx_seq_start", "tx_seq_end", "seq_name",
+ "seq_strand"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- transcripts(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(unique(res$gene_name), "SKA2")
+ checkEquals(colnames(res), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
+ "seq_name", "seq_strand"))
+ returnFilterColumns(edb) <- FALSE
+ res <- transcripts(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(colnames(res), cols)
+
+ returnFilterColumns(edb) <- orig
+}
+
+
+test_with_exons <- function(x) {
+ orig <- returnFilterColumns(edb)
+
+ returnFilterColumns(edb) <- FALSE
+ ## What happens if we use a GRangesFilter with return filter cols FALSE?
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+ res <- exons(edb, filter = grf)
+ cols <- c("exon_id", "gene_name")
+ res <- exons(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Expect only the columns
+ checkEquals(colnames(res), cols)
+ returnFilterColumns(edb) <- TRUE
+ res <- exons(edb, filter = grf, return.type = "data.frame",
+ columns = cols)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(res), c(cols, "exon_seq_start", "exon_seq_end", "seq_name",
+ "seq_strand"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- exons(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(unique(res$gene_name), c("TRIM37", "SKA2"))
+ checkEquals(colnames(res), c(cols, "gene_biotype", "exon_seq_start", "exon_seq_end",
+ "seq_name", "seq_strand"))
+ returnFilterColumns(edb) <- FALSE
+ res <- exons(edb, filter = list(gbt, grf), return.type = "data.frame",
+ columns = cols)
+ checkEquals(colnames(res), cols)
+
+ returnFilterColumns(edb) <- orig
+}
+
+test_with_exonsBy <- function(x) {
+ orig <- returnFilterColumns(edb)
+
+ returnFilterColumns(edb) <- FALSE
+ ## What happens if we use a GRangesFilter with return filter cols FALSE?
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+ ## By genes
+ cols <- c("exon_id", "gene_name")
+ res <- exonsBy(edb, by = "gene", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Expect only the columns
+ checkEquals(colnames(mcols(res)), cols)
+
+ returnFilterColumns(edb) <- TRUE
+ res <- exonsBy(edb, by = "gene", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Now I expect also the gene coords, but not the seq_name and seq_strand, as these
+ ## are redundant with data which is in the GRanges!
+ checkEquals(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- unlist(exonsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
+ checkEquals(unique(res$gene_name), c("SKA2"))
+ checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
+ returnFilterColumns(edb) <- FALSE
+ res <- unlist(exonsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
+ checkEquals(colnames(mcols(res)), cols)
+
+ ## By tx
+ returnFilterColumns(edb) <- FALSE
+ cols <- c("exon_id", "gene_name")
+ res <- exonsBy(edb, by = "tx", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Expect only the columns
+ checkEquals(colnames(mcols(res)), c(cols, "exon_rank"))
+
+ returnFilterColumns(edb) <- TRUE
+ res <- exonsBy(edb, by = "tx", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+ "exon_rank"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
+ checkEquals(unique(res$gene_name), c("SKA2"))
+ checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
+ "exon_rank"))
+ returnFilterColumns(edb) <- FALSE
+ res <- unlist(exonsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
+ checkEquals(colnames(mcols(res)), c(cols, "exon_rank"))
+
+ returnFilterColumns(edb) <- orig
+}
+
+
+test_with_transcriptsBy <- function(x) {
+ orig <- returnFilterColumns(edb)
+
+ returnFilterColumns(edb) <- FALSE
+ ## What happens if we use a GRangesFilter with return filter cols FALSE?
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+ ## By genes
+ cols <- c("tx_id", "gene_name")
+ res <- transcriptsBy(edb, by = "gene", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Expect only the columns
+ checkEquals(colnames(mcols(res)), cols)
+
+ returnFilterColumns(edb) <- TRUE
+ res <- transcriptsBy(edb, by = "gene", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(mcols(res)), c(cols, "gene_seq_start", "gene_seq_end"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
+ checkEquals(unique(res$gene_name), c("SKA2"))
+ checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "gene_seq_start", "gene_seq_end"))
+ returnFilterColumns(edb) <- FALSE
+ res <- unlist(transcriptsBy(edb, by = "gene", filter = list(gbt, grf), columns = cols))
+ checkEquals(colnames(mcols(res)), cols)
+
+ ## ## By exon
+ ## returnFilterColumns(edb) <- FALSE
+ ## cols <- c("tx_id", "gene_name")
+ ## res <- transcriptsBy(edb, by = "exon", filter = grf, columns = cols)
+ ## res <- unlist(res)
+ ## ## Expect only the columns
+ ## checkEquals(colnames(mcols(res)), c(cols))
+
+ ## returnFilterColumns(edb) <- TRUE
+ ## res <- transcriptsBy(edb, by = "exon", filter = grf, columns = cols)
+ ## res <- unlist(res)
+ ## ## Now I expect also the gene coords.
+ ## checkEquals(colnames(mcols(res)), c(cols, "exon_seq_start", "exon_seq_end"))
+
+ ## ## Use a gene biotype filter
+ ## gbt <- GenebiotypeFilter("protein_coding")
+
+ ## returnFilterColumns(edb) <- TRUE
+ ## res <- unlist(transcriptsBy(edb, by = "exon", filter = list(gbt, grf), columns = cols))
+ ## checkEquals(unique(res$gene_name), c("SKA2", "TRIM37"))
+ ## checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "exon_seq_start", "exon_seq_end"))
+ ## returnFilterColumns(edb) <- FALSE
+ ## res <- unlist(transcriptsBy(edb, by = "exon", filter = list(gbt, grf), columns = cols))
+ ## checkEquals(colnames(mcols(res)), c(cols))
+
+ returnFilterColumns(edb) <- orig
+}
+
+test_with_cdsBy <- function(x) {
+ orig <- returnFilterColumns(edb)
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+
+ ## By tx
+ returnFilterColumns(edb) <- FALSE
+ cols <- c("gene_id", "gene_name")
+ res <- cdsBy(edb, by = "tx", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Expect only the columns
+ checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
+
+ returnFilterColumns(edb) <- TRUE
+ res <- cdsBy(edb, by = "tx", filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+ "seq_name", "seq_strand", "exon_id", "exon_rank"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- unlist(cdsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
+ checkEquals(unique(res$gene_name), c("SKA2"))
+ checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
+ "seq_name", "seq_strand", "exon_id", "exon_rank"))
+ returnFilterColumns(edb) <- FALSE
+ res <- unlist(cdsBy(edb, by = "tx", filter = list(gbt, grf), columns = cols))
+ checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
+
+ returnFilterColumns(edb) <- orig
+}
+
+test_with_threeUTRsByTranscript <- function(x) {
+ orig <- returnFilterColumns(edb)
+ grf <- GRangesFilter(GRanges(17, IRanges(57180000, 57233000)))
+
+ ## By tx
+ returnFilterColumns(edb) <- FALSE
+ cols <- c("gene_id", "gene_name")
+ res <- threeUTRsByTranscript(edb, filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Expect only the columns
+ checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
+
+ returnFilterColumns(edb) <- TRUE
+ res <- threeUTRsByTranscript(edb, filter = grf, columns = cols)
+ res <- unlist(res)
+ ## Now I expect also the gene coords.
+ checkEquals(colnames(mcols(res)), c(cols, "tx_seq_start", "tx_seq_end",
+ "seq_name", "seq_strand", "exon_id", "exon_rank"))
+
+ ## Use a gene biotype filter
+ gbt <- GenebiotypeFilter("protein_coding")
+
+ returnFilterColumns(edb) <- TRUE
+ res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf), columns = cols))
+ checkEquals(unique(res$gene_name), c("SKA2"))
+ checkEquals(colnames(mcols(res)), c(cols, "gene_biotype", "tx_seq_start", "tx_seq_end",
+ "seq_name", "seq_strand", "exon_id", "exon_rank"))
+ returnFilterColumns(edb) <- FALSE
+ res <- unlist(threeUTRsByTranscript(edb, filter = list(gbt, grf), columns = cols))
+ checkEquals(colnames(mcols(res)), c(cols, "exon_id", "exon_rank"))
+
+ returnFilterColumns(edb) <- orig
+}
+
diff --git a/inst/unitTests/test_select.R b/inst/unitTests/test_select.R
new file mode 100644
index 0000000..17ff834
--- /dev/null
+++ b/inst/unitTests/test_select.R
@@ -0,0 +1,229 @@
+####============================================================
+## test cases for AnnotationDbi methods.
+##
+####------------------------------------------------------------
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_columns <- function(){
+ cols <- columns(edb)
+ ## Don't expect to see any _ there...
+ checkEquals(length(grep(cols, pattern="_")), 0)
+}
+
+test_keytypes <- function(){
+ keyt <- keytypes(edb)
+ checkEquals(all(c("GENEID", "EXONID", "TXID") %in% keyt), TRUE)
+}
+
+test_mapper <- function(){
+ Test <- ensembldb:::ensDbColumnForColumn(edb, "GENEID")
+ checkEquals(unname(Test), "gene_id")
+
+ Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID"))
+ checkEquals(unname(Test), c("gene_id", "tx_id"))
+
+ Test <- ensembldb:::ensDbColumnForColumn(edb, c("GENEID", "TXID", "bla"))
+ checkEquals(unname(Test), c("gene_id", "tx_id"))
+}
+
+test_keys <- function(){
+ ## get all gene ids
+ system.time(
+ ids <- keys(edb, "GENEID")
+ )
+ checkEquals(length(ids), length(unique(ids)))
+ ## get all tx ids
+ system.time(
+ ids <- keys(edb, "TXID")
+ )
+ ## Get the TXNAME...
+ nms <- keys(edb, "TXNAME")
+ checkEquals(nms, ids)
+ checkEquals(length(ids), length(unique(ids)))
+ ## get all gene names
+ system.time(
+ ids <- keys(edb, "GENENAME")
+ )
+ checkEquals(length(ids), length(unique(ids)))
+ ## get all seq names
+ system.time(
+ ids <- keys(edb, "SEQNAME")
+ )
+ checkEquals(length(ids), length(unique(ids)))
+ ## get all seq strands
+ system.time(
+ ids <- keys(edb, "SEQSTRAND")
+ )
+ checkEquals(length(ids), length(unique(ids)))
+ ## get all gene biotypes
+ system.time(
+ ids <- keys(edb, "GENEBIOTYPE")
+ )
+ checkEquals(ids, listGenebiotypes(edb))
+}
+
+test_select <- function(){
+ ## Test:
+ ## Provide GenenameFilter.
+ gf <- GenenameFilter("BCL2")
+ system.time(
+ Test <- select(edb, keys=gf)
+ )
+ ## Provide list of GenenameFilter and TxbiotypeFilter.
+ Test2 <- select(edb, keys=list(gf, TxbiotypeFilter("protein_coding")))
+ checkEquals(Test$EXONID[Test$TXBIOTYPE == "protein_coding"], Test2$EXONID)
+ ## Choose selected columns.
+ Test3 <- select(edb, keys=gf, columns=c("GENEID", "GENENAME", "SEQNAME"))
+ checkEquals(unique(Test[, c("GENEID", "GENENAME", "SEQNAME")]), Test3)
+ ## Provide keys.
+ Test4 <- select(edb, keys="BCL2", keytype="GENENAME")
+ checkEquals(Test[, colnames(Test4)], Test4)
+ txs <- keys(edb, "TXID")
+ ## Just get stuff from the tx table; should be faster.
+ system.time(
+ Test <- select(edb, keys=txs, columns=c("TXID", "TXBIOTYPE", "GENEID"), keytype="TXID")
+ )
+ checkEquals(all(Test$TXID==txs), TRUE)
+ ## Get all lincRNA genes
+ Test <- select(edb, keys="lincRNA", columns=c("GENEID", "GENEBIOTYPE", "GENENAME"),
+ keytype="GENEBIOTYPE")
+ Test2 <- select(edb, keys=GenebiotypeFilter("lincRNA"),
+ columns=c("GENEID", "GENEBIOTYPE", "GENENAME"))
+ checkEquals(Test[, colnames(Test2)], Test2)
+ ## All on chromosome 21
+ Test <- select(edb, keys="21", columns=c("GENEID", "GENEBIOTYPE", "GENENAME"),
+ keytype="SEQNAME")
+ Test2 <- select(edb, keys=SeqnameFilter("21"), columns=c("GENEID", "GENEBIOTYPE", "GENENAME"))
+ checkEquals(Test[, colnames(Test2)], Test2)
+ ## What if we can't find it?
+ Test <- select(edb, keys="bla", columns=c("GENEID", "GENENAME"), keytype="GENENAME")
+ ## Run the full thing.
+ ## system.time(
+ ## All <- select(edb)
+ ## )
+ ## Test <- select(edb, keys=txs, keytype="TXID")
+ ## checkEquals(Test, All)
+ Test <- select(edb, keys="ENST00000000233", columns=c("GENEID", "GENENAME"), keytype="TXNAME")
+ checkEquals(Test$TXNAME, "ENST00000000233")
+ ## Check what happens if we just add TXNAME and also TXID.
+ Test2 <- select(edb, keys=list(gf, TxbiotypeFilter("protein_coding")), columns=c("TXID", "TXNAME",
+ "GENENAME", "GENEID"))
+
+}
+
+test_mapIds <- function(){
+ ## Simple... map gene ids to gene names
+ allgenes <- keys(edb, keytype="GENEID")
+ randordergenes <- allgenes[sample(1:length(allgenes), 100)]
+ system.time(
+ mi <- mapIds(edb, keys=allgenes, keytype="GENEID", column = "GENENAME")
+ )
+ checkEquals(allgenes, names(mi))
+ ## What happens if the ordering is different:
+ mi <- mapIds(edb, keys=randordergenes, keytype="GENEID", column = "GENENAME")
+ checkEquals(randordergenes, names(mi))
+
+ ## Now check the different options:
+ ## Handle multi mappings.
+ ## first
+ first <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID")
+ checkEquals(names(first), randordergenes)
+ ## list
+ lis <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="list")
+ checkEquals(names(lis), randordergenes)
+ Test <- lapply(lis, function(z){return(z[1])})
+ checkEquals(first, unlist(Test))
+ ## filter
+ filt <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="filter")
+ checkEquals(filt, unlist(lis[unlist(lapply(lis, length)) == 1]))
+ ## asNA
+ asNA <- mapIds(edb, keys=randordergenes, keytype="GENEID", column="TXID", multiVals="asNA")
+
+ ## Check what happens if we provide 2 identical keys.
+ Test <- mapIds(edb, keys=c("BCL2", "BCL2L11", "BCL2"), keytype="GENENAME", column="TXID")
+
+ ## Submit Filter:
+ Test <- mapIds(edb, keys=SeqnameFilter("Y"), column="GENEID", multiVals="list")
+ TestS <- select(edb, keys=Test[[1]], columns="SEQNAME", keytype="GENEID")
+ checkEquals(unique(TestS$SEQNAME), "Y")
+ ## Submit 2 filter.
+ Test <- mapIds(edb, keys=list(SeqnameFilter("Y"), SeqstrandFilter("-")), multiVals="list",
+ column="GENEID")
+ TestS <- select(edb, keys=Test[[1]], keytype="GENEID", columns=c("SEQNAME", "SEQSTRAND"))
+ checkTrue(all(TestS$SEQNAME == "Y"))
+ checkTrue(all(TestS$SEQSTRAND == -1))
+}
+
+## Test if the results are properly sorted if we submit a single filter or just keys.
+test_select_sorted <- function() {
+ ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
+ ## gene_name
+ res <- select(edb, keys = ks, keytype = "GENENAME")
+ checkEquals(unique(res$GENENAME), ks)
+ res <- select(edb, keys = GenenameFilter(ks))
+ checkEquals(unique(res$GENENAME), ks)
+
+ ## Using two filters;
+ res <- select(edb, keys = list(GenenameFilter(ks),
+ TxbiotypeFilter("nonsense_mediated_decay")))
+ ## We don't expect same sorting here!
+ checkTrue(!all(unique(res$GENENAME) == ks[ks %in% unique(res$GENENAME)]))
+
+ ## symbol
+ res <- select(edb, keys = ks, keytype = "SYMBOL",
+ columns = c("GENENAME", "SYMBOL", "SEQNAME"))
+
+ ## tx_biotype
+ ks <- c("retained_intron", "nonsense_mediated_decay")
+ res <- select(edb, keys = ks, keytype = "TXBIOTYPE",
+ columns = c("GENENAME", "TXBIOTYPE"))
+ checkEquals(unique(res$TXBIOTYPE), ks)
+ res <- select(edb, keys = TxbiotypeFilter(ks),
+ keytype = "TXBIOTYPE", columns = c("GENENAME", "TXBIOTYPE"))
+ checkEquals(unique(res$TXBIOTYPE), ks)
+}
+
+test_select_symbol <- function() {
+ ## Can I use SYMBOL as keytype?
+ ks <- c("ZBTB16", "BCL2", "SKA2", "BCL2L11")
+ res <- select(edb, keys = ks, keytype = "GENENAME")
+ res2 <- select(edb, keys = ks, keytype = "SYMBOL")
+ checkEquals(res, res2)
+
+ ## Can I use the SymbolFilter?
+ res <- select(edb, keys = GenenameFilter(ks),
+ columns = c("TXNAME", "SYMBOL", "GENEID"))
+ checkEquals(colnames(res), c("TXNAME", "SYMBOL", "GENEID", "GENENAME"))
+
+ res <- select(edb, keys = SymbolFilter(ks), columns=c("GENEID"))
+ checkEquals(colnames(res), c("GENEID", "SYMBOL"))
+ checkEquals(res$SYMBOL, ks)
+
+ ## Can I ask for SYMBOL?
+ res <- select(edb, keys = list(SeqnameFilter("Y"),
+ GenebiotypeFilter("lincRNA")),
+ columns = c("GENEID", "SYMBOL"))
+ checkEquals(colnames(res), c("GENEID", "SYMBOL", "SEQNAME", "GENEBIOTYPE"))
+}
+
+test_select_symbol_n_txname <- function() {
+ ks <- c("ZBTB16", "BCL2", "SKA2")
+ ## Symbol allowed in keytype
+ res <- select(edb, keys = ks, keytype = "SYMBOL", columns = "GENENAME")
+ checkEquals(colnames(res), c("SYMBOL", "GENENAME"))
+ checkEquals(res$SYMBOL, ks)
+
+ ## Symbol using SymbolFilter
+ res <- select(edb, keys = SymbolFilter(ks), columns = "GENENAME")
+ checkEquals(colnames(res), c("GENENAME", "SYMBOL"))
+ checkEquals(res$SYMBOL, ks)
+
+ ## Symbol as a column.
+ res <- select(edb, keys = ks, keytype = "GENENAME", columns = "SYMBOL")
+ checkEquals(colnames(res), c("GENENAME", "SYMBOL"))
+
+ ## TXNAME as a column
+ res <- select(edb, keys = ks, keytype = "GENENAME", columns = c("TXNAME"))
+ checkEquals(colnames(res), c("GENENAME", "TXNAME"))
+}
diff --git a/inst/unitTests/test_transcript_lengths.R b/inst/unitTests/test_transcript_lengths.R
new file mode 100644
index 0000000..3240e4b
--- /dev/null
+++ b/inst/unitTests/test_transcript_lengths.R
@@ -0,0 +1,140 @@
+####============================================================
+## Tests related to transcript/feature length calculations.
+##
+##
+####------------------------------------------------------------
+## Loading data and stuff
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Just run that after Herve has added the mods to the transcriptLengths function.
+notyetrun_transcriptLengths <- function(){
+
+ ## With filter.
+ daFilt <- SeqnameFilter("Y")
+ allTxY <- transcripts(edb, filter=daFilt)
+ txLenY <- transcriptLengths(edb, filter=daFilt)
+ checkEquals(names(allTxY), rownames(txLenY))
+
+ ## Check if lengths are OK:
+ txLenY2 <- lengthOf(edb, "tx", filter=daFilt)
+ checkEquals(unname(txLenY2[rownames(txLenY)]), txLenY$tx_len)
+
+ ## Include the cds, 3' and 5' UTR
+ txLenY <- transcriptLengths(edb, with.cds_len = TRUE, with.utr5_len = TRUE,
+ with.utr3_len = TRUE,
+ filter=daFilt)
+ ## sum of 5' CDS and 3' has to match tx_len:
+ txLen <- rowSums(txLenY[, c("cds_len", "utr5_len", "utr3_len")])
+ checkEquals(txLenY[!is.na(txLen), "tx_len"], unname(txLen[!is.na(txLen)]))
+ ## just to be sure...
+ checkEquals(txLenY[!is.na(txLenY$utr3_len), "tx_len"],
+ unname(txLen[!is.na(txLenY$utr3_len)]))
+ ## Seems to be OK.
+
+ ## Next check the 5' UTR lengths: that also verifies the fiveUTR call.
+ futr <- fiveUTRsByTranscript(edb, filter=daFilt)
+ futrLen <- sum(width(futr))
+ checkEquals(unname(futrLen), txLenY[names(futrLen), "utr5_len"])
+ ## 3'
+ tutr <- threeUTRsByTranscript(edb, filter=daFilt)
+ tutrLen <- sum(width(tutr))
+ checkEquals(unname(tutrLen), txLenY[names(tutrLen), "utr3_len"])
+}
+
+notrun_compare_full <- function(){
+ ## That's on the full thing.
+ ## Test if the result has the same ordering than the transcripts call.
+ allTx <- transcripts(edb)
+ txLen <- transcriptLengths(edb, with.cds_len=TRUE, with.utr5_len=TRUE,
+ with.utr3_len=TRUE)
+ checkEquals(names(allTx), rownames(txLen))
+ system.time(
+ futr <- fiveUTRsByTranscript(edb)
+ )
+ ## 23 secs.
+ futrLen <- sum(width(futr)) ## do I need reduce???
+ checkEquals(unname(futrLen), txLen[names(futrLen), "utr5_len"])
+ ## 3'
+ system.time(
+ tutr <- threeUTRsByTranscript(edb)
+ )
+ system.time(
+ tutrLen <- sum(width(tutr))
+ )
+ checkEquals(unname(tutrLen), txLen[names(tutrLen), "utr3_len"])
+}
+
+notrun_compare_to_genfeat <- function(){
+ library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+ txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+
+ system.time(
+ Len <- transcriptLengths(edb)
+ )
+ ## Woa, 52 sec
+ system.time(
+ txLen <- lengthOf(edb, "tx")
+ )
+ ## Faster, 31 sec
+ checkEquals(Len$tx_len, unname(txLen[rownames(Len)]))
+ system.time(
+ Len2 <- transcriptLengths(txdb)
+ )
+ ## :) 2.5 sec.
+ ## Next.
+ system.time(
+ Len <- transcriptLengths(edb, with.cds_len = TRUE)
+ )
+ ## 56 sec
+ system.time(
+ Len2 <- transcriptLengths(txdb, with.cds_len=TRUE)
+ )
+ ## 4 sec.
+
+ ## Calling the transcriptLengths of GenomicFeatures on the EnsDb.
+ system.time(
+ Def <- GenomicFeatures::transcriptLengths(edb)
+ ) ## 26.5 sec
+
+ system.time(
+ WithCds <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE)
+ ) ## 55 sec
+
+ system.time(
+ WithAll <- GenomicFeatures::transcriptLengths(edb, with.cds_len=TRUE,
+ with.utr5_len=TRUE,
+ with.utr3_len=TRUE)
+ ) ## 99 secs
+
+ ## Get my versions...
+ system.time(
+ MyDef <- ensembldb:::.transcriptLengths(edb)
+ ) ## 31 sec
+ system.time(
+ MyWithCds <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE)
+ ) ## 44 sec
+ system.time(
+ MyWithAll <- ensembldb:::.transcriptLengths(edb, with.cds_len=TRUE,
+ with.utr5_len=TRUE,
+ with.utr3_len=TRUE)
+ ) ## 63 sec
+
+ ## Should be all the same!!!
+ rownames(MyDef) <- NULL
+ checkEquals(Def, MyDef)
+ ##
+ rownames(MyWithCds) <- NULL
+ MyWithCds[is.na(MyWithCds$cds_len), "cds_len"] <- 0
+ checkEquals(WithCds, MyWithCds)
+ ##
+ rownames(MyWithAll) <- NULL
+ MyWithAll[is.na(MyWithAll$cds_len), "cds_len"] <- 0
+ MyWithAll[is.na(MyWithAll$utr3_len), "utr3_len"] <- 0
+ MyWithAll[is.na(MyWithAll$utr5_len), "utr5_len"] <- 0
+ checkEquals(WithAll, MyWithAll)
+}
+
+
+
+
diff --git a/inst/unitTests/test_ucscChromosomeNames.R b/inst/unitTests/test_ucscChromosomeNames.R
new file mode 100644
index 0000000..096a5d6
--- /dev/null
+++ b/inst/unitTests/test_ucscChromosomeNames.R
@@ -0,0 +1,508 @@
+###================================================
+## Here we check functionality to use EnsDbs with
+## UCSC chromosome names
+###------------------------------------------------
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## library(EnsDb.Hsapiens.v83)
+## edb <- EnsDb.Hsapiens.v83
+## library(EnsDb.Hsapiens.v81)
+
+test_seqlevels <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ options(ensembldb.seqnameNotFound=NA)
+ edb <- EnsDb.Hsapiens.v75
+ SL <- seqlevels(edb)
+ ucscs <- paste0("chr", c(1:22, "X", "Y", "M"))
+ seqlevelsStyle(edb) <- "UCSC"
+ SL2 <- seqlevels(edb)
+ checkEquals(sort(ucscs), sort(SL2[!is.na(SL2)]))
+ ## Check if we throw an error message
+ options(ensembldb.seqnameNotFound="MISSING")
+ checkException(seqlevels(edb))
+ ## Check if returning original names works.
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ SL3 <- seqlevels(edb)
+ idx <- which(SL3 %in% ucscs)
+ checkEquals(sort(SL[-idx]), sort(SL3[-idx]))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_seqinfo <- function(){
+ edb <- EnsDb.Hsapiens.v75
+ orig <- getOption("ensembldb.seqnameNotFound")
+ options(ensembldb.seqnameNotFound="MISSING")
+ seqlevelsStyle(edb) <- "UCSC"
+ checkException(seqinfo(edb))
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ si <- seqinfo(edb)
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+## Testing if getWhat returns what we expect.
+test_getWhat_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ensRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+ seqlevelsStyle(edb) <- "UCSC"
+ ucscRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+ seqlevelsStyle(edb) <- "NCBI"
+ ncbiRes <- ensembldb:::getWhat(edb, columns=c("seq_name", "seq_strand"))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_SeqnameFilter_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ options(ensembldb.seqnameNotFound="MISSING")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ snf <- SeqnameFilter("chrX")
+ snfEns <- SeqnameFilter(c("X", "Y"))
+ snfNo <- SeqnameFilter(c("bla", "blu"))
+ snfSomeNo <- SeqnameFilter(c("bla", "X"))
+
+ seqlevelsStyle(edb) <- "Ensembl"
+ checkEquals(value(snf), "chrX")
+ ## That makes no sense for a query though.
+ checkEquals(value(snf, edb), "chrX")
+ checkEquals(value(snfEns, edb), c("X", "Y"))
+ seqlevelsStyle(edb) <- "UCSC"
+ checkEquals(value(snf, edb), "X")
+ checkException(value(snfEns, edb))
+ checkException(value(snfNo, edb))
+ checkException(value(snfSomeNo, edb))
+
+ ## Setting the options to "ORIGINAL"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ checkEquals(value(snf, edb), "X")
+ checkEquals(value(snfEns, edb), c("X", "Y"))
+ checkEquals(value(snfNo, edb), c("bla", "blu"))
+ checkEquals(value(snfSomeNo, edb), c("bla", "X"))
+ ##
+ snf <- SeqnameFilter(c("chrX", "Y"))
+ checkEquals(value(snf, edb), c("X", "Y"))
+
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_genes_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ ## Here we want to test whether the result returned by the function does really
+ ## work when changing the seqnames.
+ seqlevelsStyle(edb) <- "Ensembl"
+ ensAll <- genes(edb)
+ ens21Y <- genes(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(as.character(unique(seqnames(ens21Y)))), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- genes(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(strand(ensY))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ## Just visually inspect the seqinfo and seqnames for the "all" query.
+ ucscAll <- genes(edb)
+ as.character(unique(seqnames(ucscAll)))
+ ucsc21Y <- genes(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- genes(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(strand(ucscY))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_transcripts_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- transcripts(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- transcripts(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(strand(ensY))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- transcripts(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- transcripts(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(strand(ucscY))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_transcriptsBy_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- transcriptsBy(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- transcriptsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- transcriptsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- transcriptsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_exons_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- exons(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- exons(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(strand(ensY))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- exons(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- exons(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(strand(ucscY))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_exonsBy_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- exonsBy(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- exonsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- exonsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- exonsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+
+test_cdsBy_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- cdsBy(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- cdsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- cdsBy(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- cdsBy(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_threeUTRsByTranscript_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- threeUTRsByTranscript(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- threeUTRsByTranscript(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- threeUTRsByTranscript(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+test_fiveUTRsByTranscript_seqnames <- function(){
+ orig <- getOption("ensembldb.seqnameNotFound")
+ edb <- EnsDb.Hsapiens.v75
+ seqlevelsStyle(edb) <- "Ensembl"
+ ens21Y <- fiveUTRsByTranscript(edb, filter=SeqnameFilter(c("Y", "21")))
+ checkEquals(sort(seqlevels(ens21Y)), c("21", "Y"))
+ gr <- GRanges(seqnames="Y", ranges=IRanges(start=1, end=59373566), strand="+")
+ ensY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ensY), "Y")
+ checkEquals(unique(as.character(unlist(strand(ensY)))), "+")
+
+ ## Check UCSC stuff
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ ucsc21Y <- fiveUTRsByTranscript(edb, filter=SeqnameFilter(c("chrY", "chr21")))
+ checkEquals(sort(seqlevels(ucsc21Y)), c("chr21", "chrY"))
+ checkEquals(sort(names(ens21Y)), sort(names(ucsc21Y)))
+ ## GRangesFilter.
+ gr <- GRanges(seqnames="chrY", ranges=IRanges(start=1, end=59373566), strand="+")
+ ucscY <- fiveUTRsByTranscript(edb, filter=GRangesFilter(gr))
+ checkEquals(seqlevels(ucscY), "chrY")
+ checkEquals(unique(as.character(unlist(strand(ucscY)))), "+")
+ checkEquals(sort(names(ensY)), sort(names(ucscY)))
+ options(ensembldb.seqnameNotFound=orig)
+}
+
+
+test_updateEnsDb <- function(){
+ edb2 <- updateEnsDb(edb)
+ checkEquals(edb2 at tables, edb at tables)
+ checkTrue(.hasSlot(edb2, ".properties"))
+}
+
+test_properties <- function(){
+ checkEquals(ensembldb:::getProperty(edb, "foo"), NA)
+
+ checkException(ensembldb:::setProperty(edb, "foo"))
+
+ edb <- ensembldb:::setProperty(edb, foo="bar")
+ checkEquals(ensembldb:::getProperty(edb, "foo"), "bar")
+ checkEquals(length(ensembldb:::properties(edb)), 4)
+}
+
+test_set_get_seqlevelsStyle <- function(){
+ edb <- EnsDb.Hsapiens.v75
+ ## Testing the getter/setter for the seqlevelsStyle.
+ checkEquals(seqlevelsStyle(edb), "Ensembl")
+ checkEquals(NA, ensembldb:::getProperty(edb, "seqlevelsStyle"))
+
+ seqlevelsStyle(edb) <- "Ensembl"
+ checkEquals(seqlevelsStyle(edb), "Ensembl")
+ checkEquals("Ensembl", ensembldb:::getProperty(edb, "seqlevelsStyle"))
+
+ ## Try NCBI.
+ seqlevelsStyle(edb) <- "NCBI"
+ checkEquals(seqlevelsStyle(edb), "NCBI")
+
+ ## Try UCSC.
+ seqlevelsStyle(edb) <- "UCSC"
+ checkEquals(seqlevelsStyle(edb), "UCSC")
+
+ ## Error checking:
+ checkException(seqlevelsStyle(edb) <- "bla")
+}
+
+## Just dry run this without any actual query.
+test_formatSeqnamesForQuery <- function(){
+ ## Testing if the formating/mapping between seqnames works as expected
+ ## We want to map anything TO Ensembl.
+ ## Check also the warning messages!
+ ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
+ enses <- c("1", "3", "1", "9", "MT", "1", "X")
+ ## reset
+ edb <- EnsDb.Hsapiens.v75
+ ## Shouldn't do anything here.
+ seqlevelsStyle(edb)
+ ensembldb:::dbSeqlevelsStyle(edb)
+ got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
+ checkEquals(got, enses)
+ ## Change the seqlevels to UCSC
+ seqlevelsStyle(edb) <- "UCSC"
+ ## If ifNotFound is not specified we suppose to get an error.
+ options(ensembldb.seqnameNotFound="MISSING")
+ checkException(ensembldb:::formatSeqnamesForQuery(edb, enses))
+ ## With specifying ifNotFound
+ got <- ensembldb:::formatSeqnamesForQuery(edb, enses, ifNotFound=NA)
+ checkEquals(all(is.na(got)), TRUE)
+ ## Same by setting the option
+ options(ensembldb.seqnameNotFound=NA)
+ got <- ensembldb:::formatSeqnamesForQuery(edb, enses)
+ checkEquals(all(is.na(got)), TRUE)
+
+ ## Now the working example:
+ got <- ensembldb:::formatSeqnamesForQuery(edb, ucscs)
+ checkEquals(got, enses)
+ ## What if one is not mappable:
+ got <- ensembldb:::formatSeqnamesForQuery(edb, c(ucscs, "asdfd"), ifNotFound=NA)
+ checkEquals(got, c(enses, NA))
+}
+
+## Just dry run this without any actual query
+test_formatSeqnamesFromQuery <- function(){
+ ucscs <- c("chr1", "chr3", "chr1", "chr9", "chrM", "chr1", "chrX")
+ enses <- c("1", "3", "1", "9", "MT", "1", "X")
+ edb <- EnsDb.Hsapiens.v75
+ ## Shouldn't do anything here.
+ seqlevelsStyle(edb)
+ ensembldb:::dbSeqlevelsStyle(edb)
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
+ checkEquals(got, enses)
+ ## Change the seqlevels to UCSC
+ seqlevelsStyle(edb) <- "UCSC"
+ ## If ifNotFound is not specified we suppose to get an error.
+ options(ensembldb.seqnameNotFound="MISSING")
+ checkException(ensembldb:::formatSeqnamesFromQuery(edb, ucsc))
+ ## With specifying ifNotFound
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
+ checkEquals(all(is.na(got)), TRUE)
+ ## Same using options
+ options(ensembldb.seqnameNotFound=NA)
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, ucscs, ifNotFound=NA)
+ checkEquals(all(is.na(got)), TRUE)
+ ## Now the working example:
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, enses)
+ checkEquals(got, ucscs)
+ ## What if one is not mappable:
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"), ifNotFound=NA)
+ checkEquals(got, c(ucscs, NA))
+ got <- ensembldb:::formatSeqnamesFromQuery(edb, c(enses, "asdfd"))
+ checkEquals(got, c(ucscs, NA))
+}
+
+notrun_test_set_seqlevels <- function(){
+ ## To test what happens if no mapping is available
+ ##gff <- "/Users/jo/Projects/EnsDbs/83/gadus_morhua/Gadus_morhua.gadMor1.83.gff3.gz"
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ ah <- ah["AH47962"]
+ fromG <- ensDbFromAH(ah, outfile=tempfile())
+ edb <- EnsDb(fromG)
+ seqlevelsStyle(edb)
+ checkException(seqlevelsStyle(edb) <- "UCSC")
+}
+
+
+
+
+
+deprecated_test_check_SeqnameFilter <- function(){
+ Orig <- getOption("ucscChromosomeNames", FALSE)
+ options(ucscChromosomeNames=TRUE)
+ snf <- SeqnameFilter(c("chrX", "chr3"))
+ checkEquals(value(snf), c("chrX", "chr3"))
+ checkEquals(value(snf, edb), c("X", "3"))
+
+ options(ucscChromosomeNames=FALSE)
+ checkEquals(value(snf, edb), c("X", "3"))
+
+ ## No matter what, where has to return names without chr!
+ checkEquals(where(snf, edb), "gene.seq_name in ('X','3')")
+
+ ## GRangesFilter:
+ grf <- GRangesFilter(GRanges("chrX", IRanges(123, 345)))
+ checkEqualsNumeric(length(grep(where(grf), pattern="seq_name == 'chrX'")), 1)
+ checkEqualsNumeric(length(grep(where(grf, edb), pattern="seq_name == 'X'")), 1)
+
+ ## Check chromosome MT/chrM
+ options(ucscChromosomeNames=FALSE)
+ snf <- SeqnameFilter("MT")
+ checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
+ snf <- SeqnameFilter("chrM")
+ checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
+ options(ucscChromosomeNames=TRUE)
+ snf <- SeqnameFilter("MT")
+ checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
+ snf <- SeqnameFilter("chrM")
+ checkEquals(where(snf, edb), "gene.seq_name = 'MT'")
+
+ options(ucscChromosomeNames=Orig)
+}
+
+deprecated_test_check_retrieve_data <- function(){
+ Orig <- getOption("ucscChromosomeNames", FALSE)
+
+ options(ucscChromosomeNames=FALSE)
+ genes <- genes(edb, filter=SeqnameFilter(c("21", "Y", "X")))
+ checkEquals(all(seqlevels(genes) %in% c("21", "X", "Y")), TRUE)
+ options(ucscChromosomeNames=TRUE)
+ genes <- genes(edb, filter=SeqnameFilter(c("21", "Y", "X")))
+ checkEquals(all(seqlevels(genes) %in% c("chr21", "chrX", "chrY")), TRUE)
+
+ ## Check chromosome MT
+ options(ucscChromosomeNames=FALSE)
+ exons <- exons(edb, filter=SeqnameFilter("MT"))
+ checkEquals(seqlevels(exons), "MT")
+ options(ucscChromosomeNames=TRUE)
+ exons <- exons(edb, filter=SeqnameFilter("MT"))
+ checkEquals(seqlevels(exons), "chrM")
+
+ options(ucscChromosomeNames=Orig)
+}
+
+
+notrun_check_get_sequence_bsgenome <- function(){
+ edb <- EnsDb.Hsapiens.v75
+ ## Using first the Ensembl fasta stuff.
+ ensSeqs <- extractTranscriptSeqs(getGenomeFaFile(edb),
+ exonsBy(edb, "tx", filter=SeqnameFilter("Y")))
+ ## Now the same using the BSgenome stuff.
+ seqlevelsStyle(edb) <- "UCSC"
+ options(ensembldb.seqnameNotFound="ORIGINAL")
+ exs <- exonsBy(edb, "tx", filter=SeqnameFilter("chrY"))
+ library(BSgenome.Hsapiens.UCSC.hg19)
+ bsg <- BSgenome.Hsapiens.UCSC.hg19
+ ucscSeqs <- extractTranscriptSeqs(bsg, exs)
+
+ checkEquals(as.character(ensSeqs), as.character(ucscSeqs))
+}
+
+
+## Use the stuff from GenomeInfoDb!
+notrun_test_newstuff <- function(){
+ library(GenomeInfoDb)
+ Map <- mapSeqlevels(seqlevels(edb), style="Ensembl")
+ Map <- mapSeqlevels(seqlevels(edb), style="UCSC")
+ ## just check what's out there
+ genomeStyles()
+}
+
+
diff --git a/inst/unitTests/test_validity.R b/inst/unitTests/test_validity.R
new file mode 100644
index 0000000..9560f12
--- /dev/null
+++ b/inst/unitTests/test_validity.R
@@ -0,0 +1,11 @@
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_validity_functions <- function() {
+ OK <- ensembldb:::dbHasRequiredTables(dbconn(edb))
+ checkTrue(OK)
+ ## Check the tables
+ OK <- ensembldb:::dbHasValidTables(dbconn(edb))
+ checkTrue(OK)
+}
+
diff --git a/inst/unitTests/test_xByOverlap.R b/inst/unitTests/test_xByOverlap.R
new file mode 100644
index 0000000..3da75f6
--- /dev/null
+++ b/inst/unitTests/test_xByOverlap.R
@@ -0,0 +1,102 @@
+####============================================================
+## tests for exonsByOverlaps, transcriptsByOverlaps
+##
+####------------------------------------------------------------
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+test_transcriptsByOverlaps <- function(){
+ ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+ end=c(2654900, 2709550, 28111790))
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+ grf2 <- GRangesFilter(gr2, condition="overlapping")
+ Test <- transcripts(edb, filter=grf2)
+
+ Test2 <- transcriptsByOverlaps(edb, gr2)
+ checkEquals(names(Test), names(Test2))
+
+ ## on one strand.
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand=rep("-", length(ir2)))
+ grf2 <- GRangesFilter(gr2, condition="overlapping")
+ Test <- transcripts(edb, filter=grf2)
+ Test2 <- transcriptsByOverlaps(edb, gr2)
+ checkEquals(names(Test), names(Test2))
+
+ ## Combine with filter...
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+ Test3 <- transcriptsByOverlaps(edb, gr2, filter=SeqstrandFilter("-"))
+ checkEquals(names(Test), names(Test3))
+}
+
+test_exonsByOverlaps <- function(){
+ ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+ end=c(2654900, 2709550, 28111790))
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+ grf2 <- GRangesFilter(gr2, condition="overlapping")
+ Test <- exons(edb, filter=grf2)
+
+ Test2 <- exonsByOverlaps(edb, gr2)
+ checkEquals(names(Test), names(Test2))
+
+ ## on one strand.
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2, strand=rep("-", length(ir2)))
+ grf2 <- GRangesFilter(gr2, condition="overlapping")
+ Test <- exons(edb, filter=grf2)
+ Test2 <- exonsByOverlaps(edb, gr2)
+ checkEquals(names(Test), names(Test2))
+
+ ## Combine with filter...
+ gr2 <- GRanges(rep("Y", length(ir2)), ir2)
+ Test3 <- exonsByOverlaps(edb, gr2, filter=SeqstrandFilter("-"))
+ checkEquals(names(Test), names(Test3))
+}
+
+
+testing_txByOverlap <- function(){
+ ## Apparently, a combination between transcripts and findoverlaps.
+ grf <- GRangesFilter(GRanges(seqname="Y", IRanges(start=2655145, end=2655500)),
+ condition="overlapping")
+ grf2 <- GRangesFilter(GRanges(seqname="Y", IRanges(start=28740998, end=28741998)),
+ condition="overlapping")
+ transcripts(edb, filter=list(SeqnameFilter("Y"), GenebiotypeFilter("protein_coding")))
+ where(grf)
+ con <- dbconn(edb)
+ library(RSQLite)
+ q <- paste0("select * from gene where (", where(grf, edb),
+ ") or (", where(grf2), ")")
+ Test <- dbGetQuery(con, q)
+
+ ## Here we go...
+ ir <- IRanges(start=c(142999, 231380, 27635900),
+ end=c(143300, 231800, 27636200))
+ gr <- GRanges(seqname=rep("Y", length(ir)), ir)
+ grf <- GRangesFilter(gr, condition="overlapping")
+ where(grf)
+ where(grf, edb)
+ Test <- transcripts(edb, filter=grf)
+ ## ?? Nothing ??
+ ir2 <- IRanges(start=c(2654890, 2709520, 28111770),
+ end=c(2654900, 2709550, 28111790))
+ grf2 <- GRangesFilter(GRanges(rep("Y", length(ir2)), ir2), condition="overlapping")
+ Test <- transcripts(edb, filter=grf2)
+ checkEquals(names(Test), c("ENST00000383070", "ENST00000250784", "ENST00000598545"))
+ ## ## TxDb...
+ ## library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+ ## txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+ ## gr <- GRanges(seqname=c("chrY", "chrY", "chrY", "chrY"),
+ ## IRanges(start=c(2655145, 28740998, 2709990, 28111770),
+ ## end=c(2655200, 28741998, 2709999, 28112800)))
+ ## transcriptsByOverlaps(txdb, GRanges(seqname=rep("chrY", length(ir)), ir))
+ ## transcriptsByOverlaps(txdb, GRanges(seqname=rep("chrY", length(ir2)), ir2))
+
+}
+
+notrun_txdb <- function(){
+ txdb <- loadDb(system.file("extdata", "hg19_knownGene_sample.sqlite",
+ package="GenomicFeatures"))
+ gr <- GRanges(seqnames = rep("chr1",2),
+ ranges = IRanges(start=c(500,10500), end=c(10000,30000)),
+ strand = strand(rep("-",2)))
+ transcriptsByOverlaps(txdb, gr)
+}
+
diff --git a/man/EnsDb-AnnotationDbi.Rd b/man/EnsDb-AnnotationDbi.Rd
new file mode 100644
index 0000000..497f736
--- /dev/null
+++ b/man/EnsDb-AnnotationDbi.Rd
@@ -0,0 +1,223 @@
+\name{select}
+\Rdversion{1.1}
+\alias{select}
+\alias{select,EnsDb-method}
+\alias{columns,EnsDb-method}
+\alias{keys,EnsDb-method}
+\alias{keytypes,EnsDb-method}
+\alias{mapIds,EnsDb-method}
+
+\title{Integration into the AnnotationDbi framework}
+\description{
+ Several of the methods available for \code{AnnotationDbi} objects are
+ also implemented for \code{EnsDb} objects. This enables to extract
+ data from \code{EnsDb} objects in a similar fashion than from objects
+ inheriting from the base annotation package class
+ \code{AnnotationDbi}.
+ In addition to the \emph{standard} usage, the \code{select} and
+ \code{mapIds} for \code{EnsDb} objects support also the filter
+ framework of the ensembdb package and thus allow to perform more
+ fine-grained queries to retrieve data.
+}
+\usage{
+
+\S4method{columns}{EnsDb}(x)
+\S4method{keys}{EnsDb}(x, keytype, filter,...)
+\S4method{keytypes}{EnsDb}(x)
+\S4method{mapIds}{EnsDb}(x, keys, column, keytype, ..., multiVals)
+\S4method{select}{EnsDb}(x, keys, columns, keytype, ...)
+
+}
+\arguments{
+
+ (In alphabetic order)
+
+ \item{column}{
+ For \code{mapIds}: the column to search on, i.e. from which values
+ should be retrieved.
+ }
+
+ \item{columns}{
+ For \code{select}: the columns from which values should be
+ retrieved. Use the \code{columns} method to list all possible
+ columns.
+ }
+
+ \item{keys}{
+ The keys/ids for which data should be retrieved from the
+ database. This can be either a character vector of keys/IDs, a
+ single filter object extending \code{\linkS4class{BasicFilter}} or a
+ list of such objects.
+ }
+
+ \item{keytype}{
+ For \code{mapIds} and \code{select}: the type (column) that matches
+ the provided keys. This argument does not have to be specified if
+ argument \code{keys} is a filter object extending
+ \code{\linkS4class{BasicFilter}} or a \code{list} of such objects.
+
+ For \code{keys}: which keys should be returned from the database.
+ }
+
+ \item{filter}{
+ For \code{keys}: either a single object extending
+ \code{\linkS4class{BasicFilter}} or a list of such object to
+ retrieve only specific keys from the database.
+ }
+
+ \item{multiVals}{
+ What should \code{mapIds} do when there are multiple values that
+ could be returned? Options are: \code{"first"}, \code{"list"},
+ \code{"filter"}, \code{"asNA"}. See
+ \code{\link[AnnotationDbi]{mapIds}} for a detailed description.
+ }
+
+ \item{x}{
+ The \code{EnsDb} object.
+ }
+
+ \item{...}{
+ Not used.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{columns}{
+ List all the columns that can be retrieved by the \code{mapIds}
+ and \code{select} methods. Note that these column names are
+ different from the ones supported by the \code{\link{genes}},
+ \code{\link{transcripts}} etc. methods that can be listed by the
+ \code{\link{listColumns}} method.
+
+ Returns a character vector of supported column names.
+ }
+
+ \item{keys}{
+ Retrieves all keys from the column name specified with
+ \code{keytype}. By default (if \code{keytype} is not provided) it
+ returns all gene IDs. Note that \code{keytype="TXNAME"} will
+ return transcript ids, since no transcript names are available in
+ the database.
+
+ Returns a character vector of IDs.
+ }
+
+ \item{keytypes}{
+ List all supported key types (column names).
+
+ Returns a character vector of key types.
+ }
+
+ \item{mapIds}{
+ Retrieve the mapped ids for a set of keys that are of a particular
+ keytype. Argument \code{keys} can be either a character vector of
+ keys/IDs, a single filter object extending
+ \code{\linkS4class{BasicFilter}} or a list of such objects. For
+ the latter, the argument \code{keytype} does not have to be
+ specified. Importantly however, if the filtering system is used,
+ the ordering of the results might not represent the ordering of
+ the keys.
+
+ The method usually returns a named character vector or, depending
+ on the argument \code{multiVals} a named list, with names
+ corresponding to the keys (same ordering is only guaranteed if
+ \code{keys} is a character vector).
+ }
+
+ \item{select}{
+ Retrieve the data as a \code{data.frame} based on parameters for
+ selected \code{keys}, \code{columns} and \code{keytype}
+ arguments. Multiple matches of the keys are returned in one row
+ for each possible match. Argument \code{keys} can be either a
+ character vector of keys/IDs, a single filter object extending
+ \code{\linkS4class{BasicFilter}} or a list of such objects. For
+ the latter, the argument \code{keytype} does not have to be
+ specified.
+
+ Note that values from a column \code{"TXNAME"} will be the same
+ than for a column \code{"TXID"}, since internally no database
+ column \code{"tx_name"} is present and the column is thus mapped
+ to \code{"tx_id"}.
+
+ Returns a \code{data.frame} with the column names corresponding to
+ the argument \code{columns} and rows with all data matching the
+ criteria specified with \code{keys}.
+ }
+
+ }
+}
+
+\value{
+ See method description above.
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\linkS4class{BasicFilter}}
+ \code{\link{listColumns}}
+ \code{\link{transcripts}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## List all supported keytypes.
+keytypes(edb)
+
+## List all supported columns for the select and mapIds methods.
+columns(edb)
+
+## List /real/ database column names.
+listColumns(edb)
+
+## Retrieve all keys corresponding to transcript ids.
+txids <- keys(edb, keytype="TXID")
+length(txids)
+head(txids)
+
+## Retrieve all keys corresponding to gene names of genes encoded on chromosome X
+gids <- keys(edb, keytype="GENENAME", filter=SeqnameFilter("X"))
+length(gids)
+head(gids)
+
+## Get a mapping of the genes BCL2 and BCL2L11 to all of their
+## transcript ids and return the result as list
+maps <- mapIds(edb, keys=c("BCL2", "BCL2L11"), column="TXID",
+ keytype="GENENAME", multiVals="list")
+maps
+
+## Perform the same query using a combination of a GenenameFilter and a TxbiotypeFilter
+## to just retrieve protein coding transcripts for these two genes.
+mapIds(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column="TXID",
+ multiVals="list")
+
+## select:
+## Retrieve all transcript and gene related information for the above example.
+select(edb, keys=list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns=c("GENEID", "GENENAME", "TXID", "TXBIOTYPE", "TXSEQSTART", "TXSEQEND",
+ "SEQNAME", "SEQSTRAND"))
+
+## Get all data for genes encoded on chromosome Y
+Y <- select(edb, keys="Y", keytype="SEQNAME")
+head(Y)
+nrow(Y)
+
+## Get selected columns for all lincRNAs encoded on chromosome Y
+Y <- select(edb, keys=list(SeqnameFilter("Y"), GenebiotypeFilter("lincRNA")),
+ columns=c("GENEID", "GENEBIOTYPE", "TXID", "GENENAME"))
+head(Y)
+nrow(Y)
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb-class.Rd b/man/EnsDb-class.Rd
new file mode 100644
index 0000000..2c0c943
--- /dev/null
+++ b/man/EnsDb-class.Rd
@@ -0,0 +1,368 @@
+\name{EnsDb-class}
+\Rdversion{1.1}
+\docType{class}
+\alias{EnsDb-class}
+\alias{buildQuery}
+\alias{buildQuery,EnsDb-method}
+\alias{dbconn}
+\alias{dbconn,EnsDb-method}
+\alias{ensemblVersion}
+\alias{ensemblVersion,EnsDb-method}
+\alias{listColumns}
+\alias{listColumns,EnsDb-method}
+\alias{metadata}
+\alias{metadata,EnsDb-method}
+\alias{seqinfo}
+\alias{seqinfo,EnsDb-method}
+\alias{seqlevels}
+\alias{seqlevels,EnsDb-method}
+\alias{organism}
+\alias{organism,EnsDb-method}
+\alias{show}
+\alias{show,EnsDb-method}
+\alias{listGenebiotypes}
+\alias{listGenebiotypes,EnsDb-method}
+\alias{listTxbiotypes}
+\alias{listTxbiotypes,EnsDb-method}
+\alias{listTables}
+\alias{listTables,EnsDb-method}
+\alias{updateEnsDb}
+\alias{updateEnsDb,EnsDb-method}
+\alias{returnFilterColumns}
+\alias{returnFilterColumns,EnsDb-method}
+\alias{returnFilterColumns<-}
+\alias{returnFilterColumns<-,EnsDb-method}
+
+
+\title{Basic usage of an Ensembl based annotation database}
+\description{
+ Get some basic information from an Ensembl based annotation package
+ generated with \code{\link{makeEnsembldbPackage}}.
+
+}
+\section{Objects from the Class}{
+ A connection to the respective annotation database is created upon
+ loading of an annotation package created with the
+ \code{\link{makeEnsembldbPackage}} function. In addition, the
+ \code{\link{EnsDb}} constructor specifying the SQLite database file can be
+ called to generate an instance of the object (see
+ \code{\link{makeEnsemblSQLiteFromTables}} for an example).
+}
+\usage{
+
+\S4method{buildQuery}{EnsDb}(x, columns=c("gene_id", "gene_biotype",
+ "gene_name"), filter=list(), order.by,
+ order.type="asc", skip.order.check=FALSE)
+
+\S4method{dbconn}{EnsDb}(x)
+
+\S4method{ensemblVersion}{EnsDb}(x)
+
+\S4method{listColumns}{EnsDb}(x, table, skip.keys=TRUE, ...)
+
+\S4method{listGenebiotypes}{EnsDb}(x, ...)
+
+\S4method{listTxbiotypes}{EnsDb}(x, ...)
+
+\S4method{listTables}{EnsDb}(x, ...)
+
+\S4method{metadata}{EnsDb}(x, ...)
+
+\S4method{organism}{EnsDb}(object)
+
+\S4method{returnFilterColumns}{EnsDb}(x)
+
+\S4method{returnFilterColumns}{EnsDb}(x)
+
+\S4method{returnFilterColumns}{EnsDb}(x) <- value
+
+\S4method{seqinfo}{EnsDb}(x)
+
+\S4method{seqlevels}{EnsDb}(x)
+
+\S4method{updateEnsDb}{EnsDb}(x, ...)
+
+}
+\arguments{
+
+ (in alphabetic order)
+
+ \item{...}{Additional arguments.
+ Not used.
+ }
+
+ \item{columns}{
+ Columns (attributes) to be retrieved from the database tables. Use the
+ \code{listColumns} or \code{listTables} method for a list of
+ supported columns.
+ }
+
+ \item{filter}{
+ list of \code{\linkS4class{BasicFilter}} instance(s) to
+ select specific entries from the database (see examples below).
+ }
+
+ \item{object}{
+ For \code{organism}: an \code{EnsDb} instance.
+ }
+
+ \item{order.by}{name of one of the columns above on which the
+ results should be sorted.
+ }
+
+ \item{order.type}{if the results should be ordered ascending
+ (\code{asc}, default) or descending (\code{desc}).
+ }
+
+ \item{skip.keys}{
+ for \code{listColumns}: whether primary and foreign keys (not
+ being e.g. \code{"gene_id"} or alike) should be returned or not. By
+ default these will not be returned.
+ }
+
+ \item{skip.order.check}{
+ if paramter \code{order.by} should be checked for allowed column
+ names. If \code{TRUE} the function checks if the provided order
+ criteria orders on columns present in the database tables.
+ }
+
+ \item{table}{
+ For \code{listColumns}: optionally specify the table name for
+ which the columns should be returned.
+ }
+
+ \item{value}{
+ For \code{returnFilterColumns}: a logical of length one specifying
+ whether columns that are used for eventual filters should also be
+ returned.
+ }
+
+ \item{x}{
+ An \code{EnsDb} instance.
+ }
+
+}
+\section{Slots}{
+ \describe{
+ \item{ensdb}{
+ Object of class \code{"DBIConnection"}: the
+ connection to the database.
+ }
+
+ \item{tables}{
+ Named list of database table columns with the names being the
+ database table names. The tables are ordered by their degree,
+ i.e. the number of other tables they can be joined with.
+ }
+
+ \item{.properties}{
+ Internal list storing user-defined properties. Should not be
+ directly accessed.
+ }
+ }
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{buildQuery}{
+ Helper function building the SQL query to be used to retrieve the
+ wanted information. Usually there is no need to call this method.
+ }
+
+ \item{dbconn}{
+ Returns the connection to the internal SQL database.
+ }
+
+ \item{ensemblVersion}{
+ Returns the Ensembl version on which the package was built.
+ }
+
+ \item{listColumns}{
+ Lists all columns of all tables in the database, or, if
+ \code{table} is specified, of the respective table.
+ }
+
+ \item{listGenebiotypes}{
+ Lists all gene biotypes defined in the database.
+ }
+
+ \item{listTxbiotypes}{
+ Lists all transcript biotypes defined in the database.
+ }
+
+ \item{listTables}{
+ Returns a named list of database table columns (names of the
+ list being the database table names).
+ }
+
+ \item{metadata}{
+ Returns a \code{data.frame} with the metadata information from the
+ database, i.e. informations about the Ensembl version or Genome
+ build the database was build upon.
+ }
+
+ \item{organism}{
+ Returns the organism name (e.g. \code{"homo_sapiens"}).
+ }
+
+ \item{returnFilterColumns, returnFilterColumns<-}{
+ Get or set the option which results in columns that are used for
+ eventually specified filters to be added as result columns. The
+ default value is \code{TRUE} (i.e. filter columns are returned).
+ }
+
+ \item{seqinfo}{
+ Returns the sequence/chromosome information from the database.
+ }
+
+ \item{seqlevels}{
+ Returns the chromosome/sequence names that are available in the
+ database.
+ }
+
+ \item{show}{
+ Displays some informations from the database.
+ }
+
+ \item{updateEnsDb}{
+ Updates the \code{EnsDb} object to the most recent implementation.
+ }
+
+ }
+}
+\value{
+ \describe{
+ \item{For \code{buildQuery}}{
+ A character string with the SQL query.
+ }
+
+ \item{For \code{connection}}{
+ The SQL connection to the RSQLite database.
+ }
+
+ \item{For \code{EnsDb}}{
+ An \code{EnsDb} instance.
+ }
+
+ \item{For \code{lengthOf}}{
+ A named integer vector with the length of the genes or transcripts.
+ }
+
+ \item{For \code{listColumns}}{
+ A character vector with the column names.
+ }
+
+ \item{For \code{listGenebiotypes}}{
+ A character vector with the biotypes of the genes in the database.
+ }
+
+ \item{For \code{listTxbiotypes}}{
+ A character vector with the biotypes of the transcripts in the database.
+ }
+
+ \item{For \code{listTables}}{
+ A list with the names corresponding to the database table names
+ and the elements being the attribute (column) names of the table.
+ }
+
+ \item{For \code{metadata}}{
+ A \code{data.frame}.
+ }
+
+ \item{For \code{organism}}{
+ A character string.
+ }
+
+ \item{For \code{returnFilterColumns}}{
+ A logical of length 1.
+ }
+
+ \item{For \code{seqinfo}}{
+ A \code{Seqinfo} class.
+ }
+
+ \item{For \code{updateEnsDb}}{
+ A \code{EnsDb} object.
+ }
+ }
+}
+\note{
+ While a column named \code{"tx_name"} is listed by the
+ \code{listTables} and \code{listColumns} method, no such column is
+ present in the database. Transcript names returned by the methods are
+ actually the transcript IDs. This \emph{virtual} column was only
+ introduced to be compliant with \code{TxDb} objects (which provide
+ transcript names).
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\link{EnsDb}},
+ \code{\link{makeEnsembldbPackage}}, \code{\linkS4class{BasicFilter}},
+ \code{\link{exonsBy}}, \code{\link{genes}},
+ \code{\link{transcripts}},
+ \code{\link{makeEnsemblSQLiteFromTables}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+
+## Display some information:
+EnsDb.Hsapiens.v75
+
+## Show the tables along with its columns
+listTables(EnsDb.Hsapiens.v75)
+
+## For what species is this database?
+organism(EnsDb.Hsapiens.v75)
+
+## What Ensembl version if the database based on?
+ensemblVersion(EnsDb.Hsapiens.v75)
+
+## Get some more information from the database
+metadata(EnsDb.Hsapiens.v75)
+
+## Get all the sequence names.
+seqlevels(EnsDb.Hsapiens.v75)
+
+###### buildQuery
+##
+## Join tables gene and transcript and return gene_id and tx_id
+buildQuery(EnsDb.Hsapiens.v75, columns=c("gene_id", "tx_id"))
+
+
+## Get all exon_ids and transcript ids of genes encoded on chromosome Y.
+buildQuery(EnsDb.Hsapiens.v75, columns=c("exon_id", "tx_id"),
+ filter=list(SeqnameFilter( "Y")))
+
+## List all available gene biotypes from the database:
+listGenebiotypes(EnsDb.Hsapiens.v75)
+
+## List all available transcript biotypes:
+listTxbiotypes(EnsDb.Hsapiens.v75)
+
+## Update the EnsDb; this is in most instances not necessary at all.
+updateEnsDb(EnsDb.Hsapiens.v75)
+
+###### returnFilterColumns
+returnFilterColumns(EnsDb.Hsapiens.v75)
+
+## Get protein coding genes on chromosome X, specifying to return
+## only columns gene_name as additional column.
+genes(EnsDb.Hsapiens.v75, filter=list(SeqnameFilter("X"),
+ GenebiotypeFilter("protein_coding")),
+ columns=c("gene_name"))
+## By default we get also the gene_biotype column as the data was filtered
+## on this column.
+
+## This can be changed using the returnFilterColumns option
+returnFilterColumns(EnsDb.Hsapiens.v75) <- FALSE
+genes(EnsDb.Hsapiens.v75, filter=list(SeqnameFilter("X"),
+ GenebiotypeFilter("protein_coding")),
+ columns=c("gene_name"))
+
+
+}
+\keyword{classes}
+
diff --git a/man/EnsDb-exonsBy.Rd b/man/EnsDb-exonsBy.Rd
new file mode 100644
index 0000000..b2c5956
--- /dev/null
+++ b/man/EnsDb-exonsBy.Rd
@@ -0,0 +1,568 @@
+\name{exonsBy}
+\Rdversion{1.1}
+\docType{class}
+\alias{disjointExons,EnsDb-method}
+\alias{cdsBy}
+\alias{cdsBy,EnsDb-method}
+\alias{fiveUTRsByTranscript,EnsDb-method}
+\alias{threeUTRsByTranscript,EnsDb-method}
+\alias{exons}
+\alias{exons,EnsDb-method}
+\alias{exonsBy}
+\alias{exonsBy,EnsDb-method}
+\alias{exonsByOverlaps,EnsDb-method}
+\alias{genes}
+\alias{genes,EnsDb-method}
+\alias{toSAF}
+\alias{toSAF,GRangesList-method}
+\alias{transcripts}
+\alias{transcripts,EnsDb-method}
+\alias{transcriptsBy}
+\alias{transcriptsBy,EnsDb-method}
+\alias{transcriptsByOverlaps,EnsDb-method}
+\alias{promoters}
+\alias{promoters,EnsDb-method}
+
+\title{Retrieve annotation data from an Ensembl based package}
+\description{
+ Retrieve gene/transcript/exons annotations stored in an Ensembl based
+ database package generated with the \code{\link{makeEnsembldbPackage}}
+ function.
+}
+\usage{
+
+\S4method{exons}{EnsDb}(x, columns=listColumns(x,"exon"),
+ filter, order.by, order.type="asc",
+ return.type="GRanges")
+
+\S4method{exonsBy}{EnsDb}(x, by=c("tx", "gene"),
+ columns=listColumns(x, "exon"), filter, use.names=FALSE)
+
+\S4method{exonsByOverlaps}{EnsDb}(x, ranges, maxgap=0L, minoverlap=1L,
+ type=c("any", "start", "end"),
+ columns=listColumns(x, "exon"),
+ filter)
+
+\S4method{transcripts}{EnsDb}(x, columns=listColumns(x, "tx"),
+ filter, order.by, order.type="asc",
+ return.type="GRanges")
+
+\S4method{transcriptsBy}{EnsDb}(x, by=c("gene", "exon"),
+ columns=listColumns(x, "tx"), filter)
+
+\S4method{transcriptsByOverlaps}{EnsDb}(x, ranges, maxgap=0L, minoverlap=1L,
+ type=c("any", "start", "end"),
+ columns=listColumns(x, "tx"),
+ filter)
+
+\S4method{promoters}{EnsDb}(x, upstream=2000, downstream=200, ...)
+
+\S4method{genes}{EnsDb}(x, columns=listColumns(x, "gene"), filter,
+ order.by, order.type="asc",
+ return.type="GRanges")
+
+\S4method{disjointExons}{EnsDb}(x, aggregateGenes=FALSE,
+ includeTranscripts=TRUE, filter, ...)
+
+\S4method{cdsBy}{EnsDb}(x, by=c("tx", "gene"), columns=NULL, filter,
+ use.names=FALSE)
+
+\S4method{fiveUTRsByTranscript}{EnsDb}(x, columns=NULL, filter)
+
+\S4method{threeUTRsByTranscript}{EnsDb}(x, columns=NULL, filter)
+
+\S4method{toSAF}{GRangesList}(x, ...)
+
+}
+\arguments{
+
+ (In alphabetic order)
+
+ \item{...}{
+ For \code{promoters}: additional arguments to be passed to the
+ \code{transcripts} method.
+ }
+
+ \item{aggregateGenes}{
+ For \code{disjointExons}: When \code{FALSE} (default) exon fragments
+ that overlap multiple genes are dropped. When \code{TRUE}, all
+ fragments are kept and the \code{gene_id} metadata column includes
+ all gene IDs that overlap the exon fragment.
+ }
+
+ \item{by}{
+ For \code{exonsBy}: wheter exons sould be fetched by genes
+ or by transcripts; as in the corresponding function of the
+ \code{GenomicFeatures} package.
+ For \code{transcriptsBy}: whether
+ transcripts should be fetched by genes or by exons; fetching
+ transcripts by cds as supported by the
+ \code{\link[GenomicFeatures]{transcriptsBy}} method in the
+ \code{GenomicFeatures} package is currently not implemented.
+ For \code{cdsBy}: whether cds should be fetched by transcript of by
+ gene.
+ }
+
+ \item{columns}{
+ Columns to be retrieved from the database tables.
+
+ Default values for \code{genes} are all columns from the \code{gene}
+ database table, for \code{exons} and \code{exonsBy} the column names of
+ the \code{exon} database table table and for \code{transcript} and
+ \code{transcriptBy} the columns of the \code{tx} data base table
+ (see details below for more information).
+
+ Note that any of the column names of the database tables can be
+ submitted to any of the methods (use \code{\link{listTables}} or
+ \code{\link{listColumns}} methods for a complete list of allowed
+ column names).
+
+ For \code{cdsBy}: this argument is only supported for for
+ \code{by="tx"}.
+ }
+
+ \item{downstream}{
+ For method \code{promoters}: the number of nucleotides downstream of
+ the transcription start site that should be included in the promoter region.
+ }
+
+ \item{filter}{
+ A filter object extending \code{\linkS4class{BasicFilter}} or a list
+ of such object(s) to select specific entries from the database (see
+ examples below).
+ }
+
+ \item{includeTranscripts}{
+ For \code{disjointExons}: When \code{TRUE} (default) a
+ \code{tx_name} metadata column is included that lists all transcript
+ IDs that overlap the exon fragment. Note: this is different to the
+ \code{\link[GenomicFeatures]{disjointExons}} function in the
+ \code{GenomicFeatures} package, that lists the transcript names, not
+ IDs.
+ }
+
+ \item{maxgap}{
+ For \code{exonsByOverlaps} and \code{transcriptsByOverlaps}: see
+ \code{\link[GenomicFeatures]{exonsByOverlaps}} help page in the
+ \code{GenomicFeatures} package.
+ }
+
+ \item{minoverlap}{
+ For \code{exonsByOverlaps} and \code{transcriptsByOverlaps}: see
+ \code{\link[GenomicFeatures]{exonsByOverlaps}} help page in the
+ \code{GenomicFeatures} package.
+ }
+
+ \item{order.by}{
+ Name of one of the columns above on which the
+ results should be sorted.
+ }
+
+ \item{order.type}{
+ If the results should be ordered ascending
+ (\code{asc}, default) or descending (\code{desc}).
+ }
+
+ \item{ranges}{
+ For \code{exonsByOverlaps} and \code{transcriptsByOverlaps}: a
+ \code{GRanges} object specifying the genomic regions.
+ }
+
+ \item{return.type}{
+ Type of the returned object. Can be either
+ \code{"data.frame"}, \code{"DataFrame"} or \code{"GRanges"}. In the latter case the return
+ object will be a \code{GRanges} object with the GRanges specifying the
+ chromosomal start and end coordinates of the feature (gene,
+ transcript or exon, depending whether \code{genes},
+ \code{transcripts} or \code{exons} was called). All additional
+ columns are added as metadata columns to the GRanges object.
+ }
+
+ \item{type}{
+ For \code{exonsByOverlaps} and \code{transcriptsByOverlaps}: see
+ \code{\link[GenomicFeatures]{exonsByOverlaps}} help page in the
+ \code{GenomicFeatures} package.
+ }
+
+ \item{upstream}{
+ For method \code{promoters}: the number of nucleotides upstream of
+ the transcription start site that should be included in the promoter region.
+ }
+
+ \item{use.names}{
+ For \code{cdsBy} and \code{exonsBy}: only for \code{by="gene"}: use the names of the
+ genes instead of their IDs as names of the resulting
+ \code{GRangesList}.
+ }
+
+ \item{x}{
+ For \code{toSAF} a \code{GRangesList} object.
+ For all other methods an \code{EnsDb} instance.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{exons}{
+ Retrieve exon information from the database. Additional
+ columns from transcripts or genes associated with the exons can be specified
+ and are added to the respective exon annotation.
+ }
+
+ \item{exonsBy}{
+ Retrieve exons grouped by transcript or by gene. This
+ function returns a \code{GRangesList} as does the analogous function
+ in the \code{GenomicFeatures} package. Using the \code{columns}
+ parameter it is possible to determine which additional values should
+ be retrieved from the database. These will be included in the
+ \code{GRanges} object for the exons as metadata columns.
+ The exons in the inner \code{GRanges} are ordered by the exon
+ index within the transcript (if \code{by="tx"}), or increasingly by the
+ chromosomal start position of the exon or decreasingly by the chromosomal end
+ position of the exon depending whether the gene is encoded on the
+ + or - strand (for \code{by="gene"}).
+ The \code{GRanges} in the \code{GRangesList} will be ordered by
+ the name of the gene or transcript.
+ }
+
+ \item{exonsByOverlaps}{
+ Retrieve exons overlapping specified genomic ranges. For
+ more information see
+ \code{\link[GenomicFeatures]{exonsByOverlaps}} method in the
+ \code{GenomicFeatures} package. The functionality is to some
+ extent similar and redundant to the \code{exons} method in
+ combination with \code{\link{GRangesFilter}} filter.
+ }
+
+ \item{transcripts}{
+ Retrieve transcript information from the database. Additional
+ columns from genes or exons associated with the transcripts can be specified
+ and are added to the respective transcript annotation.
+ }
+
+ \item{transcriptsBy}{
+ Retrieve transcripts grouped by gene or exon. This
+ function returns a \code{GRangesList} as does the analogous function
+ in the \code{GenomicFeatures} package. Using the \code{columns}
+ parameter it is possible to determine which additional values should
+ be retrieved from the database. These will be included in the
+ \code{GRanges} object for the transcripts as metadata columns.
+ The transcripts in the inner \code{GRanges} are ordered increasingly by the
+ chromosomal start position of the transcript for genes encoded on
+ the + strand and in a decreasing manner by the chromosomal end
+ position of the transcript for genes encoded on the - strand.
+ The \code{GRanges} in the \code{GRangesList} will be ordered by
+ the name of the gene or exon.
+ }
+
+ \item{transcriptsByOverlaps}{
+ Retrieve transcripts overlapping specified genomic ranges. For
+ more information see
+ \code{\link[GenomicFeatures]{transcriptsByOverlaps}} method in the
+ \code{GenomicFeatures} package. The functionality is to some
+ extent similar and redundant to the \code{transcripts} method in
+ combination with \code{\link{GRangesFilter}} filter.
+ }
+
+ \item{promoters}{
+ Retrieve promoter information from the database. Additional
+ columns from genes or exons associated with the promoters can be specified
+ and are added to the respective promoter annotation.
+ }
+ \item{genes}{
+ Retrieve gene information from the database. Additional
+ columns from transcripts or exons associated with the genes can be specified
+ and are added to the respective gene annotation.
+ }
+
+ \item{disjointExons}{
+ This method is identical to
+ \code{\link[GenomicFeatures]{disjointExons}} defined in the
+ \code{GenomicFeatures} package. It creates a \code{GRanges} of
+ non-overlapping exon parts with metadata columns of \code{gene_id}
+ and \code{exonic_part}. Exon parts that overlap more than one gene
+ can be dropped with \code{aggregateGenes=FALSE}.
+ }
+
+ \item{cdsBy}{
+ Returns the coding region grouped either by transcript or by
+ gene. Each element in the \code{GRangesList} represents the cds
+ for one transcript or gene, with the individual ranges
+ corresponding to the coding part of its exons.
+ For \code{by="tx"} additional annotation columns can be added to
+ the individual \code{GRanges} (in addition to the default columns
+ \code{exon_id} and \code{exon_rank}).
+ Note that the \code{GRangesList} is sorted by its names.
+ }
+
+ \item{fiveUTRsByTranscript}{
+ Returns the 5' untranslated region for protein coding
+ transcripts.
+ }
+
+ \item{threeUTRsByTranscript}{
+ Returns the 3' untranslated region for protein coding
+ transcripts.
+ }
+
+ \item{toSAF}{
+ Reformats a \code{GRangesList} object into a
+ \code{data.frame} corresponding to a standard SAF (Simplified
+ Annotation Format) file (i.e. with column names \code{"GeneID"},
+ \code{"Chr"}, \code{"Start"}, \code{"End"} and
+ \code{"Strand"}). Note: this method makes only sense on a
+ \code{GRangesList} that groups features (exons, transcripts) by gene.
+ }
+
+ }
+}
+\details{
+ A detailed description of all database tables and the associated
+ attributes/column names is also given in the vignette of this package.
+ An overview of the columns is given below:
+ \describe{
+ \item{gene_id}{the Ensembl gene ID of the gene.}
+ \item{gene_name}{the name of the gene (in most cases its official symbol).}
+ \item{entrezid}{the NCBI Entrezgene ID of the gene; note that this
+ can also be a \code{";"} separated list of IDs for Ensembl genes
+ mapped to more than one Entrezgene.}
+ \item{gene_biotype}{the biotype of the gene.}
+ \item{gene_seq_start}{the start coordinate of the gene on the
+ sequence (usually a chromosome).}
+ \item{gene_seq_end}{the end coordinate of the gene.}
+ \item{seq_name}{the name of the sequence the gene is encoded
+ (usually a chromosome).}
+ \item{seq_strand}{the strand on which the gene is encoded}
+ \item{seq_coord_system}{the coordinate system of the sequence.}
+ \item{tx_id}{the Ensembl transcript ID.}
+ \item{tx_biotype}{the biotype of the transcript.}
+ \item{tx_seq_start}{the chromosomal start coordinate of the transcript.}
+ \item{tx_seq_end}{the chromosomal end coordinate of the transcript.}
+ \item{tx_cds_seq_start}{the start coordinate of the coding region of
+ the transcript (NULL for non-coding transcripts).}
+ \item{tx_cds_seq_end}{the end coordinate of the coding region.}
+ \item{exon_id}{the ID of the exon. In Ensembl, each exon specified
+ by a unique chromosomal start and end position has its own
+ ID. Thus, the same exon might be part of several transcripts.}
+ \item{exon_seq_start}{the chromosomal start coordinate of the exon.}
+ \item{exon_seq_end}{the chromosomal end coordinate of the exon.}
+ \item{exon_idx}{the index of the exon in the transcript model. As
+ noted above, an exon can be part of several transcripts and thus
+ its position inside these transcript might differ.}
+ }
+
+ Also, the vignette provides examples on how to retrieve sequences for
+ genes/transcripts/exons.
+}
+\note{
+ Ensembl defines genes not only on standard chromosomes, but also on
+ patched chromosomes and chromosome variants. Thus it might be
+ advisable to restrict the queries to just those chromosomes of
+ interest (e.g. by specifying a \code{SeqnameFilter(c(1:22, "X", "Y"))}).
+ In addition, also so called LRG genes (Locus Reference Genomic) are defined in
+ Ensembl. Their gene id starts with LRG instead of ENS for Ensembl
+ genes, thus, a filter can be applied to specifically select those
+ genes or exclude those genes (see examples below).
+
+ Depending on the value of the global option
+ \code{"ucscChromosomeNames"} (use
+ \code{getOption(ucscChromosomeNames, FALSE)} to get its value or
+ \code{option(ucscChromosomeNames=TRUE)} to change its value)
+ the sequence/chromosome names of the returned \code{GRanges} objects
+ or provided in the returned \code{data.frame} or \code{DataFrame}
+ correspond to Ensembl chromosome names (if value is \code{FALSE}) or
+ UCSC chromosome names (if \code{TRUE}). This ensures a better
+ integration with the \code{Gviz} package, in which this option is set
+ by default to \code{TRUE}.
+}
+
+\value{
+ For \code{exons}, \code{transcripts} and \code{genes},
+ a \code{data.frame}, \code{DataFrame}
+ or a \code{GRanges}, depending on the value of the
+ \code{return.type} parameter. The result
+ is ordered as specified by the parameter \code{order.by} or, if not
+ provided, by \code{seq_name} and chromosomal start coordinate, but NOT by any
+ ordering of values in eventually submitted filter objects.
+
+ For \code{exonsBy}, \code{transcriptsBy}:
+ a \code{GRangesList}, depending on the value of the
+ \code{return.type} parameter. The results are ordered by the value of the
+ \code{by} parameter.
+
+ For \code{exonsByOverlaps} and \code{transcriptsByOverlaps}: a
+ \code{GRanges} with the exons or transcripts overlapping the specified
+ regions.
+
+ For \code{toSAF}: a \code{data.frame} with column names
+ \code{"GeneID"} (the group name from the \code{GRangesList}, i.e. the
+ ID by which the \code{GRanges} are split), \code{"Chr"} (the seqnames
+ from the \code{GRanges}), \code{"Start"} (the start coordinate),
+ \code{"End"} (the end coordinate) and \code{"Strand"} (the strand).
+
+ For \code{disjointExons}: a \code{GRanges} of non-overlapping exon
+ parts.
+
+ For \code{cdsBy}: a \code{GRangesList} with \code{GRanges} per either
+ transcript or exon specifying the start and end coordinates of the
+ coding region of the transcript or gene.
+
+ For \code{fiveUTRsByTranscript}: a \code{GRangesList} with
+ \code{GRanges} for each protein coding transcript representing the
+ start and end coordinates of full or partial exons that constitute the
+ 5' untranslated region of the transcript.
+
+ For \code{threeUTRsByTranscript}: a \code{GRangesList} with
+ \code{GRanges} for each protein coding transcript representing the
+ start and end coordinates of full or partial exons that constitute the
+ 3' untranslated region of the transcript.
+
+}
+\note{
+ While it is possible to request values from a column \code{"tx_name"}
+ (with the \code{columns} argument), no such column is present in the
+ database. The returned values correspond to the ID of the transcripts.
+}
+\author{
+ Johannes Rainer, Tim Triche
+}
+\seealso{
+ \code{\link{makeEnsembldbPackage}}, \code{\linkS4class{BasicFilter}},
+ \code{\link{listColumns}}, \code{\link{lengthOf}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+###### genes
+##
+## get all genes endcoded on chromosome Y
+AllY <- genes(edb, filter=SeqnameFilter("Y"))
+AllY
+
+## return result as DataFrame.
+AllY.granges <- genes(edb,
+ filter=SeqnameFilter("Y"),
+ return.type="DataFrame")
+AllY.granges
+
+## include all transcripts of the gene and their chromosomal
+## coordinates, sort by chrom start of transcripts and return as
+## GRanges.
+AllY.granges.tx <- genes(edb,
+ filter=SeqnameFilter("Y"),
+ columns=c("gene_id", "seq_name",
+ "seq_strand", "tx_id", "tx_biotype",
+ "tx_seq_start", "tx_seq_end"),
+ order.by="tx_seq_start")
+AllY.granges.tx
+
+
+
+###### transcripts
+##
+## get all transcripts of a gene
+Tx <- transcripts(edb,
+ filter=GeneidFilter("ENSG00000184895"),
+ order.by="tx_seq_start")
+Tx
+
+## get all transcripts of two genes along with some information on the
+## gene and transcript
+Tx <- transcripts(edb,
+ filter=GeneidFilter(c("ENSG00000184895",
+ "ENSG00000092377")),
+ columns=c("gene_id", "gene_seq_start",
+ "gene_seq_end", "gene_biotype", "tx_biotype"))
+Tx
+
+###### promoters
+##
+## get the bona-fide promoters (2k up- to 200nt downstream of TSS)
+promoters(edb, filter=GeneidFilter(c("ENSG00000184895",
+ "ENSG00000092377")))
+
+###### exons
+##
+## get all exons of the provided genes
+Exon <- exons(edb,
+ filter=GeneidFilter(c("ENSG00000184895",
+ "ENSG00000092377")),
+ order.by="exon_seq_start",
+ columns=c( "gene_id", "gene_seq_start",
+ "gene_seq_end", "gene_biotype"))
+Exon
+
+
+
+##### exonsBy
+##
+## get all exons for transcripts encoded on chromosomes X and Y.
+ETx <- exonsBy(edb, by="tx",
+ filter=SeqnameFilter(c("X", "Y")))
+ETx
+## get all exons for genes encoded on chromosome 1 to 22, X and Y and
+## include additional annotation columns in the result
+EGenes <- exonsBy(edb, by="gene",
+ filter=SeqnameFilter(c("X", "Y")),
+ columns=c("gene_biotype", "gene_name"))
+EGenes
+
+## Note that this might also contain "LRG" genes.
+length(grep(names(EGenes), pattern="LRG"))
+
+## to fetch just Ensemblgenes, use an GeneidFilter with value
+## "ENS%" and condition "like"
+
+
+##### transcriptsBy
+##
+TGenes <- transcriptsBy(edb, by="gene",
+ filter=SeqnameFilter(c("X", "Y")))
+TGenes
+
+## convert this to a SAF formatted data.frame that can be used by the
+## featureCounts function from the Rsubreader package.
+head(toSAF(TGenes))
+
+
+##### transcriptsByOverlaps
+##
+ir <- IRanges(start=c(2654890, 2709520, 28111770),
+ end=c(2654900, 2709550, 28111790))
+gr <- GRanges(rep("Y", length(ir)), ir)
+
+## Retrieve all transcripts overlapping any of the regions.
+txs <- transcriptsByOverlaps(edb, gr)
+txs
+
+## Alternatively, use a GRangesFilter
+grf <- GRangesFilter(gr, condition="overlapping")
+txs <- transcripts(edb, filter=grf)
+txs
+
+
+#### cdsBy
+## Get the coding region for all transcripts on chromosome Y.
+## Specifying also additional annotation columns (in addition to the default
+## exon_id and exon_rank).
+cds <- cdsBy(edb, by="tx", filter=SeqnameFilter("Y"),
+ columns=c("tx_biotype", "gene_name"))
+
+#### the 5' untranslated regions:
+fUTRs <- fiveUTRsByTranscript(edb, filter=SeqnameFilter("Y"))
+
+#### the 3' untranslated regions with additional column gene_name.
+tUTRs <- threeUTRsByTranscript(edb, filter=SeqnameFilter("Y"),
+ columns="gene_name")
+
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb-lengths.Rd b/man/EnsDb-lengths.Rd
new file mode 100644
index 0000000..c6cc796
--- /dev/null
+++ b/man/EnsDb-lengths.Rd
@@ -0,0 +1,110 @@
+\name{lengthOf}
+\Rdversion{1.1}
+\alias{lengthOf}
+\alias{lengthOf,GRangesList-method}
+\alias{lengthOf,EnsDb-method}
+%\alias{transcriptLengths}
+%\alias{transcriptLengths,EnsDb-method}
+%\alias{transcriptLengths,TxDb-method}
+
+\title{Calculating lengths of features}
+\description{
+ These methods allow to calculate the lengths of features (transcripts, genes,
+ CDS, 3' or 5' UTRs) defined in an \code{EnsDb} object or database.
+}
+\usage{
+
+\S4method{lengthOf}{EnsDb}(x, of="gene", filter=list())
+
+}
+\arguments{
+
+ (In alphabetic order)
+
+ \item{filter}{
+ list of \code{\linkS4class{BasicFilter}} instance(s) to
+ select specific entries from the database (see examples below).
+ }
+
+ \item{of}{
+ for \code{lengthOf}: whether the length of genes or
+ transcripts should be retrieved from the database.
+ }
+
+ \item{x}{
+ For \code{lengthOf}: either an \code{EnsDb} or a
+ \code{GRangesList} object. For all other methods an \code{EnsDb}
+ instance.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{lengthOf}{
+ Retrieve the length of genes or transcripts from the
+ database. The length is the sum of the lengths of all exons of a
+ transcript or a gene. In the latter case the exons are first reduced
+ so that the length corresponds to the part of the genomic sequence covered by
+ the exons.
+
+ Note: in addition to this method, also the
+ \code{\link[GenomicFeatures]{transcriptLengths}} function in the
+ \code{GenomicFeatures} package can be used.
+ }
+
+ }
+}
+
+\value{
+ For \code{lengthOf}: see method description above.
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\link{exonsBy}}
+ \code{\link{transcripts}}
+ \code{\link[GenomicFeatures]{transcriptLengths}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+##### lengthOf
+##
+## length of a specific gene.
+lengthOf(edb,
+ filter=list(GeneidFilter("ENSG00000000003")))
+
+## length of a transcript
+lengthOf(edb, of="tx",
+ filter=list(TxidFilter("ENST00000494424")))
+
+## average length of all protein coding genes encoded on chromosomes X
+## and Y
+mean(lengthOf(edb, of="gene",
+ filter=list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))
+
+## average length of all snoRNAs
+mean(lengthOf(edb, of="gene",
+ filter=list(GenebiotypeFilter("snoRNA"),
+ SeqnameFilter(c("X", "Y")))))
+
+##### transcriptLengths
+##
+## Calculate the length of transcripts encoded on chromosome Y, including
+## length of the CDS, 5' and 3' UTR.
+##len <- transcriptLengths(edb, with.cds_len=TRUE, with.utr5_len=TRUE,
+## with.utr3_len=TRUE, filter=SeqnameFilter("Y"))
+##head(len)
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb-seqlevels.Rd b/man/EnsDb-seqlevels.Rd
new file mode 100644
index 0000000..5648f69
--- /dev/null
+++ b/man/EnsDb-seqlevels.Rd
@@ -0,0 +1,149 @@
+\name{seqlevelsStyle}
+\Rdversion{1.1}
+\alias{seqlevelsStyle}
+\alias{seqlevelsStyle,EnsDb-method}
+\alias{seqlevelsStyle<-}
+\alias{seqlevelsStyle<-,EnsDb-method}
+\alias{supportedSeqlevelsStyles}
+\alias{supportedSeqlevelsStyles,EnsDb-method}
+
+\title{Support for other than Ensembl seqlevel style}
+\description{
+ The methods and functions on this help page allow to integrate
+ \code{EnsDb} objects and the annotations they provide with other
+ Bioconductor annotation packages that base on chromosome names
+ (seqlevels) that are different from those defined by Ensembl.
+}
+\usage{
+
+\S4method{seqlevelsStyle}{EnsDb}(x)
+
+\S4method{seqlevelsStyle}{EnsDb}(x) <- value
+
+\S4method{supportedSeqlevelsStyles}{EnsDb}(x)
+
+}
+\arguments{
+
+ (In alphabetic order)
+
+ \item{value}{
+ For \code{seqlevelsStyle<-}: a character string specifying the
+ seqlevels style that should be set. Use the
+ \code{supportedSeqlevelsStyle} to list all available and supported
+ seqlevel styles.
+ }
+
+ \item{x}{
+ An \code{EnsDb} instance.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{seqlevelsStyle}{
+ Get the style of the seqlevels in which results returned from the
+ \code{EnsDb} object are encoded. By default, and internally,
+ seqnames as provided by Ensembl are used.
+
+ The method returns a character string specifying the currently used
+ seqlevelstyle.
+ }
+
+ \item{seqlevelsStyle<-}{
+ Change the style of the seqlevels in which results returned from
+ the \code{EnsDb} object are encoded. Changing the seqlevels helps
+ integrating annotations from \code{EnsDb} objects e.g. with
+ annotations from packages that base on UCSC annotations.
+ }
+
+ \item{supportedSeqlevelsStyles}{
+ Lists all seqlevel styles for which mappings between seqlevel
+ styles are available in the \code{GenomeInfoDb} package.
+
+ The method returns a character vector with supported seqlevel
+ styles for the organism of the \code{EnsDb} object.
+ }
+
+ }
+}
+
+\note{
+ The mapping between different seqname styles is performed based on
+ data provided by the \code{GenomeInfoDb} package. Note that in most
+ instances no mapping is provided for seqnames other than for primary
+ chromosomes. By default functions from the \code{ensembldb} package
+ return the \emph{original} seqname is in such cases. This behaviour
+ can be changed with the \code{ensembldb.seqnameNotFound} global
+ option. For the special keyword \code{"ORIGINAL"} (the default), the
+ original seqnames are returned, for \code{"MISSING"} an error is
+ thrown if a seqname can not be mapped. In all other cases, the value
+ of the option is returned as seqname if no mapping is available
+ (e.g. setting \code{options(ensembldb.seqnameNotFound=NA)} returns an
+ \code{NA} if the seqname is not mappable).
+}
+
+\value{
+ For \code{seqlevelsStyle}: see method description above.
+
+ For \code{supportedSeqlevelsStyles}: see method description above.
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\linkS4class{EnsDb}}
+ \code{\link{transcripts}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Get the internal, default seqlevel style.
+seqlevelsStyle(edb)
+
+## Get the seqlevels from the database.
+seqlevels(edb)
+
+## Get all supported mappings for the organism of the EnsDb.
+supportedSeqlevelsStyles(edb)
+
+## Change the seqlevels to UCSC style.
+seqlevelsStyle(edb) <- "UCSC"
+seqlevels(edb)
+
+## Change the option ensembldb.seqnameNotFound to return NA in case
+## the seqname can not be mapped form Ensembl to UCSC.
+options(ensembldb.seqnameNotFound=NA)
+
+seqlevels(edb)
+
+## Restoring the original setting.
+options(ensembldb.seqnameNotFound="ORIGINAL")
+
+
+## Integrate Ensembl based annotations with a BSgenome package that is based on
+## UCSC style seqnames.
+library(BSgenome.Hsapiens.UCSC.hg19)
+bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+## Get the genome version
+unique(genome(bsg))
+unique(genome(edb))
+## Although differently named, both represent genome build GRCh37.
+
+## Extract the full transcript sequences of all lincRNAs encoded on chromsome Y.
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx",
+ filter=list(SeqnameFilter("chrY"),
+ GenebiotypeFilter("lincRNA"))))
+yTxSeqs
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb-sequences.Rd b/man/EnsDb-sequences.Rd
new file mode 100644
index 0000000..e2b9ce5
--- /dev/null
+++ b/man/EnsDb-sequences.Rd
@@ -0,0 +1,118 @@
+\name{getGenomeFaFile}
+\Rdversion{1.1}
+%\alias{extractTranscriptSeqs}
+%\alias{extractTranscriptSeqs,ANY-method}
+%\alias{extractTranscriptSeqs,ANY,ANY}
+%\alias{extractTranscriptSeqs,ANY,EnsDb-method}
+\alias{getGenomeFaFile}
+\alias{getGenomeFaFile,EnsDb-method}
+
+\title{Functionality related to DNA/RNA sequences}
+\description{
+ Utility functions related to RNA/DNA sequences, such as extracting
+ RNA/DNA sequences for features defined in \code{Ensb}.
+}
+\usage{
+
+\S4method{getGenomeFaFile}{EnsDb}(x, pattern="dna.toplevel.fa")
+
+%\S4method{extractTranscriptSeqs}{ANY,EnsDb}(x, transcripts, filter)
+
+}
+\arguments{
+
+ (In alphabetic order)
+
+ %% \item{filter}{
+ %% A filter object extending \code{\linkS4class{BasicFilter}} or a list
+ %% of such object(s) to select specific entries from the database (see
+ %% examples below).
+ %% }
+
+ \item{pattern}{
+ For method \code{getGenomeFaFile}: the pattern to be used to
+ identify the fasta file representing genomic DNA sequence.
+ }
+
+ %% \item{transcripts}{
+ %% For \code{extractTranscriptSeqs}: the \code{EnsDb} object from which
+ %% the transcript definitions should be extracted.
+ %% }
+
+ \item{x}{
+ %% For \code{extractTranscriptSeqs}: An object representing a single
+ %% chromosome or a collection of chromosomes. Refer to the help of the
+ %% \code{\link[GenomicFeatures]{extractTranscriptSeqs}} method in
+ %% \code{GenomicFeatures} package for more information.
+ For all other methods an \code{EnsDb} instance.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+ %% \item{extractTranscriptSeqs}{
+ %% Extract transcript sequences. This method adapts the
+ %% \code{\link[GenomicFeatures]{extractTranscriptSeqs}} from the
+ %% \code{GenomicFeatures} package to allow the usage of filters to
+ %% specify the transcripts from which the sequence should be
+ %% extracted.
+ %% }
+
+ \item{getGenomeFaFile}{
+ Returns a \code{\link[Rsamtools]{FaFile-class}} (defined in
+ \code{Rsamtools}) with the genomic sequence of the genome build
+ matching the Ensembl version of the \code{EnsDb} object.
+ The file is retrieved using the \code{AnnotationHub} package,
+ thus, at least for the first invocation, an internet connection is
+ required to locate and download the file; subsequent calls will
+ load the cached file instead.
+ If no fasta file for the actual Ensembl version is available the
+ function tries to identify a file matchint the species and genome
+ build version of the closest Ensembl release and returns that
+ instead.
+ See the vignette for an example to work with such files.
+ }
+
+ }
+}
+
+\value{
+ For \code{getGenomeFaFile}: a \code{\link[Rsamtools]{FaFile-class}}
+ object with the genomic DNA sequence.
+
+ %% For \code{extractTranscriptSeqs}: A \code{DNAStringSet} object
+ %% parallel to \code{transcripts} (i.e. the i-th element in it is the
+ %% sequence of the i-th transcript in \code{transcripts}).
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\linkS4class{BasicFilter}}
+ \code{\link{transcripts}}
+ \code{\link{exonsBy}}
+}
+\examples{
+
+## Loading an EnsDb for Ensembl version 75 (genome GRCh37):
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+\dontrun{
+ ## Retrieve a FaFile with the gneomic DNA sequence matching the organism,
+ ## genome release version and, if possible, the Ensembl version of the
+ ## EnsDb object.
+ Dna <- getGenomeFaFile(edb)
+ ## Extract the transcript sequence for all transcripts encoded on chromosome
+ ## Y.
+ ##extractTranscriptSeqs(Dna, edb, filter=SeqnameFilter("Y"))
+
+}
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb-utils.Rd b/man/EnsDb-utils.Rd
new file mode 100644
index 0000000..b57d042
--- /dev/null
+++ b/man/EnsDb-utils.Rd
@@ -0,0 +1,118 @@
+\name{getGeneRegionTrackForGviz}
+\Rdversion{1.1}
+\alias{getGeneRegionTrackForGviz}
+\alias{getGeneRegionTrackForGviz,EnsDb-method}
+
+\title{Utility functions}
+\description{
+ Utility functions integrating \code{EnsDb} objects with other
+ Bioconductor packages.
+}
+\usage{
+
+\S4method{getGeneRegionTrackForGviz}{EnsDb}(x, filter=list(),
+ chromosome=NULL,
+ start=NULL, end=NULL,
+ featureIs="gene_biotype")
+}
+\arguments{
+
+ (In alphabetic order)
+
+ \item{chromosome}{
+ For \code{getGeneRegionTrackForGviz}: optional chromosome name to
+ restrict the returned entry to a specific chromosome.
+ }
+
+ \item{end}{
+ For \code{getGeneRegionTrackForGviz}: optional chromosomal end
+ coordinate specifying, together with \code{start}, the chromosomal
+ region from which features should be retrieved.
+ }
+
+ \item{featureIs}{
+ For \code{getGeneRegionTrackForGviz}: whether the gene
+ (\code{"gene_biotype"}) or the transcript biotype
+ (\code{"tx_biotype"}) should be returned in column \code{"feature"}.
+ }
+
+ \item{filter}{
+ A filter object extending \code{\linkS4class{BasicFilter}} or a list
+ of such object(s) to select specific entries from the database (see
+ examples below).
+ }
+
+ \item{start}{
+ For \code{getGeneRegionTrackForGviz}: optional chromosomal start
+ coordinate specifying, together with \code{end}, the chromosomal
+ region from which features should be retrieved.
+ }
+
+ \item{x}{
+ For \code{toSAF} a \code{GRangesList} object. For all other
+ methods an \code{EnsDb} instance.
+ }
+
+}
+\section{Methods and Functions}{
+ \describe{
+
+ \item{getGeneRegionTrackForGviz}{
+ Retrieve a \code{GRanges} object with transcript features from the
+ \code{EnsDb} that can be used directly in the \code{Gviz} package
+ to create a \code{GeneRegionTrack}. Using the \code{filter},
+ \code{chromosome}, \code{start} and \code{end} arguments it is
+ possible to fetch specific features (e.g. lincRNAs) from the
+ database.
+
+ If \code{chromosome}, \code{start} and \code{end} is provided the
+ function internally first retrieves all transcripts that have an
+ exon or an intron in the specified chromosomal region and
+ subsequently fetch all of these transcripts. This ensures that all
+ transcripts of the region are returned, even those that have
+ \emph{only} an intron in the region.
+
+ The function returns a \code{GRanges} object with additional
+ annotation columns \code{"feature"}, \code{"gene"}, \code{"exon"},
+ \code{"exon_rank"}, \code{"trancript"}, \code{"symbol"} specifying
+ the feature type (either gene or transcript biotype), the
+ (Ensembl) gene ID, the exon ID, the rank/index of the exon in the
+ transcript, the transcript ID and the gene symbol/name.
+ }
+ }
+}
+
+\value{
+ For \code{getGeneRegionTrackForGviz}: see method description above.
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\linkS4class{BasicFilter}}
+ \code{\link{transcripts}}
+}
+\examples{
+
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+###### getGeneRegionTrackForGviz
+##
+## Get all genes encoded on chromosome Y in the specifyed region.
+AllY <- getGeneRegionTrackForGviz(edb, chromosome="Y", start=5000000,
+ end=7000000)
+## We could plot this now using plotTracks(GeneRegionTrack(AllY))
+
+## We can also use filters to further restrict the query to e.g.
+## all lincRNA genes encoded in that region.
+lincsY <- getGeneRegionTrackForGviz(edb, chromosome="Y", start=5000000,
+ end=7000000,
+ filter=GenebiotypeFilter("lincRNA"))
+
+}
+\keyword{classes}
+
+
+
+
+
diff --git a/man/EnsDb.Rd b/man/EnsDb.Rd
new file mode 100644
index 0000000..6f777ec
--- /dev/null
+++ b/man/EnsDb.Rd
@@ -0,0 +1,50 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/dbhelpers.R
+\name{EnsDb}
+\alias{EnsDb}
+\title{Connect to an EnsDb object}
+\usage{
+EnsDb(x)
+}
+\arguments{
+\item{x}{Either a character specifying the \emph{SQLite} database file, or
+a \code{DBIConnection} to e.g. a MySQL database.}
+}
+\value{
+A \code{\linkS4class{EnsDb}} object.
+}
+\description{
+The \code{EnsDb} constructor function connects to the database
+specified with argument \code{x} and returns a corresponding
+\code{\linkS4class{EnsDb}} object.
+}
+\details{
+By providing the connection to a MySQL database, it is possible
+to use MySQL as the database backend and queries will be performed on that
+database. Note however that this requires the package \code{RMySQL} to be
+installed. In addition, the user needs to have access to a MySQL server
+providing already an EnsDb database, or must have write privileges on a
+MySQL server, in which case the \code{\link{useMySQL}} method can be used
+to insert the annotations from an EnsDB package into a MySQL database.
+}
+\examples{
+## "Standard" way to create an EnsDb object:
+library(EnsDb.Hsapiens.v75)
+EnsDb.Hsapiens.v75
+
+## Alternatively, provide the full file name of a SQLite database file
+dbfile <- system.file("extdata/EnsDb.Hsapiens.v75.sqlite", package = "EnsDb.Hsapiens.v75")
+edb <- EnsDb(dbfile)
+edb
+
+## Third way: connect to a MySQL database
+\dontrun{
+library(RMySQL)
+dbcon <- dbConnect(MySQL(), user = my_user, pass = my_pass, host = my_host, dbname = "ensdb_hsapiens_v75")
+edb <- EnsDb(dbcon)
+}
+}
+\author{
+Johannes Rainer
+}
+
diff --git a/man/GeneidFilter-class.Rd b/man/GeneidFilter-class.Rd
new file mode 100644
index 0000000..22f8892
--- /dev/null
+++ b/man/GeneidFilter-class.Rd
@@ -0,0 +1,451 @@
+\name{GeneidFilter-class}
+\Rdversion{1.1}
+\docType{class}
+\alias{BasicFilter-class}
+\alias{EntrezidFilter-class}
+\alias{GeneidFilter-class}
+\alias{GenebiotypeFilter-class}
+\alias{GenenameFilter-class}
+\alias{TxidFilter-class}
+\alias{TxbiotypeFilter-class}
+\alias{ExonidFilter-class}
+\alias{SeqnameFilter-class}
+\alias{SeqstrandFilter-class}
+\alias{SeqstartFilter-class}
+\alias{SeqendFilter-class}
+\alias{GRangesFilter-class}
+\alias{ExonrankFilter-class}
+\alias{column,EntrezidFilter,missing,missing-method}
+\alias{column,GeneidFilter,missing,missing-method}
+\alias{column,GenenameFilter,missing,missing-method}
+\alias{column,GenebiotypeFilter,missing,missing-method}
+\alias{column,TxidFilter,missing,missing-method}
+\alias{column,TxbiotypeFilter,missing,missing-method}
+\alias{column,ExonidFilter,missing,missing-method}
+\alias{column,ExonrankFilter,missing,missing-method}
+\alias{column,SeqnameFilter,missing,missing-method}
+\alias{column,SeqstrandFilter,missing,missing-method}
+\alias{column,SeqstartFilter,missing,missing-method}
+\alias{column,SeqendFilter,missing,missing-method}
+\alias{column,GRangesFilter,missing,missing-method}
+\alias{where,EntrezidFilter,missing,missing-method}
+\alias{where,GeneidFilter,missing,missing-method}
+\alias{where,GenenameFilter,missing,missing-method}
+\alias{where,GenebiotypeFilter,missing,missing-method}
+\alias{where,TxidFilter,missing,missing-method}
+\alias{where,TxbiotypeFilter,missing,missing-method}
+\alias{where,ExonidFilter,missing,missing-method}
+\alias{where,ExonrankFilter,missing,missing-method}
+\alias{where,SeqnameFilter,missing,missing-method}
+\alias{where,SeqstrandFilter,missing,missing-method}
+\alias{where,SeqstartFilter,missing,missing-method}
+\alias{where,SeqendFilter,missing,missing-method}
+\alias{where,GRangesFilter,missing,missing-method}
+% EnsDb, missing
+\alias{column,EntrezidFilter,EnsDb,missing-method}
+\alias{column,GeneidFilter,EnsDb,missing-method}
+\alias{column,GenenameFilter,EnsDb,missing-method}
+\alias{column,GenebiotypeFilter,EnsDb,missing-method}
+\alias{column,TxidFilter,EnsDb,missing-method}
+\alias{column,TxbiotypeFilter,EnsDb,missing-method}
+\alias{column,ExonidFilter,EnsDb,missing-method}
+\alias{column,ExonrankFilter,EnsDb,missing-method}
+\alias{column,SeqnameFilter,EnsDb,missing-method}
+\alias{column,SeqstrandFilter,EnsDb,missing-method}
+\alias{column,SeqstartFilter,EnsDb,missing-method}
+\alias{column,SeqendFilter,EnsDb,missing-method}
+\alias{column,GRangesFilter,EnsDb,missing-method}
+\alias{column,OnlyCodingTx,EnsDb,missing-method}
+\alias{where,EntrezidFilter,EnsDb,missing-method}
+\alias{where,GeneidFilter,EnsDb,missing-method}
+\alias{where,GenenameFilter,EnsDb,missing-method}
+\alias{where,GenebiotypeFilter,EnsDb,missing-method}
+\alias{where,TxidFilter,EnsDb,missing-method}
+\alias{where,TxbiotypeFilter,EnsDb,missing-method}
+\alias{where,ExonidFilter,EnsDb,missing-method}
+\alias{where,ExonrankFilter,EnsDb,missing-method}
+\alias{where,SeqnameFilter,EnsDb,missing-method}
+\alias{where,SeqstrandFilter,EnsDb,missing-method}
+\alias{where,SeqstartFilter,EnsDb,missing-method}
+\alias{where,SeqendFilter,EnsDb,missing-method}
+\alias{where,GRangesFilter,EnsDb,missing-method}
+\alias{where,OnlyCodingTx,EnsDb,missing-method}
+% EnsDb, character
+\alias{column,EntrezidFilter,EnsDb,character-method}
+\alias{column,GeneidFilter,EnsDb,character-method}
+\alias{column,GenenameFilter,EnsDb,character-method}
+\alias{column,GenebiotypeFilter,EnsDb,character-method}
+\alias{column,TxidFilter,EnsDb,character-method}
+\alias{column,TxbiotypeFilter,EnsDb,character-method}
+\alias{column,ExonidFilter,EnsDb,character-method}
+\alias{column,ExonrankFilter,EnsDb,character-method}
+\alias{column,SeqnameFilter,EnsDb,character-method}
+\alias{column,SeqstrandFilter,EnsDb,character-method}
+\alias{column,SeqstartFilter,EnsDb,character-method}
+\alias{column,SeqendFilter,EnsDb,character-method}
+\alias{column,GRangesFilter,EnsDb,character-method}
+\alias{column,OnlyCodingTx,EnsDb,character-method}
+\alias{where,EntrezidFilter,EnsDb,character-method}
+\alias{where,GeneidFilter,EnsDb,character-method}
+\alias{where,GenenameFilter,EnsDb,character-method}
+\alias{where,GenebiotypeFilter,EnsDb,character-method}
+\alias{where,TxidFilter,EnsDb,character-method}
+\alias{where,TxbiotypeFilter,EnsDb,character-method}
+\alias{where,ExonidFilter,EnsDb,character-method}
+\alias{where,ExonrankFilter,EnsDb,character-method}
+\alias{where,SeqnameFilter,EnsDb,character-method}
+\alias{where,SeqstrandFilter,EnsDb,character-method}
+\alias{where,SeqstartFilter,EnsDb,character-method}
+\alias{where,SeqendFilter,EnsDb,character-method}
+\alias{where,GRangesFilter,EnsDb,character-method}
+\alias{where,OnlyCodingTx,EnsDb,character-method}
+%
+\alias{condition,BasicFilter-method}
+\alias{condition<-,BasicFilter-method}
+\alias{condition<-}
+\alias{condition,GRangesFilter-method}
+\alias{condition<-,GRangesFilter-method}
+\alias{show,BasicFilter-method}
+\alias{show,GRangesFilter-method}
+\alias{print,BasicFilter-method}
+\alias{where,BasicFilter,missing,missing-method}
+\alias{where,BasicFilter,EnsDb,missing-method}
+\alias{where,BasicFilter,EnsDb,character-method}
+\alias{where,list,EnsDb,character-method}
+\alias{where,list,EnsDb,missing-method}
+\alias{where,list,missing,missing-method}
+\alias{value,BasicFilter,missing-method}
+\alias{value<-}
+\alias{value<-,BasicFilter-method}
+\alias{value<-,ExonrankFilter-method}
+\alias{value,BasicFilter,EnsDb-method}
+\alias{value,GRangesFilter,missing-method}
+\alias{value,GRangesFilter,EnsDb-method}
+\alias{value,SeqnameFilter,EnsDb-method}
+\alias{condition}
+\alias{value}
+\alias{column}
+\alias{where}
+% Additional GRangesFilter stuff
+\alias{end,GRangesFilter-method}
+\alias{seqlevels,GRangesFilter-method}
+\alias{seqnames,GRangesFilter-method}
+\alias{start,GRangesFilter-method}
+\alias{strand,GRangesFilter-method}
+% SymbolFilter
+\alias{SymbolFilter-class}
+\alias{column,SymbolFilter,missing,missing-method}
+\alias{column,SymbolFilter,EnsDb,missing-method}
+\alias{column,SymbolFilter,EnsDb,character-method}
+\alias{where,SymbolFilter,missing,missing-method}
+\alias{where,SymbolFilter,EnsDb,missing-method}
+\alias{where,SymbolFilter,EnsDb,character-method}
+
+
+\title{Filter results fetched from the Ensembl database}
+\description{
+ These classes allow to specify which entries (i.e. genes, transcripts
+ or exons) should be retrieved from the database.
+}
+\section{Objects from the Class}{
+ While objects can be created by calls e.g. of the form
+ \code{new("GeneidFilter", ...)} users are strongly encouraged to use the
+ specific functions: \code{\link{GeneidFilter}}, \code{\link{EntrezidFilter}},
+ \code{\link{GenenameFilter}}, \code{\link{GenebiotypeFilter}},
+ \code{\link{GRangesFilter}}, \code{\link{SymbolFilter}},
+ \code{\link{TxidFilter}}, \code{\link{TxbiotypeFilter}},
+ \code{\link{ExonidFilter}}, \code{\link{ExonrankFilter}},
+ \code{\link{SeqnameFilter}}, \code{\link{SeqstrandFilter}},
+ \code{\link{SeqstartFilter}} and \code{\link{SeqendFilter}}.
+
+ See examples below for usage.
+}
+\section{Slots}{
+ \describe{
+ \item{\code{condition}:}{
+ Object of class \code{"character"}: can be
+ either \code{"="}, \code{"in"} or \code{"like"} to filter on character values
+ (e.g. gene id, gene biotype, seqname etc), or \code{"="}, \code{">"}
+ or \code{"<"} for numerical values (chromosome/seq
+ coordinates). Note that for \code{"like"} \code{value} should be a
+ SQL pattern (e.g. \code{"ENS\%"}).
+ }
+
+ \item{\code{value}:}{
+ Object of class \code{"character"}: the value
+ to be used for filtering.
+ }
+
+ }
+}
+\section{Extends}{
+ Class \code{\linkS4class{BasicFilter}}, directly.
+}
+\section{Methods for all \code{BasicFilter} objects}{
+ \describe{
+ Note: these methods are applicable to all classes extending the
+ \code{BasicFilter} class.
+
+ \item{column}{\code{signature(object = "GeneidFilter", db = "EnsDb",
+ with.tables = "character")}:
+ returns the column (attribute name) to be used for the
+ filtering. Submitting the \code{db} parameter ensures that
+ returned column is valid in the corresponding database schema. The
+ optional argument \code{with.tables} allows to specify which in
+ which database table the function should look for the
+ attribute/column name. By default the method will check all
+ database tables.
+ }
+
+ \item{column}{\code{signature(object = "GeneidFilter", db = "EnsDb",
+ with.tables = "missing")}:
+ returns the column (attribute name) to be used for the
+ filtering. Submitting the \code{db} parameter ensures that
+ returned column is valid in the corresponding database schema.
+ }
+
+ \item{column}{\code{signature(object = "GeneidFilter", db = "missing",
+ with.tables = "missing")}:
+ returns the column (table column name) to be used for the
+ filtering.
+ }
+
+ \item{condition}{\code{signature(x = "BasicFilter")}: returns
+ the value for the \code{condition} slot.
+ }
+
+ \item{condition<-}{
+ setter method for condition.
+ }
+
+ \item{value}{\code{signature(x = "BasicFilter", db = "EnsDb")}:
+ returns the value of the \code{value} slot of the filter object.
+ }
+
+ \item{value<-}{
+ setter method for value.
+ }
+
+ \item{where}{\code{signature(object = "GeneidFilter", db = "EnsDb",
+ with.tables = "character")}:
+ returns the where condition for the SQL call. Submitting also the
+ \code{db} parameter ensures that
+ the columns are valid in the corresponding database schema. The
+ optional argument \code{with.tables} allows to specify which in
+ which database table the function should look for the
+ attribute/column name. By default the method will check all
+ database tables.
+ }
+
+ \item{where}{\code{signature(object = "GeneidFilter", db = "EnsDb",
+ with.tables = "missing")}:
+ returns the
+ where condition for the SQL call. Submitting also the \code{db}
+ parameter ensures that
+ the columns are valid in the corresponding database schema.
+ }
+
+ \item{where}{\code{signature(object = "GeneidFilter", db = "missing",
+ with.tables = "missing")}:
+ returns the where condition for the SQL call.
+ }
+ }
+}
+\section{Methods for \code{GRangesFilter} objects}{
+ \describe{
+ \item{start, end, strand}{
+ Get the start and end coordinate and the strand from the
+ \code{GRanges} within the filter.
+ }
+
+ \item{seqlevels, seqnames}{
+ Get the names of the sequences from the \code{GRanges} of the filter.
+ }
+ }
+}
+\details{
+ \describe{
+ \item{\code{ExonidFilter}}{
+ Allows to filter based on the (Ensembl) exon identifier.
+ }
+
+ \item{\code{ExonrankFilter}}{
+ Allows to filter based on the rank (index) of the exon within the
+ transcript model. Exons are always numbered 5' to 3' end of the
+ transcript, thus, also on the reverse strand, the exon 1 is the
+ most 5' exon of the transcript.
+ }
+
+ \item{\code{EntrezidFilter}}{
+ Filter results based on the NCBI Entrezgene identifierts of the
+ genes. Use the \code{\link{listGenebiotypes}} method to get a
+ complete list of all available gene biotypes.
+ }
+
+ \item{\code{GenebiotypeFilter}}{
+ Filter results based on the gene biotype as defined in the Ensembl
+ database.
+ }
+
+ \item{\code{GeneidFilter}}{
+ Filter results based on the Ensembl gene identifiers.
+ }
+
+ \item{\code{GenenameFilter}}{
+ Allows to filter on the gene names (symbols) of the genes.
+ }
+
+ \item{\code{SymbolFilter}}{
+ Filter on gene symbols. Note that since no such database column is
+ available in an \code{EnsDb} database the gene names are used to
+ filter. These do however correspond all to the official gene
+ symbols.
+ }
+
+ \item{\code{GRangesFilter}}{
+ Allows to fetch features within or overlapping specified genomic
+ region(s)/range(s). This filter takes a \code{GRanges} object as input
+ and, if \code{condition="within"} (the default) will restrict
+ results to features (genes, transcripts or exons) that are
+ completely within the region. Alternatively, by specifying
+ \code{condition="overlapping"} it will return all features
+ (i.e. genes for a call to \code{\link{genes}}, transcripts for a
+ call to \code{\link{transcripts}} and exons for a call to
+ \code{\link{exons}}) that are partially overlapping with the
+ region, i.e. which start coordinate is smaller than the end
+ coordinate of the region and which end coordinate is larger than
+ the start coordinate of the region. Thus, genes and transcripts
+ that have an intron overlapping the region will also be returned.
+
+ Calls to the methods \code{\link{exonsBy}}, \code{\link{cdsBy}}
+ and \code{\link{transcriptsBy}} use the start and end coordinates of the
+ feature type specified with argument \code{by}
+ (i.e. \code{"gene"}, \code{"transcript"} or \code{"exon"}) for the
+ filtering.
+
+ Note: if the specified \code{GRanges} object defines multiple
+ region, all features within (or overlapping) any of these regions
+ are returned.
+
+ Chromosome names/seqnames can be provided in UCSC format
+ (e.g. \code{"chrX"}) or Ensembl format (e.g. \code{"X"}); see
+ \code{\link{seqlevelsStyle}} for more information.
+ }
+
+ \item{\code{SeqendFilter}}{
+ Filter based on the chromosomal end coordinate of the exons,
+ transcripts or genes.
+ }
+
+ \item{\code{SeqnameFilter}}{
+ Filter on the sequence name on which the features are encoded
+ (mostly the chromosome names). Supports UCSC chromosome names
+ (e.g. \code{"chrX"}) and Ensembl chromosome names
+ (e.g. \code{"X"}).
+ }
+
+ \item{\code{SeqstartFilter}}{
+ Filter based on the chromosomal start coordinates of the exons,
+ transcripts or genes.
+ }
+
+ \item{\code{SeqstrandFilter}}{
+ Filter based on the strand on which the features are encoded.
+ }
+
+ \item{\code{TxbiotypeFilter}}{
+ Filter on the transcript biotype defined in Ensembl. Use the
+ \code{\link{listTxbiotypes}} method to get a complete list of all
+ available transcript biotypes.
+ }
+
+ \item{\code{TxidFilter}}{
+ Filter on the Ensembl transcript identifiers.
+ }
+ }
+}
+\note{
+ The \code{column} and \code{where} methods should be always called
+ along with the \code{EnsDb} object, as this ensures that the
+ returned column names are valid for the database schema. The optional
+ argument \code{with.tables} should on the other hand only be used
+ rarely as it is more intended for internal use.
+
+ Note that the database column \code{"entrezid"} queried for
+ \code{EntrezidFilter} classes can contain multiple, \code{";"}
+ separated, Entrezgene IDs, thus, using this filter at present might
+ not return all entries from the database. Also, the database does not
+ provide a column with the official gene symbols and a
+ \code{SymbolFilter} queries the gene names instead.
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\link{genes}}, \code{\link{transcripts}}, \code{\link{exons}},
+ \code{\link{listGenebiotypes}}, \code{\link{listTxbiotypes}}
+}
+\examples{
+
+## create a filter that could be used to retrieve all informations for
+## the respective gene.
+Gif <- GeneidFilter("ENSG00000012817")
+Gif
+## returns the where condition of the SQL querys
+where(Gif)
+
+## create a filter for a chromosomal end position of a gene
+Sef <- SeqendFilter(10000, condition=">", "gene")
+Sef
+
+## for additional examples see the help page of "genes"
+
+
+## Example for GRangesFilter:
+## retrieve all genes overlapping the specified region
+grf <- GRangesFilter(GRanges("11", ranges=IRanges(114000000, 114000050),
+ strand="+"), condition="overlapping")
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+genes(edb, filter=grf)
+
+## Get also all transcripts overlapping that region
+transcripts(edb, filter=grf)
+
+## Retrieve all transcripts for the above gene
+gn <- genes(edb, filter=grf)
+txs <- transcripts(edb, filter=GenenameFilter(gn$gene_name))
+## Next we simply plot their start and end coordinates.
+plot(3, 3, pch=NA, xlim=c(start(gn), end(gn)), ylim=c(0, length(txs)), yaxt="n", ylab="")
+## Highlight the GRangesFilter region
+rect(xleft=start(grf), xright=end(grf), ybottom=0, ytop=length(txs), col="red", border="red")
+for(i in 1:length(txs)){
+ current <- txs[i]
+ rect(xleft=start(current), xright=end(current), ybottom=i-0.975, ytop=i-0.125, border="grey")
+ text(start(current), y=i-0.5,pos=4, cex=0.75, labels=current$tx_id)
+}
+## Thus, we can see that only 4 transcripts of that gene are indeed overlapping the region.
+
+
+## No exon is overlapping that region, thus we're not getting anything
+exons(edb, filter=grf)
+
+
+## Example for ExonrankFilter
+## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns=c("tx_id", "exon_idx"),
+ filter=list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition="<")))
+
+
+## Get all transcripts for the gene SKA2
+transcripts(edb, filter=GenenameFilter("SKA2"))
+
+## Which is the same as using a SymbolFilter
+transcripts(edb, filter=SymbolFilter("SKA2"))
+
+
+}
+\keyword{classes}
+
diff --git a/man/SeqendFilter.Rd b/man/SeqendFilter.Rd
new file mode 100644
index 0000000..3f602d2
--- /dev/null
+++ b/man/SeqendFilter.Rd
@@ -0,0 +1,237 @@
+\name{SeqendFilter}
+\alias{EntrezidFilter}
+\alias{GeneidFilter}
+\alias{GenenameFilter}
+\alias{GenebiotypeFilter}
+\alias{TxidFilter}
+\alias{TxbiotypeFilter}
+\alias{ExonidFilter}
+\alias{ExonrankFilter}
+\alias{SeqnameFilter}
+\alias{SeqstrandFilter}
+\alias{SeqstartFilter}
+\alias{SeqendFilter}
+\alias{GRangesFilter}
+\alias{SymbolFilter}
+\title{
+ Constructor functions for filter objects
+}
+\description{
+ These functions allow to create filter objects that can be used to
+ retrieve specific elements from the annotation database.
+}
+\usage{
+EntrezidFilter(value, condition = "=")
+
+GeneidFilter(value, condition = "=")
+
+GenenameFilter(value, condition = "=")
+
+GenebiotypeFilter(value, condition = "=")
+
+GRangesFilter(value, condition="within", feature="gene")
+
+TxidFilter(value, condition = "=")
+
+TxbiotypeFilter(value, condition = "=")
+
+ExonidFilter(value, condition = "=")
+
+ExonrankFilter(value, condition = "=")
+
+SeqnameFilter(value, condition = "=")
+
+SeqstrandFilter(value, condition = "=")
+
+SeqstartFilter(value, condition = "=", feature = "gene")
+
+SeqendFilter(value, condition = "=", feature = "gene")
+
+SymbolFilter(value, condition = "=")
+
+}
+%- maybe also 'usage' for other objects documented here.
+\arguments{
+ \item{value}{
+ The filter value, e.g., for \code{GeneidFilter} the id of the gene
+ for which the data should be retrieved. For character values (all
+ filters except \code{SeqstartFilter} and \code{SeqendFilter}) also a
+ character vector of values is allowed. Allowed values for
+ \code{SeqstrandFilter} are: \code{"+"}, \code{"-"}, \code{"1"} or
+ \code{"-1"}.
+
+ For \code{GRangeFilter} this has to be a \code{GRanges} object.
+ }
+ \item{condition}{
+ The condition to be used in the comparison. For character values
+ \code{"="}, \code{"in"} and \code{"like"} are allowed, for numeric values
+ (\code{SeqstartFilter} and \code{SeqendFilter}) \code{"="},
+ \code{">"}, \code{">="}, \code{"<"} and \code{"<="}. Note that for
+ \code{"like"} \code{value} should be a SQL pattern
+ (e.g. \code{"ENS\%"}).
+
+ For \code{GRangesFilter}, \code{"within"} and \code{"overlapping"}
+ are allowed. See below for details.
+ }
+ \item{feature}{
+ For \code{SeqstartFilter} and \code{SeqendFilter}: the chromosomal
+ position of which features should be used in the filter (either
+ \code{"gene"}, \code{"transcript"} or \code{"exon"}).
+
+ For \code{GRangesFilter}: the submitted value is overwritten
+ internally depending on the called method, i.e. calling \code{genes}
+ will set feature to \code{"gene"}, \code{transcripts} to \code{"tx"}
+ and \code{exons} to \code{"exon"}.
+
+ }
+}
+\details{
+ \describe{
+ \item{EntrezidFilter}{
+ Filter results based on the NCBI Entrezgene ID of the genes.
+ }
+ \item{GeneidFilter}{
+ Filter results based on Ensembl gene IDs.
+ }
+ \item{GenenameFilter}{
+ Filter results based on gene names (gene symbols).
+ }
+ \item{GenebiotypeFilter}{
+ Filter results based on the biotype of the genes. For a complete
+ list of available gene biotypes use the
+ \code{\link{listGenebiotypes}} method.
+ }
+ \item{GRangesFilter}{
+ Allows to fetch features within or overlapping the specified genomic
+ region(s)/range(s). This filter takes a \code{GRanges} object as input
+ and, if \code{condition="within"} (the default) will restrict
+ results to features (genes, transcripts or exons) that are
+ completely within the region. Alternatively, by specifying
+ \code{condition="overlapping"} it will return all features that
+ are partially overlapping with the region, i.e. which start
+ coordinate is smaller than the end coordinate of the region and
+ which end coordinate is larger than the start coordinate of the
+ region. Thus, genes and transcripts that have an intron
+ overlapping the region will also be returned.
+
+ Note: if the specified \code{GRanges} object defines multiple
+ region, all features within (or overlapping) any of these regions
+ are returned.
+
+ See \code{\linkS4class{GRangesFilter}} for more details.
+ }
+ \item{TxidFilter}{
+ Filter results based on the Ensembl transcript IDs.
+ }
+ \item{TxbiotypeFilter}{
+ Filter results based on the biotype of the transcripts. For a
+ complete list of available transcript biotypes use the
+ \code{\link{listTxbiotypes}} method.
+ }
+ \item{ExonidFilter}{
+ Filter based on the Ensembl exon ID.
+ }
+ \item{ExonrankFilter}{
+ Filter results based on exon ranks (indices) of exons within
+ transcripts.
+ }
+ \item{SeqnameFilter}{
+ Filter results based on the name of the sequence the features are
+ encoded.
+ }
+ \item{SeqstrandFilter}{
+ Filter results based on the strand on which the features are encoded.
+ }
+ \item{SeqstartFilter}{
+ Filter results based on the (chromosomal) start coordinate of the
+ features (exons, genes or transcripts).
+ }
+ \item{SeqendFilter}{
+ Filter results based on the (chromosomal) end coordinates.
+ }
+ \item{SymbolFilter}{
+ Filter results based on the gene names. The database does not
+ provide an explicit \emph{symbol} column, thus this filter uses the
+ gene name instead (which in many cases corresponds to the official
+ gene name).
+ }
+ }
+}
+\value{
+ Depending on the function called an instance of:
+ \code{\linkS4class{EntrezidFilter}},
+ \code{\linkS4class{GeneidFilter}},
+ \code{\linkS4class{GenenameFilter}},
+ \code{\linkS4class{GenebiotypeFilter}},
+ \code{\linkS4class{GRangesFilter}},
+ \code{\linkS4class{TxidFilter}},
+ \code{\linkS4class{TxbiotypeFilter}},
+ \code{\linkS4class{ExonidFilter}},
+ \code{\linkS4class{ExonrankFilter}},
+ \code{\linkS4class{SeqnameFilter}},
+ \code{\linkS4class{SeqstrandFilter}},
+ \code{\linkS4class{SeqstartFilter}},
+ \code{\linkS4class{SeqendFilter}},
+ \code{\linkS4class{SymbolFilter}}
+}
+\author{
+ Johannes Rainer
+}
+\seealso{
+ \code{\linkS4class{EntrezidFilter}},
+ \code{\linkS4class{GeneidFilter}},
+ \code{\linkS4class{GenenameFilter}},
+ \code{\linkS4class{GenebiotypeFilter}},
+ \code{\linkS4class{GRangesFilter}},
+ \code{\linkS4class{TxidFilter}},
+ \code{\linkS4class{TxbiotypeFilter}},
+ \code{\linkS4class{ExonidFilter}},
+ \code{\linkS4class{ExonrankFilter}},
+ \code{\linkS4class{SeqnameFilter}},
+ \code{\linkS4class{SeqstrandFilter}},
+ \code{\linkS4class{SeqstartFilter}},
+ \code{\linkS4class{SeqendFilter}},
+ \code{\linkS4class{SymbolFilter}}
+}
+\examples{
+
+## create a filter that could be used to retrieve all informations for
+## the respective gene.
+Gif <- GeneidFilter("ENSG00000012817")
+Gif
+## returns the where condition of the SQL querys
+where(Gif)
+
+## create a filter for a chromosomal end position of a gene
+Sef <- SeqendFilter(100000, condition="<", "gene")
+Sef
+
+## To find genes within a certain chromosomal position filters should be
+## combined:
+Ssf <- SeqstartFilter(10000, condition=">", "gene")
+Snf <- SeqnameFilter("2")
+## combine the filters
+Filter <- list(Ssf, Sef, Snf)
+
+Filter
+
+## generate the where SQL call for these filters:
+where(Filter)
+
+
+## Create a GRangesFilter
+GRangesFilter(GRanges("X", IRanges(123, 5454)))
+
+## Create a GRangesFilter with multiple ranges
+grf <- GRangesFilter(GRanges(c("X", "Y"),
+ IRanges(start=c(123, 900),
+ end=c(5454, 910))))
+## Evaluate the 'where' SQL condition that would be applied.
+where(grf)
+## Change the "condition" of the filter and evaluate the
+## 'where' condition again.
+condition(grf) <- "overlapping"
+where(grf)
+
+}
+\keyword{data}
diff --git a/man/listEnsDbs.Rd b/man/listEnsDbs.Rd
new file mode 100644
index 0000000..b0258ad
--- /dev/null
+++ b/man/listEnsDbs.Rd
@@ -0,0 +1,53 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/dbhelpers.R
+\name{listEnsDbs}
+\alias{listEnsDbs}
+\title{List EnsDb databases in a MySQL server}
+\usage{
+listEnsDbs(dbcon, host, port, user, pass)
+}
+\arguments{
+\item{dbcon}{A \code{DBIConnection} object providing access to a MySQL
+database. Either \code{dbcon} or all of the other arguments have to be
+specified.}
+
+\item{host}{Character specifying the host on which the MySQL server is
+running.}
+
+\item{port}{The port of the MySQL server (usually \code{3306}).}
+
+\item{user}{The username for the MySQL server.}
+
+\item{pass}{The password for the MySQL server.}
+}
+\value{
+A \code{data.frame} listing the database names, organism name
+and Ensembl version of the EnsDb databases found on the server.
+}
+\description{
+The \code{listEnsDbs} function lists EnsDb databases in a
+MySQL server.
+}
+\details{
+The use of this function requires that the \code{RMySQL} package
+is installed and that the user has either access to a MySQL server with
+already installed EnsDb databases, or write access to a MySQL server in
+which case EnsDb databases could be added with the \code{\link{useMySQL}}
+method. EnsDb databases follow the same naming conventions than the EnsDb
+packages, with the exception that the name is all lower case and that
+\code{"."} is replaced by \code{"_"}.
+}
+\examples{
+\dontrun{
+library(RMySQL)
+dbcon <- dbConnect(MySQL(), host = "localhost", user = my_user, pass = my_pass)
+listEnsDbs(dbcon)
+}
+}
+\author{
+Johannes Rainer
+}
+\seealso{
+\code{\link{useMySQL}}
+}
+
diff --git a/man/makeEnsemblDbPackage.Rd b/man/makeEnsemblDbPackage.Rd
new file mode 100644
index 0000000..523ea58
--- /dev/null
+++ b/man/makeEnsemblDbPackage.Rd
@@ -0,0 +1,311 @@
+\name{makeEnsembldbPackage}
+\alias{ensDbFromAH}
+\alias{ensDbFromGRanges}
+\alias{ensDbFromGtf}
+\alias{ensDbFromGff}
+\alias{makeEnsembldbPackage}
+\alias{fetchTablesFromEnsembl}
+\alias{makeEnsemblSQLiteFromTables}
+\title{
+ Generating a Ensembl annotation package from Ensembl
+}
+\description{
+ The functions described on this page allow to build \code{EnsDb}
+ annotation objects/databases from Ensembl annotations. The most
+ complete set of annotations, which include also the NCBI Entrezgene
+ identifiers for each gene, can be retrieved by the functions using
+ the Ensembl Perl API (i.e. functions \code{fetchTablesFromEnsembl},
+ \code{makeEnsemblSQLiteFromTables}). Alternatively the functions
+ \code{ensDbFromAH}, \code{ensDbFromGRanges}, \code{ensDbFromGff} and
+ \code{ensDbFromGtf} can be used to build \code{EnsDb} objects using
+ GFF or GTF files from Ensembl, which can be either manually downloaded
+ from the Ensembl ftp server, or directly form within R using
+ \code{AnnotationHub}.
+ The generated SQLite database can be packaged into an R package using
+ the \code{makeEnsembldbPackage}.
+}
+\usage{
+
+ensDbFromAH(ah, outfile, path, organism, genomeVersion, version)
+
+ensDbFromGRanges(x, outfile, path, organism, genomeVersion,
+ version)
+
+ensDbFromGff(gff, outfile, path, organism, genomeVersion,
+ version)
+
+ensDbFromGtf(gtf, outfile, path, organism, genomeVersion,
+ version)
+
+fetchTablesFromEnsembl(version, ensemblapi, user="anonymous",
+ host="ensembldb.ensembl.org", pass="",
+ port=5306, species="human")
+
+makeEnsemblSQLiteFromTables(path=".", dbname)
+
+makeEnsembldbPackage(ensdb, version, maintainer, author,
+ destDir=".", license="Artistic-2.0")
+
+}
+\arguments{
+ (in alphabetical order)
+
+ \item{ah}{
+ For \code{ensDbFromAH}: an \code{AnnotationHub} object representing
+ a single resource (i.e. GTF file from Ensembl) from
+ \code{AnnotationHub}.
+ }
+
+ \item{author}{
+ The author of the package.
+ }
+
+ \item{dbname}{
+ The name for the database (optional). By default a name based on the
+ species and Ensembl version will be automatically generated (and
+ returned by the function).
+ }
+
+ \item{destDir}{
+ Where the package should be saved to.
+ }
+
+ \item{ensdb}{
+ The file name of the SQLite database generated by \code{makeEnsemblSQLiteFromTables}.
+ }
+
+ \item{ ensemblapi }{
+ The path to the Ensembl perl API installed locally on the
+ system. The Ensembl perl API version has to fit the version.
+ }
+
+ \item{genomeVersion}{
+ For \code{ensDbFromAH}, \code{ensDbFromGtf} and \code{ensDbFromGff}:
+ the version of the genome (e.g. \code{"GRCh37"}). If not provided
+ the function will try to guess it from the file name (assuming file
+ name convention of Ensembl GTF files).
+ }
+
+ \item{gff}{
+ The GFF file to import.
+ }
+
+ \item{gtf}{
+ The GTF file name.
+ }
+
+ \item{host}{
+ The hostname to access the Ensembl database.
+ }
+
+ \item{license}{
+ The license of the package.
+ }
+
+ \item{maintainer}{
+ The maintainer of the package.
+ }
+
+ \item{organism}{
+ For \code{ensDbFromAH}, \code{ensDbFromGff} and \code{ensDbFromGtf}:
+ the organism name (e.g. \code{"Homo_sapiens"}). If not provided the
+ function will try to guess it from the file name (assuming file name
+ convention of Ensembl GTF files).
+ }
+
+ \item{outfile}{
+ The desired file name of the SQLite file. If not provided the name
+ of the GTF file will be used.
+ }
+
+ \item{pass}{
+ The password for the Ensembl database.
+ }
+
+ \item{path}{
+ The directory in which the tables retrieved by
+ \code{fetchTablesFromEnsembl} or the SQLite database file generated
+ by \code{ensDbFromGtf} are stored.
+ }
+
+ \item{port}{
+ The port to be used to connect to the Ensembl database.
+ }
+
+ \item{species}{
+ The species for which the annotations should be retrieved.
+ }
+
+ \item{user}{
+ The username for the Ensembl database.
+ }
+
+ \item{version}{
+ For \code{fetchTablesFromEnsembl}, \code{ensDbFromGRanges} and \code{ensDbFromGtf}: the
+ Ensembl version for which the annotation should be retrieved
+ (e.g. 75). The \code{ensDbFromGtf} function will try to guess the
+ Ensembl version from the GTF file name if not provided.
+
+ For \code{makeEnsemblDbPackage}: the version for the package.
+ }
+
+ \item{x}{
+ For \code{ensDbFromGRanges}: the \code{GRanges} object.
+ }
+
+}
+\section{Functions}{
+ \describe{
+ \item{ensDbFromAH}{
+ Create an \code{EnsDb} (SQLite) database from a GTF file provided
+ by \code{AnnotationHub}. The function returns the file name of the
+ generated database file. For usage see the examples below.
+ }
+
+ \item{ensDbFromGff}{
+ Create an \code{EnsDb} (SQLite) database from a GFF file from
+ Ensembl. The function returns the file name of the
+ generated database file. For usage see the examples below.
+ }
+
+ \item{ensDbFromGtf}{
+ Create an \code{EnsDb} (SQLite) database from a GTF file from
+ Ensembl. The function returns the file name of the generated
+ database file. For usage see the examplesbelow.
+ }
+
+ \item{ensDbFromGRanges}{
+ Create an \code{EnsDb} (SQLite) database from a GRanges object
+ (e.g. from \code{AnnotationHub}). The function returns the file
+ name of the generated database file. For usage see the examples
+ below.
+ }
+
+ \item{fetchTablesFromEnsembl}{
+ Uses the Ensembl Perl API to fetch all required data from an
+ Ensembl database server and stores them locally to text files
+ (that can be used as input for the
+ \code{makeEnsembldbSQLiteFromTables} function).
+ }
+
+ \item{makeEnsemblSQLiteFromTables}{
+ Creates the SQLite \code{EnsDb} database from the tables generated
+ by the \code{fetchTablesFromEnsembl}.
+ }
+
+ \item{makeEnsembldbPackage}{
+ Creates an R package containing the \code{EnsDb} database from a
+ \code{EnsDb} SQLite database created by any of the above
+ functions \code{ensDbFromAH}, \code{ensDbFromGff},
+ \code{ensDbFromGtf} or \code{makeEnsemblSQLiteFromTables}.
+ }
+ }
+}
+
+\details{
+ The \code{fetchTablesFromEnsembl} function internally calls the perl
+ script \code{get_gene_transcript_exon_tables.pl} to retrieve all
+ required information from the Ensembl database using the Ensembl perl
+ API.
+
+ As an alternative way, a EnsDb database file can be generated by the
+ \code{ensDbFromGtf} or \code{ensDbFromGff} from a GTF or GFF file
+ downloaded from the Ensembl ftp server or using the \code{ensDbFromAH}
+ to build a database directly from corresponding resources from the
+ AnnotationHub. The returned database file name can then
+ be used as an input to the \code{makeEnsembldbPackage} or it can be
+ directly loaded and used by the \code{EnsDb} constructor.
+}
+\note{
+ A local installation of the Ensembl perl API is required for the
+ \code{fetchTablesFromEnsembl}. See
+ \url{http://www.ensembl.org/info/docs/api/api_installation.html} for
+ installation inscructions.
+
+ A database generated from a GTF/GFF files lacks some features as they are
+ not available in the GTF files from Ensembl. These are: NCBI
+ Entrezgene IDs.
+}
+\value{
+ \code{makeEnsemblSQLiteFromTables}, \code{ensDbFromAH},
+ \code{ensDbFromGRanges} and \code{ensDbFromGtf}: the name of the
+ SQLite file.
+}
+\seealso{
+ \code{\link{EnsDb}}, \code{\link{genes}}
+}
+\author{
+Johannes Rainer
+}
+\examples{
+
+\dontrun{
+
+ ## get all human gene/transcript/exon annotations from Ensembl (75)
+ ## the resulting tables will be stored by default to the current working
+ ## directory; if the correct Ensembl api (version 75) is defined in the
+ ## PERL5LIB environment variable, the ensemblapi parameter can also be omitted.
+ fetchTablesFromEnsembl(75,
+ ensemblapi="/home/bioinfo/ensembl/75/API/ensembl/modules",
+ species="human")
+
+ ## These tables can then be processed to generate a SQLite database
+ ## containing the annotations
+ DBFile <- makeEnsemblSQLiteFromTables()
+
+ ## and finally we can generate the package
+ makeEnsembldbPackage(ensdb=DBFile, version="0.0.1",
+ maintainer="Johannes Rainer <johannes.rainer at eurac.edu>",
+ author="J Rainer")
+
+ ## Build an annotation database form a GFF file from Ensembl.
+ ## ftp://ftp.ensembl.org/pub/release-83/gff3/rattus_norvegicus
+ gff <- "Rattus_norvegicus.Rnor_6.0.83.gff3.gz"
+ DB <- ensDbFromGff(gff=gff)
+ edb <- EnsDb(DB)
+ edb
+
+ ## Build an annotation file from a GTF file.
+ ## the GTF file can be downloaded from
+ ## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+ gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+ ## generate the SQLite database file
+ DB <- ensDbFromGtf(gtf=paste0(ensemblhost, gtffile))
+
+ ## load the DB file directly
+ EDB <- EnsDb(DB)
+
+ ## Alternatively, we could fetch a GTF file directly from AnnotationHub
+ ## and build the database from that:
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+ ## Query for all GTF files from Ensembl for Ensembl version 81
+ query(ah, c("Ensembl", "release-81", "GTF"))
+ ## We could get the one from e.g. Bos taurus:
+ DB <- ensDbFromAH(ah["AH47941"])
+ edb <- EnsDb(DB)
+ edb
+}
+
+## Generate a sqlite database for genes encoded on chromosome Y
+chrY <- system.file("chrY", package="ensembldb")
+DBFile <- makeEnsemblSQLiteFromTables(path=chrY ,dbname=tempfile())
+## load this database:
+edb <- EnsDb(DBFile)
+
+edb
+
+## Generate a sqlite database from a GRanges object specifying
+## genes encoded on chromosome Y
+load(system.file("YGRanges.RData", package="ensembldb"))
+
+Y
+
+DB <- ensDbFromGRanges(Y, path=tempdir(), version=75,
+ organism="Homo_sapiens")
+edb <- EnsDb(DB)
+
+
+}
+\keyword{ data }
+
diff --git a/man/runEnsDbApp.Rd b/man/runEnsDbApp.Rd
new file mode 100644
index 0000000..a46a290
--- /dev/null
+++ b/man/runEnsDbApp.Rd
@@ -0,0 +1,41 @@
+\name{runEnsDbApp}
+\alias{runEnsDbApp}
+\title{
+ Search annotations interactively
+}
+\description{
+ This function starts the interactive \code{EnsDb} shiny web application that
+ allows to look up gene/transcript/exon annotations from an \code{EnsDb}
+ annotation package installed locally.
+}
+\usage{
+
+ runEnsDbApp(...)
+
+}
+\arguments{
+
+ \item{...}{
+ Additional arguments passed to the \code{\link[shiny]{runApp}} function
+ from the \code{shiny} package.
+ }
+
+}
+\details{
+ The \code{shiny} based web application allows to look up any annotation
+ available in any of the locally installed \code{EnsDb} annotation packages.
+}
+\value{
+ If the button \emph{Return & close} is clicked, the function returns
+ the results of the present query either as \code{data.frame} or as
+ \code{GRanges} object.
+}
+\seealso{
+ \code{\link{EnsDb}}, \code{\link{genes}}
+}
+\author{
+Johannes Rainer
+}
+\keyword{data}
+\keyword{shiny}
+
diff --git a/man/useMySQL-EnsDb-method.Rd b/man/useMySQL-EnsDb-method.Rd
new file mode 100644
index 0000000..774bb1f
--- /dev/null
+++ b/man/useMySQL-EnsDb-method.Rd
@@ -0,0 +1,56 @@
+% Generated by roxygen2: do not edit by hand
+% Please edit documentation in R/Methods.R
+\docType{methods}
+\name{useMySQL,EnsDb-method}
+\alias{useMySQL}
+\alias{useMySQL,EnsDb-method}
+\title{Use a MySQL backend}
+\usage{
+\S4method{useMySQL}{EnsDb}(x, host = "localhost", port = 3306, user, pass)
+}
+\arguments{
+\item{x}{The \code{\linkS4class{EnsDb}} object.}
+
+\item{host}{Character vector specifying the host on which the MySQL
+server runs.}
+
+\item{port}{The port on which the MySQL server can be accessed.}
+
+\item{user}{The user name for the MySQL server.}
+
+\item{pass}{The password for the MySQL server.}
+}
+\value{
+A \code{\linkS4class{EnsDb}} object providing access to the
+data stored in the MySQL backend.
+}
+\description{
+Change the SQL backend from \emph{SQLite} to \emph{MySQL}.
+When first called on an \code{\linkS4class{EnsDb}} object, the function
+tries to create and save all of the data into a MySQL database. All
+subsequent calls will connect to the already existing MySQL database.
+}
+\details{
+This functionality requires that the \code{RMySQL} package is
+installed and that the user has (write) access to a running MySQL server.
+If the corresponding database does already exist users without write access
+can use this functionality.
+}
+\note{
+At present the function does not evaluate whether the versions
+between the SQLite and MySQL database differ.
+}
+\examples{
+## Load the EnsDb database (SQLite backend).
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+## Now change the backend to MySQL; my_user and my_pass should
+## be the user name and password to access the MySQL server.
+\dontrun{
+edb_mysql <- useMySQL(edb, host = "localhost", user = my_user, pass = my_pass)
+}
+}
+\author{
+Johannes Rainer
+}
+
diff --git a/tests/runTests.R b/tests/runTests.R
new file mode 100644
index 0000000..785dbbe
--- /dev/null
+++ b/tests/runTests.R
@@ -0,0 +1 @@
+BiocGenerics:::testPackage("ensembldb")
diff --git a/vignettes/MySQL-backend.Rmd b/vignettes/MySQL-backend.Rmd
new file mode 100644
index 0000000..0acd514
--- /dev/null
+++ b/vignettes/MySQL-backend.Rmd
@@ -0,0 +1,74 @@
+---
+title: "Using a MySQL server backend"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Using a MySQL server backend}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 20 September, 2016<br />
+**Compiled**: `r date()`
+
+# Introduction
+
+`ensembldb` uses by default, similar to other annotation packages in Bioconductor,
+a SQLite database backend, i.e. annotations are retrieved from file-based SQLite
+databases that are provided *via* packages, such as the `EnsDb.Hsapiens.v75`
+package. In addition, `ensembldb` allows to switch the backend from SQLite to
+MySQL and thus to retrieve annotations from a MySQL server instead. Such a setup
+might be useful for a lab running a well-configured MySQL server that would
+require installation of EnsDb databases only on the database server and not on
+the individual clients.
+
+**Note** the code in this document is not executed during vignette generation as
+this would require access to a MySQL server.
+
+# Using `ensembldb` with a MySQL server
+
+Installation of `EnsDb` databases in a MySQL server is straight forward - given
+that the user has write access to the server:
+
+```{r eval=FALSE}
+library(ensembldb)
+## Load the EnsDb package that should be installed on the MySQL server
+library(EnsDb.Hsapiens.v75)
+
+## Call the useMySQL method providing the required credentials to create
+## databases and inserting data on the MySQL server
+edb_mysql <- useMySQL(EnsDb.Hsapiens.v75, host = "localhost", user = "userwrite",
+ pass = "userpass")
+
+## Use this EnsDb object
+genes(edb_mysql)
+```
+
+To use an `EnsDb` in a MySQL server without the need to install the corresponding
+R-package, the connection to the database can be passed to the `EnsDb` constructor
+function. With the resulting `EnsDb` object annotations can be retrieved from the
+MySQL database.
+
+```{r eval=FALSE}
+library(ensembldb)
+library(RMySQL)
+
+## Connect to the MySQL database to list the databases.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly")
+
+## List the available databases
+listEnsDbs(dbcon)
+
+## Connect to one of the databases and use that one.
+dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly", dbname = "ensdb_hsapiens_v75")
+edb <- EnsDb(dbcon)
+edb
+```
diff --git a/vignettes/MySQL-backend.org b/vignettes/MySQL-backend.org
new file mode 100644
index 0000000..50dd77d
--- /dev/null
+++ b/vignettes/MySQL-backend.org
@@ -0,0 +1,88 @@
+#+TITLE: Using a MySQL server backend
+#+AUTHOR: Johannes Rainer
+#+EMAIL: johannes.rainer at eurac.edu
+#+OPTIONS: ^:{} toc:nil
+#+PROPERTY: exports code
+#+PROPERTY: session *R*
+
+#+BEGIN_html
+---
+title: "Using a MySQL server backend"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Using a MySQL server backend}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+#+END_html
+
+# #+BEGIN_EXPORT html
+
+#+BEGIN_html
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 20 September, 2016<br />
+**Compiled**: `r date()`
+#+END_html
+
+** Introduction
+
+=ensembldb= uses by default, similar to other annotation packages in Bioconductor,
+a SQLite database backend, i.e. annotations are retrieved from file-based SQLite
+databases that are provided /via/ packages, such as the =EnsDb.Hsapiens.v75=
+package. In addition, =ensembldb= allows to switch the backend from SQLite to
+MySQL and thus to retrieve annotations from a MySQL server instead. Such a setup
+might be useful for a lab running a well-configured MySQL server that would
+require installation of EnsDb databases only on the database server and not on
+the individual clients.
+
+*Note* the code in this document is not executed during vignette generation as
+this would require access to a MySQL server.
+
+** Using =ensembldb= with a MySQL server
+
+Installation of =EnsDb= databases in a MySQL server is straight forward - given
+that the user has write access to the server:
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ library(ensembldb)
+ ## Load the EnsDb package that should be installed on the MySQL server
+ library(EnsDb.Hsapiens.v75)
+
+ ## Call the useMySQL method providing the required credentials to create
+ ## databases and inserting data on the MySQL server
+ edb_mysql <- useMySQL(EnsDb.Hsapiens.v75, host = "localhost", user = "userwrite",
+ pass = "userpass")
+
+ ## Use this EnsDb object
+ genes(edb_mysql)
+#+END_SRC
+
+To use an =EnsDb= in a MySQL server without the need to install the corresponding
+R-package, the connection to the database can be passed to the =EnsDb= constructor
+function. With the resulting =EnsDb= object annotations can be retrieved from the
+MySQL database.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ library(ensembldb)
+ library(RMySQL)
+
+ ## Connect to the MySQL database to list the databases.
+ dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly")
+
+ ## List the available databases
+ listEnsDbs(dbcon)
+
+ ## Connect to one of the databases and use that one.
+ dbcon <- dbConnect(MySQL(), host = "localhost", user = "readonly",
+ pass = "readonly", dbname = "ensdb_hsapiens_v75")
+ edb <- EnsDb(dbcon)
+ edb
+#+END_SRC
+
diff --git a/vignettes/ensembldb.Rmd b/vignettes/ensembldb.Rmd
new file mode 100644
index 0000000..44420d6
--- /dev/null
+++ b/vignettes/ensembldb.Rmd
@@ -0,0 +1,920 @@
+---
+title: "Generating an using Ensembl based annotation packages"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 12 September, 2016<br />
+**Compiled**: `r date()`
+
+# Introduction
+
+The `ensembldb` package provides functions to create and use transcript centric
+annotation databases/packages. The annotation for the databases are directly
+fetched from Ensembl <sup><a id="fnr.1" class="footref" href="#fn.1">1</a></sup> using their Perl API. The functionality and data is
+similar to that of the `TxDb` packages from the `GenomicFeatures` package, but,
+in addition to retrieve all gene/transcript models and annotations from the
+database, the `ensembldb` package provides also a filter framework allowing to
+retrieve annotations for specific entries like genes encoded on a chromosome
+region or transcript models of lincRNA genes. In the databases, along with the
+gene and transcript models and their chromosomal coordinates, additional
+annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
+well as the gene and transcript biotypes are stored too (see Section
+[11](#orgtarget1) for the database layout and an overview of available
+attributes/columns).
+
+Another main goal of this package is to generate *versioned* annotation
+packages, i.e. annotation packages that are build for a specific Ensembl
+release, and are also named according to that (e.g. `EnsDb.Hsapiens.v75` for
+human gene definitions of the Ensembl code database version 75). This ensures
+reproducibility, as it allows to load annotations from a specific Ensembl
+release also if newer versions of annotation packages/releases are available. It
+also allows to load multiple annotation packages at the same time in order to
+e.g. compare gene models between Ensembl releases.
+
+In the example below we load an Ensembl based annotation package for Homo
+sapiens, Ensembl version 75. The connection to the database is bound to the
+variable `EnsDb.Hsapiens.v75`.
+
+```{r warning=FALSE, message=FALSE}
+library(EnsDb.Hsapiens.v75)
+
+## Making a "short cut"
+edb <- EnsDb.Hsapiens.v75
+## print some informations for this package
+edb
+
+## for what organism was the database generated?
+organism(edb)
+```
+
+# Using `ensembldb` annotation packages to retrieve specific annotations
+
+The `ensembldb` package provides a set of filter objects allowing to specify
+which entries should be fetched from the database. The complete list of filters,
+which can be used individually or can be combined, is shown below (in
+alphabetical order):
+
+- `ExonidFilter`: allows to filter the result based on the (Ensembl) exon
+ identifiers.
+- `ExonrankFilter`: filter results on the rank (index) of an exon within the
+ transcript model. Exons are always numbered from 5' to 3' end of the
+ transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
+ of the transcript.
+- `EntrezidFilter`: allows to filter results based on NCBI Entrezgene
+ identifiers of the genes.
+- `GenebiotypeFilter`: allows to filter for the gene biotypes defined in the
+ Ensembl database; use the `listGenebiotypes` method to list all available
+ biotypes.
+- `GeneidFilter`: allows to filter based on the Ensembl gene IDs.
+- `GenenameFilter`: allows to filter based on the names (symbols) of the genes.
+- `SymbolFilter`: allows to filter on gene symbols; note that no database columns
+ *symbol* is available in an `EnsDb` database and hence the gene name is used for
+ filtering.
+- `GRangesFilter`: allows to retrieve all features (genes, transcripts or exons)
+ that are either within (setting `condition` to "within") or partially
+ overlapping (setting `condition` to "overlapping") the defined genomic
+ region/range. Note that, depending on the called method (`genes`, `transcripts`
+ or `exons`) the start and end coordinates of either the genes, transcripts or
+ exons are used for the filter. For methods `exonsBy`, `cdsBy` and `txBy` the
+ coordinates of `by` are used.
+- `SeqendFilter`: filter based on the chromosomal end coordinate of the exons,
+ transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
+ =feature = "gene"=).
+- `SeqnameFilter`: filter by the name of the chromosomes the genes are encoded
+ on.
+- `SeqstartFilter`: filter based on the chromosomal start coordinates of the
+ exons, transcripts or genes (correspondingly set =feature = "exon"=,
+ =feature = "tx"= or =feature = "gene"=).
+- `SeqstrandFilter`: filter for the chromosome strand on which the genes are
+ encoded.
+- `TxbiotypeFilter`: filter on the transcript biotype defined in Ensembl; use
+ the `listTxbiotypes` method to list all available biotypes.
+- `TxidFilter`: filter on the Ensembl transcript identifiers.
+
+Each of the filter classes can take a single value or a vector of values (with
+the exception of the `SeqendFilter` and `SeqstartFilter`) for comparison. In
+addition, it is possible to specify the *condition* for the filter,
+e.g. setting `condition` to = to retrieve all entries matching the filter value,
+to != to negate the filter or setting `condition = "like"= to allow
+partial matching. The =condition` parameter for `SeqendFilter` and
+`SeqendFilter` can take the values = , >, >=, < and <= (since these
+filters base on numeric values).
+
+A simple example would be to get all transcripts for the gene *BCL2L11*. To this
+end we specify a `GenenameFilter` with the value *BCL2L11*. As a result we get
+a `GRanges` object with `start`, `end`, `strand` and `seqname` of the `GRanges`
+object being the start coordinate, end coordinate, chromosome name and strand
+for the respective transcripts. All additional annotations are available as
+metadata columns. Alternatively, by setting `return.type` to "DataFrame", or
+"data.frame" the method would return a `DataFrame` or `data.frame` object.
+
+```{r }
+Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
+
+Tx
+
+## as this is a GRanges object we can access e.g. the start coordinates with
+head(start(Tx))
+
+## or extract the biotype with
+head(Tx$tx_biotype)
+```
+
+The parameter `columns` of the `exons`, `genes` and `transcripts` method allows
+to specify which database attributes (columns) should be retrieved. The `exons`
+method returns by default all exon-related columns, the `transcripts` all columns
+from the transcript database table and the `genes` all from the gene table. Note
+however that in the example above we got also a column `gene_name` although this
+column is not present in the transcript database table. By default the methods
+return also all columns that are used by any of the filters submitted with the
+`filter` argument (thus, because a `GenenameFilter` was used, the column `gene_name`
+is also returned). Setting `returnFilterColumns(edb) <- FALSE` disables this
+option and only the columns specified by the `columns` parameter are retrieved.
+
+To get an overview of database tables and available columns the function
+`listTables` can be used. The method `listColumns` on the other hand lists columns
+for the specified database table.
+
+```{r }
+## list all database tables along with their columns
+listTables(edb)
+
+## list columns from a specific table
+listColumns(edb, "tx")
+```
+
+Thus, we could retrieve all transcripts of the biotype *nonsense\_mediated\_decay*
+(which, according to the definitions by Ensembl are transcribed, but most likely
+not translated in a protein, but rather degraded after transcription) along with
+the name of the gene for each transcript. Note that we are changing here the
+`return.type` to `DataFrame`, so the method will return a `DataFrame` with the
+results instead of the default `GRanges`.
+
+```{r }
+Tx <- transcripts(edb,
+ columns = c(listColumns(edb , "tx"), "gene_name"),
+ filter = TxbiotypeFilter("nonsense_mediated_decay"),
+ return.type = "DataFrame")
+nrow(Tx)
+Tx
+```
+
+For protein coding transcripts, we can also specifically extract their coding
+region. In the example below we extract the CDS for all transcripts encoded on
+chromosome Y.
+
+```{r }
+yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+yCds
+```
+
+Using a `GRangesFilter` we can retrieve all features from the database that are
+either within or overlapping the specified genomic region. In the example
+below we query all genes that are partially overlapping with a small region on
+chromosome 11. The filter restricts to all genes for which either an exon or an
+intron is partially overlapping with the region.
+
+```{r }
+## Define the filter
+grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+ strand = "+"), condition = "overlapping")
+
+## Query genes:
+gn <- genes(edb, filter = grf)
+gn
+
+## Next we retrieve all transcripts for that gene so that we can plot them.
+txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+```
+
+```{r tx-for-zbtb16, message=FALSE, fig.align='center', fig.width=7.5, fig.height=5}
+plot(3, 3, pch = NA, xlim = c(start(gn), end(gn)), ylim = c(0, length(txs)),
+ yaxt = "n", ylab = "")
+## Highlight the GRangesFilter region
+rect(xleft = start(grf), xright = end(grf), ybottom = 0, ytop = length(txs),
+ col = "red", border = "red")
+for(i in 1:length(txs)) {
+ current <- txs[i]
+ rect(xleft = start(current), xright = end(current), ybottom = i-0.975,
+ ytop = i-0.125, border = "grey")
+ text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
+}
+```
+
+As we can see, 4 transcripts of the gene ZBTB16 are also overlapping the
+region. Below we fetch these 4 transcripts. Note, that a call to `exons` will
+not return any features from the database, as no exon is overlapping with the
+region.
+
+```{r }
+transcripts(edb, filter = grf)
+```
+
+The `GRangesFilter` supports also `GRanges` defining multiple regions and a
+query will return all features overlapping any of these regions. Besides using
+the `GRangesFilter` it is also possible to search for transcripts or exons
+overlapping genomic regions using the `exonsByOverlaps` or
+`transcriptsByOverlaps` known from the `GenomicFeatures` package. Note that the
+implementation of these methods for `EnsDb` objects supports also to use filters
+to further fine-tune the query.
+
+To get an overview of allowed/available gene and transcript biotype the
+functions `listGenebiotypes` and `listTxbiotypes` can be used.
+
+```{r }
+## Get all gene biotypes from the database. The GenebiotypeFilter
+## allows to filter on these values.
+listGenebiotypes(edb)
+
+## Get all transcript biotypes from the database.
+listTxbiotypes(edb)
+```
+
+Data can be fetched in an analogous way using the `exons` and `genes`
+methods. In the example below we retrieve `gene_name`, `entrezid` and the
+`gene_biotype` of all genes in the database which names start with "BCL2".
+
+```{r }
+## We're going to fetch all genes which names start with BCL. To this end
+## we define a GenenameFilter with partial matching, i.e. condition "like"
+## and a % for any character/string.
+BCLs <- genes(edb,
+ columns = c("gene_name", "entrezid", "gene_biotype"),
+ filter = list(GenenameFilter("BCL%", condition = "like")),
+ return.type = "DataFrame")
+nrow(BCLs)
+BCLs
+```
+
+Sometimes it might be useful to know the length of genes or transcripts
+(i.e. the total sum of nucleotides covered by their exons). Below we calculate
+the mean length of transcripts from protein coding genes on chromosomes X and Y
+as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
+these chromosomes.
+
+```{r }
+## determine the average length of snRNA, snoRNA and rRNA genes encoded on
+## chromosomes X and Y.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+ SeqnameFilter(c("X", "Y")))))
+
+## determine the average length of protein coding genes encoded on the same
+## chromosomes.
+mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))
+```
+
+Not unexpectedly, transcripts of protein coding genes are longer than those of
+snRNA, snoRNA or rRNA genes.
+
+At last we extract the first two exons of each transcript model from the
+database.
+
+```{r }
+## Extract all exons 1 and (if present) 2 for all genes encoded on the
+## Y chromosome
+exons(edb, columns = c("tx_id", "exon_idx"),
+ filter = list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition = "<")))
+```
+
+# Extracting gene/transcript/exon models for RNASeq feature counting
+
+For the feature counting step of an RNAseq experiment, the gene or transcript
+models (defined by the chromosomal start and end positions of their exons) have
+to be known. To extract these from an Ensembl based annotation package, the
+`exonsBy`, `genesBy` and `transcriptsBy` methods can be used in an analogous way as in
+`TxDb` packages generated by the `GenomicFeatures` package. However, the
+`transcriptsBy` method does not, in contrast to the method in the `GenomicFeatures`
+package, allow to return transcripts by "cds". While the annotation packages
+built by the `ensembldb` contain the chromosomal start and end coordinates of
+the coding region (for protein coding genes) they do not assign an ID to each
+CDS.
+
+A simple use case is to retrieve all genes encoded on chromosomes X and Y from
+the database.
+
+```{r }
+TxByGns <- transcriptsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c("X", "Y")))
+ )
+TxByGns
+```
+
+Since Ensembl contains also definitions of genes that are on chromosome variants
+(supercontigs), it is advisable to specify the chromosome names for which the
+gene models should be returned.
+
+In a real use case, we might thus want to retrieve all genes encoded on the
+*standard* chromosomes. In addition it is advisable to use a `GeneidFilter` to
+restrict to Ensembl genes only, as also *LRG* (Locus Reference Genomic)
+genes<sup><a id="fnr.2" class="footref" href="#fn.2">2</a></sup> are defined in the database, which are partially redundant with
+Ensembl genes.
+
+```{r eval=FALSE}
+## will just get exons for all genes on chromosomes 1 to 22, X and Y.
+## Note: want to get rid of the "LRG" genes!!!
+EnsGenes <- exonsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+```
+
+The code above returns a `GRangesList` that can be used directly as an input for
+the `summarizeOverlaps` function from the `GenomicAlignments` package <sup><a id="fnr.3" class="footref" href="#fn.3">3</a></sup>.
+
+Alternatively, the above `GRangesList` can be transformed to a `data.frame` in
+*SAF* format that can be used as an input to the `featureCounts` function of the
+`Rsubread` package <sup><a id="fnr.4" class="footref" href="#fn.4">4</a></sup>.
+
+```{r eval=FALSE}
+## Transforming the GRangesList into a data.frame in SAF format
+EnsGenes.SAF <- toSAF(EnsGenes)
+```
+
+Note that the ID by which the `GRangesList` is split is used in the SAF
+formatted `data.frame` as the `GeneID`. In the example below this would be the
+Ensembl gene IDs, while the start, end coordinates (along with the strand and
+chromosomes) are those of the the exons.
+
+In addition, the `disjointExons` function (similar to the one defined in
+`GenomicFeatures`) can be used to generate a `GRanges` of non-overlapping exon
+parts which can be used in the `DEXSeq` package.
+
+```{r eval=FALSE}
+## Create a GRanges of non-overlapping exon parts.
+DJE <- disjointExons(edb,
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+```
+
+# Retrieving sequences for gene/transcript/exon models
+
+The methods to retrieve exons, transcripts and genes (i.e. `exons`, `transcripts`
+and `genes`) return by default `GRanges` objects that can be used to retrieve
+sequences using the `getSeq` method e.g. from BSgenome packages. The basic
+workflow is thus identical to the one for `TxDb` packages, however, it is not
+straight forward to identify the BSgenome package with the matching genomic
+sequence. Most BSgenome packages are named according to the genome build
+identifier used in UCSC which does not (always) match the genome build name used
+by Ensembl. Using the Ensembl version provided by the `EnsDb`, the correct genomic
+sequence can however be retrieved easily from the `AnnotationHub` using the
+`getGenomeFaFile`. If no Fasta file matching the Ensembl version is available, the
+function tries to identify a Fasta file with the correct genome build from the
+*closest* Ensembl release and returns that instead.
+
+In the code block below we retrieve first the `FaFile` with the genomic DNA
+sequence, extract the genomic start and end coordinates for all genes defined in
+the package, subset to genes encoded on sequences available in the `FaFile` and
+extract all of their sequences. Note: these sequences represent the sequence
+between the chromosomal start and end coordinates of the gene.
+
+```{r eval=FALSE}
+library(EnsDb.Hsapiens.v75)
+library(Rsamtools)
+edb <- EnsDb.Hsapiens.v75
+
+## Get the FaFile with the genomic sequence matching the Ensembl version
+## using the AnnotationHub package.
+Dna <- getGenomeFaFile(edb)
+
+## Get start/end coordinates of all genes.
+genes <- genes(edb)
+## Subset to all genes that are encoded on chromosomes for which
+## we do have DNA sequence available.
+genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+## Get the gene sequences, i.e. the sequence including the sequence of
+## all of the gene's exons and introns.
+geneSeqs <- getSeq(Dna, genes)
+```
+
+To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
+use directly the `extractTranscriptSeqs` method defined in the `GenomicFeatures` on
+the `EnsDb` object, eventually using a filter to restrict the query.
+
+```{r eval=FALSE}
+## get all exons of all transcripts encoded on chromosome Y
+yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+
+## Retrieve the sequences for these transcripts from the FaFile.
+library(GenomicFeatures)
+yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
+yTxSeqs
+
+## Extract the sequences of all transcripts encoded on chromosome Y.
+yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+
+## Along these lines, we could use the method also to retrieve the coding sequence
+## of all transcripts on the Y chromosome.
+cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+extractTranscriptSeqs(Dna, cdsY)
+```
+
+Note: in the next section we describe how transcript sequences can be retrieved
+from a `BSgenome` package that is based on UCSC, not Ensembl.
+
+# Integrating annotations from Ensembl based `EnsDb` packages with UCSC based annotations
+
+Sometimes it might be useful to combine (Ensembl based) annotations from `EnsDb`
+packages/objects with annotations from other Bioconductor packages, that might
+base on UCSC annotations. To support such an integration of annotations, the
+`ensembldb` packages implements the `seqlevelsStyle` and `seqlevelsStyle<-` from the
+`GenomeInfoDb` package that allow to change the style of chromosome naming. Thus,
+sequence/chromosome names other than those used by Ensembl can be used in, and
+are returned by, the queries to `EnsDb` objects as long as a mapping for them is
+provided by the `GenomeInfoDb` package (which provides a mapping mostly between
+UCSC, NCBI and Ensembl chromosome names for the *main* chromosomes).
+
+In the example below we change the seqnames style to UCSC.
+
+```{r message=FALSE}
+## Change the seqlevels style form Ensembl (default) to UCSC:
+seqlevelsStyle(edb) <- "UCSC"
+
+## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
+genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+## The seqlevels of the returned GRanges are also in UCSC style
+seqlevels(genesY)
+```
+
+Note that in most instances no mapping is available for sequences not
+corresponding to the main chromosomes (i.e. contigs, patched chromosomes
+etc). What is returned in cases in which no mapping is available can be
+specified with the global `ensembldb.seqnameNotFound` option. By default (with
+`ensembldb.seqnameNotFound` set to "ORIGINAL"), the original seqnames (i.e. the
+ones from Ensembl) are returned. With `ensembldb.seqnameNotFound` "MISSING" each
+time a seqname can not be found an error is thrown. For all other cases
+(e.g. `ensembldb.seqnameNotFound = NA`) the value of the option is returned.
+
+```{r }
+seqlevelsStyle(edb) <- "UCSC"
+
+## Getting the default option:
+getOption("ensembldb.seqnameNotFound")
+
+## Listing all seqlevels in the database.
+seqlevels(edb)[1:30]
+
+## Setting the option to NA, thus, for each seqname for which no mapping is available,
+## NA is returned.
+options(ensembldb.seqnameNotFound=NA)
+seqlevels(edb)[1:30]
+
+## Resetting the option.
+options(ensembldb.seqnameNotFound = "ORIGINAL")
+```
+
+Next we retrieve transcript sequences from genes encoded on chromosome Y using
+the `BSGenome` package for the human genome from UCSC. The specified version
+`hg19` matches the genome build of Ensembl version 75, i.e. `GRCh37`. Note that
+while we changed the style of the seqnames to UCSC we did not change the naming
+of the genome release.
+
+```{r warning=FALSE, message=FALSE}
+library(BSgenome.Hsapiens.UCSC.hg19)
+bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+## Get the genome version
+unique(genome(bsg))
+unique(genome(edb))
+## Although differently named, both represent genome build GRCh37.
+
+## Extract the full transcript sequences.
+yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+
+yTxSeqs
+
+## Extract just the CDS
+Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
+yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+yTxCds
+```
+
+At last changing the seqname style to the default value ="Ensembl"=.
+
+```{r }
+seqlevelsStyle(edb) <- "Ensembl"
+```
+
+# Interactive annotation lookup using the `shiny` web app
+
+In addition to the `genes`, `transcripts` and `exons` methods it is possibly to
+search interactively for gene/transcript/exon annotations using the internal,
+`shiny` based, web application. The application can be started with the
+`runEnsDbApp()` function. The search results from this app can also be returned
+to the R workspace either as a `data.frame` or `GRanges` object.
+
+# Plotting gene/transcript features using `ensembldb` and `Gviz`
+
+The `Gviz` package provides functions to plot genes and transcripts along with
+other data on a genomic scale. Gene models can be provided either as a
+`data.frame`, `GRanges`, `TxDB` database, can be fetched from biomart and can
+also be retrieved from `ensembldb`.
+
+Below we generate a `GeneRegionTrack` fetching all transcripts from a certain
+region on chromosome Y.
+
+Note that if we want in addition to work also with BAM files that were aligned
+against DNA sequences retrieved from Ensembl or FASTA files representing genomic
+DNA sequences from Ensembl we should change the `ucscChromosomeNames` option from
+`Gviz` to `FALSE` (i.e. by calling `options(ucscChromosomeNames = FALSE)`). This is
+not necessary if we just want to retrieve gene models from an `EnsDb` object, as
+the `ensembldb` package internally checks the `ucscChromosomeNames` option and,
+depending on that, maps Ensembl chromosome names to UCSC chromosome names.
+
+```{r gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+## Loading the Gviz library
+library(Gviz)
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## Retrieving a Gviz compatible GRanges object with all genes
+## encoded on chromosome Y.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "Y",
+ start = 20400000, end = 21400000)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+
+## We have to change the ucscChromosomeNames option to FALSE to enable Gviz usage
+## with non-UCSC chromosome names.
+options(ucscChromosomeNames = FALSE)
+
+plotTracks(list(gat, GeneRegionTrack(gr)))
+
+options(ucscChromosomeNames = TRUE)
+```
+
+Above we had to change the option `ucscChromosomeNames` to `FALSE` in order to
+use it with non-UCSC chromosome names. Alternatively, we could however also
+change the `seqnamesStyle` of the `EnsDb` object to `UCSC`. Note that we have to
+use now also chromosome names in the *UCSC style* in the `SeqnameFilter`
+(i.e. "chrY" instead of `Y`).
+
+```{r message=FALSE}
+seqlevelsStyle(edb) <- "UCSC"
+## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
+gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000)
+seqnames(gr)
+## Define a genome axis track
+gat <- GenomeAxisTrack()
+plotTracks(list(gat, GeneRegionTrack(gr)))
+```
+
+We can also use the filters from the `ensembldb` package to further refine what
+transcripts are fetched, like in the example below, in which we create two
+different gene region tracks, one for protein coding genes and one for lincRNAs.
+
+```{r gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25}
+protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("protein_coding"))
+lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("lincRNA"))
+
+plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
+ GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
+
+## At last we change the seqlevels style again to Ensembl
+seqlevelsStyle <- "Ensembl"
+```
+
+# Using `EnsDb` objects in the `AnnotationDbi` framework
+
+Most of the methods defined for objects extending the basic annotation package
+class `AnnotationDbi` are also defined for `EnsDb` objects (i.e. methods
+`columns`, `keytypes`, `keys`, `mapIds` and `select`). While these methods can
+be used analogously to basic annotation packages, the implementation for `EnsDb`
+objects also support the filtering framework of the `ensembldb` package.
+
+In the example below we first evaluate all the available columns and keytypes in
+the database and extract then the gene names for all genes encoded on chromosome
+X.
+
+```{r }
+library(EnsDb.Hsapiens.v75)
+edb <- EnsDb.Hsapiens.v75
+
+## List all available columns in the database.
+columns(edb)
+
+## Note that these do *not* correspond to the actual column names
+## of the database that can be passed to methods like exons, genes,
+## transcripts etc. These column names can be listed with the listColumns
+## method.
+listColumns(edb)
+
+## List all of the supported key types.
+keytypes(edb)
+
+## Get all gene ids from the database.
+gids <- keys(edb, keytype = "GENEID")
+length(gids)
+
+## Get all gene names for genes encoded on chromosome Y.
+gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+head(gnames)
+```
+
+In the next example we retrieve specific information from the database using the
+`select` method. First we fetch all transcripts for the genes *BCL2* and
+*BCL2L11*. In the first call we provide the gene names, while in the second call
+we employ the filtering system to perform a more fine-grained query to fetch
+only the protein coding transcripts for these genes.
+
+```{r warning=FALSE}
+## Use the /standard/ way to fetch data.
+select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
+## Use the filtering system of ensembldb
+select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+```
+
+Finally, we use the `mapIds` method to establish a mapping between ids and
+values. In the example below we fetch transcript ids for the two genes from the
+example above.
+
+```{r }
+## Use the default method, which just returns the first value for multi mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
+
+## Alternatively, specify multiVals="list" to return all mappings.
+mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
+ multiVals = "list")
+
+## And, just like before, we can use filters to map only to protein coding transcripts.
+mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column = "TXID",
+ multiVals = "list")
+```
+
+Note that, if the filters are used, the ordering of the result does no longer
+match the ordering of the genes.
+
+# Important notes
+
+These notes might explain eventually unexpected results (and, more importantly,
+help avoiding them):
+
+- The ordering of the results returned by the `genes`, `exons`, `transcripts` methods
+ can be specified with the `order.by` parameter. The ordering of the results does
+ however **not** correspond to the ordering of values in submitted filter
+ objects. The exception is the `select` method. If a character vector of values
+ or a single filter is passed with argument `keys` the ordering of results of
+ this method matches the ordering of the key values or the values of the
+ filter.
+
+- Results of `exonsBy`, `transcriptsBy` are always ordered by the `by` argument.
+
+- The CDS provided by `EnsDb` objects **always** includes both, the start and the
+ stop codon.
+
+- Transcripts with multiple CDS are at present not supported by `EnsDb`.
+
+- At present, `EnsDb` support only genes/transcripts for which all of their
+ exons are encoded on the same chromosome and the same strand.
+
+# Building an transcript-centric database package based on Ensembl annotation
+
+The code in this section is not supposed to be automatically executed when the
+vignette is built, as this would require a working installation of the Ensembl
+Perl API, which is not expected to be available on each system. Also, building
+`EnsDb` from alternative sources, like GFF or GTF files takes some time and
+thus also these examples are not directly executed when the vignette is build.
+
+## Requirements
+
+The `fetchTablesFromEnsembl` function of the package uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site *ensembldb.ensembl.org*). Thus, to use the functionality to built
+databases, the Ensembl Perl API needs to be installed (see <sup><a id="fnr.5" class="footref" href="#fn.5">5</a></sup> for details).
+
+Alternatively, the `ensDbFromAH`, `ensDbFromGff`, `ensDbFromGRanges` and `ensDbFromGtf`
+functions allow to build EnsDb SQLite files from a `GRanges` object or GFF/GTF
+files from Ensembl (either provided as files or *via* `AnnotationHub`). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files.
+
+## Building annotation packages
+
+The functions below use the Ensembl Perl API to fetch the required data directly
+from the Ensembl core databases. Thus, the path to the Perl API specific for the
+desired Ensembl version needs to be added to the `PERL5LIB` environment variable.
+
+An annotation package containing all human genes for Ensembl version 75 can be
+created using the code in the block below.
+
+```{r eval=FALSE}
+library(ensembldb)
+
+## get all human gene/transcript/exon annotations from Ensembl (75)
+## the resulting tables will be stored by default to the current working
+## directory
+fetchTablesFromEnsembl(75, species = "human")
+
+## These tables can then be processed to generate a SQLite database
+## containing the annotations (again, the function assumes the required
+## txt files to be present in the current working directory)
+DBFile <- makeEnsemblSQLiteFromTables()
+
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+```
+
+The generated package can then be build using `R CMD build EnsDb.Hsapiens.v75`
+and installed with `R CMD INSTALL EnsDb.Hsapiens.v75*`. Note that we could
+directly generate an `EnsDb` instance by loading the database file, i.e. by
+calling `edb <- EnsDb(DBFile)` and work with that annotation object.
+
+To fetch and build annotation packages for plant genomes (e.g. arabidopsis
+thaliana), the *Ensembl genomes* should be specified as a host, i.e. setting
+`host` to "mysql-eg-publicsql.ebi.ac.uk", `port` to `4157` and `species` to
+e.g. "arabidopsis thaliana".
+
+In the next example we create an `EnsDb` database using the `AnnotationHub`
+package and load also the corresponding genomic DNA sequence matching the
+Ensembl version. We thus first query the `AnnotationHub` package for all
+resources available for `Mus musculus` and the Ensembl release 77. Next we
+create the `EnsDb` object from the appropriate `AnnotationHub` resource. We
+then use the `getGenomeFaFile` method on the `EnsDb` to directly look up and
+retrieve the correct or best matching `FaFile` with the genomic DNA sequence. At
+last we retrieve the sequences of all exons using the `getSeq` method.
+
+```{r eval=FALSE}
+## Load the AnnotationHub data.
+library(AnnotationHub)
+ah <- AnnotationHub()
+
+## Query all available files for Ensembl release 77 for
+## Mus musculus.
+query(ah, c("Mus musculus", "release-77"))
+
+## Get the resource for the gtf file with the gene/transcript definitions.
+Gtf <- ah["AH28822"]
+## Create a EnsDb database file from this.
+DbFile <- ensDbFromAH(Gtf)
+## We can either generate a database package, or directly load the data
+edb <- EnsDb(DbFile)
+
+
+## Identify and get the FaFile object with the genomic DNA sequence matching
+## the EnsDb annotation.
+Dna <- getGenomeFaFile(edb)
+library(Rsamtools)
+## We next retrieve the sequence of all exons on chromosome Y.
+exons <- exons(edb, filter = SeqnameFilter("Y"))
+exonSeq <- getSeq(Dna, exons)
+
+## Alternatively, look up and retrieve the toplevel DNA sequence manually.
+Dna <- ah[["AH22042"]]
+```
+
+In the example below we load a `GRanges` containing gene definitions for genes
+encoded on chromosome Y and generate a EnsDb SQLite database from that
+information.
+
+```{r message=FALSE}
+## Generate a sqlite database from a GRanges object specifying
+## genes encoded on chromosome Y
+load(system.file("YGRanges.RData", package = "ensembldb"))
+Y
+
+DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+ organism = "Homo_sapiens")
+
+edb <- EnsDb(DB)
+edb
+
+## As shown in the example below, we could make an EnsDb package on
+## this DB object using the makeEnsembldbPackage function.
+```
+
+Alternatively we can build the annotation database using the `ensDbFromGtf`
+`ensDbFromGff` functions, that extracts most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
+<ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens> for human gene definitions
+from Ensembl version 75; for plant genomes etc files can be retrieved from
+<ftp://ftp.ensemblgenomes.org>). All information except the chromosome lengths and
+the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
+tries to retrieve chromosome length information automatically from Ensembl.
+
+Below we create the annotation from a gtf file that we fetch directly from Ensembl.
+
+```{r eval=FALSE}
+library(ensembldb)
+
+## the GTF file can be downloaded from
+## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+## generate the SQLite database file
+DB <- ensDbFromGtf(gtf = gtffile)
+
+## load the DB file directly
+EDB <- EnsDb(DB)
+
+## alternatively, build the annotation package
+## and finally we can generate the package
+makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+```
+
+# Database layout<a id="orgtarget1"></a>
+
+The database consists of the following tables and attributes (the layout is also
+shown in Figure [115](#orgparagraph1)):
+
+- **gene**: all gene specific annotations.
+ - `gene_id`: the Ensembl ID of the gene.
+ - `gene_name`: the name (symbol) of the gene.
+ - `entrezid`: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
+ `;` separated list of IDs for genes that are mapped to more than one
+ Entrezgene.
+ - `gene_biotype`: the biotype of the gene.
+ - `gene_seq_start`: the start coordinate of the gene on the sequence (usually
+ a chromosome).
+ - `gene_seq_end`: the end coordinate of the gene on the sequence.
+ - `seq_name`: the name of the sequence (usually the chromosome name).
+ - `seq_strand`: the strand on which the gene is encoded.
+ - `seq_coord_system`: the coordinate system of the sequence.
+
+- **tx**: all transcript related annotations. Note that while no `tx_name` column
+ is available in this database column, all methods to retrieve data from the
+ database support also this column. The returned values are however the ID of
+ the transcripts.
+ - `tx_id`: the Ensembl transcript ID.
+ - `tx_biotype`: the biotype of the transcript.
+ - `tx_seq_start`: the start coordinate of the transcript.
+ - `tx_seq_end`: the end coordinate of the transcript.
+ - `tx_cds_seq_start`: the start coordinate of the coding region of the
+ transcript (NULL for non-coding transcripts).
+ - `tx_cds_seq_end`: the end coordinate of the coding region of the transcript.
+ - `gene_id`: the gene to which the transcript belongs.
+
+- **exon**: all exon related annotation.
+ - `exon_id`: the Ensembl exon ID.
+ - `exon_seq_start`: the start coordinate of the exon.
+ - `exon_seq_end`: the end coordinate of the exon.
+
+- **tx2exon**: provides the n:m mapping between transcripts and exons.
+ - `tx_id`: the Ensembl transcript ID.
+ - `exon_id`: the Ensembl exon ID.
+ - `exon_idx`: the index of the exon in the corresponding transcript, always
+ from 5' to 3' of the transcript.
+
+- **chromosome**: provides some information about the chromosomes.
+ - `seq_name`: the name of the sequence/chromosome.
+ - `seq_length`: the length of the sequence.
+ - `is_circular`: whether the sequence in circular.
+
+- **information**: some additional, internal, informations (Genome build, Ensembl
+ version etc).
+ - `key`
+ - `value`
+
+- *virtual* columns:
+ - `symbol`: the database does not have such a database column, but it is still
+ possible to use it in the `columns` parameter. This column is *symlinked* to the
+ `gene_name` column.
+ - `tx_name`: similar to the `symbol` column, this column is *symlinked* to the `tx_id`
+ column.
+
+![img](images/dblayout.png "Database layout.")
+
+<div id="footnotes">
+<h2 class="footnotes">Footnotes: </h2>
+<div id="text-footnotes">
+
+<div class="footdef"><sup><a id="fn.1" class="footnum" href="#fnr.1">1</a></sup> <div class="footpara"><http://www.ensembl.org></div></div>
+
+<div class="footdef"><sup><a id="fn.2" class="footnum" href="#fnr.2">2</a></sup> <div class="footpara"><http://www.lrg-sequence.org></div></div>
+
+<div class="footdef"><sup><a id="fn.3" class="footnum" href="#fnr.3">3</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/23950696></div></div>
+
+<div class="footdef"><sup><a id="fn.4" class="footnum" href="#fnr.4">4</a></sup> <div class="footpara"><http://www.ncbi.nlm.nih.gov/pubmed/24227677></div></div>
+
+<div class="footdef"><sup><a id="fn.5" class="footnum" href="#fnr.5">5</a></sup> <div class="footpara"><http://www.ensembl.org/info/docs/api/api_installation.html></div></div>
+
+
+</div>
+</div>
diff --git a/vignettes/ensembldb.org b/vignettes/ensembldb.org
new file mode 100644
index 0000000..554983e
--- /dev/null
+++ b/vignettes/ensembldb.org
@@ -0,0 +1,1369 @@
+#+TITLE: Generating and using Ensembl based annotation packages
+#+AUTHOR: Johannes Rainer
+#+EMAIL: johannes.rainer at eurac.edu
+#+DESCRIPTION:
+#+KEYWORDS:
+#+LANGUAGE: en
+#+OPTIONS: ^:{} toc:nil
+#+PROPERTY: exports code
+#+PROPERTY: session *R*
+
+#+EXPORT_SELECT_TAGS: export
+#+EXPORT_EXCLUDE_TAGS: noexport
+
+#+latex: %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
+#+latex: %\VignetteKeywords{annotation, database}
+#+latex: %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,BSgenome.Hsapiens.UCSC.hg19}
+#+latex: %\VignettePackage{ensembldb}
+#+latex: %\VignetteEngine{knitr::rmarkdown}
+
+
+#+BEGIN_html
+---
+title: "Generating an using Ensembl based annotation packages"
+graphics: yes
+output:
+ BiocStyle::html_document2
+vignette: >
+ %\VignetteIndexEntry{Generating an using Ensembl based annotation packages}
+ %\VignetteEngine{knitr::rmarkdown}
+ %\VignetteEncoding{UTF-8}
+ %\VignetteDepends{ensembldb,EnsDb.Hsapiens.v75,Gviz,BiocStyle}
+ %\VignettePackage{ensembldb}
+ %\VignetteKeywords{annotation,database}
+---
+#+END_html
+
+# #+BEGIN_EXPORT html
+
+#+BEGIN_html
+**Package**: `r BiocStyle::Biocpkg("ensembldb")`<br />
+**Authors**: `r packageDescription("ensembldb")$Author`<br />
+**Modified**: 12 September, 2016<br />
+**Compiled**: `r date()`
+#+END_html
+
+
+
+* How to export this to a =Rnw= vignette :noexport:
+
+Use =ox-ravel= from the =orgmode-accessories= package to export this file to a =Rnw= file. After export edit the generated =Rnw= in the following way:
+
+1) Delete all =\usepackage= commands.
+2) Move the =<<style>>= code chunk before the =\begin{document}= and before =\author=.
+3) Move all =%\Vignette...= lines at the start of the file (even before =\documentclass=).
+4) Replace =\date= with =\date{Modified: 21 October, 2013. Compiled: \today}=
+5) Eventually search for all problems with =texttt=, i.e. search for pattern ="==.
+
+Note: use =:ravel= followed by the properties for the code chunk headers, e.g. =:ravel results='hide'=. Other options for knitr style options are:
++ =results=: ='hide'= (hides all output, not warnings or messages), ='asis'=, ='markup'= (the default).
++ =warning=: =TRUE= or =FALSE= whether warnings should be displayed.
++ =message=: =TRUE= or =FALSE=, same as above.
++ =include=: =TRUE= or =FALSE=, whether the output should be included into the final document (code is still evaluated).
+
+* How to export this to a =Rmd= vignette :noexport:
+
+Use =ox-ravel= to export this file as an R markdown file (=C-c C-e m
+r=). That way we don't need to edit the resulting =Rmd= file.
+
+* Introduction
+
+The =ensembldb= package provides functions to create and use transcript centric
+annotation databases/packages. The annotation for the databases are directly
+fetched from Ensembl [fn:1] using their Perl API. The functionality and data is
+similar to that of the =TxDb= packages from the =GenomicFeatures= package, but,
+in addition to retrieve all gene/transcript models and annotations from the
+database, the =ensembldb= package provides also a filter framework allowing to
+retrieve annotations for specific entries like genes encoded on a chromosome
+region or transcript models of lincRNA genes. In the databases, along with the
+gene and transcript models and their chromosomal coordinates, additional
+annotations including the gene name (symbol) and NCBI Entrezgene identifiers as
+well as the gene and transcript biotypes are stored too (see Section
+[[section.database.layout]] for the database layout and an overview of available
+attributes/columns).
+
+Another main goal of this package is to generate /versioned/ annotation
+packages, i.e. annotation packages that are build for a specific Ensembl
+release, and are also named according to that (e.g. =EnsDb.Hsapiens.v75= for
+human gene definitions of the Ensembl code database version 75). This ensures
+reproducibility, as it allows to load annotations from a specific Ensembl
+release also if newer versions of annotation packages/releases are available. It
+also allows to load multiple annotation packages at the same time in order to
+e.g. compare gene models between Ensembl releases.
+
+In the example below we load an Ensembl based annotation package for Homo
+sapiens, Ensembl version 75. The connection to the database is bound to the
+variable =EnsDb.Hsapiens.v75=.
+
+#+BEGIN_SRC R :ravel warning=FALSE, message=FALSE
+ library(EnsDb.Hsapiens.v75)
+
+ ## Making a "short cut"
+ edb <- EnsDb.Hsapiens.v75
+ ## print some informations for this package
+ edb
+
+ ## for what organism was the database generated?
+ organism(edb)
+#+END_SRC
+
+
+* Using =ensembldb= annotation packages to retrieve specific annotations
+
+The =ensembldb= package provides a set of filter objects allowing to specify
+which entries should be fetched from the database. The complete list of filters,
+which can be used individually or can be combined, is shown below (in
+alphabetical order):
+
++ =ExonidFilter=: allows to filter the result based on the (Ensembl) exon
+ identifiers.
++ =ExonrankFilter=: filter results on the rank (index) of an exon within the
+ transcript model. Exons are always numbered from 5' to 3' end of the
+ transcript, thus, also on the reverse strand, the exon 1 is the most 5' exon
+ of the transcript.
++ =EntrezidFilter=: allows to filter results based on NCBI Entrezgene
+ identifiers of the genes.
++ =GenebiotypeFilter=: allows to filter for the gene biotypes defined in the
+ Ensembl database; use the =listGenebiotypes= method to list all available
+ biotypes.
++ =GeneidFilter=: allows to filter based on the Ensembl gene IDs.
++ =GenenameFilter=: allows to filter based on the names (symbols) of the genes.
++ =SymbolFilter=: allows to filter on gene symbols; note that no database columns
+ /symbol/ is available in an =EnsDb= database and hence the gene name is used for
+ filtering.
++ =GRangesFilter=: allows to retrieve all features (genes, transcripts or exons)
+ that are either within (setting =condition= to "within") or partially
+ overlapping (setting =condition= to "overlapping") the defined genomic
+ region/range. Note that, depending on the called method (=genes=, =transcripts=
+ or =exons=) the start and end coordinates of either the genes, transcripts or
+ exons are used for the filter. For methods =exonsBy=, =cdsBy= and =txBy= the
+ coordinates of =by= are used.
++ =SeqendFilter=: filter based on the chromosomal end coordinate of the exons,
+ transcripts or genes (correspondingly set =feature = "exon"=, =feature = "tx"= or
+ =feature = "gene"=).
++ =SeqnameFilter=: filter by the name of the chromosomes the genes are encoded
+ on.
++ =SeqstartFilter=: filter based on the chromosomal start coordinates of the
+ exons, transcripts or genes (correspondingly set =feature = "exon"=,
+ =feature = "tx"= or =feature = "gene"=).
++ =SeqstrandFilter=: filter for the chromosome strand on which the genes are
+ encoded.
++ =TxbiotypeFilter=: filter on the transcript biotype defined in Ensembl; use
+ the =listTxbiotypes= method to list all available biotypes.
++ =TxidFilter=: filter on the Ensembl transcript identifiers.
+
+Each of the filter classes can take a single value or a vector of values (with
+the exception of the =SeqendFilter= and =SeqstartFilter=) for comparison. In
+addition, it is possible to specify the /condition/ for the filter,
+e.g. setting =condition= to = to retrieve all entries matching the filter value,
+to != to negate the filter or setting =condition = "like"= to allow
+partial matching. The =condition= parameter for =SeqendFilter= and
+=SeqendFilter= can take the values = , >, >=, < and <= (since these
+filters base on numeric values).
+
+# The =SeqnameFilter= and =GRangesFilter= support both UCSC and Ensembl chromosome
+# names (e.g. ="chrX"= for UCSC and ="X"= for Ensembl), internally, UCSC
+# chromosome names are mapped to Ensembl names. By default, all functions to
+# retrieve data from the database return Ensembl chromosome names, but by setting
+# the global option =ucscChromosomeNames= to =TRUE=
+# (i.e. =options(ucscChromosomeNames = TRUE)=) chromosome/seqnames are returned in
+# UCSC format.
+
+A simple example would be to get all transcripts for the gene /BCL2L11/. To this
+end we specify a =GenenameFilter= with the value /BCL2L11/. As a result we get
+a =GRanges= object with =start=, =end=, =strand= and =seqname= of the =GRanges=
+object being the start coordinate, end coordinate, chromosome name and strand
+for the respective transcripts. All additional annotations are available as
+metadata columns. Alternatively, by setting =return.type= to "DataFrame", or
+"data.frame" the method would return a =DataFrame= or =data.frame= object.
+
+#+BEGIN_SRC R
+ Tx <- transcripts(edb, filter = list(GenenameFilter("BCL2L11")))
+
+ Tx
+
+ ## as this is a GRanges object we can access e.g. the start coordinates with
+ head(start(Tx))
+
+ ## or extract the biotype with
+ head(Tx$tx_biotype)
+#+END_SRC
+
+The parameter =columns= of the =exons=, =genes= and =transcripts= method allows
+to specify which database attributes (columns) should be retrieved. The =exons=
+method returns by default all exon-related columns, the =transcripts= all columns
+from the transcript database table and the =genes= all from the gene table. Note
+however that in the example above we got also a column =gene_name= although this
+column is not present in the transcript database table. By default the methods
+return also all columns that are used by any of the filters submitted with the
+=filter= argument (thus, because a =GenenameFilter= was used, the column =gene_name=
+is also returned). Setting =returnFilterColumns(edb) <- FALSE= disables this
+option and only the columns specified by the =columns= parameter are retrieved.
+
+To get an overview of database tables and available columns the function
+=listTables= can be used. The method =listColumns= on the other hand lists columns
+for the specified database table.
+
+#+BEGIN_SRC R
+ ## list all database tables along with their columns
+ listTables(edb)
+
+ ## list columns from a specific table
+ listColumns(edb, "tx")
+#+END_SRC
+
+Thus, we could retrieve all transcripts of the biotype /nonsense_mediated_decay/
+(which, according to the definitions by Ensembl are transcribed, but most likely
+not translated in a protein, but rather degraded after transcription) along with
+the name of the gene for each transcript. Note that we are changing here the
+=return.type= to =DataFrame=, so the method will return a =DataFrame= with the
+results instead of the default =GRanges=.
+
+#+BEGIN_SRC R
+ Tx <- transcripts(edb,
+ columns = c(listColumns(edb , "tx"), "gene_name"),
+ filter = TxbiotypeFilter("nonsense_mediated_decay"),
+ return.type = "DataFrame")
+ nrow(Tx)
+ Tx
+#+END_SRC
+
+For protein coding transcripts, we can also specifically extract their coding
+region. In the example below we extract the CDS for all transcripts encoded on
+chromosome Y.
+
+#+BEGIN_SRC R
+ yCds <- cdsBy(edb, filter = SeqnameFilter("Y"))
+ yCds
+#+END_SRC
+
+Using a =GRangesFilter= we can retrieve all features from the database that are
+either within or overlapping the specified genomic region. In the example
+below we query all genes that are partially overlapping with a small region on
+chromosome 11. The filter restricts to all genes for which either an exon or an
+intron is partially overlapping with the region.
+
+#+BEGIN_SRC R
+ ## Define the filter
+ grf <- GRangesFilter(GRanges("11", ranges = IRanges(114000000, 114000050),
+ strand = "+"), condition = "overlapping")
+
+ ## Query genes:
+ gn <- genes(edb, filter = grf)
+ gn
+
+ ## Next we retrieve all transcripts for that gene so that we can plot them.
+ txs <- transcripts(edb, filter = GenenameFilter(gn$gene_name))
+#+END_SRC
+
+#+BEGIN_SRC R :ravel tx-for-zbtb16, message=FALSE, fig.align='center', fig.width=7.5, fig.height=5
+ plot(3, 3, pch = NA, xlim = c(start(gn), end(gn)), ylim = c(0, length(txs)),
+ yaxt = "n", ylab = "")
+ ## Highlight the GRangesFilter region
+ rect(xleft = start(grf), xright = end(grf), ybottom = 0, ytop = length(txs),
+ col = "red", border = "red")
+ for(i in 1:length(txs)) {
+ current <- txs[i]
+ rect(xleft = start(current), xright = end(current), ybottom = i-0.975,
+ ytop = i-0.125, border = "grey")
+ text(start(current), y = i-0.5, pos = 4, cex = 0.75, labels = current$tx_id)
+ }
+
+#+END_SRC
+
+As we can see, 4 transcripts of the gene ZBTB16 are also overlapping the
+region. Below we fetch these 4 transcripts. Note, that a call to =exons= will
+not return any features from the database, as no exon is overlapping with the
+region.
+
+#+BEGIN_SRC R
+ transcripts(edb, filter = grf)
+#+END_SRC
+
+The =GRangesFilter= supports also =GRanges= defining multiple regions and a
+query will return all features overlapping any of these regions. Besides using
+the =GRangesFilter= it is also possible to search for transcripts or exons
+overlapping genomic regions using the =exonsByOverlaps= or
+=transcriptsByOverlaps= known from the =GenomicFeatures= package. Note that the
+implementation of these methods for =EnsDb= objects supports also to use filters
+to further fine-tune the query.
+
+To get an overview of allowed/available gene and transcript biotype the
+functions =listGenebiotypes= and =listTxbiotypes= can be used.
+
+#+BEGIN_SRC R
+ ## Get all gene biotypes from the database. The GenebiotypeFilter
+ ## allows to filter on these values.
+ listGenebiotypes(edb)
+
+ ## Get all transcript biotypes from the database.
+ listTxbiotypes(edb)
+#+END_SRC
+
+Data can be fetched in an analogous way using the =exons= and =genes=
+methods. In the example below we retrieve =gene_name=, =entrezid= and the
+=gene_biotype= of all genes in the database which names start with "BCL2".
+
+#+BEGIN_SRC R
+ ## We're going to fetch all genes which names start with BCL. To this end
+ ## we define a GenenameFilter with partial matching, i.e. condition "like"
+ ## and a % for any character/string.
+ BCLs <- genes(edb,
+ columns = c("gene_name", "entrezid", "gene_biotype"),
+ filter = list(GenenameFilter("BCL%", condition = "like")),
+ return.type = "DataFrame")
+ nrow(BCLs)
+ BCLs
+#+END_SRC
+
+Sometimes it might be useful to know the length of genes or transcripts
+(i.e. the total sum of nucleotides covered by their exons). Below we calculate
+the mean length of transcripts from protein coding genes on chromosomes X and Y
+as well as the average length of snoRNA, snRNA and rRNA transcripts encoded on
+these chromosomes.
+
+#+BEGIN_SRC R
+ ## determine the average length of snRNA, snoRNA and rRNA genes encoded on
+ ## chromosomes X and Y.
+ mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter(c("snRNA", "snoRNA", "rRNA")),
+ SeqnameFilter(c("X", "Y")))))
+
+ ## determine the average length of protein coding genes encoded on the same
+ ## chromosomes.
+ mean(lengthOf(edb, of = "tx",
+ filter = list(GenebiotypeFilter("protein_coding"),
+ SeqnameFilter(c("X", "Y")))))
+#+END_SRC
+
+Not unexpectedly, transcripts of protein coding genes are longer than those of
+snRNA, snoRNA or rRNA genes.
+
+At last we extract the first two exons of each transcript model from the
+database.
+
+#+BEGIN_SRC R
+ ## Extract all exons 1 and (if present) 2 for all genes encoded on the
+ ## Y chromosome
+ exons(edb, columns = c("tx_id", "exon_idx"),
+ filter = list(SeqnameFilter("Y"),
+ ExonrankFilter(3, condition = "<")))
+#+END_SRC
+
+* Extracting gene/transcript/exon models for RNASeq feature counting
+
+For the feature counting step of an RNAseq experiment, the gene or transcript
+models (defined by the chromosomal start and end positions of their exons) have
+to be known. To extract these from an Ensembl based annotation package, the
+=exonsBy=, =genesBy= and =transcriptsBy= methods can be used in an analogous way as in
+=TxDb= packages generated by the =GenomicFeatures= package. However, the
+=transcriptsBy= method does not, in contrast to the method in the =GenomicFeatures=
+package, allow to return transcripts by "cds". While the annotation packages
+built by the =ensembldb= contain the chromosomal start and end coordinates of
+the coding region (for protein coding genes) they do not assign an ID to each
+CDS.
+
+A simple use case is to retrieve all genes encoded on chromosomes X and Y from
+the database.
+
+#+BEGIN_SRC R
+ TxByGns <- transcriptsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c("X", "Y")))
+ )
+ TxByGns
+#+END_SRC
+
+Since Ensembl contains also definitions of genes that are on chromosome variants
+(supercontigs), it is advisable to specify the chromosome names for which the
+gene models should be returned.
+
+In a real use case, we might thus want to retrieve all genes encoded on the
+/standard/ chromosomes. In addition it is advisable to use a =GeneidFilter= to
+restrict to Ensembl genes only, as also /LRG/ (Locus Reference Genomic)
+genes[fn:3] are defined in the database, which are partially redundant with
+Ensembl genes.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ ## will just get exons for all genes on chromosomes 1 to 22, X and Y.
+ ## Note: want to get rid of the "LRG" genes!!!
+ EnsGenes <- exonsBy(edb, by = "gene",
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+#+END_SRC
+
+The code above returns a =GRangesList= that can be used directly as an input for
+the =summarizeOverlaps= function from the =GenomicAlignments= package [fn:4].
+
+Alternatively, the above =GRangesList= can be transformed to a =data.frame= in
+/SAF/ format that can be used as an input to the =featureCounts= function of the
+=Rsubread= package [fn:5].
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ ## Transforming the GRangesList into a data.frame in SAF format
+ EnsGenes.SAF <- toSAF(EnsGenes)
+
+#+END_SRC
+
+Note that the ID by which the =GRangesList= is split is used in the SAF
+formatted =data.frame= as the =GeneID=. In the example below this would be the
+Ensembl gene IDs, while the start, end coordinates (along with the strand and
+chromosomes) are those of the the exons.
+
+In addition, the =disjointExons= function (similar to the one defined in
+=GenomicFeatures=) can be used to generate a =GRanges= of non-overlapping exon
+parts which can be used in the =DEXSeq= package.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ ## Create a GRanges of non-overlapping exon parts.
+ DJE <- disjointExons(edb,
+ filter = list(SeqnameFilter(c(1:22, "X", "Y")),
+ GeneidFilter("ENSG%", "like")))
+
+#+END_SRC
+
+
+
+* Retrieving sequences for gene/transcript/exon models
+
+The methods to retrieve exons, transcripts and genes (i.e. =exons=, =transcripts=
+and =genes=) return by default =GRanges= objects that can be used to retrieve
+sequences using the =getSeq= method e.g. from BSgenome packages. The basic
+workflow is thus identical to the one for =TxDb= packages, however, it is not
+straight forward to identify the BSgenome package with the matching genomic
+sequence. Most BSgenome packages are named according to the genome build
+identifier used in UCSC which does not (always) match the genome build name used
+by Ensembl. Using the Ensembl version provided by the =EnsDb=, the correct genomic
+sequence can however be retrieved easily from the =AnnotationHub= using the
+=getGenomeFaFile=. If no Fasta file matching the Ensembl version is available, the
+function tries to identify a Fasta file with the correct genome build from the
+/closest/ Ensembl release and returns that instead.
+
+In the code block below we retrieve first the =FaFile= with the genomic DNA
+sequence, extract the genomic start and end coordinates for all genes defined in
+the package, subset to genes encoded on sequences available in the =FaFile= and
+extract all of their sequences. Note: these sequences represent the sequence
+between the chromosomal start and end coordinates of the gene.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ library(EnsDb.Hsapiens.v75)
+ library(Rsamtools)
+ edb <- EnsDb.Hsapiens.v75
+
+ ## Get the FaFile with the genomic sequence matching the Ensembl version
+ ## using the AnnotationHub package.
+ Dna <- getGenomeFaFile(edb)
+
+ ## Get start/end coordinates of all genes.
+ genes <- genes(edb)
+ ## Subset to all genes that are encoded on chromosomes for which
+ ## we do have DNA sequence available.
+ genes <- genes[seqnames(genes) %in% seqnames(seqinfo(Dna))]
+
+ ## Get the gene sequences, i.e. the sequence including the sequence of
+ ## all of the gene's exons and introns.
+ geneSeqs <- getSeq(Dna, genes)
+
+
+#+END_SRC
+
+To retrieve the (exonic) sequence of transcripts (i.e. without introns) we can
+use directly the =extractTranscriptSeqs= method defined in the =GenomicFeatures= on
+the =EnsDb= object, eventually using a filter to restrict the query.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ ## get all exons of all transcripts encoded on chromosome Y
+ yTx <- exonsBy(edb, filter = SeqnameFilter("Y"))
+
+ ## Retrieve the sequences for these transcripts from the FaFile.
+ library(GenomicFeatures)
+ yTxSeqs <- extractTranscriptSeqs(Dna, yTx)
+ yTxSeqs
+
+ ## Extract the sequences of all transcripts encoded on chromosome Y.
+ yTx <- extractTranscriptSeqs(Dna, edb, filter = SeqnameFilter("Y"))
+
+ ## Along these lines, we could use the method also to retrieve the coding sequence
+ ## of all transcripts on the Y chromosome.
+ cdsY <- cdsBy(edb, filter = SeqnameFilter("Y"))
+ extractTranscriptSeqs(Dna, cdsY)
+
+#+END_SRC
+
+Note: in the next section we describe how transcript sequences can be retrieved
+from a =BSgenome= package that is based on UCSC, not Ensembl.
+
+* Integrating annotations from Ensembl based =EnsDb= packages with UCSC based annotations
+
+Sometimes it might be useful to combine (Ensembl based) annotations from =EnsDb=
+packages/objects with annotations from other Bioconductor packages, that might
+base on UCSC annotations. To support such an integration of annotations, the
+=ensembldb= packages implements the =seqlevelsStyle= and =seqlevelsStyle<-= from the
+=GenomeInfoDb= package that allow to change the style of chromosome naming. Thus,
+sequence/chromosome names other than those used by Ensembl can be used in, and
+are returned by, the queries to =EnsDb= objects as long as a mapping for them is
+provided by the =GenomeInfoDb= package (which provides a mapping mostly between
+UCSC, NCBI and Ensembl chromosome names for the /main/ chromosomes).
+
+In the example below we change the seqnames style to UCSC.
+
+#+BEGIN_SRC R :ravel message=FALSE
+ ## Change the seqlevels style form Ensembl (default) to UCSC:
+ seqlevelsStyle(edb) <- "UCSC"
+
+ ## Now we can use UCSC style seqnames in SeqnameFilters or GRangesFilter:
+ genesY <- genes(edb, filter = SeqnameFilter("chrY"))
+ ## The seqlevels of the returned GRanges are also in UCSC style
+ seqlevels(genesY)
+#+END_SRC
+
+Note that in most instances no mapping is available for sequences not
+corresponding to the main chromosomes (i.e. contigs, patched chromosomes
+etc). What is returned in cases in which no mapping is available can be
+specified with the global =ensembldb.seqnameNotFound= option. By default (with
+=ensembldb.seqnameNotFound= set to "ORIGINAL"), the original seqnames (i.e. the
+ones from Ensembl) are returned. With =ensembldb.seqnameNotFound= "MISSING" each
+time a seqname can not be found an error is thrown. For all other cases
+(e.g. =ensembldb.seqnameNotFound = NA=) the value of the option is returned.
+
+#+BEGIN_SRC R
+ seqlevelsStyle(edb) <- "UCSC"
+
+ ## Getting the default option:
+ getOption("ensembldb.seqnameNotFound")
+
+ ## Listing all seqlevels in the database.
+ seqlevels(edb)[1:30]
+
+ ## Setting the option to NA, thus, for each seqname for which no mapping is available,
+ ## NA is returned.
+ options(ensembldb.seqnameNotFound=NA)
+ seqlevels(edb)[1:30]
+
+ ## Resetting the option.
+ options(ensembldb.seqnameNotFound = "ORIGINAL")
+
+#+END_SRC
+
+Next we retrieve transcript sequences from genes encoded on chromosome Y using
+the =BSGenome= package for the human genome from UCSC. The specified version
+=hg19= matches the genome build of Ensembl version 75, i.e. =GRCh37=. Note that
+while we changed the style of the seqnames to UCSC we did not change the naming
+of the genome release.
+
+#+BEGIN_SRC R :ravel warning=FALSE, message=FALSE
+ library(BSgenome.Hsapiens.UCSC.hg19)
+ bsg <- BSgenome.Hsapiens.UCSC.hg19
+
+ ## Get the genome version
+ unique(genome(bsg))
+ unique(genome(edb))
+ ## Although differently named, both represent genome build GRCh37.
+
+ ## Extract the full transcript sequences.
+ yTxSeqs <- extractTranscriptSeqs(bsg, exonsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+
+ yTxSeqs
+
+ ## Extract just the CDS
+ Test <- cdsBy(edb, "tx", filter = SeqnameFilter("chrY"))
+ yTxCds <- extractTranscriptSeqs(bsg, cdsBy(edb, "tx", filter = SeqnameFilter("chrY")))
+ yTxCds
+
+#+END_SRC
+
+At last changing the seqname style to the default value ="Ensembl"=.
+
+#+BEGIN_SRC R
+ seqlevelsStyle(edb) <- "Ensembl"
+#+END_SRC
+
+* Interactive annotation lookup using the =shiny= web app
+
+In addition to the =genes=, =transcripts= and =exons= methods it is possibly to
+search interactively for gene/transcript/exon annotations using the internal,
+=shiny= based, web application. The application can be started with the
+=runEnsDbApp()= function. The search results from this app can also be returned
+to the R workspace either as a =data.frame= or =GRanges= object.
+
+
+* Plotting gene/transcript features using =ensembldb= and =Gviz=
+
+The =Gviz= package provides functions to plot genes and transcripts along with
+other data on a genomic scale. Gene models can be provided either as a
+=data.frame=, =GRanges=, =TxDB= database, can be fetched from biomart and can
+also be retrieved from =ensembldb=.
+
+Below we generate a =GeneRegionTrack= fetching all transcripts from a certain
+region on chromosome Y.
+
+Note that if we want in addition to work also with BAM files that were aligned
+against DNA sequences retrieved from Ensembl or FASTA files representing genomic
+DNA sequences from Ensembl we should change the =ucscChromosomeNames= option from
+=Gviz= to =FALSE= (i.e. by calling =options(ucscChromosomeNames = FALSE)=). This is
+not necessary if we just want to retrieve gene models from an =EnsDb= object, as
+the =ensembldb= package internally checks the =ucscChromosomeNames= option and,
+depending on that, maps Ensembl chromosome names to UCSC chromosome names.
+
+#+BEGIN_SRC R :ravel gviz-plot, message=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25
+ ## Loading the Gviz library
+ library(Gviz)
+ library(EnsDb.Hsapiens.v75)
+ edb <- EnsDb.Hsapiens.v75
+
+ ## Retrieving a Gviz compatible GRanges object with all genes
+ ## encoded on chromosome Y.
+ gr <- getGeneRegionTrackForGviz(edb, chromosome = "Y",
+ start = 20400000, end = 21400000)
+ ## Define a genome axis track
+ gat <- GenomeAxisTrack()
+
+ ## We have to change the ucscChromosomeNames option to FALSE to enable Gviz usage
+ ## with non-UCSC chromosome names.
+ options(ucscChromosomeNames = FALSE)
+
+ plotTracks(list(gat, GeneRegionTrack(gr)))
+
+ options(ucscChromosomeNames = TRUE)
+
+#+END_SRC
+
+Above we had to change the option =ucscChromosomeNames= to =FALSE= in order to
+use it with non-UCSC chromosome names. Alternatively, we could however also
+change the =seqnamesStyle= of the =EnsDb= object to =UCSC=. Note that we have to
+use now also chromosome names in the /UCSC style/ in the =SeqnameFilter=
+(i.e. "chrY" instead of =Y=).
+
+#+BEGIN_SRC R :ravel message=FALSE
+ seqlevelsStyle(edb) <- "UCSC"
+ ## Retrieving the GRanges objects with seqnames corresponding to UCSC chromosome names.
+ gr <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000)
+ seqnames(gr)
+ ## Define a genome axis track
+ gat <- GenomeAxisTrack()
+ plotTracks(list(gat, GeneRegionTrack(gr)))
+
+#+END_SRC
+
+We can also use the filters from the =ensembldb= package to further refine what
+transcripts are fetched, like in the example below, in which we create two
+different gene region tracks, one for protein coding genes and one for lincRNAs.
+
+#+BEGIN_SRC R :ravel gviz-separate-tracks, message=FALSE, warning=FALSE, fig.align='center', fig.width=7.5, fig.height=2.25
+ protCod <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("protein_coding"))
+ lincs <- getGeneRegionTrackForGviz(edb, chromosome = "chrY",
+ start = 20400000, end = 21400000,
+ filter = GenebiotypeFilter("lincRNA"))
+
+ plotTracks(list(gat, GeneRegionTrack(protCod, name = "protein coding"),
+ GeneRegionTrack(lincs, name = "lincRNAs")), transcriptAnnotation = "symbol")
+
+ ## At last we change the seqlevels style again to Ensembl
+ seqlevelsStyle <- "Ensembl"
+
+#+END_SRC
+
+
+* Using =EnsDb= objects in the =AnnotationDbi= framework
+
+Most of the methods defined for objects extending the basic annotation package
+class =AnnotationDbi= are also defined for =EnsDb= objects (i.e. methods
+=columns=, =keytypes=, =keys=, =mapIds= and =select=). While these methods can
+be used analogously to basic annotation packages, the implementation for =EnsDb=
+objects also support the filtering framework of the =ensembldb= package.
+
+In the example below we first evaluate all the available columns and keytypes in
+the database and extract then the gene names for all genes encoded on chromosome
+X.
+
+#+BEGIN_SRC R
+ library(EnsDb.Hsapiens.v75)
+ edb <- EnsDb.Hsapiens.v75
+
+ ## List all available columns in the database.
+ columns(edb)
+
+ ## Note that these do *not* correspond to the actual column names
+ ## of the database that can be passed to methods like exons, genes,
+ ## transcripts etc. These column names can be listed with the listColumns
+ ## method.
+ listColumns(edb)
+
+ ## List all of the supported key types.
+ keytypes(edb)
+
+ ## Get all gene ids from the database.
+ gids <- keys(edb, keytype = "GENEID")
+ length(gids)
+
+ ## Get all gene names for genes encoded on chromosome Y.
+ gnames <- keys(edb, keytype = "GENENAME", filter = SeqnameFilter("Y"))
+ head(gnames)
+#+END_SRC
+
+In the next example we retrieve specific information from the database using the
+=select= method. First we fetch all transcripts for the genes /BCL2/ and
+/BCL2L11/. In the first call we provide the gene names, while in the second call
+we employ the filtering system to perform a more fine-grained query to fetch
+only the protein coding transcripts for these genes.
+
+#+BEGIN_SRC R :ravel warning=FALSE
+ ## Use the /standard/ way to fetch data.
+ select(edb, keys = c("BCL2", "BCL2L11"), keytype = "GENENAME",
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+
+ ## Use the filtering system of ensembldb
+ select(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")),
+ columns = c("GENEID", "GENENAME", "TXID", "TXBIOTYPE"))
+#+END_SRC
+
+Finally, we use the =mapIds= method to establish a mapping between ids and
+values. In the example below we fetch transcript ids for the two genes from the
+example above.
+
+#+BEGIN_SRC R
+ ## Use the default method, which just returns the first value for multi mappings.
+ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME")
+
+ ## Alternatively, specify multiVals="list" to return all mappings.
+ mapIds(edb, keys = c("BCL2", "BCL2L11"), column = "TXID", keytype = "GENENAME",
+ multiVals = "list")
+
+ ## And, just like before, we can use filters to map only to protein coding transcripts.
+ mapIds(edb, keys = list(GenenameFilter(c("BCL2", "BCL2L11")),
+ TxbiotypeFilter("protein_coding")), column = "TXID",
+ multiVals = "list")
+#+END_SRC
+
+Note that, if the filters are used, the ordering of the result does no longer
+match the ordering of the genes.
+
+* Important notes
+
+These notes might explain eventually unexpected results (and, more importantly,
+help avoiding them):
+
++ The ordering of the results returned by the =genes=, =exons=, =transcripts= methods
+ can be specified with the =order.by= parameter. The ordering of the results does
+ however *not* correspond to the ordering of values in submitted filter
+ objects. The exception is the =select= method. If a character vector of values
+ or a single filter is passed with argument =keys= the ordering of results of
+ this method matches the ordering of the key values or the values of the
+ filter.
+
++ Results of =exonsBy=, =transcriptsBy= are always ordered by the =by= argument.
+
++ The CDS provided by =EnsDb= objects *always* includes both, the start and the
+ stop codon.
+
++ Transcripts with multiple CDS are at present not supported by =EnsDb=.
+
++ At present, =EnsDb= support only genes/transcripts for which all of their
+ exons are encoded on the same chromosome and the same strand.
+
+
+
+* Building an transcript-centric database package based on Ensembl annotation
+
+The code in this section is not supposed to be automatically executed when the
+vignette is built, as this would require a working installation of the Ensembl
+Perl API, which is not expected to be available on each system. Also, building
+=EnsDb= from alternative sources, like GFF or GTF files takes some time and
+thus also these examples are not directly executed when the vignette is build.
+
+** Requirements
+
+The =fetchTablesFromEnsembl= function of the package uses the Ensembl Perl API
+to retrieve the required annotations from an Ensembl database (e.g. from the
+main site /ensembldb.ensembl.org/). Thus, to use the functionality to built
+databases, the Ensembl Perl API needs to be installed (see [fn:2] for details).
+
+Alternatively, the =ensDbFromAH=, =ensDbFromGff=, =ensDbFromGRanges= and =ensDbFromGtf=
+functions allow to build EnsDb SQLite files from a =GRanges= object or GFF/GTF
+files from Ensembl (either provided as files or /via/ =AnnotationHub=). These
+functions do not depend on the Ensembl Perl API, but require a working internet
+connection to fetch the chromosome lengths from Ensembl as these are not
+provided within GTF or GFF files.
+
+
+** Building annotation packages
+
+The functions below use the Ensembl Perl API to fetch the required data directly
+from the Ensembl core databases. Thus, the path to the Perl API specific for the
+desired Ensembl version needs to be added to the =PERL5LIB= environment variable.
+
+An annotation package containing all human genes for Ensembl version 75 can be
+created using the code in the block below.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ library(ensembldb)
+
+ ## get all human gene/transcript/exon annotations from Ensembl (75)
+ ## the resulting tables will be stored by default to the current working
+ ## directory
+ fetchTablesFromEnsembl(75, species = "human")
+
+ ## These tables can then be processed to generate a SQLite database
+ ## containing the annotations (again, the function assumes the required
+ ## txt files to be present in the current working directory)
+ DBFile <- makeEnsemblSQLiteFromTables()
+
+ ## and finally we can generate the package
+ makeEnsembldbPackage(ensdb = DBFile, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+
+#+END_SRC
+
+The generated package can then be build using =R CMD build EnsDb.Hsapiens.v75=
+and installed with =R CMD INSTALL EnsDb.Hsapiens.v75*=. Note that we could
+directly generate an =EnsDb= instance by loading the database file, i.e. by
+calling =edb <- EnsDb(DBFile)= and work with that annotation object.
+
+To fetch and build annotation packages for plant genomes (e.g. arabidopsis
+thaliana), the /Ensembl genomes/ should be specified as a host, i.e. setting
+=host= to "mysql-eg-publicsql.ebi.ac.uk", =port= to =4157= and =species= to
+e.g. "arabidopsis thaliana".
+
+In the next example we create an =EnsDb= database using the =AnnotationHub=
+package and load also the corresponding genomic DNA sequence matching the
+Ensembl version. We thus first query the =AnnotationHub= package for all
+resources available for =Mus musculus= and the Ensembl release 77. Next we
+create the =EnsDb= object from the appropriate =AnnotationHub= resource. We
+then use the =getGenomeFaFile= method on the =EnsDb= to directly look up and
+retrieve the correct or best matching =FaFile= with the genomic DNA sequence. At
+last we retrieve the sequences of all exons using the =getSeq= method.
+
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ ## Load the AnnotationHub data.
+ library(AnnotationHub)
+ ah <- AnnotationHub()
+
+ ## Query all available files for Ensembl release 77 for
+ ## Mus musculus.
+ query(ah, c("Mus musculus", "release-77"))
+
+ ## Get the resource for the gtf file with the gene/transcript definitions.
+ Gtf <- ah["AH28822"]
+ ## Create a EnsDb database file from this.
+ DbFile <- ensDbFromAH(Gtf)
+ ## We can either generate a database package, or directly load the data
+ edb <- EnsDb(DbFile)
+
+
+ ## Identify and get the FaFile object with the genomic DNA sequence matching
+ ## the EnsDb annotation.
+ Dna <- getGenomeFaFile(edb)
+ library(Rsamtools)
+ ## We next retrieve the sequence of all exons on chromosome Y.
+ exons <- exons(edb, filter = SeqnameFilter("Y"))
+ exonSeq <- getSeq(Dna, exons)
+
+ ## Alternatively, look up and retrieve the toplevel DNA sequence manually.
+ Dna <- ah[["AH22042"]]
+
+#+END_SRC
+
+In the example below we load a =GRanges= containing gene definitions for genes
+encoded on chromosome Y and generate a EnsDb SQLite database from that
+information.
+
+#+BEGIN_SRC R :ravel message=FALSE
+ ## Generate a sqlite database from a GRanges object specifying
+ ## genes encoded on chromosome Y
+ load(system.file("YGRanges.RData", package = "ensembldb"))
+ Y
+
+ DB <- ensDbFromGRanges(Y, path = tempdir(), version = 75,
+ organism = "Homo_sapiens")
+
+ edb <- EnsDb(DB)
+ edb
+
+ ## As shown in the example below, we could make an EnsDb package on
+ ## this DB object using the makeEnsembldbPackage function.
+
+#+END_SRC
+
+
+Alternatively we can build the annotation database using the =ensDbFromGtf=
+=ensDbFromGff= functions, that extracts most of the required data from a GTF
+respectively GFF (version 3) file which can be downloaded from Ensembl (e.g. from
+ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens for human gene definitions
+from Ensembl version 75; for plant genomes etc files can be retrieved from
+ftp://ftp.ensemblgenomes.org). All information except the chromosome lengths and
+the NCBI Entrezgene IDs can be extracted from these GTF files. The function also
+tries to retrieve chromosome length information automatically from Ensembl.
+
+Below we create the annotation from a gtf file that we fetch directly from Ensembl.
+
+#+BEGIN_SRC R :ravel eval=FALSE
+ library(ensembldb)
+
+ ## the GTF file can be downloaded from
+ ## ftp://ftp.ensembl.org/pub/release-75/gtf/homo_sapiens/
+ gtffile <- "Homo_sapiens.GRCh37.75.gtf.gz"
+ ## generate the SQLite database file
+ DB <- ensDbFromGtf(gtf = gtffile)
+
+ ## load the DB file directly
+ EDB <- EnsDb(DB)
+
+ ## alternatively, build the annotation package
+ ## and finally we can generate the package
+ makeEnsembldbPackage(ensdb = DB, version = "0.99.12",
+ maintainer = "Johannes Rainer <johannes.rainer at eurac.edu>",
+ author = "J Rainer")
+
+#+END_SRC
+
+
+* Database layout<<section.database.layout>>
+
+The database consists of the following tables and attributes (the layout is also
+shown in Figure [[fig.database.layout]]):
+
++ *gene*: all gene specific annotations.
+ - =gene_id=: the Ensembl ID of the gene.
+ - =gene_name=: the name (symbol) of the gene.
+ - =entrezid=: the NCBI Entrezgene ID(s) of the gene. Note that this can be a
+ =;= separated list of IDs for genes that are mapped to more than one
+ Entrezgene.
+ - =gene_biotype=: the biotype of the gene.
+ - =gene_seq_start=: the start coordinate of the gene on the sequence (usually
+ a chromosome).
+ - =gene_seq_end=: the end coordinate of the gene on the sequence.
+ - =seq_name=: the name of the sequence (usually the chromosome name).
+ - =seq_strand=: the strand on which the gene is encoded.
+ - =seq_coord_system=: the coordinate system of the sequence.
+
++ *tx*: all transcript related annotations. Note that while no =tx_name= column
+ is available in this database column, all methods to retrieve data from the
+ database support also this column. The returned values are however the ID of
+ the transcripts.
+ - =tx_id=: the Ensembl transcript ID.
+ - =tx_biotype=: the biotype of the transcript.
+ - =tx_seq_start=: the start coordinate of the transcript.
+ - =tx_seq_end=: the end coordinate of the transcript.
+ - =tx_cds_seq_start=: the start coordinate of the coding region of the
+ transcript (NULL for non-coding transcripts).
+ - =tx_cds_seq_end=: the end coordinate of the coding region of the transcript.
+ - =gene_id=: the gene to which the transcript belongs.
+
++ *exon*: all exon related annotation.
+ - =exon_id=: the Ensembl exon ID.
+ - =exon_seq_start=: the start coordinate of the exon.
+ - =exon_seq_end=: the end coordinate of the exon.
+
++ *tx2exon*: provides the n:m mapping between transcripts and exons.
+ - =tx_id=: the Ensembl transcript ID.
+ - =exon_id=: the Ensembl exon ID.
+ - =exon_idx=: the index of the exon in the corresponding transcript, always
+ from 5' to 3' of the transcript.
+
++ *chromosome*: provides some information about the chromosomes.
+ - =seq_name=: the name of the sequence/chromosome.
+ - =seq_length=: the length of the sequence.
+ - =is_circular=: whether the sequence in circular.
+
++ *information*: some additional, internal, informations (Genome build, Ensembl
+ version etc).
+ - =key=
+ - =value=
+
++ /virtual/ columns:
+ - =symbol=: the database does not have such a database column, but it is still
+ possible to use it in the =columns= parameter. This column is /symlinked/ to the
+ =gene_name= column.
+ - =tx_name=: similar to the =symbol= column, this column is /symlinked/ to the =tx_id=
+ column.
+
+#+ATTR_LATEX: :center :placement [h!] :width 14cm
+#+NAME: fig.database.layout
+#+CAPTION: Database layout.
+[[file:images/dblayout.png]]
+
+
+
+* Footnotes
+
+[fn:1] http://www.ensembl.org
+
+[fn:2] http://www.ensembl.org/info/docs/api/api_installation.html
+
+[fn:3] http://www.lrg-sequence.org
+
+[fn:4] http://www.ncbi.nlm.nih.gov/pubmed/23950696
+
+[fn:5] http://www.ncbi.nlm.nih.gov/pubmed/24227677
+
+
+* Installing the Ensembl database locally and building new packages :noexport:
+:PROPERTIES:
+:eval: never
+:END:
+
+This section covers the local installation of a new Ensembl database on my
+system. Some of the perl scripts used here are available at
+https://github.com/jotsetung/Ensembl-Exon-probemapping.
+
+First of all we have to get the MySQL server up on my system. The MySQL server
+was installed using =homebrew= and was configured to keep the databases on an
+external disk.
+
+Start the server using =mysql.server start=.
+
+#+BEGIN_SRC shell
+ ## Change to the directory with the perl script
+ cd ~/Projects/git/Ensembl-Exon-probemapping/bin/
+
+ ## Download and install the Ensembl core database
+ perl installEnsembldb.pl -e 85 -d homo_sapiens_core_85_38
+#+END_SRC
+
+
+
+* TODOs :noexport:
+
+** DONE Fix the =ensembldb:::EnsDb= call in /zzz.R/ of the package template!
+ CLOSED: [2015-04-01 Wed 12:05]
+ - State "DONE" from "TODO" [2015-04-01 Wed 12:05]
+
+The =EnsDb= construction function is exported, thus there is no need for the =:::=.
+
+** DONE Implement the =distjointExons= method.
+ CLOSED: [2015-03-25 Wed 09:43]
+ - State "DONE" from "TODO" [2015-03-25 Wed 09:43]
+** DONE Fix return value for =organism=
+ CLOSED: [2015-03-27 Fri 12:10]
+ - State "DONE" from "TODO" [2015-03-27 Fri 12:10]
+
+The return value should be /Genus species/, i.e. without =_= in between.
+** DONE Check =utils::news=, =?news=
+ CLOSED: [2015-04-02 Thu 08:50]
+ - State "DONE" from "TODO" [2015-04-02 Thu 08:50]
+** DONE build the database based on an Ensembl gtf file
+ CLOSED: [2015-04-10 Fri 07:02]
+ - State "DONE" from "TODO" [2015-04-10 Fri 07:02]
+ - That would be the pre-requisite to write recipes for the =AnnotationHub= package.
+ - The only missing data is the sequence lengths.
+** DONE Use the =GenomicFeatures= =fetchChromLengthsFromEnsembl= to retrieve chromosome lengths for GTF import
+ CLOSED: [2015-04-14 Tue 11:36]
+ - State "DONE" from "TODO" [2015-04-14 Tue 11:36]
+
++ Ideally, automatically run this script, if there is any error just skip, but do not stop. To do that, use the =try= call.
+
+** CANCELED Include recipe to =AnnotationHub=
+ CLOSED: [2015-06-12 Fri 08:55]
+ - State "CANCELED" from "TODO" [2015-06-12 Fri 08:55] \\
+ Don't need that really. We can retrieve the GRanges object and build the EnsDb object or package based on that.
+** CANCELED Implement a function to /guess/ the correct BSgenome package
+ CLOSED: [2015-06-11 Thu 08:45]
+ - State "CANCELED" from "TODO" [2015-06-11 Thu 08:45] \\
+ Drop that; better to fetch the sequence from AnnotationHub!
++ In the end it seems I have to do some hard-coding there...
+
+
+** DONE Implement a function to load the appropriate DNA sequence from AnnotationHub
+ CLOSED: [2015-06-12 Fri 08:55]
+ - State "DONE" from "TODO" [2015-06-12 Fri 08:55]
++ [X] Implement a method to retrieve the Ensembl version.
+Some code snippet:
+=query(ah, c(organism(edb), paste0("release-")))= and use =mcols()= on the result to search for =dna.toplevel.fa=.
+
+** DONE Implement a function to build an EnsDb from a GRanges object.
+ CLOSED: [2015-04-14 Tue 11:35]
+ - State "DONE" from "TODO" [2015-04-14 Tue 11:35]
+** DONE Implement the =cdsBy= method.
+ CLOSED: [2015-10-30 Fri 09:15]
+ - State "DONE" from "TODO" [2015-10-30 Fri 09:15]
+This has to be implemented for =by= being ="tx"= and ="gene"=. Note that we can
+*only* return this stuff for protein coding genes!!!
+For =tx=:
+- returns the exons constituting the cds. Returns a =GRangesList= with =GRanges=
+ and metadata columns: =cds_id=, =cds_name=, =exon_rank=. The latter is clear,
+ the other two are ?
+- option =use.names= will return the TX ID.
+
+For =gene=:
+- Could we get that using =reduce=?
+
+** DONE Implement the =fiveUTRsByTranscript= method.
+ CLOSED: [2015-10-30 Fri 15:05]
+ - State "DONE" from "TODO" [2015-10-30 Fri 15:05]
+
+
+** DONE Implement the =threeUTRsByTranscript= method.
+ CLOSED: [2015-10-30 Fri 15:05]
+ - State "DONE" from "TODO" [2015-10-30 Fri 15:05]
+** DONE Implement a method to use ensembldb for =Gviz=
+ CLOSED: [2015-11-04 Wed 09:15]
+ - State "DONE" from "TODO" [2015-11-04 Wed 09:15]
+Do something similar to the .buildRange method for "TxDb" objects
+(/Gviz-methods.R/). Ideally, the function should return a =GRanges= object (or
+might a =data.frame= do as well?).
+
++ Implement a method that builds a =data.frame= for =Gviz=.
++ Check =.getBiotypeColor= function in /Gviz.R/ line 681.
++ Check =GeneRegionTrack= constructor in /AllClasses.R/, line 897 ->
+ =.buildRanges= ()
++ =getGeneRegionTrackForGviz= should ideally return a =GRanges=, setting also
+ the genome, seqinfo etc.
+** WAIT Add a section in the vignette describing the use of =Gviz= with =ensembldb=
+ - State "WAIT" from "TODO" [2015-11-06 Fri 08:41] \\
+ Wait for Florian Hahne to add the changes to Gviz.
+
+
+** DONE Implement a fix that would allow UCSC chromosome names [4/4]
+ CLOSED: [2015-11-30 Mon 09:24]
+ - State "DONE" from "TODO" [2015-11-30 Mon 09:24]
+The idea is that, reading =options("ucscChromosomeNames")= a ="chr"= is appended
+to the chromosome names. That way, =EnsDb= databases could directly work with
+=Gviz= (as that package uses the above option).
+
++ If something is queried from the database, the ="chr"= has to be stripped
+ off. Here we have to deal with the filters:
++ [X] =SeqnameFilter=: this now always returns stripped chr names, if =EnsDb= is
+ also submitted.
++ [X] =GRangesFilter=
+ and eventually using their =value= method:
++ If anything is returned from the database, a ="chr"= has to be appended, if
+ the options are =TRUE=.
+ - Looks like the major return path is =getWhat=, so, will include the replace
+ stuff there.
++ [X] Adapt =getWhat=.
++ [X] The query to build the Gviz =GenePanel=.
+
+** DONE Implement a fix to rename additional chromosome names, like =Mt= etc.
+ CLOSED: [2015-11-30 Mon 08:59]
+ - State "DONE" from "TODO" [2015-11-30 Mon 08:59]
+** DONE Implement a =GRangesFilter= [2/2]
+ CLOSED: [2015-11-27 Fri 13:59]
+ - State "DONE" from "TODO" [2015-11-27 Fri 13:59]
++ [X] Filter should allow to either get all features =within= the GRanges:
+ complete feature has to be within the range.
++ [X] All features overlapping: =overlappingExon=: part of an exon has to
+ overlap the range. =overlappingAll=: exon or intron has to partially overlap
+ the range.
+
++ Filter should use the coordinates of the things to fetch, i.e. gene,
+ transcript or exon regions.
+
++ =within=: _seq_start >= start & _seq_end <= end.
++ =overlapping=: _seq_start <= end & _seq_end >= start.
+- State "DONE" from "TODO" [2016-01-18 Mon 08:17]
+** DONE Extend the =getGenomeFaFile= method
+ CLOSED: [2016-01-18 Mon 08:17]
+
+Search for the genome release matching the current Ensembl release, if not
+present, search for a (Ensembl) =FaFile= matching the genome version and, if
+more available, select the one with the closest release date or version.
+
+** TODO Implement a =getGenomeTwiBitFile=.
+
+The advantage over =getGenomeFaFile=? Eventually more =TwoBit= files might
+become available in future.
+Problem now is that the =seqinfo= for these guys seems a little problematic.
+
+** TODO Implement some more =GenomicFeatures= methods [4/6]
+
++ [X] =transcriptLenghts=: use the =lengthOf= method.
++ [X] =transcriptsByOverlaps=: use the same code as in =GenomicFeatures=, but
+ allow faster queries by first running the query to fetch only the specified
+ chromosomes.
++ [X] =exonsByOverlaps=.
++ [X] Compare the two above methods with the /standard/ query and multi-region
+ =GRangesFilter=.
+(+ [ ] =cds=.) CANCELED. A cds without a transcript makes no sense...
++ [ ] =distance=, =nearest=.
++ [ ] =intronsByTranscript=.
+
+** TODO Interface to the =OrganismDbi= database [/].
+
+Basically, implementing the =AnnotationDbi= methods =columns=, =select=, =keys=
+and =keytypes= methods should already be enough, but in addition I could
+implement the two additional methods below... eventually.
+
++ [ ] Implement =selectByRanges(x, ranges, columns, overlaps, ignore.strand)=:
+ supports multiple ranges. This returns a =GRanges= with one or more element(s)
+ per input range or nothing, if nothing overlapped that region. =overlaps= can
+ be =gene, tx, exons, cds, 5utr, introns or 3utr=.
+
++ [ ] Implement =selectRangesById=.
+
+** DONE Interface the =AnnotationDbi= database [6/6]
+ CLOSED: [2015-12-23 Wed 22:29]
+ - State "DONE" from "TODO" [2015-12-23 Wed 22:29]
+Implement the following methods:
++ [X] =columns=.
++ [X] =keytypes=.
++ [X] =keys=.
++ [X] =select=: I want to add a little more flexibility here: allow to specify,
+ in addition to the standard usage of keys, keytypes etc, filter object(s) to
+ perform some more fine-grained queries.
++ [X] =mapIds=.
+
++ [X] Add a section to the vignette.
+
+** DONE Enhance the shiny app to return the search result.
+ CLOSED: [2015-12-21 Mon 14:52]
+ - State "DONE" from "TODO" [2015-12-21 Mon 14:52]
+
+ - State "DONE" from "TODO" [2016-01-18 Mon 09:01]
+** DONE Implement the =ensDbFromGff= function
+ CLOSED: [2016-01-18 Mon 09:01]
+
+We could also import stuff from GFF, not only GTF.
+
+
+** DONE Fix a bug resulting in wrong CDS definitions form GTF files.
+ CLOSED: [2016-01-19 Tue 13:41]
+ - State "DONE" from "TODO" [2016-01-19 Tue 13:41]
+I've to evaluate which is the correct way, the GFF info or the GTF, in which
+start or stop codon can be outside of the coding region (which seems odd).
+Check that with the Ensembl web page and eventually contact support!
+** DONE Include functionality from the =GenomeInfoDb= to fix chromosome naming.
+ CLOSED: [2016-02-02 Tue 07:21]
+
+ - State "DONE" from "TODO" [2016-02-02 Tue 07:21]
++ [X] Implement a =seqlevelsStyle<-= method for =EnsDb=. Should do something
+ similar than the stuff for =Gviz=. If =seqlevelStyle= is /Ensembl/ keep all as
+ it is.
+ Impact of that setter:
+ - Queries support seqnames other than the ones from Ensembl.
+ - Results have seqlevels set accordingly.
+ - Check that the species is supported by =GenomeInfoDb=! Otherwise, return an error!
++ [X] Implement a =seqlevelsStyle= method for =EnsDb=.
++ [X] Implement central =formatSeqnamesForQuery= =formatSeqnamesFromQuery= methods (basically
+ replacement for =ucscToEns= and =prefixChromName=).
++ [X] =EnsDb= needs a new slot to store any data (type list).
+Specifically, use =mapSeqlevels=
+
++ *Note*: the global option =ensembldb.seqnameNotFound= allows to specify how
+ the package handles missing mappings. Allowed are: =NA=, any value and special
+ cases ="MISSING"= (causes an error) and ="ORIGINAL"= (returns the original
+ names).
+
++ Methods/functions that should be affected:
+ - [X] =getWhat=: always calling =formatSeqnamesFromQuery=.
+ - [X] =seqinfo=: always calling =formatSeqnamesFromQuery=.
+ - [X] =seqlevels=: always calling =formatSeqnamesFromQuery=.
+ - [X] =exons=: uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =exonsBy= uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =genes= uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =transcripts= uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =transcriptsBy= uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =SeqnameFilter=: always calling =formatSeqnamesForQuery=, does *not*
+ allow =NA= values, thus doesn't work if the seqname can not be changed to
+ Ensembl style.
+ - [X] =GRangesFilter=: always calls =formatSeqnamesForQuery=.
+ - [X] =threeUTRsByTranscript=
+ - [X] =fiveUTRsByTranscript=
+ - [X] =cdsBy= uses =getWhat= and =seqinfo= (restricting to used seqnames).
+ - [X] =promoters=: uses =transcripts=.
+
++ [X] At last to verification: I could use the BSGenome package to retrieve
+ sequence info from UCSC and cross check that sequence info with the two fasta
+ files that are included in ensembldb.
+
++ [X] Add examples to the Vignette.
+
++ [X] Add help.
+
+** DONE Allow more generic GTF file names in =ensDbFromGtf=
+ CLOSED: [2016-01-21 Thu 17:15]
+ - State "DONE" from "TODO" [2016-01-21 Thu 17:15]
+Somehow I have to fix that it does not work with =chr.gtf.gz=.
+
+** DONE For all queries, restrict the seqinfo to the chromosome names in the =GRanges=.
+ CLOSED: [2016-02-01 Mon 08:53]
+ - State "DONE" from "TODO" [2016-02-01 Mon 08:53]
+** DONE =GRangesFilter= for multiple regions in =GRanges=
+ CLOSED: [2016-02-04 Thu 08:02]
+
+ - State "DONE" from "TODO" [2016-02-04 Thu 08:02]
+Support multiple regions for a =GRangesFilter=.
+
+** TODO Implement a method to convert variant information within =tx= to genomic coordinates
+
+#+BEGIN_SRC R :eval never
+ ## Get the genomic sequence
+ fa <- getGenomeFaFile(edb)
+
+ ## Convert variant coordinates to genomic coordinates
+ tx <- "ENST00000070846"
+ ## Get the cds
+ txCds <- cdsBy(edb, by="tx", filter=TxidFilter(tx))
+
+ ## ENST00000070846:c.1643delG
+ varPos <- 1643
+ exWidths <- width(txCds[[tx]])
+ ## Define the exon ends in the tx.
+ exEnds <- cumsum(exWidths)
+ ## Get the first negative index.
+ exDiffs <- varPos - exEnds
+ exVar <- min(which((exDiffs) < 0))
+ ## Now we would like to know the position within that exon:
+ posInExon <- exWidths[exVar] + exDiffs[exVar]
+ ## Next the genomic coordinate:
+ ## Note: here we have to consider the strand!
+ ## fw: exon_start + (pos in exon -1)
+ ## rv: exon_end - (pos in exon -1)
+ if(as.character(strand(txCds[[tx]][1])) == "-"){
+ chromPos <- end(txCds[[tx]][exVar]) - (posInExon - 1)
+ }else{
+ chromPos <- start(txCds[[tx]][exVar]) + (posInExon -1)
+ }
+
+ ## Validation.
+ ## OK, now we get the sequence for that exon.
+ ## Check if the estimated position is a G.
+ exSeq <- getSeq(fa, txCds[[tx]][exVar])
+ substring(exSeq, first=posInExon-2, last=posInExon+2)
+ ## Hm, hard to tell... it's two Gs there!
+ substring(exSeq, first=posInExon, last=posInExon) == "G"
+ ## Get the full CDS
+ cdsSeq <- unlist(getSeq(fa, txCds[[tx]]))
+ substring(cdsSeq, first=varPos - 2, last=1643 + 2)
+ ## The same.
+ getSeq(fa, GRanges(seqnames=seqlevels(txCds[[tx]]),
+ IRanges(chromPos, chromPos), strand="-")) == "G"
+
+
+ ## Next one is c.1881DelC:
+ varPos <- 1881
+ exDiffs <- varPos - exEnds
+ exVar <- min(which(exDiffs < 0))
+ posInExon <- exWidths[exVar] + exDiffs[exVar]
+ exSeq <- getSeq(fa, txCds[[1]][exVar])
+ substring(exSeq, first=posInExon - 2, last=posInExon + 2)
+ ## Hm, again, we're right, but there are other 2 Cs there!
+
+#+END_SRC
+
+** DONE Implement a =SymbolFilter= and support a =symbol= column
+ CLOSED: [2016-09-16 Fri 15:27]
+ - State "DONE" from "TODO" [2016-09-16 Fri 15:27]
+
+Done in issues #4 and #5.
+** TODO What about using pipe and /formula-like/ filters?
+
+** DONE Fix the =select= method such that it always returns the values in the same order than the keys were
+ CLOSED: [2016-09-16 Fri 15:26]
+ - State "DONE" from "TODO" [2016-09-16 Fri 15:26]
+This should be done if only a single filter was provided; for multiple filters
+this will not work; could do it with a simple =match=.
+
+This has been done in issue #1 on github.
+
+** DONE *Always* return the attribute of the filter!
+ CLOSED: [2016-09-16 Fri 15:26]
+ - State "DONE" from "TODO" [2016-09-16 Fri 15:26]
+I have to check that; eventually do that based on an user option, or even better
+on an internal property, which can be set by =returnFilterCols(edb) <- TRUE/FALSE=.
+
+Done in issue #6.
diff --git a/vignettes/images/dblayout.png b/vignettes/images/dblayout.png
new file mode 100644
index 0000000..a88d1a6
Binary files /dev/null and b/vignettes/images/dblayout.png differ
diff --git a/vignettes/issues.org b/vignettes/issues.org
new file mode 100644
index 0000000..bfa496e
--- /dev/null
+++ b/vignettes/issues.org
@@ -0,0 +1,183 @@
+#+TODO: OPEN | CLOSED
+#+TITLE: ensembldb issues
+#+STARTUP: overview
+
+* Introduction
+
+These issues are synced with the issues in github.
+
+* How to sync them with github :noexport:
+
+Call =M-x org-sync=.
+
+* Issues of ensembldb
+:PROPERTIES:
+:LOGGING: nil
+:since:
+:url: https://api.github.com/repos/jotsetung/ensembldb
+:END:
+** OPEN Long build times of ensembldb with newer RSQLite packages
+:PROPERTIES:
+:id: 11
+:date-modification: 2016-09-12T09:19:20+0200
+:date-creation: 2016-09-12T09:19:20+0200
+:author: "jotsetung"
+:END:
+: Build times differ considerably between `RSQLite` version 1.0.0 and release candidate.
+: + [ ] Check unit tests.
+: + [ ] Check examples.
+: + [ ] Check vignette.
+** OPEN Implement a `getGenomeTwoBitFile`
+:PROPERTIES:
+:tags: ("enhancement")
+:sync: conflict-local
+:id: 2
+:date-modification: 2016-06-29T11:06:53+0200
+:date-creation: 2016-06-29T10:42:07+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: Get a `TwoBit` matching the genome release for the `EnsDb` object.
+** OPEN Convert within tx variant information to genomic coordinates
+:PROPERTIES:
+:tags: ("enhancement")
+:sync: conflict-local
+:id: 3
+:date-modification: 2016-06-29T10:50:04+0200
+:date-creation: 2016-06-29T10:50:04+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: Functionality to map variant information within tx to genomic coordinates and vice versa. Example code below:
+:
+: ```{r}
+: fa <- getGenomeFaFile(edb)
+:
+: ## Convert variant coordinates to genomic coordinates
+: tx <- "ENST00000070846"
+: ## Get the cds
+: txCds <- cdsBy(edb, by="tx", filter=TxidFilter(tx))
+:
+: ## ENST00000070846:c.1643delG
+: varPos <- 1643
+: exWidths <- width(txCds[[tx]])
+: ## Define the exon ends in the tx.
+: exEnds <- cumsum(exWidths)
+: ## Get the first negative index.
+: exDiffs <- varPos - exEnds
+: exVar <- min(which((exDiffs) < 0))
+: ## Now we would like to know the position within that exon:
+: posInExon <- exWidths[exVar] + exDiffs[exVar]
+: ## Next the genomic coordinate:
+: ## Note: here we have to consider the strand!
+: ## fw: exon_start + (pos in exon -1)
+: ## rv: exon_end - (pos in exon -1)
+: if(as.character(strand(txCds[[tx]][1])) == "-"){
+: chromPos <- end(txCds[[tx]][exVar]) - (posInExon - 1)
+: }else{
+: chromPos <- start(txCds[[tx]][exVar]) + (posInExon -1)
+: }
+:
+: ## Validation.
+: ## OK, now we get the sequence for that exon.
+: ## Check if the estimated position is a G.
+: exSeq <- getSeq(fa, txCds[[tx]][exVar])
+: substring(exSeq, first=posInExon-2, last=posInExon+2)
+: ## Hm, hard to tell... it's two Gs there!
+: substring(exSeq, first=posInExon, last=posInExon) == "G"
+: ## Get the full CDS
+: cdsSeq <- unlist(getSeq(fa, txCds[[tx]]))
+: substring(cdsSeq, first=varPos - 2, last=1643 + 2)
+: ## The same.
+: getSeq(fa, GRanges(seqnames=seqlevels(txCds[[tx]]),
+: IRanges(chromPos, chromPos), strand="-")) == "G"
+:
+:
+: ## Next one is c.1881DelC:
+: varPos <- 1881
+: exDiffs <- varPos - exEnds
+: exVar <- min(which(exDiffs < 0))
+: posInExon <- exWidths[exVar] + exDiffs[exVar]
+: exSeq <- getSeq(fa, txCds[[1]][exVar])
+: substring(exSeq, first=posInExon - 2, last=posInExon + 2)
+: ## Hm, again, we're right, but there are other 2 Cs there!
+: ```
+** CLOSED Bug in test
+:PROPERTIES:
+:sync: conflict-local
+:id: 10
+:date-modification: 2016-06-30T16:10:00+0200
+:date-creation: 2016-06-30T16:10:00+0200
+:author: "jotsetung"
+:END:
+: In test_properties function.
+** CLOSED Support for columns TXNAME and SYMBOL in select?
+:PROPERTIES:
+:sync: conflict-local
+:id: 9
+:date-modification: 2016-06-30T10:51:50+0200
+:date-creation: 2016-06-30T10:51:50+0200
+:author: "jotsetung"
+:END:
+: Are TXNAME and SYMBOL supported for select?
+: Are they supported for genes etc?
+** CLOSED Ensure `setFeatureInGRangesFilter` is always called before `addFilterColumns`
+:PROPERTIES:
+:sync: conflict-local
+:id: 7
+:date-modification: 2016-06-29T16:20:34+0200
+:date-creation: 2016-06-29T15:24:02+0200
+:author: "jotsetung"
+:END:
+** CLOSED Ensure `setFeatureInGRangesFilter` is always called before `addFilterColumns`
+:PROPERTIES:
+:sync: conflict-local
+:id: 8
+:date-modification: 2016-06-29T15:59:07+0200
+:date-creation: 2016-06-29T15:59:07+0200
+:author: "jotsetung"
+:END:
+** CLOSED Parameter to specify whether filter columns should be returned
+:PROPERTIES:
+:sync: conflict-local
+:id: 6
+:date-modification: 2016-06-29T10:53:28+0200
+:date-creation: 2016-06-29T10:53:28+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: As of now only columns specified with the `columns` argument are returned by the methods. It might however be useful to return the columns queried by the provided filters too.
+: Add a `returnFilterColumns` setting that allows to control whether filter-columns should be returned too.
+** CLOSED Add suport for `SYMBOL`
+:PROPERTIES:
+:sync: conflict-local
+:id: 5
+:date-modification: 2016-06-29T10:51:35+0200
+:date-creation: 2016-06-29T10:51:35+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: Allow `SYMBOL` to be queried by the `select` method.
+** CLOSED Implement a `SymbolFilter`
+:PROPERTIES:
+:tags: ("enhancement")
+:sync: conflict-local
+:id: 4
+:date-modification: 2016-06-29T10:51:01+0200
+:date-creation: 2016-06-29T10:51:01+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: Based on Vince's suggestion; this should symlink to `GenenameFilter`.
+** CLOSED Ensure result ordering for `select`
+:PROPERTIES:
+:tags: ("bug")
+:sync: conflict-local
+:id: 1
+:date-modification: 2016-06-29T10:40:06+0200
+:date-creation: 2016-06-29T10:39:37+0200
+:author: "jotsetung"
+:assignee: "jotsetung"
+:END:
+: If a single filter or if `keys` are provided, the ordering of the result has to match the ordering of the input.
+: For multiple filters this would not work;
--
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-ensembldb.git
More information about the debian-med-commit
mailing list