[med-svn] [r-bioc-genomicfeatures] 05/09: New upstream version 1.30.0+dfsg

Andreas Tille tille at debian.org
Thu Nov 9 08:51:21 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-bioc-genomicfeatures.

commit 5f7c2202f3c39eeed8a39f23cf01c744243817a0
Author: Andreas Tille <tille at debian.org>
Date:   Thu Nov 9 09:33:16 2017 +0100

    New upstream version 1.30.0+dfsg
---
 DESCRIPTION                                        |  32 ++-
 NAMESPACE                                          |   9 +-
 NEWS                                               |  44 +++
 R/Ensembl.utils.R                                  |  70 +++--
 R/TxDb-SELECT-helpers.R                            |  15 +-
 R/TxDb-class.R                                     |  58 ++--
 R/TxDb-schema.R                                    |  20 +-
 R/coordinate-mapping-methods.R                     |   6 +-
 R/exonicParts.R                                    |  18 +-
 R/extractTranscriptSeqs.R                          |  62 ++---
 R/makeTxDb.R                                       | 180 ++++++------
 R/makeTxDbFromEnsembl.R                            | 301 +++++++++++++++++++++
 R/makeTxDbFromGRanges.R                            |  70 ++++-
 R/transcriptsByOverlaps.R                          |  12 +-
 build/vignette.rds                                 | Bin 292 -> 293 bytes
 inst/doc/GenomicFeatures.R                         |  67 +++--
 inst/doc/GenomicFeatures.Rnw                       |  57 ++--
 inst/doc/GenomicFeatures.pdf                       | Bin 166472 -> 214029 bytes
 inst/extdata/GFF3_files/a.sqlite                   | Bin 247808 -> 286720 bytes
 .../GFF3_files/dmel-1000-r5.11.filtered.sqlite     | Bin 45056 -> 69632 bytes
 .../extdata/GTF_files/Aedes_aegypti.partial.sqlite | Bin 66560 -> 98304 bytes
 inst/script/makeTxDbs.R                            |  92 ++++++-
 man/DEFAULT_CIRC_SEQS.Rd                           |   3 +-
 man/TxDb-class.Rd                                  |  23 +-
 man/coordinate-mapping-methods.Rd                  |   3 +-
 man/coverageByTranscript.Rd                        |  19 +-
 man/disjointExons.Rd                               |  11 +-
 man/exonicParts.Rd                                 | 163 +++++++++++
 man/extractTranscriptSeqs.Rd                       |   8 +-
 man/extractUpstreamSeqs.Rd                         |  19 +-
 man/makeTxDb.Rd                                    |  40 +--
 man/makeTxDbFromBiomart.Rd                         |  34 +--
 man/makeTxDbFromEnsembl.Rd                         |  79 ++++++
 man/makeTxDbFromGFF.Rd                             |   6 +-
 man/makeTxDbFromGRanges.Rd                         |  14 +-
 man/makeTxDbFromUCSC.Rd                            |  11 +-
 man/transcriptLengths.Rd                           |  21 +-
 man/transcripts.Rd                                 |  33 +--
 man/transcriptsBy.Rd                               |  10 +-
 man/transcriptsByOverlaps.Rd                       |  59 ++--
 vignettes/GenomicFeatures.Rnw                      |  57 ++--
 41 files changed, 1279 insertions(+), 447 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 4c6e8e5..d9ed325 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: GenomicFeatures
 Title: Tools for making and manipulating transcript centric annotations
-Version: 1.28.5
+Version: 1.30.0
 Encoding: UTF-8
 Author: M. Carlson, H. Pagès, P. Aboyoun, S. Falcon, M. Morgan,
 	D. Sarkar, M. Lawrence
@@ -17,11 +17,11 @@ Description: A set of tools and methods for making and manipulating
 	format.
 Maintainer: Bioconductor Package Maintainer <maintainer at bioconductor.org>
 Depends: BiocGenerics (>= 0.1.0), S4Vectors (>= 0.9.47), IRanges (>=
-        2.9.19), GenomeInfoDb (>= 1.11.4), GenomicRanges (>= 1.27.6),
+        2.11.16), GenomeInfoDb (>= 1.13.1), GenomicRanges (>= 1.29.14),
         AnnotationDbi (>= 1.33.15)
-Imports: methods, utils, stats, tools, DBI, RSQLite (>= 2.0), RCurl,
-        XVector, Biostrings (>= 2.23.3), rtracklayer (>= 1.29.24),
-        biomaRt (>= 2.17.1), Biobase (>= 2.15.1)
+Imports: methods, utils, stats, tools, DBI, RSQLite (>= 2.0), RMySQL,
+        RCurl, XVector, Biostrings (>= 2.23.3), rtracklayer (>=
+        1.29.24), biomaRt (>= 2.17.1), Biobase (>= 2.15.1)
 Suggests: org.Mm.eg.db, org.Hs.eg.db, BSgenome,
         BSgenome.Hsapiens.UCSC.hg19 (>= 1.3.17),
         BSgenome.Celegans.UCSC.ce2, BSgenome.Dmelanogaster.UCSC.dm3 (>=
@@ -32,19 +32,21 @@ Suggests: org.Mm.eg.db, org.Hs.eg.db, BSgenome,
         TxDb.Hsapiens.UCSC.hg19.lincRNAsTranscripts,
         TxDb.Hsapiens.UCSC.hg38.knownGene,
         SNPlocs.Hsapiens.dbSNP141.GRCh38, Rsamtools, pasillaBamSubset
-        (>= 0.0.5), GenomicAlignments, RUnit, BiocStyle, knitr
+        (>= 0.0.5), GenomicAlignments, ensembldb, RUnit, BiocStyle,
+        knitr
 Collate: utils.R TxDb-schema.R TxDb-SELECT-helpers.R Ensembl.utils.R
         findCompatibleMarts.R TxDb-class.R FeatureDb-class.R makeTxDb.R
-        makeTxDbFromUCSC.R makeTxDbFromBiomart.R makeTxDbFromGRanges.R
-        makeTxDbFromGFF.R makeFeatureDbFromUCSC.R mapIdsToRanges.R
-        id2name.R transcripts.R transcriptsBy.R transcriptsByOverlaps.R
-        transcriptLengths.R exonicParts.R disjointExons.R features.R
-        microRNAs.R extractTranscriptSeqs.R extractUpstreamSeqs.R
-        getPromoterSeq-methods.R makeTxDbPackage.R select-methods.R
-        nearest-methods.R transcriptLocs2refLocs.R
-        coordinate-mapping-methods.R coverageByTranscript.R zzz.R
+        makeTxDbFromUCSC.R makeTxDbFromBiomart.R makeTxDbFromEnsembl.R
+        makeTxDbFromGRanges.R makeTxDbFromGFF.R makeFeatureDbFromUCSC.R
+        mapIdsToRanges.R id2name.R transcripts.R transcriptsBy.R
+        transcriptsByOverlaps.R transcriptLengths.R exonicParts.R
+        disjointExons.R features.R microRNAs.R extractTranscriptSeqs.R
+        extractUpstreamSeqs.R getPromoterSeq-methods.R
+        makeTxDbPackage.R select-methods.R nearest-methods.R
+        transcriptLocs2refLocs.R coordinate-mapping-methods.R
+        coverageByTranscript.R zzz.R
 VignetteBuilder: knitr
 biocViews: Genetics, Infrastructure, Annotation, Sequencing,
         GenomeAnnotation
 NeedsCompilation: no
-Packaged: 2017-09-19 22:44:01 UTC; biocbuild
+Packaged: 2017-10-30 22:56:58 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
index de3e20e..6052b36 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -1,15 +1,17 @@
 import(methods)
 importFrom(stats, setNames)
-importFrom(utils, download.file, installed.packages, read.table, browseURL,
+importFrom(utils, download.file, packageDescription, read.table, browseURL,
                   as.person, capture.output, str)
 importFrom(tools, file_ext, file_path_sans_ext)
 
 importFrom(RCurl, getURL)
-importMethodsFrom(DBI, dbConnect, dbExecute, dbGetQuery,
+importMethodsFrom(DBI, dbConnect, dbDisconnect,
+                       dbExecute, dbGetQuery,
                        dbReadTable, dbWriteTable, dbListTables,
                        dbListFields)
 
 importFrom(RSQLite, SQLite, SQLITE_RO)
+importFrom(RMySQL, MySQL)
 
 import(AnnotationDbi)
 import(BiocGenerics)
@@ -51,6 +53,9 @@ export(
   getChromInfoFromBiomart,
   makeTxDbFromBiomart,
 
+  ## makeTxDbFromEnsembl.R:
+  makeTxDbFromEnsembl,
+
   ## makeTxDbFromGRanges.R:
   makeTxDbFromGRanges,
 
diff --git a/NEWS b/NEWS
index 6a133f1..44d1422 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,47 @@
+CHANGES IN VERSION 1.30
+-----------------------
+
+NEW FEATURES
+
+    o Add makeTxDbFromEnsembl() for creating a TxDb object by querying
+      directly an Ensembl MySQL server. This seems to be faster and more
+      reliable than makeTxDbFromBiomart().
+
+    o Improve makeTxDbFromBiomart() support for EnsemblGenomes marts
+      fungal_mart, metazoa_mart, plants_mart, and protist_mart.
+
+    o makeTxDbFromGFF() and makeTxDbFromGRanges() now import the CDS phase.
+      This required a change in the schema of the underlying SQLite db of
+      TxDb objects. This is still a work-in-progress e.g. cdsBy(txdb, by="tx")
+      still needs to be modified to return the phase info.
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+    o The *ByOverlaps() functions now use the same 'maxgap' and 'minoverlap'
+      defaults as subsetByOverlaps().
+
+DEPRECATED AND DEFUNCT
+
+    o Remove 'force' argument from seqinfo() and seqlevels() setters (the
+      argument got deprecated in BioC 3.5 in favor of new and more flexible
+      'pruning.mode' argument).
+
+BUG FIXES
+
+    o exonicParts() and intronicParts() are now documented.
+
+    o Address a couple of issues pointed out by Matt Chambers in
+      internal helpers get_organism_from_Ensembl_Mart_dataset() and
+      .extractEnsemblReleaseFromDbVersion() used by makeTxDbFromBiomart().
+
+    o Fix internal utility .Ensembl_getMySQLCoreDir(). Was failing for some
+      of the 69 datasets from the Ensembl mart, causing makeTxDbFromBiomart()
+      to fail loopking up the organism scientific name and the chromosome
+      lengths. Thanks to Matt Chambers for reporting this.
+
+    o Some tweaks and fixes needed to support RSQLite 2.0.
+
+
 CHANGES IN VERSION 1.28
 -----------------------
 
diff --git a/R/Ensembl.utils.R b/R/Ensembl.utils.R
index 943f1c2..959b94e 100644
--- a/R/Ensembl.utils.R
+++ b/R/Ensembl.utils.R
@@ -16,7 +16,7 @@
 ###
 ### Ensembl Core Schema Documentation:
 ###   http://www.ensembl.org/info/docs/api/core/core_schema.html
-### The full schema:
+### Full schema:
 ###   ftp://ftp.ensembl.org/pub/ensembl/sql/table.sql
 ###
 
@@ -89,8 +89,8 @@ ftp_url_to_Ensembl_gtf <- function(release=NA)
 
 ### 'kingdom' must be NA or one of the EnsemblGenomes marts i.e. "bacteria",
 ### "fungi", "metazoa", "plants", or "protists".
-.Ensembl_listMySQLCoreDirs <- function(release=NA,
-                                       use.grch37=FALSE, kingdom=NA, url=NA)
+Ensembl_listMySQLCoreDirs <- function(release=NA,
+                                      use.grch37=FALSE, kingdom=NA, url=NA)
 {
     if (is.na(url))
         url <- ftp_url_to_Ensembl_mysql(release, use.grch37, kingdom)
@@ -106,10 +106,10 @@ ftp_url_to_Ensembl_gtf <- function(release=NA)
 {
     if (is.na(url))
         url <- ftp_url_to_Ensembl_mysql(release, use.grch37, kingdom)
-    core_dirs <- .Ensembl_listMySQLCoreDirs(release=release,
-                                            use.grch37=use.grch37,
-                                            kingdom=kingdom,
-                                            url=url)
+    core_dirs <- Ensembl_listMySQLCoreDirs(release=release,
+                                           use.grch37=use.grch37,
+                                           kingdom=kingdom,
+                                           url=url)
     trimmed_core_dirs <- sub("_core_.*$", "", core_dirs)
     shortnames <- sub("^(.)[^_]*_", "\\1", trimmed_core_dirs)
     if (dataset == "mfuro_gene_ensembl") {
@@ -205,35 +205,37 @@ ftp_url_to_Ensembl_gtf <- function(release=NA)
     seq_region_attrib$seq_region_id[seq_region_attrib$attrib_type_id == id0]
 }
 
-### Fetch sequence names and lengths from the 'seq_region' table.
-### Typical use:
-###   core_url <- .Ensembl_getMySQLCoreUrl("hsapiens_gene_ensembl")
-###   extra_seqnames <- c("GL000217.1", "NC_012920", "HG79_PATCH")
-###   .Ensembl_fetchChromLengthsFromCoreUrl(core_url,
-###                                         extra_seqnames=extra_seqnames)
-.Ensembl_fetchChromLengthsFromCoreUrl <- function(core_url, extra_seqnames=NULL)
+extract_chromlengths_from_seq_region <- function(seq_region,
+                                                 top_level_ids,
+                                                 extra_seqnames=NULL,
+                                                 seq_region_ids=NULL)
 {
-    seq_region <- .Ensembl_getTable_seq_region(core_url, with.coord_system=TRUE)
-
     ## 1st filtering: Keep only "default_version" sequences.
-    i1 <- grep("default_version", seq_region$coord_system_attrib, fixed=TRUE)
+    keep_me <- grepl("default_version", seq_region$coord_system_attrib,
+                    fixed=TRUE)
+    if (!is.null(seq_region_ids))
+        keep_me <- keep_me | (seq_region$seq_region_id %in% seq_region_ids)
+    i1 <- which(keep_me)
     j1 <- c("seq_region_id", "name", "length",
             "coord_system_name", "coord_system_rank")
     ans <- seq_region[i1, j1, drop=FALSE]
 
     ## 2nd filtering: Keep only "toplevel" sequences that are not LRGs +
     ## extra sequences.
-    ids <- .Ensembl_fetchTopLevelSequenceIds(core_url)
-    i2 <- ans$seq_region_id %in% ids & ans$coord_system_name != "lrg"
+    keep_me <- ans$seq_region_id %in% top_level_ids &
+               ans$coord_system_name != "lrg"
     if (!is.null(extra_seqnames)) {
         extra_seqnames <- unique(extra_seqnames)
         if (!all(extra_seqnames %in% ans$name))
             stop("failed to fetch all chromosome lengths")
-        extra_seqnames <- setdiff(extra_seqnames, ans$name[i2])
+        extra_seqnames <- setdiff(extra_seqnames, ans$name[keep_me])
         ## Add extra sequences to the index.
-        i2 <- i2 | (ans$name %in% extra_seqnames)
+        keep_me <- keep_me | (ans$name %in% extra_seqnames)
     }
-    j2 <- c("name", "length", "coord_system_rank")
+    if (!is.null(seq_region_ids))
+        keep_me <- keep_me | (ans$seq_region_id %in% seq_region_ids)
+    i2 <- which(keep_me)
+    j2 <- c("seq_region_id", "name", "length", "coord_system_rank")
     ans <- ans[i2, j2, drop=FALSE]
 
     ## Ordering: First by rank, then by name.
@@ -246,8 +248,11 @@ ftp_url_to_Ensembl_gtf <- function(release=NA)
     ## by keeping rows with the lowest coord_system_rank. This is
     ## straightforward because the rows are already ordered from lowest to
     ## highest ranks.
-    i3 <- !duplicated(ans$name)
-    j3 <- c("name", "length")
+    keep_me <- !duplicated(ans$name)
+    if (!is.null(seq_region_ids))
+        keep_me <- keep_me | (ans$seq_region_id %in% seq_region_ids)
+    i3 <- which(keep_me)
+    j3 <- c("seq_region_id", "name", "length")
     ans <- ans[i3, j3, drop=FALSE]
 
     ## Final tidying.
@@ -255,6 +260,23 @@ ftp_url_to_Ensembl_gtf <- function(release=NA)
     ans
 }
 
+### Fetch sequence names and lengths from the 'seq_region' table.
+### Typical use:
+###   core_url <- .Ensembl_getMySQLCoreUrl("hsapiens_gene_ensembl")
+###   extra_seqnames <- c("GL000217.1", "NC_012920", "HG79_PATCH")
+###   .Ensembl_fetchChromLengthsFromCoreUrl(core_url,
+###                                         extra_seqnames=extra_seqnames)
+.Ensembl_fetchChromLengthsFromCoreUrl <- function(core_url, extra_seqnames=NULL)
+{
+    seq_region <- .Ensembl_getTable_seq_region(core_url, with.coord_system=TRUE)
+
+    top_level_ids <- .Ensembl_fetchTopLevelSequenceIds(core_url)
+    ans <- extract_chromlengths_from_seq_region(seq_region,
+                                                top_level_ids,
+                                                extra_seqnames=extra_seqnames)
+    ans[-1L]  # drop "seq_region_id" col
+}
+
 
 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ### get_organism_from_Ensembl_Mart_dataset()
diff --git a/R/TxDb-SELECT-helpers.R b/R/TxDb-SELECT-helpers.R
index 9eb889e..fd31ae6 100644
--- a/R/TxDb-SELECT-helpers.R
+++ b/R/TxDb-SELECT-helpers.R
@@ -141,17 +141,6 @@
 
 
 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-### TxDb_schema_version()
-###
-
-TxDb_schema_version <- function(txdb)
-{
-    version <- AnnotationDbi:::.getMetaValue(dbconn(txdb), "DBSCHEMAVERSION")
-    numeric_version(version)
-}
-
-
-### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ### The 2 flexible helpers for SELECT'ing stuff from a TxDb object:
 ###   - TxDb_SELECT_from_INNER_JOIN()
 ###   - TxDb_SELECT_from_splicing_bundle()
@@ -245,10 +234,12 @@ TxDb_SELECT_from_splicings <- function(txdb, filter=list(),
                                        cds_join_type="LEFT")
 {
     schema_version <- TxDb_schema_version(txdb)
+    splicing_columns <- TXDB_table_columns("splicing",
+                                           schema_version=schema_version)
     exon_columns <- TXDB_table_columns("exon", schema_version=schema_version)
     cds_columns <- TXDB_table_columns("cds", schema_version=schema_version)
     cds_columns <- cds_columns[c("id", "name", "start", "end")]
-    columns <- unique(c("_tx_id", "exon_rank", exon_columns, cds_columns))
+    columns <- unique(c(splicing_columns, exon_columns, cds_columns))
     TxDb_SELECT_from_splicing_bundle(txdb, columns,
                                      filter=filter, orderby=orderby,
                                      cds_join_type=cds_join_type)
diff --git a/R/TxDb-class.R b/R/TxDb-class.R
index 3d0fae0..42ed9c3 100644
--- a/R/TxDb-class.R
+++ b/R/TxDb-class.R
@@ -4,34 +4,6 @@
 
 
 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-### TxDb schema
-###
-
-### Not exported.
-DB_TYPE_NAME <- "Db type"
-DB_TYPE_VALUE <- "TxDb"  # same as the name of the class below
-DB_SCHEMA_VERSION <- "1.1"  # DON'T FORGET TO BUMP THIS WHEN YOU CHANGE THE
-                            # SCHEMA
-
-.schema_version <- function(conn)
-    numeric_version(AnnotationDbi:::.getMetaValue(conn, "DBSCHEMAVERSION"))
-
-makeFeatureColnames <- function(feature_shortname=c("tx", "exon", "cds"),
-                                no.tx_type=FALSE)
-{
-    feature_shortname <- match.arg(feature_shortname)
-    suffixes <- "name"
-    if (feature_shortname == "tx" && !no.tx_type)
-        suffixes <- c(suffixes, "type")
-    suffixes <- c(suffixes, "chrom", "strand", "start", "end")
-    ans <- c(paste0("_", feature_shortname, "_id"),
-             paste0(feature_shortname, "_", suffixes))
-    names(ans) <- c("id", suffixes)
-    ans
-}
-
-
-### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ### TxDb class definition
 ###
 
@@ -140,6 +112,7 @@ load_transcripts <- function(txdb, drop.tx_name=FALSE,
 
 .format_splicings <- function(splicings, drop.exon_name=FALSE,
                                          drop.cds_name=FALSE,
+                                         drop.cds_phase=FALSE,
                                          set.col.class=FALSE)
 {
     COL2CLASS <- c(
@@ -154,7 +127,8 @@ load_transcripts <- function(txdb, drop.tx_name=FALSE,
         cds_id="integer",
         cds_name="character",
         cds_start="integer",
-        cds_end="integer"
+        cds_end="integer",
+        cds_phase="integer"
     )
     if (is.null(splicings)) {
         splicings <- makeZeroRowDataFrame(COL2CLASS)
@@ -171,6 +145,8 @@ load_transcripts <- function(txdb, drop.tx_name=FALSE,
             COL2CLASS <- COL2CLASS[names(COL2CLASS) != "exon_name"]
         if (!has_col(splicings, "cds_name"))
             COL2CLASS <- COL2CLASS[names(COL2CLASS) != "cds_name"]
+        if (!has_col(splicings, "cds_phase"))
+            COL2CLASS <- COL2CLASS[names(COL2CLASS) != "cds_phase"]
         if (!identical(names(splicings), names(COL2CLASS)))
             splicings <- splicings[names(COL2CLASS)]
         if (set.col.class)
@@ -182,17 +158,22 @@ load_transcripts <- function(txdb, drop.tx_name=FALSE,
     if (drop.cds_name && has_col(splicings, "cds_name") &&
                          all(is.na(splicings$cds_name)))
         splicings$cds_name <- NULL
+    if (drop.cds_phase && has_col(splicings, "cds_phase") &&
+                         all(is.na(splicings$cds_phase)))
+        splicings$cds_phase <- NULL
     splicings
 }
 
 load_splicings <- function(txdb, drop.exon_name=FALSE,
                                  drop.cds_name=FALSE,
+                                 drop.cds_phase=FALSE,
                                  set.col.class=FALSE)
 {
     splicings <- TxDb_SELECT_from_splicings(txdb)
     colnames(splicings) <- sub("^_", "", colnames(splicings))
     .format_splicings(splicings, drop.exon_name=drop.exon_name,
                                  drop.cds_name=drop.cds_name,
+                                 drop.cds_phase=drop.cds_phase,
                                  set.col.class=set.col.class)
 }
 
@@ -234,8 +215,8 @@ load_genes <- function(txdb, set.col.class=FALSE)
 ### TODO: Add more checks!
 .valid.transcript.table <- function(conn)
 {
-    no_tx_type <- .schema_version(conn) < "1.1"
-    colnames <- makeFeatureColnames("tx", no_tx_type)
+    schema_version <- TxDb_schema_version(conn)
+    colnames <- TXDB_table_columns("transcript", schema_version=schema_version)
     msg <- AnnotationDbi:::.valid.table.colnames(conn, "transcript", colnames)
     if (!is.null(msg))
         return(msg)
@@ -245,7 +226,8 @@ load_genes <- function(txdb, set.col.class=FALSE)
 ### TODO: Add more checks!
 .valid.exon.table <- function(conn)
 {
-    colnames <- makeFeatureColnames("exon")
+    schema_version <- TxDb_schema_version(conn)
+    colnames <- TXDB_table_columns("exon", schema_version=schema_version)
     msg <- AnnotationDbi:::.valid.table.colnames(conn, "exon", colnames)
     if (!is.null(msg))
         return(msg)
@@ -255,7 +237,8 @@ load_genes <- function(txdb, set.col.class=FALSE)
 ### TODO: Add more checks!
 .valid.cds.table <- function(conn)
 {
-    colnames <- makeFeatureColnames("cds")
+    schema_version <- TxDb_schema_version(conn)
+    colnames <- TXDB_table_columns("cds", schema_version=schema_version)
     msg <- AnnotationDbi:::.valid.table.colnames(conn, "cds", colnames)
     if (!is.null(msg))
         return(msg)
@@ -265,7 +248,8 @@ load_genes <- function(txdb, set.col.class=FALSE)
 ### TODO: Add more checks!
 .valid.splicing.table <- function(conn)
 {
-    colnames <- c("_tx_id", "exon_rank", "_exon_id", "_cds_id")
+    schema_version <- TxDb_schema_version(conn)
+    colnames <- TXDB_table_columns("splicing", schema_version=schema_version)
     msg <- AnnotationDbi:::.valid.table.colnames(conn, "splicing", colnames)
     if (!is.null(msg))
         return(msg)
@@ -296,8 +280,7 @@ load_genes <- function(txdb, set.col.class=FALSE)
 {
     conn <- dbconn(x)
 
-    c(AnnotationDbi:::.valid.metadata.table(conn, DB_TYPE_NAME,
-                                            DB_TYPE_VALUE),
+    c(AnnotationDbi:::.valid.metadata.table(conn, DB_TYPE_NAME, DB_TYPE_VALUE),
       .valid.transcript.table(conn),
       .valid.exon.table(conn),
       .valid.cds.table(conn),
@@ -362,7 +345,7 @@ setMethod("seqlevels0", "TxDb",
 ### Adapted from default "seqlevels<-" method defined in GenomeInfoDb.
 ### We only support "renaming" and "strict subsetting" modes.
 .set_TxDb_seqlevels <-
-    function(x, force=FALSE,
+    function(x,
              pruning.mode=c("error", "coarse", "fine", "tidy"),
              value)
 {
@@ -535,6 +518,7 @@ keep_user_seqlevels_from_TxDb <- function(x, txdb)
                                                     set.col.class=TRUE)
     splicings <- .format_splicings(splicings, drop.exon_name=TRUE,
                                               drop.cds_name=TRUE,
+                                              drop.cds_phase=TRUE,
                                               set.col.class=TRUE)
     genes <- .format_genes(genes, set.col.class=TRUE)
     chrominfo <- .format_chrominfo(chrominfo, set.col.class=TRUE)
diff --git a/R/TxDb-schema.R b/R/TxDb-schema.R
index 2445c5d..2d1dd9f 100644
--- a/R/TxDb-schema.R
+++ b/R/TxDb-schema.R
@@ -14,6 +14,21 @@
 ###   - metadata (not described here)
 
 
+### Not exported.
+DB_TYPE_NAME <- "Db type"
+DB_TYPE_VALUE <- "TxDb"  # same as the name of the class below
+DB_SCHEMA_VERSION <- "1.2"  # DON'T FORGET TO BUMP THIS WHEN YOU CHANGE THE
+                            # SCHEMA
+
+### Return the *effective* schema version.
+TxDb_schema_version <- function(txdb)
+{
+    conn <- if (is(txdb, "TxDb")) dbconn(txdb) else txdb
+    version <- AnnotationDbi:::.getMetaValue(conn, "DBSCHEMAVERSION")
+    numeric_version(version)
+}
+
+
 ### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
 ### Table columns
 ###
@@ -66,7 +81,8 @@ TXDB_SPLICING_COLDEFS <- c(
     `_tx_id`="INTEGER NOT NULL",
     exon_rank="INTEGER NOT NULL",
     `_exon_id`="INTEGER NOT NULL",
-    `_cds_id`="INTEGER NULL"
+    `_cds_id`="INTEGER NULL",
+    cds_phase="INTEGER NULL"
 )
 
 TXDB_SPLICING_COLUMNS <- names(TXDB_SPLICING_COLDEFS)
@@ -166,6 +182,8 @@ TXDB_table_columns <- function(table, schema_version=NA)
         return(columns)
     if (table == "transcript" && schema_version < numeric_version("1.1"))
         columns <- columns[columns != "tx_type"]
+    if (table == "splicing" && schema_version < numeric_version("1.2"))
+        columns <- columns[columns != "cds_phase"]
     columns
 }
 
diff --git a/R/coordinate-mapping-methods.R b/R/coordinate-mapping-methods.R
index e852f14..340ad28 100644
--- a/R/coordinate-mapping-methods.R
+++ b/R/coordinate-mapping-methods.R
@@ -172,7 +172,8 @@ setMethod("mapToTranscripts", c("GenomicRanges", "GRangesList"),
 
         ## findOverlaps determines pairs
         hits <- findOverlaps(x, unlist(transcripts, use.names=FALSE), 
-                             type="within", ignore.strand=ignore.strand)
+                             minoverlap=1L, type="within",
+                             ignore.strand=ignore.strand)
         .mapToTranscripts(x, transcripts, hits, ignore.strand, intronJunctions)
     }
 )
@@ -249,7 +250,8 @@ setMethod("pmapToTranscripts", c("GenomicRanges", "GRangesList"),
 
         ## map i-th elements
         hits <- findOverlaps(x, unlist(transcripts, use.names=FALSE), 
-                             type="within", ignore.strand=ignore.strand)
+                             minoverlap=1L, type="within",
+                             ignore.strand=ignore.strand)
         ith <- queryHits(hits) ==
                togroup(PartitioningByWidth(transcripts))[subjectHits(hits)]
         map <- .mapToTranscripts(x, transcripts, hits[ith], ignore.strand)
diff --git a/R/exonicParts.R b/R/exonicParts.R
index 137adda..7e49493 100644
--- a/R/exonicParts.R
+++ b/R/exonicParts.R
@@ -9,10 +9,10 @@
 
 ### Return a GRanges object with 1 range per transcript and metadata columns
 ### tx_id, tx_name, and gene_id.
-### If 'drop.geneless' is FALSE (the default) then the transcripts are returned
-### in the same order as with transcripts(), which is expected to be by
-### transcript id (tx_id). Otherwise they are ordered first by gene id
-### (gene_id), then by transcript id.
+### If 'drop.geneless' is FALSE (the default) then the transcripts are
+### returned in the same order as with transcripts(), which is expected
+### to be by transcript id (tx_id). Otherwise they are ordered first by
+### gene id (gene_id), then by transcript id.
 .tidy_transcripts <- function(txdb, drop.geneless=FALSE)
 {
     tx <- transcripts(txdb, columns=c("tx_id", "tx_name", "gene_id"))
@@ -85,7 +85,11 @@
     ans <- disjoin(x, with.revmap=TRUE)
     revmap <- mcols(ans)$revmap
     ans_mcols <- lapply(mcols(x),
-                        function(col) unique(extractList(col, revmap)))
+                        function(col) {
+                            col <- unique(extractList(col, revmap))
+                            col[!is.na(col)]
+                        }
+                 )
     mcols(ans) <- DataFrame(ans_mcols)
     if (linked.to.single.gene.only) {
         keep_idx <- which(elementNROWS(mcols(ans)$gene_id) == 1L)
@@ -96,7 +100,7 @@
 }
 
 ### Return a disjoint and strictly sorted GRanges object with 1 range per
-### exonic part and metadata columns tx_id, tx_name, gene_id, exon_id,
+### exonic part and with metadata columns tx_id, tx_name, gene_id, exon_id,
 ### exon_name, and exon_rank.
 exonicParts <- function(txdb, linked.to.single.gene.only=FALSE)
 {
@@ -107,7 +111,7 @@ exonicParts <- function(txdb, linked.to.single.gene.only=FALSE)
 }
 
 ### Return a disjoint and strictly sorted GRanges object with 1 range per
-### intronic part and metadata columns tx_id, tx_name, and gene_id.
+### intronic part and with metadata columns tx_id, tx_name, and gene_id.
 intronicParts <- function(txdb, linked.to.single.gene.only=FALSE)
 {
     if (!isTRUEorFALSE(linked.to.single.gene.only))
diff --git a/R/extractTranscriptSeqs.R b/R/extractTranscriptSeqs.R
index 839be7f..89b394f 100644
--- a/R/extractTranscriptSeqs.R
+++ b/R/extractTranscriptSeqs.R
@@ -169,37 +169,35 @@ if (FALSE) {
     extractTranscriptSeqs(x, transcripts, strand=strand)
 }
 
-setMethod("extractTranscriptSeqs", "ANY",
-    function(x, transcripts, ...)
-    {
-        if (is(transcripts, "GRangesList")) {
-            if (length(list(...)) != 0L)
-                stop(wmsg("additional arguments are allowed only when ",
-                          "'transcripts' is not a GRangesList object"))
-        } else {
-            transcripts <- try(exonsBy(transcripts, by="tx", ...),
-                               silent=TRUE)
-            if (is(transcripts, "try-error"))
-                stop(wmsg("failed to extract the exon ranges ",
-                          "from 'transcripts' ",
-                          "with exonsBy(transcripts, by=\"tx\", ...)"))
-        }
-        idx1 <- which(elementNROWS(transcripts) != 0L)
-        tx1 <- transcripts[idx1]
-        .check_exon_chrom(tx1)
-        .check_exon_rank(tx1)
-
-        seqlevels(tx1) <- seqlevelsInUse(tx1)
-        ## 'seqnames1' is just an ordinary factor (not Rle) parallel to 'tx1'.
-        seqnames1 <- unlist(runValue(seqnames(tx1)), use.names=FALSE)
-        dnaset_list <- lapply(levels(seqnames1),
-                              .extractTranscriptSeqsFromOneSeq, x, tx1)
-        ans <- rep.int(DNAStringSet(""), length(transcripts))
-        names(ans) <- names(transcripts)
-        ans[idx1] <- unsplit_list_of_XVectorList("DNAStringSet",
-                                                 dnaset_list,
-                                                 seqnames1)
-        ans 
+.extractTranscriptSeqs_default <- function(x, transcripts, ...)
+{
+    if (is(transcripts, "GRangesList")) {
+        if (length(list(...)) != 0L)
+            stop(wmsg("additional arguments are allowed only when ",
+                      "'transcripts' is not a GRangesList object"))
+    } else {
+        transcripts <- try(exonsBy(transcripts, by="tx", ...),
+                           silent=TRUE)
+        if (is(transcripts, "try-error"))
+            stop(wmsg("failed to extract the exon ranges from 'transcripts' ",
+                      "with exonsBy(transcripts, by=\"tx\", ...)"))
     }
-)
+    idx1 <- which(elementNROWS(transcripts) != 0L)
+    tx1 <- transcripts[idx1]
+    .check_exon_chrom(tx1)
+    .check_exon_rank(tx1)
+
+    seqlevels(tx1) <- seqlevelsInUse(tx1)
+    ## 'seqnames1' is just an ordinary factor (not Rle) parallel to 'tx1'.
+    seqnames1 <- unlist(runValue(seqnames(tx1)), use.names=FALSE)
+    dnaset_list <- lapply(levels(seqnames1),
+                          .extractTranscriptSeqsFromOneSeq, x, tx1)
+    ans <- rep.int(DNAStringSet(""), length(transcripts))
+    names(ans) <- names(transcripts)
+    ans[idx1] <- unsplit_list_of_XVectorList("DNAStringSet",
+                                             dnaset_list,
+                                             seqnames1)
+    ans 
+}
+setMethod("extractTranscriptSeqs", "ANY", .extractTranscriptSeqs_default)
 
diff --git a/R/makeTxDb.R b/R/makeTxDb.R
index bd09262..6f44fb6 100644
--- a/R/makeTxDb.R
+++ b/R/makeTxDb.R
@@ -7,10 +7,10 @@
 ### 1st group of helper functions for makeTxDb()
 ###
 ### 4 functions to check and normalize the input of makeTxDb():
-###   o .normarg_makeTxDb_transcripts()
-###   o .normarg_makeTxDb_splicings()
-###   o .normarg_makeTxDb_genes()
-###   o .normarg_makeTxDb_chrominfo()
+###   o .makeTxDb_normarg_transcripts()
+###   o .makeTxDb_normarg_splicings()
+###   o .makeTxDb_normarg_genes()
+###   o .makeTxDb_normarg_chrominfo()
 
 .all_logical_NAs <- function(x)
 {
@@ -35,8 +35,12 @@
     x
 }
 
-.checkForeignKey <- function(referring_vals, referring_type, referring_colname,
-                             referred_vals, referred_type, referred_colname)
+.check_foreign_key <- function(referring_vals,
+                               referring_type,
+                               referring_colname,
+                               referred_vals,
+                               referred_type,
+                               referred_colname)
 {
     if (!is.na(referring_type) && !is(referring_vals, referring_type))
         stop("'", referring_colname, "' must be of type ", referring_type)
@@ -49,7 +53,7 @@
              "be present in '", referred_colname, "'")
 }
 
-.normarg_makeTxDb_transcripts <- function(transcripts)
+.makeTxDb_normarg_transcripts <- function(transcripts)
 {
     .REQUIRED_COLS <- c("tx_id", "tx_chrom", "tx_strand", "tx_start", "tx_end")
     .OPTIONAL_COLS <- c("tx_name", "tx_type")
@@ -97,15 +101,16 @@
     transcripts
 }
 
-.normarg_makeTxDb_splicings <- function(splicings, transcripts_tx_id)
+.makeTxDb_normarg_splicings <- function(splicings, transcripts_tx_id)
 {
     .REQUIRED_COLS <- c("tx_id", "exon_rank", "exon_start", "exon_end")
     .OPTIONAL_COLS <- c("exon_id", "exon_name", "exon_chrom", "exon_strand",
-                        "cds_id", "cds_name", "cds_start", "cds_end")
+                        "cds_id", "cds_name", "cds_start", "cds_end",
+                        "cds_phase")
     check_colnames(splicings, .REQUIRED_COLS, .OPTIONAL_COLS, "splicings")
     ## Check 'tx_id'.
-    .checkForeignKey(splicings$tx_id, "integer", "splicings$tx_id",
-                     transcripts_tx_id, "integer", "transcripts$tx_id")
+    .check_foreign_key(splicings$tx_id, "integer", "splicings$tx_id",
+                       transcripts_tx_id, "integer", "transcripts$tx_id")
     ## Check 'exon_rank'.
     if (!is.numeric(splicings$exon_rank)
      || any(is.na(splicings$exon_rank)))
@@ -114,6 +119,10 @@
         splicings$exon_rank <- as.integer(splicings$exon_rank)
     if (any(splicings$exon_rank <= 0L))
         stop("'splicings$exon_rank' contains non-positive values")
+    ## Check uniqueness of (tx_id, exon_rank) pairs.
+    if (any(S4Vectors:::duplicatedIntegerPairs(splicings$tx_id,
+                                               splicings$exon_rank)))
+        stop("'splicings' must contain unique (tx_id, exon_rank) pairs")
     ## Check 'exon_id'.
     if (has_col(splicings, "exon_id")
      && (!is.integer(splicings$exon_id) || any(is.na(splicings$exon_id))))
@@ -167,8 +176,8 @@
                                                   "splicings$cds_end")
         ## Check 'cds_start' and 'cds_end' compatibility.
         if (!all(is.na(splicings$cds_start) == is.na(splicings$cds_end)))
-            stop("NAs in 'splicings$cds_start' don't match ",
-                 "NAs in 'splicings$cds_end'")
+            stop("NAs in 'splicings$cds_start' and 'splicings$cds_end' ",
+                 "must occur at the same positions")
         if (any(splicings$cds_start > splicings$cds_end, na.rm=TRUE))
             stop("cds starts must be <= cds ends")
         ## Check CDS and exon compatibility.
@@ -185,7 +194,7 @@
             stop("'splicings$cds_id' must be an integer vector")
         if (!all(is.na(splicings$cds_id) == is.na(splicings$cds_start)))
             stop("NAs in 'splicings$cds_id' don't match ",
-                 "NAs in 'splicings$cds_start'")
+                 "those in 'splicings$cds_start' and 'splicings$cds_end'")
     }
     ## Check 'cds_name'.
     if (has_col(splicings, "cds_name")) {
@@ -195,17 +204,24 @@
         if (!.is_character_or_factor(splicings$cds_name))
             stop("'splicings$cds_name' must be a character vector (or factor)")
         if (any(is.na(splicings$cds_name) < is.na(splicings$cds_start)))
-            stop("'splicings$cds_start' and 'splicings$cds_end' contain NAs ",
-                 "where 'splicings$cds_name' doesn't")
+            stop("'splicings$cds_name' must contain NAs at least where ",
+                 "'splicings$cds_start' and 'splicings$cds_end' contain them")
     }
-    if (!has_col(splicings, "cds_start")) {
-        splicings$cds_start <- rep.int(NA_integer_, nrow(splicings))
-        splicings$cds_end <- splicings$cds_start
+    ## Check 'cds_phase'.
+    if (has_col(splicings, "cds_phase")) {
+        if (!has_col(splicings, "cds_start"))
+            stop("'splicings' has a \"cds_phase\" col ",
+                 "but no \"cds_start\"/\"cds_end\" cols")
+        splicings$cds_phase <- .graceful_as_integer(splicings$cds_phase,
+                                                    "splicings$cds_phase")
+        if (!all(is.na(splicings$cds_phase) == is.na(splicings$cds_start)))
+            stop("NAs in 'splicings$cds_phase' don't match ",
+                 "those in 'splicings$cds_start' and 'splicings$cds_end'")
     }
     splicings
 }
 
-.normarg_makeTxDb_genes <- function(genes, transcripts_tx_id)
+.makeTxDb_normarg_genes <- function(genes, transcripts_tx_id)
 {
     if (is.null(genes)) {
         genes <- data.frame(tx_id=transcripts_tx_id[FALSE],
@@ -233,13 +249,13 @@
                                            transcripts_tx_id, "tx_id")
     } else {
         ## Check 'tx_id'.
-        .checkForeignKey(genes$tx_id, "integer", "genes$tx_id",
-                         transcripts_tx_id, "integer", "transcripts$tx_id")
+        .check_foreign_key(genes$tx_id, "integer", "genes$tx_id",
+                           transcripts_tx_id, "integer", "transcripts$tx_id")
     }
     genes
 }
 
-.normarg_makeTxDb_chrominfo <- function(chrominfo, transcripts_tx_chrom,
+.makeTxDb_normarg_chrominfo <- function(chrominfo, transcripts_tx_chrom,
                                         splicings_exon_chrom)
 {
     if (is.null(chrominfo)) {
@@ -267,11 +283,11 @@
              "with no NAs")
     if (any(duplicated(chrominfo$chrom)))
         stop("'chrominfo$chrom' contains duplicated values")
-    .checkForeignKey(transcripts_tx_chrom, NA, "transcripts$tx_chrom",
-                     chrominfo$chrom, NA, "chrominfo$chrom")
+    .check_foreign_key(transcripts_tx_chrom, NA, "transcripts$tx_chrom",
+                       chrominfo$chrom, NA, "chrominfo$chrom")
     if (!is.null(splicings_exon_chrom))
-        .checkForeignKey(splicings_exon_chrom, NA, "splicings$exon_chrom",
-                         chrominfo$chrom, NA, "chrominfo$chrom")
+        .check_foreign_key(splicings_exon_chrom, NA, "splicings$exon_chrom",
+                           chrominfo$chrom, NA, "chrominfo$chrom")
     ## Check 'length'.
     chrominfo$length <- .graceful_as_integer(chrominfo$length,
                                              "chrominfo$length")
@@ -296,8 +312,8 @@
 ### These functions deal with id assignment and reassignment.
 ###
 
-.makeTranscriptsInternalTxId <- function(transcripts, reassign.ids,
-                                         chrominfo_chrom)
+.make_transcripts_internal_tx_id <- function(transcripts, reassign.ids,
+                                             chrominfo_chrom)
 {
     if (!reassign.ids)
         return(transcripts$tx_id)
@@ -309,8 +325,8 @@
                    type=transcripts$tx_type)
 }
 
-.makeSplicingsInternalExonId <- function(splicings, reassign.ids,
-                                         chrominfo_chrom)
+.make_splicings_internal_exon_id <- function(splicings, reassign.ids,
+                                             chrominfo_chrom)
 {
     if (!reassign.ids && has_col(splicings, "exon_id"))
         return(splicings$exon_id)
@@ -322,21 +338,23 @@
                    same.id.for.dups=TRUE)
 }
 
-.makeSplicingsInternalCDSId <- function(splicings, reassign.ids,
-                                        chrominfo_chrom)
+.make_splicings_internal_cds_id <- function(splicings, reassign.ids,
+                                            chrominfo_chrom)
 {
     if (!reassign.ids && has_col(splicings, "cds_id"))
         return(splicings$cds_id)
-    chrom_ids <- match(splicings$exon_chrom, chrominfo_chrom)
-    not_NA <- !is.na(splicings$cds_start)
-    ids <- makeFeatureIds(chrom_ids[not_NA], splicings$exon_strand[not_NA],
-                          splicings$cds_start[not_NA],
-                          splicings$cds_end[not_NA],
-                          name=splicings$cds_name[not_NA],
-                          same.id.for.dups=TRUE)
-    ans <- integer(nrow(splicings))
-    ans[not_NA] <- ids
-    ans[!not_NA] <- NA_integer_
+    ans <- rep.int(NA_integer_, nrow(splicings))
+    if (has_col(splicings, "cds_start")) {
+        not_NA <- !is.na(splicings$cds_start)
+        chrom_ids <- match(splicings$exon_chrom, chrominfo_chrom)
+        ids <- makeFeatureIds(chrom_ids[not_NA],
+                              splicings$exon_strand[not_NA],
+                              splicings$cds_start[not_NA],
+                              splicings$cds_end[not_NA],
+                              name=splicings$cds_name[not_NA],
+                              same.id.for.dups=TRUE)
+        ans[not_NA] <- ids
+    }
     ans
 }
 
@@ -367,12 +385,9 @@
 .write_feature_table <- function(conn, table,
                                  internal_id, name, type,
                                  chrom, strand,
-                                 start, end,
-                                 feature_shortname=NA)
+                                 start, end)
 {
-    if (is.na(feature_shortname))
-        feature_shortname <- table
-    colnames <- makeFeatureColnames(feature_shortname)
+    colnames <- TXDB_table_columns(table)
     if (is.null(name))
         name <- rep.int(NA_character_, length(internal_id))
     data <- data.frame(
@@ -405,15 +420,18 @@
                                   internal_tx_id,
                                   exon_rank,
                                   internal_exon_id,
-                                  internal_cds_id)
+                                  internal_cds_id,
+                                  cds_phase)
 {
+    if (is.null(cds_phase))
+        cds_phase <- rep.int(NA_integer_, length(internal_tx_id))
     data <- data.frame(
         internal_tx_id=internal_tx_id,
         exon_rank=exon_rank,
         internal_exon_id=internal_exon_id,
         internal_cds_id=internal_cds_id,
+        cds_phase=cds_phase,
         check.names=FALSE, stringsAsFactors=FALSE)
-    data <- unique(data)
 
     ## Create the 'splicing' table and related indices.
     SQL <- build_SQL_CREATE_splicing_table()
@@ -479,16 +497,15 @@
     dbWriteTable(conn, "metadata", metadata, row.names=FALSE)
 }
 
-.importTranscripts <- function(conn, transcripts, internal_tx_id)
+.write_transcripts <- function(conn, transcripts, internal_tx_id)
 {
     .write_feature_table(conn, "transcript",
         internal_tx_id, transcripts$tx_name, transcripts$tx_type,
         transcripts$tx_chrom, transcripts$tx_strand,
-        transcripts$tx_start, transcripts$tx_end,
-        feature_shortname="tx")
+        transcripts$tx_start, transcripts$tx_end)
 }
 
-.importExons <- function(conn, splicings, internal_exon_id)
+.write_exons <- function(conn, splicings, internal_exon_id)
 {
     .write_feature_table(conn, "exon",
         internal_exon_id, splicings$exon_name, NULL,
@@ -496,15 +513,20 @@
         splicings$exon_start, splicings$exon_end)
 }
 
-.importCDS <- function(conn, splicings, internal_cds_id)
+.write_cds <- function(conn, splicings, internal_cds_id)
 {
     not_NA <- !is.na(internal_cds_id)
     internal_cds_id <- internal_cds_id[not_NA]
-    cds_name <- splicings$cds_name[not_NA]
-    cds_chrom <- splicings$exon_chrom[not_NA]
-    cds_strand <- splicings$exon_strand[not_NA]
-    cds_start <- splicings$cds_start[not_NA]
-    cds_end <- splicings$cds_end[not_NA]
+    if (!has_col(splicings, "cds_start")) {
+        cds_name <- cds_chrom <- cds_strand <- character(0)
+        cds_start <- cds_end <- integer(0)
+    } else {
+        cds_name <- splicings$cds_name[not_NA]
+        cds_chrom <- splicings$exon_chrom[not_NA]
+        cds_strand <- splicings$exon_strand[not_NA]
+        cds_start <- splicings$cds_start[not_NA]
+        cds_end <- splicings$cds_end[not_NA]
+    }
     .write_feature_table(conn, "cds",
         internal_cds_id, cds_name, NULL,
         cds_chrom, cds_strand,
@@ -522,16 +544,17 @@ makeTxDb <- function(transcripts, splicings,
 {
     if (!isTRUEorFALSE(reassign.ids))
         stop("'reassign.ids' must be TRUE or FALSE")
-    transcripts <- .normarg_makeTxDb_transcripts(transcripts)
+    transcripts <- .makeTxDb_normarg_transcripts(transcripts)
     transcripts_tx_id <- transcripts$tx_id  # guaranteed to be unique
     names(transcripts_tx_id) <- transcripts$tx_name
-    splicings <- .normarg_makeTxDb_splicings(splicings, transcripts_tx_id)
-    genes <- .normarg_makeTxDb_genes(genes, transcripts_tx_id)
-    chrominfo <- .normarg_makeTxDb_chrominfo(chrominfo, transcripts$tx_chrom,
+    splicings <- .makeTxDb_normarg_splicings(splicings, transcripts_tx_id)
+    genes <- .makeTxDb_normarg_genes(genes, transcripts_tx_id)
+    chrominfo <- .makeTxDb_normarg_chrominfo(chrominfo, transcripts$tx_chrom,
                                              splicings$exon_chrom)
-    transcripts_internal_tx_id <- .makeTranscriptsInternalTxId(transcripts,
-                                                               reassign.ids,
-                                                               chrominfo$chrom)
+    transcripts_internal_tx_id <- .make_transcripts_internal_tx_id(
+                                             transcripts,
+                                             reassign.ids,
+                                             chrominfo$chrom)
     splicings_internal_tx_id <- translateIds(transcripts_tx_id,
                                              transcripts_internal_tx_id,
                                              splicings$tx_id)
@@ -546,23 +569,26 @@ makeTxDb <- function(transcripts, splicings,
         splicings$exon_chrom <- transcripts$tx_chrom[splicings2transcripts]
     if (!has_col(splicings, "exon_strand"))
         splicings$exon_strand <- transcripts$tx_strand[splicings2transcripts]
-    splicings_internal_exon_id <- .makeSplicingsInternalExonId(splicings,
-                                                               reassign.ids,
-                                                               chrominfo$chrom)
-    splicings_internal_cds_id <- .makeSplicingsInternalCDSId(splicings,
-                                                             reassign.ids,
-                                                             chrominfo$chrom)
+    splicings_internal_exon_id <- .make_splicings_internal_exon_id(
+                                             splicings,
+                                             reassign.ids,
+                                             chrominfo$chrom)
+    splicings_internal_cds_id <- .make_splicings_internal_cds_id(
+                                             splicings,
+                                             reassign.ids,
+                                             chrominfo$chrom)
     ## Create the db in a temp file.
     conn <- dbConnect(SQLite(), dbname="")
     .write_chrominfo_table(conn, chrominfo)  # must come first
-    .importTranscripts(conn, transcripts, transcripts_internal_tx_id)
-    .importExons(conn, splicings, splicings_internal_exon_id)
-    .importCDS(conn, splicings, splicings_internal_cds_id)
+    .write_transcripts(conn, transcripts, transcripts_internal_tx_id)
+    .write_exons(conn, splicings, splicings_internal_exon_id)
+    .write_cds(conn, splicings, splicings_internal_cds_id)
     .write_splicing_table(conn,
                           splicings_internal_tx_id,
                           splicings$exon_rank,
                           splicings_internal_exon_id,
-                          splicings_internal_cds_id)
+                          splicings_internal_cds_id,
+                          splicings$cds_phase)
     .write_gene_table(conn, genes$gene_id, genes_internal_tx_id)
     .write_metadata_table(conn, metadata)  # must come last!
     TxDb(conn)
diff --git a/R/makeTxDbFromEnsembl.R b/R/makeTxDbFromEnsembl.R
new file mode 100644
index 0000000..146dcb2
--- /dev/null
+++ b/R/makeTxDbFromEnsembl.R
@@ -0,0 +1,301 @@
+### =========================================================================
+### makeTxDbFromEnsembl()
+### -------------------------------------------------------------------------
+###
+### List of Ensembl public MySQL servers / ports
+###   https://www.ensembl.org/info/data/mysql.html
+### Ensembl Core schema
+###   https://www.ensembl.org/info/docs/api/core/core_schema.html
+###
+
+
+.normarg_organism <- function(organism)
+{
+    if (!isSingleString(organism))
+        stop("'organism' must be a single string")
+    ## Remove extra spaces.
+    tmp <- strsplit(organism, " ")[[1L]]
+    paste(tmp[nzchar(tmp)], collapse=" ")
+}
+
+.dbname2release <- function(dbname)
+    as.integer(gsub("^.*_core_([0-9]+)_.*$", "\\1", dbname))
+
+.lookup_dbname <- function(organism, release=NA)
+{
+    organism <- .normarg_organism(organism)
+    if (!isSingleNumberOrNA(release))
+        stop("'release' must be a valid Ensembl release number or NA")
+    available_dbs <- Ensembl_listMySQLCoreDirs(release=release)
+    prefix <- paste0(gsub(" ", "_", tolower(organism), fixed=TRUE), "_core_")
+    i <- match(prefix, substr(available_dbs, 1L, nchar(prefix)))
+    dbname <- available_dbs[[i]]
+    if (!is.na(release))  # sanity check
+        stopifnot(.dbname2release(dbname) == release)
+    dbname
+}
+
+.fix_numeric_cols <- function(df)
+{
+    col_idx <- which(sapply(df, is.numeric))
+    df[col_idx] <- lapply(df[col_idx], as.integer)
+    df
+}
+
+.RMySQL_select <- function(dbconn, columns, from)
+{
+    SQL <- sprintf("SELECT %s FROM %s", paste0(columns, collapse=","), from)
+    ## Not sure systematic conversion of numeric to int is actually a
+    ## good idea (risk of overflow?)
+    .fix_numeric_cols(suppressWarnings(dbGetQuery(dbconn, SQL)))
+}
+
+.seq_region_columns <- c(
+    "seq_region_id",
+    "seq_region_start",
+    "seq_region_end",
+    "seq_region_strand"
+)
+
+.fetch_Ensembl_transcripts <- function(dbconn)
+{
+    message("Fetch transcripts and genes from Ensembl ... ",
+            appendLF=FALSE)
+    transcript_columns <- c(
+        "transcript_id",
+        "stable_id",
+        .seq_region_columns,
+        "biotype"
+    )
+    columns <- c(paste0("transcript.", transcript_columns), "gene.stable_id")
+    from <- "transcript LEFT JOIN gene USING(gene_id)"
+    transcripts <- .RMySQL_select(dbconn, columns, from)
+    colnames(transcripts) <- c("tx_id",
+                               "tx_name",
+                               "seq_region_id",
+                               "tx_start",
+                               "tx_end",
+                               "tx_strand",
+                               "tx_type",
+                               "gene_id")
+    transcripts$tx_strand <- strand(transcripts$tx_strand)
+    message("OK")
+    transcripts
+}
+
+.fetch_Ensembl_translations <- function(dbconn)
+{
+    columns <- c(
+        "stable_id",
+        "start_exon_id",   # ==> exon.exon_id
+        "seq_start",       # relative to first exon
+        "end_exon_id",     # ==> exon.exon_id
+        "seq_end",         # relative to last exon
+        "transcript_id"
+    )
+    .RMySQL_select(dbconn, columns, "translation")
+}
+
+### 'has_cds' must be a logical vector.
+### 'tx_id' must be an atomic vector parallel to 'has_cds'.
+.has_cds <- function(has_cds, tx_id)
+{
+    stopifnot(length(has_cds) == length(tx_id))
+    breakpoints <- cumsum(runLength(Rle(tx_id)))
+    partitioning <- PartitioningByEnd(breakpoints)
+    ## List of relative CDS indices.
+    ridx_list <- which(relist(has_cds, partitioning))
+    idx0 <- which(lengths(ridx_list) == 0L)
+    min_ridx <- replaceROWS(min(ridx_list), idx0, 1L)
+    max_ridx <- replaceROWS(max(ridx_list), idx0, 0L)
+    ## Absolute CDS index.
+    aidx <- shift(IRanges(min_ridx, max_ridx), start(partitioning) - 1L)
+    has_cds <- logical(length(tx_id))
+    replaceROWS(has_cds, aidx, TRUE)
+}
+
+### Add "cds_name", "cds_start", and "cds_end" cols to 'splicings'.
+.add_cds_cols <- function(dbconn, splicings)
+{
+    translations <- .fetch_Ensembl_translations(dbconn)
+
+    m <- match(splicings$tx_id, translations$transcript_id)
+    cds_name <- translations$stable_id[m]
+
+    m1 <- S4Vectors:::matchIntegerPairs(splicings$tx_id,
+                                        splicings$exon_id,
+                                        translations$transcript_id,
+                                        translations$start_exon_id)
+    offset1 <- translations$seq_start[m1] - 1L
+    m2 <- S4Vectors:::matchIntegerPairs(splicings$tx_id,
+                                        splicings$exon_id,
+                                        translations$transcript_id,
+                                        translations$end_exon_id)
+    offset2 <- translations$seq_end[m2] - 1L
+
+    cds_start <- ifelse(splicings$exon_strand == "+",
+                        splicings$exon_start + offset1,
+                        splicings$exon_end - offset2)
+    cds_end <- ifelse(splicings$exon_strand == "+",
+                      splicings$exon_start + offset2,
+                      splicings$exon_end - offset1)
+
+    has_cds <- .has_cds(!(is.na(cds_start) & is.na(cds_end)), splicings$tx_id)
+    cds_name[!has_cds] <- NA
+    idx <- which(has_cds & is.na(cds_start))
+    cds_start[idx] <- splicings$exon_start[idx]
+    idx <- which(has_cds & is.na(cds_end))
+    cds_end[idx] <- splicings$exon_end[idx]
+
+    cbind(splicings, cds_name, cds_start, cds_end, stringsAsFactors=FALSE)
+}
+
+.fetch_Ensembl_splicings <- function(dbconn)
+{
+    message("Fetch exons and CDS from Ensembl ... ",
+            appendLF=FALSE)
+    exon_columns <- c(
+        "exon_id",
+        "stable_id",
+        .seq_region_columns
+    )
+    columns <- c("transcript_id", "rank", exon_columns)
+    from <- "exon_transcript INNER JOIN exon USING(exon_id)"
+    splicings <- .RMySQL_select(dbconn, columns, from)
+    colnames(splicings) <- c("tx_id",
+                             "exon_rank",
+                             "exon_id",
+                             "exon_name",
+                             "seq_region_id",
+                             "exon_start",
+                             "exon_end",
+                             "exon_strand")
+    splicings$exon_strand <- strand(splicings$exon_strand)
+    oo <- S4Vectors:::orderIntegerPairs(splicings$tx_id, splicings$exon_rank)
+    splicings <- S4Vectors:::extract_data_frame_rows(splicings, oo)
+    splicings <- .add_cds_cols(dbconn, splicings)
+    message("OK")
+    splicings
+}
+
+.get_toplevel_seq_region_ids <- function(dbconn)
+{
+    ## FIXME: id0 is hard-coded to 6 for now.
+    ## See .Ensembl_fetchAttribTypeIdForTopLevelSequence() for how to
+    ## extract this value from Ensembl
+    id0 <- 6L
+    columns <- c("seq_region_id", "attrib_type_id", "value")
+    seq_region_attrib <- .RMySQL_select(dbconn, columns, "seq_region_attrib")
+    seq_region_attrib$seq_region_id[seq_region_attrib$attrib_type_id == id0]
+}
+
+.fetch_Ensembl_chrominfo <- function(dbconn, seq_region_ids=NULL,
+                                     circ_seqs=DEFAULT_CIRC_SEQS)
+{
+    message("Fetch chromosome names and lengths from Ensembl ...",
+            appendLF=FALSE)
+    seq_region_columns <- c(
+        "seq_region_id",
+        "name",
+        "coord_system_id",
+        "length"
+    )
+    coord_system_columns <- c(
+        "coord_system_id",
+        "species_id",
+        "name",
+        "version",
+        "rank",
+        "attrib"
+    )
+    using_column <- "coord_system_id"
+    joined_columns <- c(using_column,
+                        setdiff(seq_region_columns, using_column),
+                        setdiff(coord_system_columns, using_column))
+    from <- "seq_region INNER JOIN coord_system USING(coord_system_id)"
+    seq_region <- .RMySQL_select(dbconn, "*", from)
+    stopifnot(identical(colnames(seq_region), joined_columns))
+    colnames(seq_region)[6:9] <- paste0("coord_system_",
+                                        colnames(seq_region)[6:9])
+    top_level_ids <- .get_toplevel_seq_region_ids(dbconn)
+    chromlengths <- extract_chromlengths_from_seq_region(
+                                         seq_region,
+                                         top_level_ids,
+                                         seq_region_ids=seq_region_ids)
+    chrominfo <- data.frame(
+        seq_region_id=chromlengths$seq_region_id,
+        chrom=chromlengths$name,
+        length=chromlengths$length,
+        is_circular=make_circ_flags_from_circ_seqs(chromlengths$name,
+                                                   circ_seqs)
+    )
+    message("OK")
+    chrominfo
+}
+
+.gather_Ensembl_metadata <- function(organism, dbname, server)
+{
+    message("Gather the metadata ... ", appendLF=FALSE)
+    release <- .dbname2release(dbname)
+    metadata <- data.frame(name=c("Data source",
+                                  "Organism",
+                                  "Ensembl release",
+                                  "Ensembl database",
+                                  "MySQL server"),
+                           value=c("Ensembl",
+                                   organism,
+                                   release,
+                                   dbname,
+                                   server),
+                           stringsAsFactors=FALSE)
+    message("OK")
+    metadata
+}
+
+### Always set 'server' to "useastdb.ensembl.org" in the examples so that
+### they run fast on the build machines (which are located on the East Coast).
+makeTxDbFromEnsembl <- function(organism="Homo sapiens",
+                                release=NA,
+                                circ_seqs=DEFAULT_CIRC_SEQS,
+                                server="ensembldb.ensembl.org")
+{
+    dbname <- .lookup_dbname(organism, release=release)
+    dbconn <- dbConnect(MySQL(), dbname=dbname,
+                                 username="anonymous",
+                                 host=server)
+    on.exit(dbDisconnect(dbconn))
+
+    transcripts <- .fetch_Ensembl_transcripts(dbconn)
+
+    genes <- transcripts[ , c("tx_name", "gene_id")]
+    transcripts$gene_id <- NULL
+
+    splicings <- .fetch_Ensembl_splicings(dbconn)
+
+    seq_region_ids <- unique(c(transcripts$seq_region_id,
+                               splicings$seq_region_id))
+
+    chrominfo <- .fetch_Ensembl_chrominfo(dbconn,
+                                          seq_region_ids=seq_region_ids,
+                                          circ_seqs=circ_seqs)
+
+    m <- match(transcripts$seq_region_id, chrominfo$seq_region_id)
+    transcripts$tx_chrom <- chrominfo$chrom[m]
+    transcripts$seq_region_id <- NULL
+
+    m <- match(splicings$seq_region_id, chrominfo$seq_region_id)
+    splicings$exon_chrom <- chrominfo$chrom[m]
+    splicings$seq_region_id <- NULL
+
+    chrominfo$seq_region_id <- NULL
+
+    metadata <- .gather_Ensembl_metadata(organism, dbname, server)
+
+    message("Make the TxDb object ... ", appendLF=FALSE)
+    txdb <- makeTxDb(transcripts, splicings,
+                     genes=genes, chrominfo=chrominfo,
+                     metadata=metadata, reassign.ids=TRUE)
+    message("OK")
+    txdb
+}
+
diff --git a/R/makeTxDbFromGRanges.R b/R/makeTxDbFromGRanges.R
index c3c42dc..6c5839a 100644
--- a/R/makeTxDbFromGRanges.R
+++ b/R/makeTxDbFromGRanges.R
@@ -41,13 +41,13 @@
 ###   - required: type, ID
 ###   - optional: Parent, Name, Dbxref, geneID
 ### Used in R/makeTxDbFromGFF.R
-GFF3_COLNAMES <- c("type", "ID", "Parent", "Name", "Dbxref", "geneID")
+GFF3_COLNAMES <- c("type", "phase", "ID", "Parent", "Name", "Dbxref", "geneID")
 
 ### Expected metadata columns for GRanges in GTF format:
 ###   - required: type, gene_id, transcript_id
 ###   - optional: exon_id
 ### Used in R/makeTxDbFromGFF.R
-GTF_COLNAMES <- c("type", "gene_id", "transcript_id", "exon_id")
+GTF_COLNAMES <- c("type", "phase", "gene_id", "transcript_id", "exon_id")
 
 .GENE_TYPES <- c("gene", "pseudogene", "transposable_element_gene")
 .TX_TYPES <- c("transcript", "pseudogenic_transcript", "primary_transcript",
@@ -82,6 +82,14 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
     factor(type, levels=levels_in_use)
 }
 
+.get_phase <- function(gr_mcols)
+{
+    phase <- gr_mcols$phase
+    if (!is.null(phase) && !is.integer(phase))
+        stop(wmsg("the \"phase\" metadata column must be an integer vector"))
+    phase
+}
+
 ### Return a character vector or NULL.
 .get_gene_id <- function(gr_mcols)
 {
@@ -239,9 +247,24 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
     which(type %in% .GENE_TYPES)
 }
 
-.get_cds_IDX <- function(type)
+.get_cds_IDX <- function(type, phase)
 {
-    which(type %in% .CDS_TYPES)
+    is_cds <- type %in% .CDS_TYPES
+    if (!is.null(phase)) {
+        if (S4Vectors:::anyMissingOrOutside(phase[is_cds], 0L, 2L))
+            stop(wmsg("the \"phase\" metadata column must contain 0, 1, or 2, ",
+                      "for all the CDS features"))
+        types_with_phase <- type[!is.na(phase) & type %in% GFF_FEATURE_TYPES]
+        types_with_phase <- setdiff(as.character(unique(types_with_phase)),
+                                    .CDS_TYPES)
+        if (length(types_with_phase) != 0L) {
+            in1string <- paste0(as.character(types_with_phase), collapse=", ")
+            warning(wmsg("The \"phase\" metadata column contains non-NA ",
+                         "values for features of type ", in1string,
+                         ". This information was ignored."))
+        }
+    }
+    which(is_cds)
 }
 
 ### Returns the index of CDS whose Parent is a gene (this happens with some
@@ -411,7 +434,7 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
 ### Can be used to extract exons, cds, or stop codons.
 .extract_exons_from_GRanges <- function(exon_IDX, gr, ID, Name, Parent,
                                         feature=c("exon", "cds", "stop_codon"),
-                                        gtf.format=FALSE)
+                                        gtf.format=FALSE, phase=NULL)
 {
     feature <- match.arg(feature)
     what <- switch(feature, exon="exon", cds="CDS", stop_codon="stop codon")
@@ -427,6 +450,8 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
     exon_strand <- rep.int(strand(gr)[exon_IDX], nparent_per_ex)
     exon_start <- rep.int(start(gr)[exon_IDX], nparent_per_ex)
     exon_end <- rep.int(end(gr)[exon_IDX], nparent_per_ex)
+    if (feature == "cds" && !is.null(phase))
+        cds_phase <- rep.int(phase[exon_IDX], nparent_per_ex)
 
     if (!gtf.format) {
         ## Drop orphan exons (or orphan cds or stop codons).
@@ -450,6 +475,8 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
             exon_strand <- exon_strand[keep_idx]
             exon_start <- exon_start[keep_idx]
             exon_end <- exon_end[keep_idx]
+            if (feature == "cds" && !is.null(phase))
+                cds_phase <- cds_phase[keep_idx]
         }
     }
 
@@ -472,6 +499,8 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
         exons$exon_rank <- exon_rank
     } else {
         colnames(exons) <- sub("^exon_", "", colnames(exons))
+        if (feature == "cds" && !is.null(phase))
+            exons$phase <- cds_phase
     }
     exons
 }
@@ -696,8 +725,11 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
 
     exon2cds <- .find_exon_cds(exons, cds)
     cds_name <- cds$name[exon2cds]
+    cds_strand <- cds$strand[exon2cds]
     cds_start <- cds$start[exon2cds]
     cds_end <- cds$end[exon2cds]
+    if (!is.null(cds$phase))
+        cds_phase <- cds$phase[exon2cds]
 
     if (!is.null(stop_codons)) {
         stop_codons_tx_id <- factor(stop_codons$tx_id,
@@ -709,17 +741,21 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
         exon2stop_codon <- .find_exon_cds(exons, stop_codons,
                                           what="stop codon")
         stop_codon_name <- stop_codons$name[exon2stop_codon]
+        stop_codon_strand <- stop_codons$strand[exon2stop_codon]
         stop_codon_start <- stop_codons$start[exon2stop_codon]
         stop_codon_end <- stop_codons$end[exon2stop_codon]
 
-        ## Exons with no CDS get the stop codon as CDS.
+        ## Exons with no CDS get the stop codon as CDS (with phase set to 0).
         replace_idx <- which(is.na(exon2cds) & !is.na(exon2stop_codon))
         cds_name[replace_idx] <- stop_codon_name[replace_idx]
+        cds_strand[replace_idx] <- stop_codon_strand[replace_idx]
         cds_start[replace_idx] <- stop_codon_start[replace_idx]
         cds_end[replace_idx] <- stop_codon_end[replace_idx]
+        if (!is.null(cds$phase))
+            cds_phase[replace_idx] <- 0L
 
         ## Exons with a CDS and a stop codon have the latter merged into the
-        ## former.
+        ## former (with phase adjusted).
         merge_idx <- which(!is.na(exon2cds) & !is.na(exon2stop_codon))
         start1 <- cds_start[merge_idx]
         end1 <- cds_end[merge_idx]
@@ -733,8 +769,20 @@ GFF_FEATURE_TYPES <- c(.GENE_TYPES, .TX_TYPES, .EXON_TYPES,
         }
         cds_start[merge_idx] <- pmin.int(start1, start2)
         cds_end[merge_idx] <- pmax.int(end1, end2)
+        ## Set phase to 0 if stop codon is upstream of CDS (i.e.
+        ## if start2 < start1 on + strand and end2 > end1 if on - strand).
+        if (!is.null(cds$phase)) {
+            strand1 <- cds_strand[merge_idx]
+            cds_phase[merge_idx] <- ifelse((strand1 == "+" & start2 < start1) |
+                                           (strand1 == "-" & end2 > end1),
+                                           0L, cds_phase[merge_idx])
+        }
     }
-    cbind(exons, cds_name, cds_start, cds_end, stringsAsFactors=FALSE)
+    splicings <- cbind(exons, cds_name, cds_start, cds_end,
+                       stringsAsFactors=FALSE)
+    if (!is.null(cds$phase))
+        splicings$cds_phase <- cds_phase
+    splicings
 }
 
 
@@ -887,6 +935,7 @@ makeTxDbFromGRanges <- function(gr, drop.stop.codons=FALSE, metadata=NULL,
 
     ## Get the metadata columns of interest.
     type <- .get_type(gr_mcols)
+    phase <- .get_phase(gr_mcols)
     gene_id <- .get_gene_id(gr_mcols)
     transcript_id <- .get_transcript_id(gr_mcols, gene_id, type)
     gtf.format <- .is_gtf_format(gr_mcols$ID, gene_id, transcript_id)
@@ -898,7 +947,7 @@ makeTxDbFromGRanges <- function(gr, drop.stop.codons=FALSE, metadata=NULL,
 
     ## Get the gene, cds, stop_codon, exon, and transcript indices.
     gene_IDX <- .get_gene_IDX(type)
-    cds_IDX <- .get_cds_IDX(type)
+    cds_IDX <- .get_cds_IDX(type, phase)
     cds_with_gene_parent_IDX <- .get_cds_with_gene_parent_IDX(cds_IDX,
                                           Parent, gene_IDX, ID,
                                           gtf.format=gtf.format)
@@ -932,7 +981,8 @@ makeTxDbFromGRanges <- function(gr, drop.stop.codons=FALSE, metadata=NULL,
     exons <- .extract_exons_from_GRanges(exon_IDX, gr, ID, Name, Parent,
                                    feature="exon", gtf.format=gtf.format)
     cds <- .extract_exons_from_GRanges(cds_IDX, gr, ID, Name, Parent,
-                                   feature="cds", gtf.format=gtf.format)
+                                   feature="cds", gtf.format=gtf.format,
+                                   phase=phase)
     if (!drop.stop.codons) {
         stop_codons <- .extract_exons_from_GRanges(stop_codon_IDX,
                                    gr, ID, Name, Parent,
diff --git a/R/transcriptsByOverlaps.R b/R/transcriptsByOverlaps.R
index c09d2c5..fa680f7 100644
--- a/R/transcriptsByOverlaps.R
+++ b/R/transcriptsByOverlaps.R
@@ -1,13 +1,13 @@
 ###
 
 setGeneric("transcriptsByOverlaps", signature="x",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"), ...)
         standardGeneric("transcriptsByOverlaps")
 )
 
 setMethod("transcriptsByOverlaps", "TxDb",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"),
              columns = c("tx_id", "tx_name"))
         subsetByOverlaps(transcripts(x, columns = columns), ranges,
@@ -16,13 +16,13 @@ setMethod("transcriptsByOverlaps", "TxDb",
 )
 
 setGeneric("exonsByOverlaps", signature="x",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"), ...)
         standardGeneric("exonsByOverlaps")
 )
 
 setMethod("exonsByOverlaps", "TxDb",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"),
              columns = "exon_id")
         subsetByOverlaps(exons(x, columns = columns), ranges,
@@ -31,13 +31,13 @@ setMethod("exonsByOverlaps", "TxDb",
 )
 
 setGeneric("cdsByOverlaps", signature="x",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"), ...)
         standardGeneric("cdsByOverlaps")
 )
 
 setMethod("cdsByOverlaps", "TxDb",
-    function(x, ranges, maxgap = 0L, minoverlap = 1L,
+    function(x, ranges, maxgap = -1L, minoverlap = 0L,
              type = c("any", "start", "end"),
              columns = "cds_id")
         subsetByOverlaps(cds(x, columns = columns), ranges,
diff --git a/build/vignette.rds b/build/vignette.rds
index cce68e7..f4a1e88 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/doc/GenomicFeatures.R b/inst/doc/GenomicFeatures.R
index 18232c4..a7a1fae 100644
--- a/inst/doc/GenomicFeatures.R
+++ b/inst/doc/GenomicFeatures.R
@@ -1,111 +1,122 @@
-## ----style, eval=TRUE, echo=FALSE, results='asis'---------------------------------------
+## ----style, eval=TRUE, echo=FALSE, results='asis'--------------------------
 BiocStyle::latex()
 
-## ----loadGenomicFeatures----------------------------------------------------------------
-library("GenomicFeatures")
+## ----loadGenomicFeatures---------------------------------------------------
+suppressPackageStartupMessages(library('GenomicFeatures'))
 
-## ----loadDb-----------------------------------------------------------------------------
+## ----loadDb----------------------------------------------------------------
 samplefile <- system.file("extdata", "hg19_knownGene_sample.sqlite",
                           package="GenomicFeatures")
 txdb <- loadDb(samplefile)
 txdb
 
-## ----loadPackage------------------------------------------------------------------------
+## ----loadPackage-----------------------------------------------------------
 library(TxDb.Hsapiens.UCSC.hg19.knownGene)
 txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene #shorthand (for convenience)
 txdb
 
-## ----seqlevels--------------------------------------------------------------------------
+## ----seqlevels-------------------------------------------------------------
 head(seqlevels(txdb))
 
-## ----seqlevels2-------------------------------------------------------------------------
+## ----seqlevels2------------------------------------------------------------
 seqlevels(txdb) <- "chr1"
 
-## ----seqlevels3-------------------------------------------------------------------------
+## ----seqlevels3------------------------------------------------------------
 seqlevels(txdb) <- seqlevels0(txdb)
 
-## ----seqlevels4-------------------------------------------------------------------------
+## ----seqlevels4------------------------------------------------------------
 seqlevels(txdb) <- "chr15"
+seqlevels(txdb)
 
-## ----selectExample----------------------------------------------------------------------
+## ----selectExample---------------------------------------------------------
 keys <- c("100033416", "100033417", "100033420")
 columns(txdb)
 keytypes(txdb)
 select(txdb, keys = keys, columns="TXNAME", keytype="GENEID")
 
-## ----selectExercise---------------------------------------------------------------------
+## ----selectExercise--------------------------------------------------------
 columns(txdb)
 cols <- c("TXNAME", "TXSTRAND", "TXCHROM")
 select(txdb, keys=keys, columns=cols, keytype="GENEID")
 
-## ----transcripts1-----------------------------------------------------------------------
+## ----transcripts1----------------------------------------------------------
 GR <- transcripts(txdb)
 GR[1:3]
 
-## ----transcripts2-----------------------------------------------------------------------
+## ----transcripts2----------------------------------------------------------
+tx_strand <- strand(GR)
+tx_strand
+sum(runLength(tx_strand))
+length(GR)
+
+## ----transcripts3----------------------------------------------------------
 GR <- transcripts(txdb, filter=list(tx_chrom = "chr15", tx_strand = "+"))
 length(GR)
 unique(strand(GR))
 
-## ----exonsExer1-------------------------------------------------------------------------
+## ----transcripts4----------------------------------------------------------
+PR <- promoters(txdb, upstream=2000, downstream=400)
+PR
+
+## ----exonsExer1------------------------------------------------------------
 EX <- exons(txdb)
 EX[1:4]
 length(EX)
 length(GR)
 
-## ----transcriptsBy----------------------------------------------------------------------
+## ----transcriptsBy---------------------------------------------------------
 GRList <- transcriptsBy(txdb, by = "gene")
 length(GRList)
 names(GRList)[10:13]
 GRList[11:12]
 
-## ----exonsBy----------------------------------------------------------------------------
+## ----exonsBy---------------------------------------------------------------
 GRList <- exonsBy(txdb, by = "tx")
 length(GRList)
 names(GRList)[10:13]
 GRList[[12]]
 
-## ----internalID-------------------------------------------------------------------------
+## ----internalID------------------------------------------------------------
 GRList <- exonsBy(txdb, by = "tx")
 tx_ids <- names(GRList)
 head(select(txdb, keys=tx_ids, columns="TXNAME", keytype="TXID"))
 
-## ----introns-UTRs-----------------------------------------------------------------------
+## ----introns-UTRs----------------------------------------------------------
 length(intronsByTranscript(txdb))
 length(fiveUTRsByTranscript(txdb))
 length(threeUTRsByTranscript(txdb))
 
-## ----extract----------------------------------------------------------------------------
+## ----extract---------------------------------------------------------------
 library(BSgenome.Hsapiens.UCSC.hg19)
 tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene,
                                   use.names=TRUE)
 
-## ----translate1-------------------------------------------------------------------------
+## ----translate1------------------------------------------------------------
 suppressWarnings(translate(tx_seqs1))
 
-## ----betterTranslation------------------------------------------------------------------
+## ----betterTranslation-----------------------------------------------------
 cds_seqs <- extractTranscriptSeqs(Hsapiens,
                                   cdsBy(txdb, by="tx", use.names=TRUE))
 translate(cds_seqs)
 
-## ----supportedUCSCtables----------------------------------------------------------------
+## ----supportedUCSCtables---------------------------------------------------
 supportedUCSCtables(genome="mm9")
 
-## ----makeTxDbFromUCSC, eval=FALSE-------------------------------------------------------
+## ----makeTxDbFromUCSC, eval=FALSE------------------------------------------
 #  mm9KG_txdb <- makeTxDbFromUCSC(genome="mm9", tablename="knownGene")
 
-## ----discoverChromNames-----------------------------------------------------------------
+## ----discoverChromNames----------------------------------------------------
 head(getChromInfoFromUCSC("hg19"))
 
-## ----makeTxDbFromBiomart, eval=FALSE----------------------------------------------------
+## ----makeTxDbFromBiomart, eval=FALSE---------------------------------------
 #  mmusculusEnsembl <- makeTxDbFromBiomart(dataset="mmusculus_gene_ensembl")
 
-## ----saveDb-1, eval=FALSE---------------------------------------------------------------
+## ----saveDb-1, eval=FALSE--------------------------------------------------
 #  saveDb(mm9KG_txdb, file="fileName.sqlite")
 
-## ----loadDb-1, eval=FALSE---------------------------------------------------------------
+## ----loadDb-1, eval=FALSE--------------------------------------------------
 #  mm9KG_txdb <- loadDb("fileName.sqlite")
 
-## ----SessionInfo, echo=FALSE------------------------------------------------------------
+## ----SessionInfo, echo=FALSE-----------------------------------------------
 sessionInfo()
 
diff --git a/inst/doc/GenomicFeatures.Rnw b/inst/doc/GenomicFeatures.Rnw
index 2f2f047..4ee0386 100644
--- a/inst/doc/GenomicFeatures.Rnw
+++ b/inst/doc/GenomicFeatures.Rnw
@@ -43,11 +43,10 @@ BioMart\footnote{\url{http://www.biomart.org/}} data resources. The
 package is useful for ChIP-chip, ChIP-seq, and RNA-seq analyses.
 
 <<loadGenomicFeatures>>=
-library("GenomicFeatures")
+suppressPackageStartupMessages(library('GenomicFeatures'))
 @
 
 
-
 \section{\Rclass{TxDb} Objects}
 
 The \Rpackage{GenomicFeatures} package uses \Rclass{TxDb}
@@ -66,7 +65,6 @@ gene identifiers.
 
 \section{Retrieving Data from \Rclass{TxDb} objects}
 
-
 \subsection{Loading Transcript Data}
 
 There are two ways that users can load pre-existing data to generate a
@@ -90,7 +88,6 @@ txdb
 In this case, the \Rclass{TxDb} object has been returned by
 the \Rfunction{loadDb} method.
 
-
 More commonly however, we expect that users will just load a
 TxDb annotation package like this:
 
@@ -105,16 +102,12 @@ object, and by default that object will have the same name as the
 package itself.
 
 
-
-
-
 \subsection{Pre-filtering data based on Chromosomes}
 It is possible to filter the data that is returned from a
 \Rclass{TxDb} object based on it's chromosome.  This can be a
 useful way to limit the things that are returned if you are only
 interested in studying a handful of chromosomes.
 
-
 To determine which chromosomes are currently active, use the
 \Rfunction{seqlevels} method.  For example:
 
@@ -141,7 +134,6 @@ to the seqlevels stored in the db), then set the seqlevels to
 seqlevels(txdb) <- seqlevels0(txdb)
 @ 
 
-
 \begin{Exercise}
 Use \Rfunction{seqlevels} to set only chromsome 15 to be active.  BTW,
 the rest of this vignette will assume you have succeeded at this.
@@ -149,11 +141,11 @@ the rest of this vignette will assume you have succeeded at this.
 \begin{Solution}
 <<seqlevels4>>=
 seqlevels(txdb) <- "chr15"
+seqlevels(txdb)
 @
 \end{Solution}
 
 
-
 \subsection{Retrieving data using the select method}
 
 The \Rclass{TxDb} objects inherit from \Rclass{AnnotationDb}
@@ -189,9 +181,6 @@ select(txdb, keys=keys, columns=cols, keytype="GENEID")
 \end{Solution}
 
 
-
-%% TODO: add exercises to the sections that follow
-
 \subsection{Methods for returning \Rclass{GRanges} objects}
 
 Retrieving data with select is useful, but sometimes it is more
@@ -223,17 +212,40 @@ strand on the left side and then present related metadata on the right
 side.  At the bottom, the seqlengths display all the possible seqnames
 along with the length of each sequence.
 
+The \Rfunction{strand} function is used to obtain the strand 
+information from the transcripts.  The sum of the Lengths of the 
+\Rclass{Rle} object that \Rfunction{strand} returns is equal to the 
+length of the \Rclass{GRanges} object.
+
+<<transcripts2>>=
+tx_strand <- strand(GR)
+tx_strand
+sum(runLength(tx_strand))
+length(GR)
+@
 
 In addition, the \Rfunction{transcripts} function can also be used to
 retrieve a subset of the transcripts available such as those on the
 $+$-strand of chromosome 1.
 
-<<transcripts2>>=
+<<transcripts3>>=
 GR <- transcripts(txdb, filter=list(tx_chrom = "chr15", tx_strand = "+"))
 length(GR)
 unique(strand(GR))
 @
 
+The \Rfunction{promoters} function computes a \Rclass{GRanges} object 
+that spans the promoter region around the transcription start site 
+for the transcripts in a \Rclass{TxDb} object.  The \Rcode{upstream} 
+and \Rcode{downstream} arguments define the number of bases upstream 
+and downstream from the transcription start site that make up the 
+promoter region.
+
+<<transcripts4>>=
+PR <- promoters(txdb, upstream=2000, downstream=400)
+PR
+@
+
 The \Rfunction{exons} and \Rfunction{cds} functions can also be used
 in a similar fashion to retrive genomic coordinates for exons and
 coding sequences.
@@ -275,7 +287,7 @@ class object.  As with \Rclass{GRanges} objects, you can learn more
 about these objects by reading the \Rpackage{GenomicRanges}
 introductory vignette.  The \Rfunction{show} method for a
 \Rclass{GRangesList} object will display as a list of \Rclass{GRanges}
-objects.  And, at the bottom the seqlengths will be displayed once for
+objects.  And, at the bottom the seqinfo will be displayed once for
 the entire list.
 
 For each of these three functions, there is a limited set of options
@@ -284,7 +296,6 @@ For the \Rfunction{transcriptsBy} function, you can group by gene,
 exon or cds, whereas for the \Rfunction{exonsBy} and \Rfunction{cdsBy}
 functions can only be grouped by transcript (tx) or gene.
 
-
 So as a further example, to extract all the exons for each transcript
 you can call:
 
@@ -313,7 +324,6 @@ was the case in the 2nd example.  Even though the results will
 sometimes have to come back to you as synthetic IDs, you can still
 always retrieve the original IDs.  
 
-
 \begin{Exercise}
 Starting with the tx\_ids that are the names of the GRList object we
 just made, use \Rfunction{select} to retrieve that matching transcript
@@ -328,7 +338,6 @@ head(select(txdb, keys=tx_ids, columns="TXNAME", keytype="TXID"))
 @ 
 \end{Solution}
 
-
 Finally, the order of the results in a \Rclass{GRangesList} object can
 vary with the way in which things were grouped. In most cases the
 grouped elements of the \Rclass{GRangesList} object will be listed in
@@ -355,16 +364,12 @@ length(threeUTRsByTranscript(txdb))
 @
 
 
-
-
-
 \subsection{Getting the actual sequence data}
 
 The \Rpackage{GenomicFeatures} package also provides provides
 functions for converting from ranges to actual sequence (when paired
 with an appropriate \Rpackage{BSgenome} package).
 
-
 <<extract>>=
 library(BSgenome.Hsapiens.UCSC.hg19)
 tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene,
@@ -395,7 +400,6 @@ translate(cds_seqs)
 \end{Solution}
 
 
-
 \section{Creating New \Rclass{TxDb} Objects or Packages}
 
 The \Rpackage{GenomicFeatures} package provides functions to create
@@ -470,6 +474,13 @@ Instead, we suggest that you save your annotation objects and label
 them with an appropriate time stamp so as to facilitate reproducible
 research.
 
+\subsection{Using \Rfunction{makeTxDbFromEnsembl}}
+
+The \Rfunction{makeTxDbFromEnsembl} function creates a \Rclass{TxDb} object
+for a given organism by importing the genomic locations of its transcripts,
+exons, CDS, and genes from an Ensembl database.
+
+See \Rcode{?makeTxDbFromEnsembl} for more information.
 
 \subsection{Using \Rfunction{makeTxDbFromGFF}}
 
diff --git a/inst/doc/GenomicFeatures.pdf b/inst/doc/GenomicFeatures.pdf
index 31273e2..df1f51b 100644
Binary files a/inst/doc/GenomicFeatures.pdf and b/inst/doc/GenomicFeatures.pdf differ
diff --git a/inst/extdata/GFF3_files/a.sqlite b/inst/extdata/GFF3_files/a.sqlite
index 8ef5be8..ebb3f68 100644
Binary files a/inst/extdata/GFF3_files/a.sqlite and b/inst/extdata/GFF3_files/a.sqlite differ
diff --git a/inst/extdata/GFF3_files/dmel-1000-r5.11.filtered.sqlite b/inst/extdata/GFF3_files/dmel-1000-r5.11.filtered.sqlite
index 84e2e98..38b1a52 100644
Binary files a/inst/extdata/GFF3_files/dmel-1000-r5.11.filtered.sqlite and b/inst/extdata/GFF3_files/dmel-1000-r5.11.filtered.sqlite differ
diff --git a/inst/extdata/GTF_files/Aedes_aegypti.partial.sqlite b/inst/extdata/GTF_files/Aedes_aegypti.partial.sqlite
index 92284ae..4ab19a2 100644
Binary files a/inst/extdata/GTF_files/Aedes_aegypti.partial.sqlite and b/inst/extdata/GTF_files/Aedes_aegypti.partial.sqlite differ
diff --git a/inst/script/makeTxDbs.R b/inst/script/makeTxDbs.R
index b5db4f1..3ad0c43 100644
--- a/inst/script/makeTxDbs.R
+++ b/inst/script/makeTxDbs.R
@@ -12,44 +12,108 @@ TxDbPackagesForRelease <-
                          "<maintainer at bioconductor.org>"),
              author="Bioconductor Core Team")
 {
-    ## Build new tracks for Bioconductor 3.5
-    cat("building galGal5 \n")
+
+    ##
+    ## Start building packages for Bioconductor 3.7
+    ##
+
+    cat("building bosTau8\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="bosTau8",
+                            tablename="refGene")    
+
+    cat("building ce11\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="ce11",
+                            tablename="refGene")    
+
+    cat("building canFam3\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="canFam3",
+                            tablename="refGene")    
+
+    cat("building danRer10\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="danRer10",
+                            tablename="refGene")    
+
+    cat("building galGal4\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="galGal4",
+                            tablename="refGene")    
+
+    cat("building galGal5\n")
     makeTxDbPackageFromUCSC(version=version,
                             maintainer=maintainer,
                             author=author,
                             destDir=destDir,
                             genome="galGal5",
-                            tablename="refGene")
+                            tablename="refGene")    
 
+    cat("building rheMac3\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="rheMac3",
+                            tablename="refGene")    
 
-    ## Update live tracks for Bioconductor 3.5
-    cat("building ce11 \n")
+    cat("building rheMac8\n")
     makeTxDbPackageFromUCSC(version=version,
                             maintainer=maintainer,
                             author=author,
                             destDir=destDir,
-                            genome="ce11",
-                            tablename="refGene")
-    cat("building dm6 \n")
+                            genome="rheMac8",
+                            tablename="refGene")    
+
+    cat("building panTro4\n")
     makeTxDbPackageFromUCSC(version=version,
                             maintainer=maintainer,
                             author=author,
                             destDir=destDir,
-                            genome="dm6",
-                            tablename="ensGene")
-    cat("building rn5 \n")
+                            genome="panTro4",
+                            tablename="refGene")    
+
+    cat("building rn5\n")
     makeTxDbPackageFromUCSC(version=version,
                             maintainer=maintainer,
                             author=author,
                             destDir=destDir,
                             genome="rn5",
-                            tablename="refGene")
+                            tablename="refGene")    
 
-    cat("building rn6 \n")
+    cat("building rn6\n")
     makeTxDbPackageFromUCSC(version=version,
                             maintainer=maintainer,
                             author=author,
                             destDir=destDir,
                             genome="rn6",
-                            tablename="refGene")
+                            tablename="refGene")    
+
+    cat("building susScr3\n")
+    makeTxDbPackageFromUCSC(version=version,
+                            maintainer=maintainer,
+                            author=author,
+                            destDir=destDir,
+                            genome="susScr3",
+                            tablename="refGene")    
+
+    ##
+    ## End building packages for Bioconductor 3.7
+    ##
 }
diff --git a/man/DEFAULT_CIRC_SEQS.Rd b/man/DEFAULT_CIRC_SEQS.Rd
index 78ec76f..77f3116 100644
--- a/man/DEFAULT_CIRC_SEQS.Rd
+++ b/man/DEFAULT_CIRC_SEQS.Rd
@@ -12,7 +12,8 @@ append to it as needed.
 }
 \seealso{
   \code{\link{makeTxDbFromUCSC}},
-  \code{\link{makeTxDbFromBiomart}}
+  \code{\link{makeTxDbFromBiomart}},
+  \code{\link{makeTxDbFromEnsembl}}
 }
 
 \examples{
diff --git a/man/TxDb-class.Rd b/man/TxDb-class.Rd
index 9f36517..bd2af6c 100644
--- a/man/TxDb-class.Rd
+++ b/man/TxDb-class.Rd
@@ -25,13 +25,6 @@
 
   See \code{?\link{FeatureDb}} for a more generic container for storing
   genomic locations of an arbitrary type of genomic features.
-
-  See \code{?\link{makeTxDbFromUCSC}} and
-  \code{?\link{makeTxDbFromBiomart}} for convenient ways to
-  make TxDb objects from UCSC or BioMart online resources.
-
-  See \code{?\link{makeTxDbFromGFF}} for making a TxDb
-  object from annotations available as a GFF3 or GTF file.
 }
 
 \section{Methods}{
@@ -97,21 +90,23 @@
 \seealso{
   \itemize{
     \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
-          \code{\link{makeTxDbFromGRanges}}, and \code{\link{makeTxDbFromGFF}},
-          for convenient ways to make a \link{TxDb} object from UCSC or BioMart
-          online resources, or from a \link[GenomicRanges]{GRanges} object,
-          or from a GFF or GTF file.
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
 
     \item \code{\link[AnnotationDbi]{saveDb}} and
           \code{\link[AnnotationDbi]{loadDb}} in the \pkg{AnnotationDbi}
           package for saving and loading a TxDb object as an SQLite file.
 
     \item \code{\link{transcripts}}, \code{\link{transcriptsBy}},
-          and \code{\link{transcriptsByOverlaps}},
-          for how to extract genomic features from a TxDb object.
+          and \code{\link{transcriptsByOverlaps}}, for extracting
+          genomic feature locations from a \link{TxDb}-like object.
 
     \item \code{\link{transcriptLengths}} for extracting the transcript
-          lengths from a \link{TxDb} object.
+          lengths (and other metrics) from a \link{TxDb} object.
 
     \item \link[GenomicFeatures]{select-methods} for how to use the
           simple "select" interface to extract information from a
diff --git a/man/coordinate-mapping-methods.Rd b/man/coordinate-mapping-methods.Rd
index 7c97576..c9088c1 100644
--- a/man/coordinate-mapping-methods.Rd
+++ b/man/coordinate-mapping-methods.Rd
@@ -109,7 +109,8 @@
       \item{cds}
       \item{genes}
       \item{promoters}
-      \item{disjointExons}
+      \item{exonicParts}
+      \item{intronicParts}
       \item{transcriptsBy}
       \item{exonsBy}
       \item{cdsBy}
diff --git a/man/coverageByTranscript.Rd b/man/coverageByTranscript.Rd
index 90128aa..53c439a 100644
--- a/man/coverageByTranscript.Rd
+++ b/man/coverageByTranscript.Rd
@@ -49,10 +49,11 @@ pcoverageByTranscript(x, transcripts, ignore.strand=FALSE, ...)
     \code{\link{exonsBy}}, then the exons are guaranteed to be ordered by
     ascending rank. See \code{?\link{exonsBy}} for more information.
 
-    Alternatively \code{transcripts} can be any object for which
-    \code{\link{exonsBy}} is implemented (e.g. a \link{TxDb} object), in
-    which case it is replaced with the \link[GenomicRanges]{GRangesList} object
-    returned by \code{\link{exonsBy}(transcripts, by="tx", use.names=TRUE)}.
+    Alternatively, \code{transcripts} can be a \link{TxDb} object, or any
+    \link{TxDb}-like object that supports the \code{\link{exonsBy}()}
+    extractor (e.g. an \link[ensembldb]{EnsDb} object). In this case it
+    is replaced with the \link[GenomicRanges]{GRangesList} object returned
+    by \code{\link{exonsBy}(transcripts, by="tx", use.names=TRUE)}.
 
     For \code{pcoverageByTranscript}, \code{transcripts} should have the
     length of \code{x} or length 1. If the latter, it is recycled to the
@@ -88,11 +89,15 @@ pcoverageByTranscript(x, transcripts, ignore.strand=FALSE, ...)
 
 \seealso{
   \itemize{
-    \item \code{\link{extractTranscriptSeqs}} for extracting transcript
-          (or CDS) sequences from chromosome sequences.
+    \item \code{\link{transcripts}}, \code{\link{transcriptsBy}},
+          and \code{\link{transcriptsByOverlaps}}, for extracting
+          genomic feature locations from a \link{TxDb}-like object.
 
     \item \code{\link{transcriptLengths}} for extracting the transcript
-          lengths from a \link{TxDb} object.
+          lengths (and other metrics) from a \link{TxDb} object.
+
+    \item \code{\link{extractTranscriptSeqs}} for extracting transcript
+          (or CDS) sequences from chromosome sequences.
 
     \item The \link[IRanges]{RleList} class defined and documented in the
           \pkg{IRanges} package.
diff --git a/man/disjointExons.Rd b/man/disjointExons.Rd
index 9604d6e..7960e86 100644
--- a/man/disjointExons.Rd
+++ b/man/disjointExons.Rd
@@ -10,6 +10,10 @@
 \description{
   \code{disjointExons} extracts the non-overlapping exon parts from a
   \link{TxDb} object or any other supported object.
+
+  WARNING: \code{disjointExons} is superseded by \code{\link{exonicParts}}
+  and will be deprecated soon. Please use improved \code{\link{exonicParts}}
+  instead.
 }
 
 \usage{
@@ -61,12 +65,7 @@ disjointExons(x, ...)
 }
 
 \seealso{
-  \itemize{
-    \item \code{\link{transcripts}}, \code{\link{transcriptsBy}}, and
-          \code{\link{transcriptsByOverlaps}} for the core genomic features
-          extractors.
-    \item The \link{TxDb} class.
-  }
+  \code{\link{exonicParts}} for an improved version of \code{disjointExons}.
 }
 
 \examples{
diff --git a/man/exonicParts.Rd b/man/exonicParts.Rd
new file mode 100644
index 0000000..dc6b302
--- /dev/null
+++ b/man/exonicParts.Rd
@@ -0,0 +1,163 @@
+\name{exonicParts}
+
+\alias{exonicParts}
+\alias{intronicParts}
+
+\title{
+  Extract non-overlapping exonic or intronic parts from a TxDb-like object
+}
+
+\description{
+  \code{exonicParts} and \code{intronicParts} extract the non-overlapping
+  (a.k.a. disjoint) exonic or intronic parts from a \link{TxDb}-like object.
+}
+
+\usage{
+exonicParts(txdb, linked.to.single.gene.only=FALSE)
+intronicParts(txdb, linked.to.single.gene.only=FALSE)
+}
+
+\arguments{
+  \item{txdb}{
+    A \link{TxDb} object, or any \link{TxDb}-like object that supports the
+    \code{\link{transcripts}()} and \code{\link{exonsBy}()} extractors
+    (e.g. an \link[ensembldb]{EnsDb} object).
+  }
+  \item{linked.to.single.gene.only}{
+    \code{TRUE} or \code{FALSE}.
+
+    If \code{FALSE} (the default), then the disjoint parts are obtained
+    by calling \code{\link[IRanges]{disjoin}()} on all the exons (or introns)
+    in \code{txdb}, including on exons (or introns) not linked to a gene or
+    linked to more than one gene.
+
+    If \code{TRUE}, then the disjoint parts are obtained in 2 steps:
+    \enumerate{
+      \item call \code{\link[IRanges]{disjoin}()} on the exons (or introns)
+            linked to \emph{at least one gene},
+
+      \item then drop the parts linked to more than one gene from
+            the set of exonic (or intronic) parts obtained previously.
+    }
+  }
+}
+
+\value{
+  \code{exonicParts} returns a disjoint and strictly sorted
+  \link[GenomicRanges]{GRanges} object with 1 range per exonic part
+  and with metadata columns \code{tx_id}, \code{tx_name}, \code{gene_id},
+  \code{exon_id}, \code{exon_name}, and \code{exon_rank}.
+
+  \code{intronicParts} returns a disjoint and strictly sorted
+  \link[GenomicRanges]{GRanges} object with 1 range per intronic part
+  and with metadata columns \code{tx_id}, \code{tx_name}, and \code{gene_id}.
+}
+
+\note{
+  \code{exonicParts} is a replacement for \code{\link{disjointExons}} with
+  the following differences/improvements:
+  \itemize{
+    \item Argument \code{linked.to.single.gene.only} in \code{exonicParts}
+          replaces argument \code{aggregateGenes} in \code{disjointExons},
+          but has opposite meaning i.e.
+          \code{exonicParts(txdb, linked.to.single.gene.only=TRUE)}
+          returns the same exonic parts as
+          \code{disjointExons(txdb, aggregateGenes=FALSE)}.
+
+    \item Unlike \code{disjointExons(txdb, aggregateGenes=TRUE)},
+          \code{exonicParts(txdb, linked.to.single.gene.only=FALSE)} does
+          NOT discard exon parts that are not linked to a gene.
+
+    \item \code{exonicParts} is almost 2x more efficient than
+          \code{disjointExons}.
+
+    \item \code{exonicParts} works out-of-the-box on any \link{TxDb}-like
+          object that supports the \code{\link{transcripts}()} and
+          \code{\link{exonsBy}()} extractors (e.g. on an
+          \link[ensembldb]{EnsDb} object).
+  }
+}
+
+\author{Hervé Pagès}
+
+\seealso{
+  \itemize{
+    \item \code{\link[IRanges]{disjoin}} in the \pkg{IRanges} package.
+
+    \item \code{\link{transcripts}}, \code{\link{transcriptsBy}},
+          and \code{\link{transcriptsByOverlaps}}, for extracting
+          genomic feature locations from a \link{TxDb}-like object.
+
+    \item \code{\link{transcriptLengths}} for extracting the transcript
+          lengths (and other metrics) from a \link{TxDb} object.
+
+    \item \code{\link{extractTranscriptSeqs}} for extracting transcript
+          (or CDS) sequences from chromosome sequences.
+
+    \item \code{\link{coverageByTranscript}} for computing coverage by
+          transcript (or CDS) of a set of ranges.
+
+    \item The \link{TxDb} class.
+  }
+}
+
+\examples{
+library(TxDb.Hsapiens.UCSC.hg19.knownGene)
+txdb <- TxDb.Hsapiens.UCSC.hg19.knownGene
+
+## ---------------------------------------------------------------------
+## exonicParts()
+## ---------------------------------------------------------------------
+
+exonic_parts1 <- exonicParts(txdb)
+exonic_parts1
+
+## Mapping from exonic parts to genes is many-to-many:
+mcols(exonic_parts1)$gene_id
+table(lengths(mcols(exonic_parts1)$gene_id))
+## A Human exonic part can be linked to 0 to 22 known genes!
+
+exonic_parts2 <- exonicParts(txdb, linked.to.single.gene.only=TRUE)
+exonic_parts2
+
+## Mapping from exonic parts to genes now is many-to-one:
+class(mcols(exonic_parts2)$gene_id)
+
+## Sanity checks:
+stopifnot(isDisjoint(exonic_parts1), isStrictlySorted(exonic_parts1))
+stopifnot(isDisjoint(exonic_parts2), isStrictlySorted(exonic_parts2))
+stopifnot(all(exonic_parts2 \%within\% reduce(exonic_parts1)))
+stopifnot(identical(
+    lengths(mcols(exonic_parts1)$gene_id) == 1L,
+    exonic_parts1 \%within\% exonic_parts2
+))
+
+## ---------------------------------------------------------------------
+## intronicParts()
+## ---------------------------------------------------------------------
+
+intronic_parts1 <- intronicParts(txdb)
+intronic_parts1
+
+## Mapping from intronic parts to genes is many-to-many:
+mcols(intronic_parts1)$gene_id
+table(lengths(mcols(intronic_parts1)$gene_id))
+## A Human intronic part can be linked to 0 to 22 known genes!
+
+intronic_parts2 <- intronicParts(txdb, linked.to.single.gene.only=TRUE)
+intronic_parts2
+
+## Mapping from intronic parts to genes now is many-to-one:
+class(mcols(intronic_parts2)$gene_id)
+
+## Sanity checks:
+stopifnot(isDisjoint(intronic_parts1), isStrictlySorted(intronic_parts1))
+stopifnot(isDisjoint(intronic_parts2), isStrictlySorted(intronic_parts2))
+stopifnot(all(intronic_parts2 \%within\% reduce(intronic_parts1)))
+stopifnot(identical(
+    lengths(mcols(intronic_parts1)$gene_id) == 1L,
+    intronic_parts1 \%within\% intronic_parts2
+))
+}
+
+\keyword{manip}
diff --git a/man/extractTranscriptSeqs.Rd b/man/extractTranscriptSeqs.Rd
index 0b0acc5..4f54cb7 100644
--- a/man/extractTranscriptSeqs.Rd
+++ b/man/extractTranscriptSeqs.Rd
@@ -44,9 +44,9 @@ extractTranscriptSeqs(x, transcripts, ...)
             representing a collection of chromosomes, then \code{transcripts}
             must be a \link[GenomicRanges]{GRangesList} object or any object
             for which \code{\link{exonsBy}} is implemented (e.g. a \link{TxDb}
-            object). If the latter, then it's first turned into a
-            \link[GenomicRanges]{GRangesList} object with
-            \code{\link{exonsBy}(transcripts, by="tx", ...)}.
+            or \link[ensembldb]{EnsDb} object). If the latter, then it's
+            first turned into a \link[GenomicRanges]{GRangesList} object
+            with \code{\link{exonsBy}(transcripts, by="tx", ...)}.
     }
 
     Note that, for each transcript, the exons must be ordered by ascending
@@ -112,7 +112,7 @@ extractTranscriptSeqs(x, transcripts, ...)
           transcript (or CDS) of a set of ranges.
 
     \item \code{\link{transcriptLengths}} for extracting the transcript
-          lengths from a \link{TxDb} object.
+          lengths (and other metrics) from a \link{TxDb} object.
 
     \item The \code{\link{transcriptLocs2refLocs}} function for converting
           transcript-based locations into reference-based locations.
diff --git a/man/extractUpstreamSeqs.Rd b/man/extractUpstreamSeqs.Rd
index 596295e..ba6dabb 100644
--- a/man/extractUpstreamSeqs.Rd
+++ b/man/extractUpstreamSeqs.Rd
@@ -93,24 +93,25 @@ extractUpstreamSeqs(x, genes, width=1000, ...)
   for the list of TxDb packages available in the current release of
   Bioconductor.
   Note that you can make your own custom \link{TxDb} object from
-  various annotation resources. See the \code{\link{makeTxDbFromUCSC}},
-  \code{\link{makeTxDbFromBiomart}}, and
-  \code{\link{makeTxDbFromGFF}} functions for more information about
-  this.
+  various annotation resources by using one of the \code{makeTxDbFrom*()}
+  functions listed in the "See also" section below.
 }
 
 \author{Hervé Pagès}
 
 \seealso{
   \itemize{
+    \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
+
     \item The \code{\link[BSgenome]{available.genomes}} function in the
           \pkg{BSgenome} package for checking avaibility of BSgenome
           data packages (and installing the desired one).
-          
-    \item The \code{\link{makeTxDbFromUCSC}},
-          \code{\link{makeTxDbFromBiomart}}, and
-          \code{\link{makeTxDbFromGFF}} functions for making your own
-          custom \link{TxDb} object from various annotation resources.
 
     \item The \link[BSgenome]{BSgenome}, \link[rtracklayer]{TwoBitFile}, and
           \link[Rsamtools]{FaFile} classes, defined and documented
diff --git a/man/makeTxDb.Rd b/man/makeTxDb.Rd
index eb42c31..f23c7f4 100644
--- a/man/makeTxDb.Rd
+++ b/man/makeTxDb.Rd
@@ -22,7 +22,7 @@ makeTxDb(transcripts, splicings,
 \arguments{
   \item{transcripts}{data frame containing the genomic locations of
     a set of transcripts}
-  \item{splicings}{data frame containing the exon and cds locations
+  \item{splicings}{data frame containing the exon and CDS locations
     of a set of transcripts}
   \item{genes}{data frame containing the genes associated to a set
     of transcripts}
@@ -33,7 +33,7 @@ makeTxDb(transcripts, splicings,
     The names of the columns must be \code{"name"} and \code{"value"}
     and their type must be character.}
   \item{reassign.ids}{controls how internal ids should be assigned for each
-    type of feature i.e. for transcripts, exons, and cds. For each type, if
+    type of feature i.e. for transcripts, exons, and CDS. For each type, if
     \code{reassign.ids} is \code{FALSE} and if the ids are supplied, then
     they are used as the internal ids, otherwise the internal ids are assigned
     in a way that is compatible with the order defined by ordering the
@@ -44,7 +44,7 @@ makeTxDb(transcripts, splicings,
   The \code{transcripts} (required), \code{splicings} (required)
   and \code{genes} (optional) arguments must be data frames that
   describe a set of transcripts and the genomic features related
-  to them (exons, cds and genes at the moment).
+  to them (exons, CDS and genes at the moment).
   The \code{chrominfo} (optional) argument must be a data frame
   containing chromosome information like the length of each chromosome.
 
@@ -67,7 +67,7 @@ makeTxDb(transcripts, splicings,
 
   \code{splicings} must have N rows per transcript, where N is the nb
   of exons in the transcript. Each row describes an exon plus, optionally,
-  the cds contained in this exon. Its columns must be:
+  the CDS contained in this exon. Its columns must be:
   \itemize{
   \item \code{tx_id}: Foreign key that links each row in the \code{splicings}
         data frame to a unique row in the \code{transcripts} data frame.
@@ -93,21 +93,26 @@ makeTxDb(transcripts, splicings,
         and \code{exon_chrom} must also be missing.
   \item \code{exon_start}, \code{exon_end}: Exon start and end.
         Integer vectors with no NAs.
-  \item \code{cds_id}: [optional] cds ID. Integer vector.
+  \item \code{cds_id}: [optional] CDS ID. Integer vector.
         If present then \code{cds_start} and \code{cds_end} must also
         be present.
-        NAs are allowed and must match NAs in \code{cds_start}
-        and \code{cds_end}.
-  \item \code{cds_name}: [optional] cds name. Character vector (or factor).
+        NAs are allowed and must match those in \code{cds_start} and
+        \code{cds_end}.
+  \item \code{cds_name}: [optional] CDS name. Character vector (or factor).
         If present then \code{cds_start} and \code{cds_end} must also be
-        present. NAs and/or duplicates are ok. Must be NA if corresponding
-        \code{cds_start} and \code{cds_end} are NAs.
-  \item \code{cds_start}, \code{cds_end}: [optional] cds start and end.
+        present. NAs and/or duplicates are ok. Must contain NAs at least
+        where \code{cds_start} and \code{cds_end} contain them.
+  \item \code{cds_start}, \code{cds_end}: [optional] CDS start and end.
         Integer vectors.
         If one of the 2 columns is missing then all \code{cds_*} columns
         must be missing.
         NAs are allowed and must occur at the same positions in
         \code{cds_start} and \code{cds_end}.
+  \item \code{cds_phase}: [optional] CDS phase. Integer vector.
+        If present then \code{cds_start} and \code{cds_end} must also
+        be present.
+        NAs are allowed and must match those in \code{cds_start} and
+        \code{cds_end}.
   }
   Other columns, if any, are ignored (with a warning).
 
@@ -147,10 +152,12 @@ makeTxDb(transcripts, splicings,
 \seealso{
   \itemize{
     \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
-          \code{\link{makeTxDbFromGRanges}}, and \code{\link{makeTxDbFromGFF}},
-          for convenient ways to make a \link{TxDb} object from UCSC or BioMart
-          online resources, or from a \link[GenomicRanges]{GRanges} object,
-          or from a GFF or GTF file.
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
 
     \item The \link{TxDb} class.
 
@@ -174,7 +181,8 @@ splicings <-  data.frame(
                    exon_start=c(1, 2001, 2101, 2131, 2001, 2131),
                    exon_end=c(999, 2085, 2144, 2199, 2085, 2199),
                    cds_start=c(1, 2022, 2101, 2131, NA, NA),
-                   cds_end=c(999, 2085, 2144, 2193, NA, NA))
+                   cds_end=c(999, 2085, 2144, 2193, NA, NA),
+                   cds_phase=c(0, 0, 2, 0, NA, NA))
 
 txdb <- makeTxDb(transcripts, splicings)
 }
diff --git a/man/makeTxDbFromBiomart.Rd b/man/makeTxDbFromBiomart.Rd
index c1a7286..881d8b7 100644
--- a/man/makeTxDbFromBiomart.Rd
+++ b/man/makeTxDbFromBiomart.Rd
@@ -134,10 +134,12 @@ getChromInfoFromBiomart(biomart="ENSEMBL_MART_ENSEMBL",
 
 \seealso{
   \itemize{
-    \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromGRanges}},
-          and \code{\link{makeTxDbFromGFF}}, for convenient ways to make a
-          \link{TxDb} object from UCSC online resources, or from a
-          \link[GenomicRanges]{GRanges} object, or from a GFF or GTF file.
+    \item \code{\link{makeTxDbFromUCSC}} and \code{\link{makeTxDbFromEnsembl}}
+          for making a \link{TxDb} object from other online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
 
     \item The \code{\link[biomaRt]{listMarts}}, \code{\link[biomaRt]{useMart}},
           \code{\link[biomaRt]{listDatasets}}, and
@@ -200,9 +202,9 @@ library(biomaRt)
 
 ## Note that BioMart is not currently available for Ensembl Bacteria.
 
-## -------------
-## Ensembl Fungi
-## -------------
+## ---------------------
+## --- Ensembl Fungi ---
+
 mart <- useMart(biomart="fungal_mart", host="fungi.ensembl.org")
 datasets <- listDatasets(mart)
 datasets$dataset
@@ -217,9 +219,9 @@ yeast_txdb0 <- makeTxDbFromBiomart(dataset="scerevisiae_gene_ensembl")
 all(transcripts(yeast_txdb0) \%in\% transcripts(yeast_txdb))
 all(transcripts(yeast_txdb) \%in\% transcripts(yeast_txdb0))
 
-## ---------------
-## Ensembl Metazoa
-## ---------------
+## -----------------------
+## --- Ensembl Metazoa ---
+
 mart <- useMart(biomart="metazoa_mart", host="metazoa.ensembl.org")
 datasets <- listDatasets(mart)
 datasets$dataset
@@ -236,9 +238,9 @@ filter <- list(tx_name="Y71G12B.44")
 transcripts(worm_txdb, filter=filter, columns=c("tx_name", "tx_type"))
 transcripts(txdb1, filter=filter, columns=c("tx_name", "tx_type"))
 
-## --------------
-## Ensembl Plants
-## --------------
+## ----------------------
+## --- Ensembl Plants ---
+
 mart <- useMart(biomart="plants_mart", host="plants.ensembl.org")
 datasets <- listDatasets(mart)
 datasets[ , 1:2]
@@ -247,9 +249,9 @@ athaliana_txdb <- makeTxDbFromBiomart(biomart="plants_mart",
                                       host="plants.ensembl.org")
 athaliana_txdb
 
-## ----------------
-## Ensembl Protists
-## ----------------
+## ------------------------
+## --- Ensembl Protists ---
+
 mart <- useMart(biomart="protist_mart", host="protists.ensembl.org")
 datasets <- listDatasets(mart)
 datasets$dataset
diff --git a/man/makeTxDbFromEnsembl.Rd b/man/makeTxDbFromEnsembl.Rd
new file mode 100644
index 0000000..0b2c983
--- /dev/null
+++ b/man/makeTxDbFromEnsembl.Rd
@@ -0,0 +1,79 @@
+\name{makeTxDbFromEnsembl}
+
+\alias{makeTxDbFromEnsembl}
+
+\title{
+  Make a TxDb object from an Ensembl database
+}
+
+\description{
+  The \code{makeTxDbFromEnsembl} function creates a \link{TxDb} object for
+  a given organism by importing the genomic locations of its transcripts,
+  exons, CDS, and genes from an Ensembl database.
+}
+
+\usage{
+makeTxDbFromEnsembl(organism="Homo sapiens",
+                    release=NA,
+                    circ_seqs=DEFAULT_CIRC_SEQS,
+                    server="ensembldb.ensembl.org")
+}
+
+\arguments{
+  \item{organism}{
+    The \emph{scientific name} (i.e. genus and species, or genus and species
+    and subspecies) of the organism for which to import the data.
+    Case is not sensitive. Underscores can be used instead of white spaces
+    e.g. \code{"homo_sapiens"} is accepted.
+  }
+  \item{release}{
+    The Ensembl release to query e.g. 89. If set to \code{NA} (the default),
+    the current release is used.
+  }
+  \item{circ_seqs}{
+    A character vector to list out which chromosomes should be marked
+    as circular.
+  }
+  \item{server}{
+    The name of the MySQL server to query.
+    See \url{https://www.ensembl.org/info/data/mysql.html} for the list of
+    Ensembl public MySQL servers.
+    Make sure to use the server nearest to you. It can make a big difference!
+  }
+}
+
+\value{
+  A \link{TxDb} object.
+}
+
+\note{
+  \code{makeTxDbFromEnsembl} tends to be faster and more reliable than
+  \code{\link{makeTxDbFromBiomart}}.
+}
+
+\author{H. Pagès}
+
+\seealso{
+  \itemize{
+    \item \code{\link{makeTxDbFromUCSC}} and \code{\link{makeTxDbFromBiomart}}
+          for making a \link{TxDb} object from other online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
+
+    \item \code{\link{DEFAULT_CIRC_SEQS}}.
+
+    \item The \link{TxDb} class.
+
+    \item \code{\link{makeTxDb}} for the low-level function used by the
+          \code{makeTxDbFrom*} functions to make the \link{TxDb} object
+          returned to the user.
+  }
+}
+
+\examples{
+txdb <- makeTxDbFromEnsembl("Saccharomyces cerevisiae",
+                            server="useastdb.ensembl.org")
+txdb
+}
diff --git a/man/makeTxDbFromGFF.Rd b/man/makeTxDbFromGFF.Rd
index 719dec5..558c27c 100644
--- a/man/makeTxDbFromGFF.Rd
+++ b/man/makeTxDbFromGFF.Rd
@@ -102,9 +102,9 @@ makeTxDbFromGFF(file,
           \pkg{rtracklayer} package (also used by \code{makeTxDbFromGFF}
           internally).
 
-    \item \code{\link{makeTxDbFromUCSC}} and \code{\link{makeTxDbFromBiomart}}
-          for convenient ways to make a \link{TxDb} object from UCSC or BioMart
-          online resources.
+    \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
 
     \item \code{\link{DEFAULT_CIRC_SEQS}}.
 
diff --git a/man/makeTxDbFromGRanges.Rd b/man/makeTxDbFromGRanges.Rd
index 2b2a369..0db1ec7 100644
--- a/man/makeTxDbFromGRanges.Rd
+++ b/man/makeTxDbFromGRanges.Rd
@@ -48,9 +48,11 @@ makeTxDbFromGRanges(gr, drop.stop.codons=FALSE, metadata=NULL, taxonomyId=NA)
 \seealso{
   \itemize{
     \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
-          and \code{\link{makeTxDbFromGFF}}, for convenient ways to make a
-          \link{TxDb} object from UCSC or BioMart online resources, or
-          directly from a GFF or GTF file.
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
+
+    \item \code{\link{makeTxDbFromGFF}} for making a \link{TxDb} object
+          from a GFF or GTF file.
 
     \item The \code{\link[rtracklayer]{import}} function in the
           \pkg{rtracklayer} package.
@@ -86,8 +88,10 @@ txdb
 ## Reverse operation:
 gr2 <- asGFF(txdb)
 
-## Sanity check:
-stopifnot(identical(as.list(txdb), as.list(makeTxDbFromGRanges(gr2))))
+## Sanity check (asGFF() does not propagate the CDS phase at the moment):
+target <- as.list(txdb)
+target$splicings$cds_phase <- NULL
+stopifnot(identical(target, as.list(makeTxDbFromGRanges(gr2))))
 
 ## ---------------------------------------------------------------------
 ## WITH A GRanges OBJECT STRUCTURED AS GTF
diff --git a/man/makeTxDbFromUCSC.Rd b/man/makeTxDbFromUCSC.Rd
index dc03848..0f42062 100644
--- a/man/makeTxDbFromUCSC.Rd
+++ b/man/makeTxDbFromUCSC.Rd
@@ -88,10 +88,13 @@ getChromInfoFromUCSC(
 
 \seealso{
   \itemize{
-    \item \code{\link{makeTxDbFromBiomart}}, \code{\link{makeTxDbFromGRanges}},
-          and \code{\link{makeTxDbFromGFF}}, for convenient ways to make a
-          \link{TxDb} object from BioMart online resources, or from a
-          \link[GenomicRanges]{GRanges} object, or from a GFF or GTF file.
+    \item \code{\link{makeTxDbFromBiomart}} and
+          \code{\link{makeTxDbFromEnsembl}} for making a \link{TxDb} object
+          from other online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
 
     \item \code{\link[rtracklayer]{ucscGenomes}} in the \pkg{rtracklayer}
           package.
diff --git a/man/transcriptLengths.Rd b/man/transcriptLengths.Rd
index b12a683..50eae60 100644
--- a/man/transcriptLengths.Rd
+++ b/man/transcriptLengths.Rd
@@ -3,7 +3,7 @@
 \alias{transcriptLengths}
 
 
-\title{Extract the transcript lengths from a TxDb object}
+\title{Extract the transcript lengths (and other metrics) from a TxDb object}
 
 \description{
   The \code{transcriptLengths} function extracts the transcript lengths from
@@ -86,9 +86,12 @@ transcriptLengths(txdb, with.cds_len=FALSE,
 \seealso{
   \itemize{
     \item \code{\link{transcripts}}, \code{\link{transcriptsBy}},
-          and \code{\link{transcriptsByOverlaps}},
-          for how to extract the genomic locations of features from a
-          \link{TxDb} object.
+          and \code{\link{transcriptsByOverlaps}}, for extracting
+          genomic feature locations from a \link{TxDb}-like object.
+
+    \item \code{\link{exonicParts}} and \code{\link{intronicParts}} for
+          extracting non-overlapping exonic or intronic parts from a
+          TxDb-like object.
 
     \item \code{\link{extractTranscriptSeqs}} for extracting transcript
           (or CDS) sequences from chromosome sequences.
@@ -96,9 +99,13 @@ transcriptLengths(txdb, with.cds_len=FALSE,
     \item \code{\link{coverageByTranscript}} for computing coverage by
           transcript (or CDS) of a set of ranges.
 
-    \item \code{\link{makeTxDbFromUCSC}} and
-          \code{\link{makeTxDbFromBiomart}} for convenient ways to
-          make \link{TxDb} objects from UCSC or BioMart online resources.
+    \item \code{\link{makeTxDbFromUCSC}}, \code{\link{makeTxDbFromBiomart}},
+          and \code{\link{makeTxDbFromEnsembl}}, for making a \link{TxDb}
+          object from online resources.
+
+    \item \code{\link{makeTxDbFromGRanges}} and \code{\link{makeTxDbFromGFF}}
+          for making a \link{TxDb} object from a \link[GenomicRanges]{GRanges}
+          object, or from a GFF or GTF file.
 
     \item The \link{TxDb} class.
   }
diff --git a/man/transcripts.Rd b/man/transcripts.Rd
index c859578..c45e718 100644
--- a/man/transcripts.Rd
+++ b/man/transcripts.Rd
@@ -12,11 +12,11 @@
 \alias{promoters,TxDb-method}
 
 \title{
-  Extract genomic features from an object
+  Extract genomic features from a TxDb-like object
 }
 
 \description{
-  Generic functions to extract genomic features from an object.
+  Generic functions to extract genomic features from a TxDb-like object.
   This page documents the methods for \link{TxDb} objects only.
 }
 
@@ -36,7 +36,7 @@ genes(x, ...)
 \S4method{promoters}{TxDb}(x, upstream=2000, downstream=200, ...)
 }
 
-\arguments{ 
+\arguments{
   \item{x}{
     A \link{TxDb} object.
   }
@@ -60,7 +60,7 @@ genes(x, ...)
       for \code{cds}.
     }
     If the vector is named, those names are used for the corresponding
-    column in the element metadata of the returned object. 
+    column in the element metadata of the returned object.
   }
   \item{filter}{
     Either \code{NULL} or a named list of vectors to be used to
@@ -90,26 +90,26 @@ genes(x, ...)
     additional details see \code{?`promoters,GRanges-method`}.
   }
   \item{downstream}{
-    For \code{promoters} : An \code{integer(1)} value indicating the 
-    number of bases downstream from the transcription start site. For 
+    For \code{promoters} : An \code{integer(1)} value indicating the
+    number of bases downstream from the transcription start site. For
     additional details see \code{?`promoters,GRanges-method`}.
   }
 }
 
 \details{
   These are the main functions for extracting transcript information
-  from a \link{TxDb} object. These methods can restrict the output based
-  on categorical information. To restrict the output based on interval
+  from a \link{TxDb}-like object. These methods can restrict the output
+  based on categorical information. To restrict the output based on interval
   information, use the \code{\link{transcriptsByOverlaps}},
   \code{\link{exonsByOverlaps}}, and \code{\link{cdsByOverlaps}}
   functions.
 
   The \code{promoters} function computes user-defined promoter regions
-  for the transcripts in a \link{TxDb} object. The return object is a 
-  \code{GRanges} of promoter regions around the transcription start 
+  for the transcripts in a \link{TxDb}-like object. The return object is
+  a \code{GRanges} of promoter regions around the transcription start
   site the span of which is defined by \code{upstream} and \code{downstream}.
   For additional details on how the promoter range is computed and the
-  handling of \code{+} and \code{-} strands see 
+  handling of \code{+} and \code{-} strands see
   \code{?`promoters,GRanges-method`}.
 }
 
@@ -127,10 +127,14 @@ genes(x, ...)
   \itemize{
     \item \code{\link{transcriptsBy}} and \code{\link{transcriptsByOverlaps}}
           for more ways to extract genomic features
-          from a \link{TxDb} object.
+          from a \link{TxDb}-like object.
 
     \item \code{\link{transcriptLengths}} for extracting the transcript
-          lengths from a \link{TxDb} object.
+          lengths (and other metrics) from a \link{TxDb} object.
+
+    \item \code{\link{exonicParts}} and \code{\link{intronicParts}} for
+          extracting non-overlapping exonic or intronic parts from a
+          TxDb-like object.
 
     \item \code{\link{extractTranscriptSeqs}} for extracting transcript
           (or CDS) sequences from chromosome sequences.
@@ -138,9 +142,6 @@ genes(x, ...)
     \item \code{\link{coverageByTranscript}} for computing coverage by
           transcript (or CDS) of a set of ranges.
 
-    \item \code{\link{disjointExons}} for extracting the non-overlapping
-          exon parts from a \link{TxDb} object.
-
     \item \link[GenomicFeatures]{select-methods} for how to use the
           simple "select" interface to extract information from a
           \link{TxDb} object.
diff --git a/man/transcriptsBy.Rd b/man/transcriptsBy.Rd
index 318359b..d0af1bf 100644
--- a/man/transcriptsBy.Rd
+++ b/man/transcriptsBy.Rd
@@ -14,7 +14,7 @@
 \alias{threeUTRsByTranscript,TxDb-method}
 
 \title{
-  Extract and group genomic features of a given type
+  Extract and group genomic features of a given type from a TxDb-like object
 }
 \description{
   Generic functions to extract genomic features of a given type
@@ -94,10 +94,14 @@ threeUTRsByTranscript(x, ...)
   \itemize{
     \item \code{\link{transcripts}} and \code{\link{transcriptsByOverlaps}}
           for more ways to extract genomic features
-          from a \link{TxDb} object.
+          from a \link{TxDb}-like object.
 
     \item \code{\link{transcriptLengths}} for extracting the transcript
-          lengths from a \link{TxDb} object.
+          lengths (and other metrics) from a \link{TxDb} object.
+
+    \item \code{\link{exonicParts}} and \code{\link{intronicParts}} for
+          extracting non-overlapping exonic or intronic parts from a
+          TxDb-like object.
 
     \item \code{\link{extractTranscriptSeqs}} for extracting transcript
           (or CDS) sequences from chromosome sequences.
diff --git a/man/transcriptsByOverlaps.Rd b/man/transcriptsByOverlaps.Rd
index 5e298bc..2af13a9 100644
--- a/man/transcriptsByOverlaps.Rd
+++ b/man/transcriptsByOverlaps.Rd
@@ -8,7 +8,8 @@
 \alias{cdsByOverlaps,TxDb-method}
 
 \title{
-  Extract genomic features from an object based on their by genomic location
+  Extract genomic features from a TxDb-like object based on their
+  genomic location
 }
 \description{
   Generic functions to extract genomic features for specified genomic
@@ -17,40 +18,38 @@
 }
 \usage{
 transcriptsByOverlaps(x, ranges,
-                      maxgap = 0L, minoverlap = 1L,
+                      maxgap = -1L, minoverlap = 0L,
                       type = c("any", "start", "end"), ...)
 \S4method{transcriptsByOverlaps}{TxDb}(x, ranges,
-                      maxgap = 0L, minoverlap = 1L,
+                      maxgap = -1L, minoverlap = 0L,
                       type = c("any", "start", "end"),
                       columns = c("tx_id", "tx_name"))
 
 exonsByOverlaps(x, ranges,
-                maxgap = 0L, minoverlap = 1L,
+                maxgap = -1L, minoverlap = 0L,
                 type = c("any", "start", "end"), ...)
 \S4method{exonsByOverlaps}{TxDb}(x, ranges,
-                maxgap = 0L, minoverlap = 1L,
+                maxgap = -1L, minoverlap = 0L,
                 type = c("any", "start", "end"),
                 columns = "exon_id")
 
 cdsByOverlaps(x, ranges,
-              maxgap = 0L, minoverlap = 1L,
+              maxgap = -1L, minoverlap = 0L,
               type = c("any", "start", "end"), ...)
 \S4method{cdsByOverlaps}{TxDb}(x, ranges,
-              maxgap = 0L, minoverlap = 1L,
+              maxgap = -1L, minoverlap = 0L,
               type = c("any", "start", "end"),
               columns = "cds_id")
 }
 \arguments{  
   \item{x}{A \link{TxDb} object.}
-  \item{...}{Arguments to be passed to or from methods.}
   \item{ranges}{A \link[GenomicRanges]{GRanges} object to restrict the output.}
-  \item{type}{How to perform the interval overlap operations of the
-    \code{ranges}. See the
-    \code{\link[GenomicRanges:findOverlaps-methods]{findOverlaps}} manual page
-    in the GRanges package for more information.}
-  \item{maxgap}{A non-negative integer representing the maximum distance
-    between a query interval and a subject interval.}
-  \item{minoverlap}{Ignored.}
+  \item{maxgap,minoverlap,type}{
+    Used in the internal call to \code{findOverlaps()} to detect overlaps.
+    See \code{?\link[IRanges]{findOverlaps}} in the \pkg{IRanges} package
+    for a description of these arguments.
+  }
+  \item{...}{Arguments to be passed to or from methods.}
   \item{columns}{Columns to include in the output.
     See \code{?\link{transcripts}} for the possible values.}
 }
@@ -68,22 +67,38 @@ cdsByOverlaps(x, ranges,
   \itemize{
     \item \code{\link{transcripts}} and \code{\link{transcriptsBy}}
           for more ways to extract genomic features
-          from a \link{TxDb} object.
+          from a \link{TxDb}-like object.
+
+    \item \code{\link{transcriptLengths}} for extracting the transcript
+          lengths (and other metrics) from a \link{TxDb} object.
+
+    \item \code{\link{exonicParts}} and \code{\link{intronicParts}} for
+          extracting non-overlapping exonic or intronic parts from a
+          TxDb-like object.
+
+    \item \code{\link{extractTranscriptSeqs}} for extracting transcript
+          (or CDS) sequences from chromosome sequences.
+
+    \item \code{\link{coverageByTranscript}} for computing coverage by
+          transcript (or CDS) of a set of ranges.
+
     \item \link[GenomicFeatures]{select-methods} for how to use the
           simple "select" interface to extract information from a
           \link{TxDb} object.
+
     \item \code{\link{id2name}} for mapping \link{TxDb} internal ids
           to external names for a given feature type.
+
     \item The \link{TxDb} class.
   }
 }
 \examples{
-  txdb <- loadDb(system.file("extdata", "hg19_knownGene_sample.sqlite",
-                                   package="GenomicFeatures"))
-  gr <- GRanges(seqnames = rep("chr1",2),
-                ranges = IRanges(start=c(500,10500), end=c(10000,30000)),
-                strand = strand(rep("-",2)))
-  transcriptsByOverlaps(txdb, gr)
+txdb <- loadDb(system.file("extdata", "hg19_knownGene_sample.sqlite",
+                           package="GenomicFeatures"))
+gr <- GRanges(Rle("chr1", 2),
+              IRanges(c(500,10500), c(10000,30000)),
+              strand = Rle("-", 2))
+transcriptsByOverlaps(txdb, gr)
 }
 
 \keyword{methods}
diff --git a/vignettes/GenomicFeatures.Rnw b/vignettes/GenomicFeatures.Rnw
index 2f2f047..4ee0386 100644
--- a/vignettes/GenomicFeatures.Rnw
+++ b/vignettes/GenomicFeatures.Rnw
@@ -43,11 +43,10 @@ BioMart\footnote{\url{http://www.biomart.org/}} data resources. The
 package is useful for ChIP-chip, ChIP-seq, and RNA-seq analyses.
 
 <<loadGenomicFeatures>>=
-library("GenomicFeatures")
+suppressPackageStartupMessages(library('GenomicFeatures'))
 @
 
 
-
 \section{\Rclass{TxDb} Objects}
 
 The \Rpackage{GenomicFeatures} package uses \Rclass{TxDb}
@@ -66,7 +65,6 @@ gene identifiers.
 
 \section{Retrieving Data from \Rclass{TxDb} objects}
 
-
 \subsection{Loading Transcript Data}
 
 There are two ways that users can load pre-existing data to generate a
@@ -90,7 +88,6 @@ txdb
 In this case, the \Rclass{TxDb} object has been returned by
 the \Rfunction{loadDb} method.
 
-
 More commonly however, we expect that users will just load a
 TxDb annotation package like this:
 
@@ -105,16 +102,12 @@ object, and by default that object will have the same name as the
 package itself.
 
 
-
-
-
 \subsection{Pre-filtering data based on Chromosomes}
 It is possible to filter the data that is returned from a
 \Rclass{TxDb} object based on it's chromosome.  This can be a
 useful way to limit the things that are returned if you are only
 interested in studying a handful of chromosomes.
 
-
 To determine which chromosomes are currently active, use the
 \Rfunction{seqlevels} method.  For example:
 
@@ -141,7 +134,6 @@ to the seqlevels stored in the db), then set the seqlevels to
 seqlevels(txdb) <- seqlevels0(txdb)
 @ 
 
-
 \begin{Exercise}
 Use \Rfunction{seqlevels} to set only chromsome 15 to be active.  BTW,
 the rest of this vignette will assume you have succeeded at this.
@@ -149,11 +141,11 @@ the rest of this vignette will assume you have succeeded at this.
 \begin{Solution}
 <<seqlevels4>>=
 seqlevels(txdb) <- "chr15"
+seqlevels(txdb)
 @
 \end{Solution}
 
 
-
 \subsection{Retrieving data using the select method}
 
 The \Rclass{TxDb} objects inherit from \Rclass{AnnotationDb}
@@ -189,9 +181,6 @@ select(txdb, keys=keys, columns=cols, keytype="GENEID")
 \end{Solution}
 
 
-
-%% TODO: add exercises to the sections that follow
-
 \subsection{Methods for returning \Rclass{GRanges} objects}
 
 Retrieving data with select is useful, but sometimes it is more
@@ -223,17 +212,40 @@ strand on the left side and then present related metadata on the right
 side.  At the bottom, the seqlengths display all the possible seqnames
 along with the length of each sequence.
 
+The \Rfunction{strand} function is used to obtain the strand 
+information from the transcripts.  The sum of the Lengths of the 
+\Rclass{Rle} object that \Rfunction{strand} returns is equal to the 
+length of the \Rclass{GRanges} object.
+
+<<transcripts2>>=
+tx_strand <- strand(GR)
+tx_strand
+sum(runLength(tx_strand))
+length(GR)
+@
 
 In addition, the \Rfunction{transcripts} function can also be used to
 retrieve a subset of the transcripts available such as those on the
 $+$-strand of chromosome 1.
 
-<<transcripts2>>=
+<<transcripts3>>=
 GR <- transcripts(txdb, filter=list(tx_chrom = "chr15", tx_strand = "+"))
 length(GR)
 unique(strand(GR))
 @
 
+The \Rfunction{promoters} function computes a \Rclass{GRanges} object 
+that spans the promoter region around the transcription start site 
+for the transcripts in a \Rclass{TxDb} object.  The \Rcode{upstream} 
+and \Rcode{downstream} arguments define the number of bases upstream 
+and downstream from the transcription start site that make up the 
+promoter region.
+
+<<transcripts4>>=
+PR <- promoters(txdb, upstream=2000, downstream=400)
+PR
+@
+
 The \Rfunction{exons} and \Rfunction{cds} functions can also be used
 in a similar fashion to retrive genomic coordinates for exons and
 coding sequences.
@@ -275,7 +287,7 @@ class object.  As with \Rclass{GRanges} objects, you can learn more
 about these objects by reading the \Rpackage{GenomicRanges}
 introductory vignette.  The \Rfunction{show} method for a
 \Rclass{GRangesList} object will display as a list of \Rclass{GRanges}
-objects.  And, at the bottom the seqlengths will be displayed once for
+objects.  And, at the bottom the seqinfo will be displayed once for
 the entire list.
 
 For each of these three functions, there is a limited set of options
@@ -284,7 +296,6 @@ For the \Rfunction{transcriptsBy} function, you can group by gene,
 exon or cds, whereas for the \Rfunction{exonsBy} and \Rfunction{cdsBy}
 functions can only be grouped by transcript (tx) or gene.
 
-
 So as a further example, to extract all the exons for each transcript
 you can call:
 
@@ -313,7 +324,6 @@ was the case in the 2nd example.  Even though the results will
 sometimes have to come back to you as synthetic IDs, you can still
 always retrieve the original IDs.  
 
-
 \begin{Exercise}
 Starting with the tx\_ids that are the names of the GRList object we
 just made, use \Rfunction{select} to retrieve that matching transcript
@@ -328,7 +338,6 @@ head(select(txdb, keys=tx_ids, columns="TXNAME", keytype="TXID"))
 @ 
 \end{Solution}
 
-
 Finally, the order of the results in a \Rclass{GRangesList} object can
 vary with the way in which things were grouped. In most cases the
 grouped elements of the \Rclass{GRangesList} object will be listed in
@@ -355,16 +364,12 @@ length(threeUTRsByTranscript(txdb))
 @
 
 
-
-
-
 \subsection{Getting the actual sequence data}
 
 The \Rpackage{GenomicFeatures} package also provides provides
 functions for converting from ranges to actual sequence (when paired
 with an appropriate \Rpackage{BSgenome} package).
 
-
 <<extract>>=
 library(BSgenome.Hsapiens.UCSC.hg19)
 tx_seqs1 <- extractTranscriptSeqs(Hsapiens, TxDb.Hsapiens.UCSC.hg19.knownGene,
@@ -395,7 +400,6 @@ translate(cds_seqs)
 \end{Solution}
 
 
-
 \section{Creating New \Rclass{TxDb} Objects or Packages}
 
 The \Rpackage{GenomicFeatures} package provides functions to create
@@ -470,6 +474,13 @@ Instead, we suggest that you save your annotation objects and label
 them with an appropriate time stamp so as to facilitate reproducible
 research.
 
+\subsection{Using \Rfunction{makeTxDbFromEnsembl}}
+
+The \Rfunction{makeTxDbFromEnsembl} function creates a \Rclass{TxDb} object
+for a given organism by importing the genomic locations of its transcripts,
+exons, CDS, and genes from an Ensembl database.
+
+See \Rcode{?makeTxDbFromEnsembl} for more information.
 
 \subsection{Using \Rfunction{makeTxDbFromGFF}}
 

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-genomicfeatures.git



More information about the debian-med-commit mailing list