[med-svn] [r-bioc-annotationhub] 09/11: New upstream version 2.8.1

Andreas Tille tille at debian.org
Fri Sep 22 11:40:49 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-bioc-annotationhub.

commit 043bbce7096ae08210841f12f9dd5990bc3ad886
Author: Andreas Tille <tille at debian.org>
Date:   Fri Sep 22 13:33:20 2017 +0200

    New upstream version 2.8.1
---
 DESCRIPTION                               |  43 ++
 NAMESPACE                                 |  87 ++++
 NEWS                                      | 198 ++++++++
 R/AllGenerics.R                           |  56 +++
 R/AnnotationHub-class.R                   | 121 +++++
 R/AnnotationHubOption.R                   |  52 +++
 R/AnnotationHubResource-class.R           | 275 ++++++++++++
 R/BEDResource-class.R                     |  57 +++
 R/EnsDbResource-class.R                   |  13 +
 R/EpigenomeResource-class.R               | 144 ++++++
 R/Hub-class.R                             | 466 +++++++++++++++++++
 R/Hub-utils.R                             | 134 ++++++
 R/ProteomicsResource-class.R              |  34 ++
 R/cache-utils.R                           | 114 +++++
 R/db-utils.R                              | 174 +++++++
 R/sql-utils.R                             | 361 +++++++++++++++
 R/utilities.R                             | 159 +++++++
 R/zzz.R                                   |  31 ++
 build/vignette.rds                        | Bin 0 -> 287 bytes
 debian/README.test                        |  10 -
 debian/changelog                          |  35 --
 debian/compat                             |   1 -
 debian/control                            |  34 --
 debian/copyright                          | 106 -----
 debian/docs                               |   1 -
 debian/rules                              |   4 -
 debian/source/format                      |   1 -
 debian/tests/control                      |   3 -
 debian/tests/run-unit-test                |   7 -
 debian/watch                              |   3 -
 inst/doc/AnnotationHub-HOWTO.R            | 158 +++++++
 inst/doc/AnnotationHub-HOWTO.Rmd          | 367 +++++++++++++++
 inst/doc/AnnotationHub-HOWTO.html         | 724 ++++++++++++++++++++++++++++++
 inst/doc/AnnotationHub.R                  |  71 +++
 inst/doc/AnnotationHub.Rmd                | 235 ++++++++++
 inst/doc/AnnotationHub.html               | 391 ++++++++++++++++
 inst/scripts/shinyTest.R                  | 180 ++++++++
 inst/scripts/test.R                       | 244 ++++++++++
 inst/unitTests/test_AnnotationHub-class.R |  31 ++
 inst/unitTests/test_cache.R               |  17 +
 inst/unitTests/test_tidyGRanges.R         |  32 ++
 inst/unitTests/test_utilities.R           |   3 +
 man/AnnotationHub-class.Rd                | 402 +++++++++++++++++
 man/AnnotationHub-package.Rd              |  14 +
 man/AnnotationHubResource-class.Rd        |  52 +++
 man/getAnnotationHubOption.Rd             |  72 +++
 man/listResources.Rd                      |  71 +++
 tests/runTests.R                          |   1 +
 vignettes/AnnotationHub-HOWTO.Rmd         | 367 +++++++++++++++
 vignettes/AnnotationHub.Rmd               | 235 ++++++++++
 vignettes/display.png                     | Bin 0 -> 170334 bytes
 51 files changed, 6186 insertions(+), 205 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
new file mode 100644
index 0000000..28925ea
--- /dev/null
+++ b/DESCRIPTION
@@ -0,0 +1,43 @@
+Package: AnnotationHub
+Type: Package
+Title: Client to access AnnotationHub resources
+Version: 2.8.1
+Authors at R: 
+    c(person("Martin", "Morgan", email="martin.morgan at roswellpark.org",
+             role="cre"),
+      person("Marc", "Carlson", role="ctb"),
+      person("Dan", "Tenenbaum", role="ctb"),
+      person("Sonali", "Arora", role="ctb"))
+biocViews: Infrastructure, DataImport, GUI, ThirdPartyClient
+Maintainer: Bioconductor Package Maintainer <maintainer at bioconductor.org>
+Description: This package provides a client for the Bioconductor
+    AnnotationHub web resource. The AnnotationHub web resource
+    provides a central location where genomic files (e.g., VCF, bed,
+    wig) and other resources from standard locations (e.g., UCSC,
+    Ensembl) can be discovered. The resource includes metadata about
+    each resource, e.g., a textual description, tags, and date of
+    modification. The client creates and manages a local cache of
+    files retrieved by the user, helping with quick and reproducible
+    access.
+License: Artistic-2.0
+Depends: BiocGenerics (>= 0.15.10)
+Imports: utils, methods, grDevices, RSQLite, BiocInstaller,
+        AnnotationDbi (>= 1.31.19), S4Vectors, interactiveDisplayBase,
+        httr, yaml
+Suggests: IRanges, GenomicRanges, GenomeInfoDb, VariantAnnotation,
+        Rsamtools, rtracklayer, BiocStyle, knitr, AnnotationForge,
+        rBiopaxParser, RUnit, GenomicFeatures, MSnbase, mzR,
+        Biostrings, SummarizedExperiment, ExperimentHub
+Enhances: AnnotationHubData
+Collate: AnnotationHubOption.R AllGenerics.R Hub-class.R db-utils.R
+        AnnotationHub-class.R AnnotationHubResource-class.R
+        BEDResource-class.R ProteomicsResource-class.R
+        EpigenomeResource-class.R EnsDbResource-class.R utilities.R
+        sql-utils.R Hub-utils.R cache-utils.R zzz.R
+VignetteBuilder: knitr
+NeedsCompilation: yes
+Author: Martin Morgan [cre],
+  Marc Carlson [ctb],
+  Dan Tenenbaum [ctb],
+  Sonali Arora [ctb]
+Packaged: 2017-05-03 23:27:00 UTC; biocbuild
diff --git a/NAMESPACE b/NAMESPACE
new file mode 100644
index 0000000..3aa1421
--- /dev/null
+++ b/NAMESPACE
@@ -0,0 +1,87 @@
+import(methods)
+import(BiocGenerics)
+import(RSQLite)
+import(S4Vectors)
+
+importFrom(stats, setNames)
+importFrom(grDevices, rgb)
+importFrom(utils, packageVersion, read.csv, read.delim, unzip, 
+    capture.output, read.table, .DollarNames
+)
+importFrom(httr, 
+    GET, content, parse_url, progress, write_disk,
+    stop_for_status, status_code, use_proxy
+)
+importFrom(yaml, yaml.load)
+importFrom(interactiveDisplayBase, display)
+importFrom(BiocInstaller, biocVersion, isDevel)
+
+importMethodsFrom(AnnotationDbi, dbconn, dbfile)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Export S4 classes
+###
+
+exportClasses(
+    Hub, 
+    AnnotationHub, AnnotationHubResource
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Export S4 methods for generics not defined in AnnotationHub 
+###
+
+exportMethods(
+    names, length, "$", "[", subset, "[[", as.list, c,
+    show, fileName, display, mcols, dbconn, dbfile
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Export S4 generics defined in AnnotationHub + export corresponding
+### methods
+###
+
+export(
+    cache, "cache<-", query,
+    hubUrl, hubCache, hubDate,
+    snapshotDate, "snapshotDate<-",
+    getHub, listResources, loadResources,
+    package
+)
+
+exportMethods(
+    cache, "cache<-", query,
+    hubUrl, hubCache, hubDate, 
+    snapshotDate, "snapshotDate<-",
+    getHub, listResources, loadResources,
+    package
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Export non-generic functions
+###
+
+export(
+    .Hub, AnnotationHub, display, 
+    removeCache,
+    getAnnotationHubOption, setAnnotationHubOption, 
+    possibleDates, mcols,
+    .httr_proxy, .hub_option_key, .db_close,
+    recordStatus
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Export S3 methods
+###
+
+S3method(.DollarNames, Hub)
+
+S3method(as.list, Hub)
+
+### Also export them thru the export() directive so that (a) they can be
+### called directly, (b) tab-completion on the name of the generic shows them,
+### and (c) methods() doesn't asterisk them.
+export(
+    .DollarNames.Hub, 
+    as.list.Hub
+)
diff --git a/NEWS b/NEWS
new file mode 100644
index 0000000..4468271
--- /dev/null
+++ b/NEWS
@@ -0,0 +1,198 @@
+CHANGES IN VERSION 2.8.0
+------------------------
+
+NEW FEATURES
+
+    o add .get1,RDSResource-method
+
+    o add RdsResource class
+
+    o add EnsDb dispatch class
+
+    o expose rdatapath in metadata
+
+MODIFICATIONS
+
+    o modify records exposed as metadata
+      - expose records added <= snapshot date
+      - expose a single OrgDb per organism per BioC version
+
+    o edits to .get1,GenomicScores-method and .get1,GenomicScoresResource-method
+
+    o work on biocVersion and snapshotDate relationship:
+      - snapshotDate() must be <= biocVersion() release date 
+      - possibleDates() are now filtered by snapshotDate()
+
+    o remove GenomicScoresResource; Robert Castelo will handle loading these
+    resources in his GenomicScores software package
+
+    o Changed show method for hub object 
+      - removed sourcelastmodifieddate
+      - added rdatadateadded
+ 
+
+BUG FIXES
+
+    o fix bug in ordering of output from .uid0()
+
+    o fix bugs in 'snapshotDate<-' method
+
+CHANGES IN VERSION 2.6.0
+------------------------
+
+NEW FEATURES
+
+    o add vignette section on sharing resources on clusters
+
+    o add 'preparerclass' to index.rda to allow search by package name for
+      ExperimentHub objects
+
+    o add GenomicScoresResource class for Robert Castelo
+
+MODIFICATIONS
+
+    o return 'tags' metadata as list instead of comma-separated character
+      vector
+
+    o move AnnotationHubRecipes vignette to AnnotationHubData
+
+    o move listResources() and loadResources() from ExperimentHub
+
+    o expose additional fields in .DB_RESOURCE_FIELDS()
+
+    o modify cache path to avoid creating a '~' directory on Mac
+
+    o use https: NCBI rul in documentation
+
+    o modify .get1,EpiExpressionTextResource-method to use 'gene_id'
+      column as row names
+
+
+CHANGES IN VERSION 2.4.0
+------------------------
+
+NEW FEATURES
+
+    o add new status codes '4' and '5' to 'statuses' mysql table;
+      change 'status_id' field to '4' for all removed records to date
+
+    o add getRecordStatus() generic
+
+    o add package() generic
+
+    o create 'Hub' VIRTUAL class
+      - add new .Hub() base constructor for all hubs
+      - add getAnnotationHubOption() and setAnnotationHubOption()
+      - promote cache() to generic 
+      - add getHub() getter for AnnotationHubResource class
+      - add getUrl(), getCache(), getDate() getters
+      - export as few db helpers as possible
+
+    o add 'EpigenomeRoadmapNarrowAllPeaks' and 
+      'EpigenomeRoadmapNarrowFDR' classes
+
+MODIFICATIONS
+
+    o distinguish between broad and narrow peak files in 
+      EpigenomeRoadmapFileResource dispatch class
+
+    o don't use cache for AnnotationHub SQLite connection
+      - originally introduced so could be closed if needed, but 
+        creates complexity
+      - instead, open / close connection around individual queries (not a
+        performance concern)
+      - expose hub, cache, proxy in AnnotationHub constructor
+      - document dbconn,Hub-method, dbfile,Hub-method, .db_close
+
+    o snapshotDate now uses timestamp (last date any row was modified) instead 
+      of rdatadateadded
+
+    o .require fails rather than emits warning
+      - unit test on .require()
+      - also, cache(hub[FALSE]) does not create spurious error
+
+    o work on removed records and biocVersion
+      - .uid0() was reorganized and no longer groups by record_id
+      - metadata is returned for records with 
+        biocversion field <= current biocVersion
+        instead of an exact match with the current version
+      - metadata is not returned for removed records
+
+BUG FIXES
+
+    o Work around httr() progress() bug by disabling progress bar
+
+
+CHANGES IN VERSION 2.2.0
+------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+    o seqinfo(GRanges) for all genomes supported by GenomeInfoDb now 
+      contain seqlengths. 
+
+
+CHANGES IN VERSION 2.1.21
+--------------------------
+
+SIGNIFICANT USER-VISIBLE CHANGES
+
+    o fileName() returns the cache path on the disk for a file that
+      has been cached and NA for files which have not been cached.
+ 
+    o The error message (when file is not downloaded from the hub)
+      displays the AnnotationHub name, title, and reason for failure.
+
+
+CHANGES IN VERSION 2.1
+----------------------
+
+NEW FEATURES
+
+    o as.list() splits AnnotationHub instances into a list of
+    instances, each with a single record. c() concatenates hubs.
+
+BUG FIXES
+
+    o cache<- now behaves as documented, e.g., removing the cached
+    version of the file.
+
+CHANGES IN VERSION 2.0.0
+------------------------
+
+NEW FEATURES
+
+    o AnnotationHub is all new.  We basically rewrote the entire thing.
+
+    o The back end is new (new database, new way of tracking/routing
+    data etc.)
+
+    o The front end is new (new AnnotationHub object, new methods, new
+    behaviors, new ways of finding and downloading data)
+
+    o The metadata has also been cleaned up and made more
+    consistent/searchable 
+
+    o The recipes that are used to populate these data have also been
+    cleaned up.  
+
+    o There is also a new vignette to explain how to use the new
+    AnnotationHub in detail 
+
+IMPROVEMENTS SINCE LAST TIME
+
+    o The old way of finding data (an enormous tree of paths), was not
+    really scalable to the amount of data we have to provide access
+    to.  So we junked it.  Now you have a number of better methods to
+    allow you to search for terms instead.
+
+    o The new hub interface can be searched using a new display
+    method, but it can *also* be searched entirely from the command
+    line.  This allows you to use it in examples and scripts in a way
+    that is friendlier for reproducible research.
+    
+    o For users who want to contribute valuable new annotation
+    resources to the AnnotationHub, it is now possible to write a
+    recipe and test that it works for yourself.  Then once you are
+    happy with it, you can contact us and we can add data to the
+    AnnotationHub.
diff --git a/R/AllGenerics.R b/R/AllGenerics.R
new file mode 100644
index 0000000..808b0c7
--- /dev/null
+++ b/R/AllGenerics.R
@@ -0,0 +1,56 @@
+### =========================================================================
+### All Generics
+### -------------------------------------------------------------------------
+###
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Hub objects 
+###
+
+setGeneric("hubCache", signature="x",
+    function(x) standardGeneric("hubCache")
+)
+setGeneric("hubUrl", signature="x",
+    function(x) standardGeneric("hubUrl")
+)
+setGeneric("hubDate", signature="x",
+    function(x) standardGeneric("hubDate")
+)
+setGeneric("snapshotDate", 
+    function(x, ...) standardGeneric("snapshotDate")
+)
+setGeneric("snapshotDate<-", signature="x",
+    function(x, value) standardGeneric("snapshotDate<-")
+)
+setGeneric("package", signature="x",
+    function(x, value) standardGeneric("package")
+)
+## cache returns either the path to the URL or the local path (in a cache)
+## along the way it downloads the resource that it locates 
+## Already expecting multiple ids from .dataPathIds(), (potentially)
+## This is the path based on the RdataPath? (currenntly its based on resouce_id)
+setGeneric("cache", signature="x",
+    function(x, ...) standardGeneric("cache")
+)
+setGeneric("cache<-", signature="x",
+    function(x, ..., value) standardGeneric("cache<-")
+)
+setGeneric("recordStatus", signature="hub",
+    function(hub, record) standardGeneric("recordStatus")
+)
+setGeneric("listResources", signature="hub",
+    function(hub, package, filterBy=character()) 
+        standardGeneric("listResources")
+)
+setGeneric("loadResources", signature="hub",
+    function(hub, package, filterBy=character()) 
+        standardGeneric("loadResources")
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### AnnotationHubResource objects
+###
+
+setGeneric("getHub", signature="x",
+    function(x) standardGeneric("getHub")
+)
diff --git a/R/AnnotationHub-class.R b/R/AnnotationHub-class.R
new file mode 100644
index 0000000..a93932e
--- /dev/null
+++ b/R/AnnotationHub-class.R
@@ -0,0 +1,121 @@
+### =========================================================================
+### AnnotationHub objects
+### -------------------------------------------------------------------------
+###
+
+setClass("AnnotationHub", contains="Hub")
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Constructor 
+###
+
+## Add code to check : https://annotationhub.bioconductor.org/metadata/highest_id
+## And if not, delete the DB so it will be re-downloaded...
+AnnotationHub <-
+    function(..., hub=getAnnotationHubOption("URL"),
+             cache=getAnnotationHubOption("CACHE"),
+             proxy=getAnnotationHubOption("PROXY")) 
+{
+    .Hub("AnnotationHub", hub, cache, proxy, ...)
+}
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Subsetting 
+###
+
+.Hub_get1 <-
+    function(x)
+{
+    if (!length(x))
+        stop("no records found for the given index")
+    if (length(x) != 1L)
+        stop("'i' must be length 1")
+
+    ## Add 'Resource' postfix to DispatchClass name
+    className <- sprintf("%sResource", .dataclass(x))
+    if (is.null(getClassDef(className))) {
+        msg <- sprintf("'%s' not available in this version of the
+            package; use biocLite() to update?",
+            names(x))
+        stop(paste(strwrap(msg, exdent=2), collapse="\n"), call.=FALSE)
+    }
+
+    tryCatch({
+        class <- new(className, hub=x)
+    }, error=function(err) {
+        stop("failed to create 'HubResource' instance",
+             "\n  name: ", names(x),
+             "\n  title: ", x$title,
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+
+    tryCatch({
+        .get1(class)
+    }, error=function(err) {
+        stop("failed to load resource",
+             "\n  name: ", names(x),
+             "\n  title: ", x$title,
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+}
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### cache method
+###
+
+setMethod("cache", "AnnotationHub",
+    function(x, ...) {
+        callNextMethod(x,
+                       cache.root=".AnnotationHub", 
+                       cache.fun=setAnnotationHubOption, 
+                       proxy=getAnnotationHubOption("PROXY"), 
+                       max.downloads=getAnnotationHubOption("MAX_DOWNLOADS"))
+    }
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### show method
+###
+
+setMethod("show", "AnnotationHub", function(object) 
+{
+    len <- length(object)
+    cat(sprintf("%s with %d record%s\n", class(object), len,
+                ifelse(len == 1L, "", "s")))
+    cat("# snapshotDate():", snapshotDate(object), "\n")
+    callNextMethod(object)
+})
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### listResources and loadResources methods
+###
+
+.filterResources_AH <- function(package, filterBy=character()) {
+    if (!is.character(filterBy))
+        stop("'filterBy' must be a character vector")
+    suppressMessages({ah <- AnnotationHub()})
+    if (!package %in% unique(package(ah)))
+        stop(paste0("no resources for package '", package, 
+                    "' were found in AnnotationHub"))
+
+    sub <- query(ah, package)
+    if (length(filterBy))
+        query(sub, filterBy)
+    else
+        sub
+}
+
+setMethod("listResources", "AnnotationHub", 
+    function(hub, package, filterBy=character()) {
+        metadata <- .filterResources_AH(package, filterBy)
+        mcols(metadata)$title
+})
+
+setMethod("loadResources", "AnnotationHub",
+    function(hub, package, filterBy=character()) {
+        metadata <- .filterResources_AH(package, filterBy)
+        ah <- AnnotationHub()
+        lapply(names(metadata), function(i) ah[[i]]) 
+})
diff --git a/R/AnnotationHubOption.R b/R/AnnotationHubOption.R
new file mode 100644
index 0000000..968a8f0
--- /dev/null
+++ b/R/AnnotationHubOption.R
@@ -0,0 +1,52 @@
+### =========================================================================
+### Code for setting options. Most are set in R/zzz.R when the object is 
+### instantiated or used internally for dispatch on different 'Hub's.
+### -------------------------------------------------------------------------
+###
+
+
+.AH_hub_options <- new.env(parent=emptyenv())
+
+.hub_option_key <- function(key0=c("URL", "CACHE", "PROXY", "MAX_DOWNLOADS"))
+    match.arg(key0)
+
+getAnnotationHubOption <- function(arg) {
+    arg <- .hub_option_key(toupper(arg))
+    .AH_hub_options[[arg]]
+}
+
+setAnnotationHubOption <- function(arg, value)
+{
+    key <- .hub_option_key(toupper(trimws(arg)))
+
+    .AH_hub_options[[key]] <- switch(key, URL=, CACHE={
+        value <- as.character(value)
+        stopifnot(isSingleString(value))
+        value
+    }, MAX_DOWNLOADS={
+        value <- as.integer(value)
+        stopifnot(isSingleInteger(value))
+        value
+    }, PROXY={
+        if (is.null(value) || inherits(value, "request"))
+            value
+        else if (isSingleString(value)) {
+            .httr_proxy(value)
+        } else {
+            txt <- "'value' must be an httr proxy request (use_proxy()),
+                    character(1), or NULL"
+            stop(paste(strwrap(txt, exdent=2), collapse="\n"))
+        }
+    })
+}
+
+.httr_proxy <- function(value)
+{
+    rm <- parse_url(value)
+    if (is.null(rm$scheme))
+        stop("PROXY 'value' does not include scheme (e.g., 'http://')")
+    rm$url <- paste0(rm$scheme, "://", rm$hostname)
+    if (!is.null(rm$port))
+        rm$port <- as.integer(rm$port)
+    do.call(use_proxy, rm[c("url", "port", "username", "password")])
+}
diff --git a/R/AnnotationHubResource-class.R b/R/AnnotationHubResource-class.R
new file mode 100644
index 0000000..2ebf57c
--- /dev/null
+++ b/R/AnnotationHubResource-class.R
@@ -0,0 +1,275 @@
+### =========================================================================
+### AnnotationHubResource objects
+### -------------------------------------------------------------------------
+###
+
+setClass("AnnotationHubResource", representation(hub="Hub"))
+
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Accessors 
+###
+
+setMethod("hubCache", "AnnotationHubResource",
+    function(x) hubCache(x at hub)
+)
+
+setMethod("hubUrl", "AnnotationHubResource",
+    function(x) hubUrl(x at hub) 
+)
+
+setMethod("getHub", "AnnotationHubResource",
+    function(x) x at hub
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+###  Show 
+###
+
+setMethod("show", "AnnotationHubResource",
+    function(object)
+{
+    cat("class:", class(object), "\n")
+})
+
+setGeneric(".get1", function(x, ...) {
+    stopifnot(is(x, "AnnotationHubResource"), length(x) == 1L)
+    standardGeneric(".get1")
+})
+
+setMethod(".get1", "AnnotationHubResource",
+    function(x, ...)
+{
+    msg <- sprintf("no '.get1' method defined for object
+        of class %s, consider defining your own.",
+        sQuote(class(x)))
+    stop(paste(strwrap(msg), collapse="\n"))
+})
+
+##
+## implementations
+##
+
+## FaFile
+
+setClass("FaFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "FaFileResource",
+    function(x, ...)
+{
+    .require("Rsamtools")
+    fa <- cache(getHub(x))
+    Rsamtools::FaFile(file=fa[1],index=fa[2])
+})
+
+## Rds / RDS
+
+## Michael's AHCytoData is the only package (I think) that uses RDS.
+## Added Rds to be compatible with Rda naming scheme.
+setClass("RdsResource", contains="AnnotationHubResource")
+setMethod(".get1", "RdsResource", function(x, ...) readRDS(cache(getHub(x))))
+
+setClass("RDSResource", contains="RdsResource")
+setMethod(".get1", "RDSResource", function(x, ...) callNextMethod(x, ...))
+
+## Rda
+
+setClass("RdaResource", contains="AnnotationHubResource")
+setMethod(".get1", "RdaResource",
+    function(x, ...)
+{
+    get(load(cache(getHub(x))))
+})
+
+setClass("data.frameResource", contains="RdaResource")
+
+setClass("GRangesResource", contains="RdaResource")
+setMethod(".get1", "GRangesResource",
+    function(x, ...)
+{
+    .require("GenomicRanges")
+    gr <- callNextMethod(x, ...)
+    .tidyGRanges(x, gr)
+})
+
+setClass("VCFResource", contains="RdaResource")
+setMethod(".get1", "VCFResource",
+    function(x, ...)
+{
+    .require("VariantAnnotation")
+    callNextMethod(x, ...)
+})
+
+## UCSC chain file
+setClass("ChainFileResource", contains="AnnotationHubResource")
+
+## trace(AnnotationHub:::.get1, tracer=browser, signature ="ChainFileResource")
+setMethod(".get1", "ChainFileResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    .require("GenomeInfoDb")
+    chain <- cache(getHub(x))
+    tf <- .gunzip(chain, tempfile())
+    tf <- rtracklayer::import.chain(tf)
+    tf[GenomeInfoDb::sortSeqlevels(names(tf))]
+})
+
+setClass("TwoBitFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "TwoBitFileResource",
+    function(x, ...) 
+{
+    .require("rtracklayer")
+    bit <- cache(getHub(x))
+    rtracklayer::TwoBitFile(bit)
+})
+
+setClass("GTFFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "GTFFileResource",
+    function(x, ...)
+{
+    message("Importing File into R ..")
+    .require("rtracklayer")
+    .require("GenomeInfoDb")
+    yy <- getHub(x)
+    gtf <- rtracklayer::import(cache(yy), format="gtf", genome=yy$genome, ...)
+    .tidyGRanges(x, gtf)
+})
+
+setClass("GFF3FileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "GFF3FileResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    yy <- getHub(x)
+    rtracklayer::import(cache(yy), format="GFF", ...)
+})
+
+setClass("BigWigFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "BigWigFileResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    er <- cache(getHub(x))
+    rtracklayer::BigWigFile(er) 
+})
+
+setClass("dbSNPVCFFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "dbSNPVCFFileResource",
+    function(x, ...) 
+{
+    .require("VariantAnnotation")
+    withCallingHandlers({
+        ## retrieve the resource
+        er <- cache(getHub(x))
+    }, warning=function(w) {
+        if (grepl("^Failed to parse headers:", conditionMessage(w))[1])
+            ## warning() something different, or...
+            invokeRestart("muffleWarning")
+    })
+    VariantAnnotation::VcfFile(file=er[1],index=er[2]) 
+})
+## SQLiteFile
+
+setClass("SQLiteFileResource", contains="AnnotationHubResource") 
+
+setMethod(".get1", "SQLiteFileResource",
+    function(x, ...)
+{
+    AnnotationDbi::loadDb(cache(getHub(x)))
+})
+
+## GRASP2 SQLiteFile
+
+setClass("GRASPResource", contains="SQLiteFileResource")
+
+setMethod(".get1", "GRASPResource",
+    function(x, ...)
+{
+    RSQLite::dbConnect(RSQLite::SQLite(), cache(getHub(x)),
+        flags=RSQLite::SQLITE_RO)
+})
+
+setClass("ZipResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "ZipResource",
+    function(x, filenames, ...)
+{
+    zip <- cache(getHub(x))
+    for (fl in filenames)
+        unzip(zip, fl, exdir=tempdir())
+    file.path(tempdir(), filenames)
+})
+
+setClass("ChEAResource", contains="ZipResource")
+
+setMethod(".get1", "ChEAResource",
+    function(x, ...)
+{
+    fl <- callNextMethod(x, filenames="chea-background.csv")
+    read.csv(fl, header=FALSE, stringsAsFactors=FALSE, 
+        col.names=c("s.no","TranscriptionFactor", "TranscriptionFactor-PubmedID", 
+        "TranscriptionFactorTarget", "PubmedID", "Experiment", "CellType",
+        "Species","DateAdded"))
+}) 
+
+setClass("BioPaxResource", contains="RdaResource")
+
+setMethod(".get1", "BioPaxResource",
+    function(x, ...)
+{
+    .require("rBiopaxParser")
+    callNextMethod(x, ...)
+})
+ 
+setClass("PazarResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "PazarResource",
+    function(x, ...)
+{
+    .require("GenomicRanges")
+    er <- cache(getHub(x))
+    colClasses <-
+        setNames(c(rep("character", 6), rep("integer", 2),
+                   rep("factor", 2), "character", "NULL"),
+                 c("PazarTFID","EnsemblTFAccession", "TFName",
+                   "PazarGeneID", "EnsemblGeneAccession", "Chr", "GeneStart",
+                   "GeneEnd", "Species", "ProjectName","PMID",
+                   "AnalysisMethod"))
+    dat <- read.delim(er, header=FALSE, col.names=names(colClasses),
+                      na.strings="-", colClasses=colClasses)
+    if (!anyNA(dat[["GeneStart"]])) {
+        dat <- GenomicRanges::makeGRangesFromDataFrame(dat,
+                                                       keep.extra.columns=TRUE)
+        dat <- .tidyGRanges(x, dat)
+    }
+    dat
+})
+ 
+
+setClass("CSVtoGrangesResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "CSVtoGrangesResource",
+   function(x, ...)
+{
+    .require("GenomicRanges")
+    yy <- getHub(x)
+    dat <- read.csv(cache(yy), header=TRUE, stringsAsFactors=FALSE)
+    dat <- dat[,!(names(dat) %in% "width")]
+    gr <- GenomicRanges::makeGRangesFromDataFrame(dat, keep.extra.columns=TRUE)
+    .tidyGRanges(x, gr)
+})
+
+setClass("ExpressionSetResource", contains="RdaResource")
+
+setMethod(".get1", "ExpressionSetResource",
+    function(x, ...)
+{
+    .require("Biobase")
+    callNextMethod(x, ...)
+})
diff --git a/R/BEDResource-class.R b/R/BEDResource-class.R
new file mode 100644
index 0000000..367c5b0
--- /dev/null
+++ b/R/BEDResource-class.R
@@ -0,0 +1,57 @@
+## THis file contains methods for all BED files
+
+setClass("BEDFileResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "BEDFileResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    yy <- getHub(x)
+    dat <- rtracklayer::BEDFile(cache(yy))
+    gr <- rtracklayer::import(dat, format="bed", genome=yy$genome, ...)
+    .tidyGRanges(x, gr)
+})
+
+setClass("UCSCBroadPeakResource", contains="BEDFileResource")
+
+setMethod(".get1", "UCSCBroadPeakResource",
+    function(x, ...)
+{
+    broadPeaksmcols <- c(signalValue="numeric",
+        pValue="numeric", qValue="numeric") 
+    callNextMethod(x, extraCols=broadPeaksmcols)
+})
+
+setClass("UCSCNarrowPeakResource", contains="BEDFileResource")
+
+setMethod(".get1", "UCSCNarrowPeakResource",
+    function(x, ...)
+{
+    narrowPeaksmcols <- c(
+        signalValue="numeric", pValue="numeric",
+        qValue="numeric", peak="numeric") 
+    callNextMethod(x, extraCols=narrowPeaksmcols)
+})
+
+setClass("UCSCBEDRnaElementsResource", contains="BEDFileResource")
+
+setMethod(".get1", "UCSCBEDRnaElementsResource",
+    function(x, ...)
+{
+    mcols <- c(name="character", score="integer",
+        strand="character",
+        level="numeric", signif="numeric", score2="numeric") 
+    callNextMethod(x, extraCols=mcols)
+})
+
+setClass("UCSCGappedPeakResource", contains="BEDFileResource")
+
+setMethod(".get1", "UCSCGappedPeakResource",
+    function(x, ...)
+{
+    gappedPeakmcols <- c(signalValue="numeric",
+        pValue="numeric", qValue="numeric")
+    callNextMethod(x, extraCols=gappedPeakmcols)    
+})
+
+
diff --git a/R/EnsDbResource-class.R b/R/EnsDbResource-class.R
new file mode 100644
index 0000000..c5f9b0a
--- /dev/null
+++ b/R/EnsDbResource-class.R
@@ -0,0 +1,13 @@
+### =========================================================================
+### EnsDb objects
+### -------------------------------------------------------------------------
+###
+
+setClass("EnsDbResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "EnsDbResource",
+    function(x, ...)
+{
+    .require("ensembldb")
+    EnsDb(cache(getHub(x)))
+})
diff --git a/R/EpigenomeResource-class.R b/R/EpigenomeResource-class.R
new file mode 100644
index 0000000..0dd81df
--- /dev/null
+++ b/R/EpigenomeResource-class.R
@@ -0,0 +1,144 @@
+### =========================================================================
+### Epigenome Objects 
+### -------------------------------------------------------------------------
+###
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### EpiMetadata, EpiExpression, EpichmmModel Objects
+###
+
+setClass("EpiMetadataResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "EpiMetadataResource",
+   function(x, ...)
+{
+    read.delim(cache(getHub(x)))
+})
+
+setClass("EpiExpressionTextResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "EpiExpressionTextResource",
+    function(x, ...)
+{
+    yy <- cache(getHub(x))
+    data <- read.table(yy, header=TRUE, row.names=1)
+    if(grepl("chr" ,rownames(data)[1])){
+       .require("SummarizedExperiment")
+       data <- SummarizedExperiment::SummarizedExperiment(
+           assays=as.matrix(data[,-c(1:2)]), 
+           rowRanges=.makeGrFromCharacterString(data))
+    }
+    data 
+})
+
+setClass("EpichmmModelsResource", contains="AnnotationHubResource")
+
+setMethod(".get1", "EpichmmModelsResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    yy <- getHub(x)
+    gr <- rtracklayer::import(cache(yy), format="bed", genome="hg19")
+    gr <- .mapAbbr2FullName(gr)
+    .tidyGRanges(x, gr)
+ 
+})
+
+## helper function which changes 'chr10:100011323-100011459<-1' to gr!
+.makeGrFromCharacterString <- function(data) {
+    nms = sub("<-*1", "", rownames(data)) 
+    lst = strsplit(nms, "[:-]") 
+    v = function(x, i) vapply(x, "[[", "character", i)
+    gr = GenomicRanges::GRanges(v(lst, 1), 
+       IRanges::IRanges(as.integer(v(lst, 2)), as.integer(v(lst, 3))))
+    mcols(gr) = data[,1:2]
+    gr
+}
+
+
+## this data is got from :
+## chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/labelmap_15_coreMarks.tab
+## chromhmmSegmentations/ChmmModels/coreMarks/jointModel/final/colormap_15_coreMarks.tab
+## the bed file has abbr - and these 3 columns are helpful for collaborators.
+.mapAbbr2FullName <- function(gr) {
+    map <- as.data.frame(matrix(c(
+        "1_TssA", "Active TSS", "Red", rgb(255,0,0, maxColorValue=255),
+        "2_TssAFlnk", "Flanking Active TSS", "Orange Red", rgb(255,69,0, 
+             maxColorValue=255),
+        "3_TxFlnk", "Transcr. at gene 5' and 3'", "LimeGreen", rgb(50,205,50, 
+             maxColorValue=255),
+        "4_Tx", "Strong transcription", "Green", rgb(0,128,0, 
+             maxColorValue=255),
+        "5_TxWk", "Weak transcription", "DarkGreen", rgb(0,100,0, 
+             maxColorValue=255),
+        "6_EnhG", "Genic enhancers", "GreenYellow", rgb(194,225,5, 
+             maxColorValue=255),
+        "7_Enh", "Enhancers", "Yellow", rgb(255,255,0, 
+             maxColorValue=255),
+        "8_ZNF/Rpts", "ZNF genes & repeats", "Medium Aquamarine", 
+             rgb(102,205,170, maxColorValue=255),
+        "9_Het", "Heterochromatin", "PaleTurquoise", rgb(138,145,208, 
+             maxColorValue=255),
+        "10_TssBiv", "Bivalent/Poised TSS", "IndianRed", 
+             rgb(205,92,92, maxColorValue=255),
+        "11_BivFlnk", "Flanking Bivalent TSS/Enh", "DarkSalmon", 
+             rgb(233,150,122, maxColorValue=255),
+        "12_EnhBiv", "Bivalent Enhancer", "DarkKhaki", 
+             rgb(189,183,107, maxColorValue=255),
+        "13_ReprPC", "Repressed PolyComb", "Silver", 
+             rgb(128,128,128, maxColorValue=255),
+        "14_ReprPCWk", "Weak Repressed PolyComb", "Gainsboro", 
+             rgb(192,192,192, maxColorValue=255),
+        "15_Quies", "Quiescent/Low", "White", rgb(255,255,255, maxColorValue=255)), 
+             byrow=TRUE, nrow=15), stringsAsFactors=FALSE)
+    colnames(map) <- c("abbr", "name", "color_name", "color_code")
+
+    ##perform the mapping
+    toMatch <- mcols(gr)$name
+    newdf <- map[match(toMatch, map$abbr),]
+    mcols(gr) <- newdf
+    gr
+}
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### EpigenomeRoadmap* Objects
+###
+
+### These classes are similar to the UCSCBroadPeak, UCSCNarrowPeak, family.
+### The individual methods add 'extraCols' then dispatch through 
+### BEDFileResource. 
+
+setClass("EpigenomeRoadmapFileResource", contains="AnnotationHubResource")
+
+## NOTE: This dispatch class encompasses both broad and narrow peak -
+##       (Precursors were UCSCNarrowPeak and UCSCBroadPeak?)
+setMethod(".get1", "EpigenomeRoadmapFileResource",
+    function(x, ...)
+{
+    .require("rtracklayer")
+    yy <- getHub(x)
+    extraCols=c(signalValue="numeric", pValue="numeric", qValue="numeric")
+    if (grepl("narrow", yy$sourceurl))
+        extraCols=c(extraCols, peak="numeric")
+    gr <- rtracklayer::import(cache(yy), format="bed", genome=yy$genome,
+                              extraCols=extraCols)
+    .tidyGRanges(x, gr)
+})
+setClass("EpigenomeRoadmapNarrowAllPeaksResource", 
+         contains="BEDFileResource")
+
+setMethod(".get1", "EpigenomeRoadmapNarrowAllPeaksResource",
+    function(x, ...)
+{
+    narrowAllPeaks <- c(peakTagDensity="numeric")
+    callNextMethod(x, extraCols=narrowAllPeaks)
+})
+
+setClass("EpigenomeRoadmapNarrowFDRResource", 
+         contains="BEDFileResource")
+
+setMethod(".get1", "EpigenomeRoadmapNarrowFDRResource",
+    function(x, ...)
+{
+    narrowFDR <- c(peakTagDensity="numeric", zScore="numeric")
+    callNextMethod(x, extraCols=narrowFDR)
+})
diff --git a/R/Hub-class.R b/R/Hub-class.R
new file mode 100644
index 0000000..6f744d2
--- /dev/null
+++ b/R/Hub-class.R
@@ -0,0 +1,466 @@
+### =========================================================================
+### Hub objects
+### -------------------------------------------------------------------------
+###
+
+setClass("Hub",
+    representation("VIRTUAL",
+        hub="character",       ## equivalent to cache URL
+        cache="character",     ## equivalent to cache CACHE 
+        ## FIXME: why was @date defined as data type 'character'
+        date="character",
+        .db_path="character", 
+        .db_index="character",
+        .db_uid="integer"
+    )
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Constructor for subclasses
+###
+
+.Hub <- function(.class, url, cache, proxy, ...) {
+    db_path <- .create_cache(cache, .class)
+    db_path <- .db_get(db_path, url, proxy)
+    db_date <- .restrictDateByVersion(db_path)
+    db_uid <- .db_uid0(db_path, db_date)
+    hub <- new(.class, cache=cache, hub=url, date=db_date, 
+               .db_path=db_path, .db_uid=db_uid, ...)
+    message("snapshotDate(): ", snapshotDate(hub))
+    index <- .db_create_index(hub)
+    .db_index(hub) <- index 
+    hub
+}
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Validity
+###
+
+.valid.Hub <- function(object)
+{
+    message <- NULL
+    if (!isSingleString(hubUrl(object)))
+        stop("'hubUrl(object)' must be a single string (url)")
+    if (!isSingleString(hubCache(object)))
+        stop("'hubCache(object)' must be a single string (directory path)")
+    message
+}
+
+setValidity("Hub",
+    function(object)
+    {
+        problems <- .valid.Hub(object)
+        if (is.null(problems)) TRUE else problems
+    }
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Accessors
+###
+
+setMethod("hubCache", "Hub",
+    function(x) x at cache
+)
+
+setMethod("hubUrl", "Hub",
+    function(x) x at hub 
+)
+
+setMethod("hubDate", "Hub",
+    function(x) x at date 
+)
+
+setMethod("package", "Hub",
+    function(x) character() 
+)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### cache methods
+###
+
+## FIXME: rather not pass cache.root and cache.fun
+setMethod("cache", "Hub",
+    function(x, ..., cache.root, cache.fun, proxy, max.downloads)
+        .cache_internal(x, cache.root=cache.root, cache.fun=cache.fun, 
+                        proxy=proxy, max.downloads=max.downloads)
+)
+
+setReplaceMethod("cache", "Hub",
+    function(x, ..., value)
+{
+    stopifnot(identical(value, NULL))
+    cachepath <- .cache_path(x, .datapathIds(x))
+    result <- unlink(cachepath)
+    status <- file.exists(cachepath)
+    if (any(status))
+        warning("failed to unlink cache files:\n  ",
+                paste(sQuote(.cache_path(x)[status]), collapse="\n  "))
+    x
+})
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Accessor-like methods 
+###
+
+setMethod("mcols", "Hub", function(x) DataFrame(.resource_table(x)))
+
+setMethod("names", "Hub",
+    function(x)
+{
+    as.character(names(.db_uid(x)))
+})
+
+setMethod("fileName", signature=(object="Hub"),
+    function(object, ...)
+{
+    cachepath <- .named_cache_path(object)
+    cachepath[!file.exists(cachepath)] <- NA_character_
+    cachepath
+})
+
+setMethod("length", "Hub",
+    function(x) length(.db_uid(x))
+)
+
+setMethod("snapshotDate", "Hub", function(x) x at date)
+
+setReplaceMethod("snapshotDate", "Hub", 
+    function(x, value) 
+{
+    if (length(value) != 1L)
+        stop("'value' must be a single date or character string")
+    tryCatch({
+       fmt_value <- as.POSIXlt(value)
+    }, error=function(err) {
+        stop("'value' must be a single date or character string")
+    })
+
+    ## 'value' must be < biocVersion() release date
+    restrict <- .restrictDateByVersion(dbfile(x))
+    dates <- .possibleDates(dbfile(x))
+    valid_range <- range(dates[as.POSIXlt(dates) <= as.POSIXlt(restrict)])
+    if (as.POSIXlt(value) > max(valid_range) ||
+        as.POSIXlt(value) < min(valid_range))
+        stop("'value' must be in the range of possibleDates(x)")
+ 
+    new(class(x), cache=hubCache(x), hub=hubUrl(x), 
+        date=as.character(value), 
+        .db_path=x at .db_path,
+        .db_index=x at .db_index,
+        .db_uid=.db_uid0(x at .db_path, value))
+})
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Subsetting 
+###
+
+setMethod("[", c("Hub", "numeric", "missing"),
+    function(x, i, j, ..., drop=TRUE) 
+{
+    idx <- na.omit(match(i, seq_along(.db_uid(x)))) 
+    ## seq_along creates a special problem for show()...
+    .db_uid(x) <- .db_uid(x)[idx]
+    x
+})
+
+setMethod("[", c("Hub", "logical", "missing"),
+    function(x, i, j, ..., drop=TRUE)
+{
+    i[is.na(i)] <- FALSE
+    .db_uid(x) <- .db_uid(x)[i]
+    x
+})
+
+setMethod("[", c("Hub", "character", "missing"),
+    function(x, i, j, ..., drop=TRUE) 
+{
+    idx <- na.omit(match(i, names(.db_uid(x))))
+    .db_uid(x) <- .db_uid(x)[idx]
+    x
+})
+
+setReplaceMethod("[", c("Hub", "numeric", "missing", "Hub"),
+    function(x, i, j, ..., value)
+{
+    .db_uid(x)[i] <- .db_uid(value)
+    x
+})
+
+setReplaceMethod("[", c("Hub", "logical", "missing", "Hub"),
+    function(x, i, j, ..., value)
+{
+    .db_uid(x)[i] <- .db_uid(value)
+    x
+})
+
+setReplaceMethod("[", 
+    c("Hub", "character", "missing", "Hub"),
+    function(x, i, j, ..., value)
+{
+    idx <- match(i, names(.db_uid(x)))
+    isNA <- is.na(idx)
+    .db_uid(x)[idx[!isNA]] <- .db_uid(value)[!isNA]
+    x
+})
+
+setMethod("[[", c("Hub", "numeric", "missing"),
+    function(x, i, j, ...)
+{
+    .Hub_get1(x[i])
+})
+
+setMethod("[[", c("Hub", "character", "missing"),
+    function(x, i, j, ...)
+{
+    if (length(i) != 1L)
+        stop("'i' must be length 1")
+    idx <- match(i, names(.db_uid(x)))
+    if (is.na(idx))
+        stop(recordStatus(x, i)$status)
+    .Hub_get1(x[idx])
+})
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### $, query() and subset() methods
+###
+
+setMethod("$", "Hub",
+    function(x, name)
+{
+    switch(name,
+     "tags"=unname(.collapse_as_string(x, .tags)),
+     "rdataclass"=unname(.collapse_as_string(x, .rdataclass)),
+     "rdatapath"=unname(.collapse_as_string(x, .rdatapath)),
+     "sourceurl"=unname(.collapse_as_string(x, .sourceurl)),
+     "sourcetype"=unname(.collapse_as_string(x, .sourcetype)), 
+     .resource_column(x, name))    ## try to get it from main resources table
+})
+
+.DollarNames.Hub <-
+    function(x, pattern="")
+{
+    values <- c(.resource_columns(), "tags", "rdataclass",
+                "sourceurl", "sourcetype")
+    grep(pattern, values, value=TRUE)
+}
+
+setGeneric("query", function(x, pattern, ...) standardGeneric("query"),
+    signature="x")
+
+setMethod("query", "Hub",
+    function(x, pattern, ignore.case=TRUE, pattern.op=`&`)
+{
+    tbl <- .db_index_load(x)
+    idx <- logical()
+    for (pat in pattern) {
+        idx0 <- grepl(pat, tbl, ignore.case=ignore.case)
+        if (length(idx))
+            idx <- pattern.op(idx, idx0) # pattern.op for combining patterns
+        else
+            idx <- idx0
+    }
+
+    x[idx]
+})
+
+setMethod("subset", "Hub",
+    function(x, subset)
+{
+    tbl <- mcols(x)
+    idx <- S4Vectors:::evalqForSubset(subset, tbl)
+    x[idx]
+})
+
+as.list.Hub <- function(x, ..., use.names=TRUE) {
+    ans <- lapply(seq_along(x), function(i, x) x[i], x)
+    if (use.names)
+        names(ans) <- names(x)
+    ans
+}
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Coercion
+###
+
+setMethod("as.list", "Hub", as.list.Hub)
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Combining
+###
+
+setMethod("c", "Hub",
+    function(x, ..., recursive=FALSE)
+{
+    if (!identical(recursive, FALSE)) 
+        stop("'recursive' argument not supported")
+    if (missing(x)) 
+        args <- unname(list(...))
+    else args <- unname(list(x, ...))
+
+    .test <- function(args, fun, what) {
+        ans <- unique(vapply(args, fun, character(1)))
+        if (length(ans) != 1L)
+            stop(sprintf("'%s' differs between arguments", what))
+    }
+    .test(args, hubCache, "hubCache")
+    .test(args, hubUrl, "hubUrl")
+    .test(args, snapshotDate, "snapshotDate")
+    .test(args, dbfile, "dbfile")
+
+    db_uid <- unlist(lapply(unname(args), .db_uid))
+    if (length(db_uid) == 0 && is.null(names(db_uid)))
+        names(db_uid) <- character()
+    udb_uid <- unique(db_uid)
+    idx <- match(udb_uid, db_uid)
+    .db_uid <- setNames(udb_uid, names(db_uid)[idx])
+    initialize(args[[1]], .db_uid=.db_uid)
+})
+
+## FIXME:
+## trace (as below) wasn't working (not sure why)
+## trace(subset, browser(), signature='AnnotationHub')
+## debug(AnnotationHub:::.subset)
+## Tests:
+## library(AnnotationHub);debug(AnnotationHub:::.subset);ah = AnnotationHub()
+## ahs <- subset(ah, ah$genome=='ailMel1')
+## ahs <- subset(ah, ah$rdataclass=='VCF')
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### Show 
+###
+
+.pprintf0 <- function(fmt, ...)
+    paste(strwrap(sprintf(fmt, ...), exdent=2), collapse="\n# ")
+
+.pprintf1 <-
+    function(column, values, width=getOption("width") - 1L)
+{
+    answer <- sprintf("# $%s: %s", column, paste(values, collapse=", "))
+    ifelse(nchar(answer) > width,
+           sprintf("%s...\n", substring(answer, 1L, width - 3L)),
+           sprintf("%s\n", answer))
+}
+
+.show <- function(object)
+{
+    if (!length(object))
+        return(NULL)
+
+    .some <-
+        function(elt, nh, nt, fill="...", width=getOption("width") - 13L)
+    {
+        answer <- if (length(elt) < nh + nt + 1L)
+            elt
+        else
+            c(head(elt, nh), fill, tail(elt, nt))
+        ifelse(nchar(answer) > width,
+               sprintf("%s...", substring(answer, 1L, width-3L)),
+               answer)
+    }
+
+    cat(.pprintf1("dataprovider", .count_resources(object, "dataprovider")))
+    cat(.pprintf1("species", .count_resources(object, "species")))
+    cat(.pprintf1("rdataclass",
+                  .count_join_resources(object, "rdatapaths", "rdataclass")))
+    shown <- c("dataprovider", "species", "rdataclass", "title")
+    cols <- paste(setdiff(names(mcols(object[0])), shown), collapse=", ")
+    cat(.pprintf0("# additional mcols(): %s", cols), "\n")
+    cat(.pprintf0("# retrieve records with, e.g., 'object[[\"%s\"]]'",
+                  names(object)[[1]]), "\n")
+
+    if (len <- length(object)) {
+        nhead <- get_showHeadLines()
+        ntail <- get_showTailLines()
+        title <- .title_data.frame(object)
+        rownames <- paste0("  ", .some(rownames(title), nhead, ntail))
+        out <- matrix(c(.some(rep("|", len), nhead, ntail, fill=""),
+                        .some(title[[1]], nhead, ntail)),
+                      ncol=2L,
+                      dimnames=list(rownames, c("", "title")))
+
+        cat("\n")
+        print(out, quote=FALSE, right=FALSE)
+    }
+}
+
+.show1 <- function(object)
+{
+    rsrc <- .resource_table(object)
+    size <- .collapse_as_string(object, .sourcesize)   
+    date <- .collapse_as_string(object, .sourcelastmodifieddate)
+ 
+    cat("# names(): ", names(object)[[1]], "\n", sep="")
+    if (length(package(object)) > 0L)
+        cat("# package(): ", package(object)[[1]], "\n", sep="")
+    cat(.pprintf1("dataprovider", rsrc[["dataprovider"]]))
+    cat(.pprintf1("species", rsrc[["species"]]))
+    cat(.pprintf1("rdataclass", rsrc[["rdataclass"]]))
+    cat(.pprintf1("rdatadateadded", rsrc[["rdatadateadded"]]))
+    cat(.pprintf1("title", rsrc[["title"]]))
+    cat(.pprintf1("description", rsrc[["description"]]))
+    cat(.pprintf1("taxonomyid", rsrc[["taxonomyid"]]))
+    cat(.pprintf1("genome", rsrc[["genome"]]))
+    cat(.pprintf1("sourcetype", rsrc[["sourcetype"]]))
+    cat(.pprintf1("sourceurl", rsrc[["sourceurl"]]))
+    cat(.pprintf1("sourcesize", size))
+    cat(.pprintf0("# $tags: %s", rsrc[["tags"]]), "\n")
+    cat(.pprintf0("# retrieve record with 'object[[\"%s\"]]'",
+                  names(object)[[1]]), "\n")
+}
+
+setMethod("show", "Hub", function(object) 
+{
+    if (length(object) == 1)
+        .show1(object)
+    else
+        .show(object)
+})
+
+### - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
+### display 
+###
+
+.display <- function(object, ...) { 
+    #metadata- do not touch
+    md <- mcols(object)
+ 
+    #create a data frame copy for edits called df
+    df <- as.data.frame(md)
+
+    #dropping column names
+    drops <-c("title", "taxonomyid", "sourceurl")
+    df <- df[,!(names(df) %in% drops), drop=FALSE]
+ 
+    summaryMessage = capture.output(show(object))
+    serverOptions= list(
+        bSortClasses=TRUE,
+        aLengthMenu = c(1000, 5000, "All"),
+        iDisplayLength = 1000,
+        "sDom" = '<"top"i>rt<"top"f>lt<"bottom"p><"clear">')
+
+    d <- display(object =df, summaryMessage = summaryMessage, 
+                 serverOptions = serverOptions)
+    idx <- rownames(d)
+    object[idx]
+}
+
+setMethod("display", signature(object="Hub"),
+          function(object) .display(object)
+)
+
+setMethod("recordStatus", "Hub",
+    function(hub, record) {
+        if (!is.character(record))
+            stop("'record' must be a character")
+        if (length(record) != 1L)
+            stop("'record' must be length 1")
+
+        conn <- dbconn(hub)
+        query <- paste0("SELECT status FROM statuses WHERE id IN ",
+                        "(SELECT status_id FROM resources WHERE ah_id = '",
+                        record, "')")
+        status=.db_query(conn, query)
+        if (nrow(status) == 0L)
+            status <- "record not found in database"
+        data.frame(record=record, status=status)
+})
diff --git a/R/Hub-utils.R b/R/Hub-utils.R
new file mode 100644
index 0000000..76965db
--- /dev/null
+++ b/R/Hub-utils.R
@@ -0,0 +1,134 @@
+## and hubUrl (https://annotationhub.bioconductor.org/)
+
+.hub_metadata_path <-
+    function(hub)
+{
+    paste(hub, "metadata", sep="/")
+}
+
+## The following helper was previously not used (and now is), but we
+## do need a way to have a path for the data that is separate from the
+## path for the .sqlite metadata (they don't need to always be at the
+## same server) - IOW these should be decoupled anyhow).
+.hub_data_path <-
+    function(hub)
+{
+##    "http://gamay:9393/fetch"
+##    "https://annotationhub.bioconductor.org/fetch"
+    sprintf("%s/fetch", hub)
+}
+
+.hub_resource_path <-
+    function(hub, resource)
+{
+    sprintf("%s/%s", hub, resource)
+}
+
+## example of hub resource (sometimes convenient)
+## hub = 'https://annotationhub.bioconductor.org/metadata/annotationhub.sqlite3'
+.hub_cache_resource <- function(hubpath, cachepath, proxy) {
+    ## retrieve file from hub to cache
+    tryCatch({
+        tmp <- tempfile()
+        ## Download the resource in a way that supports https
+        if (interactive() && (packageVersion("httr") > "1.0.0")) {
+            response <-
+                GET(hubpath, progress(), write_disk(tmp), proxy)
+            cat("\n") ## line break after progress bar
+        } else {
+            response <- GET(hubpath, write_disk(tmp), proxy)
+        }
+        if (length(status_code(response)))  
+        {
+            # FTP requests return empty status code, see
+            # https://github.com/hadley/httr/issues/190
+            if (status_code(response) != 302L)
+                stop_for_status(response)
+        }
+        if (!all(file.exists(dirname(cachepath))))
+            dir.create(dirname(cachepath), recursive=TRUE)
+        file.copy(from=tmp, to=cachepath)
+        file.remove(tmp)
+        TRUE
+    }, error=function(err) {
+        warning("download failed",
+                "\n  hub path: ", sQuote(hubpath),
+                "\n  cache path: ", sQuote(cachepath),
+                "\n  reason: ", conditionMessage(err),
+                call.=FALSE)
+        FALSE
+    })
+}
+
+## This is the function that gets stuff (metadata AND files) from S3
+.hub_resource <-
+    function(hub, resource, cachepath, proxy, overwrite=FALSE)
+{
+    len <- length(resource)
+    if (len > 0L) {
+        msg <- sprintf("retrieving %d resource%s", len,
+                       if (len > 1L) "s" else "")
+        message(msg)
+    }
+
+    if (length(cachepath)) {
+        test <- file.exists(cachepath)
+        if (!overwrite && any(test)) {
+            bad <- cachepath[test]
+            stop("download destination(s) exists",
+                 "\n  ", paste(sQuote(bad), collapse="\n  "))
+        }
+    }
+
+    hubpath <- .hub_resource_path(hub, resource)
+    mapply(.hub_cache_resource, hubpath, cachepath, MoreArgs=list(proxy))
+}
+
+### --------------------------------------------------------------------------
+### snapshotDate helpers
+
+## returns the release date for biocVersion()
+.biocVersionDate <- function(biocversion) {
+    if (length(biocversion) > 1L)
+        stop("length(biocversion) must == 1")
+
+    yaml <- content(GET("http://bioconductor.org/config.yaml"), 
+                    encoding="UTF8", as="text")
+    obj <- yaml.load(yaml)
+    release_dates <- obj$release_dates
+    version_date <- release_dates[biocversion == names(release_dates)]
+    ## convert to snapshot format 
+    if (length(version_date))
+        as.character(as.POSIXlt(version_date[[1]], format='%m/%d/%Y'))
+    else
+        character()
+}
+
+## single date closest to the release date for biocVersion()
+.restrictDateByVersion <- function(path) {
+    dates <- as.POSIXlt(.possibleDates(path), format='%Y-%m-%d')
+    restrict <- as.POSIXlt(.biocVersionDate(biocVersion()), format='%Y-%m-%d')
+    if (length(restrict))  ## release
+        as.character(max(dates[dates <= restrict]))
+    else                   ## devel 
+        as.character(max(dates))
+}
+
+## all possible dates 
+.possibleDates <- function(path) {
+    conn <- .db_open(path)
+    on.exit(.db_close(conn))
+    query <- 'SELECT DISTINCT rdatadateadded FROM resources'
+    dateAdded <- .db_query(conn, query)[[1]]
+    query <- 'SELECT DATE(timestamp) FROM timestamp'
+    dateModified <- .db_query(conn, query)[[1]]
+    c(dateAdded, dateModified)
+}
+
+## dates restricted by snapshotDate (and hence biocVersion())
+possibleDates <- function(x) {
+    path <- dbfile(x)
+    dates <- .possibleDates(path)
+    restrict <- .restrictDateByVersion(path)
+    dates[as.POSIXlt(dates) <= as.POSIXlt(restrict)]
+}
diff --git a/R/ProteomicsResource-class.R b/R/ProteomicsResource-class.R
new file mode 100644
index 0000000..20e1bad
--- /dev/null
+++ b/R/ProteomicsResource-class.R
@@ -0,0 +1,34 @@
+setClass("mzRpwizResource", contains="AnnotationHubResource")
+setMethod(".get1", "mzRpwizResource",
+    function(x, ...)
+{
+    .require("mzR")
+    yy <- cache(getHub(x))
+    mzR::openMSfile(yy, backend = "Ramp")
+})
+
+setClass("mzRidentResource", contains="AnnotationHubResource")
+setMethod(".get1", "mzRidentResource",
+    function(x, ...) 
+{
+    .require("mzR")
+    yy <- cache(getHub(x))
+    mzR::openIDfile(yy)
+})
+
+setClass("MSnSetResource", contains="RdaResource")
+setMethod(".get1", "MSnSetResource",
+    function(x, ...) 
+{
+    .require("MSnbase")
+    callNextMethod(x, ...) 
+})
+
+setClass("AAStringSetResource", contains="AnnotationHubResource")
+setMethod(".get1", "AAStringSetResource",
+     function(x, ...) 
+{
+    .require("Biostrings")
+    yy <- cache(getHub(x))
+    Biostrings::readAAStringSet(yy)
+})
diff --git a/R/cache-utils.R b/R/cache-utils.R
new file mode 100644
index 0000000..5668ef9
--- /dev/null
+++ b/R/cache-utils.R
@@ -0,0 +1,114 @@
+
+
+.cache_path <- function(hub, path, cache.root=NULL, cache.fun=NULL)
+{
+    cache <- hubCache(hub)
+    ## FIXME: when would 'cache' be missing?
+    if (missing(cache)) {
+        if (any(is.null(cache.root) | is.null(cache.fun)))
+            stop("'cache.root' and 'cache.fun' must be set if cache is missing")
+
+        ## this section replaces .cache_path_default
+        if (file.exists(cache) && file.info(cache)$isdir)
+            return(cache)
+
+        if (file.exists(cache) && !file.info(cache)$isdir) {
+            txt <- sprintf("default cache exists but is not a directory\n  %s",
+                           sQuote(cache))
+            warning(txt)
+            cache <- NULL
+        }
+
+        if (!is.null(cache) && interactive()) {
+            txt <- sprintf("create cache at %s?", sQuote(cache))
+            ans <- .ask(txt, values=c("y", "n"))
+            if (ans == "n") {
+                cache <- file.path(tempdir(), cache.root)
+                message(sprintf("using temporary cache %s", sQuote(cache)))
+            }
+        } else {
+            cache <- file.path(tempfile())
+        }
+        ## FIXME:
+        cache.fun("CACHE", cache)
+    } else if (missing(path)) {
+        cache
+    } else {
+        file.path(cache, path)
+    }
+}
+
+.create_cache <-
+    function(cache, .class)
+{
+    if (file.exists(cache)) {
+        if (!file.info(cache)$isdir)
+            stop("cache ", sQuote(cache), " exists but is not a directory")
+    } else
+        dir.create(cache, recursive=TRUE)
+
+    sqlitefile <- paste0(tolower(.class), ".sqlite3")
+    file.path(cache, sqlitefile)
+}
+
+.cache_internal <- function(x, cache.root, cache.fun, proxy, max.downloads)
+{
+    cachepath <- .named_cache_path(x, cache.root, cache.fun) 
+    need <- !file.exists(cachepath)
+    if (any(need)) {
+        hubpath <- .hub_resource_path(.hub_data_path(hubUrl(x)),
+            basename(cachepath)[need])
+        message("downloading from ", paste0(sQuote(hubpath), collapse="\n    "))
+    } else {
+        message("loading from cache ", 
+                paste0(sQuote(cachepath), collapse="\n    "))
+    }
+    if (any(need) > max.downloads) {
+        if (!interactive()) {
+            txt <- sprintf("resources needed (%d) exceeds max.downloads (%d)",
+                           sum(need), max.downloads)
+            stop(txt)
+        }
+        ans <- .ask(sprintf("download %d resources?", sum(need)), c("y", "n"))
+        if (ans == "n")
+            return(cachepath[!need])
+    }
+    ok <- .hub_resource(.hub_data_path(hubUrl(x)), basename(cachepath)[need], 
+                         cachepath[need], proxy=proxy)
+    if (!all(ok))
+        stop(sprintf("%d resources failed to download", sum(!ok)))
+    cachepath
+}
+
+.named_cache_path <- function(x, cache.root, cache.fun) 
+{
+    stopifnot(is(x, "Hub"))
+    path <- .datapathIds(x)
+    setNames(.cache_path(x, path, cache.root, cache.fun), names(path))
+}
+
+removeCache <- function(x)
+{
+    reply <- .ask("Delete cache file", c("y", "n"))
+    cache <- hubCache(x)
+    if (reply == "y") {
+      rv <- tryCatch({
+          if (file.exists(cache)) {
+              status <- unlink(cache, recursive=TRUE, force=TRUE)
+              if (status == 1)
+                  stop("'unlink()' failed to remove directory")
+          }
+
+          TRUE
+      }, error=function(err) {
+          warning("'clearCache()' failed",
+              "\n  database: ", sQuote(cache),
+              "\n  reason: ", conditionMessage(err),
+              call.=FALSE)
+
+          FALSE
+      })
+    } else rv <- FALSE
+
+    rv
+}
diff --git a/R/db-utils.R b/R/db-utils.R
new file mode 100644
index 0000000..0526da3
--- /dev/null
+++ b/R/db-utils.R
@@ -0,0 +1,174 @@
+### =========================================================================
+### db helpers for Hub objects
+### -------------------------------------------------------------------------
+###
+
+.db_open <- function(path) {
+    tryCatch({
+        conn <- dbConnect(SQLite(), path)
+    }, error=function(err) {
+        stop("failed to connect to local data base",
+             "\n  database: ", sQuote(path),
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+    conn
+}
+
+.db_close <- function(conn) {
+    if (!is.null(conn))
+        if (RSQLite::dbIsValid(conn)) 
+            dbDisconnect(conn)
+    invisible(conn)
+}
+
+.db_query <- function(path, query) {
+    if (is.character(path)) {
+        path <- .db_open(path)
+        on.exit(.db_close(path))
+    }
+    dbGetQuery(path, query)
+}
+
+.db_current <- function(path, hub, proxy)
+{
+    tryCatch({
+        url <- paste0(hub, '/metadata/database_timestamp')
+        onlineTime <- as.POSIXct(content(GET(url, proxy)))
+
+        db_path <- .db_get_db(path, hub, proxy)
+        sql <- "SELECT * FROM timestamp"
+        localTime <- as.POSIXct(.db_query(db_path, sql)[[1]])
+        onlineTime == localTime
+    }, error=function(e) {
+        warning("database may not be current",
+                "\n  database: ", sQuote(path),
+                "\n  reason: ", conditionMessage(e),
+                call.=FALSE)
+        ## TRUE even though not current, e.g., no internet connection
+        TRUE
+    })
+}
+
+
+## Helpers to get a fresh metadata DB connection
+.db_get_db <- function(path, hub, proxy) {
+    ## download or cache
+    tryCatch({
+        need <- !file.exists(path)
+        .hub_resource(.hub_metadata_path(hub), basename(path)[need], 
+                      path[need], proxy)
+    }, error=function(err) {
+        stop("failed to create local data base",
+             "\n  database: ", sQuote(path),
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+    path
+}
+
+.db_is_valid <- function(path) {
+    conn <- .db_open(path)
+    on.exit(.db_close(conn))
+    ## Some very minor testing to make sure metadata DB is intact.
+    tryCatch({
+        ## required tables present?
+        expected <- c("biocversions", "input_sources", "location_prefixes",
+                      "rdatapaths", "recipes", "resources", "statuses",
+                      "tags", "timestamp")
+        if (!all(expected %in% dbListTables(conn)))
+            stop("missing tables")
+        ## any resources?
+        sql <- "SELECT COUNT(id) FROM resources"
+        if (.db_query(conn, sql)[[1]] == 0L)
+            warning("empty 'resources' table; database may be corrupt")
+    }, error=function(err) {
+        stop("database is corrupt; remove it and try again",
+             "\n  database: ", sQuote(path),
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+    TRUE
+}
+
+.db_get <- function(path, hub, proxy) {
+    update <- !file.exists(path)
+    if (!update && !file.size(path)) {
+        file.remove(path)
+        update <- TRUE
+    }
+    if (!update && !.db_current(path, hub, proxy)) {
+        file.remove(path)
+        update <- TRUE
+    }
+    if (update)
+        message("updating metadata: ", appendLF=FALSE)
+    db_path <- .db_get_db(path, hub, proxy)
+    .db_is_valid(db_path)
+    db_path
+}
+
+.db_index_file <- function(x)
+    file.path(hubCache(x), "index.rds")
+
+.db_index_load <- function(x)
+    readRDS(.db_index_file(x))[names(x)]
+
+.db_uid0 <- function(path, .date){
+    tryCatch({
+        uid <- .uid0(path, .date)
+        sort(uid)
+    }, error=function(err) {
+        stop("failed to connect to local data base",
+             "\n  database: ", sQuote(path),
+             "\n  reason: ", conditionMessage(err),
+             call.=FALSE)
+    })
+}
+
+.db_create_index <- function(x) {
+    fl <- .db_index_file(x)
+    if (file.exists(fl) && (file.mtime(fl) > file.mtime(dbfile(x))))
+        return(fl)
+ 
+    tryCatch({
+        tbl <- .resource_table(x)
+        tbl <- setNames(do.call("paste", c(tbl, sep="\r")), rownames(tbl))
+        saveRDS(tbl, fl)
+    }, error=function(err) {
+        stop("failed to create index",
+             "\n  hubCache(): ", hubCache(x),
+             "\n  reason: ", conditionMessage(err))
+    })
+
+    fl
+}
+.db_index <- function(x) slot(x, ".db_index")
+`.db_index<-` <- function(x, ..., value) 
+{
+    if (length(value) > 1L)
+        stop("'value' must be length 1")
+    if (!is(value, "character"))
+        stop("'value' must be a character")
+    slot(x, ".db_index") <- value
+    x
+}
+
+.db_uid <- function(x) slot(x, ".db_uid")
+`.db_uid<-` <- function(x, ..., value)
+{
+    bad <- value[!value %in% .db_uid(x)]
+    if (any(bad))
+        stop("invalid subscripts: ",
+             paste(sQuote(S4Vectors:::selectSome(bad)), collapse=", "))
+    slot(x, ".db_uid") <- value
+    x
+}
+
+setMethod("dbconn", "Hub",
+    function(x) .db_open(dbfile(x))
+)
+
+setMethod("dbfile", "Hub", 
+    function(x) x at .db_path
+)
diff --git a/R/sql-utils.R b/R/sql-utils.R
new file mode 100644
index 0000000..31631ff
--- /dev/null
+++ b/R/sql-utils.R
@@ -0,0 +1,361 @@
+.id_as_single_string <- function(x)
+    paste(sprintf("'%s'", .db_uid(x)), collapse=", ")
+
+.query_as_data.frame <- function(x, query)
+{
+    tbl <- .db_query(dbfile(x), query)
+    ridx <- match(names(x), tbl$ah_id)
+    cidx <- match("ah_id", names(tbl)) 
+    rownames(tbl) <- tbl$ah_id
+    tbl[ridx, -cidx, drop=FALSE]
+}
+
+## FIXME: Should be removed.
+## This is redundant with the filter criteria in .uid0() and
+## does not reuse ids already discovered (i.e., it's an independent
+## query that may not match up and has no checks to confirm).
+.names_for_ids <- function(conn, ids){
+    query <- sprintf(
+        'SELECT resources.id, resources.ah_id
+         FROM resources, biocversions
+         WHERE biocversion == "%s"
+         AND biocversions.resource_id == resources.id',
+        biocVersion())
+    mtchData <- .db_query(conn, query)
+    names(ids) <- mtchData[mtchData[[1]] %in% ids,][[2]]
+    ids
+}
+
+## This function filters the local annotationhub.sqlite metadata db and
+## defines the subset exposed by AnnotationHub().
+.uid0 <- function(path, date)
+{
+    conn <- .db_open(path)
+    on.exit(.db_close(conn))
+
+    ## General filter:
+    ## All AnnotationHub resources (except OrgDbs, see below) are
+    ## available from the time they are added -> infinity unless
+    ## they are removed from the web or by author request. The
+    ## snapshot date can be changed by the user. We want to return records
+    ## with no rdatadateremoved and with rdatadateadded <= snapshot.
+    ## All OrgDbs are omitted in the first filter and selectively 
+    ## exposed in the second filter.
+    ##   NOTE: biocversions filter distinguishes between release and devel;
+    ##   this is not caught by rdatadate added filter because the timestamp
+    ##   is updated with each modification and currently someone using
+    ##   an old version of Bioconductor will still get the current db
+    ##   which will have a timestamp > the date when the old version of
+    ##   Bioconductor was valid.
+    ##   NOTE: The 'date' variable is the snapshotDate().
+
+    query1 <- sprintf(
+        'SELECT resources.id
+         FROM resources, rdatapaths, biocversions
+         WHERE resources.rdatadateadded <= "%s"
+         AND biocversions.biocversion <= "%s"
+         AND resources.rdatadateremoved IS NULL
+         AND rdatapaths.rdataclass != "OrgDb"
+         AND biocversions.resource_id == resources.id
+         AND rdatapaths.resource_id == resources.id',
+         date, biocVersion())
+    biocIds1 <- .db_query(conn, query1)[[1]]
+
+    ## OrgDb sqlite files:
+    ##
+    ## OrgDbs are the single resource designed to expire at the end of a
+    ## release cycle. The sqlite files are built before a release, added to the
+    ## devel branch then propagate to the new release branch. For the
+    ## duration of a release cycle both release and devel share the same
+    ## OrgDb packages. Before the next release, new files are built, added
+    ## to devel, propagated to release and so on.
+    ## 
+    ## When new sqlite files are added to the hub they are stamped
+    ## with the devel version which immediately becomes the new release version.
+    ## For this reason, the devel code loads OrgDbs with the release version
+    ## e.g.,
+    ##   ifelse(isDevel(), biocversion - 0.1, biocversion)
+    ##
+    ## NOTE: Because OrgDbs are valid for a full devel cycle they are
+    ##       not filtered by snapshotDate(); the OrgDbs are valid for all
+    ##       snapshotDates for a given biocVersion() 
+ 
+    biocversion <- as.numeric(as.character(biocVersion()))
+    orgdb_release_version <- ifelse(isDevel(), biocversion - 0.1, biocversion)
+    query2 <- sprintf(
+        'SELECT resources.id
+         FROM resources, biocversions, rdatapaths
+         WHERE biocversions.biocversion == "%s"
+         AND rdatapaths.rdataclass == "OrgDb"
+         AND biocversions.resource_id == resources.id
+         AND rdatapaths.resource_id == resources.id',
+         orgdb_release_version)
+    biocIds2 <- .db_query(conn, query2)[[1]]
+
+    ## make unique and sort 
+    allIds = sort(unique(c(biocIds1, biocIds2)))
+    ## match id to ah_id
+    query <- paste0('SELECT ah_id FROM resources ',
+                    'WHERE id IN (', paste0(allIds, collapse=","), ')',
+                    'ORDER BY id')
+    names(allIds) <- .db_query(conn, query)[[1]]
+    allIds
+}
+
+## FIXME: On one hand it's convenient to have helpers for each of these fields.
+##        Yet the main (only?) use case is probably mcols() and show() in
+##        which case we want all of these and it's inefficient to search
+##        the same table multiple times for the different fields. 
+
+## helper to retrieve tags
+.tags <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT tag, resource_id AS id FROM tags
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+## helper for extracting rdataclass
+.rdataclass <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT rdataclass, resource_id AS id FROM rdatapaths
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+## helper for extracting rdatapath 
+.rdatapath <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT rdatapath, resource_id AS id FROM rdatapaths
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+## helper for extracting sourceUrls
+.sourceurl <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT sourceurl, resource_id AS id FROM input_sources
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+##  helper for extracting sourcetype
+.sourcetype <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT sourcetype, resource_id AS id FROM input_sources
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+.sourcesize <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT sourcesize, resource_id AS id FROM input_sources
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+.sourcelastmodifieddate <- function(x) {
+    query <- sprintf(
+        'SELECT DISTINCT sourcelastmodifieddate, resource_id AS id 
+         FROM input_sources
+         WHERE resource_id IN (%s)',
+        .id_as_single_string(x))
+    .db_query(dbfile(x), query)
+}
+
+## Helper to collapse many to one fields (like above) into one space
+.collapse_as_string <- function(x, FUN)
+{
+    uid <- .db_uid(x)
+    tbl <- FUN(x)
+    lst <- vapply(split(tbl[[1]], tbl[["id"]]), paste0,
+                  character(1), collapse=", ")
+    lst <- lst[match(uid, names(lst))]
+    setNames(lst, names(uid))           # allows for x with no tags 
+}
+
+.collapse_as_list <- function(x, FUN)
+{
+    uid <- .db_uid(x)
+    tbl <- FUN(x)
+    lst <- split(tbl[[1]], tbl$id)
+    lst <- lst[match(uid, names(lst))]
+    setNames(lst, names(uid))           # allows for x with no tags 
+}
+
+## Used in mcols()
+.DB_RESOURCE_FIELDS <- paste(sep=".", collapse=", ", "resources",
+    c("ah_id", "title", "dataprovider", "species", "taxonomyid", "genome",
+      "description", "coordinate_1_based", "maintainer",
+      "rdatadateadded", "preparerclass"))
+
+.resource_table <- function(x)
+{
+    query <- sprintf(
+        'SELECT %s FROM resources
+         WHERE resources.id IN (%s)',
+        .DB_RESOURCE_FIELDS, .id_as_single_string(x))
+    tbl <- .query_as_data.frame(x, query)
+    tbl[["tags"]] <- I(.collapse_as_list(x, .tags))
+    tbl[["rdataclass"]] <- .collapse_as_string(x, .rdataclass)
+    tbl[["rdatapath"]] <- .collapse_as_string(x, .rdatapath)
+    tbl[["sourceurl"]] <- .collapse_as_string(x, .sourceurl)
+    tbl[["sourcetype"]] <- .collapse_as_string(x, .sourcetype)
+    tbl
+}
+
+.resource_columns <- function()
+    strsplit(gsub("resources.", "", .DB_RESOURCE_FIELDS), ", ")[[1]]
+
+.resource_column <- function(x, name)
+{
+    valid <- .resource_columns()
+    if (!name %in% valid) {
+        msg <- sprintf("%s is not a resource data column", sQuote(name))
+        stop(msg)
+    }
+    query <- sprintf(
+        'SELECT ah_id, %s FROM resources WHERE id IN (%s)',
+        name, .id_as_single_string(x))
+    .query_as_data.frame(x, query)[[1]]
+}
+
+## This is used by cache to get the rDataPath ID for a resource
+## I think this should say to select 'id' as id to extract the rdatapathID 
+## (instead of the resource_id)
+.datapathIds <- function(x)
+{
+    query <- sprintf(
+        'SELECT DISTINCT resources.ah_id, rdatapaths.id
+         FROM resources, rdatapaths
+         WHERE resources.id IN (%s)
+         AND resources.id == rdatapaths.resource_id',
+        .id_as_single_string(x))
+    result <- .db_query(dbfile(x), query)
+    setNames(result[[2]], result[[1]])
+}
+
+## 
+.dataclass <- function(x)
+{
+    query <- sprintf(
+        'SELECT DISTINCT r.ah_id AS ah_id, rdp.dispatchclass
+         FROM rdatapaths AS rdp, resources AS r WHERE
+         r.id = rdp.resource_id
+         AND rdp.resource_id IN (%s)',
+        .id_as_single_string(x))
+    .query_as_data.frame(x, query)[[1]]
+}
+
+## 
+.title_data.frame <-
+    function(x)
+{
+    query <- sprintf(
+        "SELECT ah_id, title FROM resources
+         WHERE resources.id IN (%s)",
+        .id_as_single_string(x))
+    .query_as_data.frame(x, query)
+}
+
+.count_resources <-
+    function(x, column, limit=10)
+{
+    query <- sprintf(
+        "SELECT %s FROM resources
+         WHERE resources.id IN (%s)
+         GROUP BY %s ORDER BY COUNT(%s) DESC LIMIT %d", 
+        column, .id_as_single_string(x), column, column, limit)
+    .db_query(dbfile(x), query)[[column]]
+}
+
+.count_join_resources <-
+    function(x, table, column, limit=10)
+{
+    query <- sprintf(
+        "SELECT %s FROM resources, %s
+         WHERE resources.id IN (%s) AND %s.resource_id == resources.id
+         GROUP BY %s ORDER BY COUNT(%s) DESC LIMIT %d", 
+        column, table,
+        .id_as_single_string(x), table,
+        column, column, limit)
+    .db_query(dbfile(x), query)[[column]]
+}
+
+## make a function to create a view whenever the DB is updated..
+## SQL will look kind of like the one used for go:
+## CREATE VIEW go AS
+## SELECT _id,go_id,evidence, 'BP' AS 'ontology' FROM go_bp
+## UNION
+## SELECT _id,go_id,evidence, 'CC' FROM go_cc
+## UNION
+## SELECT _id,go_id,evidence, 'MF' FROM go_mf;
+
+
+## SO now we just need to decide on which views we want/need.
+## So really we want to 1st refactor the show method (and make hard decisions there)
+## And the view we create here should reflect those ideas.
+
+
+## CREATE VIEW hub AS
+## SELECT * FROM resources AS r, rdatapaths AS rdp, input_sources AS ins  WHERE r.id=rdp.resource_id AND r.id=ins.resource_id LIMIT 2;
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+## Sonali better date query
+## SELECT * FROM resources where rdatadateadded <= "2013-03-19" GROUP BY title ORDER BY rdatadateadded DESC limit 2;
+
+
+## Example:
+## SELECT COUNT(id) AS theCount, `Tag` from `images-tags`
+## GROUP BY `Tag`
+## ORDER BY theCount DESC
+## LIMIT 20
+
+
+## SELECT id FROM 
+## (SELECT * FROM
+## (SELECT * FROM resources where rdatadateadded <= "2013-03-19")
+## AS res GROUP BY title ORDER BY rdatadateadded DESC limit 1);
+
+
+## SELECT MAX(id) FROM (SELECT id FROM (select * from resources where ah_id in ('AH523','AH22249')) AS res GROUP BY title) AS res;
+## same issue
+
+
+## Here is an example that actually does get close:
+## SELECT max(id) as mid, id, ah_id, title FROM (select * from resources where ah_id in ('AH523','AH22249','AH524','AH22250')) AS res GROUP BY maintainer;
+
+## Basically I would just want it like this:
+## SELECT max(id) as mid FROM (select * from resources where ah_id in ('AH523','AH22249','AH524','AH22250')) AS res GROUP BY maintainer;
+
+## And then group by title instead so pretty much like this:
+## SELECT max(id) as id FROM (SELECT * FROM resources where rdatadateadded <= "2013-03-19") AS res GROUP BY title;
diff --git a/R/utilities.R b/R/utilities.R
new file mode 100644
index 0000000..b7291d7
--- /dev/null
+++ b/R/utilities.R
@@ -0,0 +1,159 @@
+.require <-
+    function(pkg)
+{
+    if (length(grep(sprintf("package:%s", pkg), search())) != 0L)
+        return()
+    message("require(", dQuote(pkg), ")")
+    handler <- function(err) {
+        msg <- sprintf("require(%s) failed: %s", dQuote(pkg),
+                       conditionMessage(err))
+        stop(msg)
+    }
+    tryCatch({
+        suppressPackageStartupMessages({
+            require(pkg, quietly=TRUE, character.only=TRUE)
+        })
+    }, ## error first, so warnings-converted-to-error are not handled
+       ## again
+       error=handler, warning=handler)
+}
+
+.ask <- function(txt, values) {
+    txt <- sprintf("%s [%s] ", txt, paste(values, collapse="/"))
+    repeat {
+        ans <- tolower(trimws(readline(txt)))
+        if (ans %in% values)
+            break
+    }
+    ans
+}
+
+.gunzip <- function(file, destination)
+{
+    bufferSize <- 1e7
+    fin <- gzfile(file, "rb")
+    fout <- file(destination, "wb")
+    on.exit({
+        close(fin)
+        close(fout)
+    })
+
+    repeat {
+        x <- readBin(fin, raw(0L), bufferSize, 1L)
+        if (length(x) == 0L)
+            break
+        writeBin(x, fout, size=1L)
+    }
+
+    invisible(destination)
+}
+
+## tidyGRanges
+
+.metadataForAH <- 
+    function(x, ...)
+{
+    stopifnot(length(getHub(x)) == 1)
+    meta <- getHub(x)
+    list(AnnotationHubName=names(meta), 
+         `File Name`=basename(meta$sourceurl),
+         `Data Source`=meta$sourceurl,
+         `Provider`=meta$dataprovider,
+         `Organism`=meta$species,
+         `Taxonomy ID`=meta$taxonomyid )
+}
+
+.guessIsCircular <-
+    function(x)
+{
+    ans <- GenomeInfoDb::isCircular(x)
+    idx <- is.na(ans)
+    test <- names(ans) %in% c("MT", "MtDNA", "dmel_mitochondrion_genome",
+                              "Mito", "chrM")
+    ans[idx] <- test[idx]
+    ans
+}
+
+# tasks we want .tidyGRanges() to do:
+# 1. add metdata() to GRanges containing the names() of hub object
+# 2. sortSeqlevels()
+# 3. fill the seqinfo with correct information
+# for step 3 - comparison is done with existingSeqinfo and 
+# GenomeInfoDb::Seqinfo() - currently if its not the same, seqinfo is replaced.
+
+.tidyGRanges <- 
+    function(x, gr, sort=TRUE, guess.circular=TRUE, addGenome=TRUE,
+             metadata=TRUE, genome=getHub(x)$genome)
+{
+    if (metadata)
+        metadata(gr)  <- .metadataForAH(x)
+
+    ## BEWARE: 
+    ## 1) GenomeInfoDb::Seqinfo doesnt sortSeqlevels - so we need to 
+    ## sortSeqlevels before comparison else identical wont work.
+    ## 2) case - the input GRanges might have a subset of seqlevels whereas
+    ## the GenomeInfoDb::Seqinfo returns all seqlevels with scaffolds
+    ## from an assembly.  
+    ## 3)only 10-15 genomes supported by GenomeInfoDb::Seqinfo
+
+    tryCatch({
+        loadNamespace("GenomeInfoDb")
+    }, error=function(err) {
+        ## quietly return un-tidied GRanges (?)
+        return(gr)
+    })      
+    
+    
+    if (sort)
+        gr <- GenomeInfoDb::sortSeqlevels(gr) 
+    existingSeqinfo <- GenomeInfoDb::seqinfo(gr)    
+
+    ## Not all Genome's are supported by GenomeInfoDb::Seqinfo
+    newSeqinfo <- tryCatch({
+        GenomeInfoDb::Seqinfo(genome=genome)
+    }, error= function(err) {
+         NULL
+    })
+    
+    if (is.null(newSeqinfo)) {
+        message("using guess work to populate seqinfo")
+        ## use guess work to populate
+        if (guess.circular)
+            GenomeInfoDb::isCircular(existingSeqinfo)  <- 
+                .guessIsCircular(existingSeqinfo)
+        if (addGenome)
+            GenomeInfoDb::genome(existingSeqinfo) <- genome
+        if (sort || guess.circular || addGenome) {
+            new2old <- match(GenomeInfoDb::seqlevels(existingSeqinfo),
+                        GenomeInfoDb::seqlevels(gr))
+            GenomeInfoDb::seqinfo(gr, new2old=new2old) <- existingSeqinfo
+        }
+        return(gr)
+    }
+   
+
+    
+    newSeqinfo <- newSeqinfo[GenomeInfoDb::seqlevels(gr)]
+    # comapre the current and new seqinfo
+    diffSeqlengths <- setdiff(GenomeInfoDb::seqlengths(newSeqinfo), 
+                          GenomeInfoDb::seqlengths(existingSeqinfo))  
+    diffSeqnames <- setdiff(GenomeInfoDb::seqnames(newSeqinfo), 
+                        GenomeInfoDb::seqnames(existingSeqinfo)) 
+    diffGenome <- identical(unique(GenomeInfoDb::genome(newSeqinfo)), 
+                      unique(GenomeInfoDb::genome(existingSeqinfo))) 
+    diffIscircular <- identical(table(GenomeInfoDb::isCircular(newSeqinfo)), 
+                          table(GenomeInfoDb::isCircular(existingSeqinfo)))
+    len <- c(length(diffSeqlengths), length(diffSeqnames))
+    
+    # if its the same dont replace 
+    if(all(unique(len)==0 & diffGenome & diffIscircular))
+        return(gr)   
+
+    ## Replace incorrect seqinfo 
+    if (sort || guess.circular || addGenome) {
+        new2old <- match(GenomeInfoDb::seqlevels(gr), 
+                         GenomeInfoDb::seqlevels(newSeqinfo))
+        GenomeInfoDb::seqinfo(gr, new2old=new2old) <- newSeqinfo
+    }
+    gr
+}
diff --git a/R/zzz.R b/R/zzz.R
new file mode 100644
index 0000000..ef661ec
--- /dev/null
+++ b/R/zzz.R
@@ -0,0 +1,31 @@
+.CACHE_ROOT <- ".AnnotationHub"
+
+.onLoad <- function(libname, pkgname, ...) {
+    ## options from getOption or Sys.env or default, in that order
+    if (is.null(getAnnotationHubOption("MAX_DOWNLOADS"))) {
+        opt <- getOption("ANNOTATION_HUB_MAX_DOWNLOADS", 10L)
+        opt <- Sys.getenv("ANNOTATION_HUB_MAX_DOWNLOADS", opt)
+        opt <- as.integer(opt)
+        setAnnotationHubOption("MAX_DOWNLOADS", opt)
+    }
+    if (is.null(getAnnotationHubOption("URL"))) {
+        opt <- getOption("ANNOTATION_HUB_URL",
+                         "https://annotationhub.bioconductor.org")
+        opt <- Sys.getenv("ANNOTATION_HUB_URL", opt)
+        setAnnotationHubOption("URL", opt)
+    }
+    if (is.null(getAnnotationHubOption("CACHE"))) {
+        path <- switch(.Platform$OS.type, unix = path.expand("~/"),
+                       windows= file.path(gsub("\\\\", "/",
+                       Sys.getenv("HOME")), "AppData"))
+        opt <- getOption("ANNOTATION_HUB_CACHE", file.path(path, .CACHE_ROOT))
+        opt <- Sys.getenv("ANNOTATION_HUB_CACHE", opt)
+        setAnnotationHubOption("CACHE", opt)
+    }
+    if (is.null(getAnnotationHubOption("PROXY"))) {
+        opt <- getOption("ANNOTATION_HUB_PROXY", "")
+        opt <- Sys.getenv("ANNOTATION_HUB_PROXY", opt)
+        if (nzchar(opt))
+            setAnnotationHubOption("PROXY", opt)
+    }
+}
diff --git a/build/vignette.rds b/build/vignette.rds
new file mode 100644
index 0000000..e48a15b
Binary files /dev/null and b/build/vignette.rds differ
diff --git a/debian/README.test b/debian/README.test
deleted file mode 100644
index 18f7329..0000000
--- a/debian/README.test
+++ /dev/null
@@ -1,10 +0,0 @@
-Notes on how this package can be tested.
-────────────────────────────────────────
-
-This package can be tested by running the provided test:
-
-LC_ALL=C R --no-save <<EOT
-BiocGenerics:::testPackage("AnnotationHub")
-EOT
-
-in order to confirm its integrity.
diff --git a/debian/changelog b/debian/changelog
deleted file mode 100644
index 29b035f..0000000
--- a/debian/changelog
+++ /dev/null
@@ -1,35 +0,0 @@
-r-bioc-annotationhub (2.8.1-1) unstable; urgency=medium
-
-  * Team upload
-  * New upstream version
-
- -- Graham Inggs <ginggs at debian.org>  Fri, 12 May 2017 17:49:31 +0200
-
-r-bioc-annotationhub (2.6.4-1) unstable; urgency=medium
-
-  * New upstream version
-  * Add new Build-Depends: r-cran-yaml
-  * debhelper 10
-  * d/watch: version=4
-
- -- Andreas Tille <tille at debian.org>  Wed, 30 Nov 2016 10:43:21 +0100
-
-r-bioc-annotationhub (2.6.0-1) unstable; urgency=medium
-
-  * New upstream version
-  * Convert to dh-r
-  * Generic BioConductor homepage
-
- -- Andreas Tille <tille at debian.org>  Thu, 27 Oct 2016 14:12:17 +0200
-
-r-bioc-annotationhub (2.4.2-2) unstable; urgency=medium
-
-  * Fix autopkgtest (thanks to Gordon Ball for the patch)
-
- -- Andreas Tille <tille at debian.org>  Wed, 22 Jun 2016 16:26:48 +0200
-
-r-bioc-annotationhub (2.4.2-1) unstable; urgency=low
-
-  * Initial release (closes: #825114)
-
- -- Andreas Tille <tille at debian.org>  Mon, 23 May 2016 21:07:01 +0200
diff --git a/debian/compat b/debian/compat
deleted file mode 100644
index f599e28..0000000
--- a/debian/compat
+++ /dev/null
@@ -1 +0,0 @@
-10
diff --git a/debian/control b/debian/control
deleted file mode 100644
index 01d4f5e..0000000
--- a/debian/control
+++ /dev/null
@@ -1,34 +0,0 @@
-Source: r-bioc-annotationhub
-Maintainer: Debian Med Packaging Team <debian-med-packaging at lists.alioth.debian.org>
-Uploaders: Andreas Tille <tille at debian.org>
-Section: gnu-r
-Testsuite: autopkgtest
-Priority: optional
-Build-Depends: debhelper (>= 10),
-               dh-r,
-               r-base-dev,
-               r-bioc-annotationdbi,
-               r-bioc-biocinstaller,
-               r-bioc-interactivedisplaybase,
-               r-cran-httr,
-               r-cran-yaml
-Standards-Version: 3.9.8
-Vcs-Browser: https://anonscm.debian.org/viewvc/debian-med/trunk/packages/R/r-bioc-annotationhub/trunk/
-Vcs-Svn: svn://anonscm.debian.org/debian-med/trunk/packages/R/r-bioc-annotationhub/trunk/
-Homepage: https://bioconductor.org/packages/AnnotationHub/
-
-Package: r-bioc-annotationhub
-Architecture: all
-Depends: ${R:Depends},
-         ${misc:Depends},
-Recommends: ${R:Recommends}
-Suggests: ${R:Suggests}
-Description: GNU R client to access AnnotationHub resources
- This package provides a client for the Bioconductor AnnotationHub web
- resource. The AnnotationHub web resource provides a central location
- where genomic files (e.g., VCF, bed, wig) and other resources from
- standard locations (e.g., UCSC, Ensembl) can be discovered. The resource
- includes metadata about each resource, e.g., a textual description,
- tags, and date of modification. The client creates and manages a local
- cache of files retrieved by the user, helping with quick and
- reproducible access.
diff --git a/debian/copyright b/debian/copyright
deleted file mode 100644
index 53d45db..0000000
--- a/debian/copyright
+++ /dev/null
@@ -1,106 +0,0 @@
-Format: https://www.debian.org/doc/packaging-manuals/copyright-format/1.0/
-Upstream-Name: AnnotationHub
-Upstream-Contact: Bioconductor Package Maintainer <maintainer at bioconductor.org> 
-Source: https://bioconductor.org/packages/AnnotationHub/
-
-Files: *
-Copyright: 2006-2016 Martin Morgan, Marc Carlson, Dan Tenenbaum, Sonali Arora
-License: Artistic-2.0
-
-Files: debian/*
-Copyright: 2016 Andreas Tille <tille at debian.org>
-License: Artistic-2.0
-
-License: Artistic-2.0
-			 The "Artistic License"
- .
-				Preamble
- .
- 1. You may make and give away verbatim copies of the source form of the
-    Standard Version of this Package without restriction, provided that
-    you duplicate all of the original copyright notices and associated
-    disclaimers.
- .
- 2. You may apply bug fixes, portability fixes and other modifications
-    derived from the Public Domain or from the Copyright Holder.  A
-    Package modified in such a way shall still be considered the Standard
-    Version.
- .
- 3. You may otherwise modify your copy of this Package in any way,
-    provided that you insert a prominent notice in each changed file stating
-    how and when you changed that file, and provided that you do at least
-    ONE of the following:
- .
-    a) place your modifications in the Public Domain or otherwise make them
-    Freely Available, such as by posting said modifications to Usenet or
-    an equivalent medium, or placing the modifications on a major archive
-    site such as uunet.uu.net, or by allowing the Copyright Holder to include
-    your modifications in the Standard Version of the Package.
- .
-    b) use the modified Package only within your corporation or organization.
- .
-    c) rename any non-standard executables so the names do not conflict
-    with standard executables, which must also be provided, and provide
-    a separate manual page for each non-standard executable that clearly
-    documents how it differs from the Standard Version.
- .
-    d) make other distribution arrangements with the Copyright Holder.
- .
- 4. You may distribute the programs of this Package in object code or
-    executable form, provided that you do at least ONE of the following:
- .
-    a) distribute a Standard Version of the executables and library files,
-    together with instructions (in the manual page or equivalent) on where
-    to get the Standard Version.
- .
-    b) accompany the distribution with the machine-readable source of
-    the Package with your modifications.
- .
-    c) give non-standard executables non-standard names, and clearly
-    document the differences in manual pages (or equivalent), together
-    with instructions on where to get the Standard Version.
- .
-    d) make other distribution arrangements with the Copyright Holder.
- .
- 5. You may charge a reasonable copying fee for any distribution of this
-    Package.  You may charge any fee you choose for support of this Package.
-    You may not charge a fee for this Package itself.  However, you may
-    distribute this Package in aggregate with other (possibly commercial)
-    programs as part of a larger (possibly commercial) software distribution
-    provided that you do not advertise this Package as a product of your
-    own.  You may embed this Package's interpreter within an executable of
-    yours (by linking); this shall be construed as a mere form of
-    aggregation, provided that the complete Standard Version of the
-    interpreter is so embedded.
- .
- 6. The scripts and library files supplied as input to or produced as
-    output from the programs of this Package do not automatically fall under
-    the copyright of this Package, but belong to whoever generated them, and
-    may be sold commercially, and may be aggregated with this Package.  If
-    such scripts or library files are aggregated with this Package via the
-    so-called "undump" or "unexec" methods of producing a binary executable
-    image, then distribution of such an image shall neither be construed as
-    a distribution of this Package nor shall it fall under the restrictions
-    of Paragraphs 3 and 4, provided that you do not represent such an
-    executable image as a Standard Version of this Package.
- .
- 7. C subroutines (or comparably compiled subroutines in other
-    languages) supplied by you and linked into this Package in order to
-    emulate subroutines and variables of the language defined by this
-    Package shall not be considered part of this Package, but are the
-    equivalent of input as in Paragraph 6, provided these subroutines do
-    not change the language in any way that would cause it to fail the
-    regression tests for the language.
- .
- 8. Aggregation of this Package with a commercial distribution is always
-    permitted provided that the use of this Package is embedded; that is,
-    when no overt attempt is made to make this Package's interfaces visible
-    to the end user of the commercial distribution.  Such use shall not be
-    construed as a distribution of this Package.
- .
- 9. The name of the Copyright Holder may not be used to endorse or promote
-    products derived from this software without specific prior written permission.
- .
- 10. THIS PACKAGE IS PROVIDED "AS IS" AND WITHOUT ANY EXPRESS OR
-    IMPLIED WARRANTIES, INCLUDING, WITHOUT LIMITATION, THE IMPLIED
-    WARRANTIES OF MERCHANTIBILITY AND FITNESS FOR A PARTICULAR PURPOSE.
diff --git a/debian/docs b/debian/docs
deleted file mode 100644
index 50f6656..0000000
--- a/debian/docs
+++ /dev/null
@@ -1 +0,0 @@
-debian/README.test
diff --git a/debian/rules b/debian/rules
deleted file mode 100755
index 68d9a36..0000000
--- a/debian/rules
+++ /dev/null
@@ -1,4 +0,0 @@
-#!/usr/bin/make -f
-
-%:
-	dh $@ --buildsystem R
diff --git a/debian/source/format b/debian/source/format
deleted file mode 100644
index 163aaf8..0000000
--- a/debian/source/format
+++ /dev/null
@@ -1 +0,0 @@
-3.0 (quilt)
diff --git a/debian/tests/control b/debian/tests/control
deleted file mode 100644
index 39528b8..0000000
--- a/debian/tests/control
+++ /dev/null
@@ -1,3 +0,0 @@
-Tests: run-unit-test
-Depends: @, r-cran-runit, r-bioc-genomicranges
-Restrictions: allow-stderr
diff --git a/debian/tests/run-unit-test b/debian/tests/run-unit-test
deleted file mode 100644
index 9635be7..0000000
--- a/debian/tests/run-unit-test
+++ /dev/null
@@ -1,7 +0,0 @@
-#!/bin/sh -e
-
-export HOME=$ADTTMP
-
-LC_ALL=C R --no-save <<EOT
-BiocGenerics:::testPackage("AnnotationHub")
-EOT
diff --git a/debian/watch b/debian/watch
deleted file mode 100644
index eae5c39..0000000
--- a/debian/watch
+++ /dev/null
@@ -1,3 +0,0 @@
-version=4
-opts=downloadurlmangle=s?^(.*)\.\.?http:$1packages/release/bioc? \
- http://www.bioconductor.org/packages/release/bioc/html/AnnotationHub.html .*/AnnotationHub_([\d\.]+)\.tar\.gz
diff --git a/inst/doc/AnnotationHub-HOWTO.R b/inst/doc/AnnotationHub-HOWTO.R
new file mode 100644
index 0000000..f382f5a
--- /dev/null
+++ b/inst/doc/AnnotationHub-HOWTO.R
@@ -0,0 +1,158 @@
+## ----style, echo = FALSE, results = 'asis', warning=FALSE-----------------------------------------
+options(width=100)
+suppressPackageStartupMessages({
+    ## load here to avoid noise in the body of the vignette
+    library(AnnotationHub)
+    library(GenomicFeatures)
+    library(Rsamtools)
+    library(VariantAnnotation)
+})
+BiocStyle::markdown()
+
+## ----less-model-org-------------------------------------------------------------------------------
+library(AnnotationHub)
+ah <- AnnotationHub()
+query(ah, "OrgDb")
+orgdb <- query(ah, "OrgDb")[[1]] 
+
+## ----less-model-org-select------------------------------------------------------------------------
+keytypes(orgdb)
+columns(orgdb)
+egid <- head(keys(orgdb, "ENTREZID"))
+select(orgdb, egid, c("SYMBOL", "GENENAME"), "ENTREZID")
+
+## ---- eval=FALSE----------------------------------------------------------------------------------
+#  url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
+#  filename <-  basename(url)
+#  download.file(url, destfile=filename)
+#  if (file.exists(filename))
+#     data <- import(filename, format="bed")
+
+## ----results='hide'-------------------------------------------------------------------------------
+library(AnnotationHub)
+ah = AnnotationHub()
+epiFiles <- query(ah, "EpigenomeRoadMap")
+
+## -------------------------------------------------------------------------------------------------
+epiFiles
+
+## -------------------------------------------------------------------------------------------------
+unique(epiFiles$species)
+unique(epiFiles$genome)
+
+## -------------------------------------------------------------------------------------------------
+table(epiFiles$sourcetype)
+
+## -------------------------------------------------------------------------------------------------
+sort(table(epiFiles$description), decreasing=TRUE)
+
+## -------------------------------------------------------------------------------------------------
+metadata.tab <- query(ah , c("EpigenomeRoadMap", "Metadata"))
+metadata.tab
+
+## ----echo=FALSE, results='hide'-------------------------------------------------------------------
+metadata.tab <- ah[["AH41830"]]
+
+## -------------------------------------------------------------------------------------------------
+metadata.tab <- ah[["AH41830"]]
+
+## -------------------------------------------------------------------------------------------------
+metadata.tab[1:6, 1:5]
+
+## -------------------------------------------------------------------------------------------------
+bpChipEpi <- query(ah , c("EpigenomeRoadMap", "broadPeak", "chip", "consolidated"))
+
+## -------------------------------------------------------------------------------------------------
+allBigWigFiles <- query(ah, c("EpigenomeRoadMap", "BigWig"))
+
+## -------------------------------------------------------------------------------------------------
+seg <- query(ah, c("EpigenomeRoadMap", "segmentations"))
+
+## -------------------------------------------------------------------------------------------------
+E126 <- query(ah , c("EpigenomeRoadMap", "E126", "H3K4ME2"))
+E126
+
+## ----echo=FALSE, results='hide'-------------------------------------------------------------------
+peaks <- E126[['AH29817']]
+
+## -------------------------------------------------------------------------------------------------
+peaks <- E126[['AH29817']]
+seqinfo(peaks)
+
+## -------------------------------------------------------------------------------------------------
+metadata(peaks)
+ah[metadata(peaks)$AnnotationHubName]$sourceurl
+
+## ----takifugu-gene-models-------------------------------------------------------------------------
+query(ah, c("Takifugu", "release-80"))
+
+## ----takifugu-data--------------------------------------------------------------------------------
+gtf <- ah[["AH47101"]]
+dna <- ah[["AH47477"]]
+
+head(gtf, 3)
+dna
+head(seqlevels(dna))
+
+## ----takifugu-seqlengths--------------------------------------------------------------------------
+keep <- names(tail(sort(seqlengths(dna)), 25))
+gtf_subset <- gtf[seqnames(gtf) %in% keep]
+
+## ----takifugu-txdb--------------------------------------------------------------------------------
+library(GenomicFeatures)         # for makeTxDbFromGRanges
+txdb <- makeTxDbFromGRanges(gtf_subset)
+
+## ----takifugu-exons-------------------------------------------------------------------------------
+library(Rsamtools)               # for getSeq,FaFile-method
+exons <- exons(txdb)
+length(exons)
+getSeq(dna, exons)
+
+## -------------------------------------------------------------------------------------------------
+chainfiles <- query(ah , c("hg38", "hg19", "chainfile"))
+chainfiles
+
+## ----echo=FALSE, results='hide'-------------------------------------------------------------------
+chain <- chainfiles[['AH14150']]
+
+## -------------------------------------------------------------------------------------------------
+chain <- chainfiles[['AH14150']]
+chain
+
+## -------------------------------------------------------------------------------------------------
+library(rtracklayer)
+gr38 <- liftOver(peaks, chain)
+
+## -------------------------------------------------------------------------------------------------
+genome(gr38) <- "hg38"
+gr38
+
+## ----echo=FALSE, results='hide', message=FALSE----------------------------------------------------
+query(ah, c("GRCh37", "dbSNP", "VCF" ))
+vcf <- ah[['AH50424']]
+
+## ----message=FALSE--------------------------------------------------------------------------------
+variants <- readVcf(vcf, genome="hg19")
+variants
+
+## -------------------------------------------------------------------------------------------------
+rowRanges(variants)
+
+## -------------------------------------------------------------------------------------------------
+seqlevelsStyle(variants) <-seqlevelsStyle(peaks)
+
+## -------------------------------------------------------------------------------------------------
+overlap <- findOverlaps(variants, peaks)
+overlap
+
+## -------------------------------------------------------------------------------------------------
+idx <- subjectHits(overlap) == 3852
+overlap[idx]
+
+## -------------------------------------------------------------------------------------------------
+peaks[3852]
+rowRanges(variants)[queryHits(overlap[idx])]
+
+## -------------------------------------------------------------------------------------------------
+sessionInfo()
+
diff --git a/inst/doc/AnnotationHub-HOWTO.Rmd b/inst/doc/AnnotationHub-HOWTO.Rmd
new file mode 100644
index 0000000..953199a
--- /dev/null
+++ b/inst/doc/AnnotationHub-HOWTO.Rmd
@@ -0,0 +1,367 @@
+---
+title: "AnnotationHub How-To's"
+output:
+  BiocStyle::html_document:
+    toc: true
+vignette: >
+  % \VignetteIndexEntry{AnnotationHub: AnnotationHub HOW TO's}
+  % \VignetteDepends{AnnotationHub, GenomicFeatures, Rsamtools}
+  % \VignetteEngine{knitr::rmarkdown}
+---
+
+```{r style, echo = FALSE, results = 'asis', warning=FALSE}
+options(width=100)
+suppressPackageStartupMessages({
+    ## load here to avoid noise in the body of the vignette
+    library(AnnotationHub)
+    library(GenomicFeatures)
+    library(Rsamtools)
+    library(VariantAnnotation)
+})
+BiocStyle::markdown()
+```
+
+**Package**: `r Biocpkg("AnnotationHub")`<br />
+**Authors**: `r packageDescription("AnnotationHub")[["Author"]] `<br />
+**Modified**: Sun Jun 28 10:41:23 2015<br />
+**Compiled**: `r date()`
+
+
+# Accessing Genome-Scale Data
+
+## Non-model organism gene annotations
+
+_Bioconductor_ offers pre-built `org.*` annotation packages for model
+organisms, with their use described in the
+[OrgDb](http://bioconductor.org/help/workflows/annotation/Annotation_Resources/#OrgDb)
+section of the Annotation work flow. Here we discover available `OrgDb`
+objects for less-model organisms
+
+```{r less-model-org}
+library(AnnotationHub)
+ah <- AnnotationHub()
+query(ah, "OrgDb")
+orgdb <- query(ah, "OrgDb")[[1]] 
+```
+
+The object returned by AnnotationHub is directly usable with the
+`select()` interface, e.g., to discover the available keytypes for
+querying the object, the columns that these keytypes can map to, and
+finally selecting the SYMBOL and GENENAME corresponding to the first 6
+ENTREZIDs
+
+```{r less-model-org-select}
+keytypes(orgdb)
+columns(orgdb)
+egid <- head(keys(orgdb, "ENTREZID"))
+select(orgdb, egid, c("SYMBOL", "GENENAME"), "ENTREZID")
+```
+
+## Roadmap Epigenomics Project 
+
+All Roadmap Epigenomics files are hosted
+[here](http://egg2.wustl.edu/roadmap/data/byFileType/). If one had to
+download these files on their own, one would navigate through the web
+interface to find useful files, then use something like the following
+_R_ code.
+
+```{r, eval=FALSE}
+url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
+filename <-  basename(url)
+download.file(url, destfile=filename)
+if (file.exists(filename))
+   data <- import(filename, format="bed")
+```
+This would have to be repeated for all files, and the onus would lie
+on the user to identify, download, import, and manage the local disk
+location of these files.
+
+`r Biocpkg("AnnotationHub")` reduces this task to just a few lines of _R_ code 
+```{r results='hide'}
+library(AnnotationHub)
+ah = AnnotationHub()
+epiFiles <- query(ah, "EpigenomeRoadMap")
+```
+A look at the value returned by `epiFiles` shows us that 
+`r length(epiFiles)` roadmap resources are available via 
+`r Biocpkg("AnnotationHub")`.  Additional information about 
+the files is also available, e.g., where the files came from
+(dataprovider), genome, species, sourceurl, sourcetypes.
+
+```{r}
+epiFiles
+```
+
+A good sanity check to ensure that we have files only from the Roadmap Epigenomics
+project is to check that all the files in the returned smaller hub object
+come from _Homo sapiens_ and the `r unique(epiFiles$genome)` genome 
+```{r}
+unique(epiFiles$species)
+unique(epiFiles$genome)
+```
+Broadly, one can get an idea of the different files from this project 
+looking at the sourcetype
+```{r}
+table(epiFiles$sourcetype)
+```
+To get a more descriptive idea of these different files one can use:
+```{r}
+sort(table(epiFiles$description), decreasing=TRUE)
+```
+
+The 'metadata' provided by the Roadmap Epigenomics Project is also
+available. Note that the information displayed about a hub with a
+single resource is quite different from the information displayed when
+the hub references more than one resource.
+```{r}
+metadata.tab <- query(ah , c("EpigenomeRoadMap", "Metadata"))
+metadata.tab
+```
+
+So far we have been exploring information about resources, without
+downloading the resource to a local cache and importing it into R.
+One can retrieve the resource using `[[` as indicated at the
+end of the show method
+
+```{r echo=FALSE, results='hide'}
+metadata.tab <- ah[["AH41830"]]
+```
+```{r}
+metadata.tab <- ah[["AH41830"]]
+```
+
+The metadata.tab file is returned as a _data.frame_. The first 6 rows
+of the first 5 columns are shown here:
+
+```{r}
+metadata.tab[1:6, 1:5]
+```
+
+One can keep constructing different queries using multiple arguments to 
+trim down these `r length(epiFiles)` to get the files one wants. 
+For example, to get the ChIP-Seq files for consolidated epigenomes, 
+one could use
+```{r}
+bpChipEpi <- query(ah , c("EpigenomeRoadMap", "broadPeak", "chip", "consolidated"))
+```
+To get all the bigWig signal files, one can query the hub using 
+```{r}
+allBigWigFiles <- query(ah, c("EpigenomeRoadMap", "BigWig"))
+```
+To access the 15 state chromatin segmentations, one can use
+```{r}
+seg <- query(ah, c("EpigenomeRoadMap", "segmentations"))
+```
+If one is interested in getting all the files related to one sample
+```{r}
+E126 <- query(ah , c("EpigenomeRoadMap", "E126", "H3K4ME2"))
+E126
+```
+Hub resources can also be selected using `$`, `subset()`, and
+`display()`; see the main
+[_AnnotationHub_ vignette](AnnotationHub.html) for additional detail.
+
+Hub resources are imported as the appropriate _Bioconductor_ object
+for use in further analysis.  For example, peak files are returned as
+_GRanges_ objects.
+
+```{r echo=FALSE, results='hide'}
+peaks <- E126[['AH29817']]
+```
+```{r}
+peaks <- E126[['AH29817']]
+seqinfo(peaks)
+```
+
+BigWig files are returned as _BigWigFile_ objects. A _BigWigFile_ is a
+reference to a file on disk; the data in the file can be read in using
+`rtracklayer::import()`, perhaps querying these large files for
+particular genomic regions of interest as described on the help page
+`?import.bw`.
+
+Each record inside `r Biocpkg("AnnotationHub")` is associated with a
+unique identifier. Most _GRanges_ objects returned by 
+`r Biocpkg("AnnotationHub")` contain the unique AnnotationHub identifier of
+the resource from which the _GRanges_ is derived.  This can come handy
+when working with the _GRanges_ object for a while, and additional
+information about the object (e.g., the name of the file in the cache,
+or the original sourceurl for the data underlying the resource) that
+is being worked with.
+
+```{r}
+metadata(peaks)
+ah[metadata(peaks)$AnnotationHubName]$sourceurl
+```
+
+## Ensembl GTF and FASTA files for TxDb gene models and sequence queries
+
+_Bioconductor_ represents gene models using 'transcript'
+databases. These are available via packages such as
+`r Biocannopkg("TxDb.Hsapiens.UCSC.hg38.knownGene")`
+or can be constructed using functions such as
+`r Biocpkg("GenomicFeatures")`::`makeTxDbFromBiomart()`.
+
+_AnnotationHub_ provides an easy way to work with gene models
+published by Ensembl. Let's see what Ensembl's Release-80 has in terms
+of data for pufferfish, _Takifugu rubripes_.
+
+```{r takifugu-gene-models}
+query(ah, c("Takifugu", "release-80"))
+```
+
+We see that there is a GTF file descrbing gene models, as well as
+various DNA sequences. Let's retrieve the GTF and top-level DNA
+sequence files. The GTF file is imported as a _GRanges_ instance, the
+DNA sequence as a compressed, indexed Fasta file
+
+
+```{r takifugu-data}
+gtf <- ah[["AH47101"]]
+dna <- ah[["AH47477"]]
+
+head(gtf, 3)
+dna
+head(seqlevels(dna))
+```
+
+Let's identify the 25 longest DNA sequences, and keep just the
+annotations on these scaffolds.
+
+```{r takifugu-seqlengths}
+keep <- names(tail(sort(seqlengths(dna)), 25))
+gtf_subset <- gtf[seqnames(gtf) %in% keep]
+```
+
+It is trivial to make a TxDb instance of this subset (or of the entire
+gtf)
+
+```{r takifugu-txdb}
+library(GenomicFeatures)         # for makeTxDbFromGRanges
+txdb <- makeTxDbFromGRanges(gtf_subset)
+````
+
+and to use that in conjunction with the DNA sequences, e.g., to find
+exon sequences of all annotated genes.
+
+```{r takifugu-exons}
+library(Rsamtools)               # for getSeq,FaFile-method
+exons <- exons(txdb)
+length(exons)
+getSeq(dna, exons)
+```
+
+There is a one-to-one mapping between the genomic ranges contained in
+`exons` and the DNA sequences returned by `getSeq()`.
+
+Some difficulties arise when working with this partly assembled genome
+that require more advanced GenomicRanges skills, see the
+`r Biocpkg("GenomicRanges")` vignettes, especially "_GenomicRanges_
+HOWTOs" and "An Introduction to _GenomicRanges_".
+
+## liftOver to map between genome builds
+
+Suppose we wanted to lift features from one genome build to another,
+e.g., because annotations were generated for hg19 but our experimental
+analysis used hg18.  We know that UCSC provides 'liftover' files for
+mapping between genome builds.
+
+In this example, we will take our broad Peak _GRanges_ from E126 which
+comes from the 'hg19' genome, and lift over these features to their
+'hg38' coordinates.
+
+```{r}
+chainfiles <- query(ah , c("hg38", "hg19", "chainfile"))
+chainfiles
+```
+
+We are interested in the file that lifts over features from hg19 to
+hg38 so lets download that using
+
+```{r echo=FALSE, results='hide'}
+chain <- chainfiles[['AH14150']]
+```
+```{r}
+chain <- chainfiles[['AH14150']]
+chain
+```
+Perform the liftOver operation using `rtracklayer::liftOver()`:
+
+```{r}
+library(rtracklayer)
+gr38 <- liftOver(peaks, chain)
+```
+This returns a _GRangeslist_; update the genome of the result to get
+the final result
+
+```{r}
+genome(gr38) <- "hg38"
+gr38
+``` 
+
+## Working with dbSNP Variants
+
+One may also be interested in working with common germline variants with 
+evidence of medical interest. This information is available at 
+[NCBI](https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/).
+
+Query the dbDNP files in the hub:
+
+```{r echo=FALSE, results='hide', message=FALSE}
+query(ah, c("GRCh37", "dbSNP", "VCF" ))
+vcf <- ah[['AH50424']]
+```
+This returns a _VcfFile_ which can be read in using `r
+Biocpkg("VariantAnnotation")`; because VCF files can be large, `readVcf()`
+supports several strategies for importing only relevant parts of the file
+(e.g., particular genomic locations, particular features of the variants), see
+`?readVcf` for additional information.
+
+```{r message=FALSE}
+variants <- readVcf(vcf, genome="hg19")
+variants
+```
+
+`rowRanges()` returns information from the CHROM, POS and ID fields of the VCF 
+file, represented as a _GRanges_ instance
+
+```{r}
+rowRanges(variants)
+```
+
+Note that the broadPeaks files follow the UCSC chromosome naming convention,
+and the vcf data follows the NCBI style of chromosome naming convention. 
+To bring these ranges in the same chromosome
+naming convention (ie UCSC), we would use
+
+```{r}
+seqlevelsStyle(variants) <-seqlevelsStyle(peaks)
+```
+
+And then finally to find which variants overlap these broadPeaks we would use:
+
+```{r}
+overlap <- findOverlaps(variants, peaks)
+overlap
+```
+
+Some insight into how these results can be interpretted comes from
+looking a particular peak, e.g., the 3852nd peak
+
+```{r}
+idx <- subjectHits(overlap) == 3852
+overlap[idx]
+```
+
+There are three variants overlapping this peak; the coordinates of the
+peak and the overlapping variants are
+
+```{r}
+peaks[3852]
+rowRanges(variants)[queryHits(overlap[idx])]
+```
+
+# sessionInfo
+
+```{r}
+sessionInfo()
+```
diff --git a/inst/doc/AnnotationHub-HOWTO.html b/inst/doc/AnnotationHub-HOWTO.html
new file mode 100644
index 0000000..8756310
--- /dev/null
+++ b/inst/doc/AnnotationHub-HOWTO.html
@@ -0,0 +1,724 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+
+<title>AnnotationHub How-To’s</title>
+
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+  pre:not([class]) {
+    background-color: white;
+  }
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+   window.setTimeout(function() {
+      hljs.initHighlighting();
+   }, 0);
+}
+</script>
+
+
+<link href="data:text/css;charset=utf-8,body%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Abackground%2Dcolor%3A%20white%3B%0Afont%2Dsize%3A%2013px%3B%0A%7D%0Abody%20%7B%0Amax%2Dwidth%3A%20800px%3B%0Amargin%3A%200%20auto%3B%0Apadding%3A%201em%201em%202em%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%0Adiv%23TOC%20li%20%7B%0Alist%2Dstyle%3Anone%3B%0Abackground%2Dimage%3Anone%3B%0Abackground%2Drepeat%3Anone%3B%0Abackground%2Dposition%3A0%3B%0A%7D%0A%0Ap%2C%20pre%20%7B%20margin%3A%200em%2 [...]
+
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  var links = document.links;  
+  for (var i = 0, linksLength = links.length; i < linksLength; i++)
+    if(links[i].hostname != window.location.hostname)
+      links[i].target = '_blank';
+});
+</script>
+
+</head>
+
+<body>
+
+
+<div id="header">
+<h1 class="title">AnnotationHub How-To’s</h1>
+</div>
+
+<h1>Contents</h1>
+<div id="TOC">
+<ul>
+<li><a href="#accessing-genome-scale-data"><span class="toc-section-number">1</span> Accessing Genome-Scale Data</a><ul>
+<li><a href="#non-model-organism-gene-annotations"><span class="toc-section-number">1.1</span> Non-model organism gene annotations</a></li>
+<li><a href="#roadmap-epigenomics-project"><span class="toc-section-number">1.2</span> Roadmap Epigenomics Project</a></li>
+<li><a href="#ensembl-gtf-and-fasta-files-for-txdb-gene-models-and-sequence-queries"><span class="toc-section-number">1.3</span> Ensembl GTF and FASTA files for TxDb gene models and sequence queries</a></li>
+<li><a href="#liftover-to-map-between-genome-builds"><span class="toc-section-number">1.4</span> liftOver to map between genome builds</a></li>
+<li><a href="#working-with-dbsnp-variants"><span class="toc-section-number">1.5</span> Working with dbSNP Variants</a></li>
+</ul></li>
+<li><a href="#sessioninfo"><span class="toc-section-number">2</span> sessionInfo</a></li>
+</ul>
+</div>
+
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  document.querySelector("h1").className = "title";
+});
+</script>
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  var links = document.links;  
+  for (var i = 0, linksLength = links.length; i < linksLength; i++)
+    if (links[i].hostname != window.location.hostname)
+      links[i].target = '_blank';
+});
+</script>
+<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em><br /> <strong>Authors</strong>: Martin Morgan [cre], Marc Carlson [ctb], Dan Tenenbaum [ctb], Sonali Arora [ctb]<br /> <strong>Modified</strong>: Sun Jun 28 10:41:23 2015<br /> <strong>Compiled</strong>: Wed May 3 19:25:52 2017</p>
+<div id="accessing-genome-scale-data" class="section level1">
+<h1><span class="header-section-number">1</span> Accessing Genome-Scale Data</h1>
+<div id="non-model-organism-gene-annotations" class="section level2">
+<h2><span class="header-section-number">1.1</span> Non-model organism gene annotations</h2>
+<p><em>Bioconductor</em> offers pre-built <code>org.*</code> annotation packages for model organisms, with their use described in the <a href="http://bioconductor.org/help/workflows/annotation/Annotation_Resources/#OrgDb">OrgDb</a> section of the Annotation work flow. Here we discover available <code>OrgDb</code> objects for less-model organisms</p>
+<pre class="r"><code>library(AnnotationHub)
+ah <- AnnotationHub()</code></pre>
+<pre><code>## snapshotDate(): 2017-04-25</code></pre>
+<pre class="r"><code>query(ah, "OrgDb")</code></pre>
+<pre><code>## AnnotationHub with 19 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/
+## # $species: Escherichia coli, Anopheles gambiae, Arabidopsis thaliana, Bos taurus, Caenorhabditi...
+## # $rdataclass: OrgDb
+## # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
+## #   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH53757"]]' 
+## 
+##             title                  
+##   AH53757 | org.Ag.eg.db.sqlite    
+##   AH53758 | org.At.tair.db.sqlite  
+##   AH53759 | org.Bt.eg.db.sqlite    
+##   AH53760 | org.Cf.eg.db.sqlite    
+##   AH53761 | org.Gg.eg.db.sqlite    
+##   ...       ...                    
+##   AH53771 | org.Ce.eg.db.sqlite    
+##   AH53772 | org.Xl.eg.db.sqlite    
+##   AH53773 | org.Sc.sgd.db.sqlite   
+##   AH53774 | org.Dr.eg.db.sqlite    
+##   AH53775 | org.Pf.plasmo.db.sqlite</code></pre>
+<pre class="r"><code>orgdb <- query(ah, "OrgDb")[[1]] </code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/60495'</code></pre>
+<p>The object returned by AnnotationHub is directly usable with the <code>select()</code> interface, e.g., to discover the available keytypes for querying the object, the columns that these keytypes can map to, and finally selecting the SYMBOL and GENENAME corresponding to the first 6 ENTREZIDs</p>
+<pre class="r"><code>keytypes(orgdb)</code></pre>
+<pre><code>##  [1] "ACCNUM"       "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"     "ENZYME"      
+##  [7] "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"        "ONTOLOGY"    
+## [13] "ONTOLOGYALL"  "PATH"         "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
+## [19] "UNIPROT"</code></pre>
+<pre class="r"><code>columns(orgdb)</code></pre>
+<pre><code>##  [1] "ACCNUM"       "ENSEMBL"      "ENSEMBLPROT"  "ENSEMBLTRANS" "ENTREZID"     "ENZYME"      
+##  [7] "EVIDENCE"     "EVIDENCEALL"  "GENENAME"     "GO"           "GOALL"        "ONTOLOGY"    
+## [13] "ONTOLOGYALL"  "PATH"         "PMID"         "REFSEQ"       "SYMBOL"       "UNIGENE"     
+## [19] "UNIPROT"</code></pre>
+<pre class="r"><code>egid <- head(keys(orgdb, "ENTREZID"))
+select(orgdb, egid, c("SYMBOL", "GENENAME"), "ENTREZID")</code></pre>
+<pre><code>## 'select()' returned 1:1 mapping between keys and columns</code></pre>
+<pre><code>##   ENTREZID          SYMBOL      GENENAME
+## 1  1267437 AgaP_AGAP012606 AGAP012606-PA
+## 2  1267439 AgaP_AGAP012559 AGAP012559-PA
+## 3  1267440 AgaP_AGAP012558 AGAP012558-PA
+## 4  1267447 AgaP_AGAP012586 AGAP012586-PA
+## 5  1267450 AgaP_AGAP012834 AGAP012834-PA
+## 6  1267459 AgaP_AGAP012589 AGAP012589-PA</code></pre>
+</div>
+<div id="roadmap-epigenomics-project" class="section level2">
+<h2><span class="header-section-number">1.2</span> Roadmap Epigenomics Project</h2>
+<p>All Roadmap Epigenomics files are hosted <a href="http://egg2.wustl.edu/roadmap/data/byFileType/">here</a>. If one had to download these files on their own, one would navigate through the web interface to find useful files, then use something like the following <em>R</em> code.</p>
+<pre class="r"><code>url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
+filename <-  basename(url)
+download.file(url, destfile=filename)
+if (file.exists(filename))
+   data <- import(filename, format="bed")</code></pre>
+<p>This would have to be repeated for all files, and the onus would lie on the user to identify, download, import, and manage the local disk location of these files.</p>
+<p><em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em> reduces this task to just a few lines of <em>R</em> code</p>
+<pre class="r"><code>library(AnnotationHub)
+ah = AnnotationHub()</code></pre>
+<pre><code>## snapshotDate(): 2017-04-25</code></pre>
+<pre class="r"><code>epiFiles <- query(ah, "EpigenomeRoadMap")</code></pre>
+<p>A look at the value returned by <code>epiFiles</code> shows us that 18248 roadmap resources are available via <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em>. Additional information about the files is also available, e.g., where the files came from (dataprovider), genome, species, sourceurl, sourcetypes.</p>
+<pre class="r"><code>epiFiles</code></pre>
+<pre><code>## AnnotationHub with 18248 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: BroadInstitute
+## # $species: Homo sapiens
+## # $rdataclass: BigWigFile, GRanges, data.frame
+## # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
+## #   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH28856"]]' 
+## 
+##             title                                 
+##   AH28856 | E001-H3K4me1.broadPeak.gz             
+##   AH28857 | E001-H3K4me3.broadPeak.gz             
+##   AH28858 | E001-H3K9ac.broadPeak.gz              
+##   AH28859 | E001-H3K9me3.broadPeak.gz             
+##   AH28860 | E001-H3K27me3.broadPeak.gz            
+##   ...       ...                                   
+##   AH49540 | E058_mCRF_FractionalMethylation.bigwig
+##   AH49541 | E059_mCRF_FractionalMethylation.bigwig
+##   AH49542 | E061_mCRF_FractionalMethylation.bigwig
+##   AH49543 | E081_mCRF_FractionalMethylation.bigwig
+##   AH49544 | E082_mCRF_FractionalMethylation.bigwig</code></pre>
+<p>A good sanity check to ensure that we have files only from the Roadmap Epigenomics project is to check that all the files in the returned smaller hub object come from <em>Homo sapiens</em> and the hg19 genome</p>
+<pre class="r"><code>unique(epiFiles$species)</code></pre>
+<pre><code>## [1] "Homo sapiens"</code></pre>
+<pre class="r"><code>unique(epiFiles$genome)</code></pre>
+<pre><code>## [1] "hg19"</code></pre>
+<p>Broadly, one can get an idea of the different files from this project looking at the sourcetype</p>
+<pre class="r"><code>table(epiFiles$sourcetype)</code></pre>
+<pre><code>## 
+##    BED BigWig    GTF    Zip    tab 
+##   8298   9932      3     14      1</code></pre>
+<p>To get a more descriptive idea of these different files one can use:</p>
+<pre class="r"><code>sort(table(epiFiles$description), decreasing=TRUE)</code></pre>
+<pre><code>## 
+##                       Bigwig File containing -log10(p-value) signal tracks from EpigenomeRoadMap Project 
+##                                                                                                     6881 
+##                       Bigwig File containing fold enrichment signal tracks from EpigenomeRoadMap Project 
+##                                                                                                     2947 
+##                          Narrow ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
+##                                                                                                     2894 
+##                           Broad ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
+##                                                                                                     2534 
+##                          Gapped ChIP-seq peaks for consolidated epigenomes from EpigenomeRoadMap Project 
+##                                                                                                     2534 
+##                              Narrow DNasePeaks for consolidated epigenomes from EpigenomeRoadMap Project 
+##                                                                                                      131 
+##                                           15 state chromatin segmentations from EpigenomeRoadMap Project 
+##                                                                                                      127 
+##      Broad domains on enrichment for DNase-seq for consolidated epigenomes from EpigenomeRoadMap Project 
+##                                                                                                       78 
+##                                         RRBS fractional methylation calls from EpigenomeRoadMap Project  
+##                                                                                                       51 
+##                       Whole genome bisulphite fractional methylation calls from EpigenomeRoadMap Project 
+##                                                                                                       37 
+##                               MeDIP/MRE(mCRF) fractional methylation calls from EpigenomeRoadMap Project 
+##                                                                                                       16 
+## GencodeV10 gene/transcript coordinates and annotations corresponding to hg19 version of the human genome 
+##                                                                                                        3 
+##                                       RNA-seq read count matrix for intronic protein-coding RNA elements 
+##                                                                                                        2 
+##                                                      RNA-seq read counts matrix for ribosomal gene exons 
+##                                                                                                        2 
+##                                                          RPKM expression matrix for ribosomal gene exons 
+##                                                                                                        2 
+##                                                                    Metadata for EpigenomeRoadMap Project 
+##                                                                                                        1 
+##                                                           RNA-seq read counts matrix for non-coding RNAs 
+##                                                                                                        1 
+##                                                      RNA-seq read counts matrix for protein coding exons 
+##                                                                                                        1 
+##                                                      RNA-seq read counts matrix for protein coding genes 
+##                                                                                                        1 
+##                                                           RNA-seq read counts matrix for ribosomal genes 
+##                                                                                                        1 
+##                                                               RPKM expression matrix for non-coding RNAs 
+##                                                                                                        1 
+##                                                          RPKM expression matrix for protein coding exons 
+##                                                                                                        1 
+##                                                          RPKM expression matrix for protein coding genes 
+##                                                                                                        1 
+##                                                                RPKM expression matrix for ribosomal RNAs 
+##                                                                                                        1</code></pre>
+<p>The ‘metadata’ provided by the Roadmap Epigenomics Project is also available. Note that the information displayed about a hub with a single resource is quite different from the information displayed when the hub references more than one resource.</p>
+<pre class="r"><code>metadata.tab <- query(ah , c("EpigenomeRoadMap", "Metadata"))
+metadata.tab</code></pre>
+<pre><code>## AnnotationHub with 1 record
+## # snapshotDate(): 2017-04-25 
+## # names(): AH41830
+## # $dataprovider: BroadInstitute
+## # $species: Homo sapiens
+## # $rdataclass: data.frame
+## # $rdatadateadded: 2015-05-11
+## # $title: EID_metadata.tab
+## # $description: Metadata for EpigenomeRoadMap Project
+## # $taxonomyid: 9606
+## # $genome: hg19
+## # $sourcetype: tab
+## # $sourceurl: http://egg2.wustl.edu/roadmap/data/byFileType/metadata/EID_metadata.tab
+## # $sourcesize: 18035
+## # $tags: c("EpigenomeRoadMap", "Metadata") 
+## # retrieve record with 'object[["AH41830"]]'</code></pre>
+<p>So far we have been exploring information about resources, without downloading the resource to a local cache and importing it into R. One can retrieve the resource using <code>[[</code> as indicated at the end of the show method</p>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/47270'</code></pre>
+<pre class="r"><code>metadata.tab <- ah[["AH41830"]]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/47270'</code></pre>
+<p>The metadata.tab file is returned as a <em>data.frame</em>. The first 6 rows of the first 5 columns are shown here:</p>
+<pre class="r"><code>metadata.tab[1:6, 1:5]</code></pre>
+<pre><code>##    EID    GROUP   COLOR          MNEMONIC                                   STD_NAME
+## 1 E001      ESC #924965            ESC.I3                                ES-I3 Cells
+## 2 E002      ESC #924965           ESC.WA7                               ES-WA7 Cells
+## 3 E003      ESC #924965            ESC.H1                                   H1 Cells
+## 4 E004 ES-deriv #4178AE ESDR.H1.BMP4.MESO H1 BMP4 Derived Mesendoderm Cultured Cells
+## 5 E005 ES-deriv #4178AE ESDR.H1.BMP4.TROP H1 BMP4 Derived Trophoblast Cultured Cells
+## 6 E006 ES-deriv #4178AE       ESDR.H1.MSC          H1 Derived Mesenchymal Stem Cells</code></pre>
+<p>One can keep constructing different queries using multiple arguments to trim down these 18248 to get the files one wants. For example, to get the ChIP-Seq files for consolidated epigenomes, one could use</p>
+<pre class="r"><code>bpChipEpi <- query(ah , c("EpigenomeRoadMap", "broadPeak", "chip", "consolidated"))</code></pre>
+<p>To get all the bigWig signal files, one can query the hub using</p>
+<pre class="r"><code>allBigWigFiles <- query(ah, c("EpigenomeRoadMap", "BigWig"))</code></pre>
+<p>To access the 15 state chromatin segmentations, one can use</p>
+<pre class="r"><code>seg <- query(ah, c("EpigenomeRoadMap", "segmentations"))</code></pre>
+<p>If one is interested in getting all the files related to one sample</p>
+<pre class="r"><code>E126 <- query(ah , c("EpigenomeRoadMap", "E126", "H3K4ME2"))
+E126</code></pre>
+<pre><code>## AnnotationHub with 6 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: BroadInstitute
+## # $species: Homo sapiens
+## # $rdataclass: BigWigFile, GRanges
+## # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
+## #   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH29817"]]' 
+## 
+##             title                                  
+##   AH29817 | E126-H3K4me2.broadPeak.gz              
+##   AH30868 | E126-H3K4me2.narrowPeak.gz             
+##   AH31801 | E126-H3K4me2.gappedPeak.gz             
+##   AH32990 | E126-H3K4me2.fc.signal.bigwig          
+##   AH34022 | E126-H3K4me2.pval.signal.bigwig        
+##   AH40177 | E126-H3K4me2.imputed.pval.signal.bigwig</code></pre>
+<p>Hub resources can also be selected using <code>$</code>, <code>subset()</code>, and <code>display()</code>; see the main <a href="AnnotationHub.html"><em>AnnotationHub</em> vignette</a> for additional detail.</p>
+<p>Hub resources are imported as the appropriate <em>Bioconductor</em> object for use in further analysis. For example, peak files are returned as <em>GRanges</em> objects.</p>
+<pre><code>## require("rtracklayer")</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/35257'</code></pre>
+<pre class="r"><code>peaks <- E126[['AH29817']]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/35257'</code></pre>
+<pre class="r"><code>seqinfo(peaks)</code></pre>
+<pre><code>## Seqinfo object with 93 sequences (1 circular) from hg19 genome:
+##   seqnames       seqlengths isCircular genome
+##   chr1            249250621      FALSE   hg19
+##   chr2            243199373      FALSE   hg19
+##   chr3            198022430      FALSE   hg19
+##   chr4            191154276      FALSE   hg19
+##   chr5            180915260      FALSE   hg19
+##   ...                   ...        ...    ...
+##   chrUn_gl000245      36651      FALSE   hg19
+##   chrUn_gl000246      38154      FALSE   hg19
+##   chrUn_gl000247      36422      FALSE   hg19
+##   chrUn_gl000248      39786      FALSE   hg19
+##   chrUn_gl000249      38502      FALSE   hg19</code></pre>
+<p>BigWig files are returned as <em>BigWigFile</em> objects. A <em>BigWigFile</em> is a reference to a file on disk; the data in the file can be read in using <code>rtracklayer::import()</code>, perhaps querying these large files for particular genomic regions of interest as described on the help page <code>?import.bw</code>.</p>
+<p>Each record inside <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em> is associated with a unique identifier. Most <em>GRanges</em> objects returned by <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em> contain the unique AnnotationHub identifier of the resource from which the <em>GRanges</em> is derived. This can come handy when working with the <em>GRanges</em> object for a while, and additional information about the  [...]
+<pre class="r"><code>metadata(peaks)</code></pre>
+<pre><code>## $AnnotationHubName
+## [1] "AH29817"
+## 
+## $`File Name`
+## [1] "E126-H3K4me2.broadPeak.gz"
+## 
+## $`Data Source`
+## [1] "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E126-H3K4me2.broadPeak.gz"
+## 
+## $Provider
+## [1] "BroadInstitute"
+## 
+## $Organism
+## [1] "Homo sapiens"
+## 
+## $`Taxonomy ID`
+## [1] 9606</code></pre>
+<pre class="r"><code>ah[metadata(peaks)$AnnotationHubName]$sourceurl</code></pre>
+<pre><code>## [1] "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E126-H3K4me2.broadPeak.gz"</code></pre>
+</div>
+<div id="ensembl-gtf-and-fasta-files-for-txdb-gene-models-and-sequence-queries" class="section level2">
+<h2><span class="header-section-number">1.3</span> Ensembl GTF and FASTA files for TxDb gene models and sequence queries</h2>
+<p><em>Bioconductor</em> represents gene models using ‘transcript’ databases. These are available via packages such as <em><a href="http://bioconductor.org/packages/TxDb.Hsapiens.UCSC.hg38.knownGene">TxDb.Hsapiens.UCSC.hg38.knownGene</a></em> or can be constructed using functions such as <em><a href="http://bioconductor.org/packages/GenomicFeatures">GenomicFeatures</a></em>::<code>makeTxDbFromBiomart()</code>.</p>
+<p><em>AnnotationHub</em> provides an easy way to work with gene models published by Ensembl. Let’s see what Ensembl’s Release-80 has in terms of data for pufferfish, <em>Takifugu rubripes</em>.</p>
+<pre class="r"><code>query(ah, c("Takifugu", "release-80"))</code></pre>
+<pre><code>## AnnotationHub with 7 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: Ensembl
+## # $species: Takifugu rubripes
+## # $rdataclass: FaFile, GRanges
+## # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
+## #   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH47101"]]' 
+## 
+##             title                                     
+##   AH47101 | Takifugu_rubripes.FUGU4.80.gtf            
+##   AH47475 | Takifugu_rubripes.FUGU4.cdna.all.fa       
+##   AH47476 | Takifugu_rubripes.FUGU4.dna_rm.toplevel.fa
+##   AH47477 | Takifugu_rubripes.FUGU4.dna_sm.toplevel.fa
+##   AH47478 | Takifugu_rubripes.FUGU4.dna.toplevel.fa   
+##   AH47479 | Takifugu_rubripes.FUGU4.ncrna.fa          
+##   AH47480 | Takifugu_rubripes.FUGU4.pep.all.fa</code></pre>
+<p>We see that there is a GTF file descrbing gene models, as well as various DNA sequences. Let’s retrieve the GTF and top-level DNA sequence files. The GTF file is imported as a <em>GRanges</em> instance, the DNA sequence as a compressed, indexed Fasta file</p>
+<pre class="r"><code>gtf <- ah[["AH47101"]]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/52579'</code></pre>
+<pre><code>## using guess work to populate seqinfo</code></pre>
+<pre class="r"><code>dna <- ah[["AH47477"]]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/53323'
+##     '/home/biocbuild//.AnnotationHub/53324'</code></pre>
+<pre class="r"><code>head(gtf, 3)</code></pre>
+<pre><code>## GRanges object with 3 ranges and 19 metadata columns:
+##         seqnames         ranges strand |   source       type     score     phase            gene_id
+##            <Rle>      <IRanges>  <Rle> | <factor>   <factor> <numeric> <integer>        <character>
+##   [1] scaffold_1 [10422, 11354]      - |  ensembl       gene      <NA>      <NA> ENSTRUG00000003702
+##   [2] scaffold_1 [10422, 11354]      - |  ensembl transcript      <NA>      <NA> ENSTRUG00000003702
+##   [3] scaffold_1 [10422, 11354]      - |  ensembl       exon      <NA>      <NA> ENSTRUG00000003702
+##       gene_version gene_source   gene_biotype      transcript_id transcript_version
+##          <numeric> <character>    <character>        <character>          <numeric>
+##   [1]            1     ensembl protein_coding               <NA>               <NA>
+##   [2]            1     ensembl protein_coding ENSTRUT00000008740                  1
+##   [3]            1     ensembl protein_coding ENSTRUT00000008740                  1
+##       transcript_source transcript_biotype exon_number            exon_id exon_version  protein_id
+##             <character>        <character>   <numeric>        <character>    <numeric> <character>
+##   [1]              <NA>               <NA>        <NA>               <NA>         <NA>        <NA>
+##   [2]           ensembl     protein_coding        <NA>               <NA>         <NA>        <NA>
+##   [3]           ensembl     protein_coding           1 ENSTRUE00000055472            1        <NA>
+##       protein_version   gene_name transcript_name
+##             <numeric> <character>     <character>
+##   [1]            <NA>        <NA>            <NA>
+##   [2]            <NA>        <NA>            <NA>
+##   [3]            <NA>        <NA>            <NA>
+##   -------
+##   seqinfo: 2056 sequences (1 circular) from FUGU4 genome; no seqlengths</code></pre>
+<pre class="r"><code>dna</code></pre>
+<pre><code>## class: FaFile 
+## path: /home/biocbuild//.AnnotationHub/53323
+## index: /home/biocbuild//.AnnotationHub/53324
+## isOpen: FALSE 
+## yieldSize: NA</code></pre>
+<pre class="r"><code>head(seqlevels(dna))</code></pre>
+<pre><code>## [1] "scaffold_1" "scaffold_2" "scaffold_3" "scaffold_4" "scaffold_5" "scaffold_6"</code></pre>
+<p>Let’s identify the 25 longest DNA sequences, and keep just the annotations on these scaffolds.</p>
+<pre class="r"><code>keep <- names(tail(sort(seqlengths(dna)), 25))
+gtf_subset <- gtf[seqnames(gtf) %in% keep]</code></pre>
+<p>It is trivial to make a TxDb instance of this subset (or of the entire gtf)</p>
+<pre class="r"><code>library(GenomicFeatures)         # for makeTxDbFromGRanges
+txdb <- makeTxDbFromGRanges(gtf_subset)</code></pre>
+<p>and to use that in conjunction with the DNA sequences, e.g., to find exon sequences of all annotated genes.</p>
+<pre class="r"><code>library(Rsamtools)               # for getSeq,FaFile-method
+exons <- exons(txdb)
+length(exons)</code></pre>
+<pre><code>## [1] 66219</code></pre>
+<pre class="r"><code>getSeq(dna, exons)</code></pre>
+<pre><code>##   A DNAStringSet instance of length 66219
+##         width seq                                                               names               
+##     [1]    72 ATGGCCTATCAGTTGTACAGGAATACCACTC...CCTGCAGGAGAGTCTGGACGAGCTTATCCAG scaffold_1
+##     [2]   105 ACTCAGCAGATCACCCCTCAGCTGGCTCTCC...TAATCGTGTCCGCAACCGTGTGAACTTCAGG scaffold_1
+##     [3]   156 GGTTCTCTCAACACCTACCGGTTCTGTGACA...GAGCGAATCCATGCAAAACAAACTGGATAAA scaffold_1
+##     [4]    88 CAAACCAATCTCCTCGCTGTCTCTTCTCGTT...CATCAGCCAGAGGGACGGATCATCTCAGGTT scaffold_1
+##     [5]   271 AGACGAGATGAGTGAGGACGCATTCAACGCC...AACACAGTGTGGAGACTTCAGAGGACGCCAC scaffold_1
+##     ...   ... ...
+## [66215]    67 ACGACTGGATGACAACATCAGGACCGTGGTA...TCAGACCAATGTGGGTCAGGATGGCAGACAG scaffold_9
+## [66216]    50 TCTTTGGCTAATATTGACGATGTGGTAAACAAGATTCGTCTGAAGATTCG                scaffold_9
+## [66217]    81 GTATTTCCCAGCCAAGACCCGCTGGACAGGG...ATACATCAACACACTGTTTCCCACCGAGCAG scaffold_9
+## [66218]    87 ATGATGGAGGATGAAGAATTTGAATTTGCGG...GACCCCAGAGGTGCAGCTAGCAATTGAACAG scaffold_9
+## [66219]   213 GACGACATCCTCGTGTGGGGCCGCTCTAGGG...GACAGCTGCTGTTCGCCTGTTTCCCCCCCCC scaffold_9</code></pre>
+<p>There is a one-to-one mapping between the genomic ranges contained in <code>exons</code> and the DNA sequences returned by <code>getSeq()</code>.</p>
+<p>Some difficulties arise when working with this partly assembled genome that require more advanced GenomicRanges skills, see the <em><a href="http://bioconductor.org/packages/GenomicRanges">GenomicRanges</a></em> vignettes, especially “<em>GenomicRanges</em> HOWTOs” and “An Introduction to <em>GenomicRanges</em>”.</p>
+</div>
+<div id="liftover-to-map-between-genome-builds" class="section level2">
+<h2><span class="header-section-number">1.4</span> liftOver to map between genome builds</h2>
+<p>Suppose we wanted to lift features from one genome build to another, e.g., because annotations were generated for hg19 but our experimental analysis used hg18. We know that UCSC provides ‘liftover’ files for mapping between genome builds.</p>
+<p>In this example, we will take our broad Peak <em>GRanges</em> from E126 which comes from the ‘hg19’ genome, and lift over these features to their ‘hg38’ coordinates.</p>
+<pre class="r"><code>chainfiles <- query(ah , c("hg38", "hg19", "chainfile"))
+chainfiles</code></pre>
+<pre><code>## AnnotationHub with 2 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: UCSC
+## # $species: Homo sapiens
+## # $rdataclass: ChainFile
+## # additional mcols(): taxonomyid, genome, description, coordinate_1_based, maintainer,
+## #   rdatadateadded, preparerclass, tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH14108"]]' 
+## 
+##             title                   
+##   AH14108 | hg38ToHg19.over.chain.gz
+##   AH14150 | hg19ToHg38.over.chain.gz</code></pre>
+<p>We are interested in the file that lifts over features from hg19 to hg38 so lets download that using</p>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/18245'</code></pre>
+<pre class="r"><code>chain <- chainfiles[['AH14150']]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/18245'</code></pre>
+<pre class="r"><code>chain</code></pre>
+<pre><code>## Chain of length 25
+## names(25): chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 ... chr18 chr19 chr20 chr21 chr22 chrX chrY chrM</code></pre>
+<p>Perform the liftOver operation using <code>rtracklayer::liftOver()</code>:</p>
+<pre class="r"><code>library(rtracklayer)
+gr38 <- liftOver(peaks, chain)</code></pre>
+<pre><code>## Warning: closing unused connection 6 (/tmp/Rtmp4CdWfE/file42086a9c284e)</code></pre>
+<pre><code>## Warning: closing unused connection 5 (/tmp/Rtmp4CdWfE/file42081f40ea18)</code></pre>
+<p>This returns a <em>GRangeslist</em>; update the genome of the result to get the final result</p>
+<pre class="r"><code>genome(gr38) <- "hg38"
+gr38</code></pre>
+<pre><code>## GRangesList object of length 153266:
+## [[1]] 
+## GRanges object with 1 range and 5 metadata columns:
+##       seqnames               ranges strand |        name     score signalValue    pValue    qValue
+##          <Rle>            <IRanges>  <Rle> | <character> <numeric>   <numeric> <numeric> <numeric>
+##   [1]     chr1 [28667912, 28670147]      * |      Rank_1       189    10.55845  22.01316  18.99911
+## 
+## [[2]] 
+## GRanges object with 1 range and 5 metadata columns:
+##       seqnames               ranges strand |   name score signalValue   pValue   qValue
+##   [1]     chr4 [54090990, 54092984]      * | Rank_2   188     8.11483 21.80441 18.80662
+## 
+## [[3]] 
+## GRanges object with 1 range and 5 metadata columns:
+##       seqnames               ranges strand |   name score signalValue   pValue   qValue
+##   [1]    chr14 [75293392, 75296621]      * | Rank_3   180     8.89834 20.97714 18.02816
+## 
+## ...
+## <153263 more elements>
+## -------
+## seqinfo: 23 sequences from hg38 genome; no seqlengths</code></pre>
+</div>
+<div id="working-with-dbsnp-variants" class="section level2">
+<h2><span class="header-section-number">1.5</span> Working with dbSNP Variants</h2>
+<p>One may also be interested in working with common germline variants with evidence of medical interest. This information is available at <a href="https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/">NCBI</a>.</p>
+<p>Query the dbDNP files in the hub:</p>
+<p>This returns a <em>VcfFile</em> which can be read in using <code>r Biocpkg("VariantAnnotation")</code>; because VCF files can be large, <code>readVcf()</code> supports several strategies for importing only relevant parts of the file (e.g., particular genomic locations, particular features of the variants), see <code>?readVcf</code> for additional information.</p>
+<pre class="r"><code>variants <- readVcf(vcf, genome="hg19")
+variants</code></pre>
+<pre><code>## class: CollapsedVCF 
+## dim: 111138 0 
+## rowRanges(vcf):
+##   GRanges with 5 metadata columns: paramRangeID, REF, ALT, QUAL, FILTER
+## info(vcf):
+##   DataFrame with 58 columns: RS, RSPOS, RV, VP, GENEINFO, dbSNPBuildID, SAO, SSR, WGT, VC, PM, T...
+## info(header(vcf)):
+##                 Number Type    Description                                                         
+##    RS           1      Integer dbSNP ID (i.e. rs number)                                           
+##    RSPOS        1      Integer Chr position reported in dbSNP                                      
+##    RV           0      Flag    RS orientation is reversed                                          
+##    VP           1      String  Variation Property.  Documentation is at ftp://ftp.ncbi.nlm.nih.g...
+##    GENEINFO     1      String  Pairs each of gene symbol:gene id.  The gene symbol and id are de...
+##    dbSNPBuildID 1      Integer First dbSNP Build for RS                                            
+##    SAO          1      Integer Variant Allele Origin: 0 - unspecified, 1 - Germline, 2 - Somatic...
+##    SSR          1      Integer Variant Suspect Reason Codes (may be more than one value added to...
+##    WGT          1      Integer Weight, 00 - unmapped, 1 - weight 1, 2 - weight 2, 3 - weight 3 o...
+##    VC           1      String  Variation Class                                                     
+##    PM           0      Flag    Variant is Precious(Clinical,Pubmed Cited)                          
+##    TPA          0      Flag    Provisional Third Party Annotation(TPA) (currently rs from PHARMG...
+##    PMC          0      Flag    Links exist to PubMed Central article                               
+##    S3D          0      Flag    Has 3D structure - SNP3D table                                      
+##    SLO          0      Flag    Has SubmitterLinkOut - From SNP->SubSNP->Batch.link_out             
+##    NSF          0      Flag    Has non-synonymous frameshift A coding region variation where one...
+##    NSM          0      Flag    Has non-synonymous missense A coding region variation where one a...
+##    NSN          0      Flag    Has non-synonymous nonsense A coding region variation where one a...
+##    REF          0      Flag    Has reference A coding region variation where one allele in the s...
+##    SYN          0      Flag    Has synonymous A coding region variation where one allele in the ...
+##    U3           0      Flag    In 3' UTR Location is in an untranslated region (UTR). FxnCode = 53 
+##    U5           0      Flag    In 5' UTR Location is in an untranslated region (UTR). FxnCode = 55 
+##    ASS          0      Flag    In acceptor splice site FxnCode = 73                                
+##    DSS          0      Flag    In donor splice-site FxnCode = 75                                   
+##    INT          0      Flag    In Intron FxnCode = 6                                               
+##    R3           0      Flag    In 3' gene region FxnCode = 13                                      
+##    R5           0      Flag    In 5' gene region FxnCode = 15                                      
+##    OTH          0      Flag    Has other variant with exactly the same set of mapped positions o...
+##    CFL          0      Flag    Has Assembly conflict. This is for weight 1 and 2 variant that ma...
+##    ASP          0      Flag    Is Assembly specific. This is set if the variant only maps to one...
+##    MUT          0      Flag    Is mutation (journal citation, explicit fact): a low frequency va...
+##    VLD          0      Flag    Is Validated.  This bit is set if the variant has 2+ minor allele...
+##    G5A          0      Flag    >5% minor allele frequency in each and all populations              
+##    G5           0      Flag    >5% minor allele frequency in 1+ populations                        
+##    HD           0      Flag    Marker is on high density genotyping kit (50K density or greater)...
+##    GNO          0      Flag    Genotypes available. The variant has individual genotype (in SubI...
+##    KGPhase1     0      Flag    1000 Genome phase 1 (incl. June Interim phase 1)                    
+##    KGPhase3     0      Flag    1000 Genome phase 3                                                 
+##    CDA          0      Flag    Variation is interrogated in a clinical diagnostic assay            
+##    LSD          0      Flag    Submitted from a locus-specific database                            
+##    MTP          0      Flag    Microattribution/third-party annotation(TPA:GWAS,PAGE)              
+##    OM           0      Flag    Has OMIM/OMIA                                                       
+##    NOC          0      Flag    Contig allele not present in variant allele list. The reference s...
+##    WTD          0      Flag    Is Withdrawn by submitter If one member ss is withdrawn by submit...
+##    NOV          0      Flag    Rs cluster has non-overlapping allele sets. True when rs set has ...
+##    CAF          .      String  An ordered, comma delimited list of allele frequencies based on 1...
+##    COMMON       1      Integer RS is a common SNP.  A common SNP is one that has at least one 10...
+##    CLNHGVS      .      String  Variant names from HGVS.    The order of these variants correspon...
+##    CLNALLE      .      Integer Variant alleles from REF or ALT columns.  0 is REF, 1 is the firs...
+##    CLNSRC       .      String  Variant Clinical Chanels                                            
+##    CLNORIGIN    .      String  Allele Origin. One or more of the following values may be added: ...
+##    CLNSRCID     .      String  Variant Clinical Channel IDs                                        
+##    CLNSIG       .      String  Variant Clinical Significance, 0 - Uncertain significance, 1 - no...
+##    CLNDSDB      .      String  Variant disease database name                                       
+##    CLNDSDBID    .      String  Variant disease database ID                                         
+##    CLNDBN       .      String  Variant disease name                                                
+##    CLNREVSTAT   .      String  no_assertion - No assertion provided, no_criteria - No assertion ...
+##    CLNACC       .      String  Variant Accession and Versions                                      
+## geno(vcf):
+##   SimpleList of length 0:</code></pre>
+<p><code>rowRanges()</code> returns information from the CHROM, POS and ID fields of the VCF file, represented as a <em>GRanges</em> instance</p>
+<pre class="r"><code>rowRanges(variants)</code></pre>
+<pre><code>## GRanges object with 111138 ranges and 5 metadata columns:
+##               seqnames             ranges strand | paramRangeID            REF                ALT
+##                  <Rle>          <IRanges>  <Rle> |     <factor> <DNAStringSet> <DNAStringSetList>
+##   rs786201005        1 [1014143, 1014143]      * |         <NA>              C                  T
+##   rs672601345        1 [1014316, 1014316]      * |         <NA>              C                 CG
+##   rs672601312        1 [1014359, 1014359]      * |         <NA>              G                  T
+##   rs115173026        1 [1020217, 1020217]      * |         <NA>              G                  T
+##   rs201073369        1 [1020239, 1020239]      * |         <NA>              G                  C
+##           ...      ...                ...    ... .          ...            ...                ...
+##   rs527236200       MT     [15943, 15943]      * |         <NA>              T                  C
+##   rs118203890       MT     [15950, 15950]      * |         <NA>              G                  A
+##   rs199474700       MT     [15965, 15965]      * |         <NA>              A                  G
+##   rs199474701       MT     [15967, 15967]      * |         <NA>              G                  A
+##   rs199474699       MT     [15990, 15990]      * |         <NA>              C                  T
+##                    QUAL      FILTER
+##               <numeric> <character>
+##   rs786201005      <NA>           .
+##   rs672601345      <NA>           .
+##   rs672601312      <NA>           .
+##   rs115173026      <NA>           .
+##   rs201073369      <NA>           .
+##           ...       ...         ...
+##   rs527236200      <NA>           .
+##   rs118203890      <NA>           .
+##   rs199474700      <NA>           .
+##   rs199474701      <NA>           .
+##   rs199474699      <NA>           .
+##   -------
+##   seqinfo: 25 sequences from hg19 genome; no seqlengths</code></pre>
+<p>Note that the broadPeaks files follow the UCSC chromosome naming convention, and the vcf data follows the NCBI style of chromosome naming convention. To bring these ranges in the same chromosome naming convention (ie UCSC), we would use</p>
+<pre class="r"><code>seqlevelsStyle(variants) <-seqlevelsStyle(peaks)</code></pre>
+<p>And then finally to find which variants overlap these broadPeaks we would use:</p>
+<pre class="r"><code>overlap <- findOverlaps(variants, peaks)
+overlap</code></pre>
+<pre><code>## Hits object with 10904 hits and 0 metadata columns:
+##           queryHits subjectHits
+##           <integer>   <integer>
+##       [1]        35       20333
+##       [2]        36       20333
+##       [3]        37       20333
+##       [4]        38       20333
+##       [5]        41        7733
+##       ...       ...         ...
+##   [10900]    110761       21565
+##   [10901]    110762       21565
+##   [10902]    110763       21565
+##   [10903]    110764       21565
+##   [10904]    110765       21565
+##   -------
+##   queryLength: 111138 / subjectLength: 153266</code></pre>
+<p>Some insight into how these results can be interpretted comes from looking a particular peak, e.g., the 3852nd peak</p>
+<pre class="r"><code>idx <- subjectHits(overlap) == 3852
+overlap[idx]</code></pre>
+<pre><code>## Hits object with 39 hits and 0 metadata columns:
+##        queryHits subjectHits
+##        <integer>   <integer>
+##    [1]    102896        3852
+##    [2]    102897        3852
+##    [3]    102898        3852
+##    [4]    102899        3852
+##    [5]    102900        3852
+##    ...       ...         ...
+##   [35]    102930        3852
+##   [36]    102931        3852
+##   [37]    102932        3852
+##   [38]    102933        3852
+##   [39]    102934        3852
+##   -------
+##   queryLength: 111138 / subjectLength: 153266</code></pre>
+<p>There are three variants overlapping this peak; the coordinates of the peak and the overlapping variants are</p>
+<pre class="r"><code>peaks[3852]</code></pre>
+<pre><code>## GRanges object with 1 range and 5 metadata columns:
+##       seqnames               ranges strand |        name     score signalValue    pValue    qValue
+##          <Rle>            <IRanges>  <Rle> | <character> <numeric>   <numeric> <numeric> <numeric>
+##   [1]    chr22 [50622494, 50626143]      * |   Rank_3852        79     6.06768  10.18943   7.99818
+##   -------
+##   seqinfo: 93 sequences (1 circular) from hg19 genome</code></pre>
+<pre class="r"><code>rowRanges(variants)[queryHits(overlap[idx])]</code></pre>
+<pre><code>## GRanges object with 39 ranges and 5 metadata columns:
+##               seqnames               ranges strand | paramRangeID            REF                ALT
+##                  <Rle>            <IRanges>  <Rle> |     <factor> <DNAStringSet> <DNAStringSetList>
+##     rs6151429    chr22 [50625049, 50625049]      * |         <NA>              T                  C
+##     rs6151428    chr22 [50625182, 50625182]      * |         <NA>              C                A,T
+##   rs774153480    chr22 [50625182, 50625182]      * |         <NA>              C           CG,CGGGG
+##   rs199476388    chr22 [50625204, 50625204]      * |         <NA>              A                C,G
+##    rs74315482    chr22 [50625213, 50625213]      * |         <NA>              G                  A
+##           ...      ...                  ...    ... .          ...            ...                ...
+##   rs199476369    chr22 [50625936, 50625936]      * |         <NA>              C                  G
+##     rs2071421    chr22 [50625988, 50625988]      * |         <NA>              T                  C
+##    rs74315475    chr22 [50626033, 50626033]      * |         <NA>              T                  A
+##   rs398123419    chr22 [50626052, 50626052]      * |         <NA>              C                  A
+##   rs398123418    chr22 [50626057, 50626057]      * |         <NA>              G                  A
+##                    QUAL      FILTER
+##               <numeric> <character>
+##     rs6151429      <NA>           .
+##     rs6151428      <NA>           .
+##   rs774153480      <NA>           .
+##   rs199476388      <NA>           .
+##    rs74315482      <NA>           .
+##           ...       ...         ...
+##   rs199476369      <NA>           .
+##     rs2071421      <NA>           .
+##    rs74315475      <NA>           .
+##   rs398123419      <NA>           .
+##   rs398123418      <NA>           .
+##   -------
+##   seqinfo: 25 sequences from hg19 genome; no seqlengths</code></pre>
+</div>
+</div>
+<div id="sessioninfo" class="section level1">
+<h1><span class="header-section-number">2</span> sessionInfo</h1>
+<pre class="r"><code>sessionInfo()</code></pre>
+<pre><code>## R version 3.4.0 (2017-04-21)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 16.04.2 LTS
+## 
+## Matrix products: default
+## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
+## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
+## 
+## locale:
+##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C               LC_TIME=en_US.UTF-8       
+##  [4] LC_COLLATE=C               LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
+##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                  LC_ADDRESS=C              
+## [10] LC_TELEPHONE=C             LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
+## 
+## attached base packages:
+## [1] stats4    parallel  stats     graphics  grDevices utils     datasets  methods   base     
+## 
+## other attached packages:
+##  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.44.0                  
+##  [3] rtracklayer_1.36.1                VariantAnnotation_1.22.0         
+##  [5] SummarizedExperiment_1.6.1        DelayedArray_0.2.0               
+##  [7] matrixStats_0.52.2                Rsamtools_1.28.0                 
+##  [9] Biostrings_2.44.0                 XVector_0.16.0                   
+## [11] GenomicFeatures_1.28.0            AnnotationDbi_1.38.0             
+## [13] Biobase_2.36.2                    GenomicRanges_1.28.1             
+## [15] GenomeInfoDb_1.12.0               IRanges_2.10.0                   
+## [17] S4Vectors_0.14.0                  AnnotationHub_2.8.1              
+## [19] BiocGenerics_0.22.0               BiocStyle_2.4.0                  
+## 
+## loaded via a namespace (and not attached):
+##  [1] Rcpp_0.12.10                  compiler_3.4.0                BiocInstaller_1.26.0         
+##  [4] bitops_1.0-6                  tools_3.4.0                   zlibbioc_1.22.0              
+##  [7] biomaRt_2.32.0                digest_0.6.12                 lattice_0.20-35              
+## [10] RSQLite_1.1-2                 evaluate_0.10                 memoise_1.1.0                
+## [13] Matrix_1.2-10                 shiny_1.0.3                   DBI_0.6-1                    
+## [16] curl_2.6                      yaml_2.1.14                   GenomeInfoDbData_0.99.0      
+## [19] httr_1.2.1                    stringr_1.2.0                 knitr_1.15.1                 
+## [22] grid_3.4.0                    rprojroot_1.2                 R6_2.2.0                     
+## [25] BiocParallel_1.10.1           XML_3.98-1.6                  rmarkdown_1.5                
+## [28] magrittr_1.5                  GenomicAlignments_1.12.0      backports_1.0.5              
+## [31] htmltools_0.3.6               mime_0.5                      interactiveDisplayBase_1.14.0
+## [34] xtable_1.8-2                  httpuv_1.3.3                  stringi_1.1.5                
+## [37] RCurl_1.95-4.8</code></pre>
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/AnnotationHub.R b/inst/doc/AnnotationHub.R
new file mode 100644
index 0000000..537392f
--- /dev/null
+++ b/inst/doc/AnnotationHub.R
@@ -0,0 +1,71 @@
+## ----style, echo = FALSE, results = 'asis'-------------------------------
+BiocStyle::markdown()
+
+## ----library, message=FALSE----------------------------------------------
+library(AnnotationHub)
+
+## ----AnnotationHub-------------------------------------------------------
+ah = AnnotationHub()
+
+## ----show----------------------------------------------------------------
+ah
+
+## ----dataprovider--------------------------------------------------------
+unique(ah$dataprovider)
+
+## ----species-------------------------------------------------------------
+head(unique(ah$species))
+
+## ----rdataclass----------------------------------------------------------
+head(unique(ah$rdataclass))
+
+## ----dm1-----------------------------------------------------------------
+dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
+dm
+
+## ----show2---------------------------------------------------------------
+df <- mcols(dm)
+
+## ----length--------------------------------------------------------------
+length(ah)
+
+## ----subset--------------------------------------------------------------
+ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
+ahs
+
+## ----display, eval=FALSE-------------------------------------------------
+#  d <- display(ah)
+
+## ----dm2-----------------------------------------------------------------
+dm
+dm["AH15146"]
+
+## ----dm3-----------------------------------------------------------------
+dm[["AH15146"]]
+
+## ----show-2--------------------------------------------------------------
+ah
+
+## ----snapshot------------------------------------------------------------
+snapshotDate(ah)
+
+## ----possibleDates-------------------------------------------------------
+pd <- possibleDates(ah)
+pd
+
+## ----setdate, eval=FALSE-------------------------------------------------
+#  snapshotDate(ah) <- pd[1]
+
+## ----clusterOptions1, eval=FALSE-----------------------------------------
+#  library(AnnotationHub)
+#  hub <- AnnotationHub()
+#  gr <- hub[["AH50773"]]  ## downloaded once
+#  txdb <- makeTxDbFromGRanges(gr)  ## build on the fly
+
+## ----clusterOptions2, eval=FALSE-----------------------------------------
+#  library(AnnotationDbi)  ## if not already loaded
+#  txdb <- loadDb("/locationToFile/mytxdb.sqlite")
+
+## ----sessionInfo---------------------------------------------------------
+sessionInfo()
+
diff --git a/inst/doc/AnnotationHub.Rmd b/inst/doc/AnnotationHub.Rmd
new file mode 100644
index 0000000..8092c7d
--- /dev/null
+++ b/inst/doc/AnnotationHub.Rmd
@@ -0,0 +1,235 @@
+---
+title: "AnnotationHub: Access the AnnotationHub Web Service"
+output:
+  BiocStyle::html_document:
+    toc: true
+vignette: >
+  % \VignetteIndexEntry{AnnotationHub: Access the AnnotationHub Web Service}
+  % \VignetteDepends{AnnotationHub}
+  % \VignetteEngine{knitr::rmarkdown}
+  % \VignetteEncoding{UTF-8}
+---
+
+```{r style, echo = FALSE, results = 'asis'}
+BiocStyle::markdown()
+```
+**Package**: `r Biocpkg("AnnotationHub")`<br />
+**Authors**: `r packageDescription("AnnotationHub")[["Author"]] `<br />
+**Modified**: 27 May, 2016<br />
+**Compiled**: `r date()`
+
+The `AnnotationHub` server provides easy _R / Bioconductor_ access to
+large collections of publicly available whole genome resources,
+e.g,. ENSEMBL genome fasta or gtf files, UCSC chain resources, ENCODE
+data tracks at UCSC, etc.
+
+# AnnotationHub objects
+
+The `r Biocpkg("AnnotationHub")` package provides a client interface
+to resources stored at the AnnotationHub web service.
+
+```{r library, message=FALSE}
+library(AnnotationHub)
+```
+
+The `r Biocpkg("AnnotationHub")` package is straightforward to use.
+Create an `AnnotationHub` object
+
+```{r AnnotationHub}
+ah = AnnotationHub()
+```
+
+Now at this point you have already done everything you need in order
+to start retrieving annotations.  For most operations, using the
+`AnnotationHub` object should feel a lot like working with a familiar
+`list` or `data.frame`. 
+
+Lets take a minute to look at the show method for the hub object ah 
+
+```{r show}
+ah
+```
+
+You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass).  We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:
+
+```{r dataprovider}
+unique(ah$dataprovider)
+```
+
+In the same way, you can also see data from different species inside the hub by looking at the contents of species like this: 
+
+```{r species}
+head(unique(ah$species))
+```
+
+And this will also work for any of the other types of metadata present.  You can learn which kinds of metadata are available by simply hitting the tab key after you type 'ah$'.  In this way you can explore for yourself what kinds of data are present in the hub right from the command line. This interface also allows you to access the hub programatically to extract data that matches a particular set of criteria.
+
+Another valuable types of metadata to pay attention to is the rdataclass.
+
+```{r rdataclass}
+head(unique(ah$rdataclass))
+```
+
+The rdataclass allows you to see which kinds of R objects the hub will return to you.  This kind of information is valuable both as a means to filter results and also as a means to explore and learn about some of the kinds of annotation objects that are widely available for the project.  Right now this is a pretty short list, but over time it should grow as we support more of the different kinds of annotation objects via the hub.
+
+
+Now lets try getting the Chain Files from UCSC using the query and subset methods to selectively pare down the hub based on specific criteria. 
+The query method lets you search rows for
+specific strings, returning an `AnnotationHub` instance with just the
+rows matching the query.
+
+From the show method, one can easily see that one of the dataprovider is
+UCSC and there is a rdataclass for ChainFile
+
+One can get chain files for Drosophila melanogaster from UCSC with:
+
+```{r dm1}
+dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
+dm
+```
+Query has worked and you can now see that the only species present is 
+Drosophila melanogaster. 
+ 
+The metadata underlying this hub object can be retrieved by you 
+
+```{r show2}
+df <- mcols(dm)
+```
+
+By default the show method will only display the first 5 and last 5  rows.
+There are already thousands of records present in the hub.
+
+```{r length}
+length(ah)
+```
+Lets look at another example, where we pull down only Inparanoid8 data 
+from the hub and use  subset to return a smaller base object (here we
+ are finding cases where the genome column is set to panda).
+
+```{r subset}
+ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
+ahs
+```
+
+We can also look at the `AnnotationHub` object in a browser using the
+`display()` function. We can then filter the `AnnotationHub` object
+for _chainFile__ by either using the Global search field on the top
+right corner of the page or the in-column search field for `rdataclass'.
+
+```{r display, eval=FALSE}
+d <- display(ah)
+```
+
+![](display.png)
+Displaying and filtering the Annotation Hub object in a browser
+
+By default 1000 entries are displayed per page, we can change this using
+the filter on the top of the page or navigate through different pages
+using the page scrolling feature at the bottom of the page. 
+
+We can also select the rows of interest to us and send them back to
+the R session using 'Return rows to R session' button ; this sets a
+filter internally which filters the `AnnotationHub` object. The names
+of the selected AnnotationHub elements displayed at the top of the
+page.
+
+# Using `AnnotationHub` to retrieve data
+
+Looking back at our chain file example, if we are interested in the file 
+dm1ToDm2.over.chain.gz, we can gets its metadata using
+
+```{r dm2}
+dm
+dm["AH15146"]
+```
+We can download the file using
+
+```{r dm3}
+dm[["AH15146"]]
+```
+Each file is retrieved from the AnnotationHub server and the file is
+also cache locally, so that the next time you need to retrieve it,
+it should download much more quickly.
+
+# Configuring `AnnotationHub` objects
+
+When you create the `AnnotationHub` object, it will set up the object
+for you with some default settings.  See `?AnnotationHub` for ways to
+customize the hub source, the local cache, and other instance-specific
+options, and `?getAnnotationHubOption` to get or set package-global 
+options for use across sessions. 
+
+If you look at the object you will see some helpful information about
+it such as where the data is cached and where online the hub server is
+set to.
+
+```{r show-2}
+ah
+```
+
+By default the `AnnotationHub` object is set to the latest
+`snapshotData` and a snapshot version that matches the version of
+_Bioconductor_ that you are using. You can also learn about these data
+with the appropriate methods.
+
+```{r snapshot}
+snapshotDate(ah)
+```
+
+If you are interested in using an older version of a snapshot, you can
+list previous versions with the `possibleDates()` like this:
+
+```{r possibleDates}
+pd <- possibleDates(ah)
+pd
+```
+
+Set the dates like this:
+
+```{r setdate, eval=FALSE}
+snapshotDate(ah) <- pd[1]
+```
+# AnnotationHub objects in a cluster environment
+
+Resources in AnnotationHub aren't loaded with the standard `R` package approach
+and therefore can't be loaded on cluster nodes with library(). There are a
+couple of options to sharing AnnotationHub objects across a cluster when
+researchers are using the same R install and want access to the same
+annotations.
+
+As an example, we create a TxDb object from a GRanges stored in AnnotationHub
+contributed by contributed by Timothée Flutre.  The GRanges was created from a
+GFF file and contains gene information for Vitis vinifera.
+
+* Download once and build on the fly
+
+One option is that each user downloads the resource with hub[["AH50773"]] and
+the GRanges is saved in the cache. Each subsequent call to 
+hub[["AH50773"]] retrieves the resource from the cache which is very fast.
+
+The necessary code extracts the resource then calls makeTxDbFromGRanges().
+```{r clusterOptions1, eval=FALSE}
+library(AnnotationHub)
+hub <- AnnotationHub()
+gr <- hub[["AH50773"]]  ## downloaded once
+txdb <- makeTxDbFromGRanges(gr)  ## build on the fly
+```
+
+* Build once and share
+
+Another approach is that one user builds the TxDb and saves it as a .sqlite
+file. The cluster admin installs this in a common place on all cluster nodes
+and each user can load it with loadDb(). Loading the file is as quick and
+easy as calling library() on a TxDb package.
+
+Once the .sqlite file is install each user's code would include:
+```{r clusterOptions2, eval=FALSE}
+library(AnnotationDbi)  ## if not already loaded
+txdb <- loadDb("/locationToFile/mytxdb.sqlite")
+```
+
+# Session info
+
+```{r sessionInfo}
+sessionInfo()
+```
diff --git a/inst/doc/AnnotationHub.html b/inst/doc/AnnotationHub.html
new file mode 100644
index 0000000..dc94e47
--- /dev/null
+++ b/inst/doc/AnnotationHub.html
@@ -0,0 +1,391 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+
+<title>AnnotationHub: Access the AnnotationHub Web Service</title>
+
+<link href="data:text/css;charset=utf-8,pre%20%2Eoperator%2C%0Apre%20%2Eparen%20%7B%0Acolor%3A%20rgb%28104%2C%20118%2C%20135%29%0A%7D%0Apre%20%2Eliteral%20%7B%0Acolor%3A%20%23990073%0A%7D%0Apre%20%2Enumber%20%7B%0Acolor%3A%20%23099%3B%0A%7D%0Apre%20%2Ecomment%20%7B%0Acolor%3A%20%23998%3B%0Afont%2Dstyle%3A%20italic%0A%7D%0Apre%20%2Ekeyword%20%7B%0Acolor%3A%20%23900%3B%0Afont%2Dweight%3A%20bold%0A%7D%0Apre%20%2Eidentifier%20%7B%0Acolor%3A%20rgb%280%2C%200%2C%200%29%3B%0A%7D%0Apre%20%2Estri [...]
+<script src="data:application/x-javascript;base64,dmFyIGhsanM9bmV3IGZ1bmN0aW9uKCl7ZnVuY3Rpb24gbShwKXtyZXR1cm4gcC5yZXBsYWNlKC8mL2dtLCImYW1wOyIpLnJlcGxhY2UoLzwvZ20sIiZsdDsiKX1mdW5jdGlvbiBmKHIscSxwKXtyZXR1cm4gUmVnRXhwKHEsIm0iKyhyLmNJPyJpIjoiIikrKHA/ImciOiIiKSl9ZnVuY3Rpb24gYihyKXtmb3IodmFyIHA9MDtwPHIuY2hpbGROb2Rlcy5sZW5ndGg7cCsrKXt2YXIgcT1yLmNoaWxkTm9kZXNbcF07aWYocS5ub2RlTmFtZT09IkNPREUiKXtyZXR1cm4gcX1pZighKHEubm9kZVR5cGU9PTMmJnEubm9kZVZhbHVlLm1hdGNoKC9ccysvKSkpe2JyZWFrfX19ZnVuY3Rpb24gaCh0LH [...]
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+  pre:not([class]) {
+    background-color: white;
+  }
+</style>
+<script type="text/javascript">
+if (window.hljs && document.readyState && document.readyState === "complete") {
+   window.setTimeout(function() {
+      hljs.initHighlighting();
+   }, 0);
+}
+</script>
+
+
+<link href="data:text/css;charset=utf-8,body%2C%20td%20%7B%0Afont%2Dfamily%3A%20sans%2Dserif%3B%0Abackground%2Dcolor%3A%20white%3B%0Afont%2Dsize%3A%2013px%3B%0A%7D%0Abody%20%7B%0Amax%2Dwidth%3A%20800px%3B%0Amargin%3A%200%20auto%3B%0Apadding%3A%201em%201em%202em%3B%0Aline%2Dheight%3A%2020px%3B%0A%7D%0A%0Adiv%23TOC%20li%20%7B%0Alist%2Dstyle%3Anone%3B%0Abackground%2Dimage%3Anone%3B%0Abackground%2Drepeat%3Anone%3B%0Abackground%2Dposition%3A0%3B%0A%7D%0A%0Ap%2C%20pre%20%7B%20margin%3A%200em%2 [...]
+
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  var links = document.links;  
+  for (var i = 0, linksLength = links.length; i < linksLength; i++)
+    if(links[i].hostname != window.location.hostname)
+      links[i].target = '_blank';
+});
+</script>
+
+</head>
+
+<body>
+
+
+<div id="header">
+<h1 class="title">AnnotationHub: Access the AnnotationHub Web Service</h1>
+</div>
+
+<h1>Contents</h1>
+<div id="TOC">
+<ul>
+<li><a href="#annotationhub-objects"><span class="toc-section-number">1</span> AnnotationHub objects</a></li>
+<li><a href="#using-annotationhub-to-retrieve-data"><span class="toc-section-number">2</span> Using <code>AnnotationHub</code> to retrieve data</a></li>
+<li><a href="#configuring-annotationhub-objects"><span class="toc-section-number">3</span> Configuring <code>AnnotationHub</code> objects</a></li>
+<li><a href="#annotationhub-objects-in-a-cluster-environment"><span class="toc-section-number">4</span> AnnotationHub objects in a cluster environment</a></li>
+<li><a href="#session-info"><span class="toc-section-number">5</span> Session info</a></li>
+</ul>
+</div>
+
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  document.querySelector("h1").className = "title";
+});
+</script>
+<script type="text/javascript">
+document.addEventListener("DOMContentLoaded", function() {
+  var links = document.links;  
+  for (var i = 0, linksLength = links.length; i < linksLength; i++)
+    if (links[i].hostname != window.location.hostname)
+      links[i].target = '_blank';
+});
+</script>
+<p><strong>Package</strong>: <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em><br /> <strong>Authors</strong>: Martin Morgan [cre], Marc Carlson [ctb], Dan Tenenbaum [ctb], Sonali Arora [ctb]<br /> <strong>Modified</strong>: 27 May, 2016<br /> <strong>Compiled</strong>: Wed May 3 19:26:51 2017</p>
+<p>The <code>AnnotationHub</code> server provides easy <em>R / Bioconductor</em> access to large collections of publicly available whole genome resources, e.g,. ENSEMBL genome fasta or gtf files, UCSC chain resources, ENCODE data tracks at UCSC, etc.</p>
+<div id="annotationhub-objects" class="section level1">
+<h1><span class="header-section-number">1</span> AnnotationHub objects</h1>
+<p>The <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em> package provides a client interface to resources stored at the AnnotationHub web service.</p>
+<pre class="r"><code>library(AnnotationHub)</code></pre>
+<p>The <em><a href="http://bioconductor.org/packages/AnnotationHub">AnnotationHub</a></em> package is straightforward to use. Create an <code>AnnotationHub</code> object</p>
+<pre class="r"><code>ah = AnnotationHub()</code></pre>
+<pre><code>## snapshotDate(): 2017-04-25</code></pre>
+<p>Now at this point you have already done everything you need in order to start retrieving annotations. For most operations, using the <code>AnnotationHub</code> object should feel a lot like working with a familiar <code>list</code> or <code>data.frame</code>.</p>
+<p>Lets take a minute to look at the show method for the hub object ah</p>
+<pre class="r"><code>ah</code></pre>
+<pre><code>## AnnotationHub with 39213 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: BroadInstitute, Ensembl, UCSC, Haemcode, Inparanoid8, ...
+## # $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Da...
+## # $rdataclass: GRanges, BigWigFile, FaFile, TwoBitFile, ChainFile, Rle,...
+## # additional mcols(): taxonomyid, genome, description,
+## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
+## #   tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH2"]]' 
+## 
+##             title                                                 
+##   AH2     | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa     
+##   AH3     | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa  
+##   AH4     | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa  
+##   AH5     | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa            
+##   AH6     | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa          
+##   ...       ...                                                   
+##   AH54627 | Xiphophorus_maculatus.Xipmac4.4.2.cdna.all.2bit       
+##   AH54628 | Xiphophorus_maculatus.Xipmac4.4.2.dna.toplevel.2bit   
+##   AH54629 | Xiphophorus_maculatus.Xipmac4.4.2.dna_rm.toplevel.2bit
+##   AH54630 | Xiphophorus_maculatus.Xipmac4.4.2.dna_sm.toplevel.2bit
+##   AH54631 | Xiphophorus_maculatus.Xipmac4.4.2.ncrna.2bit</code></pre>
+<p>You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass). We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:</p>
+<pre class="r"><code>unique(ah$dataprovider)</code></pre>
+<pre><code>##  [1] "Ensembl"                              
+##  [2] "UCSC"                                 
+##  [3] "RefNet"                               
+##  [4] "Inparanoid8"                          
+##  [5] "NHLBI"                                
+##  [6] "ChEA"                                 
+##  [7] "Pazar"                                
+##  [8] "NIH Pathway Interaction Database"     
+##  [9] "Haemcode"                             
+## [10] "BroadInstitute"                       
+## [11] "PRIDE"                                
+## [12] "Gencode"                              
+## [13] "dbSNP"                                
+## [14] "CRIBI"                                
+## [15] "Genoscope"                            
+## [16] "MISO, VAST-TOOLS, UCSC"               
+## [17] "ftp://ftp.ncbi.nlm.nih.gov/gene/DATA/"</code></pre>
+<p>In the same way, you can also see data from different species inside the hub by looking at the contents of species like this:</p>
+<pre class="r"><code>head(unique(ah$species))</code></pre>
+<pre><code>## [1] "Ailuropoda melanoleuca" "Anolis carolinensis"   
+## [3] "Bos taurus"             "Caenorhabditis elegans"
+## [5] "Callithrix jacchus"     "Canis familiaris"</code></pre>
+<p>And this will also work for any of the other types of metadata present. You can learn which kinds of metadata are available by simply hitting the tab key after you type ‘ah$’. In this way you can explore for yourself what kinds of data are present in the hub right from the command line. This interface also allows you to access the hub programatically to extract data that matches a particular set of criteria.</p>
+<p>Another valuable types of metadata to pay attention to is the rdataclass.</p>
+<pre class="r"><code>head(unique(ah$rdataclass))</code></pre>
+<pre><code>## [1] "FaFile"        "GRanges"       "data.frame"    "Inparanoid8Db"
+## [5] "TwoBitFile"    "ChainFile"</code></pre>
+<p>The rdataclass allows you to see which kinds of R objects the hub will return to you. This kind of information is valuable both as a means to filter results and also as a means to explore and learn about some of the kinds of annotation objects that are widely available for the project. Right now this is a pretty short list, but over time it should grow as we support more of the different kinds of annotation objects via the hub.</p>
+<p>Now lets try getting the Chain Files from UCSC using the query and subset methods to selectively pare down the hub based on specific criteria. The query method lets you search rows for specific strings, returning an <code>AnnotationHub</code> instance with just the rows matching the query.</p>
+<p>From the show method, one can easily see that one of the dataprovider is UCSC and there is a rdataclass for ChainFile</p>
+<p>One can get chain files for Drosophila melanogaster from UCSC with:</p>
+<pre class="r"><code>dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
+dm</code></pre>
+<pre><code>## AnnotationHub with 45 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: UCSC
+## # $species: Drosophila melanogaster
+## # $rdataclass: ChainFile
+## # additional mcols(): taxonomyid, genome, description,
+## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
+## #   tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH15102"]]' 
+## 
+##             title                     
+##   AH15102 | dm3ToAnoGam1.over.chain.gz
+##   AH15103 | dm3ToApiMel3.over.chain.gz
+##   AH15104 | dm3ToDm2.over.chain.gz    
+##   AH15105 | dm3ToDm6.over.chain.gz    
+##   AH15106 | dm3ToDp3.over.chain.gz    
+##   ...       ...                       
+##   AH15142 | dm2ToDroVir3.over.chain.gz
+##   AH15143 | dm2ToDroWil1.over.chain.gz
+##   AH15144 | dm2ToDroYak1.over.chain.gz
+##   AH15145 | dm2ToDroYak2.over.chain.gz
+##   AH15146 | dm1ToDm2.over.chain.gz</code></pre>
+<p>Query has worked and you can now see that the only species present is Drosophila melanogaster.</p>
+<p>The metadata underlying this hub object can be retrieved by you</p>
+<pre class="r"><code>df <- mcols(dm)</code></pre>
+<p>By default the show method will only display the first 5 and last 5 rows. There are already thousands of records present in the hub.</p>
+<pre class="r"><code>length(ah)</code></pre>
+<pre><code>## [1] 39213</code></pre>
+<p>Lets look at another example, where we pull down only Inparanoid8 data from the hub and use subset to return a smaller base object (here we are finding cases where the genome column is set to panda).</p>
+<pre class="r"><code>ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
+ahs</code></pre>
+<pre><code>## AnnotationHub with 1 record
+## # snapshotDate(): 2017-04-25 
+## # names(): AH10451
+## # $dataprovider: Inparanoid8
+## # $species: Ailuropoda melanoleuca
+## # $rdataclass: Inparanoid8Db
+## # $rdatadateadded: 2014-03-31
+## # $title: hom.Ailuropoda_melanoleuca.inp8.sqlite
+## # $description: Inparanoid 8 annotations about Ailuropoda melanoleuca
+## # $taxonomyid: 9646
+## # $genome: inparanoid8 genomes
+## # $sourcetype: Inparanoid
+## # $sourceurl: http://inparanoid.sbc.su.se/download/current/Orthologs/A....
+## # $sourcesize: NA
+## # $tags: c("Inparanoid", "Gene", "Homology", "Annotation") 
+## # retrieve record with 'object[["AH10451"]]'</code></pre>
+<p>We can also look at the <code>AnnotationHub</code> object in a browser using the <code>display()</code> function. We can then filter the <code>AnnotationHub</code> object for _chainFile__ by either using the Global search field on the top right corner of the page or the in-column search field for `rdataclass’.</p>
+<pre class="r"><code>d <- display(ah)</code></pre>
+<p><img src=" [...]
+<p>By default 1000 entries are displayed per page, we can change this using the filter on the top of the page or navigate through different pages using the page scrolling feature at the bottom of the page.</p>
+<p>We can also select the rows of interest to us and send them back to the R session using ‘Return rows to R session’ button ; this sets a filter internally which filters the <code>AnnotationHub</code> object. The names of the selected AnnotationHub elements displayed at the top of the page.</p>
+</div>
+<div id="using-annotationhub-to-retrieve-data" class="section level1">
+<h1><span class="header-section-number">2</span> Using <code>AnnotationHub</code> to retrieve data</h1>
+<p>Looking back at our chain file example, if we are interested in the file dm1ToDm2.over.chain.gz, we can gets its metadata using</p>
+<pre class="r"><code>dm</code></pre>
+<pre><code>## AnnotationHub with 45 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: UCSC
+## # $species: Drosophila melanogaster
+## # $rdataclass: ChainFile
+## # additional mcols(): taxonomyid, genome, description,
+## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
+## #   tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH15102"]]' 
+## 
+##             title                     
+##   AH15102 | dm3ToAnoGam1.over.chain.gz
+##   AH15103 | dm3ToApiMel3.over.chain.gz
+##   AH15104 | dm3ToDm2.over.chain.gz    
+##   AH15105 | dm3ToDm6.over.chain.gz    
+##   AH15106 | dm3ToDp3.over.chain.gz    
+##   ...       ...                       
+##   AH15142 | dm2ToDroVir3.over.chain.gz
+##   AH15143 | dm2ToDroWil1.over.chain.gz
+##   AH15144 | dm2ToDroYak1.over.chain.gz
+##   AH15145 | dm2ToDroYak2.over.chain.gz
+##   AH15146 | dm1ToDm2.over.chain.gz</code></pre>
+<pre class="r"><code>dm["AH15146"]</code></pre>
+<pre><code>## AnnotationHub with 1 record
+## # snapshotDate(): 2017-04-25 
+## # names(): AH15146
+## # $dataprovider: UCSC
+## # $species: Drosophila melanogaster
+## # $rdataclass: ChainFile
+## # $rdatadateadded: 2014-12-15
+## # $title: dm1ToDm2.over.chain.gz
+## # $description: UCSC liftOver chain file from dm1 to dm2
+## # $taxonomyid: 7227
+## # $genome: dm1
+## # $sourcetype: Chain
+## # $sourceurl: http://hgdownload.cse.ucsc.edu/goldenpath/dm1/liftOver/dm...
+## # $sourcesize: NA
+## # $tags: c("liftOver", "chain", "UCSC", "genome", "homology") 
+## # retrieve record with 'object[["AH15146"]]'</code></pre>
+<p>We can download the file using</p>
+<pre class="r"><code>dm[["AH15146"]]</code></pre>
+<pre><code>## loading from cache '/home/biocbuild//.AnnotationHub/19241'</code></pre>
+<pre><code>## Chain of length 11
+## names(11): chr2L chr2R chr3L chr3R chr4 chrX chrU chr2h chr3h chrXh chrYh</code></pre>
+<p>Each file is retrieved from the AnnotationHub server and the file is also cache locally, so that the next time you need to retrieve it, it should download much more quickly.</p>
+</div>
+<div id="configuring-annotationhub-objects" class="section level1">
+<h1><span class="header-section-number">3</span> Configuring <code>AnnotationHub</code> objects</h1>
+<p>When you create the <code>AnnotationHub</code> object, it will set up the object for you with some default settings. See <code>?AnnotationHub</code> for ways to customize the hub source, the local cache, and other instance-specific options, and <code>?getAnnotationHubOption</code> to get or set package-global options for use across sessions.</p>
+<p>If you look at the object you will see some helpful information about it such as where the data is cached and where online the hub server is set to.</p>
+<pre class="r"><code>ah</code></pre>
+<pre><code>## AnnotationHub with 39213 records
+## # snapshotDate(): 2017-04-25 
+## # $dataprovider: BroadInstitute, Ensembl, UCSC, Haemcode, Inparanoid8, ...
+## # $species: Homo sapiens, Mus musculus, Bos taurus, Pan troglodytes, Da...
+## # $rdataclass: GRanges, BigWigFile, FaFile, TwoBitFile, ChainFile, Rle,...
+## # additional mcols(): taxonomyid, genome, description,
+## #   coordinate_1_based, maintainer, rdatadateadded, preparerclass,
+## #   tags, rdatapath, sourceurl, sourcetype 
+## # retrieve records with, e.g., 'object[["AH2"]]' 
+## 
+##             title                                                 
+##   AH2     | Ailuropoda_melanoleuca.ailMel1.69.dna.toplevel.fa     
+##   AH3     | Ailuropoda_melanoleuca.ailMel1.69.dna_rm.toplevel.fa  
+##   AH4     | Ailuropoda_melanoleuca.ailMel1.69.dna_sm.toplevel.fa  
+##   AH5     | Ailuropoda_melanoleuca.ailMel1.69.ncrna.fa            
+##   AH6     | Ailuropoda_melanoleuca.ailMel1.69.pep.all.fa          
+##   ...       ...                                                   
+##   AH54627 | Xiphophorus_maculatus.Xipmac4.4.2.cdna.all.2bit       
+##   AH54628 | Xiphophorus_maculatus.Xipmac4.4.2.dna.toplevel.2bit   
+##   AH54629 | Xiphophorus_maculatus.Xipmac4.4.2.dna_rm.toplevel.2bit
+##   AH54630 | Xiphophorus_maculatus.Xipmac4.4.2.dna_sm.toplevel.2bit
+##   AH54631 | Xiphophorus_maculatus.Xipmac4.4.2.ncrna.2bit</code></pre>
+<p>By default the <code>AnnotationHub</code> object is set to the latest <code>snapshotData</code> and a snapshot version that matches the version of <em>Bioconductor</em> that you are using. You can also learn about these data with the appropriate methods.</p>
+<pre class="r"><code>snapshotDate(ah)</code></pre>
+<pre><code>## [1] "2017-04-25"</code></pre>
+<p>If you are interested in using an older version of a snapshot, you can list previous versions with the <code>possibleDates()</code> like this:</p>
+<pre class="r"><code>pd <- possibleDates(ah)
+pd</code></pre>
+<pre><code>##  [1] "2013-03-19" "2013-03-21" "2013-03-26" "2013-04-04" "2013-04-29"
+##  [6] "2013-06-24" "2013-06-25" "2013-06-26" "2013-06-27" "2013-10-29"
+## [11] "2013-11-20" "2013-12-19" "2014-02-12" "2014-02-13" "2014-03-31"
+## [16] "2014-04-27" "2014-05-11" "2014-05-13" "2014-05-14" "2014-05-22"
+## [21] "2014-07-02" "2014-07-09" "2014-12-15" "2014-12-24" "2015-01-08"
+## [26] "2015-01-14" "2015-03-09" "2015-03-11" "2015-03-12" "2015-03-25"
+## [31] "2015-03-26" "2015-05-06" "2015-05-07" "2015-05-08" "2015-05-11"
+## [36] "2015-05-14" "2015-05-21" "2015-05-22" "2015-05-26" "2015-07-17"
+## [41] "2015-07-27" "2015-07-31" "2015-08-10" "2015-08-13" "2015-08-14"
+## [46] "2015-08-17" "2015-08-26" "2015-12-28" "2015-12-29" "2016-01-25"
+## [51] "2016-03-07" "2016-05-03" "2016-05-25" "2016-06-06" "2016-07-20"
+## [56] "2016-08-15" "2016-10-11" "2016-11-03" "2016-11-08" "2016-11-09"
+## [61] "2016-11-13" "2016-11-14" "2016-12-22" "2016-12-28" "2017-01-05"
+## [66] "2017-02-07" "2017-04-03" "2017-04-04" "2017-04-05" "2017-04-10"
+## [71] "2017-04-11" "2017-04-13" "2017-04-24" "2017-04-25"</code></pre>
+<p>Set the dates like this:</p>
+<pre class="r"><code>snapshotDate(ah) <- pd[1]</code></pre>
+</div>
+<div id="annotationhub-objects-in-a-cluster-environment" class="section level1">
+<h1><span class="header-section-number">4</span> AnnotationHub objects in a cluster environment</h1>
+<p>Resources in AnnotationHub aren’t loaded with the standard <code>R</code> package approach and therefore can’t be loaded on cluster nodes with library(). There are a couple of options to sharing AnnotationHub objects across a cluster when researchers are using the same R install and want access to the same annotations.</p>
+<p>As an example, we create a TxDb object from a GRanges stored in AnnotationHub contributed by contributed by Timothée Flutre. The GRanges was created from a GFF file and contains gene information for Vitis vinifera.</p>
+<ul>
+<li>Download once and build on the fly</li>
+</ul>
+<p>One option is that each user downloads the resource with hub[[“AH50773”]] and the GRanges is saved in the cache. Each subsequent call to hub[[“AH50773”]] retrieves the resource from the cache which is very fast.</p>
+<p>The necessary code extracts the resource then calls makeTxDbFromGRanges().</p>
+<pre class="r"><code>library(AnnotationHub)
+hub <- AnnotationHub()
+gr <- hub[["AH50773"]]  ## downloaded once
+txdb <- makeTxDbFromGRanges(gr)  ## build on the fly</code></pre>
+<ul>
+<li>Build once and share</li>
+</ul>
+<p>Another approach is that one user builds the TxDb and saves it as a .sqlite file. The cluster admin installs this in a common place on all cluster nodes and each user can load it with loadDb(). Loading the file is as quick and easy as calling library() on a TxDb package.</p>
+<p>Once the .sqlite file is install each user’s code would include:</p>
+<pre class="r"><code>library(AnnotationDbi)  ## if not already loaded
+txdb <- loadDb("/locationToFile/mytxdb.sqlite")</code></pre>
+</div>
+<div id="session-info" class="section level1">
+<h1><span class="header-section-number">5</span> Session info</h1>
+<pre class="r"><code>sessionInfo()</code></pre>
+<pre><code>## R version 3.4.0 (2017-04-21)
+## Platform: x86_64-pc-linux-gnu (64-bit)
+## Running under: Ubuntu 16.04.2 LTS
+## 
+## Matrix products: default
+## BLAS: /home/biocbuild/bbs-3.5-bioc/R/lib/libRblas.so
+## LAPACK: /home/biocbuild/bbs-3.5-bioc/R/lib/libRlapack.so
+## 
+## locale:
+##  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
+##  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=C              
+##  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
+##  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
+##  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
+## [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
+## 
+## attached base packages:
+## [1] stats4    parallel  stats     graphics  grDevices utils     datasets 
+## [8] methods   base     
+## 
+## other attached packages:
+##  [1] BSgenome.Hsapiens.UCSC.hg19_1.4.0 BSgenome_1.44.0                  
+##  [3] rtracklayer_1.36.1                VariantAnnotation_1.22.0         
+##  [5] SummarizedExperiment_1.6.1        DelayedArray_0.2.0               
+##  [7] matrixStats_0.52.2                Rsamtools_1.28.0                 
+##  [9] Biostrings_2.44.0                 XVector_0.16.0                   
+## [11] GenomicFeatures_1.28.0            AnnotationDbi_1.38.0             
+## [13] Biobase_2.36.2                    GenomicRanges_1.28.1             
+## [15] GenomeInfoDb_1.12.0               IRanges_2.10.0                   
+## [17] S4Vectors_0.14.0                  AnnotationHub_2.8.1              
+## [19] BiocGenerics_0.22.0               BiocStyle_2.4.0                  
+## 
+## loaded via a namespace (and not attached):
+##  [1] Rcpp_0.12.10                  compiler_3.4.0               
+##  [3] BiocInstaller_1.26.0          bitops_1.0-6                 
+##  [5] tools_3.4.0                   zlibbioc_1.22.0              
+##  [7] biomaRt_2.32.0                digest_0.6.12                
+##  [9] lattice_0.20-35               RSQLite_1.1-2                
+## [11] evaluate_0.10                 memoise_1.1.0                
+## [13] Matrix_1.2-10                 shiny_1.0.3                  
+## [15] DBI_0.6-1                     curl_2.6                     
+## [17] yaml_2.1.14                   GenomeInfoDbData_0.99.0      
+## [19] httr_1.2.1                    stringr_1.2.0                
+## [21] knitr_1.15.1                  grid_3.4.0                   
+## [23] rprojroot_1.2                 R6_2.2.0                     
+## [25] BiocParallel_1.10.1           XML_3.98-1.6                 
+## [27] rmarkdown_1.5                 magrittr_1.5                 
+## [29] GenomicAlignments_1.12.0      backports_1.0.5              
+## [31] htmltools_0.3.6               mime_0.5                     
+## [33] interactiveDisplayBase_1.14.0 xtable_1.8-2                 
+## [35] httpuv_1.3.3                  stringi_1.1.5                
+## [37] RCurl_1.95-4.8</code></pre>
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/scripts/shinyTest.R b/inst/scripts/shinyTest.R
new file mode 100644
index 0000000..ded4cb5
--- /dev/null
+++ b/inst/scripts/shinyTest.R
@@ -0,0 +1,180 @@
+library(shiny)
+
+
+#####################################################
+## the original demo as a function:
+
+# display2 <- 
+# function(df, ...)
+# {    
+#     app <- list(
+#                 ui = fluidPage(
+#                     title = 'Row selection in DataTables',
+#                     sidebarLayout(
+#                         sidebarPanel(textOutput('rows_out')),
+#                         mainPanel(dataTableOutput('tbl')),
+#                         position = 'right'
+#                     )
+#                 )                
+#                 ,
+#                 server = function(input, output) {
+#                     output$rows_out <- renderText({
+#                         paste(c('You selected these rows on the page:', 
+#                                 input$rows),
+#                                     collapse = ' ')
+#                     })                    
+#                     output$tbl <- renderDataTable(
+#                         df,
+#                         options = list(pageLength = 10),
+#                         callback = "function(table) {
+#                         table.on('click.dt', 'tr', function() {
+#                         $(this).toggleClass('selected');
+#                         Shiny.onInputChange('rows',
+#                         table.rows('.selected').indexes().toArray());
+#                         }); }")
+#                 }
+#     )
+# 
+#     ## selectively use the RStudio viewer pane (if available)
+#     viewer <- getOption("viewer")
+#     if (!is.null(viewer)){
+#         runApp(app, launch.browser = rstudio::viewer, ...)
+#     }else{
+#         runApp(app, ...)
+#     }
+# }
+
+
+#####################################################
+## my function that tries to just use indexes:
+display2 <- 
+    function(df, ...)
+    {
+        rowNames <- rownames(df)
+        dt <- data.frame(rownames=rowNames,df)
+        ## define the app
+        app <- list(
+            ui = fluidPage(
+                title = 'The data from your data.frame',
+                sidebarLayout(
+                    sidebarPanel(textOutput('rows_out'),
+                                 br(),
+                                 actionButton("btnSend", "Send Rows")),
+                    mainPanel(dataTableOutput('tbl')),
+                    position = 'left'
+                )
+            )                
+            ,
+            server = function(input, output) {
+                output$rows_out <- renderText({
+                    paste(c('You selected these rows on the page:', 
+                            input$rows),
+                          collapse = ' ')
+                })                    
+                output$tbl <- renderDataTable(
+                    dt,
+                    options = list(pageLength = 20),
+                    callback = "function(table) {
+                    table.on('click.dt', 'tr', function() {
+                    $(this).toggleClass('selected');
+                    Shiny.onInputChange('rows',
+                    table.rows('.selected').indexes().toArray());
+                    }); }")
+                observe({
+                    if(input$btnSend > 0)
+                        isolate({
+                            #print(input$rows)
+                            idx <- as.integer(input$rows) + 1
+                            stopApp(returnValue = df[idx,])
+                        })
+                })                            
+        })
+        ## selectively use the RStudio viewer pane (if available)
+        viewer <- getOption("viewer")
+        if (!is.null(viewer)){
+            runApp(app, launch.browser = rstudio::viewer, ...)
+        }else{
+            runApp(app, ...)
+        }
+}
+## usage: 
+## display2(mtcars)
+
+## original:
+## table.rows('.selected').indexes().toArray());
+
+## This is kind of the same thing
+##     table.rows( $('.selected').closest('tr') ).indexes().toArray());
+## And this gets just the 1st one
+##     table.rows( $('.selected').closest('tr')[0] ).indexes().toArray());
+
+
+
+
+
+
+#####################################################
+## my function that just get the row data (like before)
+display2 <- 
+    function(df, ...)
+    {
+        rowNames <- rownames(df)
+        dt <- data.frame(rownames=rowNames,df)
+        ## define the app
+        app <- list(
+            ui = fluidPage(
+                title = 'The data from your data.frame',
+                sidebarLayout(
+                    sidebarPanel(textOutput('rows_out'),
+                                 br(),
+                                 actionButton("btnSend", "Send Rows")),
+                    mainPanel(dataTableOutput('tbl')),
+                    position = 'left'
+                )
+            )                
+            ,
+            server = function(input, output) {
+                output$rows_out <- renderText({
+                    paste(c('You selected these rows on the page:', 
+                            input$rows),
+                          collapse = ' ')
+                })                    
+                output$tbl <- renderDataTable(
+                    dt,
+                    options = list(pageLength = 50),
+                    callback = "function(table) {
+                    table.on('click.dt', 'tr', function() {
+                    $(this).toggleClass('selected');
+                    Shiny.onInputChange('rows',                    
+            table.rows('.selected').indexes().toArray());
+                    Shiny.onInputChange('tbl',                    
+            table.rows('.selected').indexes().toArray());
+                    }); }")
+## TODO: change the above callback so that it returns rowRanges (not just indexes)
+                observe({
+                    if(input$btnSend > 0)
+                        isolate({
+                            #print(input$rows)
+#                            idx <- as.integer(input$rows) + 1
+#                           stopApp(returnValue = df[idx,])
+                            
+#                             dfVec <- input$myTable
+#                             df <- as.data.frame(matrix(data=dfVec, ncol=dim(df)[2],
+#                                                        byrow=TRUE))
+#                             names(df) <- colNames
+                             stopApp(returnValue = input$tbl)                            
+                        })
+    })                            
+    })
+    ## selectively use the RStudio viewer pane (if available)
+    viewer <- getOption("viewer")
+    if (!is.null(viewer)){
+        runApp(app, launch.browser = rstudio::viewer, ...)
+    }else{
+        runApp(app, ...)
+    }
+    }
+## usage: 
+## display2(mtcars)
+
+
diff --git a/inst/scripts/test.R b/inst/scripts/test.R
new file mode 100644
index 0000000..6552d48
--- /dev/null
+++ b/inst/scripts/test.R
@@ -0,0 +1,244 @@
+## Things to do with the new hub
+
+## 1) fix the downloads (currently broken)
+## 2) get working set of examples scripted.
+## 3) make working functions for missing stuff.
+
+## load
+library(AnnotationHub)
+ah <- AnnotationHub()
+
+## show
+ah
+
+## get metadata for first value
+ah[1]
+
+## extract 1st value
+foo = ah[[1]]
+
+## extract 2nd value (problem with back end?)
+bar = ah[[2]]
+
+
+## subset ah by genome
+ahs <- ah[grep('ailMel1', ah$genome)]
+
+
+## subset it by tags
+ahs <- ah[grep('broadPeak', ah$tags)]                
+
+## or subset by names (if known)
+ah['AH3']
+ah[c('AH3','AH4')]
+
+
+## cache up values from the web. 
+cache(ah[1:3])  
+
+cache(ah[[3]]) ## TROUBLE???: but this doesn't really make sense does it?
+## Shouldn't the thing passed to cache always be a hub object???
+
+## So shouldn't it really be more like this?
+cache(ah[3])
+
+
+## vs this which seems to work without a hitch:
+baz <- ah[[5]]
+
+## use query to subset 
+ahs <- query(ah, 'inparanoid8')     ## This is fine! (arguments were flipped)
+# trace("query", browser, signature ="AnnotationHub")
+foo <- ahs[[1]] 
+
+## use subset in the traditional way
+ahsub <- subset(ah, ah$genome=='ailMel1')
+
+
+## ADD:
+## display
+## possibleDates -- compute on client SELECT DISTINCT rdatadateadded FROM...
+
+## snapshotDate (setter and getter)
+
+## snapshotVersion -- i.e. BiocInstaller::biocVersion()
+
+## and hubUrl (https://annotationhub.bioconductor.org/)
+
+
+
+########################
+## MORE things to test:
+## 
+## Tests for different file types to each come down and be A-OK
+## resource ID for:
+## VCF: 7220                
+foo = ah[['7220']]
+
+## GRanges: 522             
+foo = ah[['522']]
+
+## data.frame: 7831         
+foo = ah[['7831']]
+
+## SQLite: 10366            
+foo = ah[['10366']]
+
+## fasta file: 3            ## This example is HUGE (need smaller one)
+foo = ah[['3']]
+
+
+#################################################
+## Test for verifying that my change works.
+## 
+## Testing 1st requires that I can do this (because the recipe is old)
+## select id,rdatapath,resource_id from rdatapaths where resource_id ='3';
+## (and get two records)
+##
+## To use gamay do this:
+## options(ANNOTATION_HUB_URL="http://gamay:9393")
+library(AnnotationHub); 
+ah = AnnotationHub(); 
+#trace(".get1", browser, signature = "FaFileResource")
+#debug(AnnotationHub:::.AnnotationHub_get1)
+##trace("show", browser, signature = "AnnotationHub")
+fa = ah[['3']]
+
+## test the 'repaired' ensembl 75 fasta files.
+fa = ahs[['10725']]
+
+## Now I just need to change it so that it uses the ah_id from the resources 
+## table (instead of the actual 'id')
+
+##################################################
+## code to hack in a value for the fasta file for ah[['3']]
+## BASICALLY you have to add a row to rdatapaths such that 
+## something like this:
+## SELECT * FROM rdatapaths WHERE resource_id  = '3' limit 4;
+## will return more than one row
+## This appears to be on default prod now for at least one resource. 
+## BUT: what is the file that is called 15107 in my cache?  How are these named?
+## Is it the id from rdatapaths?  Or the resource_id?  
+## And shouldn't it be the AHID from resources?  
+## OR perhaps should the AHID be associated with the value from rdatapaths?
+##################################################
+## The answer is that it's the 'id' from rdatapaths that you want to used for 
+## cached files (and that the methods should access).  And for the UI,
+## the user should type in the 'AHID' names (and names should be modified to 
+## use this)
+
+##################################################
+## The SQL to fix existing records needs to just add one row to rdatapaths for 
+## each .fa file in the resources table
+## Subquery:
+## SELECT rdatapath || '.fai', rdataclass, resource_id FROM rdatapaths 
+## WHERE rdataclass='FaFile';
+## 
+## 1st drop the 'extra' one
+## DELETE FROM rdatapaths where id = 15107;
+##
+## Full query:
+## INSERT INTO rdatapaths (rdatapath, rdataclass, resource_id)  
+## SELECT rdatapath || '.fai', rdataclass, resource_id FROM rdatapaths 
+## WHERE rdataclass='FaFile';
+
+## But for MySQL the subquery will probably look more like this:
+## SELECT concat(rdatapath, '.fai'), rdataclass, resource_id FROM rdatapaths 
+## WHERE rdataclass='FaFile';
+
+## Full MySQL query:
+## INSERT INTO rdatapaths (rdatapath, rdataclass, resource_id)  
+## SELECT concat(rdatapath, '.fai'), rdataclass, resource_id FROM rdatapaths 
+## WHERE rdataclass='FaFile';
+
+
+
+
+##################################################
+## SQL Code to correct the missing version bump
+## SELECT DISTINCT '3.1', resource_id FROM biocversions limit 4;
+#########
+## INSERT INTO biocversions (biocversion, resource_id) SELECT 
+## DISTINCT '3.1', resource_id FROM biocversions;
+
+
+
+#############################################################
+## Instructions for connecting to the back end DB directly:
+## on gamay just connect to the DB like this:
+## mysql -p -u ahuser annotationhub
+## 
+##
+## And for production just log in FIRST
+## ssh ubuntu at annotationhub.bioconductor.org # it has your key
+## 
+## Then back up the DB like so:
+## mysqldump -p -u ahuser annotationhub | gzip > dbdump_201410211449.sql.gz
+## 
+## And proceed as above.
+## 
+
+
+
+
+
+#############################################################
+## For browsing around the data resources:
+
+
+
+
+
+#################################################################
+## Switching from numbers to AH IDs.
+## show method fails because subsetting is now wonky
+## debug(AnnotationHub:::.show)
+## subsetting probably expects that the numbers in names() can be 
+## used as indices somehow...
+
+## trace("[", browser, signature = c("AnnotationHub", "numeric", "missing"))
+## ah[2]
+
+library(AnnotationHub)
+ah <- AnnotationHub()
+#trace("[", browser, signature = c("AnnotationHub", "character", "missing"))
+ah['AH3']
+#trace("[", browser, signature = c("AnnotationHub", "numeric", "missing"))
+trace("[<-", browser, signature = c("AnnotationHub", "numeric", "missing", "AnnotationHub"))
+ah[2]
+
+## Problem with show method for single bracket subsets...
+## Happens with single items and in the case below where (one item is dropped)
+## debug(AnnotationHub:::.show)
+ash = ah[1:3]
+ash
+
+
+## This works now
+trace("[[", browser, signature = c("AnnotationHub", "character", "missing"))
+debug(AnnotationHub:::.AnnotationHub_get1)
+ah[['AH3']]
+
+## TODO: this doesn't work (yet) - low priority though
+ah[[c('AH3','AH4')]]
+
+## is the same as:
+ah[[2]]
+
+
+##########################################################
+## Issue with chain files.
+## trace(AnnotationHub:::.get1, tracer=browser, signature ="ChainFileResource")
+library(AnnotationHub)
+ah <- AnnotationHub()
+ahs <- subset(ah, ah$genome=='hg38')
+ahs <- ahs[1]
+ahs[[1]]
+
+## URL generated: 
+chain <- "/home/mcarlson/.AnnotationHub/18058"
+## trace(AnnotationHub:::.get1, tracer=browser, signature ="ChainFileResource")
+## Things I tried:
+## rtracklayer::import.chain(chain)
+## rtracklayer::import.chain(path=chain, format='chain')
+
diff --git a/inst/unitTests/test_AnnotationHub-class.R b/inst/unitTests/test_AnnotationHub-class.R
new file mode 100644
index 0000000..7f0df54
--- /dev/null
+++ b/inst/unitTests/test_AnnotationHub-class.R
@@ -0,0 +1,31 @@
+test_open <- function() {
+    ## valid connection, including on second try
+    checkTrue(RSQLite::dbIsValid(dbconn(AnnotationHub())))
+    checkTrue(RSQLite::dbIsValid(dbconn(AnnotationHub())))
+}
+
+test_query <- function() {
+    ah = AnnotationHub()
+    q1 <- query(ah, c("GTF", "Ensembl", "Homo sapiens"))
+    checkTrue("AH7558" %in% names(q1))
+    nm <- c("title", "dataprovider", "species", "taxonomyid", "genome", 
+        "description", "tags", "rdataclass", "sourceurl", "sourcetype")
+    checkTrue(all(nm %in% names(mcols(q1))))
+}
+
+test_NA_subscript <- function() {
+    ah <- AnnotationHub()
+    checkIdentical(0L, length(ah[NA]))
+    checkIdentical(0L, length(ah[NA_character_]))
+    checkIdentical(0L, length(ah[NA_integer_]))
+}
+
+test_as.list_and_c <- function() {
+    ah <- AnnotationHub()
+    cc <- selectMethod("c", "AnnotationHub")
+    checkIdentical(ah[1:5], do.call("cc", as.list(ah[1:5])))
+    checkIdentical(ah[1:5], c(ah[1:2], ah[3:5]))
+    checkIdentical(ah[FALSE], c(ah[FALSE], ah[FALSE]))
+    checkIdentical(ah[1:5], c(ah[FALSE], ah[1:5]))
+    checkIdentical(ah[1:5], c(ah[1:4], ah[3:5])) # unique() ids
+}
diff --git a/inst/unitTests/test_cache.R b/inst/unitTests/test_cache.R
new file mode 100644
index 0000000..9faa99c
--- /dev/null
+++ b/inst/unitTests/test_cache.R
@@ -0,0 +1,17 @@
+test_cache_datapathIds <- function() {
+    ## map hub identifiers AH123 to cached identifier(s)
+    hub <- AnnotationHub()
+
+    ## 1:1 mapping
+    result <- AnnotationHub:::.datapathIds(hub["AH28854"])
+    checkIdentical(result, structure(34294L, .Names = "AH28854"))
+
+    ## 1:several mapping
+    result <- AnnotationHub:::.datapathIds(hub["AH169"])
+    checkIdentical(result,
+                   structure(c(169L, 14130L), .Names = c("AH169", "AH169")))
+
+    ## unknown identifier
+    result <- AnnotationHub:::.datapathIds(hub["AH0"])
+    checkIdentical(result, setNames(integer(), character()))
+}
diff --git a/inst/unitTests/test_tidyGRanges.R b/inst/unitTests/test_tidyGRanges.R
new file mode 100644
index 0000000..e372f08
--- /dev/null
+++ b/inst/unitTests/test_tidyGRanges.R
@@ -0,0 +1,32 @@
+test_tidyGRanges <- function() {
+    # Case 1 - GRanges does not have any seqinfo 
+    gr <- GenomicRanges::GRanges(paste0("chr", c(1, 10, "M", 2, 3)),
+                                 IRanges::IRanges(1, 1))
+    gr1 <- AnnotationHub:::.tidyGRanges(gr=gr, metadata=FALSE, genome="hg19")
+
+    chr <- paste0("chr", c(1, 2, 3, 10, "M"))
+    checkIdentical(chr, GenomeInfoDb::seqlevels(gr1))
+    checkIdentical(setNames(rep("hg19", 5), chr), GenomeInfoDb::genome(gr1))
+    checkIdentical(setNames(rep(c(FALSE, TRUE), c(4, 1)), chr),
+                   GenomeInfoDb::isCircular(gr1))
+
+    # case -2 genome not supported by GenomeInfoDb::Seqinfo
+    gr2 <- AnnotationHub:::.tidyGRanges(gr=gr, metadata=FALSE, genome="calJac3")
+    checkIdentical(setNames(rep("calJac3", 5), chr), GenomeInfoDb::genome(gr2))
+    checkIdentical(setNames(rep(c(FALSE, TRUE), c(4, 1)), chr),
+               GenomeInfoDb::isCircular(gr2))
+
+
+    # case -3 GRanges has incorrect/missing seqinfo
+    GenomeInfoDb::seqlengths(gr) <- c(1,2,3,4,5)
+    GenomeInfoDb::isCircular(gr) <- rep(FALSE,5)
+    GenomeInfoDb::genome(gr) <- "hg19"
+    gr1 <- AnnotationHub:::.tidyGRanges(gr=gr, metadata=FALSE, genome="hg19")
+
+    chr <- paste0("chr", c(1, 2, 3, 10, "M"))
+    checkIdentical(chr, GenomeInfoDb::seqlevels(gr1))
+    checkIdentical(setNames(rep("hg19", 5), chr), GenomeInfoDb::genome(gr1))
+    checkIdentical(setNames(rep(c(FALSE, TRUE), c(4, 1)), chr),
+               GenomeInfoDb::isCircular(gr1))
+ 
+} 
diff --git a/inst/unitTests/test_utilities.R b/inst/unitTests/test_utilities.R
new file mode 100644
index 0000000..e225713
--- /dev/null
+++ b/inst/unitTests/test_utilities.R
@@ -0,0 +1,3 @@
+test_require <- function() {
+    checkException(AnnotationHub:::.require("xxx_foo"))
+}
diff --git a/man/AnnotationHub-class.Rd b/man/AnnotationHub-class.Rd
new file mode 100644
index 0000000..0b63141
--- /dev/null
+++ b/man/AnnotationHub-class.Rd
@@ -0,0 +1,402 @@
+\name{AnnotationHub-objects}
+\docType{class}
+
+% Classes
+\alias{class:AnnotationHub}
+\alias{AnnotationHub-class}
+\alias{class:Hub}
+\alias{Hub-class}
+
+% Constructor
+\alias{.Hub}
+\alias{AnnotationHub}
+
+% Accessor-like methods
+\alias{mcols,Hub-method} 
+
+\alias{cache}
+\alias{cache,Hub-method}
+\alias{cache,AnnotationHub-method}
+\alias{cache<-}
+\alias{cache<-,Hub-method}
+
+\alias{hubUrl} 
+\alias{hubUrl,Hub-method} 
+\alias{hubCache}
+\alias{hubCache,Hub-method}
+\alias{hubDate}
+\alias{hubDate,Hub-method}
+\alias{package}
+\alias{package,Hub-method}
+\alias{removeCache}
+
+\alias{possibleDates}
+\alias{snapshotDate}
+\alias{snapshotDate,Hub-method}
+\alias{snapshotDate<-}
+\alias{snapshotDate<-,Hub-method}
+
+\alias{dbconn,Hub-method}
+\alias{dbfile,Hub-method}
+\alias{.db_close}
+\alias{recordStatus}
+\alias{recordStatus,Hub-method}
+
+% List-like
+\alias{length,Hub-method}
+\alias{names,Hub-method}
+\alias{fileName,Hub-method}
+
+% Subsetting:
+\alias{$,Hub-method}
+
+\alias{[[,Hub,character,missing-method}
+\alias{[[,Hub,numeric,missing-method}
+
+\alias{[,Hub,character,missing-method} 
+\alias{[,Hub,logical,missing-method} 
+\alias{[,Hub,numeric,missing-method} 
+
+\alias{[<-,Hub,character,missing,Hub-method} 
+\alias{[<-,Hub,logical,missing,Hub-method} 
+\alias{[<-,Hub,numeric,missing,Hub-method} 
+
+\alias{subset,Hub-method}
+
+\alias{query}
+\alias{query,Hub-method}
+
+\alias{display}
+\alias{display,Hub-method}
+
+% as.list / c
+\alias{as.list.Hub}
+\alias{as.list,Hub-method}
+\alias{c,Hub-method}
+
+% show method:
+\alias{show,Hub-method}
+\alias{show,AnnotationHub-method}
+\alias{show,AnnotationHubResource-method}
+
+
+\title{AnnotationHub objects and their related methods and functions}
+
+\description{
+  Use \code{AnnotationHub} to interact with Bioconductor's AnnotationHub
+  service.  Query the instance to discover and use resources that are of
+  interest, and then easily download and import the resource into R for
+  immediate use.
+
+  Use \code{AnnotationHub()} to retrieve information about all records
+  in the hub.
+
+  Discover records in a hub using \code{mcols()}, \code{query()},
+  \code{subset()}, \code{[}, and \code{display()}.
+
+  Retrieve individual records using \code{[[}. On first use of a
+  resource, the corresponding files or other hub resources are
+  downloaded from the internet to a local cache. On this and all
+  subsequent uses the files are quickly input from the cache into the R
+  session.
+
+  \code{AnnotationHub} records can be added (and sometimes removed) at
+  any time. \code{snapshotDate()} restricts hub records to those
+  available at the time of the snapshot. \code{possibleDates()} lists
+  snapshot dates valid for the current version of Bioconductor.
+
+  The location of the local cache can be found (and updated) with
+  \code{getAnnotationHubCache} and \code{setAnnotationHubCache}; 
+  \code{removeCache} removes all cache resources.
+}
+
+\section{Constructors}{
+  \describe{
+    \item{}{
+      \code{AnnotationHub(..., hub=getAnnotationHubOption("URL"), 
+        cache=getAnnotationHubOption("CACHE"),
+        proxy=getAnnotationHubOption("PROXY"))}:
+
+      Create an \code{AnnotationHub} instance, possibly updating the
+      current database of records.
+    }
+  }
+}
+
+\section{Accessors}{
+  In the code snippets below, \code{x} and \code{object} are
+  AnnotationHub objects.
+
+  \describe{
+    \item{}{
+      \code{hubCache(x)}:
+      Gets the file system location of the local AnnotationHub cache.
+    }
+    \item{}{
+      \code{hubUrl(x)}:
+      Gets the URL for the online hub.
+    }
+    \item{}{
+      \code{length(x)}:
+      Get the number of hub records.
+    }
+    \item{}{
+      \code{names(x)}:
+      Get the names (AnnotationHub unique identifiers, of the form
+      AH12345) of the hub records.
+    }
+    \item{}{
+      \code{fileName(x)}:
+      Get the file path of the hub records as stored in the local cache
+      (AnnotationHub files are stored as unique numbers, of the form
+      12345).  NA is returned for those records which have not been
+      cached.
+    }
+    \item{}{
+      \code{mcols(x)}:
+      Get the metadata columns describing each record. Columns include:
+      \describe{
+
+        \item{title}{Record title, frequently the file name of the
+          object.}
+
+        \item{dataprovider}{Original provider of the resource, e.g.,
+          Ensembl, UCSC.}
+
+        \item{species}{The species for which the record is most
+          relevant, e.g., \sQuote{Homo sapiens}.}
+
+        \item{taxonomyid}{NCBI taxonomy identifier of the species.}
+
+        \item{genome}{Genome build relevant to the record, e.g., hg19.}
+
+        \item{description}{Textual description of the resource,
+          frequently automatically generated from file path and other
+          information available when the record was created.}
+
+        \item{tags}{Single words added to the record to facilitate
+          identification, e.g,. TCGA, Roadmap.}
+
+        \item{rdataclass}{The class of the R object used to represent
+          the object when imported into R, e.g., \code{GRanges},
+          \code{VCFFile}.}
+
+        \item{sourceurl}{Original URL of the resource.}
+
+        \item{sourectype}{Format of the original resource, e.g., BED
+          file.}
+      }
+    }
+
+    \item{}{
+      \code{dbconn(x)}:
+      Return an open connection to the underyling SQLite database.}
+
+    \item{}{
+      \code{dbfile(x)}:
+      Return the full path the underyling SQLite database.}
+
+    \item{}{
+      \code{.db_close(conn)}:
+      Close the SQLite connection \code{conn} returned by \code{dbconn(x)}.}
+
+  }
+}
+
+\section{Subsetting and related operations}{
+  In the code snippets below, \code{x} is an AnnotationHub object.
+
+  \describe{
+    \item{}{
+      \code{x$name}:
+      Convenient reference to individual metadata columns, e.g.,
+      \code{x$species}.
+    }
+    \item{}{
+      \code{x[i]}:
+      Numerical, logical, or character vector (of AnnotationHub names)
+      to subset the hub, e.g., \code{x[x$species == "Homo sapiens"]}.
+    }
+    \item{}{
+      \code{x[[i]]}:
+      Numerical or character scalar to retrieve (if necessary) and
+      import the resource into R.
+    }
+    \item{}{
+      \code{query(x, pattern, ignore.case=TRUE, pattern.op= `&`)}:
+      Return an AnnotationHub subset containing only those elements
+      whose metadata matches \code{pattern}. Matching uses
+      \code{pattern} as in \code{\link{grepl}} to search the
+      \code{as.character} representation of each column, performing a
+      logical \code{`&`} across columns.
+      e.g., \code{query(x, c("Homo sapiens", "hg19", "GTF"))}.
+      \describe{
+        \item{\code{pattern}}{A character vector of patterns to search
+          (via \code{grepl}) for in any of the \code{mcols()} columns.}
+        \item{\code{ignore.case}}{A logical(1) vector indicating whether
+          the search should ignore case (TRUE) or not (FALSE).}
+        \item{\code{pattern.op}}{Any function of two arguments,
+          describing how matches across pattern elements are to be
+          combined. The default \code{`&`} requires that only records
+          with \emph{all} elements of \code{pattern} in their metadata
+          columns are returned.}
+      }
+    }
+    \item{}{
+      \code{subset(x, subset)}:
+      Return the subset of records containing only those elements whose
+      metadata satisfies the \emph{expression} in \code{subset}. The
+      expression can reference columns of \code{mcols(x)}, and should
+      return a logical vector of length \code{length(x)}.
+      e.g., \code{subset(x, species == "Homo sapiens" &
+        genome="GRCh38")}.
+    }
+    \item{}{
+      \code{display(object)}:
+      Open a web browser allowing for easy selection of hub records via
+      interactive tabular display. Return value is the subset of hub
+      records identified while navigating the display.
+    }
+    \item{}{
+      \code{recordStatus(hub, record)}:
+      Returns a \code{data.frame} of the record id and status. \code{hub} must 
+      be a \code{Hub} object and \code{record} must be a \code{character(1)}.
+      Can be used to discover why a resource was removed from the hub.
+    }
+  }
+}
+
+\section{Cache and hub management}{
+  In the code snippets below, \code{x} is an AnnotationHub object.
+  \describe{
+    \item{}{
+      \code{snapshotDate(x)} and \code{snapshotDate(x) <- value}:
+      Gets or sets the date for the snapshot in use. \code{value} should
+      be one of \code{possibleDates()}.
+    }
+    \item{}{
+      \code{possibleDates(x)}:
+      Lists the valid snapshot dates for the version of Bioconductor that
+      is being run (e.g., BiocInstaller::biocVersion()).
+    }
+    \item{}{
+      \code{cache(x)} and \code{cache(x) <- NULL}: Adds (downloads) all
+      resources in \code{x}, or removes all local resources
+      corresponding to the records in \code{x} from the cache. In this case,
+      \code{x} would typically be a small subset of AnnotationHub resources.
+    } 
+    \item{}{
+      \code{hubUrl(x)}:
+      Gets the URL for the online AnnotationHub.
+    }
+    \item{}{
+      \code{hubCache(x)}:
+      Gets the file system location of the local AnnotationHub cache.
+    }
+    \item{}{
+      \code{removeCache(x)}:
+      Removes local AnnotationHub database and all related resources. After
+      calling this function, the user will have to download any AnnotationHub
+      resources again.
+    }
+    \item{}{
+      \code{getAnnotationHubOption()}:
+      TODO: Get cache options "CACHE", "URL", "MAXDOWNLOADS" ... 
+    }
+    \item{}{
+      \code{setAnnotationHubOption()}:
+      TODO: Set cache options "CACHE", "URL", "MAXDOWNLOADS" ... 
+    }
+  }
+}
+
+\section{Coercion}{
+  In the code snippets below, \code{x} is an AnnotationHub object.
+  \describe{
+    \item{}{
+      \code{as.list(x)}:
+      Coerce x to a list of hub instances, one entry per
+      element. Primarily for internal use.
+    }
+    \item{}{
+      \code{c(x, ...)}:
+      Concatenate one or more sub-hub. Sub-hubs must reference the same
+      AnnotationHub instance. Duplicate entries are removed.
+    }
+  }
+}
+
+\author{Martin Morgan, Marc Carlson, Sonali Arora, and Dan Tenenbaum}
+
+\examples{
+  ## create an AnnotationHub object
+  library(AnnotationHub)
+  ah = AnnotationHub()
+
+  ## Summary of available records
+  ah
+
+  ## Detail for a single record
+  ah[1]
+
+  ## and what is the date we are using?
+  snapshotDate(ah)
+
+  ## how many resources?
+  length(ah)
+
+  ## from which resources, is data available?
+  head(sort(table(ah$dataprovider), decreasing=TRUE))
+
+  ## from which species, is data available ? 
+  head(sort(table(ah$species),decreasing=TRUE)) 
+
+  ## what web service and local cache does this AnnotationHub point to?
+  hubUrl(ah)
+  hubCache(ah)
+
+  ### Examples ###
+
+  ## One can  search the hub for multiple strings 
+  ahs2 <- query(ah, c("GTF", "77","Ensembl", "Homo sapiens"))
+  
+  ## information about the file can be retrieved using 
+  ahs2[1]
+
+  ## one can further extract information from this show method
+  ## like the sourceurl using:
+  ahs2$sourceurl 
+  ahs2$description
+  ahs2$title
+
+  ## We can download a file by name like this (using a list semantic):
+  gr <- ahs2[[1]]
+  ## And we can also extract it by the names like this:
+  res <- ah[["AH28812"]]
+
+  ## the gtf file is returned as a GenomicRanges object and contains
+  ## data about which organism it belongs to, its seqlevels and seqlengths
+  seqinfo(gr) 
+
+  ## each GenomicRanges contains a metadata slot which can be used to get 
+  ## the name of the hub object and other associated metadata. 
+  metadata(gr) 
+  ah[metadata(gr)$AnnotationHubName]
+   
+  ## And we can also use "[" to restrict the things that are in the
+  ## AnnotationHub object (by position, character, or logical vector).
+  ## Here is a demo of position:
+  subHub <- ah[1:3]
+
+  if(interactive()) {
+    ## Display method involves user interaction through web interface
+    ah2 <- display(ah)
+  }
+
+  ## recordStatus
+  recordStatus(ah, "TEST")
+  recordStatus(ah, "AH7220")
+}
+
+\keyword{classes}
+\keyword{methods}
diff --git a/man/AnnotationHub-package.Rd b/man/AnnotationHub-package.Rd
new file mode 100644
index 0000000..cd725ce
--- /dev/null
+++ b/man/AnnotationHub-package.Rd
@@ -0,0 +1,14 @@
+\name{AnnotationHub-package}
+\alias{AnnotationHub-package}
+\docType{package}
+\title{Light-weight AnnotationHub 3.0 Client}
+\description{
+  Client to Bioconductor AnnotationHub 3.0 for discovery and retrieval
+  of annotation resources.
+}
+
+\author{Martin Morgan \url{mtmorgan at fhcrc.org}}
+\keyword{ package }
+\examples{
+packageDescription("AnnotationHub")
+}
diff --git a/man/AnnotationHubResource-class.Rd b/man/AnnotationHubResource-class.Rd
new file mode 100644
index 0000000..a2ca0c1
--- /dev/null
+++ b/man/AnnotationHubResource-class.Rd
@@ -0,0 +1,52 @@
+\name{AnnotationHubResource-objects}
+\docType{class}
+
+% Classes
+\alias{class:AnnotationHubResource}
+\alias{AnnotationHubResource-class}
+
+% Constructor
+
+% Accessor-like methods
+\alias{getHub} 
+\alias{getHub,AnnotationHubResource-method} 
+\alias{hubCache,AnnotationHubResource-method} 
+\alias{hubUrl,AnnotationHubResource-method} 
+
+
+\title{AnnotationHubResource objects and their related methods and functions}
+
+\description{
+    TODO
+}
+
+\section{Accessors}{
+  In the code snippets below, \code{x} and \code{object} are
+  AnnotationHubResource objects.
+
+  \describe{
+    \item{}{
+      \code{getHub(x)}:
+      Gets the AnnotationHub from the hub slot.
+    }
+    \item{}{
+      \code{getCache(x)}:
+      Gets the location of the AnnotationHub cache from the AnnotationHub
+      object in the hub slot.
+    }
+    \item{}{
+      \code{getUrl(x)}:
+      Gets the location of the AnnotationHub production server from the 
+      AnnotationHub object in the hub slot.
+    }
+  }
+}
+
+\author{Bioconductor Core Team}
+
+\examples{
+    ## TODO
+}
+
+\keyword{classes}
+\keyword{methods}
diff --git a/man/getAnnotationHubOption.Rd b/man/getAnnotationHubOption.Rd
new file mode 100644
index 0000000..2ed0611
--- /dev/null
+++ b/man/getAnnotationHubOption.Rd
@@ -0,0 +1,72 @@
+\name{getAnnotationHubOption}
+
+\alias{getAnnotationHubOption}
+\alias{setAnnotationHubOption}
+
+\title{Get and set options for default AnnotationHub behavior.}
+
+\description{
+  These functions get or set options for creation of new
+  \sQuote{AnnotationHub} instances.
+}
+
+\usage{
+getAnnotationHubOption(arg)
+setAnnotationHubOption(arg, value)
+}
+
+\arguments{
+  \item{arg}{The character(1) hub options to set. see \sQuote{Details}
+    for current options.}
+  \item{value}{The value to be assigned to the hub option.}
+}
+
+\details{
+  Supported options include:
+
+  \describe{
+
+    \item{\dQuote{URL}:}{character(1). The base URL of the annotation
+      hub. Default: \url{https://annotationhub.bioconductor.org}}
+
+    \item{\dQuote{CACHE}:}{character(1). The location of the hub
+      cache. Default: \dQuote{.AnnotationHub} in the user home
+      directory.}
+
+    \item{\dQuote{MAX_DOWNLOADS}:}{numeric(1). The integer number of
+      downloads allowed before triggering an error. This is to help
+      avoid accidental download of a large number of AnnotationHub
+      members.}
+
+    \item{\dQuote{PROXY}:}{\code{request} object returned by
+      \code{httr::use_proxy()}. The \code{request} object describes a proxy
+      connection allowing Internet access, usually through a restrictive
+      firewall. Setting this option sends all AnnotationHub requests through
+      the proxy. Default: NULL.
+
+      In \code{setHubOption("PROXY", value)}, \code{value} can be one of NULL,
+      a \code{request} object returned by \code{httr::use_proxy()}, or a
+      well-formed URL as character(1). The URL can be completely
+      specified by \code{http://username:password@proxy.dom.com:8080};
+      \code{username:password} and port (e.g. \code{:8080}) are optional.}
+  }
+
+  Default values may also be determined by system and global R
+  environment variables visible \emph{before} the package is loaded. Use
+  options or variables preceeded by \dQuote{ANNOTATION_HUB_}, e.g.,
+  \code{options(ANNOTATION_HUB_MAX_DOWNLOADS=10)} prior to package load
+  sets the default number of downloads to 10. 
+
+}
+
+\value{The requested or successfully set option.}
+
+\author{Martin Morgan \url{mtmorgan at fhcrc.org}}
+
+\examples{
+getAnnotationHubOption("URL")
+\dontrun{
+setAnnotationHubOption("CACHE", "~/.myHub")
+}
+}
+\keyword{ manip }
diff --git a/man/listResources.Rd b/man/listResources.Rd
new file mode 100644
index 0000000..20a40ed
--- /dev/null
+++ b/man/listResources.Rd
@@ -0,0 +1,71 @@
+\name{utilities}
+\alias{utilities}
+
+\alias{listResources}
+\alias{listResources,AnnotationHub-method}
+\alias{loadResources}
+\alias{loadResources,AnnotationHub-method}
+
+
+\title{
+  Utility functions for discovering package-specific Hub resources.
+}
+
+\description{
+  List and load resources from ExperimentHub filtered by package
+  name and optional search terms.
+}
+
+\details{
+  Currently \code{listResources} and \code{loadResources} are only meaningful
+  for \code{ExperimentHub} objects. Methods for \code{AnnotationHub} 
+  objects may be added in the future.
+}
+
+\usage{
+listResources(hub, package, filterBy = character())
+loadResources(hub, package, filterBy = character())
+}
+
+\arguments{
+  \item{hub}{
+    A \code{Hub} object; currently only meaningful for \code{ExperimentHub}.
+  }
+  \item{package}{
+    A \code{character(1)} name of a package with resources hosted in the Hub.
+  }
+  \item{filterBy}{
+    A \code{character()} vector of search terms for additional filtering. 
+    Can be any terms found in the metadata (mcols()) of the resources. 
+    When not provided, there is no additional filtering and all resources 
+    associated with the given package are returned.
+  }
+}
+
+\value{
+  \code{listResources} returns a character vector; 
+  \code{loadResources} returns a list of data objects.
+}
+
+\seealso{
+}
+
+\examples{
+\dontrun{
+## Packages with resources hosted in ExperimentHub:
+require(ExperimentHub)
+eh <- ExperimentHub()
+unique(package(eh))
+
+## All resources associated with the 'GSE62944' package:
+listResources(eh, "GSE62944")
+
+## Resources associated with the 'curatedMetagenomicData' package
+## filtered by 'plaque.abundance':
+listResources(eh, "curatedMetagenomicData", "plaque.abundance")
+
+## 'loadResources()' returns a list of the data objects:
+loadResources(eh, "curatedMetagenomicData", "plaque.abundance")
+}
+}
+\keyword{utilities}
diff --git a/tests/runTests.R b/tests/runTests.R
new file mode 100644
index 0000000..fc4ca6b
--- /dev/null
+++ b/tests/runTests.R
@@ -0,0 +1 @@
+BiocGenerics:::testPackage("AnnotationHub")
diff --git a/vignettes/AnnotationHub-HOWTO.Rmd b/vignettes/AnnotationHub-HOWTO.Rmd
new file mode 100644
index 0000000..953199a
--- /dev/null
+++ b/vignettes/AnnotationHub-HOWTO.Rmd
@@ -0,0 +1,367 @@
+---
+title: "AnnotationHub How-To's"
+output:
+  BiocStyle::html_document:
+    toc: true
+vignette: >
+  % \VignetteIndexEntry{AnnotationHub: AnnotationHub HOW TO's}
+  % \VignetteDepends{AnnotationHub, GenomicFeatures, Rsamtools}
+  % \VignetteEngine{knitr::rmarkdown}
+---
+
+```{r style, echo = FALSE, results = 'asis', warning=FALSE}
+options(width=100)
+suppressPackageStartupMessages({
+    ## load here to avoid noise in the body of the vignette
+    library(AnnotationHub)
+    library(GenomicFeatures)
+    library(Rsamtools)
+    library(VariantAnnotation)
+})
+BiocStyle::markdown()
+```
+
+**Package**: `r Biocpkg("AnnotationHub")`<br />
+**Authors**: `r packageDescription("AnnotationHub")[["Author"]] `<br />
+**Modified**: Sun Jun 28 10:41:23 2015<br />
+**Compiled**: `r date()`
+
+
+# Accessing Genome-Scale Data
+
+## Non-model organism gene annotations
+
+_Bioconductor_ offers pre-built `org.*` annotation packages for model
+organisms, with their use described in the
+[OrgDb](http://bioconductor.org/help/workflows/annotation/Annotation_Resources/#OrgDb)
+section of the Annotation work flow. Here we discover available `OrgDb`
+objects for less-model organisms
+
+```{r less-model-org}
+library(AnnotationHub)
+ah <- AnnotationHub()
+query(ah, "OrgDb")
+orgdb <- query(ah, "OrgDb")[[1]] 
+```
+
+The object returned by AnnotationHub is directly usable with the
+`select()` interface, e.g., to discover the available keytypes for
+querying the object, the columns that these keytypes can map to, and
+finally selecting the SYMBOL and GENENAME corresponding to the first 6
+ENTREZIDs
+
+```{r less-model-org-select}
+keytypes(orgdb)
+columns(orgdb)
+egid <- head(keys(orgdb, "ENTREZID"))
+select(orgdb, egid, c("SYMBOL", "GENENAME"), "ENTREZID")
+```
+
+## Roadmap Epigenomics Project 
+
+All Roadmap Epigenomics files are hosted
+[here](http://egg2.wustl.edu/roadmap/data/byFileType/). If one had to
+download these files on their own, one would navigate through the web
+interface to find useful files, then use something like the following
+_R_ code.
+
+```{r, eval=FALSE}
+url <- "http://egg2.wustl.edu/roadmap/data/byFileType/peaks/consolidated/broadPeak/E001-H3K4me1.broadPeak.gz"
+filename <-  basename(url)
+download.file(url, destfile=filename)
+if (file.exists(filename))
+   data <- import(filename, format="bed")
+```
+This would have to be repeated for all files, and the onus would lie
+on the user to identify, download, import, and manage the local disk
+location of these files.
+
+`r Biocpkg("AnnotationHub")` reduces this task to just a few lines of _R_ code 
+```{r results='hide'}
+library(AnnotationHub)
+ah = AnnotationHub()
+epiFiles <- query(ah, "EpigenomeRoadMap")
+```
+A look at the value returned by `epiFiles` shows us that 
+`r length(epiFiles)` roadmap resources are available via 
+`r Biocpkg("AnnotationHub")`.  Additional information about 
+the files is also available, e.g., where the files came from
+(dataprovider), genome, species, sourceurl, sourcetypes.
+
+```{r}
+epiFiles
+```
+
+A good sanity check to ensure that we have files only from the Roadmap Epigenomics
+project is to check that all the files in the returned smaller hub object
+come from _Homo sapiens_ and the `r unique(epiFiles$genome)` genome 
+```{r}
+unique(epiFiles$species)
+unique(epiFiles$genome)
+```
+Broadly, one can get an idea of the different files from this project 
+looking at the sourcetype
+```{r}
+table(epiFiles$sourcetype)
+```
+To get a more descriptive idea of these different files one can use:
+```{r}
+sort(table(epiFiles$description), decreasing=TRUE)
+```
+
+The 'metadata' provided by the Roadmap Epigenomics Project is also
+available. Note that the information displayed about a hub with a
+single resource is quite different from the information displayed when
+the hub references more than one resource.
+```{r}
+metadata.tab <- query(ah , c("EpigenomeRoadMap", "Metadata"))
+metadata.tab
+```
+
+So far we have been exploring information about resources, without
+downloading the resource to a local cache and importing it into R.
+One can retrieve the resource using `[[` as indicated at the
+end of the show method
+
+```{r echo=FALSE, results='hide'}
+metadata.tab <- ah[["AH41830"]]
+```
+```{r}
+metadata.tab <- ah[["AH41830"]]
+```
+
+The metadata.tab file is returned as a _data.frame_. The first 6 rows
+of the first 5 columns are shown here:
+
+```{r}
+metadata.tab[1:6, 1:5]
+```
+
+One can keep constructing different queries using multiple arguments to 
+trim down these `r length(epiFiles)` to get the files one wants. 
+For example, to get the ChIP-Seq files for consolidated epigenomes, 
+one could use
+```{r}
+bpChipEpi <- query(ah , c("EpigenomeRoadMap", "broadPeak", "chip", "consolidated"))
+```
+To get all the bigWig signal files, one can query the hub using 
+```{r}
+allBigWigFiles <- query(ah, c("EpigenomeRoadMap", "BigWig"))
+```
+To access the 15 state chromatin segmentations, one can use
+```{r}
+seg <- query(ah, c("EpigenomeRoadMap", "segmentations"))
+```
+If one is interested in getting all the files related to one sample
+```{r}
+E126 <- query(ah , c("EpigenomeRoadMap", "E126", "H3K4ME2"))
+E126
+```
+Hub resources can also be selected using `$`, `subset()`, and
+`display()`; see the main
+[_AnnotationHub_ vignette](AnnotationHub.html) for additional detail.
+
+Hub resources are imported as the appropriate _Bioconductor_ object
+for use in further analysis.  For example, peak files are returned as
+_GRanges_ objects.
+
+```{r echo=FALSE, results='hide'}
+peaks <- E126[['AH29817']]
+```
+```{r}
+peaks <- E126[['AH29817']]
+seqinfo(peaks)
+```
+
+BigWig files are returned as _BigWigFile_ objects. A _BigWigFile_ is a
+reference to a file on disk; the data in the file can be read in using
+`rtracklayer::import()`, perhaps querying these large files for
+particular genomic regions of interest as described on the help page
+`?import.bw`.
+
+Each record inside `r Biocpkg("AnnotationHub")` is associated with a
+unique identifier. Most _GRanges_ objects returned by 
+`r Biocpkg("AnnotationHub")` contain the unique AnnotationHub identifier of
+the resource from which the _GRanges_ is derived.  This can come handy
+when working with the _GRanges_ object for a while, and additional
+information about the object (e.g., the name of the file in the cache,
+or the original sourceurl for the data underlying the resource) that
+is being worked with.
+
+```{r}
+metadata(peaks)
+ah[metadata(peaks)$AnnotationHubName]$sourceurl
+```
+
+## Ensembl GTF and FASTA files for TxDb gene models and sequence queries
+
+_Bioconductor_ represents gene models using 'transcript'
+databases. These are available via packages such as
+`r Biocannopkg("TxDb.Hsapiens.UCSC.hg38.knownGene")`
+or can be constructed using functions such as
+`r Biocpkg("GenomicFeatures")`::`makeTxDbFromBiomart()`.
+
+_AnnotationHub_ provides an easy way to work with gene models
+published by Ensembl. Let's see what Ensembl's Release-80 has in terms
+of data for pufferfish, _Takifugu rubripes_.
+
+```{r takifugu-gene-models}
+query(ah, c("Takifugu", "release-80"))
+```
+
+We see that there is a GTF file descrbing gene models, as well as
+various DNA sequences. Let's retrieve the GTF and top-level DNA
+sequence files. The GTF file is imported as a _GRanges_ instance, the
+DNA sequence as a compressed, indexed Fasta file
+
+
+```{r takifugu-data}
+gtf <- ah[["AH47101"]]
+dna <- ah[["AH47477"]]
+
+head(gtf, 3)
+dna
+head(seqlevels(dna))
+```
+
+Let's identify the 25 longest DNA sequences, and keep just the
+annotations on these scaffolds.
+
+```{r takifugu-seqlengths}
+keep <- names(tail(sort(seqlengths(dna)), 25))
+gtf_subset <- gtf[seqnames(gtf) %in% keep]
+```
+
+It is trivial to make a TxDb instance of this subset (or of the entire
+gtf)
+
+```{r takifugu-txdb}
+library(GenomicFeatures)         # for makeTxDbFromGRanges
+txdb <- makeTxDbFromGRanges(gtf_subset)
+````
+
+and to use that in conjunction with the DNA sequences, e.g., to find
+exon sequences of all annotated genes.
+
+```{r takifugu-exons}
+library(Rsamtools)               # for getSeq,FaFile-method
+exons <- exons(txdb)
+length(exons)
+getSeq(dna, exons)
+```
+
+There is a one-to-one mapping between the genomic ranges contained in
+`exons` and the DNA sequences returned by `getSeq()`.
+
+Some difficulties arise when working with this partly assembled genome
+that require more advanced GenomicRanges skills, see the
+`r Biocpkg("GenomicRanges")` vignettes, especially "_GenomicRanges_
+HOWTOs" and "An Introduction to _GenomicRanges_".
+
+## liftOver to map between genome builds
+
+Suppose we wanted to lift features from one genome build to another,
+e.g., because annotations were generated for hg19 but our experimental
+analysis used hg18.  We know that UCSC provides 'liftover' files for
+mapping between genome builds.
+
+In this example, we will take our broad Peak _GRanges_ from E126 which
+comes from the 'hg19' genome, and lift over these features to their
+'hg38' coordinates.
+
+```{r}
+chainfiles <- query(ah , c("hg38", "hg19", "chainfile"))
+chainfiles
+```
+
+We are interested in the file that lifts over features from hg19 to
+hg38 so lets download that using
+
+```{r echo=FALSE, results='hide'}
+chain <- chainfiles[['AH14150']]
+```
+```{r}
+chain <- chainfiles[['AH14150']]
+chain
+```
+Perform the liftOver operation using `rtracklayer::liftOver()`:
+
+```{r}
+library(rtracklayer)
+gr38 <- liftOver(peaks, chain)
+```
+This returns a _GRangeslist_; update the genome of the result to get
+the final result
+
+```{r}
+genome(gr38) <- "hg38"
+gr38
+``` 
+
+## Working with dbSNP Variants
+
+One may also be interested in working with common germline variants with 
+evidence of medical interest. This information is available at 
+[NCBI](https://www.ncbi.nlm.nih.gov/variation/docs/human_variation_vcf/).
+
+Query the dbDNP files in the hub:
+
+```{r echo=FALSE, results='hide', message=FALSE}
+query(ah, c("GRCh37", "dbSNP", "VCF" ))
+vcf <- ah[['AH50424']]
+```
+This returns a _VcfFile_ which can be read in using `r
+Biocpkg("VariantAnnotation")`; because VCF files can be large, `readVcf()`
+supports several strategies for importing only relevant parts of the file
+(e.g., particular genomic locations, particular features of the variants), see
+`?readVcf` for additional information.
+
+```{r message=FALSE}
+variants <- readVcf(vcf, genome="hg19")
+variants
+```
+
+`rowRanges()` returns information from the CHROM, POS and ID fields of the VCF 
+file, represented as a _GRanges_ instance
+
+```{r}
+rowRanges(variants)
+```
+
+Note that the broadPeaks files follow the UCSC chromosome naming convention,
+and the vcf data follows the NCBI style of chromosome naming convention. 
+To bring these ranges in the same chromosome
+naming convention (ie UCSC), we would use
+
+```{r}
+seqlevelsStyle(variants) <-seqlevelsStyle(peaks)
+```
+
+And then finally to find which variants overlap these broadPeaks we would use:
+
+```{r}
+overlap <- findOverlaps(variants, peaks)
+overlap
+```
+
+Some insight into how these results can be interpretted comes from
+looking a particular peak, e.g., the 3852nd peak
+
+```{r}
+idx <- subjectHits(overlap) == 3852
+overlap[idx]
+```
+
+There are three variants overlapping this peak; the coordinates of the
+peak and the overlapping variants are
+
+```{r}
+peaks[3852]
+rowRanges(variants)[queryHits(overlap[idx])]
+```
+
+# sessionInfo
+
+```{r}
+sessionInfo()
+```
diff --git a/vignettes/AnnotationHub.Rmd b/vignettes/AnnotationHub.Rmd
new file mode 100644
index 0000000..8092c7d
--- /dev/null
+++ b/vignettes/AnnotationHub.Rmd
@@ -0,0 +1,235 @@
+---
+title: "AnnotationHub: Access the AnnotationHub Web Service"
+output:
+  BiocStyle::html_document:
+    toc: true
+vignette: >
+  % \VignetteIndexEntry{AnnotationHub: Access the AnnotationHub Web Service}
+  % \VignetteDepends{AnnotationHub}
+  % \VignetteEngine{knitr::rmarkdown}
+  % \VignetteEncoding{UTF-8}
+---
+
+```{r style, echo = FALSE, results = 'asis'}
+BiocStyle::markdown()
+```
+**Package**: `r Biocpkg("AnnotationHub")`<br />
+**Authors**: `r packageDescription("AnnotationHub")[["Author"]] `<br />
+**Modified**: 27 May, 2016<br />
+**Compiled**: `r date()`
+
+The `AnnotationHub` server provides easy _R / Bioconductor_ access to
+large collections of publicly available whole genome resources,
+e.g,. ENSEMBL genome fasta or gtf files, UCSC chain resources, ENCODE
+data tracks at UCSC, etc.
+
+# AnnotationHub objects
+
+The `r Biocpkg("AnnotationHub")` package provides a client interface
+to resources stored at the AnnotationHub web service.
+
+```{r library, message=FALSE}
+library(AnnotationHub)
+```
+
+The `r Biocpkg("AnnotationHub")` package is straightforward to use.
+Create an `AnnotationHub` object
+
+```{r AnnotationHub}
+ah = AnnotationHub()
+```
+
+Now at this point you have already done everything you need in order
+to start retrieving annotations.  For most operations, using the
+`AnnotationHub` object should feel a lot like working with a familiar
+`list` or `data.frame`. 
+
+Lets take a minute to look at the show method for the hub object ah 
+
+```{r show}
+ah
+```
+
+You can see that it gives you an idea about the different types of data that are present inside the hub. You can see where the data is coming from (dataprovider), as well as what species have samples present (species), what kinds of R data objects could be returned (rdataclass).  We can take a closer look at all the kinds of data providers that are available by simply looking at the contents of dataprovider as if it were the column of a data.frame object like this:
+
+```{r dataprovider}
+unique(ah$dataprovider)
+```
+
+In the same way, you can also see data from different species inside the hub by looking at the contents of species like this: 
+
+```{r species}
+head(unique(ah$species))
+```
+
+And this will also work for any of the other types of metadata present.  You can learn which kinds of metadata are available by simply hitting the tab key after you type 'ah$'.  In this way you can explore for yourself what kinds of data are present in the hub right from the command line. This interface also allows you to access the hub programatically to extract data that matches a particular set of criteria.
+
+Another valuable types of metadata to pay attention to is the rdataclass.
+
+```{r rdataclass}
+head(unique(ah$rdataclass))
+```
+
+The rdataclass allows you to see which kinds of R objects the hub will return to you.  This kind of information is valuable both as a means to filter results and also as a means to explore and learn about some of the kinds of annotation objects that are widely available for the project.  Right now this is a pretty short list, but over time it should grow as we support more of the different kinds of annotation objects via the hub.
+
+
+Now lets try getting the Chain Files from UCSC using the query and subset methods to selectively pare down the hub based on specific criteria. 
+The query method lets you search rows for
+specific strings, returning an `AnnotationHub` instance with just the
+rows matching the query.
+
+From the show method, one can easily see that one of the dataprovider is
+UCSC and there is a rdataclass for ChainFile
+
+One can get chain files for Drosophila melanogaster from UCSC with:
+
+```{r dm1}
+dm <- query(ah, c("ChainFile", "UCSC", "Drosophila melanogaster"))
+dm
+```
+Query has worked and you can now see that the only species present is 
+Drosophila melanogaster. 
+ 
+The metadata underlying this hub object can be retrieved by you 
+
+```{r show2}
+df <- mcols(dm)
+```
+
+By default the show method will only display the first 5 and last 5  rows.
+There are already thousands of records present in the hub.
+
+```{r length}
+length(ah)
+```
+Lets look at another example, where we pull down only Inparanoid8 data 
+from the hub and use  subset to return a smaller base object (here we
+ are finding cases where the genome column is set to panda).
+
+```{r subset}
+ahs <- query(ah, c('inparanoid8', 'ailuropoda'))
+ahs
+```
+
+We can also look at the `AnnotationHub` object in a browser using the
+`display()` function. We can then filter the `AnnotationHub` object
+for _chainFile__ by either using the Global search field on the top
+right corner of the page or the in-column search field for `rdataclass'.
+
+```{r display, eval=FALSE}
+d <- display(ah)
+```
+
+![](display.png)
+Displaying and filtering the Annotation Hub object in a browser
+
+By default 1000 entries are displayed per page, we can change this using
+the filter on the top of the page or navigate through different pages
+using the page scrolling feature at the bottom of the page. 
+
+We can also select the rows of interest to us and send them back to
+the R session using 'Return rows to R session' button ; this sets a
+filter internally which filters the `AnnotationHub` object. The names
+of the selected AnnotationHub elements displayed at the top of the
+page.
+
+# Using `AnnotationHub` to retrieve data
+
+Looking back at our chain file example, if we are interested in the file 
+dm1ToDm2.over.chain.gz, we can gets its metadata using
+
+```{r dm2}
+dm
+dm["AH15146"]
+```
+We can download the file using
+
+```{r dm3}
+dm[["AH15146"]]
+```
+Each file is retrieved from the AnnotationHub server and the file is
+also cache locally, so that the next time you need to retrieve it,
+it should download much more quickly.
+
+# Configuring `AnnotationHub` objects
+
+When you create the `AnnotationHub` object, it will set up the object
+for you with some default settings.  See `?AnnotationHub` for ways to
+customize the hub source, the local cache, and other instance-specific
+options, and `?getAnnotationHubOption` to get or set package-global 
+options for use across sessions. 
+
+If you look at the object you will see some helpful information about
+it such as where the data is cached and where online the hub server is
+set to.
+
+```{r show-2}
+ah
+```
+
+By default the `AnnotationHub` object is set to the latest
+`snapshotData` and a snapshot version that matches the version of
+_Bioconductor_ that you are using. You can also learn about these data
+with the appropriate methods.
+
+```{r snapshot}
+snapshotDate(ah)
+```
+
+If you are interested in using an older version of a snapshot, you can
+list previous versions with the `possibleDates()` like this:
+
+```{r possibleDates}
+pd <- possibleDates(ah)
+pd
+```
+
+Set the dates like this:
+
+```{r setdate, eval=FALSE}
+snapshotDate(ah) <- pd[1]
+```
+# AnnotationHub objects in a cluster environment
+
+Resources in AnnotationHub aren't loaded with the standard `R` package approach
+and therefore can't be loaded on cluster nodes with library(). There are a
+couple of options to sharing AnnotationHub objects across a cluster when
+researchers are using the same R install and want access to the same
+annotations.
+
+As an example, we create a TxDb object from a GRanges stored in AnnotationHub
+contributed by contributed by Timothée Flutre.  The GRanges was created from a
+GFF file and contains gene information for Vitis vinifera.
+
+* Download once and build on the fly
+
+One option is that each user downloads the resource with hub[["AH50773"]] and
+the GRanges is saved in the cache. Each subsequent call to 
+hub[["AH50773"]] retrieves the resource from the cache which is very fast.
+
+The necessary code extracts the resource then calls makeTxDbFromGRanges().
+```{r clusterOptions1, eval=FALSE}
+library(AnnotationHub)
+hub <- AnnotationHub()
+gr <- hub[["AH50773"]]  ## downloaded once
+txdb <- makeTxDbFromGRanges(gr)  ## build on the fly
+```
+
+* Build once and share
+
+Another approach is that one user builds the TxDb and saves it as a .sqlite
+file. The cluster admin installs this in a common place on all cluster nodes
+and each user can load it with loadDb(). Loading the file is as quick and
+easy as calling library() on a TxDb package.
+
+Once the .sqlite file is install each user's code would include:
+```{r clusterOptions2, eval=FALSE}
+library(AnnotationDbi)  ## if not already loaded
+txdb <- loadDb("/locationToFile/mytxdb.sqlite")
+```
+
+# Session info
+
+```{r sessionInfo}
+sessionInfo()
+```
diff --git a/vignettes/display.png b/vignettes/display.png
new file mode 100644
index 0000000..d3f3676
Binary files /dev/null and b/vignettes/display.png differ

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-bioc-annotationhub.git



More information about the debian-med-commit mailing list