[med-svn] [r-cran-rentrez] 01/06: New upstream version 1.1.0

Andreas Tille tille at debian.org
Sat Sep 30 08:46:17 UTC 2017


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-rentrez.

commit abe6ad1b15807f452f37bc6257bd3b265e686cb1
Author: Andreas Tille <tille at debian.org>
Date:   Sat Sep 30 10:41:23 2017 +0200

    New upstream version 1.1.0
---
 DESCRIPTION                     |  10 +-
 MD5                             |  61 +++----
 NEWS                            |  18 ++
 R/base.r                        |  40 +++--
 R/entrez_fetch.r                |  10 +-
 R/entrez_search.r               |   4 +
 R/entrez_summary.r              |  79 ++++++---
 build/vignette.rds              | Bin 210 -> 210 bytes
 inst/doc/rentrez_tutorial.Rmd   |  27 ++-
 inst/doc/rentrez_tutorial.html  | 376 ++++++++++++++++++++++------------------
 man/entrez_citmatch.Rd          |   1 -
 man/entrez_db_links.Rd          |   1 -
 man/entrez_db_searchable.Rd     |   1 -
 man/entrez_db_summary.Rd        |   1 -
 man/entrez_dbs.Rd               |   1 -
 man/entrez_fetch.Rd             |  11 +-
 man/entrez_global_query.Rd      |   1 -
 man/entrez_info.Rd              |   1 -
 man/entrez_link.Rd              |   1 -
 man/entrez_post.Rd              |   1 -
 man/entrez_search.Rd            |   5 +-
 man/entrez_summary.Rd           |  10 +-
 man/extract_from_esummary.Rd    |   1 -
 man/linkout_urls.Rd             |   1 -
 man/parse_pubmed_xml.Rd         |   1 -
 man/rentrez.Rd                  |   2 +-
 tests/testthat/test_fetch.r     |  13 ++
 tests/testthat/test_httr_post.r |  19 ++
 tests/testthat/test_link.r      |   2 +-
 tests/testthat/test_query.r     |   5 +-
 tests/testthat/test_summary.r   |  33 +++-
 vignettes/rentrez_tutorial.Rmd  |  27 ++-
 32 files changed, 488 insertions(+), 276 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index b95d8ba..b73ce89 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,6 +1,6 @@
 Package: rentrez
-Version: 1.0.4
-Date: 2016-10-26
+Version: 1.1.0
+Date: 2017-05-22
 Title: Entrez in R
 Authors at R: c(
  person("David", "Winter", role=c("aut", "cre"), 
@@ -18,12 +18,12 @@ Description: Provides an R interface to the NCBI's EUtils API
     results of those searches and pull data into their R sessions.
 VignetteBuilder: knitr
 License: MIT + file LICENSE
-RoxygenNote: 5.0.1
+RoxygenNote: 6.0.1
 NeedsCompilation: no
-Packaged: 2016-10-25 21:45:43 UTC; dwinter
+Packaged: 2017-06-01 02:02:33 UTC; david
 Author: David Winter [aut, cre],
   Scott Chamberlain [ctb],
   Han Guangchun [ctb]
 Maintainer: David Winter <david.winter at gmail.com>
 Repository: CRAN
-Date/Publication: 2016-10-26 10:37:53
+Date/Publication: 2017-06-01 03:54:10 UTC
diff --git a/MD5 b/MD5
index 7ccd2ed..f9c487a 100644
--- a/MD5
+++ b/MD5
@@ -1,50 +1,51 @@
-c46cd1b0d179ed94f36607f4d794fcba *DESCRIPTION
+eb28a1020481d42fc99645fa9f55db84 *DESCRIPTION
 9cc081ea2d963c6df84446d83052ad2e *LICENSE
 734927c7997434b0a0b33184d323cbc1 *NAMESPACE
-65508a8c8d00af2fe9a9d7c878998f45 *NEWS
-bd61a06a1bf850c5e8382b0b709d2e74 *R/base.r
+a29f4541eeaa8f406e4329fca266fe25 *NEWS
+1226fb74a9bcf2119ffa2a9be56ae151 *R/base.r
 111faade12c3113b9e790591a06e4d6e *R/entrez_citmatch.r
-eaba24477be03709dc9165be72ea0af3 *R/entrez_fetch.r
+e5931141fbee993d7cf71c74ae0daf6c *R/entrez_fetch.r
 ecdd5a491cbac7d4724dc8f9fc9ed9c9 *R/entrez_global_query.r
 033301b7c9d56c3a42748f7831967acd *R/entrez_info.r
 bee110dd037576d16a0dab0b86fcd72a *R/entrez_link.r
 6df3c3820aee8f91f0da62f7dd6af96a *R/entrez_post.r
-ac8f2afdf128547c59b48c895e22e73e *R/entrez_search.r
-fa7a54a7b0bc597ca1a0dc36296256db *R/entrez_summary.r
+be2fa498a78b6c981c639b022d1d46c7 *R/entrez_search.r
+3a42c38db7146642a4ceddbbc794c9f4 *R/entrez_summary.r
 8bc803d43b3e90e932d6c82894f59650 *R/help.r
 646f614d14b267f07f6289e0cc54f357 *R/parse_pubmed_xml.r
-b619f815cdf95e7e3fd71991e1a4c3fb *build/vignette.rds
+1eea29e0728d29fc4319d61ad0b9d96d *build/vignette.rds
 c4e8efa0982cdfe1a54a41d79f3c45cb *inst/doc/rentrez_tutorial.R
-d6b6f2d94601a8060eccf9138e1e13d4 *inst/doc/rentrez_tutorial.Rmd
-ca1f984d47ba7da9814366c7b7932a6c *inst/doc/rentrez_tutorial.html
-6a9d7d59c190e52d444f99df98d7250b *man/entrez_citmatch.Rd
-1004875de1892a54ebdd08af57b41888 *man/entrez_db_links.Rd
-7b9430ebdf08921fef74c053dc79f338 *man/entrez_db_searchable.Rd
-1c76c77d9a4ecc1b4ec24b01f5a2b478 *man/entrez_db_summary.Rd
-f014c478bccc89fa7ee923bb1ca257fe *man/entrez_dbs.Rd
-9b5edd0123abaf9f940ac4633f35b153 *man/entrez_fetch.Rd
-1c247dda2af926fc6c1e60cfa2ca837e *man/entrez_global_query.Rd
-c76debb88267f7eededddc9e5dd1ec9e *man/entrez_info.Rd
-2039f9f031b0939a53dc1b7ef5780c65 *man/entrez_link.Rd
-ff390bf26de7ede2ef8830df1a71a900 *man/entrez_post.Rd
-bda0d4b88d02a9c9dd5dcea23c73170c *man/entrez_search.Rd
-3b98e2c7e97907c4379f44fc8eda8fff *man/entrez_summary.Rd
-ed7f3e55c89c45441b594e068f429148 *man/extract_from_esummary.Rd
-631c78e4d1fa5abd0f9c7000ff0196b7 *man/linkout_urls.Rd
-7f25c20a98fd461a8d73b5d00643a6a8 *man/parse_pubmed_xml.Rd
-617722e7b51351ef3bb9153ef67f0d46 *man/rentrez.Rd
+564a9cb29d2a80929d57d6234a2711a0 *inst/doc/rentrez_tutorial.Rmd
+3cbd718174b841232b0e1fcd585ddc8f *inst/doc/rentrez_tutorial.html
+4c2c03b2da5998fc705ba3fecb9b11ec *man/entrez_citmatch.Rd
+c5d2dfb20a19258305f66992b9a6eb93 *man/entrez_db_links.Rd
+1bbec18591b69b1f24e21009536f50ba *man/entrez_db_searchable.Rd
+b76c339678acd47b16fc4941fcb48998 *man/entrez_db_summary.Rd
+58c939254eded7cc617963dac52d78cc *man/entrez_dbs.Rd
+753796a12579f37f03a5fdd689282a98 *man/entrez_fetch.Rd
+42e2cf420693e3e3970bc952c49e0ee1 *man/entrez_global_query.Rd
+241c6bacdf3fad63d468e6ffc683c7b2 *man/entrez_info.Rd
+60e3d8b94936a6b60f8450ac599e2ace *man/entrez_link.Rd
+c3bb44bdde21b463be5f153e7c862609 *man/entrez_post.Rd
+305970cfa96cf36414110b966bbf1733 *man/entrez_search.Rd
+f9aeb339517b0e8932ec0447962f2605 *man/entrez_summary.Rd
+12aac656a27d8539dbb75734891339f9 *man/extract_from_esummary.Rd
+d563be6b5b0c28e82f1f9837b7f33d2c *man/linkout_urls.Rd
+f63b2726b9639edd7faa24e57fc828a3 *man/parse_pubmed_xml.Rd
+b4a0e94018cb01c5c00da6bb5ed1a835 *man/rentrez.Rd
 db04e7147a14d952e0ae8c93d1390087 *tests/test-all.R
 0c4b51d40ae63cbfdcfac31cd67edb96 *tests/testthat/test_citmatch.r
 4edd85844f931fee501b87861537459c *tests/testthat/test_docs.r
-eb3281531e131d2c8fb84bcac1bb1cc8 *tests/testthat/test_fetch.r
+9391f49d755372af5e9faef10553169c *tests/testthat/test_fetch.r
 a4c45c8f355eafbc660aede214e6f526 *tests/testthat/test_httr.r
+90a64274ba59f7232cf0adc3cbe8e86e *tests/testthat/test_httr_post.r
 6f1a4c681ca3b43318b45fdf4a87221f *tests/testthat/test_info.r
-c6822fe9c387ac4be79cd407ea7cd9b4 *tests/testthat/test_link.r
+f7f1fe31b6a902289daae7fe6e1b3554 *tests/testthat/test_link.r
 1ac649cfb5ba8744d2d62ef182c6bad9 *tests/testthat/test_net.r
 d9c769bcd94e0464e232d79ca6a063db *tests/testthat/test_parse.r
 a2bfe354a3cecc892df44f53fd6c9f58 *tests/testthat/test_post.r
-5a51a6ccff29b6a57e393edbef9331e2 *tests/testthat/test_query.r
+ae5359bee5086c5748a01c6a317a8b0a *tests/testthat/test_query.r
 d8b42d6257b50ed6dba47f0d00c28875 *tests/testthat/test_search.r
-9c6bee496f6534a5d8518582277d8c33 *tests/testthat/test_summary.r
+8fdddfa6eb899bfaf3a6c95414284c89 *tests/testthat/test_summary.r
 c435f7927e6e2f015bf43780f4957e1e *tests/testthat/test_webenv.r
-d6b6f2d94601a8060eccf9138e1e13d4 *vignettes/rentrez_tutorial.Rmd
+564a9cb29d2a80929d57d6234a2711a0 *vignettes/rentrez_tutorial.Rmd
diff --git a/NEWS b/NEWS
index cdc067c..7dac5e0 100644
--- a/NEWS
+++ b/NEWS
@@ -1,3 +1,21 @@
+Version 1.1.0
+------------------
+
+As of this release, rentrez will use httr::POST when sending > 200 ids to the
+NCBI. This should make working with large ID sets easier (thanks to the NCBI for
+supporing the POST methods, Reed Cartwright and Chris Stubben for pushing me on
+issue #89). 
+
+Other minor changes:
+    * Pass on error messages from NCBI when too many records are requested from 
+      `entrez_summary` (Issue #106)
+    * Useful error message when trying to send an empty ID set to NCBI (Issue #107)
+
+Version 1.0.4
+------------------
+Update to documentation and tests to accommodate versioned accessions now
+available from NCBI (see ?entrez_fetch and the vignette)
+
 Version 1.0.3
 ------------------  
 Update to only use https
diff --git a/R/base.r b/R/base.r
index 46deeee..2889457 100755
--- a/R/base.r
+++ b/R/base.r
@@ -19,30 +19,35 @@ entrez_tool <- function() 'rentrez'
 #Create a URL for the EUtils API. 
 #
 # This function is used by all the API-querying functions in rentrez to build
-# the appropriate url. Required arguments for each rentrez are handled in each
-# function. Those arguments that either ID(s) or are WebEnv cookie can be set
-# by passing a string or two argument names to `make_entrez_query`
-#
-#
-# efetch_url <- make_entrez_query("efetch", require_one_of=c("id", "WebEnv"), 
-#                                 id=c(23310964,23310965), db="pubmed",
-#                                 rettype="xml")
+# the appropriate url. Required arguments for each endpoint are handled by
+# specific funcitons. All of these functions can use the id_or_webenv() function
+# (below) to ensure that at least on of these arguments are provided.
 #
 
 
+
+
+
 make_entrez_query <- function(util, config, interface=".fcgi?", by_id=FALSE, ...){
     uri <- paste0("https://eutils.ncbi.nlm.nih.gov/entrez/eutils/", util, interface)
     args <- list(..., email=entrez_email(), tool=entrez_tool())
-    if(by_id){
-        ids_string <- paste0("id=", args$id, collapse="&")
-        args$id <- NULL
-        uri <- paste0(uri, ids_string)
-    }else{
-        if("id" %in% names(args)){
-            args$id <- paste(args$id, collapse=",")      
+    unsent_flag <- TRUE
+    if("id" %in% names(args)){
+        if(by_id){
+            ids_string <- paste0("id=", args$id, collapse="&")
+            args$id <- NULL
+            uri <- paste0(uri, ids_string)
+        } else {
+            args$id <- paste(args$id, collapse=",")
+        }
+    if(length(args$id) > 200){ 
+        response <- httr::POST(uri, body=args, config= config)
+            unsent_flag <- FALSE
         }
+    }    
+    if (unsent_flag) {
+        response <- httr::GET(uri, query=args, config= config) 
     }
-    response <- httr::GET(uri, query=args, config= config)
     entrez_check(response)
     httr::content(response, as="text", encoding="UTF-8")
 }
@@ -59,6 +64,9 @@ id_or_webenv <- function(){
         if(!is.null(args$web_history)){
             stop(msg, call.=FALSE)
         }
+        if (length(args$id) == 0){
+            stop("Vector of IDs to send to NCBI is empty, perhaps entrez_search or entrez_link found no hits?", call.=FALSE)        
+        }
         return(list(id=args$id))
     }
     if(is.null(args$web_history)){
diff --git a/R/entrez_fetch.r b/R/entrez_fetch.r
index e3aa445..055c7ec 100755
--- a/R/entrez_fetch.r
+++ b/R/entrez_fetch.r
@@ -4,7 +4,9 @@
 #' argument (which directly specifies the IDs as a numeric or character vector)
 #' or a \code{web_history} object as returned by 
 #' \code{\link{entrez_link}}, \code{\link{entrez_search}} or 
-#' \code{\link{entrez_post}}. See Table 1 in the linked reference for the set of 
+#' \code{\link{entrez_post}}. See
+#' \href{https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/}{Table 1} 
+#' in the linked reference for the set of 
 #' formats available for each database. In particular, note that sequence
 #' databases (nuccore, protein and their relatives) use specific format names
 #' (eg "native", "ipg") for different flavours of xml.
@@ -16,7 +18,9 @@
 #'
 #'@export
 #'@param db character, name of the database to use
-#'@param id vector (numeric or character), unique ID(s) for records in database \code{db} 
+#'@param id vector (numeric or character), unique ID(s) for records in database
+#'\code{db}. In the case of sequence databases these IDs can take form of an
+#' NCBI accession followed by a version number (eg AF123456.1 or AF123456.2).
 #'@param web_history, a web_history object 
 #'@param rettype character, format in which to get data (eg, fasta, xml...)
 #'@param retmode character, mode in which to receive data, defaults to 'text'
@@ -27,7 +31,7 @@
 #'@param parsed boolean should entrez_fetch attempt to parse the resulting 
 #' file. Only works with xml records (including those with rettypes other than
 #' "xml") at present
-#'@seealso \code{\link[httr]{config}} for available configs
+#'@seealso \code{\link[httr]{config}} for available '\code{httr}` configs
 #'@return character string containing the file created
 #'@return XMLInternalDocument a parsed XML document if parsed=TRUE and
 #'rettype is a flavour of XML.
diff --git a/R/entrez_search.r b/R/entrez_search.r
index 09a620b..3659fe8 100755
--- a/R/entrez_search.r
+++ b/R/entrez_search.r
@@ -12,6 +12,10 @@
 #' humans, and exclude review articles. More examples of the use of these search
 #' terms, and the more specific MeSH terms for precise searching, 
 #' is given in the package vignette.
+#'
+#' The\code{rentrez} tutorial provides some tips on how to make the most of 
+#' searches to the NCBI. In particular, the sections on uses of the "Filter"
+#' field and MeSH terms may in formulating precise searches. 
 #' 
 #'@export
 #'@param db character, name of the database to search for.
diff --git a/R/entrez_summary.r b/R/entrez_summary.r
index 12697f1..d23a36e 100755
--- a/R/entrez_summary.r
+++ b/R/entrez_summary.r
@@ -27,7 +27,9 @@
 #'
 #'@export
 #'@param db character Name of the database to search for
-#'@param id vector with unique ID(s) for records in database \code{db}. 
+#'@param id vector with unique ID(s) for records in database \code{db}.
+#' In the case of sequence databases these IDs can take form of an
+#' NCBI accession followed by a version number (eg AF123456.1 or AF123456.2)
 #'@param web_history A web_history object 
 #'@param always_return_list logical, return a list  of esummary objects even
 #'when only one ID is provided (see description for a note about this option)
@@ -35,6 +37,8 @@
 #'documentation linked to in references for a complete list
 #'@param config vector configuration options passed to \code{httr::GET}
 #'@param version either 1.0 or 2.0 see above for description
+#'@param retmode either "xml" or "json". By default, xml will be used for
+#'version 1.0 records, json for version 2.0.
 #'@references \url{http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_ESummary_} 
 #'@seealso \code{\link[httr]{config}} for available configs 
 #'@seealso \code{\link{extract_from_esummary}} which can be used to extract
@@ -60,14 +64,19 @@
 #'  extract_from_esummary(cv, "gene_sort") 
 #' }
 entrez_summary <- function(db, id=NULL, web_history=NULL, 
-                           version=c("2.0", "1.0"), always_return_list = FALSE, config=NULL, ...){
+                           version=c("2.0", "1.0"), always_return_list = FALSE, retmode=NULL, config=NULL, ...){
     identifiers <- id_or_webenv()
-    v <-match.arg(version)
-    retmode <- if(v == "2.0") "json" else "xml"
+    v <-match.arg(version) 
+    if( is.null(retmode)) {
+        retmode <- if( v == "1.0" ) "xml" else "json"
+    }
+    if (retmode == "json" & v == "1.0"){
+        stop("Version 1.0 records are only available as xml, not json")
+    }
     args <- c(list("esummary", db=db, config=config, retmode=retmode, version=v, ...), identifiers)
     response  <- do.call(make_entrez_query, args)
     whole_record <- parse_response(response, retmode)
-    parse_esummary(whole_record, always_return_list)
+    parse_esummary(whole_record, v, always_return_list)
 }
 
 #' Extract elements from a list of esummary records
@@ -95,7 +104,7 @@ extract_from_esummary.esummary_list <- function(esummaries, elements, simplify=T
 
 
 
-parse_esummary <- function(x, always_return_list) UseMethod("parse_esummary")
+parse_esummary <- function(x, version, always_return_list) UseMethod("parse_esummary")
 
 
 check_json_errs <- function(rec){
@@ -107,10 +116,20 @@ check_json_errs <- function(rec){
 }
 
 
-parse_esummary.list <- function(x, always_return_list){
+parse_esummary.list <- function(x, version, always_return_list){
     #already parsed by jsonlite, just add check for errors, then re-class
-    res <- x$result[2: length(x$result)]
-    sapply(res, check_json_errs)
+    #First make sure the file doesn't have an error at the root
+    if(!is.null(x[["error"]])){
+        warning("Esummary includes error message: ", x[["error"]], call.=FALSE)
+    }
+    res <- x$result[-1] #remove UIDs from result (they are already names of sub-elements)    
+    # Make sure there are some records in this file
+    if(length(res) == 0){
+        stop("No esummary records found in file", call.=FALSE)
+    }
+    #Finally check for errors _within_ each recods
+    sapply(res, check_json_errs)    
+    #OK: all clear, return the records
     res <- lapply(res, add_class, new_class="esummary")
     if(length(res)==1 & !always_return_list){
         return(res[[1]])
@@ -129,24 +148,34 @@ parse_esummary.list <- function(x, always_return_list){
 
 #
 #@export
-parse_esummary.XMLInternalDocument  <- function(x, always_return_list){
+parse_esummary.XMLInternalDocument  <- function(x, version, always_return_list){
     check_xml_errors(x)
-    recs <- x["//DocSum"]
-    if(length(recs)==0){
-       stop("Esummary document contains no DocSums, try 'version=2.0'?)")
+    #Version 2.0 records have no type information (int, list etc) so we 
+    # can onyl return them as characters
+    if(version == "2.0"){
+        res <- lapply(x["//DocumentSummary"], xmlToList)
+        res <- lapply(res, add_class, "esummary")
+        names(res) <- sapply(res, function(x) x[[".attrs"]]["uid"])
+    }
+    else{
+        recs <- x["//DocSum"] 
+
+        if(length(recs)==0){
+           stop("Esummary document contains no DocSums, try 'version=2.0'?)")
+        }
+        per_rec <- function(r){
+            res <- xpathApply(r, "Item", parse_node)
+            names(res) <- xpathApply(r, "Item", xmlGetAttr, "Name")
+            res <- c(res, file=x)
+            class(res) <- c("esummary", class(res))
+            return(res)
+        } 
+        if(length(recs)==1 & !always_return_list){
+            return(per_rec(recs[[1]]))
+        } 
+        res <- lapply(recs, per_rec)
+        names(res) <-  xpathSApply(x, "//DocSum/Id", xmlValue)
     }
-    per_rec <- function(r){
-        res <- xpathApply(r, "Item", parse_node)
-        names(res) <- xpathApply(r, "Item", xmlGetAttr, "Name")
-        res <- c(res, file=x)
-        class(res) <- c("esummary", class(res))
-        return(res)
-    } 
-    if(length(recs)==1 & !always_return_list){
-        return(per_rec(recs[[1]]))
-    } 
-    res <- lapply(recs, per_rec)
-    names(res) <-  xpathSApply(x, "//DocSum/Id", xmlValue)
     class(res) <- c("esummary_list", "list")
     res
 }
diff --git a/build/vignette.rds b/build/vignette.rds
index 7982763..079d81f 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/doc/rentrez_tutorial.Rmd b/inst/doc/rentrez_tutorial.Rmd
index cbaf1f6..2b39914 100644
--- a/inst/doc/rentrez_tutorial.Rmd
+++ b/inst/doc/rentrez_tutorial.Rmd
@@ -174,6 +174,22 @@ of available terms or any given data base with `entrez_db_searchable()`
 entrez_db_searchable("sra")
 ```
 
+### Using the Filter field
+
+"Filter" is a special field that, as the names suggests, allows you to limit 
+records returned by a search to set of filtering criteria. There is no programmatic 
+way to find the particular terms that can be used with the Filter field. 
+However, the NCBI's website provides an "advanced search" tool for some 
+databases that can be used to discover these terms. 
+
+
+For example, to find the list of possible to find all of the terms that can be
+used to filter searches to the nucleotide database using the 
+[advanced search for that databse](https://www.ncbi.nlm.nih.gov/nuccore/advanced).
+On that page selecting "Filter" from the first drop-down box then clicking 
+"Show index list" will allow the user to scroll through possible filtering
+terms.
+
 ###Precise queries using MeSH terms
 
 In addition to the search terms described above, the NCBI allows searches using
@@ -494,9 +510,9 @@ tax_list$Taxon$GeneticCode
 
 For more complex records, which generate deeply-nested lists, you can use
 [XPath expressions](https://en.wikipedia.org/wiki/XPath) along with the function 
-`XML::xpathSApply` or the extraction operatord `[` and `[[` to extract specific parts of the
-file. For instance, we can get the scientific name of each taxon in _T.
-thermophila_'s lineage by specifying a path through the XML
+`XML::xpathSApply` or the extraction operatord `[` and `[[` to extract specific 
+parts of the file. For instance, we can get the scientific name of each taxon 
+in _T. thermophila_'s lineage by specifying a path through the XML
 
 ```{r, Tt_path}
 tt_lineage <- tax_rec["//LineageEx/Taxon/ScientificName"]
@@ -535,6 +551,11 @@ upload
 The NCBI sends you back some information you can use to refer to the posted IDs. 
 In `rentrez`, that information is represented as a `web_history` object. 
 
+Note that if you have a very long list of IDs you may receive a 414 error when
+you try to upload them. If you have such a list (and they come from an external
+sources rather than a search that can be save to a `web_history` object), you
+may have to 'chunk' the IDs into smaller sets that can processed. 
+
 ###Get a `web_history` object from `entrez_search` or `entrez_link()`
 
 In addition to directly uploading IDs to the NCBI, you can use the web history
diff --git a/inst/doc/rentrez_tutorial.html b/inst/doc/rentrez_tutorial.html
index 7849d85..39f57fd 100644
--- a/inst/doc/rentrez_tutorial.html
+++ b/inst/doc/rentrez_tutorial.html
@@ -4,7 +4,7 @@
 
 <head>
 
-<meta charset="utf-8">
+<meta charset="utf-8" />
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
@@ -12,7 +12,7 @@
 
 <meta name="author" content="David winter" />
 
-<meta name="date" content="2016-10-26" />
+<meta name="date" content="2017-06-01" />
 
 <title>Rentrez Tutorial</title>
 
@@ -20,28 +20,46 @@
 
 <style type="text/css">code{white-space: pre;}</style>
 <style type="text/css">
+div.sourceCode { overflow-x: auto; }
 table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
   margin: 0; padding: 0; vertical-align: baseline; border: none; }
 table.sourceCode { width: 100%; line-height: 100%; }
 td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
 td.sourceCode { padding-left: 5px; }
-code > span.kw { color: #007020; font-weight: bold; }
-code > span.dt { color: #902000; }
-code > span.dv { color: #40a070; }
-code > span.bn { color: #40a070; }
-code > span.fl { color: #40a070; }
-code > span.ch { color: #4070a0; }
-code > span.st { color: #4070a0; }
-code > span.co { color: #60a0b0; font-style: italic; }
-code > span.ot { color: #007020; }
-code > span.al { color: #ff0000; font-weight: bold; }
-code > span.fu { color: #06287e; }
-code > span.er { color: #ff0000; font-weight: bold; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
 </style>
 
 
 
-<link href="data:text/css,body%20%7B%0A%20%20background%2Dcolor%3A%20%23fff%3B%0A%20%20margin%3A%201em%20auto%3B%0A%20%20max%2Dwidth%3A%20700px%3B%0A%20%20overflow%3A%20visible%3B%0A%20%20padding%2Dleft%3A%202em%3B%0A%20%20padding%2Dright%3A%202em%3B%0A%20%20font%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0A%20%20font%2Dsize%3A%2014px%3B%0A%20%20line%2Dheight%3A%201%2E35%3B%0A%7D%0A%0A%23header%20%7B%0A%20%20text%2Dalign%3A% [...]
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
 
 </head>
 
@@ -52,7 +70,7 @@ code > span.er { color: #ff0000; font-weight: bold; }
 
 <h1 class="title toc-ignore">Rentrez Tutorial</h1>
 <h4 class="author"><em>David winter</em></h4>
-<h4 class="date"><em>2016-10-26</em></h4>
+<h4 class="date"><em>2017-06-01</em></h4>
 
 
 <div id="TOC">
@@ -61,6 +79,7 @@ code > span.er { color: #ff0000; font-weight: bold; }
 <li><a href="#getting-started-with-the-rentrez">Getting started with the rentrez</a></li>
 <li><a href="#searching-databases-entrez_search">Searching databases: <code>entrez_search()</code></a><ul>
 <li><a href="#building-search-terms">Building search terms</a></li>
+<li><a href="#using-the-filter-field">Using the Filter field</a></li>
 <li><a href="#precise-queries-using-mesh-terms">Precise queries using MeSH terms</a></li>
 <li><a href="#advanced-counting">Advanced counting</a></li>
 </ul></li>
@@ -89,14 +108,14 @@ code > span.er { color: #ff0000; font-weight: bold; }
 
 <div id="introduction-the-ncbi-entrez-and-rentrez." class="section level2">
 <h2>Introduction: The NCBI, entrez and <code>rentrez</code>.</h2>
-<p>The NCBI shares a <em>lot</em> of data. At the time this document was compiled, there were 26.6 million papers in <a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed</a>, including 4.2 million full-text records available in <a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed Central</a>. <a href="http://www.ncbi.nlm.nih.gov/nuccore">The NCBI Nucleotide Database</a> (which includes GenBank) has data for 219.2 million different sequences, and <a href="http://www.ncbi.nlm.nih.gov/snp/" [...]
+<p>The NCBI shares a <em>lot</em> of data. At the time this document was compiled, there were 27.3 million papers in <a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed</a>, including 4.5 million full-text records available in <a href="http://www.ncbi.nlm.nih.gov/pubmed/">PubMed Central</a>. <a href="http://www.ncbi.nlm.nih.gov/nuccore">The NCBI Nucleotide Database</a> (which includes GenBank) has data for 237.5 million different sequences, and <a href="http://www.ncbi.nlm.nih.gov/snp/" [...]
 <p>The NCBI makes this data available through a <a href="http://www.ncbi.nlm.nih.gov/">web interface</a>, an <a href="ftp://ftp.ncbi.nlm.nih.gov/">FTP server</a> and through a REST API called the <a href="http://www.ncbi.nlm.nih.gov/books/NBK25500/">Entrez Utilities</a> (<code>Eutils</code> for short). This package provides functions to use that API, allowing users to gather and combine data from multiple NCBI databases in the comfort of an R session or script.</p>
 </div>
 <div id="getting-started-with-the-rentrez" class="section level2">
 <h2>Getting started with the rentrez</h2>
 <p>To make the most of all the data the NCBI shares you need to know a little about their databases, the records they contain and the ways you can find those records. The <a href="http://www.ncbi.nlm.nih.gov/home/documentation.shtml">NCBI provides extensive documentation for each of their databases</a> and for the <a href="http://www.ncbi.nlm.nih.gov/books/NBK25501/">EUtils API that <code>rentrez</code> takes advantage of</a>. There are also some helper functions in <code>rentrez</code>  [...]
 <p>First, you can use <code>entrez_dbs()</code> to find the list of available databases:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_dbs</span>()</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_dbs</span>()</code></pre></div>
 <pre><code>##  [1] "pubmed"          "protein"         "nuccore"        
 ##  [4] "nucleotide"      "nucgss"          "nucest"         
 ##  [7] "structure"       "sparcle"         "genome"         
@@ -112,41 +131,46 @@ code > span.er { color: #ff0000; font-weight: bold; }
 ## [37] "pcassay"         "biosystems"      "pccompound"     
 ## [40] "pcsubstance"     "pubmedhealth"    "seqannot"       
 ## [43] "snp"             "sra"             "taxonomy"       
-## [46] "unigene"         "gencoll"         "gtr"</code></pre>
+## [46] "biocollections"  "unigene"         "gencoll"        
+## [49] "gtr"</code></pre>
 <p>There is a set of functions with names starting <code>entrez_db_</code> that can be used to gather more information about each of these databases:</p>
 <p><strong>Functions that help you learn about NCBI databases</strong></p>
 <table>
+<colgroup>
+<col width="32%"></col>
+<col width="67%"></col>
+</colgroup>
 <thead>
 <tr class="header">
-<th align="left">Function name</th>
-<th align="left">Return</th>
+<th>Function name</th>
+<th>Return</th>
 </tr>
 </thead>
 <tbody>
 <tr class="odd">
-<td align="left"><code>entrez_db_summary()</code></td>
-<td align="left">Brief description of what the database is</td>
+<td><code>entrez_db_summary()</code></td>
+<td>Brief description of what the database is</td>
 </tr>
 <tr class="even">
-<td align="left"><code>entrez_db_searchable()</code></td>
-<td align="left">Set of search terms that can used with this database</td>
+<td><code>entrez_db_searchable()</code></td>
+<td>Set of search terms that can used with this database</td>
 </tr>
 <tr class="odd">
-<td align="left"><code>entrez_db_links()</code></td>
-<td align="left">Set of databases that might contain linked records</td>
+<td><code>entrez_db_links()</code></td>
+<td>Set of databases that might contain linked records</td>
 </tr>
 </tbody>
 </table>
 <p>For instance, we can get a description of the somewhat cryptically named database ‘cdd’…</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_summary</span>(<span class="st">"cdd"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_summary</span>(<span class="st">"cdd"</span>)</code></pre></div>
 <pre><code>##  DbName: cdd
 ##  MenuName: Conserved Domains
 ##  Description: Conserved Domain Database
-##  DbBuild: Build160706-0602.1
-##  Count: 52411
-##  LastUpdate: 2016/07/06 12:08</code></pre>
+##  DbBuild: Build170330-1240.1
+##  Count: 56066
+##  LastUpdate: 2017/03/31 16:02</code></pre>
 <p>… or find out which search terms can be used with the Sequence Read Archive (SRA) database (which contains raw data from sequencing projects):</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_searchable</span>(<span class="st">"sra"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_searchable</span>(<span class="st">"sra"</span>)</code></pre></div>
 <pre><code>## Searchable fields for database 'sra'
 ##   ALL     All terms from all searchable fields 
 ##   UID     Unique number assigned to publication 
@@ -175,47 +199,47 @@ code > span.er { color: #ff0000; font-weight: bold; }
 <div id="searching-databases-entrez_search" class="section level2">
 <h2>Searching databases: <code>entrez_search()</code></h2>
 <p>Very often, the first thing you’ll want to do with <code>rentrez</code> is search a given NCBI database to find records that match some keywords. You can do this using the function <code>entrez_search()</code>. In the simplest case you just need to provide a database name (<code>db</code>) and a search term (<code>term</code>) so let’s search PubMed for articles about the <code>R language</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r">r_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">term=</span><span class="st">"R Language"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">term=</span><span class="st">"R Language"</span>)</code></pre></div>
 <p>The object returned by a search acts like a list, and you can get a summary of its contents by printing it.</p>
-<pre class="sourceCode r"><code class="sourceCode r">r_search</code></pre>
-<pre><code>## Entrez search result with 9461 hits (object contains 20 IDs and no web_history object)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r_search</code></pre></div>
+<pre><code>## Entrez search result with 10034 hits (object contains 20 IDs and no web_history object)
 ##  Search term (as translated):  R[All Fields] AND ("programming languages"[MeSH Te ...</code></pre>
 <p>There are a few things to note here. First, the NCBI’s server has worked out that we meant R as a programming language, and so included the <a href="http://www.ncbi.nlm.nih.gov/mesh">‘MeSH’ term</a> term associated with programming languages. We’ll worry about MeSH terms and other special queries later, for now just note that you can use this feature to check that your search term was interpreted in the way you intended. Second, there are many more ‘hits’ for this search than there ar [...]
 <p>The IDs are the most important thing returned here. They allow us to fetch records matching those IDs, gather summary data about them or find cross-referenced records in other databases. We access the IDs as a vector using the <code>$</code> operator:</p>
-<pre class="sourceCode r"><code class="sourceCode r">r_search$ids</code></pre>
-<pre><code>##  [1] "27774058" "27771785" "27771004" "27694991" "27770071" "27491423"
-##  [7] "27762050" "27760879" "27509845" "27312095" "27755987" "27755648"
-## [13] "27751661" "27346093" "27059941" "26452616" "27747969" "27746120"
-## [19] "25544604" "27744111"</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">r_search$ids</code></pre></div>
+<pre><code>##  [1] "28557953" "28557948" "28557853" "28557747" "28555558" "28553573"
+##  [7] "28552027" "28549446" "28546307" "28545980" "28541650" "28542682"
+## [13] "28542514" "28541543" "28538824" "28538379" "28535257" "28534345"
+## [19] "28529495" "28528230"</code></pre>
 <p>If we want to get more than 20 IDs we can do so by increasing the <code>ret_max</code> argument.</p>
-<pre class="sourceCode r"><code class="sourceCode r">another_r_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">term=</span><span class="st">"R Language"</span>, <span class="dt">retmax=</span><span class="dv">40</span>)
-another_r_search</code></pre>
-<pre><code>## Entrez search result with 9461 hits (object contains 40 IDs and no web_history object)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">another_r_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">term=</span><span class="st">"R Language"</span>, <span class="dt">retmax=</span><span class="dv">40</span>)
+another_r_search</code></pre></div>
+<pre><code>## Entrez search result with 10034 hits (object contains 40 IDs and no web_history object)
 ##  Search term (as translated):  R[All Fields] AND ("programming languages"[MeSH Te ...</code></pre>
 <p>If we want to get IDs for all of the thousands of records that match this search, we can use the NCBI’s web history feature <a href="#web_history">described below</a>.</p>
 <div id="building-search-terms" class="section level3">
 <h3>Building search terms</h3>
 <p>The EUtils API uses a special syntax to build search terms. You can search a database against a specific term using the format <code>query[SEARCH FIELD]</code>, and combine multiple such searches using the boolean operators <code>AND</code>, <code>OR</code> and <code>NOT</code>.</p>
 <p>For instance, we can find next generation sequence datasets for the (amazing…) ciliate <em>Tetrahymena thermophila</em> by using the organism (‘ORGN’) search field:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
               <span class="dt">term=</span><span class="st">"Tetrahymena thermophila[ORGN]"</span>,
-              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre>
-<pre><code>## Entrez search result with 165 hits (object contains 0 IDs and no web_history object)
+              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre></div>
+<pre><code>## Entrez search result with 220 hits (object contains 0 IDs and no web_history object)
 ##  Search term (as translated):  "Tetrahymena thermophila"[Organism]</code></pre>
 <p>We can narrow our focus to only those records that have been added recently (using the colon to specify a range of values):</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
               <span class="dt">term=</span><span class="st">"Tetrahymena thermophila[ORGN] AND 2013:2015[PDAT]"</span>,
-              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre>
+              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre></div>
 <pre><code>## Entrez search result with 75 hits (object contains 0 IDs and no web_history object)
 ##  Search term (as translated):  "Tetrahymena thermophila"[Organism] AND 2013[PDAT] ...</code></pre>
 <p>Or include recent records for either <em>T. thermophila</em> or it’s close relative <em>T. borealis</em> (using parentheses to make ANDs and ORs explicit).</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"sra"</span>,
               <span class="dt">term=</span><span class="st">"(Tetrahymena thermophila[ORGN] OR Tetrahymena borealis[ORGN]) AND 2013:2015[PDAT]"</span>,
-              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre>
+              <span class="dt">retmax=</span><span class="dv">0</span>)</code></pre></div>
 <pre><code>## Entrez search result with 75 hits (object contains 0 IDs and no web_history object)
 ##  Search term (as translated):  ("Tetrahymena thermophila"[Organism] OR "Tetrahyme ...</code></pre>
 <p>The set of search terms available varies between databases. You can get a list of available terms or any given data base with <code>entrez_db_searchable()</code></p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_searchable</span>(<span class="st">"sra"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_db_searchable</span>(<span class="st">"sra"</span>)</code></pre></div>
 <pre><code>## Searchable fields for database 'sra'
 ##   ALL     All terms from all searchable fields 
 ##   UID     Unique number assigned to publication 
@@ -240,12 +264,17 @@ another_r_search</code></pre>
 ##   ALN     Percent of aligned reads 
 ##   MBS     Size in megabases</code></pre>
 </div>
+<div id="using-the-filter-field" class="section level3">
+<h3>Using the Filter field</h3>
+<p>“Filter” is a special field that, as the names suggests, allows you to limit records returned by a search to set of filtering criteria. There is no programmatic way to find the particular terms that can be used with the Filter field. However, the NCBI’s website provides an “advanced search” tool for some databases that can be used to discover these terms.</p>
+<p>For example, to find the list of possible to find all of the terms that can be used to filter searches to the nucleotide database using the <a href="https://www.ncbi.nlm.nih.gov/nuccore/advanced">advanced search for that databse</a>. On that page selecting “Filter” from the first drop-down box then clicking “Show index list” will allow the user to scroll through possible filtering terms.</p>
+</div>
 <div id="precise-queries-using-mesh-terms" class="section level3">
 <h3>Precise queries using MeSH terms</h3>
 <p>In addition to the search terms described above, the NCBI allows searches using <a href="http://www.ncbi.nlm.nih.gov/mesh">Medical Subject Heading (MeSH)</a> terms. These terms create a ‘controlled vocabulary’, and allow users to make very finely controlled queries of databases.</p>
 <p>For instance, if you were interested in reviewing studies on how a class of anti-malarial drugs called Folic Acid Antagonists work against <em>Plasmodium vivax</em> (a particular species of malarial parasite), you could use this search:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db   =</span> <span class="st">"pubmed"</span>,
-              <span class="dt">term =</span> <span class="st">"(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db   =</span> <span class="st">"pubmed"</span>,
+              <span class="dt">term =</span> <span class="st">"(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])"</span>)</code></pre></div>
 <pre><code>## Entrez search result with 12 hits (object contains 12 IDs and no web_history object)
 ##  Search term (as translated):  "malaria, vivax"[MeSH Terms] AND "folic acid antag ...</code></pre>
 <p>The complete set of MeSH terms is available as a database from the NCBI. That means it is possible to download detailed information about each term and find the ways in which terms relate to each other using <code>rentrez</code>. You can search for specific terms with <code>entrez_search(db="mesh", term =...)</code> and learn about the results of your search using the tools described below.</p>
@@ -253,7 +282,7 @@ another_r_search</code></pre>
 <div id="advanced-counting" class="section level3">
 <h3>Advanced counting</h3>
 <p>As you can see above, the object returned by <code>entrez_search()</code> includes the number of records matching a given search. This means you can learn a little about the composition of, or trends in, the records stored in the NCBI’s databases using only the search utility. For instance, let’s track the rise of the scientific buzzword “connectome” in PubMed, programmatically creating search terms for the <code>PDAT</code> field:</p>
-<pre class="sourceCode r"><code class="sourceCode r">search_year <-<span class="st"> </span>function(year, term){
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">search_year <-<span class="st"> </span>function(year, term){
     query <-<span class="st"> </span><span class="kw">paste</span>(term, <span class="st">"AND ("</span>, year, <span class="st">"[PDAT])"</span>)
     <span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">term=</span>query, <span class="dt">retmax=</span><span class="dv">0</span>)$count
 }
@@ -261,8 +290,8 @@ another_r_search</code></pre>
 year <-<span class="st"> </span><span class="dv">2008</span>:<span class="dv">2014</span>
 papers <-<span class="st"> </span><span class="kw">sapply</span>(year, search_year, <span class="dt">term=</span><span class="st">"Connectome"</span>, <span class="dt">USE.NAMES=</span><span class="ot">FALSE</span>)
 
-<span class="kw">plot</span>(year, papers, <span class="dt">type=</span><span class="st">'b'</span>, <span class="dt">main=</span><span class="st">"The Rise of the Connectome"</span>)</code></pre>
-<p><img src=" [...]
+<span class="kw">plot</span>(year, papers, <span class="dt">type=</span><span class="st">'b'</span>, <span class="dt">main=</span><span class="st">"The Rise of the Connectome"</span>)</code></pre></div>
+<p><img src=" [...]
 </div>
 </div>
 <div id="finding-cross-references-entrez_link" class="section level2">
@@ -272,73 +301,81 @@ papers <-<span class="st"> </span><span class="kw">sapply</span>(year, search
 <h3>My god, it’s full of links</h3>
 <p>To get an idea of the degree to which records in the NCBI are cross-linked we can find all NCBI data associated with a single gene (in this case the Amyloid Beta Precursor gene, the product of which is associated with the plaques that form in the brains of Alzheimer’s Disease patients).</p>
 <p>The function <code>entrez_link()</code> can be used to find cross-referenced records. In the most basic case we need to provide an ID (<code>id</code>), the database from which this ID comes (<code>dbfrom</code>) and the name of a database in which to find linked records (<code>db</code>). If we set this last argument to ‘all’ we can find links in multiple databases:</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_the_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">'gene'</span>, <span class="dt">id=</span><span class="dv">351</span>, <span class="dt">db=</span><span class="st">'all'</span>)
-all_the_links</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_the_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">'gene'</span>, <span class="dt">id=</span><span class="dv">351</span>, <span class="dt">db=</span><span class="st">'all'</span>)
+all_the_links</code></pre></div>
 <pre><code>## elink object with contents:
 ##  $links: IDs for linked records from NCBI
 ## </code></pre>
 <p>Just as with <code>entrez_search</code> the returned object behaves like a list, and we can learn a little about its contents by printing it. In the case, all of the information is in <code>links</code> (and there’s a lot of them!):</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_the_links$links</code></pre>
-<pre><code>## elink result with information from 52 databases:
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_the_links$links</code></pre></div>
+<pre><code>## elink result with information from 54 databases:
 ##  [1] gene_bioconcepts               gene_biosystems               
 ##  [3] gene_biosystems_all            gene_clinvar                  
-##  [5] gene_dbvar                     gene_gene_h3k4me3             
+##  [5] gene_clinvar_specific          gene_dbvar                    
 ##  [7] gene_genome                    gene_gtr                      
 ##  [9] gene_homologene                gene_medgen_diseases          
 ## [11] gene_pcassay_alltarget_list    gene_pcassay_alltarget_summary
 ## [13] gene_pcassay_rnai              gene_pcassay_target           
 ## [15] gene_probe                     gene_structure                
 ## [17] gene_bioproject                gene_books                    
-## [19] gene_cdd                       gene_gene_neighbors           
-## [21] gene_genereviews               gene_genome2                  
-## [23] gene_geoprofiles               gene_nuccore                  
-## [25] gene_nuccore_mgc               gene_nuccore_pos              
-## [27] gene_nuccore_refseqgene        gene_nuccore_refseqrna        
-## [29] gene_nucest                    gene_nucest_clust             
-## [31] gene_nucleotide                gene_nucleotide_clust         
-## [33] gene_nucleotide_mgc            gene_nucleotide_mgc_url       
-## [35] gene_nucleotide_pos            gene_omim                     
-## [37] gene_pcassay_proteintarget     gene_pccompound               
-## [39] gene_pcsubstance               gene_pmc                      
-## [41] gene_pmc_nucleotide            gene_protein                  
-## [43] gene_protein_refseq            gene_pubmed                   
-## [45] gene_pubmed_citedinomim        gene_pubmed_pmc_nucleotide    
-## [47] gene_pubmed_rif                gene_snp                      
-## [49] gene_snp_geneview              gene_taxonomy                 
-## [51] gene_unigene                   gene_varview</code></pre>
+## [19] gene_cdd                       gene_gene_h3k4me3             
+## [21] gene_gene_neighbors            gene_genereviews              
+## [23] gene_genome2                   gene_geoprofiles              
+## [25] gene_nuccore                   gene_nuccore_mgc              
+## [27] gene_nuccore_pos               gene_nuccore_refseqgene       
+## [29] gene_nuccore_refseqrna         gene_nucest                   
+## [31] gene_nucest_clust              gene_nucleotide               
+## [33] gene_nucleotide_clust          gene_nucleotide_mgc           
+## [35] gene_nucleotide_mgc_url        gene_nucleotide_pos           
+## [37] gene_omim                      gene_pcassay_proteintarget    
+## [39] gene_pccompound                gene_pcsubstance              
+## [41] gene_pmc                       gene_pmc_nucleotide           
+## [43] gene_protein                   gene_protein_refseq           
+## [45] gene_pubmed                    gene_pubmed_citedinomim       
+## [47] gene_pubmed_pmc_nucleotide     gene_pubmed_rif               
+## [49] gene_snp                       gene_snp_geneview             
+## [51] gene_sparcle                   gene_taxonomy                 
+## [53] gene_unigene                   gene_varview</code></pre>
 <p>The names of the list elements are in the format <code>[source_database]_[linked_database]</code> and the elements themselves contain a vector of linked-IDs. So, if we want to find open access publications associated with this gene we could get linked records in PubMed Central:</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_the_links$links$gene_pmc[<span class="dv">1</span>:<span class="dv">10</span>]</code></pre>
-<pre><code>##  [1] "5054717" "4944527" "4837062" "4769228" "4760399" "4751366" "4748078"
-##  [8] "4667289" "4648968" "4600482"</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_the_links$links$gene_pmc[<span class="dv">1</span>:<span class="dv">10</span>]</code></pre></div>
+<pre><code>##  [1] "5395029" "5104494" "5070722" "5054717" "4990654" "4944527" "4861488"
+##  [8] "4841699" "4841593" "4837062"</code></pre>
 <p>Or if were interested in this gene’s role in diseases we could find links to clinVar:</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_the_links$links$gene_clinvar</code></pre>
-<pre><code>##  [1] "253512" "253403" "236549" "236548" "236547" "221889" "160886"
-##  [8] "155682" "155309" "155093" "155053" "154360" "154063" "153438"
-## [15] "152839" "151388" "150018" "149551" "149418" "149160" "149035"
-## [22] "148411" "148262" "148180" "146125" "145984" "145474" "145468"
-## [29] "145332" "145107" "144677" "144194" "127268" "98242"  "98241" 
-## [36] "98240"  "98239"  "98238"  "98237"  "98236"  "98235"  "59247" 
-## [43] "59246"  "59245"  "59243"  "59226"  "59224"  "59223"  "59222" 
-## [50] "59221"  "59010"  "59005"  "59004"  "37145"  "32099"  "18106" 
-## [57] "18105"  "18104"  "18103"  "18102"  "18101"  "18100"  "18099" 
-## [64] "18098"  "18097"  "18096"  "18095"  "18094"  "18093"  "18092" 
-## [71] "18091"  "18090"  "18089"  "18088"  "18087"</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_the_links$links$gene_clinvar</code></pre></div>
+<pre><code>##   [1] "397432" "396808" "396332" "396150" "394309" "391496" "369359"
+##   [8] "339659" "339658" "339657" "339656" "339655" "339654" "339653"
+##  [15] "339652" "339651" "339650" "339649" "339648" "339647" "339646"
+##  [22] "339645" "339644" "339643" "339642" "339641" "339640" "339639"
+##  [29] "339638" "339637" "339636" "339635" "339634" "339633" "339632"
+##  [36] "339631" "339630" "339629" "339628" "339627" "339626" "339625"
+##  [43] "339624" "339623" "339622" "339621" "339620" "339619" "253512"
+##  [50] "253403" "236549" "236548" "236547" "221889" "160886" "155682"
+##  [57] "155309" "155093" "155053" "154360" "154063" "153438" "152839"
+##  [64] "151388" "150018" "149551" "149418" "149160" "149035" "148411"
+##  [71] "148262" "148180" "146125" "145984" "145474" "145468" "145332"
+##  [78] "145107" "144677" "144194" "127268" "98242"  "98241"  "98240" 
+##  [85] "98239"  "98238"  "98237"  "98236"  "98235"  "59247"  "59246" 
+##  [92] "59245"  "59243"  "59226"  "59224"  "59223"  "59222"  "59221" 
+##  [99] "59010"  "59005"  "59004"  "37145"  "32099"  "18106"  "18105" 
+## [106] "18104"  "18103"  "18102"  "18101"  "18100"  "18099"  "18098" 
+## [113] "18097"  "18096"  "18095"  "18094"  "18093"  "18092"  "18091" 
+## [120] "18090"  "18089"  "18088"  "18087"</code></pre>
 </div>
 <div id="narrowing-our-focus" class="section level3">
 <h3>Narrowing our focus</h3>
 <p>If we know beforehand what sort of links we’d like to find , we can to use the <code>db</code> argument to narrow the focus of a call to <code>entrez_link</code>.</p>
 <p>For instance, say we are interested in knowing about all of the RNA transcripts associated with the Amyloid Beta Precursor gene in humans. Transcript sequences are stored in the nucleotide database (referred to as <code>nuccore</code> in EUtils), so to find transcripts associated with a given gene we need to set <code>dbfrom=gene</code> and <code>db=nuccore</code>.</p>
-<pre class="sourceCode r"><code class="sourceCode r">nuc_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">'gene'</span>, <span class="dt">id=</span><span class="dv">351</span>, <span class="dt">db=</span><span class="st">'nuccore'</span>)
-nuc_links</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">nuc_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">'gene'</span>, <span class="dt">id=</span><span class="dv">351</span>, <span class="dt">db=</span><span class="st">'nuccore'</span>)
+nuc_links</code></pre></div>
 <pre><code>## elink object with contents:
 ##  $links: IDs for linked records from NCBI
 ## </code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">nuc_links$links</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">nuc_links$links</code></pre></div>
 <pre><code>## elink result with information from 5 databases:
 ## [1] gene_nuccore            gene_nuccore_mgc        gene_nuccore_pos       
 ## [4] gene_nuccore_refseqgene gene_nuccore_refseqrna</code></pre>
 <p>The object we get back contains links to the nucleotide database generally, but also to special subsets of that database like <a href="http://www.ncbi.nlm.nih.gov/refseq/">refseq</a>. We can take advantage of this narrower set of links to find IDs that match unique transcripts from our gene of interest.</p>
-<pre class="sourceCode r"><code class="sourceCode r">nuc_links$links$gene_nuccore_refseqrna</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">nuc_links$links$gene_nuccore_refseqrna</code></pre></div>
 <pre><code>##  [1] "324021747" "324021746" "324021739" "324021737" "324021735"
 ##  [6] "228008405" "228008404" "228008403" "228008402" "228008401"</code></pre>
 <p>We can use these ids in calls to <code>entrez_fetch()</code> or <code>entrez_summary()</code> to learn more about the transcripts they represent.</p>
@@ -346,16 +383,16 @@ nuc_links</code></pre>
 <div id="external-links" class="section level3">
 <h3>External links</h3>
 <p>In addition to finding data within the NCBI, <code>entrez_link</code> can turn up connections to external databases. Perhaps the most interesting example is finding links to the full text of papers in PubMed. For example, when I wrote this document the first paper linked to Amyloid Beta Precursor had a unique ID of <code>25500142</code>. We can find links to the full text of that paper with <code>entrez_link</code> by setting the <code>cmd</code> argument to ‘llinks’:</p>
-<pre class="sourceCode r"><code class="sourceCode r">paper_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span><span class="dv">25500142</span>, <span class="dt">cmd=</span><span class="st">"llinks"</span>)
-paper_links</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">paper_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span><span class="dv">25500142</span>, <span class="dt">cmd=</span><span class="st">"llinks"</span>)
+paper_links</code></pre></div>
 <pre><code>## elink object with contents:
 ##  $linkouts: links to external websites</code></pre>
 <p>Each element of the <code>linkouts</code> object contains information about an external source of data on this paper:</p>
-<pre class="sourceCode r"><code class="sourceCode r">paper_links$linkouts</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">paper_links$linkouts</code></pre></div>
 <pre><code>## $ID_25500142
 ## $ID_25500142[[1]]
 ## Linkout from Elsevier Science 
-##  $Url: http://linkinghub.elsevier ...
+##  $Url: https://linkinghub.elsevie ...
 ## 
 ## $ID_25500142[[2]]
 ## Linkout from Europe PubMed Central 
@@ -370,32 +407,37 @@ paper_links</code></pre>
 ##  $Url: https://www.ncbi.nlm.nih.g ...
 ## 
 ## $ID_25500142[[5]]
+## Linkout from PubMed Central Canada 
+##  $Url: http://pubmedcentralcanada ...
+## 
+## $ID_25500142[[6]]
 ## Linkout from MedlinePlus Health Information 
 ##  $Url: https://medlineplus.gov/al ...
 ## 
-## $ID_25500142[[6]]
+## $ID_25500142[[7]]
 ## Linkout from Mouse Genome Informatics (MGI) 
 ##  $Url: http://www.informatics.jax ...</code></pre>
 <p>Each of those linkout objects contains quite a lot of information, but the URL is probably the most useful. For that reason, <code>rentrez</code> provides the function <code>linkout_urls</code> to make extracting just the URL simple:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">linkout_urls</span>(paper_links)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">linkout_urls</span>(paper_links)</code></pre></div>
 <pre><code>## $ID_25500142
-## [1] "http://linkinghub.elsevier.com/retrieve/pii/S0014-4886(14)00393-8"      
+## [1] "https://linkinghub.elsevier.com/retrieve/pii/S0014-4886(14)00393-8"     
 ## [2] "http://europepmc.org/abstract/MED/25500142"                             
 ## [3] "http://ovidsp.ovid.com/ovidweb.cgi?T=JS&PAGE=linkout&SEARCH=25500142.ui"
 ## [4] "https://www.ncbi.nlm.nih.gov/pmc/articles/pmid/25500142/"               
-## [5] "https://medlineplus.gov/alzheimersdisease.html"                         
-## [6] "http://www.informatics.jax.org/marker/reference/25500142"</code></pre>
+## [5] "http://pubmedcentralcanada.ca/pmcc/articles/pmid/25500142"              
+## [6] "https://medlineplus.gov/alzheimersdisease.html"                         
+## [7] "http://www.informatics.jax.org/reference/25500142"</code></pre>
 <p>The full list of options for the <code>cmd</code> argument are given in in-line documentation (<code>?entrez_link</code>). If you are interested in finding full text records for a large number of articles checkout the package <a href="https://github.com/ropensci/fulltext">fulltext</a> which makes use of multiple sources (including the NCBI) to discover the full text articles.</p>
 </div>
 <div id="using-more-than-one-id" class="section level3">
 <h3>Using more than one ID</h3>
 <p>It is possible to pass more than one ID to <code>entrez_link()</code>. By default, doing so will give you a single elink object containing the complete set of links for <em>all</em> of the IDs that you specified. So, if you were looking for protein IDs related to specific genes you could do:</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_links_together  <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">db=</span><span class="st">"protein"</span>, <span class="dt">dbfrom=</span><span class="st">"gene"</span>, <span class="dt">id=</span><span class="kw">c</span>(<span class="st">"93100"</span>, <span class="st">"223646"</span>))
-all_links_together</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_links_together  <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">db=</span><span class="st">"protein"</span>, <span class="dt">dbfrom=</span><span class="st">"gene"</span>, <span class="dt">id=</span><span class="kw">c</span>(<span class="st">"93100"</span>, <span class="st">"223646"</span>))
+all_links_together</code></pre></div>
 <pre><code>## elink object with contents:
 ##  $links: IDs for linked records from NCBI
 ## </code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">all_links_together$links$gene_protein</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_links_together$links$gene_protein</code></pre></div>
 <pre><code>##  [1] "1034662002" "1034662000" "1034661998" "1034661996" "1034661994"
 ##  [6] "1034661992" "558472750"  "545685826"  "194394158"  "166221824" 
 ## [11] "154936864"  "148697547"  "148697546"  "122346659"  "119602646" 
@@ -404,12 +446,12 @@ all_links_together</code></pre>
 ## [26] "37787305"   "37589273"   "33991172"   "31982089"   "26339824"  
 ## [31] "26329351"   "21619615"   "10834676"</code></pre>
 <p>Although this behaviour might sometimes be useful, it means we’ve lost track of which <code>protein</code> ID is linked to which <code>gene</code> ID. To retain that information we can set <code>by_id</code> to <code>TRUE</code>. This gives us a list of elink objects, each once containing links from a single <code>gene</code> ID:</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_links_sep  <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">db=</span><span class="st">"protein"</span>, <span class="dt">dbfrom=</span><span class="st">"gene"</span>, <span class="dt">id=</span><span class="kw">c</span>(<span class="st">"93100"</span>, <span class="st">"223646"</span>), <span class="dt">by_id=</span><span class="ot">TRUE</span>)
-all_links_sep</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_links_sep  <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">db=</span><span class="st">"protein"</span>, <span class="dt">dbfrom=</span><span class="st">"gene"</span>, <span class="dt">id=</span><span class="kw">c</span>(<span class="st">"93100"</span>, <span class="st">"223646"</span>), <span class="dt">by_id=</span><span class="ot">T [...]
+all_links_sep</code></pre></div>
 <pre><code>## List of 2 elink objects,each containing
 ##   $links: IDs for linked records from NCBI
 ## </code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">lapply</span>(all_links_sep, function(x) x$links$gene_protein)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">lapply</span>(all_links_sep, function(x) x$links$gene_protein)</code></pre></div>
 <pre><code>## [[1]]
 ##  [1] "1034662002" "1034662000" "1034661998" "1034661996" "1034661994"
 ##  [6] "1034661992" "558472750"  "545685826"  "194394158"  "166221824" 
@@ -428,8 +470,8 @@ all_links_sep</code></pre>
 <div id="the-summary-record" class="section level3">
 <h3>The summary record</h3>
 <p><code>entrez_summary()</code> takes a vector of unique IDs for the samples you want to get summary information from. Let’s start by finding out something about the paper describing <a href="https://github.com/ropensci/taxize">Taxize</a>, using its PubMed ID:</p>
-<pre class="sourceCode r"><code class="sourceCode r">taxize_summ <-<span class="st"> </span><span class="kw">entrez_summary</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span><span class="dv">24555091</span>)
-taxize_summ</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">taxize_summ <-<span class="st"> </span><span class="kw">entrez_summary</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span><span class="dv">24555091</span>)
+taxize_summ</code></pre></div>
 <pre><code>## esummary result with 42 items:
 ##  [1] uid               pubdate           epubdate         
 ##  [4] source            authors           lastauthor       
@@ -446,7 +488,7 @@ taxize_summ</code></pre>
 ## [37] doccontriblist    docdate           bookname         
 ## [40] chapter           sortpubdate       sortfirstauthor</code></pre>
 <p>Once again, the object returned by <code>entrez_summary</code> behaves like a list, so you can extract elements using <code>$</code>. For instance, we could convert our PubMed ID to another article identifier…</p>
-<pre class="sourceCode r"><code class="sourceCode r">taxize_summ$articleids</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">taxize_summ$articleids</code></pre></div>
 <pre><code>##       idtype idtypen                           value
 ## 1     pubmed       1                        24555091
 ## 2        doi       3 10.12688/f1000research.2-191.v2
@@ -457,17 +499,17 @@ taxize_summ</code></pre>
 ## 7 version-id       8                               2
 ## 8      pmcid       5             pmc-id: PMC3901538;</code></pre>
 <p>…or see how many times the article has been cited in PubMed Central papers</p>
-<pre class="sourceCode r"><code class="sourceCode r">taxize_summ$pmcrefcount</code></pre>
-<pre><code>## [1] 7</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">taxize_summ$pmcrefcount</code></pre></div>
+<pre><code>## [1] 10</code></pre>
 </div>
 <div id="dealing-with-many-records" class="section level3">
 <h3>Dealing with many records</h3>
 <p>If you give <code>entrez_summary()</code> a vector with more than one ID you’ll get a list of summary records back. Let’s get those <em>Plasmodium vivax</em> papers we found in the <code>entrez_search()</code> section back, and fetch some summary data on each paper:</p>
-<pre class="sourceCode r"><code class="sourceCode r">vivax_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db =</span> <span class="st">"pubmed"</span>,
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">vivax_search <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db =</span> <span class="st">"pubmed"</span>,
                               <span class="dt">term =</span> <span class="st">"(vivax malaria[MeSH]) AND (folic acid antagonists[MeSH])"</span>)
-multi_summs <-<span class="st"> </span><span class="kw">entrez_summary</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span>vivax_search$ids)</code></pre>
+multi_summs <-<span class="st"> </span><span class="kw">entrez_summary</span>(<span class="dt">db=</span><span class="st">"pubmed"</span>, <span class="dt">id=</span>vivax_search$ids)</code></pre></div>
 <p><code>rentrez</code> provides a helper function, <code>extract_from_esummary()</code> that takes one or more elements from every summary record in one of these lists. Here it is working with one…</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">extract_from_esummary</span>(multi_summs, <span class="st">"fulljournalname"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">extract_from_esummary</span>(multi_summs, <span class="st">"fulljournalname"</span>)</code></pre></div>
 <pre><code>##                                                                                                                 24861816 
 ## "Infection, genetics and evolution : journal of molecular epidemiology and evolutionary genetics in infectious diseases" 
 ##                                                                                                                 24145518 
@@ -493,8 +535,8 @@ multi_summs <-<span class="st"> </span><span class="kw">entrez_summary</span>
 ##                                                                                                                 12374849 
 ##                                        "Proceedings of the National Academy of Sciences of the United States of America"</code></pre>
 <p>… and several elements:</p>
-<pre class="sourceCode r"><code class="sourceCode r">date_and_cite <-<span class="st"> </span><span class="kw">extract_from_esummary</span>(multi_summs, <span class="kw">c</span>(<span class="st">"pubdate"</span>, <span class="st">"pmcrefcount"</span>,  <span class="st">"title"</span>))
-knitr::<span class="kw">kable</span>(<span class="kw">head</span>(<span class="kw">t</span>(date_and_cite)), <span class="dt">row.names=</span><span class="ot">FALSE</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">date_and_cite <-<span class="st"> </span><span class="kw">extract_from_esummary</span>(multi_summs, <span class="kw">c</span>(<span class="st">"pubdate"</span>, <span class="st">"pmcrefcount"</span>,  <span class="st">"title"</span>))
+knitr::<span class="kw">kable</span>(<span class="kw">head</span>(<span class="kw">t</span>(date_and_cite)), <span class="dt">row.names=</span><span class="ot">FALSE</span>)</code></pre></div>
 <table>
 <thead>
 <tr class="header">
@@ -521,7 +563,7 @@ knitr::<span class="kw">kable</span>(<span class="kw">head</span>(<span class="k
 </tr>
 <tr class="even">
 <td align="left">2012 Dec</td>
-<td align="left">11</td>
+<td align="left">13</td>
 <td align="left">Prevalence of drug resistance-associated gene mutations in Plasmodium vivax in Central China.</td>
 </tr>
 <tr class="odd">
@@ -531,7 +573,7 @@ knitr::<span class="kw">kable</span>(<span class="kw">head</span>(<span class="k
 </tr>
 <tr class="even">
 <td align="left">2010 Sep</td>
-<td align="left">15</td>
+<td align="left">17</td>
 <td align="left">Mutations in the antifolate-resistance-associated genes dihydrofolate reductase and dihydropteroate synthase in Plasmodium vivax isolates from malaria-endemic countries.</td>
 </tr>
 </tbody>
@@ -544,20 +586,20 @@ knitr::<span class="kw">kable</span>(<span class="kw">head</span>(<span class="k
 <div id="fetch-dna-sequences-in-fasta-format" class="section level3">
 <h3>Fetch DNA sequences in fasta format</h3>
 <p>Let’s extend the example given in the <code>entrez_link()</code> section about finding transcript for a given gene. This time we will fetch cDNA sequences of those transcripts.We can start by repeating the steps in the earlier example to get nucleotide IDs for refseq transcripts of two genes:</p>
-<pre class="sourceCode r"><code class="sourceCode r">gene_ids <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">351</span>, <span class="dv">11647</span>)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">gene_ids <-<span class="st"> </span><span class="kw">c</span>(<span class="dv">351</span>, <span class="dv">11647</span>)
 linked_seq_ids <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"gene"</span>, <span class="dt">id=</span>gene_ids, <span class="dt">db=</span><span class="st">"nuccore"</span>)
 linked_transripts <-<span class="st"> </span>linked_seq_ids$links$gene_nuccore_refseqrna
-<span class="kw">head</span>(linked_transripts)</code></pre>
+<span class="kw">head</span>(linked_transripts)</code></pre></div>
 <pre><code>## [1] "1039766414" "1039766413" "1039766411" "1039766410" "1039766409"
 ## [6] "563317856"</code></pre>
 <p>Now we can get our sequences with <code>entrez_fetch</code>, setting <code>rettype</code> to “fasta” (the list of formats available for <a href="http://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/">each database is give in this table</a>):</p>
-<pre class="sourceCode r"><code class="sourceCode r">all_recs <-<span class="st"> </span><span class="kw">entrez_fetch</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">id=</span>linked_transripts, <span class="dt">rettype=</span><span class="st">"fasta"</span>)
-<span class="kw">class</span>(all_recs)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">all_recs <-<span class="st"> </span><span class="kw">entrez_fetch</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">id=</span>linked_transripts, <span class="dt">rettype=</span><span class="st">"fasta"</span>)
+<span class="kw">class</span>(all_recs)</code></pre></div>
 <pre><code>## [1] "character"</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">nchar</span>(all_recs)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">nchar</span>(all_recs)</code></pre></div>
 <pre><code>## [1] 55183</code></pre>
 <p>Congratulations, now you have a really huge character vector! Rather than printing all those thousands of bases we can take a peak at the top of the file:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cat</span>(<span class="kw">strwrap</span>(<span class="kw">substr</span>(all_recs, <span class="dv">1</span>, <span class="dv">500</span>)), <span class="dt">sep=</span><span class="st">"</span><span class="ch">\n</span><span class="st">"</span>)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">cat</span>(<span class="kw">strwrap</span>(<span class="kw">substr</span>(all_recs, <span class="dv">1</span>, <span class="dv">500</span>)), <span class="dt">sep=</span><span class="st">"</span><span class="ch">\n</span><span class="st">"</span>)</code></pre></div>
 <pre><code>## >XM_006538500.2 PREDICTED: Mus musculus alkaline phosphatase,
 ## liver/bone/kidney (Alpl), transcript variant X5, mRNA
 ## GCGCCCGTGGCTTGCGCGACTCCCACGCGCGCGCTCCGCCGGTCCCGCAGTGACTGTCCCAGCCACGGTG
@@ -567,30 +609,31 @@ linked_transripts <-<span class="st"> </span>linked_seq_ids$links$gene_nuccor
 ## GTTGGTGTCTAAAGTAGTTGGGGAGCAGCAGGAAGAAGGCACGTGCTGCGATCTTTGGCGGGAGAGATCG
 ## GAGACCGCGTGCTAGTGTCTGTCTGAGAG</code></pre>
 <p>If we wanted to use these sequences in some other application we could write them to file:</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">write</span>(all_recs, <span class="dt">file=</span><span class="st">"my_transcripts.fasta"</span>)</code></pre>
-<p>Alternatively, if you want to use them within an R session<br />we could write them to a temporary file then read that. In this case I’m using <code>read.dna()</code> from the pylogenetics package ape (but not executing the code block in this vignette, so you don’t have to install that package):</p>
-<pre class="sourceCode r"><code class="sourceCode r">temp <-<span class="st"> </span><span class="kw">tempfile</span>()
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">write</span>(all_recs, <span class="dt">file=</span><span class="st">"my_transcripts.fasta"</span>)</code></pre></div>
+<p>Alternatively, if you want to use them within an R session<br />
+we could write them to a temporary file then read that. In this case I’m using <code>read.dna()</code> from the pylogenetics package ape (but not executing the code block in this vignette, so you don’t have to install that package):</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">temp <-<span class="st"> </span><span class="kw">tempfile</span>()
 <span class="kw">write</span>(all_recs, temp)
-parsed_recs <-<span class="st"> </span>ape::<span class="kw">read.dna</span>(all_recs, temp)</code></pre>
+parsed_recs <-<span class="st"> </span>ape::<span class="kw">read.dna</span>(all_recs, temp)</code></pre></div>
 </div>
 <div id="fetch-a-parsed-xml-document" class="section level3">
 <h3>Fetch a parsed XML document</h3>
 <p>Most of the NCBI’s databases can return records in XML format. In additional to downloading the text-representation of these files, <code>entrez_fetch()</code> can return objects parsed by the <code>XML</code> package. As an example, we can check out the Taxonomy database’s record for (did I mention they are amazing….) <em>Tetrahymena thermophila</em>, specifying we want the result to be parsed by setting <code>parsed=TRUE</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r">Tt <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"taxonomy"</span>, <span class="dt">term=</span><span class="st">"(Tetrahymena thermophila[ORGN]) AND Species[RANK]"</span>)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">Tt <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"taxonomy"</span>, <span class="dt">term=</span><span class="st">"(Tetrahymena thermophila[ORGN]) AND Species[RANK]"</span>)
 tax_rec <-<span class="st"> </span><span class="kw">entrez_fetch</span>(<span class="dt">db=</span><span class="st">"taxonomy"</span>, <span class="dt">id=</span>Tt$ids, <span class="dt">rettype=</span><span class="st">"xml"</span>, <span class="dt">parsed=</span><span class="ot">TRUE</span>)
-<span class="kw">class</span>(tax_rec)</code></pre>
+<span class="kw">class</span>(tax_rec)</code></pre></div>
 <pre><code>## [1] "XMLInternalDocument" "XMLAbstractDocument"</code></pre>
 <p>The package XML (which you have if you have installed <code>rentrez</code>) provides functions to get information from these files. For relatively simple records like this one you can use <code>XML::xmlToList</code>:</p>
-<pre class="sourceCode r"><code class="sourceCode r">tax_list <-<span class="st"> </span>XML::<span class="kw">xmlToList</span>(tax_rec)
-tax_list$Taxon$GeneticCode</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">tax_list <-<span class="st"> </span>XML::<span class="kw">xmlToList</span>(tax_rec)
+tax_list$Taxon$GeneticCode</code></pre></div>
 <pre><code>## $GCId
 ## [1] "6"
 ## 
 ## $GCName
 ## [1] "Ciliate Nuclear; Dasycladacean Nuclear; Hexamita Nuclear"</code></pre>
 <p>For more complex records, which generate deeply-nested lists, you can use <a href="https://en.wikipedia.org/wiki/XPath">XPath expressions</a> along with the function <code>XML::xpathSApply</code> or the extraction operatord <code>[</code> and <code>[[</code> to extract specific parts of the file. For instance, we can get the scientific name of each taxon in <em>T. thermophila</em>’s lineage by specifying a path through the XML</p>
-<pre class="sourceCode r"><code class="sourceCode r">tt_lineage <-<span class="st"> </span>tax_rec[<span class="st">"//LineageEx/Taxon/ScientificName"</span>]
-tt_lineage[<span class="dv">1</span>:<span class="dv">4</span>]</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">tt_lineage <-<span class="st"> </span>tax_rec[<span class="st">"//LineageEx/Taxon/ScientificName"</span>]
+tt_lineage[<span class="dv">1</span>:<span class="dv">4</span>]</code></pre></div>
 <pre><code>## [[1]]
 ## <ScientificName>cellular organisms</ScientificName> 
 ## 
@@ -603,7 +646,7 @@ tt_lineage[<span class="dv">1</span>:<span class="dv">4</span>]</code></pre>
 ## [[4]]
 ## <ScientificName>Ciliophora</ScientificName></code></pre>
 <p>As the name suggests, <code>XML::xpathSApply()</code> is a counterpart of base R’s <code>sapply</code>, and can be used to apply a function to nodes in an XML object. A particularly useful function to apply is <code>XML::xmlValue</code>, which returns the content of the node:</p>
-<pre class="sourceCode r"><code class="sourceCode r">XML::<span class="kw">xpathSApply</span>(tax_rec, <span class="st">"//LineageEx/Taxon/ScientificName"</span>, XML::xmlValue)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">XML::<span class="kw">xpathSApply</span>(tax_rec, <span class="st">"//LineageEx/Taxon/ScientificName"</span>, XML::xmlValue)</code></pre></div>
 <pre><code>##  [1] "cellular organisms" "Eukaryota"          "Alveolata"         
 ##  [4] "Ciliophora"         "Intramacronucleata" "Oligohymenophorea" 
 ##  [7] "Hymenostomatida"    "Tetrahymenina"      "Tetrahymenidae"    
@@ -618,49 +661,50 @@ tt_lineage[<span class="dv">1</span>:<span class="dv">4</span>]</code></pre>
 <div id="post-a-set-of-ids-to-the-ncbi-for-later-use-entrez_post" class="section level3">
 <h3>Post a set of IDs to the NCBI for later use: <code>entrez_post()</code></h3>
 <p>If you have a list of many NCBI IDs that you want to use later on, you can post them to the NCBI’s severs. In order to provide a brief example, I’m going to post just one ID, the <code>omim</code> identifier for asthma:</p>
-<pre class="sourceCode r"><code class="sourceCode r">upload <-<span class="st"> </span><span class="kw">entrez_post</span>(<span class="dt">db=</span><span class="st">"omim"</span>, <span class="dt">id=</span><span class="dv">600807</span>)
-upload</code></pre>
-<pre><code>## Web history object (QueryKey = 1, WebEnv = NCID_1_54492...)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">upload <-<span class="st"> </span><span class="kw">entrez_post</span>(<span class="dt">db=</span><span class="st">"omim"</span>, <span class="dt">id=</span><span class="dv">600807</span>)
+upload</code></pre></div>
+<pre><code>## Web history object (QueryKey = 1, WebEnv = NCID_1_20644...)</code></pre>
 <p>The NCBI sends you back some information you can use to refer to the posted IDs. In <code>rentrez</code>, that information is represented as a <code>web_history</code> object.</p>
+<p>Note that if you have a very long list of IDs you may receive a 414 error when you try to upload them. If you have such a list (and they come from an external sources rather than a search that can be save to a <code>web_history</code> object), you may have to ‘chunk’ the IDs into smaller sets that can processed.</p>
 </div>
 <div id="get-a-web_history-object-from-entrez_search-or-entrez_link" class="section level3">
 <h3>Get a <code>web_history</code> object from <code>entrez_search</code> or <code>entrez_link()</code></h3>
 <p>In addition to directly uploading IDs to the NCBI, you can use the web history features with <code>entrez_search</code> and <code>entrez_link</code>. For instance, imagine you wanted to find all of the sequences of the widely-studied gene COI from all snails (which are members of the taxonomic group Gastropoda):</p>
-<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">term=</span><span class="st">"COI[Gene] AND Gastropoda[ORGN]"</span>)</code></pre>
-<pre><code>## Entrez search result with 61927 hits (object contains 20 IDs and no web_history object)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">term=</span><span class="st">"COI[Gene] AND Gastropoda[ORGN]"</span>)</code></pre></div>
+<pre><code>## Entrez search result with 67898 hits (object contains 20 IDs and no web_history object)
 ##  Search term (as translated):  COI[Gene] AND "Gastropoda"[Organism]</code></pre>
 <p>That’s a lot of sequences! If you really wanted to download all of these it would be a good idea to save all those IDs to the server by setting <code>use_history</code> to <code>TRUE</code> (note you now get a <code>web_history</code> object along with your normal search result):</p>
-<pre class="sourceCode r"><code class="sourceCode r">snail_coi <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">term=</span><span class="st">"COI[Gene] AND Gastropoda[ORGN]"</span>, <span class="dt">use_history=</span><span class="ot">TRUE</span>)
-snail_coi</code></pre>
-<pre><code>## Entrez search result with 61927 hits (object contains 20 IDs and a web_history object)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">snail_coi <-<span class="st"> </span><span class="kw">entrez_search</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">term=</span><span class="st">"COI[Gene] AND Gastropoda[ORGN]"</span>, <span class="dt">use_history=</span><span class="ot">TRUE</span>)
+snail_coi</code></pre></div>
+<pre><code>## Entrez search result with 67898 hits (object contains 20 IDs and a web_history object)
 ##  Search term (as translated):  COI[Gene] AND "Gastropoda"[Organism]</code></pre>
-<pre class="sourceCode r"><code class="sourceCode r">snail_coi$web_history</code></pre>
-<pre><code>## Web history object (QueryKey = 1, WebEnv = NCID_1_54493...)</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">snail_coi$web_history</code></pre></div>
+<pre><code>## Web history object (QueryKey = 1, WebEnv = NCID_1_20752...)</code></pre>
 <p>Similarity, <code>entrez_link()</code> can return <code>web_history</code> objects by using the <code>cmd</code> <code>neighbor_history</code>. Let’s find genetic variants (from the clinvar database) associated with asthma (using the same OMIM ID we identified earlier):</p>
-<pre class="sourceCode r"><code class="sourceCode r">asthma_clinvar <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"omim"</span>, <span class="dt">db=</span><span class="st">"clinvar"</span>, <span class="dt">cmd=</span><span class="st">"neighbor_history"</span>, <span class="dt">id=</span><span class="dv">600807</span>)
-asthma_clinvar$web_histories</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">asthma_clinvar <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"omim"</span>, <span class="dt">db=</span><span class="st">"clinvar"</span>, <span class="dt">cmd=</span><span class="st">"neighbor_history"</span>, <span class="dt">id=</span><span class="dv">600807</span>)
+asthma_clinvar$web_histories</code></pre></div>
 <pre><code>## $omim_clinvar
-## Web history object (QueryKey = 1, WebEnv = NCID_1_51728...)</code></pre>
+## Web history object (QueryKey = 1, WebEnv = NCID_1_20752...)</code></pre>
 <p>As you can see, instead of returning lists of IDs for each linked database (as it would be default), <code>entrez_link()</code> now returns a list of web_histories.</p>
 </div>
 <div id="use-a-web_history-object" class="section level3">
 <h3>Use a <code>web_history</code> object</h3>
 <p>Once you have those IDs stored on the NCBI’s servers, you are going to want to do something with them. The functions <code>entrez_fetch()</code> <code>entrez_summary()</code> and <code>entrez_link()</code> can all use <code>web_history</code> objects in exactly the same way they use IDs.</p>
 <p>So, we could repeat the last example (finding variants linked to asthma), but this time using the ID we uploaded earlier</p>
-<pre class="sourceCode r"><code class="sourceCode r">asthma_variants <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"omim"</span>, <span class="dt">db=</span><span class="st">"clinvar"</span>, <span class="dt">cmd=</span><span class="st">"neighbor_history"</span>, <span class="dt">web_history=</span>upload)
-asthma_variants</code></pre>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">asthma_variants <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"omim"</span>, <span class="dt">db=</span><span class="st">"clinvar"</span>, <span class="dt">cmd=</span><span class="st">"neighbor_history"</span>, <span class="dt">web_history=</span>upload)
+asthma_variants</code></pre></div>
 <pre><code>## elink object with contents:
 ##  $web_histories: Objects containing web history information</code></pre>
 <p>… if we want to get some genetic information about these variants we need to map our clinvar IDs to SNP IDs:</p>
-<pre class="sourceCode r"><code class="sourceCode r">snp_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"clinvar"</span>, <span class="dt">db=</span><span class="st">"snp"</span>, 
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">snp_links <-<span class="st"> </span><span class="kw">entrez_link</span>(<span class="dt">dbfrom=</span><span class="st">"clinvar"</span>, <span class="dt">db=</span><span class="st">"snp"</span>, 
                          <span class="dt">web_history=</span>asthma_variants$web_histories$omim_clinvar,
                          <span class="dt">cmd=</span><span class="st">"neighbor_history"</span>)
 snp_summ <-<span class="st"> </span><span class="kw">entrez_summary</span>(<span class="dt">db=</span><span class="st">"snp"</span>, <span class="dt">web_history=</span>snp_links$web_histories$clinvar_snp)
-knitr::<span class="kw">kable</span>(<span class="kw">extract_from_esummary</span>(snp_summ, <span class="kw">c</span>(<span class="st">"chr"</span>, <span class="st">"fxn_class"</span>, <span class="st">"global_maf"</span>)))</code></pre>
+knitr::<span class="kw">kable</span>(<span class="kw">extract_from_esummary</span>(snp_summ, <span class="kw">c</span>(<span class="st">"chr"</span>, <span class="st">"fxn_class"</span>, <span class="st">"global_maf"</span>)))</code></pre></div>
 <table>
 <thead>
 <tr class="header">
-<th align="left"></th>
+<th></th>
 <th align="left">41364547</th>
 <th align="left">11558538</th>
 <th align="left">2303067</th>
@@ -669,21 +713,21 @@ knitr::<span class="kw">kable</span>(<span class="kw">extract_from_esummary</spa
 </thead>
 <tbody>
 <tr class="odd">
-<td align="left">chr</td>
+<td>chr</td>
 <td align="left">11</td>
 <td align="left">2</td>
 <td align="left">5</td>
 <td align="left">5</td>
 </tr>
 <tr class="even">
-<td align="left">fxn_class</td>
+<td>fxn_class</td>
 <td align="left">intron-variant,utr-variant-5-prime</td>
-<td align="left">missense,reference,utr-variant-5-prime</td>
+<td align="left">intron-variant,missense,nc-transcript-variant,reference,utr-variant-5-prime</td>
 <td align="left">missense,reference</td>
 <td align="left">missense,reference</td>
 </tr>
 <tr class="odd">
-<td align="left">global_maf</td>
+<td>global_maf</td>
 <td align="left">A=0.0036/18</td>
 <td align="left">T=0.0595/298</td>
 <td align="left">G=0.4331/2169</td>
@@ -693,12 +737,12 @@ knitr::<span class="kw">kable</span>(<span class="kw">extract_from_esummary</spa
 </table>
 <p>If you really wanted to you could also use <code>web_history</code> objects to download all those thousands of COI sequences. When downloading large sets of data, it is a good idea to take advantage of the arguments <code>retmax</code> and <code>restart</code> to split the request up into smaller chunks. For instance, we could get the first 200 sequences in 50-sequence chunks:</p>
 <p>(note: this code block is not executed as part of the vignette to save time and bandwidth):</p>
-<pre class="sourceCode r"><code class="sourceCode r">for( seq_start in <span class="kw">seq</span>(<span class="dv">1</span>,<span class="dv">200</span>,<span class="dv">50</span>)){
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">for( seq_start in <span class="kw">seq</span>(<span class="dv">1</span>,<span class="dv">200</span>,<span class="dv">50</span>)){
     recs <-<span class="st"> </span><span class="kw">entrez_fetch</span>(<span class="dt">db=</span><span class="st">"nuccore"</span>, <span class="dt">web_history=</span>snail_coi$web_history,
                          <span class="dt">rettype=</span><span class="st">"fasta"</span>, <span class="dt">retmax=</span><span class="dv">50</span>, <span class="dt">retstart=</span>seq_start)
     <span class="kw">cat</span>(recs, <span class="dt">file=</span><span class="st">"snail_coi.fasta"</span>, <span class="dt">append=</span><span class="ot">TRUE</span>)
     <span class="kw">cat</span>(seq_start<span class="dv">+49</span>, <span class="st">"sequences downloaded</span><span class="ch">\r</span><span class="st">"</span>)
-}</code></pre>
+}</code></pre></div>
 </div>
 </div>
 <div id="what-next" class="section level2">
@@ -713,7 +757,7 @@ knitr::<span class="kw">kable</span>(<span class="kw">extract_from_esummary</spa
   (function () {
     var script = document.createElement("script");
     script.type = "text/javascript";
-    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    script.src  = "https://mathjax.rstudio.com/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
     document.getElementsByTagName("head")[0].appendChild(script);
   })();
 </script>
diff --git a/man/entrez_citmatch.Rd b/man/entrez_citmatch.Rd
index d27fcc5..ea989e4 100644
--- a/man/entrez_citmatch.Rd
+++ b/man/entrez_citmatch.Rd
@@ -37,4 +37,3 @@ entrez_citmatch(ex_cites)
 \seealso{
 \code{\link[httr]{config}} for available configs
 }
-
diff --git a/man/entrez_db_links.Rd b/man/entrez_db_links.Rd
index b0eb60e..cf7e6f1 100644
--- a/man/entrez_db_links.Rd
+++ b/man/entrez_db_links.Rd
@@ -41,4 +41,3 @@ Other einfo: \code{\link{entrez_db_searchable}},
   \code{\link{entrez_db_summary}},
   \code{\link{entrez_dbs}}, \code{\link{entrez_info}}
 }
-
diff --git a/man/entrez_db_searchable.Rd b/man/entrez_db_searchable.Rd
index d96d018..846e59c 100644
--- a/man/entrez_db_searchable.Rd
+++ b/man/entrez_db_searchable.Rd
@@ -38,4 +38,3 @@ Other einfo: \code{\link{entrez_db_links}},
   \code{\link{entrez_db_summary}},
   \code{\link{entrez_dbs}}, \code{\link{entrez_info}}
 }
-
diff --git a/man/entrez_db_summary.Rd b/man/entrez_db_summary.Rd
index 242f3c5..da17723 100644
--- a/man/entrez_db_summary.Rd
+++ b/man/entrez_db_summary.Rd
@@ -37,4 +37,3 @@ Other einfo: \code{\link{entrez_db_links}},
   \code{\link{entrez_db_searchable}},
   \code{\link{entrez_dbs}}, \code{\link{entrez_info}}
 }
-
diff --git a/man/entrez_dbs.Rd b/man/entrez_dbs.Rd
index 490563c..ae5aaf1 100644
--- a/man/entrez_dbs.Rd
+++ b/man/entrez_dbs.Rd
@@ -26,4 +26,3 @@ Other einfo: \code{\link{entrez_db_links}},
   \code{\link{entrez_db_summary}},
   \code{\link{entrez_info}}
 }
-
diff --git a/man/entrez_fetch.Rd b/man/entrez_fetch.Rd
index 87fa96c..0704b99 100644
--- a/man/entrez_fetch.Rd
+++ b/man/entrez_fetch.Rd
@@ -10,7 +10,9 @@ entrez_fetch(db, id = NULL, web_history = NULL, rettype, retmode = "",
 \arguments{
 \item{db}{character, name of the database to use}
 
-\item{id}{vector (numeric or character), unique ID(s) for records in database \code{db}}
+\item{id}{vector (numeric or character), unique ID(s) for records in database
+\code{db}. In the case of sequence databases these IDs can take form of an
+NCBI accession followed by a version number (eg AF123456.1 or AF123456.2).}
 
 \item{web_history, }{a web_history object}
 
@@ -38,7 +40,9 @@ A set of unique identifiers mush be specified with either the \code{db}
 argument (which directly specifies the IDs as a numeric or character vector)
 or a \code{web_history} object as returned by 
 \code{\link{entrez_link}}, \code{\link{entrez_search}} or 
-\code{\link{entrez_post}}. See Table 1 in the linked reference for the set of 
+\code{\link{entrez_post}}. See
+\href{https://www.ncbi.nlm.nih.gov/books/NBK25499/table/chapter4.T._valid_values_of__retmode_and/}{Table 1} 
+in the linked reference for the set of 
 formats available for each database. In particular, note that sequence
 databases (nuccore, protein and their relatives) use specific format names
 (eg "native", "ipg") for different flavours of xml.
@@ -62,6 +66,5 @@ kaitpo_seqs <- entrez_fetch(db="nuccore", id=katipo_search$ids, rettype="native"
 \url{http://www.ncbi.nlm.nih.gov/books/NBK25499/#_chapter4_EFetch_}
 }
 \seealso{
-\code{\link[httr]{config}} for available configs
+\code{\link[httr]{config}} for available '\code{httr}` configs
 }
-
diff --git a/man/entrez_global_query.Rd b/man/entrez_global_query.Rd
index 1e6ac60..bc42018 100644
--- a/man/entrez_global_query.Rd
+++ b/man/entrez_global_query.Rd
@@ -26,4 +26,3 @@ NCBI_data_on_best_butterflies_ever <- entrez_global_query(term="Heliconius")
 \seealso{
 \code{\link[httr]{config}} for available configs
 }
-
diff --git a/man/entrez_info.Rd b/man/entrez_info.Rd
index 1e24943..bad08bd 100644
--- a/man/entrez_info.Rd
+++ b/man/entrez_info.Rd
@@ -40,4 +40,3 @@ Other einfo: \code{\link{entrez_db_links}},
   \code{\link{entrez_db_searchable}},
   \code{\link{entrez_db_summary}}, \code{\link{entrez_dbs}}
 }
-
diff --git a/man/entrez_link.Rd b/man/entrez_link.Rd
index adbaa3b..a14e930 100644
--- a/man/entrez_link.Rd
+++ b/man/entrez_link.Rd
@@ -78,4 +78,3 @@ set for the \code{cmd} argument. Printing the returned object lists the names
 
 \code{entrez_db_links}
 }
-
diff --git a/man/entrez_post.Rd b/man/entrez_post.Rd
index ed6c741..b44e485 100644
--- a/man/entrez_post.Rd
+++ b/man/entrez_post.Rd
@@ -39,4 +39,3 @@ second <- entrez_fetch(db="nuccore", file_format="fasta", web_history=upload,
 \seealso{
 \code{\link[httr]{config}} for available httr configurations
 }
-
diff --git a/man/entrez_search.Rd b/man/entrez_search.Rd
index e01cf6c..9943bd1 100644
--- a/man/entrez_search.Rd
+++ b/man/entrez_search.Rd
@@ -53,6 +53,10 @@ Review[PTYP])'' in PubMed would identify articles matching the gene APP in
 humans, and exclude review articles. More examples of the use of these search
 terms, and the more specific MeSH terms for precise searching, 
 is given in the package vignette.
+
+The\code{rentrez} tutorial provides some tips on how to make the most of 
+searches to the NCBI. In particular, the sections on uses of the "Filter"
+field and MeSH terms may in formulating precise searches.
 }
 \examples{
 \dontrun{
@@ -83,4 +87,3 @@ entrez_search(db="taxonomy", term="Drosophila & Genus[RANK]")
 \code{\link{entrez_db_searchable}} to get a set of search fields that
 can be used in \code{term} for any database
 }
-
diff --git a/man/entrez_summary.Rd b/man/entrez_summary.Rd
index 2fa9d19..35d4844 100644
--- a/man/entrez_summary.Rd
+++ b/man/entrez_summary.Rd
@@ -5,12 +5,14 @@
 \title{Get summaries of objects in NCBI datasets from a unique ID}
 \usage{
 entrez_summary(db, id = NULL, web_history = NULL, version = c("2.0",
-  "1.0"), always_return_list = FALSE, config = NULL, ...)
+  "1.0"), always_return_list = FALSE, retmode = NULL, config = NULL, ...)
 }
 \arguments{
 \item{db}{character Name of the database to search for}
 
-\item{id}{vector with unique ID(s) for records in database \code{db}.}
+\item{id}{vector with unique ID(s) for records in database \code{db}.
+In the case of sequence databases these IDs can take form of an
+NCBI accession followed by a version number (eg AF123456.1 or AF123456.2)}
 
 \item{web_history}{A web_history object}
 
@@ -19,6 +21,9 @@ entrez_summary(db, id = NULL, web_history = NULL, version = c("2.0",
 \item{always_return_list}{logical, return a list  of esummary objects even
 when only one ID is provided (see description for a note about this option)}
 
+\item{retmode}{either "xml" or "json". By default, xml will be used for
+version 1.0 records, json for version 2.0.}
+
 \item{config}{vector configuration options passed to \code{httr::GET}}
 
 \item{\dots}{character Additional terms to add to the request, see NCBI
@@ -81,4 +86,3 @@ single objects as well as lists.
 \code{\link{extract_from_esummary}} which can be used to extract
 elements from a list of esummary records
 }
-
diff --git a/man/extract_from_esummary.Rd b/man/extract_from_esummary.Rd
index 9494960..89476b3 100644
--- a/man/extract_from_esummary.Rd
+++ b/man/extract_from_esummary.Rd
@@ -19,4 +19,3 @@ List or vector containing requested elements
 \description{
 Extract elements from a list of esummary records
 }
-
diff --git a/man/linkout_urls.Rd b/man/linkout_urls.Rd
index b94955a..160c013 100644
--- a/man/linkout_urls.Rd
+++ b/man/linkout_urls.Rd
@@ -19,4 +19,3 @@ Extract URLs from an elink object
 \seealso{
 entrez_link
 }
-
diff --git a/man/parse_pubmed_xml.Rd b/man/parse_pubmed_xml.Rd
index 24e3ecc..6ba88da 100644
--- a/man/parse_pubmed_xml.Rd
+++ b/man/parse_pubmed_xml.Rd
@@ -27,4 +27,3 @@ recs <- entrez_fetch(db="pubmed",
 parse_pubmed_xml(recs)
 
 }
-
diff --git a/man/rentrez.Rd b/man/rentrez.Rd
index 90466e7..e7ed961 100644
--- a/man/rentrez.Rd
+++ b/man/rentrez.Rd
@@ -4,6 +4,7 @@
 \name{rentrez}
 \alias{rentrez}
 \alias{rentrez-package}
+\alias{rentrez-package}
 \title{rentrez}
 \description{
 rentrez provides functions to search for, discover and download data from
@@ -20,4 +21,3 @@ The NCBI will ban IPs that don't use EUtils within their \href{http://www.ncbi.n
  /item  For large requests use the web history method (see examples for \code{\link{entrez_search}} or use \code{\link{entrez_post}} to upload IDs)
 }
 }
-
diff --git a/tests/testthat/test_fetch.r b/tests/testthat/test_fetch.r
index 30e95c1..7a0d43d 100644
--- a/tests/testthat/test_fetch.r
+++ b/tests/testthat/test_fetch.r
@@ -7,6 +7,9 @@ coi <- entrez_fetch(db = "popset", id = pop_ids[1],
 xml_rec <- entrez_fetch(db = "popset", id=pop_ids[1], rettype="native", parsed=TRUE)
 raw_rec <- entrez_fetch(db = "popset", id=pop_ids[1], rettype="native")
 
+acc_old = "AF123456.1"
+acc_new = "AF123456.2"
+
 test_that("httr does no warn about inferred encoding", {
     expect_message( entrez_fetch(db = "popset", id=pop_ids[1], rettype="uilist"), NA)
 })
@@ -28,4 +31,14 @@ test_that("Entrez_fetch record parsing works", {
 })
 
 
+test_that("Entrez fetch can download versioned sequences", {
+    #The two versions of this sequence have different annotations. We can check
+    #that we are getting the correct version of the record by checking the name
+    #of each sequence reflects the change in annotation.
+    old_rec = entrez_fetch(db="nuccore", id="AF123456.1", rettype="fasta")
+    new_rec = entrez_fetch(db="nuccore", id="AF123456.2", rettype="fasta")
+    expect_match(old_rec, "testis-specific mRNA")
+    expect_match(new_rec, "doublesex and mab-3 related transcription factor")
+})
 
+                           
diff --git a/tests/testthat/test_httr_post.r b/tests/testthat/test_httr_post.r
new file mode 100644
index 0000000..2ba78e7
--- /dev/null
+++ b/tests/testthat/test_httr_post.r
@@ -0,0 +1,19 @@
+context("POST (the HTTP verb)")
+
+are_there_any_cancer_papers <- entrez_search(db="pubmed", term="Cancer", retmax=201)
+search_ids <- are_there_any_cancer_papers$ids
+
+test_that("We can POST to NCBI epost", {
+    wh <- entrez_post(db="pubmed", id=search_ids)
+    expect_that(wh, is_a("web_history"))
+    expect_that(as.integer(wh$QueryKey), is_a("integer"))
+    expect_false(is.na(as.integer(wh$QueryKey)))    
+})
+
+test_that("We can fecth using POST", {
+    fetched_ids <- entrez_fetch(db="pubmed", id=search_ids, rettype="uilist")
+    expect( 
+        all( strsplit(fetched_ids, "\n")[[1]] %in% search_ids), 
+        "fetched IDs do not match sent IDs when using httr::POST"
+    )
+})
diff --git a/tests/testthat/test_link.r b/tests/testthat/test_link.r
index aece800..feb42b6 100644
--- a/tests/testthat/test_link.r
+++ b/tests/testthat/test_link.r
@@ -48,7 +48,7 @@ test_that("Elink sub-elements can be acessed and printed", {
     expect_output(print(all_the_commands[[3]][[1]]), 
                   "elink result with information from \\d+ databases")
     expect_output(print(all_the_commands[[8]]$linkouts[[1]]),
-                  "Linkout from [A-Za-z]+\\s+\\$Url")
+                  "Linkout from [ A-Za-z]+\\s+\\$Url")
 })
 
 
diff --git a/tests/testthat/test_query.r b/tests/testthat/test_query.r
index db2c248..81ee2a2 100644
--- a/tests/testthat/test_query.r
+++ b/tests/testthat/test_query.r
@@ -43,4 +43,7 @@ test_that("Query building functions work", {
 })
 
 
-
+test_that("We give a useful error when an empty ID vector is passed", {
+    ET <- entrez_search(db="taxonomy", term="Extraterrestrial[Organism]")
+    expect_error(entrez_fetch(db="taxonomy", id= ET$ids, rettype="uilist"))              
+})
diff --git a/tests/testthat/test_summary.r b/tests/testthat/test_summary.r
index e14527c..ce2aa47 100644
--- a/tests/testthat/test_summary.r
+++ b/tests/testthat/test_summary.r
@@ -1,12 +1,13 @@
 context("fetching and parsing summary recs")
 
-
+fake_ids <- sample(1e5, 501)
 pop_ids = c("307082412", "307075396", "307075338", "307075274")
 pop_summ_xml <- entrez_summary(db="popset", 
                                id=pop_ids, version="1.0")
 pop_summ_json <- entrez_summary(db="popset", 
                                 id=pop_ids, version="2.0")
-
+pop_summ_xml2 <- entrez_summary(db="popset", 
+                               id=pop_ids, version="1.0", retmode="xml")
 
 test_that("Functions to fetch summaries work", {
           #tests
@@ -29,6 +30,12 @@ test_that("List elements in XML are parsable", {
 })
          
 
+test_that("Version 2 xml records can be fetched and parsed", {
+    sapply(pop_summ_xml2, function(x)
+                 expect_that(x[["Title"]], matches("Muraenidae")))
+    expect_that(length(pop_summ_xml2[[1]]), is_more_than(12))
+})
+
 test_that("JSON and XML objects are similar", {
           #It would be nice to test whether the xml and json records
           # have the same data in them, but it turns out they don't
@@ -44,6 +51,13 @@ test_that("JSON and XML objects are similar", {
           
 })
 
+
+test_that("Error whent tring to fetch 1.0 summaries as json", {
+      expect_error(
+        entrez_summary("pubmed", id = fake_ids[1:10], version="1.0", retmode="json")
+      )
+})
+
 test_that("We can print summary records", {
       expect_output(print(pop_summ_json), "List of  4 esummary records")        
       expect_output(print(pop_summ_json[[1]]), "esummary result with \\d+ items")        
@@ -60,9 +74,16 @@ test_that("We can detect errors in esummary records", {
     )
 })
                          
+test_that("We can detect errors in esummary returns", {
+    expect_error(
+       entrez_summary(db="pmc", id=fake_ids, version="2.0")       
+    )
+})
+
 test_that("We can extract elements from esummary object", {
     expect_that(extract_from_esummary(pop_summ_xml, c("Title", "TaxId")), is_a("matrix"))
     expect_that(extract_from_esummary(pop_summ_xml, c("Title", "TaxId"), simplify=FALSE), is_a("list"))
+    expect_that(extract_from_esummary(pop_summ_xml2, c("Title", "TaxId"), simplify=FALSE), is_a("list"))
     expect_that(extract_from_esummary(pop_summ_json, "title"), is_a("character"))
    
 })
@@ -77,3 +98,11 @@ test_that("We can get a list of one element if we ask for it", {
     expect_that(entrez_summary(db="popset", id=307075396, always_return_list=TRUE), is_a("list"))
     expect_that(entrez_summary(db="popset", id=307075396), is_a("esummary"))
 })
+
+
+test_that("We can fetch summaries on versioned sequences", {
+    old_rec = entrez_summary(db="nuccore", id="AF123456.1")
+    new_rec = entrez_summary(db="nuccore", id="AF123456.2")
+    expect_match(old_rec$title, "testis-specific mRNA")
+    expect_match(new_rec$title, "doublesex and mab-3 related transcription factor")    
+})
diff --git a/vignettes/rentrez_tutorial.Rmd b/vignettes/rentrez_tutorial.Rmd
index cbaf1f6..2b39914 100644
--- a/vignettes/rentrez_tutorial.Rmd
+++ b/vignettes/rentrez_tutorial.Rmd
@@ -174,6 +174,22 @@ of available terms or any given data base with `entrez_db_searchable()`
 entrez_db_searchable("sra")
 ```
 
+### Using the Filter field
+
+"Filter" is a special field that, as the names suggests, allows you to limit 
+records returned by a search to set of filtering criteria. There is no programmatic 
+way to find the particular terms that can be used with the Filter field. 
+However, the NCBI's website provides an "advanced search" tool for some 
+databases that can be used to discover these terms. 
+
+
+For example, to find the list of possible to find all of the terms that can be
+used to filter searches to the nucleotide database using the 
+[advanced search for that databse](https://www.ncbi.nlm.nih.gov/nuccore/advanced).
+On that page selecting "Filter" from the first drop-down box then clicking 
+"Show index list" will allow the user to scroll through possible filtering
+terms.
+
 ###Precise queries using MeSH terms
 
 In addition to the search terms described above, the NCBI allows searches using
@@ -494,9 +510,9 @@ tax_list$Taxon$GeneticCode
 
 For more complex records, which generate deeply-nested lists, you can use
 [XPath expressions](https://en.wikipedia.org/wiki/XPath) along with the function 
-`XML::xpathSApply` or the extraction operatord `[` and `[[` to extract specific parts of the
-file. For instance, we can get the scientific name of each taxon in _T.
-thermophila_'s lineage by specifying a path through the XML
+`XML::xpathSApply` or the extraction operatord `[` and `[[` to extract specific 
+parts of the file. For instance, we can get the scientific name of each taxon 
+in _T. thermophila_'s lineage by specifying a path through the XML
 
 ```{r, Tt_path}
 tt_lineage <- tax_rec["//LineageEx/Taxon/ScientificName"]
@@ -535,6 +551,11 @@ upload
 The NCBI sends you back some information you can use to refer to the posted IDs. 
 In `rentrez`, that information is represented as a `web_history` object. 
 
+Note that if you have a very long list of IDs you may receive a 414 error when
+you try to upload them. If you have such a list (and they come from an external
+sources rather than a search that can be save to a `web_history` object), you
+may have to 'chunk' the IDs into smaller sets that can processed. 
+
 ###Get a `web_history` object from `entrez_search` or `entrez_link()`
 
 In addition to directly uploading IDs to the NCBI, you can use the web history

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-rentrez.git



More information about the debian-med-commit mailing list