[med-svn] [r-cran-stringr] 01/04: Imported Upstream version 1.0.0

Andreas Tille tille at debian.org
Sun Jun 28 06:07:53 UTC 2015


This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-stringr.

commit 2742b4ef5a655df119b6973b499644d84e4c20f9
Author: Andreas Tille <tille at debian.org>
Date:   Sun Jun 28 08:02:38 2015 +0200

    Imported Upstream version 1.0.0
---
 DESCRIPTION                                   |  39 ++--
 MD5                                           | 128 ++++++------
 NAMESPACE                                     |  16 ++
 NEWS                                          |  81 --------
 R/c.r                                         |  45 +++--
 R/case.R                                      |  31 +++
 R/checks.r                                    |  22 ---
 R/conv.R                                      |  17 ++
 R/count.r                                     |  42 ++--
 R/detect.r                                    |  50 +++--
 R/dup.r                                       |  22 +--
 R/extract.r                                   |  73 +++----
 R/length.r                                    |  35 +++-
 R/locate.r                                    | 116 ++++-------
 R/match.r                                     |  88 ++++-----
 R/modifiers.r                                 | 178 ++++++++++++-----
 R/pad-trim.r                                  |  69 +++----
 R/replace.r                                   | 103 ++++++----
 R/sort.R                                      |  30 +++
 R/split.r                                     | 128 +++++-------
 R/stringr.R                                   |   5 +
 R/sub.r                                       |  95 ++++-----
 R/subset.R                                    |  31 +++
 R/utils.R                                     |   9 +
 R/utils.r                                     |   1 -
 R/vectorise.r                                 |  38 ----
 R/word.r                                      |   6 +-
 R/wrap.r                                      |  18 +-
 README.md                                     |  49 +++--
 build/vignette.rds                            | Bin 0 -> 211 bytes
 inst/doc/stringr.R                            |  65 +++++++
 inst/doc/stringr.Rmd                          | 201 +++++++++++++++++++
 inst/doc/stringr.html                         | 267 ++++++++++++++++++++++++++
 inst/tests/test-check.r                       |  22 ---
 man/case.Rd                                   |  34 ++++
 man/fixed.Rd                                  |  27 ---
 man/ignore.case.Rd                            |  24 ---
 man/invert_match.Rd                           |  13 +-
 man/modifier-deprecated.Rd                    |  17 ++
 man/modifiers.Rd                              |  87 +++++++++
 man/perl.Rd                                   |  26 ---
 man/pipe.Rd                                   |  13 ++
 man/str_c.Rd                                  |  48 +++--
 man/str_conv.Rd                               |  25 +++
 man/str_count.Rd                              |  43 +++--
 man/str_detect.Rd                             |  35 ++--
 man/str_dup.Rd                                |  13 +-
 man/str_extract.Rd                            |  59 ++++--
 man/str_extract_all.Rd                        |  35 ----
 man/str_length.Rd                             |  38 +++-
 man/str_locate.Rd                             |  63 +++---
 man/str_locate_all.Rd                         |  46 -----
 man/str_match.Rd                              |  39 ++--
 man/str_match_all.Rd                          |  33 ----
 man/str_order.Rd                              |  40 ++++
 man/str_pad.Rd                                |  27 +--
 man/str_replace.Rd                            |  59 +++---
 man/str_replace_all.Rd                        |  46 -----
 man/str_replace_na.Rd                         |  28 +++
 man/str_split.Rd                              |  52 +++--
 man/str_split_fixed.Rd                        |  40 ----
 man/str_sub.Rd                                |  62 +++---
 man/str_sub_replace.Rd                        |  40 ----
 man/str_subset.Rd                             |  50 +++++
 man/str_trim.Rd                               |  17 +-
 man/str_wrap.Rd                               |  27 +--
 man/stringr.Rd                                |   9 +
 man/word.Rd                                   |  25 +--
 tests/{test-all.R => testthat.R}              |   2 +-
 {inst/tests => tests/testthat}/test-count.r   |   0
 {inst/tests => tests/testthat}/test-detect.r  |   9 +-
 {inst/tests => tests/testthat}/test-dup.r     |   0
 {inst/tests => tests/testthat}/test-extract.r |   4 +
 {inst/tests => tests/testthat}/test-join.r    |  10 +-
 {inst/tests => tests/testthat}/test-length.r  |   0
 {inst/tests => tests/testthat}/test-locate.r  |   0
 {inst/tests => tests/testthat}/test-match.r   |  14 +-
 {inst/tests => tests/testthat}/test-pad.r     |   0
 {inst/tests => tests/testthat}/test-split.r   |   0
 {inst/tests => tests/testthat}/test-sub.r     |   0
 {inst/tests => tests/testthat}/test-trim.r    |   0
 vignettes/stringr.Rmd                         | 201 +++++++++++++++++++
 82 files changed, 2244 insertions(+), 1356 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 65535bd..4466789 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,23 +1,24 @@
 Package: stringr
-Maintainer: Hadley Wickham <h.wickham at gmail.com>
+Version: 1.0.0
+Title: Simple, Consistent Wrappers for Common String Operations
+Description: A consistent, simple and easy to use set of wrappers around the
+    fantastic 'stringi' package. All function and argument names (and positions)
+    are consistent, all functions deal with "NA"'s and zero length vectors
+    in the same way, and the output from one function is easy to feed into
+    the input of another.
+Authors at R: c(
+    person("Hadley", "Wickham", , "hadley at rstudio.com", c("aut", "cre", "cph")),
+    person("RStudio", role = "cph")
+    )
 License: GPL-2
-Title: Make it easier to work with strings.
-Type: Package
-Author: Hadley Wickham <h.wickham at gmail.com>
-Description: stringr is a set of simple wrappers that make R's string
-        functions more consistent, simpler and easier to use.  It does
-        this by ensuring that: function and argument names (and
-        positions) are consistent, all functions deal with NA's and
-        zero length character appropriately, and the output data
-        structures from each function matches the input data structures
-        of other functions.
-Version: 0.6.2
 Depends: R (>= 2.14)
-Suggests: testthat (>= 0.3)
-Collate: 'c.r' 'checks.r' 'count.r' 'detect.r' 'dup.r' 'extract.r'
-        'length.r' 'locate.r' 'match.r' 'modifiers.r' 'pad-trim.r'
-        'replace.r' 'split.r' 'sub.r' 'vectorise.r' 'word.r' 'wrap.r'
-        'utils.r'
-Packaged: 2012-12-05 21:47:03 UTC; hadley
+Imports: stringi (>= 0.4.1), magrittr
+Suggests: testthat, knitr
+VignetteBuilder: knitr
+NeedsCompilation: no
+Packaged: 2015-04-29 12:46:34 UTC; hadley
+Author: Hadley Wickham [aut, cre, cph],
+  RStudio [cph]
+Maintainer: Hadley Wickham <hadley at rstudio.com>
 Repository: CRAN
-Date/Publication: 2012-12-06 08:39:59
+Date/Publication: 2015-04-30 11:48:24
diff --git a/MD5 b/MD5
index 13cb908..aba939d 100644
--- a/MD5
+++ b/MD5
@@ -1,61 +1,67 @@
-c14761e3000b5eaae6ff0aa5262a6630 *DESCRIPTION
-0cca89f86586b39903872782ad0c2d92 *NAMESPACE
-84346581b10f2d04598438e428488268 *NEWS
-3d81c2717c2e42df6b55d2b264502c0f *R/c.r
-1711405698cf3015be163e208f1eff40 *R/checks.r
-e55ebb86acca2cb4ee3026c7f2694457 *R/count.r
-729c91c257c5b4642c03a4db03f5f96a *R/detect.r
-93397237016f3ceaedbe6d2a4b1548b4 *R/dup.r
-2293db9f73f80f83f15286b5531c9766 *R/extract.r
-2b8caff655dbdccedc752c496c1f1041 *R/length.r
-95d57be6ed4e6e8786d39a0797098d72 *R/locate.r
-37072fa39b5a3a41f0cf0b54b6341a2e *R/match.r
-b650885277dba55b82664f2edf42aeec *R/modifiers.r
-27735fdcdf9c7bb201bf148cef58e7a5 *R/pad-trim.r
-44165abbb4926ee6bee46f3b94f425ef *R/replace.r
-f84ed242cc4fbc0b817e6fb9178388dd *R/split.r
-6f82f94cbc0d13367b53828923db57a2 *R/sub.r
-524e1f157dac5ce09334df6c6ae0c774 *R/utils.r
-b5557d805bac52a3e4c8ce27e5a02c35 *R/vectorise.r
-59b72fb1d808b48dc72ee1d22bb07f03 *R/word.r
-83d187f96099d2fb3e0e74fde6c5f180 *R/wrap.r
-1234794765eabc33a0e77211b8681688 *README.md
-f24da3c0d81e9d8fb6dc9eab10f303dd *inst/tests/test-check.r
-61f9d77768cf9ff813d382f9337178fb *inst/tests/test-count.r
-d2a6a58e44de1968cf46bd3e8c2d0e26 *inst/tests/test-detect.r
-065f752787f210c753d5bb5feea7f7a5 *inst/tests/test-dup.r
-bcdf3dd9ddd2d00d189d43eac13347a8 *inst/tests/test-extract.r
-8f03149944d3937c9b5a30d686c6e492 *inst/tests/test-join.r
-922366c3451f88871b9ce063529edb7a *inst/tests/test-length.r
-76249df3c11c62fb11aef63899029790 *inst/tests/test-locate.r
-d723b2fc4e6682042b9ac6339f2b4bdf *inst/tests/test-match.r
-3cfc28d6785f4a8c0796a7980c9aac90 *inst/tests/test-pad.r
-f339473f66b14267ec4b86db14b97820 *inst/tests/test-split.r
-c95563eafc4fad4c60504ae59225b9d0 *inst/tests/test-sub.r
-7dc6b256c7c2d3af1483b84698494819 *inst/tests/test-trim.r
-fad8767b5232cda34cea611d7e461796 *man/fixed.Rd
-828127f0f7f43842bd27ebe9e366a737 *man/ignore.case.Rd
-8aca10a194e602482768ad9540784f18 *man/invert_match.Rd
-9c4e9545741520f8728598ddd46acc1d *man/perl.Rd
-e7299ef80cd457c767f4c4701ac7ff1c *man/str_c.Rd
-acb3d1faa6c4d880146abd057dda2a13 *man/str_count.Rd
-3d113fd04cb4e133aa847b9919b26e94 *man/str_detect.Rd
-3da6dedbe73f2cd8ab0b220fddb05265 *man/str_dup.Rd
-4b7252acee53920f9489554eba32ff47 *man/str_extract.Rd
-421188a3864ff18626b8cb6b602ed257 *man/str_extract_all.Rd
-80947dce676306790aec00a00bc71de5 *man/str_length.Rd
-a5719d08876a471dbc136e9c69a80b17 *man/str_locate.Rd
-4fb3fa0632efe8eb22aa9aec33d27c41 *man/str_locate_all.Rd
-044068e21a9d3883c9568dbe81c2aefb *man/str_match.Rd
-28e18ed2325d9fbf4bf1cb9a343c269f *man/str_match_all.Rd
-499efab9a76d60d78c58ba641c9dd761 *man/str_pad.Rd
-ea6ff066d63136cacb172a2b0c4cf5c2 *man/str_replace.Rd
-7fac10186ba22f54a80021913ffd878a *man/str_replace_all.Rd
-93e6eea98bd572829b8e164384673a2a *man/str_split.Rd
-5031e079d68b1e5ff4dd034e0dfe4cac *man/str_split_fixed.Rd
-c397afe69fc8dc833a0ed0990a9abce5 *man/str_sub.Rd
-bf353bfff3f33db800910f7cf498c6c4 *man/str_sub_replace.Rd
-2ac1d755e7a56c11ab8d08fb8b692e33 *man/str_trim.Rd
-ae0c6fcf7ea0086ab5e87b6d25d23c8b *man/str_wrap.Rd
-62d4953c6ee32543df92481fe26e9377 *man/word.Rd
-37129f1e586caa1da9010b485199f41b *tests/test-all.R
+f6c0f7228263e8d0c42f7b8581ee9da7 *DESCRIPTION
+54c1589e84ba5df2778a4e893ef72457 *NAMESPACE
+87ddd6605b202ee5d3ee22926e447100 *R/c.r
+9c5e91f93a404215e8c4946c5a3ac2b7 *R/case.R
+bc5a2a73f2842baf45454f53f324f274 *R/conv.R
+3ebde5c8b233eb438daa6eb6ef20b580 *R/count.r
+5a02f00b78b0090830f05b1747033e83 *R/detect.r
+017248629447b588fd6e607c6a5b420e *R/dup.r
+033276694abb26130c10b31e087f71b9 *R/extract.r
+8ef6b7317657989078722686317f810b *R/length.r
+673284eb210305e4983f2618d55c2d7a *R/locate.r
+77cb46494e7208f2deb942ef897fdd52 *R/match.r
+f5d93cebd0efc3ad2f1a479c86f4c041 *R/modifiers.r
+7c467f8789c92d5e437f9b39cbc419f9 *R/pad-trim.r
+373404758535e696141d86d2cd4a1f9b *R/replace.r
+932f28001598d05ca5f977721e9bb131 *R/sort.R
+0b6092c4c346f20b2181c505e7d86912 *R/split.r
+994867fa3894453fc4d4832171eafee3 *R/stringr.R
+657a92dfd6074f24e81c49279bd54297 *R/sub.r
+b925d9c285b6b377f1cb26205521ed29 *R/subset.R
+f583f5b5856f7cb5f2c5fbb04f39f8a8 *R/utils.R
+e289fd194fc7928e7159cf8afe1d0677 *R/word.r
+200bb24c414024721759d59d2907eadd *R/wrap.r
+b838ad43bd80f67c1348cdfd1db69208 *README.md
+b536c8a62aa1b18eea448bf780b6f225 *build/vignette.rds
+8ffc45088b1068264eba4514b264d53a *inst/doc/stringr.R
+2d2afda9742a6d5ef5fa1c51a781bda9 *inst/doc/stringr.Rmd
+293072752c163cf1501f8948dd27cb2e *inst/doc/stringr.html
+7f5ecca60bf966d675cad61a9f96d5cd *man/case.Rd
+f2bce59645a8e34f5ce03d3c96b00481 *man/invert_match.Rd
+895425e477b9733374018900728a2900 *man/modifier-deprecated.Rd
+11c58e978d8a8b356967e4b2c0e74a1e *man/modifiers.Rd
+46b011f56b10d41a81084a43102f615b *man/pipe.Rd
+870530c70a8db2f1cd85638257365eae *man/str_c.Rd
+ead10b491a30fc630e7b12ea45a3bd13 *man/str_conv.Rd
+d99b1cc60142d76eefdc5b301e746de8 *man/str_count.Rd
+8277de388e437d825f74f5e14f114bb1 *man/str_detect.Rd
+20dccc633025393e296a7f8e0973bfb9 *man/str_dup.Rd
+7cd0c123e5673f99e4162b93a80f1674 *man/str_extract.Rd
+5cd37c9c64d70595c1ed8ce551f82789 *man/str_length.Rd
+81fac342bc1e6fe503d02c9673dce784 *man/str_locate.Rd
+3fd2199b7ff06db88c9ba6cd4722d6cd *man/str_match.Rd
+92b4ac302da8c85b58e31652655df6ab *man/str_order.Rd
+176c05214e4313f394e7714e1d202376 *man/str_pad.Rd
+211b5f4ab56a0771b3239d32ab2cd613 *man/str_replace.Rd
+ccd6299c89a844ec3573455cb4b86825 *man/str_replace_na.Rd
+ef5fcb6a4e71c6316da2157046a92138 *man/str_split.Rd
+f81e06be94e067d0ee7e972d2744ea01 *man/str_sub.Rd
+353ba68fb55d9274cc38d498d80a947e *man/str_subset.Rd
+b040d44ded59aaf147050d56ada196da *man/str_trim.Rd
+9b2a1aea53b13a1fd0edc924f55c370a *man/str_wrap.Rd
+93af066d98be9af372aa91351be491c1 *man/stringr.Rd
+9f42a2d37fae6d2bc33ac8e1255c9d42 *man/word.Rd
+4ee9d05bd4688270eca8d85299cedcd1 *tests/testthat.R
+61f9d77768cf9ff813d382f9337178fb *tests/testthat/test-count.r
+c69fd2a84d10850f39b85819d60a32fb *tests/testthat/test-detect.r
+065f752787f210c753d5bb5feea7f7a5 *tests/testthat/test-dup.r
+a7623052beaad8b11fcdc1fbe1504599 *tests/testthat/test-extract.r
+0e49f9a28c45a65d7c6893f03a8c9ca0 *tests/testthat/test-join.r
+922366c3451f88871b9ce063529edb7a *tests/testthat/test-length.r
+76249df3c11c62fb11aef63899029790 *tests/testthat/test-locate.r
+3fbd8882c34a3923b7b12707af0a987d *tests/testthat/test-match.r
+3cfc28d6785f4a8c0796a7980c9aac90 *tests/testthat/test-pad.r
+f339473f66b14267ec4b86db14b97820 *tests/testthat/test-split.r
+c95563eafc4fad4c60504ae59225b9d0 *tests/testthat/test-sub.r
+7dc6b256c7c2d3af1483b84698494819 *tests/testthat/test-trim.r
+2d2afda9742a6d5ef5fa1c51a781bda9 *vignettes/stringr.Rmd
diff --git a/NAMESPACE b/NAMESPACE
index 149ba82..c70f329 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -1,9 +1,16 @@
+# Generated by roxygen2 (4.1.0): do not edit by hand
+
+export("%>%")
 export("str_sub<-")
+export(boundary)
+export(coll)
 export(fixed)
 export(ignore.case)
 export(invert_match)
 export(perl)
+export(regex)
 export(str_c)
+export(str_conv)
 export(str_count)
 export(str_detect)
 export(str_dup)
@@ -15,12 +22,21 @@ export(str_locate)
 export(str_locate_all)
 export(str_match)
 export(str_match_all)
+export(str_order)
 export(str_pad)
 export(str_replace)
 export(str_replace_all)
+export(str_replace_na)
+export(str_sort)
 export(str_split)
 export(str_split_fixed)
 export(str_sub)
+export(str_subset)
+export(str_to_lower)
+export(str_to_title)
+export(str_to_upper)
 export(str_trim)
 export(str_wrap)
 export(word)
+import(stringi)
+importFrom(magrittr,"%>%")
diff --git a/NEWS b/NEWS
deleted file mode 100644
index 1994db8..0000000
--- a/NEWS
+++ /dev/null
@@ -1,81 +0,0 @@
-stringr 0.6.2
-================
-
-* fixed path in `str_wrap` example so works for more R installations.
-
-* remove dependency on plyr
-
-stringr 0.6.1
-=============
-
-* Zero input to `str_split_fixed` returns 0 row matrix with `n` columns
-
-* Export `str_join`
-
-stringr 0.6
-===========
-
-* new modifier `perl` that switches to Perl regular expressions
-
-* `str_match` now uses new base function `regmatches` to extract matches -
-  this should hopefully be faster than my previous pure R algorithm
-
-stringr 0.5
-===========
-
-* new `str_wrap` function which gives `strwrap` output in a more convenient
-  format
-
-* new `word` function extract words from a string given user defined
-  separator (thanks to suggestion by David Cooper)
-
-* `str_locate` now returns consistent type when matching empty string (thanks
-  to Stavros Macrakis)
-
-* new `str_count` counts number of matches in a string.
-
-* `str_pad` and `str_trim` receive performance tweaks - for large vectors this
-  should give at least a two order of magnitude speed up
-
-* str_length returns NA for invalid multibyte strings
-
-* fix small bug in internal `recyclable` function
-
-stringr 0.4
-===========
-
- * all functions now vectorised with respect to string, pattern (and
-   where appropriate) replacement parameters
- * fixed() function now tells stringr functions to use fixed matching, rather
-   than escaping the regular expression.  Should improve performance for 
-   large vectors.
- * new ignore.case() modifier tells stringr functions to ignore case of
-   pattern.
- * str_replace renamed to str_replace_all and new str_replace function added.
-   This makes str_replace consistent with all functions.
- * new str_sub<- function (analogous to substring<-) for substring replacement
- * str_sub now understands negative positions as a position from the end of
-   the string. -1 replaces Inf as indicator for string end.
- * str_pad side argument can be left, right, or both (instead of center)
- * str_trim gains side argument to better match str_pad
- * stringr now has a namespace and imports plyr (rather than requiring it)
-
-stringr 0.3
-===========
-
- * fixed() now also escapes |
- * str_join() renamed to str_c()
- * all functions more carefully check input and return informative error
-   messages if not as expected.
- * add invert_match() function to convert a matrix of location of matches to
-   locations of non-matches
- * add fixed() function to allow matching of fixed strings.
-
-stringr 0.2
-===========
-
- * str_length now returns correct results when used with factors
- * str_sub now correctly replaces Inf in end argument with length of string
- * new function str_split_fixed returns fixed number of splits in a character
-   matrix
- * str_split no longer uses strsplit to preserve trailing breaks
diff --git a/R/c.r b/R/c.r
index 44e7eac..7df5993 100644
--- a/R/c.r
+++ b/R/c.r
@@ -1,25 +1,24 @@
 #' Join multiple strings into a single string.
 #'
 #' To understand how \code{str_c} works, you need to imagine that you are
-#' building up a matrix of strings.  Each input argument forms a column, and
+#' building up a matrix of strings. Each input argument forms a column, and
 #' is expanded to the length of the longest argument, using the usual
 #' recyling rules.  The \code{sep} string is inserted between each column. If
-#' collapse is \code{NULL} each row is collapsed into a single string.   If
+#' collapse is \code{NULL} each row is collapsed into a single string. If
 #' non-\code{NULL} that string is inserted at the end of each row, and
 #' the entire matrix collapsed to a single string.
 #'
-#' @param ... one or more character vectors.  Zero length arguments
-#'   are removed
-#' @param sep string to insert between input vectors
-#' @param collapse optional string used to combine input vectors into single
-#'   string
+#' @param ... One or more character vectors. Zero length arguments
+#'   are removed.
+#' @param sep String to insert between input vectors.
+#' @param collapse Optional string used to combine input vectors into single
+#'   string.
 #' @return If \code{collapse = NULL} (the default) a character vector with
-#'   length equal to the longest input string.  If \code{collapse} is non-
-#'   NULL, a character vector of length 1.
-#' @keywords character
-#' @seealso \code{\link{paste}} which this function wraps
-#' @aliases str_c str_join
-#' @export str_c str_join
+#'   length equal to the longest input string. If \code{collapse} is
+#'   non-NULL, a character vector of length 1.
+#' @seealso \code{\link{paste}} for equivalent base R functionality, and
+#'    \code{\link[stringi]{stri_c}} which this function wraps
+#' @export str_c
 #' @examples
 #' str_c("Letter: ", letters)
 #' str_c("Letter", letters, sep = ": ")
@@ -28,12 +27,18 @@
 #'
 #' str_c(letters, collapse = "")
 #' str_c(letters, collapse = ", ")
-str_c <- str_join <- function(..., sep = "", collapse = NULL) {
-  strings <- Filter(function(x) length(x) > 0, list(...))
-  atomic <- vapply(strings, is.atomic, logical(1))
-  if (!all(atomic)) {
-    stop("Input to str_c should be atomic vectors", call. = FALSE)
-  }
+#'
+#' # Missing inputs give missing outputs
+#' str_c(c("a", NA, "b"), "-d")
+#' # Use str_replace_NA to display literal NAs:
+#' str_c(str_replace_na(c("a", NA, "b")), "-d")
+str_c <- function(..., sep = "", collapse = NULL) {
+  stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
+}
 
-  do.call("paste", c(strings, list(sep = sep, collapse = collapse)))
+#' @export
+#' @rdname str_c
+str_join <- function(..., sep = "", collapse = NULL) {
+  .Deprecated("str_c")
+  stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE)
 }
diff --git a/R/case.R b/R/case.R
new file mode 100644
index 0000000..33b785b
--- /dev/null
+++ b/R/case.R
@@ -0,0 +1,31 @@
+#' Convert case of a string.
+#'
+#' @param string String to modify
+#' @param locale Locale to use for translations.
+#' @examples
+#' dog <- "The quick brown dog"
+#' str_to_upper(dog)
+#' str_to_lower(dog)
+#' str_to_title(dog)
+#'
+#' # Locale matters!
+#' str_to_upper("i", "en") # English
+#' str_to_upper("i", "tr") # Turkish
+#' @name case
+NULL
+
+#' @export
+#' @rdname case
+str_to_upper <- function(string, locale = "") {
+  stri_trans_toupper(string, locale = locale)
+}
+#' @export
+#' @rdname case
+str_to_lower <- function(string, locale = "") {
+  stri_trans_tolower(string, locale = locale)
+}
+#' @export
+#' @rdname case
+str_to_title <- function(string, locale = "") {
+  stri_trans_totitle(string, opts_brkiter = stri_opts_brkiter(locale = locale))
+}
diff --git a/R/checks.r b/R/checks.r
deleted file mode 100644
index 613d610..0000000
--- a/R/checks.r
+++ /dev/null
@@ -1,22 +0,0 @@
-# Check that string is of the correct type for stringr functions
-check_string <- function(string) {
-  if (!is.atomic(string))
-    stop("String must be an atomic vector", call. = FALSE)
-
-  if (!is.character(string))
-    string <- as.character(string)
-
-  string
-}
-
-# Check that pattern is of the correct type for stringr functions
-check_pattern <- function(pattern, string, replacement = NULL) {
-  if (!is.character(pattern))
-    stop("Pattern must be a character vector", call. = FALSE)
-
-  if (!recyclable(string, pattern, replacement)) {
-    stop("Lengths of string and pattern not compatible")
-  }
-
-  pattern
-}
diff --git a/R/conv.R b/R/conv.R
new file mode 100644
index 0000000..c1a4c7e
--- /dev/null
+++ b/R/conv.R
@@ -0,0 +1,17 @@
+#' Specify the encoding of a string.
+#'
+#' This is a convenient way to override the current encoding of a string.
+#'
+#' @param string String to re-encode.
+#' @param encoding Name of encoding. See \code{\link[stringi]{stri_enc_list}}
+#'   for a complete list.
+#' @export
+#' @examples
+#' # Example from encoding?stringi::stringi
+#' x <- rawToChar(as.raw(177))
+#' x
+#' str_conv(x, "ISO-8859-2") # Polish "a with ogonek"
+#' str_conv(x, "ISO-8859-1") # Plus-minus
+str_conv <- function(string, encoding) {
+  stri_conv(string, encoding, "UTF-8")
+}
diff --git a/R/count.r b/R/count.r
index c737a29..c43c752 100644
--- a/R/count.r
+++ b/R/count.r
@@ -1,13 +1,11 @@
 #' Count the number of matches in a string.
 #'
-#' Vectorised over \code{string} and \code{pattern}, shorter is recycled to
-#' same length as longest.
+#' Vectorised over \code{string} and \code{pattern}.
 #'
 #' @inheritParams str_detect
-#' @keywords character
-#' @return integer vector
+#' @return An integer vector.
 #' @seealso
-#'  \code{\link{regexpr}} which this function wraps
+#'  \code{\link[stringi]{stri_count}} which this function wraps.
 #'
 #'  \code{\link{str_locate}}/\code{\link{str_locate_all}} to locate position
 #'  of matches
@@ -19,22 +17,20 @@
 #' str_count(fruit, "p")
 #' str_count(fruit, "e")
 #' str_count(fruit, c("a", "b", "p", "p"))
-str_count <- function(string, pattern) {
-  if (length(string) == 0) return(character())
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (length(pattern) == 1) {
-    matches <- re_call("gregexpr", string, pattern)
-  } else {
-    matches <- unlist(re_mapply("gregexpr", string, pattern),
-      recursive = FALSE)
-  }
-
-  match_length <- function(x) {
-    len <- length(x)
-    if (len > 1) return(len)
-    if (identical(c(x), -1L)) 0L else 1L
-  }
-  vapply(matches, match_length, integer(1))
+#'
+#' str_count(c("a.", "...", ".a.a"), ".")
+#' str_count(c("a.", "...", ".a.a"), fixed("."))
+str_count <- function(string, pattern = "") {
+  switch(type(pattern),
+    empty = stri_count_boundaries(string,
+      opts_brkiter = stri_opts_brkiter(type = "character")),
+    bound = stri_count_boundaries(string,
+      opts_brkiter = attr(pattern, "options")),
+    fixed = stri_count_fixed(string, pattern,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_count_coll(string, pattern,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_count_regex(string, pattern,
+      opts_regex = attr(pattern, "options"))
+  )
 }
diff --git a/R/detect.r b/R/detect.r
index 2a8de1c..0d07fa4 100644
--- a/R/detect.r
+++ b/R/detect.r
@@ -2,16 +2,24 @@
 #'
 #' Vectorised over \code{string} and \code{pattern}.
 #'
-#' @param string input vector. This must be an atomic vector, and will be
-#'   coerced to a character vector
-#' @param pattern pattern to look for, as defined by a POSIX regular
-#'   expression.  See the ``Extended Regular Expressions'' section of
-#'   \code{\link{regex}} for details.  See \code{\link{fixed}},
-#'   \code{\link{ignore.case}} and \code{\link{perl}} for how to use other
-#'   types of matching: fixed, case insensitive and perl-compatible.
-#' @return boolean vector
-#' @seealso \code{\link{grepl}} which this function wraps
-#' @keywords character
+#' @param string Input vector. Either a character vector, or something
+#'  coercible to one.
+#' @param pattern Pattern to look for.
+#'
+#'   The default interpretation is a regular expression, as described
+#'   in \link[stringi]{stringi-search-regex}. Control options with
+#'   \code{\link{regex}()}.
+#'
+#'   Match a fixed string (i.e. by comparing only bytes), using
+#'   \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+#'   for matching human text, you'll want \code{\link{coll}(x)} which
+#'   respects character matching rules for the specified locale.
+#'
+#'   Match character, word, line and sentence boundaries with
+#'   \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+#'   \code{boundary("character")}.
+#' @return A logical vector.
+#' @seealso \code{\link[stringi]{stri_detect}} which this function wraps
 #' @export
 #' @examples
 #' fruit <- c("apple", "banana", "pear", "pinapple")
@@ -24,15 +32,15 @@
 #' # Also vectorised over pattern
 #' str_detect("aecfg", letters)
 str_detect <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (length(pattern) == 1) {
-    results <- re_call("grepl", string, pattern)
-  } else {
-    results <- unlist(re_mapply("grepl", string, pattern))
-  }
-  is.na(results) <- is.na(string)
-
-  results
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_detect_fixed(string, pattern,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_detect_coll(string, pattern,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_detect_regex(string, pattern,
+      opts_regex = attr(pattern, "options"))
+  )
 }
+
diff --git a/R/dup.r b/R/dup.r
index caf6a33..6954aba 100644
--- a/R/dup.r
+++ b/R/dup.r
@@ -2,10 +2,9 @@
 #'
 #' Vectorised over \code{string} and \code{times}.
 #'
-#' @param string input character vector
-#' @param times number of times to duplicate each string
-#' @return character vector
-#' @keywords character
+#' @param string Input character vector.
+#' @param times Number of times to duplicate each string.
+#' @return A character vector.
 #' @export
 #' @examples
 #' fruit <- c("apple", "pear", "banana")
@@ -13,18 +12,5 @@
 #' str_dup(fruit, 1:3)
 #' str_c("ba", str_dup("na", 0:5))
 str_dup <- function(string, times) {
-  string <- check_string(string)
-
-  # Use data frame to do recycling
-  data <- data.frame(string, times)
-  n <- nrow(data)
-  string <- data$string
-  times <- data$times
-
-  output <- vapply(seq_len(n), function(i) {
-    paste(rep.int(string[i], times[i]), collapse = "")
-  }, character(1))
-
-  names(output) <- names(string)
-  output
+  stri_dup(string, times)
 }
diff --git a/R/extract.r b/R/extract.r
index 7ccae44..254a642 100644
--- a/R/extract.r
+++ b/R/extract.r
@@ -1,49 +1,54 @@
-#' Extract first piece of a string that matches a pattern.
+#' Extract matching patterns from a string.
 #'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
+#' Vectorised over \code{string} and \code{pattern}.
 #'
 #' @inheritParams str_detect
-#' @return character vector.
-#' @keywords character
-#' @seealso \code{\link{str_extract_all}} to extract all matches
+#' @return A character vector.
+#' @seealso \code{\link[stringi]{stri_extract_first}} and
+#'   \code{\link[stringi]{stri_extract_all}} for the underlying
+#'   implementation.
+#' @param simplify If \code{FALSE}, the default, returns a list of character
+#'   vectors. If \code{TRUE} returns a character matrix.
 #' @export
 #' @examples
-#' shopping_list <- c("apples x4", "flour", "sugar", "milk x2")
+#' shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
 #' str_extract(shopping_list, "\\d")
 #' str_extract(shopping_list, "[a-z]+")
 #' str_extract(shopping_list, "[a-z]{1,4}")
 #' str_extract(shopping_list, "\\b[a-z]{1,4}\\b")
-str_extract <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  positions <- str_locate(string, pattern)
-  str_sub(string, positions[, "start"], positions[, "end"])
-}
-
-#' Extract all pieces of a string that match a pattern.
-#'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
 #'
-#' @inheritParams str_detect
-#' @return list of character vectors.
-#' @keywords character
-#' @seealso \code{\link{str_extract}} to extract the first match
-#' @export
-#' @examples
-#' shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
+#' # Extract all matches
 #' str_extract_all(shopping_list, "[a-z]+")
 #' str_extract_all(shopping_list, "\\b[a-z]+\\b")
 #' str_extract_all(shopping_list, "\\d")
-str_extract_all <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
+#'
+#' # Simplify results into character matrix
+#' str_extract_all(shopping_list, "\\b[a-z]+\\b", simplify = TRUE)
+#' str_extract_all(shopping_list, "\\d", simplify = TRUE)
+str_extract <- function(string, pattern) {
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_extract_first_fixed(string, pattern,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_extract_first_coll(string, pattern,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_extract_first_regex(string, pattern,
+      opts_regex = attr(pattern, "options"))
+  )
+}
 
-  positions <- str_locate_all(string, pattern)
-  lapply(seq_along(string), function(i) {
-    position <- positions[[i]]
-    str_sub(string[i], position[, "start"], position[, "end"])
-  })
+#' @rdname str_extract
+#' @export
+str_extract_all <- function(string, pattern, simplify = FALSE) {
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_extract_all_fixed(string, pattern,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_extract_all_coll(string, pattern,
+      simplify = simplify, omit_no_match = TRUE, attr(pattern, "options")),
+    regex = stri_extract_all_regex(string, pattern,
+      simplify = simplify, omit_no_match = TRUE, attr(pattern, "options"))
+  )
 }
diff --git a/R/length.r b/R/length.r
index 7e198d6..68ed3d4 100644
--- a/R/length.r
+++ b/R/length.r
@@ -1,18 +1,33 @@
-#' The length of a string (in characters).
+#' The length of a string.
+#'
+#' Technically this returns the number of "code points", in a string. One
+#' code point usually corresponds to one character, but not always. For example,
+#' an u with a umlaut might be represented as a single character or as the
+#' combination a u and an umlaut.
 #'
 #' @inheritParams str_detect
-#' @return numeric vector giving number of characters in each element of the
-#'   character vector.  Missing string have missing length.
-#' @keywords character
-#' @seealso \code{\link{nchar}} which this function wraps
+#' @return A numeric vector giving number of characters (code points) in each
+#'    element of the character vector. Missing string have missing length.
+#' @seealso \code{\link[stringi]{stri_length}} which this function wraps.
 #' @export
 #' @examples
 #' str_length(letters)
+#' str_length(NA)
+#' str_length(factor("abc"))
 #' str_length(c("i", "like", "programming", NA))
+#'
+#' # Two ways of representing a u with an umlaut
+#' u1 <- "\u00fc"
+#' u2 <- stringi::stri_trans_nfd(u1)
+#' # The print the same:
+#' u1
+#' u2
+#' # But have a different length
+#' str_length(u1)
+#' str_length(u2)
+#' # Even though they have the same number of characters
+#' str_count(u1)
+#' str_count(u2)
 str_length <- function(string) {
-  string <- check_string(string)
-
-  nc <- nchar(string, allowNA = TRUE)
-  is.na(nc) <- is.na(string)
-  nc
+  stri_length(string)
 }
diff --git a/R/locate.r b/R/locate.r
index ed74bd7..b2cf54b 100644
--- a/R/locate.r
+++ b/R/locate.r
@@ -1,92 +1,60 @@
-#' Locate the position of the first occurence of a pattern in a string.
+#' Locate the position of patterns in a string.
 #'
-#' Vectorised over \code{string} and \code{pattern}, shorter is recycled to
-#' same length as longest.
+#' Vectorised over \code{string} and \code{pattern}. If the match is of length
+#' 0, (e.g. from a special match like \code{$}) end will be one character less
+#' than start.
 #'
 #' @inheritParams str_detect
-#' @return integer matrix.  First column gives start postion of match, and
-#'   second column gives end position.
-#' @keywords character
+#' @return For \code{str_locate}, an integer matrix. First column gives start
+#'   postion of match, and second column gives end position. For
+#'   \code{str_locate_all} a list of integer matrices.
 #' @seealso
-#'   \code{\link{regexpr}} which this function wraps
-#'
-#'   \code{\link{str_extract}} for a convenient way of extracting matches
-#
-#'   \code{\link{str_locate_all}} to locate position of all matches
-#'
+#'   \code{\link{str_extract}} for a convenient way of extracting matches,
+#'   \code{\link[stringi]{stri_locate}} for the underlying implementation.
 #' @export
 #' @examples
-#' fruit <- c("apple", "banana", "pear", "pinapple")
+#' fruit <- c("apple", "banana", "pear", "pineapple")
+#' str_locate(fruit, "$")
 #' str_locate(fruit, "a")
 #' str_locate(fruit, "e")
 #' str_locate(fruit, c("a", "b", "p", "p"))
-str_locate <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (length(pattern) == 1) {
-    results <- re_call("regexpr", string, pattern)
-    match_to_matrix(results)
-  } else {
-    results <- re_mapply("regexpr", string, pattern)
-    out <- t(vapply(results, match_to_matrix, integer(2)))
-    colnames(out) <- c("start", "end")
-    out
-  }
-}
-
-#' Locate the position of all occurences of a pattern in a string.
-#'
-#' Vectorised over \code{string} and \code{pattern}, shorter is recycled to
-#' same length as longest.
-#'
-#' If the match is of length 0, (e.g. from a special match like \code{$})
-#' end will be one character less than start.
-#'
-#' @inheritParams str_detect
-#' @keywords character
-#' @return list of integer matrices.  First column gives start postion of
-#'   match, and second column gives end position.
-#' @seealso
-#'  \code{\link{regexpr}} which this function wraps
-#'
-#'  \code{\link{str_extract}} for a convenient way of extracting matches
-#'
-#'  \code{\link{str_locate}} to locate position of first match
 #'
-#' @export
-#' @examples
-#' fruit <- c("apple", "banana", "pear", "pineapple")
 #' str_locate_all(fruit, "a")
 #' str_locate_all(fruit, "e")
 #' str_locate_all(fruit, c("a", "b", "p", "p"))
-str_locate_all <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (length(pattern) == 1) {
-    matches <- re_call("gregexpr", string, pattern)
-  } else {
-    matches <- unlist(re_mapply("gregexpr", string, pattern),
-      recursive = FALSE)
-  }
-  lapply(matches, match_to_matrix, global = TRUE)
+#'
+#' # Find location of every character
+#' str_locate_all(fruit, "")
+str_locate <- function(string, pattern) {
+  switch(type(pattern),
+    empty = stri_locate_first_boundaries(string,
+      opts_brkiter = stri_opts_brkiter("character")),
+    bound = stri_locate_first_boundaries(string,
+      opts_brkiter = attr(pattern, "options")),
+    fixed = stri_locate_first_fixed(string, pattern,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_locate_first_coll(string, pattern,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_locate_first_regex(string, pattern,
+      opts_regex = attr(pattern, "options"))
+  )
 }
 
-# Convert annoying regexpr format to something more useful
-match_to_matrix <- function(match, global = FALSE) {
-  if (global && length(match) == 1 && (is.na(match) || match == -1)) {
-    null <- matrix(0, nrow = 0, ncol = 2)
-    colnames(null) <- c("start", "end")
-
-    return(null)
-  }
-
-  start <- as.vector(match)
-  start[start == -1] <- NA
-  end <- start + attr(match, "match.length") - 1L
-
-  cbind(start = start, end = end)
+#' @rdname str_locate
+#' @export
+str_locate_all <- function(string, pattern) {
+  switch(type(pattern),
+    empty = stri_locate_all_boundaries(string, omit_no_match = TRUE,
+      opts_brkiter = stri_opts_brkiter("character")),
+    bound = stri_locate_all_boundaries(string, omit_no_match = TRUE,
+      opts_brkiter = attr(pattern, "options")),
+    fixed = stri_locate_all_fixed(string, pattern, omit_no_match = TRUE,
+      opts_fixed = attr(pattern, "options")),
+    regex = stri_locate_all_regex(string, pattern,
+      omit_no_match = TRUE, opts_regex = attr(pattern, "options")),
+    coll  = stri_locate_all_coll(string, pattern,
+      omit_no_match = TRUE, opts_collator = attr(pattern, "options"))
+  )
 }
 
 
diff --git a/R/match.r b/R/match.r
index f4d4ae2..03846c7 100644
--- a/R/match.r
+++ b/R/match.r
@@ -1,70 +1,54 @@
-#' Extract first matched group from a string.
+#' Extract matched groups from a string.
 #'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
+#' Vectorised over \code{string} and \code{pattern}.
 #'
 #' @inheritParams str_detect
-#' @param pattern pattern to look for, as defined by a POSIX regular
-#'   expression.  Pattern should contain groups, defined by ().  See the
-#'  ``Extended Regular Expressions'' section of \code{\link{regex}} for
-#'   details.
-#' @return character matrix. First column is the complete match, followed by
-#'   one for each capture group
-#' @keywords character
+#' @param pattern Pattern to look for, as defined by an ICU regular
+#'   expression. See \link[stringi]{stringi-search-regex} for more details.
+#' @return For \code{str_match}, a character matrix. First column is the
+#'   complete match, followed by one column for each capture group.
+#'   For \code{str_match_all}, a list of character matrices.
+#'
+#' @seealso \code{\link{str_extract}} to extract the complete match,
+#'   \code{\link[stringi]{stri_match}} for the underlying
+#'   implementation.
 #' @export
 #' @examples
 #' strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
 #'   "387 287 6718", "apple", "233.398.9187  ", "482 952 3315",
-#'   "239 923 8115", "842 566 4692", "Work: 579-499-7527", "$1000",
+#'   "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000",
 #'   "Home: 543.355.3679")
 #' phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
 #'
 #' str_extract(strings, phone)
 #' str_match(strings, phone)
+#'
+#' # Extract/match all
+#' str_extract_all(strings, phone)
+#' str_match_all(strings, phone)
+#'
+#' x <- c("<a> <b>", "<a> <>", "<a>", "", NA)
+#' str_match(x, "<(.*?)> <(.*?)>")
+#' str_match_all(x, "<(.*?)>")
+#'
+#' str_extract(x, "<.*?>")
+#' str_extract_all(x, "<.*?>")
 str_match <- function(string, pattern) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (length(string) == 0) return(character())
-
-  matcher <- re_call("regexec", string, pattern)
-  matches <- regmatches(string, matcher)
-
-  # Figure out how many groups there are and coerce into a matrix with
-  # nmatches + 1 columns
-  tmp <- str_replace_all(pattern, "\\\\\\(", "")
-  n <- str_length(str_replace_all(tmp, "[^(]", "")) + 1
-
-  len <- vapply(matches, length, integer(1))
-  matches[len == 0] <- rep(list(rep(NA_character_, n)), sum(len == 0))
-
-  do.call("rbind", matches)
+  switch(type(pattern),
+    regex = stri_match_first_regex(string, pattern,
+      opts_regex = attr(pattern, "options")),
+    stop("Can only match regular expressions", call. = FALSE)
+  )
 }
 
-#' Extract all matched groups from a string.
-#'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
-#'
-#' @inheritParams str_detect
-#' @param pattern pattern to look for, as defined by a POSIX regular
-#'   expression.  Pattern should contain groups, defined by ().  See the
-#'  ``Extended Regular Expressions'' section of \code{\link{regex}} for
-#'   details.
-#' @return list of character matrices, as given by \code{\link{str_match}}
-#' @keywords character
+#' @rdname str_match
 #' @export
-#' @examples
-#' strings <- c("Home: 219 733 8965.  Work: 229-293-8753 ",
-#'   "banana pear apple", "595 794 7569 / 387 287 6718")
-#' phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
-#'
-#' str_extract_all(strings, phone)
-#' str_match_all(strings, phone)
 str_match_all <- function(string, pattern) {
-  matches <- str_extract_all(string, pattern)
-
-  lapply(matches, function(match) {
-    str_match(match, pattern)
-  })
+  switch(type(pattern),
+    regex = stri_match_all_regex(string, pattern,
+      cg_missing = "",
+      omit_no_match = TRUE,
+      opts_regex = attr(pattern, "options")),
+    stop("Can only match regular expressions", call. = FALSE)
+  )
 }
diff --git a/R/modifiers.r b/R/modifiers.r
index 5f1947d..3fffe9d 100644
--- a/R/modifiers.r
+++ b/R/modifiers.r
@@ -1,72 +1,152 @@
-#' Match fixed characters, not regular expression.
+#' Control matching behaviour with modifier functions.
 #'
-#' This function specifies that a pattern is a fixed string, rather
-#' than a regular expression.  This can yield substantial speed ups, if
-#' regular expression matching is not needed.
+#' \describe{
+#'  \item{fixed}{Compare literal bytes in the string. This is very fast, but
+#'    not usually what you want for non-ASCII character sets.}
+#'  \item{coll}{Compare strings respecting standard collation rules.}
+#'  \item{regexp}{The default. Uses ICU regular expressions.}
+#'  \item{boundary}{Match boundaries between things.}
+#' }
 #'
-#' @param string string to match exactly as is
-#' @family modifiers
-#' @keywords character
-#' @export
+#' @param pattern Pattern to modify behaviour.
+#' @param ignore_case Should case differences be ignored in the match?
+#' @name modifiers
 #' @examples
 #' pattern <- "a.b"
 #' strings <- c("abb", "a.b")
 #' str_detect(strings, pattern)
 #' str_detect(strings, fixed(pattern))
-fixed <- function(string) {
-  if (is.perl(string)) message("Overriding Perl regexp matching")
-  structure(string, fixed = TRUE)
+#' str_detect(strings, coll(pattern))
+#'
+#' # coll() is useful for locale-aware case-insensitive matching
+#' i <- c("I", "\u0130", "i")
+#' i
+#' str_detect(i, fixed("i", TRUE))
+#' str_detect(i, coll("i", TRUE))
+#' str_detect(i, coll("i", TRUE, locale = "tr"))
+#'
+#' # Word boundaries
+#' words <- c("These are   some words.")
+#' str_count(words, boundary("word"))
+#' str_split(words, " ")[[1]]
+#' str_split(words, boundary("word"))[[1]]
+#'
+#' # Regular expression variations
+#' str_extract_all("The Cat in the Hat", "[a-z]+")
+#' str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
+#'
+#' str_extract_all("a\nb\nc", "^.")
+#' str_extract_all("a\nb\nc", regex("^.", multiline = TRUE))
+#'
+#' str_extract_all("a\nb\nc", "a.")
+#' str_extract_all("a\nb\nc", regex("a.", dotall = TRUE))
+NULL
+
+#' @export
+#' @rdname modifiers
+fixed <- function(pattern, ignore_case = FALSE) {
+  options <- stri_opts_fixed(case_insensitive = ignore_case)
+
+  structure(
+    pattern,
+    options = options,
+    class = c("fixed", "pattern", "character")
+  )
 }
 
-is.fixed <- function(string) {
-  fixed <- attr(string, "fixed")
-  if (is.null(fixed)) FALSE else fixed
+#' @export
+#' @rdname modifiers
+#' @param locale Locale to use for comparisons. See
+#'   \code{\link[stringi]{stri_locale_list}()} for all possible options.
+#' @param ... Other less frequently used arguments passed on to
+#'   \code{\link[stringi]{stri_opts_collator}},
+#'   \code{\link[stringi]{stri_opts_regex}}, or
+#'   \code{\link[stringi]{stri_opts_brkiter}}
+coll <- function(pattern, ignore_case = FALSE, locale = NULL, ...) {
+  options <- stri_opts_collator(
+    strength = if (ignore_case) 2L else 3L,
+    locale = locale,
+    ...
+  )
+
+  structure(
+    pattern,
+    options = options,
+    class = c("coll", "pattern", "character")
+  )
 }
 
-#' Ignore case of match.
-#'
-#' This function specifies that a pattern should ignore the case of matches.
-#'
-#' @param string pattern for which to ignore case
-#' @keywords character
-#' @family modifiers
 #' @export
-#' @examples
-#' pattern <- "a.b"
-#' strings <- c("ABB", "aaB", "aab")
-#' str_detect(strings, pattern)
-#' str_detect(strings, ignore.case(pattern))
-ignore.case <- function(string) {
-  structure(string, ignore.case = TRUE)
+#' @rdname modifiers
+#' @param multiline If \code{TRUE}, \code{$} and \code{^} match
+#'   the beginning and end of each line. If \code{FALSE}, the
+#'   default, only match the start and end of the input.
+#' @param comments If \code{TRUE}, white space and comments beginning with
+#'   \code{#} are ignored. Escape literal spaces with \code{\\ }.
+#' @param dotall If \code{TRUE}, \code{.} will also match line terminators.
+regex <- function(pattern, ignore_case = FALSE, multiline = FALSE,
+                   comments = FALSE, dotall = FALSE, ...) {
+  options <- stri_opts_regex(
+    case_insensitive = ignore_case,
+    multiline = multiline,
+    comments = comments,
+    dotall = dotall,
+    ...
+  )
+
+  structure(
+    pattern,
+    options = options,
+    class = c("regex", "pattern", "character")
+  )
 }
 
-case.ignored <- function(string) {
-  ignore.case <- attr(string, "ignore.case")
-  if (is.null(ignore.case)) FALSE else ignore.case
+#' @param type Boundary type to detect.
+#' @param skip_word_none Ignore "words" that don't contain any characters
+#'   or numbers - i.e. punctuation.
+#' @export
+#' @rdname modifiers
+boundary <- function(type = c("character", "line_break", "sentence", "word"),
+                    skip_word_none = TRUE, ...) {
+  type <- match.arg(type)
+  options <- stri_opts_brkiter(
+    type = type,
+    skip_word_none = skip_word_none,
+    ...
+  )
+
+  structure(
+    character(),
+    options = options,
+    class = c("boundary", "pattern", "character")
+  )
 }
 
+type <- function(x) UseMethod("type")
+type.boundary <- function(x) "bound"
+type.regexp <- function(x) "regex"
+type.coll <- function(x) "coll"
+type.fixed <- function(x) "fixed"
+type.character <- function(x) if (identical(x, "")) "empty" else "regex"
 
-#' Use perl regular expressions.
+#' Deprecated modifier functions.
 #'
-#' This function specifies that a pattern should use the Perl regular
-#' expression egine, rather than the default POSIX 1003.2 extended
-#' regular expressions
+#' Please use \code{\link{regexp}} and \code{\link{coll}} instead.
 #'
-#' @param string pattern to match with Perl regexps
-#' @family modifiers
-#' @keywords character
+#' @name modifier-deprecated
+#' @keywords internal
+NULL
+
 #' @export
-#' @examples
-#' pattern <- "(?x)a.b"
-#' strings <- c("abb", "a.b")
-#' \dontrun{str_detect(strings, pattern)}
-#' str_detect(strings, perl(pattern))
-perl <- function(string) {
-  if (is.fixed(string)) message("Overriding fixed matching")
-  structure(string, perl = TRUE)
+#' @rdname modifier-deprecated
+ignore.case <- function(string) {
+  message("Please use (fixed|coll|regexp)(x, ignore_case = TRUE) instead of ignore.case(x)")
+  fixed(string, ignore_case = TRUE)
 }
 
-is.perl <- function(string) {
-  perl <- attr(string, "perl")
-  if (is.null(perl)) FALSE else perl
+#' @export
+#' @rdname modifier-deprecated
+perl <- function(pattern) {
+  message("perl is deprecated. Please use regexp instead")
+  regex(pattern)
 }
diff --git a/R/pad-trim.r b/R/pad-trim.r
index f4c98f1..466e21f 100644
--- a/R/pad-trim.r
+++ b/R/pad-trim.r
@@ -1,14 +1,13 @@
 #' Pad a string.
 #'
-#' Vectorised over \code{string}.  All other inputs should be of length 1.
+#' Vectorised over \code{string}, \code{width} and \code{pad}.
 #'
-#' @param string input character vector
-#' @param width pad strings to this minimum width
-#' @param side side on which padding character is added (left, right or both)
-#' @param pad single padding character (default is a space)
-#' @return character vector
+#' @param string A character vector.
+#' @param width Minimum width of padded strings.
+#' @param side Side on which padding character is added (left, right or both).
+#' @param pad Single padding character (default is a space).
+#' @return A character vector.
 #' @seealso \code{\link{str_trim}} to remove whitespace
-#' @keywords character
 #' @export
 #' @examples
 #' rbind(
@@ -16,50 +15,40 @@
 #'   str_pad("hadley", 30, "right"),
 #'   str_pad("hadley", 30, "both")
 #' )
+#'
+#' # All arguments are vectorised except side
+#' str_pad(c("a", "abc", "abcdef"), 10)
+#' str_pad("a", c(5, 10, 20))
+#' str_pad("a", 10, pad = c("-", "_", " "))
+#'
 #' # Longer strings are returned unchanged
 #' str_pad("hadley", 3)
-str_pad <- function(string, width, side = "left", pad = " ") {
-  string <- check_string(string)
-  stopifnot(length(width) == 1)
-  stopifnot(length(side) == 1)
-  stopifnot(length(pad) == 1)
-  if (str_length(pad) != 1) {
-    stop("pad must be single character single")
-  }
-
-  side <- match.arg(side, c("left", "right", "both"))
-  needed <- pmax(0, width - str_length(string))
+str_pad <- function(string, width, side = c("left", "right", "both"), pad = " ") {
+  side <- match.arg(side)
 
-  left <- switch(side,
-    left = needed, right = 0, both = floor(needed / 2))
-  right <- switch(side,
-    left = 0, right = needed, both = ceiling(needed / 2))
-
-  # String duplication is slow, so only do the absolute necessary
-  lengths <- unique(c(left, right))
-  padding <- str_dup(pad, lengths)
-
-  str_c(padding[match(left, lengths)], string, padding[match(right, lengths)])
+  switch(side,
+    left = stri_pad_left(string, width, pad = pad),
+    right = stri_pad_right(string, width, pad = pad),
+    both = stri_pad_both(string, width, pad = pad)
+  )
 }
 
 #' Trim whitespace from start and end of string.
 #'
-#' @param string input character vector
-#' @param side side on which whitespace is removed (left, right or both)
-#' @return character vector with leading and trailing whitespace removed
-#' @keywords character
+#' @param string A character vector.
+#' @param side Side on which to remove whitespace (left, right or both).
+#' @return A character vector.
 #' @export
 #' @seealso \code{\link{str_pad}} to add whitespace
 #' @examples
 #' str_trim("  String with trailing and leading white space\t")
 #' str_trim("\n\nString with trailing and leading white space\n\n")
-str_trim <- function(string, side = "both") {
-  string <- check_string(string)
-  stopifnot(length(side) == 1)
-
-  side <- match.arg(side, c("left", "right", "both"))
-  pattern <- switch(side, left = "^\\s+", right = "\\s+$",
-    both = "^\\s+|\\s+$")
+str_trim <- function(string, side = c("both", "left", "right")) {
+  side <- match.arg(side)
 
-  str_replace_all(string, pattern, "")
+  switch(side,
+    left =  stri_trim_left(string),
+    right = stri_trim_right(string),
+    both =  stri_trim_both(string)
+  )
 }
diff --git a/R/replace.r b/R/replace.r
index f5c574a..fd3047c 100644
--- a/R/replace.r
+++ b/R/replace.r
@@ -1,16 +1,19 @@
-#' Replace first occurrence of a matched pattern in a string.
+#' Replace matched patterns in a string.
 #'
 #' Vectorised over \code{string}, \code{pattern} and \code{replacement}.
-#' Shorter arguments will be expanded to length of longest.
 #'
 #' @inheritParams str_detect
-#' @param replacement replacement string.  References of the form \code{\1},
+#' @param pattern,replacement Supply separate pattern and replacement strings
+#'   to vectorise over the patterns. References of the form \code{\1},
 #'   \code{\2} will be replaced with the contents of the respective matched
 #'   group (created by \code{()}) within the pattern.
-#' @return character vector.
-#' @keywords character
-#' @seealso \code{\link{sub}} which this function wraps,
-#'   \code{\link{str_replace_all}} to replace all matches
+#'
+#'   For \code{str_replace_all} only, you can perform multiple patterns and
+#'   replacements to each string, by passing a named character to
+#'   \code{pattern}.
+#' @return A character vector.
+#' @seealso \code{str_replace_na} to turn missing values into "NA";
+#'   \code{\link{stri_replace}} for the underlying implementation.
 #' @export
 #' @examples
 #' fruits <- c("one apple", "two pears", "three bananas")
@@ -21,32 +24,7 @@
 #' str_replace(fruits, "([aeiou])", "\\1\\1")
 #' str_replace(fruits, "[aeiou]", c("1", "2", "3"))
 #' str_replace(fruits, c("a", "e", "i"), "-")
-str_replace <- function(string, pattern, replacement) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string, replacement)
-
-  if (length(pattern) == 1 && length(replacement) == 1) {
-    re_call("sub", string, pattern, replacement)
-  } else {
-    unlist(re_mapply("sub", string, pattern, replacement))
-  }
-}
-
-#' Replace all occurrences of a matched pattern in a string.
 #'
-#' Vectorised over \code{string}, \code{pattern} and \code{replacement}.
-#' Shorter arguments will be expanded to length of longest.
-#'
-#' @inheritParams str_detect
-#' @param replacement replacement string.  References of the form \code{\1},
-#'   \code{\2} will be replaced with the contents of the respective matched
-#'   group (created by \code{()}) within the pattern.
-#' @return character vector.
-#' @keywords character
-#' @seealso \code{\link{gsub}} which this function wraps,
-#'   \code{\link{str_replace}} to replace a single match
-#' @export
-#' @examples
 #' fruits <- c("one apple", "two pears", "three bananas")
 #' str_replace(fruits, "[aeiou]", "-")
 #' str_replace_all(fruits, "[aeiou]", "-")
@@ -55,13 +33,62 @@ str_replace <- function(string, pattern, replacement) {
 #' str_replace_all(fruits, "([aeiou])", "\\1\\1")
 #' str_replace_all(fruits, "[aeiou]", c("1", "2", "3"))
 #' str_replace_all(fruits, c("a", "e", "i"), "-")
-str_replace_all <- function(string, pattern, replacement) {
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string, replacement)
+#'
+#' # If you want to apply multiple patterns and replacements to the same
+#' # string, pass a named version to pattern.
+#' str_replace_all(str_c(fruits, collapse = "---"),
+#'  c("one" = 1, "two" = 2, "three" = 3))
+str_replace <- function(string, pattern, replacement) {
+  replacement <- fix_replacement(replacement)
+
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_replace_first_fixed(string, pattern, replacement,
+      opts_fixed = attr(pattern, "options")),
+    coll  = stri_replace_first_coll(string, pattern, replacement,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_replace_first_regex(string, pattern, replacement,
+      opts_regex = attr(pattern, "options")),
+  )
+}
 
-  if (length(pattern) == 1 && length(replacement) == 1) {
-    re_call("gsub", string, pattern, replacement)
+#' @export
+#' @rdname str_replace
+str_replace_all <- function(string, pattern, replacement) {
+  if (!is.null(names(pattern))) {
+    replacement <- unname(pattern)
+    pattern <- names(pattern)
+    vec <- FALSE
   } else {
-    unlist(re_mapply("gsub", string, pattern, replacement))
+    vec <- TRUE
   }
+  replacement <- fix_replacement(replacement)
+
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_replace_all_fixed(string, pattern, replacement,
+      vectorize_all = vec, opts_fixed = attr(pattern, "options")),
+    coll  = stri_replace_all_coll(string, pattern, replacement,
+      vectorize_all = vec, opts_collator = attr(pattern, "options")),
+    regex = stri_replace_all_regex(string, pattern, replacement,
+      vectorize_all = vec, opts_regex = attr(pattern, "options"))
+  )
+}
+
+fix_replacement <- function(x) {
+  stri_replace_all_regex(x, c("\\$", "\\\\(\\d)"), c("\\\\$", "\\$$1"),
+    vectorize_all = FALSE)
+}
+
+
+#' Turn NA into "NA"
+#'
+#' @inheritParams str_replace
+#' @export
+#' @examples
+#' str_replace_na(c("NA", "abc", "def"))
+str_replace_na <- function(string, replacement = "NA") {
+  stri_replace_na(string, replacement)
 }
diff --git a/R/sort.R b/R/sort.R
new file mode 100644
index 0000000..5958464
--- /dev/null
+++ b/R/sort.R
@@ -0,0 +1,30 @@
+#' Order or sort a character vector.
+#'
+#' @param x A character vector to sort.
+#' @param decreasing A boolean. If \code{FALSE}, the default, sorts from
+#'   lowest to highest; if \code{TRUE} sorts from highest to lowest.
+#' @param na_last Where should \code{NA} go? \code{TRUE} at the end,
+#'   \code{FALSE} at the beginning, \code{NA} dropped.
+#' @param locale In which locale should the sorting occur? Defaults to
+#'   the current locale.
+#' @param ... Other options used to control sorting order. Passed on to
+#'   \code{\link[stringi]{stri_opts_collator}}.
+#' @seealso \code{\link[stringi]{stri_order}} for the underlying implementation.
+#' @export
+#' @examples
+#' str_order(letters, locale = "en")
+#' str_sort(letters, locale = "en")
+#'
+#' str_order(letters, locale = "haw")
+#' str_sort(letters, locale = "haw")
+str_order <- function(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) {
+  stri_order(x, decreasing = decreasing, na_last = na_last,
+    opts_collator = stri_opts_collator(locale, ...))
+}
+
+#' @export
+#' @rdname str_order
+str_sort <- function(x, decreasing = FALSE, na_last = TRUE, locale = "", ...) {
+  stri_sort(x, decreasing = decreasing, na_last = na_last,
+    opts_collator = stri_opts_collator(locale, ...))
+}
diff --git a/R/split.r b/R/split.r
index 5f6e52e..900190b 100644
--- a/R/split.r
+++ b/R/split.r
@@ -1,79 +1,23 @@
-#' Split up a string into a fixed number of pieces.
+#' Split up a string into pieces.
 #'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
+#' Vectorised over \code{string} and \code{pattern}.
 #'
-#' @param string input character vector
-#' @param pattern pattern to split up by, as defined by a POSIX regular
-#'   expression.  See the ``Extended Regular Expressions'' section of
-#'   \code{\link{regex}} for details. If \code{NA}, returns original string.
-#'   If \code{""} splits into individual characters.
+#' @inheritParams str_detect
 #' @param n number of pieces to return.  Default (Inf) uses all
-#'   possible split positions.  If n is greater than the number of pieces,
+#'   possible split positions.
+#'
+#'   For \code{str_split_fixed}, if n is greater than the number of pieces,
 #'   the result will be padded with empty strings.
-#' @return character matrix with \code{n} columns.
-#' @keywords character
-#' @seealso \code{\link{str_split}} for variable number of splits
+#' @return For \code{str_split_fixed}, a character matrix with \code{n} columns.
+#'   For \code{str_split}, a list of character vectors.
+#' @seealso \code{\link{stri_split}} for the underlying implementation.
 #' @export
 #' @examples
 #' fruits <- c(
 #'   "apples and oranges and pears and bananas",
 #'   "pineapples and mangos and guavas"
 #' )
-#' str_split_fixed(fruits, " and ", 3)
-#' str_split_fixed(fruits, " and ", 4)
-str_split_fixed <- function(string, pattern, n) {
-  if (length(string) == 0) {
-    return(matrix(character(), nrow = 0, ncol = n))
-  }
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (!is.numeric(n) || length(n) != 1) {
-    stop("n should be a numeric vector of length 1")
-  }
-
-  if (n == Inf) {
-    stop("n must be finite", call. = FALSE)
-  } else if (n == 1) {
-    matrix(string, ncol = 1)
-  } else {
-    locations <- str_locate_all(string, pattern)
-    do.call("rbind", lapply(seq_along(locations), function(i) {
-      location <- locations[[i]]
-      string <- string[i]
-
-      pieces <- min(n - 1, nrow(location))
-      cut <- location[seq_len(pieces), , drop = FALSE]
-      keep <- invert_match(cut)
-
-      padding <- rep("", n - pieces - 1)
-      c(str_sub(string, keep[, 1], keep[, 2]), padding)
-    }))
-  }
-}
-
-#' Split up a string into a variable number of pieces.
-#'
-#' Vectorised over \code{string}.  \code{pattern} should be a single pattern,
-#' i.e. a character vector of length one.
 #'
-#' @param string input character vector
-#' @param pattern pattern to split up by, as defined by a POSIX regular
-#'   expression.  See the ``Extended Regular Expressions'' section of
-#'   \code{\link{regex}} for details. If \code{NA}, returns original string.
-#'   If \code{""} splits into individual characters.
-#' @param n maximum number of pieces to return.  Default (Inf) uses all
-#'   possible split positions.
-#' @return a list of character vectors.
-#' @keywords character
-#' @export
-#' @seealso \code{\link{str_split_fixed}} for fixed number of splits
-#' @examples
-#' fruits <- c(
-#'   "apples and oranges and pears and bananas",
-#'   "pineapples and mangos and guavas"
-#' )
 #' str_split(fruits, " and ")
 #'
 #' # Specify n to restrict the number of possible matches
@@ -81,26 +25,42 @@ str_split_fixed <- function(string, pattern, n) {
 #' str_split(fruits, " and ", n = 2)
 #' # If n greater than number of pieces, no padding occurs
 #' str_split(fruits, " and ", n = 5)
+#'
+#' # Use fixed to return a character matrix
+#' str_split_fixed(fruits, " and ", 3)
+#' str_split_fixed(fruits, " and ", 4)
 str_split <- function(string, pattern, n = Inf) {
-  if (length(string) == 0) return(list())
-  string <- check_string(string)
-  pattern <- check_pattern(pattern, string)
-
-  if (!is.numeric(n) || length(n) != 1) {
-    stop("n should be a numeric vector of length 1")
-  }
+  if (identical(n, Inf)) n <- -1L
 
-  if (n == 1) {
-    as.list(string)
-  } else {
-    locations <- str_locate_all(string, pattern)
-    pieces <- function(mat, string) {
-      cut <- mat[seq_len(min(n - 1, nrow(mat))), , drop = FALSE]
-      keep <- invert_match(cut)
+  switch(type(pattern),
+    empty = stri_split_boundaries(string, n = n, simplify = FALSE,
+      opts_brkiter = stri_opts_brkiter(type = "character")),
+    bound = stri_split_boundaries(string, n = n, simplify = FALSE,
+      opts_brkiter = attr(pattern, "options")),
+    fixed = stri_split_fixed(string, pattern, n = n, simplify = FALSE,
+      opts_fixed = attr(pattern, "options")),
+    regex = stri_split_regex(string, pattern, n = n, simplify = FALSE,
+      opts_regex = attr(pattern, "options")),
+    coll  = stri_split_coll(string, pattern, n = n, simplify = FALSE,
+      opts_collator = attr(pattern, "options"))
+  )
+}
 
-      str_sub(string, keep[, 1], keep[, 2])
-    }
-    mapply(pieces, locations, string,
-      SIMPLIFY = FALSE, USE.NAMES = FALSE)
-  }
+#' @export
+#' @rdname str_split
+str_split_fixed <- function(string, pattern, n) {
+  out <- switch(type(pattern),
+    empty = stri_split_boundaries(string, n = n, simplify = TRUE,
+      opts_brkiter = stri_opts_brkiter(type = "character")),
+    bound = stri_split_boundaries(string, n = n, simplify = TRUE,
+      opts_brkiter = attr(pattern, "options")),
+    fixed = stri_split_fixed(string, pattern, n = n, simplify = TRUE,
+      opts_fixed = attr(pattern, "options")),
+    regex = stri_split_regex(string, pattern, n = n, simplify = TRUE,
+      opts_regex = attr(pattern, "options")),
+    coll  = stri_split_coll(string, pattern, n = n, simplify = TRUE,
+      opts_collator = attr(pattern, "options"))
+  )
+  out[is.na(out)] <- ""
+  out
 }
diff --git a/R/stringr.R b/R/stringr.R
new file mode 100644
index 0000000..f0bef11
--- /dev/null
+++ b/R/stringr.R
@@ -0,0 +1,5 @@
+#' Fast and friendly string manipulation.
+#'
+#' @name stringr
+#' @import stringi
+NULL
diff --git a/R/sub.r b/R/sub.r
index 6bf73c6..598d6cf 100644
--- a/R/sub.r
+++ b/R/sub.r
@@ -1,4 +1,4 @@
-#' Extract substrings from a character vector.
+#' Extract and replace substrings from a character vector.
 #'
 #' \code{str_sub} will recycle all arguments to be the same length as the
 #' longest argument. If any arguments are of length 0, the output will be
@@ -9,17 +9,16 @@
 #' substring, from the first character to the last.
 #'
 #' @param string input character vector.
-#' @param start integer vector giving position of first charater in substring,
-#'   defaults to first character. If negative, counts backwards from last
-#'   character.
-#' @param end integer vector giving position of last character in substring,
-#'   defaults to last character. If negative, counts backwards from last
-#'   character.
-#' @return character vector of substring from \code{start} to \code{end}
+#' @param start,end Two integer vectors. \code{start} gives the position
+#'   of the first character (defaults to first), \code{end} gives the position
+#'   of the last (defaults to last character). Alternatively, pass a two-column
+#'   matrix to \code{start}.
+#'
+#'   Negative values count backwards from the last character.
+#' @param value replacement string
+#' @return A character vector of substring from \code{start} to \code{end}
 #'   (inclusive). Will be length of longest input argument.
-#' @keywords character
-#' @seealso \code{\link{substring}} which this function wraps, and
-#'   \code{link{str_sub_replace}} for the replacement version
+#' @seealso The underlying implementation in \code{\link[stringi]{stri_sub}}
 #' @export
 #' @examples
 #' hw <- "Hadley Wickham"
@@ -30,65 +29,43 @@
 #' str_sub(hw, 8)
 #' str_sub(hw, c(1, 8), c(6, 14))
 #'
+#' # Negative indices
 #' str_sub(hw, -1)
 #' str_sub(hw, -7)
 #' str_sub(hw, end = -7)
 #'
+#' # Alternatively, you can pass in a two colum matrix, as in the
+#' # output from str_locate_all
+#' pos <- str_locate_all(hw, "[aeio]")[[1]]
+#' str_sub(hw, pos)
+#' str_sub(hw, pos[, 1], pos[, 2])
+#'
+#' # Vectorisation
 #' str_sub(hw, seq_len(str_length(hw)))
 #' str_sub(hw, end = seq_len(str_length(hw)))
-str_sub <- function(string, start = 1L, end = -1L) {
-  if (length(string) == 0L || length(start) == 0L || length(end) == 0L) {
-    return(vector("character", 0L))
-  }
-
-  string <- check_string(string)
-
-  n <- max(length(string), length(start), length(end))
-  string <- rep(string, length = n)
-  start <- rep(start, length = n)
-  end <- rep(end, length = n)
-
-  # Convert negative values into actual positions
-  len <- str_length(string)
-
-  neg_start <- !is.na(start) & start < 0L
-  start[neg_start] <- start[neg_start] + len[neg_start] + 1L
-
-  neg_end <- !is.na(end) & end < 0L
-  end[neg_end] <- end[neg_end] + len[neg_end] + 1L
-
-  substring(string, start, end)
-}
-
-#' Replace substrings in a character vector.
-#
-#' \code{str_sub<-} will recycle all arguments to be the same length as the
-#' longest argument.
 #'
-#' @param string input character vector.
-#' @param start integer vector giving position of first charater in substring,
-#'   defaults to first character. If negative, counts backwards from last
-#'   character.
-#' @param end integer vector giving position of last character in substring,
-#'   defaults to last character. If negative, counts backwards from last
-#'   character.
-#' @param value replacement string
-#' @return character vector of substring from \code{start} to \code{end}
-#'   (inclusive). Will be length of longest input argument.
-#' @name str_sub_replace
-#' @aliases str_sub<- str_sub_replace
-#' @usage str_sub(string, start = 1L, end = -1L) <- value
-#' @export "str_sub<-"
-#' @examples
+#' # Replacement form
 #' x <- "BBCDEF"
 #' str_sub(x, 1, 1) <- "A"; x
 #' str_sub(x, -1, -1) <- "K"; x
 #' str_sub(x, -2, -2) <- "GHIJ"; x
 #' str_sub(x, 2, -2) <- ""; x
-"str_sub<-" <- function(string, start = 1L, end = -1L, value) {
+str_sub <- function(string, start = 1L, end = -1L) {
+  if (is.matrix(start)) {
+    stri_sub(string, from = start)
+  } else {
+    stri_sub(string, from = start, to = end)
+  }
+}
+
 
-  str_c(
-    str_sub(string, end = start - 1L),
-    value,
-    ifelse(end == -1L, "", str_sub(string, start = end + 1L)))
+#' @export
+#' @rdname str_sub
+"str_sub<-" <- function(string, start = 1L, end = -1L, value) {
+  if (is.matrix(start)) {
+    stri_sub(string, from = start) <- value
+  } else {
+    stri_sub(string, from = start, to = end) <- value
+  }
+  string
 }
diff --git a/R/subset.R b/R/subset.R
new file mode 100644
index 0000000..6b3f5cf
--- /dev/null
+++ b/R/subset.R
@@ -0,0 +1,31 @@
+#' Keep strings matching a pattern.
+#'
+#' This is a convenient wrapper around \code{x[str_detect(x, pattern)]}.
+#' Vectorised over \code{string} and \code{pattern}
+#'
+#' @inheritParams str_detect
+#' @return A character vector.
+#' @seealso \code{\link{grep}} with argument \code{value = TRUE},
+#'    \code{\link[stringi]{stri_subset}} for the underlying implementation.
+#' @export
+#' @examples
+#' fruit <- c("apple", "banana", "pear", "pinapple")
+#' str_subset(fruit, "a")
+#' str_subset(fruit, "^a")
+#' str_subset(fruit, "a$")
+#' str_subset(fruit, "b")
+#' str_subset(fruit, "[aeiou]")
+#'
+#' # Missings are silently dropped
+#' str_subset(c("a", NA, "b"), ".")
+str_subset <- function(string, pattern) {
+  switch(type(pattern),
+    empty = ,
+    bound = stop("Not implemented", call. = FALSE),
+    fixed = stri_subset_fixed(string, pattern, omit_na = TRUE),
+    coll  = stri_subset_coll(string, pattern, omit_na = TRUE,
+      opts_collator = attr(pattern, "options")),
+    regex = stri_subset_regex(string, pattern, omit_na = TRUE,
+      opts_regex = attr(pattern, "options"))
+  )
+}
diff --git a/R/utils.R b/R/utils.R
new file mode 100644
index 0000000..036dd30
--- /dev/null
+++ b/R/utils.R
@@ -0,0 +1,9 @@
+#' Pipe operator
+#'
+#' @name %>%
+#' @rdname pipe
+#' @keywords internal
+#' @export
+#' @importFrom magrittr %>%
+#' @usage lhs \%>\% rhs
+NULL
diff --git a/R/utils.r b/R/utils.r
deleted file mode 100644
index 2c830e0..0000000
--- a/R/utils.r
+++ /dev/null
@@ -1 +0,0 @@
-compact <- function(l) Filter(Negate(is.null), l)
diff --git a/R/vectorise.r b/R/vectorise.r
deleted file mode 100644
index 450b9da..0000000
--- a/R/vectorise.r
+++ /dev/null
@@ -1,38 +0,0 @@
-# General wrapper around sub, gsub, regexpr, gregexpr, grepl.
-# Vectorises with pattern and replacement, and uses fixed and ignored.case
-# attributes.
-
-re_call <- function(f, string, pattern, replacement = NULL) {
-  args <- list(pattern, replacement, string,
-    fixed = is.fixed(pattern), ignore.case = case.ignored(pattern),
-    perl = is.perl(pattern))
-
-  if (!("perl" %in% names(formals(f)))) {
-    if (args$perl) message("Perl regexps not supported by ", f)
-    args$perl <- NULL
-  }
-
-  do.call(f, compact(args))
-}
-
-re_mapply <- function(f, string, pattern, replacement = NULL) {
-  args <- list(
-    FUN = f, SIMPLIFY = FALSE, USE.NAMES = FALSE,
-    pattern, replacement, string,
-    MoreArgs = list(
-      fixed = is.fixed(pattern),
-      ignore.case = case.ignored(pattern))
-    )
-  do.call("mapply", compact(args))
-}
-
-# Check if a set of vectors is recyclable.
-# Ignores zero length vectors.  Trivially TRUE if all inputs are zero length.
-recyclable <- function(...) {
-  lengths <- vapply(list(...), length, integer(1))
-
-  lengths <- lengths[lengths != 0]
-  if (length(lengths) == 0) return(TRUE)
-
-  all(max(lengths) %% lengths == 0)
-}
diff --git a/R/word.r b/R/word.r
index e883ef1..3ffcbfa 100644
--- a/R/word.r
+++ b/R/word.r
@@ -28,9 +28,9 @@
 #' word(str, 2, sep = fixed('..'))
 word <- function(string, start = 1L, end = start, sep = fixed(" ")) {
   n <- max(length(string), length(start), length(end))
-  string <- rep(string, length = n)
-  start <- rep(start, length = n)
-  end <- rep(end, length = n)
+  string <- rep(string, length.out = n)
+  start <- rep(start, length.out = n)
+  end <- rep(end, length.out = n)
 
   breaks <- str_locate_all(string, sep)
   words <- lapply(breaks, invert_match)
diff --git a/R/wrap.r b/R/wrap.r
index 2a2d98a..0f878c8 100644
--- a/R/wrap.r
+++ b/R/wrap.r
@@ -1,16 +1,16 @@
 #' Wrap strings into nicely formatted paragraphs.
 #'
-#' This is currently implemented as thin wrapper over \code{\link{strwrap}},
-#' but is vectorised over \code{stringr}, and collapses output into single
-#' strings.  See \code{\link{strwrap}} for more details.
+#' This is a wrapper around \code{\link[stringi]{stri_wrap}} which implements
+#' the Knuth-Plass paragraph wrapping algorithm.
 #'
 #' @param string character vector of strings to reformat.
-#' @param width positive integer giving target line width in characters.
+#' @param width positive integer giving target line width in characters. A
+#'   width less than or equal to 1 will put each word on its own line.
 #' @param indent non-negative integer giving indentation of first line in
 #'  each paragraph
 #' @param exdent non-negative integer giving indentation of following lines in
 #'  each paragraph
-#' @return a character vector of reformatted strings.
+#' @return A character vector of re-wrapped strings.
 #' @export
 #' @examples
 #' thanks_path <- file.path(R.home("doc"), "THANKS")
@@ -20,9 +20,11 @@
 #' cat(str_wrap(thanks, width = 40), "\n")
 #' cat(str_wrap(thanks, width = 60, indent = 2), "\n")
 #' cat(str_wrap(thanks, width = 60, exdent = 2), "\n")
+#' cat(str_wrap(thanks, width = 0, exdent = 2), "\n")
 str_wrap <- function(string, width = 80, indent = 0, exdent = 0) {
-  string <- check_string(string)
+  if (width <= 0) width <- 1
 
-  pieces <- strwrap(string, width, indent, exdent, simplify = FALSE)
-  unlist(lapply(pieces, str_c, collapse = "\n"))
+  out <- stri_wrap(string, width = width, indent = indent, exdent = exdent,
+    simplify = FALSE)
+  vapply(out, str_c, collapse = "\n", character(1))
 }
diff --git a/README.md b/README.md
index f0adf24..818d8a5 100644
--- a/README.md
+++ b/README.md
@@ -1,19 +1,46 @@
 # stringr
 
-Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The `stringr` package aims  [...]
+[![Build Status](https://travis-ci.org/hadley/stringr.png?branch=master)](https://travis-ci.org/hadley/stringr)
 
-More concretely, `stringr`:
+Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. 
 
- * Processes factors and characters in the same way.
+The __stringr__ package aims to remedy these problems by providing a clean, modern interface to common string operations. More concretely, stringr:
 
- * Gives functions consistent names and arguments.
+* Uses consistent functions and argument names.
 
- * Simplifies string operations by eliminating options that you don't need
-   95% of the time.
+* Simplifies string operations by eliminating options that you don't need
+  95% of the time.
 
- * Produces outputs than can easily be used as inputs. This includes ensuring
-   that missing inputs result in missing outputs, and zero length inputs
-   result in zero length outputs.
+* Produces outputs than can easily be used as inputs. This includes ensuring
+  that missing inputs result in missing outputs, and zero length inputs
+  result in zero length outputs.
 
- * Completes R's string handling functions with useful functions from other
-   programming languages.
+* Is built on top of [stringi](https://github.com/Rexamine/stringi/) which
+  uses the [ICU](http://site.icu-project.org) library to provide fast, correct
+  implementations of common string manipulations
+
+## Installation
+
+To get the current released version from CRAN:
+
+```R
+install.packages("stringr")
+```
+
+To get the current development version from github:
+
+```R
+# install.packages("devtools")
+devtools::install_github("Rexamine/stringi")
+devtools::install_github("hadley/stringr")
+```
+
+## Piping
+
+stringr provides the pipe, `%>%`, from magrittr to make it easy to string together sequences of string operations:
+
+```R
+letters %>%
+  str_pad(5, "right") %>%
+  str_c(letters)
+```
diff --git a/build/vignette.rds b/build/vignette.rds
new file mode 100644
index 0000000..24bc595
Binary files /dev/null and b/build/vignette.rds differ
diff --git a/inst/doc/stringr.R b/inst/doc/stringr.R
new file mode 100644
index 0000000..d738444
--- /dev/null
+++ b/inst/doc/stringr.R
@@ -0,0 +1,65 @@
+## ---- echo=FALSE---------------------------------------------------------
+library("stringr")
+knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
+
+## ------------------------------------------------------------------------
+strings <- c(
+  "apple", 
+  "219 733 8965", 
+  "329-293-8753", 
+  "Work: 579-499-7527; Home: 543.355.3679"
+)
+phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
+
+## ------------------------------------------------------------------------
+# Which strings contain phone numbers?
+str_detect(strings, phone)
+str_subset(strings, phone)
+
+## ------------------------------------------------------------------------
+# Where in the string is the phone number located?
+(loc <- str_locate(strings, phone))
+str_locate_all(strings, phone)
+
+## ------------------------------------------------------------------------
+# What are the phone numbers?
+str_extract(strings, phone)
+str_extract_all(strings, phone)
+str_extract_all(strings, phone, simplify = TRUE)
+
+## ------------------------------------------------------------------------
+# Pull out the three components of the match
+str_match(strings, phone)
+str_match_all(strings, phone)
+
+## ------------------------------------------------------------------------
+str_replace(strings, phone, "XXX-XXX-XXXX")
+str_replace_all(strings, phone, "XXX-XXX-XXXX")
+
+## ------------------------------------------------------------------------
+col2hex <- function(col) {
+  rgb <- col2rgb(col)
+  rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
+}
+
+# Goal replace colour names in a string with their hex equivalent
+strings <- c("Roses are red, violets are blue", "My favourite colour is green")
+
+colours <- str_c("\\b", colors(), "\\b", collapse="|")
+# This gets us the colours, but we have no way of replacing them
+str_extract_all(strings, colours)
+
+# Instead, let's work with locations
+locs <- str_locate_all(strings, colours)
+Map(function(string, loc) {
+  hex <- col2hex(str_sub(string, loc))
+  str_sub(string, loc) <- hex
+  string
+}, strings, locs)
+
+## ------------------------------------------------------------------------
+matches <- col2hex(colors())
+names(matches) <- str_c("\\b", colors(), "\\b")
+
+str_replace_all(strings, matches)
+
diff --git a/inst/doc/stringr.Rmd b/inst/doc/stringr.Rmd
new file mode 100644
index 0000000..3e79e57
--- /dev/null
+++ b/inst/doc/stringr.Rmd
@@ -0,0 +1,201 @@
+---
+title: "Introduction to stringr"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Introduction to stringr}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+```{r, echo=FALSE}
+library("stringr")
+knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
+```
+
+Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The __stringr__ package aim [...]
+
+More concretely, stringr:
+
+-   Simplifies string operations by eliminating options that you don't need 
+    95% of the time (the other 5% of the time you can functions from base R or
+    [stringi](https://github.com/Rexamine/stringi/)).
+
+-   Uses consistent function names and arguments.
+
+-   Produces outputs than can easily be used as inputs. This includes ensuring 
+    that missing inputs result in missing outputs, and zero length inputs result 
+    in zero length outputs. It also processes factors and character vectors in 
+    the same way.
+
+-   Completes R's string handling functions with useful functions from other 
+    programming languages.
+
+To meet these goals, stringr provides two basic families of functions:
+
+-   basic string operations, and
+
+-   pattern matching functions which use regular expressions to detect, locate, 
+    match, replace, extract, and split strings.
+
+As of version 1.0, stringr is a thin wrapper around [stringi](https://github.com/Rexamine/stringi/), which implements all the functions in stringr with efficient C code based on the [ICU library](http://site.icu-project.org).  Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you're getting started learning string functions, and if you do need more of stringi's power, you should find the interface similar.
+
+These are described in more detail in the following sections.
+
+## Basic string operations
+
+There are three string functions that are closely related to their base R equivalents, but with a few enhancements:
+
+-   `str_c()` is equivalent to `paste()`, but it uses the empty string ("") as 
+    the default separator and silently removes `NULL` inputs.
+
+-   `str_length()` is equivalent to `nchar()`, but it preserves NA's (rather than 
+     giving them length 2) and converts factors to characters (not integers).
+
+-   `str_sub()` is equivalent to `substr()` but it returns a zero length vector 
+    if any of its inputs are zero length, and otherwise expands each argument to
+    match the longest. It also accepts negative positions, which are calculated 
+    from the left of the last character. The end position defaults to `-1`, 
+    which corresponds to the last character.
+
+-   `str_str<-` is equivalent to `substr<-`, but like `str_sub` it understands 
+    negative indices, and replacement strings not do need to be the same length 
+    as the string they are replacing.
+
+Three functions add new functionality:
+
+-   `str_dup()` to duplicate the characters within a string.
+
+-   `str_trim()` to remove leading and trailing whitespace.
+
+-   `str_pad()` to pad a string with extra whitespace on the left, right, or both sides.
+
+## Pattern matching
+
+stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers:
+
+```{r}
+strings <- c(
+  "apple", 
+  "219 733 8965", 
+  "329-293-8753", 
+  "Work: 579-499-7527; Home: 543.355.3679"
+)
+phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
+```
+
+-   `str_detect()` detects the presence or absence of a pattern and returns a 
+    logical vector (similar to `grepl()`). `str_subset()` returns the elements
+    of a character vector that match a regular expression (similar to `grep()` 
+    with `value = TRUE`)`.
+    
+    ```{r}
+    # Which strings contain phone numbers?
+    str_detect(strings, phone)
+    str_subset(strings, phone)
+    ```
+
+-   `str_locate()` locates the first position of a pattern and returns a numeric 
+    matrix with columns start and end. `str_locate_all()` locates all matches, 
+    returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`.
+
+    ```{r}
+    # Where in the string is the phone number located?
+    (loc <- str_locate(strings, phone))
+    str_locate_all(strings, phone)
+    ```
+
+-   `str_extract()` extracts text corresponding to the first match, returning a 
+    character vector. `str_extract_all()` extracts all matches and returns a 
+    list of character vectors.
+
+    ```{r}
+    # What are the phone numbers?
+    str_extract(strings, phone)
+    str_extract_all(strings, phone)
+    str_extract_all(strings, phone, simplify = TRUE)
+    ```
+
+-   `str_match()` extracts capture groups formed by `()` from the first match. 
+    It returns a character matrix with one column for the complete match and 
+    one column for each group. `str_match_all()` extracts capture groups from 
+    all matches and returns a list of character matrices. Similar to 
+    `regmatches()`.
+
+    ```{r}
+    # Pull out the three components of the match
+    str_match(strings, phone)
+    str_match_all(strings, phone)
+    ```
+
+-   `str_replace()` replaces the first matched pattern and returns a character
+    vector. `str_replace_all()` replaces all matches. Similar to `sub()` and 
+    `gsub()`.
+
+    ```{r}
+    str_replace(strings, phone, "XXX-XXX-XXXX")
+    str_replace_all(strings, phone, "XXX-XXX-XXXX")
+    ```
+
+-   `str_split_fixed()` splits the string into a fixed number of pieces based 
+    on a pattern and returns a character matrix. `str_split()` splits a string 
+    into a variable number of pieces and returns a list of character vectors.
+
+### Arguments
+
+Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces.
+
+Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, `regexp()`, `coll()` and `fixed()`.  This is a deliberate choice made to simplify these functions. For example, while `grepl` has six arguments, `str_detect()` only has two.
+
+### Regular expressions
+
+To be able to use these functions effectively, you'll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started:
+
+-   A good [reference sheet](http://www.regular-expressions.info/reference.html).
+
+-   A tool that allows you to [interactively test](http://gskinner.com/RegExr/)
+    what a regular expression will match.
+
+-   A tool to [build a regular expression](http://www.txt2re.com) from an 
+    input string.
+
+When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn't match) test cases to ensure that you are matching the correct components.
+
+### Functions that return lists
+
+Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use `Map()` to iterate through the vectors simultaneously. The second strategy is illustrated below:
+
+```{r}
+col2hex <- function(col) {
+  rgb <- col2rgb(col)
+  rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
+}
+
+# Goal replace colour names in a string with their hex equivalent
+strings <- c("Roses are red, violets are blue", "My favourite colour is green")
+
+colours <- str_c("\\b", colors(), "\\b", collapse="|")
+# This gets us the colours, but we have no way of replacing them
+str_extract_all(strings, colours)
+
+# Instead, let's work with locations
+locs <- str_locate_all(strings, colours)
+Map(function(string, loc) {
+  hex <- col2hex(str_sub(string, loc))
+  str_sub(string, loc) <- hex
+  string
+}, strings, locs)
+```
+
+Another approach is to use the second form of `str_replace_all()`: if you give it a named vector, it applies each `pattern = replacement` in turn:
+
+```{r}
+matches <- col2hex(colors())
+names(matches) <- str_c("\\b", colors(), "\\b")
+
+str_replace_all(strings, matches)
+```
+
+## Conclusion
+
+stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free.
diff --git a/inst/doc/stringr.html b/inst/doc/stringr.html
new file mode 100644
index 0000000..c133022
--- /dev/null
+++ b/inst/doc/stringr.html
@@ -0,0 +1,267 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+
+<meta name="date" content="2015-04-29" />
+
+<title>Introduction to stringr</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; }
+code > span.dt { color: #902000; }
+code > span.dv { color: #40a070; }
+code > span.bn { color: #40a070; }
+code > span.fl { color: #40a070; }
+code > span.ch { color: #4070a0; }
+code > span.st { color: #4070a0; }
+code > span.co { color: #60a0b0; font-style: italic; }
+code > span.ot { color: #007020; }
+code > span.al { color: #ff0000; font-weight: bold; }
+code > span.fu { color: #06287e; }
+code > span.er { color: #ff0000; font-weight: bold; }
+</style>
+<style type="text/css">
+  pre:not([class]) {
+    background-color: white;
+  }
+</style>
+
+
+<link href="data:text/css,body%20%7B%0A%20%20background%2Dcolor%3A%20%23fff%3B%0A%20%20margin%3A%201em%20auto%3B%0A%20%20max%2Dwidth%3A%20700px%3B%0A%20%20overflow%3A%20visible%3B%0A%20%20padding%2Dleft%3A%202em%3B%0A%20%20padding%2Dright%3A%202em%3B%0A%20%20font%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0A%20%20font%2Dsize%3A%2014px%3B%0A%20%20line%2Dheight%3A%201%2E35%3B%0A%7D%0A%0A%23header%20%7B%0A%20%20text%2Dalign%3A% [...]
+
+</head>
+
+<body>
+
+
+
+<div id="header">
+<h1 class="title">Introduction to stringr</h1>
+<h4 class="date"><em>2015-04-29</em></h4>
+</div>
+
+
+<p>Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The <strong>stringr</str [...]
+<p>More concretely, stringr:</p>
+<ul>
+<li><p>Simplifies string operations by eliminating options that you don’t need 95% of the time (the other 5% of the time you can functions from base R or <a href="https://github.com/Rexamine/stringi/">stringi</a>).</p></li>
+<li><p>Uses consistent function names and arguments.</p></li>
+<li><p>Produces outputs than can easily be used as inputs. This includes ensuring that missing inputs result in missing outputs, and zero length inputs result in zero length outputs. It also processes factors and character vectors in the same way.</p></li>
+<li><p>Completes R’s string handling functions with useful functions from other programming languages.</p></li>
+</ul>
+<p>To meet these goals, stringr provides two basic families of functions:</p>
+<ul>
+<li><p>basic string operations, and</p></li>
+<li><p>pattern matching functions which use regular expressions to detect, locate, match, replace, extract, and split strings.</p></li>
+</ul>
+<p>As of version 1.0, stringr is a thin wrapper around <a href="https://github.com/Rexamine/stringi/">stringi</a>, which implements all the functions in stringr with efficient C code based on the <a href="http://site.icu-project.org">ICU library</a>. Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you’re getting started learning string functions, and if you do need more of stringi’s power, you should find the interfa [...]
+<p>These are described in more detail in the following sections.</p>
+<div id="basic-string-operations" class="section level2">
+<h2>Basic string operations</h2>
+<p>There are three string functions that are closely related to their base R equivalents, but with a few enhancements:</p>
+<ul>
+<li><p><code>str_c()</code> is equivalent to <code>paste()</code>, but it uses the empty string (“”) as the default separator and silently removes <code>NULL</code> inputs.</p></li>
+<li><p><code>str_length()</code> is equivalent to <code>nchar()</code>, but it preserves NA’s (rather than giving them length 2) and converts factors to characters (not integers).</p></li>
+<li><p><code>str_sub()</code> is equivalent to <code>substr()</code> but it returns a zero length vector if any of its inputs are zero length, and otherwise expands each argument to match the longest. It also accepts negative positions, which are calculated from the left of the last character. The end position defaults to <code>-1</code>, which corresponds to the last character.</p></li>
+<li><p><code>str_str<-</code> is equivalent to <code>substr<-</code>, but like <code>str_sub</code> it understands negative indices, and replacement strings not do need to be the same length as the string they are replacing.</p></li>
+</ul>
+<p>Three functions add new functionality:</p>
+<ul>
+<li><p><code>str_dup()</code> to duplicate the characters within a string.</p></li>
+<li><p><code>str_trim()</code> to remove leading and trailing whitespace.</p></li>
+<li><p><code>str_pad()</code> to pad a string with extra whitespace on the left, right, or both sides.</p></li>
+</ul>
+</div>
+<div id="pattern-matching" class="section level2">
+<h2>Pattern matching</h2>
+<p>stringr provides pattern matching functions to <strong>detect</strong>, <strong>locate</strong>, <strong>extract</strong>, <strong>match</strong>, <strong>replace</strong>, and <strong>split</strong> strings. I’ll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers:</p>
+<pre class="sourceCode r"><code class="sourceCode r">strings <-<span class="st"> </span><span class="kw">c</span>(
+  <span class="st">"apple"</span>, 
+  <span class="st">"219 733 8965"</span>, 
+  <span class="st">"329-293-8753"</span>, 
+  <span class="st">"Work: 579-499-7527; Home: 543.355.3679"</span>
+)
+phone <-<span class="st"> "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"</span></code></pre>
+<ul>
+<li><p><code>str_detect()</code> detects the presence or absence of a pattern and returns a logical vector (similar to <code>grepl()</code>). <code>str_subset()</code> returns the elements of a character vector that match a regular expression (similar to <code>grep()</code> with <code>value = TRUE</code>)`.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Which strings contain phone numbers?</span>
+<span class="kw">str_detect</span>(strings, phone)
+<span class="co">#> [1] FALSE  TRUE  TRUE  TRUE</span>
+<span class="kw">str_subset</span>(strings, phone)
+<span class="co">#> [1] "219 733 8965"                          </span>
+<span class="co">#> [2] "329-293-8753"                          </span>
+<span class="co">#> [3] "Work: 579-499-7527; Home: 543.355.3679"</span></code></pre></li>
+<li><p><code>str_locate()</code> locates the first position of a pattern and returns a numeric matrix with columns start and end. <code>str_locate_all()</code> locates all matches, returning a list of numeric matrices. Similar to <code>regexpr()</code> and <code>gregexpr()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Where in the string is the phone number located?</span>
+(loc <-<span class="st"> </span><span class="kw">str_locate</span>(strings, phone))
+<span class="co">#>      start end</span>
+<span class="co">#> [1,]    NA  NA</span>
+<span class="co">#> [2,]     1  12</span>
+<span class="co">#> [3,]     1  12</span>
+<span class="co">#> [4,]     7  18</span>
+<span class="kw">str_locate_all</span>(strings, phone)
+<span class="co">#> [[1]]</span>
+<span class="co">#>      start end</span>
+<span class="co">#> </span>
+<span class="co">#> [[2]]</span>
+<span class="co">#>      start end</span>
+<span class="co">#> [1,]     1  12</span>
+<span class="co">#> </span>
+<span class="co">#> [[3]]</span>
+<span class="co">#>      start end</span>
+<span class="co">#> [1,]     1  12</span>
+<span class="co">#> </span>
+<span class="co">#> [[4]]</span>
+<span class="co">#>      start end</span>
+<span class="co">#> [1,]     7  18</span>
+<span class="co">#> [2,]    27  38</span></code></pre></li>
+<li><p><code>str_extract()</code> extracts text corresponding to the first match, returning a character vector. <code>str_extract_all()</code> extracts all matches and returns a list of character vectors.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># What are the phone numbers?</span>
+<span class="kw">str_extract</span>(strings, phone)
+<span class="co">#> [1] NA             "219 733 8965" "329-293-8753" "579-499-7527"</span>
+<span class="kw">str_extract_all</span>(strings, phone)
+<span class="co">#> [[1]]</span>
+<span class="co">#> character(0)</span>
+<span class="co">#> </span>
+<span class="co">#> [[2]]</span>
+<span class="co">#> [1] "219 733 8965"</span>
+<span class="co">#> </span>
+<span class="co">#> [[3]]</span>
+<span class="co">#> [1] "329-293-8753"</span>
+<span class="co">#> </span>
+<span class="co">#> [[4]]</span>
+<span class="co">#> [1] "579-499-7527" "543.355.3679"</span>
+<span class="kw">str_extract_all</span>(strings, phone, <span class="dt">simplify =</span> <span class="ot">TRUE</span>)
+<span class="co">#>      [,1]           [,2]          </span>
+<span class="co">#> [1,] ""             ""            </span>
+<span class="co">#> [2,] "219 733 8965" ""            </span>
+<span class="co">#> [3,] "329-293-8753" ""            </span>
+<span class="co">#> [4,] "579-499-7527" "543.355.3679"</span></code></pre></li>
+<li><p><code>str_match()</code> extracts capture groups formed by <code>()</code> from the first match. It returns a character matrix with one column for the complete match and one column for each group. <code>str_match_all()</code> extracts capture groups from all matches and returns a list of character matrices. Similar to <code>regmatches()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="co"># Pull out the three components of the match</span>
+<span class="kw">str_match</span>(strings, phone)
+<span class="co">#>      [,1]           [,2]  [,3]  [,4]  </span>
+<span class="co">#> [1,] NA             NA    NA    NA    </span>
+<span class="co">#> [2,] "219 733 8965" "219" "733" "8965"</span>
+<span class="co">#> [3,] "329-293-8753" "329" "293" "8753"</span>
+<span class="co">#> [4,] "579-499-7527" "579" "499" "7527"</span>
+<span class="kw">str_match_all</span>(strings, phone)
+<span class="co">#> [[1]]</span>
+<span class="co">#>      [,1] [,2] [,3] [,4]</span>
+<span class="co">#> </span>
+<span class="co">#> [[2]]</span>
+<span class="co">#>      [,1]           [,2]  [,3]  [,4]  </span>
+<span class="co">#> [1,] "219 733 8965" "219" "733" "8965"</span>
+<span class="co">#> </span>
+<span class="co">#> [[3]]</span>
+<span class="co">#>      [,1]           [,2]  [,3]  [,4]  </span>
+<span class="co">#> [1,] "329-293-8753" "329" "293" "8753"</span>
+<span class="co">#> </span>
+<span class="co">#> [[4]]</span>
+<span class="co">#>      [,1]           [,2]  [,3]  [,4]  </span>
+<span class="co">#> [1,] "579-499-7527" "579" "499" "7527"</span>
+<span class="co">#> [2,] "543.355.3679" "543" "355" "3679"</span></code></pre></li>
+<li><p><code>str_replace()</code> replaces the first matched pattern and returns a character vector. <code>str_replace_all()</code> replaces all matches. Similar to <code>sub()</code> and <code>gsub()</code>.</p>
+<pre class="sourceCode r"><code class="sourceCode r"><span class="kw">str_replace</span>(strings, phone, <span class="st">"XXX-XXX-XXXX"</span>)
+<span class="co">#> [1] "apple"                                 </span>
+<span class="co">#> [2] "XXX-XXX-XXXX"                          </span>
+<span class="co">#> [3] "XXX-XXX-XXXX"                          </span>
+<span class="co">#> [4] "Work: XXX-XXX-XXXX; Home: 543.355.3679"</span>
+<span class="kw">str_replace_all</span>(strings, phone, <span class="st">"XXX-XXX-XXXX"</span>)
+<span class="co">#> [1] "apple"                                 </span>
+<span class="co">#> [2] "XXX-XXX-XXXX"                          </span>
+<span class="co">#> [3] "XXX-XXX-XXXX"                          </span>
+<span class="co">#> [4] "Work: XXX-XXX-XXXX; Home: XXX-XXX-XXXX"</span></code></pre></li>
+<li><p><code>str_split_fixed()</code> splits the string into a fixed number of pieces based on a pattern and returns a character matrix. <code>str_split()</code> splits a string into a variable number of pieces and returns a list of character vectors.</p></li>
+</ul>
+<div id="arguments" class="section level3">
+<h3>Arguments</h3>
+<p>Each pattern matching function has the same first two arguments, a character vector of <code>string</code>s to process and a single <code>pattern</code> (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces.</p>
+<p>Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, <code>regexp()</code>, <code>coll()</code> and <code>fixed()</code>. This is a deliberate choice made to simplify these functions. For example, while <code>grepl</code> has six arguments, <code>str_detect()</code> only has two.</p>
+</div>
+<div id="regular-expressions" class="section level3">
+<h3>Regular expressions</h3>
+<p>To be able to use these functions effectively, you’ll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started:</p>
+<ul>
+<li><p>A good <a href="http://www.regular-expressions.info/reference.html">reference sheet</a>.</p></li>
+<li><p>A tool that allows you to <a href="http://gskinner.com/RegExr/">interactively test</a> what a regular expression will match.</p></li>
+<li><p>A tool to <a href="http://www.txt2re.com">build a regular expression</a> from an input string.</p></li>
+</ul>
+<p>When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn’t match) test cases to ensure that you are matching the correct components.</p>
+</div>
+<div id="functions-that-return-lists" class="section level3">
+<h3>Functions that return lists</h3>
+<p>Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use <code>Map()</code> to iterate through the vectors simultaneously. The second strategy is illustrated below:</p>
+<pre class="sourceCode r"><code class="sourceCode r">col2hex <-<span class="st"> </span>function(col) {
+  rgb <-<span class="st"> </span><span class="kw">col2rgb</span>(col)
+  <span class="kw">rgb</span>(rgb[<span class="st">"red"</span>, ], rgb[<span class="st">"green"</span>, ], rgb[<span class="st">"blue"</span>, ], <span class="dt">max =</span> <span class="dv">255</span>)
+}
+
+<span class="co"># Goal replace colour names in a string with their hex equivalent</span>
+strings <-<span class="st"> </span><span class="kw">c</span>(<span class="st">"Roses are red, violets are blue"</span>, <span class="st">"My favourite colour is green"</span>)
+
+colours <-<span class="st"> </span><span class="kw">str_c</span>(<span class="st">"</span><span class="ch">\\</span><span class="st">b"</span>, <span class="kw">colors</span>(), <span class="st">"</span><span class="ch">\\</span><span class="st">b"</span>, <span class="dt">collapse=</span><span class="st">"|"</span>)
+<span class="co"># This gets us the colours, but we have no way of replacing them</span>
+<span class="kw">str_extract_all</span>(strings, colours)
+<span class="co">#> [[1]]</span>
+<span class="co">#> [1] "red"  "blue"</span>
+<span class="co">#> </span>
+<span class="co">#> [[2]]</span>
+<span class="co">#> [1] "green"</span>
+
+<span class="co"># Instead, let's work with locations</span>
+locs <-<span class="st"> </span><span class="kw">str_locate_all</span>(strings, colours)
+<span class="kw">Map</span>(function(string, loc) {
+  hex <-<span class="st"> </span><span class="kw">col2hex</span>(<span class="kw">str_sub</span>(string, loc))
+  <span class="kw">str_sub</span>(string, loc) <-<span class="st"> </span>hex
+  string
+}, strings, locs)
+<span class="co">#> $`Roses are red, violets are blue`</span>
+<span class="co">#> [1] "Roses are #FF0000, violets are blue"</span>
+<span class="co">#> [2] "Roses are red, violets are #0000FF" </span>
+<span class="co">#> </span>
+<span class="co">#> $`My favourite colour is green`</span>
+<span class="co">#> [1] "My favourite colour is #00FF00"</span></code></pre>
+<p>Another approach is to use the second form of <code>str_replace_all()</code>: if you give it a named vector, it applies each <code>pattern = replacement</code> in turn:</p>
+<pre class="sourceCode r"><code class="sourceCode r">matches <-<span class="st"> </span><span class="kw">col2hex</span>(<span class="kw">colors</span>())
+<span class="kw">names</span>(matches) <-<span class="st"> </span><span class="kw">str_c</span>(<span class="st">"</span><span class="ch">\\</span><span class="st">b"</span>, <span class="kw">colors</span>(), <span class="st">"</span><span class="ch">\\</span><span class="st">b"</span>)
+
+<span class="kw">str_replace_all</span>(strings, matches)
+<span class="co">#> [1] "Roses are #FF0000, violets are #0000FF"</span>
+<span class="co">#> [2] "My favourite colour is #00FF00"</span></code></pre>
+</div>
+</div>
+<div id="conclusion" class="section level2">
+<h2>Conclusion</h2>
+<p>stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free.</p>
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/tests/test-check.r b/inst/tests/test-check.r
deleted file mode 100644
index da71498..0000000
--- a/inst/tests/test-check.r
+++ /dev/null
@@ -1,22 +0,0 @@
-context("String and pattern checks")
-
-test_that("string is atomic", {
-  expect_that(check_string(list()),
-    throws_error("must be an atomic"))
-})
-
-test_that("pattern is a string", {
-  expect_that(check_pattern(1),
-    throws_error("must be a character vector"))
-})
-
-test_that("error when string and pattern lengths incompatible", {
-  expect_that(check_pattern(letters, "a"), equals(letters))
-  expect_that(check_pattern("a", letters), equals("a"))
-
-  expect_that(check_pattern(c("a", "b", "c"), c("a", "b")),
-    throws_error("not compatible"))
-  expect_that(check_pattern(c("a", "b"), c("a", "b", "c")),
-    throws_error("not compatible"))
-})
-
diff --git a/man/case.Rd b/man/case.Rd
new file mode 100644
index 0000000..92d56be
--- /dev/null
+++ b/man/case.Rd
@@ -0,0 +1,34 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/case.R
+\name{case}
+\alias{case}
+\alias{str_to_lower}
+\alias{str_to_title}
+\alias{str_to_upper}
+\title{Convert case of a string.}
+\usage{
+str_to_upper(string, locale = "")
+
+str_to_lower(string, locale = "")
+
+str_to_title(string, locale = "")
+}
+\arguments{
+\item{string}{String to modify}
+
+\item{locale}{Locale to use for translations.}
+}
+\description{
+Convert case of a string.
+}
+\examples{
+dog <- "The quick brown dog"
+str_to_upper(dog)
+str_to_lower(dog)
+str_to_title(dog)
+
+# Locale matters!
+str_to_upper("i", "en") # English
+str_to_upper("i", "tr") # Turkish
+}
+
diff --git a/man/fixed.Rd b/man/fixed.Rd
deleted file mode 100644
index 8df94e6..0000000
--- a/man/fixed.Rd
+++ /dev/null
@@ -1,27 +0,0 @@
-\name{fixed}
-\alias{fixed}
-\title{Match fixed characters, not regular expression.}
-\usage{
-  fixed(string)
-}
-\arguments{
-  \item{string}{string to match exactly as is}
-}
-\description{
-  This function specifies that a pattern is a fixed string,
-  rather than a regular expression.  This can yield
-  substantial speed ups, if regular expression matching is
-  not needed.
-}
-\examples{
-pattern <- "a.b"
-strings <- c("abb", "a.b")
-str_detect(strings, pattern)
-str_detect(strings, fixed(pattern))
-}
-\seealso{
-  Other modifiers: \code{\link{ignore.case}},
-  \code{\link{perl}}
-}
-\keyword{character}
-
diff --git a/man/ignore.case.Rd b/man/ignore.case.Rd
deleted file mode 100644
index 40f8178..0000000
--- a/man/ignore.case.Rd
+++ /dev/null
@@ -1,24 +0,0 @@
-\name{ignore.case}
-\alias{ignore.case}
-\title{Ignore case of match.}
-\usage{
-  ignore.case(string)
-}
-\arguments{
-  \item{string}{pattern for which to ignore case}
-}
-\description{
-  This function specifies that a pattern should ignore the
-  case of matches.
-}
-\examples{
-pattern <- "a.b"
-strings <- c("ABB", "aaB", "aab")
-str_detect(strings, pattern)
-str_detect(strings, ignore.case(pattern))
-}
-\seealso{
-  Other modifiers: \code{\link{fixed}}, \code{\link{perl}}
-}
-\keyword{character}
-
diff --git a/man/invert_match.Rd b/man/invert_match.Rd
index 4ed7768..92619f8 100644
--- a/man/invert_match.Rd
+++ b/man/invert_match.Rd
@@ -1,19 +1,20 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/locate.r
 \name{invert_match}
 \alias{invert_match}
 \title{Switch location of matches to location of non-matches.}
 \usage{
-  invert_match(loc)
+invert_match(loc)
 }
 \arguments{
-  \item{loc}{matrix of match locations, as from
-  \code{\link{str_locate_all}}}
+\item{loc}{matrix of match locations, as from \code{\link{str_locate_all}}}
 }
 \value{
-  numeric match giving locations of non-matches
+numeric match giving locations of non-matches
 }
 \description{
-  Invert a matrix of match locations to match the opposite
-  of what was previously matched.
+Invert a matrix of match locations to match the opposite of what was
+previously matched.
 }
 \examples{
 numbers <- "1 and 2 and 4 and 456"
diff --git a/man/modifier-deprecated.Rd b/man/modifier-deprecated.Rd
new file mode 100644
index 0000000..490d18e
--- /dev/null
+++ b/man/modifier-deprecated.Rd
@@ -0,0 +1,17 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/modifiers.r
+\name{modifier-deprecated}
+\alias{ignore.case}
+\alias{modifier-deprecated}
+\alias{perl}
+\title{Deprecated modifier functions.}
+\usage{
+ignore.case(string)
+
+perl(pattern)
+}
+\description{
+Please use \code{\link{regexp}} and \code{\link{coll}} instead.
+}
+\keyword{internal}
+
diff --git a/man/modifiers.Rd b/man/modifiers.Rd
new file mode 100644
index 0000000..caeb126
--- /dev/null
+++ b/man/modifiers.Rd
@@ -0,0 +1,87 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/modifiers.r
+\name{modifiers}
+\alias{boundary}
+\alias{coll}
+\alias{fixed}
+\alias{modifiers}
+\alias{regex}
+\title{Control matching behaviour with modifier functions.}
+\usage{
+fixed(pattern, ignore_case = FALSE)
+
+coll(pattern, ignore_case = FALSE, locale = NULL, ...)
+
+regex(pattern, ignore_case = FALSE, multiline = FALSE, comments = FALSE,
+  dotall = FALSE, ...)
+
+boundary(type = c("character", "line_break", "sentence", "word"),
+  skip_word_none = TRUE, ...)
+}
+\arguments{
+\item{pattern}{Pattern to modify behaviour.}
+
+\item{ignore_case}{Should case differences be ignored in the match?}
+
+\item{locale}{Locale to use for comparisons. See
+\code{\link[stringi]{stri_locale_list}()} for all possible options.}
+
+\item{...}{Other less frequently used arguments passed on to
+\code{\link[stringi]{stri_opts_collator}},
+\code{\link[stringi]{stri_opts_regex}}, or
+\code{\link[stringi]{stri_opts_brkiter}}}
+
+\item{multiline}{If \code{TRUE}, \code{$} and \code{^} match
+the beginning and end of each line. If \code{FALSE}, the
+default, only match the start and end of the input.}
+
+\item{comments}{If \code{TRUE}, white space and comments beginning with
+\code{#} are ignored. Escape literal spaces with \code{\\ }.}
+
+\item{dotall}{If \code{TRUE}, \code{.} will also match line terminators.}
+
+\item{type}{Boundary type to detect.}
+
+\item{skip_word_none}{Ignore "words" that don't contain any characters
+or numbers - i.e. punctuation.}
+}
+\description{
+\describe{
+ \item{fixed}{Compare literal bytes in the string. This is very fast, but
+   not usually what you want for non-ASCII character sets.}
+ \item{coll}{Compare strings respecting standard collation rules.}
+ \item{regexp}{The default. Uses ICU regular expressions.}
+ \item{boundary}{Match boundaries between things.}
+}
+}
+\examples{
+pattern <- "a.b"
+strings <- c("abb", "a.b")
+str_detect(strings, pattern)
+str_detect(strings, fixed(pattern))
+str_detect(strings, coll(pattern))
+
+# coll() is useful for locale-aware case-insensitive matching
+i <- c("I", "\\u0130", "i")
+i
+str_detect(i, fixed("i", TRUE))
+str_detect(i, coll("i", TRUE))
+str_detect(i, coll("i", TRUE, locale = "tr"))
+
+# Word boundaries
+words <- c("These are   some words.")
+str_count(words, boundary("word"))
+str_split(words, " ")[[1]]
+str_split(words, boundary("word"))[[1]]
+
+# Regular expression variations
+str_extract_all("The Cat in the Hat", "[a-z]+")
+str_extract_all("The Cat in the Hat", regex("[a-z]+", TRUE))
+
+str_extract_all("a\\nb\\nc", "^.")
+str_extract_all("a\\nb\\nc", regex("^.", multiline = TRUE))
+
+str_extract_all("a\\nb\\nc", "a.")
+str_extract_all("a\\nb\\nc", regex("a.", dotall = TRUE))
+}
+
diff --git a/man/perl.Rd b/man/perl.Rd
deleted file mode 100644
index 189c0b6..0000000
--- a/man/perl.Rd
+++ /dev/null
@@ -1,26 +0,0 @@
-\name{perl}
-\alias{perl}
-\title{Use perl regular expressions.}
-\usage{
-  perl(string)
-}
-\arguments{
-  \item{string}{pattern to match with Perl regexps}
-}
-\description{
-  This function specifies that a pattern should use the
-  Perl regular expression egine, rather than the default
-  POSIX 1003.2 extended regular expressions
-}
-\examples{
-pattern <- "(?x)a.b"
-strings <- c("abb", "a.b")
-\dontrun{str_detect(strings, pattern)}
-str_detect(strings, perl(pattern))
-}
-\seealso{
-  Other modifiers: \code{\link{fixed}},
-  \code{\link{ignore.case}}
-}
-\keyword{character}
-
diff --git a/man/pipe.Rd b/man/pipe.Rd
new file mode 100644
index 0000000..08a8b06
--- /dev/null
+++ b/man/pipe.Rd
@@ -0,0 +1,13 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/utils.R
+\name{\%>\%}
+\alias{\%>\%}
+\title{Pipe operator}
+\usage{
+lhs \%>\% rhs
+}
+\description{
+Pipe operator
+}
+\keyword{internal}
+
diff --git a/man/str_c.Rd b/man/str_c.Rd
index 1540ead..bf60e13 100644
--- a/man/str_c.Rd
+++ b/man/str_c.Rd
@@ -1,35 +1,36 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/c.r
 \name{str_c}
 \alias{str_c}
 \alias{str_join}
 \title{Join multiple strings into a single string.}
 \usage{
-  str_c(..., sep = "", collapse = NULL)
+str_c(..., sep = "", collapse = NULL)
+
+str_join(..., sep = "", collapse = NULL)
 }
 \arguments{
-  \item{...}{one or more character vectors.  Zero length
-  arguments are removed}
+\item{...}{One or more character vectors. Zero length arguments
+are removed.}
 
-  \item{sep}{string to insert between input vectors}
+\item{sep}{String to insert between input vectors.}
 
-  \item{collapse}{optional string used to combine input
-  vectors into single string}
+\item{collapse}{Optional string used to combine input vectors into single
+string.}
 }
 \value{
-  If \code{collapse = NULL} (the default) a character
-  vector with length equal to the longest input string.  If
-  \code{collapse} is non- NULL, a character vector of
-  length 1.
+If \code{collapse = NULL} (the default) a character vector with
+  length equal to the longest input string. If \code{collapse} is
+  non-NULL, a character vector of length 1.
 }
 \description{
-  To understand how \code{str_c} works, you need to imagine
-  that you are building up a matrix of strings.  Each input
-  argument forms a column, and is expanded to the length of
-  the longest argument, using the usual recyling rules.
-  The \code{sep} string is inserted between each column. If
-  collapse is \code{NULL} each row is collapsed into a
-  single string.  If non-\code{NULL} that string is
-  inserted at the end of each row, and the entire matrix
-  collapsed to a single string.
+To understand how \code{str_c} works, you need to imagine that you are
+building up a matrix of strings. Each input argument forms a column, and
+is expanded to the length of the longest argument, using the usual
+recyling rules.  The \code{sep} string is inserted between each column. If
+collapse is \code{NULL} each row is collapsed into a single string. If
+non-\code{NULL} that string is inserted at the end of each row, and
+the entire matrix collapsed to a single string.
 }
 \examples{
 str_c("Letter: ", letters)
@@ -39,9 +40,14 @@ str_c(letters[-26], " comes before ", letters[-1])
 
 str_c(letters, collapse = "")
 str_c(letters, collapse = ", ")
+
+# Missing inputs give missing outputs
+str_c(c("a", NA, "b"), "-d")
+# Use str_replace_NA to display literal NAs:
+str_c(str_replace_na(c("a", NA, "b")), "-d")
 }
 \seealso{
-  \code{\link{paste}} which this function wraps
+\code{\link{paste}} for equivalent base R functionality, and
+   \code{\link[stringi]{stri_c}} which this function wraps
 }
-\keyword{character}
 
diff --git a/man/str_conv.Rd b/man/str_conv.Rd
new file mode 100644
index 0000000..252d452
--- /dev/null
+++ b/man/str_conv.Rd
@@ -0,0 +1,25 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/conv.R
+\name{str_conv}
+\alias{str_conv}
+\title{Specify the encoding of a string.}
+\usage{
+str_conv(string, encoding)
+}
+\arguments{
+\item{string}{String to re-encode.}
+
+\item{encoding}{Name of encoding. See \code{\link[stringi]{stri_enc_list}}
+for a complete list.}
+}
+\description{
+This is a convenient way to override the current encoding of a string.
+}
+\examples{
+# Example from encoding?stringi::stringi
+x <- rawToChar(as.raw(177))
+x
+str_conv(x, "ISO-8859-2") # Polish "a with ogonek"
+str_conv(x, "ISO-8859-1") # Plus-minus
+}
+
diff --git a/man/str_count.Rd b/man/str_count.Rd
index a856cf6..38cb61b 100644
--- a/man/str_count.Rd
+++ b/man/str_count.Rd
@@ -1,26 +1,35 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/count.r
 \name{str_count}
 \alias{str_count}
 \title{Count the number of matches in a string.}
 \usage{
-  str_count(string, pattern)
+str_count(string, pattern = "")
 }
 \arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
 
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
+
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
 }
 \value{
-  integer vector
+An integer vector.
 }
 \description{
-  Vectorised over \code{string} and \code{pattern}, shorter
-  is recycled to same length as longest.
+Vectorised over \code{string} and \code{pattern}.
 }
 \examples{
 fruit <- c("apple", "banana", "pear", "pineapple")
@@ -28,12 +37,14 @@ str_count(fruit, "a")
 str_count(fruit, "p")
 str_count(fruit, "e")
 str_count(fruit, c("a", "b", "p", "p"))
+
+str_count(c("a.", "...", ".a.a"), ".")
+str_count(c("a.", "...", ".a.a"), fixed("."))
 }
 \seealso{
-  \code{\link{regexpr}} which this function wraps
+\code{\link[stringi]{stri_count}} which this function wraps.
 
-  \code{\link{str_locate}}/\code{\link{str_locate_all}} to
-  locate position of matches
+ \code{\link{str_locate}}/\code{\link{str_locate_all}} to locate position
+ of matches
 }
-\keyword{character}
 
diff --git a/man/str_detect.Rd b/man/str_detect.Rd
index b673650..afdad9b 100644
--- a/man/str_detect.Rd
+++ b/man/str_detect.Rd
@@ -1,25 +1,35 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/detect.r
 \name{str_detect}
 \alias{str_detect}
 \title{Detect the presence or absence of a pattern in a string.}
 \usage{
-  str_detect(string, pattern)
+str_detect(string, pattern)
 }
 \arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
 
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
+
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
 }
 \value{
-  boolean vector
+A logical vector.
 }
 \description{
-  Vectorised over \code{string} and \code{pattern}.
+Vectorised over \code{string} and \code{pattern}.
 }
 \examples{
 fruit <- c("apple", "banana", "pear", "pinapple")
@@ -33,7 +43,6 @@ str_detect(fruit, "[aeiou]")
 str_detect("aecfg", letters)
 }
 \seealso{
-  \code{\link{grepl}} which this function wraps
+\code{\link[stringi]{stri_detect}} which this function wraps
 }
-\keyword{character}
 
diff --git a/man/str_dup.Rd b/man/str_dup.Rd
index 5c1a44d..1d6b1c9 100644
--- a/man/str_dup.Rd
+++ b/man/str_dup.Rd
@@ -1,19 +1,21 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/dup.r
 \name{str_dup}
 \alias{str_dup}
 \title{Duplicate and concatenate strings within a character vector.}
 \usage{
-  str_dup(string, times)
+str_dup(string, times)
 }
 \arguments{
-  \item{string}{input character vector}
+\item{string}{Input character vector.}
 
-  \item{times}{number of times to duplicate each string}
+\item{times}{Number of times to duplicate each string.}
 }
 \value{
-  character vector
+A character vector.
 }
 \description{
-  Vectorised over \code{string} and \code{times}.
+Vectorised over \code{string} and \code{times}.
 }
 \examples{
 fruit <- c("apple", "pear", "banana")
@@ -21,5 +23,4 @@ str_dup(fruit, 2)
 str_dup(fruit, 1:3)
 str_c("ba", str_dup("na", 0:5))
 }
-\keyword{character}
 
diff --git a/man/str_extract.Rd b/man/str_extract.Rd
index 0664dbb..6b35865 100644
--- a/man/str_extract.Rd
+++ b/man/str_extract.Rd
@@ -1,36 +1,61 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/extract.r
 \name{str_extract}
 \alias{str_extract}
-\title{Extract first piece of a string that matches a pattern.}
+\alias{str_extract_all}
+\title{Extract matching patterns from a string.}
 \usage{
-  str_extract(string, pattern)
+str_extract(string, pattern)
+
+str_extract_all(string, pattern, simplify = FALSE)
 }
 \arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
+
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
+
+\item{simplify}{If \code{FALSE}, the default, returns a list of character
+vectors. If \code{TRUE} returns a character matrix.}
 }
 \value{
-  character vector.
+A character vector.
 }
 \description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
+Vectorised over \code{string} and \code{pattern}.
 }
 \examples{
-shopping_list <- c("apples x4", "flour", "sugar", "milk x2")
+shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
 str_extract(shopping_list, "\\\\d")
 str_extract(shopping_list, "[a-z]+")
 str_extract(shopping_list, "[a-z]{1,4}")
 str_extract(shopping_list, "\\\\b[a-z]{1,4}\\\\b")
+
+# Extract all matches
+str_extract_all(shopping_list, "[a-z]+")
+str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b")
+str_extract_all(shopping_list, "\\\\d")
+
+# Simplify results into character matrix
+str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b", simplify = TRUE)
+str_extract_all(shopping_list, "\\\\d", simplify = TRUE)
 }
 \seealso{
-  \code{\link{str_extract_all}} to extract all matches
+\code{\link[stringi]{stri_extract_first}} and
+  \code{\link[stringi]{stri_extract_all}} for the underlying
+  implementation.
 }
-\keyword{character}
 
diff --git a/man/str_extract_all.Rd b/man/str_extract_all.Rd
deleted file mode 100644
index e36a0d6..0000000
--- a/man/str_extract_all.Rd
+++ /dev/null
@@ -1,35 +0,0 @@
-\name{str_extract_all}
-\alias{str_extract_all}
-\title{Extract all pieces of a string that match a pattern.}
-\usage{
-  str_extract_all(string, pattern)
-}
-\arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
-}
-\value{
-  list of character vectors.
-}
-\description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
-}
-\examples{
-shopping_list <- c("apples x4", "bag of flour", "bag of sugar", "milk x2")
-str_extract_all(shopping_list, "[a-z]+")
-str_extract_all(shopping_list, "\\\\b[a-z]+\\\\b")
-str_extract_all(shopping_list, "\\\\d")
-}
-\seealso{
-  \code{\link{str_extract}} to extract the first match
-}
-\keyword{character}
-
diff --git a/man/str_length.Rd b/man/str_length.Rd
index 6b284c8..920c593 100644
--- a/man/str_length.Rd
+++ b/man/str_length.Rd
@@ -1,27 +1,45 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/length.r
 \name{str_length}
 \alias{str_length}
-\title{The length of a string (in characters).}
+\title{The length of a string.}
 \usage{
-  str_length(string)
+str_length(string)
 }
 \arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
 }
 \value{
-  numeric vector giving number of characters in each
-  element of the character vector.  Missing string have
-  missing length.
+A numeric vector giving number of characters (code points) in each
+   element of the character vector. Missing string have missing length.
 }
 \description{
-  The length of a string (in characters).
+Technically this returns the number of "code points", in a string. One
+code point usually corresponds to one character, but not always. For example,
+an u with a umlaut might be represented as a single character or as the
+combination a u and an umlaut.
 }
 \examples{
 str_length(letters)
+str_length(NA)
+str_length(factor("abc"))
 str_length(c("i", "like", "programming", NA))
+
+# Two ways of representing a u with an umlaut
+u1 <- "\\u00fc"
+u2 <- stringi::stri_trans_nfd(u1)
+# The print the same:
+u1
+u2
+# But have a different length
+str_length(u1)
+str_length(u2)
+# Even though they have the same number of characters
+str_count(u1)
+str_count(u2)
 }
 \seealso{
-  \code{\link{nchar}} which this function wraps
+\code{\link[stringi]{stri_length}} which this function wraps.
 }
-\keyword{character}
 
diff --git a/man/str_locate.Rd b/man/str_locate.Rd
index 20f36e9..040809f 100644
--- a/man/str_locate.Rd
+++ b/man/str_locate.Rd
@@ -1,40 +1,59 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/locate.r
 \name{str_locate}
 \alias{str_locate}
-\title{Locate the position of the first occurence of a pattern in a string.}
+\alias{str_locate_all}
+\title{Locate the position of patterns in a string.}
 \usage{
-  str_locate(string, pattern)
+str_locate(string, pattern)
+
+str_locate_all(string, pattern)
 }
 \arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
+
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
 }
 \value{
-  integer matrix.  First column gives start postion of
-  match, and second column gives end position.
+For \code{str_locate}, an integer matrix. First column gives start
+  postion of match, and second column gives end position. For
+  \code{str_locate_all} a list of integer matrices.
 }
 \description{
-  Vectorised over \code{string} and \code{pattern}, shorter
-  is recycled to same length as longest.
+Vectorised over \code{string} and \code{pattern}. If the match is of length
+0, (e.g. from a special match like \code{$}) end will be one character less
+than start.
 }
 \examples{
-fruit <- c("apple", "banana", "pear", "pinapple")
+fruit <- c("apple", "banana", "pear", "pineapple")
+str_locate(fruit, "$")
 str_locate(fruit, "a")
 str_locate(fruit, "e")
 str_locate(fruit, c("a", "b", "p", "p"))
+
+str_locate_all(fruit, "a")
+str_locate_all(fruit, "e")
+str_locate_all(fruit, c("a", "b", "p", "p"))
+
+# Find location of every character
+str_locate_all(fruit, "")
 }
 \seealso{
-  \code{\link{regexpr}} which this function wraps
-
-  \code{\link{str_extract}} for a convenient way of
-  extracting matches \code{\link{str_locate_all}} to locate
-  position of all matches
+\code{\link{str_extract}} for a convenient way of extracting matches,
+  \code{\link[stringi]{stri_locate}} for the underlying implementation.
 }
-\keyword{character}
 
diff --git a/man/str_locate_all.Rd b/man/str_locate_all.Rd
deleted file mode 100644
index e967900..0000000
--- a/man/str_locate_all.Rd
+++ /dev/null
@@ -1,46 +0,0 @@
-\name{str_locate_all}
-\alias{str_locate_all}
-\title{Locate the position of all occurences of a pattern in a string.}
-\usage{
-  str_locate_all(string, pattern)
-}
-\arguments{
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
-}
-\value{
-  list of integer matrices.  First column gives start
-  postion of match, and second column gives end position.
-}
-\description{
-  Vectorised over \code{string} and \code{pattern}, shorter
-  is recycled to same length as longest.
-}
-\details{
-  If the match is of length 0, (e.g. from a special match
-  like \code{$}) end will be one character less than start.
-}
-\examples{
-fruit <- c("apple", "banana", "pear", "pineapple")
-str_locate_all(fruit, "a")
-str_locate_all(fruit, "e")
-str_locate_all(fruit, c("a", "b", "p", "p"))
-}
-\seealso{
-  \code{\link{regexpr}} which this function wraps
-
-  \code{\link{str_extract}} for a convenient way of
-  extracting matches
-
-  \code{\link{str_locate}} to locate position of first
-  match
-}
-\keyword{character}
-
diff --git a/man/str_match.Rd b/man/str_match.Rd
index b497d99..d332d51 100644
--- a/man/str_match.Rd
+++ b/man/str_match.Rd
@@ -1,35 +1,46 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/match.r
 \name{str_match}
 \alias{str_match}
-\title{Extract first matched group from a string.}
+\alias{str_match_all}
+\title{Extract matched groups from a string.}
 \usage{
-  str_match(string, pattern)
+str_match(string, pattern)
+
+str_match_all(string, pattern)
 }
 \arguments{
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  Pattern should contain groups,
-  defined by ().  See the ``Extended Regular Expressions''
-  section of \code{\link{regex}} for details.}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
 
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
+\item{pattern}{Pattern to look for, as defined by an ICU regular
+expression. See \link[stringi]{stringi-search-regex} for more details.}
 }
 \value{
-  character matrix. First column is the complete match,
-  followed by one for each capture group
+For \code{str_match}, a character matrix. First column is the
+  complete match, followed by one column for each capture group.
+  For \code{str_match_all}, a list of character matrices.
 }
 \description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
+Vectorised over \code{string} and \code{pattern}.
 }
 \examples{
 strings <- c(" 219 733 8965", "329-293-8753 ", "banana", "595 794 7569",
   "387 287 6718", "apple", "233.398.9187  ", "482 952 3315",
-  "239 923 8115", "842 566 4692", "Work: 579-499-7527", "$1000",
+  "239 923 8115 and 842 566 4692", "Work: 579-499-7527", "$1000",
   "Home: 543.355.3679")
 phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
 
 str_extract(strings, phone)
 str_match(strings, phone)
+
+# Extract/match all
+str_extract_all(strings, phone)
+str_match_all(strings, phone)
+}
+\seealso{
+\code{\link{str_extract}} to extract the complete match,
+  \code{\link[stringi]{stri_match}} for the underlying
+  implementation.
 }
-\keyword{character}
 
diff --git a/man/str_match_all.Rd b/man/str_match_all.Rd
deleted file mode 100644
index e804729..0000000
--- a/man/str_match_all.Rd
+++ /dev/null
@@ -1,33 +0,0 @@
-\name{str_match_all}
-\alias{str_match_all}
-\title{Extract all matched groups from a string.}
-\usage{
-  str_match_all(string, pattern)
-}
-\arguments{
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  Pattern should contain groups,
-  defined by ().  See the ``Extended Regular Expressions''
-  section of \code{\link{regex}} for details.}
-
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-}
-\value{
-  list of character matrices, as given by
-  \code{\link{str_match}}
-}
-\description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
-}
-\examples{
-strings <- c("Home: 219 733 8965.  Work: 229-293-8753 ",
-  "banana pear apple", "595 794 7569 / 387 287 6718")
-phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
-
-str_extract_all(strings, phone)
-str_match_all(strings, phone)
-}
-\keyword{character}
-
diff --git a/man/str_order.Rd b/man/str_order.Rd
new file mode 100644
index 0000000..90f3772
--- /dev/null
+++ b/man/str_order.Rd
@@ -0,0 +1,40 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/sort.R
+\name{str_order}
+\alias{str_order}
+\alias{str_sort}
+\title{Order or sort a character vector.}
+\usage{
+str_order(x, decreasing = FALSE, na_last = TRUE, locale = "", ...)
+
+str_sort(x, decreasing = FALSE, na_last = TRUE, locale = "", ...)
+}
+\arguments{
+\item{x}{A character vector to sort.}
+
+\item{decreasing}{A boolean. If \code{FALSE}, the default, sorts from
+lowest to highest; if \code{TRUE} sorts from highest to lowest.}
+
+\item{na_last}{Where should \code{NA} go? \code{TRUE} at the end,
+\code{FALSE} at the beginning, \code{NA} dropped.}
+
+\item{locale}{In which locale should the sorting occur? Defaults to
+the current locale.}
+
+\item{...}{Other options used to control sorting order. Passed on to
+\code{\link[stringi]{stri_opts_collator}}.}
+}
+\description{
+Order or sort a character vector.
+}
+\examples{
+str_order(letters, locale = "en")
+str_sort(letters, locale = "en")
+
+str_order(letters, locale = "haw")
+str_sort(letters, locale = "haw")
+}
+\seealso{
+\code{\link[stringi]{stri_order}} for the underlying implementation.
+}
+
diff --git a/man/str_pad.Rd b/man/str_pad.Rd
index ed8636f..5ef8028 100644
--- a/man/str_pad.Rd
+++ b/man/str_pad.Rd
@@ -1,25 +1,25 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/pad-trim.r
 \name{str_pad}
 \alias{str_pad}
 \title{Pad a string.}
 \usage{
-  str_pad(string, width, side = "left", pad = " ")
+str_pad(string, width, side = c("left", "right", "both"), pad = " ")
 }
 \arguments{
-  \item{string}{input character vector}
+\item{string}{A character vector.}
 
-  \item{width}{pad strings to this minimum width}
+\item{width}{Minimum width of padded strings.}
 
-  \item{side}{side on which padding character is added
-  (left, right or both)}
+\item{side}{Side on which padding character is added (left, right or both).}
 
-  \item{pad}{single padding character (default is a space)}
+\item{pad}{Single padding character (default is a space).}
 }
 \value{
-  character vector
+A character vector.
 }
 \description{
-  Vectorised over \code{string}.  All other inputs should
-  be of length 1.
+Vectorised over \code{string}, \code{width} and \code{pad}.
 }
 \examples{
 rbind(
@@ -27,11 +27,16 @@ rbind(
   str_pad("hadley", 30, "right"),
   str_pad("hadley", 30, "both")
 )
+
+# All arguments are vectorised except side
+str_pad(c("a", "abc", "abcdef"), 10)
+str_pad("a", c(5, 10, 20))
+str_pad("a", 10, pad = c("-", "_", " "))
+
 # Longer strings are returned unchanged
 str_pad("hadley", 3)
 }
 \seealso{
-  \code{\link{str_trim}} to remove whitespace
+\code{\link{str_trim}} to remove whitespace
 }
-\keyword{character}
 
diff --git a/man/str_replace.Rd b/man/str_replace.Rd
index 17af4c3..6359c89 100644
--- a/man/str_replace.Rd
+++ b/man/str_replace.Rd
@@ -1,32 +1,32 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/replace.r
 \name{str_replace}
 \alias{str_replace}
-\title{Replace first occurrence of a matched pattern in a string.}
+\alias{str_replace_all}
+\title{Replace matched patterns in a string.}
 \usage{
-  str_replace(string, pattern, replacement)
+str_replace(string, pattern, replacement)
+
+str_replace_all(string, pattern, replacement)
 }
 \arguments{
-  \item{replacement}{replacement string.  References of the
-  form \code{\1}, \code{\2} will be replaced with the
-  contents of the respective matched group (created by
-  \code{()}) within the pattern.}
-
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{pattern,replacement}{Supply separate pattern and replacement strings
+  to vectorise over the patterns. References of the form \code{\1},
+  \code{\2} will be replaced with the contents of the respective matched
+  group (created by \code{()}) within the pattern.
+
+  For \code{str_replace_all} only, you can perform multiple patterns and
+  replacements to each string, by passing a named character to
+  \code{pattern}.}
 }
 \value{
-  character vector.
+A character vector.
 }
 \description{
-  Vectorised over \code{string}, \code{pattern} and
-  \code{replacement}. Shorter arguments will be expanded to
-  length of longest.
+Vectorised over \code{string}, \code{pattern} and \code{replacement}.
 }
 \examples{
 fruits <- c("one apple", "two pears", "three bananas")
@@ -37,10 +37,23 @@ str_replace(fruits, "([aeiou])", "")
 str_replace(fruits, "([aeiou])", "\\\\1\\\\1")
 str_replace(fruits, "[aeiou]", c("1", "2", "3"))
 str_replace(fruits, c("a", "e", "i"), "-")
+
+fruits <- c("one apple", "two pears", "three bananas")
+str_replace(fruits, "[aeiou]", "-")
+str_replace_all(fruits, "[aeiou]", "-")
+
+str_replace_all(fruits, "([aeiou])", "")
+str_replace_all(fruits, "([aeiou])", "\\\\1\\\\1")
+str_replace_all(fruits, "[aeiou]", c("1", "2", "3"))
+str_replace_all(fruits, c("a", "e", "i"), "-")
+
+# If you want to apply multiple patterns and replacements to the same
+# string, pass a named version to pattern.
+str_replace_all(str_c(fruits, collapse = "---"),
+ c("one" = 1, "two" = 2, "three" = 3))
 }
 \seealso{
-  \code{\link{sub}} which this function wraps,
-  \code{\link{str_replace_all}} to replace all matches
+\code{str_replace_na} to turn missing values into "NA";
+  \code{\link{stri_replace}} for the underlying implementation.
 }
-\keyword{character}
 
diff --git a/man/str_replace_all.Rd b/man/str_replace_all.Rd
deleted file mode 100644
index d5363f5..0000000
--- a/man/str_replace_all.Rd
+++ /dev/null
@@ -1,46 +0,0 @@
-\name{str_replace_all}
-\alias{str_replace_all}
-\title{Replace all occurrences of a matched pattern in a string.}
-\usage{
-  str_replace_all(string, pattern, replacement)
-}
-\arguments{
-  \item{replacement}{replacement string.  References of the
-  form \code{\1}, \code{\2} will be replaced with the
-  contents of the respective matched group (created by
-  \code{()}) within the pattern.}
-
-  \item{string}{input vector. This must be an atomic
-  vector, and will be coerced to a character vector}
-
-  \item{pattern}{pattern to look for, as defined by a POSIX
-  regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  See \code{\link{fixed}}, \code{\link{ignore.case}} and
-  \code{\link{perl}} for how to use other types of
-  matching: fixed, case insensitive and perl-compatible.}
-}
-\value{
-  character vector.
-}
-\description{
-  Vectorised over \code{string}, \code{pattern} and
-  \code{replacement}. Shorter arguments will be expanded to
-  length of longest.
-}
-\examples{
-fruits <- c("one apple", "two pears", "three bananas")
-str_replace(fruits, "[aeiou]", "-")
-str_replace_all(fruits, "[aeiou]", "-")
-
-str_replace_all(fruits, "([aeiou])", "")
-str_replace_all(fruits, "([aeiou])", "\\\\1\\\\1")
-str_replace_all(fruits, "[aeiou]", c("1", "2", "3"))
-str_replace_all(fruits, c("a", "e", "i"), "-")
-}
-\seealso{
-  \code{\link{gsub}} which this function wraps,
-  \code{\link{str_replace}} to replace a single match
-}
-\keyword{character}
-
diff --git a/man/str_replace_na.Rd b/man/str_replace_na.Rd
new file mode 100644
index 0000000..37e58cd
--- /dev/null
+++ b/man/str_replace_na.Rd
@@ -0,0 +1,28 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/replace.r
+\name{str_replace_na}
+\alias{str_replace_na}
+\title{Turn NA into "NA"}
+\usage{
+str_replace_na(string, replacement = "NA")
+}
+\arguments{
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{replacement}{Supply separate pattern and replacement strings
+  to vectorise over the patterns. References of the form \code{\1},
+  \code{\2} will be replaced with the contents of the respective matched
+  group (created by \code{()}) within the pattern.
+
+  For \code{str_replace_all} only, you can perform multiple patterns and
+  replacements to each string, by passing a named character to
+  \code{pattern}.}
+}
+\description{
+Turn NA into "NA"
+}
+\examples{
+str_replace_na(c("NA", "abc", "def"))
+}
+
diff --git a/man/str_split.Rd b/man/str_split.Rd
index 1ee0293..84cc707 100644
--- a/man/str_split.Rd
+++ b/man/str_split.Rd
@@ -1,33 +1,52 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/split.r
 \name{str_split}
 \alias{str_split}
-\title{Split up a string into a variable number of pieces.}
+\alias{str_split_fixed}
+\title{Split up a string into pieces.}
 \usage{
-  str_split(string, pattern, n = Inf)
+str_split(string, pattern, n = Inf)
+
+str_split_fixed(string, pattern, n)
 }
 \arguments{
-  \item{string}{input character vector}
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
 
-  \item{pattern}{pattern to split up by, as defined by a
-  POSIX regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  If \code{NA}, returns original string.  If \code{""}
-  splits into individual characters.}
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
 
-  \item{n}{maximum number of pieces to return.  Default
-  (Inf) uses all possible split positions.}
+\item{n}{number of pieces to return.  Default (Inf) uses all
+  possible split positions.
+
+  For \code{str_split_fixed}, if n is greater than the number of pieces,
+  the result will be padded with empty strings.}
 }
 \value{
-  a list of character vectors.
+For \code{str_split_fixed}, a character matrix with \code{n} columns.
+  For \code{str_split}, a list of character vectors.
 }
 \description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
+Vectorised over \code{string} and \code{pattern}.
 }
 \examples{
 fruits <- c(
   "apples and oranges and pears and bananas",
   "pineapples and mangos and guavas"
 )
+
 str_split(fruits, " and ")
 
 # Specify n to restrict the number of possible matches
@@ -35,9 +54,12 @@ str_split(fruits, " and ", n = 3)
 str_split(fruits, " and ", n = 2)
 # If n greater than number of pieces, no padding occurs
 str_split(fruits, " and ", n = 5)
+
+# Use fixed to return a character matrix
+str_split_fixed(fruits, " and ", 3)
+str_split_fixed(fruits, " and ", 4)
 }
 \seealso{
-  \code{\link{str_split_fixed}} for fixed number of splits
+\code{\link{stri_split}} for the underlying implementation.
 }
-\keyword{character}
 
diff --git a/man/str_split_fixed.Rd b/man/str_split_fixed.Rd
deleted file mode 100644
index d933d3e..0000000
--- a/man/str_split_fixed.Rd
+++ /dev/null
@@ -1,40 +0,0 @@
-\name{str_split_fixed}
-\alias{str_split_fixed}
-\title{Split up a string into a fixed number of pieces.}
-\usage{
-  str_split_fixed(string, pattern, n)
-}
-\arguments{
-  \item{string}{input character vector}
-
-  \item{pattern}{pattern to split up by, as defined by a
-  POSIX regular expression.  See the ``Extended Regular
-  Expressions'' section of \code{\link{regex}} for details.
-  If \code{NA}, returns original string.  If \code{""}
-  splits into individual characters.}
-
-  \item{n}{number of pieces to return.  Default (Inf) uses
-  all possible split positions.  If n is greater than the
-  number of pieces, the result will be padded with empty
-  strings.}
-}
-\value{
-  character matrix with \code{n} columns.
-}
-\description{
-  Vectorised over \code{string}.  \code{pattern} should be
-  a single pattern, i.e. a character vector of length one.
-}
-\examples{
-fruits <- c(
-  "apples and oranges and pears and bananas",
-  "pineapples and mangos and guavas"
-)
-str_split_fixed(fruits, " and ", 3)
-str_split_fixed(fruits, " and ", 4)
-}
-\seealso{
-  \code{\link{str_split}} for variable number of splits
-}
-\keyword{character}
-
diff --git a/man/str_sub.Rd b/man/str_sub.Rd
index 6898530..597e238 100644
--- a/man/str_sub.Rd
+++ b/man/str_sub.Rd
@@ -1,36 +1,39 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/sub.r
 \name{str_sub}
 \alias{str_sub}
-\title{Extract substrings from a character vector.}
+\alias{str_sub<-}
+\title{Extract and replace substrings from a character vector.}
 \usage{
-  str_sub(string, start = 1L, end = -1L)
+str_sub(string, start = 1L, end = -1L)
+
+str_sub(string, start = 1L, end = -1L) <- value
 }
 \arguments{
-  \item{string}{input character vector.}
+\item{string}{input character vector.}
+
+\item{start,end}{Two integer vectors. \code{start} gives the position
+  of the first character (defaults to first), \code{end} gives the position
+  of the last (defaults to last character). Alternatively, pass a two-column
+  matrix to \code{start}.
 
-  \item{start}{integer vector giving position of first
-  charater in substring, defaults to first character. If
-  negative, counts backwards from last character.}
+  Negative values count backwards from the last character.}
 
-  \item{end}{integer vector giving position of last
-  character in substring, defaults to last character. If
-  negative, counts backwards from last character.}
+\item{value}{replacement string}
 }
 \value{
-  character vector of substring from \code{start} to
-  \code{end} (inclusive). Will be length of longest input
-  argument.
+A character vector of substring from \code{start} to \code{end}
+  (inclusive). Will be length of longest input argument.
 }
 \description{
-  \code{str_sub} will recycle all arguments to be the same
-  length as the longest argument. If any arguments are of
-  length 0, the output will be a zero length character
-  vector.
+\code{str_sub} will recycle all arguments to be the same length as the
+longest argument. If any arguments are of length 0, the output will be
+a zero length character vector.
 }
 \details{
-  Substrings are inclusive - they include the characters at
-  both start and end positions. \code{str_sub(string, 1,
-  -1)} will return the complete substring, from the first
-  character to the last.
+Substrings are inclusive - they include the characters at both start and
+end positions. \code{str_sub(string, 1, -1)} will return the complete
+substring, from the first character to the last.
 }
 \examples{
 hw <- "Hadley Wickham"
@@ -41,16 +44,29 @@ str_sub(hw, 8, 14)
 str_sub(hw, 8)
 str_sub(hw, c(1, 8), c(6, 14))
 
+# Negative indices
 str_sub(hw, -1)
 str_sub(hw, -7)
 str_sub(hw, end = -7)
 
+# Alternatively, you can pass in a two colum matrix, as in the
+# output from str_locate_all
+pos <- str_locate_all(hw, "[aeio]")[[1]]
+str_sub(hw, pos)
+str_sub(hw, pos[, 1], pos[, 2])
+
+# Vectorisation
 str_sub(hw, seq_len(str_length(hw)))
 str_sub(hw, end = seq_len(str_length(hw)))
+
+# Replacement form
+x <- "BBCDEF"
+str_sub(x, 1, 1) <- "A"; x
+str_sub(x, -1, -1) <- "K"; x
+str_sub(x, -2, -2) <- "GHIJ"; x
+str_sub(x, 2, -2) <- ""; x
 }
 \seealso{
-  \code{\link{substring}} which this function wraps, and
-  \code{link{str_sub_replace}} for the replacement version
+The underlying implementation in \code{\link[stringi]{stri_sub}}
 }
-\keyword{character}
 
diff --git a/man/str_sub_replace.Rd b/man/str_sub_replace.Rd
deleted file mode 100644
index 43002b2..0000000
--- a/man/str_sub_replace.Rd
+++ /dev/null
@@ -1,40 +0,0 @@
-\name{str_sub_replace}
-\alias{str_sub<-}
-\alias{str_sub_replace}
-\title{Replace substrings in a character vector.
-\code{str_sub<-} will recycle all arguments to be the same length as the
-longest argument.}
-\usage{
-  str_sub(string, start = 1L, end = -1L) <- value
-}
-\arguments{
-  \item{string}{input character vector.}
-
-  \item{start}{integer vector giving position of first
-  charater in substring, defaults to first character. If
-  negative, counts backwards from last character.}
-
-  \item{end}{integer vector giving position of last
-  character in substring, defaults to last character. If
-  negative, counts backwards from last character.}
-
-  \item{value}{replacement string}
-}
-\value{
-  character vector of substring from \code{start} to
-  \code{end} (inclusive). Will be length of longest input
-  argument.
-}
-\description{
-  Replace substrings in a character vector.
-  \code{str_sub<-} will recycle all arguments to be the
-  same length as the longest argument.
-}
-\examples{
-x <- "BBCDEF"
-str_sub(x, 1, 1) <- "A"; x
-str_sub(x, -1, -1) <- "K"; x
-str_sub(x, -2, -2) <- "GHIJ"; x
-str_sub(x, 2, -2) <- ""; x
-}
-
diff --git a/man/str_subset.Rd b/man/str_subset.Rd
new file mode 100644
index 0000000..91778ac
--- /dev/null
+++ b/man/str_subset.Rd
@@ -0,0 +1,50 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/subset.R
+\name{str_subset}
+\alias{str_subset}
+\title{Keep strings matching a pattern.}
+\usage{
+str_subset(string, pattern)
+}
+\arguments{
+\item{string}{Input vector. Either a character vector, or something
+coercible to one.}
+
+\item{pattern}{Pattern to look for.
+
+  The default interpretation is a regular expression, as described
+  in \link[stringi]{stringi-search-regex}. Control options with
+  \code{\link{regex}()}.
+
+  Match a fixed string (i.e. by comparing only bytes), using
+  \code{\link{fixed}(x)}. This is fast, but approximate. Generally,
+  for matching human text, you'll want \code{\link{coll}(x)} which
+  respects character matching rules for the specified locale.
+
+  Match character, word, line and sentence boundaries with
+  \code{\link{boundary}()}. An empty pattern, "", is equivalent to
+  \code{boundary("character")}.}
+}
+\value{
+A character vector.
+}
+\description{
+This is a convenient wrapper around \code{x[str_detect(x, pattern)]}.
+Vectorised over \code{string} and \code{pattern}
+}
+\examples{
+fruit <- c("apple", "banana", "pear", "pinapple")
+str_subset(fruit, "a")
+str_subset(fruit, "^a")
+str_subset(fruit, "a$")
+str_subset(fruit, "b")
+str_subset(fruit, "[aeiou]")
+
+# Missings are silently dropped
+str_subset(c("a", NA, "b"), ".")
+}
+\seealso{
+\code{\link{grep}} with argument \code{value = TRUE},
+   \code{\link[stringi]{stri_subset}} for the underlying implementation.
+}
+
diff --git a/man/str_trim.Rd b/man/str_trim.Rd
index a5dfd0f..aa29289 100644
--- a/man/str_trim.Rd
+++ b/man/str_trim.Rd
@@ -1,28 +1,27 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/pad-trim.r
 \name{str_trim}
 \alias{str_trim}
 \title{Trim whitespace from start and end of string.}
 \usage{
-  str_trim(string, side = "both")
+str_trim(string, side = c("both", "left", "right"))
 }
 \arguments{
-  \item{string}{input character vector}
+\item{string}{A character vector.}
 
-  \item{side}{side on which whitespace is removed (left,
-  right or both)}
+\item{side}{Side on which to remove whitespace (left, right or both).}
 }
 \value{
-  character vector with leading and trailing whitespace
-  removed
+A character vector.
 }
 \description{
-  Trim whitespace from start and end of string.
+Trim whitespace from start and end of string.
 }
 \examples{
 str_trim("  String with trailing and leading white space\\t")
 str_trim("\\n\\nString with trailing and leading white space\\n\\n")
 }
 \seealso{
-  \code{\link{str_pad}} to add whitespace
+\code{\link{str_pad}} to add whitespace
 }
-\keyword{character}
 
diff --git a/man/str_wrap.Rd b/man/str_wrap.Rd
index 6869676..c2a68b3 100644
--- a/man/str_wrap.Rd
+++ b/man/str_wrap.Rd
@@ -1,29 +1,29 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/wrap.r
 \name{str_wrap}
 \alias{str_wrap}
 \title{Wrap strings into nicely formatted paragraphs.}
 \usage{
-  str_wrap(string, width = 80, indent = 0, exdent = 0)
+str_wrap(string, width = 80, indent = 0, exdent = 0)
 }
 \arguments{
-  \item{string}{character vector of strings to reformat.}
+\item{string}{character vector of strings to reformat.}
 
-  \item{width}{positive integer giving target line width in
-  characters.}
+\item{width}{positive integer giving target line width in characters. A
+width less than or equal to 1 will put each word on its own line.}
 
-  \item{indent}{non-negative integer giving indentation of
-  first line in each paragraph}
+\item{indent}{non-negative integer giving indentation of first line in
+each paragraph}
 
-  \item{exdent}{non-negative integer giving indentation of
-  following lines in each paragraph}
+\item{exdent}{non-negative integer giving indentation of following lines in
+each paragraph}
 }
 \value{
-  a character vector of reformatted strings.
+A character vector of re-wrapped strings.
 }
 \description{
-  This is currently implemented as thin wrapper over
-  \code{\link{strwrap}}, but is vectorised over
-  \code{stringr}, and collapses output into single strings.
-  See \code{\link{strwrap}} for more details.
+This is a wrapper around \code{\link[stringi]{stri_wrap}} which implements
+the Knuth-Plass paragraph wrapping algorithm.
 }
 \examples{
 thanks_path <- file.path(R.home("doc"), "THANKS")
@@ -33,5 +33,6 @@ cat(str_wrap(thanks), "\\n")
 cat(str_wrap(thanks, width = 40), "\\n")
 cat(str_wrap(thanks, width = 60, indent = 2), "\\n")
 cat(str_wrap(thanks, width = 60, exdent = 2), "\\n")
+cat(str_wrap(thanks, width = 0, exdent = 2), "\\n")
 }
 
diff --git a/man/stringr.Rd b/man/stringr.Rd
new file mode 100644
index 0000000..a49955d
--- /dev/null
+++ b/man/stringr.Rd
@@ -0,0 +1,9 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/stringr.R
+\name{stringr}
+\alias{stringr}
+\title{Fast and friendly string manipulation.}
+\description{
+Fast and friendly string manipulation.
+}
+
diff --git a/man/word.Rd b/man/word.Rd
index 5bc4503..02642e8 100644
--- a/man/word.Rd
+++ b/man/word.Rd
@@ -1,29 +1,30 @@
+% Generated by roxygen2 (4.1.0): do not edit by hand
+% Please edit documentation in R/word.r
 \name{word}
 \alias{word}
 \title{Extract words from a sentence.}
 \usage{
-  word(string, start = 1L, end = start, sep = fixed(" "))
+word(string, start = 1L, end = start, sep = fixed(" "))
 }
 \arguments{
-  \item{string}{input character vector.}
+\item{string}{input character vector.}
 
-  \item{start}{integer vector giving position of first word
-  to extract.  Defaults to first word. If negative, counts
-  backwards from last character.}
+\item{start}{integer vector giving position of first word to extract.
+Defaults to first word. If negative, counts backwards from last
+character.}
 
-  \item{end}{integer vector giving position of last word to
-  extract.  Defaults to first word. If negative, counts
-  backwards from last character.}
+\item{end}{integer vector giving position of last word to extract.
+Defaults to first word. If negative, counts backwards from last
+character.}
 
-  \item{sep}{separator between words.  Defaults to single
-  space.}
+\item{sep}{separator between words.  Defaults to single space.}
 }
 \value{
-  character vector of words from \code{start} to \code{end}
+character vector of words from \code{start} to \code{end}
   (inclusive). Will be length of longest input argument.
 }
 \description{
-  Extract words from a sentence.
+Extract words from a sentence.
 }
 \examples{
 sentences <- c("Jane saw a cat", "Jane sat down")
diff --git a/tests/test-all.R b/tests/testthat.R
similarity index 60%
rename from tests/test-all.R
rename to tests/testthat.R
index 18f94b6..3b03326 100644
--- a/tests/test-all.R
+++ b/tests/testthat.R
@@ -1,4 +1,4 @@
 library(testthat)
 library(stringr)
 
-test_package("stringr")
+test_check("stringr")
diff --git a/inst/tests/test-count.r b/tests/testthat/test-count.r
similarity index 100%
rename from inst/tests/test-count.r
rename to tests/testthat/test-count.r
diff --git a/inst/tests/test-detect.r b/tests/testthat/test-detect.r
similarity index 69%
rename from inst/tests/test-detect.r
rename to tests/testthat/test-detect.r
index d8edafb..99fc429 100644
--- a/inst/tests/test-detect.r
+++ b/tests/testthat/test-detect.r
@@ -1,8 +1,8 @@
 context("Detecting patterns")
 
 test_that("special cases are correct", {
-  expect_that(str_detect(NA, ""), equals(NA))
-  expect_that(str_detect(character(), ""), equals(logical()))
+  expect_that(str_detect(NA, "x"), equals(NA))
+  expect_that(str_detect(character(), "x"), equals(logical()))
 })
 
 test_that("vectorised patterns work", {
@@ -12,12 +12,11 @@ test_that("vectorised patterns work", {
 
 test_that("modifiers work", {
   expect_that(str_detect("ab", "AB"), equals(FALSE))
-  expect_that(str_detect("ab", ignore.case("AB")), equals(TRUE))
+  expect_that(str_detect("ab", regex("AB", TRUE)), equals(TRUE))
 
   expect_that(str_detect("abc", "ab[c]"), equals(TRUE))
   expect_that(str_detect("abc", fixed("ab[c]")), equals(FALSE))
   expect_that(str_detect("ab[c]", fixed("ab[c]")), equals(TRUE))
 
-  expect_that(str_detect("abc", perl("(?x)a b c")), equals(TRUE))
-
+  expect_that(str_detect("abc", "(?x)a b c"), equals(TRUE))
 })
diff --git a/inst/tests/test-dup.r b/tests/testthat/test-dup.r
similarity index 100%
rename from inst/tests/test-dup.r
rename to tests/testthat/test-dup.r
diff --git a/inst/tests/test-extract.r b/tests/testthat/test-extract.r
similarity index 76%
rename from inst/tests/test-extract.r
rename to tests/testthat/test-extract.r
index 06fd89b..ebc0c1e 100644
--- a/inst/tests/test-extract.r
+++ b/tests/testthat/test-extract.r
@@ -12,3 +12,7 @@ test_that("single pattern extracted correctly", {
     equals(list(c("one", "two", "three"), character())))
 
 })
+
+test_that("no match yields empty vector", {
+  expect_equal(str_extract_all("a", "b")[[1]], character())
+})
diff --git a/inst/tests/test-join.r b/tests/testthat/test-join.r
similarity index 54%
rename from inst/tests/test-join.r
rename to tests/testthat/test-join.r
index bd71139..c7bf803 100644
--- a/inst/tests/test-join.r
+++ b/tests/testthat/test-join.r
@@ -8,13 +8,9 @@ test_that("basic case works", {
   expect_that(str_c(test, collapse = ""), equals("abc"))
 })
 
-test_that("zero length vectors dropped", {
+test_that("NULLs are dropped", {
   test <- letters[1:3]
 
-  expect_that(str_c(test, c()), equals(test))
-  expect_that(str_c(test, NULL), equals(test))
-
-  expect_that(
-    str_c(test, NULL, "a", sep = " "),
-    equals(c("a a", "b a", "c a")))
+  expect_equal(str_c(test, NULL), test)
+  expect_equal(str_c(test, NULL, "a", sep = " "), c("a a", "b a", "c a"))
 })
diff --git a/inst/tests/test-length.r b/tests/testthat/test-length.r
similarity index 100%
rename from inst/tests/test-length.r
rename to tests/testthat/test-length.r
diff --git a/inst/tests/test-locate.r b/tests/testthat/test-locate.r
similarity index 100%
rename from inst/tests/test-locate.r
rename to tests/testthat/test-locate.r
diff --git a/inst/tests/test-match.r b/tests/testthat/test-match.r
similarity index 81%
rename from inst/tests/test-match.r
rename to tests/testthat/test-match.r
index 92130bb..33c31e2 100644
--- a/inst/tests/test-match.r
+++ b/tests/testthat/test-match.r
@@ -10,12 +10,8 @@ phones <- str_c(
   num[, 7], num[, 8], num[, 9], num[, 10])
 
 test_that("special case are correct", {
-  # These tests really should compare to character matrices, but str_match
-  # returns matrices with dimnames set it's real pain
-  expect_that(c(str_match(NA, "(a)")),
-    equals(c(NA_character_, NA_character_)))
-  expect_that(c(str_match(character(), "(a)")),
-    equals(character()))
+  expect_equal(str_match(NA, "(a)"), matrix(NA_character_))
+  expect_equal(str_match(character(), "(a)"), matrix(character(), 0, 1))
 })
 
 test_that("no matching cases returns 1 column matrix", {
@@ -39,7 +35,7 @@ test_that("single match works when all match", {
   expect_that(matches_flat, equals(num_flat))
 })
 
-test_that("single match works when some don't match", {
+test_that("match returns NA when some inputs don't match", {
   matches <- str_match(c(phones, "blah", NA),
     "\\(([0-9]{3})\\) ([0-9]{3}) ([0-9]{4})")
 
@@ -50,6 +46,10 @@ test_that("single match works when some don't match", {
   expect_that(matches[12, ], equals(rep(NA_character_, 4)))
 })
 
+test_that("match returns NA when optional group doesn't match", {
+  expect_equal(str_match(c("ab", "a"), "(a)(b)?")[,3], c("b", NA))
+})
+
 test_that("multiple match works", {
   phones_one <- str_c(phones, collapse = " ")
   multi_match <- str_match_all(phones_one,
diff --git a/inst/tests/test-pad.r b/tests/testthat/test-pad.r
similarity index 100%
rename from inst/tests/test-pad.r
rename to tests/testthat/test-pad.r
diff --git a/inst/tests/test-split.r b/tests/testthat/test-split.r
similarity index 100%
rename from inst/tests/test-split.r
rename to tests/testthat/test-split.r
diff --git a/inst/tests/test-sub.r b/tests/testthat/test-sub.r
similarity index 100%
rename from inst/tests/test-sub.r
rename to tests/testthat/test-sub.r
diff --git a/inst/tests/test-trim.r b/tests/testthat/test-trim.r
similarity index 100%
rename from inst/tests/test-trim.r
rename to tests/testthat/test-trim.r
diff --git a/vignettes/stringr.Rmd b/vignettes/stringr.Rmd
new file mode 100644
index 0000000..3e79e57
--- /dev/null
+++ b/vignettes/stringr.Rmd
@@ -0,0 +1,201 @@
+---
+title: "Introduction to stringr"
+date: "`r Sys.Date()`"
+output: rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Introduction to stringr}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+```{r, echo=FALSE}
+library("stringr")
+knitr::opts_chunk$set(comment = "#>", collapse = TRUE)
+```
+
+Strings are not glamorous, high-profile components of R, but they do play a big role in many data cleaning and preparations tasks. R provides a solid set of string operations, but because they have grown organically over time, they can be inconsistent and a little hard to learn. Additionally, they lag behind the string operations in other programming languages, so that some things that are easy to do in languages like Ruby or Python are rather hard to do in R. The __stringr__ package aim [...]
+
+More concretely, stringr:
+
+-   Simplifies string operations by eliminating options that you don't need 
+    95% of the time (the other 5% of the time you can functions from base R or
+    [stringi](https://github.com/Rexamine/stringi/)).
+
+-   Uses consistent function names and arguments.
+
+-   Produces outputs than can easily be used as inputs. This includes ensuring 
+    that missing inputs result in missing outputs, and zero length inputs result 
+    in zero length outputs. It also processes factors and character vectors in 
+    the same way.
+
+-   Completes R's string handling functions with useful functions from other 
+    programming languages.
+
+To meet these goals, stringr provides two basic families of functions:
+
+-   basic string operations, and
+
+-   pattern matching functions which use regular expressions to detect, locate, 
+    match, replace, extract, and split strings.
+
+As of version 1.0, stringr is a thin wrapper around [stringi](https://github.com/Rexamine/stringi/), which implements all the functions in stringr with efficient C code based on the [ICU library](http://site.icu-project.org).  Compared to stringi, stringr is considerably simpler: it provides fewer options and fewer functions. This is great when you're getting started learning string functions, and if you do need more of stringi's power, you should find the interface similar.
+
+These are described in more detail in the following sections.
+
+## Basic string operations
+
+There are three string functions that are closely related to their base R equivalents, but with a few enhancements:
+
+-   `str_c()` is equivalent to `paste()`, but it uses the empty string ("") as 
+    the default separator and silently removes `NULL` inputs.
+
+-   `str_length()` is equivalent to `nchar()`, but it preserves NA's (rather than 
+     giving them length 2) and converts factors to characters (not integers).
+
+-   `str_sub()` is equivalent to `substr()` but it returns a zero length vector 
+    if any of its inputs are zero length, and otherwise expands each argument to
+    match the longest. It also accepts negative positions, which are calculated 
+    from the left of the last character. The end position defaults to `-1`, 
+    which corresponds to the last character.
+
+-   `str_str<-` is equivalent to `substr<-`, but like `str_sub` it understands 
+    negative indices, and replacement strings not do need to be the same length 
+    as the string they are replacing.
+
+Three functions add new functionality:
+
+-   `str_dup()` to duplicate the characters within a string.
+
+-   `str_trim()` to remove leading and trailing whitespace.
+
+-   `str_pad()` to pad a string with extra whitespace on the left, right, or both sides.
+
+## Pattern matching
+
+stringr provides pattern matching functions to **detect**, **locate**, **extract**, **match**, **replace**, and **split** strings. I'll illustrate how they work with some strings and a regular expression designed to match (US) phone numbers:
+
+```{r}
+strings <- c(
+  "apple", 
+  "219 733 8965", 
+  "329-293-8753", 
+  "Work: 579-499-7527; Home: 543.355.3679"
+)
+phone <- "([2-9][0-9]{2})[- .]([0-9]{3})[- .]([0-9]{4})"
+```
+
+-   `str_detect()` detects the presence or absence of a pattern and returns a 
+    logical vector (similar to `grepl()`). `str_subset()` returns the elements
+    of a character vector that match a regular expression (similar to `grep()` 
+    with `value = TRUE`)`.
+    
+    ```{r}
+    # Which strings contain phone numbers?
+    str_detect(strings, phone)
+    str_subset(strings, phone)
+    ```
+
+-   `str_locate()` locates the first position of a pattern and returns a numeric 
+    matrix with columns start and end. `str_locate_all()` locates all matches, 
+    returning a list of numeric matrices. Similar to `regexpr()` and `gregexpr()`.
+
+    ```{r}
+    # Where in the string is the phone number located?
+    (loc <- str_locate(strings, phone))
+    str_locate_all(strings, phone)
+    ```
+
+-   `str_extract()` extracts text corresponding to the first match, returning a 
+    character vector. `str_extract_all()` extracts all matches and returns a 
+    list of character vectors.
+
+    ```{r}
+    # What are the phone numbers?
+    str_extract(strings, phone)
+    str_extract_all(strings, phone)
+    str_extract_all(strings, phone, simplify = TRUE)
+    ```
+
+-   `str_match()` extracts capture groups formed by `()` from the first match. 
+    It returns a character matrix with one column for the complete match and 
+    one column for each group. `str_match_all()` extracts capture groups from 
+    all matches and returns a list of character matrices. Similar to 
+    `regmatches()`.
+
+    ```{r}
+    # Pull out the three components of the match
+    str_match(strings, phone)
+    str_match_all(strings, phone)
+    ```
+
+-   `str_replace()` replaces the first matched pattern and returns a character
+    vector. `str_replace_all()` replaces all matches. Similar to `sub()` and 
+    `gsub()`.
+
+    ```{r}
+    str_replace(strings, phone, "XXX-XXX-XXXX")
+    str_replace_all(strings, phone, "XXX-XXX-XXXX")
+    ```
+
+-   `str_split_fixed()` splits the string into a fixed number of pieces based 
+    on a pattern and returns a character matrix. `str_split()` splits a string 
+    into a variable number of pieces and returns a list of character vectors.
+
+### Arguments
+
+Each pattern matching function has the same first two arguments, a character vector of `string`s to process and a single `pattern` (regular expression) to match. The replace functions have an additional argument specifying the replacement string, and the split functions have an argument to specify the number of pieces.
+
+Unlike base string functions, stringr offers control over matching not through arguments, but through modifier functions, `regexp()`, `coll()` and `fixed()`.  This is a deliberate choice made to simplify these functions. For example, while `grepl` has six arguments, `str_detect()` only has two.
+
+### Regular expressions
+
+To be able to use these functions effectively, you'll need a good knowledge of regular expressions, which this vignette is not going to teach you. Some useful tools to get you started:
+
+-   A good [reference sheet](http://www.regular-expressions.info/reference.html).
+
+-   A tool that allows you to [interactively test](http://gskinner.com/RegExr/)
+    what a regular expression will match.
+
+-   A tool to [build a regular expression](http://www.txt2re.com) from an 
+    input string.
+
+When writing regular expressions, I strongly recommend generating a list of positive (pattern should match) and negative (pattern shouldn't match) test cases to ensure that you are matching the correct components.
+
+### Functions that return lists
+
+Many of the functions return a list of vectors or matrices. To work with each element of the list there are two strategies: iterate through a common set of indices, or use `Map()` to iterate through the vectors simultaneously. The second strategy is illustrated below:
+
+```{r}
+col2hex <- function(col) {
+  rgb <- col2rgb(col)
+  rgb(rgb["red", ], rgb["green", ], rgb["blue", ], max = 255)
+}
+
+# Goal replace colour names in a string with their hex equivalent
+strings <- c("Roses are red, violets are blue", "My favourite colour is green")
+
+colours <- str_c("\\b", colors(), "\\b", collapse="|")
+# This gets us the colours, but we have no way of replacing them
+str_extract_all(strings, colours)
+
+# Instead, let's work with locations
+locs <- str_locate_all(strings, colours)
+Map(function(string, loc) {
+  hex <- col2hex(str_sub(string, loc))
+  str_sub(string, loc) <- hex
+  string
+}, strings, locs)
+```
+
+Another approach is to use the second form of `str_replace_all()`: if you give it a named vector, it applies each `pattern = replacement` in turn:
+
+```{r}
+matches <- col2hex(colors())
+names(matches) <- str_c("\\b", colors(), "\\b")
+
+str_replace_all(strings, matches)
+```
+
+## Conclusion
+
+stringr provides an opinionated interface to strings in R. It makes string processing simpler by removing uncommon options, and by vigorously enforcing consistency across functions. I have also added new functions that I have found useful from Ruby, and over time, I hope users will suggest useful functions from other programming languages. I will continue to build on the included test suite to ensure that the package behaves as expected and remains bug free.

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-stringr.git



More information about the debian-med-commit mailing list