[med-svn] [r-cran-data.table] 01/07: New upstream version 1.10.0

Sun Dec 4 16:23:39 UTC 2016

This is an automated email from the git hooks/post-receive script.

tille pushed a commit to branch master
in repository r-cran-data.table.

commit d5e331cf74603467b18a3d42b374a01a5f0eeb81
Author: Andreas Tille <tille at debian.org>
Date:   Sun Dec 4 16:42:24 2016 +0100

    New upstream version 1.10.0
---
 DESCRIPTION                                        |   41 +-
 LICENSE                                            |  674 ++++
 MD5                                                |  251 +-
 NAMESPACE                                          |   29 +-
 README.md => NEWS.md                               |  392 ++-
 R/IDateTime.R                                      |   90 +-
 R/as.data.table.R                                  |  179 +
 R/between.R                                        |   43 +-
 R/bmerge.R                                         |   13 +-
 R/data.table.R                                     | 1419 ++++----
 R/duplicated.R                                     |   37 +-
 R/fcast.R                                          |   23 +-
 R/fmelt.R                                          |   21 +-
 R/foverlaps.R                                      |    6 +-
 R/frank.R                                          |   93 +
 R/fread.R                                          |   95 +-
 R/fwrite.R                                         |   56 +
 R/last.R                                           |   32 +-
 R/merge.R                                          |   10 +-
 R/onAttach.R                                       |   26 +-
 R/onLoad.R                                         |   16 +-
 R/openmp-utils.R                                   |    8 +
 R/setkey.R                                         |  368 +-
 R/setops.R                                         |  247 +-
 R/test.data.table.R                                |  100 +-
 R/timetaken.R                                      |    2 +-
 R/transpose.R                                      |   44 +-
 R/xts.R                                            |   29 +-
 README.md                                          | 3058 +----------------
 build/vignette.rds                                 |  Bin 360 -> 442 bytes
 inst/doc/datatable-faq.R                           |  248 +-
 inst/doc/datatable-faq.Rmd                         |  616 ++++
 inst/doc/datatable-faq.Rnw                         |  585 ----
 inst/doc/datatable-faq.html                        |  808 +++++
 inst/doc/datatable-faq.pdf                         |  Bin 322575 -> 0 bytes
 inst/doc/datatable-intro-vignette.R                |  210 --
 inst/doc/datatable-intro-vignette.html             |  974 ------
 inst/doc/datatable-intro.R                         |  391 +--
 .../doc/datatable-intro.Rmd                        |  190 +-
 inst/doc/datatable-intro.Rnw                       |  293 --
 inst/doc/datatable-intro.html                      |  907 +++++
 inst/doc/datatable-intro.pdf                       |  Bin 191827 -> 0 bytes
 inst/doc/datatable-keys-fast-subset.R              |   48 +-
 inst/doc/datatable-keys-fast-subset.Rmd            |  123 +-
 inst/doc/datatable-keys-fast-subset.html           |  156 +-
 inst/doc/datatable-reference-semantics.R           |   41 +-
 inst/doc/datatable-reference-semantics.Rmd         |   80 +-
 inst/doc/datatable-reference-semantics.html        |  262 +-
 inst/doc/datatable-reshape.R                       |   26 +-
 inst/doc/datatable-reshape.Rmd                     |   75 +-
 inst/doc/datatable-reshape.html                    |  144 +-
 ...datatable-secondary-indices-and-auto-indexing.R |  112 +
 ...tatable-secondary-indices-and-auto-indexing.Rmd |  327 ++
 ...atable-secondary-indices-and-auto-indexing.html |  453 +++
 inst/tests/1680-fread-header-encoding.csv          |    5 +
 inst/tests/530_fread.txt                           |   51 +
 inst/tests/536_fread_fill_1.txt                    |   29 +
 inst/tests/536_fread_fill_2.txt                    |   28 +
 inst/tests/536_fread_fill_3_extreme.txt            |   22 +
 inst/tests/536_fread_fill_4.txt                    |   30 +
 inst/tests/fread_blank.txt                         |   48 +
 inst/tests/fread_blank2.txt                        |   32 +
 inst/tests/fread_blank3.txt                        |   12 +
 inst/tests/issue_1087_utf8_bom.csv                 |    2 +
 inst/tests/issue_1116_fread_few_lines.txt          |  133 +
 inst/tests/issue_1116_fread_few_lines_2.txt        |  177 +
 inst/tests/issue_1164_json.txt                     |    2 +
 inst/tests/issue_1462_fread_quotes.txt             |    4 +
 inst/tests/issue_1573_fill.txt                     |    8 +
 inst/tests/melt-warning-1752.tsv                   |    2 +
 inst/tests/tests.Rraw                              | 3594 +++++++++++++++++---
 man/IDateTime.Rd                                   |   72 +-
 man/J.Rd                                           |    5 +-
 man/all.equal.data.table.Rd                        |   60 +-
 man/as.data.table.Rd                               |   79 +
 man/as.xts.data.table.Rd                           |    3 +-
 man/assign.Rd                                      |  129 +-
 man/between.Rd                                     |   73 +-
 man/data.table.Rd                                  |  524 +--
 man/datatable-optimize.Rd                          |  148 +
 man/dcast.data.table.Rd                            |   53 +-
 man/duplicated.Rd                                  |  103 +-
 man/first.Rd                                       |   34 +
 man/foverlaps.Rd                                   |  154 +-
 man/fread.Rd                                       |   45 +-
 man/fsort.Rd                                       |   32 +
 man/fwrite.Rd                                      |  118 +
 man/last.Rd                                        |   32 +-
 man/like.Rd                                        |    7 +-
 man/melt.data.table.Rd                             |   91 +-
 man/merge.Rd                                       |  133 +-
 man/openmp-utils.Rd                                |   19 +
 man/patterns.Rd                                    |   25 +-
 man/print.data.table.Rd                            |   49 +
 man/rbindlist.Rd                                   |    2 +-
 man/rleid.Rd                                       |   17 +-
 man/rowid.Rd                                       |   49 +
 man/setDF.Rd                                       |    6 +-
 man/setDT.Rd                                       |    4 +-
 man/setNumericRounding.Rd                          |   54 +-
 man/setattr.Rd                                     |    5 +-
 man/setkey.Rd                                      |  126 +-
 man/setops.Rd                                      |   59 +
 man/setorder.Rd                                    |   87 +-
 man/shift.Rd                                       |   10 +-
 man/shouldPrint.Rd                                 |   25 +
 man/special-symbols.Rd                             |   50 +
 man/split.Rd                                       |   76 +
 man/test.data.table.Rd                             |    5 +-
 man/truelength.Rd                                  |   16 +-
 man/tstrsplit.Rd                                   |   17 +-
 src/Makevars                                       |    4 +
 src/assign.c                                       |  120 +-
 src/between.c                                      |  104 +
 src/bmerge.c                                       |  459 ++-
 src/data.table.h                                   |   51 +-
 src/dogroups.c                                     |  148 +-
 src/fastradixdouble.c                              |  253 --
 src/fastradixint.c                                 |  100 -
 src/fcast.c                                        |  389 +--
 src/fmelt.c                                        |   72 +-
 src/forder.c                                       |   69 +-
 src/frank.c                                        |   10 +-
 src/fread.c                                        |  308 +-
 src/fsort.c                                        |  302 ++
 src/fwrite.c                                       | 1110 ++++++
 src/fwriteLookups.h                                | 2142 ++++++++++++
 src/gsumm.c                                        |  805 ++++-
 src/init.c                                         |   95 +-
 src/inrange.c                                      |   35 +
 src/openmp-utils.c                                 |   64 +
 src/quickselect.c                                  |  102 +
 src/rbindlist.c                                    |  113 +-
 src/reorder.c                                      |  209 +-
 src/shift.c                                        |   24 +
 src/subset.c                                       |  313 ++
 src/transpose.c                                    |   11 +-
 src/uniqlist.c                                     |  199 +-
 src/wrappers.c                                     |   29 +
 tests/autoprint.R                                  |    2 +-
 tests/autoprint.Rout.save                          |    4 +-
 tests/knitr.R                                      |   10 +-
 tests/knitr.Rout.mock                              |   41 +
 tests/knitr.Rout.save                              |   11 +-
 tests/{tests.R => main.R}                          |    4 +-
 tests/test-all.R                                   |    4 -
 tests/testthat.R                                   |    6 +
 {inst/tests => tests/testthat}/test-S4.R           |    0
 .../testthat}/test-data.frame-like.R               |    4 +-
 vignettes/Makefile                                 |    8 -
 vignettes/datatable-faq.Rmd                        |  616 ++++
 vignettes/datatable-faq.Rnw                        |  585 ----
 .../datatable-intro.Rmd                            |  190 +-
 vignettes/datatable-intro.Rnw                      |  293 --
 vignettes/datatable-keys-fast-subset.Rmd           |  123 +-
 vignettes/datatable-reference-semantics.Rmd        |   80 +-
 vignettes/datatable-reshape.Rmd                    |   75 +-
 ...tatable-secondary-indices-and-auto-indexing.Rmd |  327 ++
 158 files changed, 20402 insertions(+), 10867 deletions(-)

diff --git a/DESCRIPTION b/DESCRIPTION
index 292915f..3d0eae6 100644
--- a/DESCRIPTION
+++ b/DESCRIPTION
@@ -1,21 +1,34 @@
 Package: data.table
-Version: 1.9.6
-Title: Extension of Data.frame
-Author: M Dowle, A Srinivasan, T Short, S Lianoglou with contributions from R Saporta, E Antonyan
-Maintainer: Matt Dowle <mattjdowle at gmail.com>
-Depends: R (>= 2.14.1)
-Imports: methods, chron
-Suggests: ggplot2 (>= 0.9.0), plyr, reshape, reshape2, testthat (>=
-        0.4), hexbin, fastmatch, nlme, xts, bit64, gdata,
-        GenomicRanges, caret, knitr, curl, zoo, plm
-Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns and a fast file reader (fread). Offers a natural and flexible syntax, for faster development.
-License: GPL (>= 2)
-URL: https://github.com/Rdatatable/data.table/wiki
+Version: 1.10.0
+Title: Extension of `data.frame`
+Authors at R: c(
+  person("Matt","Dowle",      role=c("aut","cre"), email="mattjdowle at gmail.com"),
+  person("Arun","Srinivasan", role="aut", email="arunkumar.sriniv at gmail.com"),
+  person("Jan","Gorecki",     role="ctb"),
+  person("Tom","Short",       role="ctb"),
+  person("Steve","Lianoglou", role="ctb"),
+  person("Eduard","Antonyan", role="ctb") )
+Depends: R (>= 3.0.0)
+Imports: methods
+Suggests: bit64, knitr, chron, ggplot2 (>= 0.9.0), plyr, reshape,
+        reshape2, testthat (>= 0.4), hexbin, fastmatch, nlme, xts,
+        gdata, GenomicRanges, caret, curl, zoo, plm, rmarkdown,
+        parallel
+Description: Fast aggregation of large data (e.g. 100GB in RAM), fast ordered joins, fast add/modify/delete of columns by group using no copies at all, list columns, a fast friendly file reader and parallel file writer. Offers a natural and flexible syntax, for faster development.
+License: GPL-3 | file LICENSE
+URL: http://r-datatable.com
 BugReports: https://github.com/Rdatatable/data.table/issues
 MailingList: datatable-help at lists.r-forge.r-project.org
 VignetteBuilder: knitr
 ByteCompile: TRUE
 NeedsCompilation: yes
-Packaged: 2015-09-19 04:47:43.628 UTC; mdowle
+Packaged: 2016-12-02 19:25:45.346 UTC; mdowle
+Author: Matt Dowle [aut, cre],
+  Arun Srinivasan [aut],
+  Jan Gorecki [ctb],
+  Tom Short [ctb],
+  Steve Lianoglou [ctb],
+  Eduard Antonyan [ctb]
+Maintainer: Matt Dowle <mattjdowle at gmail.com>
 Repository: CRAN
-Date/Publication: 2015-09-19 22:13:43
+Date/Publication: 2016-12-03 11:05:23
diff --git a/LICENSE b/LICENSE
new file mode 100644
index 0000000..94a9ed0
--- /dev/null
+++ b/LICENSE
@@ -0,0 +1,674 @@
+                    GNU GENERAL PUBLIC LICENSE
+                       Version 3, 29 June 2007
+
+ Copyright (C) 2007 Free Software Foundation, Inc. <http://fsf.org/>
+ Everyone is permitted to copy and distribute verbatim copies
+ of this license document, but changing it is not allowed.
+
+                            Preamble
+
+  The GNU General Public License is a free, copyleft license for
+software and other kinds of works.
+
+  The licenses for most software and other practical works are designed
+to take away your freedom to share and change the works.  By contrast,
+the GNU General Public License is intended to guarantee your freedom to
+share and change all versions of a program--to make sure it remains free
+software for all its users.  We, the Free Software Foundation, use the
+GNU General Public License for most of our software; it applies also to
+any other work released this way by its authors.  You can apply it to
+your programs, too.
+
+  When we speak of free software, we are referring to freedom, not
+price.  Our General Public Licenses are designed to make sure that you
+have the freedom to distribute copies of free software (and charge for
+them if you wish), that you receive source code or can get it if you
+want it, that you can change the software or use pieces of it in new
+free programs, and that you know you can do these things.
+
+  To protect your rights, we need to prevent others from denying you
+these rights or asking you to surrender the rights.  Therefore, you have
+certain responsibilities if you distribute copies of the software, or if
+you modify it: responsibilities to respect the freedom of others.
+
+  For example, if you distribute copies of such a program, whether
+gratis or for a fee, you must pass on to the recipients the same
+freedoms that you received.  You must make sure that they, too, receive
+or can get the source code.  And you must show them these terms so they
+know their rights.
+
+  Developers that use the GNU GPL protect your rights with two steps:
+(1) assert copyright on the software, and (2) offer you this License
+giving you legal permission to copy, distribute and/or modify it.
+
+  For the developers' and authors' protection, the GPL clearly explains
+that there is no warranty for this free software.  For both users' and
+authors' sake, the GPL requires that modified versions be marked as
+changed, so that their problems will not be attributed erroneously to
+authors of previous versions.
+
+  Some devices are designed to deny users access to install or run
+modified versions of the software inside them, although the manufacturer
+can do so.  This is fundamentally incompatible with the aim of
+protecting users' freedom to change the software.  The systematic
+pattern of such abuse occurs in the area of products for individuals to
+use, which is precisely where it is most unacceptable.  Therefore, we
+have designed this version of the GPL to prohibit the practice for those
+products.  If such problems arise substantially in other domains, we
+stand ready to extend this provision to those domains in future versions
+of the GPL, as needed to protect the freedom of users.
+
+  Finally, every program is threatened constantly by software patents.
+States should not allow patents to restrict development and use of
+software on general-purpose computers, but in those that do, we wish to
+avoid the special danger that patents applied to a free program could
+make it effectively proprietary.  To prevent this, the GPL assures that
+patents cannot be used to render the program non-free.
+
+  The precise terms and conditions for copying, distribution and
+modification follow.
+
+                       TERMS AND CONDITIONS
+
+  0. Definitions.
+
+  "This License" refers to version 3 of the GNU General Public License.
+
+  "Copyright" also means copyright-like laws that apply to other kinds of
+works, such as semiconductor masks.
+
+  "The Program" refers to any copyrightable work licensed under this
+License.  Each licensee is addressed as "you".  "Licensees" and
+"recipients" may be individuals or organizations.
+
+  To "modify" a work means to copy from or adapt all or part of the work
+in a fashion requiring copyright permission, other than the making of an
+exact copy.  The resulting work is called a "modified version" of the
+earlier work or a work "based on" the earlier work.
+
+  A "covered work" means either the unmodified Program or a work based
+on the Program.
+
+  To "propagate" a work means to do anything with it that, without
+permission, would make you directly or secondarily liable for
+infringement under applicable copyright law, except executing it on a
+computer or modifying a private copy.  Propagation includes copying,
+distribution (with or without modification), making available to the
+public, and in some countries other activities as well.
+
+  To "convey" a work means any kind of propagation that enables other
+parties to make or receive copies.  Mere interaction with a user through
+a computer network, with no transfer of a copy, is not conveying.
+
+  An interactive user interface displays "Appropriate Legal Notices"
+to the extent that it includes a convenient and prominently visible
+feature that (1) displays an appropriate copyright notice, and (2)
+tells the user that there is no warranty for the work (except to the
+extent that warranties are provided), that licensees may convey the
+work under this License, and how to view a copy of this License.  If
+the interface presents a list of user commands or options, such as a
+menu, a prominent item in the list meets this criterion.
+
+  1. Source Code.
+
+  The "source code" for a work means the preferred form of the work
+for making modifications to it.  "Object code" means any non-source
+form of a work.
+
+  A "Standard Interface" means an interface that either is an official
+standard defined by a recognized standards body, or, in the case of
+interfaces specified for a particular programming language, one that
+is widely used among developers working in that language.
+
+  The "System Libraries" of an executable work include anything, other
+than the work as a whole, that (a) is included in the normal form of
+packaging a Major Component, but which is not part of that Major
+Component, and (b) serves only to enable use of the work with that
+Major Component, or to implement a Standard Interface for which an
+implementation is available to the public in source code form.  A
+"Major Component", in this context, means a major essential component
+(kernel, window system, and so on) of the specific operating system
+(if any) on which the executable work runs, or a compiler used to
+produce the work, or an object code interpreter used to run it.
+
+  The "Corresponding Source" for a work in object code form means all
+the source code needed to generate, install, and (for an executable
+work) run the object code and to modify the work, including scripts to
+control those activities.  However, it does not include the work's
+System Libraries, or general-purpose tools or generally available free
+programs which are used unmodified in performing those activities but
+which are not part of the work.  For example, Corresponding Source
+includes interface definition files associated with source files for
+the work, and the source code for shared libraries and dynamically
+linked subprograms that the work is specifically designed to require,
+such as by intimate data communication or control flow between those
+subprograms and other parts of the work.
+
+  The Corresponding Source need not include anything that users
+can regenerate automatically from other parts of the Corresponding
+Source.
+
+  The Corresponding Source for a work in source code form is that
+same work.
+
+  2. Basic Permissions.
+
+  All rights granted under this License are granted for the term of
+copyright on the Program, and are irrevocable provided the stated
+conditions are met.  This License explicitly affirms your unlimited
+permission to run the unmodified Program.  The output from running a
+covered work is covered by this License only if the output, given its
+content, constitutes a covered work.  This License acknowledges your
+rights of fair use or other equivalent, as provided by copyright law.
+
+  You may make, run and propagate covered works that you do not
+convey, without conditions so long as your license otherwise remains
+in force.  You may convey covered works to others for the sole purpose
+of having them make modifications exclusively for you, or provide you
+with facilities for running those works, provided that you comply with
+the terms of this License in conveying all material for which you do
+not control copyright.  Those thus making or running the covered works
+for you must do so exclusively on your behalf, under your direction
+and control, on terms that prohibit them from making any copies of
+your copyrighted material outside their relationship with you.
+
+  Conveying under any other circumstances is permitted solely under
+the conditions stated below.  Sublicensing is not allowed; section 10
+makes it unnecessary.
+
+  3. Protecting Users' Legal Rights From Anti-Circumvention Law.
+
+  No covered work shall be deemed part of an effective technological
+measure under any applicable law fulfilling obligations under article
+11 of the WIPO copyright treaty adopted on 20 December 1996, or
+similar laws prohibiting or restricting circumvention of such
+measures.
+
+  When you convey a covered work, you waive any legal power to forbid
+circumvention of technological measures to the extent such circumvention
+is effected by exercising rights under this License with respect to
+the covered work, and you disclaim any intention to limit operation or
+modification of the work as a means of enforcing, against the work's
+users, your or third parties' legal rights to forbid circumvention of
+technological measures.
+
+  4. Conveying Verbatim Copies.
+
+  You may convey verbatim copies of the Program's source code as you
+receive it, in any medium, provided that you conspicuously and
+appropriately publish on each copy an appropriate copyright notice;
+keep intact all notices stating that this License and any
+non-permissive terms added in accord with section 7 apply to the code;
+keep intact all notices of the absence of any warranty; and give all
+recipients a copy of this License along with the Program.
+
+  You may charge any price or no price for each copy that you convey,
+and you may offer support or warranty protection for a fee.
+
+  5. Conveying Modified Source Versions.
+
+  You may convey a work based on the Program, or the modifications to
+produce it from the Program, in the form of source code under the
+terms of section 4, provided that you also meet all of these conditions:
+
+    a) The work must carry prominent notices stating that you modified
+    it, and giving a relevant date.
+
+    b) The work must carry prominent notices stating that it is
+    released under this License and any conditions added under section
+    7.  This requirement modifies the requirement in section 4 to
+    "keep intact all notices".
+
+    c) You must license the entire work, as a whole, under this
+    License to anyone who comes into possession of a copy.  This
+    License will therefore apply, along with any applicable section 7
+    additional terms, to the whole of the work, and all its parts,
+    regardless of how they are packaged.  This License gives no
+    permission to license the work in any other way, but it does not
+    invalidate such permission if you have separately received it.
+
+    d) If the work has interactive user interfaces, each must display
+    Appropriate Legal Notices; however, if the Program has interactive
+    interfaces that do not display Appropriate Legal Notices, your
+    work need not make them do so.
+
+  A compilation of a covered work with other separate and independent
+works, which are not by their nature extensions of the covered work,
+and which are not combined with it such as to form a larger program,
+in or on a volume of a storage or distribution medium, is called an
+"aggregate" if the compilation and its resulting copyright are not
+used to limit the access or legal rights of the compilation's users
+beyond what the individual works permit.  Inclusion of a covered work
+in an aggregate does not cause this License to apply to the other
+parts of the aggregate.
+
+  6. Conveying Non-Source Forms.
+
+  You may convey a covered work in object code form under the terms
+of sections 4 and 5, provided that you also convey the
+machine-readable Corresponding Source under the terms of this License,
+in one of these ways:
+
+    a) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by the
+    Corresponding Source fixed on a durable physical medium
+    customarily used for software interchange.
+
+    b) Convey the object code in, or embodied in, a physical product
+    (including a physical distribution medium), accompanied by a
+    written offer, valid for at least three years and valid for as
+    long as you offer spare parts or customer support for that product
+    model, to give anyone who possesses the object code either (1) a
+    copy of the Corresponding Source for all the software in the
+    product that is covered by this License, on a durable physical
+    medium customarily used for software interchange, for a price no
+    more than your reasonable cost of physically performing this
+    conveying of source, or (2) access to copy the
+    Corresponding Source from a network server at no charge.
+
+    c) Convey individual copies of the object code with a copy of the
+    written offer to provide the Corresponding Source.  This
+    alternative is allowed only occasionally and noncommercially, and
+    only if you received the object code with such an offer, in accord
+    with subsection 6b.
+
+    d) Convey the object code by offering access from a designated
+    place (gratis or for a charge), and offer equivalent access to the
+    Corresponding Source in the same way through the same place at no
+    further charge.  You need not require recipients to copy the
+    Corresponding Source along with the object code.  If the place to
+    copy the object code is a network server, the Corresponding Source
+    may be on a different server (operated by you or a third party)
+    that supports equivalent copying facilities, provided you maintain
+    clear directions next to the object code saying where to find the
+    Corresponding Source.  Regardless of what server hosts the
+    Corresponding Source, you remain obligated to ensure that it is
+    available for as long as needed to satisfy these requirements.
+
+    e) Convey the object code using peer-to-peer transmission, provided
+    you inform other peers where the object code and Corresponding
+    Source of the work are being offered to the general public at no
+    charge under subsection 6d.
+
+  A separable portion of the object code, whose source code is excluded
+from the Corresponding Source as a System Library, need not be
+included in conveying the object code work.
+
+  A "User Product" is either (1) a "consumer product", which means any
+tangible personal property which is normally used for personal, family,
+or household purposes, or (2) anything designed or sold for incorporation
+into a dwelling.  In determining whether a product is a consumer product,
+doubtful cases shall be resolved in favor of coverage.  For a particular
+product received by a particular user, "normally used" refers to a
+typical or common use of that class of product, regardless of the status
+of the particular user or of the way in which the particular user
+actually uses, or expects or is expected to use, the product.  A product
+is a consumer product regardless of whether the product has substantial
+commercial, industrial or non-consumer uses, unless such uses represent
+the only significant mode of use of the product.
+
+  "Installation Information" for a User Product means any methods,
+procedures, authorization keys, or other information required to install
+and execute modified versions of a covered work in that User Product from
+a modified version of its Corresponding Source.  The information must
+suffice to ensure that the continued functioning of the modified object
+code is in no case prevented or interfered with solely because
+modification has been made.
+
+  If you convey an object code work under this section in, or with, or
+specifically for use in, a User Product, and the conveying occurs as
+part of a transaction in which the right of possession and use of the
+User Product is transferred to the recipient in perpetuity or for a
+fixed term (regardless of how the transaction is characterized), the
+Corresponding Source conveyed under this section must be accompanied
+by the Installation Information.  But this requirement does not apply
+if neither you nor any third party retains the ability to install
+modified object code on the User Product (for example, the work has
+been installed in ROM).
+
+  The requirement to provide Installation Information does not include a
+requirement to continue to provide support service, warranty, or updates
+for a work that has been modified or installed by the recipient, or for
+the User Product in which it has been modified or installed.  Access to a
+network may be denied when the modification itself materially and
+adversely affects the operation of the network or violates the rules and
+protocols for communication across the network.
+
+  Corresponding Source conveyed, and Installation Information provided,
+in accord with this section must be in a format that is publicly
+documented (and with an implementation available to the public in
+source code form), and must require no special password or key for
+unpacking, reading or copying.
+
+  7. Additional Terms.
+
+  "Additional permissions" are terms that supplement the terms of this
+License by making exceptions from one or more of its conditions.
+Additional permissions that are applicable to the entire Program shall
+be treated as though they were included in this License, to the extent
+that they are valid under applicable law.  If additional permissions
+apply only to part of the Program, that part may be used separately
+under those permissions, but the entire Program remains governed by
+this License without regard to the additional permissions.
+
+  When you convey a copy of a covered work, you may at your option
+remove any additional permissions from that copy, or from any part of
+it.  (Additional permissions may be written to require their own
+removal in certain cases when you modify the work.)  You may place
+additional permissions on material, added by you to a covered work,
+for which you have or can give appropriate copyright permission.
+
+  Notwithstanding any other provision of this License, for material you
+add to a covered work, you may (if authorized by the copyright holders of
+that material) supplement the terms of this License with terms:
+
+    a) Disclaiming warranty or limiting liability differently from the
+    terms of sections 15 and 16 of this License; or
+
+    b) Requiring preservation of specified reasonable legal notices or
+    author attributions in that material or in the Appropriate Legal
+    Notices displayed by works containing it; or
+
+    c) Prohibiting misrepresentation of the origin of that material, or
+    requiring that modified versions of such material be marked in
+    reasonable ways as different from the original version; or
+
+    d) Limiting the use for publicity purposes of names of licensors or
+    authors of the material; or
+
+    e) Declining to grant rights under trademark law for use of some
+    trade names, trademarks, or service marks; or
+
+    f) Requiring indemnification of licensors and authors of that
+    material by anyone who conveys the material (or modified versions of
+    it) with contractual assumptions of liability to the recipient, for
+    any liability that these contractual assumptions directly impose on
+    those licensors and authors.
+
+  All other non-permissive additional terms are considered "further
+restrictions" within the meaning of section 10.  If the Program as you
+received it, or any part of it, contains a notice stating that it is
+governed by this License along with a term that is a further
+restriction, you may remove that term.  If a license document contains
+a further restriction but permits relicensing or conveying under this
+License, you may add to a covered work material governed by the terms
+of that license document, provided that the further restriction does
+not survive such relicensing or conveying.
+
+  If you add terms to a covered work in accord with this section, you
+must place, in the relevant source files, a statement of the
+additional terms that apply to those files, or a notice indicating
+where to find the applicable terms.
+
+  Additional terms, permissive or non-permissive, may be stated in the
+form of a separately written license, or stated as exceptions;
+the above requirements apply either way.
+
+  8. Termination.
+
+  You may not propagate or modify a covered work except as expressly
+provided under this License.  Any attempt otherwise to propagate or
+modify it is void, and will automatically terminate your rights under
+this License (including any patent licenses granted under the third
+paragraph of section 11).
+
+  However, if you cease all violation of this License, then your
+license from a particular copyright holder is reinstated (a)
+provisionally, unless and until the copyright holder explicitly and
+finally terminates your license, and (b) permanently, if the copyright
+holder fails to notify you of the violation by some reasonable means
+prior to 60 days after the cessation.
+
+  Moreover, your license from a particular copyright holder is
+reinstated permanently if the copyright holder notifies you of the
+violation by some reasonable means, this is the first time you have
+received notice of violation of this License (for any work) from that
+copyright holder, and you cure the violation prior to 30 days after
+your receipt of the notice.
+
+  Termination of your rights under this section does not terminate the
+licenses of parties who have received copies or rights from you under
+this License.  If your rights have been terminated and not permanently
+reinstated, you do not qualify to receive new licenses for the same
+material under section 10.
+
+  9. Acceptance Not Required for Having Copies.
+
+  You are not required to accept this License in order to receive or
+run a copy of the Program.  Ancillary propagation of a covered work
+occurring solely as a consequence of using peer-to-peer transmission
+to receive a copy likewise does not require acceptance.  However,
+nothing other than this License grants you permission to propagate or
+modify any covered work.  These actions infringe copyright if you do
+not accept this License.  Therefore, by modifying or propagating a
+covered work, you indicate your acceptance of this License to do so.
+
+  10. Automatic Licensing of Downstream Recipients.
+
+  Each time you convey a covered work, the recipient automatically
+receives a license from the original licensors, to run, modify and
+propagate that work, subject to this License.  You are not responsible
+for enforcing compliance by third parties with this License.
+
+  An "entity transaction" is a transaction transferring control of an
+organization, or substantially all assets of one, or subdividing an
+organization, or merging organizations.  If propagation of a covered
+work results from an entity transaction, each party to that
+transaction who receives a copy of the work also receives whatever
+licenses to the work the party's predecessor in interest had or could
+give under the previous paragraph, plus a right to possession of the
+Corresponding Source of the work from the predecessor in interest, if
+the predecessor has it or can get it with reasonable efforts.
+
+  You may not impose any further restrictions on the exercise of the
+rights granted or affirmed under this License.  For example, you may
+not impose a license fee, royalty, or other charge for exercise of
+rights granted under this License, and you may not initiate litigation
+(including a cross-claim or counterclaim in a lawsuit) alleging that
+any patent claim is infringed by making, using, selling, offering for
+sale, or importing the Program or any portion of it.
+
+  11. Patents.
+
+  A "contributor" is a copyright holder who authorizes use under this
+License of the Program or a work on which the Program is based.  The
+work thus licensed is called the contributor's "contributor version".
+
+  A contributor's "essential patent claims" are all patent claims
+owned or controlled by the contributor, whether already acquired or
+hereafter acquired, that would be infringed by some manner, permitted
+by this License, of making, using, or selling its contributor version,
+but do not include claims that would be infringed only as a
+consequence of further modification of the contributor version.  For
+purposes of this definition, "control" includes the right to grant
+patent sublicenses in a manner consistent with the requirements of
+this License.
+
+  Each contributor grants you a non-exclusive, worldwide, royalty-free
+patent license under the contributor's essential patent claims, to
+make, use, sell, offer for sale, import and otherwise run, modify and
+propagate the contents of its contributor version.
+
+  In the following three paragraphs, a "patent license" is any express
+agreement or commitment, however denominated, not to enforce a patent
+(such as an express permission to practice a patent or covenant not to
+sue for patent infringement).  To "grant" such a patent license to a
+party means to make such an agreement or commitment not to enforce a
+patent against the party.
+
+  If you convey a covered work, knowingly relying on a patent license,
+and the Corresponding Source of the work is not available for anyone
+to copy, free of charge and under the terms of this License, through a
+publicly available network server or other readily accessible means,
+then you must either (1) cause the Corresponding Source to be so
+available, or (2) arrange to deprive yourself of the benefit of the
+patent license for this particular work, or (3) arrange, in a manner
+consistent with the requirements of this License, to extend the patent
+license to downstream recipients.  "Knowingly relying" means you have
+actual knowledge that, but for the patent license, your conveying the
+covered work in a country, or your recipient's use of the covered work
+in a country, would infringe one or more identifiable patents in that
+country that you have reason to believe are valid.
+
+  If, pursuant to or in connection with a single transaction or
+arrangement, you convey, or propagate by procuring conveyance of, a
+covered work, and grant a patent license to some of the parties
+receiving the covered work authorizing them to use, propagate, modify
+or convey a specific copy of the covered work, then the patent license
+you grant is automatically extended to all recipients of the covered
+work and works based on it.
+
+  A patent license is "discriminatory" if it does not include within
+the scope of its coverage, prohibits the exercise of, or is
+conditioned on the non-exercise of one or more of the rights that are
+specifically granted under this License.  You may not convey a covered
+work if you are a party to an arrangement with a third party that is
+in the business of distributing software, under which you make payment
+to the third party based on the extent of your activity of conveying
+the work, and under which the third party grants, to any of the
+parties who would receive the covered work from you, a discriminatory
+patent license (a) in connection with copies of the covered work
+conveyed by you (or copies made from those copies), or (b) primarily
+for and in connection with specific products or compilations that
+contain the covered work, unless you entered into that arrangement,
+or that patent license was granted, prior to 28 March 2007.
+
+  Nothing in this License shall be construed as excluding or limiting
+any implied license or other defenses to infringement that may
+otherwise be available to you under applicable patent law.
+
+  12. No Surrender of Others' Freedom.
+
+  If conditions are imposed on you (whether by court order, agreement or
+otherwise) that contradict the conditions of this License, they do not
+excuse you from the conditions of this License.  If you cannot convey a
+covered work so as to satisfy simultaneously your obligations under this
+License and any other pertinent obligations, then as a consequence you may
+not convey it at all.  For example, if you agree to terms that obligate you
+to collect a royalty for further conveying from those to whom you convey
+the Program, the only way you could satisfy both those terms and this
+License would be to refrain entirely from conveying the Program.
+
+  13. Use with the GNU Affero General Public License.
+
+  Notwithstanding any other provision of this License, you have
+permission to link or combine any covered work with a work licensed
+under version 3 of the GNU Affero General Public License into a single
+combined work, and to convey the resulting work.  The terms of this
+License will continue to apply to the part which is the covered work,
+but the special requirements of the GNU Affero General Public License,
+section 13, concerning interaction through a network will apply to the
+combination as such.
+
+  14. Revised Versions of this License.
+
+  The Free Software Foundation may publish revised and/or new versions of
+the GNU General Public License from time to time.  Such new versions will
+be similar in spirit to the present version, but may differ in detail to
+address new problems or concerns.
+
+  Each version is given a distinguishing version number.  If the
+Program specifies that a certain numbered version of the GNU General
+Public License "or any later version" applies to it, you have the
+option of following the terms and conditions either of that numbered
+version or of any later version published by the Free Software
+Foundation.  If the Program does not specify a version number of the
+GNU General Public License, you may choose any version ever published
+by the Free Software Foundation.
+
+  If the Program specifies that a proxy can decide which future
+versions of the GNU General Public License can be used, that proxy's
+public statement of acceptance of a version permanently authorizes you
+to choose that version for the Program.
+
+  Later license versions may give you additional or different
+permissions.  However, no additional obligations are imposed on any
+author or copyright holder as a result of your choosing to follow a
+later version.
+
+  15. Disclaimer of Warranty.
+
+  THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
+APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
+HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
+OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
+THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
+PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
+IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
+ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
+
+  16. Limitation of Liability.
+
+  IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
+WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
+THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
+GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
+USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
+DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
+PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
+EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
+SUCH DAMAGES.
+
+  17. Interpretation of Sections 15 and 16.
+
+  If the disclaimer of warranty and limitation of liability provided
+above cannot be given local legal effect according to their terms,
+reviewing courts shall apply local law that most closely approximates
+an absolute waiver of all civil liability in connection with the
+Program, unless a warranty or assumption of liability accompanies a
+copy of the Program in return for a fee.
+
+                     END OF TERMS AND CONDITIONS
+
+            How to Apply These Terms to Your New Programs
+
+  If you develop a new program, and you want it to be of the greatest
+possible use to the public, the best way to achieve this is to make it
+free software which everyone can redistribute and change under these terms.
+
+  To do so, attach the following notices to the program.  It is safest
+to attach them to the start of each source file to most effectively
+state the exclusion of warranty; and each file should have at least
+the "copyright" line and a pointer to where the full notice is found.
+
+    <one line to give the program's name and a brief idea of what it does.>
+    Copyright (C) <year>  <name of author>
+
+    This program is free software: you can redistribute it and/or modify
+    it under the terms of the GNU General Public License as published by
+    the Free Software Foundation, either version 3 of the License, or
+    (at your option) any later version.
+
+    This program is distributed in the hope that it will be useful,
+    but WITHOUT ANY WARRANTY; without even the implied warranty of
+    MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+    GNU General Public License for more details.
+
+    You should have received a copy of the GNU General Public License
+    along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+Also add information on how to contact you by electronic and paper mail.
+
+  If the program does terminal interaction, make it output a short
+notice like this when it starts in an interactive mode:
+
+    <program>  Copyright (C) <year>  <name of author>
+    This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
+    This is free software, and you are welcome to redistribute it
+    under certain conditions; type `show c' for details.
+
+The hypothetical commands `show w' and `show c' should show the appropriate
+parts of the General Public License.  Of course, your program's commands
+might be different; for a GUI interface, you would use an "about box".
+
+  You should also get your employer (if you work as a programmer) or school,
+if any, to sign a "copyright disclaimer" for the program, if necessary.
+For more information on this, and how to apply and follow the GNU GPL, see
+<http://www.gnu.org/licenses/>.
+
+  The GNU General Public License does not permit incorporating your program
+into proprietary programs.  If your program is a subroutine library, you
+may consider it more useful to permit linking proprietary applications with
+the library.  If this is what you want to do, use the GNU Lesser General
+Public License instead of this License.  But first, please read
+<http://www.gnu.org/philosophy/why-not-lgpl.html>.
diff --git a/MD5 b/MD5
index 9105fff..e9c3114 100644
--- a/MD5
+++ b/MD5
@@ -1,150 +1,191 @@
-8e843c4734d52481552fc5112dcdec70 *DESCRIPTION
-2a95b4283f27fb6f510d766d001e151a *NAMESPACE
+8c3e3e59b79bc6ff0826e6cb7fd46526 *DESCRIPTION
+d32239bcb673463ab874e80d47fae504 *LICENSE
+5feca43f299bbbd001563b1033a59705 *NAMESPACE
+33a8be67e6ca2155dbb2b451af19d7db *NEWS.md
 4377e4b917c44366f66862324c8fd32a *R/AllS4.R
-5ac1a83b6c6ac8bff444dfa32d44fb27 *R/IDateTime.R
-958328fd685ccbbf30845c21dfab48c9 *R/between.R
-fd94efc958aa281dbf916ec647d50241 *R/bmerge.R
+3d35eb16da4271f72a9118200db76d65 *R/IDateTime.R
+55b63a5d831071ec745ce7bd4ab4a336 *R/as.data.table.R
+ad10acb3d99aab6ccf9dc37bb836d817 *R/between.R
+9052f694cdd5de844433349a91f3bc7a *R/bmerge.R
 0041b7a70824ed486983cf52ff902aaa *R/c.factor.R
 b28ccef4d6291066b742fd3a3a5f5d39 *R/cedta.R
-b8be0229f2bc451f00a99973c2333466 *R/data.table.R
-081129adbf5a24e7ec8c4466017c90c0 *R/duplicated.R
-15fc0c4d82824f3e4f58222973c87ac8 *R/fcast.R
-e646383ee56798334d95549f835b5ad4 *R/fmelt.R
-5d423874e740e04bf204d90386413e2b *R/foverlaps.R
-ec044b964fdeaf668e3bcf288715b097 *R/fread.R
+3c6276faf6ce4d69984c537b4ec21f63 *R/data.table.R
+4e672da0bdc96542f0713ba1da6bde69 *R/duplicated.R
+8fdd31e066db641fe2da803bd2e20268 *R/fcast.R
+f758580ab081803256534bd9ee22e250 *R/fmelt.R
+8b4c18a8d933cc3a92dab206aec20e60 *R/foverlaps.R
+ff31822b594f68c126528ff50f21897a *R/frank.R
+9c97a879ed5969be61b1d505abc40b96 *R/fread.R
+6d9b80bfa81f1cf0a7e204db48f0f5b4 *R/fwrite.R
 50f35848d63ae310c3b4a448bb170097 *R/getdots.R
-05efe8da384ddfcaf20be877f38f3cc4 *R/last.R
+9e49976bdb1d7b41a755a588fd9eb05b *R/last.R
 a9a8ec624e5afd3067d8ea9cc33482f6 *R/like.R
-dc83686aa80cca46637eb41e3fa883a6 *R/merge.R
-38831dc98211508cceafd5c17e83c82d *R/onAttach.R
-797a7cbea6b37489bb15fd46bd64b907 *R/onLoad.R
-db58675c3a18162334749ed51cddbba0 *R/setkey.R
-b270fb622c3463b4c425aaf7886e1d57 *R/setops.R
+8cace748741791eac2d81501c02a1488 *R/merge.R
+5a8e73bd84c6460446b9e43c2966e5f7 *R/onAttach.R
+e1b5de66ef3e976962065e029655f709 *R/onLoad.R
+a00ff6edb3599fcf2801acfc12c4a7e4 *R/openmp-utils.R
+697ce1fba7cf18681bb3f58076bd32cc *R/setkey.R
+b8d3020884ea456d3eeeecf5b06323ef *R/setops.R
 b9a144d533820eb7529e86a9b794e661 *R/shift.R
 1432e4ed7a2a3d6be052527b9005cd8d *R/tables.R
-a67909fc208084de93587c7f522c4170 *R/test.data.table.R
-f84b342bd4a2df21900a7a596cc122d3 *R/timetaken.R
-9a22293f18bc61f6b9700af628f52805 *R/transpose.R
+f8b4fc4d8738808c6a8540e290324a83 *R/test.data.table.R
+20aa38be566f1dc0f21a8937944481c5 *R/timetaken.R
+5ced66446db69ebb73eba28399336922 *R/transpose.R
 085cbece46eb9e4689189ac6cda84457 *R/uniqlist.R
 fda476f057d8bc9cdd88f97265d4d162 *R/utils.R
-8442ff0ef5733397a269e55a8cc112b6 *R/xts.R
-bc4abbeed288ce6d57a44d6210423e37 *README.md
-e294747369bd8c73b82e00d1a5815338 *build/vignette.rds
-27fe915ab9416bed025b0c6ee6e08fd1 *inst/doc/datatable-faq.R
-3d1eba5e5fe861c334d3565e9f9434b7 *inst/doc/datatable-faq.Rnw
-5b025594325ca4e158439bc59be1d266 *inst/doc/datatable-faq.pdf
-8912048bada02f351568c2c73e649ea3 *inst/doc/datatable-intro-vignette.R
-37698b8d9052ceb106e193811c939ba4 *inst/doc/datatable-intro-vignette.Rmd
-eac3f7aeb92c41ca790e2f9b6eb679d5 *inst/doc/datatable-intro-vignette.html
-b068b6c608e89a5333d9e703cc9858c6 *inst/doc/datatable-intro.R
-6b144692290e61a73178e08a764624e6 *inst/doc/datatable-intro.Rnw
-328bb6ef5785404937c58f39fcfa8038 *inst/doc/datatable-intro.pdf
-a2a39ab786d0bffe7e906c058b75339a *inst/doc/datatable-keys-fast-subset.R
-b9c16a06bba12bf90f688afffd8810d2 *inst/doc/datatable-keys-fast-subset.Rmd
-bc0ece209d71cc6eb564d128bb4a7b29 *inst/doc/datatable-keys-fast-subset.html
-3db3d108a8e2edc96abc57c9f849106a *inst/doc/datatable-reference-semantics.R
-6082a466c5f34d47b545112a6e321c76 *inst/doc/datatable-reference-semantics.Rmd
-51440adef139a188edc1591b259cff78 *inst/doc/datatable-reference-semantics.html
-ec78d67af348595bfa6c89a69f7623cd *inst/doc/datatable-reshape.R
-056b4a2e8142bea0e9fddb0a602fabf1 *inst/doc/datatable-reshape.Rmd
-0a93b9cc121a8819a5c42f68305568dc *inst/doc/datatable-reshape.html
+8ff82535188f42a368314fddf610483a *R/xts.R
+dc6551c1e43c07a784519699e51586ae *README.md
+52d0eaab5376b90d6f2e7e20e90e0e2a *build/vignette.rds
+d20a5d50c2a2fae35659da9de05acea3 *inst/doc/datatable-faq.R
+0b13b6d61aa41e7908a871653f63c755 *inst/doc/datatable-faq.Rmd
+929d8d437e0e91f721a958de1f7f4e17 *inst/doc/datatable-faq.html
+ea6aff9f9dc4f5f26a8cee86ba0e8393 *inst/doc/datatable-intro.R
+e9fff1a46fdf96e3572b583bc89e8f86 *inst/doc/datatable-intro.Rmd
+ff2c63b3c8aa6b7b0f8a9069ce937e94 *inst/doc/datatable-intro.html
+c2f4d1dc6234576bf0c1f071325d5b1d *inst/doc/datatable-keys-fast-subset.R
+3f2980389baaff06c2d6b401b26d71bf *inst/doc/datatable-keys-fast-subset.Rmd
+937a425071ae4728cc6fada95ef75ecf *inst/doc/datatable-keys-fast-subset.html
+723df81331669d44c4cab1f541a3d956 *inst/doc/datatable-reference-semantics.R
+531acab6260b82f65ab9048aee6fb331 *inst/doc/datatable-reference-semantics.Rmd
+80616e790d763e60b5592054f6976207 *inst/doc/datatable-reference-semantics.html
+7149288c1c45ff4e6dc0c89b71f81f72 *inst/doc/datatable-reshape.R
+e8ef65c1d8424e390059b854cb18740e *inst/doc/datatable-reshape.Rmd
+eabf46e6fa6f72a90fb29ba97d7af33a *inst/doc/datatable-reshape.html
+22265ade65535db347b44213d4354772 *inst/doc/datatable-secondary-indices-and-auto-indexing.R
+bcdc8c1716a1e3aa1ef831bad0d67715 *inst/doc/datatable-secondary-indices-and-auto-indexing.Rmd
+4ddddfdef09aeec8d1be3a2b043357a8 *inst/doc/datatable-secondary-indices-and-auto-indexing.html
 e48efd4babf364e97ff98e56b1980c8b *inst/tests/1206FUT.txt
+28b57d31f67353c1192c6f65d69a12b1 *inst/tests/1680-fread-header-encoding.csv
 fe198c1178f7db508ee0b10a94272e7e *inst/tests/2008head.csv
+188e619e8c7907a3c8c813fe3a9804a4 *inst/tests/530_fread.txt
+333dc623ece824311884d26fc4b6bddc *inst/tests/536_fread_fill_1.txt
+fa6e982f8396ad10c39664cc8aa899d3 *inst/tests/536_fread_fill_2.txt
+e1303b79f18f51731c441ae2429d30d3 *inst/tests/536_fread_fill_3_extreme.txt
+b3f7faec468f4c541b64a3a624a4ca83 *inst/tests/536_fread_fill_4.txt
 5a4de682f7af427399d4696bc4c417ea *inst/tests/ch11b.dat
 a40ed028d505ec258bec85d3abf92233 *inst/tests/doublequote_newline.csv
+85e0bec6247563bc77fb1afd7ef17196 *inst/tests/fread_blank.txt
+21fcda1bb80e43725988e8bb5b604304 *inst/tests/fread_blank2.txt
+4b96566f22c2b10f4ad22c4b2eb33b04 *inst/tests/fread_blank3.txt
 5cceac3d95d91f6a61b3bd02d76eafd6 *inst/tests/fread_line_error.csv
+5b4a9a12e3db1ba05141ee64f7d5569a *inst/tests/issue_1087_utf8_bom.csv
 1d82109108f1f6a50fe3a8aeb2b9181e *inst/tests/issue_1095_fread.txt
 9c9c5128264c22af43e40cc6892f6d7f *inst/tests/issue_1113_fread.txt
+5050323a6dfb67b7e13d6fe348eeef61 *inst/tests/issue_1116_fread_few_lines.txt
+3f060b3db47db887f5c01ee76617d12d *inst/tests/issue_1116_fread_few_lines_2.txt
+cb50f9289b794276d7d9840c0663fd16 *inst/tests/issue_1164_json.txt
 cf3f75b23f23f13c5187435614bffe8d *inst/tests/issue_1330_fread.txt
+b47c3cb41cbb4c3efdcfb60491b9cb3e *inst/tests/issue_1462_fread_quotes.txt
+c0569cfe2b4ab73e9e812456f2cdcbd5 *inst/tests/issue_1573_fill.txt
 ffbbad4e579a76851c1f3c727fd9e7df *inst/tests/issue_563_fread.txt
 867ec4fca4333c6210bafdd76ac9a33d *inst/tests/issue_773_fread.txt
 d5f1a4914ee94df080c787a3dcc530e3 *inst/tests/issue_785_fread.txt
+2d7daf2b94bd0501a3246755e29a15a8 *inst/tests/melt-warning-1752.tsv
 0738c8cabf507aecd5b004fbbc45c2b4 *inst/tests/quoted_multiline.csv
 278198d16e23ea87b8be7822d0af26e3 *inst/tests/russellCRCRLF.csv
 2d8e8b64b59f727ae5bf5c527948be6a *inst/tests/russellCRLF.csv
-6e8ff5d0e3ffe0bff606a006dcbca24e *inst/tests/test-S4.R
-e1e2c0d73b02da060b444a29898dd2fe *inst/tests/test-data.frame-like.R
-281554bc5fefd8b2db324bef55aa89e4 *inst/tests/tests.Rraw
-6847dd1eefaf386476a48cd155758095 *man/IDateTime.Rd
-885a381b23c36597a6242fbe0830f120 *man/J.Rd
+8befb61685660b818dfe830dba387af5 *inst/tests/tests.Rraw
+9f0dd826cb0518ead716be2e932d4def *man/IDateTime.Rd
+9cbcd078347c5a08402df1ba73aa44b5 *man/J.Rd
 835019f8d7596f0e2f6482b725b10cbf *man/address.Rd
-d4fcf1b33dca213ea50077c7dbffb710 *man/all.equal.data.table.Rd
+cba1f36488291b0fbfa133edf0b061c7 *man/all.equal.data.table.Rd
+fcee579c860188bed47a04336af9f9c0 *man/as.data.table.Rd
 ea442e905301b4305b5c0cdc5a7aa117 *man/as.data.table.xts.Rd
-a05cd8548f8737e8a23070e8440fbc65 *man/as.xts.data.table.Rd
-c8c579c8220ed0bba6b04d1a2b553be6 *man/assign.Rd
-98296b151934142b1031fc44a4f2a066 *man/between.Rd
+313e6cb9f227a39ff91a73b2a73f1674 *man/as.xts.data.table.Rd
+ba628c186b54d45d15bc4d6276edf731 *man/assign.Rd
+262b9b6611f9347776ec21b60cc654ad *man/between.Rd
 e22276a8d899dc4ffb5f24cdc7d5a6ac *man/chmatch.Rd
 3beb57267468fd528548ce3210506bf2 *man/copy.Rd
 7f8f525be80f158964c83fe5b6b43d78 *man/data.table-class.Rd
-ed5e1db0637770d356ec4d7080f80a63 *man/data.table.Rd
-5815c515c3456cfa70e7b0b8a3963a95 *man/dcast.data.table.Rd
-505ac8a7bc10c1a415c4cfd556ead0de *man/duplicated.Rd
-3f6abe22d901c839f663788eee7702b3 *man/foverlaps.Rd
+814bfbf9ec9e3f7d9fc214b3d337a87b *man/data.table.Rd
+6a5dcb5da93a2cd9be8edb77b0aab182 *man/datatable-optimize.Rd
+0f95b7876a3e66740fce419eba57ba46 *man/dcast.data.table.Rd
+75ef60fdf0a53a3f70eacfa67a094933 *man/duplicated.Rd
+c313e9f95feafc04c75e7526105c9d8c *man/first.Rd
+d073c3f067c5113a93cc91487b8b8d13 *man/foverlaps.Rd
 14b89aaa05e22182e1eb28a82c125088 *man/frank.Rd
-b1c0ea006b21afcee590e492e4e5b680 *man/fread.Rd
-ea3e08dc755c70b665278c1f31171183 *man/last.Rd
-ab3e201824e29c2ab52794799229010b *man/like.Rd
-38fc2b2ca30da968c0e3b7a2b032320d *man/melt.data.table.Rd
-50b633ef40c94e04abbdb3263429e5d8 *man/merge.Rd
+f0581f51c048bb08d7ea87a4b19cf62f *man/fread.Rd
+f3ad1b3fee28abbb0010cf8e7ca526d2 *man/fsort.Rd
+d151acecbb8e4045eb7b5e47910797fd *man/fwrite.Rd
+da2b630f6e90002b4e3c820185874d9a *man/last.Rd
+eef491c35003ceefb9c522d1c53a3386 *man/like.Rd
+330e05945e94fedd232564509fda1d3a *man/melt.data.table.Rd
+d5adc5244fd194110052370718a73be0 *man/merge.Rd
 5e9030a66033b8b48dde7b7c03ae0ccc *man/na.omit.data.table.Rd
-6161a4eb18f3f06c5a97567f413e9fb9 *man/patterns.Rd
-6589d06cbe6e7ad66ed4e4092904305c *man/rbindlist.Rd
-31eed9903c9d172dab03fcbfcf6ea6be *man/rleid.Rd
-c31e001ba49e213bdfee2fddc5ba77d2 *man/setDF.Rd
-cce1869968f44fab0ae16df0e90f75e8 *man/setDT.Rd
-85538c401f78bdf46130489535de043f *man/setNumericRounding.Rd
-ecf21c61d503815304eb7fe456289f1e *man/setattr.Rd
+83946073081506038b2752b900091a69 *man/openmp-utils.Rd
+e9a59ea5fd128298fdeaf90d1cbe4beb *man/patterns.Rd
+f84fac7af46eb7cb1639261c2091baf3 *man/print.data.table.Rd
+74e6ffcb1c8327e25add14753118c135 *man/rbindlist.Rd
+71879f9deee6095c8571d39e57c6d97a *man/rleid.Rd
+cfbc90d2e7881f129fcd37bcf4e1c535 *man/rowid.Rd
+6dbff1fdadffedcf32bba700585c58a5 *man/setDF.Rd
+cfd8e9966df5c3f856c80d2eac2ce225 *man/setDT.Rd
+5dd46328a97acd449b00b72ade0f7e08 *man/setNumericRounding.Rd
+e1f96224db1469216c98abe8ac01ceaa *man/setattr.Rd
 5b9d6ef8696a28a89e21fb1db01fa0c3 *man/setcolorder.Rd
-a5c8a59aea1511d39052343b5edb664d *man/setkey.Rd
-4bb592a3d24b42a0496357fc4636bcb4 *man/setorder.Rd
-2ea95eb4205b02d4915336ad77b89fb1 *man/shift.Rd
+856e0aaec6aa12d657211ad2cf8cec7c *man/setkey.Rd
+2b077a22ecf6b9ff2aa289f571531f3b *man/setops.Rd
+f55745cbb4d5d4e72f89c7dbf4813d27 *man/setorder.Rd
+3643b5e199ac96ca4223f53d7fbf1a8f *man/shift.Rd
+189c9f58e2b01fc06d692c6a4f9e3eb4 *man/shouldPrint.Rd
+5b1ffc2688c7cdca88a81d2896ba57f1 *man/special-symbols.Rd
+3517543bfedccafc877eda9492cb1b6d *man/split.Rd
 af0454fe22cc4149ef29e126cff33135 *man/subset.data.table.Rd
 35a25fe3239f5105199d82f1caf731fb *man/tables.Rd
-64170aad1226f92a8be817cfb9bdca4e *man/test.data.table.Rd
+4de072f81600dc10269ade854bfdaa54 *man/test.data.table.Rd
 1bdb8df53d3e596955b6ed4c050aee7a *man/timetaken.Rd
 92f617e2f245dd1af07215a59d98be10 *man/transform.data.table.Rd
 a2a4eb530691106fb68f4851182b66fb *man/transpose.Rd
-e09361ebcd90900aa2b1371e602ccb0d *man/truelength.Rd
-d8a91751bc39321203576a2b5ca5baf7 *man/tstrsplit.Rd
-15a12eef17e9186ef30d7acc88dac93b *src/Makevars
-21fc0a7f04b4af696a152d335227f668 *src/assign.c
-1688aac66ee8cc427b29dbb81bfd4130 *src/bmerge.c
+4d71404f8c7d5b3a9aa93195c22e8f97 *man/truelength.Rd
+142dc7a616c9c546ffb9f0bea81cc2d7 *man/tstrsplit.Rd
+7140c94fe69f182d6c111a8d40795e4f *src/Makevars
+ddf75bd43a58be15256ee2ea99e8c2e1 *src/assign.c
+059a1ee027698e97d5757cdc6493ba2d *src/between.c
+5668430c2b9d513b544cfa6c9d93b664 *src/bmerge.c
 771f833144f4bcd530d36944c9ece29f *src/chmatch.c
-6adf9a9731b360a25362df22d69e0bdb *src/data.table.h
-412d4d4088cb8c91e35828c2d225dd52 *src/dogroups.c
+8c6af4083df6e5193d6c18e2f2c679b4 *src/data.table.h
+659f0e6641f909726724915818a6a2f2 *src/dogroups.c
 e3d5b289ef4b511e4f89c9838b86cbd0 *src/fastmean.c
-6c365b327989acaea94103075ec7bd2f *src/fastradixdouble.c
-5d0213fd2b7c2f916b558f631a19de71 *src/fastradixint.c
-5efb4470c5f95c51132964f6678da9bf *src/fcast.c
-c0f7d576b3914b54797878ec8f421522 *src/fmelt.c
-e7d0296adc515500c439f111d581150c *src/forder.c
-fcd7100afa206eea6fc64af21f7f47f1 *src/frank.c
-4aa4ba82f5207091091c6cb940a29c99 *src/fread.c
-8038e32e74b05c45a0f68ede4ef86e9e *src/gsumm.c
+b14b7f112e32f64e2389b4f6176783c6 *src/fcast.c
+15f33d9fd7a31c7f288019c0d667f003 *src/fmelt.c
+9d4344ed7c33cc5e79a91c262f5af38e *src/forder.c
+4cace1380fb006a5c786e9a7f4d12937 *src/frank.c
+f249407a5e5b24e1d1e9e5e29d41895c *src/fread.c
+8cc10403f23c6e018f26b0220b509a86 *src/fsort.c
+4eb633083629e9f2df7431000100418a *src/fwrite.c
+237a455e5212cf3358ab5aaca12fbd9c *src/fwriteLookups.h
+4ddc11c18a0ef9ca3153c64ba9ef9f68 *src/gsumm.c
 47792eafb3cee1c03bbcb972d00c4aad *src/ijoin.c
-17a5ea8d81686098067123620f12aa67 *src/init.c
-a9f5e09254925500d8ee16bb56a6351d *src/rbindlist.c
-091c9c0e14a7d55debb0b15fac8b82e4 *src/reorder.c
-4eba835decab079e08080a910bd3ab09 *src/shift.c
-8e923a23bf90010e0daddd91b715e313 *src/transpose.c
-fc58eacd0513bb60e72dc601d46157f7 *src/uniqlist.c
+c286736030104e7222bf17b7b1d91a9f *src/init.c
+520938944d8dbd58460bcf4ca44e9479 *src/inrange.c
+a1237adc2f1ced2e4e9e3724a9434211 *src/openmp-utils.c
+ab561ed83137b5b2d78d5d06030f7446 *src/quickselect.c
+9075a14cce153c75374d798fb1ea4bde *src/rbindlist.c
+416562e57a9368398d026ec1edc96313 *src/reorder.c
+43566a73264aab49b4f4fb9ffcf77c0b *src/shift.c
+422ef9d84f83359beb206ffe087ef89c *src/subset.c
+53304fe0233e11f15786cdbddf6c89f8 *src/transpose.c
+cfe85d1626ba795d8edc0f098c1b7f12 *src/uniqlist.c
 75a359ce5e8c6927d46dd9b2aa169da1 *src/vecseq.c
-566dad18bf6dd9e6ffe38bc54d1ac867 *src/wrappers.c
-39d85ffda439359be47c1d38ba0a4eae *tests/autoprint.R
-f93d8466f52d75c4c1bcf8beff0346a4 *tests/autoprint.Rout.save
-49eb934f742dc72ad5dfa2ce55948e17 *tests/knitr.R
+edb7d033ad8f381276ef9600ec9ff8da *src/wrappers.c
+441a393fe285e88a86e1126af8d6d7d8 *tests/autoprint.R
+1ad409241d679d234e1a56ef06507e64 *tests/autoprint.Rout.save
+5b9fc0d7c7ea64a9b1f60f9eba66327e *tests/knitr.R
 a131697aa09fdb9c30febf299ff78afa *tests/knitr.Rmd
-e0fb6f61a9efe8a41f89ee9ad6d7189b *tests/knitr.Rout.save
-105cff3529e59e908e3e655e6ea1f6b3 *tests/test-all.R
-cf45f2117922e04a24da6add785480db *tests/tests.R
-ea0d0c33a6b3ea87ecd51b89b437653a *vignettes/Makefile
+c516b2732db5e72e708571b41bdb2bf6 *tests/knitr.Rout.mock
+ab5ea160db13987244b76beeff3be74f *tests/knitr.Rout.save
+d4e7268543efee032a82d9bde312b34c *tests/main.R
+531d607a1e10b7eb8a48b96d576ee22f *tests/testthat.R
+6e8ff5d0e3ffe0bff606a006dcbca24e *tests/testthat/test-S4.R
+19d2c1a56e20560c13418539c56fbe9a *tests/testthat/test-data.frame-like.R
+4304c919f6e28ea84d1ca05708ccaae8 *vignettes/Makefile
 5dcf8be4b810d38fc5d4d0817167b079 *vignettes/css/bootstrap.css
-3d1eba5e5fe861c334d3565e9f9434b7 *vignettes/datatable-faq.Rnw
-37698b8d9052ceb106e193811c939ba4 *vignettes/datatable-intro-vignette.Rmd
-6b144692290e61a73178e08a764624e6 *vignettes/datatable-intro.Rnw
-b9c16a06bba12bf90f688afffd8810d2 *vignettes/datatable-keys-fast-subset.Rmd
-6082a466c5f34d47b545112a6e321c76 *vignettes/datatable-reference-semantics.Rmd
-056b4a2e8142bea0e9fddb0a602fabf1 *vignettes/datatable-reshape.Rmd
+0b13b6d61aa41e7908a871653f63c755 *vignettes/datatable-faq.Rmd
+e9fff1a46fdf96e3572b583bc89e8f86 *vignettes/datatable-intro.Rmd
+3f2980389baaff06c2d6b401b26d71bf *vignettes/datatable-keys-fast-subset.Rmd
+531acab6260b82f65ab9048aee6fb331 *vignettes/datatable-reference-semantics.Rmd
+e8ef65c1d8424e390059b854cb18740e *vignettes/datatable-reshape.Rmd
+bcdc8c1716a1e3aa1ef831bad0d67715 *vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
 47c3d35a8fa9fed8ed988bff69f70d3b *vignettes/flights14.csv
 f84166315a27e89bf88bb54f3b7e454b *vignettes/melt_default.csv
 30d44359d84026ba2b7140b0ff33abaa *vignettes/melt_enhanced.csv
diff --git a/NAMESPACE b/NAMESPACE
index f8fd373..00d56f6 100644
--- a/NAMESPACE
+++ b/NAMESPACE
@@ -6,8 +6,8 @@ exportClasses(data.table, IDate, ITime)
 ##
 
 export(data.table, tables, setkey, setkeyv, key, "key<-", haskey, CJ, SJ, copy)
-export(set2key, set2keyv, key2)
-export(as.data.table,is.data.table,test.data.table,last,like,"%like%",between,"%between%")
+export(set2key, set2keyv, key2, setindex, setindexv, indices)
+export(as.data.table,is.data.table,test.data.table,last,first,like,"%like%",between,"%between%",inrange,"%inrange%")
 export(timetaken)
 export(truelength, alloc.col, ":=")
 export(setattr, setnames, setcolorder, set, setDT, setDF)
@@ -16,6 +16,7 @@ export(setNumericRounding, getNumericRounding)
 export(chmatch, "%chin%", chorder, chgroup)
 export(rbindlist)
 export(fread)
+export(fwrite)
 export(foverlaps)
 export(shift)
 export(transpose)
@@ -26,8 +27,19 @@ export(address)
 export(.SD,.N,.I,.GRP,.BY,.EACHI)
 export(rleid)
 export(rleidv)
+export(rowid)
+export(rowidv)
 export(as.xts.data.table)
 export(uniqueN)
+export(setDTthreads, getDTthreads)
+# set operators
+export(fintersect)
+export(fsetdiff)
+export(funion)
+export(fsetequal)
+S3method(all.equal, data.table)
+export(shouldPrint)
+export(fsort)  # experimental parallel sort for vector type double only, currently
 
 S3method("[", data.table)
 S3method("[<-", data.table)
@@ -46,6 +58,7 @@ S3method(as.data.table, logical)
 S3method(as.data.table, factor)
 S3method(as.data.table, ordered)
 S3method(as.data.table, Date)
+S3method(as.data.table, ITime)
 S3method(as.data.table, table)
 S3method(as.data.table, xts)
 S3method(as.data.table, default)
@@ -55,6 +68,7 @@ S3method(as.matrix, data.table)
 #S3method(cbind, data.table)
 #S3method(rbind, data.table)
 export(.rbind.data.table)
+S3method(split, data.table)
 S3method(dim, data.table)
 S3method(dimnames, data.table)
 S3method("dimnames<-", data.table)
@@ -68,7 +82,6 @@ S3method(within, data.table)
 S3method(is.na, data.table)
 S3method(format, data.table)
 S3method(Ops, data.table)
-S3method(all.equal, data.table)
 
 S3method(anyDuplicated, data.table)
 
@@ -87,16 +100,16 @@ S3method(na.omit, data.table)
 
 # IDateTime support:
 export(as.IDate,as.ITime,IDateTime)
-export(hour,yday,wday,mday,week,month,quarter,year)
+export(second,minute,hour,yday,wday,mday,week,isoweek,month,quarter,year)
 
-importFrom(chron, chron, as.chron)
-export(as.chron.IDate,as.chron.ITime)
+export(as.chron.IDate, as.chron.ITime)
+export(as.Date.IDate) # workaround for zoo bug, see #1500
 
 S3method("[", ITime)
+S3method("+", IDate)
+S3method("-", IDate)
 S3method(as.character, ITime)
 S3method(as.data.frame, ITime)
-## S3method(as.chron, IDate)
-## S3method(as.chron, ITime)
 S3method(as.Date, IDate)
 S3method(as.IDate, Date)
 S3method(as.IDate, default)
diff --git a/README.md b/NEWS.md
similarity index 79%
copy from README.md
copy to NEWS.md
index 72a00bb..5e8cf64 100644
--- a/README.md
+++ b/NEWS.md
@@ -1,15 +1,382 @@
 
-**Current stable release** (always even) : [v1.9.6 on CRAN](http://cran.r-project.org/package=data.table), released 19<sup>th</sup> Sep 2015.  
-**Development version** (always odd): [v1.9.7 on GitHub](https://github.com/Rdatatable/data.table/) [![Build Status](https://travis-ci.org/Rdatatable/data.table.svg?branch=master)](https://travis-ci.org/Rdatatable/data.table) [![codecov.io](http://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](http://codecov.io/github/Rdatatable/data.table?branch=master)
- [How to install?](https://github.com/Rdatatable/data.table/wiki/Installation)
+### Changes in v1.10.0  (on CRAN 3 Dec 2016)
 
-<!-- Note this file is displayed on the CRAN page, as well as on GitHub. So the the link to GitHub is not to itself when viewed on the CRAN page. -->
+#### BUG FIXES
 
-**Introduction, installation, documentation, benchmarks etc:** [HOMEPAGE](https://github.com/Rdatatable/data.table/wiki)
+1. `fwrite(..., quote='auto')` already quoted a field if it contained a `sep` or `\n`, or `sep2[2]` when `list` columns are present. Now it also quotes a field if it contains a double quote (`"`) as documented, [#1925](https://github.com/Rdatatable/data.table/issues/1925). Thanks to Aki Matsuo for reporting. Tests added. The `qmethod` tests did test escaping embedded double quotes, but only when `sep` or `\n` was present in the field as well to trigger the quoting of the field.
 
-**Guidelines for filing issues / pull requests:** Please see the project's [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/Contributing.md).
+2. Fixed 3 test failures on Solaris only, [#1934](https://github.com/Rdatatable/data.table/issues/1934). Two were on both sparc and x86 and related to a `tzone` attribute difference between `as.POSIXct` and `as.POSIXlt` even when passed the default `tz=""`. The third was on sparc only: a minor rounding issue in `fwrite()` of 1e-305.
+
+3. Regression crash fixed when 0's occur at the end of a non-empty subset of an empty table, [#1937](https://github.com/Rdatatable/data.table/issues/1937). Thanks Arun for tracking down. Tests added. For example, subsetting the empty `DT=data.table(a=character())` with `DT[c(1,0)]` should return a 1 row result with one `NA` since 1 is past the end of `nrow(DT)==0`, the same result as `DT[1]`.
+
+4. Fixed newly reported crash that also occurred in old v1.9.6 when `by=.EACHI`, `nomatch=0`, the first item in `i` has no match AND `j` has a function call that is passed a key column, [#1933](https://github.com/Rdatatable/data.table/issues/1933). Many thanks to Reino Bruner for finding and reporting with a reproducible example. Tests added.
+
+5. Fixed `fread()` error occurring for a subset of Windows users: `showProgress is not type integer but type 'logical'.`, [#1944](https://github.com/Rdatatable/data.table/issues/1944) and [#1111](https://github.com/Rdatatable/data.table/issues/1111). Our tests cover this usage (it is just default usage), pass on AppVeyor (Windows), win-builder (Windows) and CRAN's Windows so perhaps it only occurs on a specific and different version of Windows to all those. Thanks to @demydd for reportin [...]
+
+6. Combining `on=` (new in v1.9.6) with `by=` or `keyby=` gave incorrect results, [#1943](https://github.com/Rdatatable/data.table/issues/1943). Many thanks to Henrik-P for the detailed and reproducible report. Tests added.
+
+7. New function `rleidv` was ignoring its `cols` argument, [#1942](https://github.com/Rdatatable/data.table/issues/1942). Thanks Josh O'Brien for reporting. Tests added.
+
+#### NOTES
+
+1. It seems OpenMP is not available on CRAN's Mac platform; NOTEs appeared in [CRAN checks](https://cran.r-project.org/web/checks/check_results_data.table.html) for v1.9.8. Moved `Rprintf` from `init.c` to `packageStartupMessage` to avoid the NOTE as requested urgently by Professor Ripley. Also fixed the bad grammar of the message: 'single threaded' now 'single-threaded'. If you have a Mac and run macOS or OS X on it (I run Ubuntu on mine) please contact CRAN maintainers and/or Apple if  [...]
+
+2. Just to state explicitly: data.table does not now depend on or require OpenMP. If you don't have it (as on CRAN's Mac it appears but not in general on Mac) then data.table should build, run and pass all tests just fine.
+
+3. There are now 5,910 raw tests as reported by `test.data.table()`. Tests cover 91% of the 4k lines of R and 89% of the 7k lines of C. These stats are now known thanks to Jim Hester's [Covr](https://CRAN.R-project.org/package=covr) package and [Codecov.io](https://codecov.io/). If anyone is looking for something to help with, creating tests to hit the missed lines shown by clicking the `R` and `src` folders at the bottom [here](https://codecov.io/github/Rdatatable/data.table?branch=mast [...]
+
+4. The FAQ vignette has been revised given the changes in v1.9.8. In particular, the very first FAQ.
+
+5. With hindsight, the last release v1.9.8 should have been named v1.10.0 to convey it wasn't just a patch release from .6 to .8 owing to the 'potentially breaking changes' items. Thanks to @neomantic for correctly pointing out. The best we can do now is now bump to 1.10.0.
+
+
+### Changes in v1.9.8  (on CRAN 25 Nov 2016)
+
+#### POTENTIALLY BREAKING CHANGES
+
+  1. By default all columns are now used by `unique()`, `duplicated()` and `uniqueN()` data.table methods, [#1284](https://github.com/Rdatatable/data.table/issues/1284) and [#1841](https://github.com/Rdatatable/data.table/issues/1841). To restore old behaviour: `options(datatable.old.unique.by.key=TRUE)`. In 1 year this option to restore the old default will be deprecated with warning. In 2 years the option will be removed. Please explicitly pass `by=key(DT)` for clarity. Only code that  [...]
+
+  2. A new column is guaranteed with `:=` even when there are no matches or when its RHS is length 0 (e.g. `integer()`, `numeric()`) but not `NULL`. The NA column is created with the same type as the empty RHS. This is for consistency so that whether a new column is added or not does not depend on whether `i` matched to 1 or more rows or not. See [#759](https://github.com/Rdatatable/data.table/issues/759) for further details and examples.
+
+  3. When `j` contains no unquoted variable names (whether column names or not), `with=` is now automatically set to `FALSE`. Thus, `DT[,1]`, `DT[,"someCol"]`, `DT[,c("colA","colB")]` and `DT[,100:109]` now work as we all expect them to; i.e., returning columns, [#1188](https://github.com/Rdatatable/data.table/issues/1188), [#1149](https://github.com/Rdatatable/data.table/issues/1149). Since there are no variable names there is no ambiguity as to what was intended. `DT[,colName1:colName2 [...]
+
+#### NEW FEATURES
+
+  1. `fwrite()` - parallel .csv writer:
+    * Thanks to Otto Seiskari for the initial pull request [#580](https://github.com/Rdatatable/data.table/issues/580) that provided C code, R wrapper, manual page and extensive tests.
+    * From there Matt parallelized and specialized C functions for writing integer/numeric exactly matching `write.csv` between 2.225074e-308 and 1.797693e+308 to 15 significant figures, dates (between 0000-03-01 and 9999-12-31), times down to microseconds in POSIXct, automatic quoting, `bit64::integer64`, `row.names` and `sep2` for `list` columns where each cell can itself be a vector. See [this blog post](http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/) for implementation details an [...]
+    * Accepts any `list` of same length vectors; e.g. `data.frame` and `data.table`.
+    * Caught in development before release to CRAN: thanks to Francesco Grossetti for [#1725](https://github.com/Rdatatable/data.table/issues/1725) (NA handling), Torsten Betz for [#1847](https://github.com/Rdatatable/data.table/issues/1847) (rounding of 9.999999999999998) and @ambils for [#1903](https://github.com/Rdatatable/data.table/issues/1903) (> 1 million columns).
+    * `fwrite` status was tracked here: [#1664](https://github.com/Rdatatable/data.table/issues/1664)
+
+  2. `fread()`:
+    * gains `quote` argument. `quote = ""` disables quoting altogether which reads each field *as is*, [#1367](https://github.com/Rdatatable/data.table/issues/1367). Thanks @manimal.
+    * With [#1462](https://github.com/Rdatatable/data.table/issues/1462) fix, quotes are handled slightly better. Thanks  @Pascal for [posting on SO](http://stackoverflow.com/q/34144314/559784).
+    * gains `blank.lines.skip` argument that continues reading by skipping empty lines. Default is `FALSE` for backwards compatibility, [#530](https://github.com/Rdatatable/data.table/issues/530). Thanks @DirkJonker. Also closes [#1575](https://github.com/Rdatatable/data.table/issues/1575).
+    * gains `fill` argument with default `FALSE` for backwards compatibility. Closes [#536](https://github.com/Rdatatable/data.table/issues/536). Also, `fill=TRUE` prioritises maximum cols instead of longest run with identical columns when `fill=TRUE` which allows handle missing columns slightly more robustly, [#1573](https://github.com/Rdatatable/data.table/issues/1573).
+    * gains `key` argument, [#590](https://github.com/Rdatatable/data.table/issues/590).
+    * gains `file` argument which expects existing file on input, to ensure no shell commands will be executed when reading file. Closes [#1702](https://github.com/Rdatatable/data.table/issues/1702).
+    * Column type guessing is improved by testing 100 rows at 10 points rather than 5 rows at 3 points. See point 3 of [convenience features of fread for small data](https://github.com/Rdatatable/data.table/wiki/Convenience-features-of-fread).
+
+  3. Joins:
+    * Non-equi (or conditional) joins are now possible using the familiar `on=` syntax. Possible binary operators include `>=`, `>`, `<=`, `<` and `==`. For e.g., `X[Y, on=.(a, b>b)]` looks for `X.a == Y.a` first and within those matching rows for rows where`X.b > Y.b`, [#1452](https://github.com/Rdatatable/data.table/issues/1452).
+    * x's columns can be referred to in `j` using the prefix `x.` at all times. This is particularly useful when it is necessary to x's column that is *also a join column*, [#1615](https://github.com/Rdatatable/data.table/issues/1615). Also closes [#1705](https://github.com/Rdatatable/data.table/issues/1705) (thanks @dbetebenner) and [#1761](https://github.com/Rdatatable/data.table/issues/1761).
+    * `on=.()` syntax is now posible, e.g., `X[Y, on=.(x==a, y==b)]`, [#1257](https://github.com/Rdatatable/data.table/issues/1257). Thanks @dselivanov.
+    * Joins using `on=` accepts unnamed columns on ad hoc joins, e.g., X[.(5), on="b"] joins "b" from `X` to "V1" from `i`, partly closes [#1375](https://github.com/Rdatatable/data.table/issues/1375).
+    * When joining with `on=`, `X[Y, on=c(A="A", b="c")]` can be now specified as `X[Y, on=c("A", b="c")]`, fully closes [#1375](https://github.com/Rdatatable/data.table/issues/1375).
+    * `on=` joins now provides more friendly error messages when columns aren't found, [#1376](https://github.com/Rdatatable/data.table/issues/1376).
+    * Joins (and binary search based subsets) using `on=` argument now reuses existing (secondary) indices, [#1439](https://github.com/Rdatatable/data.table/issues/1439). Thanks @jangorecki.
+
+  4. `merge.data.table` by default also checks for common key columns between the two `data.table`s before resulting in error when `by` or `by.x, by.y` arguments are not provided, [#1517](https://github.com/Rdatatable/data.table/issues/1517). Thanks @DavidArenburg.
+
+  5. Fast set operations `fsetdiff`, `fintersect`, `funion` and `fsetequal` for data.tables are now implemented, [#547](https://github.com/Rdatatable/data.table/issues/547).
+
+  6. Added `setDTthreads()` and `getDTthreads()` to control the threads used in data.table functions that are now parallelized with OpenMP on all architectures including Windows (`fwrite()`, `fsort()` and subsetting). Extra code was required internally to ensure these control data.table only and not other packages using OpenMP. When data.table is used from the parallel package (e.g. `mclapply` as done by 3 CRAN and Bioconductor packages) data.table automatically switches down to one thre [...]
+
+  7. `GForce` (See ?\`datatable-optimize\` for more):
+    * `dt[, .N, by=cols]` is optimised internally as well, [#1251](https://github.com/Rdatatable/data.table/issues/1251).
+    * is now also optimised for `median`. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523). Check that issue for benchmarks.
+    * GForce kicks in along with subsets in `i` as well, e.g., `DT[x > 2, mean(y), by=z]`. Partly addresses [#971](https://github.com/Rdatatable/data.table/issues/971).
+    * GForce is optimised for `head(., 1)` and `tail(., 1)`, where `.` is a column name or `.SD`. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523). Check the link for benchmarks.
+    * GForce is optimised for length-1 subsets, e.g., `.SD[2]`, `col[2]`. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523).
+    * `var`, `sd` and `prod` are all GForce optimised for speed and memory. Partly addresses [#523](https://github.com/Rdatatable/data.table/issues/523). See that post for benchmarks.
+
+  8. Reshaping:
+    * `dcast.data.table` now allows `drop = c(FALSE, TRUE)` and `drop = c(TRUE, FALSE)`. The former only fills all missing combinations of formula LHS, where as the latter fills only all missing combinations of formula RHS. Thanks to Ananda Mahto for [this SO post](http://stackoverflow.com/q/34830908/559784) and to Jaap for filing [#1512](https://github.com/Rdatatable/data.table/issues/1512).
+    * `melt.data.table` finds variables provided to `patterns()` when called from within user defined functions, [#1749](https://github.com/Rdatatable/data.table/issues/1749). Thanks to @kendonB for the report.
+
+  9. We can now refer to the columns that are not mentioned in `.SD` / `.SDcols` in `j` as well. For example, `DT[, .(sum(v1), lapply(.SD, mean)), by=grp, .SDcols=v2:v3]` works as expected, [#495](https://github.com/Rdatatable/data.table/issues/495). Thanks to @MattWeller for report and to others for linking various SO posts to be updated. Also closes [#484](https://github.com/Rdatatable/data.table/issues/484).
+
+  10. New functions `inrange()` and `%inrange%` are exported. It performs a range join making use of the recently implemented *non-equi* joins ([#1452](https://github.com/Rdatatable/data.table/issues/1452)) [#679](https://github.com/Rdatatable/data.table/issues/679). Also thanks to @DavidArenburg for [#1819](https://github.com/Rdatatable/data.table/issues/1819).
+
+  11. `%between%` is vectorised which means we can now do: `DT[x %between% list(y,z)]` where `y` and `z` are vectors, [#534](https://github.com/Rdatatable/data.table/issues/534). Thanks @MicheleCarriero for filing the issue and the idea.
+
+  12. Most common use case for `between()`, i.e., `lower` and `upper` are length=1, is now implemented in C and parallelised. This results in ~7-10x speed improvement on vectors of length >= 1e6.
+
+  13. Row subset operations of data.table is now parallelised with OpenMP, [#1660](https://github.com/Rdatatable/data.table/issues/1660). See the linked issue page for a rough benchmark on speedup.
+
+  14. `tstrsplit` gains argument `names`, [#1379](https://github.com/Rdatatable/data.table/issues/1379). A character vector of column names can be provided as well. Thanks @franknarf1.
+
+  15. `tstrsplit` gains argument `keep` which corresponds to the indices of list elements to return from the transposed list.
+
+  16. `rowid()` and `rowidv()` - convenience functions for generating a unique row ids within each group, are implemented. `rowid()` is particularly useful along with `dcast()`. See `?rowid` for more, [#1353](https://github.com/Rdatatable/data.table/issues/1353).
+
+  17. `rleid()` gains `prefix` argument, similar to `rowid()`.
+
+  18. `shift()` understands and operates on list-of-list inputs as well, [#1595](https://github.com/Rdatatable/data.table/issues/1595). Thanks to @enfascination and to @chris for [asking on SO](http://stackoverflow.com/q/38900293/559784).
+
+  19. `uniqueN` gains `na.rm` argument, [#1455](https://github.com/Rdatatable/data.table/issues/1455).
+
+  20. `first()` is now exported to return the first element of vectors, data.frames and data.tables.
+
+  21. New `split.data.table` method. Faster, more flexible and consistent with data.frame method. Closes [#1389](https://github.com/Rdatatable/data.table/issues/1389). Now also properly preallocate columns, thanks @maverickg for reporting, closes [#1908](https://github.com/Rdatatable/data.table/issues/1908).
+
+  22. `rbindlist` supports columns of type `complex`, [#1659](https://github.com/Rdatatable/data.table/issues/1659).
+
+  23. Added `second` and `minute` extraction functions which, like extant `hour`/`yday`/`week`/etc, always return an integer, [#874](https://github.com/Rdatatable/data.table/issues/874). Also added ISO 8601-consistent weeks in `isoweek`, [#1765](https://github.com/Rdatatable/data.table/issues/1765). Thanks to @bthieurmel and @STATWORX for the FRs and @MichaelChirico for the PRs. 
+
+  24. `setnames` accepts negative indices in `old` argument, [#1443](https://github.com/Rdatatable/data.table/issues/1443). Thanks @richierocks.
+
+  25. `by` understands `colA:colB` syntax now, like `.SDcols` does, [#1395](https://github.com/Rdatatable/data.table/issues/1395). Thanks @franknarf1.
+
+  26. `data.table()` function gains `stringsAsFactors` argument with default `FALSE`, [#643](https://github.com/Rdatatable/data.table/issues/643). Thanks to @jangorecki for reviving this issue.
+  
+  27. `print.data.table` now warns when `bit64` package isn't loaded but the `data.table` contains `integer64` columns, [#975](https://github.com/Rdatatable/data.table/issues/975). Thanks to @StephenMcInerney.
+
+  28. New argument `print.class` for `print.data.table` allows for including column class under column names (as inspired by `tbl_df` in `dplyr`); default (adjustable via `"datatable.print.class"` option) is `FALSE`, the inherited behavior. Part of [#1523](https://github.com/Rdatatable/data.table/issues/1523); thanks to @MichaelChirico for the FR & PR.
+  
+  29. `all.equal.data.table` gains `check.attributes`, `ignore.col.order`, `ignore.row.order` and `tolerance` arguments.
+  
+  30. `keyby=` is now much faster by not doing not needed work; e.g. 25s down to 13s for a 1.5GB DT with 200m rows and 86m groups. With more groups or bigger data, larger speedup factors are possible. Please always use `keyby=` unless you really need `by=`. `by=` returns the groups in first appearance order and takes longer to do that. See [#1880](https://github.com/Rdatatable/data.table/issues/1880) for more info and please register your views there on changing the default.
+
+#### BUG FIXES
+
+  1. Now compiles and runs on IBM AIX gcc. Thanks to Vinh Nguyen for investigation and testing, [#1351](https://github.com/Rdatatable/data.table/issues/1351).
+
+  2. `as.ITime(NA)` works as intended, [#1354](https://github.com/Rdatatable/data.table/issues/1354). Thanks @geneorama.
+
+  3. `last()` dispatches `xts::last()` properly again, [#1347](https://github.com/Rdatatable/data.table/issues/1347). Thanks to @JoshuaUlrich for spotting and suggesting the fix.
+
+  4. `merge.data.table` ignores names when `by` argument is a named vector, [#1352](https://github.com/Rdatatable/data.table/issues/1352). Thanks @sebastian-c.
+
+  5. `melt.data.table` names `value` column correctly when `patterns()` of length=1 is provided to `measure.vars()`, [#1346](https://github.com/Rdatatable/data.table/issues/1346). Thanks @jaapwalhout.
+
+  6. Fixed a rare case in `melt.data.table` not setting `variable` factor column properly when `na.rm=TRUE`, [#1359](https://github.com/Rdatatable/data.table/issues/1359). Thanks @mplatzer.
+
+  7. `dt[i, .SD]` unlocks `.SD` and overallocates correctly now, [#1341](https://github.com/Rdatatable/data.table/issues/1341). Thanks @marc-outins.
+
+  8. Querying a list column with `get()`, e.g., `dt[, get("c")]` is handled properly, [#1212](https://github.com/Rdatatable/data.table/issues/1212). Thanks @DavidArenburg.
+
+  9. Grouping on empty data.table with list col in `j` works as expected, [#1207](https://github.com/Rdatatable/data.table/issues/1207). Thanks @jangorecki.
+
+  10. Unnamed `by/keyby` expressions ensure now that the auto generated names are unique, [#1334](https://github.com/Rdatatable/data.table/issues/1334). Thanks @caneff.
+
+  11. `melt` errors correctly when `id.vars` or `measure.vars` are negative values, [#1372](https://github.com/Rdatatable/data.table/issues/1372).
+
+  12. `merge.data.table` always resets class to `c("data.table", "data.frame")` in result to be consistent with `merge.data.frame`, [#1378](https://github.com/Rdatatable/data.table/issues/1378). Thanks @ladida771.
+
+  13. `fread` reads text input with empty newline but with just spaces properly, for e.g., fread('a,b\n1,2\n   '), [#1384](https://github.com/Rdatatable/data.table/issues/1384). Thanks to @ladida771.
+
+  14. `fread` with `stringsAsFactors = TRUE` no longer produces factors with NA as a factor level, [#1408](https://github.com/Rdatatable/data.table/pull/1408). Thanks to @DexGroves.
+
+  15. `test.data.table` no longer raises warning if suggested packages are not available. Thanks to @jangorecki for PR [#1403](https://github.com/Rdatatable/data.table/pull/1403). Closes [#1193](https://github.com/Rdatatable/data.table/issues/1193).
+
+  16. `rleid()` does not affect attributes of input vector, [#1419](https://github.com/Rdatatable/data.table/issues/1419). Thanks @JanGorecki.
+
+  17. `uniqueN()` now handles NULL properly, [#1429](https://github.com/Rdatatable/data.table/issues/1429). Thanks @JanGorecki.
+
+  18. GForce `min` and `max` functions handle `NaN` correctly, [#1461](https://github.com/Rdatatable/data.table/issues/1461). Thanks to @LyssBucks for [asking on SO](http://stackoverflow.com/q/34081848/559784).
+
+  19. Warnings on unable to detect column types from middle/last 5 lines are now moved to messages when `verbose=TRUE`. Closes [#1124](https://github.com/Rdatatable/data.table/issues/1124).
+
+  20. `fread` converts columns to `factor` type when used along with `colClasses` argument, [#721](https://github.com/Rdatatable/data.table/issues/721). Thanks @AmyMikhail.
+
+  21. Auto indexing handles logical subset of factor column using numeric value properly, [#1361](https://github.com/Rdatatable/data.table/issues/1361). Thanks @mplatzer.
+  
+  22. `as.data.table.xts` handles single row `xts` object properly, [#1484](https://github.com/Rdatatable/data.table/issues/1484). Thanks Michael Smith and @jangorecki.
+
+  23. data.table now solves the issue of mixed encodings by comparing character columns with marked encodings under `UTF8` locale. This resolves issues [#66](https://github.com/Rdatatable/data.table/issues/66), [#69](https://github.com/Rdatatable/data.table/issues/69), [#469](https://github.com/Rdatatable/data.table/issues/469) and [#1293](https://github.com/Rdatatable/data.table/issues/1293). Thanks to @StefanFritsch and @Arthur.
+
+  24. `rbindlist` handles `idcol` construction correctly and more efficiently now (logic moved to C), [#1432](https://github.com/Rdatatable/data.table/issues/1432). Thanks to @franknarf1 and @Chris.
+
+  25. `CJ` sorts correctly when duplicates are found in input values and `sorted=TRUE`, [#1513](https://github.com/Rdatatable/data.table/issues/1513). Thanks @alexdeng.
+
+  26. Auto indexing returns order of subset properly when input `data.table` is already sorted, [#1495](https://github.com/Rdatatable/data.table/issues/1495). Thanks @huashan for the nice reproducible example.
+
+  27. `[.data.table` handles column subsets based on conditions that result in `NULL` as list elements correctly, [#1477](https://github.com/Rdatatable/data.table/issues/1477). Thanks @MichaelChirico. Also thanks to @Max from DSR for spotting a bug as a result of this fix. Now fixed.
+
+  28. Providing the first argument to `.Call`, for e.g., `.Call("Crbindlist", ...)` seems to result in *"not resolved in current namespace"* error. A potential fix is to simply remove the quotes like so many other calls in data.table. Potentially fixes [#1467](https://github.com/Rdatatable/data.table/issues/1467). Thanks to @rBatt, @rsaporta and @damienchallet.
+
+  29. `last` function will now properly redirect method if `xts` is not installed or not attached on search path. Closes [#1560](https://github.com/Rdatatable/data.table/issues/1560).
+
+  30. `rbindlist` (and `rbind`) works as expected when `fill = TRUE` and the first element of input list doesn't have columns present in other elements of the list, [#1549](https://github.com/Rdatatable/data.table/issues/1549). Thanks to @alexkowa.
+
+  31. `DT[, .(col), with=FALSE]` now returns a meaningful error message, [#1440](https://github.com/Rdatatable/data.table/issues/1440). Thanks to @VasilyA for [posting on SO](http://stackoverflow.com/q/33851742/559784).
+
+  32. Fixed a segault in `forder` when elements of input list are not of same length, [#1531](https://github.com/Rdatatable/data.table/issues/1531). Thanks to @MichaelChirico.
+  
+  33. Reverted support of *list-of-lists* made in [#1224](https://github.com/Rdatatable/data.table/issues/1224) for consistency.
+
+  34. Fixed an edge case in fread's `fill` argument, [#1503](https://github.com/Rdatatable/data.table/issues/1503). Thanks to @AnandaMahto.
+
+  35. `copy()` overallocates properly when input is a *list-of-data.tables*, [#1476](https://github.com/Rdatatable/data.table/issues/1476). Thanks to @kimiylilammi and @AmitaiPerlstein for the report.
+
+  36. `fread()` handles embedded double quotes in json fields as expected, [#1164](https://github.com/Rdatatable/data.table/issues/1164). Thanks @richardtessier.
+
+  37. `as.data.table.list` handles list elements that are matrices/data.frames/data.tables properly, [#833](https://github.com/Rdatatable/data.table/issues/833). Thanks to @talexand.
+
+  38. `data.table()`, `as.data.table()` and `[.data.table` warn on `POSIXlt` type column and converts to `POSIXct` type. `setDT()` errors when input is list and any column is of type `POSIXlt`, [#646](https://github.com/Rdatatable/data.table/issues/646). Thanks to @tdhock.
+
+  39. `roll` argument handles -ve integer64 values correctly, [#1405](https://github.com/Rdatatable/data.table/issues/1405). Thanks @bryan4887. Also closes [#1650](https://github.com/Rdatatable/data.table/issues/1650), a segfault due to this fix. Thanks @Franknarf1 for filing the issue.
+
+  40. Not join along with `mult="first"` and `mult="last"` is handled correctly, [#1571](https://github.com/Rdatatable/data.table/issues/1571).
+
+  41. `by=.EACHI` works as expected along with `mult="first"` and `mult="last"`, [#1287](https://github.com/Rdatatable/data.table/issues/1287) and [#1271](https://github.com/Rdatatable/data.table/issues/1271).
+
+  42. Subsets using logical expressions in `i` (e.g. `DT[someCol==3]`) no longer return an unintended all-NA row when `DT` consists of a single row and `someCol` contains `NA`, fixing [#1252](https://github.com/Rdatatable/data.table/issues/1252). Thanks to @sergiizaskaleta for reporting. If `i` is the reserved symbol `NA` though (i.e. `DT[NA]`) it is still auto converted to `DT[NA_integer_]` so that a single `NA` row is returned as almost surely expected. For consistency with past behavi [...]
+
+  43. `setattr()` catches logical input that points to R's global TRUE value and sets attributes on a copy instead, along with a warning, [#1281](https://github.com/Rdatatable/data.table/issues/1281). Thanks to @tdeenes.
+
+  44. `fread` respects order of columns provided to argument `select` in result, and also warns if the column(s) provided is not present, [#1445](https://github.com/Rdatatable/data.table/issues/1445). 
+
+  45. `DT[, .BY, by=x]` and other variants of adding a column using `.BY` are now handled correctly, [#1270](https://github.com/Rdatatable/data.table/issues/1270).
+
+  46. `as.data.table.data.table()` method checks and restores over-allocation, [#473](https://github.com/Rdatatable/data.table/issues/473). 
+
+  47. When the number of rows read are less than the number of guessed rows (or allocated), `fread()` doesn't warn anymore; rather restricts to a verbose message, [#1116](https://github.com/Rdatatable/data.table/issues/1116) and [#1239](https://github.com/Rdatatable/data.table/issues/1239). Thanks to @slowteetoe and @hshipper.
+
+  48. `fread()` throws an error if input is a *directory*, [#989](https://github.com/Rdatatable/data.table/issues/989). Thanks @vlsi.
+
+  49. UTF8 BOM header is excluded properly in `fread()`, [#1087](https://github.com/Rdatatable/data.table/issues/1087) and [#1465](https://github.com/Rdatatable/data.table/issues/1465). Thanks to @nigmastar and @MichaelChirico.
+
+  50. Joins using `on=` retains (and discards) keys properly, [#1268](https://github.com/Rdatatable/data.table/issues/1268). Thanks @DouglasClark for [this SO post](http://stackoverflow.com/q/29918595/559784) that helped discover the issue.
+
+  51. Secondary keys are properly removed when those columns get updated, [#1479](https://github.com/Rdatatable/data.table/issues/1479). Thanks @fabiangehring for the report, and also @ChristK for the MRE.
+  
+  52. `dcast` no longer errors on tables with duplicate columns that are unused in the call, [#1654](https://github.com/Rdatatable/data.table/issues/1654). Thanks @MichaelChirico for FR&PR.
+  
+  53. `fread` won't use `wget` for file:// input, [#1668](https://github.com/Rdatatable/data.table/issues/1668); thanks @MichaelChirico for FR&PR.
+
+  54. `chmatch()` handles `nomatch = integer(0)` properly, [#1672](https://github.com/Rdatatable/data.table/issues/1672).
+  
+  55. `dimnames.data.table` no longer errors in `data.table`-unaware environments when a `data.table` has, e.g., been churned through some `dplyr` functions and acquired extra classes, [#1678](https://github.com/Rdatatable/data.table/issues/1678). Thanks Daisy Lee on SO for pointing this out and @MichaelChirico for the fix.
+
+  55. `fread()` did not respect encoding on header column. Now fixed, [#1680](https://github.com/Rdatatable/data.table/issues/1680). Thanks @nachti.
+
+  56. as.data.table's `data.table` method returns a copy as it should, [#1681](https://github.com/Rdatatable/data.table/issues/1681).
+
+  57. Grouped update operations, e.g., `DT[, y := val, by=x]` where `val` is an unsupported type errors *without adding an unnamed column*, [#1676](https://github.com/Rdatatable/data.table/issues/1676). Thanks @wligtenberg.
+  
+  58. Handled use of `.I` in some `GForce` operations, [#1683](https://github.com/Rdatatable/data.table/issues/1683). Thanks gibbz00 from SO and @franknarf1 for reporting and @MichaelChirico for the PR.
+  
+  59. Added `+.IDate` method so that IDate + integer retains the `IDate` class, [#1528](https://github.com/Rdatatable/data.table/issues/1528); thanks @MichaelChirico for FR&PR. Similarly, added `-.IDate` so that IDate - IDate returns a plain integer rather than difftime.
+  
+  60. Radix ordering an integer vector containing INTMAX (2147483647) with decreasing=TRUE and na.last=FALSE failed ASAN check and seg faulted some systems. As reported for base R [#16925](https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16925) whose new code comes from data.table. Simplified code, added test and proposed change to base R.
+
+  60. Fixed test in `onAttach()` for when `Packaged` field is missing from `DESCRIPTION`, [#1706](https://github.com/Rdatatable/data.table/issues/1706); thanks @restonslacker for BR&PR.
+
+  61. Adding missing factor levels are handled correctly in case of NAs. This affected a case of join+update operation as shown in [#1718](https://github.com/Rdatatable/data.table/issues/1718). Thanks to @daniellemccool.
+
+  63. `foverlaps` now raise a meaningful error for duplicate column names, closes [#1730](https://github.com/Rdatatable/data.table/issues/1730). Thanks @rodonn.
+
+  64. `na.omit` and `unique` methods now removes indices, closes [#1734](https://github.com/Rdatatable/data.table/issues/1734) and [#1760](https://github.com/Rdatatable/data.table/issues/1760). Thanks @m-dz and @fc9.30.
+
+  65. List of data.tables with custom class is printed properly, [#1758](https://github.com/Rdatatable/data.table/issues/1758). Thanks @fruce-ki.
+
+  66. `uniqueN` handles `na.rm=TRUE` argument on sorted inputs correctly, [#1771](https://github.com/Rdatatable/data.table/issues/1771). Thanks @ywhuofu.
+
+  67. `get()` / `mget()` play nicely with `.SD` / `.SDcols`, [#1744](https://github.com/Rdatatable/data.table/issues/1753). Thanks @franknarf1.
+
+  68. Joins on integer64 columns assigns `NA` correctly for no matching rows, [#1385](https://github.com/Rdatatable/data.table/issues/1385) and partly [#1459](https://github.com/Rdatatable/data.table/issues/1459). Thanks @dlithio and @abielr.
+  
+  69. Added `as.IDate.POSIXct` to prevent loss of timezone information, [#1498](https://github.com/Rdatatable/data.table/issues/1498). Thanks @dougedmunds for reporting and @MichaelChirico for the investigating & fixing.
+
+  70. Retaining / removing keys is handled better when join is performed on non-key columns using `on` argument, [#1766](https://github.com/Rdatatable/data.table/issues/1766), [#1704](https://github.com/Rdatatable/data.table/issues/1704) and [#1823](https://github.com/Rdatatable/data.table/issues/1823). Thanks @mllg, @DavidArenburg and @mllg.
+
+  71. `rbind` for data.tables now coerces non-list inputs to data.tables first before calling `rbindlist` so that binding list of data.tables and matrices work as expected to be consistent with base's rbind, [#1626](https://github.com/Rdatatable/data.table/issues/1626). Thanks @ems for reporting [here](http://stackoverflow.com/q/34426957/559784) on SO.
+
+  72. Subassigning a factor column with `NA` works as expected. Also, the warning message on coercion is suppressed when RHS is singleton NA, [#1740](https://github.com/Rdatatable/data.table/issues/1740). Thanks @Zus.
+
+  73. Joins on key columns in the presence of `on=` argument were slightly slower as it was unnecesarily running a check to ensure orderedness. This is now fixed, [#1825](https://github.com/Rdatatable/data.table/issues/1825). Thanks @sz-cgt. See that post for updated benchmark.
+  
+  74. `keyby=` now runs j in the order that the groups appear in the sorted result rather than first appearance order, [#606](https://github.com/Rdatatable/data.table/issues/606). This only makes a difference in very rare usage where j does something depending on an earlier group's result, perhaps by using `<<-`. If j is required to be run in first appearance order, then use `by=` whose behaviour is unchanged. Now we have this option. No existing tests affected. New tests added.
+  
+  75. `:=` verbose messages have been corrected and improved, [#1808](https://github.com/Rdatatable/data.table/issues/1808). Thanks to @franknarf1 for reproducible examples. Tests added.
+  
+  76. `DT[order(colA,na.last=NA)]` on a 2-row `DT` with one NA in `colA` and `na.last=NA` (meaning to remove NA) could return a randomly wrong result due to using uninitialized memory. Tests added.
+
+  77. `fread` is now consistent to `read.table` on `colClasses` vector containing NA, also fixes mixed character and factor in `colClasses` vector. Closes [#1910](https://github.com/Rdatatable/data.table/issues/1910).
+
+#### NOTES
+
+  1. Updated error message on invalid joins to reflect the new `on=` syntax, [#1368](https://github.com/Rdatatable/data.table/issues/1368). Thanks @MichaelChirico.
+
+  2. Fixed test 842 to account for `gdata::last` as well, [#1402](https://github.com/Rdatatable/data.table/issues/1402). Thanks @JanGorecki. 
+
+  3. Fixed tests for `fread` 1378.2 and 1378.3 with `showProgress = FALSE`, closes [#1397](https://github.com/Rdatatable/data.table/issues/1397). Thanks to @JanGorecki for the PR.
+
+  4. Worked around auto index error in `v1.9.6` to account for indices created with `v1.9.4`, [#1396](https://github.com/Rdatatable/data.table/issues/1396). Thanks @GRandom.
+
+  5. `test.data.table` gets new argument `silent`, if set to TRUE then it will not raise exception but returns TRUE/FALSE based on the test results.
+
+  6. `dim.data.table` is now implemented in C. Thanks to Andrey Riabushenko.
+
+  7. Better fix to `fread`'s `check.names` argument using `make.names()`, [#1027](https://github.com/Rdatatable/data.table/issues/1027). Thanks to @DavidArenberg for spotting the issue with the previous fix using `make.unique()`.
+
+  8. Fixed explanation of `skip` argument in `?fread` as spotted by @aushev, [#1425](https://github.com/Rdatatable/data.table/issues/1425).
+
+  9. Run `install_name_tool` when building on OS X to ensure that the install name for datatable.so matches its filename. Fixes [#1144](https://github.com/Rdatatable/data.table/issues/1144). Thanks to @chenghlee for the PR.
+  
+  10. Updated documentation of `i` in `[.data.table` to emphasize the emergence of the new `on` option as an alternative to keyed joins, [#1488](https://github.com/Rdatatable/data.table/issues/1488). Thanks @MichaelChirico.
+  
+  11. Improvements and fixes to `?like` [#1515](https://github.com/Rdatatable/data.table/issues/1515). Thanks to @MichaelChirico for the PR.
+
+  12. Several improvements and fixes to `?between` [#1521](https://github.com/Rdatatable/data.table/issues/1521). Thanks @MichaelChirico for the PR.
+
+  13. `?shift.Rd` is fixed so that it does not get misconstrued to be in a time series sense. Closes [#1530](https://github.com/Rdatatable/data.table/issues/1530). Thanks to @pstoyanov.
+
+  14. `?truelength.Rd` is fixed to reflect that over-allocation happens on data.tables loaded from disk only during column additions and not deletions, [#1536](https://github.com/Rdatatable/data.table/issues/1536). Thanks to @Roland and @rajkrpan.
+
+  15. Added `\n` to message displayed in `melt.data.table` when duplicate names are found, [#1538](https://github.com/Rdatatable/data.table/issues/1538). Thanks @Franknarf1.
+
+  16. `merge.data.table` will raise warning if any of data.tables to join has 0 columns. Closes [#597](https://github.com/Rdatatable/data.table/issues/597).
+
+  17. Travis-CI will now automatically deploy package to drat repository hosted on [data.table at gh-pages](https://github.com/Rdatatable/data.table/tree/gh-pages) branch allowing to install latest devel from **source** via `install.packages("data.table", repos = "https://Rdatatable.github.io/data.table", type = "source")`. Closes [#1505](https://github.com/Rdatatable/data.table/issues/1505).
+
+  18. Dependency on `chron` package has been changed to *suggested*. Closes [#1558](https://github.com/Rdatatable/data.table/issues/1558).
+
+  19. Rnw vignettes are converted to Rmd. The *10 minute quick introduction* Rnw vignette has been removed, since almost all of its contents are consolidated into the new intro Rmd vignette. Thanks to @MichaelChirico and @jangorecki. 
+
+  A *quick tour of data.table* HTML vignette is in the works in the spirit of the previous *10 minute quick intro* PDF guide.
+  
+  20. `row.names` argument to `print.data.table` can now be changed by default via `options("datatable.print.rownames")` (`TRUE` by default, the inherited standard), [#1097](https://github.com/Rdatatable/data.table/issues/1097). Thanks to @smcinerney for the suggestion and @MichaelChirico for the PR.
+  
+  21. `data.table`s with `NULL` or blank column names now print with blank column names, [#545](https://github.com/Rdatatable/data.table/issues/545), with minor revision to [#97](https://github.com/Rdatatable/data.table/issues/97). Thanks to @arunsrinivasan for reporting and @MichaelChirico for the PR.
+
+  21. Added a FAQ entry for the new update to `:=` which sometimes doesn't print the result on the first time, [#939](https://github.com/Rdatatable/data.table/issues/939).
+
+  22. Added `Note` section and examples to `?":="` for [#905](https://github.com/Rdatatable/data.table/issues/905).
+
+  23. Fixed example in `?as.data.table.Rd`, [#1576](https://github.com/Rdatatable/data.table/issues/1576). Thanks @MichaelChirico.
+
+  24. Fixed an edge case and added tests for columns of type `function`, [#518](https://github.com/Rdatatable/data.table/issues/518).
+  
+  25. `data.table`'s dependency has been moved forward from R 2.14.1 to R 3.0.0 (Apr 2013; i.e. 3 years old). We keep this dependency as old as possible for as long as possible as requested by users in managed environments. This bump allows `data.table` internals to use `paste0()` for the first time and also allows `fsort()` to accept vectors of length over 2 billion items. Before release to CRAN [our procedures](https://github.com/Rdatatable/data.table/blob/master/CRAN_Release.cmd) incl [...]
+
+  26. New option `options(datatable.use.index = TRUE)` (default) gives better control over usage of indices, when combined with `options(datatable.auto.index = FALSE)` it allows to use only indices created manually with `setindex` or `setindexv`. Closes [#1422](https://github.com/Rdatatable/data.table/issues/1422).
+  
+  27. The default number of over-allocated spare column pointer slots has been increased from 64 to 1024. The wasted memory overhead (if never used) is insignificant (0.008 MB). The advantage is that adding a large number of columns by reference using := or set() inside a loop will not now saturate as quickly and need reallocating. An alleviation to issue [#1633](https://github.com/Rdatatable/data.table/issues/1633). See `?alloc.col` for how to change this default yourself. Accordingly,  [...]
+
+  28. `?IDateTime` now makes clear that `wday`, `yday` and `month` are all 1- (not 0- as in `POSIXlt`) based, [#1658](https://github.com/Rdatatable/data.table/issues/1658); thanks @MichaelChirico.
+
+  29. Fixed misleading documentation of `?uniqueN`, [#1746](https://github.com/Rdatatable/data.table/issues/1746). Thanks @SymbolixAU.
+
+  30. `melt.data.table` restricts column names printed during warning messages to a maximum of five, [#1752](https://github.com/Rdatatable/data.table/issues/1752). Thanks @franknarf1.
+
+  31. data.table's `setNumericRounding` has a default value of 0, which means ordering, joining and grouping of numeric values will be done at *full precision* by default. Handles [#1642](https://github.com/Rdatatable/data.table/issues/1642), [#1728](https://github.com/Rdatatable/data.table/issues/1728), [#1463](https://github.com/Rdatatable/data.table/issues/1463), [#485](https://github.com/Rdatatable/data.table/issues/485).
+
+  32. Subsets with S4 objects in `i` are now faster, [#1438](https://github.com/Rdatatable/data.table/issues/1438). Thanks @DCEmilberg.
+
+  33. When formula RHS is `.` and multiple functions are provided to `fun.aggregate`, column names of the cast data.table columns don't have the `.` in them, as it doesn't add any useful information really, [#1821](https://github.com/Rdatatable/data.table/issues/1821). Thanks @franknarf1.
+
+  34. Function names are added to column names on cast data.tables only when more than one function is provided, [#1810](https://github.com/Rdatatable/data.table/issues/1810). Thanks @franknarf1.
+  
+  35. The option `datatable.old.bywithoutby` to restore the old default has been removed. As warned 2 years ago in release notes and explicitly warned about for 1 year when used. Search down this file for the text 'bywithoutby' to see previous notes on this topic.
+  
+  36. Using `with=FALSE` together with `:=` was deprecated in v1.9.4 released 2 years ago (Oct 2014). As warned then in release notes (see below) this is now a warning with advice to wrap the LHS of `:=` with parenthesis; e.g. `myCols=c("colA","colB"); DT[,(myCols):=1]`. In the next release, this warning message will be an error message.
+
+  37. Using `nomatch` together with `:=` now warns that it is ignored.
+  
+  38. Logical `i` is no longer recycled. Instead an error message if it isn't either length 1 or `nrow(DT)`. This was hiding more bugs than was worth the rare convenience. The error message suggests to recycle explcitly; i.e. `DT[rep(<logical>,length=.N),...]`.
+  
+  39. Thanks to Mark Landry and Michael Chirico for finding and reporting a problem in dev before release with auto `with=FALSE` (item 3 above) when `j` starts with with `!` or `-`, [#1864](https://github.com/Rdatatable/data.table/issues/1864). Fixed and tests added.
+  
+  40. Following latest recommended testthat practices and to avoid a warning that it now issues, `inst/tests/testthat` has been moved to `/tests/testthat`. This means that testthat tests won't be installed for use by users by default and that `test_package("data.table")` will now fail with error `No matching test file in dir` and also a warning `Placing tests in inst/tests/ is deprecated. Please use tests/testthat/ instead`. (That warning seems to be misleading since we already have made [...]
+  
+  41. The license field is changed from "GPL (>= 2)" to "GPL-3 | file LICENSE" due to independent communication from two users of data.table at Google. The lack of an explicit license file was preventing them from contributing patches to data.table. Further, Google lawyers require the full text of the license and not a URL to the license. Since this requirement appears to require the choice of one license, we opted for GPL-3 and we checked the GPL-3 is fine by Google for them to use and  [...]
+  
+  42. Thanks to @rrichmond for finding and reporting a regression in dev before release with `roll` not respecting fractions in type double, [#1904](https://github.com/Rdatatable/data.table/issues/1904). For example dates like `zoo::as.yearmon("2016-11")` which is stored as `double` value 2016.833. Fixed and test added.
 
----
 
 ### Changes in v1.9.6  (on CRAN 19 Sep 2015)
 
@@ -223,7 +590,7 @@
 
   51. `x[J(vals), .N, nomatch=0L]` also included no matches in result, [#1074](https://github.com/Rdatatable/data.table/issues/1074). And `x[J(...), col := val, nomatch=0L]` returned a warning with incorrect results when join resulted in no matches as well, even though `nomatch=0L` should have no effect in `:=`, [#1092](https://github.com/Rdatatable/data.table/issues/1092). Both issues are fixed now. Thanks to @riabusan and @cguill95 for #1092.
 
-  52. `.data.table.locked` attributes set to NULL in internal function `subsetDT`. Closes [#1154](https://github.com/Rdatatable/data.table/issues/1154). Thanks to @Jan.
+  52. `.data.table.locked` attributes set to NULL in internal function `subsetDT`. Closes [#1154](https://github.com/Rdatatable/data.table/issues/1154). Thanks to @jangorecki.
 
   53. Internal function `fastmean()` retains column attributes. Closes [#1160](https://github.com/Rdatatable/data.table/issues/1160). Thanks to @renkun-ken.
 
@@ -233,7 +600,7 @@
 
   56. Key is retained properly when joining on factor type columns. Closes [#477](https://github.com/Rdatatable/data.table/issues/477). Thanks to @nachti for the report.
   
-  57. Over-allocated memory is released more robustly thanks to Karl Miller's investigation and suggested fix.
+  57. Over-allocated memory is released more robustly thanks to Karl Millar's investigation and suggested fix.
   
   58. `DT[TRUE, colA:=colA*2]` no longer churns through 4 unnecessary allocations as large as one column. This was caused by `i=TRUE` being recycled. Thanks to Nathan Kurz for reporting and investigating. Added provided test to test suite. Only a single vector is allocated now for the RHS (`colA*2`). Closes [#1249](https://github.com/Rdatatable/data.table/issues/1249).
   
@@ -513,7 +880,7 @@
 
   31.  Fixed an edge case in `DT[order(.)]` internal optimisation to be consistent with base. Closes [#696](https://github.com/Rdatatable/data.table/issues/696). Thanks to Michael Smith and Garrett See for reporting.
 
-  32.  `DT[, list(list(.)), by=.]` and `DT[, col := list(list(.)), by=.]` returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where `list(.)` does not result in a *copy*. Closes [#481](https://github.com/Rdatatable/data.table/issues/481). Also thanks to KrishnaPG for filing [#728](https://github.com/Rdatatable/data.table/issues/728).
+  32.  `DT[, list(list(.)), by=.]` and `DT[, col := list(list(.)), by=.]` now return correct results in R >= 3.1.0. The bug was due to a welcome change in R 3.1.0 where `list(.)` no longer copies. Closes [#481](https://github.com/Rdatatable/data.table/issues/481). Also thanks to KrishnaPG for filing [#728](https://github.com/Rdatatable/data.table/issues/728).
 
   33.  `dcast.data.table` handles `fun.aggregate` argument properly when called from within a function that accepts `fun.aggregate` argument and passes to `dcast.data.table()`. Closes [#713](https://github.com/Rdatatable/data.table/issues/713). Thanks to mathematicalcoffee for reporting [here](http://stackoverflow.com/q/24542976/559784) on SO.
 
@@ -601,7 +968,7 @@
       * `melt.data.table`'s `na.rm = TRUE` parameter is optimised to remove NAs directly during melt and therefore avoids the overhead of subsetting using `!is.na` afterwards on the molten data. 
       * except for `margins` argument from `reshape2:::dcast`, all features of dcast are intact. `dcast.data.table` can also accept `value.var` columns of type list.
     
-    > Reminder of Cologne (Dec 2013) presentation **slide 32** : ["Why not submit a dcast pull request to reshape2?"](http://datatable.r-forge.r-project.org/CologneR_2013.pdf).
+    > Reminder of Cologne (Dec 2013) presentation **slide 32** : ["Why not submit a dcast pull request to reshape2?"](https://github.com/Rdatatable/data.table/wiki/talks/CologneR_2013.pdf).
   
   2.  Joins scale better as the number of rows increases. The binary merge used to start on row 1 of i; it now starts on the middle row of i. Many thanks to Mike Crowe for the suggestion. This has been done within column so scales much better as the number of join columns increase, too. 
 
@@ -1084,8 +1451,7 @@ USER VISIBLE CHANGES
         the value falls in a gap, and to the end value according to 'rollends'.
         'rolltolast' has been deprecated. For backwards compatibility it is converted to
         {roll=TRUE;rollends=c(FALSE,FALSE)}.
-        This implements FR#2300 & FR#206 and helps several recent S.O. questions :
-            https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2300
+        This implements [r-forge FR#2300](https://github.com/Rdatatable/data.table/issues/615) & [r-forge FR#206](https://github.com/Rdatatable/data.table/issues/459) and helps several recent S.O. questions.
 
 #### BUG FIXES
 
diff --git a/R/IDateTime.R b/R/IDateTime.R
index 6adba1d..07d52fa 100644
--- a/R/IDateTime.R
+++ b/R/IDateTime.R
@@ -12,6 +12,10 @@ as.IDate.Date <- function(x, ...) {
     structure(as.integer(x), class=c("IDate","Date"))
 }    
 
+as.IDate.POSIXct <- function(x, ...) {
+  as.IDate(as.Date(x, tz = attr(x, "tzone"), ...))
+}
+
 as.IDate.IDate <- function(x, ...) x
 
 as.Date.IDate <- function(x, ...) { 
@@ -45,6 +49,42 @@ round.IDate <- function (x, digits=c("weeks", "months", "quarters", "years"), ..
                     years = ISOdate(year(x), 1, 1)))
 }
 
+#Adapted from `+.Date`
+`+.IDate` <- function (e1, e2) {
+    if (nargs() == 1L) 
+        return(e1)
+    if (inherits(e1, "difftime") || inherits(e2, "difftime"))
+        stop("difftime objects may not be added to IDate. Use plain integer instead of difftime.")
+    if ( (storage.mode(e1)=="double" && isReallyReal(e1)) ||
+         (storage.mode(e2)=="double" && isReallyReal(e2)) ) {
+        return(`+.Date`(e1,e2))
+        # IDate doesn't support fractional days; revert to base Date
+    }
+    if (inherits(e1, "Date") && inherits(e2, "Date")) 
+        stop("binary + is not defined for \"IDate\" objects")
+    structure(as.integer(unclass(e1) + unclass(e2)), class = c("IDate", "Date"))
+}
+
+`-.IDate` <- function (e1, e2) {
+    if (!inherits(e1, "IDate")) 
+        stop("can only subtract from \"IDate\" objects")
+    if (storage.mode(e1) != "integer")
+        stop("Internal error: storage mode of IDate is somehow no longer integer")
+    if (nargs() == 1) 
+        stop("unary - is not defined for \"IDate\" objects")
+    if (inherits(e2, "difftime"))
+        stop("difftime objects may not be subtracted from IDate. Use plain integer instead of difftime.")
+    if ( storage.mode(e2)=="double" && isReallyReal(e2) ) {
+        return(`-.Date`(as.Date(e1),as.Date(e2)))
+        # IDate doesn't support fractional days so revert to base Date
+    }
+    ans = as.integer(unclass(e1) - unclass(e2))
+    if (!inherits(e2, "Date")) class(ans) = c("IDate","Date")
+    return(ans)
+}
+
+
+
 ###################################################################
 # ITime -- Integer time-of-day class
 #          Stored as seconds in the day
@@ -83,7 +123,10 @@ as.character.ITime <- format.ITime <- function(x, ...) {
     res = paste(substring(paste("0", hh, sep = ""), nchar(paste(hh))), 
               substring(paste("0", mm, sep = ""), nchar(paste(mm))), 
               substring(paste("0", ss, sep = ""), nchar(paste(ss))), sep = ":")
-    if (any(neg)) res[neg] = paste("-", res[neg], sep="")
+    # Fix for #1354, so that "NA" input is handled correctly.
+    if (is.na(any(neg))) res[is.na(x)] = NA
+    neg = which(neg)
+    if (length(neg)) res[neg] = paste("-", res[neg], sep="")
     res
 }
 
@@ -160,19 +203,23 @@ as.POSIXlt.ITime <- function(x, ...) {
 # chron support
 
 as.chron.IDate <- function(x, time = NULL, ...) {
-    if (!is.null(time)) {
-        chron(dates. = as.chron(as.Date(x)), times. = as.chron(time))
-    } else {
-        chron(dates. = as.chron(as.Date(x)))
-    }    
+    if(!requireNamespace("chron", quietly = TRUE)) stop("Install suggested `chron` package to use `as.chron.IDate` function.") else {
+        if (!is.null(time)) {
+            chron::chron(dates. = chron::as.chron(as.Date(x)), times. = chron::as.chron(time))
+        } else {
+            chron::chron(dates. = chron::as.chron(as.Date(x)))
+        }    
+    }
 }
 
 as.chron.ITime <- function(x, date = NULL, ...) {
-    if (!is.null(date)) {
-        chron(dates. = as.chron(as.Date(date)), times. = as.chron(x))
-    } else {
-        chron(times. = as.character(x))
-    }    
+    if(!requireNamespace("chron", quietly = TRUE)) stop("Install suggested `chron` package to use `as.chron.ITime` function.") else {
+        if (!is.null(date)) {
+            chron::chron(dates. = chron::as.chron(as.Date(date)), times. = chron::as.chron(x))
+        } else {
+            chron::chron(times. = as.character(x))
+        }  
+    }
 }
 
 as.ITime.times <- function(x, ...) {
@@ -193,11 +240,32 @@ as.ITime.times <- function(x, ...) {
 #   lubridate routines do not return integer values.
 ###################################################################
 
+second  <- function(x) as.integer(as.POSIXlt(x)$sec)
+minute  <- function(x) as.POSIXlt(x)$min
 hour    <- function(x) as.POSIXlt(x)$hour
 yday    <- function(x) as.POSIXlt(x)$yday + 1L
 wday    <- function(x) as.POSIXlt(x)$wday + 1L
 mday    <- function(x) as.POSIXlt(x)$mday
 week    <- function(x) yday(x) %/% 7L + 1L
+isoweek <- function(x) {
+  #ISO 8601-conformant week, as described at
+  #  https://en.wikipedia.org/wiki/ISO_week_date
+  
+  #Approach:
+  # * Find nearest Thursday to each element of x
+  # * Find the number of weeks having passed between
+  #   January 1st of the year of the nearest Thursdays and x
+  
+  xlt <- as.POSIXlt(x)
+  
+  #We want Thursday to be 3 (4 by default in POSIXlt), so
+  #  subtract 1 and re-divide; also, POSIXlt increment by seconds
+  nearest_thurs <- xlt + (3 - ((xlt$wday - 1) %% 7)) * 86400
+  
+  year_start <- as.POSIXct(paste0(as.POSIXlt(nearest_thurs)$year + 1900L, "-01-01"))
+  
+  as.integer(1 + unclass(difftime(nearest_thurs, year_start, units = "days")) %/% 7)
+}
 month   <- function(x) as.POSIXlt(x)$mon + 1L
 quarter <- function(x) as.POSIXlt(x)$mon %/% 3L + 1L
 year    <- function(x) as.POSIXlt(x)$year + 1900L
diff --git a/R/as.data.table.R b/R/as.data.table.R
new file mode 100644
index 0000000..324a478
--- /dev/null
+++ b/R/as.data.table.R
@@ -0,0 +1,179 @@
+as.data.table <-function(x, keep.rownames=FALSE, ...)
+{
+    if (is.null(x))
+        return(null.data.table())
+    UseMethod("as.data.table")
+}
+
+as.data.table.default <- function(x, ...){
+  setDT(as.data.frame(x, ...))[]
+}
+
+as.data.table.factor <- as.data.table.ordered <- 
+as.data.table.integer <- as.data.table.numeric <- 
+as.data.table.logical <- as.data.table.character <- 
+as.data.table.Date <- as.data.table.ITime <- function(x, keep.rownames=FALSE, ...) {
+    if (is.matrix(x)) {
+        return(as.data.table.matrix(x, ...))
+    }
+    tt = deparse(substitute(x))[1]
+    nm = names(x)
+    # FR #2356 - transfer names of named vector as "rn" column if required
+    if (!identical(keep.rownames, FALSE) & !is.null(nm)) 
+        x <- list(nm, unname(x))
+    else x <- list(x)
+    if (tt == make.names(tt)) {
+        # can specify col name to keep.rownames, #575
+        nm = if (length(x) == 2L) if (is.character(keep.rownames)) keep.rownames[1L] else "rn"
+        setattr(x, 'names', c(nm, tt))
+    }
+    as.data.table.list(x, FALSE)
+}
+
+R300_provideDimnames <- function (x, sep = "", base = list(LETTERS)) {
+    # backported from R3.0.0 so data.table can depend on R 2.14.0 
+    dx <- dim(x)
+    dnx <- dimnames(x)
+    if (new <- is.null(dnx)) 
+        dnx <- vector("list", length(dx))
+    k <- length(M <- vapply(base, length, 1L))
+    for (i in which(vapply(dnx, is.null, NA))) {
+        ii <- 1L + (i - 1L)%%k
+        dnx[[i]] <- make.unique(base[[ii]][1L + 0:(dx[i] - 1L)%%M[ii]], 
+            sep = sep)
+        new <- TRUE
+    }
+    if (new) 
+        dimnames(x) <- dnx
+    x
+}
+
+# as.data.table.table - FR #4848
+as.data.table.table <- function(x, keep.rownames=FALSE, ...) {
+    # Fix for bug #5408 - order of columns are different when doing as.data.table(with(DT, table(x, y)))
+    val = rev(dimnames(R300_provideDimnames(x)))
+    if (is.null(names(val)) || all(nchar(names(val)) == 0L)) 
+        setattr(val, 'names', paste("V", rev(seq_along(val)), sep=""))
+    ans <- data.table(do.call(CJ, c(val, sorted=FALSE)), N = as.vector(x))
+    setcolorder(ans, c(rev(head(names(ans), -1)), "N"))
+    ans
+}
+
+as.data.table.matrix <- function(x, keep.rownames=FALSE, ...) {
+    if (!identical(keep.rownames, FALSE)) {
+        # can specify col name to keep.rownames, #575
+        ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
+        if (is.character(keep.rownames))
+            setnames(ans, 'rn', keep.rownames[1L])
+        return(ans)
+    }
+    d <- dim(x)
+    nrows <- d[1L]
+    ir <- seq_len(nrows)
+    ncols <- d[2L]
+    ic <- seq_len(ncols)
+    dn <- dimnames(x)
+    collabs <- dn[[2L]]
+    if (any(empty <- nchar(collabs) == 0L))
+        collabs[empty] <- paste("V", ic, sep = "")[empty]
+    value <- vector("list", ncols)
+    if (mode(x) == "character") {
+        # fix for #745 - A long overdue SO post: http://stackoverflow.com/questions/17691050/data-table-still-converts-strings-to-factors
+        for (i in ic) value[[i]] <- x[, i]                  # <strike>for efficiency.</strike> For consistency - data.table likes and prefers "character"
+    }
+    else {
+        for (i in ic) value[[i]] <- as.vector(x[, i])       # to drop any row.names that would otherwise be retained inside every column of the data.table
+    }
+    if (length(collabs) == ncols)
+        setattr(value, "names", collabs)
+    else
+        setattr(value, "names", paste("V", ic, sep = ""))
+    setattr(value,"row.names",.set_row_names(nrows))
+    setattr(value,"class",c("data.table","data.frame"))
+    alloc.col(value)
+}
+
+as.data.table.list <- function(x, keep.rownames=FALSE, ...) {
+    if (!length(x)) return( null.data.table() )
+    # fix for #833, as.data.table.list with matrix/data.frame/data.table as a list element..
+    # TODO: move this entire logic (along with data.table() to C
+    for (i in seq_along(x)) {
+        dims = dim(x[[i]])
+        if (!is.null(dims)) {
+            ans = do.call("data.table", x)
+            setnames(ans, make.unique(names(ans)))
+            return(ans)
+        }
+    }
+    n = vapply(x, length, 0L)
+    mn = max(n)
+    x = copy(x)
+    idx = which(n < mn)
+    if (length(idx)) {
+        for (i in idx) {
+            if (!is.null(x[[i]])) {# avoids warning when a list element is NULL
+                if (inherits(x[[i]], "POSIXlt")) {
+                    warning("POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.")
+                    x[[i]] = as.POSIXct(x[[i]])
+                }
+                # Implementing FR #4813 - recycle with warning when nr %% nrows[i] != 0L
+                if (!n[i] && mn)
+                    warning("Item ", i, " is of size 0 but maximum size is ", mn, ", therefore recycled with 'NA'")
+                else if (n[i] && mn %% n[i] != 0)
+                    warning("Item ", i, " is of size ", n[i], " but maximum size is ", mn, " (recycled leaving a remainder of ", mn%%n[i], " items)")
+                x[[i]] = rep(x[[i]], length.out=mn)
+            }
+        }
+    }
+    # fix for #842
+    if (mn > 0L) {
+        nz = which(n > 0L)
+        xx = point(vector("list", length(nz)), seq_along(nz), x, nz)
+        if (!is.null(names(x)))
+            setattr(xx, 'names', names(x)[nz])
+        x = xx
+    }
+    if (is.null(names(x))) setattr(x,"names",paste("V",seq_len(length(x)),sep=""))
+    setattr(x,"row.names",.set_row_names(max(n)))
+    setattr(x,"class",c("data.table","data.frame"))
+    alloc.col(x)
+}
+
+# don't retain classes before "data.frame" while converting 
+# from it.. like base R does. This'll break test #527 (see 
+# tests and as.data.table.data.frame) I've commented #527 
+# for now. This addresses #1078 and #1128
+.resetclass <- function(x, class) {
+    cx = class(x)
+    n  = chmatch(class, cx)
+    cx = unique( c("data.table", "data.frame", tail(cx, length(cx)-n)) )
+}
+
+as.data.table.data.frame <- function(x, keep.rownames=FALSE, ...) {
+    if (!identical(keep.rownames, FALSE)) {
+        # can specify col name to keep.rownames, #575
+        ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
+        if (is.character(keep.rownames))
+            setnames(ans, 'rn', keep.rownames[1L])
+        return(ans)
+    }
+    ans = copy(x)  # TO DO: change this deep copy to be shallow.
+    setattr(ans,"row.names",.set_row_names(nrow(x)))
+
+    ## NOTE: This test (#527) is no longer in effect ##
+    # for nlme::groupedData which has class c("nfnGroupedData","nfGroupedData","groupedData","data.frame")
+    # See test 527.
+    ## 
+
+    # fix for #1078 and #1128, see .resetclass() for explanation.
+    setattr(ans, "class", .resetclass(x, "data.frame"))
+    alloc.col(ans)
+}
+
+as.data.table.data.table <- function(x, ...) {
+    # as.data.table always returns a copy, automatically takes care of #473
+    x = copy(x) # #1681
+    # fix for #1078 and #1128, see .resetclass() for explanation.
+    setattr(x, 'class', .resetclass(x, "data.table"))
+    return(x)
+}
diff --git a/R/between.R b/R/between.R
index a56b25d..ced6ca4 100644
--- a/R/between.R
+++ b/R/between.R
@@ -1,9 +1,40 @@
-
-between <- function(x,lower,upper,incbounds=TRUE)
-{
-  if(incbounds) x>=lower & x<=upper
-  else x>lower & x<upper
+# is x[i] in between lower[i] and upper[i] ? 
+between <- function(x,lower,upper,incbounds=TRUE) {
+    is_strictly_numeric <- function(x) is.numeric(x) && !"integer64" %in% class(x)
+    if (is_strictly_numeric(x) && is_strictly_numeric(lower) &&
+        is_strictly_numeric(upper) && length(lower) == 1L && length(upper) == 1L) {
+        # faster parallelised version for int/double for most common scenario
+        .Call(Cbetween, x, lower, upper, incbounds)
+    } else {
+        if(incbounds) x>=lower & x<=upper
+        else x>lower & x<upper
+    }
 }
 
-"%between%" <- function(x,y) between(x,y[1],y[2],incbounds=TRUE)
+# %between% is vectorised, #534.
+"%between%" <- function(x,y) between(x,y[[1]],y[[2]],incbounds=TRUE)
 # If we want non inclusive bounds with %between%, just +1 to the left, and -1 to the right (assuming integers)
+
+# issue FR #707
+# is x[i] found anywhere within [lower, upper] range?
+inrange <- function(x,lower,upper,incbounds=TRUE) {
+    query = setDT(list(x=x))
+    subject = setDT(list(l=lower, u=upper))
+    ops = if (incbounds) c(4L, 2L) else c(5L, 3L) # >=,<= and >,<
+    verbose = getOption("datatable.verbose")
+    if (verbose) {last.started.at=proc.time()[3];cat("forderv(query) took ... ");flush.console()}
+    xo = forderv(query)
+    if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+    ans = bmerge(shallow(subject), query, 1:2, c(1L,1L), FALSE, xo, 
+            0, c(FALSE, TRUE), 0L, "all", ops, integer(0), 
+            1L, verbose) # fix for #1819, turn on verbose messages
+    options(datatable.verbose=FALSE)
+    setDT(ans[c("starts", "lens")], key=c("starts", "lens"))
+    options(datatable.verbose=verbose)
+    if (verbose) {last.started.at=proc.time()[3];cat("Generating final logical vector ... ");flush.console()}
+    .Call(Cinrange, idx <- vector("logical", length(x)), xo, ans[["starts"]], ans[["lens"]])
+    if (verbose) {cat("done in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+    idx
+}
+
+"%inrange%" <- function(x,y) inrange(x,y[[1L]],y[[2L]],incbounds=TRUE)
diff --git a/R/bmerge.R b/R/bmerge.R
index 85f420f..2090c46 100644
--- a/R/bmerge.R
+++ b/R/bmerge.R
@@ -1,5 +1,5 @@
 
-bmerge <- function(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, verbose)
+bmerge <- function(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, mult, ops, nqgrp, nqmaxgrp, verbose)
 {
     # TO DO: rename leftcols to icols, rightcols to xcols
     # NB: io is currently just TRUE or FALSE for whether i is keyed
@@ -89,15 +89,8 @@ bmerge <- function(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, v
             set(i, j=lc, value=newval)
         }
     }
-        
-    # Now that R doesn't copy named inputs to list(), we can return these as a list()
-    # TO DO: could be allocated inside Cbmerge and returned as list from that
-    f__ = integer(nrow(i))
-    len__ = integer(nrow(i))
-    allLen1 = logical(1)
-    
     if (verbose) {last.started.at=proc.time()[3];cat("Starting bmerge ...");flush.console()}
-    .Call(Cbmerge, i, x, as.integer(leftcols), as.integer(rightcols), io<-haskey(i), xo, roll, rollends, nomatch, f__, len__, allLen1)
+    ans = .Call(Cbmerge, i, x, as.integer(leftcols), as.integer(rightcols), io<-haskey(i), xo, roll, rollends, nomatch, mult, ops, nqgrp, nqmaxgrp)
     # NB: io<-haskey(i) necessary for test 579 where the := above change the factor to character and remove i's key
     if (verbose) {cat("done in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console()}
 
@@ -109,7 +102,7 @@ bmerge <- function(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, v
         if (haskey(origi))
             setattr(i, 'sorted', key(origi))
     }    
-    return(list(starts=f__, lens=len__, allLen1=allLen1))
+    return(ans)
 }
 
 
diff --git a/R/data.table.R b/R/data.table.R
index 2363882..97da35e 100644
--- a/R/data.table.R
+++ b/R/data.table.R
@@ -1,81 +1,7 @@
-deconstruct_and_eval <- function(expr, envir = parent.frame(), enclos = parent.frame()) {
-    if (!mode(expr) %in% c("call", "expression", "(")) return(expr)
-    # Fix for #774.
-    # the only place where a call is of length 1, that I can think of is 
-    # a function call with empty arguments: e.g., a[[1L]]() or a$b() or b()
-    # and in all these cases, we *don't* want to deconstruct and eval
-    # hence the "else if" statement is commented below. No existing tests 
-    # are broken in doing so. If there are other reports, will revisit.
-    if (length(expr) == 1L) {
-        if (is.expression(expr)) return (deconstruct_and_eval(expr[[1L]]))
-        # else if (is.call(expr[[1L]])) return (list(deconstruct_and_eval(expr[[1L]])))
-        else return(expr)
-    }
-    # Fix for #2496. The `{` in `DT[, {var := bla}, by=x]` is caught and removed from `j`.
-    if (expr[[1L]] == "{" & is.call(expr[[2L]])) {
-        if (identical(expr[[2L]][[1L]], quote(`:=`))) {
-            warning('Caught and removed `{` wrapped around := in j. := and `:=`(...) are 
-                defined for use in j, once only and in particular ways. See help(":=").')
-            return(deconstruct_and_eval(expr[[2L]], envir, enclos))
-        }
-    }
-    # don't evaluate eval's if the environment is specified
-    if (expr[[1L]] == quote(eval) && length(expr) < 3L) {
-        return(deconstruct_and_eval(eval(expr[[2L]], envir, enclos), envir, enclos))
-    }
-    ff <- function(m) {
-        if (is.call(m)) {
-            if (m[[1L]] == quote(eval))
-                # fix for #880. Hopefully this resolve the eval(parse(.)) issue for good.
-                if (is.call(m[[2L]]) && m[[2L]][[1L]] == quote(parse)) 
-                    deconstruct_and_eval(m, envir, enclos)
-                else deconstruct_and_eval(eval(m[[2L]], envir, enclos), envir, enclos)
-            else deconstruct_and_eval(m, envir, enclos)
-        } else m
-    }
-    lapply(expr, ff)
-}
-
-construct <- function(l) {
-    if (length(l) == 0L) return(NULL)
-    if (is.name(l)) return(l) # fix for error in cases as reported in Bug #5007: DT[, (cols) := lapply(.SD, function(x) MyValueIsTen), by=ID]
-                              # construct(l[[3L]] would give an error when l[[3L]] is MyValueIsTen if not for this line)
-    if (length(l) == 1L) {
-        if (length(l[[1L]]) == 1L & !is.call(l)) return(l[[1L]]) # so that DT[, test()] does not return just the function definition
-        else return(as.call(list(construct(l[[1L]]))))
-    }
-
-    if (identical(l[[1L]], quote(`function`))) return(as.call(list(l[[1L]], l[[2L]], construct(l[[3L]]))))
-
-    if (!is.list(l)) return(l)
 
-    as.call(setNames(lapply(l, function(m) {
-        if (length(m) == 1L) m
-        else construct(m)
-    }), names(l)))
-}
-
-replace_dot <- function(e) {
-    if (is.call(e)) {
-        if (e[[1L]] == ".")
-            e[[1L]] = quote(list)
-        le = as.list(e)
-        for (i in seq_along(le)) {
-            if (is.call(le[[i]])) {
-                if (le[[i]][[1L]] == ".") 
-                    le[[i]][[1L]] = quote(list)
-                le[[i]] = as.call(replace_dot(le[[i]]))
-            }
-        }
-        e = as.call(le)
-    }
-    e
-}
-
-dim.data.table <- function(x) {
-    if (length(x)) c(length(x[[1L]]), length(x))
-    else c(0L,0L)
-    # TO DO: consider placing "dim" as an attibute updated on inserts. Saves this 'if'.
+dim.data.table <- function(x) 
+{
+    .Call(Cdim, x)
 }
 
 .global <- new.env()  # thanks to: http://stackoverflow.com/a/12605694/403310
@@ -90,12 +16,24 @@ setPackageName("data.table",.global)
 # So even though .BY doesn't appear in this file, it should still be NULL here and exported because it's
 # defined in SDenv and can be used by users.
 
-print.data.table <- function(x,
-    topn=getOption("datatable.print.topn"),   # (5) print the top topn and bottom topn rows with '---' inbetween
-    nrows=getOption("datatable.print.nrows"), # (100) under this the whole (small) table is printed, unless topn is provided
-    row.names = TRUE, quote = FALSE, ...)
-{
-    if (.global$print!="" && address(x)==.global$print) {   # The !="" is to save address() calls and R's global cache of address strings
+mimicsAutoPrint = c("knit_print.default")
+# add maybe repr_text.default.  See https://github.com/Rdatatable/data.table/issues/933#issuecomment-220237965
+
+shouldPrint = function(x) {
+  ret = (.global$print=="" ||   # to save address() calls and adding lots of address strings to R's global cache
+         address(x)!=.global$print)
+  .global$print = ""
+  ret
+}
+
+print.data.table <- function(x, topn=getOption("datatable.print.topn"), 
+                             nrows=getOption("datatable.print.nrows"), 
+                             class=getOption("datatable.print.class"), 
+                             row.names=getOption("datatable.print.rownames"), 
+                             quote=FALSE, ...) {    # topn  - print the top topn and bottom topn rows with '---' inbetween (5)
+    # nrows - under this the whole (small) table is printed, unless topn is provided (100)
+    # class - should column class be printed underneath column name? (FALSE)
+    if (!shouldPrint(x)) {   
         #  := in [.data.table sets .global$print=address(x) to suppress the next print i.e., like <- does. See FAQ 2.22 and README item in v1.9.5
         # The issue is distinguishing "> DT" (after a previous := in a function) from "> DT[,foo:=1]". To print.data.table(), there
         # is no difference. Now from R 3.2.0 a side effect of the very welcome and requested change to avoid silent deep copy is that
@@ -105,13 +43,12 @@ print.data.table <- function(x,
         #   topenv(), inspecting next statement in caller, using clock() at C level to timeout suppression after some number of cycles
         SYS <- sys.calls()
         if (length(SYS) <= 2 ||  # "> DT" auto-print or "> print(DT)" explicit print (cannot distinguish from R 3.2.0 but that's ok)
-            ( length(SYS) > 3L &&
-              SYS[[length(SYS)-3L]][[1L]] == "knit_print.default") ) {   # knitr auto print to mimic the prompt
-            .global$print = ""
+            ( length(SYS) > 3L && is.symbol(thisSYS <- SYS[[length(SYS)-3L]][[1L]]) && 
+              as.character(thisSYS) %chin% mimicsAutoPrint ) )  {
             return(invisible())
+            # is.symbol() temp fix for #1758.
         }
     }
-    .global$print = ""
     if (!is.numeric(nrows)) nrows = 100L
     if (!is.infinite(nrows)) nrows = as.integer(nrows)
     if (nrows <= 0L) return(invisible())   # ability to turn off printing
@@ -135,9 +72,27 @@ print.data.table <- function(x,
         printdots = FALSE
     }
     toprint=format.data.table(toprint, ...)
+    # fix for #975.
+    if (any(sapply(x, function(col) "integer64" %in% class(col))) && !"package:bit64" %in% search()) {
+        warning("Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.")
+    }
     # FR #5020 - add row.names = logical argument to print.data.table
     if (isTRUE(row.names)) rownames(toprint)=paste(format(rn,right=TRUE,scientific=FALSE),":",sep="") else rownames(toprint)=rep.int("", nrow(toprint))
-    if (is.null(names(x))) colnames(toprint)=rep("NA", ncol(toprint)) # fixes bug #4934
+    if (is.null(names(x)) | all(names(x) == "")) colnames(toprint)=rep("", ncol(toprint)) # fixes bug #97 (RF#4934) and #545 (RF#5253)
+    if (isTRUE(class)) {
+      #Matching table for most common types & their abbreviations
+      class_abb = c(list = "<list>", integer = "<int>", numeric = "<num>",
+            character = "<char>", Date = "<Date>", complex = "<cplx>",
+            factor = "<fctr>", POSIXct = "<POSc>", logical = "<lgcl>",
+            IDate = "<IDat>", integer64 = "<i64>", raw = "<raw>",
+            expression = "<expr>", ordered = "<ord>")
+      classes = vapply(x, function(col) class(col)[1L], "", USE.NAMES=FALSE)
+      abbs = unname(class_abb[classes])
+      if ( length(idx <- which(is.na(abbs))) )
+        abbs[idx] = paste("<", classes[idx], ">", sep="")
+      toprint = rbind(abbs, toprint)
+      rownames(toprint)[1L] = ""
+    }
     if (printdots) {
         toprint = rbind(head(toprint,topn),"---"="",tail(toprint,topn))
         rownames(toprint) = format(rownames(toprint),justify="right")
@@ -203,7 +158,7 @@ null.data.table <-function() {
     alloc.col(ans)
 }
 
-data.table <-function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)
+data.table <-function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFactors=FALSE)
 {
     # NOTE: It may be faster in some circumstances to create a data.table by creating a list l first, and then setattr(l,"class",c("data.table","data.frame")) at the expense of checking.
     # TO DO: rewrite data.table(), one of the oldest functions here. Many people use data.table() to convert data.frame rather than
@@ -256,6 +211,8 @@ data.table <-function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)
         } else if (is.table(xi)) {
             x[[i]] = xi = as.data.table.table(xi, keep.rownames=keep.rownames)
             numcols[i] = length(xi)
+        } else if (is.function(xi)) {
+            x[[i]] = xi = list(xi)
         }
         nrows[i] <- NROW(xi)    # for a vector (including list() columns) returns the length
         if (numcols[i]>0L) {
@@ -343,10 +300,23 @@ data.table <-function(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)
            && !any(duplicated(names(value)[names(value) %in% ckey])))
            setattr(value, "sorted", ckey)
     }
+    # FR #643, setfactor is an internal function in fread.R
+    if (isTRUE(stringsAsFactors)) setfactor(value, which(vapply(value, is.character, TRUE)), FALSE)
     alloc.col(value)  # returns a NAMED==0 object, unlike data.frame()
 }
 
+replace_dot_alias <- function(e) {
+    # we don't just simply alias .=list because i) list is a primitive (faster to iterate) and ii) we test for use
+    # of "list" in several places so it saves having to remember to write "." || "list" in those places 
+    if (is.call(e)) {
+        if (e[[1L]] == ".") e[[1L]] = quote(list)
+        for (i in seq_along(e)[-1]) if (!is.null(e[[i]])) e[[i]] = replace_dot_alias(e[[i]])
+    }
+    e
+}
+
 .massagei <- function(x) {
+    # J alias for list as well in i, just if the first symbol 
     if (is.call(x) && as.character(x[[1L]]) %chin% c("J","."))
         x[[1L]] = quote(list)
     x
@@ -379,6 +349,8 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         return(ans)
     }
     if (!mult %chin% c("first","last","all")) stop("mult argument can only be 'first','last' or 'all'")
+    missingroll = missing(roll)
+    missingwith = missing(with)
     if (length(roll)!=1L || is.na(roll)) stop("roll must be a single TRUE, FALSE, positive/negative integer/double including +Inf and -Inf or 'nearest'")
     if (is.character(roll)) {
         if (roll!="nearest") stop("roll is '",roll,"' (type character). Only valid character value is 'nearest'.")
@@ -402,18 +374,13 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         # or some kind of dynamic construction that has edge case of no contents inside [...]
         return(x)
     }
-    if (!with && missing(j)) stop("j must be provided when with=FALSE")
-    if (!missing(j)) {
-        jsub = substitute(j) # Ignore allow.cartesian when `jsub` has `:=`. Search for #800 to see where we need this.
-        # deconstruct and eval everything with just one argument, then reconstruct back to a call
-        if (is.call(jsub)) jsub = construct(deconstruct_and_eval(replace_dot(jsub), parent.frame(), parent.frame()))
-    }
-    bysub=NULL
-    if (!missing(by)) bysub=substitute(by)
     if (!missing(keyby)) {
         if (!missing(by)) stop("Provide either 'by' or 'keyby' but not both")
         by=bysub=substitute(keyby)
         # Assign to 'by' so that by is no longer missing and we can proceed as if there were one by
+    } else {
+        bysub = if (missing(by)) NULL # and leave missing(by)==TRUE
+                else substitute(by)
     }
     byjoin = FALSE
     if (!missing(by)) {
@@ -423,6 +390,72 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     irows = NULL  # Meaning all rows. We avoid creating 1:nrow(x) for efficiency.
     notjoin = FALSE
     rightcols = leftcols = integer()
+    if (!with && missing(j)) stop("j must be provided when with=FALSE")
+    if (!missing(j)) {
+        jsub = replace_dot_alias(substitute(j))
+        root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
+        if (root == ":" ||
+            (root %chin% c("-","!") && is.call(jsub[[2L]]) && jsub[[2L]][[1L]]=="(" && is.call(jsub[[2L]][[2L]]) && jsub[[2L]][[2L]][[1L]]==":") ||
+            ( !length(all.vars(jsub)) &&
+              root %chin% c("","c","paste","paste0","-","!") &&
+              missing(by) )) {   # test 763. TODO: likely that !missing(by) iff with==TRUE (so, with can be removed)
+            # When no variable names occur in j, scope doesn't matter because there are no symbols to find.
+            # Auto set with=FALSE in this case so that DT[,1], DT[,2:3], DT[,"someCol"] and DT[,c("colB","colD")]
+            # work as expected.  As before, a vector will never be returned, but a single column data.table
+            # for type consistency with >1 cases. To return a single vector use DT[["someCol"]] or DT[[3]].
+            # The root==":" is to allow DT[,colC:colH] even though that contains two variable names
+            # root either "-" or "!" is for tests 1504.11 and 1504.13 (a : with a ! or - modifier root)
+            # This change won't break anything because it didn't do anything anyway; i.e. used to return the
+            # j value straight back: DT[,1] == 1 which isn't possibly useful.  It was that was for consistency
+            # of learning, since it was simpler to state that j always gets eval'd within the scope of DT.
+            # We don't want to evaluate j at all in making this decision because i) evaluating itself could
+            # increment some variable and not intended to be evaluated a 2nd time later on and ii) we don't
+            # want decisions like this to depend on the data or vector lengths since that can introduce
+            # inconistency reminiscent of drop in [.data.table that we seek to avoid.
+            #if (verbose) cat("Auto with=FALSE because j  
+            with=FALSE
+        } else if (is.name(jsub) && isTRUE(getOption("datatable.WhenJisSymbolThenCallingScope"))) {
+            # Allow future behaviour to be turned on. Current default is FALSE.
+            # Use DT[["someCol"]] or DT$someCol to fetch that column as vector, regardless of this option.
+            if (!missingwith && isTRUE(with)) {
+                # unusual edge case only when future option has been turned on
+                stop('j is a single symbol, WhenJisSymbol is turned on but with=TRUE has been passed explicitly. Please instead use DT[,"someVar"], DT[,.(someVar)] or DT[["someVar"]]')
+            } else {
+                with=FALSE
+            }
+            jsubChar = as.character(jsub)
+            if (!exists(jsubChar, where=parent.frame()) && jsubChar %chin% names(x)) {
+                # Would fail anyway with 'object 'a' not found' but give a more helpful error. Thanks to Jan's suggestion.
+                stop("The option 'datatable.WhenJisSymbolThenCallingScope' is TRUE so looking for the variable '", jsubChar, "' in calling scope but it is not found there. It is a column name though. So, most likely, please wrap with quotes (i.e. DT[,'", jsubChar, "']) to return a 1-column data.table or if you need the column as a plain vector then DT[['",jsubChar,"']] or DT$",jsubChar)
+            }
+        }
+        if (root=="{") { 
+            if (length(jsub)==2) {
+                jsub = jsub[[2L]]  # to allow {} wrapping of := e.g. [,{`:=`(...)},] [#376]
+                root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""
+            } else if (is.call(jsub[[2L]]) && jsub[[2L]][[1L]] == ":=") {
+                stop("You have wrapped := with {} which is ok but then := must be the only thing inside {}. You have something else inside {} as well. Consider placing the {} on the RHS of := instead; e.g. DT[,someCol:={tmpVar1<-...;tmpVar2<-...;tmpVar1*tmpVar2}")
+            }
+        }
+        if (root=="eval" && !any(all.vars(jsub[[2]]) %chin% names(x))) {
+            # TODO: this && !any depends on data. Can we remove it?
+            # Grab the dynamic expression from calling scope now to give the optimizer a chance to optimize it
+            # Only when top level is eval call.  Not nested like x:=eval(...) or `:=`(x=eval(...), y=eval(...))
+            jsub = eval(jsub[[2L]], parent.frame(), parent.frame())  # this evals the symbol to return the dynamic expression
+            if (is.expression(jsub)) jsub = jsub[[1L]]    # if expression, convert it to call
+            # Note that the dynamic expression could now be := (new in v1.9.7)
+            root = if (is.call(jsub)) as.character(jsub[[1L]])[1L] else ""        
+        }
+        if (root == ":=") {
+            allow.cartesian=TRUE   # (see #800)
+            if (!missing(i) && !missing(keyby))
+                stop(":= with keyby is only possible when i is not supplied since you can't setkey on a subset of rows. Either change keyby to by or remove i")
+            if (!missingnomatch) {
+                warning("nomatch isn't relevant together with :=, ignoring nomatch")
+                nomatch=0L
+            }
+        }
+    }
     
     # To take care of duplicate column names properly (see chmatch2 function above `[data.table`) for description
     dupmatch <- function(x, y, ...) {
@@ -437,6 +470,21 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     if (!missing(i)) {
         xo = NULL
         isub = substitute(i)
+        if (identical(isub, NA)) {
+            # only possibility *isub* can be NA (logical) is the symbol NA itself; i.e. DT[NA]
+            # replace NA in this case with NA_integer_ as that's almost surely what user intended to
+            # return a single row with NA in all columns. (DT[0] returns an empty table, with correct types.)
+            # Any expression (including length 1 vectors) that evaluates to a single NA logical will
+            # however be left as NA logical since that's important for consistency to return empty in that
+            # case; e.g. DT[Col==3] where DT is 1 row and Col contains NA.
+            # Replacing the NA symbol makes DT[NA] and DT[c(1,NA)] consistent and provides
+            # an easy way to achieve a single row of NA as users expect rather than requiring them
+            # to know and change to DT[NA_integer_].
+            isub=NA_integer_
+        }
+        isnull_inames = FALSE
+        nqgrp = integer(0)  # for non-equi join
+        nqmaxgrp = 1L       # for non-equi join
         # Fixes 4994: a case where quoted expression with a "!", ex: expr = quote(!dt1); dt[eval(expr)] requires 
         # the "eval" to be checked before `as.name("!")`. Therefore interchanged.
         restore.N = remove.N = FALSE
@@ -479,11 +527,13 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             assign("x", x, order_env)
             i = eval(isub, order_env, parent.frame())             # for optimisation of 'order' to 'forder'
             # that forder returns integer(0) is taken care of internally within forder
-          } else if (is.call(isub) && getOption("datatable.auto.index") &&
-                   as.character(isub[[1L]]) %chin% c("==","%in%") &&
-                   is.name(isub[[2L]]) &&
-                   (isub2<-as.character(isub[[2L]])) %chin% names(x) && 
-                   is.null(attr(x, '.data.table.locked'))) {  # fix for #958, don't create auto index on '.SD'.
+        } else if (is.call(isub) &&
+                     getOption("datatable.use.index") && # #1422
+                     as.character(isub[[1L]]) %chin% c("==","%in%") &&
+                     is.name(isub[[2L]]) &&
+                     (isub2<-as.character(isub[[2L]])) %chin% names(x) &&
+                     (getOption("datatable.auto.index") || (isub2 %chin% indices(x))) && # `||` used to either auto.index or already have index #1422
+                     is.null(attr(x, '.data.table.locked'))) {  # fix for #958, don't create auto index on '.SD'.
             # LHS is a column name symbol
             # simplest case for now (single ==).  Later, top level may be &,|,< or >
             # TO DO: print method could print physical and secondary keys at end.
@@ -495,7 +545,8 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 if (length(RHS)!=nrow(x)) stop("RHS of == is length ",length(RHS)," which is not 1 or nrow (",nrow(x),"). For robustness, no recycling is allowed (other than of length 1 RHS). Consider %in% instead.")
                 i = x[[isub2]] == RHS    # DT[colA == colB] regular element-wise vector scan
             } else if ( (is.integer(x[[isub2]]) && is.double(RHS) && isReallyReal(RHS)) || (mode(x[[isub2]]) != mode(RHS) && !(class(x[[isub2]]) %in% c("character", "factor") && 
-                         class(RHS) %in% c("character", "factor"))) ) {
+                         class(RHS) %in% c("character", "factor"))) || 
+                         (is.factor(x[[isub2]]) && !is.factor(RHS) && mode(RHS)=="numeric") ) { # fringe case, #1361. TODO: cleaner way of doing these checks.
                     # re-direct all non-matching mode cases to base R, as data.table's binary 
                     # search based join is strict in types. #957 and #961.
                     i = if (isub[[1L]] == "==") x[[isub2]] == RHS else x[[isub2]] %in% RHS
@@ -518,7 +569,8 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     xo = get2key(x,isub2)  # Can't be any index with that col as the first one because those indexes will reorder within each group
                     if (is.null(xo)) {   # integer() would be valid and signifies o=1:.N
                         if (verbose) {cat("Creating new index '",isub2,"'\n",sep="");flush.console()}
-                        set2keyv(x,isub2)
+                        if (identical(getOption("datatable.auto.index"), FALSE)) warning("Index is being created on '",isub2,"' besides the fact that option 'datatable.auto.index' is FALSE. Please report to data.table#1422.") # why not double check that, even if won't happen now may be a good check for future changes
+                        setindexv(x,isub2)
                         xo = get2key(x,isub2)
                     } else {
                         if (verbose) {cat("Using existing index '",isub2,"'\n",sep="");flush.console()}
@@ -529,10 +581,10 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 i = as.data.table( unique(RHS) )
                 # To do: wrap isub[[3L]] with as.data.table() first before eval to save copy
                 leftcols = 1L
-                ans = bmerge(i, x, leftcols, rightcols, io<-FALSE, xo, roll=0.0, rollends=c(FALSE,FALSE), nomatch=0L, verbose=verbose)
+                ans = bmerge(i, x, leftcols, rightcols, io<-FALSE, xo, roll=0.0, rollends=c(FALSE,FALSE), nomatch=0L, mult="all", 1L, nqgrp, nqmaxgrp, verbose=verbose)
                 # No need to shallow copy i before passing to bmerge; we just created i above ourselves
                 i = if (ans$allLen1 && !identical(suppressWarnings(min(ans$starts)), 0L)) ans$starts else vecseq(ans$starts, ans$lens, NULL)
-                if (length(xo)) i = fsort(xo[i])
+                if (length(xo)) i = fsort(xo[i], internal=TRUE) else i = fsort(i, internal=TRUE) # fix for #1495
                 leftcols = rightcols = NULL  # these are used later to know whether a join was done, affects column order of result. So reset.
             }
         } else if (!is.name(isub)) i = eval(.massagei(isub), x, parent.frame())
@@ -554,28 +606,117 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 notjoin = FALSE
                 i = !i
             }
-            if (identical(i,NA)) i = NA_integer_  # see DT[NA] thread re recycling of NA logical
         }
         if (is.null(i)) return( null.data.table() )
-        if (is.character(i)) i = data.table(V1=i)   # for user convenience; e.g. DT["foo"] without needing DT[.("foo")]
-        else if (identical(class(i),"list") && length(i)==1L && is.data.frame(i[[1L]])) i = as.data.table(i[[1L]])
+        if (is.character(i)) {
+            isnull_inames = TRUE
+            i = data.table(V1=i)   # for user convenience; e.g. DT["foo"] without needing DT[.("foo")]
+        } else if (identical(class(i),"list") && length(i)==1L && is.data.frame(i[[1L]])) i = as.data.table(i[[1L]])
         else if (identical(class(i),"data.frame")) i = as.data.table(i)   # TO DO: avoid these as.data.table() and use a flag instead
-        else if (identical(class(i),"list")) i = as.data.table(i)
+        else if (identical(class(i),"list")) {
+            isnull_inames = is.null(names(i))
+            i = as.data.table(i)
+        }
         if (is.data.table(i)) {
             if (!haskey(x) && missing(on) && is.null(xo)) {
-                stop("When i is a data.table (or character vector), x must be keyed (i.e. sorted, and, marked as sorted) so data.table knows which columns to join to and take advantage of x being sorted. Call setkey(x,...) first, see ?setkey.")
+                stop("When i is a data.table (or character vector), the columns to join by must be specified either using 'on=' argument (see ?data.table) or by keying x (i.e. sorted, and, marked as sorted, see ?setkey). Keyed joins might have further speed benefits on very large data due to x being sorted in RAM.")
             }
             if (!missing(on)) {
-                if (!is.character(on))
-                    stop("'on' argument should be a named atomic vector oc column names indicating which columns in 'i' should be joined with which columns in 'x'.")
-                if (is.null(names(on))) {
-                    if (verbose)
-                        cat("names(on) = NULL. Assigning 'on' to names(on)' as well.\n")
-                    names(on) = on
+                # on = .() is now possible, #1257
+                parse_on <- function(onsub) {
+                    ops = c("==", "<=", "<", ">=", ">", "!=")
+                    pat = paste("(", ops, ")", sep = "", collapse = "|")
+                    if (is.call(onsub) && onsub[[1L]] == "eval") {
+                        onsub = eval(onsub[[2L]], parent.frame(2L), parent.frame(2L))
+                        if (is.call(onsub) && onsub[[1L]] == "eval") onsub = onsub[[2L]]
+                    }
+                    if (is.call(onsub) && as.character(onsub[[1L]]) %in% c("list", ".")) {
+                        spat = paste("[ ]+(", pat, ")[ ]+", sep="")
+                        onsub = lapply(as.list(onsub)[-1L], function(x) gsub(spat, "\\1", deparse(x, width.cutoff=500L)))
+                        onsub = as.call(c(quote(c), onsub))
+                    }
+                    on = eval(onsub, parent.frame(2L), parent.frame(2L))
+                    if (!is.character(on))
+                        stop("'on' argument should be a named atomic vector of column names indicating which columns in 'i' should be joined with which columns in 'x'.")
+                    this_op = regmatches(on, gregexpr(pat, on))
+                    idx = (vapply(this_op, length, 0L) == 0L)
+                    this_op[idx] = "=="
+                    this_op = unlist(this_op, use.names=FALSE)
+                    idx_op = match(this_op, ops, nomatch=0L)
+                    if (any(idx_op %in% c(0L, 6L)))
+                        stop("Invalid operators ", paste(this_op[idx_op==0L], collapse=","), ". Only allowed operators are ", paste(ops[1:5], collapse=""), ".")
+                    if (is.null(names(on))) {
+                        on[idx] = if (isnull_inames) paste(on[idx], paste("V", seq_len(sum(idx)), sep=""), sep="==") else paste(on[idx], on[idx], sep="==")
+                    } else {
+                        on[idx] = paste(names(on)[idx], on[idx], sep="==")
+                    }
+                    split = tstrsplit(on, paste("[ ]*", pat, "[ ]*", sep=""))
+                    on = setattr(split[[2L]], 'names', split[[1L]])
+                    if (length(empty_idx <- which(names(on) == "")))
+                        names(on)[empty_idx] = on[empty_idx]
+                    list(on = on, ops = idx_op)
                 }
+                on_ops = parse_on(substitute(on))
+                on = on_ops[[1L]]
+                ops = on_ops[[2L]]
+                # TODO: collect all '==' ops first to speeden up Cnestedid
                 rightcols = chmatch(names(on), names(x))
+                if (length(nacols <- which(is.na(rightcols))))
+                    stop("Column(s) [", paste(names(on)[nacols], collapse=","), "] not found in x")
                 leftcols  = chmatch(unname(on), names(i))
-                xo = forderv(x, by = rightcols)
+                if (length(nacols <- which(is.na(leftcols))))
+                    stop("Column(s) [", paste(unname(on)[nacols], collapse=","), "] not found in i")
+                # figure out the columns on which to compute groups on
+                non_equi = which.first(ops != 1L) # 1 is "==" operator
+                if (!is.na(non_equi)) {
+                    # non-equi operators present.. investigate groups..
+                    if (verbose) cat("Non-equi join operators detected ... \n")
+                    if (!missingroll) stop("roll is not implemented for non-equi joins yet.")
+                    if (verbose) {last.started.at=proc.time()[3];cat("  forder took ... ");flush.console()}
+                    # TODO: could check/reuse secondary indices, but we need 'starts' attribute as well!
+                    xo = forderv(x, rightcols, retGrp=TRUE)
+                    if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+                    xg = attr(xo, 'starts')
+                    resetcols = head(rightcols, non_equi-1L)
+                    if (length(resetcols)) {
+                        # TODO: can we get around having to reorder twice here?
+                        # or at least reuse previous order?
+                        if (verbose) {last.started.at=proc.time()[3];cat("  Generating group lengths ... ");flush.console()}
+                        resetlen = attr(forderv(x, resetcols, retGrp=TRUE), 'starts')
+                        resetlen = .Call(Cuniqlengths, resetlen, nrow(x))
+                        if (verbose) {cat("done in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+                    } else resetlen = integer(0)
+                    if (verbose) {last.started.at=proc.time()[3];cat("  Generating non-equi group ids ... ");flush.console()}
+                    nqgrp = .Call(Cnestedid, x, rightcols[non_equi:length(rightcols)], xo, xg, resetlen, mult)
+                    if (verbose) {cat("done in", round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+                    if ( (nqmaxgrp <- max(nqgrp)) > 1L) { # got some non-equi join work to do
+                        if ("_nqgrp_" %in% names(x)) stop("Column name '_nqgrp_' is reserved for non-equi joins.")
+                        if (verbose) {last.started.at=proc.time()[3];cat("  Recomputing forder with non-equi ids ... ");flush.console()}
+                        set(nqx<-shallow(x), j="_nqgrp_", value=nqgrp)
+                        xo = forderv(nqx, c(ncol(nqx), rightcols))
+                        if (verbose) {cat("done in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console}
+                    } else nqgrp = integer(0)
+                    if (verbose) cat("  Found", nqmaxgrp, "non-equi group(s) ...\n")
+                }
+                if (is.na(non_equi)) {
+                    # equi join. use existing key (#1825) or existing secondary index (#1439)
+                    if ( identical(head(key(x), length(on)), names(on)) ) {
+                        xo = integer(0)
+                        if (verbose) cat("on= matches existing key, using key\n")
+                    } else {
+                        if (isTRUE(getOption("datatable.use.index"))) {
+                            idxName = paste0("__", names(on), sep="", collapse="")  # TODO: wrong, no sep!
+                            xo = attr(attr(x, 'index'), idxName)
+                            if (verbose && !is.null(xo)) cat("on= matches existing index, using index\n")
+                        }
+                        if (is.null(xo)) {
+                            last.started.at=proc.time()[3]
+                            xo = forderv(x, by = rightcols)
+                            if (verbose) cat("Calculated ad hoc index in", round(proc.time()[3]-last.started.at,3), "secs\n")
+                            # TODO: use setindex() instead, so it's cached for future reuse
+                        }
+                    }
+                }
             } else if (is.null(xo)) {
                 rightcols = chmatch(key(x),names(x))   # NAs here (i.e. invalid data.table) checked in bmerge()
                 leftcols = if (haskey(i))
@@ -584,17 +725,10 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     seq_len(min(length(i),length(rightcols)))
                 rightcols = head(rightcols,length(leftcols))
                 xo = integer()  ## signifies 1:.N
-                if (missing(by) && with && isTRUE(getOption("datatable.old.bywithoutby"))) {
-                    # To revert to <=v1.9.2 behaviour.  TO DO: remove option after Sep 2015
-                    warning("The data.table option 'datatable.old.bywithoutby' for grouping on join without providing `by` will be deprecated in the next release, use `by=.EACHI`.", call. = FALSE)
-                    by=bysub=as.symbol(".EACHI")
-                    byjoin=TRUE
-                    txtav = c(names(x)[-rightcols], names(i)[-leftcols])
-                    if (missing(j)) j = jsub = as.call(parse(text=paste(".(",paste(txtav, collapse=","),")",sep="")))[[1]]
-                }
+                ops = rep(1L, length(leftcols))
             }
             # Implementation for not-join along with by=.EACHI, #604
-            if (notjoin && byjoin) {
+            if (notjoin && (byjoin || mult != "all")) { # mult != "all" needed for #1571 fix
                 notjoin = FALSE
                 if (verbose) {last.started.at=proc.time()[3];cat("not-join called with 'by=.EACHI'; Replacing !i with i=setdiff(x,i) ...");flush.console()}
                 orignames = copy(names(i))
@@ -605,45 +739,71 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             }
             io = if (missing(on)) haskey(i) else identical(unname(on), head(key(i), length(on)))
             i = .shallow(i, retain.key = io)
-            ans = bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, verbose=verbose)
+            ans = bmerge(i, x, leftcols, rightcols, io, xo, roll, rollends, nomatch, mult, ops, nqgrp, nqmaxgrp, verbose=verbose)
+            # temp fix for issue spotted by Jan, test #1653.1. TODO: avoid this 
+            # 'setorder', as there's another 'setorder' in generating 'irows' below...
+            if (length(ans$indices)) setorder(setDT(ans[1:3]), indices)
+            allLen1 = ans$allLen1
+            allGrp1 = ans$allGrp1
             f__ = ans$starts
             len__ = ans$lens
-            allLen1 = ans$allLen1
+            indices__ = ans$indices
             # length of input nomatch (single 0 or NA) is 1 in both cases.
             # When no match, len__ is 0 for nomatch=0 and 1 for nomatch=NA, so len__ isn't .N
             # If using secondary key of x, f__ will refer to xo
             if (is.na(which)) {
                 w = if (notjoin) f__!=0L else is.na(f__)
-                return( if (length(xo)) fsort(xo[w]) else which(w) )
+                return( if (length(xo)) fsort(xo[w], internal=TRUE) else which(w) )
             }
             if (mult=="all") {
-                if (!byjoin) {
+                # is by=.EACHI along with non-equi join?
+                nqbyjoin = byjoin && length(ans$indices) && !allGrp1
+                if (!byjoin || nqbyjoin) {
                     # Really, `anyDuplicated` in base is AWESOME!
-                    # allow.cartesian shouldn't error if a) not-join, b) 'i' has no duplicates or c) jsub has `:=`.
+                    # allow.cartesian shouldn't error if a) not-join, b) 'i' has no duplicates
                     irows = if (allLen1) f__ else vecseq(f__,len__,
                         if( allow.cartesian || 
-                            notjoin || # #698 fix. When notjoin=TRUE, ignore allow.cartesian. Rows in answer will never be > nrow(x).
-                            !anyDuplicated(f__, incomparables = c(0L, NA_integer_)) || # #742 fix. If 'i' has no duplicates, ignore as well.
-                            (!missing(j) && all.vars(jsub, TRUE)[1L] == ":=")) # #800 fix. if jsub[1L] == ":=" ignore allow.cartesian.
-                                                                            # TODO: warn on `:=` when `i` has duplicates? 
-                           NULL 
+                            notjoin || # #698. When notjoin=TRUE, ignore allow.cartesian. Rows in answer will never be > nrow(x).
+                            !anyDuplicated(f__, incomparables = c(0L, NA_integer_)))  # #742. If 'i' has no duplicates, ignore 
+                            NULL 
                         else as.double(nrow(x)+nrow(i))) # rows in i might not match to x so old max(nrow(x),nrow(i)) wasn't enough. But this limit now only applies when there are duplicates present so the reason now for nrow(x)+nrow(i) is just to nail it down and be bigger than max(nrow(x),nrow(i)).
                     # Fix for #1092 and #1074
                     # TODO: implement better version of "any"/"all"/"which" to avoid 
                     # unnecessary construction of logical vectors
                     if (identical(nomatch, 0L) && allLen1) irows = irows[irows != 0L]
                 } else {
-                    if (length(xo) && missing(on)) stop("Cannot by=.EACHI when joining to a secondary key, yet")
+                    if (length(xo) && missing(on))
+                        stop("Cannot by=.EACHI when joining to a secondary key, yet")
                     # since f__ refers to xo later in grouping, so xo needs to be passed through to dogroups too.
-                    if (length(irows)) stop("Internal error. irows has length in by=.EACHI")
+                    if (length(irows)) 
+                        stop("Internal error. irows has length in by=.EACHI")
+                }
+                if (nqbyjoin) {
+                    irows = if (length(xo)) xo[irows] else irows
+                    xo = setorder(setDT(list(indices=rep.int(indices__, len__), irows=irows)))[["irows"]]
+                    ans = .Call(Cnqnewindices, xo, len__, indices__, max(indices__))
+                    f__ = ans[[1L]]; len__ = ans[[2L]]
+                    allLen1 = FALSE # TODO; should this always be FALSE?
+                    irows = NULL # important to reset
+                    if (any_na(as_list(xo))) xo = xo[!is.na(xo)]
                 }
             } else {
-                irows = if (mult=="first") f__ else f__+len__-1L
-                if (identical(nomatch,0L)) irows = irows[len__>0L]  # 0s are len 0, so this removes -1 irows
-                if (length(len__)) len__ = pmin(len__,1L)  # for test 456, and consistency generally
-                                                           # the if() is for R < 2.15.1 when pmin was enhanced, see v1.8.6.
+                # turning on mult = "first"/"last" for non-equi joins again to test..
+                # if (nqmaxgrp>1L) stop("Non-equi joins aren't yet functional with mult='first' and mult='last'.")
+                # mult="first"/"last" logic moved to bmerge.c, also handles non-equi cases, #1452
+                if (!byjoin) { #1287 and #1271
+                    irows = f__ # len__ is set to 1 as well, no need for 'pmin' logic
+                    if (identical(nomatch,0L)) irows = irows[len__>0L]  # 0s are len 0, so this removes -1 irows
+                }
+                # TODO: when nomatch=NA, len__ need not be allocated / set at all for mult="first"/"last"?
+                # TODO: how about when nomatch=0L, can we avoid allocating then as well?
+            }
+            if (length(xo) && length(irows)) {
+                irows = xo[irows]   # TO DO: fsort here?
+                if (mult=="all" && !allGrp1) {
+                    irows = setorder(setDT(list(indices=rep.int(indices__, len__), irows=irows)))[["irows"]]
+                }
             }
-            if (length(xo) && length(irows)) irows = xo[irows]   # TO DO: fsort here?
         } else {
             if (!missing(on)) {
                 stop("logical error. i is not a data.table, but 'on' argument is provided.")
@@ -652,12 +812,21 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             # i is not a data.table
             if (!is.logical(i) && !is.numeric(i)) stop("i has not evaluated to logical, integer or double")
             if (is.logical(i)) {
-                if (isTRUE(i)) irows = i = NULL  # fixes #1249
-                else if (length(i)==nrow(x)) irows = i = which(i)   # e.g. DT[colA>3,which=TRUE]
-                                                               # also replacing 'i' here - to save memory, #926.
-                else irows=seq_len(nrow(x))[i]  # e.g. recycling DT[c(TRUE,FALSE),which=TRUE], for completeness 
-                # it could also be DT[!TRUE, which=TRUE] (silly cases, yes). 
-                # replaced the "else if (!isTRUE(i))" to just "else". Fixes bug report #4930 
+                if (isTRUE(i)) irows=i=NULL
+                # NULL is efficient signal to avoid creating 1:nrow(x) but still return all rows, fixes #1249
+                
+                else if (length(i)<=1L) irows=i=integer(0)
+                # FALSE, NA and empty. All should return empty data.table. The NA here will be result of expression,
+                # where for consistency of edge case #1252 all NA to be removed. If NA is a single NA symbol, it
+                # was auto converted to NA_integer_ higher up for ease of use and convenience. We definitely
+                # don't want base R behaviour where DF[NA,] returns an entire copy filled with NA everywhere.
+                
+                else if (length(i)==nrow(x)) irows=i=which(i)
+                # The which() here auto removes NA for convenience so user doesn't need to remember "!is.na() & ..."
+                # Also this which() is for consistenty of DT[colA>3,which=TRUE] and which(DT[,colA>3])
+                # Assigning to 'i' here as well to save memory, #926.
+                
+                else stop("i evaluates to a logical vector length ", length(i), " but there are ", nrow(x), " rows. Recycling of logical i is no longer allowed as it hides more bugs than is worth the rare convenience. Explicitly use rep(...,length=.N) if you really need to recycle.")
             } else {
                 irows = as.integer(i)  # e.g. DT[c(1,3)] and DT[c(-1,-3)] ok but not DT[c(1,-3)] (caught as error)
                 irows = .Call(CconvertNegativeIdx, irows, nrow(x)) # simplifies logic from here on (can assume positive subscripts)
@@ -684,50 +853,54 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     byval = NULL
     xnrow = nrow(x)
     xcols = xcolsAns = icols = icolsAns = integer()
+    xdotcols = FALSE
+    othervars = character(0)
     if (missing(j)) {
         # missing(by)==TRUE was already checked above before dealing with i
         if (!length(x)) return(null.data.table())
-        if (!length(leftcols)) {               
-            ansvars = names(x)
+        if (!length(leftcols)) {
+            ansvars = nx = names(x)
             jisvars = character()
             xcols = xcolsAns = seq_along(x)
         } else {
             jisvars = names(i)[-leftcols]
             tt = jisvars %chin% names(x)
             if (length(tt)) jisvars[tt] = paste("i.",jisvars[tt],sep="")
-            ansvars = make.unique(c(names(x), jisvars))
+            if (length(duprightcols <- rightcols[duplicated(rightcols)])) {
+                nx = c(names(x), names(x)[duprightcols])
+                rightcols = chmatch2(names(x)[rightcols], nx)
+                nx = make.unique(nx)
+            } else nx = names(x)
+            ansvars = make.unique(c(nx, jisvars))
             icols = c(leftcols, seq_along(i)[-leftcols])
-            icolsAns = c(rightcols, seq.int(ncol(x)+1L, length.out=ncol(i)-length(leftcols)))
+            icolsAns = c(rightcols, seq.int(length(nx)+1L, length.out=ncol(i)-length(unique(leftcols))))
             xcols = xcolsAns = seq_along(x)[-rightcols]
         }
-        ansvals = chmatch(ansvars, names(x))
+        ansvals = chmatch(ansvars, nx)
     } else {
-        # These commented lines are moved to the top for #800.
-        # jsub = substitute(j)
-        # # deconstruct and eval everything with just one argument, then reconstruct back to a call
-        # if (is.call(jsub))
-        #     jsub = construct(deconstruct_and_eval(jsub, parent.frame(), parent.frame()))
-        if (is.null(jsub)) return(NULL)
+        if (is.data.table(i)) {
+            idotprefix = paste0("i.", names(i))
+            xdotprefix = paste0("x.", names(x))
+        } else idotprefix = xdotprefix = character(0)
 
-        if (is.call(jsub) && jsub[[1L]]==as.name(":=")) {
-            if (identical(irows, integer())) {  # short circuit do-nothing, don't do further checks on .SDcols for example
-                .global$print = address(x)
-                return(invisible(x))          # irows=NULL means all rows at this stage
-            }
-            if (!with) {
-                if (is.null(names(jsub)) && is.name(jsub[[2L]])) {
-                    # TO DO: uncomment these warnings in next release. Later, make both errors.
-                    ## warning("with=FALSE is deprecated when using :=. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples.")
-                    jsub[[2L]] = eval(jsub[[2L]], parent.frame(), parent.frame()) 
-                } else {
-                    ## warning("with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.")
-                }
-                with = TRUE
+        # j was substituted before dealing with i so that := can set allow.cartesian=FALSE (#800) (used above in i logic)
+        if (is.null(jsub)) return(NULL)
+        
+        if (!with && is.call(jsub) && jsub[[1L]]==":=") {
+            # TODO: make these both errors (or single long error in both cases) in next release.
+            # i.e. using with=FALSE together with := at all will become an error. Eventually with will be removed.
+            if (is.null(names(jsub)) && is.name(jsub[[2L]])) {
+                warning("with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please wrap the LHS of := with parentheses; e.g., DT[,(myVar):=sum(b),by=a] to assign to column name(s) held in variable myVar. See ?':=' for other examples. As warned in 2014, this is now a warning.")
+                jsub[[2L]] = eval(jsub[[2L]], parent.frame(), parent.frame()) 
+            } else {
+                warning("with=FALSE ignored, it isn't needed when using :=. See ?':=' for examples.")
             }
+            with = TRUE
         }
+        
         if (!with) {
             # missing(by)==TRUE was already checked above before dealing with i
-            if (is.call(jsub) && deparse(jsub[[1]], 500L) %in% c("!", "-")) {
+            if (is.call(jsub) && deparse(jsub[[1]], 500L) %in% c("!", "-")) {  # TODO is deparse avoidable here?
                 notj = TRUE
                 jsub = jsub[[2L]]
             } else notj = FALSE
@@ -760,14 +933,15 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     ansvals = chmatch(ansvars, names(x))
                 }
             } else if (is.numeric(j)) {
-                if (all(j == 0L)) return (null.data.table())
-                if (any(abs(j) > ncol(x) | j==0L)) stop("j out of bounds")
-                if (any(j<0L) && any(j>0L)) stop("j mixes positive and negative")
+                j = as.integer(j)
+                if (any(w<-(j>ncol(x)))) stop("Item ",which.first(w)," of j is ",j[which.first(w)]," which is outside the column number range [1,ncol=", ncol(x),"]")
+                j = j[j!=0L]
+                if (!length(j)) return(null.data.table())
+                if (any(j<0L) && any(j>0L)) stop("j mixes positives and negatives")
                 if (any(j<0L)) j = seq_len(ncol(x))[j]
-                ansvars = names(x)[ if (notj) -j else j ]  # DT[,!"columntoexclude",with=FALSE], if a copy is needed, rather than :=NULL
-                # DT[, c(1,3), with=FALSE] should clearly provide both 'x' columns
-                ansvals = if (notj) setdiff(seq_along(x), as.integer(j)) else as.integer(j)
-            }
+                ansvars = names(x)[ if (notj) -j else j ]  # DT[,!"columntoexclude",with=FALSE] if a copy is needed, rather than :=NULL
+                ansvals = if (notj) setdiff(seq_along(x), j) else j
+            } else stop("When with=FALSE, j-argument should be of type logical/character/integer indicating the columns to select.") # fix for #1440.
             if (!length(ansvals)) return(null.data.table())
         } else {   # with=TRUE and byjoin could be TRUE
             bynames = NULL
@@ -788,7 +962,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 }
                 if (length(bysubl) && identical(bysubl[[1L]],quote(eval))) {    # TO DO: or by=..()
                     bysub = eval(bysubl[[2]], parent.frame(), parent.frame())
-                    bysub = replace_dot(bysub) # fix for #1298
+                    bysub = replace_dot_alias(bysub) # fix for #1298
                     if (is.expression(bysub)) bysub=bysub[[1L]]
                     bysubl = as.list.default(bysub)
                 } else if (is.call(bysub) && as.character(bysub[[1L]]) %chin% c("c","key","names", "intersect", "setdiff")) {
@@ -801,7 +975,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     tt = eval(bysub, parent.frame(), parent.frame())
                     if (!is.character(tt)) stop("by=c(...), key(...) or names(...) must evaluate to 'character'")
                     bysub=tt
-                } else if (is.call(bysub) && !as.character(bysub[[1L]]) %chin% c("list", "as.list", "{", ".")) {
+                } else if (is.call(bysub) && !as.character(bysub[[1L]]) %chin% c("list", "as.list", "{", ".", ":")) {
                     # potential use of function, ex: by=month(date). catch it and wrap with "(", because we need to set "bysameorder" to FALSE as we don't know if the function will return ordered results just because "date" is ordered. Fixes #2670.
                     bysub = as.call(c(as.name('('), list(bysub)))
                     bysubl = as.list.default(bysub)
@@ -817,13 +991,16 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     bysub = parse(text=paste("list(",paste(bysub,collapse=","),")",sep=""))[[1L]]
                     bysubl = as.list.default(bysub)
                 }
-                allbyvars = intersect(all.vars(construct(bysubl), FALSE),names(x))
+                allbyvars = intersect(all.vars(bysub),names(x))
                 
                 orderedirows = .Call(CisOrderedSubset, irows, nrow(x))  # TRUE when irows is NULL (i.e. no i clause)
                 # orderedirows = is.sorted(f__)
                 bysameorder = orderedirows && haskey(x) && all(sapply(bysubl,is.name)) && length(allbyvars) && identical(allbyvars,head(key(x),length(allbyvars)))
                 if (is.null(irows))
-                    byval = eval(bysub, x, parent.frame())
+                    if (is.call(bysub) && length(bysub) == 3L && bysub[[1L]] == ":" && is.name(bysub[[2L]]) && is.name(bysub[[3L]])) {
+                        byval = eval(bysub, setattr(as.list(seq_along(x)), 'names', names(x)), parent.frame())
+                        byval = as.list(x)[byval]
+                    } else byval = eval(bysub, x, parent.frame())
                 else {
                     if (!is.integer(irows)) stop("Internal error: irows isn't integer")  # length 0 when i returns no rows
                     # Passing irows as i to x[] below has been troublesome in a rare edge case.
@@ -840,7 +1017,10 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                         if (verbose) cat("i clause present but columns used in by not detected. Having to subset all columns before evaluating 'by': '",deparse(by),"'\n",sep="")
                         xss = x[irows,nomatch=nomatch,mult=mult,roll=roll,rollends=rollends]
                     }
-                    byval = eval(bysub, xss, parent.frame())
+                    if (is.call(bysub) && length(bysub) == 3L && bysub[[1L]] == ":") {
+                        byval = eval(bysub, setattr(as.list(seq_along(xss)), 'names', names(xss)), parent.frame())
+                        byval = as.list(xss)[byval]
+                    } else byval = eval(bysub, xss, parent.frame())
                     xnrow = nrow(xss)
                     # TO DO: pass xss (x subset) through into dogroups. Still need irows there (for :=), but more condense
                     # and contiguous to use xss to form .SD in dogroups than going via irows
@@ -896,6 +1076,10 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                             # if user doesn't like this inferred name, user has to use by=list() to name the column
                         }
                     }
+                    # Fix for #1334
+                    if (any(duplicated(bynames))) {
+                        bynames = make.unique(bynames)
+                    }
                 }
                 setattr(byval, "names", bynames)  # byval is just a list not a data.table hence setattr not setnames
             }
@@ -923,7 +1107,9 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 } # else empty list is needed for test 468: adding an empty list column
             } # else maybe a call to transform or something which returns a list.
             av = all.vars(jsub,TRUE)  # TRUE fixes bug #1294 which didn't see b in j=fns[[b]](c)
-            if (".SD" %chin% av) {
+            use.I = ".I" %chin% av
+            # browser()
+            if (any(c(".SD","eval","get","mget") %chin% av)) {
                 if (missing(.SDcols)) {
                     # here we need to use 'dupdiff' instead of 'setdiff'. Ex: setdiff(c("x", "x"), NULL) will give 'x'.
                     ansvars = dupdiff(names(x),union(bynames,allbyvars))   # TO DO: allbyvars here for vars used by 'by'. Document.
@@ -966,11 +1152,20 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                         # dups = FALSE here. DT[, .SD, .SDcols=c("x", "x")] again doesn't really help with which 'x' to keep (and if '-' which x to remove)
                         ansvals = chmatch(ansvars, names(x))
                     }
-                    # .SDcols might include grouping columns if users wants that, but normally we expect user not to include them in .SDcols
                 }
+                # fix for long standing FR/bug, #495 and #484
+                allcols = c(names(x), xdotprefix, names(i), idotprefix)
+                if ( length(othervars <- setdiff(intersect(av, allcols), c(bynames, ansvars))) ) {
+                    # we've a situation like DT[, c(sum(V1), lapply(.SD, mean)), by=., .SDcols=...] or 
+                    # DT[, lapply(.SD, function(x) x *v1), by=, .SDcols=...] etc., 
+                    ansvars = union(ansvars, othervars)
+                    ansvals = chmatch(ansvars, names(x))
+                }
+                # .SDcols might include grouping columns if users wants that, but normally we expect user not to include them in .SDcols
             } else {
                 if (!missing(.SDcols)) warning("This j doesn't use .SD but .SDcols has been supplied. Ignoring .SDcols. See ?data.table.")
-                ansvars = setdiff(intersect(av,c(names(x),names(i),paste("i.",names(i),sep=""))), bynames)
+                allcols = c(names(x), xdotprefix, names(i), idotprefix)
+                ansvars = setdiff(intersect(av,allcols), bynames)
                 if (verbose) cat("Detected that j uses these columns:",if (!length(ansvars)) "<none>" else paste(ansvars,collapse=","),"\n")
                 # using a few named columns will be faster
                 # Consider:   DT[,max(diff(date)),by=list(month=month(date))]
@@ -979,15 +1174,20 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 ansvals = chmatch(ansvars, names(x))
             }
             # if (!length(ansvars)) Leave ansvars empty. Important for test 607.
+            
+            # TODO remove as (m)get is now folded in above.
             # added 'mget' - fix for #994
             if (any(c("get", "mget") %chin% av)) {
                 if (verbose) {
-                    cat("'(m)get' found in j. ansvars being set to all columns. Use .SDcols or eval(macro) instead. Both will detect the columns used which is important for efficiency.\nOld:", paste(ansvars,collapse=","),"\n")
+                    cat("'(m)get' found in j. ansvars being set to all columns. Use .SDcols or a single j=eval(macro) instead. Both will detect the columns used which is important for efficiency.\nOld:", paste(ansvars,collapse=","),"\n")
                     # get('varname') is too difficult to detect which columns are used in general
                     # eval(macro) column names are detected via the  if jsub[[1]]==eval switch earlier above.
                 }
-                ansvars = setdiff(c(names(x), if (is.data.table(i)) c(names(i), paste("i.", names(i), sep=""))),bynames) # fix for bug #5443
+                if (length(ansvars)) othervars = ansvars # #1744 fix
+                allcols = c(names(x), xdotprefix, names(i), idotprefix)
+                ansvars = setdiff(allcols,bynames) # fix for bug #5443
                 ansvals = chmatch(ansvars, names(x))
+                if (length(othervars)) othervars = setdiff(ansvars, othervars) # #1744 fix
                 if (verbose) cat("New:",paste(ansvars,collapse=","),"\n")
             }
 
@@ -1000,22 +1200,6 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 # Suppress print when returns ok not on error, bug #2376. Thanks to: http://stackoverflow.com/a/13606880/403310
                 # All appropriate returns following this point are wrapped; i.e. return(suppPrint(x)).
                 
-                # FR #4996 - verbose message and return when a join matches nothing with `:=` in j
-                if (byjoin & !notjoin) {
-                    # Note: !notjoin is here only until the notjoin is implemented as a "proper" byjoin
-                    if ((all(is.na(f__)) | (all(f__ == 0L) & nomatch == 0L))) {
-                        if (verbose) cat("No rows pass i clause so quitting := early with no changes made.\n")
-                        return(suppPrint(x))
-                    }
-                }
-                if (!is.null(irows)) {
-                    if (!length(irows)) {
-                        if (verbose) cat("No rows pass i clause so quitting := early with no changes made.\n")
-                        return(suppPrint(x))
-                    } else
-                        if (!with) irows <- irows[!is.na(irows)] # fixes 2445. TO DO: added a message if(verbose) or warning?
-                        if (!missing(keyby)) stop("When i is present, keyby := on a subset of rows doesn't make sense. Either change keyby to by, or remove i")
-                }
                 if (is.null(names(jsub))) {
                     # regular LHS:=RHS usage, or `:=`(...) with no named arguments (an error)
                     # `:=`(LHS,RHS) is valid though, but more because can't see how to detect that, than desire
@@ -1049,6 +1233,23 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     # updates by reference to existing columns
                     cols = as.integer(m)
                     newnames=NULL
+                    if (identical(irows, integer())) {
+                        # Empty integer() means no rows e.g. logical i with only FALSE and NA
+                        # got converted to empty integer() by the which() above
+                        # Short circuit and do-nothing since columns already exist. If some don't
+                        # exist then for consistency with cases where irows is non-empty, we need to create
+                        # them of the right type and populate with NA.  Which will happen via the regular
+                        # alternative branches below, to cover #759.
+                        # We need this short circuit at all just for convenience. Otherwise users may need to
+                        # fix errors in their RHS when called on empty edge cases, even when the result won't be
+                        # used anyway (so it would be annoying to have to fix it.)
+                        if (verbose) {
+                            cat("No rows match i. No new columns to add so not evaluating RHS of :=\n")
+                            cat("Assigning to 0 row subset of",nrow(x),"rows\n")
+                        }
+                        .global$print = address(x)
+                        return(invisible(x))
+                    }
                 } else {
                     # Adding new column(s). TO DO: move after the first eval in case the jsub has an error.
                     newnames=setdiff(lhs,names(x))
@@ -1057,7 +1258,9 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                     if ((ok<-selfrefok(x,verbose))==0L)   # ok==0 so no warning when loaded from disk (-1) [-1 considered TRUE by R]
                         warning("Invalid .internal.selfref detected and fixed by taking a (shallow) copy of the data.table so that := can add this new column by reference. At an earlier point, this data.table has been copied by R (or been created manually using structure() or similar). Avoid key<-, names<- and attr<- which in R currently (and oddly) may copy the whole data.table. Use set* syntax instead to avoid copying: ?set, ?setnames and ?setattr. Also, in R<=v3.0.2, list(DT1,DT2) cop [...]
                     if ((ok<1L) || (truelength(x) < ncol(x)+length(newnames))) {
-                        n = max(ncol(x)+100, ncol(x)+2*length(newnames))
+                        DT = x  # in case getOption contains "ncol(DT)" as it used to.  TODO: warn and then remove
+                        n = length(newnames) + eval(getOption("datatable.alloccol"))  # TODO: warn about expressions and then drop the eval()
+                        # i.e. reallocate at the size as if the new columns were added followed by alloc.col().
                         name = substitute(x)
                         if (is.name(name) && ok && verbose) { # && NAMED(x)>0 (TO DO)    # ok here includes -1 (loaded from disk)
                             cat("Growing vector of column pointers from truelength ",truelength(x)," to ",n,". A shallow copy has been taken, see ?alloc.col. Only a potential issue if two variables point to the same data (we can't yet detect that well) and if not you can safely ignore this. To avoid this message you could alloc.col() first, deep copy first using copy(), wrap with suppressWarnings() or increase the 'datatable.alloccol' option.\n")
@@ -1096,7 +1299,16 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         
         if (length(ansvars)) {
             w = ansvals
-            if (length(rightcols) && missing(by)) w[ w %in% rightcols ] = NA
+            if (length(rightcols) && missing(by)) {
+                w[ w %in% rightcols ] = NA
+            }
+            # patch for #1615. Allow 'x.' syntax. Only useful during join op when x's join col needs to be used.
+            # Note that I specifically have not implemented x[y, aa, on=c(aa="bb")] to refer to x's join column 
+            # as well because x[i, col] == x[i][, col] will not be TRUE anymore..
+            if ( any(xdotprefixvals <- ansvars %in% xdotprefix)) {
+                w[xdotprefixvals] = chmatch(ansvars[xdotprefixvals], xdotprefix)
+                xdotcols = TRUE
+            }
             if (!any(wna <- is.na(w))) {
                 xcols = w
                 xcolsAns = seq_along(ansvars)
@@ -1125,8 +1337,13 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     }  # end of  if !missing(j)
     
     SDenv = new.env(parent=parent.frame())
-    # hash=TRUE (the default) does seem better as expected using e.g. test 645.  TO DO experiment with 'size' argument
+    # taking care of warnings for posixlt type, #646
+    SDenv$strptime <- function(x, ...) {
+    warning("POSIXlt column type detected and converted to POSIXct. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.")
+    as.POSIXct(base::strptime(x, ...))
+    }
 
+    # hash=TRUE (the default) does seem better as expected using e.g. test 645.  TO DO experiment with 'size' argument
     if (missing(by) || (!byjoin && !length(byval))) {
         # No grouping: 'by' = missing | NULL | character() | "" | list()
         # Considered passing a one-group to dogroups but it doesn't do the recycling of i within group, that's done here
@@ -1134,7 +1351,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             # TO DO: port more of this to C
             ans = vector("list", length(ansvars))
             if (length(i) && length(icols)) {
-                if (allLen1 && (is.na(nomatch) || !any(f__==0L))) {   # nomatch=0 should drop rows in i that have no match
+                if (allLen1 && allGrp1 && (is.na(nomatch) || !any(f__==0L))) {   # nomatch=0 should drop rows in i that have no match
                     for (s in seq_along(icols)) {
                         target = icolsAns[s]
                         source = icols[s]
@@ -1142,7 +1359,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                         if (address(ans[[target]]) == address(i[[source]])) ans[[target]] = copy(ans[[target]])
                     }
                 } else {
-                    ii = rep.int(seq_len(nrow(i)),len__)
+            ii = rep.int(if(allGrp1) seq_len(nrow(i)) else indices__, len__)
                     for (s in seq_along(icols)) {
                         target = icolsAns[s]
                         source = icols[s]
@@ -1178,7 +1395,12 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             if (haskey(x)) {
                 keylen = which.first(!key(x) %chin% ansvars)-1L
                 if (is.na(keylen)) keylen = length(key(x))
-                if (keylen > length(rightcols) && !.Call(CisOrderedSubset, irows, nrow(x))) keylen = length(rightcols)
+                len = length(rightcols)
+                # fix for #1268, #1704, #1766 and #1823
+                chk = if (len && !missing(on)) !identical(head(key(x), len), names(on)) else FALSE
+                if ( (keylen>len || chk) && !.Call(CisOrderedSubset, irows, nrow(x))) {
+                    keylen = if (!chk) len else 0L # fix for #1268
+                }
                 if (keylen && ((is.data.table(i) && haskey(i)) || is.logical(i) || (.Call(CisOrderedSubset, irows, nrow(x)) && ((roll == FALSE) || length(irows) == 1L)))) # see #1010. don't set key when i has no key, but irows is ordered and roll != FALSE
                     setattr(ans,"sorted",head(key(x),keylen))
             }
@@ -1187,25 +1409,28 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
             
             if (!with || missing(j)) return(alloc.col(ans))
 
-            SDenv$.SD = ans
+            SDenv$.SDall = ans
+            SDenv$.SD = if (!length(othervars)) SDenv$.SDall else shallow(SDenv$.SDall, setdiff(ansvars, othervars))
             SDenv$.N = nrow(SDenv$.SD)
 
         } else {
-            SDenv$.SD = null.data.table()   # no columns used by j so .SD can be empty. Only needs to exist so that we can rely on it being there when locking it below for example. If .SD were used by j, of course then xvars would be the columns and we wouldn't be in this leaf.
+            SDenv$.SDall = SDenv$.SD = null.data.table()   # no columns used by j so .SD can be empty. Only needs to exist so that we can rely on it being there when locking it below for example. If .SD were used by j, of course then xvars would be the columns and we wouldn't be in this leaf.
             SDenv$.N = if (is.null(irows)) nrow(x) else length(irows) * !identical(suppressWarnings(max(irows)), 0L)
             # Fix for #963.
             # When irows is integer(0), length(irows) = 0 will result in 0 (as expected).
             # Binary search can return all 0 irows when none of the input matches. Instead of doing all(irows==0L) (previous method), which has to allocate a logical vector the size of irows, we can make use of 'max'. If max is 0, we return 0. The condition where only some irows > 0 won't occur.
         }
         # Temp fix for #921. Allocate `.I` only if j-expression uses it.
-        SDenv$.I = if (!missing(j) && ".I" %chin% av) seq_len(SDenv$.N) else 0L
+        SDenv$.I = if (!missing(j) && use.I) seq_len(SDenv$.N) else 0L
         SDenv$.GRP = 1L
         setattr(SDenv$.SD,".data.table.locked",TRUE)   # used to stop := modifying .SD via j=f(.SD), bug#1727. The more common case of j=.SD[,subcol:=1] was already caught when jsub is inspected for :=.
+        setattr(SDenv$.SDall,".data.table.locked",TRUE)
         lockBinding(".SD",SDenv)
+        lockBinding(".SDall",SDenv)
         lockBinding(".N",SDenv)
         lockBinding(".I",SDenv)
         lockBinding(".GRP",SDenv)
-        for (ii in ansvars) assign(ii, SDenv$.SD[[ii]], SDenv)
+        for (ii in ansvars) assign(ii, SDenv$.SDall[[ii]], SDenv)
         # Since .SD is inside SDenv, alongside its columns as variables, R finds .SD symbol more quickly, if used.
         # There isn't a copy of the columns here, the xvar symbols point to the SD columns (copy-on-write).
 
@@ -1214,22 +1439,28 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         # copy 'jval' when required
         # More speedup - only check + copy if irows is NULL
         if (is.null(irows)) {
-            if (is.atomic(jval)) {
+            if (!is.list(jval)) { # performance improvement when i-arg is S4, but not list, #1438, Thanks @DCEmilberg.
                 jcpy = address(jval) %in% sapply(SDenv$.SD, address) # %chin% errors when RHS is list()
                 if (jcpy) jval = copy(jval)
             } else if (address(jval) == address(SDenv$.SD)) {
                 jval = copy(jval)
             } else if ( length(jcpy <- which(sapply(jval, address) %in% sapply(SDenv, address))) ) {
                 for (jidx in jcpy) jval[[jidx]] = copy(jval[[jidx]])
+            } else if (is.call(jsub) && jsub[[1L]] == "get" && is.list(jval)) {
+                jval = copy(jval) # fix for #1212
+            }
+        } else {
+            if (is.data.table(jval)) {
+                setattr(jval, '.data.table.locked', NULL) # fix for #1341
+                if (!truelength(jval)) alloc.col(jval)
             }
         }
-
-        if (!is.null(lhs)) {   # *** TO DO ***: use set() here now that it can add new column(s) and remove newnames and alloc logic above
-            if (verbose) cat("Assigning to ",if (is.null(irows)) "all " else paste(length(irows),"row subset of "), nrow(x)," rows\n",sep="")
+        if (!is.null(lhs)) {
+            # TODO?: use set() here now that it can add new columns. Then remove newnames and alloc logic above.
             .Call(Cassign,x,irows,cols,newnames,jval,verbose)
             return(suppPrint(x))
         }
-        if ((is.call(jsub) && is.list(jval) && !is.object(jval)) || !missing(by)) {
+        if ((is.call(jsub) && is.list(jval) && jsub[[1L]] != "get" && !is.object(jval)) || !missing(by)) {
             # is.call: selecting from a list column should return list
             # is.object: for test 168 and 168.1 (S4 object result from ggplot2::qplot). Just plain list results should result in data.table
 
@@ -1244,9 +1475,10 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 if (is.null(jvnames)) jvnames=names(jval)
                 # avoid copy if all vectors are already of same lengths, use setDT
                 lenjval = vapply(jval, length, 0L)
-                if (any(lenjval != lenjval[1L]))
+                if (any(lenjval != lenjval[1L])) {
                     jval = as.data.table.list(jval)   # does the vector expansion to create equal length vectors
-                else setDT(jval)
+                    jvnames = jvnames[lenjval != 0L]  # fix for #1477
+                } else setDT(jval)
             }
             if (is.null(jvnames)) jvnames = character(length(jval)-length(bynames))
             ww = which(jvnames=="")
@@ -1279,7 +1511,7 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     assign("print", function(x,...){base::print(x,...);NULL}, SDenv)
     # Now ggplot2 returns data from print, we need a way to throw it away otherwise j accumulates the result
 
-    SDenv$.SD = null.data.table()  # e.g. test 607. Grouping still proceeds even though no .SD e.g. grouping key only tables, or where j consists of .N only
+    SDenv$.SDall = SDenv$.SD = null.data.table()  # e.g. test 607. Grouping still proceeds even though no .SD e.g. grouping key only tables, or where j consists of .N only
     SDenv$.N = as.integer(0)     # not 0L for the reson on next line :
     SDenv$.GRP = as.integer(1)   # oddly using 1L doesn't work reliably here! Possible R bug? TO DO: create reproducible example and report. To reproduce change to 1L and run test.data.table, test 780 fails. The assign seems ineffective and a previous value for .GRP from a previous test is retained, despite just creating a new SDenv.
 
@@ -1309,52 +1541,84 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     } else {
         # Find the groups, using 'byval' ...
         if (missing(by)) stop("Internal error, by is missing")
-        if (verbose) {last.started.at=proc.time()[3];cat("Finding groups (bysameorder=",bysameorder,") ... ",sep="");flush.console()}
+        
         if (length(byval) && length(byval[[1]])) {
             if (!bysameorder) {
-                o__ = forderv(byval, sort=FALSE, retGrp=TRUE)   # returns integer() (not NULL) if already ordered, to save 1:xnrow for efficiency
+                if (verbose) {last.started.at=proc.time()[3];cat("Finding groups using forderv ... ");flush.console()}
+                o__ = forderv(byval, sort=!missing(keyby), retGrp=TRUE)
+                # The sort= argument is called sortStr at C level. It's just about saving the sort of unique strings at
+                # C level for efficiency (cgroup vs csort) when by= not keyby=. All other types are always sorted. Getting 
+                # orginal order below is the part that retains original order. Passing sort=TRUE here always won't change any 
+                # result at all (tested and confirmed to double check), it'll just make by= slower when there's a large
+                # number of unique strings. It must be TRUE when keyby= though, since the key is just marked afterwards.
+                # forderv() returns empty integer() if already ordered to save allocating 1:xnrow
                 bysameorder = orderedirows && !length(o__)
+                if (verbose) {
+                    cat(round(proc.time()[3]-last.started.at, 3), "sec\n")
+                    last.started.at=proc.time()[3]
+                    cat("Finding group sizes from the positions (can be avoided to save RAM) ... ")
+                    flush.console()  # for windows
+                }
                 f__ = attr(o__, "starts")
                 len__ = uniqlengths(f__, xnrow)
-                if (!bysameorder) {    # TO DO: lower this into forder.c
+                if (verbose) { cat(round(proc.time()[3]-last.started.at, 3), "sec\n");flush.console()}
+                if (!bysameorder && missing(keyby)) {
+                    # TO DO: lower this into forder.c
+                    if (verbose) {last.started.at=proc.time()[3];cat("Getting back original order ... ");flush.console()}
                     firstofeachgroup = o__[f__]    
                     if (length(origorder <- forderv(firstofeachgroup))) {
                         f__ = f__[origorder]
                         len__ = len__[origorder]
                     }
+                    if (verbose) {cat(round(proc.time()[3]-last.started.at, 3), "sec\n")}
                 }
-                if (!orderedirows && !length(o__)) o__ = 1:xnrow  # temp fix.  TO DO: revist orderedirows
+                if (!orderedirows && !length(o__)) o__ = 1:xnrow  # temp fix.  TODO: revist orderedirows
             } else {
+                if (verbose) {last.started.at=proc.time()[3];cat("Finding groups using uniqlist ... ");flush.console()}
                 f__ = uniqlist(byval)
+                if (verbose) {
+                    cat(round(proc.time()[3]-last.started.at, 3), "sec\n")
+                    last.started.at=proc.time()[3]
+                    cat("Finding group sizes from the positions (can be avoided to save RAM) ... ")
+                    flush.console()  # for windows
+                }
                 len__ = uniqlengths(f__, xnrow)
                 # TO DO: combine uniqlist and uniquelengths into one call.  Or, just set len__ to NULL when dogroups infers that.
+                if (verbose) { cat(round(proc.time()[3]-last.started.at, 3), "sec\n");flush.console() }
             }
         } else {
             f__=NULL
             len__=0L
             bysameorder=TRUE   # for test 724
         }
-        if (verbose) {cat("done in ",round(proc.time()[3]-last.started.at,3),"secs. bysameorder=",bysameorder," and o__ is length ",length(o__),"\n",sep="");flush.console}
         # TO DO: allow secondary keys to be stored, then we see if our by matches one, if so use it, and no need to sort again. TO DO: document multiple keys.
     }
     alloc = if (length(len__)) seq_len(max(len__)) else 0L
     SDenv$.I = alloc
     if (length(xcols)) {
-        SDenv$.SD = .Call(CsubsetDT,x,alloc,xcols)    # i.e. x[alloc, xcols, with=FALSE] but without recursive overhead
-        # Must not shallow copy here. This is the allocation for the largest group. Since i=alloc is passed in here, it won't shallow copy, even in future. Only DT[,xvars,with=FALSE] might ever shallow copy automatically.
+        #  TODO add: if (length(alloc)==nrow(x)) stop("There is no need to deep copy x in this case")
+        SDenv$.SDall = .Call(CsubsetDT,x,alloc,xcols)    # must be deep copy when largest group is a subset
+        if (xdotcols) setattr(SDenv$.SDall, 'names', ansvars[seq_along(xcols)]) # now that we allow 'x.' prefix in 'j'
+        SDenv$.SD = if (!length(othervars)) SDenv$.SDall else shallow(SDenv$.SDall, setdiff(ansvars, othervars))
+    }
+    if (nrow(SDenv$.SDall)==0L) {
+        setattr(SDenv$.SDall,"row.names",c(NA_integer_,0L))
+        setattr(SDenv$.SD,"row.names",c(NA_integer_,0L))
     }
-    if (nrow(SDenv$.SD)==0L) setattr(SDenv$.SD,"row.names",c(NA_integer_,0L))
     # .set_row_names() basically other than not integer() for 0 length, otherwise dogroups has no [1] to modify to -.N
     setattr(SDenv$.SD,".data.table.locked",TRUE)   # used to stop := modifying .SD via j=f(.SD), bug#1727. The more common case of j=.SD[,subcol:=1] was already caught when jsub is inspected for :=.
+    setattr(SDenv$.SDall,".data.table.locked",TRUE)
     lockBinding(".SD",SDenv)
+    lockBinding(".SDall",SDenv)
     lockBinding(".N",SDenv)
     lockBinding(".GRP",SDenv)
     lockBinding(".I",SDenv)
     lockBinding(".iSD",SDenv)
     
     GForce = FALSE
-    if ( (getOption("datatable.optimize")>=1 && is.call(jsub)) || (is.name(jsub) && jsub == ".SD") ) {  # Ability to turn off if problems or to benchmark the benefit
+    if ( getOption("datatable.optimize")>=1 && (is.call(jsub) || (is.name(jsub) && as.character(jsub) %chin% c(".SD",".N"))) ) {  # Ability to turn off if problems or to benchmark the benefit
         # Optimization to reduce overhead of calling lapply over and over for each group
+        ansvarsnew = setdiff(ansvars, othervars)
         oldjsub = jsub
         funi = 1L # Fix for #985
         # convereted the lapply(.SD, ...) to a function and used below, easier to implement FR #2722 then.
@@ -1375,14 +1639,14 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
                 if (is.character(fun)) fun = as.name(fun)
                 txt[[1L]] = fun
             }
-            ans = vector("list",length(ansvars)+1L)
+            ans = vector("list",length(ansvarsnew)+1L)
             ans[[1L]] = as.name("list")
-            for (ii in seq_along(ansvars)) {
-                txt[[2L]] = as.name(ansvars[ii])
+            for (ii in seq_along(ansvarsnew)) {
+                txt[[2L]] = as.name(ansvarsnew[ii])
                 ans[[ii+1L]] = as.call(txt)
             }
             jsub = as.call(ans)  # important no names here
-            jvnames = ansvars      # but here instead
+            jvnames = ansvarsnew      # but here instead
             list(jsub, jvnames)
             # It may seem inefficient to constuct a potentially long expression. But, consider calling
             # lapply 100000 times. The C code inside lapply does the LCONS stuff anyway, every time it
@@ -1398,136 +1662,154 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         }
         if (is.name(jsub)) {
             if (jsub == ".SD") {
-                jsub = as.call(c(quote(list), lapply(ansvars, as.name)))
-                jvnames = ansvars
+                jsub = as.call(c(quote(list), lapply(ansvarsnew, as.name)))
+                jvnames = ansvarsnew
             }
-        } else if ( length(jsub) == 3L && (jsub[[1L]] == "[" || jsub[[1L]] == "head") && jsub[[2L]] == ".SD" && (is.numeric(jsub[[3L]]) || jsub[[3L]] == ".N") ) {
-            # optimise .SD[1] or .SD[2L]. Not sure how to test .SD[a] as to whether a is numeric/integer or a data.table, yet.
-            jsub = as.call(c(quote(list), lapply(ansvars, function(x) { jsub[[2L]] = as.name(x); jsub })))
-            jvnames = ansvars
-        } else if (jsub[[1L]]=="lapply" && jsub[[2L]]==".SD" && length(xcols)) {
-            deparse_ans = .massageSD(jsub)
-            jsub = deparse_ans[[1L]]
-            jvnames = deparse_ans[[2L]]
-        } else if (jsub[[1L]] == "c" && length(jsub) > 1L) {
-            # TODO, TO DO: raise the checks for 'jvnames' earlier (where jvnames is set by checking 'jsub') and set 'jvnames' already.
-            # FR #2722 is just about optimisation of j=c(.N, lapply(.SD, .)) that is taken care of here.
-            # FR #735 tries to optimise j-expressions of the form c(...) as long as ... contains
-            # 1) lapply(.SD, ...), 2) simply .SD or .SD[..], 3) .N, 4) list(...) and 5) functions that normally return a single value*
-            # On 5)* the IMPORTANT point to note is that things that are not wrapped within "list(...)" should *always* 
-            # return length 1 output for us to optimise. Else, there's no equivalent to optimising c(...) to list(...) AFAICT.
-            # One issue could be that these functions (e.g., mean) can be "re-defined" by the OP to produce a length > 1 output
-            # Of course this is worrying too much though. If the issue comes up, we'll just remove the relevant optimisations.
-            # For now, we optimise all functions mentioned in 'optfuns' below.
-            optfuns = c("max", "min", "mean", "length", "sum", "median", "sd", "var")
-            is_valid = TRUE
-            any_SD = FALSE
-            jsubl = as.list.default(jsub)
-            oldjvnames = jvnames
-            jvnames = NULL           # TODO: not let jvnames grow, maybe use (number of lapply(.SD, .))*lenght(ansvars) + other jvars ?? not straightforward.
-            # Fix for #744. Don't use 'i' in for-loops. It masks the 'i' from the input!!
-            for (i_ in 2:length(jsubl)) {
-                this = jsub[[i_]]
-                if (is.name(this)) {
-                    if (this == ".SD") { # optimise '.SD' alone
-                        any_SD = TRUE
-                        jsubl[[i_]] = lapply(ansvars, as.name)
-                        jvnames = c(jvnames, ansvars)
-                    } else if (this == ".N") {
-                        # don't optimise .I in c(.SD, .I), it's length can be > 1 
-                        # only c(.SD, list(.I)) should be optimised!! .N is always length 1.
-                        jvnames = c(jvnames, gsub("^[.]([N])$", "\\1", this))   
-                    } else {
-                        # jvnames = c(jvnames, if (is.null(names(jsubl))) "" else names(jsubl)[i_])
-                        is_valid=FALSE
-                        break
-                    }
-                } else if (is.call(this)) {
-                    if (this[[1L]] == "lapply" && this[[2L]] == ".SD" && length(xcols)) {
-                        any_SD = TRUE
-                        deparse_ans = .massageSD(this)
-                        funi = funi + 1L # Fix for #985
-                        jsubl[[i_]] = as.list(deparse_ans[[1L]][-1L]) # just keep the '.' from list(.)
-                        jvnames = c(jvnames, deparse_ans[[2L]])
-                    } else if (this[[1]] == "list") {
-                        # also handle c(lapply(.SD, sum), list()) - silly, yes, but can happen
-                        if (length(this) > 1L) {
-                            jl__ = as.list(jsubl[[i_]])[-1L] # just keep the '.' from list(.)
-                            jn__ = if (is.null(names(jl__))) rep("", length(jl__)) else names(jl__)
-                            idx  = unlist(lapply(jl__, function(x) is.name(x) && x == ".I"))
-                            if (any(idx)) jn__[idx] = ifelse(jn__[idx] == "", "I", jn__[idx])
-                            jvnames = c(jvnames, jn__)
-                            jsubl[[i_]] = jl__
+        } else {
+            if ( length(jsub) == 3L && (jsub[[1L]] == "[" || jsub[[1L]] == "head" || jsub[[1L]] == "tail") && jsub[[2L]] == ".SD" && (is.numeric(jsub[[3L]]) || jsub[[3L]] == ".N") ) {
+                # optimise .SD[1] or .SD[2L]. Not sure how to test .SD[a] as to whether a is numeric/integer or a data.table, yet.
+                jsub = as.call(c(quote(list), lapply(ansvarsnew, function(x) { jsub[[2L]] = as.name(x); jsub })))
+                jvnames = ansvarsnew
+            } else if (jsub[[1L]]=="lapply" && jsub[[2L]]==".SD" && length(xcols)) {
+                deparse_ans = .massageSD(jsub)
+                jsub = deparse_ans[[1L]]
+                jvnames = deparse_ans[[2L]]
+            } else if (jsub[[1L]] == "c" && length(jsub) > 1L) {
+                # TODO, TO DO: raise the checks for 'jvnames' earlier (where jvnames is set by checking 'jsub') and set 'jvnames' already.
+                # FR #2722 is just about optimisation of j=c(.N, lapply(.SD, .)) that is taken care of here.
+                # FR #735 tries to optimise j-expressions of the form c(...) as long as ... contains
+                # 1) lapply(.SD, ...), 2) simply .SD or .SD[..], 3) .N, 4) list(...) and 5) functions that normally return a single value*
+                # On 5)* the IMPORTANT point to note is that things that are not wrapped within "list(...)" should *always* 
+                # return length 1 output for us to optimise. Else, there's no equivalent to optimising c(...) to list(...) AFAICT.
+                # One issue could be that these functions (e.g., mean) can be "re-defined" by the OP to produce a length > 1 output
+                # Of course this is worrying too much though. If the issue comes up, we'll just remove the relevant optimisations.
+                # For now, we optimise all functions mentioned in 'optfuns' below.
+                optfuns = c("max", "min", "mean", "length", "sum", "median", "sd", "var")
+                is_valid = TRUE
+                any_SD = FALSE
+                jsubl = as.list.default(jsub)
+                oldjvnames = jvnames
+                jvnames = NULL           # TODO: not let jvnames grow, maybe use (number of lapply(.SD, .))*lenght(ansvarsnew) + other jvars ?? not straightforward.
+                # Fix for #744. Don't use 'i' in for-loops. It masks the 'i' from the input!!
+                for (i_ in 2:length(jsubl)) {
+                    this = jsub[[i_]]
+                    if (is.name(this)) {
+                        if (this == ".SD") { # optimise '.SD' alone
+                            any_SD = TRUE
+                            jsubl[[i_]] = lapply(ansvarsnew, as.name)
+                            jvnames = c(jvnames, ansvarsnew)
+                        } else if (this == ".N") {
+                            # don't optimise .I in c(.SD, .I), it's length can be > 1 
+                            # only c(.SD, list(.I)) should be optimised!! .N is always length 1.
+                            jvnames = c(jvnames, gsub("^[.]([N])$", "\\1", this))   
+                        } else {
+                            # jvnames = c(jvnames, if (is.null(names(jsubl))) "" else names(jsubl)[i_])
+                            is_valid=FALSE
+                            break
                         }
-                    } else if (is.call(this) && length(this) > 1L && as.character(this[[1L]]) %in% optfuns) {
-                        jvnames = c(jvnames, if (is.null(names(jsubl))) "" else names(jsubl)[i_])
-                    } else if ( length(this) == 3L && (this[[1L]] == "[" || this[[1L]] == "head") && 
-                                    this[[2L]] == ".SD" && (is.numeric(this[[3L]]) || this[[3L]] == ".N") ) {
-                        # optimise .SD[1] or .SD[2L]. Not sure how to test .SD[a] as to whether a is numeric/integer or a data.table, yet.
-                        any_SD = TRUE
-                        jsubl[[i_]] = lapply(ansvars, function(x) { this[[2L]] = as.name(x); this })
-                        jvnames = c(jvnames, ansvars)
-                    } else if (any(all.vars(this) == ".SD")) {
-                        # TODO, TO DO: revisit complex cases (as illustrated below)
-                        # complex cases like DT[, c(.SD[x>1], .SD[J(.)], c(.SD), a + .SD, lapply(.SD, sum)), by=grp]
-                        # hard to optimise such cases (+ difficulty in counting exact columns and therefore names). revert back to no optimisation.
-                        is_valid=FALSE
-                        break
-                    } else { # just to be sure that any other case (I've overlooked) runs smoothly, without optimisation
-                        # TO DO, TODO: maybe a message/warning here so that we can catch the overlooked cases, if any?
-                        is_valid=FALSE
+                    } else if (is.call(this)) {
+                        if (this[[1L]] == "lapply" && this[[2L]] == ".SD" && length(xcols)) {
+                            any_SD = TRUE
+                            deparse_ans = .massageSD(this)
+                            funi = funi + 1L # Fix for #985
+                            jsubl[[i_]] = as.list(deparse_ans[[1L]][-1L]) # just keep the '.' from list(.)
+                            jvnames = c(jvnames, deparse_ans[[2L]])
+                        } else if (this[[1]] == "list") {
+                            # also handle c(lapply(.SD, sum), list()) - silly, yes, but can happen
+                            if (length(this) > 1L) {
+                                jl__ = as.list(jsubl[[i_]])[-1L] # just keep the '.' from list(.)
+                                jn__ = if (is.null(names(jl__))) rep("", length(jl__)) else names(jl__)
+                                idx  = unlist(lapply(jl__, function(x) is.name(x) && x == ".I"))
+                                if (any(idx)) jn__[idx & (jn__ == "")] = "I"
+                                jvnames = c(jvnames, jn__)
+                                jsubl[[i_]] = jl__
+                            }
+                        } else if (is.call(this) && length(this) > 1L && as.character(this[[1L]]) %in% optfuns) {
+                            jvnames = c(jvnames, if (is.null(names(jsubl))) "" else names(jsubl)[i_])
+                        } else if ( length(this) == 3L && (this[[1L]] == "[" || this[[1L]] == "head") && 
+                                        this[[2L]] == ".SD" && (is.numeric(this[[3L]]) || this[[3L]] == ".N") ) {
+                            # optimise .SD[1] or .SD[2L]. Not sure how to test .SD[a] as to whether a is numeric/integer or a data.table, yet.
+                            any_SD = TRUE
+                            jsubl[[i_]] = lapply(ansvarsnew, function(x) { this[[2L]] = as.name(x); this })
+                            jvnames = c(jvnames, ansvarsnew)
+                        } else if (any(all.vars(this) == ".SD")) {
+                            # TODO, TO DO: revisit complex cases (as illustrated below)
+                            # complex cases like DT[, c(.SD[x>1], .SD[J(.)], c(.SD), a + .SD, lapply(.SD, sum)), by=grp]
+                            # hard to optimise such cases (+ difficulty in counting exact columns and therefore names). revert back to no optimisation.
+                            is_valid=FALSE
+                            break
+                        } else { # just to be sure that any other case (I've overlooked) runs smoothly, without optimisation
+                            # TO DO, TODO: maybe a message/warning here so that we can catch the overlooked cases, if any?
+                            is_valid=FALSE
+                            break
+                        }
+                    } else {
+                        is_valid = FALSE
                         break
                     }
+                }
+                if (!is_valid || !any_SD) { # restore if c(...) doesn't contain lapply(.SD, ..) or if it's just invalid
+                    jvnames = oldjvnames           # reset jvnames
+                    jsub = oldjsub                 # reset jsub
+                    jsubl = as.list.default(jsubl) # reset jsubl
                 } else {
-                    is_valid = FALSE
-                    break
+                    setattr(jsubl, 'names', NULL)
+                    jsub = as.call(unlist(jsubl, use.names=FALSE))
+                    jsub[[1L]] = quote(list)
                 }
             }
-            if (!is_valid || !any_SD) { # restore if c(...) doesn't contain lapply(.SD, ..) or if it's just invalid
-                jvnames = oldjvnames           # reset jvnames
-                jsub = oldjsub                 # reset jsub
-                jsubl = as.list.default(jsubl) # reset jsubl
-            } else {
-                setattr(jsubl, 'names', NULL)
-                jsub = as.call(unlist(jsubl, use.names=FALSE))
-                jsub[[1L]] = quote(list)
-            }
         }
         if (verbose) {
             if (!identical(oldjsub, jsub))
-                cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200),"'\n",sep="")
+                cat("lapply optimization changed j from '",deparse(oldjsub),"' to '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
             else
-                cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200),"'\n",sep="")
+                cat("lapply optimization is on, j unchanged as '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
         }
         dotN <- function(x) if (is.name(x) && x == ".N") TRUE else FALSE # For #5760
-        if (getOption("datatable.optimize")>=2 && !byjoin && !length(irows) && length(f__) && length(ansvars) && !length(lhs)) {
-            # Apply GForce
-            gfuns = c("sum","mean",".N", "min", "max") # added .N for #5760
-            .ok <- function(q) {
-                if (dotN(q)) return(TRUE) # For #5760
-                ans = is.call(q) && as.character(q[[1L]]) %chin% gfuns && !is.call(q[[2L]]) && (length(q)==2 || identical("na",substring(names(q)[3L],1,2)))
-                if (is.na(ans)) ans=FALSE
-                ans
-            }
-            if (jsub[[1L]]=="list") {
-                GForce = TRUE
-                for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) GForce = FALSE
-            } else GForce = .ok(jsub)
-            if (GForce) {
-                if (jsub[[1L]]=="list")
-                    for (ii in seq_along(jsub)[-1L]) { 
-                        if (dotN(jsub[[ii]])) next; # For #5760
-                        jsub[[ii]][[1L]] = as.name(paste("g", jsub[[ii]][[1L]], sep=""))
-                        if (length(jsub[[ii]])==3) jsub[[ii]][[3]] = eval(jsub[[ii]][[3]], parent.frame())  # tests 1187.2 & 1187.4
-                    }
-                else {
-                    jsub[[1L]] = as.name(paste("g", jsub[[1L]], sep=""))
-                    if (length(jsub)==3) jsub[[3]] = eval(jsub[[3]], parent.frame())   # tests 1187.3 & 1187.5
+        # FR #971, GForce kicks in on all subsets, no joins yet. Although joins could work with 
+        # nomatch=0L even now.. but not switching it on yet, will deal it separately.
+        if (getOption("datatable.optimize")>=2 && !is.data.table(i) && !byjoin && length(f__) && !length(lhs)) {
+            if (!length(ansvars) && !use.I) {
+                GForce = FALSE
+                if ( (is.name(jsub) && jsub == ".N") || (is.call(jsub) && length(jsub)==2L && jsub[[1L]] == "list" && jsub[[2L]] == ".N") ) {
+                    GForce = TRUE
+                    if (verbose) cat("GForce optimized j to '",deparse(jsub,width.cutoff=200L),"'\n",sep="")
+                }
+            } else {
+                # Apply GForce
+                gfuns = c("sum", "prod", "mean", "median", "var", "sd", ".N", "min", "max", "head", "last", "first", "tail", "[") # added .N for #5760
+                .ok <- function(q) {
+                    if (dotN(q)) return(TRUE) # For #5760
+                    cond = is.call(q) && as.character(q[[1L]]) %chin% gfuns && !is.call(q[[2L]])
+                    ans  = cond && (length(q)==2 || identical("na",substring(names(q)[3L],1,2)))
+                    if (identical(ans, TRUE)) return(ans)
+                    ans = cond && length(q)==3 && ( as.character(q[[1]]) %chin% c("head", "tail") && 
+                                                         (identical(q[[3]], 1) || identical(q[[3]], 1L)) || 
+                                                    as.character(q[[1]]) %chin% "[" && is.numeric(q[[3]]) && 
+                                                        length(q[[3]])==1 && q[[3]]>0 )
+                    if (is.na(ans)) ans=FALSE
+                    ans
                 }
-                if (verbose) cat("GForce optimized j to '",deparse(jsub,width.cutoff=200),"'\n",sep="")
-            } else if (verbose) cat("GForce is on, left j unchanged\n");
+                if (jsub[[1L]]=="list") {
+                    GForce = TRUE
+                    for (ii in seq_along(jsub)[-1L]) if (!.ok(jsub[[ii]])) GForce = FALSE
+                } else GForce = .ok(jsub)
+                if (GForce) {
+                    if (jsub[[1L]]=="list")
+                        for (ii in seq_along(jsub)[-1L]) { 
+                            if (dotN(jsub[[ii]])) next; # For #5760
+                            jsub[[ii]][[1L]] = as.name(paste("g", jsub[[ii]][[1L]], sep=""))
+                            if (length(jsub[[ii]])==3) jsub[[ii]][[3]] = eval(jsub[[ii]][[3]], parent.frame())  # tests 1187.2 & 1187.4
+                        }
+                    else {
+                        jsub[[1L]] = as.name(paste("g", jsub[[1L]], sep=""))
+                        if (length(jsub)==3) jsub[[3]] = eval(jsub[[3]], parent.frame())   # tests 1187.3 & 1187.5
+                    }
+                    if (verbose) cat("GForce optimized j to '",deparse(jsub,width.cutoff=200),"'\n",sep="")
+                } else if (verbose) cat("GForce is on, left j unchanged\n");
+            }
         }
-        if (!GForce) {
+        if (!GForce && !is.name(jsub)) {
             # Still do the old speedup for mean, for now
             nomeanopt=FALSE  # to be set by .optmean() using <<- inside it
             oldjsub = jsub
@@ -1576,7 +1858,8 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     }
     lockBinding(".xSD", SDenv)
     grporder = o__
-    if (length(irows) && !isTRUE(irows)) {
+    # for #971, added !GForce. if (GForce) we do it much more (memory) efficiently than subset of order vector below.
+    if (length(irows) && !isTRUE(irows) && !GForce) {
         # fix for bug #2758. TO DO: provide a better error message
         if (length(irows) > 1 && length(zo__ <- which(irows == 0)) > 0) stop("i[", zo__[1], "] is 0. While grouping, i=0 is allowed when it's the only value. When length(i) > 1, all i should be > 0.")
         if (length(o__) && length(irows)!=length(o__)) stop("Internal error: length(irows)!=length(o__)")
@@ -1588,22 +1871,24 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
         # for consistency of empty case in test 184
         f__=len__=0L
     }
+    if (verbose) {last.started.at=proc.time()[3];cat("Making each group and running j (GForce ",GForce,") ... ",sep="");flush.console()}
     if (GForce) {
         thisEnv = new.env()  # not parent=parent.frame() so that gsum is found
         for (ii in ansvars) assign(ii, x[[ii]], thisEnv)
         assign(".N", len__, thisEnv) # For #5760
-        gstart(o__, f__, len__)
+        #fix for #1683
+        if (use.I) assign(".I", seq_len(nrow(x)), thisEnv)
+        gstart(o__, f__, len__, irows) # irows needed for #971.
         ans = eval(jsub, thisEnv)
         if (is.atomic(ans)) ans=list(ans)  # won't copy named argument in new version of R, good
         gend()
         gi = if (length(o__)) o__[f__] else f__
         g = lapply(grpcols, function(i) groups[[i]][gi])
         ans = c(g, ans)
-    } else {
-        if (verbose) {last.started.at=proc.time()[3];cat("Starting dogroups ... ");flush.console()}
+    } else {        
         ans = .Call(Cdogroups, x, xcols, groups, grpcols, jiscols, xjiscols, grporder, o__, f__, len__, jsub, SDenv, cols, newnames, !missing(on), verbose)
-        if (verbose) {cat("done dogroups in",round(proc.time()[3]-last.started.at,3),"secs\n");flush.console()}
     }
+    if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console()}
     # TO DO: xrows would be a better name for irows: irows means the rows of x that i joins to
     # Grouping by i: icols the joins columns (might not need), isdcols (the non join i and used by j), all __ are length x
     # Grouping by by: i is by val, icols NULL, o__ may be subset of x, f__ points to o__ (or x if !length o__)
@@ -1612,10 +1897,21 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     if (!is.null(lhs)) {
         if (any(names(x)[cols] %chin% key(x)))
             setkey(x,NULL)
+        # fixes #1479. Take care of secondary indices, TODO: cleaner way of doing this
+        attrs = attr(x, 'index')
+        skeys = names(attributes(attrs))
+        if (!is.null(skeys)) {
+            hits  = unlist(lapply(paste("__", names(x)[cols], sep=""), function(x) grep(x, skeys)))
+            hits  = skeys[unique(hits)]
+            for (i in seq_along(hits)) setattr(attrs, hits[i], NULL) # does by reference
+        }
         if (!missing(keyby)) {
             cnames = as.character(bysubl)[-1]
-            if (all(cnames %chin% names(x)))
+            if (all(cnames %chin% names(x))) {
+                if (verbose) {last.started.at=proc.time()[3];cat("setkey() after the := with keyby= ... ");flush.console()}
                 setkeyv(x,cnames)  # TO DO: setkey before grouping to get memcpy benefit.
+                if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console()}
+            }
             else warning(":= keyby not straightforward character column names or list() of column names, treating as a by:",paste(cnames,collapse=","),"\n")
         }
         return(suppPrint(x))
@@ -1638,11 +1934,11 @@ chmatch2 <- function(x, table, nomatch=NA_integer_) {
     } else {
         setnames(ans,seq_along(bynames),bynames)   # TO DO: reinvestigate bynames flowing from dogroups here and simplify
     }
-    if (!missing(keyby)) {
+    if (byjoin && !missing(keyby) && !bysameorder) {
+        if (verbose) {last.started.at=proc.time()[3];cat("setkey() afterwards for keyby=.EACHI ... ");flush.console()}
         setkeyv(ans,names(ans)[seq_along(byval)])
-        # but if 'bykey' and 'bysameorder' then the setattr in branch above will run instead for
-        # speed (because !missing(by) when bykey, too)
-    } else if (haskey(x) && bysameorder) {
+        if (verbose) {cat(round(proc.time()[3]-last.started.at,3),"secs\n");flush.console()}
+    } else if (!missing(keyby) || (haskey(x) && bysameorder)) {
         setattr(ans,"sorted",names(ans)[seq_along(grpcols)])
     }
     alloc.col(ans)   # TO DO: overallocate in dogroups in the first place and remove this line
@@ -1746,162 +2042,6 @@ as.matrix.data.table <- function(x,...)
     X
 }
 
-as.data.table.matrix <- function(x, keep.rownames=FALSE, ...)
-{
-    if (!identical(keep.rownames, FALSE)) {
-        # can specify col name to keep.rownames, #575
-        ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
-        if (is.character(keep.rownames))
-            setnames(ans, 'rn', keep.rownames[1L])
-        return(ans)
-    }
-    d <- dim(x)
-    nrows <- d[1L]
-    ir <- seq_len(nrows)
-    ncols <- d[2L]
-    ic <- seq_len(ncols)
-    dn <- dimnames(x)
-    collabs <- dn[[2L]]
-    if (any(empty <- nchar(collabs) == 0L))
-        collabs[empty] <- paste("V", ic, sep = "")[empty]
-    value <- vector("list", ncols)
-    if (mode(x) == "character") {
-        # fix for #745 - A long overdue SO post: http://stackoverflow.com/questions/17691050/data-table-still-converts-strings-to-factors
-        for (i in ic) value[[i]] <- x[, i]                  # <strike>for efficiency.</strike> For consistency - data.table likes and prefers "character"
-    }
-    else {
-        for (i in ic) value[[i]] <- as.vector(x[, i])       # to drop any row.names that would otherwise be retained inside every column of the data.table
-    }
-    if (length(collabs) == ncols)
-        setattr(value, "names", collabs)
-    else
-        setattr(value, "names", paste("V", ic, sep = ""))
-    setattr(value,"row.names",.set_row_names(nrows))
-    setattr(value,"class",c("data.table","data.frame"))
-    alloc.col(value)
-}
-
-# don't retain classes before "data.frame" while converting 
-# from it.. like base R does. This'll break test #527 (see 
-# tests and as.data.table.data.frame) I've commented #527 
-# for now. This addresses #1078 and #1128
-.resetclass <- function(x, class) {
-    cx = class(x)
-    n  = chmatch(class, cx)
-    cx = unique( c("data.table", "data.frame", tail(cx, length(cx)-n)) )
-}
-
-as.data.table.data.frame <- function(x, keep.rownames=FALSE, ...)
-{
-    if (!identical(keep.rownames, FALSE)) {
-        # can specify col name to keep.rownames, #575
-        ans = data.table(rn=rownames(x), x, keep.rownames=FALSE)
-        if (is.character(keep.rownames))
-            setnames(ans, 'rn', keep.rownames[1L])
-        return(ans)
-    }
-    ans = copy(x)  # TO DO: change this deep copy to be shallow.
-    setattr(ans,"row.names",.set_row_names(nrow(x)))
-
-    ## NOTE: This test (#527) is no longer in effect ##
-    # for nlme::groupedData which has class c("nfnGroupedData","nfGroupedData","groupedData","data.frame")
-    # See test 527.
-    ## 
-
-    # fix for #1078 and #1128, see .resetclass() for explanation.
-    setattr(ans, "class", .resetclass(x, "data.frame"))
-    alloc.col(ans)
-}
-
-as.data.table.list <- function(x, keep.rownames=FALSE, ...) {
-    if (!length(x)) return( null.data.table() )
-    n = vapply(x, length, 0L)
-    mn = max(n)
-    x = copy(x)
-    idx = which(n < mn)
-    if (length(idx)) {
-        for (i in idx) {
-            if (!is.null(x[[i]])) {# avoids warning when a list element is NULL
-                # Implementing FR #4813 - recycle with warning when nr %% nrows[i] != 0L
-                if (!n[i] && mn)
-                    warning("Item ", i, " is of size 0 but maximum size is ", mn, ", therefore recycled with 'NA'")
-                else if (n[i] && mn %% n[i] != 0)
-                    warning("Item ", i, " is of size ", n[i], " but maximum size is ", mn, " (recycled leaving a remainder of ", mn%%n[i], " items)")
-                x[[i]] = rep(x[[i]], length.out=mn)
-            }
-        }
-    }
-    # fix for #842
-    if (mn > 0L) {
-        nz = which(n > 0L)
-        xx = point(vector("list", length(nz)), seq_along(nz), x, nz)
-        if (!is.null(names(x)))
-            setattr(xx, 'names', names(x)[nz])
-        x = xx
-    }
-    if (is.null(names(x))) setattr(x,"names",paste("V",seq_len(length(x)),sep=""))
-    setattr(x,"row.names",.set_row_names(max(n)))
-    setattr(x,"class",c("data.table","data.frame"))
-    alloc.col(x)
-}
-
-as.data.table.data.table <- function(x, keep.rownames=FALSE, ...) {
-    # fix for #1078 and #1128, see .resetclass() for explanation.
-    setattr(x, 'class', .resetclass(x, "data.table"))
-    return(x)
-}
-
-# takes care of logical, character, numeric, integer
-as.data.table.factor <- as.data.table.ordered <- 
-as.data.table.integer <- as.data.table.numeric <- 
-as.data.table.logical <- as.data.table.character <- 
-as.data.table.Date <- function(x, keep.rownames=FALSE, ...) {
-    if (is.matrix(x)) {
-        return(as.data.table.matrix(x, ...))
-    }
-    tt = deparse(substitute(x))[1]
-    nm = names(x)
-    # FR #2356 - transfer names of named vector as "rn" column if required
-    if (!identical(keep.rownames, FALSE) & !is.null(nm)) 
-        x <- list(nm, unname(x))
-    else x <- list(x)
-    if (tt == make.names(tt)) {
-        # can specify col name to keep.rownames, #575
-        nm = if (length(x) == 2L) if (is.character(keep.rownames)) keep.rownames[1L] else "rn"
-        setattr(x, 'names', c(nm, tt))
-    }
-    as.data.table.list(x, FALSE)
-}
-
-R300_provideDimnames <- function (x, sep = "", base = list(LETTERS))   # backported from R3.0.0 so data.table can depend on R 2.14.0 
-{
-    dx <- dim(x)
-    dnx <- dimnames(x)
-    if (new <- is.null(dnx)) 
-        dnx <- vector("list", length(dx))
-    k <- length(M <- vapply(base, length, 1L))
-    for (i in which(vapply(dnx, is.null, NA))) {
-        ii <- 1L + (i - 1L)%%k
-        dnx[[i]] <- make.unique(base[[ii]][1L + 0:(dx[i] - 1L)%%M[ii]], 
-            sep = sep)
-        new <- TRUE
-    }
-    if (new) 
-        dimnames(x) <- dnx
-    x
-}
-
-# as.data.table.table - FR #4848
-as.data.table.table <- function(x, keep.rownames=FALSE, ...) {
-    # Fix for bug #5408 - order of columns are different when doing as.data.table(with(DT, table(x, y)))
-    val = rev(dimnames(R300_provideDimnames(x)))
-    if (is.null(names(val)) || all(nchar(names(val)) == 0L)) 
-        setattr(val, 'names', paste("V", rev(seq_along(val)), sep=""))
-    ans <- data.table(do.call(CJ, c(val, sorted=FALSE)), N = as.vector(x))
-    setcolorder(ans, c(rev(head(names(ans), -1)), "N"))
-    ans
-}
-
 # bug #2375. fixed. same as head.data.frame and tail.data.frame to deal with negative indices
 head.data.table <- function(x, n=6, ...) {
     if (!cedta()) return(NextMethod())
@@ -1994,17 +2134,6 @@ tail.data.table <- function(x, n=6, ...) {
     `[<-.data.table`(x,j=name,value=value)  # important i is missing here
 }
 
-as.data.table <-function(x, keep.rownames=FALSE, ...)
-{
-    if (is.null(x))
-        return(null.data.table())
-    UseMethod("as.data.table")
-}
-
-as.data.table.default <- function(x, ...){
-  setDT(as.data.frame(x, ...))[]
-}
-
 as.data.frame.data.table <- function(x, ...)
 {
     ans = copy(x)
@@ -2034,7 +2163,8 @@ as.list.data.table <- function(x, ...) {
 
 dimnames.data.table <- function(x) {
     if (!cedta()) {
-        if (!identical(class(x),c("data.table","data.frame"))) stop("data.table inherits from data.frame (from v1.5) but this data.table does not. Has it been created manually (e.g. by using 'structure' rather than 'data.table') or saved to disk using a prior version of data.table? The correct class is c('data.table','data.frame').")
+        if (!inherits(x, "data.frame")) 
+          stop("data.table inherits from data.frame (from v1.5), but this data.table does not. Has it been created manually (e.g. by using 'structure' rather than 'data.table') or saved to disk using a prior version of data.table?")
         return(`dimnames.data.frame`(x))
     }
     list(NULL, names(x))
@@ -2059,7 +2189,8 @@ dimnames.data.table <- function(x) {
     if ( ((tt<-identical(caller,"colnames<-")) && cedta(3)) || cedta() ) {
         if (.R.assignNamesCopiesAll)
             warning("This is R<3.1.0 where ",if(tt)"col","names(x)<-value deep copies the entire table (several times). Please upgrade to R>=3.1.0 and see ?setnames which allows you to change names by name with built-in checks and warnings.")
-    } else x = shallow(x) ## Fix for #476 and #825. Needed for R v3.1.0+.  TO DO: revisit
+    }
+    x = shallow(x) # `names<-` should NOT modify by reference. Related to #1015, #476 and #825. Needed for R v3.1.0+.  TO DO: revisit
     if (is.null(value))
         setattr(x,"names",NULL)   # e.g. plyr::melt() calls base::unname()
     else
@@ -2102,6 +2233,7 @@ transform.data.table <- function (`_data`, ...)
     inx <- chmatch(tags, names(`_data`))
     matched <- !is.na(inx)
     if (any(matched)) {
+        if (isTRUE(attr(`_data`, ".data.table.locked", TRUE))) setattr(`_data`, ".data.table.locked", NULL) # fix for #1641
         `_data`[,inx[matched]] <- e[matched]
         `_data` <- data.table(`_data`)
     }
@@ -2179,12 +2311,14 @@ na.omit.data.table <- function (object, cols = seq_along(object), invert = FALSE
     }
     cols = as.integer(cols)
     ix = .Call(Cdt_na, object, cols)
-    .Call(CsubsetDT, object, which_(ix, bool = invert), seq_along(object))
+    ans = .Call(CsubsetDT, object, which_(ix, bool = invert), seq_along(object))
+    if (any(ix)) setindexv(ans, NULL)[] else ans #1734
     # compare the above to stats:::na.omit.data.frame
 }
 
 which_ <- function(x, bool = TRUE) {
-    .Call("Cwhichwrapper", x, bool)
+    # fix for #1467, quotes result in "not resolved in current namespace" error
+    .Call(Cwhichwrapper, x, bool)
 }
 
 is.na.data.table <- function (x) {
@@ -2205,11 +2339,74 @@ Ops.data.table <- function(e1, e2 = NULL)
     ans
 }
 
-
-split.data.table <- function(...) {
-    if (cedta() && getOption("datatable.dfdispatchwarn"))  # or user can use suppressWarnings
-        warning("split is inefficient. It copies memory. Please use [,j,by=list(...)] syntax. See data.table FAQ.")
-    NextMethod()  # allow user to do it though, split object will be data.table's with 'NA' repeated in row.names silently
+split.data.table <- function(x, f, drop = FALSE, by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, ..., verbose = getOption("datatable.verbose")) {
+    if (!is.data.table(x)) stop("x argument must be a data.table")
+    stopifnot(is.logical(drop), is.logical(sorted), is.logical(keep.by),  is.logical(flatten))
+    # split data.frame way, using `f` and not `by` argument
+    if (!missing(f)) {
+        if (!length(f) && nrow(x))
+            stop("group length is 0 but data nrow > 0")
+        if (!missing(by))
+            stop("passing 'f' argument together with 'by' is not allowed, use 'by' when split by column in data.table and 'f' when split by external factor")
+        # same as split.data.frame - handling all exceptions, factor orders etc, in a single stream of processing was a nightmare in factor and drop consistency
+        return(lapply(split(x = seq_len(nrow(x)), f = f, drop = drop, ...), function(ind) x[ind]))
+    }
+    if (missing(by)) stop("you must provide 'by' or 'f' arguments")
+    # check reserved column names during processing
+    if (".ll.tech.split" %in% names(x)) stop("column '.ll.tech.split' is reserved for split.data.table processing")
+    if (".nm.tech.split" %in% by) stop("column '.nm.tech.split' is reserved for split.data.table processing")
+    if (!all(by %in% names(x))) stop("argument 'by' must refer to data.table column names")
+    if (!all(by.atomic <- sapply(by, function(.by) is.atomic(x[[.by]])))) stop(sprintf("argument 'by' must refer only to atomic type columns, classes of '%s' columns are not atomic type", paste(by[!by.atomic], collapse=", ")))
+    # list of data.tables (flatten) or list of lists of ... data.tables
+    make.levels = function(x, cols, sorted) {
+        by.order = if (!sorted) x[, funique(.SD), .SDcols=cols] # remember order of data, only when not sorted=FALSE
+        ul = lapply(setNames(cols, nm=cols), function(col) if (!is.factor(x[[col]])) unique(x[[col]]) else levels(x[[col]]))
+        r = do.call("CJ", c(ul, sorted=sorted, unique=TRUE))
+        if (!sorted && nrow(by.order)) {
+            ii = r[by.order, on=cols, which=TRUE]
+            r = rbindlist(list(
+                r[ii], # original order from data
+                r[-ii] # empty levels at the end
+            ))
+        }
+        r
+    }
+    .by = by[1L]
+    # this builds data.table call - is much more cleaner than handling each case one by one
+    dtq = as.list(call("[", as.name("x")))
+    join = FALSE
+    flatten_any = flatten && any(sapply(by, function(col) is.factor(x[[col]])))
+    nested_current = !flatten && is.factor(x[[.by]])
+    if (!drop && (flatten_any || nested_current)) {
+        dtq[["i"]] = substitute(make.levels(x, cols=.cols, sorted=.sorted), list(.cols=if (flatten) by else .by, .sorted=sorted))
+        join = TRUE
+    }
+    dtq[["j"]] = substitute(
+        list(.ll.tech.split=list(.expr)),
+        list(.expr = if (join) quote(if(.N == 0L) .SD[0L] else .SD) else as.name(".SD")) # simplify when `nomatch` accept NULL #857 ?
+    )
+    by.or.keyby = if (join) "by" else c("by"[!sorted], "keyby"[sorted])[1L]
+    dtq[[by.or.keyby]] = substitute( # retain order, for `join` and `sorted` it will use order of `i` data.table instead of `keyby`.
+        .expr,
+        list(.expr = if(join) as.name(".EACHI") else if (flatten) by else .by)
+    )
+    dtq[[".SDcols"]] = if (keep.by) names(x) else setdiff(names(x), if (flatten) by else .by)
+    if (join) dtq[["on"]] = if (flatten) by else .by
+    dtq = as.call(dtq)
+    if (isTRUE(verbose)) cat("Processing split.data.table with: ", deparse(dtq, width.cutoff=500L), "\n", sep="")
+    tmp = eval(dtq)
+    # add names on list
+    setattr(ll <- tmp$.ll.tech.split,
+            "names", 
+            as.character(
+                if (!flatten) tmp[[.by]] else tmp[, list(.nm.tech.split=paste(unlist(lapply(.SD, as.character)), collapse = ".")), by=by, .SDcols=by]$.nm.tech.split
+            ))
+    # handle nested split
+    if (flatten || length(by) == 1L) return(
+        lapply(lapply(ll, setattr, '.data.table.locked', NULL), setDT) # alloc.col could handle DT in list as done in: c9c4ff80bdd4c600b0c4eff23b207d53677176bd
+    ) else if (length(by) > 1L) return(
+        lapply(ll, split.data.table, drop=drop, by=by[-1L], sorted=sorted, keep.by=keep.by, flatten=flatten)
+    )
 }
 
 # TO DO, add more warnings e.g. for by.data.table(), telling user what the data.table syntax is but letting them dispatch to data.frame if they want
@@ -2218,7 +2415,19 @@ copy <- function(x) {
     newx = .Call(Ccopy,x)  # copies at length but R's duplicate() also copies truelength over.
                            # TO DO: inside Ccopy it could reset tl to 0 or length, but no matter as selfrefok detects it
                            # TO DO: revisit duplicate.c in R 3.0.3 and see where it's at
-    if (!is.data.table(x)) return(newx)   # e.g. in as.data.table.list() the list is copied before changing to data.table
+    if (!is.data.table(x)) {
+	# fix for #1476. TODO: find if a cleaner fix is possible..
+	if (is.list(x)) {
+	    anydt = vapply(x, is.data.table, TRUE, USE.NAMES=FALSE)
+	    if (sum(anydt)) {
+		newx[anydt] = lapply(newx[anydt], function(x) {
+				    setattr(x, ".data.table.locked", NULL)
+				    alloc.col(x)
+				})
+	    }
+	}
+	return(newx)   # e.g. in as.data.table.list() the list is copied before changing to data.table
+    }
     setattr(newx,".data.table.locked",NULL)
     alloc.col(newx)
 }
@@ -2231,7 +2440,7 @@ point <- function(to, to_idx, from, from_idx) {
     .Call(CpointWrapper, to, to_idx, from, from_idx)
 }
 
-.shallow <- function(x, cols = NULL, retain.key = FALSE) {
+.shallow <- function(x, cols = NULL, retain.key = FALSE, unlock = FALSE) {
     isnull = is.null(cols)
     if (!isnull) cols = validate(cols, x)  # NULL is default = all columns
     ans = .Call(Cshallowwrapper, x, cols)  # copies VECSXP only
@@ -2240,6 +2449,7 @@ point <- function(to, to_idx, from, from_idx) {
     cols = names(x)[cols]
     retain.key = retain.key && identical(cols, head(key(x), length(cols)))
     setattr(ans, 'sorted', if (haskey(x) && retain.key) cols else NULL)
+    if (unlock) setattr(ans, '.data.table.locked', NULL)
     ans
     # TODO: check/remove attributes for secondary keys?
 }
@@ -2255,13 +2465,7 @@ alloc.col <- function(DT, n=getOption("datatable.alloccol"), verbose=getOption("
 {
     name = substitute(DT)
     if (identical(name,quote(`*tmp*`))) stop("alloc.col attempting to modify `*tmp*`")
-    ans = .Call(Calloccolwrapper,DT,as.integer(eval(n)),verbose)
-    for (i in seq_along(ans)) {
-        # clear the same excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890).
-        setattr(ans[[i]],"names",NULL)
-        setattr(ans[[i]],"dim",NULL)
-        setattr(ans[[i]],"dimnames",NULL)
-    }
+    ans = .Call(Calloccolwrapper, DT, length(DT)+as.integer(eval(n)), verbose)
     if (is.name(name)) {
         name = as.character(name)
         assign(name,ans,parent.frame(),inherits=TRUE)
@@ -2291,9 +2495,15 @@ setattr <- function(x,name,value) {
         # Using setnames here so that truelength of names can be retained, to carry out integrity checks such as not
         # creating names longer than the number of columns of x, and to change the key, too
         # For convenience so that setattr(DT,"names",allnames) works as expected without requiring a switch to setnames.
-    else .Call(Csetattrib, x, name, value)
-    # If name=="names" and this is the first time names are assigned (e.g. in data.table()), this will be grown by alloc.col very shortly afterwards in the caller.
-
+    else {
+	# fix for R's global TRUE value input, #1281
+	ans = .Call(Csetattrib, x, name, value)
+	# If name=="names" and this is the first time names are assigned (e.g. in data.table()), this will be grown by alloc.col very shortly afterwards in the caller.
+	if (!is.null(ans)) {
+	    warning("Input is a length=1 logical that points to the same address as R's global TRUE value. Therefore the attribute has not been set by reference, rather on a copy. You will need to assign the result back to a variable. See https://github.com/Rdatatable/data.table/issues/1281 for more.")
+	    x = ans
+	}
+    }
     # fix for #1142 - duplicated levels for factors
     if (name == "levels" && is.factor(x) && anyDuplicated(value))
         .Call(Csetlevels, x, (value <- as.character(value)), unique(value))
@@ -2319,11 +2529,12 @@ setnames <- function(x,old,new) {
     } else {
         if (missing(old)) stop("When 'new' is provided, 'old' must be provided too")
         if (!is.character(new)) stop("'new' is not a character vector")
-        if (length(new)!=length(old)) stop("'old' is length ",length(old)," but 'new' is length ",length(new))
         if (is.numeric(old)) {
-            tt = old<1L | old>length(x) | is.na(old)
+            if (length(sgn <- unique(sign(old))) != 1L) 
+                stop("Items of 'old' is numeric but has both +ve and -ve indices.")
+            tt = abs(old)<1L | abs(old)>length(x) | is.na(old)
             if (any(tt)) stop("Items of 'old' either NA or outside range [1,",length(x),"]: ",paste(old[tt],collapse=","))
-            i = as.integer(old)
+            i = if (sgn == 1L) as.integer(old) else seq_along(x)[as.integer(old)]
             if (any(duplicated(i))) stop("Some duplicates exist in 'old': ",paste(i[duplicated(i)],collapse=","))
         } else {
             if (!is.character(old)) stop("'old' is type ",typeof(old)," but should be integer, double or character")
@@ -2332,6 +2543,7 @@ setnames <- function(x,old,new) {
             if (any(is.na(i))) stop("Items of 'old' not found in column names: ",paste(old[is.na(i)],collapse=","))
             if (any(tt<-!is.na(chmatch(old,names(x)[-i])))) stop("Some items of 'old' are duplicated (ambiguous) in column names: ",paste(old[tt],collapse=","))
         }
+        if (length(new)!=length(i)) stop("'old' is length ",length(i)," but 'new' is length ",length(new))
     }
     # update the key if the column name being change is in the key
     m = chmatch(names(x)[i], key(x))
@@ -2389,7 +2601,7 @@ set <- function(x,i=NULL,j,value)  # low overhead, loopable
 }
 
 chmatch <- function(x,table,nomatch=NA_integer_)
-    .Call(Cchmatchwrapper,x,table,as.integer(nomatch),FALSE)
+    .Call(Cchmatchwrapper,x,table,as.integer(nomatch[1L]),FALSE) # [1L] to fix #1672
 
 "%chin%" <- function(x,table) {
     # TO DO  if table has 'ul' then match to that
@@ -2411,32 +2623,23 @@ chgroup <- function(x) {
 .rbind.data.table <- function(..., use.names=TRUE, fill=FALSE, idcol=NULL) {
     # See FAQ 2.23
     # Called from base::rbind.data.frame
-    l = list(...)
-    # if (missing(use.names)) message("Columns will be bound by name for consistency with base. You can supply unnamed lists and the columns will then be joined by position, or set use.names=FALSE. Alternatively, explicitly setting use.names to TRUE will remove this message.")
+    # fix for #1626.. because some packages (like psych) bind an input
+    # data.frame/data.table with a matrix..
+    l = lapply(list(...), function(x) if (is.list(x)) x else as.data.table(x))
     rbindlist(l, use.names, fill, idcol)
 }
 
 rbindlist <- function(l, use.names=fill, fill=FALSE, idcol=NULL) {
-    ans = .Call("Crbindlist", l, use.names, fill)
-    if (!length(ans)) return(null.data.table())
-    setDT(ans)
-    if (!is.null(idcol)) {
+    if (identical(idcol, FALSE)) idcol = NULL
+    else if (!is.null(idcol)) {
         if (isTRUE(idcol)) idcol = ".id"
-        if (!is.character(idcol))
-            stop("idcol must be a logical or character vector of length 1. If logical and 'TRUE' the id column will automatically named '.id'. Else the column will be named with the character value provided in 'idcol'.")
-        if (idcol %in% names(ans))
-            stop(idcol, " is already a column name in the result. Please provide another name for 'idcol'.")
-        nm  = names(l)
-        len = vapply(l, NROW, 0L)
-        idx = which(len > 0L)
-        len = len[idx]
-        if (is.null(nm)) nm = seq_along(len)
-        else nm = nm[idx]
-        ansnames = c(idcol, names(ans))
-        set(ans, j=idcol, value=rep.int(nm, len))
-        setcolorder(ans, ansnames)
+        if (!is.character(idcol)) stop("idcol must be a logical or character vector of length 1. If logical TRUE the id column will named '.id'.")
+        idcol = idcol[1L]
     }
-    ans
+    # fix for #1467, quotes result in "not resolved in current namespace" error
+    ans = .Call(Crbindlist, l, use.names, fill, idcol)
+    if (!length(ans)) return(null.data.table())
+    setDT(ans)[]
 }
 
 vecseq <- function(x,y,clamp) .Call(Cvecseq,x,y,clamp)
@@ -2534,6 +2737,10 @@ setDT <- function(x, keep.rownames=FALSE, key=NULL, check.names=FALSE) {
         x = null.data.table()
     } else if (is.list(x)) {
         # copied from as.data.table.list - except removed the copy
+	for (i in seq_along(x)) {
+	    if (inherits(x[[i]], "POSIXlt"))
+		stop("Column ", i, " is of POSIXlt type. Please convert it to POSIXct using as.POSIXct and run setDT again. We do not recommend use of POSIXlt at all because it uses 40 bytes to store one date.")
+	}
         n = vapply(x, length, 0L)
         mn = max(n)
         if (any(n<mn))
@@ -2585,33 +2792,75 @@ as_list <- function(x) {
     lx
 }
 
+# FR #1353
+rowid <- function(..., prefix=NULL) {
+    rowidv(list(...), prefix=prefix)
+}
+
+rowidv <- function(x, cols=seq_along(x), prefix=NULL) {
+    if (!is.null(prefix) && (!is.character(prefix) || length(prefix) != 1L))
+        stop("prefix must be NULL or a character vector of length=1.")
+    if (is.atomic(x)) {
+        if (!missing(cols) && !is.null(cols))
+            stop("x is a single vector, non-NULL 'cols' doesn't make sense.")
+        cols = 1L
+        x = as_list(x)
+    } else {
+        if (!length(cols))
+            stop("x is a list, 'cols' can not be on 0-length.")
+        if (is.character(cols))
+            cols = chmatch(cols, names(x))
+        cols = as.integer(cols)
+    }
+    xorder = forderv(x, by=cols, sort=FALSE, retGrp=TRUE) # speedup on char with sort=FALSE
+    xstart = attr(xorder, 'start')
+    if (!length(xorder)) xorder = seq_along(x[[1L]])
+    ids = .Call(Cfrank, xorder, xstart, uniqlengths(xstart, length(xorder)), "sequence")
+    if (!is.null(prefix))
+        ids = paste0(prefix, ids)
+    ids
+}
+
 # FR #686
-rleid <- function(...) {
-    rleidv(list(...))
+rleid <- function(..., prefix=NULL) {
+    rleidv(list(...), prefix=prefix)
 }
 
-rleidv <- function(x, cols=seq_along(x)) {
+rleidv <- function(x, cols=seq_along(x), prefix=NULL) {
+    if (!is.null(prefix) && (!is.character(prefix) || length(prefix) != 1L))
+        stop("prefix must be NULL or a character vector of length=1.")
     if (is.atomic(x)) {
         if (!missing(cols) && !is.null(cols)) 
-            stop("x is a single vector, non-NULL 'cols' doesn't make sense")
+            stop("x is a single vector, non-NULL 'cols' doesn't make sense.")
         cols = 1L
         x = as_list(x)
     } else {
         if (!length(cols))
-            stop("x is a list, 'cols' can not be 0-length")
+            stop("x is a list, 'cols' can not be 0-length.")
         if (is.character(cols)) 
             cols = chmatch(cols, names(x))
         cols = as.integer(cols)
     }
-    x = .shallow(x, cols) # shallow copy even if list..
-    .Call(Crleid, setDT(x), -1L)
+    ids = .Call(Crleid, x, cols)
+    if (!is.null(prefix)) ids = paste0(prefix, ids)
+    ids
 }
 
+# GForce functions
+`g[` <- function(x, n) .Call(Cgnthvalue, x, as.integer(n)) # n is of length=1 here.
+ghead <- function(x, n) .Call(Cghead, x, as.integer(n)) # n is not used at the moment
+gtail <- function(x, n) .Call(Cgtail, x, as.integer(n)) # n is not used at the moment
+gfirst <- function(x) .Call(Cgfirst, x)
+glast <- function(x) .Call(Cglast, x)
 gsum <- function(x, na.rm=FALSE) .Call(Cgsum, x, na.rm)
 gmean <- function(x, na.rm=FALSE) .Call(Cgmean, x, na.rm)
+gprod <- function(x, na.rm=FALSE) .Call(Cgprod, x, na.rm)
+gmedian <- function(x, na.rm=FALSE) .Call(Cgmedian, x, na.rm)
 gmin <- function(x, na.rm=FALSE) .Call(Cgmin, x, na.rm)
 gmax <- function(x, na.rm=FALSE) .Call(Cgmax, x, na.rm)
-gstart <- function(o, f, l) .Call(Cgstart, o, f, l)
+gvar <- function(x, na.rm=FALSE) .Call(Cgvar, x, na.rm)
+gsd <- function(x, na.rm=FALSE) .Call(Cgsd, x, na.rm)
+gstart <- function(o, f, l, rows) .Call(Cgstart, o, f, l, rows)
 gend <- function() .Call(Cgend)
 
 isReallyReal <- function(x) {
diff --git a/R/duplicated.R b/R/duplicated.R
index a139bc7..a57923f 100644
--- a/R/duplicated.R
+++ b/R/duplicated.R
@@ -1,9 +1,11 @@
 
-duplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...) {
+duplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...) {
     if (!cedta()) return(NextMethod("duplicated"))
     if (!identical(incomparables, FALSE)) {
         .NotYetUsed("incomparables != FALSE")
     }
+    if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key")))  #1284
+        by = key(x)
     if (nrow(x) == 0L || ncol(x) == 0L) return(logical(0)) # fix for bug #5582
     if (is.na(fromLast) || !is.logical(fromLast)) stop("'fromLast' must be TRUE or FALSE")
     query <- .duplicated.helper(x, by)
@@ -25,10 +27,13 @@ duplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key
     res
 }
 
-unique.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...) {
+unique.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...) {
     if (!cedta()) return(NextMethod("unique"))
+    if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key")))  #1284
+        by = key(x)
     dups <- duplicated.data.table(x, incomparables, fromLast, by, ...)
-    .Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x))) # more memory efficient version of which(!dups)
+    ans <- .Call(CsubsetDT, x, which_(dups, FALSE), seq_len(ncol(x))) # more memory efficient version of which(!dups)
+    if (nrow(x) != nrow(ans)) setindexv(ans, NULL)[] else ans #1760
     # i.e. x[!dups] but avoids [.data.table overhead when unique() is loop'd
     # TO DO: allow logical to be passed through to C level, and allow cols=NULL to mean all, for further speed gain.
     #        See news for v1.9.3 for link to benchmark use-case on datatable-help.
@@ -40,7 +45,7 @@ unique.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key(x),
 ##
 ## This was dropped into a helper because initial implementation of
 ## unique.data.table and duplicated.data.table both needed this. However,
-## unique.data.table has bene refactored to simply call duplicated.data.table
+## unique.data.table has been refactored to simply call duplicated.data.table
 ## making the refactor unnecessary, but let's leave it here just in case
 .duplicated.helper <- function(x, by) {
     use.sub.cols <- !is.null(by) # && !isTRUE(by) # Fixing bug #5424
@@ -78,22 +83,36 @@ unique.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key(x),
 # Note that base's anyDuplicated is faster than any(duplicated(.)) (for vectors) - for data.frames it still pastes before calling duplicated
 # In that sense, this anyDuplicated is *not* the same as base's - meaning it's not a different implementation
 # This is just a wrapper. That being said, it should be incredibly fast on data.tables (due to data.table's fast forder)
-anyDuplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...) {
+anyDuplicated.data.table <- function(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...) {
     if (!cedta()) return(NextMethod("anyDuplicated"))
+    if (missing(by) && isTRUE(getOption("datatable.old.unique.by.key")))  #1284
+        by = key(x)
     dups <- duplicated(x, incomparables, fromLast, by, ...)
     if (fromLast) idx = tail(which(dups), 1L) else idx = head(which(dups), 1L)
     if (!length(idx)) idx=0L
     idx
 }
 
-
 # simple straightforward helper function to get the number 
 # of groups in a vector or data.table. Here by data.table, 
 # we really mean `.SD` - used in a grouping operation
-uniqueN <- function(x, by = if (is.data.table(x)) key(x) else NULL) {
+# TODO: optimise uniqueN further with GForce.
+uniqueN <- function(x, by = if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE) { # na.rm, #1455
+    if (missing(by) && is.data.table(x) && isTRUE(getOption("datatable.old.unique.by.key")))  #1284
+        by = key(x)
+    if (is.null(x)) return(0L)
     if (!is.atomic(x) && !is.data.frame(x))
-        return(length(unique(x)))
+        stop("x must be an atomic vector or data.frames/data.tables")
     if (is.atomic(x)) x = as_list(x)
     if (is.null(by)) by = seq_along(x)
-    length(attr(forderv(x, by=by, retGrp=TRUE), 'starts'))
+    o = forderv(x, by=by, retGrp=TRUE, na.last=if (!na.rm) FALSE else NA)
+    starts = attr(o, 'starts')
+    if (!na.rm) {
+        length(starts)
+    } else {
+        # TODO: internal efficient sum
+        # fix for #1771, account for already sorted input
+        sum( (if (length(o)) o[starts] else starts) != 0L)
+    }
 }
+
diff --git a/R/fcast.R b/R/fcast.R
index 011b0f8..bb3fffc 100644
--- a/R/fcast.R
+++ b/R/fcast.R
@@ -24,6 +24,8 @@ check_formula <- function(formula, varnames, valnames) {
     vars = all.vars(formula)
     vars = vars[!vars %chin% c(".", "...")]
     allvars = c(vars, valnames)
+    if (any(allvars %in% varnames[duplicated(varnames)])) 
+      stop('data.table to cast must have unique column names')
     ans = deparse_formula(as.list(formula)[-1L], varnames, allvars)
 }
 
@@ -67,6 +69,7 @@ aggregate_funs <- function(funs, vals, sep="_", ...) {
             vals = replicate(length(funs), vals)
         else stop("When 'fun.aggregate' and 'value.var' are both lists, 'value.var' must be either of length =1 or =length(fun.aggregate).")
     }
+    only_one_fun = length(unlist(funs)) == 1L
     dots = list(...)
     construct_funs <- function(fun, val) {
         if (!is.list(fun)) fun = list(fun)
@@ -80,7 +83,8 @@ aggregate_funs <- function(funs, vals, sep="_", ...) {
                     expr = c(expr, dots)
                 ans[[k]] = as.call(expr)
                 # changed order of arguments here, #1153
-                nms[k] = paste(j, all.names(i, max.names=1L, functions=TRUE), sep=sep)
+                nms[k] = if (only_one_fun) j else 
+                            paste(j, all.names(i, max.names=1L, functions=TRUE), sep=sep)
                 k = k+1L;
             }
         }
@@ -92,9 +96,8 @@ aggregate_funs <- function(funs, vals, sep="_", ...) {
 
 dcast.data.table <- function(data, formula, fun.aggregate = NULL, sep = "_", ..., margins = NULL, subset = NULL, fill = NULL, drop = TRUE, value.var = guess(data), verbose = getOption("datatable.verbose")) {
     if (!is.data.table(data)) stop("'data' must be a data.table.")
-    if (anyDuplicated(names(data))) stop('data.table to cast must have unique column names')
-    drop = as.logical(drop[1])
-    if (is.na(drop)) stop("'drop' must be logical TRUE/FALSE")
+    drop = as.logical(rep(drop, length.out=2L))
+    if (any(is.na(drop))) stop("'drop' must be logical TRUE/FALSE")
     lvals = value_vars(value.var, names(data))
     valnames = unique(unlist(lvals))
     lvars = check_formula(formula, names(data), valnames)
@@ -177,14 +180,15 @@ dcast.data.table <- function(data, formula, fun.aggregate = NULL, sep = "_", ...
     if (length(rhsnames)) {
         lhs = shallow(dat, lhsnames); rhs = shallow(dat, rhsnames); val = shallow(dat, valnames)
         # handle drop=TRUE/FALSE - Update: Logic moved to R, AND faster than previous version. Take that... old me :-).
-        if (drop) {
+        if (all(drop)) {
             map = setDT(lapply(list(lhsnames, rhsnames), function(cols) frankv(dat, cols=cols, ties.method="dense")))
             maporder = lapply(map, order_)
             mapunique = lapply(seq_along(map), function(i) .Call(CsubsetVector, map[[i]], maporder[[i]]))
             lhs = .Call(CsubsetDT, lhs, maporder[[1L]], seq_along(lhs))
             rhs = .Call(CsubsetDT, rhs, maporder[[2L]], seq_along(rhs))
         } else {
-            lhs_ = cj_uniq(lhs); rhs_ = cj_uniq(rhs)
+            lhs_ = if (!drop[1L]) cj_uniq(lhs) else setkey(unique(lhs, by=names(lhs)))
+            rhs_ = if (!drop[2L]) cj_uniq(rhs) else setkey(unique(rhs, by=names(rhs)))
             map = vector("list", 2L)
             .Call(Csetlistelt, map, 1L, lhs_[lhs, which=TRUE])
             .Call(Csetlistelt, map, 2L, rhs_[rhs, which=TRUE])
@@ -196,10 +200,11 @@ dcast.data.table <- function(data, formula, fun.aggregate = NULL, sep = "_", ...
         }
         maplen = sapply(mapunique, length)
         idx = do.call("CJ", mapunique)[map, I := .I][["I"]] # TO DO: move this to C and avoid materialising the Cross Join.
-        ans = .Call("Cfcast", lhs, val, maplen[[1L]], maplen[[2L]], idx, fill, fill.default, is.null(fun.call))
+        ans = .Call(Cfcast, lhs, val, maplen[[1L]], maplen[[2L]], idx, fill, fill.default, is.null(fun.call))
         allcols = do.call("paste", c(rhs, sep=sep))
         if (length(valnames) > 1L)
-            allcols = do.call("paste", c(CJ(valnames, allcols, sorted=FALSE), sep=sep))
+            allcols = do.call("paste", if (identical(".", allcols)) list(valnames, sep=sep) 
+                        else c(CJ(valnames, allcols, sorted=FALSE), sep=sep))
             # removed 'setcolorder()' here, #1153
         setattr(ans, 'names', c(lhsnames, allcols))
         setDT(ans); setattr(ans, 'sorted', lhsnames)
@@ -217,7 +222,7 @@ dcast.data.table <- function(data, formula, fun.aggregate = NULL, sep = "_", ...
             lhs_ = cj_uniq(lhs)
             idx = lhs_[lhs, I := .I][["I"]]
             lhs_[, I := NULL]
-            ans = .Call("Cfcast", lhs_, val, nrow(lhs_), 1L, idx, fill, fill.default, is.null(fun.call))
+            ans = .Call(Cfcast, lhs_, val, nrow(lhs_), 1L, idx, fill, fill.default, is.null(fun.call))
             setDT(ans); setattr(ans, 'sorted', lhsnames)
             setnames(ans, c(lhsnames, valnames))
         }
diff --git a/R/fmelt.R b/R/fmelt.R
index 348c9f2..1746c4f 100644
--- a/R/fmelt.R
+++ b/R/fmelt.R
@@ -6,8 +6,11 @@ melt <- function(data, ..., na.rm = FALSE, value.name = "value") {
       reshape2::melt(data, ..., na.rm=na.rm, value.name=value.name)
 }
 
-patterns <- function(...) {
+patterns <- function(..., cols=character(0)) {
     p = unlist(list(...), use.names=FALSE)
+    if (!is.character(p))
+        stop("Input patterns must be of type character.")
+    lapply(p, grep, cols)
 }
 
 melt.data.table <- function(data, id.vars, measure.vars, variable.name = "variable", 
@@ -18,22 +21,28 @@ melt.data.table <- function(data, id.vars, measure.vars, variable.name = "variab
     if (missing(measure.vars)) measure.vars = NULL
     measure.sub = substitute(measure.vars)
     if (is.call(measure.sub) && measure.sub[[1L]] == "patterns") {
-        measure.vars = lapply(eval(measure.sub), grep, names(data))    
+        measure.sub = as.list(measure.sub)[-1L]
+        idx = which(names(measure.sub) %in% "cols")
+        if (length(idx)) {
+            cols = eval(measure.sub[["cols"]], parent.frame())
+            measure.sub = measure.sub[-idx]
+        } else cols = names(data)
+        pats = lapply(measure.sub, eval, parent.frame())
+        measure.vars = patterns(pats, cols=cols)
     }
-    if (is.list(measure.vars)) {
+    if (is.list(measure.vars) && length(measure.vars) > 1L) {
         if (length(value.name) == 1L)  
           value.name = paste(value.name, seq_along(measure.vars), sep="")
     }
-    ans <- .Call("Cfmelt", data, id.vars, measure.vars, 
+    ans <- .Call(Cfmelt, data, id.vars, measure.vars, 
             as.logical(variable.factor), as.logical(value.factor), 
             variable.name, value.name, as.logical(na.rm), 
             as.logical(verbose))
     setDT(ans)
     if (any(duplicated(names(ans)))) {
-        cat("Duplicate column names found in molten data.table. Setting unique names using 'make.names'")   
+        cat("Duplicate column names found in molten data.table. Setting unique names using 'make.names'\n")
         setnames(ans, make.unique(names(ans)))
     }
     setattr(ans, 'sorted', NULL)
     ans
 }
-
diff --git a/R/foverlaps.R b/R/foverlaps.R
index 9f955b6..80fa573 100644
--- a/R/foverlaps.R
+++ b/R/foverlaps.R
@@ -45,6 +45,10 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by.
         stop("Duplicate columns are not allowed in overlap joins. This may change in the future.")
     if (length(by.x) != length(by.y))
         stop("length(by.x) != length(by.y). Columns specified in by.x should correspond to columns specified in by.y and should be of same lengths.")
+    if (any(dup.x<-duplicated(names(x)))) #1730 - handling join possible but would require workarounds on setcolorder further, it is really better just to rename dup column
+        stop("x has some duplicated column name(s): ",paste(unique(names(x)[dup.x]),collapse=","),". Please remove or rename the duplicate(s) and try again.")
+    if (any(dup.y<-duplicated(names(y))))
+        stop("y has some duplicated column name(s): ",paste(unique(names(y)[dup.y]),collapse=","),". Please remove or rename the duplicate(s) and try again.")
     
     xnames = by.x; xintervals = tail(xnames, 2L);
     ynames = by.y; yintervals = tail(ynames, 2L);
@@ -113,7 +117,7 @@ foverlaps <- function(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), by.
     matches <- function(ii, xx, del, ...) {
         cols = setdiff(names(xx), del)
         xx = shallow(xx, cols)
-        ans = bmerge(xx, ii, seq_along(xx), seq_along(xx), haskey(xx), integer(0), verbose=verbose, ...)
+	ans = bmerge(xx, ii, seq_along(xx), seq_along(xx), haskey(xx), integer(0), mult=mult, ops=rep(1L, length(xx)), integer(0), 1L, verbose=verbose, ...)
         # vecseq part should never run here, but still...
         if (ans$allLen1) ans$starts else vecseq(ans$starts, ans$lens, NULL)
     }
diff --git a/R/frank.R b/R/frank.R
new file mode 100644
index 0000000..b02b1cf
--- /dev/null
+++ b/R/frank.R
@@ -0,0 +1,93 @@
+frankv <- function(x, cols=seq_along(x), order=1L, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min", "dense")) {
+    ties.method = match.arg(ties.method)
+    if (!length(na.last)) stop('length(na.last) = 0')
+    if (length(na.last) != 1L) {
+        warning("length(na.last) > 1, only the first element will be used")
+        na.last = na.last[1L]
+    }
+    keep = (na.last == "keep")
+    na.last = as.logical(na.last)
+    as_list <- function(x) {
+        xx = vector("list", 1L)
+        .Call(Csetlistelt, xx, 1L, x)
+        xx
+    }
+    if (is.atomic(x)) {
+        if (!missing(cols) && !is.null(cols)) 
+            stop("x is a single vector, non-NULL 'cols' doesn't make sense")
+        cols = 1L
+        x = as_list(x)
+    } else {
+        if (!length(cols))
+            stop("x is a list, 'cols' can not be 0-length")
+        if (is.character(cols)) 
+            cols = chmatch(cols, names(x))
+        cols = as.integer(cols)
+    }
+    x = .shallow(x, cols) # shallow copy even if list..
+    setDT(x)
+    cols = seq_along(cols)
+    if (is.na(na.last)) {
+        set(x, j = "..na_prefix..", value = is_na(x, cols))
+        order = if (length(order) == 1L) c(1L, rep(order, length(cols))) else c(1L, order)
+        cols = c(ncol(x), cols)
+        nas  = x[[ncol(x)]]
+    }
+    if (ties.method == "random") {
+        set(x, i = if (is.na(na.last)) which_(nas, FALSE) else NULL, 
+               j = "..stats_runif..", 
+               value = stats::runif(nrow(x)))
+        order = if (length(order) == 1L) c(rep(order, length(cols)), 1L) else c(order, 1L)
+        cols = c(cols, ncol(x))
+    }
+    xorder  = forderv(x, by=cols, order=order, sort=TRUE, retGrp=TRUE, 
+                na.last=if (identical(na.last, FALSE)) na.last else TRUE)
+    xstart  = attr(xorder, 'starts')
+    xsorted = FALSE
+    if (!length(xorder)) {
+        xsorted = TRUE
+        xorder  = seq_along(x[[1L]])
+    }
+    ans = switch(ties.method, 
+           average = , min = , max =, dense = {
+               rank = .Call(Cfrank, xorder, xstart, uniqlengths(xstart, length(xorder)), ties.method)
+           },
+           first = , random = {
+               if (xsorted) xorder else forderv(xorder)
+           }
+         )
+    # take care of na.last="keep"
+    V1 = NULL # for R CMD CHECK warning
+    if (isTRUE(keep)) {
+        ans = (setDT(as_list(ans))[which_(nas, TRUE), V1 := NA])[[1L]]
+    } else if (is.na(na.last)) {
+        ans = ans[which_(nas, FALSE)]
+    }
+    ans
+}
+
+frank <- function(x, ..., na.last=TRUE, ties.method=c("average", "first", "random", "max", "min", "dense")) {
+    cols = substitute(list(...))[-1]
+    if (identical(as.character(cols), "NULL")) {
+        cols  = NULL
+        order = 1L
+    } else if (length(cols)) {
+        cols=as.list(cols)
+        order=rep(1L, length(cols))
+        for (i in seq_along(cols)) {
+            v=as.list(cols[[i]])
+            if (length(v) > 1 && v[[1L]] == "+") v=v[[-1L]]
+            else if (length(v) > 1 && v[[1L]] == "-") {
+                v=v[[-1L]]
+                order[i] = -1L
+            }
+            cols[[i]]=as.character(v)
+        }
+        cols=unlist(cols, use.names=FALSE)
+    } else {
+        cols=colnames(x)
+        order=if (is.null(cols)) 1L else rep(1L, length(cols))
+    }
+    frankv(x, cols=cols, order=order, na.last=na.last, ties.method=ties.method)
+
+}
diff --git a/R/fread.R b/R/fread.R
index 7cf55d4..8efb633 100644
--- a/R/fread.R
+++ b/R/fread.R
@@ -1,13 +1,15 @@
 
-fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.strings="NA",stringsAsFactors=FALSE,verbose=getOption("datatable.verbose"),autostart=1L,skip=0L,select=NULL,drop=NULL,colClasses=NULL,integer64=getOption("datatable.integer64"),dec=if (sep!=".") "." else ",", col.names, check.names=FALSE, encoding="unknown", strip.white=TRUE, showProgress=getOption("datatable.showProgress"),data.table=getOption("datatable.fread.datatable")) {
+fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.strings="NA",file,stringsAsFactors=FALSE,verbose=getOption("datatable.verbose"),autostart=1L,skip=0L,select=NULL,drop=NULL,colClasses=NULL,integer64=getOption("datatable.integer64"),dec=if (sep!=".") "." else ",", col.names, check.names=FALSE, encoding="unknown", quote="\"", strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, showProgress=getOption("datatable.showProgress"),data.table=getOption("data [...]
+{    
     if (!is.character(dec) || length(dec)!=1L || nchar(dec)!=1) stop("dec must be a single character e.g. '.' or ','")
     # handle encoding, #563
-    if (!encoding %in% c("unknown", "UTF-8", "Latin-1")) {
+    if (length(encoding) != 1L || !encoding %in% c("unknown", "UTF-8", "Latin-1")) {
         stop("Argument 'encoding' must be 'unknown', 'UTF-8' or 'Latin-1'.")
     }
-    if (!strip.white %in% c(TRUE, FALSE)) {
-        stop("Argument 'strip.white' must be logical TRUE/FALSE")
-    }
+    isLOGICAL = function(x) isTRUE(x) || identical(FALSE, x)
+    stopifnot( isLOGICAL(strip.white), isLOGICAL(blank.lines.skip), isLOGICAL(fill), isLOGICAL(showProgress),
+               isLOGICAL(stringsAsFactors), isLOGICAL(verbose), isLOGICAL(check.names) )
+    
     if (getOption("datatable.fread.dec.experiment") && Sys.localeconv()["decimal_point"] != dec) {
         oldlocale = Sys.getlocale("LC_NUMERIC")
         if (verbose) cat("dec='",dec,"' but current locale ('",oldlocale,"') has dec='",Sys.localeconv()["decimal_point"],"'. Attempting to change locale to one that has the desired decimal point.\n",sep="")
@@ -35,7 +37,7 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
                 }
                 if (toupper(tt)!=toupper(i)) {
                     warning(cmd, " returned '",tt,"' != '",i,"' (not NULL not '' and allowing for case differences). This may not be a problem but please report.")
-                } 
+                }
                 if (Sys.localeconv()["decimal_point"] == dec) break
                 if (verbose) cat("Successfully changed locale but it provides dec='",Sys.localeconv()["decimal_point"],"' not the desired dec", sep="")
             }
@@ -45,6 +47,12 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
         }
         if (verbose) cat("This R session's locale is now '",tt,"' which provides the desired decimal point for reading numerics in the file - success! The locale will be restored to what it was ('",oldlocale,") even if the function fails for other reasons.\n")
     }
+    # map file as input
+    if (!missing(file)) {
+        if (!identical(input, "")) stop("You can provide 'input' or 'file', not both.")
+        if (!file.exists(file)) stop(sprintf("Provided file '%s' does not exists.", file))
+        input = file
+    }
 
     is_url <- function(x) grepl("^(http|ftp)s?://", x)
     is_secureurl <- function(x) grepl("^(http|ftp)s://", x)
@@ -56,7 +64,12 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
         on.exit(unlink(tt), add = TRUE)
         # In text mode on Windows-only, R doubles up \r to make \r\r\n line endings. mode="wb" avoids that. See ?connections:"CRLF"
         if (!is_secureurl(input)) {
-            download.file(input, tt, mode = "wb", quiet = !showProgress)
+            #1668 - force "auto" when is_file to
+            #  ensure we don't use an invalid option, e.g. wget
+            method <- if (is_file(input)) "auto" else 
+                getOption("download.file.method", default = "auto")
+            download.file(input, tt, method = method,
+                          mode = "wb", quiet = !showProgress)
         } else {
             if (!requireNamespace("curl", quietly = TRUE))
                 stop("Input URL requires https:// connection for which fread() requires 'curl' package, but cannot be found. Please install the package using 'install.packages()'.")
@@ -65,6 +78,8 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
         input = tt
     } else if (input == "" || length(grep('\\n|\\r', input)) > 0) {
         # text input
+    } else if (isTRUE(file.info(input)$isdir)) { # fix for #989, dir.exists() requires v3.2+
+        stop("'input' can not be a directory name, but must be a single character string containing a file name, a command, full path to a file, a URL starting 'http[s]://', 'ftp[s]://' or 'file://', or the input data itself.")
     } else if (!file.exists(input)) {
         if (length(grep(' ', input)) == 0) stop("File '",input,"' does not exist. Include one or more spaces to consider the input a system command.")
         tt = tempfile()
@@ -81,8 +96,8 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
     }
     if (identical(header,"auto")) header=NA
     if (identical(sep,"auto")) sep=NULL
-    if (is.atomic(colClasses) && !is.null(names(colClasses))) colClasses = tapply(names(colClasses),colClasses,c,simplify=FALSE)
-    ans = .Call(Creadfile,input,sep,as.integer(nrows),header,na.strings,verbose,as.integer(autostart),skip,select,drop,colClasses,integer64,dec,encoding,strip.white,as.integer(showProgress))
+    if (is.atomic(colClasses) && !is.null(names(colClasses))) colClasses = tapply(names(colClasses),colClasses,c,simplify=FALSE) # named vector handling
+    ans = .Call(Creadfile,input,sep,as.integer(nrows),header,na.strings,verbose,as.integer(autostart),skip,select,drop,colClasses,integer64,dec,encoding,quote,strip.white,blank.lines.skip,fill,showProgress)
     nr = length(ans[[1]])
     if ( integer64=="integer64" && !exists("print.integer64") && any(sapply(ans,inherits,"integer64")) )
         warning("Some columns have been read as type 'integer64' but package bit64 isn't loaded. Those columns will display as strange looking floating point data. There is no need to reload the data. Just require(bit64) to obtain the integer64 print method and print the data again.")
@@ -94,28 +109,58 @@ fread <- function(input="",sep="auto",sep2="auto",nrows=-1L,header="auto",na.str
     } else {
         setattr(ans, "class", "data.frame")
     }
-    if (isTRUE(as.logical(check.names))) {
-        setattr(ans, 'names', make.unique(names(ans)))
+    # #1027, make.unique -> make.names as spotted by @DavidArenberg
+    if (check.names) {
+        setattr(ans, 'names', make.names(names(ans), unique=TRUE))
     }
-    as_factor <- function(x) {
-        lev = forderv(x, retGrp = TRUE)
-        # get levels, also take care of all sorted condition
-        if (length(lev)) lev = x[lev[attributes(lev)$starts]]
-        else lev = x[attributes(lev)$starts]
-        ans = chmatch(x, lev)
-        setattr(ans, 'levels', lev)
-        setattr(ans, 'class', 'factor')
-    }
-    if (isTRUE(as.logical(stringsAsFactors))) {
+    cols = NULL
+    if (stringsAsFactors)
         cols = which(vapply(ans, is.character, TRUE))
-        if (length(cols)) {
-            if (verbose) cat("Converting column(s) [", paste(names(ans)[cols], collapse = ", "), "] from 'char' to 'factor'\n", sep = "")
-            for (j in cols) 
-                set(ans, j = j, value = as_factor(.subset2(ans, j)))
+    else if (length(colClasses)) {
+        if (is.list(colClasses) && "factor" %in% names(colClasses))
+            cols = colClasses[["factor"]]
+        else if (is.character(colClasses) && "factor" %chin% colClasses)
+            cols = which(colClasses=="factor")
+    }
+    setfactor(ans, cols, verbose)
+    if (!missing(select)) {
+        # fix for #1445
+        if (is.numeric(select)) {
+            reorder = if (length(o <- forderv(select))) o else seq_along(select)
+        } else {
+            reorder = select[select %chin% names(ans)]
+            # any missing columns are warning about in fread.c and skipped
         }
+        setcolorder(ans, reorder)
     }
     # FR #768
     if (!missing(col.names))
         setnames(ans, col.names) # setnames checks and errors automatically
+    if (!is.null(key) && data.table) {
+        if (!is.character(key)) 
+            stop("key argument of data.table() must be character")
+        if (length(key) == 1L) {
+            key = strsplit(key, split = ",")[[1L]]
+        }
+        setkeyv(ans, key)
+    }
     ans
 }
+
+# for internal use only. Used in `fread` and `data.table` for 'stringsAsFactors' argument
+setfactor <- function(x, cols, verbose) {
+    # simplified but faster version of `factor()` for internal use.
+    as_factor <- function(x) {
+        lev = forderv(x, retGrp = TRUE, na.last = NA)
+        # get levels, also take care of all sorted condition
+        lev = if (length(lev)) x[lev[attributes(lev)$starts]] else x[attributes(lev)$starts]
+        ans = chmatch(x, lev)
+        setattr(ans, 'levels', lev)
+        setattr(ans, 'class', 'factor')
+    }
+    if (length(cols)) {
+        if (verbose) cat("Converting column(s) [", paste(names(x)[cols], collapse = ", "), "] from 'char' to 'factor'\n", sep = "")
+        for (j in cols) set(x, j = j, value = as_factor(.subset2(x, j)))
+    }
+    invisible(x)
+}
diff --git a/R/fwrite.R b/R/fwrite.R
new file mode 100644
index 0000000..90e2ffc
--- /dev/null
+++ b/R/fwrite.R
@@ -0,0 +1,56 @@
+fwrite <- function(x, file="", append=FALSE, quote="auto",
+                   sep=",", sep2=c("","|",""), eol=if (.Platform$OS.type=="windows") "\r\n" else "\n",
+                   na="", dec=".", row.names=FALSE, col.names=TRUE,
+                   qmethod=c("double","escape"),
+                   logicalAsInt=FALSE, dateTimeAs = c("ISO","squash","epoch","write.csv"),
+                   buffMB=8, nThread=getDTthreads(),
+                   showProgress = getOption("datatable.showProgress"),
+                   verbose = getOption("datatable.verbose"),
+                   ..turbo=TRUE) {
+    isLOGICAL = function(x) isTRUE(x) || identical(FALSE, x)  # it seems there is no isFALSE in R?
+    na = as.character(na[1L]) # fix for #1725
+    if (missing(qmethod)) qmethod = qmethod[1L]
+    if (missing(dateTimeAs)) dateTimeAs = dateTimeAs[1L]
+    else if (length(dateTimeAs)>1) stop("dateTimeAs must be a single string")
+    dateTimeAs = chmatch(dateTimeAs, c("ISO","squash","epoch","write.csv"))-1L
+    if (is.na(dateTimeAs)) stop("dateTimeAs must be 'ISO','squash','epoch' or 'write.csv'")
+    buffMB = as.integer(buffMB)
+    nThread = as.integer(nThread)
+    # write.csv default is 'double' so fwrite follows suit. write.table's default is 'escape'
+    # validate arguments
+    stopifnot(is.list(x), ncol(x) > 0L,
+        identical(quote,"auto") || identical(quote,FALSE) || identical(quote,TRUE),
+        is.character(sep) && length(sep)==1L && nchar(sep) == 1L,
+        is.character(sep2) && length(sep2)==3L && nchar(sep2[2L])==1L,
+        is.character(dec) && length(dec)==1L && nchar(dec) == 1L,
+        dec != sep,  # sep2!=dec and sep2!=sep checked at C level when we know if list columns are present 
+        is.character(eol) && length(eol)==1L,
+        length(qmethod) == 1L && qmethod %in% c("double", "escape"),
+        isLOGICAL(col.names), isLOGICAL(append), isLOGICAL(row.names),
+        isLOGICAL(verbose), isLOGICAL(showProgress), isLOGICAL(logicalAsInt),
+        length(na) == 1L, #1725, handles NULL or character(0) input
+        isLOGICAL(..turbo),
+        is.character(file) && length(file)==1 && !is.na(file),
+        length(buffMB)==1 && !is.na(buffMB) && 1<=buffMB && buffMB<=1024,
+        length(nThread)==1 && !is.na(nThread) && nThread>=1
+        )
+    file <- path.expand(file)  # "~/foo/bar"
+    if (append && missing(col.names) && (file=="" || file.exists(file)))
+        col.names = FALSE  # test 1658.16 checks this
+    if (!..turbo) warning("The ..turbo=FALSE option will be removed in future. Please report any problems with ..turbo=TRUE.")
+    if (identical(quote,"auto")) quote=NA  # logical NA
+    if (file=="") {
+        # console output (Rprintf) isn't thread safe.
+        # Perhaps more so on Windows (as experienced) than Linux
+        nThread=1L
+        showProgress=FALSE
+    }
+   
+    .Call(Cwritefile, x, file, sep, sep2, eol, na, dec, quote, qmethod=="escape", append,
+                      row.names, col.names, logicalAsInt, dateTimeAs, buffMB, nThread,
+                      showProgress, verbose, ..turbo)
+    invisible()
+}
+
+genLookups = function() invisible(.Call(CgenLookups))
+
diff --git a/R/last.R b/R/last.R
index 368ce7d..be28760 100644
--- a/R/last.R
+++ b/R/last.R
@@ -4,10 +4,34 @@
 # We'd like last() on vectors to be fast, so that's a direct x[NROW(x)] as it was in data.table, otherwise use xts's.
 # If xts is loaded higher than data.table, xts::last will work but slower.
 last <- function(x, ...) {
-    if (nargs()==1L || !"package:xts" %in% search()) {
-        if (is.data.frame(x)) return(x[NROW(x),])
-        if (!length(x)) return(x) else return(x[[length(x)]])  # for vectors, [[ works like [
+    if (nargs()==1L) {
+        if (is.vector(x)) {
+        	if (!length(x)) return(x) else return(x[[length(x)]])  # for vectors, [[ works like [
+        } else if (is.data.frame(x)) return(x[NROW(x),])
+    }
+    if(!requireNamespace("xts", quietly = TRUE)) {
+        tail(x, n = 1L, ...)
+    } else {
+        # fix with suggestion from Joshua, #1347
+        if (!"package:xts" %in% search()) {
+            tail(x, n = 1L, ...)
+        } else xts::last(x, ...) # UseMethod("last") doesn't find xts's methods, not sure what I did wrong.
     }
-    xts::last(x,...)   # UseMethod("last") doesn't find xts's methods, not sure what I did wrong.
 }
 
+# first(), similar to last(), not sure why this wasn't exported in the first place...
+first <- function(x, ...) {
+    if (nargs()==1L) {
+        if (is.vector(x)) {
+            if (!length(x)) return(x) else return(x[[1L]])
+        } else if (is.data.frame(x)) return(x[1L,])
+    }
+    if(!requireNamespace("xts", quietly = TRUE)) {
+        head(x, n = 1L, ...)
+    } else {
+        # fix with suggestion from Joshua, #1347
+        if (!"package:xts" %in% search()) {
+            head(x, n = 1L, ...)
+        } else xts::first(x, ...)
+    }
+}
diff --git a/R/merge.R b/R/merge.R
index cde76d2..3832be8 100644
--- a/R/merge.R
+++ b/R/merge.R
@@ -8,6 +8,7 @@ merge.data.table <- function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FA
             by = key(x)
         }
     }
+    if ((x0 <- length(x)==0L) | (y0 <- length(y)==0L)) warning(sprintf("You are trying to join data.tables where %s 0 columns data.table.", if(x0 & y0) "'x' and 'y' arguments are" else if(x0 & !y0) "'x' argument is" else if(!x0 & y0) "'y' argument is"))
     if (any(duplicated(names(x)))) stop("x has some duplicated column name(s): ",paste(names(x)[duplicated(names(x))],collapse=","),". Please remove or rename the duplicate(s) and try again.")
     if (any(duplicated(names(y)))) stop("y has some duplicated column name(s): ",paste(names(y)[duplicated(names(y))],collapse=","),". Please remove or rename the duplicate(s) and try again.")
     
@@ -31,11 +32,12 @@ merge.data.table <- function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FA
         if (is.null(by)) 
             by = key(x)
         if (is.null(by))
-            stop("Can not match keys in x and y to automatically determine appropriate `by` parameter. Please set `by` value explicitly.")
+            by = intersect(names(x), names(y))
         if (length(by) == 0L || !is.character(by))
             stop("A non-empty vector of column names for `by` is required.")
         if (!all(by %in% intersect(colnames(x), colnames(y))))
             stop("Elements listed in `by` must be valid column names in x and y")
+        by = unname(by)
         by.x = by.y = by
     }
     # with i. prefix in v1.9.3, this goes away. Left here for now ...
@@ -50,7 +52,7 @@ merge.data.table <- function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FA
         end[chmatch(dupnames, end, 0L)] = paste(dupnames, suffixes[2L], sep="")
     }
 
-    dt = y[x,nomatch=ifelse(all.x,NA,0),on=by,allow.cartesian=allow.cartesian]   # includes JIS columns (with a i. prefix if conflict with x names)
+    dt = y[x,nomatch = if (all.x) NA else 0,on=by,allow.cartesian=allow.cartesian]   # includes JIS columns (with a i. prefix if conflict with x names)
 
     if (all.y && nrow(y)) {  # If y does not have any rows, no need to proceed
         # Perhaps not very commonly used, so not a huge deal that the join is redone here.
@@ -76,5 +78,9 @@ merge.data.table <- function(x, y, by = NULL, by.x = NULL, by.y = NULL, all = FA
     if (nrow(dt) > 0L) {
       setkeyv(dt, if (sort) by.x else NULL)
     }
+    # merge resets class, #1378. X[Y] is quite clear that X is being *subset* by Y, 
+    # makes sense to therefore retain X's class, unlike `merge`. Hard to tell what 
+    # class to retain for *full join* for example. 
+    setattr(dt, 'class', c("data.table", "data.frame"))
     dt
 }
diff --git a/R/onAttach.R b/R/onAttach.R
index 95b0c84..ae2d481 100644
--- a/R/onAttach.R
+++ b/R/onAttach.R
@@ -1,8 +1,26 @@
 .onAttach <- function(libname, pkgname) {
-    # Runs when attached to search() path such as by library() or require()
-    if (interactive()) {
-        packageStartupMessage('data.table ',as.character(packageVersion("data.table")),'  For help type ?data.table or https://github.com/Rdatatable/data.table/wiki')
-        packageStartupMessage('The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way')
+  # Runs when attached to search() path such as by library() or require()
+  if (interactive()) {
+    v = packageVersion("data.table")
+    d = read.dcf(system.file("DESCRIPTION", package="data.table"), fields = c("Packaged", "Built"))
+    if(is.na(d[1])){
+      if(is.na(d[2])){
+        return() #neither field exists
+      } else{
+        d = unlist(strsplit(d[2], split="; "))[3]
+      }
+    } else {
+      d = d[1]
     }
+    dev = as.integer(v[1,3])%%2 == 1  # version number odd => dev
+    packageStartupMessage("data.table ", v, if(dev) paste0(" IN DEVELOPMENT built ", d))
+    if (dev && (Sys.Date() - as.Date(d))>28)
+        packageStartupMessage("**********\nThis development version of data.table was built more than 4 weeks ago. Please update.\n**********")
+    if (!.Call(ChasOpenMP))
+        packageStartupMessage("**********\nThis installation of data.table has not detected OpenMP support. It will still work but in single-threaded mode.\n**********")
+    packageStartupMessage('  The fastest way to learn (by data.table authors): https://www.datacamp.com/courses/data-analysis-the-data-table-way')
+    packageStartupMessage('  Documentation: ?data.table, example(data.table) and browseVignettes("data.table")')
+    packageStartupMessage('  Release notes, videos and slides: http://r-datatable.com')
+  }
 }
 
diff --git a/R/onLoad.R b/R/onLoad.R
index 734e162..3f7dc08 100644
--- a/R/onLoad.R
+++ b/R/onLoad.R
@@ -32,22 +32,30 @@
              "datatable.optimize"="Inf",             # datatable.<argument name>
              "datatable.print.nrows"="100L",         # datatable.<argument name>
              "datatable.print.topn"="5L",            # datatable.<argument name>
+             "datatable.print.class"="FALSE",        # for print.data.table
+             "datatable.print.rownames"="TRUE",      # for print.data.table
              "datatable.allow.cartesian"="FALSE",    # datatable.<argument name>
              "datatable.dfdispatchwarn"="TRUE",                   # not a function argument
              "datatable.warnredundantby"="TRUE",                  # not a function argument
-             "datatable.alloccol"="quote(max(100L,ncol(DT)+64L))",# argument 'n' of alloc.col. Allocate at least 64 spare slots by default. Needs to be 100L floor to save small object reallocs.
+             "datatable.alloccol"="1024L",           # argument 'n' of alloc.col. Over-allocate 1024 spare column slots
              "datatable.integer64"="'integer64'",    # datatable.<argument name>    integer64|double|character
-             "datatable.showProgress"="1L",          # in fread
+             "datatable.showProgress"="TRUE",        # in fread and fwrite
              "datatable.auto.index"="TRUE",          # DT[col=="val"] to auto add index so 2nd time faster
+             "datatable.use.index"="TRUE",           # global switch to address #1422
              "datatable.fread.datatable"="TRUE",
-             "datatable.old.bywithoutby"="FALSE",    # temp rollback method for code migration, will be removed in future
              "datatable.fread.dec.experiment"="TRUE", # temp.  will remove once stable
              "datatable.fread.dec.locale"=if (.Platform$OS.type=="unix") "'fr_FR.utf8'" else "'French_France.1252'",
-             "datatable.prettyprint.char" = NULL # FR #1091
+             "datatable.prettyprint.char" = NULL, # FR #1091
+             "datatable.old.unique.by.key" = "FALSE",  # TODO: warn 1 year, remove after 2 years
+             "datatable.WhenJisSymbolThenCallingScope" = "FALSE"   # TODO: warn (asking user to change to DT[,"someCol"] or DT[["someCol"]], then change default, then remove.
              )
     for (i in setdiff(names(opts),names(options()))) {
         eval(parse(text=paste("options(",i,"=",opts[i],")",sep="")))
     }
+    
+    if (!is.null(getOption("datatable.old.bywithoutby")))
+        warning("Option 'datatable.old.bywithoutby' has been removed as warned for 2 years. It is now ignored. Please use by=.EACHI instead and stop using this option.")
+    
     # reshape2
     # Tried this :
     # if (!"package:reshape2" %in% search()) {
diff --git a/R/openmp-utils.R b/R/openmp-utils.R
new file mode 100644
index 0000000..ebc7066
--- /dev/null
+++ b/R/openmp-utils.R
@@ -0,0 +1,8 @@
+setDTthreads <- function(threads) {
+    invisible(.Call(CsetDTthreads, as.integer(threads)))
+}
+
+getDTthreads <- function() {
+    .Call(CgetDTthreads)
+}
+
diff --git a/R/setkey.R b/R/setkey.R
index 6be9e53..f229511 100644
--- a/R/setkey.R
+++ b/R/setkey.R
@@ -7,8 +7,18 @@ setkey <- function(x, ..., verbose=getOption("datatable.verbose"), physical=TRUE
     setkeyv(x, cols, verbose=verbose, physical=physical)
 }
 
-set2key <- function(...) setkey(..., physical=FALSE)
-set2keyv <- function(...) setkeyv(..., physical=FALSE)
+# FR #1442
+setindex <- function(...) setkey(..., physical=FALSE)
+setindexv <- function(...) setkeyv(..., physical=FALSE)
+
+set2key <- function(...) {
+    warning("set2key() will be deprecated in the next relase. Please use setindex() instead.", call.=FALSE)
+    setkey(..., physical=FALSE)
+}
+set2keyv <- function(...) {
+    warning("set2key() will be deprecated in the next relase. Please use setindex() instead.", call.=FALSE)
+    setkeyv(..., physical=FALSE)
+}
 
 setkeyv <- function(x, cols, verbose=getOption("datatable.verbose"), physical=TRUE)
 {
@@ -71,10 +81,17 @@ setkeyv <- function(x, cols, verbose=getOption("datatable.verbose"), physical=TR
 
 key <- function(x) attr(x,"sorted",exact=TRUE)
 key2 <- function(x) {
+    warning("key2() will be deprecated in the next relase. Please use indices() instead.", call.=FALSE)
+    ans = names(attributes(attr(x,"index",exact=TRUE)))
+    if (is.null(ans)) return(ans) # otherwise character() gets returned by next line
+    gsub("^__","",ans)
+}
+indices <- function(x) {
     ans = names(attributes(attr(x,"index",exact=TRUE)))
     if (is.null(ans)) return(ans) # otherwise character() gets returned by next line
     gsub("^__","",ans)
 }
+
 get2key <- function(x, col) attr(attr(x,"index",exact=TRUE),paste("__",col,sep=""),exact=TRUE)   # work in progress, not yet exported
 
 "key<-" <- function(x,value) {
@@ -206,10 +223,19 @@ forder <- function(x, ..., na.last=TRUE, decreasing=FALSE)
     o
 }
 
-fsort <- function(x, decreasing = FALSE, na.last = FALSE, ...)
+fsort <- function(x, decreasing = FALSE, na.last = FALSE, internal=FALSE, verbose=FALSE, ...)
 {
-    o = forderv(x, order=!decreasing, na.last=na.last)
-    return( if (length(o)) x[o] else x )   # TO DO: document the nice efficiency here
+    if (typeof(x)=="double" && !decreasing && !na.last) {
+      if (internal) stop("Internal code should not be being called on type double")
+      return(.Call(Cfsort, x, verbose))
+    } else {
+      # fsort is now exported for testing. Trying to head off complaints "it's slow on integer"
+      # The only places internally we use fsort internally (3 calls, all on integer) have had internal=TRUE added for now.
+      # TODO: implement integer and character in Cfsort and remove this branch and warning
+      if (!internal) warning("Input is not a vector of type double. New parallel sort has only been done for double vectors so far. Invoking relatively inefficient sort using order first.")
+      o = forderv(x, order=!decreasing, na.last=na.last)
+      return( if (length(o)) x[o] else x )   # TO DO: document this shortcut for already-sorted
+    }
 }
 
 setorder <- function(x, ..., na.last=FALSE)
@@ -300,6 +326,7 @@ CJ <- function(..., sorted = TRUE, unique = FALSE)
     l = list(...)
     if (unique) l = lapply(l, unique)
 
+    dups = FALSE # fix for #1513
     # using rep.int instead of rep speeds things up considerably (but attributes are dropped).
     j = lapply(l, class)  # changed "vapply" to avoid errors with "ordered" "factor" input
     if (length(l)==1L && sorted && length(o <- forderv(l[[1L]])))
@@ -310,7 +337,11 @@ CJ <- function(..., sorted = TRUE, unique = FALSE)
         x = c(rev(take(cumprod(rev(n)))), 1L)
         for (i in seq_along(x)) {
             y = l[[i]]
-            if (sorted && length(o <- forderv(y))) y = y[o]
+            # fix for #1513
+            if (sorted) {
+                if (length(o <- forderv(y, retGrp=TRUE))) y = y[o]
+                if (!dups) dups = attr(o, 'maxgrpn') > 1L 
+            }
             if (i == 1L) 
                 l[[i]] = rep.int(y, times = rep.int(x[i], n[i]))   # i.e. rep(y, each=x[i])
             else if (i == length(n))
@@ -331,326 +362,9 @@ CJ <- function(..., sorted = TRUE, unique = FALSE)
         setattr(l, "names", vnames)
     }
     l <- alloc.col(l)  # a tiny bit wasteful to over-allocate a fixed join table (column slots only), doing it anyway for consistency, and it's possible a user may wish to use SJ directly outside a join and would expect consistent over-allocation.
-    if (sorted) setattr(l, 'sorted', names(l))
-    l
-}
-
-frankv <- function(x, cols=seq_along(x), order=1L, na.last=TRUE, ties.method=c("average", "first", "random", "max", "min", "dense")) {
-    ties.method = match.arg(ties.method)
-    if (!length(na.last)) stop('length(na.last) = 0')
-    if (length(na.last) != 1L) {
-        warning("length(na.last) > 1, only the first element will be used")
-        na.last = na.last[1L]
-    }
-    keep = (na.last == "keep")
-    na.last = as.logical(na.last)
-    as_list <- function(x) {
-        xx = vector("list", 1L)
-        .Call(Csetlistelt, xx, 1L, x)
-        xx
+    if (sorted) {
+        if (!dups) setattr(l, 'sorted', names(l)) 
+        else setkey(l) # fix #1513
     }
-    if (is.atomic(x)) {
-        if (!missing(cols) && !is.null(cols)) 
-            stop("x is a single vector, non-NULL 'cols' doesn't make sense")
-        cols = 1L
-        x = as_list(x)
-    } else {
-        if (!length(cols))
-            stop("x is a list, 'cols' can not be 0-length")
-        if (is.character(cols)) 
-            cols = chmatch(cols, names(x))
-        cols = as.integer(cols)
-    }
-    x = .shallow(x, cols) # shallow copy even if list..
-    setDT(x)
-    cols = seq_along(cols)
-    if (is.na(na.last)) {
-        set(x, j = "..na_prefix..", value = is_na(x, cols))
-        order = if (length(order) == 1L) c(1L, rep(order, length(cols))) else c(1L, order)
-        cols = c(ncol(x), cols)
-        nas  = x[[ncol(x)]]
-    }
-    if (ties.method == "random") {
-        set(x, i = if (is.na(na.last)) which_(nas, FALSE) else NULL, 
-               j = "..stats_runif..", 
-               value = stats::runif(nrow(x)))
-        order = if (length(order) == 1L) c(rep(order, length(cols)), 1L) else c(order, 1L)
-        cols = c(cols, ncol(x))
-    }
-    xorder  = forderv(x, by=cols, order=order, sort=TRUE, retGrp=TRUE, 
-                na.last=if (identical(na.last, FALSE)) na.last else TRUE)
-    xstart  = attr(xorder, 'starts')
-    xsorted = FALSE
-    if (!length(xorder)) {
-        xsorted = TRUE
-        xorder  = seq_along(x[[1L]])
-    }
-    ans = switch(ties.method, 
-           average = , min = , max =, dense = {
-               rank = .Call(Cfrank, xorder, xstart, uniqlengths(xstart, length(xorder)), ties.method)
-           },
-           first = , random = {
-               if (xsorted) xorder else forderv(xorder)
-           }
-         )
-    # take care of na.last="keep"
-    V1 = NULL # for R CMD CHECK warning
-    if (isTRUE(keep)) {
-        ans = (setDT(as_list(ans))[which_(nas, TRUE), V1 := NA])[[1L]]
-    } else if (is.na(na.last)) {
-        ans = ans[which_(nas, FALSE)]
-    }
-    ans
-}
-
-frank <- function(x, ..., na.last=TRUE, ties.method=c("average", "first", "random", "max", "min", "dense")) {
-    cols = substitute(list(...))[-1]
-    if (identical(as.character(cols), "NULL")) {
-        cols  = NULL
-        order = 1L
-    } else if (length(cols)) {
-        cols=as.list(cols)
-        order=rep(1L, length(cols))
-        for (i in seq_along(cols)) {
-            v=as.list(cols[[i]])
-            if (length(v) > 1 && v[[1L]] == "+") v=v[[-1L]]
-            else if (length(v) > 1 && v[[1L]] == "-") {
-                v=v[[-1L]]
-                order[i] = -1L
-            }
-            cols[[i]]=as.character(v)
-        }
-        cols=unlist(cols, use.names=FALSE)
-    } else {
-        cols=colnames(x)
-        order=if (is.null(cols)) 1L else rep(1L, length(cols))
-    }
-    frankv(x, cols=cols, order=order, na.last=na.last, ties.method=ties.method)
-
-}
-
-#########################################################################################
-# Deprecated ...
-#########################################################################################
-
-
-# nocov start
-# don't include functions not used for coverage
-bench <- function(quick=TRUE, testback=TRUE, baseline=FALSE) {
-    if (baseline) testback=FALSE  # when baseline return in fastorder.c is uncommented, baseline must be TRUE
-    # fastorder benchmark forwards vs backwards
-    
-    Levels=Rows=SubGroupN=rand.forw=rand.back=ordT.forw=ordT.back=ordB.forw=ordB.back=rev.forw=rev.back=NULL  
-    x=y=faster1=faster2=faster3=faster4=NULL  # to keep R CMD check quiet
-    
-    if (quick) {Sr = 1:3; Nr = 2:4} else {Sr = 1:5; Nr = 2:8}
-    ans = setkey(CJ(Levels=as.integer(10^Sr),Rows=as.integer(10^Nr)))
-    
-    # TO DO:  add a case   S1 : 1e7 levels    S2 : 1:3   Every level has 3 rows.   1e7 calls to iradix
-    #   Sr:  c(1:3, 10^(1:7))    CJ(Sr,Sr) all combinations
-
-    # fastorder (backwards) doesn't call isSortedList anymore (we removed that at C level). It now proceeds as if
-    # unsorted always. This is always more favourable to fastorder timings, unless, the data is actually perfectly
-    # sorted. Hence we no longer test perfectly ordered here, as that was just testing isSortedList anyway. ordT
-    # and ordB tests dominate.
-
-    ans[, SubGroupN:=format(as.integer(ceiling(Rows/Levels)), big.mark=",")]
-    ans[,Rows:=format(Rows,big.mark=",")]
-    ans[,Levels:=format(Levels,big.mark=",")]
-    ident <- function(x,y) if (length(x)==0) length(y)==0 || identical(y,seq_along(y)) else identical(x,y)
-    IS.SORTED <- function(...)suppressWarnings(is.sorted(...))  # just needed here to ensure the test data construction is working
-    for (i in 1:nrow(ans)) {
-        ttype = c("user.self","sys.self")  # elapsed can sometimes be >> user.self+sys.self. TO DO: three repeats as well.
-        tol = 0.5                          # we don't mind about 0.5s; benefits when almost sorted outweigh
-        S = ans[i,as.integer(gsub(",","",Levels))]
-        N = ans[i,as.integer(gsub(",","",Rows))]
-        DT = setDT(lapply(1:2, function(x){sample(S,N,replace=TRUE)}))
-        
-        if (testback || baseline) ans[i, rand.back := sum(system.time(y<<-fastorder(DT, 1:2))[ttype])]
-        # in baseline mode, Cforder doesn't order, so y is needed to test baseline on ordered DT
-        ans[i, rand.forw := sum(system.time(x<<-forderv(DT))[ttype])]
-        if (testback) ans[i, faster1 := rand.forw<rand.back+tol]
-        if (testback) if (!ident(x,y)) browser()
-        
-        .Call(Creorder,DT, if (baseline) y else x)  # in baseline mode, x is deliberately wrong. And if testback=FALSE, we won't have y
-        if (!IS.SORTED(DT)) stop("Logical error: ordered table is not sorted according to IS.SORTED!")
-        if (baseline) ans[, rand.back := NULL]
-        
-        if (FALSE) {
-          # don't test perfectly ordered case. See note above.
-          if (testback) ans[i, ord.back := sum(system.time(y<<-fastorder(DT, 1:2))[ttype])]
-          ans[i, ord.forw := sum(system.time(x<<-forderv(DT))[ttype])]
-          if (testback) ans[i, faster2 := ord.forw<ord.back+tol]
-          if (testback) if (!ident(x,y)) browser()
-        }
-        
-        if (DT[[1]][1] == DT[[1]][2]) v = 2 else v = 1  # make small change to column 2, unless rows 1 and 2 aren't in the same group by column 1
-        old = DT[[v]][1:2]
-        DT[1:2, (v):=77:76]   # unsorted near the top to trigger full sort, is.sorted detects quickly.
-        if (IS.SORTED(DT)) stop("Table is sorted. Change to the top didn't work.")
-        
-        if (testback) ans[i, ordT.back := sum(system.time(y<<-fastorder(DT, 1:2))[ttype])]   # T for Top
-        ans[i, ordT.forw := sum(system.time(x<<-forderv(DT))[ttype])]
-        if (testback) ans[i, faster2 := ordT.forw<ordT.back+tol]
-        if (testback) if (!ident(x,y)) browser()
-
-        DT[1:2, (v):=old]          # undo the change at the top to make it sorted again
-        if (!IS.SORTED(DT)) stop("Logical error: reverting the small change at the top didn't make DT ordered again")
-        r = c(nrow(DT)-1, nrow(DT))
-        if (DT[[1]][r[1]] == DT[[1]][r[2]]) v = 2 else v = 1
-        old = DT[[v]][r]
-        DT[r, (v):=77:76]    # unsorted near the very end, so is.sorted does full scan.
-        if (IS.SORTED(DT)) stop("Table is sorted. Change to the very bottom didn't work.")
-        
-        if (testback) ans[i, ordB.back := sum(system.time(y<<-fastorder(DT, 1:2))[ttype])]   # B for Bottom
-        ans[i, ordB.forw := sum(system.time(x<<-forderv(DT))[ttype])]
-        if (testback) ans[i, faster3 := ordB.forw<ordB.back+tol]
-        if (testback) if (!ident(x,y)) browser()
-        
-        DT[r, (v):=old]          # undo the change at the top to make it sorted again
-        if (!IS.SORTED(DT)) stop("Logical error: reverting the small change at the bottom didn't make DT ordered again")
-        
-        .Call(Creorder,DT,nrow(DT):1)   # Pefect reverse order, some sort algo's worst case e.g. O(n^2)
-        if (IS.SORTED(DT)) stop("Logical error: reverse order of table is sorted according to IS.SORTED!")
-        # Adding this test revealed the complexity that a reverse order vector containing ties, would not be stable if the reverse was applied. isorted fixed so that -1 returned only if strictly decreasing order
-        
-        if (testback) ans[i, rev.back := sum(system.time(y<<-fastorder(DT, 1:2))[ttype])]   # rev = reverse order
-        ans[i, rev.forw := sum(system.time(x<<-forderv(DT))[ttype])]
-        if (testback) ans[i, faster4 := rev.forw<rev.back+tol]
-        if (testback) if (!ident(x,y)) browser()
-        
-        if (i==nrow(ans) || ans[i+1,Levels]!=ans[i,Levels]) print(ans[Levels==Levels[i]])  # print each block as we go along
-    }
-    cat("\nFinished.\n\n")
-    ans
-}
-
-# radixorder1 is used internally, only with fastorder
-# so adding a new argument is okay. added 'o' for order vector
-radixorder1 <- function(x, o=NULL) {
-    if(is.object(x)) x = xtfrm(x) # should take care of handling factors, Date's and others, so we don't need unlist
-    if (!is.null(o)) { # fix for http://stackoverflow.com/questions/21437546/data-table-1-8-11-and-aggregation-issues (moved this if-check to before checking logical)
-        x = copy(x)
-        setreordervec(x, o)
-    }
-    if(typeof(x) == "logical") return(c(which(is.na(x)),which(!x),which(x))) # logical is a special case of radix sort; just 3 buckets known up front. TO DO - could be faster in C but low priority
-    if(typeof(x) != "integer") # this allows factors; we assume the levels are sorted as we always do in data.table
-        stop("radixorder1 is only for integer 'x'")
-    base::sort.list(x, na.last=FALSE, decreasing=FALSE,method="radix")
-    # Always put NAs first, relied on in C binary search by relying on NA_integer_ being -maxint (checked in C).
-}
-
-# FOR INTERNAL use only.
-# Note that implementing just "sort" (and not order) takes half of this time. Getting order seems to be more time-consuming
-# slightly slower than R's (improperly named radix order) counting sort but: 
-# 1) works for any data size - not restricted like R's radix where max-min should be <= 1e5 
-# 2) with any values => also works on -ve integers, NA
-# 3) directly returns sort value instead of sort order by setting last parameter in C function to FALSE (not accessible via iradixorder)
-# 4) removed "decreasing=". Use 'setrev' instead to get the reversed order
-iradixorder <- function(x, o=NULL) {
-    # copied from radixorder1 and just changed the call to the correct function
-    # xtfrm converts date object to numeric. but this will be called only if it's integer, so do a as.integer(.)
-    if(is.object(x)) x = as.integer(xtfrm(x))
-    if(typeof(x) == "logical") {
-        if (!is.null(o)) { # since iradixorder requires a copy this check is better to be inside this if-statement unlike radixorder1
-            x = copy(x)
-            setreordervec(x, o)
-        }
-        return(c(which(is.na(x)), which(!x), which(x)))
-    }
-    if(typeof(x) != "integer") # this allows factors; we assume the levels are sorted as we always do in data.table
-        stop("iradixorder is only for integer 'x'. Try dradixorder for numeric 'x'")
-    if (length(x) == 0L) return(integer(0))
-    # OLD: passing list(x) to C to ensure copy is being made...
-    # NOTE: passing list(x) does not make a copy in >3.0.2 (devel version currently), so explicitly copying
-    x = copy(x)
-    if (!is.null(o)) setreordervec(x, o)
-    ans <- .Call(Cfastradixint, x, TRUE) # TRUE returns indices, FALSE returns sorted value directly
-    ans
-    # NA first as data.table requires
-}
-
-# FOR INTERNAL use only.
-# at least > 5-30x times faster than ordernumtol and order (depending on the number of groups to find the tolerance on)
-# real-life performances must be towards the much faster side though.
-dradixorder <- function(x, o=NULL, tol=.Machine$double.eps^0.5) {
-    if (!is.atomic(x) || typeof(x) != "double") stop("'dradixorder' is only numeric 'x'")
-    if (length(x) == 0) return(integer(0))
-    # OLD: passing list(x) to C to ensure copy is being made...
-    # NOTE: passing list(x) does not make a copy in >3.0.2 (devel version currently), so explicitly copying
-    x = copy(x)
-    if (!is.null(o)) setreordervec(x, o)
-    ans <- .Call(Cfastradixdouble, x, as.numeric(tol), TRUE) # TRUE returns order, FALSE returns sorted vector.
-    ans
-    # NA first followed by NaN as data.table requires
-}
-
-regularorder1 <- function(x) {
-    if(is.object(x)) x = xtfrm(x) # should take care of handling factors, Date's and others, so we don't need unlist
-    base::sort.list(x, na.last=FALSE, decreasing=FALSE)
-}
-
-ordernumtol <- function(x, tol=.Machine$double.eps^0.5) {
-    o = forderv(x)
-    if (length(o)) o else seq_along(x)
-    # was as follows, but we removed Crorder_tol at C level. 
-    #   o=seq_along(x)
-    #   .Call(Crorder_tol,x,o,tol)
-    #   o
-    # Retaining this function (ordernumtol) so that fastorder and bench() still works. So we can
-    # still test forwards vs backwards through columns, but just using the new forderv to sort the
-    # entire numeric column when backwards with fastorder.
-}
-
-# chorder2 to be used only with fastorder
-# neither are exported
-chorder2 <- function(x, o=NULL) {
-    if (!is.null(o)) {
-        x = copy(x)
-        setreordervec(x, o)
-    }
-    forderv(x,sort=TRUE)   #  was .Call(Ccountingcharacter, x, TRUE) but that's now removed
-}
-
-fastorder <- function(x, by=seq_along(x), verbose=getOption("datatable.verbose"))
-{
-    # x can be a vector, or anything that's stored as a list (inc data.frame and data.table), thus can be accessed with non-copying base::[[.
-    # When x is a list, 'by' may be integers or names
-    # This function uses the backwards approach; i.e. first orders the last column, then orders the 2nd to last column ordered by the order of
-    # the last column, and so on. This vectorized approach is much faster than base::order(...) [src/main/sort.c:ordervector(...,listgreater)]
-    # which is a comparison sort comparing 2 rows using a loop through columns for that row with a switch on each column type.
-    
-    # Now here only for dev testing to compare to forderv, e.g. in bench()
-    # Always orders without testing for sortedness. This is favourable to fastorder (!), unless, data is perfectly ordered. See message in bench().
-
-    if (is.atomic(x)) { by=NULL; v = x; w = 1 }  # w = 1 just for the error message below
-    else { w = last(by); v = x[[w]] }
-    o = switch(typeof(v),
-        "double" = dradixorder(v), # ordernumtol(v),
-        "character" = chorder(v),
-        # Use a radix sort (fast and stable for ties), but will fail for range > 1e5 elements (and any negatives in base)
-        tryCatch(radixorder1(v),error=function(e) {
-            if (verbose) cat("Column",w,"failed radixorder1, reverting to 'iradixorder'\n")
-            iradixorder(v) # regularorder1(v)
-        })
-    )
-    if (is.atomic(x)) return(o)
-    # If there is more than one column, run through them backwards
-    for (w in rev(take(by))) {
-        v = x[[w]] # We could make the 'copy' here followed by 'setreordervec' 
-                     # instead of creating 'chorder2'. But 'iradixorder' and 'dradixorder' 
-                     # already take a copy internally So it's better to avoid copying twice.
-        switch(typeof(v),
-            "double" = setreordervec(o, dradixorder(v, o)), # PREV: o[dradixorder(v[o])], PPREV: o[ordernumto(v[o])]
-            "character" = setreordervec(o, chorder2(v, o)), # TO DO: avoid the copy and reorder, pass in o to C like ordernumtol (still stands??)
-            tryCatch(setreordervec(o, radixorder1(v, o)), error=function(e) {
-                if (verbose) cat("Column",w,"failed radixorder1, reverting to 'iradixorder'\n")
-                setreordervec(o, iradixorder(v, o))         # PREV: o[regularorder1(v[o])]
-                                                            # TO DO: avoid the copy and reorder, pass in o to C like ordernumtol (still holds??)
-            })
-        )
-    }
-    o
+    l
 }
-
-# nocov end
diff --git a/R/setops.R b/R/setops.R
index ac78173..de3e4d1 100644
--- a/R/setops.R
+++ b/R/setops.R
@@ -15,7 +15,7 @@ validate <- function(cols, dt) {
     cols
 }
 
-# setdiff for data.tables, internal at the moment #547
+# setdiff for data.tables, internal at the moment #547, used in not-join
 setdiff_ <- function(x, y, by.x=seq_along(x), by.y=seq_along(y), use.names=FALSE) {
     if (!is.data.table(x) || !is.data.table(y)) stop("x and y must both be data.tables")
     if (is.null(x) || !length(x)) return(x)
@@ -40,5 +40,248 @@ setdiff_ <- function(x, y, by.x=seq_along(x), by.y=seq_along(y), use.names=FALSE
     ux = unique(shallow(x, by.x))
     uy = unique(shallow(y, by.y))
     ix = duplicated(rbind(uy, ux, use.names=use.names, fill=FALSE))[-seq_len(nrow(uy))]
-    .Call("CsubsetDT", ux, which_(ix, FALSE), seq_along(ux)) # more memory efficient version of which(!ix)
+    .Call(CsubsetDT, ux, which_(ix, FALSE), seq_along(ux)) # more memory efficient version of which(!ix)
 }
+
+# set operators ----
+
+funique <- function(x) {
+    stopifnot(is.data.table(x))
+    dup = duplicated(x)
+    if (any(dup)) .Call(CsubsetDT, x, which_(dup, FALSE), seq_along(x)) else x
+}
+
+fintersect <- function(x, y, all=FALSE) {
+    if (!is.logical(all) || length(all) != 1L) stop("argument 'all' should be logical of length one")
+    if (!is.data.table(x) || !is.data.table(y)) stop("x and y must be both data.tables")
+    if (!identical(sort(names(x)), sort(names(y)))) stop("x and y must have same column names")
+    if (!identical(names(x), names(y))) stop("x and y must have same column order")
+    bad.type = setNames(c("raw","complex","list") %chin% c(vapply(x, typeof, FUN.VALUE = ""), vapply(y, typeof, FUN.VALUE = "")), c("raw","complex","list"))
+    if (any(bad.type)) stop(sprintf("x and y must not have unsupported column types: %s", paste(names(bad.type)[bad.type], collapse=", ")))
+    if (!identical(lapply(x, class), lapply(y, class))) stop("x and y must have same column classes")
+    if (".seqn" %in% names(x)) stop("None of the datasets to intersect should contain a column named '.seqn'")
+    if (!nrow(x) || !nrow(y)) return(x[0L])
+    if (all) {
+        x = shallow(x)[, ".seqn" := rowidv(x)]
+        y = shallow(y)[, ".seqn" := rowidv(y)]
+        jn.on = c(".seqn",setdiff(names(x),".seqn"))
+        x[y, .SD, .SDcols=setdiff(names(x),".seqn"), nomatch=0L, on=jn.on]
+    } else {
+        x[funique(y), nomatch=0L, on=names(x), mult="first"]
+    }
+}
+
+fsetdiff <- function(x, y, all=FALSE) {
+    if (!is.logical(all) || length(all) != 1L) stop("argument 'all' should be logical of length one")
+    if (!is.data.table(x) || !is.data.table(y)) stop("x and y must be both data.tables")
+    if (!identical(sort(names(x)), sort(names(y)))) stop("x and y must have same column names")
+    if (!identical(names(x), names(y))) stop("x and y must have same column order")
+    bad.type = setNames(c("raw","complex","list") %chin% c(vapply(x, typeof, FUN.VALUE = ""), vapply(y, typeof, FUN.VALUE = "")), c("raw","complex","list"))
+    if (any(bad.type)) stop(sprintf("x and y must not have unsupported column types: %s", paste(names(bad.type)[bad.type], collapse=", ")))
+    if (!identical(lapply(x, class), lapply(y, class))) stop("x and y must have same column classes")
+    if (".seqn" %in% names(x)) stop("None of the datasets to setdiff should contain a column named '.seqn'")
+    if (!nrow(x)) return(x)
+    if (!nrow(y)) return(if (!all) funique(x) else x)
+    if (all) {
+        x = shallow(x)[, ".seqn" := rowidv(x)]
+        y = shallow(y)[, ".seqn" := rowidv(y)]
+        jn.on = c(".seqn",setdiff(names(x),".seqn"))
+        x[!y, .SD, .SDcols=setdiff(names(x),".seqn"), on=jn.on]
+    } else {
+        funique(x[!y, on=names(x)])
+    }
+}
+
+funion <- function(x, y, all=FALSE) {
+    if (!is.logical(all) || length(all) != 1L) stop("argument 'all' should be logical of length one")
+    if (!is.data.table(x) || !is.data.table(y)) stop("x and y must be both data.tables")
+    if (!identical(sort(names(x)), sort(names(y)))) stop("x and y must have same column names")
+    if (!identical(names(x), names(y))) stop("x and y must have same column order")
+    bad.type = setNames(c("raw","complex", if(!all) "list") %chin% c(vapply(x, typeof, FUN.VALUE = ""), vapply(y, typeof, FUN.VALUE = "")), c("raw","complex", if(!all) "list"))
+    if (any(bad.type)) stop(sprintf("x and y must not have unsupported column types: %s", paste(names(bad.type)[bad.type], collapse=", ")))
+    if (!identical(lapply(x, class), lapply(y, class))) stop("x and y must have same column classes")
+    ans = rbindlist(list(x, y))
+    if (!all) ans = funique(ans)
+    ans
+}
+
+fsetequal <- function(x, y) {
+    if (!is.data.table(x) || !is.data.table(y)) stop("x and y must be both data.tables")
+    if (!identical(sort(names(x)), sort(names(y)))) stop("x and y must have same column names")
+    if (!identical(names(x), names(y))) stop("x and y must have same column order")
+    bad.type = setNames(c("raw","complex","list") %chin% c(vapply(x, typeof, FUN.VALUE = ""), vapply(y, typeof, FUN.VALUE = "")), c("raw","complex","list"))
+    if (any(bad.type)) stop(sprintf("x and y must not have unsupported column types: %s", paste(names(bad.type)[bad.type], collapse=", ")))
+    if (!identical(lapply(x, class), lapply(y, class))) stop("x and y must have same column classes")
+    isTRUE(all.equal.data.table(x, y, check.attributes = FALSE, ignore.row.order = TRUE))
+}
+
+# all.equal ----
+
+all.equal.data.table <- function(target, current, trim.levels=TRUE, check.attributes=TRUE, ignore.col.order=FALSE, ignore.row.order=FALSE, tolerance=sqrt(.Machine$double.eps), ...) {
+    stopifnot(is.logical(trim.levels), is.logical(check.attributes), is.logical(ignore.col.order), is.logical(ignore.row.order), is.numeric(tolerance))
+    if (!is.data.table(target) || !is.data.table(current)) stop("'target' and 'current' must be both data.tables")
+    
+    msg = character(0)
+    # init checks that detect high level all.equal
+    if (nrow(current) != nrow(target)) msg = "Different number of rows"
+    if (ncol(current) != ncol(target)) msg = c(msg, "Different number of columns")
+    diff.colnames = !identical(sort(names(target)), sort(names(current)))
+    diff.colorder = !identical(names(target), names(current))
+    if (check.attributes && diff.colnames) msg = c(msg, "Different column names")
+    if (!diff.colnames && !ignore.col.order && diff.colorder) msg = c(msg, "Different column order")
+    
+    if (length(msg)) return(msg) # skip check.attributes and further heavy processing
+    
+    # ignore.col.order
+    if (ignore.col.order && diff.colorder) current = setcolorder(shallow(current), names(target))
+    
+    # Always check modes equal, like base::all.equal
+    targetModes = sapply(target, mode)
+    currentModes = sapply(current,  mode)
+    if (any( d<-(targetModes!=currentModes) )) {
+        w = head(which(d),3)
+        return(paste0("Datasets have different column modes. First 3: ",paste(
+         paste(names(targetModes)[w],"(",paste(targetModes[w],currentModes[w],sep="!="),")",sep="")
+                      ,collapse=" ")))
+    }
+    
+    if (check.attributes) {
+        squashClass = function(x) if (is.object(x)) paste(class(x),collapse=";") else mode(x)
+        # else mode() is so that integer==numeric, like base all.equal does.
+        targetTypes = sapply(target, squashClass)
+        currentTypes = sapply(current, squashClass)
+        if (length(targetTypes) != length(currentTypes))
+            stop("Internal error: ncol(current)==ncol(target) was checked above")
+        if (any( d<-(targetTypes != currentTypes))) {
+            w = head(which(d),3)
+            return(paste0("Datasets have different column classes. First 3: ",paste(
+         paste(names(targetTypes)[w],"(",paste(targetTypes[w],currentTypes[w],sep="!="),")",sep="")
+                      ,collapse=" ")))
+        }
+    }
+    
+    if (check.attributes) {
+        # check key
+        k1 = key(target)
+        k2 = key(current)
+        if (!identical(k1, k2)) {
+            return(sprintf("Datasets has different keys. 'target'%s. 'current'%s.",
+                           if(length(k1)) paste0(": ", paste(k1, collapse=", ")) else " has no key",
+                           if(length(k2)) paste0(": ", paste(k2, collapse=", ")) else " has no key"))
+        }
+        # check index
+        i1 = indices(target)
+        i2 = indices(current)
+        if (!identical(i1, i2)) {
+            return(sprintf("Datasets has different indexes. 'target'%s. 'current'%s.",
+                           if(length(i1)) paste0(": ", paste(i1, collapse=", ")) else " has no index",
+                           if(length(i2)) paste0(": ", paste(i2, collapse=", ")) else " has no index"))
+        }
+        
+        # Trim any extra row.names attributes that came from some inheritence
+        # Trim ".internal.selfref" as long as there is no `all.equal.externalptr` method
+        exclude.attrs = function(x, attrs = c("row.names",".internal.selfref")) x[!names(x) %in% attrs]
+        a1 = exclude.attrs(attributes(target))
+        a2 = exclude.attrs(attributes(current))
+        if (length(a1) != length(a2)) return(sprintf("Datasets has different number of (non-excluded) attributes: target %s, current %s", length(a1), length(a2)))
+        if (!identical(nm1 <- sort(names(a1)), nm2 <- sort(names(a2)))) return(sprintf("Datasets has attributes with different names: %s", paste(setdiff(union(names(a1), names(a2)), intersect(names(a1), names(a2))), collapse=", ")))
+        attrs.r = all.equal(a1[nm1], a2[nm2], ..., check.attributes = check.attributes)
+        if (is.character(attrs.r)) return(paste("Attributes: <", attrs.r, ">")) # skip further heavy processing
+    }
+    
+    if (ignore.row.order) {
+        if (".seqn" %in% names(target))
+            stop("None of the datasets to compare should contain a column named '.seqn'")
+        bad.type = setNames(c("raw","complex","list") %chin% c(vapply(current, typeof, FUN.VALUE = ""), vapply(target, typeof, FUN.VALUE = "")), c("raw","complex","list"))
+        if (any(bad.type))
+            stop(sprintf("Datasets to compare with 'ignore.row.order' must not have unsupported column types: %s", paste(names(bad.type)[bad.type], collapse=", ")))
+        if (between(tolerance, 0, sqrt(.Machine$double.eps), incbounds=FALSE)) {
+            warning(sprintf("Argument 'tolerance' was forced to lowest accepted value `sqrt(.Machine$double.eps)` from provided %s", format(tolerance, scientific=FALSE)))
+            tolerance = sqrt(.Machine$double.eps)
+        }
+        target_dup = as.logical(anyDuplicated(target))
+        current_dup = as.logical(anyDuplicated(current))
+        tolerance.msg = if (identical(tolerance, 0)) ", be aware you are using `tolerance=0` which may result into visually equal data" else ""
+        if (target_dup || current_dup) {
+            # handling 'tolerance' for duplicate rows - those `msg` will be returned only when equality with tolerance will fail
+            if (any(sapply(target,typeof)=="double") && !identical(tolerance, 0)) {
+                if (target_dup && !current_dup) msg = c(msg, "Dataset 'target' has duplicate rows while 'current' doesn't")
+                else if (!target_dup && current_dup) msg = c(msg, "Dataset 'current' has duplicate rows while 'target' doesn't")
+                else { # both
+                    if (!identical(tolerance, sqrt(.Machine$double.eps))) # non-default will raise error
+                        stop("Duplicate rows in datasets, numeric columns and ignore.row.order cannot be used with non 0 tolerance argument")
+                    msg = c(msg, "Both datasets have duplicate rows, they also have numeric columns, together with ignore.row.order this force 'tolerance' argument to 0")
+                    tolerance = 0
+                }
+            } else { # no numeric columns or tolerance==0L
+                if (target_dup && !current_dup)
+                    return(sprintf("Dataset 'target' has duplicate rows while 'current' doesn't%s", tolerance.msg))
+                if (!target_dup && current_dup)
+                    return(sprintf("Dataset 'current' has duplicate rows while 'target' doesn't%s", tolerance.msg))
+            }
+        }
+        jn.on = if (target_dup && current_dup) {
+            target = shallow(target)[, ".seqn" := rowidv(target)]
+            current = shallow(current)[, ".seqn" := rowidv(current)]
+            c(".seqn", setdiff(names(target), ".seqn"))
+        } else names(target)
+        # handling 'tolerance' for factor cols - those `msg` will be returned only when equality with tolerance will fail
+        if (any(sapply(target,is.factor)) && !identical(tolerance, 0)) {
+            if (!identical(tolerance, sqrt(.Machine$double.eps))) # non-default will raise error
+                stop("Factor columns and ignore.row.order cannot be used with non 0 tolerance argument")
+            msg = c(msg, "Using factor columns together together with ignore.row.order, this force 'tolerance' argument to 0")
+            tolerance = 0
+        }
+        # roll join to support 'tolerance' argument, conditional to retain support for factor when tolerance=0
+        ans = if (identical(tolerance, 0)) target[current, nomatch=NA, which=TRUE, on=jn.on] else {
+            ans1 = target[current, roll=tolerance, rollends=TRUE, which=TRUE, on=jn.on]
+            ans2 = target[current, roll=-tolerance, rollends=TRUE, which=TRUE, on=jn.on]
+            pmin(ans1, ans2, na.rm=TRUE)
+        }
+        if (any_na(as_list(ans))) {
+            msg = c(msg, sprintf("Dataset 'current' has rows not present in 'target'%s%s", if (target_dup || current_dup) " or present in different quantity" else "", tolerance.msg))
+            return(msg)
+        }
+        ans = if (identical(tolerance, 0)) current[target, nomatch=NA, which=TRUE, on=jn.on] else {
+            ans1 = current[target, roll=tolerance, rollends=TRUE, which=TRUE, on=jn.on]
+            ans2 = current[target, roll=-tolerance, rollends=TRUE, which=TRUE, on=jn.on]
+            pmin(ans1, ans2, na.rm=TRUE)
+        }
+        if (any_na(as_list(ans))) {
+            msg = c(msg, sprintf("Dataset 'target' has rows not present in 'current'%s%s", if (target_dup || current_dup) " or present in different quantity" else "", tolerance.msg))
+            return(msg)
+        }
+    } else {
+        for (i in seq_along(target)) {
+            # trim.levels moved here
+            x = target[[i]]
+            y = current[[i]]
+            if (xor(is.factor(x),is.factor(y)))
+                return("Internal error: factor type mismatch should have been caught earlier")
+            cols.r = TRUE
+            if (is.factor(x)) {
+                if (!identical(levels(x),levels(y))) {
+                    if (trim.levels) {
+                      # do this regardless of check.attributes (that's more about classes, checked above)
+                      x = factor(x)
+                      y = factor(y)
+                      if (!identical(levels(x),levels(y)))
+                        cols.r = "Levels not identical even after refactoring since trim.levels is TRUE"
+                    } else {
+                        cols.r = "Levels not identical. No attempt to refactor because trim.levels is FALSE"
+                    }    
+                } else {
+                    cols.r = all.equal(x, y, check.attributes=check.attributes)
+                    # the check.attributes here refers to everything other than the levels, which are always
+                    # dealt with according to trim.levels
+                }
+            } else {
+                cols.r = all.equal(unclass(x), unclass(y), tolerance=tolerance, ..., check.attributes=check.attributes)
+                # classes were explicitly checked earlier above, so ignore classes here.
+            }
+            if (!isTRUE(cols.r)) return(paste0("Column '", names(target)[i], "': ", paste(cols.r,collapse=" ")))
+        }
+    }
+    TRUE
+}
+
diff --git a/R/test.data.table.R b/R/test.data.table.R
index 4267562..c66abd8 100644
--- a/R/test.data.table.R
+++ b/R/test.data.table.R
@@ -1,5 +1,5 @@
 
-test.data.table <- function(verbose=FALSE, pkg="pkg") {
+test.data.table <- function(verbose=FALSE, pkg="pkg", silent=FALSE) {
     if (exists("test.data.table",.GlobalEnv,inherits=FALSE)) {
         # package developer
         if ("package:data.table" %in% search()) stop("data.table package loaded")
@@ -20,22 +20,21 @@ test.data.table <- function(verbose=FALSE, pkg="pkg") {
     # Sys.setlocale("LC_CTYPE", "")   # just for CRAN's Mac to get it off C locale (post to r-devel on 16 Jul 2012)
     olddir = setwd(d)
     on.exit(setwd(olddir))
+    envirs <- list()
     for (fn in file.path(d, 'tests.Rraw')) {    # not testthat
         cat("Running",fn,"\n")
-        oldverbose = getOption("datatable.verbose")
-        if (verbose) options(datatable.verbose=TRUE)
-        sys.source(fn,envir=new.env(parent=.GlobalEnv))
-        options(data.table.verbose=oldverbose)
-        # As from v1.7.2, testthat doesn't run the tests.Rraw (hence file name change to .Rraw).
-        # There were environment issues with system.time() (when run by test_package) that only
-        # showed up when CRAN maintainers tested on 64bit. Matt spent a long time including
-        # testing on 64bit in Amazon EC2. Solution was simply to not run the tests.R from
-        # testthat, which probably makes sense anyway to speed it up a bit (was running twice
-        # before).
+        oldverbose = options(datatable.verbose=verbose)
+        envirs[[fn]] = new.env(parent=.GlobalEnv)
+        if(isTRUE(silent)){
+            try(sys.source(fn,envir=envirs[[fn]]), silent=silent)
+        } else {
+            sys.source(fn,envir=envirs[[fn]])
+        }
+        options(oldverbose)
     }
     options(encoding=oldenc)
     # Sys.setlocale("LC_CTYPE", oldlocale)
-    invisible()
+    invisible(sum(sapply(envirs, `[[`, "nfail"))==0)
 }
 
 # Define test() and its globals here, for use in dev
@@ -46,10 +45,18 @@ test.data.table <- function(verbose=FALSE, pkg="pkg") {
 # .devtesting = TRUE
 
 compactprint <- function(DT, topn=2) {
-    cn = paste(" [Key=",paste(key(DT),collapse=",")," Types=",paste(substring(gsub("integer64","i64",sapply(DT,class)),1,3),collapse=","),"]",sep="")
+    tt = sapply(DT,function(x)class(x)[1L])
+    tt[tt=="integer64"] = "i64"
+    cn = paste(" [Key=",paste(key(DT),collapse=","),
+                " Types=",paste(substring(sapply(DT,typeof),1,3),collapse=","),
+                " Classes=",paste(substring(tt,1,3),collapse=","),
+                "]",sep="")
     print(copy(DT)[,(cn):=""], topn=topn)
     invisible()
 }
+
+INT = function(...) { as.integer(c(...)) }   # utility used in tests.Rraw
+
 test <- function(num,x,y,error=NULL,warning=NULL,output=NULL) {
     # Usage:
     # i) tests that x equals y when both x and y are supplied, the most common usage
@@ -67,6 +74,7 @@ test <- function(num,x,y,error=NULL,warning=NULL,output=NULL) {
     # 3) each test has a unique id which we refer to in commit messages, emails etc.
     nfail = get("nfail", parent.frame())   # to cater for both test.data.table() and stepping through tests in dev
     whichfail = get("whichfail", parent.frame())
+    all.equal.result = TRUE
     assign("ntest", get("ntest", parent.frame()) + 1, parent.frame(), inherits=TRUE)   # bump number of tests run
     assign("lastnum", num, parent.frame(), inherits=TRUE)
     v = getOption("datatable.verbose")
@@ -78,14 +86,18 @@ test <- function(num,x,y,error=NULL,warning=NULL,output=NULL) {
     xsub = substitute(x)
     ysub = substitute(y)
     if (is.null(output)) err <<- try(x,TRUE)
-    else out = gsub("NULL$","",paste(capture.output(print(err<<-try(x,TRUE))),collapse=""))
-    if (!is.null(output)) {
-        output = gsub("[[]","<LBRACKET>",output)
-        output = gsub("[]]","<RBRACKET>",output)
-        output = gsub("<LBRACKET>","[[]",output)
-        output = gsub("<RBRACKET>","[]]",output)
-        output = gsub("[(]","[(]",output)
-        output = gsub("[)]","[)]",output)
+    else {
+        out = gsub("NULL$","",paste(capture.output(print(err<<-try(x,TRUE))),collapse=""))
+        out = gsub("\n","",gsub("\r","",out))  # ensure no \r or \n pollution on windows
+        # We use .* to shorten what we test for (so the grep below needs fixed=FALSE)
+        # but other characters should be matched literally
+        output = gsub("\\","\\\\",output,fixed=TRUE)  # e.g numbers like 9.9e+10 should match the + literally
+        output = gsub("[","\\[",output,fixed=TRUE)
+        output = gsub("]","\\]",output,fixed=TRUE)
+        output = gsub("(","\\(",output,fixed=TRUE)
+        output = gsub(")","\\)",output,fixed=TRUE)
+        output = gsub("+","\\+",output,fixed=TRUE)  # e.g numbers like 9.9e+10 should match the + literally
+        output = gsub("\n","",output,fixed=TRUE)  # e.g numbers like 9.9e+10 should match the + literally
         if (!length(grep(output,out))) {
             cat("Test",num,"didn't produce correct output:\n")
             cat(">",deparse(xsub),"\n")
@@ -152,60 +164,24 @@ test <- function(num,x,y,error=NULL,warning=NULL,output=NULL) {
             setattr(xc,"index",NULL)   # too onerous to create test RHS with the correct index as well, just check result
             setattr(yc,"index",NULL)
             if (identical(xc,yc) && identical(key(x),key(y))) return()  # check key on original x and y because := above might have cleared it on xc or yc
-            if (isTRUE(all.equal(xc,yc)) && identical(key(x),key(y))) return()
+            if (isTRUE(all.equal.result<-all.equal(xc,yc)) && identical(key(x),key(y)) &&
+                identical(sapply(xc,typeof), sapply(yc,typeof))) return()
         }
         if (is.factor(x) && is.factor(y)) {
             x = factor(x)
             y = factor(y)
             if (identical(x,y)) return()
         }
-        if (is.atomic(x) && is.atomic(y) && isTRUE(all.equal(x,y))) return()   # For test 617 on r-prerel-solaris-sparc on 7 Mar 2013
+        if (is.atomic(x) && is.atomic(y) && isTRUE(all.equal.result<-all.equal(x,y))) return()
+        # For test 617 on r-prerel-solaris-sparc on 7 Mar 2013
     }
     cat("Test",num,"ran without errors but failed check that x equals y:\n")
     cat("> x =",deparse(xsub),"\n")
     if (is.data.table(x)) compactprint(x) else if (length(x)>6) {cat("First 6 of", length(x),":");print(head(x))} else print(x)
     cat("> y =",deparse(ysub),"\n")
     if (is.data.table(y)) compactprint(y) else if (length(y)>6) {cat("First 6 of", length(y),":");print(head(y))} else print(y)
+    if (!isTRUE(all.equal.result)) cat(all.equal.result,sep="\n")
     assign("nfail", nfail+1, parent.frame(), inherits=TRUE)
     assign("whichfail", c(whichfail, num), parent.frame(), inherits=TRUE)
     invisible()
 }
-
-
-## Tests that two data.tables (`target` and `current`) are equivalent.
-## This method is used primarily to make life easy with a testing harness
-## built around test_that. A call to test_that::{expect_equal|equal} will
-## ultimately dispatch to this method when making an "equality" call.
-all.equal.data.table <- function(target, current, trim.levels=TRUE, ...) {
-    target = copy(target)
-    current = copy(current)
-    if (trim.levels) {
-        ## drop unused levels
-        if (length(target)) {
-            for (i in which(sapply(target, is.factor))) {
-                .xi = factor(target[[i]])
-                target[,(i):=.xi]
-            }
-        }
-        if (length(current)) {
-            for (i in which(sapply(current, is.factor))) {
-                .xi = factor(current[[i]])
-                current[,(i):=.xi]
-            }
-        }
-    }
-
-    ## Trim any extra row.names attributes that came from some inheritence
-    setattr(target, "row.names", NULL)
-    setattr(current, "row.names", NULL)
-    
-    # all.equal uses unclass which doesn't know about external pointers; there
-    # doesn't seem to be all.equal.externalptr method in base.
-    setattr(target, ".internal.selfref", NULL)
-    setattr(current, ".internal.selfref", NULL)
-    
-    all.equal.list(target, current, ...)
-}
-
-
-
diff --git a/R/timetaken.R b/R/timetaken.R
index dc07195..493520a 100644
--- a/R/timetaken.R
+++ b/R/timetaken.R
@@ -16,6 +16,6 @@ timetaken <- function(started.at)
        if (days >= 1) res = sprintf("%d days ", as.integer(days)) else res=""
        paste(res,sprintf("%02.0f:%02.0f:%02.0f", hrs, mins, secs %% 60),sep="")
    } else {
-       sprintf("%1.3fsec", secs)
+       sprintf(if (secs>=10) "%.1fsec" else "%.3fsec", secs)
    }
 }
diff --git a/R/transpose.R b/R/transpose.R
index b693d6b..499cc3a 100644
--- a/R/transpose.R
+++ b/R/transpose.R
@@ -1,18 +1,36 @@
 transpose <- function(l, fill=NA, ignore.empty=FALSE) {
-	ans = .Call(Ctranspose, l, fill, ignore.empty)
-	if (is.data.table(l)) setDT(ans)
-	else if (is.data.frame(l)) { 
-		if (is.null(names(ans))) 
-        	setattr(ans, "names", paste("V", seq_along(ans), sep = ""))
-    	setattr(ans, "row.names", .set_row_names(length(ans[[1L]])))
-    	setattr(ans, "class", "data.frame")
-	}
-	ans[]
+    ans = .Call(Ctranspose, l, fill, ignore.empty)
+    if (is.data.table(l)) setDT(ans)
+    else if (is.data.frame(l)) { 
+        if (is.null(names(ans))) 
+            setattr(ans, "names", paste("V", seq_along(ans), sep = ""))
+        setattr(ans, "row.names", .set_row_names(length(ans[[1L]])))
+        setattr(ans, "class", "data.frame")
+    }
+    ans[]
 }
 
-tstrsplit <- function(x, ..., fill=NA, type.convert=FALSE) {
-	ans = transpose(strsplit(as.character(x), ...), fill=fill, ignore.empty = FALSE)
-	# Implementing #1094, but default FALSE
-  if(type.convert) ans = lapply(ans, type.convert, as.is = TRUE)
+tstrsplit <- function(x, ..., fill=NA, type.convert=FALSE, keep, names=FALSE) {
+    ans = transpose(strsplit(as.character(x), ...), fill=fill, ignore.empty=FALSE)
+    if (!missing(keep)) {
+        keep = suppressWarnings(as.integer(keep))
+        chk = min(keep) >= min(1L, length(ans)) & max(keep) <= length(ans)
+        if (!isTRUE(chk)) # handles NA case too
+            stop("'keep' should contain integer values between ", 
+                min(1L, length(ans)), " and ", length(ans), ".")
+        ans = ans[keep]
+    }
+    # Implementing #1094, but default FALSE
+    if(type.convert) ans = lapply(ans, type.convert, as.is = TRUE)
+    if (identical(names, FALSE)) return(ans)
+    else if (isTRUE(names)) names = paste0("V", seq_along(ans))
+    if (!is.character(names))
+        stop("'names' must be TRUE/FALSE or a character vector.")
+    if (length(names) != length(ans)) {
+        str = if (missing(keep)) "ans" else "keep"
+        stop("length(names) (= ", length(names), 
+            ") is not equal to length(", str, ") (= ", length(ans), ").")
+    }
+    setattr(ans, 'names', names)
   ans
 }
diff --git a/R/xts.R b/R/xts.R
index 135c1d1..21f83b5 100644
--- a/R/xts.R
+++ b/R/xts.R
@@ -1,18 +1,17 @@
-as.data.table.xts <- function(x, keep.rownames = TRUE, ...){
-  stopifnot(requireNamespace("xts"), !missing(x), xts::is.xts(x))
-  if(!keep.rownames) return(setDT(as.data.frame(x, row.names=FALSE))[])
-  if("index" %in% names(x)) stop("Input xts object should not have 'index' column because it would result in duplicate column names. Rename 'index' column in xts or use `keep.rownames=FALSE` and add index manually as another column.")
-  r = setDT(as.data.frame(x, row.names=FALSE))
-  index = NULL # fix for "no visible binding for global variable index"
-  r[, index := zoo::index(x)]
-  setcolorder(r,c("index",names(r)[names(r)!="index"]))[]
+as.data.table.xts <- function(x, keep.rownames = TRUE, ...) {
+    stopifnot(requireNamespace("xts"), !missing(x), xts::is.xts(x))
+    r = setDT(as.data.frame(x, row.names=NULL))
+    if (!keep.rownames) return(r[])
+    if ("index" %in% names(x)) stop("Input xts object should not have 'index' column because it would result in duplicate column names. Rename 'index' column in xts or use `keep.rownames=FALSE` and add index manually as another column.")
+    r[, "index" := zoo::index(x)]
+    setcolorder(r, c("index", setdiff(names(r), "index")))[]
 }
 
-as.xts.data.table <- function(x){
-  stopifnot(requireNamespace("xts"), !missing(x), is.data.table(x))
-  if(!any(class(x[[1]]) %in% c("POSIXct","Date"))) stop("data.table must have a POSIXct or Date column on first position, use `setcolorder` function.")
-  colsNumeric = sapply(x, is.numeric)[-1] # exclude first col, xts index
-  if(any(!colsNumeric)) warning(paste("Following columns are not numeric and will be omitted:",paste(names(colsNumeric)[!colsNumeric],collapse=", ")))
-  r = setDF(x[,.SD,.SDcols=names(colsNumeric)[colsNumeric]])
-  xts::as.xts(r, order.by=x[[1]])
+as.xts.data.table <- function(x, ...) {
+    stopifnot(requireNamespace("xts"), !missing(x), is.data.table(x))
+    if (!any((index_class <- class(x[[1L]])) %in% c("POSIXct","Date"))) stop("data.table must have a POSIXct, Date or IDate column on first position, use `setcolorder` function.")
+    colsNumeric = sapply(x, is.numeric)[-1L] # exclude first col, xts index
+    if (any(!colsNumeric)) warning(paste("Following columns are not numeric and will be omitted:", paste(names(colsNumeric)[!colsNumeric], collapse=", ")))
+    r = setDF(x[, .SD, .SDcols=names(colsNumeric)[colsNumeric]])
+    xts::as.xts(r, order.by=if ("IDate" %in% index_class) as.Date(x[[1L]]) else x[[1L]])
 }
diff --git a/README.md b/README.md
index 72a00bb..3441f3f 100644
--- a/README.md
+++ b/README.md
@@ -1,3060 +1,4 @@
 
-**Current stable release** (always even) : [v1.9.6 on CRAN](http://cran.r-project.org/package=data.table), released 19<sup>th</sup> Sep 2015.  
-**Development version** (always odd): [v1.9.7 on GitHub](https://github.com/Rdatatable/data.table/) [![Build Status](https://travis-ci.org/Rdatatable/data.table.svg?branch=master)](https://travis-ci.org/Rdatatable/data.table) [![codecov.io](http://codecov.io/github/Rdatatable/data.table/coverage.svg?branch=master)](http://codecov.io/github/Rdatatable/data.table?branch=master)
- [How to install?](https://github.com/Rdatatable/data.table/wiki/Installation)
-
-<!-- Note this file is displayed on the CRAN page, as well as on GitHub. So the the link to GitHub is not to itself when viewed on the CRAN page. -->
-
-**Introduction, installation, documentation, benchmarks etc:** [HOMEPAGE](https://github.com/Rdatatable/data.table/wiki)
-
-**Guidelines for filing issues / pull requests:** Please see the project's [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/Contributing.md).
-
----
-
-### Changes in v1.9.6  (on CRAN 19 Sep 2015)
-
-#### NEW FEATURES
-
-  1. `fread`
-      * passes `showProgress=FALSE` through to `download.file()` (as `quiet=TRUE`). Thanks to a pull request from Karl Broman and Richard Scriven for filing the issue, [#741](https://github.com/Rdatatable/data.table/issues/741).
-      * accepts `dec=','` (and other non-'.' decimal separators), [#917](https://github.com/Rdatatable/data.table/issues/917). A new paragraph has been added to `?fread`. On Windows this should just-work. On Unix it may just-work but if not you will need to read the paragraph for an extra step. In case it somehow breaks `dec='.'`, this new feature can be turned off with `options(datatable.fread.dec.experiment=FALSE)`.
-      * Implemented `stringsAsFactors` argument for `fread()`. When `TRUE`, character columns are converted to factors. Default is `FALSE`. Thanks to Artem Klevtsov for filing [#501](https://github.com/Rdatatable/data.table/issues/501), and to @hmi2015 for [this SO post](http://stackoverflow.com/q/31350209/559784).
-      * gains `check.names` argument, with default value `FALSE`. When `TRUE`, it uses the base function `make.unique()` to ensure that the column names of the data.table read in are all unique. Thanks to David Arenburg for filing [#1027](https://github.com/Rdatatable/data.table/issues/1027).
-      * gains `encoding` argument. Acceptable values are "unknown", "UTF-8" and "Latin-1" with default value of "unknown". Closes [#563](https://github.com/Rdatatable/data.table/issues/563). Thanks to @BenMarwick for the original report and to the many requests from others, and Q on SO.
-      * gains `col.names` argument, and is similar to `base::read.table()`. Closes [#768](https://github.com/Rdatatable/data.table/issues/768). Thanks to @dardesta for filing the FR.
-
-  2. `DT[column == value]` no longer recycles `value` except in the length 1 case (when it still uses DT's key or an automatic secondary key, as introduced in v1.9.4). If `length(value)==length(column)` then it works element-wise as standard in R. Otherwise, a length error is issued to avoid common user errors. `DT[column %in% values]` still uses DT's key (or an an automatic secondary key) as before.  Automatic indexing (i.e., optimization of `==` and `%in%`) may still be turned off with [...]
-
-  3. `na.omit` method for data.table is rewritten in C, for speed. It's ~11x faster on bigger data; see examples under `?na.omit`. It also gains two additional arguments a) `cols` accepts column names (or numbers) on which to check for missing values. 2) `invert` when `TRUE` returns the rows with any missing values instead. Thanks to the suggestion and PR from @matthieugomez.
-
-  4. New function `shift()` implements fast `lead/lag` of *vector*, *list*, *data.frames* or *data.tables*. It takes a `type` argument which can be either *"lag"* (default) or *"lead"*. It enables very convenient usage along with `:=` or `set()`. For example: `DT[, (cols) := shift(.SD, 1L), by=id]`. Please have a look at `?shift` for more info.
-
-  5. `frank()` is now implemented. It's much faster than `base::rank` and does more. It accepts *vectors*, *lists* with all elements of equal lengths, *data.frames* and *data.tables*, and optionally takes a `cols` argument. In addition to implementing all the `ties.method` methods available from `base::rank`, it also implements *dense rank*. It is also capable of calculating ranks by ordering column(s) in ascending or descending order. See `?frank` for more. Closes [#760](https://github. [...]
-
-  6. `rleid()`, a convenience function for generating a run-length type id column to be used in grouping operations is now implemented. Closes [#686](https://github.com/Rdatatable/data.table/issues/686). Check `?rleid` examples section for usage scenarios.
-  
-  7. Efficient convertion of `xts` to data.table. Closes [#882](https://github.com/Rdatatable/data.table/issues/882). Check examples in `?as.xts.data.table` and `?as.data.table.xts`. Thanks to @jangorecki for the PR.
-
-  8. `rbindlist` gains `idcol` argument which can be used to generate an index column. If `idcol=TRUE`, the column is automatically named `.id`. Instead you can also provide a column name directly. If the input list has no names, indices are automatically generated. Closes [#591](https://github.com/Rdatatable/data.table/issues/591). Also thanks to @KevinUshey for filing [#356](https://github.com/Rdatatable/data.table/issues/356).
-
-  9. A new helper function `uniqueN` is now implemented. It is equivalent to `length(unique(x))` but much faster. It handles `atomic vectors`, `lists`, `data.frames` and `data.tables` as input and returns the number of unique rows. Closes [#884](https://github.com/Rdatatable/data.table/issues/884). Gains by argument. Closes [#1080](https://github.com/Rdatatable/data.table/issues/1080). Closes [#1224](https://github.com/Rdatatable/data.table/issues/1224). Thanks to @DavidArenburg, @kevinm [...]
-
-  10. Implemented `transpose()` to transpose a list and `tstrsplit` which is a wrapper for `transpose(strsplit(...))`. This is particularly useful in scenarios where a column has to be split and the resulting list has to be assigned to multiple columns. See `?transpose` and `?tstrsplit`, [#1025](https://github.com/Rdatatable/data.table/issues/1025) and [#1026](https://github.com/Rdatatable/data.table/issues/1026) for usage scenarios. Closes both #1025 and #1026 issues.
-    * Implemented `type.convert` as suggested by Richard Scriven. Closes [#1094](https://github.com/Rdatatable/data.table/issues/1094).
-
-  11. `melt.data.table` 
-      * can now melt into multiple columns by providing a list of columns to `measure.vars` argument. Closes [#828](https://github.com/Rdatatable/data.table/issues/828). Thanks to Ananda Mahto for the extended email discussions and ideas on generating the `variable` column.
-      * also retains attributes wherever possible. Closes [#702](https://github.com/Rdatatable/data.table/issues/702) and [#993](https://github.com/Rdatatable/data.table/issues/993). Thanks to @richierocks for the report.
-      * Added `patterns.Rd`. Closes [#1294](https://github.com/Rdatatable/data.table/issues/1294). Thanks to @MichaelChirico.
-
-  12. `.SDcols`
-      * understands `!` now, i.e., `DT[, .SD, .SDcols=!"a"]` now works, and is equivalent to `DT[, .SD, .SDcols = -c("a")]`. Closes [#1066](https://github.com/Rdatatable/data.table/issues/1066). 
-      * accepts logical vectors as well. If length is smaller than number of columns, the vector is recycled. Closes [#1060](https://github.com/Rdatatable/data.table/issues/1060). Thanks to @StefanFritsch.
-
-  13. `dcast` can now:
-      * cast multiple `value.var` columns simultaneously. Closes [#739](https://github.com/Rdatatable/data.table/issues/739).
-      * accept multiple functions under `fun.aggregate`. Closes [#716](https://github.com/Rdatatable/data.table/issues/716).
-      * supports optional column prefixes as mentioned under [this SO post](http://stackoverflow.com/q/26225206/559784). Closes [#862](https://github.com/Rdatatable/data.table/issues/862). Thanks to @JohnAndrews.
-      * works with undefined variables directly in formula. Closes [#1037](https://github.com/Rdatatable/data.table/issues/1037). Thanks to @DavidArenburg for the MRE.
-      * Naming conventions on multiple columns changed according to [#1153](https://github.com/Rdatatable/data.table/issues/1153). Thanks to @MichaelChirico for the FR.
-      * also has a `sep` argument with default `_` for backwards compatibility. [#1210](https://github.com/Rdatatable/data.table/issues/1210). Thanks to @dbetebenner for the FR.
-
-  14. `.SDcols` and `with=FALSE` understand `colA:colB` form now. That is, `DT[, lapply(.SD, sum), by=V1, .SDcols=V4:V6]` and `DT[, V5:V7, with=FALSE]` works as intended. This is quite useful for interactive use. Closes [#748](https://github.com/Rdatatable/data.table/issues/748) and [#1216](https://github.com/Rdatatable/data.table/issues/1216). Thanks to @carbonmetrics, @jangorecki and @mtennekes.
-
-  15. `setcolorder()` and `setorder()` work with `data.frame`s too. Closes [#1018](https://github.com/Rdatatable/data.table/issues/1018).
-
-  16. `as.data.table.*` and `setDT` argument `keep.rownames` can take a column name as well. When `keep.rownames=TRUE`, the column will still automatically named `rn`. Closes [#575](https://github.com/Rdatatable/data.table/issues/575). 
-
-  17. `setDT` gains a `key` argument so that `setDT(X, key="a")` would convert `X` to a `data.table` by reference *and* key by the columns specified. Closes [#1121](https://github.com/Rdatatable/data.table/issues/1121).
-
-  18. `setDF` also converts `list` of equal length to `data.frame` by reference now. Closes [#1132](https://github.com/Rdatatable/data.table/issues/1132).
-
-  19. `CJ` gains logical `unique` argument with default `FALSE`. If `TRUE`, unique values of vectors are automatically computed and used. This is convenient, for example, `DT[CJ(a, b, c, unique=TRUE)]` instead of  doing `DT[CJ(unique(a), unique(b), unique(c))]`. Ultimately, `unique = TRUE` will be default. Closes [#1148](https://github.com/Rdatatable/data.table/issues/1148). 
-
-  20. `on=` syntax: data.tables can join now without having to set keys by using the new `on` argument. For example: `DT1[DT2, on=c(x = "y")]` would join column 'y' of `DT2` with 'x' of `DT1`. `DT1[DT2, on="y"]` would join on column 'y' on both data.tables. Closes [#1130](https://github.com/Rdatatable/data.table/issues/1130) partly.
-
-  21. `merge.data.table` gains arguments `by.x` and `by.y`. Closes [#637](https://github.com/Rdatatable/data.table/issues/637) and [#1130](https://github.com/Rdatatable/data.table/issues/1130). No copies are made even when the specified columns aren't key columns in data.tables, and therefore much more fast and memory efficient. Thanks to @blasern for the initial PRs. Also gains logical argument `sort` (like base R). Closes [#1282](https://github.com/Rdatatable/data.table/issues/1282).
-
-  22. `setDF()` gains `rownames` argument for ready conversion to a `data.frame` with user-specified rows. Closes [#1320](https://github.com/Rdatatable/data.table/issues/1320). Thanks to @MichaelChirico for the FR and PR.
-
-  23. `print.data.table` gains `quote` argument (defaul=`FALSE`). This option surrounds all printed elements with quotes, helps make whitespace(s) more evident. Closes [#1177](https://github.com/Rdatatable/data.table/issues/1177); thanks to @MichaelChirico for the PR.
-
-  24. `[.data.table` now accepts single column numeric matrix in `i` argument the same way as `data.frame`. Closes [#826](https://github.com/Rdatatable/data.table/issues/826). Thanks to @jangorecki for the PR.
-
-  25. `setDT()` gains `check.names` argument paralleling that of `fread`, `data.table`, and `base` functionality, allowing poorly declared objects to be converted to tidy `data.table`s by reference. Closes [#1338](https://github.com/Rdatatable/data.table/issues/1338); thanks to @MichaelChirico for the FR/PR.
-
-#### BUG FIXES
-
-  1. `if (TRUE) DT[,LHS:=RHS]` no longer prints, [#869](https://github.com/Rdatatable/data.table/issues/869) and [#1122](https://github.com/Rdatatable/data.table/issues/1122). Tests added. To get this to work we've had to live with one downside: if a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` or `print(DT)` is typed at the prompt, nothing will be printed. A repeated `DT` or `print(DT)` will print. To avoid this: include a `DT[]`  [...]
-  
-  2. `DT[FALSE,LHS:=RHS]` no longer prints either, [#887](https://github.com/Rdatatable/data.table/issues/887). Thanks to Jureiss for reporting.
-  
-  3. `:=` no longer prints in knitr for consistency with behaviour at the prompt, [#505](https://github.com/Rdatatable/data.table/issues/505). Output of a test `knit("knitr.Rmd")` is now in data.table's unit tests. Thanks to Corone for the illustrated report.
-  
-  4. `knitr::kable()` works again without needing to upgrade from knitr v1.6 to v1.7, [#809](https://github.com/Rdatatable/data.table/issues/809). Packages which evaluate user code and don't wish to import data.table need to be added to `data.table:::cedta.pkgEvalsUserCode` and now only the `eval` part is made data.table-aware (the rest of such package's code is left data.table-unaware). `data.table:::cedta.override` is now empty and will be deprecated if no need for it arises. Thanks to [...]
-  
-  5. `fread()`:
-      * doubled quotes ("") inside quoted fields including if immediately followed by an embedded newline. Thanks to James Sams for reporting, [#489](https://github.com/Rdatatable/data.table/issues/489). 
-      * quoted fields with embedded newlines in the lines used to detect types, [#810](https://github.com/Rdatatable/data.table/issues/810). Thanks to Vladimir Sitnikov for the scrambled data file which is now included in the test suite.
-      * when detecting types in the middle and end of the file, if the jump lands inside a quoted field with (possibly many) embedded newlines, this is now detected.
-      * if the file doesn't exist the error message is clearer ([#486](https://github.com/Rdatatable/data.table/issues/486))
-      * system commands are now checked to contain at least one space
-      * sep="." now works (when dec!="."), [#502](https://github.com/Rdatatable/data.table/issues/502). Thanks to Ananda Mahto for reporting.
-      * better error message if quoted field is missing an end quote, [#802](https://github.com/Rdatatable/data.table/issues/802). Thanks to Vladimir Sitnikov for the sample file which is now included in the test suite.
-      * providing sep which is not present in the file now reads as if sep="\n" rather than 'sep not found', #738. Thanks to Adam Kennedy for explaining the use-case.
-      * seg fault with errors over 1,000 characters (when long lines are included) is fixed, [#802](https://github.com/Rdatatable/data.table/issues/802). Thanks again to Vladimir Sitnikov.
-      * Missing `integer64` values are properly assigned `NA`s. Closes [#488](https://github.com/Rdatatable/data.table/issues/488). Thanks to @PeterStoyanov and @richierocks for the report.
-      * Column headers with empty strings aren't skipped anymore. [Closes #483](https://github.com/Rdatatable/data.table/issues/483). Thanks to @RobyJoehanes and @kforner.
-      * Detects separator correctly when commas also exist in text fields. Closes [#923](https://github.com/Rdatatable/data.table/issues/923). Thanks to @raymondben for the report.
-      * `NA` values in NA inflated file are read properly. [Closes #737](https://github.com/Rdatatable/data.table/issues/737). Thanks to Adam Kennedy.  
-      * correctly handles `na.strings` argument for all types of columns - it detect possible `NA` values without coercion to character, like in base `read.table`. [fixes #504](https://github.com/Rdatatable/data.table/issues/504). Thanks to @dselivanov for the PR. Also closes [#1314](https://github.com/Rdatatable/data.table/issues/1314), which closes this issue completely, i.e., `na.strings = c("-999", "FALSE")` etc. also work.
-      * deals with quotes more robustly. When reading quoted fields fail, it re-attemps to read the field as if it wasn't quoted. This helps read in those fields that might have unbalanced quotes without erroring immediately, thereby closing issues [#568](https://github.com/Rdatatable/data.table/issues/568), [#1256](https://github.com/Rdatatable/data.table/issues/1256), [#1077](https://github.com/Rdatatable/data.table/issues/1077), [#1079](https://github.com/Rdatatable/data.table/issues/ [...]
-      * gains argument `strip.white` which is `TRUE` by default (unlike `base::read.table`). All unquoted columns' leading and trailing white spaces are automatically removed. If \code{FALSE}, only trailing spaces of header is removed. Closes [#1113](https://github.com/Rdatatable/data.table/issues/1113), [#1035](https://github.com/Rdatatable/data.table/issues/1035), [#1000](https://github.com/Rdatatable/data.table/issues/1000), [#785](https://github.com/Rdatatable/data.table/issues/785), [...]
-      * doesn't warn about empty lines when 'nrow' argument is specified and that many rows are read properly. Thanks to @richierocks for the report. Closes [#1330](https://github.com/Rdatatable/data.table/issues/1330).
-      * doesn't error/warn about not being able to read last 5 lines when 'nrow' argument is specified. Thanks to @robbig2871. Closes [#773](https://github.com/Rdatatable/data.table/issues/773).
-
-  6. Auto indexing:
-      * `DT[colA == max(colA)]` now works again without needing `options(datatable.auto.index=FALSE)`. Thanks to Jan Gorecki and kaybenleroll, [#858](https://github.com/Rdatatable/data.table/issues/858). Test added.
-      * `DT[colA %in% c("id1","id2","id2","id3")]` now ignores the RHS duplicates (as before, consistent with base R) without needing `options(datatable.auto.index=FALSE)`. Thanks to Dayne Filer for reporting.
-      * If `DT` contains a column `class` (happens to be a reserved attribute name in R) then `DT[class=='a']` now works again without needing `options(datatable.auto.index=FALSE)`. Thanks to sunnyghkm for reporting, [#871](https://github.com/Rdatatable/data.table/issues/871).
-      * `:=` and `set*` now drop secondary keys (new in v1.9.4) so that `DT[x==y]` works again after a `:=` or `set*` without needing `options(datatable.auto.index=FALSE)`. Only `setkey()` was dropping secondary keys correctly. 23 tests added. Thanks to user36312 for reporting, [#885](https://github.com/Rdatatable/data.table/issues/885).
-      * Automatic indices are not created on `.SD` so that `dt[, .SD[b == "B"], by=a]` works correctly. Fixes [#958](https://github.com/Rdatatable/data.table/issues/958). Thanks to @azag0 for the nice reproducible example.
-      * `i`-operations resulting in 0-length rows ignore `j` on subsets using auto indexing. Closes [#1001](https://github.com/Rdatatable/data.table/issues/1001). Thanks to @Gsee.
-      * `POSIXct` type columns work as expected with auto indexing. Closes [#955](https://github.com/Rdatatable/data.table/issues/955). Thanks to @GSee for the minimal report.
-      * Auto indexing with `!` operator, for e.g., `DT[!x == 1]` works as intended. Closes [#932](https://github.com/Rdatatable/data.table/issues/932). Thanks to @matthieugomez for the minimal example.
-      * While fixing `#932`, issues on subsetting `NA` were also spotted and fixed, for e.g., `DT[x==NA]` or `DT[!x==NA]`. 
-      * Works fine when RHS is of `list` type - quite unusual operation but could happen. Closes [#961](https://github.com/Rdatatable/data.table/issues/961). Thanks to @Gsee for the minimal report.
-      * Auto indexing errored in some cases when LHS and RHS were not of same type. This is fixed now. Closes [#957](https://github.com/Rdatatable/data.table/issues/957). Thanks to @GSee for the minimal report.
-      * `DT[x == 2.5]` where `x` is integer type resulted in `val` being coerced to integer (for binary search) and therefore returned incorrect result. This is now identified using the function `isReallyReal()` and if so, auto indexing is turned off. Closes [#1050](https://github.com/Rdatatable/data.table/issues/1050).
-      * Auto indexing errored during `DT[x %in% val]` when `val` has some values not present in `x`. Closes [#1072](https://github.com/Rdatatable/data.table/issues/1072). Thanks to @CarlosCinelli for asking on [StackOverflow](http://stackoverflow.com/q/28932742/559784).
-
-  7. `as.data.table.list` with list input having 0-length items, e.g. `x = list(a=integer(0), b=3:4)`. `as.data.table(x)` recycles item `a` with `NA`s to fit the length of the longer column `b` (length=2), as before now, but with an additional warning message that the item has been recycled with `NA`. Closes [#847](https://github.com/Rdatatable/data.table/issues/847). Thanks to @tvinodr for the report. This was a regression from 1.9.2.
-
-  8. `DT[i, j]` when `i` returns all `FALSE` and `j` contains some length-0 values (ex: `integer(0)`) now returns an empty data.table as it should. Closes [#758](https://github.com/Rdatatable/data.table/issues/758) and [#813](https://github.com/Rdatatable/data.table/issues/813). Thanks to @tunaaa and @nigmastar for the nice reproducible reports. 
-
-  9. `allow.cartesian` is ignored during joins when:  
-      * `i` has no duplicates and `mult="all"`. Closes [#742](https://github.com/Rdatatable/data.table/issues/742). Thanks to @nigmastar for the report.  
-      * assigning by reference, i.e., `j` has `:=`. Closes [#800](https://github.com/Rdatatable/data.table/issues/800). Thanks to @matthieugomez for the report.
-
-  In both these cases (and during a `not-join` which was already fixed in [1.9.4](https://github.com/Rdatatable/data.table/blob/master/README.md#bug-fixes-1)), `allow.cartesian` can be safely ignored.
-
-  10. `names<-.data.table` works as intended on data.table unaware packages with Rv3.1.0+. Closes [#476](https://github.com/Rdatatable/data.table/issues/476) and [#825](https://github.com/Rdatatable/data.table/issues/825). Thanks to ezbentley for reporting [here](http://stackoverflow.com/q/23256177/559784) on SO and to @narrenfrei.
-  
-  11. `.EACHI` is now an exported symbol (just like `.SD`,`.N`,`.I`,`.GRP` and `.BY` already were) so that packages using `data.table` and `.EACHI` pass `R CMD check` with no NOTE that this symbol is undefined. Thanks to Matt Bannert for highlighting.
-  
-  12. Some optimisations of `.SD` in `j` was done in 1.9.4, refer to [#735](https://github.com/Rdatatable/data.table/issues/735). Due to an oversight, j-expressions of the form c(lapply(.SD, ...), list(...)) were optimised improperly. This is now fixed. Thanks to @mmeierer for filing [#861](https://github.com/Rdatatable/data.table/issues/861).
-
-  13. `j`-expressions in `DT[, col := x$y()]` (or) `DT[, col := x[[1]]()]` are now (re)constructed properly. Thanks to @ihaddad-md for reporting. Closes [#774](https://github.com/Rdatatable/data.table/issues/774).
-
-  14. `format.ITime` now handles negative values properly. Closes [#811](https://github.com/Rdatatable/data.table/issues/811). Thanks to @StefanFritsch for the report along with the fix!
-
-  15. Compatibility with big endian machines (e.g., SPARC and PowerPC) is restored. Most Windows, Linux and Mac systems are little endian; type `.Platform$endian` to confirm. Thanks to Gerhard Nachtmann for reporting and the [QEMU project](http://qemu.org/) for their PowerPC emulator.
-
-  16. `DT[, LHS := RHS]` with RHS is of the form `eval(parse(text = foo[1]))` referring to columns in `DT` is now handled properly. Closes [#880](https://github.com/Rdatatable/data.table/issues/880). Thanks to tyner.
-
-  17. `subset` handles extracting duplicate columns in consistency with data.table's rule - if a column name is duplicated, then accessing that column using column number should return that column, whereas accessing by column name (due to ambiguity) will always extract the first column. Closes [#891](https://github.com/Rdatatable/data.table/issues/891). Thanks to @jjzz.
-
-  18. `rbindlist` handles combining levels of data.tables with both ordered and unordered factor columns properly. Closes [#899](https://github.com/Rdatatable/data.table/issues/899). Thanks to @ChristK.
-
-  19. Updating `.SD` by reference using `set` also errors appropriately now; similar to `:=`. Closes [#927](https://github.com/Rdatatable/data.table/issues/927). Thanks to @jrowen for the minimal example.
-
-  20. `X[Y, .N]` returned the same result as `X[Y, .N, nomatch=0L]`) when `Y` contained rows that has no matches in `X`. Fixed now. Closes [#963](https://github.com/Rdatatable/data.table/issues/963). Thanks to [this SO post](http://stackoverflow.com/q/27004002/559784) from @Alex which helped discover the bug.
-
-  21. `data.table::dcast` handles levels in factor columns properly when `drop = FALSE`. Closes [#893](https://github.com/Rdatatable/data.table/issues/893). Thanks to @matthieugomez for the great minimal example.
-
-  22. `[.data.table` subsets complex and raw type objects again. Thanks to @richierocks for the nice minimal example. Closes [#982](https://github.com/Rdatatable/data.table/issues/982).
-
-  23. Fixed a bug in the internal optimisation of `j-expression` with more than one `lapply(.SD, function(..) ..)` as illustrated [here on SO](http://stackoverflow.com/a/27495844/559784). Closes #985. Thanks to @jadaliha for the report and to @BrodieG for the debugging on SO.
-
-  24. `mget` fetches columns from the default environment `.SD` when called from within the frame of `DT`. That is, `DT[, mget(cols)]`, `DT[, lapply(mget(cols), sum), by=.]` etc.. work as intended. Thanks to @Roland for filing this issue. Closes [#994](https://github.com/Rdatatable/data.table/issues/994).
-
-  25. `foverlaps()` did not find overlapping intervals correctly *on numeric ranges* in a special case where both `start` and `end` intervals had *0.0*. This is now fixed. Thanks to @tdhock for the reproducible example. Closes [#1006](https://github.com/Rdatatable/data.table/issues/1006) partly.
-
-  26. When performing rolling joins, keys are set only when we can be absolutely sure. Closes [#1010](https://github.com/Rdatatable/data.table/issues/1010), which explains cases where keys should not be retained.
-
-  27. Rolling joins with `-Inf` and `Inf` are handled properly. Closes [#1007](https://github.com/Rdatatable/data.table/issues/1007). Thanks to @tdhock for filing [#1006](https://github.com/Rdatatable/data.table/issues/1006) which lead to the discovery of this issue.
-
-  28. Overlapping range joins with `-Inf` and `Inf` and 0.0 in them are handled properly now. Closes [#1006](https://github.com/Rdatatable/data.table/issues/1006). Thanks to @tdhock for filing the issue with a nice reproducible example.
-
-  29. Fixed two segfaults in `shift()` when number of rows in `x` is lesser than value for `n`. Closes [#1009](https://github.com/Rdatatable/data.table/issues/1009) and [#1014](https://github.com/Rdatatable/data.table/issues/1014). Thanks to @jangorecki and @ashinm for the reproducible reports.
-
-  30. Attributes are preserved for `sum()` and `mean()` when fast internal (GForce) implementations are used. Closes [#1023](https://github.com/Rdatatable/data.table/issues/1023). Thanks to @DavidArenburg for the nice reproducible example.
-
-  31. `lapply(l, setDT)` is handled properly now; over-allocation isn't lost. Similarly, `for (i in 1:k) setDT(l[[i]])` is handled properly as well. Closes [#480](https://github.com/Rdatatable/data.table/issues/480). 
-
-  32. `rbindlist` stack imbalance on all `NULL` list elements is now fixed. Closes [#980](https://github.com/Rdatatable/data.table/issues/980). Thanks to @ttuggle.
-
-  33. List columns can be assigned to columns of `factor` type by reference. Closes [#936](https://github.com/Rdatatable/data.table/issues/936). Thanks to @richierocks for the minimal example.
-
-  34. After setting the `datatable.alloccol` option, creating a data.table with more than the set `truelength` resulted in error or segfault. This is now fixed. Closes [#970](https://github.com/Rdatatable/data.table/issues/970). Thanks to @caneff for the nice minimal example.
-
-  35. Update by reference using `:=` after loading from disk where the `data.table` exists within a local environment now works as intended. Closes [#479](https://github.com/Rdatatable/data.table/issues/479). Thanks to @ChongWang for the minimal reproducible example.
-
-  36. Issues on merges involving `factor` columns with `NA` and merging `factor` with `character` type with non-identical levels are both fixed. Closes [#499](https://github.com/Rdatatable/data.table/issues/499) and [#945](https://github.com/Rdatatable/data.table/issues/945). Thanks to @AbielReinhart and @stewbasic for the minimal examples.
-
-  37. `as.data.table(ll)` returned a `data.table` with 0-rows when the first element of the list has 0-length, for e.g., `ll = list(NULL, 1:2, 3:4)`. This is now fixed by removing those 0-length elements. Closes [#842](https://github.com/Rdatatable/data.table/issues/842). Thanks to @Rick for the nice minimal example.
-
-  38. `as.datat.able.factor` redirects to `as.data.table.matrix` when input is a `matrix`, but also of type `factor`. Closes [#868](https://github.com/Rdatatable/data.table/issues/868). Thanks to @mgahan for the example.
-
-  39. `setattr` now returns an error when trying to set `data.table` and/or `data.frame` as class to a *non-list* type object (ex: `matrix`). Closes [#832](https://github.com/Rdatatable/data.table/issues/832). Thanks to @Rick for the minimal example.
-
-  40. data.table(table) works as expected. Closes [#1043](https://github.com/Rdatatable/data.table/issues/1043). Thanks to @rnso for the [SO post](http://stackoverflow.com/q/28499359/559784).
-
-  41. Joins and binary search based subsets of the form `x[i]` where `x`'s key column is integer and `i` a logical column threw an error before. This is now fixed by converting the logical column to integer type and then performing the join, so that it works as expected. 
-
-  42. When `by` expression is, for example, `by = x %% 2`, `data.table` tries to automatically extracts meaningful column names from the expression. In this case it would be `x`. However, if the `j-expression` also contains `x`, for example, `DT[, last(x), by= x %% 2]`, the original `x` got masked by the expression in `by`. This is now fixed; by-expressions are not simplified in column names for these cases. Closes [#497](https://github.com/Rdatatable/data.table/issues/497). Thanks to @G [...]
-
-  43. `rbindlist` now errors when columns have non-identical class attributes and are not `factor`s, e.g., binding column of class `Date` with `POSIXct`. Previously this returned incorrect results. Closes [#705](https://github.com/Rdatatable/data.table/issues/705). Thanks to @ecoRoland for the minimal report.
-
-  44. Fixed a segfault in `melt.data.table` when `measure.vars` have duplicate names. Closes [#1055](https://github.com/Rdatatable/data.table/issues/1055). Thanks to @ChristK for the minimal report.
-
-  45. Fixed another segfault in `melt.data.table` issue that was caught due to issue in Windows. Closes [#1059](https://github.com/Rdatatable/data.table/issues/1059). Thanks again to @ChristK for the minimal report. 
-
-  46. `DT[rows, newcol := NULL]` resulted in a segfault on the next assignment by reference. Closes [#1082](https://github.com/Rdatatable/data.table/issues/1082). Thanks to @stevenbagley for the MRE.
-
-  47. `as.matrix(DT)` handles cases where `DT` contains both numeric and logical columns correctly (doesn't coerce to character columns anymore). Closes [#1083](https://github.com/Rdatatable/data.table/issues/1083). Thanks to @bramvisser for the [SO post](http://stackoverflow.com/questions/29068328/correlation-between-numeric-and-logical-variable-gives-intended-error).
-
-  48. Coercion is handled properly on subsets/joins on `integer64` key columns. Closes [#1108](https://github.com/Rdatatable/data.table/issues/1108). Thanks to @vspinu.
-
-  49. `setDT()` and `as.data.table()` both strip *all classes* preceding *data.table*/*data.frame*, to be consistent with base R. Closes [#1078](https://github.com/Rdatatable/data.table/issues/1078) and [#1128](https://github.com/Rdatatable/data.table/issues/1128). Thanks to Jan and @helix123 for the reports.
-
-  50. `setattr(x, 'levels', value)` handles duplicate levels in `value`
- appropriately. Thanks to Jeffrey Horner for pointing it out [here](http://jeffreyhorner.tumblr.com/post/118297392563/tidyr-challenge-help-me-do-my-job). Closes [#1142](https://github.com/Rdatatable/data.table/issues/1142).
-
-  51. `x[J(vals), .N, nomatch=0L]` also included no matches in result, [#1074](https://github.com/Rdatatable/data.table/issues/1074). And `x[J(...), col := val, nomatch=0L]` returned a warning with incorrect results when join resulted in no matches as well, even though `nomatch=0L` should have no effect in `:=`, [#1092](https://github.com/Rdatatable/data.table/issues/1092). Both issues are fixed now. Thanks to @riabusan and @cguill95 for #1092.
-
-  52. `.data.table.locked` attributes set to NULL in internal function `subsetDT`. Closes [#1154](https://github.com/Rdatatable/data.table/issues/1154). Thanks to @Jan.
-
-  53. Internal function `fastmean()` retains column attributes. Closes [#1160](https://github.com/Rdatatable/data.table/issues/1160). Thanks to @renkun-ken.
-
-  54. Using `.N` in `i`, for e.g., `DT[, head(.SD, 3)[1:(.N-1L)]]`  accessed incorrect value of `.N`. This is now fixed. Closes [#1145](https://github.com/Rdatatable/data.table/issues/1145). Thanks to @claytonstanley.
-  
-  55. `setDT` handles `key=` argument properly when input is already a `data.table`. Closes [#1169](https://github.com/Rdatatable/data.table/issues/1169). Thanks to @DavidArenburg for the PR.
-
-  56. Key is retained properly when joining on factor type columns. Closes [#477](https://github.com/Rdatatable/data.table/issues/477). Thanks to @nachti for the report.
-  
-  57. Over-allocated memory is released more robustly thanks to Karl Miller's investigation and suggested fix.
-  
-  58. `DT[TRUE, colA:=colA*2]` no longer churns through 4 unnecessary allocations as large as one column. This was caused by `i=TRUE` being recycled. Thanks to Nathan Kurz for reporting and investigating. Added provided test to test suite. Only a single vector is allocated now for the RHS (`colA*2`). Closes [#1249](https://github.com/Rdatatable/data.table/issues/1249).
-  
-  59. Thanks to @and3k for the excellent bug report [#1258](https://github.com/Rdatatable/data.table/issues/1258). This was a result of shallow copy retaining keys when it shouldn't. It affected some cases of joins using `on=`. Fixed now.
-
-  60. `set()` and `:=` handle RHS value `NA_integer_` on factor types properly. Closes [#1234](https://github.com/Rdatatable/data.table/issues/1234). Thanks to @DavidArenburg.
-
-  61. `merge.data.table()` didn't set column order (and therefore names) properly in some cases. Fixed now. Closes [#1290](https://github.com/Rdatatable/data.table/issues/1290). Thanks to @ChristK for the minimal example.
-
-  62. print.data.table now works for 100+ rows as intended when `row.names=FALSE`. Closes [#1307](https://github.com/Rdatatable/data.table/issues/1307). Thanks to @jangorecki for the PR.
-
-  63. Row numbers are not printed in scientific format. Closes [#1167](https://github.com/Rdatatable/data.table/issues/1167). Thanks to @jangorecki for the PR.
-
-  64. Using `.GRP` unnamed in `j` now returns a variable named `GRP` instead of `.GRP` as the period was causing issues. Same for `.BY`. Closes [#1243](https://github.com/Rdatatable/data.table/issues/1243); thanks to @MichaelChirico for the PR.
-
-  65. `DT[, 0, with=FALSE]` returns null data.table to be consistent with `data.frame`'s behaviour. Closes [#1140](https://github.com/Rdatatable/data.table/issues/1140). Thanks to @franknarf1.
-
-  66. Evaluating quoted expressions with `.` in `by` works as intended. That is, `dt = data.table(a=c(1,1,2,2), b=1:4); expr=quote(.(a)); dt[, sum(b), eval(expr)]` works now. Closes [#1298](https://github.com/Rdatatable/data.table/issues/1298). Thanks @eddi.
-
-  67. `as.list` method for `IDate` object works properly. Closes [#1315](https://github.com/Rdatatable/data.table/issues/1315). Thanks to @gwerbin.
-
-#### NOTES
-
-  1. Clearer explanation of what `duplicated()` does (borrowed from base). Thanks to @matthieugomez for pointing out. Closes [#872](https://github.com/Rdatatable/data.table/issues/872).
-  
-  2. `?setnames` has been updated now that `names<-` and `colnames<-` shallow (rather than deep) copy from R >= 3.1.0, [#853](https://github.com/Rdatatable/data.table/issues/872).
-  
-  3. [FAQ 1.6](https://github.com/Rdatatable/data.table/wiki/vignettes/datatable-faq.pdf) has been embellished, [#517](https://github.com/Rdatatable/data.table/issues/517). Thanks to a discussion with Vivi and Josh O'Brien.
-  
-  4. `data.table` redefines `melt` generic and *suggests* `reshape2` instead of *import*. As a result we don't have to load `reshape2` package to use `melt.data.table` anymore. The reason for this change is that `data.table` requires R >=2.14, whereas `reshape2` R v3.0.0+. Reshape2's melt methods can be used without any issues by loading the package normally.
-
-  5. `DT[, j, ]` at times made an additional (unnecessary) copy. This is now fixed. This fix also avoids allocating `.I` when `j` doesn't use it. As a result `:=` and other subset operations should be faster (and use less memory). Thanks to @szilard for the nice report. Closes [#921](https://github.com/Rdatatable/data.table/issues/921).
-
-  6. Because `reshape2` requires R >3.0.0, and `data.table` works with R >= 2.14.1, we can not import `reshape2` anymore. Therefore we define a `melt` generic and `melt.data.table` method for data.tables and redirect to `reshape2`'s `melt` for other objects. This is to ensure that existing code works fine. 
-
-  7. `dcast` is also a generic now in data.table. So we can use `dcast(...)` directly, and don't have to spell it out as `dcast.data.table(...)` like before. The `dcast` generic in data.table redirects to `reshape2::dcast` if the input object is not a data.table. But for that you have to load `reshape2` before loading `data.table`. If not,  reshape2's `dcast` overwrites data.table's `dcast` generic, in which case you will need the `::` operator - ex: `data.table::dcast(...)`. 
-
-  NB: Ideal situation would be for `dcast` to be a generic in reshape2 as well, but it is not. We have issued a [pull request](https://github.com/hadley/reshape/pull/62) to make `dcast` in reshape2 a generic, but that has not yet been accepted. 
-
-  8. Clarified the use of `bit64::integer4` in `merge.data.table()` and `setNumericRounding()`. Closes [#1093](https://github.com/Rdatatable/data.table/issues/1093). Thanks to @sfischme for the report.
-
-  9. Removed an unnecessary (and silly) `giveNames` argument from `setDT()`. Not sure why I added this in the first place!
-
-  10. `options(datatable.prettyprint.char=5L)` restricts the number of characters to be printed for character columns. For example:
-    
-        options(datatable.prettyprint.char = 5L)
-        DT = data.table(x=1:2, y=c("abcdefghij", "klmnopqrstuv"))
-        DT
-        #    x        y
-        # 1: 1 abcde...
-        # 2: 2 klmno...
-
-  11. `rolltolast` argument in `[.data.table` is now defunct. It was deprecated in 1.9.4.
-  
-  12. `data.table`'s dependency has been moved forward from R 2.14.0 to R 2.14.1, now nearly 4 years old (Dec 2011). As usual before release to CRAN we ensure data.table passes the test suite on the stated dependency and keep this as old as possible for as long as possible. As requested by users in managed environments. For this reason we still don't use `paste0()` internally, since that was added to R 2.15.0.
-
-  13. Warning about `datatable.old.bywithoutby` option (for grouping on join without providing `by`) being deprecated in the next release is in place now. Thanks to @jangorecki for the PR.
-
-  14. Fixed `allow.cartesian` documentation to `nrow(x)+nrow(i)` instead of `max(nrow(x), nrow(i))`. Closes [#1123](https://github.com/Rdatatable/data.table/issues/1123). 
-
-### Changes in v1.9.4  (on CRAN 2 Oct 2014)
-
-#### NEW FEATURES
-
-  1. `by=.EACHI` runs `j` for each group in `DT` that each row of `i` joins to.
-    ```R
-    setkey(DT, ID)
-    DT[c("id1", "id2"), sum(val)]                # single total across both id1 and id2
-    DT[c("id1", "id2"), sum(val), by = .EACHI]   # sum(val) for each id
-    DT[c("id1", "id2"), sum(val), by = key(DT)]  # same
-    ```
-    In other words, `by-without-by` is now explicit, as requested by users, [#371](https://github.com/Rdatatable/data.table/issues/371). When `i` contains duplicates, `by=.EACHI` is different to `by=key(DT)`; e.g.,
-    ```R
-    setkey(DT, ID)
-    ids = c("id1", "id2", "id1")     # NB: id1 appears twice
-    DT[ids, sum(val), by = ID]       # 2 rows returned
-    DT[ids, sum(val), by = .EACHI]   # 3 rows in the order of ids (result 1 and 3 are not merged)
-    ```
-    `by=.EACHI` can be useful when `i` is event data, where you don't want the events aggregated by common join values but wish the output to be ordered with repeats, or simply just using join inherited columns as parameters; e.g.;
-    ```R
-    X[Y, head(.SD, i.top), by = .EACHI]
-    ```
-    where `top` is a non-join column in `Y`; i.e., join inherited column. Thanks to many, especially eddi, Sadao Milberg and Gabor Grothendieck for extended discussions. Closes [#538](https://github.com/Rdatatable/data.table/issues/538).
-
-  2. Accordingly, `X[Y, j]` now does what `X[Y][, j]` did. To return the old behaviour: `options(datatable.old.bywithoutby=TRUE)`. This is a temporary option to aid migration and will be removed in future. See [this](http://r.789695.n4.nabble.com/changing-data-table-by-without-by-syntax-to-require-a-quot-by-quot-td4664770.html), [this](http://stackoverflow.com/questions/16093289/data-table-join-and-j-expression-unexpected-behavior) and [this](http://stackoverflow.com/a/16222108/403310) p [...]
-  
-  3. `Overlap joins` ([#528](https://github.com/Rdatatable/data.table/issues/528)) is now here, finally!! Except for `type="equal"` and `maxgap` and `minoverlap` arguments, everything else is implemented. Check out `?foverlaps` and the examples there on its usage. This is a major feature addition to `data.table`.
-
-  4. `DT[column==value]` and `DT[column %in% values]` are now optimized to use `DT`'s key when `key(DT)[1]=="column"`, otherwise a secondary key (a.k.a. _index_) is automatically added so the next `DT[column==value]` is much faster. No code changes are needed; existing code should automatically benefit. Secondary keys can be added manually using `set2key()` and existence checked using `key2()`. These optimizations and function names/arguments are experimental and may be turned off with ` [...]
-
-  5. `fread()`:
-      * accepts line breaks inside quoted fields. Thanks to Clayton Stanley for highlighting [here](http://stackoverflow.com/questions/21006661/fread-and-a-quoted-multi-line-column-value).
-      * accepts trailing backslash in quoted fields. Thanks to user2970844 for highlighting [here](http://stackoverflow.com/questions/24375832/fread-and-column-with-a-trailing-backslash).
-      * Blank and `"NA"` values in logical columns (`T`,`True`,`TRUE`) no longer cause them to be read as character, [#567](https://github.com/Rdatatable/data.table/issues/567). Thanks to Adam November for reporting.
-      * URLs now work on Windows. R's `download.file()` converts `\r\n` to `\r\r\n` on Windows. Now avoided by downloading in binary mode. Thanks to Steve Miller and Dean MacGregor for reporting, [#492](https://github.com/Rdatatable/data.table/issues/492).
-      * Fixed seg fault in sparse data files when bumping to character, [#796](https://github.com/Rdatatable/data.table/issues/796) and [#722](https://github.com/Rdatatable/data.table/issues/722). Thanks to Adam Kennedy and Richard Cotton for the detailed reproducible reports.
-      * New argument `fread(...,data.table=FALSE)` returns a `data.frame` instead of a `data.table`. This can be set globally: `options(datatable.fread.datatable=FALSE)`.
-
-  6. `.()` can now be used in `j` and is identical to `list()`, for consistency with `i`.
-    ```R
-    DT[,list(MySum=sum(B)),by=...]
-    DT[,.(MySum=sum(B)),by=...]     # same
-    DT[,list(colB,colC,colD)]
-    DT[,.(colB,colC,colD)]          # same
-    ```
-    Similarly, `by=.()` is now a shortcut for `by=list()`, for consistency with `i` and `j`.
-    
-  7. `rbindlist` gains `use.names` and `fill` arguments and is now implemented entirely in C. Closes [#345](https://github.com/Rdatatable/data.table/issues/345):
-      * `use.names` by default is FALSE for backwards compatibility (does not bind by names by default)
-      * `rbind(...)` now just calls `rbindlist()` internally, except that `use.names` is TRUE by default, for compatibility with base (and backwards compatibility).
-      * `fill=FALSE` by default. If `fill=TRUE`, `use.names` has to be TRUE. 
-      * When use.names=TRUE, at least one item of the input list has to have non-null column names.
-      * When fill=TRUE, all items of the input list has to have non-null column names.
-      * Duplicate columns are bound in the order of occurrence, like base.
-      * Attributes that might exist in individual items would be lost in the bound result.
-      * Columns are coerced to the highest SEXPTYPE when they are different, if possible.
-      * And incredibly fast ;).
-      * Documentation updated in much detail. Closes [#333](https://github.com/Rdatatable/data.table/issues/333).
-    
-  8. `bit64::integer64` now works in grouping and joins, [#342](https://github.com/Rdatatable/data.table/issues/342). Thanks to James Sams for highlighting UPCs and Clayton Stanley for [this SO post](http://stackoverflow.com/questions/22273321/large-integers-in-data-table-grouping-results-different-in-1-9-2-compared-to-1). `fread()` has been detecting and reading `integer64` for a while.
-
-  9. `setNumericRounding()` may be used to reduce to 1 byte or 0 byte rounding when joining to or grouping columns of type 'numeric', [#342](https://github.com/Rdatatable/data.table/issues/342). See example in `?setNumericRounding` and NEWS item below for v1.9.2. `getNumericRounding()` returns the current setting.
-
-  10. `X[Y]` now names non-join columns from `i` that have the same name as a column in `x`, with an `i.` prefix for consistency with the `i.` prefix that has been available in `j` for some time. This is now documented.
-
-  11. For a keyed table `X` where the key columns are not at the beginning in order, `X[Y]` now retains the original order of columns in X rather than moving the join columns to the beginning of the result.
-
-  12. It is no longer an error to assign to row 0 or row NA.
-    ```R
-    DT[0, colA := 1L]             # now does nothing, silently (was error)
-    DT[NA, colA := 1L]            # now does nothing, silently (was error)
-    DT[c(1, NA, 0, 2), colA:=1L]  # now ignores the NA and 0 silently (was error)
-    DT[nrow(DT) + 1, colA := 1L]  # error (out-of-range) as before
-    ```
-    This is for convenience to avoid the need for a switch in user code that evals various `i` conditions in a loop passing in `i` as an integer vector which may containing `0` or `NA`.
-
-  13. A new function `setorder` is now implemented which uses data.table's internal fast order to reorder rows *by reference*. It returns the result invisibly (like `setkey`) that allows for compound statements; e.g., `setorder(DT, a, -b)[, cumsum(c), by=list(a,b)]`. Check `?setorder` for more info.
-
-  14. `DT[order(x, -y)]` is now by default optimised to use data.table's internal fast order as `DT[forder(DT, x, -y)]`. It can be turned off by setting `datatable.optimize` to < 1L or just calling `base:::order` explicitly. It results in 20x speedup on data.table of 10 million rows with 2 integer columns, for example. To order character vectors in descending order it's sufficient to do `DT[order(x, -y)]` as opposed to `DT[order(x, -xtfrm(y))]` in base. This closes [#603](https://github. [...]
-     
-  15. `mult="all"` -vs- `mult="first"|"last"` now return consistent types and columns, [#340](https://github.com/Rdatatable/data.table/issues/340). Thanks to Michele Carriero for highlighting.
-
-  16. `duplicated.data.table` and `unique.data.table` gains `fromLast = TRUE/FALSE` argument, similar to base. Default value is FALSE. Closes [#347](https://github.com/Rdatatable/data.table/issues/347).
-
-  17. `anyDuplicated.data.table` is now implemented. Closes [#350](https://github.com/Rdatatable/data.table/issues/350). Thanks to M C (bluemagister) for reporting.
-
-  18. Complex j-expressions of the form `DT[, c(..., lapply(.SD, fun)), by=grp]`are now optimised as long as `.SD` is of the form `lapply(.SD, fun)` or `.SD`, `.SD[1]` or `.SD[1L]`. This resolves [#370](https://github.com/Rdatatable/data.table/issues/370). Thanks to Sam Steingold for reporting. 
-      This also completes the first two task lists in [#735](https://github.com/Rdatatable/data.table/issues/735).
-    ```R
-    ## example:
-    DT[, c(.I, lapply(.SD, sum), mean(x), lapply(.SD, log)), by=grp]
-    ## is optimised to
-    DT[, list(.I, x=sum(x), y=sum(y), ..., mean(x), log(x), log(y), ...), by=grp]
-    ## and now... these variations are also optimised internally for speed
-    DT[, c(..., .SD, lapply(.SD, sum), ...), by=grp]
-    DT[, c(..., .SD[1], lapply(.SD, sum), ...), by=grp]
-    DT[, .SD, by=grp]
-    DT[, c(.SD), by=grp]
-    DT[, .SD[1], by=grp] # Note: but not yet DT[, .SD[1,], by=grp]
-    DT[, c(.SD[1]), by=grp]
-    DT[, head(.SD, 1), by=grp] # Note: but not yet DT[, head(.SD, -1), by=grp]
-    # but not yet optimised
-    DT[, c(.SD[a], .SD[x>1], lapply(.SD, sum)), by=grp] # where 'a' is, say, a numeric or a data.table, and also for expressions like x>1
-    ```
-    The underlying message is that `.SD` is being slowly optimised internally wherever possible, for speed, without compromising in the nice readable syntax it provides.
-
-  19. `setDT` gains `keep.rownames = TRUE/FALSE` argument, which works only on `data.frame`s. TRUE retains the data.frame's row names as a new column named `rn`.
-
-  20. The output of `tables()` now includes `NCOL`. Thanks to @dnlbrky for the suggestion.
-
-  21. `DT[, LHS := RHS]` (or its equivalent in `set`) now provides a warning and returns `DT` as it was, instead of an error, when `length(LHS) = 0L`, [#343](https://github.com/Rdatatable/data.table/issues/343). For example:
-    ```R
-    DT[, grep("^b", names(DT)) := NULL] # where no columns start with b
-    # warns now and returns DT instead of error
-    ```
-
-  22. GForce now is also optimised for j-expression with `.N`. Closes [#334](https://github.com/Rdatatable/data.table/issues/334) and part of [#523](https://github.com/Rdatatable/data.table/issues/523).
-    ```R
-    DT[, list(.N, mean(y), sum(y)), by=x] # 1.9.2 - doesn't know to use GForce - will be (relatively) slower
-    DT[, list(.N, mean(y), sum(y)), by=x] # 1.9.3+ - will use GForce.
-    ```
-
-  23. `setDF` is now implemented. It accepts a data.table and converts it to data.frame by reference, [#338](https://github.com/Rdatatable/data.table/issues/338). Thanks to canneff for the discussion [here](http://r.789695.n4.nabble.com/Is-there-any-overhead-to-converting-back-and-forth-from-a-data-table-to-a-data-frame-td4688332.html) on data.table mailing list.
-
-  24. `.I` gets named as `I` (instead of `.I`) wherever possible, similar to `.N`, [#344](https://github.com/Rdatatable/data.table/issues/344).
-  
-  25. `setkey` on `.SD` is now an error, rather than warnings for each group about rebuilding the key. The new error is similar to when attempting to use `:=` in a `.SD` subquery: `".SD is locked. Using set*() functions on .SD is reserved for possible future use; a tortuously flexible way to modify the original data by group."`  Thanks to Ron Hylton for highlighting the issue on datatable-help [here](http://r.789695.n4.nabble.com/data-table-is-asking-for-help-tp4692080.html).
-  
-  26. Looping calls to `unique(DT)` such as in `DT[,unique(.SD),by=group]` is now faster by avoiding internal overhead of calling `[.data.table`. Thanks again to Ron Hylton for highlighting in the [same thread](http://r.789695.n4.nabble.com/data-table-is-asking-for-help-tp4692080.html). His example is reduced from 28 sec to 9 sec, with identical results.
-  
-  27. Following `gsum` and `gmean`, now `gmin` and `gmax` from GForce are also implemented. Closes part of [#523](https://github.com/Rdatatable/data.table/issues/523). Benchmarks are also provided.
-    ```R
-    DT[, list(sum(x), min(y), max(z), .N), by=...] # runs by default using GForce
-    ```
-    
-  28. `setorder()` and `DT[order(.)]` handles `integer64` type in descending order as well. Closes [#703](https://github.com/Rdatatable/data.table/issues/703).
-  
-  29. `setorder()` and `setorderv()` gain `na.last = TRUE/FALSE`. Closes [#706](https://github.com/Rdatatable/data.table/issues/706).
-
-  30. `.N` is now available in `i`, [FR#724](https://github.com/Rdatatable/data.table/issues/724). Thanks to newbie indirectly [here](http://stackoverflow.com/a/24649115/403310) and Farrel directly [here](http://stackoverflow.com/questions/24685421/how-do-you-extract-a-few-random-rows-from-a-data-table-on-the-fly).
-
-  31. `by=.EACHI` is now implemented for *not-joins* as well. Closes [#604](https://github.com/Rdatatable/data.table/issues/604). Thanks to Garrett See for filing the FR. As an example:
-    ```R
-    DT = data.table(x=c(1,1,1,1,2,2,3,4,4,4), y=1:10, key="x")
-    DT[!J(c(1,4)), sum(y), by=.EACHI] # is equivalent to DT[J(c(2,3)), sum(y), by=.EACHI]
-    ```
-
-
-#### BUG FIXES
-
-  
-  1.  When joining to fewer columns than the key has, using one of the later key columns explicitly in j repeated the first value. A problem introduced by v1.9.2 and not caught bythe 1,220 tests, or tests in 37 dependent packages. Test added. Many thanks to Michele Carriero for reporting.
-    ```R
-    DT = data.table(a=1:2, b=letters[1:6], key="a,b")    # keyed by a and b
-    DT[.(1), list(b,...)]    # correct result again (joining just to a not b but using b)
-    ```
-    
-  2.  `setkey` works again when a non-key column is type list (e.g. each cell can itself be a vector), [#54](https://github.com/Rdatatable/data.table/issues/54). Test added. Thanks to James Sams, Michael Nelson and Musx [for the reproducible examples](http://stackoverflow.com/questions/22186798/r-data-table-1-9-2-issue-on-setkey).
-
-  3.  The warning "internal TRUE value has been modified" with recently released R 3.1 when grouping a table containing a logical column *and* where all groups are just 1 row is now fixed and tests added. Thanks to James Sams for the reproducible example. The warning is issued by R and we have asked if it can be upgraded to error (UPDATE: change now made for R 3.1.1 thanks to Luke Tierney).
-
-  4.  `data.table(list())`, `data.table(data.table())` and `data.table(data.frame())` now return a null data.table (no columns) rather than one empty column, [#48](https://github.com/Rdatatable/data.table/issues/48). Test added. Thanks to Shubh Bansal for reporting.
-
-  5.  `unique(<NULL data.table>)` now returns a null data.table, [#44](https://github.com/Rdatatable/data.table/issues/44). Thanks to agstudy for reporting.
-
-  6.  `data.table()` converted POSIXlt to POSIXct, consistent with `base:::data.frame()`, but now also provides a helpful warning instead of coercing silently, [#59](https://github.com/Rdatatable/data.table/issues/59). Thanks to Brodie Gaslam, Patrick and Ragy Isaac for reporting [here](http://stackoverflow.com/questions/21487614/error-creating-r-data-table-with-date-time-posixlt) and [here](http://stackoverflow.com/questions/21320215/converting-from-data-frame-to-data-table-i-get-an-err [...]
-
-  7.  If another class inherits from data.table; e.g. `class(DT) == c("UserClass","data.table","data.frame")` then `DT[...]` now retains `UserClass` in the result. Thanks to Daniel Krizian for reporting, [#64](https://github.com/Rdatatable/data.table/issues/44). Test added.
-
-  8.  An error `object '<name>' not found` could occur in some circumstances, particularly after a previous error. [Reported on SO](http://stackoverflow.com/questions/22128047/how-to-avoid-weird-umlaute-error-when-using-data-table) with non-ASCII characters in a column name, a red herring we hope since non-ASCII characters are fully supported in data.table including in column names. Fix implemented and tests added.
-
-  9.  Column order was reversed in some cases by `as.data.table.table()`, [#43](https://github.com/Rdatatable/data.table/issues/43). Test added. Thanks to Benjamin Barnes for reporting.
-     
-  10.  `DT[, !"missingcol", with=FALSE]` now returns `DT` (rather than a NULL data.table) with warning that "missingcol" is not present.
-
-  11.  `DT[,y := y * eval(parse(text="1*2"))]` resulted in error unless `eval()` was wrapped with paranthesis. That is, `DT[,y := y * (eval(parse(text="1*2")))]`, **#5423**. Thanks to Wet Feet for reporting and to Simon O'Hanlon for identifying the issue [here on SO](http://stackoverflow.com/questions/22375404/unable-to-use-evalparse-in-data-table-function/22375557#22375557).
-
-  12.  Using `by` columns with attributes (ex: factor, Date) in `j` did not retain the attributes, also in case of `:=`. This was partially a regression from an earlier fix ([#155](https://github.com/Rdatatable/data.table/issues/155)) due to recent changes for R3.1.0. Now fixed and clearer tests added. Thanks to Christophe Dervieux for reporting and to Adam B for reporting [here on SO](http://stackoverflow.com/questions/22536586/by-seems-to-not-retain-attribute-of-date-type-columns-in-da [...]
-
-  13.  `.BY` special variable did not retain names of the grouping columns which resulted in not being able to access `.BY$grpcol` in `j`. Ex: `DT[, .BY$x, by=x]`. This is now fixed. Closes **#5415**. Thanks to Stephane Vernede for the bug report.
-
-  14.  Fixed another issue with `eval(parse(...))` in `j` along with assignment by reference `:=`. Closes [#30](https://github.com/Rdatatable/data.table/issues/30). Thanks to Michele Carriero for reporting. 
-
-  15.  `get()` in `j` did not see `i`'s columns when `i` is a data.table which lead to errors while doing operations like: `DT1[DT2, list(get('c'))]`. Now, use of `get` makes *all* x's and i's columns visible (fetches all columns). Still, as the verbose message states, using `.SDcols` or `eval(macro)` would be able to select just the columns used, which is better for efficiency. Closes [#34](https://github.com/Rdatatable/data.table/issues/34). Thanks to Eddi for reporting.
-
-  16.  Fixed an edge case with `unique` and `duplicated`, which on empty data.tables returned a 1-row data.table with all NAs. Closes [#28](https://github.com/Rdatatable/data.table/issues/28). Thanks to Shubh Bansal for reporting.
-
-  17.  `dcast.data.table` resuled in error (because function `CJ()` was not visible) in packages that "import" data.table. This did not happen if the package "depends" on data.table. Closes bug [#31](https://github.com/Rdatatable/data.table/issues/31). Thanks to K Davis for the excellent report. 
-
-  18.  `merge(x, y, all=TRUE)` error when `x` is empty data.table is now fixed. Closes [#24](https://github.com/Rdatatable/data.table/issues/24). Thanks to Garrett See for filing the report.
-
-  19.  Implementing #5249 closes bug [#26](https://github.com/Rdatatable/data.table/issues/26), a case where rbind gave error when binding with empty data.tables. Thanks to Roger for [reporting on SO](http://stackoverflow.com/q/23216033/559784).
-
-  20.  Fixed a segfault during grouping with assignment by reference, ex: `DT[, LHS := RHS, by=.]`, where length(RHS) > group size (.N). Closes [#25](https://github.com/Rdatatable/data.table/issues/25). Thanks to Zachary Long for reporting on datatable-help mailing list.
-
-  21.  Consistent subset rules on datat.tables with duplicate columns. In short, if indices are directly provided, 'j', or in .SDcols, then just those columns are either returned (or deleted if you provide -.SDcols or !j). If instead, column names are given and there are more than one occurrence of that column, then it's hard to decide which to keep and which to remove on a subset. Therefore, to remove, all occurrences of that column are removed, and to keep, always the first column is r [...]
-
-       Note that using `by=` to aggregate on duplicate columns may not give intended result still, as it may not operate on the proper column.
-
-  22.  When DT is empty, `DT[, newcol:=max(b), by=a]` now properly adds the column, [#49](https://github.com/Rdatatable/data.table/issues/49). Thanks to Shubh bansal for filing the report.
-
-  23.  When `j` evaluates to `integer(0)/character(0)`, `DT[, j, with=FALSE]` resulted in error, [#21](https://github.com/Rdatatable/data.table/issues/21). Thanks indirectly to Malcolm Cook for [#52](https://github.com/Rdatatable/data.table/issues/52), through which this (recent) regression (from 1.9.3) was found.
-
-  24.  `print(DT)` now respects `digits` argument on list type columns, [#37](https://github.com/Rdatatable/data.table/issues/37). Thanks to Frank for the discussion on the mailing list and to Matthew Beckers for filing the bug report.
-
-  25.  FR # 2551 implemented leniance in warning messages when columns are coerced with `DT[, LHS := RHS]`, when `length(RHS)==1`. But this was very lenient; e.g., `DT[, a := "bla"]`, where `a` is a logical column should get a warning. This is now fixed such that only very obvious cases coerces silently; e.g., `DT[, a := 1]` where `a` is `integer`. Closes [#35](https://github.com/Rdatatable/data.table/issues/35). Thanks to Michele Carriero and John Laing for reporting.
-
-  26.  `dcast.data.table` provides better error message when `fun.aggregate` is specified but it returns length != 1. Closes [#693](https://github.com/Rdatatable/data.table/issues/693). Thanks to Trevor Alexander for reporting [here on SO](http://stackoverflow.com/questions/24152733/undocumented-error-in-dcast-data-table).
-
-  27.  `dcast.data.table` tries to preserve attributes whereever possible, except when `value.var` is a `factor` (or ordered factor). For `factor` types, the casted columns will be coerced to type `character` thereby losing the `levels` attribute. Closes [#688](https://github.com/Rdatatable/data.table/issues/688). Thanks to juancentro for reporting.
-
-  28.  `melt` now returns friendly error when `meaure.vars` are not in data instead of segfault. Closes [#699](https://github.com/Rdatatable/data.table/issues/688). Thanks to vsalmendra for [this post on SO](http://stackoverflow.com/q/24326797/559784) and the subsequent bug report.
-
-  29.  `DT[, list(m1 = eval(expr1), m2=eval(expr2)), by=val]` where `expr1` and `expr2` are constructed using `parse(text=.)` now works instead of resulting in error. Closes [#472](https://github.com/Rdatatable/data.table/issues/472). Thanks to Benjamin Barnes for reporting with a nice reproducible example.
-
-  30.  A join of the form `X[Y, roll=TRUE, nomatch=0L]` where some of Y's key columns occur more than once (duplicated keys) might at times return incorrect join. This was introduced only in 1.9.2 and is fixed now. Closes [#700](https://github.com/Rdatatable/data.table/issues/472). Thanks to Michael Smith for the very nice reproducible example and nice spotting of such a tricky case.
-
-  31.  Fixed an edge case in `DT[order(.)]` internal optimisation to be consistent with base. Closes [#696](https://github.com/Rdatatable/data.table/issues/696). Thanks to Michael Smith and Garrett See for reporting.
-
-  32.  `DT[, list(list(.)), by=.]` and `DT[, col := list(list(.)), by=.]` returns correct results in R >=3.1.0 as well. The bug was due to recent (welcoming) changes in R v3.1.0 where `list(.)` does not result in a *copy*. Closes [#481](https://github.com/Rdatatable/data.table/issues/481). Also thanks to KrishnaPG for filing [#728](https://github.com/Rdatatable/data.table/issues/728).
-
-  33.  `dcast.data.table` handles `fun.aggregate` argument properly when called from within a function that accepts `fun.aggregate` argument and passes to `dcast.data.table()`. Closes [#713](https://github.com/Rdatatable/data.table/issues/713). Thanks to mathematicalcoffee for reporting [here](http://stackoverflow.com/q/24542976/559784) on SO.
-
-  34.  `dcast.data.table` now returns a friendly error when fun.aggregate value for missing combinations is 0-length, and 'fill' argument is not provided. Closes [#715](https://github.com/Rdatatable/data.table/issues/715)
-
-  35.  `rbind/rbindlist` binds in the same order of occurrence also when binding tables with duplicate names along with 'fill=TRUE' (previously, it grouped all duplicate columns together). This was the underlying reason for [#725](https://github.com/Rdatatable/data.table/issues/715). Thanks to Stefan Fritsch for the report with a nice reproducible example and discussion.
-
-  36.  `setDT` now provides a friendly error when attempted to change a variable to data.table by reference whose binding is locked (usually when the variable is within a package, ex: CO2). Closes [#475](https://github.com/Rdatatable/data.table/issues/475). Thanks to David Arenburg for filing the report [here](http://stackoverflow.com/questions/23361080/error-in-setdt-from-data-table-package) on SO.
-
-  37.  `X[!Y]` where `X` and `Y` are both data.tables ignores 'allow.cartesian' argument, and rightly so because a not-join (or anti-join) cannot exceed nrow(x). Thanks to @fedyakov for spotting this. Closes [#698](https://github.com/Rdatatable/data.table/issues/698).
-
-  38.  `as.data.table.matrix` does not convert strings to factors by default. `data.table` likes and prefers using character vectors to factors. Closes [#745](https://github.com/Rdatatable/data.table/issues/698). Thanks to @fpinter for reporting the issue on the github issue tracker and to vijay for reporting [here](http://stackoverflow.com/questions/17691050/data-table-still-converts-strings-to-factors) on SO.
-
-  39.  Joins of the form `x[y[z]]` resulted in duplicate names when all `x`, `y` and `z` had the same column names as non-key columns. This is now fixed. Closes [#471](https://github.com/Rdatatable/data.table/issues/471). Thanks to Christian Sigg for the nice reproducible example.
-
-  40. `DT[where, someCol:=NULL]` is now an error that `i` is provided since it makes no sense to delete a column for only a subset of rows. Closes [#506](https://github.com/Rdatatable/data.table/issues/506).
-
-  41. `forder` did not identify `-0` as `0` for numeric types. This is fixed now. Thanks to @arcosdium for nice minimal example. Closes [#743](https://github.com/Rdatatable/data.table/issues/743).
-
-  42. Segfault on joins of the form `X[Y, c(..), by=.EACHI]` is now fixed. Closes [#744](https://github.com/Rdatatable/data.table/issues/744). Thanks to @nigmastar (Michele Carriero) for the excellent minimal example. 
-
-  43. Subset on data.table using `lapply` of the form `lapply(L, "[", Time == 3L)` works now without error due to `[.data.frame` redirection. Closes [#500](https://github.com/Rdatatable/data.table/issues/500). Thanks to Garrett See for reporting.
-
-  44. `id.vars` and `measure.vars` default value of `NULL` was removed to be consistent in behaviour with `reshape2:::melt.data.frame`. Closes [#780](https://github.com/Rdatatable/data.table/issues/780). Thanks to @dardesta for reporting.
-
-  45. Grouping using external variables on keyed data.tables did not return correct results at times. Thanks to @colinfang for reporting. Closes [#762](https://github.com/Rdatatable/data.table/issues/762).
-
-#### NOTES
-
-  1.  Reminder: using `rolltolast` still works but since v1.9.2 now issues the following warning:
-     > 'rolltolast' has been marked 'deprecated' in ?data.table since v1.8.8 on CRAN 3 Mar 2013, see NEWS. Please change to the more flexible 'rollends' instead. 'rolltolast' will be removed in the next version."
-
-  2.  Using `with=FALSE` with `:=` is now deprecated in all cases, given that wrapping the LHS of `:=` with parentheses has been preferred for some time.
-    ```R
-    colVar = "col1"
-    DT[, colVar:=1, with=FALSE]                   # deprecated, still works silently as before
-    DT[, (colVar):=1]                             # please change to this
-    DT[, c("col1","col2"):=1]                     # no change
-    DT[, 2:4 := 1]                                # no change
-    DT[, c("col1","col2"):=list(sum(a),mean(b)]   # no change
-    DT[, `:=`(...), by=...]                       # no change
-    ```
-    The next release will issue a warning when `with=FALSE` is used with `:=`.
-
-  3.  `?duplicated.data.table` explained that `by=NULL` or `by=FALSE` would use all columns, however `by=FALSE` resulted in error. `by=FALSE` is removed from help and `duplicated` returns an error when `by=TRUE/FALSE` now. Closes [#38](https://github.com/Rdatatable/data.table/issues/38).
-     
-  4.  More info about distinguishing small numbers from 0.0 in v1.9.2+ is [here](http://stackoverflow.com/questions/22290544/grouping-very-small-numbers-e-g-1e-28-and-0-0-in-data-table-v1-8-10-vs-v1-9-2).
-
-  5.  `?dcast.data.table` now explains how the names are generated for the columns that are being casted. Closes **#5676**.
-  
-  6.  `dcast.data.table(dt, a ~ ... + b)` now generates the column names with values from `b` coming last. Closes **#5675**.
-
-  7.  Added `x[order(.)]` internal optimisation, and how to go back to `base:::order(.)` if one wants to sort by session locale to 
-     `?setorder` (with alias `?order` and `?forder`). Closes [#478](https://github.com/Rdatatable/data.table/issues/478) and 
-     also [#704](https://github.com/Rdatatable/data.table/issues/704). Thanks to Christian Wolf for the report.
-
-  8.  Added tests (1351.1 and 1351.2) to catch any future regressions on particular case of binary search based subset reported [here](http://stackoverflow.com/q/24729001/559784) on SO. Thanks to Scott for the post. The regression was contained to v1.9.2 AFAICT. Closes [#734](https://github.com/Rdatatable/data.table/issues/704).
-
-  9.  Added an `.onUnload` method to unload `data.table`'s shared object properly. Since the name of the shared object is 'datatable.so' and not 'data.table.so', 'detach' fails to unload correctly. This was the reason for the issue reported [here](http://stackoverflow.com/questions/23498804/load-detach-re-load-anomaly) on SO. Closes [#474](https://github.com/Rdatatable/data.table/issues/474). Thanks to Matthew Plourde for reporting.
-
-  10.  Updated `BugReports` link in DESCRIPTION. Thanks to @chrsigg for reporting. Closes [#754](https://github.com/Rdatatable/data.table/issues/754).
-
-  11. Added `shiny`, `rmarkdown` and `knitr` to the data.table whitelist. Packages which take user code as input and run it in their own environment (so do not `Depend` or `Import` `data.table` themselves) either need to be added here, or they can define a variable `.datatable.aware <- TRUE` in their namepace, so that data.table can work correctly in those packages. Users can also add to `data.table`'s whitelist themselves using `assignInNamespace()` but these additions upstream remove t [...]
-
-  12. Clarified `with=FALSE` as suggested in [#513](https://github.com/Rdatatable/data.table/issues/513).
-
-  13. Clarified `.I` in `?data.table`. Closes [#510](https://github.com/Rdatatable/data.table/issues/510). Thanks to Gabor for reporting.
-
-  14. Moved `?copy` to its own help page, and documented that `dt_names <- copy(names(DT))` is necessary for `dt_names` to be not modified by reference as a result of updating `DT` by reference (e.g. adding a new column by reference). Closes [#512](https://github.com/Rdatatable/data.table/issues/512). Thanks to Zach for [this SO question](http://stackoverflow.com/q/15913417/559784) and user1971988 for [this SO question](http://stackoverflow.com/q/18662715/559784).
-  
-  15. `address(x)` doesn't increment `NAM()` value when `x` is a vector. Using the object as argument to a non-primitive function is sufficient to increment its reference. Closes #824. Thanks to @tarakc02 for the [question on twitter](https://twitter.com/tarakc02/status/513796515026837504) and hint from Hadley.
-  
----
-
-### Changes in v1.9.2 (on CRAN 27 Feb 2014)
-
-#### NEW FEATURES
-
-  1.  Fast methods of `reshape2`'s `melt` and `dcast` have been implemented for `data.table`, **FR #2627**. Most settings are identical to `reshape2`, see `?melt.data.table.`
-    > `melt`: 10 million rows and 5 columns, 61.3 seconds reduced to 1.2 seconds.  
-    > `dcast`: 1 million rows and 4 columns, 192 seconds reduced to 3.6 seconds.  
-     
-      * `melt.data.table` is also capable of melting on columns of type `list`.
-      * `melt.data.table` gains `variable.factor` and `value.factor` which by default are TRUE and FALSE respectively for compatibility with `reshape2`. This allows for directly controlling the output type of "variable" and "value" columns (as factors or not).
-      * `melt.data.table`'s `na.rm = TRUE` parameter is optimised to remove NAs directly during melt and therefore avoids the overhead of subsetting using `!is.na` afterwards on the molten data. 
-      * except for `margins` argument from `reshape2:::dcast`, all features of dcast are intact. `dcast.data.table` can also accept `value.var` columns of type list.
-    
-    > Reminder of Cologne (Dec 2013) presentation **slide 32** : ["Why not submit a dcast pull request to reshape2?"](http://datatable.r-forge.r-project.org/CologneR_2013.pdf).
-  
-  2.  Joins scale better as the number of rows increases. The binary merge used to start on row 1 of i; it now starts on the middle row of i. Many thanks to Mike Crowe for the suggestion. This has been done within column so scales much better as the number of join columns increase, too. 
-
-    > Reminder: bmerge allows the rolling join feature: forwards, backwards, limited and nearest.  
-
-  3.  Sorting (`setkey` and ad-hoc `by=`) is faster and scales better on randomly ordered data and now also adapts to almost sorted data. The remaining comparison sorts have been removed. We use a combination of counting sort and forwards radix (MSD) for all types including double, character and integers with range>100,000; forwards not backwards through columns. This was inspired by [Terdiman](http://codercorner.com/RadixSortRevisited.htm) and [Herf's](http://stereopsis.com/radix.html)  [...]
-
-  4.  `unique` and `duplicated` methods for `data.table` are significantly faster especially for type numeric (i.e. double), and type integer where range > 100,000 or contains negatives.
-     
-  5.  `NA`, `NaN`, `+Inf` and `-Inf` are now considered distinct values, may be in keys, can be joined to and can be grouped. `data.table` defines: `NA` < `NaN` < `-Inf`. Thanks to Martin Liberts for the suggestions, #4684, #4815 and #4883. 
-  
-  6.  Numeric data is still joined and grouped within tolerance as before but instead of tolerance being `sqrt(.Machine$double.eps) == 1.490116e-08` (the same as `base::all.equal`'s default) the significand is now rounded to the last 2 bytes, apx 11 s.f. This is more appropriate for large (1.23e20) and small (1.23e-20) numerics and is faster via a simple bit twiddle. A few functions provided a 'tolerance' argument but this wasn't being passed through so has been removed. We aim to add a  [...]
-     
-  7.  New optimization: **GForce**. Rather than grouping the data, the group locations are passed into grouped versions of sum and mean (`gsum` and `gmean`) which then compute the result for all groups in a single sequential pass through the column for cache efficiency. Further, since the g* function is called just once, we don't need to find ways to speed up calling sum or mean repetitively for each group. Plan is to add `gmin`, `gmax`, `gsd`, `gprod`, `gwhich.min` and `gwhich.max`. Exa [...]
-    ```R
-    DT[,sum(x,na.rm=),by=...]                       # yes
-    DT[,list(sum(x,na.rm=),mean(y,na.rm=)),by=...]  # yes
-    DT[,lapply(.SD,sum,na.rm=),by=...]              # yes
-    DT[,list(sum(x),min(y)),by=...]                 # no. gmin not yet available, only sum and mean so far.
-    ```
-    GForce is a level 2 optimization. To turn it off: `options(datatable.optimize=1)`. Reminder: to see the optimizations and other info, set `verbose=TRUE`
-
-  8.  fread's `integer64` argument implemented. Allows reading of `integer64` data as 'double' or 'character' instead of `bit64::integer64` (which remains the default as before). Thanks to Chris Neff for the suggestion. The default can be changed globally; e.g, `options(datatable.integer64="character")`
-     
-  9.  fread's `drop`, `select` and `NULL` in `colClasses` are implemented. To drop or select columns by name or by number. See examples in `?fread`.
-     
-  10.  fread now detects `T`, `F`, `True`, `False`, `TRUE` and `FALSE` as type logical, consistent with `read.csv`, #4766. Thanks to Adam November for highlighting.
-  
-  11.  fread now accepts quotes (both `'` and `"`) in the middle of fields, whether the field starts with `"` or not, rather than the 'unbalanced quotes' error, #2694. Thanks to baidao for reporting. It was known and documented at the top of `?fread` (now removed). If a field starts with `"` it must end with `"` (necessary to include the field separator itself in the field contents). Embedded quotes can be in column names, too. Newlines (`\n`) still can't be in quoted fields or quoted co [...]
-     
-  12.  fread gains `showProgress`, default TRUE. The global option is `datatable.showProgress`.
-     
-  13.  `fread("1.46761e-313\n")` detected the **ERANGE** error, so read as `character`. It now reads as numeric but with a detailed warning. Thanks to Heather Turner for the detailed report, #4879.
-     
-  14.  fread now understand system commands; e.g., `fread("grep blah file.txt")`.
-
-  15.  `as.data.table` method for `table()` implemented, #4848. Thanks to Frank Pinter for suggesting [here on SO](http://stackoverflow.com/questions/18390947/data-table-of-table-is-very-different-from-data-frame-of-table).
-  
-  16.  `as.data.table` methods added for integer, numeric, character, logical, factor, ordered and Date.
-	 
-  17.  `DT[i,:=,]` now accepts negative indices in `i`. Thanks to Eduard Antonyan. See also bug fix #2697.
-
-  18.  `set()` is now able to add new columns by reference, #2077.
-    ```R
-    DT[3:5, newCol := 5L]
-    set(DT, i=3:5, j="newCol", 5L)   # same
-    ```
-
-  19.  eval will now be evaluated anywhere in a `j`-expression as long as it has just one argument, #4677. Will still need to use `.SD` as environment in complex cases. Also fixes bug [here on SO](http://stackoverflow.com/a/19054962/817778).
-
-  20.  `!` at the head of the expression will no longer trigger a not-join if the expression is logical, #4650. Thanks to Arunkumar Srinivasan for reporting.
-
-  21.  `rbindlist` now chooses the highest type per column, not the first, #2456. Up-conversion follows R defaults, with the addition of factors being the highest type. Also fixes #4981 for the specific case of `NA`'s.
-
-  22.  `cbind(x,y,z,...)` now creates a data.table if `x` isn't a `data.table` but `y` or `z` is, unless `x` is a `data.frame` in which case a `data.frame` is returned (use `data.table(DF,DT)` instead for that). 
-  
-  23.  `cbind(x,y,z,...)` and `data.table(x,y,z,...)` now retain keys of any `data.table` inputs directly (no sort needed, for speed). The result's key is `c(key(x), key(y), key(z), ...)`, provided, that the data.table inputs that have keys are not recycled and there are no ambiguities (i.e. duplicates) in column names.
-
-  24.  `rbind/rbindlist` will preserve ordered factors if it's possible to do so; i.e., if a compatible global order exists, #4856 & #5019. Otherwise the result will be a `factor` and a *warning*.
-
-  25.  `rbind` now has a `fill` argument, #4790. When `fill=TRUE` it will behave in a manner similar to plyr's `rbind.fill`. This option is incompatible with `use.names=FALSE`. Thanks to Arunkumar Srinivasan for the base code.
-
-  26.  `rbind` now relies exclusively on `rbindlist` to bind `data.tables` together. This makes rbind'ing factors faster, #2115.
-
-  27.  `DT[, as.factor('x'), with=FALSE]` where `x` is a column in `DT` is now equivalent to `DT[, "x", with=FALSE]` instead of ending up with an error, #4867. Thanks to tresbot for reporting [here on SO](http://stackoverflow.com/questions/18525976/converting-multiple-data-table-columns-to-factors-in-r).
-
-  28.  `format.data.table` now understands 'formula' and displays embedded formulas as expected, FR #2591.
-
-  29.  `{}` around `:=` in `j` now obtain desired result, but with a warning #2496. Now, 
-    ```R
-    DT[, { `:=`(...)}]             # now works
-    DT[, {`:=`(...)}, by=(...)]    # now works
-    ```
-    Thanks to Alex for reporting [here on SO](http://stackoverflow.com/questions/14541959/expression-syntax-for-data-table-in-r).
-
-  30.  `x[J(2), a]`, where `a` is the key column sees `a` in `j`, #2693 and FAQ 2.8. Also, `x[J(2)]` automatically names the columns from `i` using the key columns of `x`. In cases where the key columns of `x` and `i` are identical, i's columns can be referred to by using `i.name`; e.g., `x[J(2), i.a]`. Thanks to mnel and Gabor for the discussion [here](http://r.789695.n4.nabble.com/Problem-with-FAQ-2-8-tt4668878.html).
-
-  31.  `print.data.table` gains `row.names`, default=TRUE. When FALSE, the row names (along with the :) are not printed, #5020. Thanks to Frank Erickson.
-
-  32.  `.SDcols` now is also able to de-select columns. This works both with column names and column numbers.
-    ```R
-    DT[, lapply(.SD,...), by=..., .SDcols=-c(1,3)]       # .SD all but columns 1 and 3
-    DT[, lapply(.SD,...), by=..., .SDcols=-c("x", "z")]  # .SD all but columns 'x' and 'z'
-    DT[..., .SDcols=c(1, -3)]           # can't mix signs, error
-    DT[, .SD, .SDcols=c("x", -"z")]     # can't mix signs, error
-    ```
-    Thanks to Tonny Peterson for filing FR #4979.
-
-  33.  `as.data.table.list` now issues a warning for those items/columns that result in a remainder due to recycling, #4813. `data.table()` also now issues a warning (instead of an error previously) when recycling leaves a remainder; e.g., `data.table(x=1:2, y=1:3)`.
-
-  34.  `:=` now coerces without warning when precision is not lost and `length(RHS) == 1`, #2551.
-    ```R
-    DT = data.table(x=1:2, y=c(TRUE, FALSE))
-    DT[1, x:=1]   # ok, now silent
-    DT[1, y:=0]   # ok, now silent
-    DT[1, y:=0L]  # ok, now silent
-    ```
-
-  35.  `as.data.table.*(x, keep.rownames=TRUE)`, where `x` is a named vector now adds names of `x` into a new column with default name `rn`. Thanks to Garrett See for FR #2356.
-
-  36.  `X[Y, col:=value]` when no match exists in the join is now caught early and X is simply returned. Also a message when `datatable.verbose` is TRUE is provided. In addition, if `col` is an existing column, since no update actually takes place, the key is now retained. Thanks to Frank Erickson for suggesting, #4996.
-
-  37.  New function `setDT()` takes a `list` (named and/or unnamed) or `data.frame` and changes its type by reference to `data.table`, *without any copy*. It also has a logical argument `giveNames` which is used for a list inputs. See `?setDT` examples for more. Based on [this FR on SO](http://stackoverflow.com/questions/20345022/convert-a-data-frame-to-a-data-table-without-copy/20346697#20346697).
-     
-  38.  `setnames(DT,"oldname","newname")` no longer complains about any duplicated column names in `DT` so long as oldname is unique and unambiguous. Thanks to Wet Feet for highlighting [here on SO](http://stackoverflow.com/questions/20942905/ignore-safety-check-when-using-setnames).
-
-  39.  `last(x)` where `length(x)=0` now returns 'x' instead of an error, #5152. Thanks to Garrett See for reporting.
-
-  40.  `as.ITime.character` no longer complains when given vector input, and will accept mixed format time entries; e.g., c("12:00", "13:12:25")
-     
-  41.  Key is now retained in `NA` subsets; e.g.,
-    ```R
-    DT = data.table(a=1:3,b=4:6,key="a")
-    DT[NA]   # 1-row of NA now keyed by 'a'
-    DT[5]    # 1-row of NA now keyed by 'a'
-    DT[2:4]  # not keyed as before because NA (last row of result) sorts first in keyed data.table
-    ```
-
-  42.  Each column in the result for each group has always been recycled (if necessary) to match the longest column in that group's result. If it doesn't recycle exactly, though, it was caught gracefully as an error. Now, it is recycled, with remainder with warning.
-    ```R  
-    DT = data.table(a=1:2,b=1:6)
-    DT[, list(b,1:2), by=a]        # now recycles the 1:2 with warning to length 3
-    ```
-
-#### BUG FIXES
-
-  1.  Long outstanding (usually small) memory leak in grouping fixed, #2648. When the last group is smaller than the largest group, the difference in those sizes was not being released. Also evident in non-trivial aggregations where each group returns a different number of rows. Most users run a grouping
-     query once and will never have noticed these, but anyone looping calls to grouping (such as when running in parallel, or benchmarking) may have suffered. Tests added. Thanks to many including vc273 and Y T for reporting [here](http://stackoverflow.com/questions/20349159/memory-leak-in-data-table-grouped-assignment-by-reference) and [here](http://stackoverflow.com/questions/15651515/slow-memory-leak-in-data-table-when-returning-named-lists-in-j-trying-to-reshap) on SO.
-
-  2.  In long running computations where data.table is called many times repetitively the following error could sometimes occur, #2647: *"Internal error: .internal.selfref prot is not itself an extptr"*. Now fixed. Thanks to theEricStone, StevieP and JasonB for (difficult) reproducible examples [here](http://stackoverflow.com/questions/15342227/getting-a-random-internal-selfref-error-in-data-table-for-r).
-       
-  3.  If `fread` returns a data error (such as no closing quote on a quoted field) it now closes the file first rather than holding a lock open, a Windows only problem.
-     Thanks to nigmastar for reporting [here](http://stackoverflow.com/questions/18597123/fread-data-table-locks-files) and Carl Witthoft for the hint. Tests added.
-       
-  4.  `DT[0,col:=value]` is now a helpful error rather than crash, #2754. Thanks to Ricardo Saporta for reporting. `DT[NA,col:=value]`'s error message has also been improved. Tests added.
-     
-  5.  Assigning to the same column twice in the same query is now an error rather than a crash in some circumstances; e.g., `DT[,c("B","B"):=NULL]` (delete by reference the same column twice). Thanks to Ricardo (#2751) and matt_k (#2791) for reporting [here](http://stackoverflow.com/questions/16638484/remove-multiple-columns-from-data-table). Tests added.
-  
-  6.  Crash and/or incorrect aggregate results with negative indexing in `i` is fixed, with a warning when the `abs(negative index) > nrow(DT)`, #2697. Thanks to Eduard Antonyan (eddi) for reporting [here](http://stackoverflow.com/questions/16046696/data-table-bug-causing-a-segfault-in-r). Tests added.
-  
-  7.  `head()` and `tail()` handle negative `n` values correctly now, #2375. Thanks to Garrett See for reporting. Also it results in an error when `length(n) != 1`. Tests added.
-     
-  8.  Crash when assigning empty data.table to multiple columns is fixed, #4731. Thanks to Andrew Tinka for reporting. Tests added.
-     
-  9.  `print(DT, digits=2)` now heeds digits and other parameters, #2535. Thanks to Heather Turner for reporting. Tests added.
-     
-  10.  `print(data.table(table(1:101)))` is now an 'invalid column' error and suggests `print(as.data.table(table(1:101)))` instead, #4847. Thanks to Frank Pinter for reporting. Test added.
-
-  11.  Crash when grouping by character column where `i` is `integer(0)` is now fixed. It now returns an appropriate empty data.table. This fixes bug #2440. Thanks to Malcolm Cook for reporting. Tests added.
-
-  12.  Grouping when i has value '0' and `length(i) > 1` resulted in crash; it is now fixed. It returns a friendly error instead. This fixes bug #2758. Thanks to Garrett See for reporting. Tests added.
-
-  13.  `:=` failed while subsetting yielded NA and `with=FALSE`, #2445. Thanks to Damian Betebenner for reporting.
-
-  14.  `by=month(date)` gave incorrect results if `key(DT)=="date"`, #2670. Tests added. 
-    ```R
-    DT[,,by=month(date)]         # now ok if key(DT)=="date"
-    DT[,,by=list(month(date))]   # ok before whether or not key(DT)=="date"
-    ```
-         
-  15.  `rbind` and `rbindlist` could crash if input columns themselves had hidden names, #4890 & #4912. Thanks to Chris Neff and Stefan Fritsch for reporting. Tests added.
-     
-  16.  `data.table()`, `as.data.table()` and other paths to create a data.table now detect and drop hidden names, the root cause of #4890. It was never intended that columns could have hidden names attached.
-
-  17.  Cartesian Join (`allow.cartesian = TRUE`) when both `x` and `i` are keyed and `length(key(x)) > length(key(i))` set resulting key incorrectly. This is now fixed, #2677. Tests added. Thanks to Shir Levkowitz for reporting.
-
-  18.  `:=` (assignment by reference) loses POSIXct or ITime attribute *while grouping* is now fixed, #2531. Tests added. Thanks to stat quant for reporting [here](http://stackoverflow.com/questions/14604820/why-does-this-posixct-or-itime-loses-its-format-attribute) and to Paul Murray for reporting [here](http://stackoverflow.com/questions/15996692/cannot-assign-columns-as-date-by-reference-in-data-table) on SO.
-
-  19.  `chmatch()` didn't always match non-ascii characters, #2538 and #4818. chmatch is used internally so `DT[is.na(päs), päs := 99L]` now works. Thanks to Benjamin Barnes and Stefan Fritsch for reporting. Tests added.
-
-  20.  `unname(DT)` threw an error when `20 < nrow(DT) <= 100`, bug #4934. This is now fixed. Tests added. Thanks to Ricardo Saporta.
-
-  21.  A special case of not-join and logical TRUE, `DT[!TRUE]`, gave an error whereas it should be identical to `DT[FALSE]`. Now fixed and tests added. Thanks once again to Ricardo Saporta for filing #4930.
-     
-  22.  `X[Y,roll=-Inf,rollends=FALSE]` didn't roll the middle correctly if `Y` was keyed. It was ok if `Y` was unkeyed or rollends left as the default [c(TRUE,FALSE) when roll < 0]. Thanks to user338714 for reporting [here](http://stackoverflow.com/questions/18984179/roll-data-table-with-rollends). Tests added.
-
-  23.  Key is now retained after an order-preserving subset, #295.
-
-  24.  Fixed bug #2584. Now columns that had function names, in particular "list" do not pose problems in `.SD`. Thanks to Zachary Mayer for reporting.
-
-  25.  Fixed bug #4927. Unusual column names in normal quotes, ex: `by=".Col"`, now works as expected in `by`. Thanks to Ricardo Saporta for reporting.
-
-  26.  `setkey` resulted in error when column names contained ",". This is now fixed. Thanks to Corone for reporting [here](http://stackoverflow.com/a/19166273/817778) on SO.
-
-  27.  `rbind` when at least one argument was a data.table, but not the first, returned the rbind'd data.table with key. This is now fixed, #4995. Thanks to Frank Erickson for reporting.
- 
-  28.  That `.SD` doesn't retain column's class is now fixed (#2530). Thanks to Corone for reporting [here](http://stackoverflow.com/questions/14753411/why-does-data-table-lose-class-definition-in-sd-after-group-by).
-
-  29.  `eval(quote())` returned error when the quoted expression is a not-join, #4994. This is now fixed. Tests added.
-
-  30.  `DT[, lapply(.SD, function(), by=]` did not see columns of DT when optimisation is "on". This is now fixed, #2381. Tests added. Thanks to David F for reporting [here](http://stackoverflow.com/questions/13441868/data-table-and-stratified-means) on SO.
-
-  31.  #4959 - rbind'ing empty data.tables now works
-
-  32.  #5005 - some function expressions were not being correctly evaluated in j-expression. Thanks to Tonny Petersen for reporting.
-
-  33.  Fixed bug #5007, `j` did not see variables declared within a local (function) environment properly. Now, `DT[, lapply(.SD, function(x) fun_const), by=x]` where "fun_const" is a local variable within a function works as expected. Thanks to Ricardo Saporta for catching this and providing a very nice reproducible      example.
-
-  34.  Fixing #5007 also fixes #4957, where `.N` was not visible during `lapply(.SD, function(x) ...)` in `j`. Thanks to juba for noticing it [here](http://stackoverflow.com/questions/19094771/replace-values-in-each-column-based-on-conditions-according-to-groups-by-rows) on SO.
-  
-  35.  Fixed another case where function expressions were not constructed properly in `j`, while fixing #5007. `DT[, lapply(.SD, function(x) my_const), by=x]` now works as expected instead of ending up in an error.
-
-  36.  Fixed #4990, where `:=` did not generate a recycling warning during `by`, when `length(RHS) < group` size but not an integer multiple of group size. Now, 
-    ```R
-    DT <- data.table(a=rep(1:2, c(5,2)))
-    DT[, b := c(1:2), by=a]
-    ``` 
-       will generate a warning (here for first group as RHS length (2) is not an integer multiple of group size (=5)). 
-
-  37.  Fixed #5069 where `gdata:::write.fwf` returned an error with data.table.
-
-  38.  Fixed #5098 where construction of `j`-expression with a function with no-argument returned the function definition instead of returning the result from executing the function.
-
-  39.  Fixed #5106 where `DT[, .N, by=y]` where `y` is a vector with `length(y) = nrow(DT)`, but `y` is not a column in `DT`. Thanks to colinfang for reporting.
-
-  40.  Fixed #5104 which popped out as a side-effect of fixing #2531. `:=` while grouping and assigning columns that are factors resulted in wrong results (and the column not being added). This is now fixed. Thanks to Jonathen Owen for reporting.
-
-  41.  Fixed bug #5114 where modifying columns in particular cases resulted in ".SD is locked" error. Thanks to GSee for the bug report.
-
-  42.  Implementing FR #4979 lead to a bug when grouping with .SDcols, where .SDcols argument was variable name. This bug #5190 is now fixed.
-
-  43.  Fixed #5171 - where setting the attribute name to a non-character type resulted in a segfault. Ex: `setattr(x, FALSE, FALSE); x`. Now ends up with a friendly error.
-     
-  44.  Dependent packages using `cbind` may now *Import* data.table as intended rather than needing to *Depend*. There was a missing `data.table::` prefix on a call to `key()`. Thanks to Maarten-Jan Kallen for reporting.
-     
-  45.  'chmatch' didn't handle character encodings properly when the string was identical but the encoding were different. For ex: `UTF8` and `Latin1`. This is now fixed (a part of bug #5159). Thanks to Stefan Fritsch for reporting.
-
-  46.  Joins (`X[Y]`) on character columns with different encodings now issue a warning that join may result in unexpected results for those indices with different encodings. That is, when "ä" in X's key column and "ä" in Y's key column are of different encodings, a warning is issued. This takes care of bugs #5266 and other part of #5159 for the moment. Thanks to Stefan Fritsch once again for reporting.
-
-  47.  Fixed #5117 - segfault when `rbindlist` on empty data.tables. Thanks to Garrett See for reporting.
-
-  48.  Fixed a rare segfault that occurred on >250m rows (integer overflow during memory allocation); closes #5305. Thanks to Guenter J. Hitsch for reporting.
-
-  49.  `rbindlist` with at least one factor column along with the presence of at least one empty data.table resulted in segfault (or in linux/mac reported an error related to hash tables). This is now fixed, #5355. Thanks to Trevor Alexander for [reporting on SO](http://stackoverflow.com/questions/21591433/merging-really-not-that-large-data-tables-immediately-results-in-r-being-killed) (and mnel for filing the bug report): 
-     
-  50.  `CJ()` now orders character vectors in a locale consistent with `setkey`, #5375. Typically this affected whether upper case letters were ordered before lower case letters; they were by `setkey()` but not by `CJ()`. This difference started in v1.8.10 with the change "CJ() is 90% faster...", see NEWS below. Test added and avenues for differences closed off and nailed down, with no loss in performance. Many thanks to Malcolm Hawkes for reporting.
-
-#### THANKS FOR BETA TESTING TO :
-
-  1.  Zach Mayer for a reproducible segfault related to radix sorting character strings longer than 20. Test added.
-     
-  2.  Simon Biggs for reporting a bug in fread'ing logicals. Test added.
-  
-  3.  Jakub Szewczyk for reporting that where "." is used in formula interface of `dcast.data.table` along with an aggregate function, it did not result in aggregated result, #5149. Test added.
-    ```R
-    dcast.data.table(x, a ~ ., mean, value.var="b")
-    ```
-     
-  4.  Jonathan Owen for reporting that `DT[,sum(.SD),by=]` failed with GForce optimization, #5380. Added test and error message redirecting to use `DT[,lapply(.SD,sum),by=]` or `base::sum` and how to turn off GForce.
-     
-  5.  Luke Tierney for guidance in finding a corruption of `R_TrueValue` which needed `--enable-strict-barier`, `gctorture2` and a hardware watchpoint to ferret out. Started after a change in Rdevel on 11 Feb 2014, r64973.
-     
-  6.  Minkoo Seo for a new test on rbindlist, #4648.
-  
-  7.  Gsee for reporting that `set()` and `:=` could no longer add columns by reference to an object that inherits from data.table; e.g., `class = c("myclass", data.table", "data.frame"))`, #5115.
-  
-  8.  Clayton Stanley for reporting #5307 [here on SO](http://stackoverflow.com/questions/21437546/data-table-1-8-11-and-aggregation-issues). Aggregating logical types could give wrong results.
-
-  9.  New and very welcome ASAN and UBSAN checks on CRAN detected :
-      * integer64 overflow in test 899 reading integers longer than apx 18 digits  
-        ```R
-        fread("Col1\n12345678901234567890")`   # works as before, bumped to character
-        ```
-      * a memory fault in rbindlist when binding ordered factors, and, some items in the list of data.table/list are empty or NULL. In both cases we had anticipated and added tests for these cases, which is why ASAN and UBSAN were able to detect a problem for us.
-  
-  10.  Karl Millar for reporting a similar fault that ASAN detected, #5042. Also fixed.
-     
-  11.  Ricardo Saporta for finding a crash when i is empty and a join column is character, #5387. Test added.
-
-#### NOTES
-
-  1.  If `fread` detects data which would be lost if the column was read according to type supplied in `colClasses`, e.g. a numeric column specified as integer in `colClasses`, the message that it has ignored colClasses is upgraded to warning instead of just a line in `verbose=TRUE` mode.
-  
-  2.  `?last` has been improved and if `xts` is needed but not installed the error message is more helpful, #2728. Thanks to Sam Steingold for reporting.
-     
-  3.  `?between corrected`. It returns a logical not integer vector, #2671. Thanks to Michael Nelson for reporting.
-  
-  4.  `.SD`, `.N`, `.I`, `.GRP` and `.BY` are now exported (as NULL). So that NOTEs aren't produced for them by `R CMD check` or `codetools::checkUsage` via compiler. `utils::globalVariables()` was considered, but exporting chosen. Thanks to Sam Steingold for raising, #2723.
-     
-  5.  When `DT` is empty, `DT[,col:=""]` is no longer a warning. The warning was:
-      > "Supplied 1 items to be assigned to 0 items of column (1 unused)"  
-
-  6.  Using `rolltolast` still works but now issues the following warning :  
-      > "'rolltolast' has been marked 'deprecated' in ?data.table since v1.8.8 on CRAN 3 Mar 2013, see NEWS. Please
-        change to the more flexible 'rollends' instead. 'rolltolast' will be removed in the next version."
-
-  7.  There are now 1,220 raw tests, as reported by `test.data.table()`.
-
-  8.  `data.table`'s dependency has been moved forward from R 2.12.0 to R 2.14.0, now over 2 years old (Oct 2011). As usual before release to CRAN, we ensure data.table passes `R CMD check` on the stated dependency and keep this as old as possible for as long as possible. As requested by users in managed environments. For this reason we still don't use `paste0()` internally, since that was added to R 2.15.0. 
-  
----
-
-### Changes in v1.8.10 (on CRAN 03 Sep 2013)
-
-#### NEW FEATURES
-
-  *  fread :
-     * If some column names are blank they are now given default names rather than causing
-       the header row to be read as a data row. Thanks to Simon Judes for suggesting.
-
-     * "+" and "-" are now read as character rather than integer 0. Thanks to Alvaro Gonzalez and
-       Roby Joehanes for reporting, #4814.
-       http://stackoverflow.com/questions/15388714/reading-strand-column-with-fread-data-table-package
-
-     * % progress console meter has been removed. The ouput was inconvenient in batch mode, log files and
-       reports which don't handle \r. It was too difficult to detect where fread is being called from, plus,
-       removing it speeds up fread a little by saving code inside the C for loop (which is why it wasn't
-       made optional instead). Use your operating system's system monitor to confirm fread is progressing.
-       Thanks to Baptiste for highlighting :
-       http://stackoverflow.com/questions/15370993/strange-output-from-fread-when-called-from-knitr
-
-     * colClasses has been added. Same character vector format as read.csv (may be named or unnamed), but
-       additionally may be type list. Type list enables setting ranges of columns by numeric position.
-       NOTE: colClasses is intended for rare overrides, not routine use.
-
-     * fread now supports files larger than 4GB on 64bit Windows (#2767 thanks to Paul Harding) and files
-       between 2GB and 4GB on 32bit Windows (#2655 thanks to Vishal). A C call to GetFileSize() needed to
-       be GetFileSizeEx().
-
-     * When input is the data as a character string, it is no longer truncated to your system's maximum
-       path length, #2649. It was being passed through path.expand() even when it wasn't a filename.
-       Many thanks to Timothee Carayol for the reproducible report. The limit should now be R's character
-       string length limit (2^31-1 bytes = 2GB). Test added.
-
-     * New argument 'skip' overrides automatic banner skipping. When skip>=0, 'autostart' is ignored and
-       line skip+1 will be taken as the first data row, or column names according to header="auto"|TRUE|FALSE
-       as usual. Or, skip="string" uses the first line containing "string" (chosen to be a substring of the
-       column name row unlikely to appear earlier), inspired by read.xls in package gdata.
-       Thanks to Gabor Grothendieck for these suggestions.
-
-     * fread now stops reading if an empty line is encountered, with warning if any text exists after that
-       such as a footer (the first line of which will be included in the warning message).
-
-     * Now reads files that are open in Excel without having to close them first, #2661. And up to 5 attempts
-       are made every 250ms on Windows as recommended here : http://support.microsoft.com/kb/316609.
-
-     * "nan%" observed in output of fread(...,verbose=TRUE) timings are now 0% when fread takes 0.000 seconds.
-
-     * An unintended 50,000 column limit in fread has been removed. Thanks to mpmorley for reporting. Test added.
-       http://stackoverflow.com/questions/18449997/fread-protection-stack-overflow-error
-
-  *  unique() and duplicated() methods gain 'by' to allow testing for uniqueness using any subset of columns,
-     not just the keyed columns (if keyed) or all columns (if not). By default by=key(dt) for backwards
-     compatibility. ?duplicated has been revised and tests added.
-     Thanks to Arunkumar Srinivasan, Ricardo Saporta, and Frank Erickson for useful discussions.
-
-  *  CJ() is 90% faster on 1e6 rows (for example), #4849. The inputs are now sorted first before combining
-     rather than after combining and uses rep.int instead of rep (thanks to Sean Garborg for the ideas,
-     code and benchmark) and only sorted if is.unsorted(), #2321.
-     Reminder: CJ = Cross Join; i.e., joins to all combinations of its inputs. 
-     
-  *  CJ() gains 'sorted' argument, by default TRUE for backwards compatibility. FALSE retains input order and is faster
-     to create the result of CJ() but then slower to join from since unkeyed.
-     
-  *  New function address() returns the address in RAM of its argument. Sometimes useful in determining whether a value
-     has been copied or not by R, programatically.
-       http://stackoverflow.com/a/10913296/403310
-
-#### BUG FIXES
-
-  *  merge no longer returns spurious NA row(s) when y is empty and all.y=TRUE (or all=TRUE), #2633. Thanks
-     to Vinicius Almendra for reporting. Test added.
-       http://stackoverflow.com/questions/15566250/merge-data-table-with-all-true-introduces-na-row-is-this-correct
-
-  *  rbind'ing data.tables containing duplicate, "" or NA column names now works, #2726 & #2384.
-     Thanks to Garrett See and Arun Srinivasan for reporting. This also affected the printing of data.tables
-     with duplicate column names since the head and tail are rbind-ed together internally.
-
-  *  rbind, cbind and merge on data.table should now work in packages that Import but do not
-     Depend on data.table. Many thanks to a patch to .onLoad from Ken Williams, and related
-     posts from Victor Kryukov :
-	   http://r.789695.n4.nabble.com/Import-problem-with-data-table-in-packages-tp4665958.html
-
-  *  Mixing adding and updating into one DT[, `:=`(existingCol=...,newCol=...), by=...] now works
-     without error or segfault, #2778 and #2528. Many thanks to Arunkumar Srinivasan for reporting
-     and for the nice reproducible examples. Tests added.
-
-  *  rbindlist() now binds factor columns correctly, #2650. Thanks to many for reporting. Tests added.
-
-  *  Deleting a (0-length) factor column using :=NULL on an empty data.table now works, #4809. Thanks
-     to Frank Pinter for reporting. Test added.
-       http://stackoverflow.com/questions/18089587/error-deleting-factor-column-in-empty-data-table
-
-  *  Writing FUN= in DT[,lapply(.SD,FUN=...),] now works, #4893. Thanks to Jan Wijffels for
-     reporting and Arun for suggesting and testing a fix. Committed and test added.
-       http://stackoverflow.com/questions/18314757/why-cant-i-used-fun-in-lapply-when-grouping-by-using-data-table
-	
-  *  The slowness of transform() on data.table has been fixed, #2599. But, please use :=.
-  
-  *  setkey(DT,`Colname with spaces`) now works, #2452.
-     setkey(DT,"Colname with spaces") worked already.
-     
-  *  mean() in j has been optimized since v1.8.2 (see NEWS below) but wasn't respecting na.rm=TRUE (the default).
-     Many thanks to Colin Fang for reporting. Test added.
-       http://stackoverflow.com/questions/18571774/data-table-auto-remove-na-in-by-for-mean-function
-     
-
-USER VISIBLE CHANGES
-
-  *  := on a null data.table now gives a clearer error message :
-       "Cannot use := to add columns to a null data.table (no columns), currently. You can use :=
-        to add (empty) columns to an empty data.table (1 or more columns, all 0 length), though."
-     rather than the untrue :
-       "Cannot use := to add columns to an empty data.table, currently"
-
-  *  Misuse of := and `:=`() is now caught in more circumstances and gives a clearer and shorter error message :
-       ":= and `:=`(...) are defined for use in j, once only and in particular ways. See
-        help(":="). Check is.data.table(DT) is TRUE."
-
-  *  data.table(NULL) now prints "Null data.table (0 rows and 0 cols)" and FAQ 2.5 has been
-     improved. Thanks to:
-       http://stackoverflow.com/questions/15317536/is-null-does-not-work-on-null-data-table-in-r-possible-bug
-
-  *  The braces {} have been removed from rollends's default, to solve a trace() problem. Thanks
-	 to Josh O'Brien's investigation :
-	   http://stackoverflow.com/questions/15931801/why-does-trace-edit-true-not-work-when-data-table
-
-#### NOTES
-
-  *  Tests 617,646 and 647 could sometimes fail (e.g. r-prerel-solaris-sparc on 7 Mar 2013)
-     due to machine tolerance. Fixed.
-
-  *  The default for datatable.alloccol has changed from max(100L, 2L*ncol(DT)) to max(100L, ncol(DT)+64L).
-     And a pointer to ?truelength has been added to an error message as suggested and thanks to Roland :
-       http://stackoverflow.com/questions/15436356/potential-problems-from-over-allocating-truelength-more-than-1000-times
-       
-  *  For packages wishing to use data.table optionally (e.g. according to user of that package) and therefore
-     not wishing to Depend on data.table (which is the normal determination of data.table-awareness via .Depends),
-     `.datatable.aware` may be set to TRUE in such packages which cedta() will look for, as before. But now it doesn't
-     need to be exported. Thanks to Hadley Wickham for the suggestion and solution.
-
-  *  There are now 1,009 raw tests, as reported by test.data.table().
-  
-  *  Welcome to Arunkumar Srinivasan and Ricardo Saporta who have joined the project and contributed directly
-     by way of commits above.
-     
-  *  v1.8.9 was on R-Forge only. v1.8.10 was released to CRAN.
-     Odd numbers are development, evens on CRAN.
-
-
-### Changes in v1.8.8 (on CRAN 06 Mar 2013)
-
-#### NEW FEATURES
-
-    *   New function fread(), a fast and friendly file reader.
-        *  header, skip, nrows, sep and colClasses are all auto detected.
-        *  integers>2^31 are detected and read natively as bit64::integer64.
-        *  accepts filenames, URLs and "A,B\n1,2\n3,4" directly
-        *  new implementation entirely in C
-        *  with a 50MB .csv, 1 million rows x 6 columns :
-             read.csv("test.csv")                                        # 30-60 sec (varies)
-             read.table("test.csv",<all known tricks and known nrows>)   #    10 sec
-             fread("test.csv")                                           #     3 sec
-        * airline data: 658MB csv (7 million rows x 29 columns)
-             read.table("2008.csv",<all known tricks and known nrows>)   #   360 sec
-             fread("2008.csv")                                           #    40 sec
-        See ?fread. Many thanks to Chris Neff, Garrett See, Hideyoshi Maeda, Patrick
-        Nic, Akhil Behl and Aykut Firat for ideas, discussions and beta testing.
-        ** The fread function is still under development; e.g., dates are read as
-        ** character and embedded quotes ("\"" and """") cause problems.
-
-    *   New argument 'allow.cartesian' (default FALSE) added to X[Y] and merge(X,Y), #2464.
-        Prevents large allocations due to misspecified joins; e.g., duplicate key values in Y
-        joining to the same group in X over and over again. The word 'cartesian' is used loosely
-        for when more than max(nrow(X),nrow(Y)) rows would be returned. The error message is
-        verbose and includes advice. Thanks to a question by Nick Clark, help from user1935457
-        and a detailed reproducible crash report from JR.
-          http://stackoverflow.com/questions/14231737/greatest-n-per-group-reference-with-intervals-in-r-or-sql
-        If the new option affects existing code you can set :
-            options(datatable.allow.cartesian=TRUE)
-        to restore the previous behaviour until you have time to address.
-
-    *   In addition to TRUE/FALSE, 'roll' may now be a positive number (roll forwards/LOCF) or
-        negative number (roll backwards/NOCB). A finite number limits the distance a value is
-        rolled (limited staleness). roll=TRUE and roll=+Inf are equivalent.
-        'rollends' is a new parameter holding two logicals. The first observation is rolled
-        backwards if rollends[1] is TRUE. The last observation is rolled forwards if rollends[2]
-        is TRUE. If roll is a finite number, the same limit applies to the ends.
-        New value roll='nearest' joins to the nearest value (either backwards or forwards) when
-        the value falls in a gap, and to the end value according to 'rollends'.
-        'rolltolast' has been deprecated. For backwards compatibility it is converted to
-        {roll=TRUE;rollends=c(FALSE,FALSE)}.
-        This implements FR#2300 & FR#206 and helps several recent S.O. questions :
-            https://r-forge.r-project.org/tracker/?group_id=240&atid=978&func=detail&aid=2300
-
-#### BUG FIXES
-
-    *   setnames(DT,c(NA,NA)) is now a type error rather than a segfault, #2393.
-        Thanks to Damian Betebenner for reporting.
-
-    *   rbind() no longers warns about inputs having columns in a different order
-        if use.names has been explicitly set TRUE, #2385. Thanks to Simon Judes
-        for reporting.
-
-    *   := by group with 0 length RHS could crash in some circumstances. Thanks to
-        Damien Challet for the reproducible example using obfuscated data and
-        pinpointing the version that regressed. Fixed and test added.
-
-    *   Error 'attempting to roll join on a factor column' could occur when a non last
-        join column was a factor column, #2450. Thanks to Blue Magister for
-        highlighting.
-
-    *   NA in a join column of type double could cause both X[Y] and merge(X,Y)
-        to return incorrect results, #2453. Due to an errant x==NA_REAL in the C source
-        which should have been ISNA(x). Support for double in keyed joins is a relatively
-        recent addition to data.table, but embarrassing all the same. Fixed and tests added.
-        Many thanks to statquant for the thorough and reproducible report :
-        http://stackoverflow.com/questions/14076065/data-table-inner-outer-join-to-merge-with-na
-
-    *   setnames() of all column names (such as setnames(DT,toupper(names(DT)))) failed on a
-        keyed table where columns 1:length(key) were not the key. Fixed and test added.
-
-    *   setkey could sort 'double' columns (such as POSIXct) incorrectly when not the
-        last column of the key, #2484. In data.table's C code :
-            x[a] > x[b]-tol
-        should have been :
-            x[a]-x[b] > -tol  [or  x[b]-x[a] < tol ]
-        The difference may have been machine/compiler dependent. Many thanks to statquant
-        for the short reproducible example. Test added.
-
-    *   cbind(DT,1:n) returned an invalid data.table (some columns were empty) when DT
-        had one row, #2478. Grouping now warns if j evaluates to an invalid data.table,
-        to aid tracing root causes like this in future. Tests added. Many thanks to
-        statquant for the reproducible example revealed by his interesting solution
-        and to user1935457 for the assistance :
-            http://stackoverflow.com/a/14359701/403310
-
-    *   merge(...,all.y=TRUE) was 'setcolorder' error if a y column name included a space
-        and there were rows in y not in x, #2555. The non syntactically valid column names
-        are now preserved as intended. Thanks to Simon Judes for reporting. Tests added.
-
-    *   An error in := no longer suppresses the next print, #2376; i.e.,
-            > DT[,foo:=colnameTypo+1]
-            Error: object 'colnameTypo' not found
-            > DT    # now prints DT ok
-            > DT    # used to have to type DT a second time to see it
-        Many thanks to Charles, Joris Meys, and, Spacedman whose solution is now used
-        by data.table internally (http://stackoverflow.com/a/13606880/403310).
-
-#### NOTES
-
-    *   print(DT,topn=2), where topn is provided explicitly, now prints the top and bottom 2 rows
-        even when nrow(x)<100 [options()$datatable.print.nrows]. And the 'topn' argument is now first
-        for easier interactive use: print(DT,2), head(DT,2) and tail(DT,2).
-
-    *   The J() alias is now removed *outside* DT[...], but will still work inside DT[...];
-        i.e., DT[J(...)] is fine. As warned in v1.8.2 (see below in this file) and deprecated
-        with warning() in v1.8.4. This resolves the conflict with function J() in package
-        XLConnect (#1747) and rJava (#2045).
-        Please use data.table() directly instead of J(), outside DT[...].
-
-    *   ?merge.data.table and FAQ 1.12 have been improved (#2457), and FAQ 2.24 added.
-        Thanks to dnlbrky for highlighting : http://stackoverflow.com/a/14164411/403310.
-
-    *   There are now 943 raw tests, as reported by test.data.table().
-
-    *   v1.8.7 was on R-Forge only. v1.8.8 was released to CRAN.
-        Odd numbers are development, evens on CRAN.
-
-
-### Changes in v1.8.6 (on CRAN 13 Nov 2012)
-
-#### BUG FIXES
-
-    *   A variable in calling scope was not found when combining i, j and by in
-        one query, i used that local variable, and that query occurred inside a
-        function, #2368. This worked in 1.8.2, a regression. Test added.
-
-#### COMPATIBILITY FOR R 2.12.0-2.15.0
-
-    *   setnames used paste0() to construct its error messages, a function
-        added to R 2.15.0. Reverted to use paste(). Tests added.
-
-    *   X[Y] where Y is empty (test 764) failed due to reliance on a pmin()
-        enhancement in R 2.15.1. Removed reliance.
-
-#### NOTES
-
-    *   test.data.table() now passes in 2.12.0, the stated dependency, as well as
-        2.14.0, 2.15.0, 2.15.1, 2.15.2 and R-devel.
-
-    *   Full R CMD check (i.e. including compatibility tests with the 9 Suggest-ed
-        packages and S4 tests run using testthat which in turn depends on packages
-        which depend on R >= 2.14.0) passes ok in 2.14.0 onwards.
-
-    *   There are now 876 raw tests, as reported by test.data.table().
-
-    *   v1.8.5 was on R-Forge only. v1.8.6 was released to CRAN.
-        Odd numbers are development, evens on CRAN.
-
-
-### Changes in v1.8.4 (on CRAN 9 Nov 2012)
-
-#### NEW FEATURES
-
-    *   New printing options have been added :
-            options(datatable.print.nrows=100)
-            options(datatable.print.topn=10)
-        If the table to be printed has more than nrows, the top and bottom topn rows
-        are printed. Otherwise, below nrows, the entire table is printed.
-        Thanks to Allan Engelhardt and Melanie Bacou for useful discussions :
-        http://lists.r-forge.r-project.org/pipermail/datatable-help/2012-September/001303.html
-        and see FAQs 2.11 and 2.22.
-
-    *   When one or more rows in i have no match to x and i is unkeyed, i is now
-        tested to see if it is sorted. If so, the key is retained. As before, when all rows of
-        i match to x, the key is retained if i matches to an ordered subset of keyed x without
-        needing to test i, even if i is unkeyed.
-
-    *   by on a keyed empty table is now keyed by the by columns, for consistency with
-        the non empty case when an ordered grouping is detected.
-
-    *   DT[,`:=`(col1=val1, col2=val2, ...)] is now valid syntax rather than a crash, #2254.
-        Many thanks to Thell Fowler for the suggestion.
-
-    *   with=FALSE is no longer needed to use column names or positions on the LHS of :=, #2120.
-            DT[,c("newcol","existingcol"):=list(1L,NULL)]   # with=FALSE not needed
-            DT[,`:=`(newcol=1L, existingcol:=NULL)]         # same
-        If the LHS is held in a variable, the followed equivalent options are retained :
-            mycols = c("existingcol","newcol")
-            DT[,get("mycols"):=1L]
-            DT[,eval(mycols):=1L]                # same
-            DT[,mycols:=1L,with=FALSE]           # same
-            DT[,c("existingcol","newcol"):=1L]   # same (with=FALSE not needed)
-
-    *   Multiple LHS:= and `:=`(...) now work by group, and by without by. Implementing
-        or fixing, and thanks to, #2215 (Florian Oswald), #1710 (Farrel Buchinsky) and
-        others.
-            DT[,c("newcol1","newcol2"):=list(mean(col1),sd(col1)), by=grp]
-            DT[,`:=`(newcol1=mean(col1),
-                     newcol2=sd(col1),
-                     ...),  by=grp]                        # same but easier to read
-            DT[c("grp1","grp2"), `:=`(newcol1=mean(col1),
-                                      newcol2=sd(col1))]   # same using by-without-by
-
-    *   with=FALSE now works with a symbol LHS of :=, by group (#2120) :
-            colname = "newcol"
-            DT[,colname:=f(),by=grp,with=FALSE]
-        Thanks to Alex Chernyakov :
-            http://stackoverflow.com/questions/11745169/dynamic-column-names-in-data-table-r
-            http://stackoverflow.com/questions/11680579/assign-multiple-columns-using-in-data-table-by-group
-
-    *   .GRP is a new symbol available to j. Value 1 for the first group, 2 for the 2nd, etc. Thanks
-        to Josh O'Brien for the suggestion :
-            http://stackoverflow.com/questions/13018696/data-table-key-indices-or-group-counter
-
-    *   .I is a new symbol available to j. An integer vector length .N. It contains the group's row
-        locations in DT. This implements FR#1962.
-            DT[,.I[which.max(colB)],by=colA]      # row numbers of maxima by group
-
-    *   A new "!" prefix on i signals 'not-join' (a.k.a. 'not-where'), #1384i.
-            DT[-DT["a", which=TRUE, nomatch=0]]   # old not-join idiom, still works
-            DT[!"a"]                              # same result, now preferred.
-            DT[!J(6),...]                         # !J == not-join
-            DT[!2:3,...]                          # ! on all types of i
-            DT[colA!=6L | colB!=23L,...]          # multiple vector scanning approach (slow)
-            DT[!J(6L,23L)]                        # same result, faster binary search
-        '!' has been used rather than '-' :
-            * to match the 'not-join'/'not-where' nomenclature
-            * with '-', DT[-0] would return DT rather than DT[0] and not be backwards
-              compatible. With '!', DT[!0] returns DT both before (since !0 is TRUE in
-              base R) and after this new feature.
-            * to leave DT[+J...] and DT[-J...] available for future use
-
-    *   When with=FALSE, "!" may also be a prefix on j, #1384ii. This selects all but the named columns.
-            DF[,-match("somecol",names(DF))]              # works when somecol exists. If not, NA causes an error.
-            DF[,-match("somecol",names(DF),nomatch=0)]    # works when somecol exists. Empty data.frame when it doesn't, silently.
-            DT[,-match("somecol",names(DT)),with=FALSE]   # same issues.
-            DT[,setdiff(names(DT),"somecol"),with=FALSE]  # works but you have to know order of arguments, and no warning if doesn't exist
-            - vs -
-            DT[,!"somecol",with=FALSE]                    # works and easy to read. With (helpful) warning if somecol isn't there.
-        Strictly speaking, this (!j) is a "not-select" (!i is 'not-where'). This has no analogy in SQL.
-        Reminder: i is analogous to WHERE, j is analogous to SELECT and `:=` in j changes SELECT to UPDATE.
-        !j when j is column positions is very similar to -j.
-            DF[,-(2:3),drop=FALSE]           # all but columns 2 and 3. Careful, brackets and drop=FALSE are required.
-            DT[,-(2:3),with=FALSE]           # same
-            DT[,!2:3,with=FALSE]             # same
-            copy(DT)[,2:3:=NULL,with=FALSE]  # same
-        !j was introduced for column names really, not positions. It works for both, for consistency :
-            toremove = c("somecol","anothercol")
-            DT[,!toremove,with=FALSE]
-            toremove = 2:3
-            DT[,!toremove,with=FALSE]        # same code works without change
-
-    *   'which' now accepts NA. This means return the row numbers in i that don't match, #1384iii.
-        Thanks to Santosh Srinivas for the suggestion.
-            X[Y,which=TRUE]   # row numbers of X that do match, as before
-            X[!Y,which=TRUE]  # row numbers of X that don't match
-            X[Y,which=NA]     # row numbers of Y that don't match
-            X[!Y,which=NA]    # row numbers of Y that do match (for completeness)
-
-    *   setnames() now works on data.frame, #2273. Thanks to Christian Hudon for the suggestion.
-
-#### BUG FIXES
-
-    *   A large slowdown (many minutes instead of a few secs) in X[Y] joins has been fixed, #2216.
-        This occurred where the number of rows in i was large, and at least one row joined to
-        more than one row in x. Possibly in other similar circumstances too. The workaround was
-        to set mult="first" which is no longer required. Test added.
-        Thanks to a question and report from Alex Chernyakov :
-            http://stackoverflow.com/questions/12042779/time-of-data-table-join
-
-    *   Indexing columns of data.table with a logical vector and `with=FALSE` now works as
-        expected, fixing #1797. Thanks to Mani Narayanan for reporting. Test added.
-
-    *   In X[Y,cols,with=FALSE], NA matches are now handled correctly. And if cols
-        includes join columns, NA matches (if any) are now populated from i. For
-        consistency with X[Y] and X[Y,list(...)]. Tests added.
-
-    *   "Internal error" when combining join containing missing groups and group by
-        is fixed, #2162. For example :
-            X[Y,.N,by=NonJoinColumn]
-        where Y contains some rows that don't match to X. This bug could also result in a segfault.
-        Thanks to Andrey Riabushenko and Michael Schermerhorn for reporting. Tests added.
-
-    *   On empty tables, := now changes column type and adds new 0 length columns ok, fixing
-        #2274. Tests added.
-
-    *   Deleting multiple columns out-of-order is no longer a segfault, #2223. Test added.
-            DT[,c(9,2,6):=NULL]
-        Reminder: deleting columns by reference is relatively instant, regardless of table size.
-
-    *   Mixing column adds and deletes in one := gave incorrect results, #2251. Thanks to
-        Michael Nelson for reporting. Test added.
-            DT[,c("newcol","col1"):=list(col1+1L,NULL)]
-            DT[,`:=`(newcol=col1+1L,col1=NULL)]             # same
-
-    *   Out of bound positions in the LHS of := are now caught. Root cause of crash in #2254.
-        Thanks to Thell Fowler for reporting. Tests added.
-            DT[,(ncol(DT)+1):=1L]   # out of bounds error (add new columns by name only)
-            DT[,ncol(DT):=1L]       # ok
-
-    *   A recycled column plonk RHS of := no longer messes up setkey and := when used on that
-        object afterwards, #2298. For example,
-            DT = data.table(a=letters[3:1],x=1:3)
-            DT[,c("x1","x2"):=x]  # ok (x1 and x2 are now copies of x)
-            setkey(DT,a)          # now ok rather than wrong result
-        Thanks to Timothee Carayol for reporting. Tests added.
-
-    *   Join columns are now named correctly when j is .SD, a subset of .SD, or similar, #2281.
-            DT[c("a","b"),.SD[...]]    # result's first column now named key(DT)[1] rather than 'V1'
-
-    *   Joining an empty i table now works without error (#2194). It also retains key and has a consistent
-        number and type of empty columns as the non empty by-without-by case. Tests added.
-
-    *   by-without-by with keyed i where key isn't the 1:n columns of i could crash, #2314. Many thanks
-        to Garrett See for reporting with reproducible example data file. Tests added.
-
-    *   DT[,col1:=X[Y,col2]] was a crash, #2311. Due to RHS being a data.table. mult="first"
-        (or drop=TRUE in future) was likely intended. Thanks to Anoop Shah for reporting with
-        reproducible example. Root cause (recycling of list columns) fixed and tests added.
-
-    *   Grouping by a column which somehow has names, no longer causes an error, #2307.
-          DT = data.table(a=1:3,b=c("a","a","b"))
-          setattr(DT$b, "names", c("a","b","c"))  # not recommended, just to illustrate
-          DT[,sum(a),by=b]  # now ok
-
-    *   gWidgetsWWW wasn't known as data.table aware, even though it mimicks executing
-        code in .GlobalEnv, #2340. So, data.table is now gWidgetsWWW-aware. Further packages
-        can be added if required by changing a new variable :
-            data.table:::cedta.override
-        by using assignInNamespace(). Thanks to Zach Waite and Yihui Xie for investigating and
-        providing reproducible examples :
-            http://stackoverflow.com/questions/13106018/data-table-error-when-used-through-knitr-gwidgetswww
-
-    *   Optimization of lapply when FUN is a character function name now works, #2212.
-            DT[,lapply(.SD, "+", 1), by=id]  # no longer an error
-            DT[,lapply(.SD, `+`, 1), by=id]  # same, worked before
-        Thanks to Michael Nelson for highlighting. Tests added.
-
-    *   Syntactically invalid column names (such as "Some rate (%)") are now preserved in X[Y] joins and
-        merge(), as intended. Thanks to George Kaupas (#2193i) and Yang Zhang (#2090) for reporting.
-        Tests added.
-
-    *   merge() and setcolorder() now check for duplicate column names first rather than a less helpful
-        error later, #2193ii. Thanks to Peter Fine for reporting. Tests added.
-
-    *   Column attributes (such as 'comment') are now retained by X[Y] and merge(), #2270. Thanks to
-        Allan Engelhardt for reporting. Tests added.
-
-    *   A matrix RHS of := is now treated as vector, with warning if it has more than 1 column, #2333.
-        Thanks to Alex Chernyakov for highlighting. Tests added.
-            DT[,b:=scale(a)]   # now works rather than creating an invalid column of type matrix
-        http://stackoverflow.com/questions/13076509/why-error-from-na-omit-after-running-scale-in-r-in-data-table
-
-    *   last() is now S3 generic for compatibility with xts::last, #2312. Strictly speaking, for speed,
-        last(x) deals with vector, list and data.table inputs directly before falling back to
-        S3 dispatch. Thanks to Garrett See for reporting. Tests added.
-
-    *   DT[,lapply(.SD,sum)] in the case of no grouping now returns a data.table for consistency, rather
-        than list, #2263. Thanks to Justin and mnel for highlighting. Existing test changed.
-            http://stackoverflow.com/a/12290443/403310
-
-    *   L[[2L]][,newcol:=] now works, where L is a list of data.table objects, #2204. Thanks to Melanie Bacou
-        for reporting. Tests added. A warning is issued when the first column is added if L was created with
-        list(DT1,DT2) since R's list() copies named inputs. Until reflist() is implemented, this warning can be
-        ignored or suppressed.
-            http://lists.r-forge.r-project.org/pipermail/datatable-help/2012-August/001265.html
-
-    *   DT[J(data.frame(...))] now works again, giving the same result as DT[data.frame(...)], #2265.
-        Thanks to Christian Hudon for reporting. Tests added.
-
-    *   A memory leak has been fixed, #2191 and #2284. All data.table objects leaked the over allocated column
-        pointer slots; i.e., when a data.table went out of scope or was rm()'d this memory wasn't released and
-        gc() would report growing Vcells. For a 3 column data.table with a 100 allocation, the growth was
-        1.5Kb per data.table on 64bit (97*8*2 bytes) and 0.75Kb on 32bit (97*4*2 bytes).
-        Many thanks to Xavier Saint-Mleux and Sasha Goodman for the reproducible examples and
-        assistance. Tests added.
-
-    *   rbindlist now skips empty (0 row) items as well as NULL items. So the column types of the result are
-        now taken from the first non-empty data.table. Thanks to Garrett See for reporting. Test added.
-
-    *   setnames did not update column names correctly when passed integer column positions and those
-        column names contained duplicates, fixed. This affected the column names of queries involving
-        two or more by expressions with a named list inside {}. Thanks to Steve Lianoglou for finding and
-        fixing. Tests added.
-            DT[, {list(name1=sum(v),name2=sum(w))}, by="a,b"]  # now ok, no blank column names in result
-            DT[, list(name1=sum(v),name2=sum(w)), by="a,b"]    # ok before
-
-#### USER VISIBLE CHANGES
-
-    *   J() now issues a warning (when used *outside* DT[...]) that using it
-        outside DT[...] is deprecated. See item below in v1.8.2.
-        Use data.table() directly instead of J(), outside DT[...]. Or, define
-        an alias yourself. J() will continue to work *inside* DT[...] as documented.
-
-    *   DT[,LHS:=RHS,...] no longer prints DT. This implements #2128 "Try again to get
-        DT[i,j:=value] to return invisibly". Thanks to discussions here :
-            http://stackoverflow.com/questions/11359553/how-to-suppress-output-when-using-in-r-data-table
-            http://r.789695.n4.nabble.com/Avoiding-print-when-using-tp4643076.html
-        FAQs 2.21 and 2.22 have been updated.
-
-    *   DT[] now returns DT rather than an error that either i or j must be supplied.
-        So, ending with [] at the console is a convenience to print the result of :=, rather
-        than wrapping with print(); e.g.,
-            DT[i,j:=value]...oops forgot print...[]
-        is the same as :
-            print(DT[i,j:=value])
-
-    *   A warning is now issued when by is set equal to the by-without-by join columns,
-        causing x to be subset and then grouped again. The warning suggests removing by or
-        changing it, #2282. This can be turned off using options(datatable.warnredundantby=FALSE)
-        in case it occurs after upgrading, until those lines can be modified.
-        Thanks to Ben Barnes for highlighting :
-            http://stackoverflow.com/a/12474211/403310
-
-    *   Description of how join columns are determined in X[Y] syntax has been further clarified
-        in ?data.table. Thanks to Alex :
-            http://stackoverflow.com/questions/12920803/merge-data-table-when-the-number-of-key-columns-are-different
-
-    *   ?transform and example(transform) has been fixed and embelished, #2316.
-        Thanks to Garrett See's suggestion.
-
-    *   ?setattr has been updated to document that it takes any input, not just data.table, and
-        can be used on columns of a data.frame, for example.
-
-    *   Efficiency warnings when joining between a factor column and a character column are now downgraded
-        to messages when verbosity is on, #2265i. Thanks to Christian Hudon for the suggestion.
-
-#### THANKS TO BETA TESTING (bugs caught in 1.8.3 before release to CRAN)
-
-    *   Combining a join with mult="first"|"last" followed by by inside the same [...] gave incorrect
-        results or a crash, #2303. Many thanks to Garrett See for the reproducible example and
-        pinpointing in advance which commit had caused the problem. Tests added.
-
-    *   Examples in ?data.table have been updated now that := no longer prints. Thanks to Garrett See.
-
-#### NOTES
-
-    *   There are now 869 raw tests. test.data.table() should return precisely this number of
-        tests passed. If not, then somehow, a slightly stale version from R-Forge is likely
-        installed; please reinstall from CRAN.
-
-    *   v1.8.3 was an R-Forge only beta release. v1.8.4 was released to CRAN.
-
-
-### Changes in v1.8.2
-
-#### NEW FEATURES
-
-    *   Numeric columns (type 'double') are now allowed in keys and ad hoc
-        by. J() and SJ() no longer coerce 'double' to 'integer'. i join columns
-        which mismatch on numeric type are coerced silently to match
-        the type of x's join column. Two floating point values
-        are considered equal (by grouping and binary search joins) if their
-        difference is within sqrt(.Machine$double.eps), by default. See example
-        in ?unique.data.table. Completes FRs #951, #1609 and #1075. This paves the
-        way for other atomic types which use 'double' (such as POSIXct and bit64).
-        Thanks to Chris Neff for beta testing and finding problems with keys
-        of two numeric columns (bug #2004), fixed and tests added.
-
-    *   := by group is now implemented (FR#1491) and sub-assigning to a new column
-        by reference now adds the column automatically (initialized with NA where
-        the sub-assign doesn't touch) (FR#1997). := by group can be combined with all
-        types of i, so ":= by group" includes grouping by `i` as well as by `by`.
-        Since := by group is by reference, it should be significantly faster than any
-        method that (directly or indirectly) `cbind`s the grouped results to DT, since
-        no copy of the (large) DT is made at all. It's a short and natural syntax that
-        can be compounded with other queries.
-            DT[,newcol:=sum(colB),by=colA]
-
-    *   Prettier printing of list columns. The first 6 items of atomic vectors
-        are collapsed with "," followed by a trailing "," if there are more than
-        6, FR#1608. This difference to data.frame has been added to FAQ 2.17.
-        Embedded objects (such as a data.table) print their class name only to avoid
-        seemingly mangled output, bug #1803. Thanks to Yike Lu for reporting.
-        For example:
-        > data.table(x=letters[1:3],
-                     y=list( 1:10, letters[1:4], data.table(a=1:3,b=4:6) ))
-           x            y
-        1: a 1,2,3,4,5,6,
-        2: b      a,b,c,d
-        3: c <data.table>
-
-    *   Warnings added when joining character to factor, and factor to character.
-        Character to character is now preferred in joins and needs no coercion.
-        Even so, these coercions have been made much more efficient by taking
-        a shallow copy of i internally, avoiding a full deep copy of i.
-
-    *   Ordered subsets now retain x's key. Always for logical and keyed i, using
-        base::is.unsorted() for integer and unkeyed i. Implements FR#295.
-
-    *   mean() is now automatically optimized, #1231. This can speed up grouping
-        by 20 times when there are a large number of groups. See wiki point 3, which
-        is no longer needed to know. Turn off optimization by setting
-        options(datatable.optimize=0).
-
-    *   DT[,lapply(.SD,...),by=...] is now automatically optimized, #2067. This can speed
-        up applying a function by column by group, by over 20 times. See wiki point 5
-        which is no longer needed to know. In other words:
-             DT[,lapply(.SD,sum),by=grp]
-        is now just as fast as :
-             DT[,list(x=sum(x),y=sum(y)),by=grp]
-        Don't forget to use .SDcols when a subset of columns is needed.
-
-    *   The package is now Byte Compiled (when installed in R 2.14.0 or later). Several
-        internal speed improvements were made in this version too, such as avoiding
-        internal copies. If you find 1.8.2 is faster, before attributing that to Byte
-        Compilation, please install the package without Byte Compilation and compare
-        ceteris paribus. If you find cases where speed has slowed, please let us know.
-
-    *   sapply(DT,class) gets a significant speed boost by avoiding a call to unclass()
-        in as.list.data.table() called by lapply(DT,...), which copied the entire object.
-        Thanks to a question by user1393348 on Stack Overflow, implementing #2000.
-        http://stackoverflow.com/questions/10584993/r-loop-over-columns-in-data-table
-
-    *   The J() alias is now deprecated outside DT[...], but will still work inside
-        DT[...], as in DT[J(...)].
-        J() is conflicting with function J() in package XLConnect (#1747)
-        and rJava (#2045). For data.table to change is easier, with some efficiency
-        advantages too. The next version of data.table will issue a warning from J()
-        when used outside DT[...]. The version after will remove it. Only then will
-        the conflict with rJava and XLConnect be resolved.
-        Please use data.table() directly instead of J(), outside DT[...].
-
-    *   New DT[.(...)] syntax (in the style of package plyr) is identical to
-        DT[list(...)], DT[J(...)] and DT[data.table(...)]. We plan to add ..(), too, so
-        that .() and ..() are analogous to the file system's ./ and ../; i.e., .()
-        evaluates within the frame of DT and ..() in the parent scope.
-
-    *   New function rbindlist(l). This does the same as do.call("rbind",l), but much
-        faster.
-
-#### BUG FIXES
-
-    *   DT[,f(.SD),by=colA] where f(x)=x[,colB:=1L] was a segfault, bug#1727.
-        This is now a graceful error to say that using := in .SD's j is
-        reserved for future use. This was already caught in most circumstances,
-        other than via f(.SD). Thanks to Leon Baum for reporting. Test added.
-
-    *   If .N is selected by j it is now renamed "N" (no dot) in the output, to
-        avoid a potential conflict in subsequent grouping between a column called
-        ".N" and the special .N variable, fixing #1720. ?data.table updated and
-        FAQ 4.6 added with detailed examples. Tests added.
-
-    *   Moved data.table setup code from .onAttach to .onLoad so that it
-        is also run when data.table is simply `import`ed from within a package,
-        fixing #1916 related to missing data.table options.
-
-    *   Typos fixed in ?":=", thanks to Michael Weylandt for reporting.
-
-    *   base::unname(DT) now works again, as needed by plyr::melt(). Thanks to
-        Christoph Jaeckel for reporting. Test added.
-
-    *   CJ(x=...,y=...) now retains the column names x and y, useful when CJ
-        is used independently (since x[CJ(...)] takes join column names from x).
-        Restores behaviour lost somewhere between 1.7.1 and 1.8.0, thanks
-        to Muhammad Waliji for reporting. Tests added.
-
-    *   A column plonk via set() was only possible by passing NULL as i. The default
-        for i is now NULL so that missing i invokes a column plonk, too (when length(value)
-        == nrow(DT)). A column plonk is much more efficient than creating 1:nrow(DT) and
-        passing that as i to set() or DT[i,:=] (almost infinitely faster). Thanks to
-        testing by Josh O'Brien in comments on Stack Overflow. Test added.
-
-    *   Joining a factor column with unsorted and unused levels to a character column
-        now matches properly, fixing #1922. Thanks to Christoph Jäckel for the reproducible
-        example. Test added.
-
-    *   'by' on an empty table now returns an empty table (#1945) and .N, .SD and .BY are
-        now available in the empty case (also #1945). The column names and types of
-        the returned empty table are consistent with the non empty case. Thanks to
-        Malcolm Cook for reporting. Tests added.
-
-    *   DT[NULL] now returns the NULL data.table, rather than an error. Test added.
-        Use DT[0] to return an empty copy of DT.
-
-    *   .N, .SD and .BY are now available to j when 'by' is missing, "", character()
-        and NULL, fixing #1732. For consistency so that j works unchanged when by is
-        dynamic and passed one of those values all meaning 'don't group'. Thanks
-        to Joseph Voelkel reporting and Chris Neff for further use cases. Tests added.
-
-    *   chorder(character()) was a seg fault, #2026. Fixed and test added.
-
-    *   When grouping by i, if the first row of i had no match, .N was 1 rather than 0.
-        Fixed and tests added. Thanks to a question by user1165199 on Stack Overflow :
-        http://stackoverflow.com/questions/10721517/count-number-of-times-data-is-in-another-dataframe-in-r
-
-    *   All object attributes are now retained by grouping; e.g., tzone of POSIXct is no
-        longer lost, fixing #1704. Test added. Thanks to Karl Ove Hufthammer for reporting.
-
-    *   All object attributes are now retained by recycling assign to a new column (both
-        <- and :=); e.g., POSIXct class is no longer lost, fixing #1712. Test added. Thanks
-        to Leon Baum for reporting.
-
-    *   unique() of ITime no longer coerces to integer, fixing #1719. Test added.
-
-    *   rbind() of DT with an irregular list() now recycles the list items correctly,
-        #2003. Test added.
-
-    *   setcolorder() now produces correct error when passed missing column names. Test added.
-
-    *   merge() with common names, and, all.y=TRUE (or all=TRUE) no longer returns an error, #2011.
-        Tests added. Thanks to a question by Ina on Stack Overflow :
-        http://stackoverflow.com/questions/10618837/joining-two-partial-data-tables-keeping-all-x-and-all-y
-
-    *   Removing or setting datatable.alloccol to NULL is no longer a memory leak, #2014.
-        Tests added. Thanks to a question by Vanja on Stack Overflow :
-        http://stackoverflow.com/questions/10628371/r-importing-data-table-package-namespace-unexplainable-jump-in-memory-consumpt
-
-    *   DT[,2:=someval,with=FALSE] now changes column 2 even if column 1 has the same (duplicate)
-        name, #2025. Thanks to Sean Creighton for reporting. Tests added.
-
-    *   merge() is now correct when all=TRUE but there are no common values in the two
-        data.tables, fixing #2114. Thanks to Karl Ove Hufthammer for reporting.  Tests added.
-
-    *   An as.data.frame method has been added for ITime, so that ITime can be passed to ggplot2
-        without error, #1713. Thanks to Farrel Buchinsky for reporting. Tests added.
-        ITime axis labels are still displayed as integer seconds from midnight; we don't know why ggplot2
-        doesn't invoke ITime's as.character method. Convert ITime to POSIXct for ggplot2, is one approach.
-
-    *   setnames(DT,newnames) now works when DT contains duplicate column names, #2103.
-        Thanks to Timothee Carayol for reporting. Tests added.
-
-    *   subset() would crash on a keyed table with non-character 'select', #2131. Thanks
-        to Benjamin Barnes for reporting. The root cause was non character inputs to chmatch
-        and %chin%. Tests added.
-
-    *   Non-ascii column names now work when passed as character 'by', #2134. Thanks to
-        Karl Ove Hufthammer for reporting. Tests added.
-            DT[, mean(foo), by=ÆØÅ]      # worked before
-            DT[, mean(foo), by="ÆØÅ"]    # now works too
-            DT[, mean(foo), by=colA]     # worked before
-            DT[, mean(foo), by="colA"]   # worked before
-
-#### USER VISIBLE CHANGES
-
-    *   Incorrect syntax error message for := now includes advice to check that
-        DT is a data.table rather than a data.frame. Thanks to a comment by
-        gkaupas on Stack Overflow.
-
-    *   When set() is passed a logical i, the error message now includes advice to
-        wrap with which() and take the which() outside the loop (if any) if possible.
-
-    *   An empty data.table (0 rows, 1+ cols) now print as "Empty data.table" rather
-        than "NULL data.table". A NULL data.table, returned by data.table(NULL) has
-        0 rows and 0 cols.  DT[0] returns an empty data.table.
-
-    *   0 length by (such as NULL and character(0)) now return a data.table when j
-        is vector, rather than vector, for consistency of return types when by
-        is dynamic and 'dont group' needs to be represented. Bug fix #1599 in
-        v1.7.0 was fixing an error in this case (0 length by).
-
-    *   Default column names for unnamed columns are now consistent between 'by' and
-        non-'by'; e.g. these two queries now name the columns "V1" and "V2" :
-            DT[,list("a","b"),by=x]
-            DT[,list("a","b")]      # used to name the columns 'a' and 'b', oddly.
-
-    *   Typing ?merge now asks whether to display ?merge.data.frame or ?merge.data.table,
-        and ?merge.data.table works directly. Thanks to Chris Neff for suggesting.
-
-    *   Description of how join columns are determined in X[Y] syntax has been clarified
-        in ?data.table. Thanks to Juliet Hannah and Yike Lu.
-
-    *   DT now prints consistent row numbers when the column names are reprinted at the
-        bottom of the output (saves scrolling up). Thanks to Yike Lu for reporting #2015.
-        The tail as well as the head of large tables is now printed.
-
-
-#### THANKS TO BETA TESTING (i.e. bugs caught in 1.8.1 before release to CRAN) :
-
-    *   Florian Oswald for #2094: DT[,newcol:=NA] now adds a new logical column ok.
-        Test added.
-
-    *   A large slow down (2s went up to 40s) when iterating calls to DT[...] in a
-        for loop, such as in example(":="), was caught and fixed in beta, #2027.
-        Speed regression test added.
-
-    *   Christoph Jäckel for #2078: by=c(...) with i clause broke. Tests added.
-
-    *   Chris Neff for #2065: keyby := now keys, unless, i clause is present or
-        keyby is not straightforward column names (in any format). Tests added.
-
-    *   :=NULL to delete, following by := by group to add, didn't add the column,
-        #2117. Test added.
-
-    *   Combining i subset with by gave incorrect results, #2118. Tests added.
-
-    *   Benjamin Barnes for #2133: rbindlist not supporting type 'logical'.
-        Tests added.
-
-    *   Chris Neff for #2146: using := to add a column to the result of a simple
-        column subset such as DT[,list(x)], or after changing all column names
-        with setnames(), was an error. Fixed and tests added.
-
-#### NOTES
-
-    *   There are now 717 raw tests, plus S4 tests.
-
-    *   v1.8.1 was an R-Forge only beta release. v1.8.2 was released to CRAN.
-
-
-### Changes in v1.8.0
-
-#### NEW FEATURES
-
-    *   character columns are now allowed in keys and are preferred to
-        factor. data.table() and setkey() no longer coerce character to
-        factor. Factors are still supported. Implements FR#1493, FR#1224
-        and (partially) FR#951.
-
-    *   setkey() no longer sorts factor levels. This should be more convenient
-        and compatible with ordered factors where the levels are 'labels', in
-        some order other than alphabetical. The established advice to paste each
-        level with an ordinal prefix, or use another table to hold the factor
-        labels instead of a factor column, is no longer needed. Solves FR#1420.
-        Thanks to Damian Betebenner and Allan Engelhardt raising on datatable-help
-        and their tests have been added verbatim to the test suite.
-
-    *   unique(DT) and duplicated(DT) are now faster with character columns,
-        on unkeyed tables as well as keyed tables, FR#1724.
-
-    *   New function set(DT,i,j,value) allows fast assignment to elements
-        of DT. Similar to := but avoids the overhead of [.data.table, so is
-        much faster inside a loop. Less flexible than :=, but as flexible
-        as matrix subassignment. Similar in spirit to setnames(), setcolorder(),
-        setkey() and setattr(); i.e., assigns by reference with no copy at all.
-
-            M = matrix(1,nrow=100000,ncol=100)
-            DF = as.data.frame(M)
-            DT = as.data.table(M)
-            system.time(for (i in 1:1000) DF[i,1L] <- i)   # 591.000s
-            system.time(for (i in 1:1000) DT[i,V1:=i])     #   1.158s
-            system.time(for (i in 1:1000) M[i,1L] <- i)    #   0.016s
-            system.time(for (i in 1:1000) set(DT,i,1L,i))  #   0.027s
-
-    *   New functions chmatch() and %chin%, faster versions of match()
-        and %in% for character vectors. R's internal string cache is
-        utilised (no hash table is built). They are about 4 times faster
-        than match() on the example in ?chmatch.
-
-    *   Internal function sortedmatch() removed and replaced with chmatch()
-        when matching i levels to x levels for columns of type 'factor'. This
-        preliminary step was causing a (known) significant slowdown when the number
-        of levels of a factor column was large (e.g. >10,000). Exacerbated in
-        tests of joining four such columns, as demonstrated by Wes McKinney
-        (author of Python package Pandas). Matching 1 million strings of which
-        of which 600,000 are unique is now reduced from 16s to 0.5s, for example.
-        Background here :
-        http://stackoverflow.com/questions/8991709/why-are-pandas-merges-in-python-faster-than-data-table-merges-in-r
-
-    *   rbind.data.table() gains a use.names argument, by default TRUE.
-        Set to FALSE to combine columns in order rather than by name. Thanks to
-        a question by Zach on Stack Overflow :
-        http://stackoverflow.com/questions/9315258/aggregating-sub-totals-and-grand-totals-with-data-table
-
-    *   New argument 'keyby'. An ad hoc by just as 'by' but with an additional setkey()
-        on the by columns of the result, for convenience. Not to be confused with a
-        'keyed by' such as DT[...,by=key(DT)] which can be more efficient as explained
-        by FAQ 3.3.  Thanks to Yike Lu for the suggestion and discussion (FR#1780).
-
-    *   Single by (or keyby) expressions no longer need to be wrapped in list(),
-        for convenience, implementing FR#1743; e.g., these now works :
-            DT[,sum(v),by=a%%2L]
-            DT[,sum(v),by=month(date)]
-        instead of needing :
-            DT[,sum(v),by=list(a%%2L)]
-            DT[,sum(v),by=list(month(date))]
-
-    *   Unnamed 'by' expressions have always been inspected using all.vars() to make
-        a guess at a sensible column name for the result. This guess now includes
-        function names via all.vars(functions=TRUE), for convenience; e.g.,
-            DT[,sum(v),by=month(date)]
-        now returns a column called 'month' rather than 'date'. It is more robust to
-        explicitly name columns, though; e.g.,
-            DT[,sum(v),by=list("Guaranteed name"=month(date))]
-
-    *   For a surprising speed boost in some circumstances, default options such as
-        'datatable.verbose' are now set when the package loads (unless they are already
-        set, by user's profile for example). The 'default' argument of base::getOption()
-        was the culprit and has been removed internally from all 11 calls.
-
-
-#### BUG FIXES
-
-    *   Fixed a `suffixes` handling bug in merge.data.table that was
-        only recently introduced during the recent "fast-merge"-ing reboot.
-        Briefly, the bug was only triggered in scenarios where both
-        tables had identical column names that were not part of `by` and
-        ended with *.1. cf. "merge and auto-increment columns in y[x]"
-        test in tests/test-data.frame-like.R for more information.
-
-    *   Adding a column using := on a data.table just loaded from disk was
-        correctly detected and over allocated, but incorrectly warning about
-        a previous copy. Test 462 tested loading from disk, but suppressed
-        warnings (sadly). Fixed.
-
-    *   data.table unaware packages that use DF[i] and DF[i]<-value syntax
-        were not compatible with data.table, fixed. Many thanks to Prasad Chalasani
-        for providing a reproducible example with base::droplevels(), and
-        Helge Liebert for providing a reproducible example (#1794) with stats::reshape().
-        Tests added.
-
-    *   as.data.table(DF) already preserved DF's attributes but not any inherited
-        classes such as nlme's groupedData, so nlme was incompatible with
-        data.table. Fixed. Thanks to Dieter Menne for providing a reproducible
-        example. Test added.
-
-    *   The internal row.names attribute of .SD (which exists for compatibility with
-        data.frame only) was not being updated for each group. This caused length errors
-        when calling any non-data.table-aware package from j, by group, when that package
-        used length of row.names. Such as the recent update to ggplot2. Fixed.
-
-    *   When grouped j consists of a print of an object (such as ggplot2), the print is now
-        masked to return NULL rather than the object that ggplot2 returns since the
-        recent update v0.9.0. Otherwise data.table tries to accumulate the (albeit
-        invisible) print object. The print mask is local to grouping, not generally.
-
-    *   'by' was failing (bug #1880) when passed character column names where one or more
-        included a space. So, this now works :
-            DT[,sum(v),by="column 1"]
-        and j retains spaces in column names rather than replacing spaces with "."; e.g.,
-            DT[,list("a b"=1)]
-        Thanks to Yang Zhang for reporting. Tests added. As before, column names may be
-        back ticked in the usual R way (in i, j and by); e.g.,
-            DT[,sum(`nicely named var`+1),by=month(`long name for date column`)]
-
-    *   unique() on an unkeyed table including character columns now works correctly, fixing
-        #1725. Thanks to Steven Bagley for reporting. Test added.
-
-    *   %like% now returns logical (rather than integer locations) so that it can be
-        combined with other i clauses, fixing #1726. Thanks to Ivan Zhang for reporting. Test
-        added.
-
-
-#### THANKS TO
-
-    *   Joshua Ulrich for spotting a missing PACKAGE="data.table"
-        in .Call in setkey.R, and suggesting as.list.default() and
-        unique.default() to avoid dispatch for speed, all implemented.
-
-
-#### USER-VISIBLE CHANGES
-
-    *   Providing .SDcols when j doesn't use .SD is downgraded from error to warning,
-        and verbosity now reports which columns have been detected as used by j.
-
-    *   check.names is now FALSE by default, for convenience when working with column
-        names with spaces and other special characters, which are now fully supported.
-        This difference to data.frame has been added to FAQ 2.17.
-
-
-### Changes in v1.7.10
-
-#### NEW FEATURES
-
-    *   New function setcolorder() reorders the columns by name
-        or by number, by reference with no copy. This is (almost)
-        infinitely faster than DT[,neworder,with=FALSE].
-
-    *   The prefix i. can now be used in j to refer to join inherited
-        columns of i that are otherwise masked by columns in x with
-        the same name.
-
-
-#### BUG FIXES
-
-    *   tracemem() in example(setkey) was causing CRAN check errors
-        on machines where R is compiled without memory profiling available,
-        for efficiency. Notably, R for Windows, Ubuntu and Mac have memory
-        profiling enabled which may slow down R on those architectures even
-        when memory profiling is not being requested by the user. The call to
-        tracemem() is now wrapped with try().
-
-    *   merge of unkeyed tables now works correctly after breaking in 1.7.8 and
-        1.7.9. Thanks to Eric and DM for reporting. Tests added.
-
-    *   nomatch=0 was ignored for the first group when j used join inherited
-        scope. Fixed and tests added.
-
-
-#### USER-VISIBLE CHANGES
-
-    *   Updating an existing column using := after a key<- now works without warning
-        or error. This can be useful in interactive use when you forget to use setkey()
-        but don't mind about the inefficiency of key<-. Thanks to Chris Neff for
-        providing a convincing use case. Adding a new column uing := after key<-
-        issues a warning, shallow copies and proceeds, as before.
-
-    *   The 'datatable.pre.suffixes' option has been removed. It was available to
-        obtain deprecated merge() suffixes pre v1.5.4.
-
-
-### Changes in v1.7.9
-
-#### NEW FEATURES
-
-   *    New function setnames(), referred to in 1.7.8 warning messages.
-        It makes no copy of the whole data object, unlike names<- and
-        colnames<-. It may be more convenient as well since it allows changing
-        a column name, by name; e.g.,
-
-          setnames(DT,"oldcolname","newcolname")  # by name; no match() needed
-          setnames(DT,3,"newcolname")             # by position
-          setnames(DT,2:3,c("A","B"))             # multiple
-          setnames(DT,c("a","b"),c("A","B"))      # multiple by name
-          setnames(DT,toupper(names(DT)))         # replace all
-
-        setnames() maintains truelength of the over-allocated names vector. This
-        allows := to add columns fully by reference without growing the names
-        vector. As before with names<-, if a key column's name is changed,
-        the "sorted" attribute is updated with the new column name.
-
-#### BUG FIXES
-
-   *    Incompatibility with reshape() of 3 column tables fixed
-        (introduced by 1.7.8) :
-          Error in setkey(ans, NULL) : x is not a data.table
-        Thanks to Damian Betebenner for reporting and
-        reproducible example. Tests added to catch in future.
-
-   *    setattr(DT,...) still returns DT, but now invisibly. It returns
-        DT back again for compound syntax to work; e.g.,
-            setattr(DT,...)[i,j,by]
-        Again, thanks to Damian Betebenner for reporting.
-
-
-### Changes in v1.7.8
-
-#### BUG FIXES
-
-   *    unique(DT) now works when DT is keyed and a key
-        column is called 'x' (an internal scoping conflict
-        introduced in v1.6.1). Thanks to Steven Bagley for
-        reporting.
-
-   *    Errors and seg faults could occur in grouping when
-        j contained character or list columns. Many thanks
-        to Jim Holtman for providing a reproducible example.
-
-   *    Setting a key on a table with over 268 million rows
-        (2^31/8) now works (again), #1714. Bug introduced in
-        v1.7.2. setkey works up to the regular R vector limit
-        of 2^31 rows (2 billion). Thanks to Leon Baum
-        for reporting.
-
-   *    Checks in := are now made up front (before starting to
-        modify the data.table) so that the data.table isn't
-        left in an invalid state should an error occur, #1711.
-        Thanks to Chris Neff for reporting.
-
-   *    The 'Chris crash' is fixed. The root cause was that key<-
-        always copies the whole table. The problem with that copy
-        (other than being slower) is that R doesn't maintain the
-        over allocated truelength, but it looks as though it has.
-        key<- was used internally, in particular in merge(). So,
-        adding a column using := after merge() was a memory overwrite,
-        since the over allocated memory wasn't really there after
-        key<-'s copy.
-
-        data.tables now have a new attribute '.internal.selfref' to
-        catch and warn about such copies in future. All internal
-        use of key<- has been replaced with setkey(), or new function
-        setkeyv() which accepts a vector, and do not copy.
-
-        Many thanks to Chris Neff for extended dialogue, providing a
-        reproducible example and his patience. This problem was not just
-        in pre 2.14.0, but post 2.14.0 as well. Thanks also to Christoph
-        Jäckel, Timothée Carayol and DM for investigations and suggestions,
-        which in combination led to the solution.
-
-   *    An example in ?":=" fixed, and j and by descriptions
-        improved in ?data.table. Thanks to Joseph Voelkel for
-        reporting.
-
-#### NEW FEATURES
-
-   *    Multiple new columns can be added by reference using
-        := and with=FALSE; e.g.,
-            DT[,c("foo","bar"):=1L,with=FALSE]
-            DT[,c("foo","bar"):=list(1L,2L),with=FALSE]
-
-   *    := now recycles vectors of non divisible length, with
-        a warning (previously an error).
-
-   *    When setkey coerces a numeric or character column, it
-        no longer makes a copy of the whole table, FR#1744. Thanks
-        to an investigation by DM.
-
-   *    New function setkeyv(DT,v) (v stands for vector) replaces
-        key(DT)<-v syntax. Also added setattr(). See ?copy.
-
-   *    merge() now uses (manual) secondary keys, for speed.
-
-
-#### USER VISIBLE CHANGES
-
-   *    The loc argument of setkey has been removed. This wasn't very
-        useful and didn't warrant a period of deprecation.
-
-   *    datatable.alloccol has been removed. That warning is now
-        controlled by datatable.verbose=TRUE. One option is easer.
-
-   *    If i is a keyed data.table, it is no longer an error if its
-        key is longer than x's key; the first length(key(x)) columns
-        of i's key are used to join.
-
-
-### Changes in v1.7.7
-
-#### BUG FIXES
-
-   *    Previous bug fix for random crash in R <= 2.13.2
-        related to truelength and over-allocation didn't
-        work, 3rd attempt. Thanks to Chris Neff for his
-        patience and testing. This has shown up consistently
-        as error status on CRAN old-rel checks (windows and
-        mac). So if they pass, this issue is fixed.
-
-
-### Changes in v1.7.6
-
-#### NEW FEATURES
-
-   *    An empty list column can now be added with :=, and
-        data.table() accepts empty list().
-            DT[,newcol:=list()]
-            data.table(a=1:3,b=list())
-        Empty list columns contain NULL for all rows.
-
-#### BUG FIXES
-
-   *    Adding a column to a data.table loaded from disk could
-        result in a memory corruption in R <= 2.13.2, revealed
-        and thanks to CRAN checks on windows old-rel.
-
-   *    Adding a factor column with a RHS to be recycled no longer
-        loses its factor attribute, #1691. Thanks to Damian
-        Betebenner for reporting.
-
-
-### Changes in v1.7.5
-
-#### BUG FIXES
-
-   *    merge()-ing a data.table where its key is not the first
-        few columns in order now works correctly and without
-        warning, fixing #1645. Thanks to Timothee Carayol for
-        reporting.
-
-   *    Mixing nomatch=0 and mult="last" (or "first") now works,
-        #1661. Thanks to Johann Hibschman for reporting.
-
-   *    Join Inherited Scope now respects nomatch=0, #1663. Thanks
-        to Johann Hibschman for reporting.
-
-   *    by= could generate a keyed result table with invalid key;
-        e.g., when by= expressions return NA, #1631. Thanks to
-        Muhammad Waliji for reporting.
-
-   *    Adding a column to a data.table loaded from disk resulted
-        in an error that truelength(DT)<length(DT).
-
-   *    CJ() bogus values and logical error fixed, #1689. Thanks to
-        Damian Betebenner and Chris Neff for reporting.
-
-   *    j=list(.SD,newcol=...) now gives friendly error suggesting cbind
-        or merge afterwards until := by group is implemented, rather than
-        treating .SD as a list column, #1647. Thanks to a question by
-        Christoph_J on Stack Overflow.
-
-
-#### USER VISIBLE CHANGES
-
-   *    rbind now cross-refs colnames as data.frame does, rather
-        than always binding by column order, FR#1634. A warning is
-        produced when the colnames are not in a consistent order.
-        Thanks to Damian Betebenner for highlighting. rbind an
-        unnamed list to bind columns by position.
-
-   *    The 'bysameorder' argument has been removed, as intended and
-        warned in ?data.table.
-
-   *    New option datatable.allocwarn. See ?truelength.
-
-#### NOTES
-
-   *    There are now 472 raw tests, plus S4 tests.
-
-
-### Changes in v1.7.4
-
-#### BUG FIXES
-
-   *    v1.7.3 failed CRAN checks (and could crash) in R pre-2.14.0.
-        Over-allocation in v1.7.3 uses truelength which is initialized
-        to 0 by R 2.14.0, but not initialized pre-2.14.0. This was
-        known and coded for but only tested in 2.14.0 before previous
-        release to CRAN.
-
-#### NOTES
-
-   *    Two unused C variables removed to pass warning from one CRAN
-        check machine (r-devel-fedora). -Wno-unused removed from
-        Makevars to catch this in future before submitting to CRAN.
-
-
-### Changes in v1.7.3
-
-#### NEW FEATURES
-
-    *   data.table now over-allocates its vector of column pointer slots
-        (100 by default). This allows := to add columns fully by
-        reference as suggested by Muhammad Waliji, #1646. When the 100
-        slots are used up, more space is automatically allocated.
-
-        Over allocation has negligible overhead. It's just the vector
-        of column pointers, not the columns themselves.
-
-    *   New function alloc.col() pre-allocates column slots. Use
-        this before a loop to add many more than 100 columns, for example,
-        to avoid the warning as data.table grows its column pointer vector
-        every additional 100 columns; e.g.,
-            alloc.col(DT,10000)  # reserve 10,000 column slots
-
-    *   New function truelength() returns the number of column pointer
-        slots allocated, always >= length() other than just after a table
-        has been loaded from disk.
-
-    *   New option 'datatable.nomatch' allows the default for nomatch
-        to be changed from NA to 0, as wished for by Branson Owen.
-
-    *   cbind(DT,...) now retains DT's key, as wished for by Chris Neff
-        and partly implementing FR#295.
-
-#### BUG FIXES
-
-    *   Assignment to factor columns (using :=, [<- or $<-) could cause
-        'variable not found' errors and a seg fault in some circumstances
-        due to a new feature in v1.7.0: "Factor columns on LHS of :=, [<-
-        and $<- can now be assigned new levels", fixing #1664. Thanks to
-        Daniele Signori for reporting.
-
-    *   DT[i,j]<-value no longer crashes when j is a factor column and value
-        is numeric, fixing #1656.
-
-    *   An unnecessarily strict machine tolerance test failed CRAN checks
-        on Mac preventing v1.7.2 availability for Mac (only).
-
-#### USER VISIBLE CHANGES
-
-    *   := now has its own help page in addition to the examples in ?data.table,
-        see help(":=").
-
-    *   The error message from X[Y] when X is unkeyed has been lengthened to
-        including advice to call setkey first and see ?setkey. Thanks to a
-        comment by ilprincipe on Stack Overflow.
-
-    *   Deleting a missing column is now a warning rather than error. Thanks
-        to Chris Neff for suggesting, #1642.
-
-
-### Changes in v1.7.2
-
-#### NEW FEATURES
-
-    *   unique and duplicated methods now work on unkeyed tables (comparing
-        all columns in that case) and both now respect machine tolerance for
-        double precision columns, implementing FR#1626 and fixing bug #1632.
-        Their help page has been updated accordingly with detailed examples.
-        Thanks to questions by Iterator and comments by Allan Engelhardt on
-        Stack Overflow.
-
-    *   A new method as.data.table.list has been added, since passing a (pure)
-        list to data.table() now creates a single list column.
-
-
-#### BUG FIXES
-
-    *   Assigning to a column variable using <- or = in j now
-        works (creating a local copy within j), rather than
-        persisting from group to group and sometimes causing a crash.
-        Non column variables still persist from group to group; e.g.,
-        a group counter. This fixes the remainder of #1624 thanks to
-        Steve Lianoglou for reporting.
-
-    *   A crash bug is fixed when j returns a (strictly) NULL column next
-        to a non-empty column, #1633. This case was anticipated and coded
-        for but an errant LENGTH() should have been length(). Thanks
-        to Dennis Murphy for reporting.
-
-    *   The first column of data.table() can now be a list column, fixing
-        #1640. Thanks to Stavros Macrakis for reporting.
-
-
-### Changes in v1.7.1
-
-#### BUG FIXES
-
-    *   .SD is now locked, partially fixing #1624. It was never
-        the intention to allow assignment to .SD. Take a 'copy(.SD)'
-        first if needed. Now documented in ?data.table and new FAQ 4.5
-        including example. Thanks to Steve Lianoglou for reporting.
-
-    *   := now works with a logical i subset; e.g.,
-            DT[x==1,y:=x]
-        Thanks to Muhammad Waliji for reporting.
-
-#### USER VISIBLE CHANGES
-
-    *   Error message "column <name> of i is not internally type integer"
-        is now more helpful adding "i doesn't need to be keyed, just
-        convert the (likely) character column to factor". Thanks to
-        Christoph_J for his SO question.
-
-
-### Changes in v1.7.0
-
-#### NEW FEATURES
-
-    *   data.table() now accepts list columns directly rather than
-        needing to add list columns to an existing data.table; e.g.,
-
-            DT = data.table(x=1:3,y=list(4:6,3.14,matrix(1:12,3)))
-
-        Thanks to Branson Owen for reminding. As before, list columns
-        can be created via grouping; e.g.,
-
-            DT = data.table(x=c(1,1,2,2,2,3,3),y=1:7)
-            DT2 = DT[,list(list(unique(y))),by=x]
-            DT2
-                 x      V1
-            [1,] 1    1, 2
-            [2,] 2 3, 4, 5
-            [3,] 3    6, 7
-
-        and list columns can be grouped; e.g.,
-
-            DT2[,sum(unlist(V1)),by=list(x%%2)]
-                 x V1
-            [1,] 1 16
-            [2,] 0 12
-
-        Accordingly, one item has been added to FAQ 2.17 (differences
-        between data.frame and data.table): data.frame(list(1:2,"k",1:4))
-        creates 3 columns, data.table creates one list column.
-
-    *   subset, transform and within now retain keys when the expression
-        does not 'touch' key columns, implemeting FR #1341.
-
-    *   Recycling list() items on RHS of := now works; e.g.,
-
-            DT[,1:4:=list(1L,NULL),with=FALSE]
-            # set columns 1 and 3 to 1L and remove columns 2 and 4
-
-    *   Factor columns on LHS of :=, [<- and $<- can now be assigned
-        new levels; e.g.,
-
-            DT = data.table(A=c("a","b"))
-            DT[2,"A"] <- "c"  # adds new level automatically
-            DT[2,A:="c"]      # same (faster)
-            DT$A = "newlevel" # adds new level and recycles it
-
-        Thanks to Damian Betebenner and Chris Neff for highlighting.
-        To change the type of a column, provide a full length RHS (i.e.
-        'replace' the column).
-
-#### BUG FIXES
-
-    *   := with i all FALSE no longer sets the whole column, fixing
-        bug #1570. Thanks to Chris Neff for reporting.
-
-    *   0 length by (such as NULL and character(0)) now behave as
-        if by is missing, fixing bug #1599. This is useful when by
-        is dynamic and a 'dont group' needs to be represented.
-        Thanks to Chris Neff for reporting.
-
-    *   NULL j no longer results in 'inconsistent types' error, but
-        instead returns no rows for that group, fixing bug #1576.
-
-    *   matrix i is now an error rather than using i as if it were a
-        vector and obtaining incorrect results. It was undocumented that
-        matrix might have been an acceptable type. matrix i is
-        still acceptable in [<-; e.g.,
-            DT[is.na(DT)] <- 1L
-        and this now works rather than assigning to non-NA items in some
-        cases.
-
-    *   Inconsistent [<- behaviour is now fixed (#1593) so these examples
-        now work :
-            DT[x == "a", ]$y <- 0L
-            DT["a", ]$y <- 0L
-        But, := is highly encouraged instead for speed; i.e.,
-            DT[x == "a", y:=0L]
-            DT["a", y:=0L]
-        Thanks to Leon Baum for reporting.
-
-    *   unique on an unsorted table now works, fixing bug #1601.
-        Thanks to a question by Iterator on Stack Overflow.
-
-    *   Bug fix #1534 in v1.6.5 (see NEWS below) only worked if data.table
-        was higher than IRanges on the search() path, despite the item in
-        NEWS stating otherwise. Fixed.
-
-    *   Compatibility with package sqldf (which can call do.call("rbind",...)
-        on an empty "...") is fixed and test added. data.table was switching
-        on list(...)[[1]] rather than ..1. Thanks to RYogi for reporting #1623.
-
-#### USER VISIBLE CHANGES
-
-    *   cbind and rbind are no longer masked. But, please do read FAQ 2.23,
-        4.4 and 5.1.
-
-
-### Changes in v1.6.6
-
-#### BUG FIXES
-
-    *   Tests using .Call("Rf_setAttrib",...) passed CRAN acceptance
-        checks but failed on many (but not all) platforms. Fixed.
-        Thanks to Prof Brian Ripley for investigating the issue.
-
-### Changes in v1.6.5
-
-#### NEW FEATURES
-
-    *   The LHS of := may now be column names or positions
-        when with=FALSE; e.g.,
-
-            DT[,c("d","e"):=NULL,with=FALSE]
-            DT[,4:5:=NULL,with=FALSE]
-            newcolname="myname"
-            DT[,newcolname:=3.14,with=FALSE]
-
-        This implements FR#1499 'Ability to efficiently remove a
-        vector of column names' by Timothee Carayol in addition to
-        creating and assigning to multiple columns. We still plan
-        to allow multiple := without needing with=FALSE, in future.
-
-    *   setkey(DT,...) now returns DT (invisibly) rather than NULL.
-        This is to allow compound statements; e.g.,
-            setkey(DT,x)["a"]
-
-    *   setkey (and key<-) are now more efficient when the data happens
-        to be already sorted by the key columns; e.g., when data is
-        loaded from ordered files.
-
-    *   If DT is already keyed by the columns passed to setkey (or
-        key<-), the key is now rebuilt and checked rather than skipping
-        for efficiency. This is to save needing to know to drop the key
-        first to rebuild an invalid key. Invalid keys can arise by going
-        'under the hood'; e.g., attr(DT,"sorted")="z", or somehow ending
-        up with unordered factor levels. A warning is issued so the root
-        cause can be fixed. Thanks to Timothee Carayol for highlighting.
-
-    *   A new copy() function has been added, FR#1501. This copies a
-        data.table (retaining its key, if any) and should now be used to
-        copy rather than data.table(). Reminder: data.tables are not
-        copied on write by setkey, key<- or :=.
-
-
-#### BUG FIXES
-
-    *   DT[,z:=a/b] and DT[a>3,z:=a/b] work again, where a and
-        b are columns of DT. Thanks to Chris Neff for reporting,
-        and his patience.
-
-    *   Numeric columns with class attributes are now correctly
-        coerced to integer by setkey and ad hoc by. The error
-        similar to 'fractional data cannot be truncated' should now
-        only occur when that really is true. A side effect of
-        this is that ad hoc by and setkey now work on IDate columns
-        which have somehow become numeric; e.g., via rbind(DF,DF)
-        as reported by Chris Neff.
-
-    *   .N is now 0 (rather than 1) when no rows in x match the
-        row in i, fixing bug #1532. Thanks to Yang Zhang for
-        reporting.
-
-    *   Compatibility with package IRanges has been restored. Both
-        data.table and IRanges mask cbind and rbind. When data.table's
-        cbind is found first (if it is loaded after IRanges) and the
-        first argument is not data.table, it now delegates to the next
-        package on the search path (and above that), one or more of which
-        may also mask cbind (such as IRanges), rather than skipping
-        straight to base::cbind. So, it no longer matters which way around
-        data.table and IRanges are loaded, fixing #1534. Thanks to Steve
-        Lianoglou for reporting.
-
-
-#### USER VISIBLE CHANGES
-
-    *   setkey's verbose messages expanded.
-
-
-### Changes in v1.6.4
-
-#### NEW FEATURES
-
-    *   DT[colA>3,which=TRUE] now returns row numbers rather
-        than a logical vector, for consistency.
-
-#### BUG FIXES
-
-    *   Changing a keyed column name now updates the key, too,
-        so an invalid key no longer arises, fixing #1495.
-        Thanks to Chris Neff for reporting.
-
-    *   := already warned when a numeric RHS is coerced to
-        match an integer column's type. Now it also warns when
-        numeric is coerced to logical, and integer is coerced
-        to logical, fixing #1500. Thanks to Chris Neff for
-        reporting.
-
-    *   The result of DT[,newcol:=3.14] now includes the new
-        column correctly, as well as changing DT by reference,
-        fixing #1496. Thanks to Chris Neff for reporting.
-
-    *   :=NULL to remove a column (instantly, regardless of table
-        size) now works rather than causing a segfault in some
-        circumstances, fixing #1497. Thanks to Timothee Carayol
-        for reporting.
-
-    *   Previous within() and transform() behaviour restored; e.g.,
-        can handle multiple columns again. Thanks to Timothee Carayol
-        for reporting.
-
-    *   cbind(DT,DF) now works, as does rbind(DT,DF), fixing #1512.
-        Thanks to Chris Neff for reporting. This was tricky to fix due
-        to nuances of the .Internal dispatch code in cbind and rbind,
-        preventing S3 methods from working in all cases.
-        R will now warn that cbind and rbind have been masked when
-        the data.table package is loaded. These revert to base::cbind
-        and base::rbind when the first argument is not data.table.
-
-    *   Removing multiple columns now works (again) using
-        DT[,c("a","b")]=NULL, or within(DT,rm(a,b)), fixing #1510.
-        Thanks to Timothee Carayol for reporting.
-
-
-#### NOTES
-
-    *   The package uses two features (packageVersion() and \href in Rd)
-        added to R 2.12.0 and is therefore dependent on that release.
-        A 'spurious warning' when checking a package using \href was
-        fixed in R 2.12.2 patched but we believe that warning can safely
-        be ignored in versions >= 2.12.0 and < 2.12.2 patched.
-
-
-### Changes in v1.6.3
-
-#### NEW FEATURES
-
-    *   Ad hoc grouping now returns results in the same order each
-        group first appears in the table, rather than sorting the
-        groups. Thanks to Steve Lianoglou for highlighting. The order
-        of the rows within each group always has and always will be
-        preserved. For larger datasets a 'keyed by' is still faster;
-        e.g., by=key(DT).
-
-    *   The 'key' argument of data.table() now accepts a vector of
-        column names in addition to a single comma separated string
-        of column names, for consistency. Thanks to Steve Lianoglou
-        for highlighting.
-
-    *   A new argument '.SDcols' has been added to [.data.table. This
-        may be character column names or numeric positions and
-        specifies the columns of x included in .SD. This is useful
-        for speed when applying a function through a subset of
-        (possibly very many) columns; e.g.,
-            DT[,lapply(.SD,sum),by="x,y",.SDcols=301:350]
-
-    *   as(character, "IDate") and as(character, "ITime") coercion
-        functions have been added. Enables the user to declaring colClasses
-        as "IDate" and "ITime" in the various read.table (and sister)
-        functions. Thanks to Chris Neff for the suggestion.
-
-    *   DT[i,j]<-value is now handled by data.table in C rather
-        than falling through to data.frame methods, FR#200. Thanks to
-        Ivo Welch for raising speed issues on r-devel, to Simon Urbanek
-        for the suggestion, and Luke Tierney and Simon for information
-        on R internals.
-
-        [<- syntax still incurs one working copy of the whole
-        table (as of R 2.13.1) due to R's [<- dispatch mechanism
-        copying to `*tmp*`, so, for ultimate speed and brevity,
-        the operator := may now be used in j as follows.
-
-    *   := is now available to j and means assign to the column by
-        reference; e.g.,
-
-            DT[i,colname:=value]
-
-        This syntax makes no copies of any part of memory at all.
-
-        m = matrix(1,nrow=100000,ncol=100)
-        DF = as.data.frame(m)
-        DT = as.data.table(m)
-
-        system.time(for (i in 1:1000) DF[i,1] <- i)
-             user  system elapsed
-          287.062 302.627 591.984
-
-        system.time(for (i in 1:1000) DT[i,V1:=i])
-             user  system elapsed
-            1.148   0.000   1.158     ( 511 times faster )
-
-        := in j can be combined with all types of i, such as binary
-        search, and used to add and remove columns efficiently.
-        Fast assigning within groups will be implemented in future.
-
-        Reminder that data.frame and data.table both allow columns
-        of mixed types, including columns which themselves may be
-        type list; matrix may be one (atomic) type only.
-
-        *Please note*, := is new and experimental.
-
-
-#### BUG FIXES
-
-    *   merge()ing two data.table's with user-defined `suffixes`
-        was getting tripped up when column names in x ended in
-        '.1'. This resulted in the `suffixes` parameter being
-        ignored.
-
-    *   Mistakenly wrapping a j expression inside quotes; e.g.,
-            DT[,list("sum(a),sum(b)"),by=grp]
-        was appearing to work, but with wrong column names. This
-        now returns a character column (the quotes should not
-        be used). Thanks to Joseph Voelkel for reporting.
-
-    *   setkey has been made robust in several ways to fix issues
-        introduced in 1.6.2: #1465 ('R crashes after setkey')
-        reported by Eugene Tyurin and similar bug #1387 ('paste()
-        by group to create long comma separated strings can crash')
-        reported by Nicolas Servant and Jean-Francois Rami. This
-        bug was not reproducible so we are especially grateful for
-        the patience of these people in helping us find, fix and
-        test it.
-
-    *   Combining a join, j and by together in one query now works
-        rather than giving an error, fixing bug #1468. Discovered
-        indirectly thanks to a post from Jelmer Ypma.
-
-    *   Invalid keys no longer arise when a non-data.table-aware
-        package reorders the data; e.g.,
-            setkey(DT,x,y)
-            plyr::arrange(DT,y)       # same as DT[order(y)]
-        This now drops the key to avoid incorrect results being
-        returned the next time the invalid key is joined to. Thanks
-        to Chris Neff for reporting.
-
-
-#### USER-VISIBLE CHANGES
-
-    *   The startup banner has been shortened to one line.
-
-    *   data.table does not support POSIXlt. Almost unbelievably
-        POSIXlt uses 40 bytes to store a single datetime. If it worked
-        before, that was unintentional. Please see ?IDateTime, or any
-        other date class that uses a single atomic vector. This is
-        regardless of whether the POSIXlt is a key column, or not. This
-        resolves bug #1481 by documenting non support in ?data.table.
-
-
-#### DEPRECATED & DEFUNCT
-
-   *    Use of the DT() alias in j is no longer caught for backwards
-        compatibility and is now fully removed. As warned in NEWS
-        for v1.5.3, v1.4, and FAQs 2.6 and 2.7.
-
-
-### Changes in v1.6.2
-
-#### NEW FEATURES
-
-   *    setkey no longer copies the whole table and should be
-        faster for large tables. Each column is reordered by reference
-        (in C) using one column of working memory, FR#1006. User
-        defined attributes on the original table are now also
-        retained (thanks to Thell Fowler for reporting).
-
-   *    A new symbol .N is now available to j, containing the
-        number of rows in the group. This may be useful when
-        the column names are not known in advance, for
-        convenience generally, and for efficiency.
-
-
-### Changes in v1.6.1
-
-#### NEW FEATURES
-
-   *    j's environment is now consistently reused so
-        that local variables may be set which persist
-        from group to group; e.g., incrementing a group
-        counter :
-            DT[,list(z,groupInd<-groupInd+1),by=x]
-        Thanks to Andreas Borg for reporting.
-
-   *    A new symbol .BY is now available to j, containing 1 row
-        of the current 'by' variables, type list. 'by' variables
-        may also be used by name, and are now length 1 too. This
-        implements FR#1313. FAQ 2.10 has been updated accordingly.
-        Some examples :
-            DT[,sum(x)*.BY[[1]],by=eval(byexp)]
-            DT[,sum(x)*mylookuptable[J(y),z],by=y]
-            DT[,list(sum(unlist(.BY)),sum(z)),by=list(x,y%%2)]
-
-   *    i may now be type list, and works the same as when i
-        is type data.table. This saves needing J() in as many
-        situations and may be a little more efficient. One
-        application is using .BY directly in j to join to a
-        relatively small lookup table, once per group, for space
-        and time efficiency. For example :
-            DT[,list(GROUPDATA[.BY]$name,sum(v)),by=grp]
-
-
-#### BUG FIXES
-
-   *    A 'by' character vector of column names now
-        works when there are less rows than columns; e.g.,
-            DT[,sum(x),by=key(DT)]  where nrow(DT)==1.
-        Many thanks to Andreas Borg for report, proposed
-        fix and tests.
-
-   *    Zero length columns in j no longer cause a crash in
-        some circumstances. Empty columns are filled with NA
-        to match the length of the longest column in j.
-        Thanks to Johann Hibschman for bug report #1431.
-
-   *    unique.data.table now calls the same internal code
-        (in C) that grouping calls. This fixes a bug when
-        unique is called directly by user, and, NA exist
-        in the key (which might be quite rare). Thanks to
-        Damian Betebenner for bug report. unique should also
-        now be faster.
-
-   *    Variables in calling scope can now be used in j when
-        i is logical or integer, fixing bug #1421. Thanks
-        to Alexander Peterhansl for reporting.
-
-
-#### USER-VISIBLE CHANGES
-
-    *   ?data.table now documents that logical i is not quite
-        the same as i in [.data.frame. NA are treated as FALSE,
-        and DT[NA] returns 1 row of NA, unlike [.data.frame.
-        Three points have been added to FAQ 2.17. Thanks to
-        Johann Hibschman for highlighting.
-
-    *   Startup banner now uses packageStartupMessage() so the
-        banner can be suppressed by those annoyed by banners,
-        whilst still being helpful to new users.
-
-
-
-### Changes in v1.6
-
-#### NEW FEATURES
-
-   *    data.table now plays nicely with S4 classes. Slots can be
-        defined to be S4 objects, S4 classes can inherit from data.table,
-        and S4 function dispatch works on data.table objects. See the
-        tests in inst/tests/test-S4.R, and from the R prompt:
-        ?"data.table-class"
-
-   *    merge.data.table now works more like merge.data.frame:
-        (i) suffixes are consistent with merge.data.frame; existing users
-        may set options(datatable.pre.suffixes=TRUE) for backwards
-        compatibility.
-        (ii) support for 'by' argument added (FR #1315).
-        However, X[Y] syntax is preferred; some users never use merge.
-
-
-#### BUG FIXES
-
-   *    by=key(DT) now works when the number of rows is not
-        divisible by the number of groups (#1298, an odd bug).
-        Thanks to Steve Lianoglou for reporting.
-
-   *    Combining i and by where i is logical or integer subset now
-        works, fixing bug #1294. Thanks to Johann Hibschman for
-        contributing a new test.
-
-   *    Variable scope inside [[...]] now works without a workaround
-        required. This can be useful for looking up which function
-        to call based on the data e.g. DT[,fns[[fn]](colA),by=ID].
-        Thanks to Damian Betebenner for reporting.
-
-   *    Column names in self joins such as DT[DT] are no longer
-        duplicated, fixing bug #1340. Thanks to Andreas Borg for
-        reporting.
-
-
-#### USER-VISIBLE CHANGES
-
-   *    Additions and updates to FAQ vignette. Thanks to Dennis
-        Murphy for his thorough proof reading.
-
-   *    Welcome to Steve Lianoglou who joins the project
-        contributing S4-ization, testing using testthat, and more.
-
-   *    IDateTime is now linked from ?data.table. data.table users
-        unaware of IDateTime, please do take a look. Tom added
-        IDateTime in v1.5 (see below).
-
-
-### Changes in v1.5.3
-
-#### NEW FEATURES
-
-   *    .SD no longer includes 'by' columns, FR#978. This resolves
-        the long standing annoyance of duplicated 'by' columns
-        when the j expression returns a subset of rows from .SD.
-        For example, the following query no longer contains
-        a redundant 'colA.1' duplicate.
-            DT[,.SD[2],by=colA] #  2nd row of each group
-        Any existing code that uses .SD may require simple
-        changes to remove workarounds.
-
-   *    'by' may now be a character vector of column names.
-        This allows syntax such as DT[,sum(x),by=key(DT)].
-
-   *    X[Y] now includes Y's non-join columns, as most users
-        naturally expect, FR#746. Please do use j in one step
-        (i.e. X[Y,j]) since that merges just the columns j uses and
-        is much more efficient than X[Y][,j] or merge(X,Y)[,j].
-
-   *    The 'Join Inherited Scope' feature is back on, FR#1095. This
-        is consistent with X[Y] including Y's non-join columns, enabling
-        natural progression from X[Y] to X[Y,j]. j sees columns in X
-        first then Y.
-        If the same column name exists in both X and Y, the data in
-        Y can be accessed via a prefix "i." (not yet implemented).
-
-   *    Ad hoc by now coerces double to integer (provided they are
-        all.equal) and character to factor, FR#1051, as setkey
-        already does.
-
-#### USER-VISIBLE CHANGES
-
-   *    The default for mult is now "all", as planned and
-        prior notice given in FAQ 2.2.
-
-   *    ?[.data.table has been merged into ?data.table and updated,
-        simplified, corrected and formatted.
-
-#### DEPRECATED & DEFUNCT
-
-   *    The DT() alias is now fully deprecated, as warned
-        in NEWS for v1.4, and FAQs 2.6 and 2.7.
-
-
-### Changes in v1.5.2
-
-#### NEW FEATURES
-
-   *    'by' now works when DT contains list() columns i.e.
-        where each value in a column may itself be vector
-        or where each value is a different type. FR#1092.
-
-   *    The result from merge() is now keyed. FR#1244.
-
-
-#### BUG FIXES
-
-    *   eval of parse()-ed expressions now works without
-        needing quote() in the expression, bug #1243. Thanks
-        to Joseph Voelkel for reporting.
-
-    *   the result from the first group alone may be bigger
-        than the table itself, bug #1245. Thanks to
-        Steve Lianoglou for reporting.
-
-    *   merge on a data.table with a single key'd column only
-        and all=TRUE now works, bug #1241. Thanks to
-        Joseph Voelkel for reporting.
-
-    *   merge()-ing by a column called "x" now works, bug
-        #1229 related to variable scope. Thanks to Steve
-        Lianoglou for reporting.
-
-
-### Changes in v1.5.1
-
-#### BUG FIXES
-
-    *   Fixed inheritance for other packages importing or depending
-        on data.table, bugs #1093 and #1132. Thanks to Koert Kuipers
-        for reporting.
-
-    *   data.table queries can now be used at the debugger() prompt,
-        fixing bug #1131 related to inheritance from data.frame.
-
-
-### Changes in v1.5
-
-#### NEW FEATURES
-
-    *   data.table now *inherits* from data.frame, for functions and
-        packages which _only_ accept data.frame, saving time and
-        memory of conversion. A data.table is a data.frame too;
-        is.data.frame() now returns TRUE.
-
-    *   Integer-based date and time-of-day classes have been
-        introduced. This allows dates and times to be used as keys
-        more easily. See as.IDate, as.ITime, and IDateTime.
-        Conversions to and from POSIXct, Date, and chron are
-        supported.
-
-    *   [<-.data.table and $<-.data.table were revised to check for
-        changes to the key-ed columns. [<-.data.table also now allows
-        data.table-style indexing for i. Both of these changes may
-        introduce incompatibilities for existing code.
-
-    *   Logical columns are now allowed in keys and in 'by', as are expressions
-        that evaluate to logical. Thanks to David Winsemius for highlighting.
-
-
-#### BUG FIXES
-
-    *   DT[,5] now returns 5 as FAQ 1.1 says, for consistency
-        with DT[,c(5)] and DT[,5+0]. DT[,"region"] now returns
-        "region" as FAQ 1.2 says. Thanks to Harish V for reporting.
-
-    *   When a quote()-ed expression q is passed to 'by' using
-        by=eval(q), the group column names now come from the list
-        in the expression rather than the name 'q' (bug #974) and,
-        multiple items work (bug #975). Thanks to Harish V for
-        reporting.
-
-    *   quote()-ed i and j expressions receive similar fixes, bugs
-        #977 and #1058. Thanks to Harish V and Branson Owen for
-        reporting.
-
-    *   Multiple errors (grammar, format and spelling) in intro.Rnw
-        and faqs.Rnw corrected by Dennis Murphy. Thank you.
-
-    *   Memory is now reallocated in rare cases when the up front
-        allocate for the result of grouping is insufficient. Bug
-        #952 raised by Georg V, and also reported by Harish. Thank
-        you.
-
-    *   A function call foo(arg=sum(b)) now finds b in DT when foo
-        contains DT[,eval(substitute(arg)),by=a], fixing bug #1026.
-        Thanks to Harish V for reporting.
-
-    *   If DT contains column 'a' then DT[J(unique(a))] now finds
-        'a', fixing bug #1005. Thanks to Branson Owen for reporting.
-
-    *   'by' on no data (for example when 'i' returns no rows) now
-        works, fixing bug #709.
-
-    *   'by without by' now heeds nomatch=NA, fixing bug #1015.
-        Thanks to Harish V for reporting.
-
-    *   DT[NA] now returns 1 row of NA rather than the whole table
-        via standard NA logical recycling. A single NA logical is
-        a special case and is now replaced by NA_integer_. Thanks
-        to Branson Owen for highlighting the issue.
-
-    *   NROW removed from data.table, since the is.data.frame() in
-        base::NROW now returns TRUE due to inheritance. Fixes bug
-        #1039 reported by Bradley Buchsbaum. Thank you.
-
-    *   setkey() now coerces character to factor and double to
-        integer (provided they are all.equal), fixing bug #953.
-        Thanks to Steve Lianoglou for reporting.
-
-    *   'by' now accepts lists from the calling scope without the
-        work around of wrapping with as.list() or {}, fixing bug
-        #1060. Thanks to Johann Hibschman for reporting.
-
-
-#### NOTES
-
-    *   The package uses the 'default' option of base::getOption,
-        and is therefore dependent on R 2.10.0. Updated DESCRIPTION
-        file accordingly. Thanks to Christian Hudon for reporting.
-
-
-### Changes in v1.4.1
-
-
-#### NEW FEATURES
-
-    *   Vignettes tidied up.
-
-
-#### BUG FIXES
-
-    *   Out of order levels in key columns are now sorted by
-        setkey. Thanks to Steve Lianoglou for reporting.
-
-
-
-
-### Changes in v1.4
-
-
-#### NEW FEATURES
-
-    *   'by' faster. Memory is allocated first for the result, then
-    populated directly by the result of j for each group. Can be 10
-    or more times faster than tapply() and aggregate(), see
-    timings vignette.
-
-    *   j should now be a list(), not DT(), of expressions. Use of
-    j=DT(...) is caught internally and replaced with j=list(...).
-
-    *   'by' may be a list() of expressions. A single column name
-    is automatically list()-ed for convenience. 'by' may still be
-    a comma separated character string, as before.
-        DT[,sum(x),by=region]                     # new
-        DT[,sum(x),by=list(region,month(date))]   # new
-        DT[,sum(x),by="region"]                   # old, ok too
-        DT[,sum(x),by="region,month(date)"]       # old, ok too
-
-    *   key() and key<- added. More R-style alternatives to getkey()
-    and setkey().
-
-    *   haskey() added. Returns TRUE if a table has a key.
-
-    *   radix sorting is now column by column where possible, was
-    previously all or nothing. Helps with keys of many columns.
-
-    *   Added format method.
-
-    *   22 tests added to test.data.table(), now 149.
-
-    *   Three vignettes added : FAQ, Intro & Timings
-
-
-#### DEPRECATED & DEFUNCT
-
-    *   The DT alias is removed. Use 'data.table' instead to create
-    objects. See 2nd new feature above.
-
-    *   RUnit framework removed.
-    test.data.table() is called from examples in .Rd so 'R CMD check'
-    will run it. Simpler. An eval(body(test.data.table))
-    is also in the .Rd, to catch namespace issues.
-
-    *   Dependency on package 'ref' removed.
-
-    *   Arguments removed:  simplify, incbycols and byretn.
-    Grouping is simpler now, these are superfluous.
-
-
-#### BUG FIXES
-
-    *   Column classes are now retained by subset and grouping.
-
-    *   tail no longer fails when a column 'x' exists.
-
-
-#### KNOWN PROBLEMS
-
-    *   Minor : Join Inherited Scope not working, contrary
-        to the documentation.
-
-
-#### NOTES
-
-    *   v1.4 was essentially the branch at rev 44, reintegrated
-    at rev 78.
-
-
-
-
-### Changes in v1.3
-
-
-#### NEW FEATURES
-
-    *   Radix sorting added. Speeds up setkey and add-hoc 'by'
-    by factor of 10 or more.
-
-    *   Merge method added, much faster than base::merge method
-    of data.frame.
-
-    *   'by' faster. Logic moved from R into C. Memory is
-    allocated for the largest group only, then re-used.
-
-    *   The Sub Data is accessible as a whole by j using object
-    .SD. This should only be used in rare circumstances. See FAQ.
-
-    *   Methods added : duplicated, unique, transform, within,
-    [<-, t, Math, Ops, is.na, na.omit, summary
-
-    *   Column name rules improved e.g. dots now allowed.
-
-    *   as.data.frame.data.table rownames improved.
-
-    *   29 tests added to test.data.table(), now 127.
-
-
-#### USER-VISIBLE CHANGES
-
-    *   Default of mb changed, now tables(mb=TRUE)
-
-
-#### DEPRECATED & DEFUNCT
-
-    *   ... removed in [.data.table.
-    j may not be a function, so this is now superfluous.
-
-
-#### BUG FIXES
-
-    *   Incorrect version warning with R 2.10+ fixed.
-
-    *   j enclosure raised one level. This fixes some bugs
-    where the j expression previously saw internal variable
-    names. It also speeds up grouping a little.
-
-
-#### NOTES
-
-    *   v1.3 was not released to CRAN. R-Forge repository only.
-
-
-
-### v1.2 released to CRAN in Aug 2008
+### Project overview is on the GitHub Wiki tab, our [HOMEPAGE](https://github.com/Rdatatable/data.table/wiki)
 
 
diff --git a/build/vignette.rds b/build/vignette.rds
index 482882e..0465335 100644
Binary files a/build/vignette.rds and b/build/vignette.rds differ
diff --git a/inst/doc/datatable-faq.R b/inst/doc/datatable-faq.R
index a75f9fd..424ec8e 100644
--- a/inst/doc/datatable-faq.R
+++ b/inst/doc/datatable-faq.R
@@ -1,72 +1,35 @@
-### R code from vignette source 'datatable-faq.Rnw'
-
-###################################################
-### code chunk number 1: datatable-faq.Rnw:15-18
-###################################################
-if (!exists("data.table",.GlobalEnv)) library(data.table)  # see Intro.Rnw for comments on these two lines
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-options(width=70)  # so lines wrap round
-
-
-###################################################
-### code chunk number 2: datatable-faq.Rnw:95-102
-###################################################
-DT = as.data.table(iris)
-setkey(DT,Species)
-myfunction = function(dt, expr) {
-    e = substitute(expr)
-    dt[,eval(e),by=Species]
-}
-myfunction(DT,sum(Sepal.Width))
-
-
-###################################################
-### code chunk number 3: datatable-faq.Rnw:109-115
-###################################################
-DT = as.data.table(iris)
-whatToRun = quote( .(AvgWidth = mean(Sepal.Width),
-                     MaxLength = max(Sepal.Length)) )
-DT[, eval(whatToRun), by=Species]
-DT[, eval(whatToRun), by=.(FirstLetter=substring(Species,1,1))]
-DT[, eval(whatToRun), by=.(Petal.Width=round(Petal.Width,0))]
-
-
-###################################################
-### code chunk number 4: datatable-faq.Rnw:158-165
-###################################################
-X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
-setkey(X,grp)
-Y = data.table(c("b","c"), bar=c(4,2))
+## ---- echo = FALSE, message = FALSE--------------------------------------
+library(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+
+## ------------------------------------------------------------------------
+X = data.table(grp = c("a", "a", "b",
+                       "b", "b", "c", "c"), foo = 1:7)
+setkey(X, grp)
+Y = data.table(c("b", "c"), bar = c(4, 2))
 X
 Y
-X[Y,sum(foo*bar)]
-X[Y,sum(foo*bar),by=.EACHI]
-
+X[Y, sum(foo*bar)]
+X[Y, sum(foo*bar), by = .EACHI]
 
-###################################################
-### code chunk number 5: datatable-faq.Rnw:193-196
-###################################################
-DF = data.frame(x=1:3,y=4:6,z=7:9)
+## ------------------------------------------------------------------------
+DF = data.frame(x = 1:3, y = 4:6, z = 7:9)
 DF
-DF[,c("y","z")]
-
+DF[ , c("y", "z")]
 
-###################################################
-### code chunk number 6: datatable-faq.Rnw:199-201
-###################################################
+## ------------------------------------------------------------------------
 DT = data.table(DF)
-DT[,c(y,z)]
+DT[ , c(y, z)]
 
+## ------------------------------------------------------------------------
+DT[ , .(y, z)]
 
-###################################################
-### code chunk number 7: datatable-faq.Rnw:204-205
-###################################################
-DT[,.(y,z)]
-
-
-###################################################
-### code chunk number 8: datatable-faq.Rnw:214-220
-###################################################
+## ------------------------------------------------------------------------
 data.table(NULL)
 data.frame(NULL)
 as.data.table(NULL)
@@ -74,151 +37,100 @@ as.data.frame(NULL)
 is.null(data.table(NULL))
 is.null(data.frame(NULL))
 
-
-###################################################
-### code chunk number 9: datatable-faq.Rnw:224-227
-###################################################
-DT = data.table(a=1:3,b=c(4,5,6),d=c(7L,8L,9L))
+## ------------------------------------------------------------------------
+DT = data.table(a = 1:3, b = c(4, 5, 6), d = c(7L,8L,9L))
 DT[0]
-sapply(DT[0],class)
+sapply(DT[0], class)
 
-
-###################################################
-### code chunk number 10: datatable-faq.Rnw:249-252
-###################################################
-DT = data.table(x=rep(c("a","b"),c(2,3)),y=1:5)
+## ------------------------------------------------------------------------
+DT = data.table(x = rep(c("a", "b"), c(2, 3)), y = 1:5)
 DT
-DT[,{z=sum(y);z+3},by=x]
-
-
-###################################################
-### code chunk number 11: datatable-faq.Rnw:258-263
-###################################################
-DT[,{
-  cat("Objects:",paste(objects(),collapse=","),"\n")
-  cat("Trace: x=",as.character(x)," y=",y,"\n")
-  sum(y)
-},by=x]
+DT[ , {z = sum(y); z + 3}, by = x]
 
+## ------------------------------------------------------------------------
+DT[ , {
+  cat("Objects:", paste(objects(), collapse = ","), "\n")
+  cat("Trace: x=", as.character(x), " y=", y, "\n")
+  sum(y)},
+  by = x]
 
-###################################################
-### code chunk number 12: datatable-faq.Rnw:269-271
-###################################################
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x,sum(y)),by=x]
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x[1],sum(y)),by=x]
+## ------------------------------------------------------------------------
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x, sum(y)), by = x]
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x[1], sum(y)), by = x]
 
-
-###################################################
-### code chunk number 13: datatable-faq.Rnw:290-292
-###################################################
-A = matrix(1:12,nrow=4)
+## ------------------------------------------------------------------------
+A = matrix(1:12, nrow = 4)
 A
 
+## ------------------------------------------------------------------------
+A[c(1, 3), c(2, 3)]
 
-###################################################
-### code chunk number 14: datatable-faq.Rnw:295-296
-###################################################
-A[c(1,3),c(2,3)]
-
-
-###################################################
-### code chunk number 15: datatable-faq.Rnw:303-306
-###################################################
-B = cbind(c(1,3),c(2,3))
+## ------------------------------------------------------------------------
+B = cbind(c(1, 3), c(2, 3))
 B
 A[B]
 
-
-###################################################
-### code chunk number 16: datatable-faq.Rnw:309-314
-###################################################
+## ------------------------------------------------------------------------
 rownames(A) = letters[1:4]
 colnames(A) = LETTERS[1:3]
 A
-B = cbind(c("a","c"),c("B","C"))
+B = cbind(c("a", "c"), c("B", "C"))
 A[B]
 
-
-###################################################
-### code chunk number 17: datatable-faq.Rnw:317-322
-###################################################
-A = data.frame(A=1:4,B=letters[11:14],C=pi*1:4)
+## ------------------------------------------------------------------------
+A = data.frame(A = 1:4, B = letters[11:14], C = pi*1:4)
 rownames(A) = letters[1:4]
 A
 B
 A[B]
 
+## ------------------------------------------------------------------------
+B = data.frame(c("a", "c"), c("B", "C"))
+cat(try(A[B], silent = TRUE))
 
-###################################################
-### code chunk number 18: datatable-faq.Rnw:325-327
-###################################################
-B = data.frame(c("a","c"),c("B","C"))
-cat(try(A[B],silent=TRUE))
-
+## ---- eval = FALSE-------------------------------------------------------
+#  DT[where, select|update, group by][order by][...] ... [...]
 
-###################################################
-### code chunk number 19: datatable-faq.Rnw:406-407
-###################################################
+## ------------------------------------------------------------------------
 base::cbind.data.frame
 
-
-###################################################
-### code chunk number 20: datatable-faq.Rnw:414-417
-###################################################
-foo = data.frame(a=1:3)
-cbind.data.frame = function(...)cat("Not printed\n")
+## ------------------------------------------------------------------------
+foo = data.frame(a = 1:3)
+cbind.data.frame = function(...) cat("Not printed\n")
 cbind(foo)
-
-
-###################################################
-### code chunk number 21: datatable-faq.Rnw:419-420
-###################################################
 rm("cbind.data.frame")
 
-
-###################################################
-### code chunk number 22: datatable-faq.Rnw:474-480
-###################################################
-DT = data.table(a=rep(1:3,1:3),b=1:6,c=7:12)
+## ------------------------------------------------------------------------
+DT = data.table(a = rep(1:3, 1:3), b = 1:6, c = 7:12)
 DT
-DT[,{ mySD = copy(.SD)
-      mySD[1,b:=99L]
-      mySD },
-    by=a]
-
+DT[ , { mySD = copy(.SD)
+      mySD[1, b := 99L]
+      mySD},
+    by = a]
 
-###################################################
-### code chunk number 23: datatable-faq.Rnw:486-492
-###################################################
-DT = data.table(a=c(1,1,2,2,2),b=c(1,2,2,2,1))
+## ------------------------------------------------------------------------
+DT = data.table(a = c(1,1,2,2,2), b = c(1,2,2,2,1))
 DT
-DT[,list(.N=.N),list(a,b)]   # show intermediate result for exposition
+DT[ , list(.N = .N), list(a, b)]   # show intermediate result for exposition
 cat(try(
-    DT[,list(.N=.N),by=list(a,b)][,unique(.N),by=a]   # compound query more typical
-,silent=TRUE))
-
+    DT[ , list(.N = .N), by = list(a, b)][ , unique(.N), by = a]   # compound query more typical
+, silent = TRUE))
 
-###################################################
-### code chunk number 24: datatable-faq.Rnw:496-502
-###################################################
+## ------------------------------------------------------------------------
 if (packageVersion("data.table") >= "1.8.1") {
-    DT[,.N,by=list(a,b)][,unique(N),by=a]
-}
+    DT[ , .N, by = list(a, b)][ , unique(N), by = a]
+  }
 if (packageVersion("data.table") >= "1.9.3") {
-    DT[,.N,by=.(a,b)][,unique(N),by=a]   # same
+    DT[ , .N, by = .(a, b)][ , unique(N), by = a]   # same
 }
 
-
-###################################################
-### code chunk number 25: datatable-faq.Rnw:520-528
-###################################################
-DT = data.table(a=1:5,b=1:5)
+## ------------------------------------------------------------------------
+DT = data.table(a = 1:5, b = 1:5)
 suppressWarnings(
-DT[2,b:=6]        # works (slower) with warning
+DT[2, b := 6]         # works (slower) with warning
 )
-class(6)          # numeric not integer
-DT[2,b:=7L]       # works (faster) without warning
-class(7L)         # L makes it an integer
-DT[,b:=rnorm(5)]  # 'replace' integer column with a numeric column
-
+class(6)              # numeric not integer
+DT[2, b := 7L]        # works (faster) without warning
+class(7L)             # L makes it an integer
+DT[ , b := rnorm(5)]  # 'replace' integer column with a numeric column
 
diff --git a/inst/doc/datatable-faq.Rmd b/inst/doc/datatable-faq.Rmd
new file mode 100644
index 0000000..f78ca19
--- /dev/null
+++ b/inst/doc/datatable-faq.Rmd
@@ -0,0 +1,616 @@
+---
+title: "Frequently Asked Questions about data.table"
+date: "`r Sys.Date()`"
+output:
+  rmarkdown::html_vignette:
+    toc: true
+    number_sections: true
+vignette: >
+  %\VignetteIndexEntry{Frequently asked questions}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+<style>
+h2 {
+    font-size: 20px;
+}
+</style>
+
+```{r, echo = FALSE, message = FALSE}
+library(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+```
+
+The first section, Beginner FAQs, is intended to be read in order, from start to finish.  It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow.
+
+This FAQ is required reading and considered core documentation. Please do not ask questions on Stack Overflow or raise issues on GitHub until you have read it. We can all tell when you ask that you haven't read it. So if you do ask and haven't read it, don't use your real name.
+
+This document has been quickly revised given the changes in v1.9.8 released Nov 2016. Please do submit pull requests to fix mistakes or improvements.  If anyone knows why the table of contents comes out so narrow and squashed when displayed by CRAN, please let us know.  This document used to be a PDF and we changed it recently to HTML.
+
+
+# Beginner FAQs
+
+## Why do `DT[ , 5]` and `DT[2, 5]` return a 1-column data.table rather than vectors like `data.frame`? {#j-num}
+
+For consistency so that when you use data.table in functions that accept varying inputs, you can rely on `DT[...]` returning a data.table. You don't have to remember to include `drop=FALSE` like you do in data.frame. data.table was first released in 2006 and this difference to data.frame has been a feature since the very beginning.
+
+You may have heard that it is generally bad practice to refer to columns by number rather than name, though. If your colleague comes along and reads your code later they may have to hunt around to find out which column is number 5. If you or they change the column ordering higher up in your R program, you may produce wrong results with no warning or error if you forget to change all the places in your code which refer to column number 5. That is your fault not R's or data.table's. It's r [...]
+
+Say column 5 is named `"region"` and you really must extract that column as a vector not a data.table. It is more robust to use the column name and write `DT$region` or `DT[["region"]]`; i.e., the same as base R. Using base R's `$` and `[[` on data.table is encouraged. Not when combined with `<-` to assign (use `:=` instead for that) but just to select a single column by name they are encouraged.
+
+There are some circumstances where referring to a column by number seems like the only way, such as a sequence of columns. In these situations just like data.frame, you can write `DT[, 5:10]` and `DT[,c(1,4,10)]`. However, again, it is more robust (to future changes in your data's number of and ordering of columns) to use a named range such as `DT[,columnRed:columnViolet]` or name each one `DT[,c("columnRed","columnOrange","columnYellow")]`. It is harder work up front, but you will proba [...]
+
+However, what we really want you to do is `DT[,.(columnRed,columnOrange,columnYellow)]`; i.e., use column names as if they are variables directly inside `DT[...]`. You don't have to prefix each column with `DT$` like you do in data.frame. The `.()` part is just an alias for `list()` and you can use `list()` instead if you prefer. You can place any R expression of column names, using any R package, returning different types of different lengths, right there. We wanted to encourage you to  [...]
+
+Reminder: you can place _any_ R expression inside `DT[...]` using column names as if they are variables; e.g., try `DT[, colA*colB/2]`. That does return a vector because you used column names as if they are variables. Wrap with `.()` to return a data.table; i.e. `DT[,.(colA*colB/2)]`.  Name it: `DT[,.(myResult = colA*colB/2)]`.  And we'll leave it to you to guess how to return two things from this query. It's also quite common to do a bunch of things inside an anonymous body: `DT[, { x<- [...]
+
+## Why does `DT[,"region"]` return a 1-column data.table rather than a vector?
+
+See the [answer above](#j-num). Try `DT$region` instead. Or `DT[["region"]]`. 
+
+
+## Why does `DT[, region]` return a vector for the "region" column?  I'd like a 1-column data.table.
+
+Try `DT[ , .(region)]` instead. `.()` is an alias for `list()` and ensures a data.table is returned.
+
+Also continue reading and see the FAQ after next. Skim whole documents before getting stuck in one part.
+
+## Why does `DT[ , x, y, z]` not work? I wanted the 3 columns `x`,`y` and `z`.
+
+The `j` expression is the 2nd argument. Try `DT[ , c("x","y","z")]` or `DT[ , .(x,y,z)]`.
+
+## I assigned a variable `mycol = "x"` but then `DT[ , mycol]` returns `"x"`. How do I get it to look up the column name contained in the `mycol` variable?
+
+In v1.9.8 released Nov 2016 there is an abililty to turn on new behaviour: `options(datatable.WhenJisSymbolThenCallingScope=TRUE)`. It will then work as you expected, just like data.frame. If you are a new user of data.table, you should probably do this. You can place this command in your .Rprofile file so you don't have to remember again. See the long item in release notes about this. The release notes are linked at the top of the data.table homepage: [NEWS](https://github.com/Rdatatabl [...]
+
+Without turning on that new behavior, what's happening is that the `j` expression sees objects in the calling scope. The variable `mycol` does not exist as a column name of `DT` so data.table then looked in the calling scope and found `mycol` there and returned its value `"x"`. This is correct behaviour currently. Had `mycol` been a column name, then that column's data would have been returned. What has been done to date has been `DT[ , mycol, with = FALSE]` which will return the `x` col [...]
+
+## What are the benefits of being able to use column names as if they are variables inside `DT[...]`?
+
+`j` doesn't have to be just column names. You can write any R _expression_ of column names directly in `j`, _e.g._, `DT[ , mean(x*y/z)]`.  The same applies to `i`, _e.g._, `DT[x>1000, sum(y*z)]`.
+
+This runs the `j` expression on the set of rows where the `i` expression is true. You don't even need to return data, _e.g._, `DT[x>1000, plot(y, z)]`. You can do `j` by group simply by adding `by = `; e.g., `DT[x>1000, sum(y*z), by = w]`. This runs `j` for each group in column `w` but just over the rows where `x>1000`. By placing the 3 parts of the query (i=where, j=select and by=group by) inside the square brackets, data.table sees this query as a whole before any part of it is evaluat [...]
+
+## OK, I'm starting to see what data.table is about, but why didn't you just enhance `data.frame` in R? Why does it have to be a new package?
+
+As [highlighted above](#j-num), `j` in `[.data.table` is fundamentally different from `j` in `[.data.frame`. Even if something as simple as `DF[ , 1]` was changed in base R to return a data.frame rather than a vector, that would break existing code in many 1000's of CRAN packages and user code. As soon as we took the step to create a new class that inherited from data.frame, we had the opportunity to change a few things and we did. We want data.table to be slightly different and to work  [...]
+
+Furthermore, data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table can be passed to any package that only accepts `data.frame` and that package can use `[.data.frame` syntax on the data.table. See [this answer](http://stackoverflow.com/a/10529888/403310) for how that is achieved.
+
+We _have_ proposed enhancements to R wherever possible, too. One of these was accepted as a new feature in R 2.12.0 :
+
+> `unique()` and `match()` are now faster on character vectors where all elements are in the global CHARSXP cache and have unmarked encoding (ASCII).  Thanks to Matt Dowle for suggesting improvements to the way the hash code is generated in unique.c.
+
+A second proposal was to use `memcpy` in duplicate.c, which is much faster than a for loop in C. This would improve the _way_ that R copies data internally (on some measures by 13 times). The thread on r-devel is [here](http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html).
+
+A third more significant proposal that was accepted is that R now uses data.table's radix sort code as from R 3.3.0 :
+
+> The radix sort algorithm and implementation from data.table (forder) replaces the previous radix (counting) sort and adds a new method for order(). Contributed by Matt Dowle and Arun Srinivasan, the new algorithm supports logical, integer (even with large values), real, and character vectors. It outperforms all other methods, but there are some caveats (see ?sort).
+
+This was big event for us and we celebrated until the cows came home. (Not really.)
+
+## Why are the defaults the way they are? Why does it work the way it does?
+
+The simple answer is because the main author originally designed it for his own use. He wanted it that way. He finds it a more natural, faster way to write code, which also executes more quickly.
+
+## Isn't this already done by `with()` and `subset()` in `base`?
+
+Some of the features discussed so far are, yes. The package builds upon base functionality. It does the same sorts of things but with less code required and executes many times faster if used correctly.
+
+## Why does `X[Y]` return all the columns from `Y` too? Shouldn't it return a subset of `X`?
+
+This was changed in v1.5.3 (Feb 2011). Since then `X[Y]` includes `Y`'s non-join columns. We refer to this feature as _join inherited scope_ because not only are `X` columns available to the `j` expression, so are `Y` columns. The downside is that `X[Y]` is less efficient since every item of `Y`'s non-join columns are duplicated to match the (likely large) number of rows in `X` that match. We therefore strongly encourage `X[Y, j]` instead of `X[Y]`. See [next FAQ](#MergeDiff).
+
+## What is the difference between `X[Y]` and `merge(X, Y)`? {#MergeDiff}
+
+`X[Y]` is a join, looking up `X`'s rows using `Y` (or `Y`'s key if it has one) as an index.
+
+`Y[X]` is a join, looking up `Y`'s rows using `X` (or `X`'s key if it has one) as an index.
+
+`merge(X,Y)`[^1] does both ways at the same time. The number of rows of `X[Y]` and `Y[X]` usually differ, whereas the number of rows returned by `merge(X, Y)` and `merge(Y, X)` is the same.
+
+_BUT_ that misses the main point. Most tasks require something to be done on the data after a join or merge. Why merge all the columns of data, only to use a small subset of them afterwards? You may suggest `merge(X[ , ColsNeeded1], Y[ , ColsNeeded2])`, but that requires the programmer to work out which columns are needed. `X[Y, j]` in data.table does all that in one step for you. When you write `X[Y, sum(foo*bar)]`, data.table automatically inspects the `j` expression to see which colum [...]
+
+[^1]: Here we mean either the `merge` _method_ for data.table or the `merge` method for `data.frame` since both methods work in the same way in this respect. See `?merge.data.table` and [below](#r-dispatch) for more information about method dispatch.
+
+## Anything else about `X[Y, sum(foo*bar)]`?
+
+This behaviour changed in v1.9.4 (Sep 2014). It now does the `X[Y]` join and then runs `sum(foo*bar)` over all the rows; i.e., `X[Y][ , sum(foo*bar)]`. It used to run `j` for each _group_ of `X` that each row of `Y` matches to. That can still be done as it's very useful but you now need to be explicit and specify `by = .EACHI`, _i.e._, `X[Y, sum(foo*bar), by = .EACHI]`. We call this _grouping by each `i`_.
+
+For example, (further complicating it by using _join inherited scope_, too):
+
+```{r}
+X = data.table(grp = c("a", "a", "b",
+                       "b", "b", "c", "c"), foo = 1:7)
+setkey(X, grp)
+Y = data.table(c("b", "c"), bar = c(4, 2))
+X
+Y
+X[Y, sum(foo*bar)]
+X[Y, sum(foo*bar), by = .EACHI]
+```
+
+## That's nice. How did you manage to change it given that users depended on the old behaviour?
+
+The request to change came from users. The feeling was that if a query is doing grouping then an explicit `by=` should be present for code readability reasons. An option was provided to return the old behaviour: `options(datatable.old.bywithoutby)`, by default `FALSE`. This enabled upgrading to test the other new features / bug fixes in v1.9.4, with later migration of any by-without-by queries when ready by adding `by=.EACHI` to them. We retained 47 pre-change tests and added them back a [...]
+
+Of the 66 packages on CRAN or Bioconductor that depended on or import data.table at the time of releasing v1.9.4 (it is now over 300), only one was affected by the change. That could be because many packages don't have comprehensive tests, or just that grouping by each row in `i` wasn't being used much by downstream packages. We always test the new version with all dependent packages before release and coordinate any changes with those maintainers. So this release was quite straightforwa [...]
+
+Another compelling reason to make the change was that previously, there was no efficient way to achieve what `X[Y, sum(foo*bar)]` does now. You had to write `X[Y][ , sum(foo*bar)]`. That was suboptimal because `X[Y]` joined all the columns and passed them all to the second compound query without knowing that only `foo` and `bar` are needed. To solve that efficiency problem, extra programming effort was required: `X[Y, list(foo, bar)][ , sum(foo*bar)]`.  The change to `by = .EACHI` has si [...]
+
+# General Syntax
+
+## How can I avoid writing a really long `j` expression? You've said that I should use the column _names_, but I've got a lot of columns.
+
+When grouping, the `j` expression can use column names as variables, as you know, but it can also use a reserved symbol `.SD` which refers to the **S**ubset of the **D**ata.table for each group (excluding the grouping columns). So to sum up all your columns it's just `DT[ , lapply(.SD, sum), by = grp]`. It might seem tricky, but it's fast to write and fast to run. Notice you don't have to create an anonymous function. The `.SD` object is efficiently implemented internally and more effici [...]
+
+So please don't do, for example, `DT[ , sum(.SD[["sales"]]), by = grp]`. That works but is inefficient and inelegant. `DT[ , sum(sales), by = grp]` is what was intended, and it could be 100s of times faster. If you use _all_ of the data in `.SD` for each group (such as in `DT[ , lapply(.SD, sum), by = grp]`) then that's very good usage of `.SD`. If you're using _several_ but not _all_ of the columns, you can combine `.SD` with `.SDcols`; see `?data.table`.
+
+## Why is the default for `mult` now `"all"`?
+
+In v1.5.3 the default was changed to `"all"`. When `i` (or `i`'s key if it has one) has fewer columns than `x`'s key, `mult` was already set to `"all"` automatically. Changing the default makes this clearer and easier for users as it came up quite often.
+
+In versions up to v1.3, `"all"` was slower. Internally, `"all"` was implemented by joining using `"first"`, then again from scratch using `"last"`, after which a diff between them was performed to work out the span of the matches in `x` for each row in `i`. Most often we join to single rows, though, where `"first"`,`"last"` and `"all"` return the same result. We preferred maximum performance for the majority of situations so the default chosen was `"first"`. When working with a non-uniqu [...]
+
+In v1.4 the binary search in C was changed to branch at the deepest level to find first and last. That branch will likely occur within the same final pages of RAM so there should no longer be a speed disadvantage in defaulting `mult` to `"all"`. We warned that the default might change and made the change in v1.5.3.
+
+A future version of data.table may allow a distinction between a key and a _unique key_. Internally `mult = "all"` would perform more like `mult = "first"` when all `x`'s key columns were joined to and `x`'s key was a unique key. data.table would need checks on insert and update to make sure a unique key is maintained. An advantage of specifying a unique key would be that data.table would ensure no duplicates could be inserted, in addition to performance.
+
+## I'm using `c()` in `j` and getting strange results.
+
+This is a common source of confusion. In `data.frame` you are used to, for example:
+
+```{r}
+DF = data.frame(x = 1:3, y = 4:6, z = 7:9)
+DF
+DF[ , c("y", "z")]
+```
+
+which returns the two columns. In data.table you know you can use the column names directly and might try:
+
+```{r}
+DT = data.table(DF)
+DT[ , c(y, z)]
+```
+
+but this returns one vector.  Remember that the `j` expression is evaluated within the environment of `DT` and `c()` returns a vector.  If 2 or more columns are required, use `list()` or `.()` instead:
+
+```{r}
+DT[ , .(y, z)]
+```
+
+`c()` can be useful in a data.table too, but its behaviour is different from that in `[.data.frame`.
+
+## I have built up a complex table with many columns.  I want to use it as a template for a new table; _i.e._, create a new table with no rows, but with the column names and types copied from my table. Can I do that easily?
+
+Yes. If your complex table is called `DT`, try `NEWDT = DT[0]`.
+
+## Is a null data.table the same as `DT[0]`?
+
+No. By "null data.table" we mean the result of `data.table(NULL)` or `as.data.table(NULL)`; _i.e._,
+
+```{r}
+data.table(NULL)
+data.frame(NULL)
+as.data.table(NULL)
+as.data.frame(NULL)
+is.null(data.table(NULL))
+is.null(data.frame(NULL))
+```
+
+The null data.table|`frame` is `NULL` with some attributes attached, which means it's no longer `NULL`. In R only pure `NULL` is `NULL` as tested by `is.null()`. When referring to the "null data.table" we use lower case null to help distinguish from upper case `NULL`. To test for the null data.table, use `length(DT) == 0` or `ncol(DT) == 0` (`length` is slightly faster as it's a primitive function).
+
+An _empty_ data.table (`DT[0]`) has one or more columns, all of which are empty. Those empty columns still have names and types.
+
+```{r}
+DT = data.table(a = 1:3, b = c(4, 5, 6), d = c(7L,8L,9L))
+DT[0]
+sapply(DT[0], class)
+```
+
+## Why has the `DT()` alias been removed? {#DTremove1}
+`DT` was introduced originally as a wrapper for a list of `j `expressions. Since `DT` was an alias for data.table, this was a convenient way to take care of silent recycling in cases where each item of the `j` list evaluated to different lengths. The alias was one reason grouping was slow, though.
+
+As of v1.3, `list()` or `.()` should be passed instead to the `j` argument. These are much faster, especially when there are many groups. Internally, this was a nontrivial change. Vector recycling is now done internally, along with several other speed enhancements for grouping.
+
+## But my code uses `j = DT(...)` and it works. The previous FAQ says that `DT()` has been removed. {#DTremove2}
+
+Then you are using a version prior to 1.5.3. Prior to 1.5.3 `[.data.table` detected use of `DT()` in the `j` and automatically replaced it with a call to `list()`. This was to help the transition for existing users.
+
+## What are the scoping rules for `j` expressions?
+
+Think of the subset as an environment where all the column names are variables. When a variable `foo` is used in the `j` of a query such as `X[Y, sum(foo)]`, `foo` is looked for in the following order :
+
+ 1. The scope of `X`'s subset; _i.e._, `X`'s column names.
+ 2. The scope of each row of `Y`; _i.e._, `Y`'s column names (_join inherited scope_)
+ 3. The scope of the calling frame; _e.g._, the line that appears before the data.table query.
+ 4. Exercise for reader: does it then ripple up the calling frames, or go straight to `globalenv()`?
+ 5. The global environment
+
+This is _lexical scoping_ as explained in [R FAQ 3.3.1](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping). The environment in which the function was created is not relevant, though, because there is _no function_. No anonymous _function_ is passed to `j`. Instead, an anonymous _body_ is passed to `j`; for example,
+
+```{r}
+DT = data.table(x = rep(c("a", "b"), c(2, 3)), y = 1:5)
+DT
+DT[ , {z = sum(y); z + 3}, by = x]
+```
+
+Some programming languages call this a _lambda_.
+
+## Can I trace the `j` expression as it runs through the groups? {#j-trace}
+
+Try something like this:
+
+```{r}
+DT[ , {
+  cat("Objects:", paste(objects(), collapse = ","), "\n")
+  cat("Trace: x=", as.character(x), " y=", y, "\n")
+  sum(y)},
+  by = x]
+```
+
+## Inside each group, why are the group variables length-1?
+
+[Above](#j-trace), `x` is a grouping variable and (as from v1.6.1) has `length` 1 (if inspected or used in `j`). It's for efficiency and convenience. Therefore, there is no difference between the following two statements:
+
+```{r}
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x, sum(y)), by = x]
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x[1], sum(y)), by = x]
+```
+
+If you need the size of the current group, use `.N` rather than calling `length()` on any column.
+
+## Only the first 10 rows are printed, how do I print more?
+
+There are two things happening here. First, if the number of rows in a data.table are large (`> 100` by default), then a summary of the data.table is printed to the console by default. Second, the summary of a large data.table is printed by taking the top and bottom `n` (`= 5` by default) rows of the data.table and only printing those. Both of these parameters (when to trigger a summary and how much of a table to use as a summary) are configurable by R's `options` mechanism, or by callin [...]
+
+For instance, to enforce the summary of a data.table to only happen when a data.table is greater than 50 rows, you could `options(datatable.print.nrows = 50)`. To disable the summary-by-default completely, you could `options(datatable.print.nrows = Inf)`. You could also call `print` directly, as in `print(your.data.table, nrows = Inf)`.
+
+If you want to show more than just the top (and bottom) 10 rows of a data.table summary (say you like 20), set `options(datatable.print.topn = 20)`, for example. Again, you could also just call `print` directly, as in `print(your.data.table, topn = 20)`.
+
+## With an `X[Y]` join, what if `X` contains a column called `"Y"`?
+
+When `i` is a single name such as `Y` it is evaluated in the calling frame. In all other cases such as calls to `.()` or other expressions, `i` is evaluated within the scope of `X`. This facilitates easy _self-joins_ such as `X[J(unique(colA)), mult = "first"]`.
+
+## `X[Z[Y]]` is failing because `X` contains a column `"Y"`. I'd like it to use the table `Y` in calling scope.
+
+The `Z[Y]` part is not a single name so that is evaluated within the frame of `X` and the problem occurs. Try `tmp = Z[Y]; X[tmp]`. This is robust to `X` containing a column `"tmp"` because `tmp` is a single name. If you often encounter conflicts of this type, one simple solution may be to name all tables in uppercase and all column names in lowercase, or some similar scheme.
+
+## Can you explain further why data.table is inspired by `A[B]` syntax in `base`?
+
+Consider `A[B]` syntax using an example matrix `A` :
+```{r}
+A = matrix(1:12, nrow = 4)
+A
+```
+
+To obtain cells `(1, 2) = 5` and `(3, 3) = 11` many users (we believe) may try this first :
+```{r}
+A[c(1, 3), c(2, 3)]
+```
+
+However, this returns the union of those rows and columns. To reference the cells, a 2-column matrix is required. `?Extract` says :
+
+> When indexing arrays by `[` a single argument `i` can be a matrix with as many columns as there are dimensions of `x`; the result is then a vector with elements corresponding to the sets of indices in each row of `i`.
+
+Let's try again.
+
+```{r}
+B = cbind(c(1, 3), c(2, 3))
+B
+A[B]
+```
+
+A matrix is a 2-dimensional structure with row names and column names. Can we do the same with names?
+
+```{r}
+rownames(A) = letters[1:4]
+colnames(A) = LETTERS[1:3]
+A
+B = cbind(c("a", "c"), c("B", "C"))
+A[B]
+```
+
+So yes, we can. Can we do the same with a `data.frame`?
+
+```{r}
+A = data.frame(A = 1:4, B = letters[11:14], C = pi*1:4)
+rownames(A) = letters[1:4]
+A
+B
+A[B]
+```
+
+But, notice that the result was coerced to `character.` R coerced `A` to `matrix` first so that the syntax could work, but the result isn't ideal.  Let's try making `B` a `data.frame`.
+
+```{r}
+B = data.frame(c("a", "c"), c("B", "C"))
+cat(try(A[B], silent = TRUE))
+```
+
+So we can't subset a `data.frame` by a `data.frame` in base R. What if we want row names and column names that aren't `character` but `integer` or `float`? What if we want more than 2 dimensions of mixed types? Enter data.table.
+
+Furthermore, matrices, especially sparse matrices, are often stored in a 3-column tuple: `(i, j, value)`. This can be thought of as a key-value pair where `i` and `j` form a 2-column key. If we have more than one value, perhaps of different types, it might look like `(i, j, val1, val2, val3, ...)`. This looks very much like a `data.frame`. Hence data.table extends `data.frame` so that a `data.frame` `X` can be subset by a `data.frame` `Y`, leading to the `X[Y]` syntax.
+
+## Can base be changed to do this then, rather than a new package?
+`data.frame` is used _everywhere_ and so it is very difficult to make _any_ changes to it.
+data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table _can_ be passed to any package that _only_ accepts `data.frame`. When that package uses `[.data.frame` syntax on the data.table, it works. It works because `[.data.table` looks to see where it was called from. If it was called from such a package, `[.data.table` diverts to `[.data.frame`.
+
+## I've heard that data.table syntax is analogous to SQL.
+Yes :
+
+ - `i`  $\Leftrightarrow$ where
+ - `j`  $\Leftrightarrow$  select
+ - `:=`  $\Leftrightarrow$  update
+ - `by`  $\Leftrightarrow$  group by
+ - `i`  $\Leftrightarrow$  order by (in compound syntax)
+ - `i`  $\Leftrightarrow$  having (in compound syntax)
+ - `nomatch = NA`  $\Leftrightarrow$  outer join
+ - `nomatch = 0L`  $\Leftrightarrow$  inner join
+ - `mult = "first"|"last"`  $\Leftrightarrow$  N/A because SQL is inherently unordered
+ - `roll = TRUE`  $\Leftrightarrow$  N/A because SQL is inherently unordered
+
+The general form is :
+
+```{r, eval = FALSE}
+DT[where, select|update, group by][order by][...] ... [...]
+```
+
+A key advantage of column vectors in R is that they are _ordered_, unlike SQL[^2]. We can use ordered functions in `data.table queries such as `diff()` and we can use _any_ R function from any package, not just the functions that are defined in SQL. A disadvantage is that R objects must fit in memory, but with several R packages such as ff, bigmemory, mmap and indexing, this is changing.
+
+[^2]: It may be a surprise to learn that `select top 10 * from ...` does _not_ reliably return the same rows over time in SQL. You do need to include an `order by` clause, or use a clustered index to guarantee row order; _i.e._, SQL is inherently unordered.
+
+## What are the smaller syntax differences between `data.frame` and data.table {#SmallerDiffs}
+
+ - `DT[3]` refers to the 3rd _row_, but `DF[3]` refers to the 3rd _column_
+ - `DT[3, ] == DT[3]`, but `DF[ , 3] == DF[3]` (somewhat confusingly in data.frame, whereas data.table is consistent)
+ - For this reason we say the comma is _optional_ in `DT`, but not optional in `DF`
+ - `DT[[3]] == DF[3] == DF[[3]]`
+ - `DT[i, ]`, where `i` is a single integer, returns a single row, just like `DF[i, ]`, but unlike a matrix single-row subset which returns a vector.
+ - `DT[ , j]` where `j` is a single integer returns a one-column data.table, unlike `DF[, j]` which returns a vector by default
+ - `DT[ , "colA"][[1]] == DF[ , "colA"]`.
+ - `DT[ , colA] == DF[ , "colA"]` (currently in data.table v1.9.8 but is about to change, see release notes)
+ - `DT[ , list(colA)] == DF[ , "colA", drop = FALSE]`
+ - `DT[NA]` returns 1 row of `NA`, but `DF[NA]` returns an entire copy of `DF` containing `NA` throughout. The symbol `NA` is type `logical` in R and is therefore recycled by `[.data.frame`. The user's intention was probably `DF[NA_integer_]`. `[.data.table` diverts to this probable intention automatically, for convenience.
+ - `DT[c(TRUE, NA, FALSE)]` treats the `NA` as `FALSE`, but `DF[c(TRUE, NA, FALSE)]` returns
+  `NA` rows for each `NA`
+ - `DT[ColA == ColB]` is simpler than `DF[!is.na(ColA) & !is.na(ColB) & ColA == ColB, ]`
+ - `data.frame(list(1:2, "k", 1:4))` creates 3 columns, data.table creates one `list` column.
+ - `check.names` is by default `TRUE` in `data.frame` but `FALSE` in data.table, for convenience.
+ - `stringsAsFactors` is by default `TRUE` in `data.frame` but `FALSE` in data.table, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of converting to `factor`.
+ - Atomic vectors in `list` columns are collapsed when printed using `", "` in `data.frame`, but `","` in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
+
+In `[.data.frame` we very often set `drop = FALSE`. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column `data.frame`. In `[.data.table` we took the opportunity to make it consistent and dropped `drop`.
+
+When a data.table is passed to a data.table-unaware package, that package is not concerned with any of these differences; it just works.
+
+## I'm using `j` for its side effect only, but I'm still getting data returned. How do I stop that?
+
+In this case `j` can be wrapped with `invisible()`; e.g., `DT[ , invisible(hist(colB)), by = colA]`[^3]
+
+[^3]: _e.g._, `hist()` returns the breakpoints in addition to plotting to the graphics device.
+
+## Why does `[.data.table` now have a `drop` argument from v1.5?
+
+So that data.table can inherit from `data.frame` without using `...`. If we used `...` then invalid argument names would not be caught.
+
+The `drop` argument is never used by `[.data.table`. It is a placeholder for non-data.table-aware packages when they use the `[.data.frame` syntax directly on a data.table.
+
+## Rolling joins are cool and very fast! Was that hard to program?
+The prevailing row on or before the `i` row is the final row the binary search tests anyway. So `roll = TRUE` is essentially just a switch in the binary search C code to return that row.
+
+## Why does `DT[i, col := value]` return the whole of `DT`? I expected either no visible value (consistent with `<-`), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.
+
+This has changed in v1.8.3 to meet your expectations. Please upgrade.
+
+The whole of `DT` is returned (now invisibly) so that compound syntax can work; _e.g._, `DT[i, done := TRUE][ , sum(done)]`. The number of rows updated is returned when `verbose` is `TRUE`, either on a per-query basis or globally using `options(datatable.verbose = TRUE)`.
+
+## OK, thanks. What was so difficult about the result of `DT[i, col := value]` being returned invisibly?
+R internally forces visibility on for `[`. The value of FunTab's eval column (see [src/main/names.c](https://github.com/wch/r-source/blob/trunk/src/main/names.c)) for `[` is `0` meaning "force `R_Visible` on" (see [R-Internals section 1.6](https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Autoprinting) ). Therefore, when we tried `invisible()` or setting `R_Visible` to `0` directly ourselves, `eval` in [src/main/eval.c](https://github.com/wch/r-source/blob/trunk/src/main/eval. [...]
+
+To solve this problem, the key was to stop trying to stop the print method running after a `:=`. Instead, inside `:=` we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.
+
+## Why do I have to type `DT` sometimes twice after using `:=` to print the result to console?
+
+This is an unfortunate downside to get [#869](https://github.com/Rdatatable/data.table/issues/869) to work. If a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` is typed at the prompt, nothing will be printed. A repeated `DT` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `print(DT)` and `DT[]` at the prompt are guaranteed to pri [...]
+
+## I've noticed that `base::cbind.data.frame` (and `base::rbind.data.frame`) appear to be changed by data.table. How is this possible? Why?
+
+It is a temporary, last resort solution until we discover a better way to solve the problems listed below. Essentially, the issue is that data.table inherits from `data.frame`, _and_ `base::cbind` and `base::rbind` (uniquely) do their own S3 dispatch internally as documented by `?cbind`. The change is adding one `for` loop to the start of each function directly in `base`; _e.g._,
+
+```{r}
+base::cbind.data.frame
+```
+
+That modification is made dynamically, _i.e._, the `base` definition of `cbind.data.frame` is fetched, the `for` loop added to the beginning and then assigned back to `base`. This solution is intended to be robust to different definitions of `base::cbind.data.frame` in different versions of R, including unknown future changes. Again, it is a last resort until a better solution is known or made available. The competing requirements are:
+
+ - `cbind(DT, DF)` needs to work. Defining `cbind.data.table` doesn't work because `base::cbind` does its own S3 dispatch and requires that the _first_ `cbind` method for each object it is passed is _identical_. This is not true in `cbind(DT, DF)` because the first method for `DT` is `cbind.data.table` but the first method for `DF` is `cbind.data.frame`. `base::cbind` then falls through to its internal `bind` code which appears to treat `DT` as a regular `list` and returns very odd looki [...]
+
+ - This naturally leads to trying to mask `cbind.data.frame` instead. Since a data.table is a `data.frame`, `cbind` would find the same method for both `DT` and `DF`. However, this doesn't work either because `base::cbind` appears to find methods in `base` first; _i.e._, `base::cbind.data.frame` isn't maskable. This is reproducible as follows :
+
+```{r}
+foo = data.frame(a = 1:3)
+cbind.data.frame = function(...) cat("Not printed\n")
+cbind(foo)
+rm("cbind.data.frame")
+```
+
+ - Finally, we tried masking `cbind` itself (v1.6.5 and v1.6.6). This allowed `cbind(DT, DF)` to work, but introduced compatibility issues with package `IRanges`, since `IRanges` also masks `cbind`. It worked if `IRanges` was lower on the `search()` path than data.table, but if `IRanges` was higher then data.table's, `cbind` would never be called and the strange-looking `matrix` output occurs again (see [below](#cbinderror)).
+
+If you know of a better solution that still solves all the issues above, then please let us know and we'll gladly change it.
+
+## I've read about method dispatch (_e.g._ `merge` may or may not dispatch to `merge.data.table`) but _how_ does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when? {#r-dispatch}
+
+This comes up quite a lot but it's really earth-shatteringly simple. A function such as `merge` is _generic_ if it consists of a call to `UseMethod`. When you see people talking about whether or not functions are _generic_ functions they are merely typing the function without `()` afterwards, looking at the program code inside it and if they see a call to `UseMethod` then it is _generic_.  What does `UseMethod` do? It literally slaps the function name together with the class of the first [...]
+
+You might now ask: where is this documented in R? Answer: it's quite clear, but, you need to first know to look in `?UseMethod` and _that_ help file contains :
+
+> When a function calling `UseMethod('fun')` is applied to an object with class attribute `c('first', 'second')`, the system searches for a function called `fun.first` and, if it finds it, applies it to the object. If no such function is found a function called `fun.second` is tried. If no class name produces a suitable function, the function `fun.default` is used, if it exists, or an error results.
+
+Happily, an internet search for "How does R method dispatch work" (at the time of this writing) returns the `?UseMethod` help page in the top few links. Admittedly, other links rapidly descend into the intricacies of S3 vs S4, internal generics and so on.
+
+However, features like basic S3 dispatch (pasting the function name together with the class name) is why some R folk love R. It's so simple. No complicated registration or signature is required. There isn't much needed to learn. To create the `merge` method for data.table all that was required, literally, was to merely create a function called `merge.data.table`.
+
+# Questions relating to compute time
+
+## I have 20 columns and a large number of rows. Why is an expression of one column so quick?
+
+Several reasons:
+
+ - Only that column is grouped, the other 19 are ignored because data.table inspects the `j` expression and realises it doesn't use the other columns.
+ - One memory allocation is made for the largest group only, then that memory is re-used for the other groups. There is very little garbage to collect.
+ - R is an in-memory column store; i.e., the columns are contiguous in RAM. Page fetches from RAM into L2 cache are minimised.
+
+## I don't have a `key` on a large table, but grouping is still really quick. Why is that?
+
+data.table uses radix sorting. This is significantly faster than other sort algorithms. See [our presentations](http://user2015.math.aau.dk/presentations/234.pdf) on [our homepage](https://github.com/Rdatatable/data.table/wiki) for more information.
+
+This is also one reason why `setkey()` is quick.
+
+When no `key` is set, or we group in a different order from that of the key, we call it an _ad hoc_ `by`.
+
+## Why is grouping by columns in the key faster than an _ad hoc_ `by`?
+
+Because each group is contiguous in RAM, thereby minimising page fetches and memory can be
+copied in bulk (`memcpy` in C) rather than looping in C.
+
+## What are primary and secondary indexes in data.table?
+
+Manual: [`?setkey`](https://www.rdocumentation.org/packages/data.table/functions/setkey)
+S.O. : [What is the purpose of setting a key in data.table?](https://stackoverflow.com/questions/20039335/what-is-the-purpose-of-setting-a-key-in-data-table/20057411#20057411)
+
+`setkey(DT, col1, col2)` orders the rows by column `col1` then within each group of `col1` it orders by `col2`. This is a _primary index_. The row order is changed _by reference_ in RAM. Subsequent joins and groups on those key columns then take advantage of the sort order for efficiency. (Imagine how difficult looking for a phone number in a printed telephone directory would be if it wasn't sorted by surname then forename. That's literally all `setkey` does. It sorts the rows by the col [...]
+
+However, you can only have one primary key because data can only be physically sorted in RAM in one way at a time. Choose the primary index to be the one you use most often (e.g. `[id,date]`). Sometimes there isn't an obvious choice for the primary key or you need to join and group many different columns in different orders. Enter a secondary index. This does use memory (`4*nrow` bytes regardless of the number of columns in the index) to store the order of the rows by the columns you spe [...]
+
+We use the words _index_ and _key_ interchangeably.
+
+# Error messages
+## "Could not find function `DT`"
+See above [here](#DTremove1) and [here](#DTremove2).
+
+## "unused argument(s) (`MySum = sum(v)`)"
+
+This error is generated by `DT[ , MySum = sum(v)]`. `DT[ , .(MySum = sum(v))]` was intended, or `DT[ , j = .(MySum = sum(v))]`.
+
+## "`translateCharUTF8` must be called on a `CHARSXP`"
+This error (and similar, _e.g._, "`getCharCE` must be called on a `CHARSXP`") may be nothing do with character data or locale. Instead, this can be a symptom of an earlier memory corruption. To date these have been reproducible and fixed (quickly). Please report it to our [issues tracker](https://github.com/Rdatatable/data.table/issues).
+
+## `cbind(DT, DF)` returns a strange format, _e.g._ `Integer,5` {#cbinderror}
+
+This occurs prior to v1.6.5, for `rbind(DT, DF)` too. Please upgrade to v1.6.7 or later.
+
+## "cannot change value of locked binding for `.SD`"
+
+`.SD` is locked by design. See `?data.table`. If you'd like to manipulate `.SD` before using it, or returning it, and don't wish to modify `DT` using `:=`, then take a copy first (see `?copy`), _e.g._,
+
+```{r}
+DT = data.table(a = rep(1:3, 1:3), b = 1:6, c = 7:12)
+DT
+DT[ , { mySD = copy(.SD)
+      mySD[1, b := 99L]
+      mySD},
+    by = a]
+```
+
+## "cannot change value of locked binding for `.N`"
+
+Please upgrade to v1.8.1 or later. From this version, if `.N` is returned by `j` it is renamed to `N` to avoid any ambiguity in any subsequent grouping between the `.N` special variable and a column called `".N"`.
+
+The old behaviour can be reproduced by forcing `.N` to be called `.N`, like this :
+```{r}
+DT = data.table(a = c(1,1,2,2,2), b = c(1,2,2,2,1))
+DT
+DT[ , list(.N = .N), list(a, b)]   # show intermediate result for exposition
+cat(try(
+    DT[ , list(.N = .N), by = list(a, b)][ , unique(.N), by = a]   # compound query more typical
+, silent = TRUE))
+```
+
+If you are already running v1.8.1 or later then the error message is now more helpful than the "cannot change value of locked binding" error, as you can see above, since this vignette was produced using v1.8.1 or later.
+
+The more natural syntax now works :
+```{r}
+if (packageVersion("data.table") >= "1.8.1") {
+    DT[ , .N, by = list(a, b)][ , unique(N), by = a]
+  }
+if (packageVersion("data.table") >= "1.9.3") {
+    DT[ , .N, by = .(a, b)][ , unique(N), by = a]   # same
+}
+```
+
+# Warning messages
+## "The following object(s) are masked from `package:base`: `cbind`, `rbind`"
+
+This warning was present in v1.6.5 and v.1.6.6 only, when loading the package. The motivation was to allow `cbind(DT, DF)` to work, but as it transpired, this broke (full) compatibility with package `IRanges`. Please upgrade to v1.6.7 or later.
+
+## "Coerced numeric RHS to integer to match the column's type"
+
+Hopefully, this is self explanatory. The full message is:
+
+Coerced numeric RHS to integer to match the column's type; may have truncated precision. Either change the column to numeric first by creating a new numeric vector length 5 (nrows of entire table) yourself and assigning that (i.e. 'replace' column), or coerce RHS to integer yourself (e.g. 1L or as.integer) to make your intent clear (and for speed). Or, set the column type correctly up front when you create the table and stick to it, please.
+
+
+To generate it, try :
+
+```{r}
+DT = data.table(a = 1:5, b = 1:5)
+suppressWarnings(
+DT[2, b := 6]         # works (slower) with warning
+)
+class(6)              # numeric not integer
+DT[2, b := 7L]        # works (faster) without warning
+class(7L)             # L makes it an integer
+DT[ , b := rnorm(5)]  # 'replace' integer column with a numeric column
+```
+
+## Reading data.table from RDS or RData file
+
+`*.RDS` and `*.RData` are file types which can store in-memory R objects on disk efficiently. However, storing data.table into the binary file loses its column over-allocation. This isn't a big deal -- your data.table will be copied in memory on the next _by reference_ operation and throw a warning. Therefore it is recommended to call `alloc.col()` on each data.table loaded with `readRDS()` or `load()` calls.
+
+# General questions about the package
+
+## v1.3 appears to be missing from the CRAN archive?
+That is correct. v1.3 was available on R-Forge only. There were several large
+changes internally and these took some time to test in development.
+
+## Is data.table compatible with S-plus?
+
+Not currently.
+
+ - A few core parts of the package are written in C and use internal R functions and R structures.
+ - The package uses lexical scoping which is one of the differences between R and **S-plus** explained by [R FAQ 3.3.1](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping)
+
+## Is it available for Linux, Mac and Windows?
+Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no special or OS-specific libraries used.
+
+## I think it's great. What can I do?
+Please file suggestions, bug reports and enhancement requests on our [issues tracker](https://github.com/Rdatatable/data.table/issues). This helps make the package better.
+
+Please do star the package on [GitHub](https://github.com/Rdatatable/data.table/wiki). This helps encourage the developers and helps other R users find the package.
+
+You can submit pull requests to change the code and/or documentation yourself; see our [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/Contributing.md).
+
+## I think it's not great. How do I warn others about my experience?
+Please put your vote and comments on [Crantastic](http://crantastic.org/packages/data-table). Please make it constructive so we have a chance to improve.
+
+## I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?
+Yes, there are two options. You can post to [datatable-help](mailto:datatable-help at lists.r-forge.r-project.org). It's like r-help, but just for this package. Or the [`[data.table]` tag](https://stackoverflow.com/tags/data.table/info) on [Stack Overflow](https://stackoverflow.com/). Feel free to answer questions in those places, too.
+
+## Where are the datatable-help archives?
+The [homepage](https://github.com/Rdatatable/data.table/wiki) contains links to the archives in several formats.
+
+## I'd prefer not to post on the Issues page, can I mail just one or two people privately?
+Sure. You're more likely to get a faster answer from the Issues page or Stack Overflow, though. Further, asking publicly in those places helps build the general knowledge base.
+
+## I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from `data.frame` works?
+
+Please see [this answer](http://stackoverflow.com/a/10529888/403310).
+
+
diff --git a/inst/doc/datatable-faq.Rnw b/inst/doc/datatable-faq.Rnw
deleted file mode 100644
index 097caa4..0000000
--- a/inst/doc/datatable-faq.Rnw
+++ /dev/null
@@ -1,585 +0,0 @@
-\documentclass[a4paper]{article}
-
-\usepackage[margin=3cm]{geometry}
-%%\usepackage[round]{natbib}
-\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
-
-%%\newcommand{\acronym}[1]{\textsc{#1}}
-%%\newcommand{\class}[1]{\mbox{\textsf{#1}}}
-\newcommand{\code}[1]{\mbox{\texttt{#1}}}
-\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
-\newcommand{\proglang}[1]{\textsf{#1}}
-\SweaveOpts{keep.source=TRUE}
-%% \VignetteIndexEntry{Frequently asked questions}
-
-<<echo=FALSE,results=hide>>=
-if (!exists("data.table",.GlobalEnv)) library(data.table)  # see Intro.Rnw for comments on these two lines
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-options(width=70)  # so lines wrap round
-@
-
-\begin{document}
-\title{FAQs about the \pkg{data.table} package in \proglang{R}}
-\date{Revised: \today\\(A later revision may be available on the \href{https://github.com/Rdatatable/data.table/wiki}{homepage})}
-\maketitle
-
-The first section, Beginner FAQs, is intended to be read in order, from start to finish.
-
-\tableofcontents
-\section{Beginner FAQs}
-
-\subsection{Why does \code{DT[,5]} return \code{5}?}
-Because by default, unlike a \code{data.frame}, the 2nd argument is an \emph{expression}
-which is evaluated within the scope of \code{DT}. 5 evaluates to 5. It is generally bad practice to refer
-to columns by number rather than name. If someone else comes along and reads your code later, they
-may have to hunt around to find out which column is number 5. Furthermore, if you or someone else
-changes the column ordering of \code{DT} higher up in your \proglang{R} program, you might get bugs if you forget to
-change all the places in your code which refer to column number 5.
-
-Say column 5 is called ''region'', just do \code{DT[,region]} instead. Notice there are no quotes around
-the column name. This is what we mean by j being evaluated within the scope of the \code{data.table}. That
-scope consists of an environment where the column names are variables.
-
-You can place \emph{any} \proglang{R} expression in \code{j}; e.g., \code{DT[,colA*colB/2]}. Further, \code{j} may be a set of \proglang{R} expressions (including calls to any \proglang{R} package) wrapped with \code{list()}, \code{.()} or an anonymous code block wrapped with braces \code{{}}. A simple example is \code{DT[,fitdistr(d1-d1,"normal")]}.
-
-Having said this, there are some circumstances where referring to a column by number is ok, such as
-a sequence of columns. In these situations just do \code{DT[,5:10,with=FALSE]} or \newline \code{DT[,c(1,4,10),with=FALSE]}.
-See \code{?data.table} for an explanation of the \code{with} argument. It lets you use \code{data.table} the same way as \code{data.frame}, when you need to.
-
-Note that \code{with()} has been a base function for a long time.  That's one reason we say \code{data.table} builds
-upon base functionality. There is little new here really, \code{data.table} is just making use of \code{with()}
-and building it into the syntax.
-
-\subsection{Why does \code{DT[,"region"]} return \code{"region"}?}
-See answer to 1.1 above. Try \code{DT[,region]} instead. Or \code{DT[,"region",with=FALSE]}.
-
-
-\subsection{Why does \code{DT[,region]} return a vector?  I'd like a 1-column \code{data.table}. There is no \code{drop} argument like I'm used to in \code{data.frame}.}
-Try \code{DT[,.(region)]} instead. \code{.()} is an alias for \code{list()} and ensures a \code{data.table} is returned.
-
-\subsection{Why does \code{DT[,x,y,z]} not work? I wanted the 3 columns \code{x},\code{y} and \code{z}.}
-The \code{j} expression is the 2nd argument. The correct way to do this is \code{DT[,.(x,y,z)]}.
-
-\subsection{I assigned a variable \code{mycol="x"} but then \code{DT[,mycol]} returns \code{"x"}. How do I get it to look up the column name contained in the \code{mycol} variable?}
-This is what we mean when we say the \code{j} expression 'sees' objects in the calling scope. The variable \code{mycol} does not exist as a column name of
-\code{DT} so \proglang{R} then looked in the calling scope and found \code{mycol} there and returned its value \code{"x"}. This is correct behaviour. Had \code{mycol} been a column name,
-then that column's data would have been returned. What you probably meant was \code{DT[,mycol,with=FALSE]}, which will return the \code{x} column's 
-data as you wanted. Alternatively, since a \code{data.table} \emph{is} a \code{list}, too, you can write \code{DT[["x"]]} or \code{DT[[mycol]]}.
-
-\subsection{Ok but I don't know the expressions in advance. How do I programatically pass them in?}
-To create expressions use the \code{quote()} function. We refer to these as \emph{quote()-ed} expressions to
-save confusion with the double quotes used to create a character vector such as \code{c("x")}. The simplest
-quote()-ed expression is just one column name :
-
-\code{q = quote(x)}
-
-\code{DT[,eval(q)]  \# returns the column x as a vector}
-
-\code{q = quote(list(x))}
-
-\code{DT[,eval(q)]  \# returns the column x as a 1-column data.table}
-\newline
-Since these are \emph{expressions}, we are not restricted to column names only :
-
-\code{q = quote(mean(x))}
-
-\code{DT[,eval(q)]  \# identical to DT[,mean(x)]}
-
-\code{q = quote(list(x,sd(y),mean(y*z)))}
-
-\code{DT[,eval(q)]  \# identical to DT[,list(x,sd(y),mean(y*z))]}
-\newline
-However, if it's just simply a vector of column names you need, it may be simpler to pass a character vector to \code{j} and use \code{with=FALSE}.
-
-To pass an expression into your own function, one idiom is as follows :
-<<>>=
-DT = as.data.table(iris)
-setkey(DT,Species)
-myfunction = function(dt, expr) {
-    e = substitute(expr)
-    dt[,eval(e),by=Species]
-}
-myfunction(DT,sum(Sepal.Width))
-@
-
-\code{quote()} and \code{eval()} are like macros in other languages. Instead of \code{j=myfunction()} (which won't work without laboriously passing in all the arguments) it's \code{j=eval(mymacro)}. This can be more efficient than a function call, and convenient. When data.table sees \code{j=eval(mymacro)} it knows to find \code{mymacro} in calling scope so as not to be tripped up if a column name happens to be called \code{mymacro}, too. 
-
-For example, let's make sure that exactly the same \code{j} is run for a set of different grouping criteria :
-
-<<>>=
-DT = as.data.table(iris)
-whatToRun = quote( .(AvgWidth = mean(Sepal.Width),
-                     MaxLength = max(Sepal.Length)) )
-DT[, eval(whatToRun), by=Species]
-DT[, eval(whatToRun), by=.(FirstLetter=substring(Species,1,1))]
-DT[, eval(whatToRun), by=.(Petal.Width=round(Petal.Width,0))]
-@
-
-\subsection{What are the benefits of being able to use column names as if they are variables inside \code{DT[...]}?}
-\code{j} doesn't have to be just column names. You can write any \proglang{R} \emph{expression} of column names directly as the \code{j}; e.g., 
-\code{DT[,mean(x*y/z)]}.  The same applies
-to \code{i}; e.g., \code{DT[x>1000, sum(y*z)]}.
-This runs the \code{j} expression on the set of rows where the \code{i} expression is true. You don't even need to return data; e.g., \code{DT[x>1000, plot(y,z)]}. Finally, you can do \code{j} by group by adding \code{by=}; e.g., \code{DT[x>1000, sum(y*z), by=w]}. This runs \code{j} for each group in column \code{w} but just over the rows where \code{x>1000}. By placing the 3 parts of the query (where, select and group by) inside the square brackets, \code{data.table} sees this query as  [...]
-
-\subsection{OK, I'm starting to see what \code{data.table} is about, but why didn't you enhance \code{data.frame} in \proglang{R}? Why does it have to be a new package?}
-As FAQ 1.1 highlights, \code{j} in \code{[.data.table} is fundamentally different from \code{j} in \code{[.data.frame}. Even something as simple as \code{DF[,1]} would break existing code in many packages and user code.  This is by design. We want it to work this way for more complicated syntax to work. There are other differences, too (see FAQ \ref{faq:SmallerDiffs}).
-
-Furthermore, \code{data.table} \emph{inherits} from \code{data.frame}. It \emph{is} a \code{data.frame}, too. A \code{data.table} can be passed to any package that only accepts \code{data.frame} and that package can use \code{[.data.frame} syntax on the \code{data.table}. 
-
-We \emph{have} proposed enhancements to \proglang{R} wherever possible, too. One of these was accepted as a new feature in \proglang{R} 2.12.0 :
-\begin{quotation}unique() and match() are now faster on character vectors where
-      all elements are in the global CHARSXP cache and have unmarked
-      encoding (ASCII).  Thanks to Matt Dowle for suggesting
-      improvements to the way the hash code is generated in unique.c.\end{quotation}
-
-A second proposal was to use memcpy in duplicate.c, which is much faster than a for loop in C. This would improve the \emph{way} that \proglang{R} copies data internally (on some measures by 13 times). The thread on r-devel is here : \url{http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html}.
-
-\subsection{Why are the defaults the way they are? Why does it work the way it does?}
-The simple answer is because the main author originally designed it for his own use. He wanted it that way. He finds it a more natural, faster way to
-write code, which also executes more quickly.
-
-\subsection{Isn't this already done by \code{with()} and \code{subset()} in base?}
-Some of the features discussed so far are, yes. The package builds upon base functionality. It does the same sorts of things but with
-less code required and executes many times faster if used correctly.
-
-\subsection{Why does \code{X[Y]} return all the columns from \code{Y} too? Shouldn't it return a subset of \code{X}?}
-This was changed in v1.5.3. \code{X[Y]} now includes \code{Y}'s non-join columns. We refer to this feature as \emph{join inherited scope} because not only are \code{X} columns available to the j expression, so are \code{Y} columns. The downside is that \code{X[Y]} is less efficient since every item of \code{Y}'s non-join columns are duplicated to match the (likely large) number of rows in \code{X} that match. We therefore strongly encourage \code{X[Y,j]} instead of \code{X[Y]}. See next FAQ.
-
-\subsection{What is the difference between \code{X[Y]} and \code{merge(X,Y)}?}
-\code{X[Y]} is a join, looking up \code{X}'s rows using \code{Y} (or \code{Y}'s key if it has one) as an index.\newline
-\code{Y[X]} is a join, looking up \code{Y}'s rows using \code{X} (or \code{X}'s key if it has one) as an index.\newline
-\code{merge(X,Y)}\footnote{Here we mean either the \code{merge} \emph{method} for \code{data.table} or the \code{merge} method for \code{data.frame} since both methods work in the same way in this respect. See \code{?merge.data.table} and FAQ 2.24 for more information about method dispatch.} does both ways at the same time. The number of rows of \code{X[Y]} and \code{Y[X]} usually differ; whereas the number of rows returned by \code{merge(X,Y)} and \code{merge(Y,X)} is the same.
-
-\emph{BUT} that misses the main point. Most tasks require something to be done on the data after a join or merge. Why merge all the columns of data, only to use a small subset of them afterwards? You may suggest \code{merge(X[,ColsNeeded1],Y[,ColsNeeded2])}, but that takes copies of the subsets of data and it requires the programmer to work out which columns are needed. \code{X[Y,j]} in data.table does all that in one step for you. When you write \code{X[Y,sum(foo*bar)]}, \code{data.tabl [...]
-
-\subsection{Anything else about \code{X[Y,sum(foo*bar)]}?}
-This behaviour changed in v1.9.4 (Sep 2014). It now does the \code{X[Y]} join and then runs \code{sum(foo*bar)} over all the rows; i.e., \code{X[Y][,sum(foo*bar)]}. It used to run \code{j} for each \emph{group} of \code{X} that each row of \code{Y} matches to. That can still be done as it's very useful but you now need to be explicit and specify \code{by=.EACHI}; i.e., \code{X[Y,sum(foo*bar),by=.EACHI]}. We call this \emph{grouping by each i}.
-For example, and making it complicated by using \emph{join inherited scope}, too :
-<<>>=
-X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
-setkey(X,grp)
-Y = data.table(c("b","c"), bar=c(4,2))
-X
-Y
-X[Y,sum(foo*bar)]
-X[Y,sum(foo*bar),by=.EACHI]
-@
-
-\subsection{That's nice. How did you manage to change it?}
-The request to change came from users. The feeling was that if a query is doing grouping then an explicit `by=` should be present for code readability reasons. An option is provided to return the old behaviour: \code{options(datatable.old.bywithoutby)}, by default FALSE. This enables upgrading to test the other new features / bug fixes in v1.9.4, with later migration of any by-without-by queries when ready (by adding \code{by=.EACHI} to them). We retained 47 pre-change tests and added th [...]
-
-Of the 66 packages on CRAN or Bioconductor that depend or import data.table at the time of releasing v1.9.4, only one was affected by the change. That could be because many packages don't have comprehensive tests, or just that grouping by each row in i wasn't being used much by downstream packages. We always test the new version with all dependent packages before release and coordinate any changes with those maintainers. So this release was quite straightforward in that regard.
-
-Another compelling reason to make the change was that previously, there was no efficient way to achieve what \code{X[Y,sum(foo*bar)]} does now. You had to write \code{X[Y][,sum(foo*bar)]}. That was suboptimal because \code{X[Y]} joined all the columns and passed them all to the second compound query without knowing that only foo and bar are needed. To solve that efficiency problem, extra programming effort was required: \code{X[Y,list(foo,bar)][,sum(foo*bar)]}.  The change to \code{by=.E [...]
-
-\section{General syntax}
-
-\subsection{How can I avoid writing a really long \code{j} expression? You've said I should use the column \emph{names}, but I've got a lot of columns.}
-When grouping, the \code{j} expression can use column names as variables, as you know, but it can also use a reserved symbol \code{.SD} which refers to the {\bf S}ubset of the \code{{\bf D}ata.table} for each group (excluding the grouping columns). So to sum up all your columns it's just \code{DT[,lapply(.SD,sum),by=grp]}. It might seem tricky, but it's fast to write and fast to run. Notice you don't have to create an anonymous \code{function}. The \code{.SD} object is efficiently implem [...]
-So please don't do this, for example, \code{DT[,sum(.SD[["sales"]]),by=grp]}. That works but is inefficient and inelegant. This is what was intended: \code{DT[,sum(sales),by=grp]} and could be 100's of times faster. If you do use all the data in \code{.SD} for each group (such as in \code{DT[,lapply(.SD,sum),by=grp]}) then that's very good usage of \code{.SD}. Also see \code{?data.table} for the \code{.SDcols} argument which allows you to specify a subset of columns for \code{.SD}.
-
-\subsection{Why is the default for \code{mult} now \code{"all"}?}
-In v1.5.3 the default was changed to \code{"all"}. When \code{i} (or \code{i}'s key if it has one) has fewer columns than \code{x}'s key, \code{mult} was already set to \code{"all"} automatically. Changing the default makes this clearer and easier for users as it came up quite often.
-
-In versions up to v1.3, \code{"all"} was slower. Internally, \code{"all"} was implemented by joining using \code{"first"}, then again from scratch using \code{"last"}, after which a diff between them was performed to work out the span of the matches in \code{x} for each row in \code{i}. Most often we join to single rows, though, where \code{"first"},\code{"last"} and \code{"all"} return the same result. We preferred maximum performance for the majority of situations so the default chosen [...]
-
-In v1.4 the binary search in C was changed to branch at the deepest level to find first and last. That branch will likely occur within the same final pages of RAM so there should no longer be a speed disadvantage in defaulting \code{mult} to \code{"all"}. We warned that the default might change and made the change in v1.5.3.
-
-A future version of \code{data.table} may allow a distinction between a key and a \emph{unique key}. Internally \code{mult="all"} would perform more like \code{mult="first"} when all \code{x}'s key columns were joined to and \code{x}'s key was a unique key. \code{data.table} would need checks on insert and update to make sure a unique key is maintained. An advantage of specifying a unique key would be that \code{data.table} would ensure no duplicates could be inserted, in addition to per [...]
-
-
-\subsection{I'm using \code{c()} in the \code{j} and getting strange results.}
-This is a common source of confusion. In \code{data.frame} you are used to, for example:
-<<>>=
-DF = data.frame(x=1:3,y=4:6,z=7:9)
-DF
-DF[,c("y","z")]
-@
-which returns the two columns. In \code{data.table} you know you can use the column names directly and might try :
-<<>>=
-DT = data.table(DF)
-DT[,c(y,z)]
-@
-but this returns one vector.  Remember that the \code{j} expression is evaluated within the environment of \code{DT} and \code{c()} returns a vector.  If 2 or more columns are required, use \code{list()} or \code{.()} instead:
-<<>>=
-DT[,.(y,z)]
-@
-\code{c()} can be useful in a \code{data.table} too, but its behaviour is different from that in \code{[.data.frame}.
-
-\subsection{I have built up a complex table with many columns.  I want to use it as a template for a new table; i.e., create a new table with no rows, but with the column names and types copied from my table. Can I do that easily?}
-Yes. If your complex table is called \code{DT}, try \code{NEWDT = DT[0]}.
-
-\subsection{Is a null data.table the same as \code{DT[0]}?}
-No. By "null data.table" we mean the result of \code{data.table(NULL)} or \code{as.data.table(NULL)}; i.e.,
-<<>>=
-data.table(NULL)
-data.frame(NULL)
-as.data.table(NULL)
-as.data.frame(NULL)
-is.null(data.table(NULL))
-is.null(data.frame(NULL))
-@
-The null \code{data.table|frame} is \code{NULL} with some attributes attached, making it not NULL anymore. In R only pure \code{NULL} is \code{NULL} as tested by \code{is.null()}. When referring to the "null data.table" we use lower case null to help distinguish from upper case \code{NULL}. To test for the null data.table, use \code{length(DT)==0} or \code{ncol(DT)==0} (\code{length} is slightly faster as it's a primitive function).
-An \emph{empty} data.table (\code{DT[0]}) has one or more columns, all of which are empty. Those empty columns still have names and types.
-<<>>=
-DT = data.table(a=1:3,b=c(4,5,6),d=c(7L,8L,9L))
-DT[0]
-sapply(DT[0],class)
-@
-
-\subsection{Why has the \code{DT()} alias been removed?}\label{faq:DTremove1}
-\code{DT} was introduced originally as a wrapper for a list of \code{j} expressions. Since \code{DT} was an alias for \code{data.table}, this was a convenient way to take care of silent recycling in cases where each item of the \code{j} list evaluated to different lengths. The alias was one reason grouping was slow, though.
-As of v1.3, \code{list()} or \code{.()} should be passed instead to the \code{j} argument. These are much faster, especially when there are many groups. Internally, this was a nontrivial change. Vector recycling is now done internally, along with several other speed enhancements for grouping.
-
-\subsection{But my code uses \code{j=DT(...)} and it works. The previous FAQ says that \code{DT()} has been removed.}\label{faq:DTremove2}
-Then you are using a version prior to 1.5.3. Prior to 1.5.3 \code{[.data.table} detected use of \code{DT()} in the \code{j} and automatically replaced it with a call to \code{list()}. This was to help the transition for existing users.
-
-\subsection{What are the scoping rules for \code{j} expressions?}
-Think of the subset as an environment where all the column names are variables. When a variable \code{foo} is used in the \code{j} of a query such as \code{X[Y,sum(foo)]}, \code{foo} is looked for in the following order :
-\begin{enumerate}
-\item The scope of \code{X}'s subset; i.e., \code{X}'s column names.
-\item The scope of each row of \code{Y}; i.e., \code{Y}'s column names (\emph{join inherited scope})
-\item The scope of the calling frame; e.g., the line that appears before the \code{data.table} query.
-\item Exercise for reader: does it then ripple up the calling frames, or go straight to \code{globalenv()}?
-\item The global environment
-\end{enumerate}
-This is \emph{lexical scoping} as explained in \href{http://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping}{R FAQ 3.3.1}.
-The environment in which the function was created is not relevant, though, because there is \emph{no function}. No anonymous \emph{function}
-is passed to the \code{j}. Instead, an anonymous \emph{body} is passed to the \code{j}; for example,
-<<>>=
-DT = data.table(x=rep(c("a","b"),c(2,3)),y=1:5)
-DT
-DT[,{z=sum(y);z+3},by=x]
-@
-Some programming languages call this a \emph{lambda}.
-
-\subsection{Can I trace the \code{j} expression as it runs through the groups?}
-Try something like this:
-<<>>=
-DT[,{
-  cat("Objects:",paste(objects(),collapse=","),"\n")
-  cat("Trace: x=",as.character(x)," y=",y,"\n")
-  sum(y)
-},by=x]
-@
-
-\subsection{Inside each group, why are the group variables length 1?}
-In the previous FAQ, \code{x} is a grouping variable
-and (as from v1.6.1) has length 1 (if inspected or used in \code{j}). It's for efficiency and convenience. Therefore, there is no difference between the following two statements:
-<<>>=
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x,sum(y)),by=x]
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x[1],sum(y)),by=x]
-@
-If you need the size of the current group, use \code{.N} rather than calling \code{length()} on any column.
-
-\subsection{Only the first 10 rows are printed, how do I print more?}
-There are two things happening here. First, if the number of rows in a \code{data.table} are large (\code{> 100} by default), then a summary of the \code{data.table} is printed to the console by default. Second, the summary of a large \code{data.table} is printed by takingthe top and bottom \code{n} rows of the \code{data.table} and only printing those. Both of these parameters (when to trigger a summary and how much of a table to use as a summary) are configurable by \proglang{R}'s \cod [...]
-
-For instance, to enforce the summary of a \code{data.table} to only happen when a \code{data.table} is greater than 50 rows, you could \code{options(datatable.print.nrows=50)}. To disable the summary-by-default completely, you could \code{options(datatable.print.nrows=Inf)}. You could also call \code{print} directly, as in \code{print(your.data.table, nrows=Inf)}.
-
-If you want to show more than just the top (and bottom) 10 rows of a \code{data.table} summary (say you like 20), set \code{options(datatable.print.topn=20)}, for example. Again, you could also just call \code{print} directly, as in \code{print(your.data.table, topn=20)}
-
-\subsection{With an \code{X[Y]} join, what if \code{X} contains a column called \code{"Y"}?}
-When \code{i} is a single name such as \code{Y} it is evaluated in the calling frame. In all other cases such as calls to \code{.()} or other expressions, \code{i} is evaluated within the scope of \code{X}. This facilitates easy \emph{self joins} such as \code{X[J(unique(colA)),mult="first"]}.
-
-\subsection{\code{X[Z[Y]]} is failing because \code{X} contains a column \code{"Y"}. I'd like it to use the table \code{Y} in calling scope.}
-The \code{Z[Y]} part is not a single name so that is evaluated within the frame of \code{X} and the problem occurs. Try \code{tmp=Z[Y];X[tmp]}. This is robust to \code{X} containing a column \code{"tmp"} because \code{tmp} is a single name. If you often encounter conflics of this type, one simple solution may be to name all tables in uppercase and all column names in lowercase, or some similar scheme.
-
-\subsection{Can you explain further why \code{data.table} is inspired by \code{A[B]} syntax in base?}
-Consider \code{A[B]} syntax using an example matrix \code{A} :
-<<>>=
-A = matrix(1:12,nrow=4)
-A
-@
-To obtain cells (1,2)=5 and (3,3)=11 many users (we believe) may try this first :
-<<>>=
-A[c(1,3),c(2,3)]
-@
-That returns the union of those rows and columns, though. To reference the cells, a 2-column matrix is required. \code{?Extract} says :
-\begin{quotation}
-When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
-\end{quotation}
-Let's try again.
-<<>>=
-B = cbind(c(1,3),c(2,3))
-B
-A[B]
-@
-A matrix is a 2-dimension structure with row names and column names. Can we do the same with names?
-<<>>=
-rownames(A) = letters[1:4]
-colnames(A) = LETTERS[1:3]
-A
-B = cbind(c("a","c"),c("B","C"))
-A[B]
-@
-So, yes we can. Can we do the same with \code{data.frame}?
-<<>>=
-A = data.frame(A=1:4,B=letters[11:14],C=pi*1:4)
-rownames(A) = letters[1:4]
-A
-B
-A[B]
-@
-But, notice that the result was coerced to character. \proglang{R} coerced \code{A} to matrix first so that the syntax could work, but the result isn't ideal.  Let's try making \code{B} a \code{data.frame}.
-<<>>=
-B = data.frame(c("a","c"),c("B","C"))
-cat(try(A[B],silent=TRUE))
-@
-So we can't subset a \code{data.frame} by a \code{data.frame} in base R. What if we want row names and column names that aren't character but integer or float? What if we want more than 2 dimensions of mixed types? Enter \code{data.table}.
-
-Furthermore, matrices, especially sparse matrices, are often stored in a 3 column tuple: (i,j,value). This can be thought of as a key-value pair where \code{i} and \code{j} form a 2-column key. If we have more than one value, perhaps of different types it might look like (i,j,val1,val2,val3,...). This looks very much like a \code{data.frame}. Hence \code{data.table} extends \code{data.frame} so that a \code{data.frame X} can be subset by a \code{data.frame Y}, leading to the \code{X[Y]} syntax.
-
-\subsection{Can base be changed to do this then, rather than a new package?}
-\code{data.frame} is used \emph{everywhere} and so it is very difficult to make \emph{any} changes to it.
-\code{data.table} \emph{inherits} from \code{data.frame}. It \emph{is} a \code{data.frame}, too. A \code{data.table} \emph{can} be passed to any package that \emph{only} accepts \code{data.frame}. When that package uses \code{[.data.frame} syntax on the \code{data.table}, it works. It works because \code{[.data.table} looks to see where it was called from. If it was called from such a package, \code{[.data.table} diverts to \code{[.data.frame}.
-
-\subsection{I've heard that \code{data.table} syntax is analogous to SQL.}
-Yes :
-\begin{itemize}
-\item{\code{i}  <==>  where}
-\item{\code{j}  <==>  select}
-\item{\code{:=}  <==>  update}
-\item{\code{by}  <==>  group by}
-\item{\code{i}  <==>  order by (in compound syntax)}
-\item{\code{i}  <==>  having (in compound syntax)}
-\item{\code{nomatch=NA}  <==>  outer join}
-\item{\code{nomatch=0}  <==>  inner join}
-\item{\code{mult="first"|"last"}  <==>  N/A because SQL is inherently unordered}
-\item{\code{roll=TRUE}  <==>  N/A because SQL is inherently unordered}
-\end{itemize}
-The general form is : \newline
-\code{\hspace*{2cm}DT[where,select|update,group by][order by][...] ... [...]}
-\newline\newline
-A key advantage of column vectors in \proglang{R} is that they are \emph{ordered}, unlike SQL\footnote{It may be a surprise to learn that \code{select top 10 * from ...} does \emph{not} reliably return the same rows over time in SQL. You do need to include an \code{order by} clause, or use a clustered index to guarantee row order; i.e., SQL is inherently unordered.}. We can use ordered functions in \code{data.table} queries such as \code{diff()} and we can use \emph{any} \proglang{R} fun [...]
-
-
-\subsection{What are the smaller syntax differences between \code{data.frame} and \code{data.table}?}\label{faq:SmallerDiffs}
-\begin{itemize}
-\item{\code{DT[3]} refers to the 3rd row, but \code{DF[3]} refers to the 3rd column}
-\item{\code{DT[3,]} == \code{DT[3]}, but \code{DF[,3]} == \code{DF[3]} (somewhat confusingly)}
-\item{For this reason we say the comma is \emph{optional} in \code{DT}, but not optional in \code{DF}}
-\item{\code{DT[[3]]} == \code{DF[3]} == \code{DF[[3]]}}
-\item{\code{DT[i,]} where \code{i} is a single integer returns a single row, just like \code{DF[i,]}, but unlike a matrix single row subset which returns a vector.}
-\item{\code{DT[,j,with=FALSE]} where \code{j} is a single integer returns a one column \code{data.table}, unlike \code{DF[,j]} which returns a vector by default}
-\item{\code{DT[,"colA",with=FALSE][[1]]} == \code{DF[,"colA"]}.}
-\item{\code{DT[,colA]} == \code{DF[,"colA"]}}
-\item{\code{DT[,list(colA)]} == \code{DF[,"colA",drop=FALSE]}}
-\item{\code{DT[NA]} returns 1 row of \code{NA}, but \code{DF[NA]} returns a copy of \code{DF} containing
-  \code{NA} throughout. The symbol \code{NA} is type logical in \proglang{R} and
-  is therefore recycled by \code{[.data.frame}. Intention was probably \code{DF[NA\_integer\_]}.
-  \code{[.data.table} does this automatically for convenience.}
-\item{\code{DT[c(TRUE,NA,FALSE)]} treats the \code{NA} as \code{FALSE}, but \code{DF[c(TRUE,NA,FALSE)]} returns
-  \code{NA} rows for each \code{NA}}
-\item{\code{DT[ColA==ColB]} is simpler than \code{DF[!is.na(ColA) \& !is.na(ColB) \& ColA==ColB,]}}
-\item{\code{data.frame(list(1:2,"k",1:4))} creates 3 columns, \code{data.table} creates one \code{list} column.}
-\item{\code{check.names} is by default \code{TRUE} in \code{data.frame} but \code{FALSE} in \code{data.table}, for convenience.}
-\item{\code{stringsAsFactors} is by default \code{TRUE} in \code{data.frame} but \code{FALSE} in \code{data.table}, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of coverting to factor.}
-\item{Atomic vectors in \code{list} columns are collapsed when printed using ", " in \code{data.frame}, but "," in \code{data.table} with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.}
-\end{itemize}
-In \code{[.data.frame} we very often set \code{drop=FALSE}. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column \code{data.frame}. In \code{[.data.table} we took the opportunity to make it consistent and drop \code{drop}.
-\newline\newline
-When a \code{data.table} is passed to a \code{data.table}-unaware package, that package it not concerned with any of these differences; it just works.
-
-\subsection{I'm using \code{j} for its side effect only, but I'm still getting data returned. How do I stop that?}
-In this case \code{j} can be wrapped with \code{invisible()}; e.g., \code{DT[,invisible(hist(colB)),by=colA]}\footnote{\code{hist()} returns the breakpoints in addition to plotting to the graphics device}.
-
-\subsection{Why does \code{[.data.table} now have a \code{drop} argument from v1.5?}
-So that \code{data.table} can inherit from \code{data.frame} without using \code{\dots}. If we used \code{\dots} then invalid argument names would not be caught.
-
-The \code{drop} argument is never used by \code{[.data.table}. It is a placeholder for non \code{data.table} aware packages when they use the \code{[.data.frame} syntax directly on a \code{data.table}.
-
-\subsection{Rolling joins are cool and very fast! Was that hard to program?}
-The prevailing row on or before the \code{i} row is the final row the binary search tests anyway. So \code{roll=TRUE} is essentially just a switch in the binary search C code to return that row.
-
-\subsection{Why does \code{DT[i,col:=value]} return the whole of \code{DT}? I expected either no visible value (consistent with \code{<-}), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.}
-This has changed in v1.8.3 to meet your expectations. Please upgrade.
-The whole of \code{DT} is returned (now invisibly) so that compound syntax can work; e.g., \code{DT[i,done:=TRUE][,sum(done)]}. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using \code{options(datatable.verbose=TRUE)}.
-
-\subsection{Ok, thanks. What was so difficult about the result of \code{DT[i,col:=value]} being returned invisibly?}
-\proglang{R} internally forces visibility on for \code{[}. The value of FunTab's eval column (see src/main/names.c) for \code{[} is 0 meaning force \code{R\_Visible} on (see R-Internals section 1.6). Therefore, when we tried \code{invisible()} or setting \code{R\_Visible} to 0 directly ourselves, \code{eval} in src/main/eval.c would force it on again.
-
-To solve this problem, the key was to stop trying to stop the print method running after a \code{:=}. Instead, inside \code{:=} we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.
-
-\subsection{I've noticed that \code{base::cbind.data.frame} (and \code{base::rbind.data.frame}) appear to be changed by \code{data.table}. How is this possible? Why?}
-It is a temporary, last resort solution until we discover a better way to solve the problems listed below. Essentially, the issue is that \code{data.table} inherits from \code{data.frame}, \emph{and}, \code{base::cbind} and \code{base::rbind} (uniquely) do their own S3 dispatch internally as documented by \code{?cbind}. The change is adding one \code{for} loop to the start of each function directly in base; e.g.,
-<<>>=
-base::cbind.data.frame
-@
-That modification is made dynamically; i.e., the base definition of \code{cbind.data.frame} is fetched, the \code{for} loop added to the beginning and then assigned back to base. This solution is intended to be robust to different definitions of \code{base::cbind.data.frame} in different versions of \proglang{R}, including unknown future changes. Again, it is a last resort until a better solution is known or made available. The competing requirements are :
-\begin{itemize}
-\item \code{cbind(DT,DF)} needs to work. Defining \code{cbind.data.table} doesn't work because \code{base::cbind} does its own S3 dispatch and requires that the \emph{first} \code{cbind} method for each object it is passed is \emph{identical}. This is not true in \code{cbind(DT,DF)} because the first method for \code{DT} is \code{cbind.data.table} but the first method for \code{DF} is \code{cbind.data.frame}. \code{base::cbind} then falls through to its internal \code{bind} code which ap [...]
-\item This naturally leads to trying to mask \code{cbind.data.frame} instead. Since a \code{data.table} is a \code{data.frame}, \code{cbind} would find the same method for both \code{DT} and \code{DF}. However, this doesn't work either because \code{base::cbind} appears to find methods in \code{base} first; i.e., \code{base::cbind.data.frame} isn't maskable. This is reproducible as follows :
-\end{itemize}
-<<>>=
-foo = data.frame(a=1:3)
-cbind.data.frame = function(...)cat("Not printed\n")
-cbind(foo)
-@
-<<echo=FALSE>>=
-rm("cbind.data.frame")
-@
-\begin{itemize}
-\item Finally, we tried masking \code{cbind} itself (v1.6.5 and v1.6.6). This allowed \code{cbind(DT,DF)} to work, but introduced compatibility issues with package IRanges, since IRanges also masks \code{cbind}. It worked if IRanges was lower on the search() path than \code{data.table}, but if IRanges was higher then \code{data.table}'s \code{cbind} would never be called and the strange looking matrix output occurs again (FAQ \ref{faq:cbinderror}).
-\end{itemize}
-If you know of a better solution, that still solves all the issues above, then please let us know and we'll gladly change it.
-
-\subsection{I've read about method dispatch (e.g. \code{merge} may or may not dispatch to \code{merge.data.table}) but \emph{how} does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when?}
-This comes up quite a lot, but it's really earth shatteringly simple. A function such as \code{merge} is \emph{generic} if it consists of a call to \code{UseMethod}. When you see people talking about whether or not functions are \emph{generic} functions they are merely typing the function, without \code{()} afterwards, looking at the program code inside it and if they see a call to \code{UseMethod} then it is \emph{generic}.  What does \code{UseMethod} do? It literally slaps the function [...]
-
-You might now ask: where is this documented in R? Answer: it's quite clear, but, you need to first know to look in \code{?UseMethod} and \emph{that} help file contains : 
-
- "When a function calling UseMethod('fun') is applied to an object with class attribute c('first', 'second'), the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used, if it exists, or an error results."
-
-Happily, an internet search for "How does R method dispatch work" (at the time of writing) returns the \code{?UseMethod} help page as the top link. Admittedly, other links rapidly descend into the intracies of S3 vs S4, internal generics and so on.
-
-However, features like basic S3 dispatch (pasting the function name together with the class name) is why some R folk love R. It's so simple. No complicated registration or signature is required. There isn't much needed to learn. To create the \code{merge} method for \code{data.table} all that was required, literally, was to merely create a function called \code{merge.data.table}.
-
-\section{Questions relating to compute time}
-
-\subsection{I have 20 columns and a large number of rows. Why is an expression of one column so quick?}
-Several reasons:
-\begin{itemize}
-\item Only that column is grouped, the other 19 are ignored because \code{data.table} inspects the \code{j} expression and realises it doesn't use the other columns.
-\item One memory allocation is made for the largest group only, then that memory is re-used for the other groups. There is very little garbage to collect.
-\item \proglang{R} is an in-memory column store; i.e., the columns are contiguous in RAM. Page fetches from RAM into L2 cache are minimised.
-\end{itemize}
-
-\subsection{I don't have a key on a large table, but grouping is still really quick. Why is that?}
-\code{data.table} uses radix sorting. This is significantly faster than other sort algorithms. See our presentations on our homepage for more information.
-
-This is also one reason why \code{setkey()} is quick.
-
-When no key is set, or we group in a different order from that of the key, we call it an \emph{ad hoc by}.
-
-\subsection{Why is grouping by columns in the key faster than an ad hoc by?}
-Because each group is contiguous in RAM, thereby minimising page fetches and memory can be
-copied in bulk (memcpy in C) rather than looping in C.
-
-\section{Error messages}
-\subsection{\code{Could not find function "DT"}}
-See FAQ \ref{faq:DTremove1} and FAQ \ref{faq:DTremove2}.
-
-\subsection{\code{unused argument(s) (MySum = sum(v))}}
-This error is generated by \code{DT[,MySum=sum(v)]}. \code{DT[,.(MySum=sum(v))]} was intended, or \code{DT[,j=.(MySum=sum(v))]}.
-
-\subsection{\code{'translateCharUTF8' must be called on a CHARSXP}}
-This error (and similar; e.g., \code{'getCharCE' must be called on a CHARSXP}) may be nothing do with character data or locale. Instead, this can be a symptom of an earlier memory corruption. To date these have been reproducible and fixed (quickly). Please report it to datatable-help.
-
-\subsection{\code{cbind(DT,DF) returns a strange format e.g. 'Integer,5'}} \label{faq:cbinderror}
-This occurs prior to v1.6.5, for \code{rbind(DT,DF)} too. Please upgrade to v1.6.7 or later.
-
-\subsection{\code{cannot change value of locked binding for '.SD'}}
-\code{.SD} is locked by design. See \code{?data.table}. If you'd like to manipulate \code{.SD} before using it, or returning it, and don't wish to modify \code{DT} using \code{:=}, then take a copy first (see \code{?copy}); e.g.,
-<<>>=
-DT = data.table(a=rep(1:3,1:3),b=1:6,c=7:12)
-DT
-DT[,{ mySD = copy(.SD)
-      mySD[1,b:=99L]
-      mySD },
-    by=a]
-@
-
-\subsection{\code{cannot change value of locked binding for '.N'}}
-Please upgrade to v1.8.1 or later. From this version, if \code{.N} is returned by \code{j} it is renamed to \code{N} to avoid any abiguity in any subsequent grouping between the \code{.N} special variable and a column called \code{".N"}.
-The old behaviour can be reproduced by forcing \code{.N} to be called \code{.N}, like this :
-<<>>=
-DT = data.table(a=c(1,1,2,2,2),b=c(1,2,2,2,1))
-DT
-DT[,list(.N=.N),list(a,b)]   # show intermediate result for exposition
-cat(try(
-    DT[,list(.N=.N),by=list(a,b)][,unique(.N),by=a]   # compound query more typical
-,silent=TRUE))
-@
-If you are already running v1.8.1 or later then the error message is now more helpful than the \code{cannot change value of locked binding} error.  As you can see above, since this vignette was produced using v1.8.1 or later. 
-The more natural syntax now works :
-<<>>=
-if (packageVersion("data.table") >= "1.8.1") {
-    DT[,.N,by=list(a,b)][,unique(N),by=a]
-}
-if (packageVersion("data.table") >= "1.9.3") {
-    DT[,.N,by=.(a,b)][,unique(N),by=a]   # same
-}
-@
-
-\section{Warning messages}
-\subsection{\code{The following object(s) are masked from 'package:base': cbind, rbind}}
-This warning was present in v1.6.5 and v.1.6.6 only, when loading the package. The motivation was to allow \code{cbind(DT,DF)} to work, but as it transpired, broke (full) compatibility with package IRanges. Please upgrade to v1.6.7 or later. 
-
-\subsection{\code{Coerced numeric RHS to integer to match the column's type}}
-Hopefully, this is self explanatory. The full message is :\newline
-
-\code{Coerced numeric RHS to integer to match the column's type; may have truncated}\newline
-\code{precision. Either change the column to numeric first by creating a new numeric}\newline
-\code{vector length 5 (nrows of entire table) yourself and assigning that (i.e. }\newline
-\code{'replace' column), or coerce RHS to integer yourself (e.g. 1L or as.integer)}\newline
-\code{to make your intent clear (and for speed). Or, set the column type correctly}\newline
-\code{up front when you create the table and stick to it, please.}\newline
-
-To generate it, try :
-<<>>=
-DT = data.table(a=1:5,b=1:5)
-suppressWarnings(
-DT[2,b:=6]        # works (slower) with warning
-)
-class(6)          # numeric not integer
-DT[2,b:=7L]       # works (faster) without warning
-class(7L)         # L makes it an integer
-DT[,b:=rnorm(5)]  # 'replace' integer column with a numeric column
-@
-
-\section{General questions about the package}
-
-\subsection{v1.3 appears to be missing from the CRAN archive?}
-That is correct. v1.3 was available on R-Forge only. There were several large
-changes internally and these took some time to test in development.
-
-\subsection{Is \code{data.table} compatible with S-plus?}
-Not currently.
-\begin{itemize}
-\item A few core parts of the package are written in C and use internal \proglang{R} functions and \proglang{R} structures.
-\item The package uses lexical scoping which is one of the differences between \proglang{R} and \proglang{S-plus} explained by
-\href{http://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping}{R FAQ 3.3.1}.
-\end{itemize}
-
-\subsection{Is it available for Linux, Mac and Windows?}
-Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no special or OS-specific libraries used.
-
-\subsection{I think it's great. What can I do?}
-Please file suggestions, bug reports and enhancement requests on \href{https://github.com/Rdatatable/data.table/issues}{GitHub}. 
-This helps make the package better.
-
-Please do vote for the package on \href{http://crantastic.org/packages/data-table}{Crantastic}. This helps encourage the developers and helps other \proglang{R} users find the package. If you have time to write a comment too, that can help others in the community. Just simply clicking that you use the package, though, is much appreciated.
-
-You can submit pull requests to change the code and/or documentation yourself.
-
-\subsection{I think it's not great. How do I warn others about my experience?}
-Please put your vote and comments on \href{http://crantastic.org/packages/data-table}{Crantastic}. Please make it constructive so we have a chance to improve.
-
-\subsection{I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?}
-Yes, there are two options. You can post to \href{mailto:datatable-help at lists.r-forge.r-project.org}{datatable-help}. It's like r-help, but just for this package. Or the \href{http://stackoverflow.com/questions/tagged/data.table}{\code{data.table} tag on Stack Overflow}. Feel free to answer questions in those places, too.
-
-\subsection{Where are the datatable-help archives?}
-The \href{https://github.com/Rdatatable/data.table/wiki}{homepage} contains links to the archives in several formats.
-
-\subsection{I'd prefer not to contact datatable-help, can I mail just one or two people privately?}
-Sure. You're more likely to get a faster answer from datatable-help or Stack Overflow, though. Asking publicly in those places helps build the knowledge base.
-
-\subsection{I have created a package that depends on \code{data.table}. How do I ensure my package is \code{data.table}-aware so that inheritance from \code{data.frame} works?}
-Either i) include \code{data.table} in the \code{Depends:} field of your DESCRIPTION file, or ii) include \code{data.table} in the \code{Imports:} field of your DESCRIPTION file AND \code{import(data.table)} in your NAMESPACE file.
-
-\subsection{Why is this FAQ in pdf format? Can it moved to HTML?}
-Yes we'd like to move it to a HTML vignette. Just haven't got to that yet. 
-The benefits of vignettes (rather than a wiki) including the following:
-\begin{itemize}
-\item We include \proglang{R} code in the vignettes. This code is \emph{actually run} when the file is created, not copy and pasted.
-\item This document is \emph{reproducible}. Grab the .Rnw and you can run it yourself.
-\item CRAN checks the package (including running vignettes) every night on Linux, Mac and Windows, both 32bit and 64bit. Results are posted to \url{http://cran.r-project.org/web/checks/check_results_data.table.html}. Included there are results from r-devel; i.e., not yet released R. That serves as a very useful early warning system for any potential future issues as \proglang{R} itself develops.
-\item This file is bound into each version of the package. The package is not accepted on CRAN unless this file passes checks. Each version of the package will have its own FAQ file which will be relevant for that version. Contrast this to a single website, which can be ambiguous if the answer depends on the version.
-\item You can open it offline at your \proglang{R} prompt using \code{vignette()}.
-\item You can extract the code from the document and play with it using\newline \code{edit(vignette("datatable-faq"))}.
-\end{itemize}
-
-\end{document}
-
-
diff --git a/inst/doc/datatable-faq.html b/inst/doc/datatable-faq.html
new file mode 100644
index 0000000..472b175
--- /dev/null
+++ b/inst/doc/datatable-faq.html
@@ -0,0 +1,808 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+<meta name="date" content="2016-12-02" />
+
+<title>Frequently Asked Questions about data.table</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Frequently Asked Questions about data.table</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
+
+
+<div id="TOC">
+<ul>
+<li><a href="#beginner-faqs"><span class="toc-section-number">1</span> Beginner FAQs</a><ul>
+<li><a href="#j-num"><span class="toc-section-number">1.1</span> Why do <code>DT[ , 5]</code> and <code>DT[2, 5]</code> return a 1-column data.table rather than vectors like <code>data.frame</code>?</a></li>
+<li><a href="#why-does-dtregion-return-a-1-column-data.table-rather-than-a-vector"><span class="toc-section-number">1.2</span> Why does <code>DT[,"region"]</code> return a 1-column data.table rather than a vector?</a></li>
+<li><a href="#why-does-dt-region-return-a-vector-for-the-region-column-id-like-a-1-column-data.table."><span class="toc-section-number">1.3</span> Why does <code>DT[, region]</code> return a vector for the “region” column? I’d like a 1-column data.table.</a></li>
+<li><a href="#why-does-dt-x-y-z-not-work-i-wanted-the-3-columns-xy-and-z."><span class="toc-section-number">1.4</span> Why does <code>DT[ , x, y, z]</code> not work? I wanted the 3 columns <code>x</code>,<code>y</code> and <code>z</code>.</a></li>
+<li><a href="#i-assigned-a-variable-mycol-x-but-then-dt-mycol-returns-x.-how-do-i-get-it-to-look-up-the-column-name-contained-in-the-mycol-variable"><span class="toc-section-number">1.5</span> I assigned a variable <code>mycol = "x"</code> but then <code>DT[ , mycol]</code> returns <code>"x"</code>. How do I get it to look up the column name contained in the <code>mycol</code> variable?</a></li>
+<li><a href="#what-are-the-benefits-of-being-able-to-use-column-names-as-if-they-are-variables-inside-dt..."><span class="toc-section-number">1.6</span> What are the benefits of being able to use column names as if they are variables inside <code>DT[...]</code>?</a></li>
+<li><a href="#ok-im-starting-to-see-what-data.table-is-about-but-why-didnt-you-just-enhance-data.frame-in-r-why-does-it-have-to-be-a-new-package"><span class="toc-section-number">1.7</span> OK, I’m starting to see what data.table is about, but why didn’t you just enhance <code>data.frame</code> in R? Why does it have to be a new package?</a></li>
+<li><a href="#why-are-the-defaults-the-way-they-are-why-does-it-work-the-way-it-does"><span class="toc-section-number">1.8</span> Why are the defaults the way they are? Why does it work the way it does?</a></li>
+<li><a href="#isnt-this-already-done-by-with-and-subset-in-base"><span class="toc-section-number">1.9</span> Isn’t this already done by <code>with()</code> and <code>subset()</code> in <code>base</code>?</a></li>
+<li><a href="#why-does-xy-return-all-the-columns-from-y-too-shouldnt-it-return-a-subset-of-x"><span class="toc-section-number">1.10</span> Why does <code>X[Y]</code> return all the columns from <code>Y</code> too? Shouldn’t it return a subset of <code>X</code>?</a></li>
+<li><a href="#MergeDiff"><span class="toc-section-number">1.11</span> What is the difference between <code>X[Y]</code> and <code>merge(X, Y)</code>?</a></li>
+<li><a href="#anything-else-about-xy-sumfoobar"><span class="toc-section-number">1.12</span> Anything else about <code>X[Y, sum(foo*bar)]</code>?</a></li>
+<li><a href="#thats-nice.-how-did-you-manage-to-change-it-given-that-users-depended-on-the-old-behaviour"><span class="toc-section-number">1.13</span> That’s nice. How did you manage to change it given that users depended on the old behaviour?</a></li>
+</ul></li>
+<li><a href="#general-syntax"><span class="toc-section-number">2</span> General Syntax</a><ul>
+<li><a href="#how-can-i-avoid-writing-a-really-long-j-expression-youve-said-that-i-should-use-the-column-names-but-ive-got-a-lot-of-columns."><span class="toc-section-number">2.1</span> How can I avoid writing a really long <code>j</code> expression? You’ve said that I should use the column <em>names</em>, but I’ve got a lot of columns.</a></li>
+<li><a href="#why-is-the-default-for-mult-now-all"><span class="toc-section-number">2.2</span> Why is the default for <code>mult</code> now <code>"all"</code>?</a></li>
+<li><a href="#im-using-c-in-j-and-getting-strange-results."><span class="toc-section-number">2.3</span> I’m using <code>c()</code> in <code>j</code> and getting strange results.</a></li>
+<li><a href="#i-have-built-up-a-complex-table-with-many-columns.-i-want-to-use-it-as-a-template-for-a-new-table-i.e.-create-a-new-table-with-no-rows-but-with-the-column-names-and-types-copied-from-my-table.-can-i-do-that-easily"><span class="toc-section-number">2.4</span> I have built up a complex table with many columns. I want to use it as a template for a new table; <em>i.e.</em>, create a new table with no rows, but with the column names and types copied from my table. Can I do that  [...]
+<li><a href="#is-a-null-data.table-the-same-as-dt0"><span class="toc-section-number">2.5</span> Is a null data.table the same as <code>DT[0]</code>?</a></li>
+<li><a href="#DTremove1"><span class="toc-section-number">2.6</span> Why has the <code>DT()</code> alias been removed?</a></li>
+<li><a href="#DTremove2"><span class="toc-section-number">2.7</span> But my code uses <code>j = DT(...)</code> and it works. The previous FAQ says that <code>DT()</code> has been removed.</a></li>
+<li><a href="#what-are-the-scoping-rules-for-j-expressions"><span class="toc-section-number">2.8</span> What are the scoping rules for <code>j</code> expressions?</a></li>
+<li><a href="#j-trace"><span class="toc-section-number">2.9</span> Can I trace the <code>j</code> expression as it runs through the groups?</a></li>
+<li><a href="#inside-each-group-why-are-the-group-variables-length-1"><span class="toc-section-number">2.10</span> Inside each group, why are the group variables length-1?</a></li>
+<li><a href="#only-the-first-10-rows-are-printed-how-do-i-print-more"><span class="toc-section-number">2.11</span> Only the first 10 rows are printed, how do I print more?</a></li>
+<li><a href="#with-an-xy-join-what-if-x-contains-a-column-called-y"><span class="toc-section-number">2.12</span> With an <code>X[Y]</code> join, what if <code>X</code> contains a column called <code>"Y"</code>?</a></li>
+<li><a href="#xzy-is-failing-because-x-contains-a-column-y.-id-like-it-to-use-the-table-y-in-calling-scope."><span class="toc-section-number">2.13</span> <code>X[Z[Y]]</code> is failing because <code>X</code> contains a column <code>"Y"</code>. I’d like it to use the table <code>Y</code> in calling scope.</a></li>
+<li><a href="#can-you-explain-further-why-data.table-is-inspired-by-ab-syntax-in-base"><span class="toc-section-number">2.14</span> Can you explain further why data.table is inspired by <code>A[B]</code> syntax in <code>base</code>?</a></li>
+<li><a href="#can-base-be-changed-to-do-this-then-rather-than-a-new-package"><span class="toc-section-number">2.15</span> Can base be changed to do this then, rather than a new package?</a></li>
+<li><a href="#ive-heard-that-data.table-syntax-is-analogous-to-sql."><span class="toc-section-number">2.16</span> I’ve heard that data.table syntax is analogous to SQL.</a></li>
+<li><a href="#SmallerDiffs"><span class="toc-section-number">2.17</span> What are the smaller syntax differences between <code>data.frame</code> and data.table</a></li>
+<li><a href="#im-using-j-for-its-side-effect-only-but-im-still-getting-data-returned.-how-do-i-stop-that"><span class="toc-section-number">2.18</span> I’m using <code>j</code> for its side effect only, but I’m still getting data returned. How do I stop that?</a></li>
+<li><a href="#why-does-.data.table-now-have-a-drop-argument-from-v1.5"><span class="toc-section-number">2.19</span> Why does <code>[.data.table</code> now have a <code>drop</code> argument from v1.5?</a></li>
+<li><a href="#rolling-joins-are-cool-and-very-fast-was-that-hard-to-program"><span class="toc-section-number">2.20</span> Rolling joins are cool and very fast! Was that hard to program?</a></li>
+<li><a href="#why-does-dti-col-value-return-the-whole-of-dt-i-expected-either-no-visible-value-consistent-with---or-a-message-or-return-value-containing-how-many-rows-were-updated.-it-isnt-obvious-that-the-data-has-indeed-been-updated-by-reference."><span class="toc-section-number">2.21</span> Why does <code>DT[i, col := value]</code> return the whole of <code>DT</code>? I expected either no visible value (consistent with <code><-</code>), or a message or return value containing how m [...]
+<li><a href="#ok-thanks.-what-was-so-difficult-about-the-result-of-dti-col-value-being-returned-invisibly"><span class="toc-section-number">2.22</span> OK, thanks. What was so difficult about the result of <code>DT[i, col := value]</code> being returned invisibly?</a></li>
+<li><a href="#why-do-i-have-to-type-dt-sometimes-twice-after-using-to-print-the-result-to-console"><span class="toc-section-number">2.23</span> Why do I have to type <code>DT</code> sometimes twice after using <code>:=</code> to print the result to console?</a></li>
+<li><a href="#ive-noticed-that-basecbind.data.frame-and-baserbind.data.frame-appear-to-be-changed-by-data.table.-how-is-this-possible-why"><span class="toc-section-number">2.24</span> I’ve noticed that <code>base::cbind.data.frame</code> (and <code>base::rbind.data.frame</code>) appear to be changed by data.table. How is this possible? Why?</a></li>
+<li><a href="#r-dispatch"><span class="toc-section-number">2.25</span> I’ve read about method dispatch (<em>e.g.</em> <code>merge</code> may or may not dispatch to <code>merge.data.table</code>) but <em>how</em> does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when?</a></li>
+</ul></li>
+<li><a href="#questions-relating-to-compute-time"><span class="toc-section-number">3</span> Questions relating to compute time</a><ul>
+<li><a href="#i-have-20-columns-and-a-large-number-of-rows.-why-is-an-expression-of-one-column-so-quick"><span class="toc-section-number">3.1</span> I have 20 columns and a large number of rows. Why is an expression of one column so quick?</a></li>
+<li><a href="#i-dont-have-a-key-on-a-large-table-but-grouping-is-still-really-quick.-why-is-that"><span class="toc-section-number">3.2</span> I don’t have a <code>key</code> on a large table, but grouping is still really quick. Why is that?</a></li>
+<li><a href="#why-is-grouping-by-columns-in-the-key-faster-than-an-ad-hoc-by"><span class="toc-section-number">3.3</span> Why is grouping by columns in the key faster than an <em>ad hoc</em> <code>by</code>?</a></li>
+<li><a href="#what-are-primary-and-secondary-indexes-in-data.table"><span class="toc-section-number">3.4</span> What are primary and secondary indexes in data.table?</a></li>
+</ul></li>
+<li><a href="#error-messages"><span class="toc-section-number">4</span> Error messages</a><ul>
+<li><a href="#could-not-find-function-dt"><span class="toc-section-number">4.1</span> “Could not find function <code>DT</code>”</a></li>
+<li><a href="#unused-arguments-mysum-sumv"><span class="toc-section-number">4.2</span> “unused argument(s) (<code>MySum = sum(v)</code>)”</a></li>
+<li><a href="#translatecharutf8-must-be-called-on-a-charsxp"><span class="toc-section-number">4.3</span> “<code>translateCharUTF8</code> must be called on a <code>CHARSXP</code>”</a></li>
+<li><a href="#cbinddt-df-returns-a-strange-format-e.g.-integer5"><span class="toc-section-number">4.4</span> <code>cbind(DT, DF)</code> returns a strange format, <em>e.g.</em> <code id="cbinderror">Integer,5</code></a></li>
+<li><a href="#cannot-change-value-of-locked-binding-for-.sd"><span class="toc-section-number">4.5</span> “cannot change value of locked binding for <code>.SD</code>”</a></li>
+<li><a href="#cannot-change-value-of-locked-binding-for-.n"><span class="toc-section-number">4.6</span> “cannot change value of locked binding for <code>.N</code>”</a></li>
+</ul></li>
+<li><a href="#warning-messages"><span class="toc-section-number">5</span> Warning messages</a><ul>
+<li><a href="#the-following-objects-are-masked-from-packagebase-cbind-rbind"><span class="toc-section-number">5.1</span> “The following object(s) are masked from <code>package:base</code>: <code>cbind</code>, <code>rbind</code>”</a></li>
+<li><a href="#coerced-numeric-rhs-to-integer-to-match-the-columns-type"><span class="toc-section-number">5.2</span> “Coerced numeric RHS to integer to match the column’s type”</a></li>
+<li><a href="#reading-data.table-from-rds-or-rdata-file"><span class="toc-section-number">5.3</span> Reading data.table from RDS or RData file</a></li>
+</ul></li>
+<li><a href="#general-questions-about-the-package"><span class="toc-section-number">6</span> General questions about the package</a><ul>
+<li><a href="#v1.3-appears-to-be-missing-from-the-cran-archive"><span class="toc-section-number">6.1</span> v1.3 appears to be missing from the CRAN archive?</a></li>
+<li><a href="#is-data.table-compatible-with-s-plus"><span class="toc-section-number">6.2</span> Is data.table compatible with S-plus?</a></li>
+<li><a href="#is-it-available-for-linux-mac-and-windows"><span class="toc-section-number">6.3</span> Is it available for Linux, Mac and Windows?</a></li>
+<li><a href="#i-think-its-great.-what-can-i-do"><span class="toc-section-number">6.4</span> I think it’s great. What can I do?</a></li>
+<li><a href="#i-think-its-not-great.-how-do-i-warn-others-about-my-experience"><span class="toc-section-number">6.5</span> I think it’s not great. How do I warn others about my experience?</a></li>
+<li><a href="#i-have-a-question.-i-know-the-r-help-posting-guide-tells-me-to-contact-the-maintainer-not-r-help-but-is-there-a-larger-group-of-people-i-can-ask"><span class="toc-section-number">6.6</span> I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?</a></li>
+<li><a href="#where-are-the-datatable-help-archives"><span class="toc-section-number">6.7</span> Where are the datatable-help archives?</a></li>
+<li><a href="#id-prefer-not-to-post-on-the-issues-page-can-i-mail-just-one-or-two-people-privately"><span class="toc-section-number">6.8</span> I’d prefer not to post on the Issues page, can I mail just one or two people privately?</a></li>
+<li><a href="#i-have-created-a-package-that-uses-data.table.-how-do-i-ensure-my-package-is-data.table-aware-so-that-inheritance-from-data.frame-works"><span class="toc-section-number">6.9</span> I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from <code>data.frame</code> works?</a></li>
+</ul></li>
+</ul>
+</div>
+
+<style>
+h2 {
+    font-size: 20px;
+}
+</style>
+<p>The first section, Beginner FAQs, is intended to be read in order, from start to finish. It’s just written in a FAQ style to be digested more easily. It isn’t really the most frequently asked questions. A better measure for that is looking on Stack Overflow.</p>
+<p>This FAQ is required reading and considered core documentation. Please do not ask questions on Stack Overflow or raise issues on GitHub until you have read it. We can all tell when you ask that you haven’t read it. So if you do ask and haven’t read it, don’t use your real name.</p>
+<p>This document has been quickly revised given the changes in v1.9.8 released Nov 2016. Please do submit pull requests to fix mistakes or improvements. If anyone knows why the table of contents comes out so narrow and squashed when displayed by CRAN, please let us know. This document used to be a PDF and we changed it recently to HTML.</p>
+<div id="beginner-faqs" class="section level1">
+<h1><span class="header-section-number">1</span> Beginner FAQs</h1>
+<div id="j-num" class="section level2">
+<h2><span class="header-section-number">1.1</span> Why do <code>DT[ , 5]</code> and <code>DT[2, 5]</code> return a 1-column data.table rather than vectors like <code>data.frame</code>?</h2>
+<p>For consistency so that when you use data.table in functions that accept varying inputs, you can rely on <code>DT[...]</code> returning a data.table. You don’t have to remember to include <code>drop=FALSE</code> like you do in data.frame. data.table was first released in 2006 and this difference to data.frame has been a feature since the very beginning.</p>
+<p>You may have heard that it is generally bad practice to refer to columns by number rather than name, though. If your colleague comes along and reads your code later they may have to hunt around to find out which column is number 5. If you or they change the column ordering higher up in your R program, you may produce wrong results with no warning or error if you forget to change all the places in your code which refer to column number 5. That is your fault not R’s or data.table’s. It’ [...]
+<p>Say column 5 is named <code>"region"</code> and you really must extract that column as a vector not a data.table. It is more robust to use the column name and write <code>DT$region</code> or <code>DT[["region"]]</code>; i.e., the same as base R. Using base R’s <code>$</code> and <code>[[</code> on data.table is encouraged. Not when combined with <code><-</code> to assign (use <code>:=</code> instead for that) but just to select a single column by name they are e [...]
+<p>There are some circumstances where referring to a column by number seems like the only way, such as a sequence of columns. In these situations just like data.frame, you can write <code>DT[, 5:10]</code> and <code>DT[,c(1,4,10)]</code>. However, again, it is more robust (to future changes in your data’s number of and ordering of columns) to use a named range such as <code>DT[,columnRed:columnViolet]</code> or name each one <code>DT[,c("columnRed","columnOrange",&quo [...]
+<p>However, what we really want you to do is <code>DT[,.(columnRed,columnOrange,columnYellow)]</code>; i.e., use column names as if they are variables directly inside <code>DT[...]</code>. You don’t have to prefix each column with <code>DT$</code> like you do in data.frame. The <code>.()</code> part is just an alias for <code>list()</code> and you can use <code>list()</code> instead if you prefer. You can place any R expression of column names, using any R package, returning different ty [...]
+<p>Reminder: you can place <em>any</em> R expression inside <code>DT[...]</code> using column names as if they are variables; e.g., try <code>DT[, colA*colB/2]</code>. That does return a vector because you used column names as if they are variables. Wrap with <code>.()</code> to return a data.table; i.e. <code>DT[,.(colA*colB/2)]</code>. Name it: <code>DT[,.(myResult = colA*colB/2)]</code>. And we’ll leave it to you to guess how to return two things from this query. It’s also quite commo [...]
+</div>
+<div id="why-does-dtregion-return-a-1-column-data.table-rather-than-a-vector" class="section level2">
+<h2><span class="header-section-number">1.2</span> Why does <code>DT[,"region"]</code> return a 1-column data.table rather than a vector?</h2>
+<p>See the <a href="#j-num">answer above</a>. Try <code>DT$region</code> instead. Or <code>DT[["region"]]</code>.</p>
+</div>
+<div id="why-does-dt-region-return-a-vector-for-the-region-column-id-like-a-1-column-data.table." class="section level2">
+<h2><span class="header-section-number">1.3</span> Why does <code>DT[, region]</code> return a vector for the “region” column? I’d like a 1-column data.table.</h2>
+<p>Try <code>DT[ , .(region)]</code> instead. <code>.()</code> is an alias for <code>list()</code> and ensures a data.table is returned.</p>
+<p>Also continue reading and see the FAQ after next. Skim whole documents before getting stuck in one part.</p>
+</div>
+<div id="why-does-dt-x-y-z-not-work-i-wanted-the-3-columns-xy-and-z." class="section level2">
+<h2><span class="header-section-number">1.4</span> Why does <code>DT[ , x, y, z]</code> not work? I wanted the 3 columns <code>x</code>,<code>y</code> and <code>z</code>.</h2>
+<p>The <code>j</code> expression is the 2nd argument. Try <code>DT[ , c("x","y","z")]</code> or <code>DT[ , .(x,y,z)]</code>.</p>
+</div>
+<div id="i-assigned-a-variable-mycol-x-but-then-dt-mycol-returns-x.-how-do-i-get-it-to-look-up-the-column-name-contained-in-the-mycol-variable" class="section level2">
+<h2><span class="header-section-number">1.5</span> I assigned a variable <code>mycol = "x"</code> but then <code>DT[ , mycol]</code> returns <code>"x"</code>. How do I get it to look up the column name contained in the <code>mycol</code> variable?</h2>
+<p>In v1.9.8 released Nov 2016 there is an abililty to turn on new behaviour: <code>options(datatable.WhenJisSymbolThenCallingScope=TRUE)</code>. It will then work as you expected, just like data.frame. If you are a new user of data.table, you should probably do this. You can place this command in your .Rprofile file so you don’t have to remember again. See the long item in release notes about this. The release notes are linked at the top of the data.table homepage: <a href="https://gith [...]
+<p>Without turning on that new behavior, what’s happening is that the <code>j</code> expression sees objects in the calling scope. The variable <code>mycol</code> does not exist as a column name of <code>DT</code> so data.table then looked in the calling scope and found <code>mycol</code> there and returned its value <code>"x"</code>. This is correct behaviour currently. Had <code>mycol</code> been a column name, then that column’s data would have been returned. What has been d [...]
+</div>
+<div id="what-are-the-benefits-of-being-able-to-use-column-names-as-if-they-are-variables-inside-dt..." class="section level2">
+<h2><span class="header-section-number">1.6</span> What are the benefits of being able to use column names as if they are variables inside <code>DT[...]</code>?</h2>
+<p><code>j</code> doesn’t have to be just column names. You can write any R <em>expression</em> of column names directly in <code>j</code>, <em>e.g.</em>, <code>DT[ , mean(x*y/z)]</code>. The same applies to <code>i</code>, <em>e.g.</em>, <code>DT[x>1000, sum(y*z)]</code>.</p>
+<p>This runs the <code>j</code> expression on the set of rows where the <code>i</code> expression is true. You don’t even need to return data, <em>e.g.</em>, <code>DT[x>1000, plot(y, z)]</code>. You can do <code>j</code> by group simply by adding <code>by =</code>; e.g., <code>DT[x>1000, sum(y*z), by = w]</code>. This runs <code>j</code> for each group in column <code>w</code> but just over the rows where <code>x>1000</code>. By placing the 3 parts of the query (i=where, j=selec [...]
+</div>
+<div id="ok-im-starting-to-see-what-data.table-is-about-but-why-didnt-you-just-enhance-data.frame-in-r-why-does-it-have-to-be-a-new-package" class="section level2">
+<h2><span class="header-section-number">1.7</span> OK, I’m starting to see what data.table is about, but why didn’t you just enhance <code>data.frame</code> in R? Why does it have to be a new package?</h2>
+<p>As <a href="#j-num">highlighted above</a>, <code>j</code> in <code>[.data.table</code> is fundamentally different from <code>j</code> in <code>[.data.frame</code>. Even if something as simple as <code>DF[ , 1]</code> was changed in base R to return a data.frame rather than a vector, that would break existing code in many 1000’s of CRAN packages and user code. As soon as we took the step to create a new class that inherited from data.frame, we had the opportunity to change a few things [...]
+<p>Furthermore, data.table <em>inherits</em> from <code>data.frame</code>. It <em>is</em> a <code>data.frame</code>, too. A data.table can be passed to any package that only accepts <code>data.frame</code> and that package can use <code>[.data.frame</code> syntax on the data.table. See <a href="http://stackoverflow.com/a/10529888/403310">this answer</a> for how that is achieved.</p>
+<p>We <em>have</em> proposed enhancements to R wherever possible, too. One of these was accepted as a new feature in R 2.12.0 :</p>
+<blockquote>
+<p><code>unique()</code> and <code>match()</code> are now faster on character vectors where all elements are in the global CHARSXP cache and have unmarked encoding (ASCII). Thanks to Matt Dowle for suggesting improvements to the way the hash code is generated in unique.c.</p>
+</blockquote>
+<p>A second proposal was to use <code>memcpy</code> in duplicate.c, which is much faster than a for loop in C. This would improve the <em>way</em> that R copies data internally (on some measures by 13 times). The thread on r-devel is <a href="http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html">here</a>.</p>
+<p>A third more significant proposal that was accepted is that R now uses data.table’s radix sort code as from R 3.3.0 :</p>
+<blockquote>
+<p>The radix sort algorithm and implementation from data.table (forder) replaces the previous radix (counting) sort and adds a new method for order(). Contributed by Matt Dowle and Arun Srinivasan, the new algorithm supports logical, integer (even with large values), real, and character vectors. It outperforms all other methods, but there are some caveats (see ?sort).</p>
+</blockquote>
+<p>This was big event for us and we celebrated until the cows came home. (Not really.)</p>
+</div>
+<div id="why-are-the-defaults-the-way-they-are-why-does-it-work-the-way-it-does" class="section level2">
+<h2><span class="header-section-number">1.8</span> Why are the defaults the way they are? Why does it work the way it does?</h2>
+<p>The simple answer is because the main author originally designed it for his own use. He wanted it that way. He finds it a more natural, faster way to write code, which also executes more quickly.</p>
+</div>
+<div id="isnt-this-already-done-by-with-and-subset-in-base" class="section level2">
+<h2><span class="header-section-number">1.9</span> Isn’t this already done by <code>with()</code> and <code>subset()</code> in <code>base</code>?</h2>
+<p>Some of the features discussed so far are, yes. The package builds upon base functionality. It does the same sorts of things but with less code required and executes many times faster if used correctly.</p>
+</div>
+<div id="why-does-xy-return-all-the-columns-from-y-too-shouldnt-it-return-a-subset-of-x" class="section level2">
+<h2><span class="header-section-number">1.10</span> Why does <code>X[Y]</code> return all the columns from <code>Y</code> too? Shouldn’t it return a subset of <code>X</code>?</h2>
+<p>This was changed in v1.5.3 (Feb 2011). Since then <code>X[Y]</code> includes <code>Y</code>’s non-join columns. We refer to this feature as <em>join inherited scope</em> because not only are <code>X</code> columns available to the <code>j</code> expression, so are <code>Y</code> columns. The downside is that <code>X[Y]</code> is less efficient since every item of <code>Y</code>’s non-join columns are duplicated to match the (likely large) number of rows in <code>X</code> that match. W [...]
+</div>
+<div id="MergeDiff" class="section level2">
+<h2><span class="header-section-number">1.11</span> What is the difference between <code>X[Y]</code> and <code>merge(X, Y)</code>?</h2>
+<p><code>X[Y]</code> is a join, looking up <code>X</code>’s rows using <code>Y</code> (or <code>Y</code>’s key if it has one) as an index.</p>
+<p><code>Y[X]</code> is a join, looking up <code>Y</code>’s rows using <code>X</code> (or <code>X</code>’s key if it has one) as an index.</p>
+<p><code>merge(X,Y)</code><a href="#fn1" class="footnoteRef" id="fnref1"><sup>1</sup></a> does both ways at the same time. The number of rows of <code>X[Y]</code> and <code>Y[X]</code> usually differ, whereas the number of rows returned by <code>merge(X, Y)</code> and <code>merge(Y, X)</code> is the same.</p>
+<p><em>BUT</em> that misses the main point. Most tasks require something to be done on the data after a join or merge. Why merge all the columns of data, only to use a small subset of them afterwards? You may suggest <code>merge(X[ , ColsNeeded1], Y[ , ColsNeeded2])</code>, but that requires the programmer to work out which columns are needed. <code>X[Y, j]</code> in data.table does all that in one step for you. When you write <code>X[Y, sum(foo*bar)]</code>, data.table automatically ins [...]
+</div>
+<div id="anything-else-about-xy-sumfoobar" class="section level2">
+<h2><span class="header-section-number">1.12</span> Anything else about <code>X[Y, sum(foo*bar)]</code>?</h2>
+<p>This behaviour changed in v1.9.4 (Sep 2014). It now does the <code>X[Y]</code> join and then runs <code>sum(foo*bar)</code> over all the rows; i.e., <code>X[Y][ , sum(foo*bar)]</code>. It used to run <code>j</code> for each <em>group</em> of <code>X</code> that each row of <code>Y</code> matches to. That can still be done as it’s very useful but you now need to be explicit and specify <code>by = .EACHI</code>, <em>i.e.</em>, <code>X[Y, sum(foo*bar), by = .EACHI]</code>. We call this < [...]
+<p>For example, (further complicating it by using <em>join inherited scope</em>, too):</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">X =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">grp =</span> <span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"a"</span>, <span class="st">"b"</span>,
+                       <span class="st">"b"</span>, <span class="st">"b"</span>, <span class="st">"c"</span>, <span class="st">"c"</span>), <span class="dt">foo =</span> <span class="dv">1</span>:<span class="dv">7</span>)
+<span class="kw">setkey</span>(X, grp)
+Y =<span class="st"> </span><span class="kw">data.table</span>(<span class="kw">c</span>(<span class="st">"b"</span>, <span class="st">"c"</span>), <span class="dt">bar =</span> <span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">2</span>))
+X
+<span class="co">#    grp foo</span>
+<span class="co"># 1:   a   1</span>
+<span class="co"># 2:   a   2</span>
+<span class="co"># 3:   b   3</span>
+<span class="co"># 4:   b   4</span>
+<span class="co"># 5:   b   5</span>
+<span class="co"># 6:   c   6</span>
+<span class="co"># 7:   c   7</span>
+Y
+<span class="co">#    V1 bar</span>
+<span class="co"># 1:  b   4</span>
+<span class="co"># 2:  c   2</span>
+X[Y, <span class="kw">sum</span>(foo*bar)]
+<span class="co"># [1] 74</span>
+X[Y, <span class="kw">sum</span>(foo*bar), by =<span class="st"> </span>.EACHI]
+<span class="co">#    grp V1</span>
+<span class="co"># 1:   b 48</span>
+<span class="co"># 2:   c 26</span></code></pre></div>
+</div>
+<div id="thats-nice.-how-did-you-manage-to-change-it-given-that-users-depended-on-the-old-behaviour" class="section level2">
+<h2><span class="header-section-number">1.13</span> That’s nice. How did you manage to change it given that users depended on the old behaviour?</h2>
+<p>The request to change came from users. The feeling was that if a query is doing grouping then an explicit <code>by=</code> should be present for code readability reasons. An option was provided to return the old behaviour: <code>options(datatable.old.bywithoutby)</code>, by default <code>FALSE</code>. This enabled upgrading to test the other new features / bug fixes in v1.9.4, with later migration of any by-without-by queries when ready by adding <code>by=.EACHI</code> to them. We ret [...]
+<p>Of the 66 packages on CRAN or Bioconductor that depended on or import data.table at the time of releasing v1.9.4 (it is now over 300), only one was affected by the change. That could be because many packages don’t have comprehensive tests, or just that grouping by each row in <code>i</code> wasn’t being used much by downstream packages. We always test the new version with all dependent packages before release and coordinate any changes with those maintainers. So this release was quite [...]
+<p>Another compelling reason to make the change was that previously, there was no efficient way to achieve what <code>X[Y, sum(foo*bar)]</code> does now. You had to write <code>X[Y][ , sum(foo*bar)]</code>. That was suboptimal because <code>X[Y]</code> joined all the columns and passed them all to the second compound query without knowing that only <code>foo</code> and <code>bar</code> are needed. To solve that efficiency problem, extra programming effort was required: <code>X[Y, list(fo [...]
+</div>
+</div>
+<div id="general-syntax" class="section level1">
+<h1><span class="header-section-number">2</span> General Syntax</h1>
+<div id="how-can-i-avoid-writing-a-really-long-j-expression-youve-said-that-i-should-use-the-column-names-but-ive-got-a-lot-of-columns." class="section level2">
+<h2><span class="header-section-number">2.1</span> How can I avoid writing a really long <code>j</code> expression? You’ve said that I should use the column <em>names</em>, but I’ve got a lot of columns.</h2>
+<p>When grouping, the <code>j</code> expression can use column names as variables, as you know, but it can also use a reserved symbol <code>.SD</code> which refers to the <strong>S</strong>ubset of the <strong>D</strong>ata.table for each group (excluding the grouping columns). So to sum up all your columns it’s just <code>DT[ , lapply(.SD, sum), by = grp]</code>. It might seem tricky, but it’s fast to write and fast to run. Notice you don’t have to create an anonymous function. The <cod [...]
+<p>So please don’t do, for example, <code>DT[ , sum(.SD[["sales"]]), by = grp]</code>. That works but is inefficient and inelegant. <code>DT[ , sum(sales), by = grp]</code> is what was intended, and it could be 100s of times faster. If you use <em>all</em> of the data in <code>.SD</code> for each group (such as in <code>DT[ , lapply(.SD, sum), by = grp]</code>) then that’s very good usage of <code>.SD</code>. If you’re using <em>several</em> but not <em>all</em> of the columns, [...]
+</div>
+<div id="why-is-the-default-for-mult-now-all" class="section level2">
+<h2><span class="header-section-number">2.2</span> Why is the default for <code>mult</code> now <code>"all"</code>?</h2>
+<p>In v1.5.3 the default was changed to <code>"all"</code>. When <code>i</code> (or <code>i</code>’s key if it has one) has fewer columns than <code>x</code>’s key, <code>mult</code> was already set to <code>"all"</code> automatically. Changing the default makes this clearer and easier for users as it came up quite often.</p>
+<p>In versions up to v1.3, <code>"all"</code> was slower. Internally, <code>"all"</code> was implemented by joining using <code>"first"</code>, then again from scratch using <code>"last"</code>, after which a diff between them was performed to work out the span of the matches in <code>x</code> for each row in <code>i</code>. Most often we join to single rows, though, where <code>"first"</code>,<code>"last"</code> and <code>&quot [...]
+<p>In v1.4 the binary search in C was changed to branch at the deepest level to find first and last. That branch will likely occur within the same final pages of RAM so there should no longer be a speed disadvantage in defaulting <code>mult</code> to <code>"all"</code>. We warned that the default might change and made the change in v1.5.3.</p>
+<p>A future version of data.table may allow a distinction between a key and a <em>unique key</em>. Internally <code>mult = "all"</code> would perform more like <code>mult = "first"</code> when all <code>x</code>’s key columns were joined to and <code>x</code>’s key was a unique key. data.table would need checks on insert and update to make sure a unique key is maintained. An advantage of specifying a unique key would be that data.table would ensure no duplicates could [...]
+</div>
+<div id="im-using-c-in-j-and-getting-strange-results." class="section level2">
+<h2><span class="header-section-number">2.3</span> I’m using <code>c()</code> in <code>j</code> and getting strange results.</h2>
+<p>This is a common source of confusion. In <code>data.frame</code> you are used to, for example:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">y =</span> <span class="dv">4</span>:<span class="dv">6</span>, <span class="dt">z =</span> <span class="dv">7</span>:<span class="dv">9</span>)
+DF
+<span class="co">#   x y z</span>
+<span class="co"># 1 1 4 7</span>
+<span class="co"># 2 2 5 8</span>
+<span class="co"># 3 3 6 9</span>
+DF[ , <span class="kw">c</span>(<span class="st">"y"</span>, <span class="st">"z"</span>)]
+<span class="co">#   y z</span>
+<span class="co"># 1 4 7</span>
+<span class="co"># 2 5 8</span>
+<span class="co"># 3 6 9</span></code></pre></div>
+<p>which returns the two columns. In data.table you know you can use the column names directly and might try:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(DF)
+DT[ , <span class="kw">c</span>(y, z)]
+<span class="co"># [1] 4 5 6 7 8 9</span></code></pre></div>
+<p>but this returns one vector. Remember that the <code>j</code> expression is evaluated within the environment of <code>DT</code> and <code>c()</code> returns a vector. If 2 or more columns are required, use <code>list()</code> or <code>.()</code> instead:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[ , .(y, z)]
+<span class="co">#    y z</span>
+<span class="co"># 1: 4 7</span>
+<span class="co"># 2: 5 8</span>
+<span class="co"># 3: 6 9</span></code></pre></div>
+<p><code>c()</code> can be useful in a data.table too, but its behaviour is different from that in <code>[.data.frame</code>.</p>
+</div>
+<div id="i-have-built-up-a-complex-table-with-many-columns.-i-want-to-use-it-as-a-template-for-a-new-table-i.e.-create-a-new-table-with-no-rows-but-with-the-column-names-and-types-copied-from-my-table.-can-i-do-that-easily" class="section level2">
+<h2><span class="header-section-number">2.4</span> I have built up a complex table with many columns. I want to use it as a template for a new table; <em>i.e.</em>, create a new table with no rows, but with the column names and types copied from my table. Can I do that easily?</h2>
+<p>Yes. If your complex table is called <code>DT</code>, try <code>NEWDT = DT[0]</code>.</p>
+</div>
+<div id="is-a-null-data.table-the-same-as-dt0" class="section level2">
+<h2><span class="header-section-number">2.5</span> Is a null data.table the same as <code>DT[0]</code>?</h2>
+<p>No. By “null data.table” we mean the result of <code>data.table(NULL)</code> or <code>as.data.table(NULL)</code>; <em>i.e.</em>,</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">data.table</span>(<span class="ot">NULL</span>)
+<span class="co"># Null data.table (0 rows and 0 cols)</span>
+<span class="kw">data.frame</span>(<span class="ot">NULL</span>)
+<span class="co"># data frame with 0 columns and 0 rows</span>
+<span class="kw">as.data.table</span>(<span class="ot">NULL</span>)
+<span class="co"># Null data.table (0 rows and 0 cols)</span>
+<span class="kw">as.data.frame</span>(<span class="ot">NULL</span>)
+<span class="co"># data frame with 0 columns and 0 rows</span>
+<span class="kw">is.null</span>(<span class="kw">data.table</span>(<span class="ot">NULL</span>))
+<span class="co"># [1] FALSE</span>
+<span class="kw">is.null</span>(<span class="kw">data.frame</span>(<span class="ot">NULL</span>))
+<span class="co"># [1] FALSE</span></code></pre></div>
+<p>The null data.table|<code>frame</code> is <code>NULL</code> with some attributes attached, which means it’s no longer <code>NULL</code>. In R only pure <code>NULL</code> is <code>NULL</code> as tested by <code>is.null()</code>. When referring to the “null data.table” we use lower case null to help distinguish from upper case <code>NULL</code>. To test for the null data.table, use <code>length(DT) == 0</code> or <code>ncol(DT) == 0</code> (<code>length</code> is slightly faster as it’s [...]
+<p>An <em>empty</em> data.table (<code>DT[0]</code>) has one or more columns, all of which are empty. Those empty columns still have names and types.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">a =</span> <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">b =</span> <span class="kw">c</span>(<span class="dv">4</span>, <span class="dv">5</span>, <span class="dv">6</span>), <span class="dt">d =</span> <span class="kw">c</span>(7L,8L,9L))
+DT[<span class="dv">0</span>]
+<span class="co"># Empty data.table (0 rows) of 3 cols: a,b,d</span>
+<span class="kw">sapply</span>(DT[<span class="dv">0</span>], class)
+<span class="co">#         a         b         d </span>
+<span class="co"># "integer" "numeric" "integer"</span></code></pre></div>
+</div>
+<div id="DTremove1" class="section level2">
+<h2><span class="header-section-number">2.6</span> Why has the <code>DT()</code> alias been removed?</h2>
+<p><code>DT</code> was introduced originally as a wrapper for a list of <code>j</code>expressions. Since <code>DT</code> was an alias for data.table, this was a convenient way to take care of silent recycling in cases where each item of the <code>j</code> list evaluated to different lengths. The alias was one reason grouping was slow, though.</p>
+<p>As of v1.3, <code>list()</code> or <code>.()</code> should be passed instead to the <code>j</code> argument. These are much faster, especially when there are many groups. Internally, this was a nontrivial change. Vector recycling is now done internally, along with several other speed enhancements for grouping.</p>
+</div>
+<div id="DTremove2" class="section level2">
+<h2><span class="header-section-number">2.7</span> But my code uses <code>j = DT(...)</code> and it works. The previous FAQ says that <code>DT()</code> has been removed.</h2>
+<p>Then you are using a version prior to 1.5.3. Prior to 1.5.3 <code>[.data.table</code> detected use of <code>DT()</code> in the <code>j</code> and automatically replaced it with a call to <code>list()</code>. This was to help the transition for existing users.</p>
+</div>
+<div id="what-are-the-scoping-rules-for-j-expressions" class="section level2">
+<h2><span class="header-section-number">2.8</span> What are the scoping rules for <code>j</code> expressions?</h2>
+<p>Think of the subset as an environment where all the column names are variables. When a variable <code>foo</code> is used in the <code>j</code> of a query such as <code>X[Y, sum(foo)]</code>, <code>foo</code> is looked for in the following order :</p>
+<ol style="list-style-type: decimal">
+<li>The scope of <code>X</code>’s subset; <em>i.e.</em>, <code>X</code>’s column names.</li>
+<li>The scope of each row of <code>Y</code>; <em>i.e.</em>, <code>Y</code>’s column names (<em>join inherited scope</em>)</li>
+<li>The scope of the calling frame; <em>e.g.</em>, the line that appears before the data.table query.</li>
+<li>Exercise for reader: does it then ripple up the calling frames, or go straight to <code>globalenv()</code>?</li>
+<li>The global environment</li>
+</ol>
+<p>This is <em>lexical scoping</em> as explained in <a href="https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping">R FAQ 3.3.1</a>. The environment in which the function was created is not relevant, though, because there is <em>no function</em>. No anonymous <em>function</em> is passed to <code>j</code>. Instead, an anonymous <em>body</em> is passed to <code>j</code>; for example,</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x =</span> <span class="kw">rep</span>(<span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"b"</span>), <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">3</span>)), <span class="dt">y =</span> <span class="dv">1</span>:<span class="dv">5</span>)
+DT
+<span class="co">#    x y</span>
+<span class="co"># 1: a 1</span>
+<span class="co"># 2: a 2</span>
+<span class="co"># 3: b 3</span>
+<span class="co"># 4: b 4</span>
+<span class="co"># 5: b 5</span>
+DT[ , {z =<span class="st"> </span><span class="kw">sum</span>(y); z +<span class="st"> </span><span class="dv">3</span>}, by =<span class="st"> </span>x]
+<span class="co">#    x V1</span>
+<span class="co"># 1: a  6</span>
+<span class="co"># 2: b 15</span></code></pre></div>
+<p>Some programming languages call this a <em>lambda</em>.</p>
+</div>
+<div id="j-trace" class="section level2">
+<h2><span class="header-section-number">2.9</span> Can I trace the <code>j</code> expression as it runs through the groups?</h2>
+<p>Try something like this:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[ , {
+  <span class="kw">cat</span>(<span class="st">"Objects:"</span>, <span class="kw">paste</span>(<span class="kw">objects</span>(), <span class="dt">collapse =</span> <span class="st">","</span>), <span class="st">"</span><span class="ch">\n</span><span class="st">"</span>)
+  <span class="kw">cat</span>(<span class="st">"Trace: x="</span>, <span class="kw">as.character</span>(x), <span class="st">" y="</span>, y, <span class="st">"</span><span class="ch">\n</span><span class="st">"</span>)
+  <span class="kw">sum</span>(y)},
+  by =<span class="st"> </span>x]
+<span class="co"># Objects: Cfastmean,mean,print,strptime,x,y </span>
+<span class="co"># Trace: x= a  y= 1 2 </span>
+<span class="co"># Objects: Cfastmean,mean,print,strptime,x,y </span>
+<span class="co"># Trace: x= b  y= 3 4 5</span>
+<span class="co">#    x V1</span>
+<span class="co"># 1: a  3</span>
+<span class="co"># 2: b 12</span></code></pre></div>
+</div>
+<div id="inside-each-group-why-are-the-group-variables-length-1" class="section level2">
+<h2><span class="header-section-number">2.10</span> Inside each group, why are the group variables length-1?</h2>
+<p><a href="#j-trace">Above</a>, <code>x</code> is a grouping variable and (as from v1.6.1) has <code>length</code> 1 (if inspected or used in <code>j</code>). It’s for efficiency and convenience. Therefore, there is no difference between the following two statements:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[ , .(<span class="dt">g =</span> <span class="dv">1</span>, <span class="dt">h =</span> <span class="dv">2</span>, <span class="dt">i =</span> <span class="dv">3</span>, <span class="dt">j =</span> <span class="dv">4</span>, <span class="dt">repeatgroupname =</span> x, <span class="kw">sum</span>(y)), by =<span class="st"> </span>x]
+<span class="co">#    x g h i j repeatgroupname V6</span>
+<span class="co"># 1: a 1 2 3 4               a  3</span>
+<span class="co"># 2: b 1 2 3 4               b 12</span>
+DT[ , .(<span class="dt">g =</span> <span class="dv">1</span>, <span class="dt">h =</span> <span class="dv">2</span>, <span class="dt">i =</span> <span class="dv">3</span>, <span class="dt">j =</span> <span class="dv">4</span>, <span class="dt">repeatgroupname =</span> x[<span class="dv">1</span>], <span class="kw">sum</span>(y)), by =<span class="st"> </span>x]
+<span class="co">#    x g h i j repeatgroupname V6</span>
+<span class="co"># 1: a 1 2 3 4               a  3</span>
+<span class="co"># 2: b 1 2 3 4               b 12</span></code></pre></div>
+<p>If you need the size of the current group, use <code>.N</code> rather than calling <code>length()</code> on any column.</p>
+</div>
+<div id="only-the-first-10-rows-are-printed-how-do-i-print-more" class="section level2">
+<h2><span class="header-section-number">2.11</span> Only the first 10 rows are printed, how do I print more?</h2>
+<p>There are two things happening here. First, if the number of rows in a data.table are large (<code>> 100</code> by default), then a summary of the data.table is printed to the console by default. Second, the summary of a large data.table is printed by taking the top and bottom <code>n</code> (<code>= 5</code> by default) rows of the data.table and only printing those. Both of these parameters (when to trigger a summary and how much of a table to use as a summary) are configurable b [...]
+<p>For instance, to enforce the summary of a data.table to only happen when a data.table is greater than 50 rows, you could <code>options(datatable.print.nrows = 50)</code>. To disable the summary-by-default completely, you could <code>options(datatable.print.nrows = Inf)</code>. You could also call <code>print</code> directly, as in <code>print(your.data.table, nrows = Inf)</code>.</p>
+<p>If you want to show more than just the top (and bottom) 10 rows of a data.table summary (say you like 20), set <code>options(datatable.print.topn = 20)</code>, for example. Again, you could also just call <code>print</code> directly, as in <code>print(your.data.table, topn = 20)</code>.</p>
+</div>
+<div id="with-an-xy-join-what-if-x-contains-a-column-called-y" class="section level2">
+<h2><span class="header-section-number">2.12</span> With an <code>X[Y]</code> join, what if <code>X</code> contains a column called <code>"Y"</code>?</h2>
+<p>When <code>i</code> is a single name such as <code>Y</code> it is evaluated in the calling frame. In all other cases such as calls to <code>.()</code> or other expressions, <code>i</code> is evaluated within the scope of <code>X</code>. This facilitates easy <em>self-joins</em> such as <code>X[J(unique(colA)), mult = "first"]</code>.</p>
+</div>
+<div id="xzy-is-failing-because-x-contains-a-column-y.-id-like-it-to-use-the-table-y-in-calling-scope." class="section level2">
+<h2><span class="header-section-number">2.13</span> <code>X[Z[Y]]</code> is failing because <code>X</code> contains a column <code>"Y"</code>. I’d like it to use the table <code>Y</code> in calling scope.</h2>
+<p>The <code>Z[Y]</code> part is not a single name so that is evaluated within the frame of <code>X</code> and the problem occurs. Try <code>tmp = Z[Y]; X[tmp]</code>. This is robust to <code>X</code> containing a column <code>"tmp"</code> because <code>tmp</code> is a single name. If you often encounter conflicts of this type, one simple solution may be to name all tables in uppercase and all column names in lowercase, or some similar scheme.</p>
+</div>
+<div id="can-you-explain-further-why-data.table-is-inspired-by-ab-syntax-in-base" class="section level2">
+<h2><span class="header-section-number">2.14</span> Can you explain further why data.table is inspired by <code>A[B]</code> syntax in <code>base</code>?</h2>
+<p>Consider <code>A[B]</code> syntax using an example matrix <code>A</code> :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">A =<span class="st"> </span><span class="kw">matrix</span>(<span class="dv">1</span>:<span class="dv">12</span>, <span class="dt">nrow =</span> <span class="dv">4</span>)
+A
+<span class="co">#      [,1] [,2] [,3]</span>
+<span class="co"># [1,]    1    5    9</span>
+<span class="co"># [2,]    2    6   10</span>
+<span class="co"># [3,]    3    7   11</span>
+<span class="co"># [4,]    4    8   12</span></code></pre></div>
+<p>To obtain cells <code>(1, 2) = 5</code> and <code>(3, 3) = 11</code> many users (we believe) may try this first :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">A[<span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">3</span>), <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">3</span>)]
+<span class="co">#      [,1] [,2]</span>
+<span class="co"># [1,]    5    9</span>
+<span class="co"># [2,]    7   11</span></code></pre></div>
+<p>However, this returns the union of those rows and columns. To reference the cells, a 2-column matrix is required. <code>?Extract</code> says :</p>
+<blockquote>
+<p>When indexing arrays by <code>[</code> a single argument <code>i</code> can be a matrix with as many columns as there are dimensions of <code>x</code>; the result is then a vector with elements corresponding to the sets of indices in each row of <code>i</code>.</p>
+</blockquote>
+<p>Let’s try again.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">B =<span class="st"> </span><span class="kw">cbind</span>(<span class="kw">c</span>(<span class="dv">1</span>, <span class="dv">3</span>), <span class="kw">c</span>(<span class="dv">2</span>, <span class="dv">3</span>))
+B
+<span class="co">#      [,1] [,2]</span>
+<span class="co"># [1,]    1    2</span>
+<span class="co"># [2,]    3    3</span>
+A[B]
+<span class="co"># [1]  5 11</span></code></pre></div>
+<p>A matrix is a 2-dimensional structure with row names and column names. Can we do the same with names?</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">rownames</span>(A) =<span class="st"> </span>letters[<span class="dv">1</span>:<span class="dv">4</span>]
+<span class="kw">colnames</span>(A) =<span class="st"> </span>LETTERS[<span class="dv">1</span>:<span class="dv">3</span>]
+A
+<span class="co">#   A B  C</span>
+<span class="co"># a 1 5  9</span>
+<span class="co"># b 2 6 10</span>
+<span class="co"># c 3 7 11</span>
+<span class="co"># d 4 8 12</span>
+B =<span class="st"> </span><span class="kw">cbind</span>(<span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"c"</span>), <span class="kw">c</span>(<span class="st">"B"</span>, <span class="st">"C"</span>))
+A[B]
+<span class="co"># [1]  5 11</span></code></pre></div>
+<p>So yes, we can. Can we do the same with a <code>data.frame</code>?</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">A =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">A =</span> <span class="dv">1</span>:<span class="dv">4</span>, <span class="dt">B =</span> letters[<span class="dv">11</span>:<span class="dv">14</span>], <span class="dt">C =</span> pi*<span class="dv">1</span>:<span class="dv">4</span>)
+<span class="kw">rownames</span>(A) =<span class="st"> </span>letters[<span class="dv">1</span>:<span class="dv">4</span>]
+A
+<span class="co">#   A B         C</span>
+<span class="co"># a 1 k  3.141593</span>
+<span class="co"># b 2 l  6.283185</span>
+<span class="co"># c 3 m  9.424778</span>
+<span class="co"># d 4 n 12.566371</span>
+B
+<span class="co">#      [,1] [,2]</span>
+<span class="co"># [1,] "a"  "B" </span>
+<span class="co"># [2,] "c"  "C"</span>
+A[B]
+<span class="co"># [1] "k"         " 9.424778"</span></code></pre></div>
+<p>But, notice that the result was coerced to <code>character.</code> R coerced <code>A</code> to <code>matrix</code> first so that the syntax could work, but the result isn’t ideal. Let’s try making <code>B</code> a <code>data.frame</code>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">B =<span class="st"> </span><span class="kw">data.frame</span>(<span class="kw">c</span>(<span class="st">"a"</span>, <span class="st">"c"</span>), <span class="kw">c</span>(<span class="st">"B"</span>, <span class="st">"C"</span>))
+<span class="kw">cat</span>(<span class="kw">try</span>(A[B], <span class="dt">silent =</span> <span class="ot">TRUE</span>))
+<span class="co"># Error in `[.default`(A, B) : invalid subscript type 'list'</span></code></pre></div>
+<p>So we can’t subset a <code>data.frame</code> by a <code>data.frame</code> in base R. What if we want row names and column names that aren’t <code>character</code> but <code>integer</code> or <code>float</code>? What if we want more than 2 dimensions of mixed types? Enter data.table.</p>
+<p>Furthermore, matrices, especially sparse matrices, are often stored in a 3-column tuple: <code>(i, j, value)</code>. This can be thought of as a key-value pair where <code>i</code> and <code>j</code> form a 2-column key. If we have more than one value, perhaps of different types, it might look like <code>(i, j, val1, val2, val3, ...)</code>. This looks very much like a <code>data.frame</code>. Hence data.table extends <code>data.frame</code> so that a <code>data.frame</code> <code>X</ [...]
+</div>
+<div id="can-base-be-changed-to-do-this-then-rather-than-a-new-package" class="section level2">
+<h2><span class="header-section-number">2.15</span> Can base be changed to do this then, rather than a new package?</h2>
+<p><code>data.frame</code> is used <em>everywhere</em> and so it is very difficult to make <em>any</em> changes to it. data.table <em>inherits</em> from <code>data.frame</code>. It <em>is</em> a <code>data.frame</code>, too. A data.table <em>can</em> be passed to any package that <em>only</em> accepts <code>data.frame</code>. When that package uses <code>[.data.frame</code> syntax on the data.table, it works. It works because <code>[.data.table</code> looks to see where it was called fro [...]
+</div>
+<div id="ive-heard-that-data.table-syntax-is-analogous-to-sql." class="section level2">
+<h2><span class="header-section-number">2.16</span> I’ve heard that data.table syntax is analogous to SQL.</h2>
+<p>Yes :</p>
+<ul>
+<li><code>i</code> <span class="math inline">\(\Leftrightarrow\)</span> where</li>
+<li><code>j</code> <span class="math inline">\(\Leftrightarrow\)</span> select</li>
+<li><code>:=</code> <span class="math inline">\(\Leftrightarrow\)</span> update</li>
+<li><code>by</code> <span class="math inline">\(\Leftrightarrow\)</span> group by</li>
+<li><code>i</code> <span class="math inline">\(\Leftrightarrow\)</span> order by (in compound syntax)</li>
+<li><code>i</code> <span class="math inline">\(\Leftrightarrow\)</span> having (in compound syntax)</li>
+<li><code>nomatch = NA</code> <span class="math inline">\(\Leftrightarrow\)</span> outer join</li>
+<li><code>nomatch = 0L</code> <span class="math inline">\(\Leftrightarrow\)</span> inner join</li>
+<li><code>mult = "first"|"last"</code> <span class="math inline">\(\Leftrightarrow\)</span> N/A because SQL is inherently unordered</li>
+<li><code>roll = TRUE</code> <span class="math inline">\(\Leftrightarrow\)</span> N/A because SQL is inherently unordered</li>
+</ul>
+<p>The general form is :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[where, select|update, group by][order by][...] ... [...]</code></pre></div>
+<p>A key advantage of column vectors in R is that they are <em>ordered</em>, unlike SQL<a href="#fn2" class="footnoteRef" id="fnref2"><sup>2</sup></a>. We can use ordered functions in <code>data.table queries such as</code>diff()` and we can use <em>any</em> R function from any package, not just the functions that are defined in SQL. A disadvantage is that R objects must fit in memory, but with several R packages such as ff, bigmemory, mmap and indexing, this is changing.</p>
+</div>
+<div id="SmallerDiffs" class="section level2">
+<h2><span class="header-section-number">2.17</span> What are the smaller syntax differences between <code>data.frame</code> and data.table</h2>
+<ul>
+<li><code>DT[3]</code> refers to the 3rd <em>row</em>, but <code>DF[3]</code> refers to the 3rd <em>column</em></li>
+<li><code>DT[3, ] == DT[3]</code>, but <code>DF[ , 3] == DF[3]</code> (somewhat confusingly in data.frame, whereas data.table is consistent)</li>
+<li>For this reason we say the comma is <em>optional</em> in <code>DT</code>, but not optional in <code>DF</code></li>
+<li><code>DT[[3]] == DF[3] == DF[[3]]</code></li>
+<li><code>DT[i, ]</code>, where <code>i</code> is a single integer, returns a single row, just like <code>DF[i, ]</code>, but unlike a matrix single-row subset which returns a vector.</li>
+<li><code>DT[ , j]</code> where <code>j</code> is a single integer returns a one-column data.table, unlike <code>DF[, j]</code> which returns a vector by default</li>
+<li><code>DT[ , "colA"][[1]] == DF[ , "colA"]</code>.</li>
+<li><code>DT[ , colA] == DF[ , "colA"]</code> (currently in data.table v1.9.8 but is about to change, see release notes)</li>
+<li><code>DT[ , list(colA)] == DF[ , "colA", drop = FALSE]</code></li>
+<li><code>DT[NA]</code> returns 1 row of <code>NA</code>, but <code>DF[NA]</code> returns an entire copy of <code>DF</code> containing <code>NA</code> throughout. The symbol <code>NA</code> is type <code>logical</code> in R and is therefore recycled by <code>[.data.frame</code>. The user’s intention was probably <code>DF[NA_integer_]</code>. <code>[.data.table</code> diverts to this probable intention automatically, for convenience.</li>
+<li><code>DT[c(TRUE, NA, FALSE)]</code> treats the <code>NA</code> as <code>FALSE</code>, but <code>DF[c(TRUE, NA, FALSE)]</code> returns <code>NA</code> rows for each <code>NA</code></li>
+<li><code>DT[ColA == ColB]</code> is simpler than <code>DF[!is.na(ColA) & !is.na(ColB) & ColA == ColB, ]</code></li>
+<li><code>data.frame(list(1:2, "k", 1:4))</code> creates 3 columns, data.table creates one <code>list</code> column.</li>
+<li><code>check.names</code> is by default <code>TRUE</code> in <code>data.frame</code> but <code>FALSE</code> in data.table, for convenience.</li>
+<li><code>stringsAsFactors</code> is by default <code>TRUE</code> in <code>data.frame</code> but <code>FALSE</code> in data.table, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of converting to <code>factor</code>.</li>
+<li>Atomic vectors in <code>list</code> columns are collapsed when printed using <code>", "</code> in <code>data.frame</code>, but <code>","</code> in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.</li>
+</ul>
+<p>In <code>[.data.frame</code> we very often set <code>drop = FALSE</code>. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column <code>data.frame</code>. In <code>[.data.table</code> we took the opportunity to make it consistent and dropped <code>drop</code>.</p>
+<p>When a data.table is passed to a data.table-unaware package, that package is not concerned with any of these differences; it just works.</p>
+</div>
+<div id="im-using-j-for-its-side-effect-only-but-im-still-getting-data-returned.-how-do-i-stop-that" class="section level2">
+<h2><span class="header-section-number">2.18</span> I’m using <code>j</code> for its side effect only, but I’m still getting data returned. How do I stop that?</h2>
+<p>In this case <code>j</code> can be wrapped with <code>invisible()</code>; e.g., <code>DT[ , invisible(hist(colB)), by = colA]</code><a href="#fn3" class="footnoteRef" id="fnref3"><sup>3</sup></a></p>
+</div>
+<div id="why-does-.data.table-now-have-a-drop-argument-from-v1.5" class="section level2">
+<h2><span class="header-section-number">2.19</span> Why does <code>[.data.table</code> now have a <code>drop</code> argument from v1.5?</h2>
+<p>So that data.table can inherit from <code>data.frame</code> without using <code>...</code>. If we used <code>...</code> then invalid argument names would not be caught.</p>
+<p>The <code>drop</code> argument is never used by <code>[.data.table</code>. It is a placeholder for non-data.table-aware packages when they use the <code>[.data.frame</code> syntax directly on a data.table.</p>
+</div>
+<div id="rolling-joins-are-cool-and-very-fast-was-that-hard-to-program" class="section level2">
+<h2><span class="header-section-number">2.20</span> Rolling joins are cool and very fast! Was that hard to program?</h2>
+<p>The prevailing row on or before the <code>i</code> row is the final row the binary search tests anyway. So <code>roll = TRUE</code> is essentially just a switch in the binary search C code to return that row.</p>
+</div>
+<div id="why-does-dti-col-value-return-the-whole-of-dt-i-expected-either-no-visible-value-consistent-with---or-a-message-or-return-value-containing-how-many-rows-were-updated.-it-isnt-obvious-that-the-data-has-indeed-been-updated-by-reference." class="section level2">
+<h2><span class="header-section-number">2.21</span> Why does <code>DT[i, col := value]</code> return the whole of <code>DT</code>? I expected either no visible value (consistent with <code><-</code>), or a message or return value containing how many rows were updated. It isn’t obvious that the data has indeed been updated by reference.</h2>
+<p>This has changed in v1.8.3 to meet your expectations. Please upgrade.</p>
+<p>The whole of <code>DT</code> is returned (now invisibly) so that compound syntax can work; <em>e.g.</em>, <code>DT[i, done := TRUE][ , sum(done)]</code>. The number of rows updated is returned when <code>verbose</code> is <code>TRUE</code>, either on a per-query basis or globally using <code>options(datatable.verbose = TRUE)</code>.</p>
+</div>
+<div id="ok-thanks.-what-was-so-difficult-about-the-result-of-dti-col-value-being-returned-invisibly" class="section level2">
+<h2><span class="header-section-number">2.22</span> OK, thanks. What was so difficult about the result of <code>DT[i, col := value]</code> being returned invisibly?</h2>
+<p>R internally forces visibility on for <code>[</code>. The value of FunTab’s eval column (see <a href="https://github.com/wch/r-source/blob/trunk/src/main/names.c">src/main/names.c</a>) for <code>[</code> is <code>0</code> meaning “force <code>R_Visible</code> on” (see <a href="https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Autoprinting">R-Internals section 1.6</a> ). Therefore, when we tried <code>invisible()</code> or setting <code>R_Visible</code> to <code>0</code> dir [...]
+<p>To solve this problem, the key was to stop trying to stop the print method running after a <code>:=</code>. Instead, inside <code>:=</code> we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.</p>
+</div>
+<div id="why-do-i-have-to-type-dt-sometimes-twice-after-using-to-print-the-result-to-console" class="section level2">
+<h2><span class="header-section-number">2.23</span> Why do I have to type <code>DT</code> sometimes twice after using <code>:=</code> to print the result to console?</h2>
+<p>This is an unfortunate downside to get <a href="https://github.com/Rdatatable/data.table/issues/869">#869</a> to work. If a <code>:=</code> is used inside a function with no <code>DT[]</code> before the end of the function, then the next time <code>DT</code> is typed at the prompt, nothing will be printed. A repeated <code>DT</code> will print. To avoid this: include a <code>DT[]</code> after the last <code>:=</code> in your function. If that is not possible (e.g., it’s not a function [...]
+</div>
+<div id="ive-noticed-that-basecbind.data.frame-and-baserbind.data.frame-appear-to-be-changed-by-data.table.-how-is-this-possible-why" class="section level2">
+<h2><span class="header-section-number">2.24</span> I’ve noticed that <code>base::cbind.data.frame</code> (and <code>base::rbind.data.frame</code>) appear to be changed by data.table. How is this possible? Why?</h2>
+<p>It is a temporary, last resort solution until we discover a better way to solve the problems listed below. Essentially, the issue is that data.table inherits from <code>data.frame</code>, <em>and</em> <code>base::cbind</code> and <code>base::rbind</code> (uniquely) do their own S3 dispatch internally as documented by <code>?cbind</code>. The change is adding one <code>for</code> loop to the start of each function directly in <code>base</code>; <em>e.g.</em>,</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">base::cbind.data.frame
+<span class="co"># function (..., deparse.level = 1) </span>
+<span class="co"># {</span>
+<span class="co">#     if (!identical(class(..1), "data.frame")) </span>
+<span class="co">#         for (x in list(...)) {</span>
+<span class="co">#             if (inherits(x, "data.table")) </span>
+<span class="co">#                 return(data.table::data.table(...))</span>
+<span class="co">#         }</span>
+<span class="co">#     data.frame(..., check.names = FALSE)</span>
+<span class="co"># }</span>
+<span class="co"># <environment: namespace:base></span></code></pre></div>
+<p>That modification is made dynamically, <em>i.e.</em>, the <code>base</code> definition of <code>cbind.data.frame</code> is fetched, the <code>for</code> loop added to the beginning and then assigned back to <code>base</code>. This solution is intended to be robust to different definitions of <code>base::cbind.data.frame</code> in different versions of R, including unknown future changes. Again, it is a last resort until a better solution is known or made available. The competing requi [...]
+<ul>
+<li><p><code>cbind(DT, DF)</code> needs to work. Defining <code>cbind.data.table</code> doesn’t work because <code>base::cbind</code> does its own S3 dispatch and requires that the <em>first</em> <code>cbind</code> method for each object it is passed is <em>identical</em>. This is not true in <code>cbind(DT, DF)</code> because the first method for <code>DT</code> is <code>cbind.data.table</code> but the first method for <code>DF</code> is <code>cbind.data.frame</code>. <code>base::cbind< [...]
+<li><p>This naturally leads to trying to mask <code>cbind.data.frame</code> instead. Since a data.table is a <code>data.frame</code>, <code>cbind</code> would find the same method for both <code>DT</code> and <code>DF</code>. However, this doesn’t work either because <code>base::cbind</code> appears to find methods in <code>base</code> first; <em>i.e.</em>, <code>base::cbind.data.frame</code> isn’t maskable. This is reproducible as follows :</p></li>
+</ul>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">foo =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">a =</span> <span class="dv">1</span>:<span class="dv">3</span>)
+cbind.data.frame =<span class="st"> </span>function(...) <span class="kw">cat</span>(<span class="st">"Not printed</span><span class="ch">\n</span><span class="st">"</span>)
+<span class="kw">cbind</span>(foo)
+<span class="co">#   a</span>
+<span class="co"># 1 1</span>
+<span class="co"># 2 2</span>
+<span class="co"># 3 3</span>
+<span class="kw">rm</span>(<span class="st">"cbind.data.frame"</span>)</code></pre></div>
+<ul>
+<li>Finally, we tried masking <code>cbind</code> itself (v1.6.5 and v1.6.6). This allowed <code>cbind(DT, DF)</code> to work, but introduced compatibility issues with package <code>IRanges</code>, since <code>IRanges</code> also masks <code>cbind</code>. It worked if <code>IRanges</code> was lower on the <code>search()</code> path than data.table, but if <code>IRanges</code> was higher then data.table’s, <code>cbind</code> would never be called and the strange-looking <code>matrix</code> [...]
+</ul>
+<p>If you know of a better solution that still solves all the issues above, then please let us know and we’ll gladly change it.</p>
+</div>
+<div id="r-dispatch" class="section level2">
+<h2><span class="header-section-number">2.25</span> I’ve read about method dispatch (<em>e.g.</em> <code>merge</code> may or may not dispatch to <code>merge.data.table</code>) but <em>how</em> does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when?</h2>
+<p>This comes up quite a lot but it’s really earth-shatteringly simple. A function such as <code>merge</code> is <em>generic</em> if it consists of a call to <code>UseMethod</code>. When you see people talking about whether or not functions are <em>generic</em> functions they are merely typing the function without <code>()</code> afterwards, looking at the program code inside it and if they see a call to <code>UseMethod</code> then it is <em>generic</em>. What does <code>UseMethod</code> [...]
+<p>You might now ask: where is this documented in R? Answer: it’s quite clear, but, you need to first know to look in <code>?UseMethod</code> and <em>that</em> help file contains :</p>
+<blockquote>
+<p>When a function calling <code>UseMethod('fun')</code> is applied to an object with class attribute <code>c('first', 'second')</code>, the system searches for a function called <code>fun.first</code> and, if it finds it, applies it to the object. If no such function is found a function called <code>fun.second</code> is tried. If no class name produces a suitable function, the function <code>fun.default</code> is used, if it exists, or an error results.</p>
+</blockquote>
+<p>Happily, an internet search for “How does R method dispatch work” (at the time of this writing) returns the <code>?UseMethod</code> help page in the top few links. Admittedly, other links rapidly descend into the intricacies of S3 vs S4, internal generics and so on.</p>
+<p>However, features like basic S3 dispatch (pasting the function name together with the class name) is why some R folk love R. It’s so simple. No complicated registration or signature is required. There isn’t much needed to learn. To create the <code>merge</code> method for data.table all that was required, literally, was to merely create a function called <code>merge.data.table</code>.</p>
+</div>
+</div>
+<div id="questions-relating-to-compute-time" class="section level1">
+<h1><span class="header-section-number">3</span> Questions relating to compute time</h1>
+<div id="i-have-20-columns-and-a-large-number-of-rows.-why-is-an-expression-of-one-column-so-quick" class="section level2">
+<h2><span class="header-section-number">3.1</span> I have 20 columns and a large number of rows. Why is an expression of one column so quick?</h2>
+<p>Several reasons:</p>
+<ul>
+<li>Only that column is grouped, the other 19 are ignored because data.table inspects the <code>j</code> expression and realises it doesn’t use the other columns.</li>
+<li>One memory allocation is made for the largest group only, then that memory is re-used for the other groups. There is very little garbage to collect.</li>
+<li>R is an in-memory column store; i.e., the columns are contiguous in RAM. Page fetches from RAM into L2 cache are minimised.</li>
+</ul>
+</div>
+<div id="i-dont-have-a-key-on-a-large-table-but-grouping-is-still-really-quick.-why-is-that" class="section level2">
+<h2><span class="header-section-number">3.2</span> I don’t have a <code>key</code> on a large table, but grouping is still really quick. Why is that?</h2>
+<p>data.table uses radix sorting. This is significantly faster than other sort algorithms. See <a href="http://user2015.math.aau.dk/presentations/234.pdf">our presentations</a> on <a href="https://github.com/Rdatatable/data.table/wiki">our homepage</a> for more information.</p>
+<p>This is also one reason why <code>setkey()</code> is quick.</p>
+<p>When no <code>key</code> is set, or we group in a different order from that of the key, we call it an <em>ad hoc</em> <code>by</code>.</p>
+</div>
+<div id="why-is-grouping-by-columns-in-the-key-faster-than-an-ad-hoc-by" class="section level2">
+<h2><span class="header-section-number">3.3</span> Why is grouping by columns in the key faster than an <em>ad hoc</em> <code>by</code>?</h2>
+<p>Because each group is contiguous in RAM, thereby minimising page fetches and memory can be copied in bulk (<code>memcpy</code> in C) rather than looping in C.</p>
+</div>
+<div id="what-are-primary-and-secondary-indexes-in-data.table" class="section level2">
+<h2><span class="header-section-number">3.4</span> What are primary and secondary indexes in data.table?</h2>
+<p>Manual: <a href="https://www.rdocumentation.org/packages/data.table/functions/setkey"><code>?setkey</code></a> S.O. : <a href="https://stackoverflow.com/questions/20039335/what-is-the-purpose-of-setting-a-key-in-data-table/20057411#20057411">What is the purpose of setting a key in data.table?</a></p>
+<p><code>setkey(DT, col1, col2)</code> orders the rows by column <code>col1</code> then within each group of <code>col1</code> it orders by <code>col2</code>. This is a <em>primary index</em>. The row order is changed <em>by reference</em> in RAM. Subsequent joins and groups on those key columns then take advantage of the sort order for efficiency. (Imagine how difficult looking for a phone number in a printed telephone directory would be if it wasn’t sorted by surname then forename. Tha [...]
+<p>However, you can only have one primary key because data can only be physically sorted in RAM in one way at a time. Choose the primary index to be the one you use most often (e.g. <code>[id,date]</code>). Sometimes there isn’t an obvious choice for the primary key or you need to join and group many different columns in different orders. Enter a secondary index. This does use memory (<code>4*nrow</code> bytes regardless of the number of columns in the index) to store the order of the ro [...]
+<p>We use the words <em>index</em> and <em>key</em> interchangeably.</p>
+</div>
+</div>
+<div id="error-messages" class="section level1">
+<h1><span class="header-section-number">4</span> Error messages</h1>
+<div id="could-not-find-function-dt" class="section level2">
+<h2><span class="header-section-number">4.1</span> “Could not find function <code>DT</code>”</h2>
+<p>See above <a href="#DTremove1">here</a> and <a href="#DTremove2">here</a>.</p>
+</div>
+<div id="unused-arguments-mysum-sumv" class="section level2">
+<h2><span class="header-section-number">4.2</span> “unused argument(s) (<code>MySum = sum(v)</code>)”</h2>
+<p>This error is generated by <code>DT[ , MySum = sum(v)]</code>. <code>DT[ , .(MySum = sum(v))]</code> was intended, or <code>DT[ , j = .(MySum = sum(v))]</code>.</p>
+</div>
+<div id="translatecharutf8-must-be-called-on-a-charsxp" class="section level2">
+<h2><span class="header-section-number">4.3</span> “<code>translateCharUTF8</code> must be called on a <code>CHARSXP</code>”</h2>
+<p>This error (and similar, <em>e.g.</em>, “<code>getCharCE</code> must be called on a <code>CHARSXP</code>”) may be nothing do with character data or locale. Instead, this can be a symptom of an earlier memory corruption. To date these have been reproducible and fixed (quickly). Please report it to our <a href="https://github.com/Rdatatable/data.table/issues">issues tracker</a>.</p>
+</div>
+<div id="cbinddt-df-returns-a-strange-format-e.g.-integer5" class="section level2">
+<h2><span class="header-section-number">4.4</span> <code>cbind(DT, DF)</code> returns a strange format, <em>e.g.</em> <code id="cbinderror">Integer,5</code></h2>
+<p>This occurs prior to v1.6.5, for <code>rbind(DT, DF)</code> too. Please upgrade to v1.6.7 or later.</p>
+</div>
+<div id="cannot-change-value-of-locked-binding-for-.sd" class="section level2">
+<h2><span class="header-section-number">4.5</span> “cannot change value of locked binding for <code>.SD</code>”</h2>
+<p><code>.SD</code> is locked by design. See <code>?data.table</code>. If you’d like to manipulate <code>.SD</code> before using it, or returning it, and don’t wish to modify <code>DT</code> using <code>:=</code>, then take a copy first (see <code>?copy</code>), <em>e.g.</em>,</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">a =</span> <span class="kw">rep</span>(<span class="dv">1</span>:<span class="dv">3</span>, <span class="dv">1</span>:<span class="dv">3</span>), <span class="dt">b =</span> <span class="dv">1</span>:<span class="dv">6</span>, <span class="dt">c =</span> <span class="dv">7</span>:<span class="dv">12</span>)
+DT
+<span class="co">#    a b  c</span>
+<span class="co"># 1: 1 1  7</span>
+<span class="co"># 2: 2 2  8</span>
+<span class="co"># 3: 2 3  9</span>
+<span class="co"># 4: 3 4 10</span>
+<span class="co"># 5: 3 5 11</span>
+<span class="co"># 6: 3 6 12</span>
+DT[ , { mySD =<span class="st"> </span><span class="kw">copy</span>(.SD)
+      mySD[<span class="dv">1</span>, b :<span class="er">=</span><span class="st"> </span>99L]
+      mySD},
+    by =<span class="st"> </span>a]
+<span class="co">#    a  b  c</span>
+<span class="co"># 1: 1 99  7</span>
+<span class="co"># 2: 2 99  8</span>
+<span class="co"># 3: 2  3  9</span>
+<span class="co"># 4: 3 99 10</span>
+<span class="co"># 5: 3  5 11</span>
+<span class="co"># 6: 3  6 12</span></code></pre></div>
+</div>
+<div id="cannot-change-value-of-locked-binding-for-.n" class="section level2">
+<h2><span class="header-section-number">4.6</span> “cannot change value of locked binding for <code>.N</code>”</h2>
+<p>Please upgrade to v1.8.1 or later. From this version, if <code>.N</code> is returned by <code>j</code> it is renamed to <code>N</code> to avoid any ambiguity in any subsequent grouping between the <code>.N</code> special variable and a column called <code>".N"</code>.</p>
+<p>The old behaviour can be reproduced by forcing <code>.N</code> to be called <code>.N</code>, like this :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">a =</span> <span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span class="dv">2</span>), <span class="dt">b =</span> <span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span c [...]
+DT
+<span class="co">#    a b</span>
+<span class="co"># 1: 1 1</span>
+<span class="co"># 2: 1 2</span>
+<span class="co"># 3: 2 2</span>
+<span class="co"># 4: 2 2</span>
+<span class="co"># 5: 2 1</span>
+DT[ , <span class="kw">list</span>(<span class="dt">.N =</span> .N), <span class="kw">list</span>(a, b)]   <span class="co"># show intermediate result for exposition</span>
+<span class="co">#    a b .N</span>
+<span class="co"># 1: 1 1  1</span>
+<span class="co"># 2: 1 2  1</span>
+<span class="co"># 3: 2 2  2</span>
+<span class="co"># 4: 2 1  1</span>
+<span class="kw">cat</span>(<span class="kw">try</span>(
+    DT[ , <span class="kw">list</span>(<span class="dt">.N =</span> .N), <span class="dt">by =</span> <span class="kw">list</span>(a, b)][ , <span class="kw">unique</span>(.N), <span class="dt">by =</span> a]   <span class="co"># compound query more typical</span>
+, <span class="dt">silent =</span> <span class="ot">TRUE</span>))
+<span class="co"># Error in `[.data.table`(DT[, list(.N = .N), by = list(a, b)], , unique(.N),  : </span>
+<span class="co">#   The column '.N' can't be grouped because it conflicts with the special .N variable. Try setnames(DT,'.N','N') first.</span></code></pre></div>
+<p>If you are already running v1.8.1 or later then the error message is now more helpful than the “cannot change value of locked binding” error, as you can see above, since this vignette was produced using v1.8.1 or later.</p>
+<p>The more natural syntax now works :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">if (<span class="kw">packageVersion</span>(<span class="st">"data.table"</span>) >=<span class="st"> "1.8.1"</span>) {
+    DT[ , .N, by =<span class="st"> </span><span class="kw">list</span>(a, b)][ , <span class="kw">unique</span>(N), by =<span class="st"> </span>a]
+  }
+<span class="co">#    a V1</span>
+<span class="co"># 1: 1  1</span>
+<span class="co"># 2: 2  2</span>
+<span class="co"># 3: 2  1</span>
+if (<span class="kw">packageVersion</span>(<span class="st">"data.table"</span>) >=<span class="st"> "1.9.3"</span>) {
+    DT[ , .N, by =<span class="st"> </span>.(a, b)][ , <span class="kw">unique</span>(N), by =<span class="st"> </span>a]   <span class="co"># same</span>
+}
+<span class="co">#    a V1</span>
+<span class="co"># 1: 1  1</span>
+<span class="co"># 2: 2  2</span>
+<span class="co"># 3: 2  1</span></code></pre></div>
+</div>
+</div>
+<div id="warning-messages" class="section level1">
+<h1><span class="header-section-number">5</span> Warning messages</h1>
+<div id="the-following-objects-are-masked-from-packagebase-cbind-rbind" class="section level2">
+<h2><span class="header-section-number">5.1</span> “The following object(s) are masked from <code>package:base</code>: <code>cbind</code>, <code>rbind</code>”</h2>
+<p>This warning was present in v1.6.5 and v.1.6.6 only, when loading the package. The motivation was to allow <code>cbind(DT, DF)</code> to work, but as it transpired, this broke (full) compatibility with package <code>IRanges</code>. Please upgrade to v1.6.7 or later.</p>
+</div>
+<div id="coerced-numeric-rhs-to-integer-to-match-the-columns-type" class="section level2">
+<h2><span class="header-section-number">5.2</span> “Coerced numeric RHS to integer to match the column’s type”</h2>
+<p>Hopefully, this is self explanatory. The full message is:</p>
+<p>Coerced numeric RHS to integer to match the column’s type; may have truncated precision. Either change the column to numeric first by creating a new numeric vector length 5 (nrows of entire table) yourself and assigning that (i.e. ‘replace’ column), or coerce RHS to integer yourself (e.g. 1L or as.integer) to make your intent clear (and for speed). Or, set the column type correctly up front when you create the table and stick to it, please.</p>
+<p>To generate it, try :</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">a =</span> <span class="dv">1</span>:<span class="dv">5</span>, <span class="dt">b =</span> <span class="dv">1</span>:<span class="dv">5</span>)
+<span class="kw">suppressWarnings</span>(
+DT[<span class="dv">2</span>, b :<span class="er">=</span><span class="st"> </span><span class="dv">6</span>]         <span class="co"># works (slower) with warning</span>
+)
+<span class="co">#    a b</span>
+<span class="co"># 1: 1 1</span>
+<span class="co"># 2: 2 6</span>
+<span class="co"># 3: 3 3</span>
+<span class="co"># 4: 4 4</span>
+<span class="co"># 5: 5 5</span>
+<span class="kw">class</span>(<span class="dv">6</span>)              <span class="co"># numeric not integer</span>
+<span class="co"># [1] "numeric"</span>
+DT[<span class="dv">2</span>, b :<span class="er">=</span><span class="st"> </span>7L]        <span class="co"># works (faster) without warning</span>
+<span class="co">#    a b</span>
+<span class="co"># 1: 1 1</span>
+<span class="co"># 2: 2 7</span>
+<span class="co"># 3: 3 3</span>
+<span class="co"># 4: 4 4</span>
+<span class="co"># 5: 5 5</span>
+<span class="kw">class</span>(7L)             <span class="co"># L makes it an integer</span>
+<span class="co"># [1] "integer"</span>
+DT[ , b :<span class="er">=</span><span class="st"> </span><span class="kw">rnorm</span>(<span class="dv">5</span>)]  <span class="co"># 'replace' integer column with a numeric column</span>
+<span class="co">#    a          b</span>
+<span class="co"># 1: 1  0.3941705</span>
+<span class="co"># 2: 2 -0.5748240</span>
+<span class="co"># 3: 3  0.8167449</span>
+<span class="co"># 4: 4 -1.3435421</span>
+<span class="co"># 5: 5 -0.2106035</span></code></pre></div>
+</div>
+<div id="reading-data.table-from-rds-or-rdata-file" class="section level2">
+<h2><span class="header-section-number">5.3</span> Reading data.table from RDS or RData file</h2>
+<p><code>*.RDS</code> and <code>*.RData</code> are file types which can store in-memory R objects on disk efficiently. However, storing data.table into the binary file loses its column over-allocation. This isn’t a big deal – your data.table will be copied in memory on the next <em>by reference</em> operation and throw a warning. Therefore it is recommended to call <code>alloc.col()</code> on each data.table loaded with <code>readRDS()</code> or <code>load()</code> calls.</p>
+</div>
+</div>
+<div id="general-questions-about-the-package" class="section level1">
+<h1><span class="header-section-number">6</span> General questions about the package</h1>
+<div id="v1.3-appears-to-be-missing-from-the-cran-archive" class="section level2">
+<h2><span class="header-section-number">6.1</span> v1.3 appears to be missing from the CRAN archive?</h2>
+<p>That is correct. v1.3 was available on R-Forge only. There were several large changes internally and these took some time to test in development.</p>
+</div>
+<div id="is-data.table-compatible-with-s-plus" class="section level2">
+<h2><span class="header-section-number">6.2</span> Is data.table compatible with S-plus?</h2>
+<p>Not currently.</p>
+<ul>
+<li>A few core parts of the package are written in C and use internal R functions and R structures.</li>
+<li>The package uses lexical scoping which is one of the differences between R and <strong>S-plus</strong> explained by <a href="https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping">R FAQ 3.3.1</a></li>
+</ul>
+</div>
+<div id="is-it-available-for-linux-mac-and-windows" class="section level2">
+<h2><span class="header-section-number">6.3</span> Is it available for Linux, Mac and Windows?</h2>
+<p>Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no special or OS-specific libraries used.</p>
+</div>
+<div id="i-think-its-great.-what-can-i-do" class="section level2">
+<h2><span class="header-section-number">6.4</span> I think it’s great. What can I do?</h2>
+<p>Please file suggestions, bug reports and enhancement requests on our <a href="https://github.com/Rdatatable/data.table/issues">issues tracker</a>. This helps make the package better.</p>
+<p>Please do star the package on <a href="https://github.com/Rdatatable/data.table/wiki">GitHub</a>. This helps encourage the developers and helps other R users find the package.</p>
+<p>You can submit pull requests to change the code and/or documentation yourself; see our <a href="https://github.com/Rdatatable/data.table/blob/master/Contributing.md">Contribution Guidelines</a>.</p>
+</div>
+<div id="i-think-its-not-great.-how-do-i-warn-others-about-my-experience" class="section level2">
+<h2><span class="header-section-number">6.5</span> I think it’s not great. How do I warn others about my experience?</h2>
+<p>Please put your vote and comments on <a href="http://crantastic.org/packages/data-table">Crantastic</a>. Please make it constructive so we have a chance to improve.</p>
+</div>
+<div id="i-have-a-question.-i-know-the-r-help-posting-guide-tells-me-to-contact-the-maintainer-not-r-help-but-is-there-a-larger-group-of-people-i-can-ask" class="section level2">
+<h2><span class="header-section-number">6.6</span> I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?</h2>
+<p>Yes, there are two options. You can post to <a href="mailto:datatable-help at lists.r-forge.r-project.org">datatable-help</a>. It’s like r-help, but just for this package. Or the <a href="https://stackoverflow.com/tags/data.table/info"><code>[data.table]</code> tag</a> on <a href="https://stackoverflow.com/">Stack Overflow</a>. Feel free to answer questions in those places, too.</p>
+</div>
+<div id="where-are-the-datatable-help-archives" class="section level2">
+<h2><span class="header-section-number">6.7</span> Where are the datatable-help archives?</h2>
+<p>The <a href="https://github.com/Rdatatable/data.table/wiki">homepage</a> contains links to the archives in several formats.</p>
+</div>
+<div id="id-prefer-not-to-post-on-the-issues-page-can-i-mail-just-one-or-two-people-privately" class="section level2">
+<h2><span class="header-section-number">6.8</span> I’d prefer not to post on the Issues page, can I mail just one or two people privately?</h2>
+<p>Sure. You’re more likely to get a faster answer from the Issues page or Stack Overflow, though. Further, asking publicly in those places helps build the general knowledge base.</p>
+</div>
+<div id="i-have-created-a-package-that-uses-data.table.-how-do-i-ensure-my-package-is-data.table-aware-so-that-inheritance-from-data.frame-works" class="section level2">
+<h2><span class="header-section-number">6.9</span> I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from <code>data.frame</code> works?</h2>
+<p>Please see <a href="http://stackoverflow.com/a/10529888/403310">this answer</a>.</p>
+</div>
+</div>
+<div class="footnotes">
+<hr />
+<ol>
+<li id="fn1"><p>Here we mean either the <code>merge</code> <em>method</em> for data.table or the <code>merge</code> method for <code>data.frame</code> since both methods work in the same way in this respect. See <code>?merge.data.table</code> and <a href="#r-dispatch">below</a> for more information about method dispatch.<a href="#fnref1">↩</a></p></li>
+<li id="fn2"><p>It may be a surprise to learn that <code>select top 10 * from ...</code> does <em>not</em> reliably return the same rows over time in SQL. You do need to include an <code>order by</code> clause, or use a clustered index to guarantee row order; <em>i.e.</em>, SQL is inherently unordered.<a href="#fnref2">↩</a></p></li>
+<li id="fn3"><p><em>e.g.</em>, <code>hist()</code> returns the breakpoints in addition to plotting to the graphics device.<a href="#fnref3">↩</a></p></li>
+</ol>
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/datatable-faq.pdf b/inst/doc/datatable-faq.pdf
deleted file mode 100644
index 13f3895..0000000
Binary files a/inst/doc/datatable-faq.pdf and /dev/null differ
diff --git a/inst/doc/datatable-intro-vignette.R b/inst/doc/datatable-intro-vignette.R
deleted file mode 100644
index 539af8f..0000000
--- a/inst/doc/datatable-intro-vignette.R
+++ /dev/null
@@ -1,210 +0,0 @@
-## ---- echo = FALSE, message = FALSE--------------------------------------
-require(data.table)
-knitr::opts_chunk$set(
-  comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
-
-## ----echo=FALSE-----------------------------------------------------------------------------------
-options(width=100)
-
-## -------------------------------------------------------------------------------------------------
-flights <- fread("flights14.csv")
-flights
-dim(flights)
-
-## -------------------------------------------------------------------------------------------------
-DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
-DT
-class(DT$ID)
-
-## -------------------------------------------------------------------------------------------------
-getOption("datatable.print.nrows")
-
-## ----eval = FALSE---------------------------------------------------------------------------------
-#  DT[i, j, by]
-#  
-#  ##   R:      i                 j        by
-#  ## SQL:  where   select | update  group by
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[origin == "JFK" & month == 6L]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[1:2]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[order(origin, -dest)]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-odt = data.table(col=sample(1e7))
-(t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
-(t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's forder
-(identical(ans1, ans2))
-
-## ----echo = FALSE---------------------------------------------------------------------------------
-rm(odt); rm(ans1); rm(ans2); rm(t1); rm(t2)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, arr_delay]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, list(arr_delay)]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, .(arr_delay, dep_delay)]
-head(ans)
-
-## alternatively
-# ans <- flights[, list(arr_delay, dep_delay)]
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, .(delay_arr = arr_delay, delay_dep = dep_delay)]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, sum((arr_delay + dep_delay) < 0)]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[origin == "JFK" & month == 6L, 
-               .(m_arr=mean(arr_delay), m_dep=mean(dep_delay))]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[origin == "JFK" & month == 6L, length(dest)]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[origin == "JFK" & month == 6L, .N]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, c("arr_delay", "dep_delay"), with=FALSE]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-DF = data.frame(x = c(1,1,1,2,2,3,3,3), y = 1:8)
-
-## (1) normal way
-DF[DF$x > 1, ] # data.frame needs that ',' as well
-
-## (2) using with
-DF[with(DF, x > 1), ]
-
-## ----eval=FALSE-----------------------------------------------------------------------------------
-#  ## not run
-#  
-#  # returns all columns except arr_delay and dep_delay
-#  ans <- flights[, !c("arr_delay", "dep_delay"), with=FALSE]
-#  # or
-#  ans <- flights[, -c("arr_delay", "dep_delay"), with=FALSE]
-
-## ----eval=FALSE-----------------------------------------------------------------------------------
-#  ## not run
-#  
-#  # returns year,month and day
-#  ans <- flights[, year:day, with=FALSE]
-#  # returns day, month and year
-#  ans <- flights[, day:year, with=FALSE]
-#  # returns all columns except year, month and day
-#  ans <- flights[, -(year:day), with=FALSE]
-#  ans <- flights[, !(year:day), with=FALSE]
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, .(.N), by=.(origin)]
-ans
-
-## or equivalently using a character vector in 'by'
-# ans <- flights[, .(.N), by="origin"]
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, .N, by=origin]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", .N, by=origin]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", .N, by=.(origin,dest)]
-head(ans)
-
-## or equivalently using a character vector in 'by'
-# ans <- flights[carrier == "AA", .N, by=c("origin", "dest")]
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        by=.(origin, dest, month)]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        keyby=.(origin, dest, month)]
-ans
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
-
-## -------------------------------------------------------------------------------------------------
-ans <- ans[order(origin, -dest)]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[carrier == "AA", .N, by=.(origin, dest)][order(origin, -dest)]
-head(ans, 10)
-
-## ----eval=FALSE-----------------------------------------------------------------------------------
-#  DT[ ...
-#   ][ ...
-#   ][ ...
-#   ]
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, .N, .(dep_delay>0, arr_delay>0)]
-ans
-
-## -------------------------------------------------------------------------------------------------
-DT
-
-DT[, print(.SD), by=ID]
-
-## -------------------------------------------------------------------------------------------------
-DT[, lapply(.SD, mean), by=ID]
-
-## -------------------------------------------------------------------------------------------------
-flights[carrier == "AA",                     ## Only on trips with carrier "AA"
-        lapply(.SD, mean),                   ## compute the mean
-        by=.(origin, dest, month),           ## for every 'origin,dest,month'
-        .SDcols=c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
-
-## -------------------------------------------------------------------------------------------------
-ans <- flights[, head(.SD, 2), by=month]
-head(ans)
-
-## -------------------------------------------------------------------------------------------------
-DT[, .(val = c(a,b)), by=ID]
-
-## -------------------------------------------------------------------------------------------------
-DT[, .(val = list(c(a,b))), by=ID]
-
-## -------------------------------------------------------------------------------------------------
-## (1) look at the difference between
-DT[, print(c(a,b)), by=ID]
-
-## (2) and
-DT[, print(list(c(a,b))), by=ID]
-
-## ----eval=FALSE-----------------------------------------------------------------------------------
-#  DT[i, j, by]
-
diff --git a/inst/doc/datatable-intro-vignette.html b/inst/doc/datatable-intro-vignette.html
deleted file mode 100644
index 3b59c6c..0000000
--- a/inst/doc/datatable-intro-vignette.html
+++ /dev/null
@@ -1,974 +0,0 @@
-<!DOCTYPE html>
-
-<html xmlns="http://www.w3.org/1999/xhtml">
-
-<head>
-
-<meta charset="utf-8">
-<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
-<meta name="generator" content="pandoc" />
-
-
-<meta name="date" content="2015-09-18" />
-
-<title>Introduction to data.table</title>
-
-<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4wIHwgKGMpIDIwMDUsIDIwMTQgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
-<meta name="viewport" content="width=device-width, initial-scale=1" />
-<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjEgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE0IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgTUlUIChodHRwczovL2dpdGh1Yi5jb20vdHdicy9ib290c3RyYXAvYmxvYi9tYXN0ZXIvTElDRU5TRSkKICovCmlmKCJ1bmRlZmluZWQiPT10eXBlb2YgalF1ZXJ5KXRocm93IG5ldyBFcnJvcigiQm9vdHN0cmFwJ3MgSmF2YVNjcmlwdCByZXF1aXJlcyBqUXVlcnkiKTsrZnVuY3Rpb24oYSl7dmFyIGI9YS5mbi5qcXVlcnkuc3BsaXQoIiAiKVswXS5zcGxpdCgiLiIpO2lmKGJbMF08MiYmYlsxXTw5fH [...]
-<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
-<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG9jdW1lbnRFbGVtZW50LGQ9Yy5maXJzdEVsZW1lbn [...]
-<style type="text/css">
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 400;
-  src: url(fonts/OpenSans.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 700;
-  src: url(fonts/OpenSansBold.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 400;
-  src: url(fonts/OpenSansItalic.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 700;
-  src: url(fonts/OpenSansBoldItalic.ttf) format('truetype');
-}
-
-/*!
- * bootswatch v3.3.1+1
- * Homepage: http://bootswatch.com
- * Copyright 2012-2014 Thomas Park
- * Licensed under MIT
- * Based on Bootstrap
-*//*! normalize.css v3.0.2 | MIT License | git.io/normalize */html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}body{margin:0}article,aside,details,figcaption,figure,footer,header,hgroup,main,menu,nav,section,summary{display:block}audio,canvas,progress,video{display:inline-block;vertical-align:baseline}audio:not([controls]){display:none;height:0}[hidden],template{display:none}a{background-color:transparent}a:active,a:hover{outline:0}abbr[title]{border-bo [...]
-</style>
-
-
-<style type="text/css">code{white-space: pre;}</style>
-<style type="text/css">
-div.sourceCode { overflow-x: auto; }
-table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
-  margin: 0; padding: 0; vertical-align: baseline; border: none; }
-table.sourceCode { width: 100%; line-height: 100%; }
-td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
-td.sourceCode { padding-left: 5px; }
-code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
-code > span.dt { color: #902000; } /* DataType */
-code > span.dv { color: #40a070; } /* DecVal */
-code > span.bn { color: #40a070; } /* BaseN */
-code > span.fl { color: #40a070; } /* Float */
-code > span.ch { color: #4070a0; } /* Char */
-code > span.st { color: #4070a0; } /* String */
-code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
-code > span.ot { color: #007020; } /* Other */
-code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
-code > span.fu { color: #06287e; } /* Function */
-code > span.er { color: #ff0000; font-weight: bold; } /* Error */
-code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
-code > span.cn { color: #880000; } /* Constant */
-code > span.sc { color: #4070a0; } /* SpecialChar */
-code > span.vs { color: #4070a0; } /* VerbatimString */
-code > span.ss { color: #bb6688; } /* SpecialString */
-code > span.im { } /* Import */
-code > span.va { color: #19177c; } /* Variable */
-code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
-code > span.op { color: #666666; } /* Operator */
-code > span.bu { } /* BuiltIn */
-code > span.ex { } /* Extension */
-code > span.pp { color: #bc7a00; } /* Preprocessor */
-code > span.at { color: #7d9029; } /* Attribute */
-code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
-code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
-code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
-code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
-</style>
-<style type="text/css">
-  pre:not([class]) {
-    background-color: white;
-  }
-</style>
-
-
-<link href="data:text/css;charset=utf-8,code%2C%0Akbd%2C%0Apre%2C%0Asamp%20%7B%0Afont%2Dfamily%3A%20Source%20Code%20Pro%2C%20Inconsolata%2C%20Monaco%2C%20Consolas%2C%20Menlo%2C%20Courier%20New%2C%20monospace%3B%0A%7D%0Acode%20%7B%0Apadding%3A%200px%202px%3B%0Afont%2Dsize%3A%2090%25%3B%0Acolor%3A%20%23c7254e%3B%0Awhite%2Dspace%3A%20nowrap%3B%0Abackground%2Dcolor%3A%20%23f9f2f4%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%200px%3B%0A%7D%0Apre%20%7B%0Adisplay%3A%20block%3B%0Apadding%3A%209% [...]
-
-</head>
-
-<body>
-
-<style type="text/css">
-.main-container {
-  max-width: 940px;
-  margin-left: auto;
-  margin-right: auto;
-}
-code {
-  color: inherit;
-  background-color: rgba(0, 0, 0, 0.04);
-}
-img { 
-  max-width:100%; 
-  height: auto; 
-}
-</style>
-<div class="container-fluid main-container">
-
-
-<div id="header">
-<h1 class="title">Introduction to data.table</h1>
-<h4 class="date"><em>2015-09-18</em></h4>
-</div>
-
-
-<p>This vignette introduces the <em>data.table</em> syntax, its general form, how to <em>subset</em> rows, <em>select and compute</em> on columns and perform aggregations <em>by group</em>. Familiarity with <em>data.frame</em> data structure from base R is useful, but not essential to follow this vignette.</p>
-<hr />
-<div id="data-analysis-using-data.table" class="section level2">
-<h2>Data analysis using data.table</h2>
-<p>Data manipulation operations such as <em>subset</em>, <em>group</em>, <em>update</em>, <em>join</em> etc., are all inherently related. Keeping these <em>related operations together</em> allows for:</p>
-<ul>
-<li><p><em>concise</em> and <em>consistent</em> syntax irrespective of the set of operations you would like to perform to achieve your end goal.</p></li>
-<li><p>performing analysis <em>fluidly</em> without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis.</p></li>
-<li><p><em>automatically</em> optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient.</p></li>
-</ul>
-<p>Briefly, if you are interested in reducing <em>programming</em> and <em>compute</em> time tremendously, then this package is for you. The philosophy that <em>data.table</em> adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.</p>
-</div>
-<div id="data" class="section level2">
-<h2>Data</h2>
-<p>In this vignette, we will use <a href="https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data">NYC-flights14</a> data. It contains On-Time flights data from the <a href="http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236">Bureau of Transporation Statistics</a> for all the flights that departed from New York City airports in 2014 (inspired by <a href="https://github.com/hadley/nycflights13">nycflights13</a>). The data is available only for Jan-Oct’14.</p>
-<p>We can use <em>data.table’s</em> fast file reader <code>fread</code> to load <em>flights</em> directly as follows:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <-<span class="st"> </span><span class="kw">fread</span>(<span class="st">"flights14.csv"</span>)
-flights
-<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
-<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
-<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
-<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
-<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
-<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
-<span class="co">#     ---                                                                              </span>
-<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14</span>
-<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8</span>
-<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11</span>
-<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11</span>
-<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8</span>
-<span class="kw">dim</span>(flights)
-<span class="co"># [1] 253316     11</span></code></pre></div>
-<p>Aside: <code>fread</code> accepts <code>http</code> and <code>https</code> URLs directly as well as operating system commands such as <code>sed</code> and <code>awk</code> output. See <code>?fread</code> for examples.</p>
-</div>
-<div id="introduction" class="section level2">
-<h2>Introduction</h2>
-<p>In this vignette, we will</p>
-<ol style="list-style-type: decimal">
-<li><p>start with basics - what is a <em>data.table</em>, its general form, how to <em>subset</em> rows, <em>select and compute</em> on columns</p></li>
-<li><p>and then we will look at performing data aggregations by group,</p></li>
-</ol>
-</div>
-<div id="basics-1" class="section level2">
-<h2>1. Basics</h2>
-<div id="what-is-datatable-1a" class="section level3">
-<h3>a) What is data.table?</h3>
-<p><em>data.table</em> is an R package that provides <strong>an enhanced version</strong> of <em>data.frames</em>. In the <a href="#data">Data</a> section, we already created a <em>data.table</em> using <code>fread()</code>. We can also create one using the <code>data.table()</code> function. Here is an example:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">ID =</span> <span class="kw">c</span>(<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"a"</span>,<span class="st">"a"</span>,<span class="st">"c"</span>), <span class="dt">a =</span> <span class="dv">1</span>:<span class= [...]
-DT
-<span class="co">#    ID a  b  c</span>
-<span class="co"># 1:  b 1  7 13</span>
-<span class="co"># 2:  b 2  8 14</span>
-<span class="co"># 3:  b 3  9 15</span>
-<span class="co"># 4:  a 4 10 16</span>
-<span class="co"># 5:  a 5 11 17</span>
-<span class="co"># 6:  c 6 12 18</span>
-<span class="kw">class</span>(DT$ID)
-<span class="co"># [1] "character"</span></code></pre></div>
-<p>You can also convert existing objects to a <em>data.table</em> using <code>as.data.table()</code>.</p>
-<div id="note-that" class="section level4 bs-callout bs-callout-info">
-<h4>Note that:</h4>
-<ul>
-<li><p>Unlike <em>data.frames</em>, columns of <code>character</code> type are <em>never</em> converted to <code>factors</code> by default.</p></li>
-<li><p>Row numbers are printed with a <code>:</code> in order to visually separate the row number from the first column.</p></li>
-<li><p>When the number of rows to print exceeds the global option <code>datatable.print.nrows</code> (default = 100), it automatically prints only the top 5 and bottom 5 rows (as can be seen in the <a href="#data">Data</a> section).</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">getOption</span>(<span class="st">"datatable.print.nrows"</span>)
-<span class="co"># [1] 100</span></code></pre></div></li>
-<li><p><em>data.table</em> doesn’t set or use <em>row names</em>, ever. We will see as to why in <em>“Keys and fast binary search based subset”</em> vignette.</p></li>
-</ul>
-</div>
-</div>
-<div id="enhanced-1b" class="section level3">
-<h3>b) General form - in what way is a data.table <em>enhanced</em>?</h3>
-<p>In contrast to a <em>data.frame</em>, you can do <em>a lot more</em> than just subsetting rows and selecting columns within the frame of a <em>data.table</em>, i.e., within <code>[ ... ]</code>. To understand it we will have to first look at the <em>general form</em> of <em>data.table</em> syntax, as shown below:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[i, j, by]
-
-##   R:      i                 j        by
-## SQL:  where   select | update  group by</code></pre></div>
-<p>Users who have a SQL background might perhaps immediately relate to this syntax.</p>
-<div id="the-way-to-read-it-out-loud-is" class="section level4 bs-callout bs-callout-info">
-<h4>The way to read it (out loud) is:</h4>
-<p>Take <code>DT</code>, subset rows using <code>i</code>, then calculate <code>j</code>, grouped by <code>by</code>.</p>
-</div>
-</div>
-</div>
-<div id="section" class="section level1">
-<h1></h1>
-<p>Let’s begin by looking at <code>i</code> and <code>j</code> first - subsetting rows and operating on columns.</p>
-<div id="c-subset-rows-in-i" class="section level3">
-<h3>c) Subset rows in <code id="subset-i-1c">i</code></h3>
-<div id="get-all-the-flights-with-jfk-as-the-origin-airport-in-the-month-of-june." class="section level4">
-<h4>– Get all the flights with “JFK” as the origin airport in the month of June.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L]
-<span class="kw">head</span>(ans)
-<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
-<span class="co"># 1: 2014     6   1        -9        -5      AA    JFK  LAX      324     2475    8</span>
-<span class="co"># 2: 2014     6   1       -10       -13      AA    JFK  LAX      329     2475   12</span>
-<span class="co"># 3: 2014     6   1        18        -1      AA    JFK  LAX      326     2475    7</span>
-<span class="co"># 4: 2014     6   1        -6       -16      AA    JFK  LAX      320     2475   10</span>
-<span class="co"># 5: 2014     6   1        -4       -45      AA    JFK  LAX      326     2475   18</span>
-<span class="co"># 6: 2014     6   1        -6       -23      AA    JFK  LAX      329     2475   14</span></code></pre></div>
-</div>
-<div id="section-1" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Within the frame of a <em>data.table</em>, columns can be referred to <em>as if they are variables</em>. Therefore, we simply refer to <code>dest</code> and <code>month</code> as if they are variables. We do not need to add the prefix <code>flights$</code> each time. However using <code>flights$dest</code> and <code>flights$month</code> would work just fine.</p></li>
-<li><p>The <em>row indices</em> that satisfies the condition <code>origin == "JFK" & month == 6L</code> are computed, and since there is nothing else left to do, a <em>data.table</em> all columns from <code>flights</code> corresponding to those <em>row indices</em> are simply returned.</p></li>
-<li><p>A comma after the condition is also not required in <code>i</code>. But <code>flights[dest == "JFK" & month == 6L, ]</code> would work just fine. In <em>data.frames</em> however, the comma is necessary.</p></li>
-</ul>
-</div>
-<div id="subset-rows-integer" class="section level4">
-<h4>– Get the first two rows from <code>flights</code>.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="dv">1</span>:<span class="dv">2</span>]
-ans
-<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
-<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
-<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span></code></pre></div>
-</div>
-<div id="section-2" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li>In this case, there is no condition. The row indices are already provided in <code>i</code>. We therefore return a <em>data.table</em> with all columns from <code>flight</code> for those <em>row indices</em>.</li>
-</ul>
-</div>
-<div id="sort-flights-first-by-column-origin-in-ascending-order-and-then-by-dest-in-descending-order" class="section level4">
-<h4>– Sort <code>flights</code> first by column <code>origin</code> in <em>ascending</em> order, and then by <code>dest</code> in <em>descending</em> order:</h4>
-<p>We can use the base R function <code>order()</code> to accomplish this.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="kw">order</span>(origin, -dest)]
-<span class="kw">head</span>(ans)
-<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
-<span class="co"># 1: 2014     1   5         6        49      EV    EWR  XNA      195     1131    8</span>
-<span class="co"># 2: 2014     1   6         7        13      EV    EWR  XNA      190     1131    8</span>
-<span class="co"># 3: 2014     1   7        -6       -13      EV    EWR  XNA      179     1131    8</span>
-<span class="co"># 4: 2014     1   8        -7       -12      EV    EWR  XNA      184     1131    8</span>
-<span class="co"># 5: 2014     1   9        16         7      EV    EWR  XNA      181     1131    8</span>
-<span class="co"># 6: 2014     1  13        66        66      EV    EWR  XNA      188     1131    9</span></code></pre></div>
-</div>
-<div id="order-is-internally-optimised" class="section level4 bs-callout bs-callout-info">
-<h4><code>order()</code> is internally optimised</h4>
-<ul>
-<li><p>We can use “-” on a <em>character</em> columns within the frame of a <em>data.table</em> to sort in decreasing order.</p></li>
-<li><p>In addition, <code>order(...)</code> within the frame of a <em>data.table</em> uses <em>data.table</em>’s internal fast radix order <code>forder()</code>, which is much faster than <code>base::order</code>. Here’s a small example to highlight the difference.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">odt =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">col=</span><span class="kw">sample</span>(<span class="fl">1e7</span>))
-(t1 <-<span class="st"> </span><span class="kw">system.time</span>(ans1 <-<span class="st"> </span>odt[base::<span class="kw">order</span>(col)]))  ## uses order from base R
-<span class="co">#    user  system elapsed </span>
-<span class="co">#   4.116   0.016   4.128</span>
-(t2 <-<span class="st"> </span><span class="kw">system.time</span>(ans2 <-<span class="st"> </span>odt[<span class="kw">order</span>(col)]))        ## uses data.table's forder
-<span class="co">#    user  system elapsed </span>
-<span class="co">#   0.308   0.012   0.319</span>
-(<span class="kw">identical</span>(ans1, ans2))
-<span class="co"># [1] TRUE</span></code></pre></div></li>
-</ul>
-<p>The speedup here is <strong>~13x</strong>. We will discuss <em>data.table</em>’s fast order in more detail in the <em>data.table internals</em> vignette.</p>
-<ul>
-<li>This is so that you can improve performance tremendously while using already familiar functions.</li>
-</ul>
-</div>
-</div>
-</div>
-<div id="section-3" class="section level1">
-<h1></h1>
-<div id="d-select-columns-in-j" class="section level3">
-<h3>d) Select column(s) in <code id="select-j-1d">j</code></h3>
-<div id="select-arr_delay-column-but-return-it-as-a-vector." class="section level4">
-<h4>– Select <code>arr_delay</code> column, but return it as a <em>vector</em>.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, arr_delay]
-<span class="kw">head</span>(ans)
-<span class="co"># [1]  13  13   9 -26   1   0</span></code></pre></div>
-</div>
-<div id="section-4" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Since columns can be referred to as if they are variables within the frame of data.tables, we directly refer to the <em>variable</em> we want to subset. Since we want <em>all the rows</em>, we simply skip <code>i</code>.</p></li>
-<li><p>It returns <em>all</em> the rows for the column <code>arr_delay</code>.</p></li>
-</ul>
-</div>
-<div id="select-arr_delay-column-but-return-as-a-data.table-instead." class="section level4">
-<h4>– Select <code>arr_delay</code> column, but return as a <em>data.table</em> instead.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">list</span>(arr_delay)]
-<span class="kw">head</span>(ans)
-<span class="co">#    arr_delay</span>
-<span class="co"># 1:        13</span>
-<span class="co"># 2:        13</span>
-<span class="co"># 3:         9</span>
-<span class="co"># 4:       -26</span>
-<span class="co"># 5:         1</span>
-<span class="co"># 6:         0</span></code></pre></div>
-</div>
-<div id="section-5" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We wrap the <em>variables</em> (column names) within <code>list()</code>, which ensures that a <em>data.table</em> is returned. In case of a single column name, not wrapping with <code>list()</code> returns a vector instead, as seen in the <a href="#select-j-1d">previous example</a>.</p></li>
-<li><p><em>data.table</em> also allows using <code>.()</code> to wrap columns with. It is an <em>alias</em> to <code>list()</code>; they both mean the same. Feel free to use whichever you prefer.</p>
-<p>We will continue to use <code>.()</code> from here on.</p></li>
-</ul>
-</div>
-</div>
-</div>
-<div id="section-6" class="section level1">
-<h1></h1>
-<p><em>data.tables</em> (and <em>data.frames</em>) are internally <em>lists</em> as well, but with all its columns of equal length and with a <em>class</em> attribute. Allowing <code>j</code> to return a <em>list</em> enables converting and returning a <em>data.table</em> very efficiently.</p>
-<div id="tip-1" class="section level4 bs-callout bs-callout-warning">
-<h4>Tip:</h4>
-<p>As long as <code>j-expression</code> returns a <em>list</em>, each element of the list will be converted to a column in the resulting <em>data.table</em>. This makes <code>j</code> quite powerful, as we will see shortly.</p>
-</div>
-<div id="select-both-arr_delay-and-dep_delay-columns." class="section level4">
-<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(arr_delay, dep_delay)]
-<span class="kw">head</span>(ans)
-<span class="co">#    arr_delay dep_delay</span>
-<span class="co"># 1:        13        14</span>
-<span class="co"># 2:        13        -3</span>
-<span class="co"># 3:         9         2</span>
-<span class="co"># 4:       -26        -8</span>
-<span class="co"># 5:         1         2</span>
-<span class="co"># 6:         0         4</span>
-
-## alternatively
-<span class="co"># ans <- flights[, list(arr_delay, dep_delay)]</span></code></pre></div>
-</div>
-<div id="section-7" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li>Wrap both columns within <code>.()</code>, or <code>list()</code>. That’s it.</li>
-</ul>
-</div>
-</div>
-<div id="section-8" class="section level1">
-<h1></h1>
-<div id="select-both-arr_delay-and-dep_delay-columns-and-rename-them-to-delay_arr-and-delay_dep." class="section level4">
-<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns <em>and</em> rename them to <code>delay_arr</code> and <code>delay_dep</code>.</h4>
-<p>Since <code>.()</code> is just an alias for <code>list()</code>, we can name columns as we would while creating a <em>list</em>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(<span class="dt">delay_arr =</span> arr_delay, <span class="dt">delay_dep =</span> dep_delay)]
-<span class="kw">head</span>(ans)
-<span class="co">#    delay_arr delay_dep</span>
-<span class="co"># 1:        13        14</span>
-<span class="co"># 2:        13        -3</span>
-<span class="co"># 3:         9         2</span>
-<span class="co"># 4:       -26        -8</span>
-<span class="co"># 5:         1         2</span>
-<span class="co"># 6:         0         4</span></code></pre></div>
-<p>That’s it.</p>
-</div>
-<div id="e-compute-or-do-in-j" class="section level3">
-<h3>e) Compute or <em>do</em> in <code>j</code></h3>
-<div id="how-many-trips-have-had-total-delay-0" class="section level4">
-<h4>– How many trips have had total delay < 0?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">sum</span>((arr_delay +<span class="st"> </span>dep_delay) <<span class="st"> </span><span class="dv">0</span>)]
-ans
-<span class="co"># [1] 141814</span></code></pre></div>
-</div>
-<div id="whats-happening-here" class="section level4 bs-callout bs-callout-info">
-<h4>What’s happening here?</h4>
-<ul>
-<li><em>data.table</em>’s <code>j</code> can handle more than just <em>selecting columns</em> - it can handle <em>expressions</em>, i.e., <em>compute on columns</em>. This shouldn’t be surprising, as columns can be referred to as if they are variables. Then we should be able to <em>compute</em> by calling functions on those variables. And that’s what precisely happens here.</li>
-</ul>
-</div>
-</div>
-<div id="f-subset-in-i-and-do-in-j" class="section level3">
-<h3>f) Subset in <code>i</code> <em>and</em> do in <code>j</code></h3>
-<div id="calculate-the-average-arrival-and-departure-delay-for-all-flights-with-jfk-as-the-origin-airport-in-the-month-of-june." class="section level4">
-<h4>– Calculate the average arrival and departure delay for all flights with “JFK” as the origin airport in the month of June.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L, 
-               .(<span class="dt">m_arr=</span><span class="kw">mean</span>(arr_delay), <span class="dt">m_dep=</span><span class="kw">mean</span>(dep_delay))]
-ans
-<span class="co">#       m_arr    m_dep</span>
-<span class="co"># 1: 5.839349 9.807884</span></code></pre></div>
-</div>
-<div id="section-9" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We first subset in <code>i</code> to find matching <em>row indices</em> where <code>origin</code> airport equals <em>“JFK”</em>, and <code>month</code> equals <em>6</em>. At this point, we <em>do not</em> subset the entire <em>data.table</em> corresponding to those rows.</p></li>
-<li><p>Now, we look at <code>j</code> and find that it uses only <em>two columns</em>. And what we have to do is to compute their <code>mean()</code>. Therefore we subset just those columns corresponding to the matching rows, and compute their <code>mean()</code>.</p></li>
-</ul>
-<p>Because the three main components of the query (<code>i</code>, <code>j</code> and <code>by</code>) are <em>together</em> inside <code>[...]</code>, <em>data.table</em> can see all three and optimise the query altogether <em>before evaluation</em>, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency.</p>
-</div>
-<div id="how-many-trips-have-been-made-in-2014-from-jfk-airport-in-the-month-of-june" class="section level4">
-<h4>– How many trips have been made in 2014 from “JFK” airport in the month of June?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L, <span class="kw">length</span>(dest)]
-ans
-<span class="co"># [1] 8422</span></code></pre></div>
-<p>The function <code>length()</code> requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to <code>length()</code> really.</p>
-<p>This type of operation occurs quite frequently, especially while grouping as we will see in the next section, that <em>data.table</em> provides a <em>special symbol</em> <code>.N</code> for it.</p>
-</div>
-<div id="special-N" class="section level4 bs-callout bs-callout-info">
-<h4>Special symbol <code>.N</code>:</h4>
-<p><code>.N</code> is a special in-built variable that holds the number of observations in the current group. It is particularly useful when combined with <code>by</code> as we’ll see in the next section. In the absence of group by operations, it simply returns the number of rows in the subset.</p>
-</div>
-</div>
-</div>
-<div id="section-10" class="section level1">
-<h1></h1>
-<p>So we can now accomplish the same task by using <code>.N</code> as follows:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L, .N]
-ans
-<span class="co"># [1] 8422</span></code></pre></div>
-<div id="section-11" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Once again, we subset in <code>i</code> to get the <em>row indices</em> where <code>origin</code> airport equals <em>“JFK”</em>, and <code>month</code> equals <em>6</em>.</p></li>
-<li><p>We see that <code>j</code> uses only <code>.N</code> and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices).</p></li>
-<li><p>Note that we did not wrap <code>.N</code> with <code>list()</code> or <code>.()</code>. Therefore a vector is returned.</p></li>
-</ul>
-<p>We could have accomplished the same operation by doing <code>nrow(flights[origin == "JFK" & month == 6L])</code>. However, it would have to subset the entire <em>data.table</em> first corresponding to the <em>row indices</em> in <code>i</code> <em>and then</em> return the rows using <code>nrow()</code>, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the <em>data.table design</em> vignette.</p>
-</div>
-<div id="g-great-but-how-can-i-refer-to-columns-by-names-in-j-like-in-a-data.frame" class="section level3">
-<h3>g) Great! But how can I refer to columns by names in <code>j</code> (like in a <em>data.frame</em>)?</h3>
-<p>You can refer to column names the <em>data.frame</em> way using <code>with = FALSE</code>.</p>
-<div id="select-both-arr_delay-and-dep_delay-columns-the-data.frame-way." class="section level4">
-<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns the <em>data.frame</em> way.</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with=<span class="ot">FALSE</span>]
-<span class="kw">head</span>(ans)
-<span class="co">#    arr_delay dep_delay</span>
-<span class="co"># 1:        13        14</span>
-<span class="co"># 2:        13        -3</span>
-<span class="co"># 3:         9         2</span>
-<span class="co"># 4:       -26        -8</span>
-<span class="co"># 5:         1         2</span>
-<span class="co"># 6:         0         4</span></code></pre></div>
-<p>The argument is named <code>with</code> after the R function <code>with()</code> because of similar functionality. Suppose you’ve a <em>data.frame</em> <code>DF</code> and you’d like to subset all rows where <code>x > 1</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">3</span>,<span class="dv">3</span>), <span class="dt">y =</span> <span class="dv">1</span>:<span class="dv">8</span>)
-
-## (1) normal way
-DF[DF$x ><span class="st"> </span><span class="dv">1</span>, ] <span class="co"># data.frame needs that ',' as well</span>
-<span class="co">#   x y</span>
-<span class="co"># 4 2 4</span>
-<span class="co"># 5 2 5</span>
-<span class="co"># 6 3 6</span>
-<span class="co"># 7 3 7</span>
-<span class="co"># 8 3 8</span>
-
-## (2) using with
-DF[<span class="kw">with</span>(DF, x ><span class="st"> </span><span class="dv">1</span>), ]
-<span class="co">#   x y</span>
-<span class="co"># 4 2 4</span>
-<span class="co"># 5 2 5</span>
-<span class="co"># 6 3 6</span>
-<span class="co"># 7 3 7</span>
-<span class="co"># 8 3 8</span></code></pre></div>
-</div>
-<div id="with_false" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Using <code>with()</code> in (2) allows using <code>DF</code>’s column <code>x</code> as if it were a variable.</p>
-<p>Hence the argument name <code>with</code> in <em>data.table</em>. Setting <code>with=FALSE</code> disables the ability to refer to columns as if they are variables, thereby restoring the “<em>data.frame</em> mode”.</p></li>
-<li><p>We can also <em>deselect</em> columns using <code>-</code> or <code>!</code>. For example:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
-
-<span class="co"># returns all columns except arr_delay and dep_delay</span>
-ans <-<span class="st"> </span>flights[, !<span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with=<span class="ot">FALSE</span>]
-<span class="co"># or</span>
-ans <-<span class="st"> </span>flights[, -<span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with=<span class="ot">FALSE</span>]</code></pre></div></li>
-<li><p>From <code>v1.9.5+</code>, we can also select by specifying start and end column names, for e.g, <code>year:day</code> to select the first three columns.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
-
-<span class="co"># returns year,month and day</span>
-ans <-<span class="st"> </span>flights[, year:day, with=<span class="ot">FALSE</span>]
-<span class="co"># returns day, month and year</span>
-ans <-<span class="st"> </span>flights[, day:year, with=<span class="ot">FALSE</span>]
-<span class="co"># returns all columns except year, month and day</span>
-ans <-<span class="st"> </span>flights[, -(year:day), with=<span class="ot">FALSE</span>]
-ans <-<span class="st"> </span>flights[, !(year:day), with=<span class="ot">FALSE</span>]</code></pre></div>
-<p>This is particularly handy while working interactively.</p></li>
-</ul>
-</div>
-</div>
-</div>
-<div id="section-12" class="section level1">
-<h1></h1>
-<p><code>with = TRUE</code> is default in <em>data.table</em> because we can do much more by allowing <code>j</code> to handle expressions - especially when combined with <code>by</code> as we’ll see in a moment.</p>
-<div id="aggregations" class="section level2">
-<h2>2. Aggregations</h2>
-<p>We’ve already seen <code>i</code> and <code>j</code> from <em>data.table</em>’s general form in the previous section. In this section, we’ll see how they can be combined together with <code>by</code> to perform operations <em>by group</em>. Let’s look at some examples.</p>
-<div id="a-grouping-using-by" class="section level3">
-<h3>a) Grouping using <code>by</code></h3>
-<div id="how-can-we-get-the-number-of-trips-corresponding-to-each-origin-airport" class="section level4">
-<h4>– How can we get the number of trips corresponding to each origin airport?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(.N), by=.(origin)]
-ans
-<span class="co">#    origin     N</span>
-<span class="co"># 1:    JFK 81483</span>
-<span class="co"># 2:    LGA 84433</span>
-<span class="co"># 3:    EWR 87400</span>
-
-## or equivalently using a character vector in 'by'
-<span class="co"># ans <- flights[, .(.N), by="origin"]</span></code></pre></div>
-</div>
-<div id="section-13" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We know <code>.N</code> <a href="#special-N">is a special variable</a> that holds the number of rows in the current group. Grouping by <code>origin</code> obtains the number of rows, <code>.N</code>, for each group.</p></li>
-<li><p>By doing <code>head(flights)</code> you can see that the origin airports occur in the order <em>“JFK”</em>, <em>“LGA”</em> and <em>“EWR”</em>. The original order of grouping variables is preserved in the result.</p></li>
-<li><p>Since we did not provide a name for the column returned in <code>j</code>, it was named <code>N</code>automatically by recognising the special symbol <code>.N</code>.</p></li>
-<li><p><code>by</code> also accepts character vector of column names. It is particularly useful to program with, for e.g., designing a function with the columns to be group by as a function argument.</p></li>
-<li><p>When there’s only one column or expression to refer to in <code>j</code> and <code>by</code>, we can drop the <code>.()</code> notation. This is purely for convenience. We could instead do:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .N, by=origin]
-ans
-<span class="co">#    origin     N</span>
-<span class="co"># 1:    JFK 81483</span>
-<span class="co"># 2:    LGA 84433</span>
-<span class="co"># 3:    EWR 87400</span></code></pre></div>
-<p>We’ll use this convenient form wherever applicable hereafter.</p></li>
-</ul>
-</div>
-</div>
-</div>
-</div>
-<div id="section-14" class="section level1">
-<h1></h1>
-<div id="origin-.N" class="section level4">
-<h4>– How can we calculate the number of trips for each origin airport for carrier code <em>“AA”</em>?</h4>
-<p>The unique carrier code <em>“AA”</em> corresponds to <em>American Airlines Inc.</em></p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by=origin]
-ans
-<span class="co">#    origin     N</span>
-<span class="co"># 1:    JFK 11923</span>
-<span class="co"># 2:    LGA 11730</span>
-<span class="co"># 3:    EWR  2649</span></code></pre></div>
-</div>
-<div id="section-15" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We first obtain the row indices for the expression <code>carrier == "AA"</code> from <code>i</code>.</p></li>
-<li><p>Using those <em>row indices</em>, we obtain the number of rows while grouped by <code>origin</code>. Once again no columns are actually materialised here, because the <code>j-expression</code> does not require any columns to be actually subsetted and is therefore fast and memory efficient.</p></li>
-</ul>
-</div>
-<div id="origin-dest-.N" class="section level4">
-<h4>– How can we get the total number of trips for each <code>origin, dest</code> pair for carrier code <em>“AA”</em>?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by=.(origin,dest)]
-<span class="kw">head</span>(ans)
-<span class="co">#    origin dest    N</span>
-<span class="co"># 1:    JFK  LAX 3387</span>
-<span class="co"># 2:    LGA  PBI  245</span>
-<span class="co"># 3:    EWR  LAX   62</span>
-<span class="co"># 4:    JFK  MIA 1876</span>
-<span class="co"># 5:    JFK  SEA  298</span>
-<span class="co"># 6:    EWR  MIA  848</span>
-
-## or equivalently using a character vector in 'by'
-<span class="co"># ans <- flights[carrier == "AA", .N, by=c("origin", "dest")]</span></code></pre></div>
-</div>
-<div id="section-16" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><code>by</code> accepts multiple columns. We just provide all the columns by which to group by.</li>
-</ul>
-</div>
-<div id="origin-dest-month" class="section level4">
-<h4>– How can we get the average arrival and departure delay for each <code>orig,dest</code> pair for each month for carrier code <em>“AA”</em>?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, 
-        .(<span class="kw">mean</span>(arr_delay), <span class="kw">mean</span>(dep_delay)), 
-        by=.(origin, dest, month)]
-ans
-<span class="co">#      origin dest month         V1         V2</span>
-<span class="co">#   1:    JFK  LAX     1   6.590361 14.2289157</span>
-<span class="co">#   2:    LGA  PBI     1  -7.758621  0.3103448</span>
-<span class="co">#   3:    EWR  LAX     1   1.366667  7.5000000</span>
-<span class="co">#   4:    JFK  MIA     1  15.720670 18.7430168</span>
-<span class="co">#   5:    JFK  SEA     1  14.357143 30.7500000</span>
-<span class="co">#  ---                                        </span>
-<span class="co"># 196:    LGA  MIA    10  -6.251799 -1.4208633</span>
-<span class="co"># 197:    JFK  MIA    10  -1.880184  6.6774194</span>
-<span class="co"># 198:    EWR  PHX    10  -3.032258 -4.2903226</span>
-<span class="co"># 199:    JFK  MCO    10 -10.048387 -1.6129032</span>
-<span class="co"># 200:    JFK  DCA    10  16.483871 15.5161290</span></code></pre></div>
-</div>
-<div id="section-17" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We did not provide column names for expressions in <code>j</code>, they were automatically generated (<code>V1</code>, <code>V2</code>).</p></li>
-<li><p>Once again, note that the input order of grouping columns is preserved in the result.</p></li>
-</ul>
-</div>
-</div>
-<div id="section-18" class="section level1">
-<h1></h1>
-<p>Now what if we would like to order the result by those grouping columns <code>origin</code>, <code>dest</code> and <code>month</code>?</p>
-<div id="b-keyby" class="section level3">
-<h3>b) keyby</h3>
-<p><em>data.table</em> retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by.</p>
-<div id="so-how-can-we-directly-order-by-all-the-grouping-variables" class="section level4">
-<h4>– So how can we directly order by all the grouping variables?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, 
-        .(<span class="kw">mean</span>(arr_delay), <span class="kw">mean</span>(dep_delay)), 
-        keyby=.(origin, dest, month)]
-ans
-<span class="co">#      origin dest month         V1         V2</span>
-<span class="co">#   1:    EWR  DFW     1   6.427673 10.0125786</span>
-<span class="co">#   2:    EWR  DFW     2  10.536765 11.3455882</span>
-<span class="co">#   3:    EWR  DFW     3  12.865031  8.0797546</span>
-<span class="co">#   4:    EWR  DFW     4  17.792683 12.9207317</span>
-<span class="co">#   5:    EWR  DFW     5  18.487805 18.6829268</span>
-<span class="co">#  ---                                        </span>
-<span class="co"># 196:    LGA  PBI     1  -7.758621  0.3103448</span>
-<span class="co"># 197:    LGA  PBI     2  -7.865385  2.4038462</span>
-<span class="co"># 198:    LGA  PBI     3  -5.754098  3.0327869</span>
-<span class="co"># 199:    LGA  PBI     4 -13.966667 -4.7333333</span>
-<span class="co"># 200:    LGA  PBI     5 -10.357143 -6.8571429</span></code></pre></div>
-</div>
-<div id="section-19" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li>All we did was to change <code>by</code> to <code>keyby</code>. This automatically orders the result by the grouping variables in increasing order. Note that <code>keyby()</code> is applied after performing the operation, i.e., on the computed result.</li>
-</ul>
-<p><strong>Keys:</strong> Actually <code>keyby</code> does a little more than <em>just ordering</em>. It also <em>sets a key</em> after ordering by setting an <em>attribute</em> called <code>sorted</code>. But we’ll learn more about <code>keys</code> in the next vignette.</p>
-<p>For now, all you’ve to know is you can use <code>keyby</code> to automatically order by the columns specified in <code>by</code>.</p>
-</div>
-</div>
-<div id="c-chaining" class="section level3">
-<h3>c) Chaining</h3>
-<p>Let’s reconsider the task of <a href="#origin-dest-.N">getting the total number of trips for each <code>origin, dest</code> pair for carrier <em>“AA”</em></a>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by =<span class="st"> </span>.(origin, dest)]</code></pre></div>
-<div id="how-can-we-order-ans-using-the-columns-origin-in-ascending-order-and-dest-in-descending-order" class="section level4">
-<h4>– How can we order <code>ans</code> using the columns <code>origin</code> in ascending order, and <code>dest</code> in descending order?</h4>
-<p>We can store the intermediate result in a variable, and then use <code>order(origin, -dest)</code> on that variable. It seems fairly straightforward.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>ans[<span class="kw">order</span>(origin, -dest)]
-<span class="kw">head</span>(ans)
-<span class="co">#    origin dest    N</span>
-<span class="co"># 1:    EWR  PHX  121</span>
-<span class="co"># 2:    EWR  MIA  848</span>
-<span class="co"># 3:    EWR  LAX   62</span>
-<span class="co"># 4:    EWR  DFW 1618</span>
-<span class="co"># 5:    JFK  STT  229</span>
-<span class="co"># 6:    JFK  SJU  690</span></code></pre></div>
-</div>
-<div id="section-20" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Recall that we can use “-” on a <em>character</em> column in <code>order()</code> within the frame of a <em>data.table</em>. This is possible to due <em>data.table</em>’s internal query optimisation.</p></li>
-<li><p>Also recall that <code>order(...)</code> with the frame of a <em>data.table</em> is <em>automatically optimised</em> to use <em>data.table</em>’s internal fast radix order <code>forder()</code> for speed. So you can keep using the already <em>familiar</em> base R functions without compromising in speed or memory efficiency that <em>data.table</em> offers. We will cover this in more detail in the <em>data.table internals</em> vignette.</p></li>
-</ul>
-</div>
-</div>
-</div>
-<div id="section-21" class="section level1">
-<h1></h1>
-<p>But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogther by <code>chaining</code> expressions.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by=.(origin, dest)][<span class="kw">order</span>(origin, -dest)]
-<span class="kw">head</span>(ans, <span class="dv">10</span>)
-<span class="co">#     origin dest    N</span>
-<span class="co">#  1:    EWR  PHX  121</span>
-<span class="co">#  2:    EWR  MIA  848</span>
-<span class="co">#  3:    EWR  LAX   62</span>
-<span class="co">#  4:    EWR  DFW 1618</span>
-<span class="co">#  5:    JFK  STT  229</span>
-<span class="co">#  6:    JFK  SJU  690</span>
-<span class="co">#  7:    JFK  SFO 1312</span>
-<span class="co">#  8:    JFK  SEA  298</span>
-<span class="co">#  9:    JFK  SAN  299</span>
-<span class="co"># 10:    JFK  ORD  432</span></code></pre></div>
-<div id="section-22" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>We can tack expressions one after another, <em>forming a chain</em> of operations, i.e., <code>DT[ ... ][ ... ][ ... ]</code>.</p></li>
-<li><p>Or you can also chain them vertically:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[ ... 
- ][ ... 
- ][ ... 
- ]</code></pre></div></li>
-</ul>
-</div>
-<div id="d-expressions-in-by" class="section level3">
-<h3>d) Expressions in <code>by</code></h3>
-<div id="can-by-accept-expressions-as-well-or-just-take-columns" class="section level4">
-<h4>– Can <code>by</code> accept <em>expressions</em> as well or just take columns?</h4>
-<p>Yes it does. As an example, if we would like to find out how many flights started late but arrived early (or on time), started and arrived late etc…</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .N, .(dep_delay><span class="dv">0</span>, arr_delay><span class="dv">0</span>)]
-ans
-<span class="co">#    dep_delay arr_delay      N</span>
-<span class="co"># 1:      TRUE      TRUE  72836</span>
-<span class="co"># 2:     FALSE      TRUE  34583</span>
-<span class="co"># 3:     FALSE     FALSE 119304</span>
-<span class="co"># 4:      TRUE     FALSE  26593</span></code></pre></div>
-</div>
-<div id="section-23" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>The last row corresponds to <code>dep_delay > 0 = TRUE</code> and <code>arr_delay > 0 = FALSE</code>. We can see that 26593 flights started late but arrived early (or on time).</p></li>
-<li><p>Note that we did not provide any names to <code>by-expression</code>. And names have been automatically assigned in the result.</p></li>
-<li><p>You can provide other columns along with expressions, for example: <code>DT[, .N, by=.(a, b>0)]</code>.</p></li>
-</ul>
-</div>
-</div>
-<div id="e-multiple-columns-in-j---.sd" class="section level3">
-<h3>e) Multiple columns in <code>j</code> - <code>.SD</code></h3>
-<div id="do-we-have-to-compute-mean-for-each-column-individually" class="section level4">
-<h4>– Do we have to compute <code>mean()</code> for each column individually?</h4>
-<p>It is of course not practical to have to type <code>mean(myCol)</code> for every column one by one. What if you had a 100 columns to compute <code>mean()</code> of?</p>
-<p>How can we do this efficiently? To get there, refresh on <a href="#tip-1">this tip</a> - <em>“As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table”</em>. Suppose we can refer to the <em>data subset for each group</em> as a variable <em>while grouping</em>, then we can loop through all the columns of that variable using the already familiar base function <code>lapply()</code>. We don’t have to learn any new function.</p>
-</div>
-<div id="special-SD" class="section level4 bs-callout bs-callout-info">
-<h4>Special symbol <code>.SD</code>:</h4>
-<p><em>data.table</em> provides a <em>special</em> symbol, called <code>.SD</code>. It stands for <strong>S</strong>ubset of <strong>D</strong>ata. It by itself is a <em>data.table</em> that holds the data for <em>the current group</em> defined using <code>by</code>.</p>
-<p>Recall that a <em>data.table</em> is internally a list as well with all its columns of equal length.</p>
-</div>
-</div>
-</div>
-<div id="section-24" class="section level1">
-<h1></h1>
-<p>Let’s use the <a href="#what-is-datatable-1a"><em>data.table</em> <code>DT</code> from before</a> to get a glimpse of what <code>.SD</code> looks like.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT
-<span class="co">#    ID a  b  c</span>
-<span class="co"># 1:  b 1  7 13</span>
-<span class="co"># 2:  b 2  8 14</span>
-<span class="co"># 3:  b 3  9 15</span>
-<span class="co"># 4:  a 4 10 16</span>
-<span class="co"># 5:  a 5 11 17</span>
-<span class="co"># 6:  c 6 12 18</span>
-
-DT[, <span class="kw">print</span>(.SD), by=ID]
-<span class="co">#    a b  c</span>
-<span class="co"># 1: 1 7 13</span>
-<span class="co"># 2: 2 8 14</span>
-<span class="co"># 3: 3 9 15</span>
-<span class="co">#    a  b  c</span>
-<span class="co"># 1: 4 10 16</span>
-<span class="co"># 2: 5 11 17</span>
-<span class="co">#    a  b  c</span>
-<span class="co"># 1: 6 12 18</span>
-<span class="co"># Empty data.table (0 rows) of 1 col: ID</span></code></pre></div>
-<div id="section-25" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p><code>.SD</code> contains all the columns <em>except the grouping columns</em> by default.</p></li>
-<li><p>It is also generated by preserving the original order - data corresponding to <code>ID = "b"</code>, then <code>ID = "a"</code>, and then <code>ID = "c"</code>.</p></li>
-</ul>
-</div>
-</div>
-<div id="section-26" class="section level1">
-<h1></h1>
-<p>To compute on (multiple) columns, we can then simply use the base R function <code>lapply()</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, <span class="kw">lapply</span>(.SD, mean), by=ID]
-<span class="co">#    ID   a    b    c</span>
-<span class="co"># 1:  b 2.0  8.0 14.0</span>
-<span class="co"># 2:  a 4.5 10.5 16.5</span>
-<span class="co"># 3:  c 6.0 12.0 18.0</span></code></pre></div>
-<div id="section-27" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p><code>.SD</code> holds the rows corresponding to columns <em>a</em>, <em>b</em> and <em>c</em> for that group. We compute the <code>mean()</code> on each of these columns using the already familiar base function <code>lapply()</code>.</p></li>
-<li><p>Each group returns a list of three elements containing the mean value which will become the columns of the resulting <code>data.table</code>.</p></li>
-<li><p>Since <code>lapply()</code> returns a <em>list</em>, there is no need to wrap it with an additional <code>.()</code> (if necessary, refer to <a href="#tip-1">this tip</a>).</p></li>
-</ul>
-</div>
-</div>
-<div id="section-28" class="section level1">
-<h1></h1>
-<p>We are almost there. There is one little thing left to address. In our <code>flights</code> <em>data.table</em>, we only wanted to calculate the <code>mean()</code> of two columns <code>arr_delay</code> and <code>dep_delay</code>. But <code>.SD</code> would contain all the columns other than the grouping variables by default.</p>
-<div id="how-can-we-specify-just-the-columns-we-would-like-to-compute-the-mean-on" class="section level4">
-<h4>– How can we specify just the columns we would like to compute the <code>mean()</code> on?</h4>
-</div>
-<div id="sdcols" class="section level4 bs-callout bs-callout-info">
-<h4>.SDcols</h4>
-<p>Using the argument <code>.SDcols</code>. It accepts either column names or column indices. For example, <code>.SDcols = c("arr_delay", "dep_delay")</code> ensures that <code>.SD</code> contains only these two columns for each group.</p>
-<p>Similar to the <a href="#with_false">with = FALSE section</a>, you can also provide the columns to remove instead of columns to keep using <code>-</code> or <code>!</code> sign as well as select consecutive columns as <code>colA:colB</code> and deselect consecutive columns as <code>!(colA:colB) or</code>-(colA:colB)`.</p>
-</div>
-</div>
-<div id="section-29" class="section level1">
-<h1></h1>
-<p>Now let us try to use <code>.SD</code> along with <code>.SDcols</code> to get the <code>mean()</code> of <code>arr_delay</code> and <code>dep_delay</code> columns grouped by <code>origin</code>, <code>dest</code> and <code>month</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[carrier ==<span class="st"> "AA"</span>,                     ## Only on trips with carrier "AA"
-        <span class="kw">lapply</span>(.SD, mean),                   ## compute the mean
-        by=.(origin, dest, month),           ## for every 'origin,dest,month'
-        .SDcols=<span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>)] ## for just those specified in .SDcols
-<span class="co">#      origin dest month  arr_delay  dep_delay</span>
-<span class="co">#   1:    JFK  LAX     1   6.590361 14.2289157</span>
-<span class="co">#   2:    LGA  PBI     1  -7.758621  0.3103448</span>
-<span class="co">#   3:    EWR  LAX     1   1.366667  7.5000000</span>
-<span class="co">#   4:    JFK  MIA     1  15.720670 18.7430168</span>
-<span class="co">#   5:    JFK  SEA     1  14.357143 30.7500000</span>
-<span class="co">#  ---                                        </span>
-<span class="co"># 196:    LGA  MIA    10  -6.251799 -1.4208633</span>
-<span class="co"># 197:    JFK  MIA    10  -1.880184  6.6774194</span>
-<span class="co"># 198:    EWR  PHX    10  -3.032258 -4.2903226</span>
-<span class="co"># 199:    JFK  MCO    10 -10.048387 -1.6129032</span>
-<span class="co"># 200:    JFK  DCA    10  16.483871 15.5161290</span></code></pre></div>
-<div id="f-subset-.sd-for-each-group" class="section level3">
-<h3>f) Subset <code>.SD</code> for each group:</h3>
-<div id="how-can-we-return-the-first-two-rows-for-each-month" class="section level4">
-<h4>– How can we return the first two rows for each <code>month</code>?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">head</span>(.SD, <span class="dv">2</span>), by=month]
-<span class="kw">head</span>(ans)
-<span class="co">#    month year day dep_delay arr_delay carrier origin dest air_time distance hour</span>
-<span class="co"># 1:     1 2014   1        14        13      AA    JFK  LAX      359     2475    9</span>
-<span class="co"># 2:     1 2014   1        -3        13      AA    JFK  LAX      363     2475   11</span>
-<span class="co"># 3:     2 2014   1        -1         1      AA    JFK  LAX      358     2475    8</span>
-<span class="co"># 4:     2 2014   1        -5         3      AA    JFK  LAX      358     2475   11</span>
-<span class="co"># 5:     3 2014   1       -11        36      AA    JFK  LAX      375     2475    8</span>
-<span class="co"># 6:     3 2014   1        -3        14      AA    JFK  LAX      368     2475   11</span></code></pre></div>
-</div>
-<div id="section-30" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p><code>.SD</code> is a <em>data.table</em> that holds all the rows for <em>that group</em>. We simply subset the first two rows as we have seen <a href="#subset-rows-integer">here</a> already.</p></li>
-<li><p>For each group, <code>head(.SD, 2)</code> returns the first two rows as a <em>data.table</em> which is also a list. So we do not have to wrap it with <code>.()</code>.</p></li>
-</ul>
-</div>
-</div>
-<div id="g-why-keep-j-so-flexible" class="section level3">
-<h3>g) Why keep <code>j</code> so flexible?</h3>
-<p>So that we have a consistent syntax and keep using already existing (and familiar) base functions instead of learning new functions. To illustrate, let us use the <em>data.table</em> <code>DT</code> we created at the very beginning under <a href="#what-is-datatable-1a">What is a data.table?</a> section.</p>
-<div id="how-can-we-concatenate-columns-a-and-b-for-each-group-in-id" class="section level4">
-<h4>– How can we concatenate columns <code>a</code> and <code>b</code> for each group in <code>ID</code>?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, .(<span class="dt">val =</span> <span class="kw">c</span>(a,b)), by=ID]
-<span class="co">#     ID val</span>
-<span class="co">#  1:  b   1</span>
-<span class="co">#  2:  b   2</span>
-<span class="co">#  3:  b   3</span>
-<span class="co">#  4:  b   7</span>
-<span class="co">#  5:  b   8</span>
-<span class="co">#  6:  b   9</span>
-<span class="co">#  7:  a   4</span>
-<span class="co">#  8:  a   5</span>
-<span class="co">#  9:  a  10</span>
-<span class="co"># 10:  a  11</span>
-<span class="co"># 11:  c   6</span>
-<span class="co"># 12:  c  12</span></code></pre></div>
-</div>
-<div id="section-31" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li>That’s it. There is no special syntax required. All we need to know is the base function <code>c()</code> which concatenates vectors and <a href="#tip-1">the tip from before</a>.</li>
-</ul>
-</div>
-<div id="what-if-we-would-like-to-have-all-the-values-of-column-a-and-b-concatenated-but-returned-as-a-list-column" class="section level4">
-<h4>– What if we would like to have all the values of column <code>a</code> and <code>b</code> concatenated, but returned as a list column?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, .(<span class="dt">val =</span> <span class="kw">list</span>(<span class="kw">c</span>(a,b))), by=ID]
-<span class="co">#    ID         val</span>
-<span class="co"># 1:  b 1,2,3,7,8,9</span>
-<span class="co"># 2:  a  4, 5,10,11</span>
-<span class="co"># 3:  c        6,12</span></code></pre></div>
-</div>
-<div id="section-32" class="section level4 bs-callout bs-callout-info">
-<h4></h4>
-<ul>
-<li><p>Here, we first concatenate the values with <code>c(a,b)</code> for each group, and wrap that with <code>list()</code>. So for each group, we return a list of all concatenated values.</p></li>
-<li><p>Note those commas are for display only. A list column can contain any object in each cell, and in this example, each cell is itself a vector and some cells contain longer vectors than others.</p></li>
-</ul>
-</div>
-</div>
-</div>
-<div id="section-33" class="section level1">
-<h1></h1>
-<p>Once you start internalising usage in <code>j</code>, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of <code>print()</code>.</p>
-<p>For example:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## (1) look at the difference between
-DT[, <span class="kw">print</span>(<span class="kw">c</span>(a,b)), by=ID]
-<span class="co"># [1] 1 2 3 7 8 9</span>
-<span class="co"># [1]  4  5 10 11</span>
-<span class="co"># [1]  6 12</span>
-<span class="co"># Empty data.table (0 rows) of 1 col: ID</span>
-
-## (2) and
-DT[, <span class="kw">print</span>(<span class="kw">list</span>(<span class="kw">c</span>(a,b))), by=ID]
-<span class="co"># [[1]]</span>
-<span class="co"># [1] 1 2 3 7 8 9</span>
-<span class="co"># </span>
-<span class="co"># [[1]]</span>
-<span class="co"># [1]  4  5 10 11</span>
-<span class="co"># </span>
-<span class="co"># [[1]]</span>
-<span class="co"># [1]  6 12</span>
-<span class="co"># Empty data.table (0 rows) of 1 col: ID</span></code></pre></div>
-<p>In (1), for each group, a vector is returned, with length = 6,4,2 here. However (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore (1) results in a length of <code>6+4+2 = 12</code>, whereas (2) returns <code>1+1+1=3</code>.</p>
-<div id="summary" class="section level2">
-<h2>Summary</h2>
-<p>The general form of <em>data.table</em> syntax is:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[i, j, by]</code></pre></div>
-<p>We have seen so far that,</p>
-<div id="using-i" class="section level4 bs-callout bs-callout-info">
-<h4>Using <code>i</code>:</h4>
-<ul>
-<li><p>We can subset rows similar to a <em>data.frame</em> - except you don’t have to use <code>DT$</code> repetitively since columns within the frame of a <em>data.table</em> are seen as if they are <em>variables</em>.</p></li>
-<li><p>We can also sort a <em>data.table</em> using <code>order()</code>, which internally uses <em>data.table</em>’s fast order for performance.</p></li>
-</ul>
-<p>We can do much more in <code>i</code> by keying a <em>data.table</em>, which allows blazing fast subsets and joins. We will see this in the <em>“Keys and fast binary search based subsets”</em> and <em>“Joins and rolling joins”</em> vignette.</p>
-</div>
-<div id="using-j" class="section level4 bs-callout bs-callout-info">
-<h4>Using <code>j</code>:</h4>
-<ol style="list-style-type: decimal">
-<li><p>Select columns the <em>data.table</em> way: <code>DT[, .(colA, colB)]</code>.</p></li>
-<li><p>Select columns the <em>data.frame</em> way: <code>DT[, c("colA", "colB"), with=FALSE]</code>.</p></li>
-<li><p>Compute on columns: <code>DT[, .(sum(colA), mean(colB))]</code>.</p></li>
-<li><p>Provide names if necessary: <code>DT[, .(sA =sum(colA), mB = mean(colB))]</code>.</p></li>
-<li><p>Combine with <code>i</code>: <code>DT[colA > value, sum(colB)]</code>.</p></li>
-</ol>
-</div>
-</div>
-</div>
-<div id="section-34" class="section level1">
-<h1></h1>
-<div id="using-by" class="section level4 bs-callout bs-callout-info">
-<h4>Using <code>by</code>:</h4>
-<ul>
-<li><p>Using <code>by</code>, we can group by columns by specifying a <em>list of columns</em> or a <em>character vector of column names</em> or even <em>expressions</em>. The flexibility of <code>j</code>, combined with <code>by</code> and <code>i</code> makes for a very powerful syntax.</p></li>
-<li><p><code>by</code> can handle multiple columns and also <em>expressions</em>.</p></li>
-<li><p>We can <code>keyby</code> grouping columns to automatically sort the grouped result.</p></li>
-<li><p>We can use <code>.SD</code> and <code>.SDcols</code> in <code>j</code> to operate on multiple columns using already familiar base functions. Here are some examples:</p>
-<ol style="list-style-type: decimal">
-<li><p><code>DT[, lapply(.SD, fun), by=., .SDcols=...]</code> - applies <code>fun</code> to all columns specified in <code>.SDcols</code> while grouping by the columns specified in <code>by</code>.</p></li>
-<li><p><code>DT[, head(.SD, 2), by=.]</code> - return the first two rows for each group.</p></li>
-<li><p><code>DT[col > val, head(.SD, 1), by=.]</code> - combine <code>i</code> along with <code>j</code> and <code>by</code>.</p></li>
-</ol></li>
-</ul>
-</div>
-</div>
-<div id="section-35" class="section level1">
-<h1></h1>
-<div id="and-remember-the-tip" class="section level4 bs-callout bs-callout-warning">
-<h4>And remember the tip:</h4>
-<p>As long as <code>j</code> returns a <em>list</em>, each element of the list will become a column in the resulting <em>data.table</em>.</p>
-</div>
-</div>
-<div id="section-36" class="section level1">
-<h1></h1>
-<p>We will see how to <em>add/update/delete</em> columns <em>by reference</em> and how to combine them with <code>i</code> and <code>by</code> in the next vignette.</p>
-<hr />
-</div>
-
-
-</div>
-
-<script>
-
-// add bootstrap table styles to pandoc tables
-$(document).ready(function () {
-  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
-});
-
-</script>
-
-<!-- dynamically load mathjax for compatibility with self-contained -->
-<script>
-  (function () {
-    var script = document.createElement("script");
-    script.type = "text/javascript";
-    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
-    document.getElementsByTagName("head")[0].appendChild(script);
-  })();
-</script>
-
-</body>
-</html>
diff --git a/inst/doc/datatable-intro.R b/inst/doc/datatable-intro.R
index d7e7f67..56df3c1 100644
--- a/inst/doc/datatable-intro.R
+++ b/inst/doc/datatable-intro.R
@@ -1,206 +1,209 @@
-### R code from vignette source 'datatable-intro.Rnw'
-
-###################################################
-### code chunk number 1: datatable-intro.Rnw:15-19
-###################################################
-if (!exists("data.table",.GlobalEnv)) library(data.table)  
-# In devel won't call library, but R CMD build/check will.
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-# for development when we Sweave this file repeatedly. Otherwise first tables() shows tables from last run
-
-
-###################################################
-### code chunk number 2: datatable-intro.Rnw:36-38
-###################################################
-DF = data.frame(x=c("b","b","b","a","a"),v=rnorm(5))
-DF
-
-
-###################################################
-### code chunk number 3: datatable-intro.Rnw:41-43
-###################################################
-DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
-DT
-
-
-###################################################
-### code chunk number 4: datatable-intro.Rnw:47-49
-###################################################
-CARS = data.table(cars)
-head(CARS)
-
-
-###################################################
-### code chunk number 5: datatable-intro.Rnw:53-54
-###################################################
-tables()
-
-
-###################################################
-### code chunk number 6: datatable-intro.Rnw:66-67
-###################################################
-sapply(DT,class)
-
-
-###################################################
-### code chunk number 7: datatable-intro.Rnw:83-85
-###################################################
-tables()
+## ---- echo = FALSE, message = FALSE--------------------------------------
+require(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+
+## ----echo = FALSE---------------------------------------------------------------------------------
+options(width = 100L)
+
+## -------------------------------------------------------------------------------------------------
+flights <- fread("flights14.csv")
+flights
+dim(flights)
+
+## -------------------------------------------------------------------------------------------------
+DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DT
-
-
-###################################################
-### code chunk number 8: datatable-intro.Rnw:90-92
-###################################################
-DT[2,]         # select row 2
-DT[x=="b",]    # select rows where column x == "b"
-
-
-###################################################
-### code chunk number 9: datatable-intro.Rnw:98-99
-###################################################
-cat(try(DT["b",],silent=TRUE))
-
-
-###################################################
-### code chunk number 10: datatable-intro.Rnw:103-105
-###################################################
-setkey(DT,x)
+class(DT$ID)
+
+## -------------------------------------------------------------------------------------------------
+getOption("datatable.print.nrows")
+
+## ----eval = FALSE---------------------------------------------------------------------------------
+#  DT[i, j, by]
+#  
+#  ##   R:      i                 j        by
+#  ## SQL:  where   select | update  group by
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[origin == "JFK" & month == 6L]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[1:2]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[order(origin, -dest)]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+odt = data.table(col = sample(1e7))
+(t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
+(t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's forder
+(identical(ans1, ans2))
+
+## ----echo = FALSE---------------------------------------------------------------------------------
+rm(odt); rm(ans1); rm(ans2); rm(t1); rm(t2)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, arr_delay]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, list(arr_delay)]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, .(arr_delay, dep_delay)]
+head(ans)
+
+## alternatively
+# ans <- flights[, list(arr_delay, dep_delay)]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, .(delay_arr = arr_delay, delay_dep = dep_delay)]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, sum((arr_delay + dep_delay) < 0)]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[origin == "JFK" & month == 6L,
+               .(m_arr = mean(arr_delay), m_dep = mean(dep_delay))]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[origin == "JFK" & month == 6L, length(dest)]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[origin == "JFK" & month == 6L, .N]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, c("arr_delay", "dep_delay"), with = FALSE]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+DF = data.frame(x = c(1,1,1,2,2,3,3,3), y = 1:8)
+
+## (1) normal way
+DF[DF$x > 1, ] # data.frame needs that ',' as well
+
+## (2) using with
+DF[with(DF, x > 1), ]
+
+## ----eval = FALSE---------------------------------------------------------------------------------
+#  ## not run
+#  
+#  # returns all columns except arr_delay and dep_delay
+#  ans <- flights[, !c("arr_delay", "dep_delay"), with = FALSE]
+#  # or
+#  ans <- flights[, -c("arr_delay", "dep_delay"), with = FALSE]
+
+## ----eval = FALSE---------------------------------------------------------------------------------
+#  ## not run
+#  
+#  # returns year,month and day
+#  ans <- flights[, year:day, with = FALSE]
+#  # returns day, month and year
+#  ans <- flights[, day:year, with = FALSE]
+#  # returns all columns except year, month and day
+#  ans <- flights[, -(year:day), with = FALSE]
+#  ans <- flights[, !(year:day), with = FALSE]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, .(.N), by = .(origin)]
+ans
+
+## or equivalently using a character vector in 'by'
+# ans <- flights[, .(.N), by = "origin"]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, .N, by = origin]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA", .N, by = origin]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA", .N, by = .(origin,dest)]
+head(ans)
+
+## or equivalently using a character vector in 'by'
+# ans <- flights[carrier == "AA", .N, by = c("origin", "dest")]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        by = .(origin, dest, month)]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        keyby = .(origin, dest, month)]
+ans
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
+
+## -------------------------------------------------------------------------------------------------
+ans <- ans[order(origin, -dest)]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[carrier == "AA", .N, by = .(origin, dest)][order(origin, -dest)]
+head(ans, 10)
+
+## ----eval = FALSE---------------------------------------------------------------------------------
+#  DT[ ...
+#   ][ ...
+#   ][ ...
+#   ]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, .N, .(dep_delay>0, arr_delay>0)]
+ans
+
+## -------------------------------------------------------------------------------------------------
 DT
 
+DT[, print(.SD), by = ID]
 
-###################################################
-### code chunk number 11: datatable-intro.Rnw:113-114
-###################################################
-tables()
-
-
-###################################################
-### code chunk number 12: datatable-intro.Rnw:119-120
-###################################################
-DT["b",]
-
-
-###################################################
-### code chunk number 13: datatable-intro.Rnw:125-127
-###################################################
-DT["b",mult="first"]
-DT["b",mult="last"]
-
-
-###################################################
-### code chunk number 14: datatable-intro.Rnw:132-133
-###################################################
-DT["b"]
-
-
-###################################################
-### code chunk number 15: datatable-intro.Rnw:138-148
-###################################################
-grpsize = ceiling(1e7/26^2)   # 10 million rows, 676 groups
-tt=system.time( DF <- data.frame(
-  x=rep(LETTERS,each=26*grpsize),
-  y=rep(letters,each=grpsize),
-  v=runif(grpsize*26^2),
-  stringsAsFactors=FALSE)
-)
-head(DF,3)
-tail(DF,3)
-dim(DF)
-
-
-###################################################
-### code chunk number 16: datatable-intro.Rnw:157-160
-###################################################
-tt=system.time(ans1 <- DF[DF$x=="R" & DF$y=="h",])   # 'vector scan'
-head(ans1,3)
-dim(ans1)
-
-
-###################################################
-### code chunk number 17: datatable-intro.Rnw:165-167
-###################################################
-DT = as.data.table(DF)       # but normally use fread() or data.table() directly, originally 
-system.time(setkey(DT,x,y))  # one-off cost, usually
-
-
-###################################################
-### code chunk number 18: datatable-intro.Rnw:169-173
-###################################################
-ss=system.time(ans2 <- DT[list("R","h")])   # binary search
-head(ans2,3)
-dim(ans2)
-identical(ans1$v, ans2$v)
-
-
-###################################################
-### code chunk number 19: datatable-intro.Rnw:175-176
-###################################################
-if(!identical(ans1$v, ans2$v)) stop("vector scan vs binary search not equal")
-
-
-###################################################
-### code chunk number 20: datatable-intro.Rnw:186-189
-###################################################
-system.time(ans1 <- DT[x=="R" & y=="h",])   # works but is using data.table badly
-system.time(ans2 <- DF[DF$x=="R" & DF$y=="h",])   # the data.frame way
-mapply(identical,ans1,ans2)
-
-
-###################################################
-### code chunk number 21: datatable-intro.Rnw:204-206
-###################################################
-identical( DT[list("R","h"),],
-           DT[.("R","h"),])
-
-
-###################################################
-### code chunk number 22: datatable-intro.Rnw:208-209
-###################################################
-if(!identical(DT[list("R","h"),],DT[.("R","h"),])) stop("list != . check")
-
-
-###################################################
-### code chunk number 23: datatable-intro.Rnw:229-230
-###################################################
-DT[,sum(v)]
-
-
-###################################################
-### code chunk number 24: datatable-intro.Rnw:235-236
-###################################################
-DT[,sum(v),by=x]
-
-
-###################################################
-### code chunk number 25: datatable-intro.Rnw:241-246
-###################################################
-ttt=system.time(tt <- tapply(DT$v,DT$x,sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by=x]); sss
-head(tt)
-head(ss)
-identical(as.vector(tt), ss$V1)
+## -------------------------------------------------------------------------------------------------
+DT[, lapply(.SD, mean), by = ID]
 
+## -------------------------------------------------------------------------------------------------
+flights[carrier == "AA",                       ## Only on trips with carrier "AA"
+        lapply(.SD, mean),                     ## compute the mean
+        by = .(origin, dest, month),           ## for every 'origin,dest,month'
+        .SDcols = c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
 
-###################################################
-### code chunk number 26: datatable-intro.Rnw:248-249
-###################################################
-if(!identical(as.vector(tt), ss$V1)) stop("by check failed")
+## -------------------------------------------------------------------------------------------------
+ans <- flights[, head(.SD, 2), by = month]
+head(ans)
 
+## -------------------------------------------------------------------------------------------------
+DT[, .(val = c(a,b)), by = ID]
 
-###################################################
-### code chunk number 27: datatable-intro.Rnw:257-262
-###################################################
-ttt=system.time(tt <- tapply(DT$v,list(DT$x,DT$y),sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by="x,y"]); sss
-tt[1:5,1:5]
-head(ss)
-identical(as.vector(t(tt)), ss$V1)
+## -------------------------------------------------------------------------------------------------
+DT[, .(val = list(c(a,b))), by = ID]
 
+## -------------------------------------------------------------------------------------------------
+## (1) look at the difference between
+DT[, print(c(a,b)), by = ID]
 
-###################################################
-### code chunk number 28: datatable-intro.Rnw:264-265
-###################################################
-if(!identical(as.vector(t(tt)), ss$V1)) stop("group check failed")
+## (2) and
+DT[, print(list(c(a,b))), by = ID]
 
+## ----eval = FALSE---------------------------------------------------------------------------------
+#  DT[i, j, by]
 
diff --git a/vignettes/datatable-intro-vignette.Rmd b/inst/doc/datatable-intro.Rmd
similarity index 88%
rename from vignettes/datatable-intro-vignette.Rmd
rename to inst/doc/datatable-intro.Rmd
index 7e2335f..d942281 100644
--- a/vignettes/datatable-intro-vignette.Rmd
+++ b/inst/doc/datatable-intro.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Introduction to data.table"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Introduction to data.table}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,11 +13,10 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 This vignette introduces the *data.table* syntax, its general form, how to *subset* rows, *select and compute* on columns and perform aggregations *by group*. Familiarity with *data.frame* data structure from base R is useful, but not essential to follow this vignette.
 
@@ -30,11 +26,11 @@ This vignette introduces the *data.table* syntax, its general form, how to *subs
 
 Data manipulation operations such as *subset*, *group*, *update*, *join* etc., are all inherently related. Keeping these *related operations together* allows for:
 
-* *concise* and *consistent* syntax irrespective of the set of operations you would like to perform to achieve your end goal. 
+* *concise* and *consistent* syntax irrespective of the set of operations you would like to perform to achieve your end goal.
 
-* performing analysis *fluidly* without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis. 
+* performing analysis *fluidly* without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis.
 
-* *automatically* optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient. 
+* *automatically* optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient.
 
 Briefly, if you are interested in reducing *programming* and *compute* time tremendously, then this package is for you. The philosophy that *data.table* adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.
 
@@ -45,8 +41,8 @@ In this vignette, we will use [NYC-flights14](https://github.com/arunsrinivasan/
 
 We can use *data.table's* fast file reader `fread` to load *flights* directly as follows:
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -63,7 +59,7 @@ In this vignette, we will
 
 1. start with basics - what is a *data.table*, its general form, how to *subset* rows, *select and compute* on columns
 
-2. and then we will look at performing data aggregations by group, 
+2. and then we will look at performing data aggregations by group,
 
 ## 1. Basics {#basics-1}
 
@@ -72,7 +68,7 @@ In this vignette, we will
 *data.table* is an R package that provides **an enhanced version** of *data.frames*. In the [Data](#data) section, we already created a *data.table* using `fread()`. We can also create one using the `data.table()` function. Here is an example:
 
 ```{r}
-DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
+DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DT
 class(DT$ID)
 ```
@@ -86,7 +82,7 @@ You can also convert existing objects to a *data.table* using `as.data.table()`.
 * Row numbers are printed with a `:` in order to visually separate the row number from the first column.
 
 * When the number of rows to print exceeds the global option `datatable.print.nrows` (default = `r getOption("datatable.print.nrows")`), it automatically prints only the top 5 and bottom 5 rows (as can be seen in the [Data](#data) section).
-    
+
     ```{.r}
     getOption("datatable.print.nrows")
     ```
@@ -110,7 +106,7 @@ Users who have a SQL background might perhaps immediately relate to this syntax.
 
 Take `DT`, subset rows using `i`, then calculate `j`, grouped by `by`.
 
-# 
+#
 
 Let's begin by looking at `i` and `j` first - subsetting rows and operating on columns.
 
@@ -156,7 +152,7 @@ head(ans)
 * In addition, `order(...)` within the frame of a *data.table* uses *data.table*'s internal fast radix order `forder()`, which is much faster than `base::order`. Here's a small example to highlight the difference.
 
     ```{r}
-    odt = data.table(col=sample(1e7))
+    odt = data.table(col = sample(1e7))
     (t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
     (t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's forder
     (identical(ans1, ans2))
@@ -203,7 +199,7 @@ head(ans)
 
 #
 
-*data.tables* (and *data.frames*) are internally *lists*  as well, but with all its columns of equal length and with a *class* attribute. Allowing `j` to return a *list* enables converting and returning a *data.table* very efficiently. 
+*data.tables* (and *data.frames*) are internally *lists*  as well, but with all its columns of equal length and with a *class* attribute. Allowing `j` to return a *list* enables converting and returning a *data.table* very efficiently.
 
 #### Tip: {.bs-callout .bs-callout-warning #tip-1}
 
@@ -223,7 +219,7 @@ head(ans)
 
 * Wrap both columns within `.()`, or `list()`. That's it.
 
-# 
+#
 
 #### -- Select both `arr_delay` and `dep_delay` columns *and* rename them to `delay_arr` and `delay_dep`.
 
@@ -247,15 +243,15 @@ ans
 
 #### What's happening here? {.bs-callout .bs-callout-info}
 
-* *data.table*'s `j` can handle more than just *selecting columns* - it can handle *expressions*, i.e., *compute on columns*. This shouldn't be surprising, as columns can be referred to as if they are variables. Then we should be able to *compute* by calling functions on those variables. And that's what precisely happens here. 
+* *data.table*'s `j` can handle more than just *selecting columns* - it can handle *expressions*, i.e., *compute on columns*. This shouldn't be surprising, as columns can be referred to as if they are variables. Then we should be able to *compute* by calling functions on those variables. And that's what precisely happens here.
 
 ### f) Subset in `i` *and* do in `j`
 
 #### -- Calculate the average arrival and departure delay for all flights with "JFK" as the origin airport in the month of June.
 
 ```{r}
-ans <- flights[origin == "JFK" & month == 6L, 
-               .(m_arr=mean(arr_delay), m_dep=mean(dep_delay))]
+ans <- flights[origin == "JFK" & month == 6L,
+               .(m_arr = mean(arr_delay), m_dep = mean(dep_delay))]
 ans
 ```
 
@@ -265,7 +261,7 @@ ans
 
 * Now, we look at `j` and find that it uses only *two columns*. And what we have to do is to compute their `mean()`. Therefore we subset just those columns corresponding to the matching rows, and compute their `mean()`.
 
-Because the three main components of the query (`i`, `j` and `by`) are *together* inside `[...]`, *data.table* can see all three and optimise the query altogether *before evaluation*, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency. 
+Because the three main components of the query (`i`, `j` and `by`) are *together* inside `[...]`, *data.table* can see all three and optimise the query altogether *before evaluation*, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency.
 
 #### -- How many trips have been made in 2014 from "JFK" airport in the month of June?
 
@@ -274,7 +270,7 @@ ans <- flights[origin == "JFK" & month == 6L, length(dest)]
 ans
 ```
 
-The function `length()` requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to `length()` really. 
+The function `length()` requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to `length()` really.
 
 This type of operation occurs quite frequently, especially while grouping as we will see in the next section, that *data.table* provides a *special symbol* `.N` for it.
 
@@ -294,7 +290,7 @@ ans
 
 * Once again, we subset in `i` to get the *row indices* where `origin` airport equals *"JFK"*, and `month` equals *6*.
 
-* We see that `j` uses only `.N` and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices). 
+* We see that `j` uses only `.N` and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices).
 
 * Note that we did not wrap `.N` with `list()` or `.()`. Therefore a vector is returned.
 
@@ -302,12 +298,12 @@ We could have accomplished the same operation by doing `nrow(flights[origin == "
 
 ### g) Great! But how can I refer to columns by names in `j` (like in a *data.frame*)?
 
-You can refer to column names the *data.frame* way using `with = FALSE`. 
+You can refer to column names the *data.frame* way using `with = FALSE`.
 
 #### -- Select both `arr_delay` and `dep_delay` columns the *data.frame* way.
 
 ```{r}
-ans <- flights[, c("arr_delay", "dep_delay"), with=FALSE]
+ans <- flights[, c("arr_delay", "dep_delay"), with = FALSE]
 head(ans)
 ```
 
@@ -325,38 +321,38 @@ DF[with(DF, x > 1), ]
 
 #### {.bs-callout .bs-callout-info #with_false}
 
-* Using `with()` in (2) allows using `DF`'s column `x` as if it were a variable. 
+* Using `with()` in (2) allows using `DF`'s column `x` as if it were a variable.
 
-    Hence the argument name `with` in *data.table*. Setting `with=FALSE` disables the ability to refer to columns as if they are variables, thereby restoring the "*data.frame* mode".
+    Hence the argument name `with` in *data.table*. Setting `with = FALSE` disables the ability to refer to columns as if they are variables, thereby restoring the "*data.frame* mode".
 
 * We can also *deselect* columns using `-` or `!`. For example:
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     ## not run
 
     # returns all columns except arr_delay and dep_delay
-    ans <- flights[, !c("arr_delay", "dep_delay"), with=FALSE]
+    ans <- flights[, !c("arr_delay", "dep_delay"), with = FALSE]
     # or
-    ans <- flights[, -c("arr_delay", "dep_delay"), with=FALSE]
+    ans <- flights[, -c("arr_delay", "dep_delay"), with = FALSE]
     ```
 
 * From `v1.9.5+`, we can also select by specifying start and end column names, for e.g, `year:day` to select the first three columns.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     ## not run
 
     # returns year,month and day
-    ans <- flights[, year:day, with=FALSE]
+    ans <- flights[, year:day, with = FALSE]
     # returns day, month and year
-    ans <- flights[, day:year, with=FALSE]
+    ans <- flights[, day:year, with = FALSE]
     # returns all columns except year, month and day
-    ans <- flights[, -(year:day), with=FALSE]
-    ans <- flights[, !(year:day), with=FALSE]
+    ans <- flights[, -(year:day), with = FALSE]
+    ans <- flights[, !(year:day), with = FALSE]
     ```
 
     This is particularly handy while working interactively.
 
-# 
+#
 
 `with = TRUE` is default in *data.table* because we can do much more by allowing `j` to handle expressions - especially when combined with `by` as we'll see in a moment.
 
@@ -369,11 +365,11 @@ We've already seen `i` and `j` from *data.table*'s general form in the previous
 #### -- How can we get the number of trips corresponding to each origin airport?
 
 ```{r}
-ans <- flights[, .(.N), by=.(origin)]
+ans <- flights[, .(.N), by = .(origin)]
 ans
 
 ## or equivalently using a character vector in 'by'
-# ans <- flights[, .(.N), by="origin"]
+# ans <- flights[, .(.N), by = "origin"]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -389,20 +385,20 @@ ans
 * When there's only one column or expression to refer to in `j` and `by`, we can drop the `.()` notation. This is purely for convenience. We could instead do:
 
     ```{r}
-    ans <- flights[, .N, by=origin]
+    ans <- flights[, .N, by = origin]
     ans
     ```
 
     We'll use this convenient form wherever applicable hereafter.
 
-# 
+#
 
 #### -- How can we calculate the number of trips for each origin airport for carrier code *"AA"*? {#origin-.N}
 
 The unique carrier code *"AA"* corresponds to *American Airlines Inc.*
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=origin]
+ans <- flights[carrier == "AA", .N, by = origin]
 ans
 ```
 
@@ -415,11 +411,11 @@ ans
 #### -- How can we get the total number of trips for each `origin, dest` pair for carrier code *"AA"*? {#origin-dest-.N}
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=.(origin,dest)]
+ans <- flights[carrier == "AA", .N, by = .(origin,dest)]
 head(ans)
 
 ## or equivalently using a character vector in 'by'
-# ans <- flights[carrier == "AA", .N, by=c("origin", "dest")]
+# ans <- flights[carrier == "AA", .N, by = c("origin", "dest")]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -429,9 +425,9 @@ head(ans)
 #### -- How can we get the average arrival and departure delay for each `orig,dest` pair for each month for carrier code *"AA"*? {#origin-dest-month}
 
 ```{r}
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        by=.(origin, dest, month)]
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        by = .(origin, dest, month)]
 ans
 ```
 
@@ -447,14 +443,14 @@ Now what if we would like to order the result by those grouping columns `origin`
 
 ### b) keyby
 
-*data.table* retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by. 
+*data.table* retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by.
 
 #### -- So how can we directly order by all the grouping variables?
 
 ```{r}
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        keyby=.(origin, dest, month)]
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        keyby = .(origin, dest, month)]
 ans
 ```
 
@@ -462,7 +458,7 @@ ans
 
 * All we did was to change `by` to `keyby`. This automatically orders the result by the grouping variables in increasing order. Note that `keyby()` is applied after performing the  operation, i.e., on the computed result.
 
-**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an *attribute* called `sorted`. But we'll learn more about `keys` in the next vignette. 
+**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an *attribute* called `sorted`. But we'll learn more about `keys` in the next vignette.
 
 For now, all you've to know is you can use `keyby` to automatically order by the columns specified in `by`.
 
@@ -476,7 +472,7 @@ ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
 
 #### -- How can we order `ans` using the columns `origin` in ascending order, and `dest` in descending order?
 
-We can store the intermediate result in a variable, and then use `order(origin, -dest)` on that variable. It seems fairly straightforward. 
+We can store the intermediate result in a variable, and then use `order(origin, -dest)` on that variable. It seems fairly straightforward.
 
 ```{r}
 ans <- ans[order(origin, -dest)]
@@ -491,10 +487,10 @@ head(ans)
 
 #
 
-But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogther by `chaining` expressions.
+But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogether by `chaining` expressions.
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=.(origin, dest)][order(origin, -dest)]
+ans <- flights[carrier == "AA", .N, by = .(origin, dest)][order(origin, -dest)]
 head(ans, 10)
 ```
 
@@ -504,10 +500,10 @@ head(ans, 10)
 
 * Or you can also chain them vertically:
 
-    ```{r eval=FALSE}
-    DT[ ... 
-     ][ ... 
-     ][ ... 
+    ```{r eval = FALSE}
+    DT[ ...
+     ][ ...
+     ][ ...
      ]
     ```
 
@@ -528,7 +524,7 @@ ans
 
 * Note that we did not provide any names to `by-expression`. And names have been automatically assigned in the result.
 
-* You can provide other columns along with expressions, for example: `DT[, .N, by=.(a, b>0)]`. 
+* You can provide other columns along with expressions, for example: `DT[, .N, by = .(a, b>0)]`.
 
 ### e) Multiple columns in `j` - `.SD`
 
@@ -536,27 +532,27 @@ ans
 
 It is of course not practical to have to type `mean(myCol)` for every column one by one. What if you had a 100 columns to compute `mean()` of?
 
-How can we do this efficiently? To get there, refresh on [this tip](#tip-1) - *"As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table"*. Suppose we can refer to the *data subset for each group* as a variable *while grouping*, then we can loop through all the columns of that variable using the already familiar base function `lapply()`. We don't have to learn any new function. 
+How can we do this efficiently? To get there, refresh on [this tip](#tip-1) - *"As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table"*. Suppose we can refer to the *data subset for each group* as a variable *while grouping*, then we can loop through all the columns of that variable using the already familiar base function `lapply()`. We don't have to learn any new function.
 
 #### Special symbol `.SD`: {.bs-callout .bs-callout-info #special-SD}
 
-*data.table* provides a *special* symbol, called `.SD`. It stands for **S**ubset of **D**ata. It by itself is a *data.table* that holds the data for *the current group* defined using `by`. 
+*data.table* provides a *special* symbol, called `.SD`. It stands for **S**ubset of **D**ata. It by itself is a *data.table* that holds the data for *the current group* defined using `by`.
 
 Recall that a *data.table* is internally a list as well with all its columns of equal length.
 
-# 
+#
 
 Let's use the [*data.table* `DT` from before](#what-is-datatable-1a) to get a glimpse of what `.SD` looks like.
 
 ```{r}
 DT
 
-DT[, print(.SD), by=ID]
+DT[, print(.SD), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* `.SD` contains all the columns *except the grouping columns* by default. 
+* `.SD` contains all the columns *except the grouping columns* by default.
 
 * It is also generated by preserving the original order - data corresponding to `ID = "b"`, then `ID = "a"`, and then `ID = "c"`.
 
@@ -565,37 +561,37 @@ DT[, print(.SD), by=ID]
 To compute on (multiple) columns, we can then simply use the base R function `lapply()`.
 
 ```{r}
-DT[, lapply(.SD, mean), by=ID]
+DT[, lapply(.SD, mean), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* `.SD` holds the rows corresponding to columns *a*, *b* and *c* for that group. We compute the `mean()` on each of these columns using the already familiar base function `lapply()`. 
+* `.SD` holds the rows corresponding to columns *a*, *b* and *c* for that group. We compute the `mean()` on each of these columns using the already familiar base function `lapply()`.
 
 * Each group returns a list of three elements containing the mean value which will become the columns of the resulting `data.table`.
 
 * Since `lapply()` returns a *list*, there is no need to wrap it with an additional `.()` (if necessary, refer to [this tip](#tip-1)).
 
-# 
+#
 
-We are almost there. There is one little thing left to address. In our `flights` *data.table*, we only wanted to calculate the `mean()` of two columns `arr_delay` and `dep_delay`. But `.SD` would contain all the columns other than the grouping variables by default. 
+We are almost there. There is one little thing left to address. In our `flights` *data.table*, we only wanted to calculate the `mean()` of two columns `arr_delay` and `dep_delay`. But `.SD` would contain all the columns other than the grouping variables by default.
 
 #### -- How can we specify just the columns we would like to compute the `mean()` on?
 
 #### .SDcols {.bs-callout .bs-callout-info}
 
-Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group. 
+Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group.
 
 Similar to the [with = FALSE section](#with_false), you can also provide the columns to remove instead of columns to keep using `-` or `!` sign as well as select consecutive columns as `colA:colB` and deselect consecutive columns as `!(colA:colB) or `-(colA:colB)`.
 
-# 
+#
 Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of `arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`.
 
 ```{r}
-flights[carrier == "AA",                     ## Only on trips with carrier "AA"
-        lapply(.SD, mean),                   ## compute the mean
-        by=.(origin, dest, month),           ## for every 'origin,dest,month'
-        .SDcols=c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
+flights[carrier == "AA",                       ## Only on trips with carrier "AA"
+        lapply(.SD, mean),                     ## compute the mean
+        by = .(origin, dest, month),           ## for every 'origin,dest,month'
+        .SDcols = c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
 ```
 
 ### f) Subset `.SD` for each group:
@@ -603,7 +599,7 @@ flights[carrier == "AA",                     ## Only on trips with carrier "AA"
 #### -- How can we return the first two rows for each `month`?
 
 ```{r}
-ans <- flights[, head(.SD, 2), by=month]
+ans <- flights[, head(.SD, 2), by = month]
 head(ans)
 ```
 
@@ -617,10 +613,10 @@ head(ans)
 
 So that we have a consistent syntax and keep using already existing (and familiar) base functions instead of learning new functions. To illustrate, let us use the *data.table* `DT` we created at the very beginning under [What is a data.table?](#what-is-datatable-1a) section.
 
-#### -- How can we concatenate columns `a` and `b` for each group in `ID`? 
+#### -- How can we concatenate columns `a` and `b` for each group in `ID`?
 
 ```{r}
-DT[, .(val = c(a,b)), by=ID]
+DT[, .(val = c(a,b)), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -630,7 +626,7 @@ DT[, .(val = c(a,b)), by=ID]
 #### -- What if we would like to have all the values of column `a` and `b` concatenated, but returned as a list column?
 
 ```{r}
-DT[, .(val = list(c(a,b))), by=ID]
+DT[, .(val = list(c(a,b))), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -639,17 +635,17 @@ DT[, .(val = list(c(a,b))), by=ID]
 
 * Note those commas are for display only. A list column can contain any object in each cell, and in this example, each cell is itself a vector and some cells contain longer vectors than others.
 
-# 
-Once you start internalising usage in `j`, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of `print()`. 
+#
+Once you start internalising usage in `j`, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of `print()`.
 
 For example:
 
 ```{r}
 ## (1) look at the difference between
-DT[, print(c(a,b)), by=ID]
+DT[, print(c(a,b)), by = ID]
 
 ## (2) and
-DT[, print(list(c(a,b))), by=ID]
+DT[, print(list(c(a,b))), by = ID]
 ```
 
 In (1), for each group, a vector is returned, with length = 6,4,2 here. However (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore (1) results in a length of ` 6+4+2 = `r 6+4+2``, whereas (2) returns `1+1+1=`r 1+1+1``.
@@ -658,7 +654,7 @@ In (1), for each group, a vector is returned, with length = 6,4,2 here. However
 
 The general form of *data.table* syntax is:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 DT[i, j, by]
 ```
 
@@ -676,7 +672,7 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 
 1. Select columns the *data.table* way: `DT[, .(colA, colB)]`.
 
-2. Select columns the *data.frame* way: `DT[, c("colA", "colB"), with=FALSE]`.
+2. Select columns the *data.frame* way: `DT[, c("colA", "colB"), with = FALSE]`.
 
 3. Compute on columns: `DT[, .(sum(colA), mean(colB))]`.
 
@@ -687,20 +683,20 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 #
 
 #### Using `by`: {.bs-callout .bs-callout-info}
- 
+
 * Using `by`, we can group by columns by specifying a *list of columns* or a *character vector of column names* or even *expressions*. The flexibility of `j`, combined with `by` and `i` makes for a very powerful syntax.
 
-* `by` can handle multiple columns and also *expressions*. 
+* `by` can handle multiple columns and also *expressions*.
 
-* We can `keyby` grouping columns to automatically sort the grouped result. 
+* We can `keyby` grouping columns to automatically sort the grouped result.
 
 * We can use `.SD` and `.SDcols` in `j` to operate on multiple columns using already familiar base functions. Here are some examples:
 
-    1. `DT[, lapply(.SD, fun), by=., .SDcols=...]` - applies `fun` to all columns specified in `.SDcols` while grouping by the columns specified in `by`.
+    1. `DT[, lapply(.SD, fun), by = ..., .SDcols = ...]` - applies `fun` to all columns specified in `.SDcols` while grouping by the columns specified in `by`.
 
-    2. `DT[, head(.SD, 2), by=.]` - return the first two rows for each group.
+    2. `DT[, head(.SD, 2), by = ...]` - return the first two rows for each group.
 
-    3. `DT[col > val, head(.SD, 1), by=.]` - combine `i` along with `j` and `by`.
+    3. `DT[col > val, head(.SD, 1), by = ...]` - combine `i` along with `j` and `by`.
 
 #
 
@@ -708,7 +704,7 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 
 As long as `j` returns a *list*, each element of the list will become a column in the resulting *data.table*.
 
-# 
+#
 
 We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette.
 
diff --git a/inst/doc/datatable-intro.Rnw b/inst/doc/datatable-intro.Rnw
deleted file mode 100644
index 49627cb..0000000
--- a/inst/doc/datatable-intro.Rnw
+++ /dev/null
@@ -1,293 +0,0 @@
-\documentclass[a4paper]{article}
-
-\usepackage[margin=3cm]{geometry}
-%%\usepackage[round]{natbib}
-\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
-
-%%\newcommand{\acronym}[1]{\textsc{#1}}
-%%\newcommand{\class}[1]{\mbox{\textsf{#1}}}
-\newcommand{\code}[1]{\mbox{\texttt{#1}}}
-\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
-\newcommand{\proglang}[1]{\textsf{#1}}
-\SweaveOpts{keep.source=TRUE, strip.white=all}
-%% \VignetteIndexEntry{Quick introduction}
-
-<<echo=FALSE,results=hide>>=
-if (!exists("data.table",.GlobalEnv)) library(data.table)  
-# In devel won't call library, but R CMD build/check will.
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-# for development when we Sweave this file repeatedly. Otherwise first tables() shows tables from last run
-@
-
-\begin{document}
-\title{Introduction to the \pkg{data.table} package in \proglang{R}}
-\date{Revised: \today\\(A later revision may be available on the \href{https://github.com/Rdatatable/data.table/wiki}{homepage})}
-\maketitle
-
-\section*{Introduction}
-
-This vignette is aimed at those who are already familiar with creating and subsetting \code{data.frame} in \proglang{R}. We aim for this quick introduction
-to be readable in {\bf 10 minutes}, briefly covering a few features:
-1.\,Keys; 2.\,Fast Grouping; and 3.\,Fast \emph{ordered} join.
-
-\section*{Creation}
-
-Recall that we create a \code{data.frame} using the function \code{data.frame()}:
-<<>>=
-DF = data.frame(x=c("b","b","b","a","a"),v=rnorm(5))
-DF
-@
-A \code{data.table} is created in exactly the same way:
-<<>>=
-DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
-DT
-@
-Observe that a \code{data.table} prints the row numbers with a colon so as to visually separate the row number from the first column.
-We can easily convert existing \code{data.frame} objects to \code{data.table}.
-<<>>=
-CARS = data.table(cars)
-head(CARS)
-@
-We have just created two \code{data.table}s: \code{DT} and \code{CARS}. It is often useful to see a list of all
-\code{data.table}s in memory:
-<<>>=
-tables()
-@
-
-The MB column is useful to quickly assess memory use and to spot if any redundant tables can be
-removed to free up memory. Just like \code{data.frame}s, \code{data.table}s must fit inside RAM. 
-
-Some users regularly work with 20 or more tables in memory, rather like a database. 
-The result of \code{tables()} is itself a \code{data.table}, returned silently, so that \code{tables()} 
-can be used in programs. \code{tables()} is unrelated to the base function \code{table()}.
-
-To see the column types :
-
-<<>>=
-sapply(DT,class)
-@
-
-You may have noticed the empty column KEY in the result of \code{tables()} above. This is the subject of the next section.
-
-
-\section*{1. Keys}
-
-Let's start by considering \code{data.frame}, specifically \code{rownames}. We know that each row has exactly one row name. However, a person (for example) has at least two names, a first name and a second name. It's useful to organise a telephone directory sorted by surname then first name.
-
-In \code{data.table}, a \emph{key} consists of one \emph{or more} columns. These columns may be integer, factor or numeric as well as character. Furthermore, the rows are sorted by the key. Therefore, a \code{data.table} can have at most one key because it cannot be sorted in more than one way. We can think of a key as like super-charged row names; i.e., mult-column and multi-type.
-
-Uniqueness is not enforced; i.e., duplicate key values are allowed. Since
-the rows are sorted by the key, any duplicates in the key will appear consecutively.
-
-Let's remind ourselves of our tables:
-<<>>=
-tables()
-DT
-@
-
-No keys have been set yet. 
-
-<<>>=
-DT[2,]         # select row 2
-DT[x=="b",]    # select rows where column x == "b"
-@
-
-Aside: notice that we did not need to prefix \code{x} with \code{DT\$x}. In \code{data.table} queries, we can use column names as if they are variables directly.
-
-But since there are no rownames, the following does not work:
-<<>>=
-cat(try(DT["b",],silent=TRUE))
-@
-
-The error message tells us we need to use \code{setkey()}:
-<<>>=
-setkey(DT,x)
-DT
-@
-
-Notice that the rows in \code{DT} have now been re-ordered according to the values of \code{x}. 
-The two \code{"a"} rows have moved to the top.
-We can confirm that \code{DT} does indeed have a key using \code{haskey()}, \code{key()},
-\code{attributes()}, or just running \code{tables()}.
-
-<<>>=
-tables()
-@
-
-Now that we are sure \code{DT} has a key, let's try again:
-
-<<>>=
-DT["b",]
-@
-
-By default all the rows in the group are returned\footnote{In contrast to a \code{data.frame} where only the first rowname is returned when the rownames contain duplicates.}. The \code{mult} argument (short for \emph{multiple}) allows the first or last row of the group to be returned instead.
-
-<<>>=
-DT["b",mult="first"]
-DT["b",mult="last"]
-@
-
-Also, the comma is optional.
-
-<<>>=
-DT["b"]
-@
-
-Let's now create a new \code{data.frame}. We will make it large enough to demonstrate the
-difference between a \emph{vector scan} and a \emph{binary search}.
-<<print=TRUE>>=
-grpsize = ceiling(1e7/26^2)   # 10 million rows, 676 groups
-tt=system.time( DF <- data.frame(
-  x=rep(LETTERS,each=26*grpsize),
-  y=rep(letters,each=grpsize),
-  v=runif(grpsize*26^2),
-  stringsAsFactors=FALSE)
-)
-head(DF,3)
-tail(DF,3)
-dim(DF)
-@
-
-We might say that \proglang{R} has created a 3 column table and \emph{inserted}
-\Sexpr{format(nrow(DF),big.mark=",",scientific=FALSE)} rows. It took \Sexpr{format(tt[3],nsmall=3)} secs, so it inserted
-\Sexpr{format(as.integer(nrow(DF)/tt[3]),big.mark=",",scientific=FALSE)} rows per second. This is normal in base \proglang{R}. Notice that we set \code{stringsAsFactors=FALSE}. This makes it a little faster for a fairer comparison, but feel free to experiment. 
-
-Let's extract an arbitrary group from \code{DF}:
-
-<<print=TRUE>>=
-tt=system.time(ans1 <- DF[DF$x=="R" & DF$y=="h",])   # 'vector scan'
-head(ans1,3)
-dim(ans1)
-@
-
-Now convert to a \code{data.table} and extract the same group:
-
-<<>>=
-DT = as.data.table(DF)       # but normally use fread() or data.table() directly, originally 
-system.time(setkey(DT,x,y))  # one-off cost, usually
-@
-<<print=TRUE>>=
-ss=system.time(ans2 <- DT[list("R","h")])   # binary search
-head(ans2,3)
-dim(ans2)
-identical(ans1$v, ans2$v)
-@
-<<echo=FALSE>>=
-if(!identical(ans1$v, ans2$v)) stop("vector scan vs binary search not equal")
-@
-
-At \Sexpr{format(ss[3],nsmall=3)} seconds, this was {\bf\Sexpr{as.integer(tt[3]/ss[3])}} times faster than \Sexpr{format(tt[3],nsmall=3)} seconds,
-and produced precisely the same result. If you are thinking that a few seconds is not much to save, it's the relative speedup that's important. The
-vector scan is linear, but the binary search is O(log n). It scales. If a task taking 10 hours is sped up by 100 times to 6 minutes, that is
-significant\footnote{We wonder how many people are deploying parallel techniques to code that is vector scanning}. 
-
-We can do vector scans in \code{data.table}, too. In other words we can use data.table \emph{badly}.
-
-<<>>=
-system.time(ans1 <- DT[x=="R" & y=="h",])   # works but is using data.table badly
-system.time(ans2 <- DF[DF$x=="R" & DF$y=="h",])   # the data.frame way
-mapply(identical,ans1,ans2)
-@
-
-
-If the phone book analogy helped, the {\bf\Sexpr{as.integer(tt[3]/ss[3])}} times speedup should not be surprising. We use the key to take advantage of the fact 
-that the table is sorted and use binary search to find the matching rows. We didn't vector scan; we didn't use \code{==}.
-
-When we used \code{x=="R"} we \emph{scanned} the entire column x, testing each and every value to see if it equalled "R". We did
-it again in the y column, testing for "h". Then \code{\&} combined the two logical results to create a single logical vector which was
-passed to the \code{[} method, which in turn searched it for \code{TRUE} and returned those rows. These were \emph{vectorized} operations. They
-occurred internally in R and were very fast, but they were scans. \emph{We} did those scans because \emph{we} wrote that R code.
-
-
-When \code{i} is a \code{list} (and \code{data.table} is a \code{list} too), we say that we are \emph{joining}. In this case, we are joining DT to the 1 row, 2 column table returned by \code{list("R","h")}. Since we do this a lot, there is an alias for \code{list}: \code{.()}.
-
-<<>>=
-identical( DT[list("R","h"),],
-           DT[.("R","h"),])
-@
-<<echo=FALSE>>=
-if(!identical(DT[list("R","h"),],DT[.("R","h"),])) stop("list != . check")
-@
-
-Both vector scanning and binary search are available in \code{data.table}, but one way of using \code{data.table} is much better than the other.
-
-The join syntax is a short, fast to write and easy to maintain. Passing a \code{data.table} into a \code{data.table} subset is analogous to \code{A[B]} syntax in base \proglang{R} where \code{A} is a matrix and \code{B} is a 2-column matrix\footnote{Subsetting a keyed \code{data.table} by a n-column 
-\code{data.table} is consistent with subsetting a n-dimension array by a n-column matrix in base R}. In fact, the \code{A[B]} syntax in base R inspired the \code{data.table} package. There are
-other types of ordered joins and further arguments which are beyond the scope of this quick introduction.
-
-The merge method of \code{data.table} is very similar to \code{X[Y]}, but there are some differences. See FAQ 1.12.
-
-This first section has been about the first argument inside \code{DT[...]}, namely \code{i}. The next section is about the 2nd and 3rd arguments: \code{j} and \code{by}.
-
-
-\section*{2. Fast grouping}
-
-
-The second argument to \code{DT[...]} is \code{j} and may consist of
-one or more expressions whose arguments are (unquoted) column names, as if the column names were variables. Just as we saw earlier in \code{i} as well.
-
-<<>>=
-DT[,sum(v)]
-@
-
-When we supply a \code{j} expression and a 'by' expression, the \code{j} expression is repeated for each 'by' group.
-
-<<>>=
-DT[,sum(v),by=x]
-@
-
-The \code{by} in \code{data.table} is fast.  Let's compare it to \code{tapply}.
-
-<<>>=
-ttt=system.time(tt <- tapply(DT$v,DT$x,sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by=x]); sss
-head(tt)
-head(ss)
-identical(as.vector(tt), ss$V1)
-@
-<<echo=FALSE>>=
-if(!identical(as.vector(tt), ss$V1)) stop("by check failed")
-@
-
-At \Sexpr{sprintf("%0.3f",sss[3])} sec, this was {\bf\Sexpr{as.integer(ttt[3]/sss[3])}} times faster than 
-\Sexpr{sprintf("%0.3f",ttt[3])} sec, and produced precisely the same result.
-
-Next, let's group by two columns:
-
-<<>>=
-ttt=system.time(tt <- tapply(DT$v,list(DT$x,DT$y),sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by="x,y"]); sss
-tt[1:5,1:5]
-head(ss)
-identical(as.vector(t(tt)), ss$V1)
-@
-<<echo=FALSE>>=
-if(!identical(as.vector(t(tt)), ss$V1)) stop("group check failed")
-@
-
-This was {\bf\Sexpr{as.integer(ttt[3]/sss[3])}} times faster, and the syntax is a little simpler and easier to read.
-\newline
-
-
-\section*{3. Fast ordered joins}
-
-This is also known as last observation carried forward (LOCF) or a \emph{rolling join}.
-
-Recall that \code{X[Y]} is a join between \code{data.table} \code{X} and \code{data.table} \code{Y}.  If \code{Y} has 2 columns, the first column is matched
-to the first column of the key of \code{X} and the 2nd column to the 2nd.  An equi-join is performed by default, meaning that the values must be equal.
-
-Instead of an equi-join, a rolling join is :
-
-\code{X[Y,roll=TRUE]}
-
-As before the first column of \code{Y} is matched to \code{X} where the values are equal. The last join column in \code{Y} though, the 2nd one in
-this example, is treated specially. If no match is found, then the row before is returned, provided the first column still matches.
-
-Further controls are rolling forwards, backwards, nearest and limited staleness. 
-
-For examples type \code{example(data.table)} and follow the output at the prompt.
-
-
-\end{document}
-
-
diff --git a/inst/doc/datatable-intro.html b/inst/doc/datatable-intro.html
new file mode 100644
index 0000000..c928e04
--- /dev/null
+++ b/inst/doc/datatable-intro.html
@@ -0,0 +1,907 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+<meta name="date" content="2016-12-02" />
+
+<title>Introduction to data.table</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Introduction to data.table</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
+
+
+
+<p>This vignette introduces the <em>data.table</em> syntax, its general form, how to <em>subset</em> rows, <em>select and compute</em> on columns and perform aggregations <em>by group</em>. Familiarity with <em>data.frame</em> data structure from base R is useful, but not essential to follow this vignette.</p>
+<hr />
+<div id="data-analysis-using-data.table" class="section level2">
+<h2>Data analysis using data.table</h2>
+<p>Data manipulation operations such as <em>subset</em>, <em>group</em>, <em>update</em>, <em>join</em> etc., are all inherently related. Keeping these <em>related operations together</em> allows for:</p>
+<ul>
+<li><p><em>concise</em> and <em>consistent</em> syntax irrespective of the set of operations you would like to perform to achieve your end goal.</p></li>
+<li><p>performing analysis <em>fluidly</em> without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis.</p></li>
+<li><p><em>automatically</em> optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient.</p></li>
+</ul>
+<p>Briefly, if you are interested in reducing <em>programming</em> and <em>compute</em> time tremendously, then this package is for you. The philosophy that <em>data.table</em> adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.</p>
+</div>
+<div id="data" class="section level2">
+<h2>Data</h2>
+<p>In this vignette, we will use <a href="https://github.com/arunsrinivasan/flights/wiki/NYC-Flights-2014-data">NYC-flights14</a> data. It contains On-Time flights data from the <a href="http://www.transtats.bts.gov/DL_SelectFields.asp?Table_ID=236">Bureau of Transporation Statistics</a> for all the flights that departed from New York City airports in 2014 (inspired by <a href="https://github.com/hadley/nycflights13">nycflights13</a>). The data is available only for Jan-Oct’14.</p>
+<p>We can use <em>data.table’s</em> fast file reader <code>fread</code> to load <em>flights</em> directly as follows:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <-<span class="st"> </span><span class="kw">fread</span>(<span class="st">"flights14.csv"</span>)
+flights
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co">#     ---                                                                              </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8</span>
+<span class="kw">dim</span>(flights)
+<span class="co"># [1] 253316     11</span></code></pre></div>
+<p>Aside: <code>fread</code> accepts <code>http</code> and <code>https</code> URLs directly as well as operating system commands such as <code>sed</code> and <code>awk</code> output. See <code>?fread</code> for examples.</p>
+</div>
+<div id="introduction" class="section level2">
+<h2>Introduction</h2>
+<p>In this vignette, we will</p>
+<ol style="list-style-type: decimal">
+<li><p>start with basics - what is a <em>data.table</em>, its general form, how to <em>subset</em> rows, <em>select and compute</em> on columns</p></li>
+<li><p>and then we will look at performing data aggregations by group,</p></li>
+</ol>
+</div>
+<div id="basics-1" class="section level2">
+<h2>1. Basics</h2>
+<div id="what-is-datatable-1a" class="section level3">
+<h3>a) What is data.table?</h3>
+<p><em>data.table</em> is an R package that provides <strong>an enhanced version</strong> of <em>data.frames</em>. In the <a href="#data">Data</a> section, we already created a <em>data.table</em> using <code>fread()</code>. We can also create one using the <code>data.table()</code> function. Here is an example:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">ID =</span> <span class="kw">c</span>(<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"a"</span>,<span class="st">"a"</span>,<span class="st">"c"</span>), <span class="dt">a =</span> <span class="dv">1</span>:<span class= [...]
+DT
+<span class="co">#    ID a  b  c</span>
+<span class="co"># 1:  b 1  7 13</span>
+<span class="co"># 2:  b 2  8 14</span>
+<span class="co"># 3:  b 3  9 15</span>
+<span class="co"># 4:  a 4 10 16</span>
+<span class="co"># 5:  a 5 11 17</span>
+<span class="co"># 6:  c 6 12 18</span>
+<span class="kw">class</span>(DT$ID)
+<span class="co"># [1] "character"</span></code></pre></div>
+<p>You can also convert existing objects to a <em>data.table</em> using <code>as.data.table()</code>.</p>
+<div id="note-that" class="section level4 bs-callout bs-callout-info">
+<h4>Note that:</h4>
+<ul>
+<li><p>Unlike <em>data.frames</em>, columns of <code>character</code> type are <em>never</em> converted to <code>factors</code> by default.</p></li>
+<li><p>Row numbers are printed with a <code>:</code> in order to visually separate the row number from the first column.</p></li>
+<li><p>When the number of rows to print exceeds the global option <code>datatable.print.nrows</code> (default = 100), it automatically prints only the top 5 and bottom 5 rows (as can be seen in the <a href="#data">Data</a> section).</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">getOption</span>(<span class="st">"datatable.print.nrows"</span>)
+<span class="co"># [1] 100</span></code></pre></div></li>
+<li><p><em>data.table</em> doesn’t set or use <em>row names</em>, ever. We will see as to why in <em>“Keys and fast binary search based subset”</em> vignette.</p></li>
+</ul>
+</div>
+</div>
+<div id="enhanced-1b" class="section level3">
+<h3>b) General form - in what way is a data.table <em>enhanced</em>?</h3>
+<p>In contrast to a <em>data.frame</em>, you can do <em>a lot more</em> than just subsetting rows and selecting columns within the frame of a <em>data.table</em>, i.e., within <code>[ ... ]</code>. To understand it we will have to first look at the <em>general form</em> of <em>data.table</em> syntax, as shown below:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[i, j, by]
+
+##   R:      i                 j        by
+## SQL:  where   select | update  group by</code></pre></div>
+<p>Users who have a SQL background might perhaps immediately relate to this syntax.</p>
+<div id="the-way-to-read-it-out-loud-is" class="section level4 bs-callout bs-callout-info">
+<h4>The way to read it (out loud) is:</h4>
+<p>Take <code>DT</code>, subset rows using <code>i</code>, then calculate <code>j</code>, grouped by <code>by</code>.</p>
+</div>
+</div>
+</div>
+<div id="section" class="section level1">
+<h1></h1>
+<p>Let’s begin by looking at <code>i</code> and <code>j</code> first - subsetting rows and operating on columns.</p>
+<div id="c-subset-rows-in-i" class="section level3">
+<h3>c) Subset rows in <code id="subset-i-1c">i</code></h3>
+<div id="get-all-the-flights-with-jfk-as-the-origin-airport-in-the-month-of-june." class="section level4">
+<h4>– Get all the flights with “JFK” as the origin airport in the month of June.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L]
+<span class="kw">head</span>(ans)
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     6   1        -9        -5      AA    JFK  LAX      324     2475    8</span>
+<span class="co"># 2: 2014     6   1       -10       -13      AA    JFK  LAX      329     2475   12</span>
+<span class="co"># 3: 2014     6   1        18        -1      AA    JFK  LAX      326     2475    7</span>
+<span class="co"># 4: 2014     6   1        -6       -16      AA    JFK  LAX      320     2475   10</span>
+<span class="co"># 5: 2014     6   1        -4       -45      AA    JFK  LAX      326     2475   18</span>
+<span class="co"># 6: 2014     6   1        -6       -23      AA    JFK  LAX      329     2475   14</span></code></pre></div>
+</div>
+<div id="section-1" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Within the frame of a <em>data.table</em>, columns can be referred to <em>as if they are variables</em>. Therefore, we simply refer to <code>dest</code> and <code>month</code> as if they are variables. We do not need to add the prefix <code>flights$</code> each time. However using <code>flights$dest</code> and <code>flights$month</code> would work just fine.</p></li>
+<li><p>The <em>row indices</em> that satisfies the condition <code>origin == "JFK" & month == 6L</code> are computed, and since there is nothing else left to do, a <em>data.table</em> all columns from <code>flights</code> corresponding to those <em>row indices</em> are simply returned.</p></li>
+<li><p>A comma after the condition is also not required in <code>i</code>. But <code>flights[dest == "JFK" & month == 6L, ]</code> would work just fine. In <em>data.frames</em> however, the comma is necessary.</p></li>
+</ul>
+</div>
+<div id="subset-rows-integer" class="section level4">
+<h4>– Get the first two rows from <code>flights</code>.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="dv">1</span>:<span class="dv">2</span>]
+ans
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span></code></pre></div>
+</div>
+<div id="section-2" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li>In this case, there is no condition. The row indices are already provided in <code>i</code>. We therefore return a <em>data.table</em> with all columns from <code>flight</code> for those <em>row indices</em>.</li>
+</ul>
+</div>
+<div id="sort-flights-first-by-column-origin-in-ascending-order-and-then-by-dest-in-descending-order" class="section level4">
+<h4>– Sort <code>flights</code> first by column <code>origin</code> in <em>ascending</em> order, and then by <code>dest</code> in <em>descending</em> order:</h4>
+<p>We can use the base R function <code>order()</code> to accomplish this.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="kw">order</span>(origin, -dest)]
+<span class="kw">head</span>(ans)
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   5         6        49      EV    EWR  XNA      195     1131    8</span>
+<span class="co"># 2: 2014     1   6         7        13      EV    EWR  XNA      190     1131    8</span>
+<span class="co"># 3: 2014     1   7        -6       -13      EV    EWR  XNA      179     1131    8</span>
+<span class="co"># 4: 2014     1   8        -7       -12      EV    EWR  XNA      184     1131    8</span>
+<span class="co"># 5: 2014     1   9        16         7      EV    EWR  XNA      181     1131    8</span>
+<span class="co"># 6: 2014     1  13        66        66      EV    EWR  XNA      188     1131    9</span></code></pre></div>
+</div>
+<div id="order-is-internally-optimised" class="section level4 bs-callout bs-callout-info">
+<h4><code>order()</code> is internally optimised</h4>
+<ul>
+<li><p>We can use “-” on a <em>character</em> columns within the frame of a <em>data.table</em> to sort in decreasing order.</p></li>
+<li><p>In addition, <code>order(...)</code> within the frame of a <em>data.table</em> uses <em>data.table</em>’s internal fast radix order <code>forder()</code>, which is much faster than <code>base::order</code>. Here’s a small example to highlight the difference.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">odt =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">col =</span> <span class="kw">sample</span>(<span class="fl">1e7</span>))
+(t1 <-<span class="st"> </span><span class="kw">system.time</span>(ans1 <-<span class="st"> </span>odt[base::<span class="kw">order</span>(col)]))  ## uses order from base R
+<span class="co">#    user  system elapsed </span>
+<span class="co">#   0.640   0.004   0.645</span>
+(t2 <-<span class="st"> </span><span class="kw">system.time</span>(ans2 <-<span class="st"> </span>odt[<span class="kw">order</span>(col)]))        ## uses data.table's forder
+<span class="co">#    user  system elapsed </span>
+<span class="co">#   0.648   0.004   0.651</span>
+(<span class="kw">identical</span>(ans1, ans2))
+<span class="co"># [1] TRUE</span></code></pre></div></li>
+</ul>
+<p>The speedup here is <strong>~1x</strong>. We will discuss <em>data.table</em>’s fast order in more detail in the <em>data.table internals</em> vignette.</p>
+<ul>
+<li>This is so that you can improve performance tremendously while using already familiar functions.</li>
+</ul>
+</div>
+</div>
+</div>
+<div id="section-3" class="section level1">
+<h1></h1>
+<div id="d-select-columns-in-j" class="section level3">
+<h3>d) Select column(s) in <code id="select-j-1d">j</code></h3>
+<div id="select-arr_delay-column-but-return-it-as-a-vector." class="section level4">
+<h4>– Select <code>arr_delay</code> column, but return it as a <em>vector</em>.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, arr_delay]
+<span class="kw">head</span>(ans)
+<span class="co"># [1]  13  13   9 -26   1   0</span></code></pre></div>
+</div>
+<div id="section-4" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Since columns can be referred to as if they are variables within the frame of data.tables, we directly refer to the <em>variable</em> we want to subset. Since we want <em>all the rows</em>, we simply skip <code>i</code>.</p></li>
+<li><p>It returns <em>all</em> the rows for the column <code>arr_delay</code>.</p></li>
+</ul>
+</div>
+<div id="select-arr_delay-column-but-return-as-a-data.table-instead." class="section level4">
+<h4>– Select <code>arr_delay</code> column, but return as a <em>data.table</em> instead.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">list</span>(arr_delay)]
+<span class="kw">head</span>(ans)
+<span class="co">#    arr_delay</span>
+<span class="co"># 1:        13</span>
+<span class="co"># 2:        13</span>
+<span class="co"># 3:         9</span>
+<span class="co"># 4:       -26</span>
+<span class="co"># 5:         1</span>
+<span class="co"># 6:         0</span></code></pre></div>
+</div>
+<div id="section-5" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We wrap the <em>variables</em> (column names) within <code>list()</code>, which ensures that a <em>data.table</em> is returned. In case of a single column name, not wrapping with <code>list()</code> returns a vector instead, as seen in the <a href="#select-j-1d">previous example</a>.</p></li>
+<li><p><em>data.table</em> also allows using <code>.()</code> to wrap columns with. It is an <em>alias</em> to <code>list()</code>; they both mean the same. Feel free to use whichever you prefer.</p>
+<p>We will continue to use <code>.()</code> from here on.</p></li>
+</ul>
+</div>
+</div>
+</div>
+<div id="section-6" class="section level1">
+<h1></h1>
+<p><em>data.tables</em> (and <em>data.frames</em>) are internally <em>lists</em> as well, but with all its columns of equal length and with a <em>class</em> attribute. Allowing <code>j</code> to return a <em>list</em> enables converting and returning a <em>data.table</em> very efficiently.</p>
+<div id="tip-1" class="section level4 bs-callout bs-callout-warning">
+<h4>Tip:</h4>
+<p>As long as <code>j-expression</code> returns a <em>list</em>, each element of the list will be converted to a column in the resulting <em>data.table</em>. This makes <code>j</code> quite powerful, as we will see shortly.</p>
+</div>
+<div id="select-both-arr_delay-and-dep_delay-columns." class="section level4">
+<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(arr_delay, dep_delay)]
+<span class="kw">head</span>(ans)
+<span class="co">#    arr_delay dep_delay</span>
+<span class="co"># 1:        13        14</span>
+<span class="co"># 2:        13        -3</span>
+<span class="co"># 3:         9         2</span>
+<span class="co"># 4:       -26        -8</span>
+<span class="co"># 5:         1         2</span>
+<span class="co"># 6:         0         4</span>
+
+## alternatively
+<span class="co"># ans <- flights[, list(arr_delay, dep_delay)]</span></code></pre></div>
+</div>
+<div id="section-7" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li>Wrap both columns within <code>.()</code>, or <code>list()</code>. That’s it.</li>
+</ul>
+</div>
+</div>
+<div id="section-8" class="section level1">
+<h1></h1>
+<div id="select-both-arr_delay-and-dep_delay-columns-and-rename-them-to-delay_arr-and-delay_dep." class="section level4">
+<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns <em>and</em> rename them to <code>delay_arr</code> and <code>delay_dep</code>.</h4>
+<p>Since <code>.()</code> is just an alias for <code>list()</code>, we can name columns as we would while creating a <em>list</em>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(<span class="dt">delay_arr =</span> arr_delay, <span class="dt">delay_dep =</span> dep_delay)]
+<span class="kw">head</span>(ans)
+<span class="co">#    delay_arr delay_dep</span>
+<span class="co"># 1:        13        14</span>
+<span class="co"># 2:        13        -3</span>
+<span class="co"># 3:         9         2</span>
+<span class="co"># 4:       -26        -8</span>
+<span class="co"># 5:         1         2</span>
+<span class="co"># 6:         0         4</span></code></pre></div>
+<p>That’s it.</p>
+</div>
+<div id="e-compute-or-do-in-j" class="section level3">
+<h3>e) Compute or <em>do</em> in <code>j</code></h3>
+<div id="how-many-trips-have-had-total-delay-0" class="section level4">
+<h4>– How many trips have had total delay < 0?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">sum</span>((arr_delay +<span class="st"> </span>dep_delay) <<span class="st"> </span><span class="dv">0</span>)]
+ans
+<span class="co"># [1] 141814</span></code></pre></div>
+</div>
+<div id="whats-happening-here" class="section level4 bs-callout bs-callout-info">
+<h4>What’s happening here?</h4>
+<ul>
+<li><em>data.table</em>’s <code>j</code> can handle more than just <em>selecting columns</em> - it can handle <em>expressions</em>, i.e., <em>compute on columns</em>. This shouldn’t be surprising, as columns can be referred to as if they are variables. Then we should be able to <em>compute</em> by calling functions on those variables. And that’s what precisely happens here.</li>
+</ul>
+</div>
+</div>
+<div id="f-subset-in-i-and-do-in-j" class="section level3">
+<h3>f) Subset in <code>i</code> <em>and</em> do in <code>j</code></h3>
+<div id="calculate-the-average-arrival-and-departure-delay-for-all-flights-with-jfk-as-the-origin-airport-in-the-month-of-june." class="section level4">
+<h4>– Calculate the average arrival and departure delay for all flights with “JFK” as the origin airport in the month of June.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L,
+               .(<span class="dt">m_arr =</span> <span class="kw">mean</span>(arr_delay), <span class="dt">m_dep =</span> <span class="kw">mean</span>(dep_delay))]
+ans
+<span class="co">#       m_arr    m_dep</span>
+<span class="co"># 1: 5.839349 9.807884</span></code></pre></div>
+</div>
+<div id="section-9" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We first subset in <code>i</code> to find matching <em>row indices</em> where <code>origin</code> airport equals <em>“JFK”</em>, and <code>month</code> equals <em>6</em>. At this point, we <em>do not</em> subset the entire <em>data.table</em> corresponding to those rows.</p></li>
+<li><p>Now, we look at <code>j</code> and find that it uses only <em>two columns</em>. And what we have to do is to compute their <code>mean()</code>. Therefore we subset just those columns corresponding to the matching rows, and compute their <code>mean()</code>.</p></li>
+</ul>
+<p>Because the three main components of the query (<code>i</code>, <code>j</code> and <code>by</code>) are <em>together</em> inside <code>[...]</code>, <em>data.table</em> can see all three and optimise the query altogether <em>before evaluation</em>, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency.</p>
+</div>
+<div id="how-many-trips-have-been-made-in-2014-from-jfk-airport-in-the-month-of-june" class="section level4">
+<h4>– How many trips have been made in 2014 from “JFK” airport in the month of June?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L, <span class="kw">length</span>(dest)]
+ans
+<span class="co"># [1] 8422</span></code></pre></div>
+<p>The function <code>length()</code> requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to <code>length()</code> really.</p>
+<p>This type of operation occurs quite frequently, especially while grouping as we will see in the next section, that <em>data.table</em> provides a <em>special symbol</em> <code>.N</code> for it.</p>
+</div>
+<div id="special-N" class="section level4 bs-callout bs-callout-info">
+<h4>Special symbol <code>.N</code>:</h4>
+<p><code>.N</code> is a special in-built variable that holds the number of observations in the current group. It is particularly useful when combined with <code>by</code> as we’ll see in the next section. In the absence of group by operations, it simply returns the number of rows in the subset.</p>
+</div>
+</div>
+</div>
+<div id="section-10" class="section level1">
+<h1></h1>
+<p>So we can now accomplish the same task by using <code>.N</code> as follows:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[origin ==<span class="st"> "JFK"</span> &<span class="st"> </span>month ==<span class="st"> </span>6L, .N]
+ans
+<span class="co"># [1] 8422</span></code></pre></div>
+<div id="section-11" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Once again, we subset in <code>i</code> to get the <em>row indices</em> where <code>origin</code> airport equals <em>“JFK”</em>, and <code>month</code> equals <em>6</em>.</p></li>
+<li><p>We see that <code>j</code> uses only <code>.N</code> and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices).</p></li>
+<li><p>Note that we did not wrap <code>.N</code> with <code>list()</code> or <code>.()</code>. Therefore a vector is returned.</p></li>
+</ul>
+<p>We could have accomplished the same operation by doing <code>nrow(flights[origin == "JFK" & month == 6L])</code>. However, it would have to subset the entire <em>data.table</em> first corresponding to the <em>row indices</em> in <code>i</code> <em>and then</em> return the rows using <code>nrow()</code>, which is unnecessary and inefficient. We will cover this and other optimisation aspects in detail under the <em>data.table design</em> vignette.</p>
+</div>
+<div id="g-great-but-how-can-i-refer-to-columns-by-names-in-j-like-in-a-data.frame" class="section level3">
+<h3>g) Great! But how can I refer to columns by names in <code>j</code> (like in a <em>data.frame</em>)?</h3>
+<p>You can refer to column names the <em>data.frame</em> way using <code>with = FALSE</code>.</p>
+<div id="select-both-arr_delay-and-dep_delay-columns-the-data.frame-way." class="section level4">
+<h4>– Select both <code>arr_delay</code> and <code>dep_delay</code> columns the <em>data.frame</em> way.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with =<span class="st"> </span><span class="ot">FALSE</span>]
+<span class="kw">head</span>(ans)
+<span class="co">#    arr_delay dep_delay</span>
+<span class="co"># 1:        13        14</span>
+<span class="co"># 2:        13        -3</span>
+<span class="co"># 3:         9         2</span>
+<span class="co"># 4:       -26        -8</span>
+<span class="co"># 5:         1         2</span>
+<span class="co"># 6:         0         4</span></code></pre></div>
+<p>The argument is named <code>with</code> after the R function <code>with()</code> because of similar functionality. Suppose you’ve a <em>data.frame</em> <code>DF</code> and you’d like to subset all rows where <code>x > 1</code>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">x =</span> <span class="kw">c</span>(<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">1</span>,<span class="dv">2</span>,<span class="dv">2</span>,<span class="dv">3</span>,<span class="dv">3</span>,<span class="dv">3</span>), <span class="dt">y =</span> <span class="dv">1</span>:<span class="dv">8</span>)
+
+## (1) normal way
+DF[DF$x ><span class="st"> </span><span class="dv">1</span>, ] <span class="co"># data.frame needs that ',' as well</span>
+<span class="co">#   x y</span>
+<span class="co"># 4 2 4</span>
+<span class="co"># 5 2 5</span>
+<span class="co"># 6 3 6</span>
+<span class="co"># 7 3 7</span>
+<span class="co"># 8 3 8</span>
+
+## (2) using with
+DF[<span class="kw">with</span>(DF, x ><span class="st"> </span><span class="dv">1</span>), ]
+<span class="co">#   x y</span>
+<span class="co"># 4 2 4</span>
+<span class="co"># 5 2 5</span>
+<span class="co"># 6 3 6</span>
+<span class="co"># 7 3 7</span>
+<span class="co"># 8 3 8</span></code></pre></div>
+</div>
+<div id="with_false" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Using <code>with()</code> in (2) allows using <code>DF</code>’s column <code>x</code> as if it were a variable.</p>
+<p>Hence the argument name <code>with</code> in <em>data.table</em>. Setting <code>with = FALSE</code> disables the ability to refer to columns as if they are variables, thereby restoring the “<em>data.frame</em> mode”.</p></li>
+<li><p>We can also <em>deselect</em> columns using <code>-</code> or <code>!</code>. For example:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
+
+<span class="co"># returns all columns except arr_delay and dep_delay</span>
+ans <-<span class="st"> </span>flights[, !<span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with =<span class="st"> </span><span class="ot">FALSE</span>]
+<span class="co"># or</span>
+ans <-<span class="st"> </span>flights[, -<span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>), with =<span class="st"> </span><span class="ot">FALSE</span>]</code></pre></div></li>
+<li><p>From <code>v1.9.5+</code>, we can also select by specifying start and end column names, for e.g, <code>year:day</code> to select the first three columns.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
+
+<span class="co"># returns year,month and day</span>
+ans <-<span class="st"> </span>flights[, year:day, with =<span class="st"> </span><span class="ot">FALSE</span>]
+<span class="co"># returns day, month and year</span>
+ans <-<span class="st"> </span>flights[, day:year, with =<span class="st"> </span><span class="ot">FALSE</span>]
+<span class="co"># returns all columns except year, month and day</span>
+ans <-<span class="st"> </span>flights[, -(year:day), with =<span class="st"> </span><span class="ot">FALSE</span>]
+ans <-<span class="st"> </span>flights[, !(year:day), with =<span class="st"> </span><span class="ot">FALSE</span>]</code></pre></div>
+<p>This is particularly handy while working interactively.</p></li>
+</ul>
+</div>
+</div>
+</div>
+<div id="section-12" class="section level1">
+<h1></h1>
+<p><code>with = TRUE</code> is default in <em>data.table</em> because we can do much more by allowing <code>j</code> to handle expressions - especially when combined with <code>by</code> as we’ll see in a moment.</p>
+<div id="aggregations" class="section level2">
+<h2>2. Aggregations</h2>
+<p>We’ve already seen <code>i</code> and <code>j</code> from <em>data.table</em>’s general form in the previous section. In this section, we’ll see how they can be combined together with <code>by</code> to perform operations <em>by group</em>. Let’s look at some examples.</p>
+<div id="a-grouping-using-by" class="section level3">
+<h3>a) Grouping using <code>by</code></h3>
+<div id="how-can-we-get-the-number-of-trips-corresponding-to-each-origin-airport" class="section level4">
+<h4>– How can we get the number of trips corresponding to each origin airport?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .(.N), by =<span class="st"> </span>.(origin)]
+ans
+<span class="co">#    origin     N</span>
+<span class="co"># 1:    JFK 81483</span>
+<span class="co"># 2:    LGA 84433</span>
+<span class="co"># 3:    EWR 87400</span>
+
+## or equivalently using a character vector in 'by'
+<span class="co"># ans <- flights[, .(.N), by = "origin"]</span></code></pre></div>
+</div>
+<div id="section-13" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We know <code>.N</code> <a href="#special-N">is a special variable</a> that holds the number of rows in the current group. Grouping by <code>origin</code> obtains the number of rows, <code>.N</code>, for each group.</p></li>
+<li><p>By doing <code>head(flights)</code> you can see that the origin airports occur in the order <em>“JFK”</em>, <em>“LGA”</em> and <em>“EWR”</em>. The original order of grouping variables is preserved in the result.</p></li>
+<li><p>Since we did not provide a name for the column returned in <code>j</code>, it was named <code>N</code>automatically by recognising the special symbol <code>.N</code>.</p></li>
+<li><p><code>by</code> also accepts character vector of column names. It is particularly useful to program with, for e.g., designing a function with the columns to be group by as a function argument.</p></li>
+<li><p>When there’s only one column or expression to refer to in <code>j</code> and <code>by</code>, we can drop the <code>.()</code> notation. This is purely for convenience. We could instead do:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .N, by =<span class="st"> </span>origin]
+ans
+<span class="co">#    origin     N</span>
+<span class="co"># 1:    JFK 81483</span>
+<span class="co"># 2:    LGA 84433</span>
+<span class="co"># 3:    EWR 87400</span></code></pre></div>
+<p>We’ll use this convenient form wherever applicable hereafter.</p></li>
+</ul>
+</div>
+</div>
+</div>
+</div>
+<div id="section-14" class="section level1">
+<h1></h1>
+<div id="origin-.N" class="section level4">
+<h4>– How can we calculate the number of trips for each origin airport for carrier code <em>“AA”</em>?</h4>
+<p>The unique carrier code <em>“AA”</em> corresponds to <em>American Airlines Inc.</em></p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by =<span class="st"> </span>origin]
+ans
+<span class="co">#    origin     N</span>
+<span class="co"># 1:    JFK 11923</span>
+<span class="co"># 2:    LGA 11730</span>
+<span class="co"># 3:    EWR  2649</span></code></pre></div>
+</div>
+<div id="section-15" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We first obtain the row indices for the expression <code>carrier == "AA"</code> from <code>i</code>.</p></li>
+<li><p>Using those <em>row indices</em>, we obtain the number of rows while grouped by <code>origin</code>. Once again no columns are actually materialised here, because the <code>j-expression</code> does not require any columns to be actually subsetted and is therefore fast and memory efficient.</p></li>
+</ul>
+</div>
+<div id="origin-dest-.N" class="section level4">
+<h4>– How can we get the total number of trips for each <code>origin, dest</code> pair for carrier code <em>“AA”</em>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by =<span class="st"> </span>.(origin,dest)]
+<span class="kw">head</span>(ans)
+<span class="co">#    origin dest    N</span>
+<span class="co"># 1:    JFK  LAX 3387</span>
+<span class="co"># 2:    LGA  PBI  245</span>
+<span class="co"># 3:    EWR  LAX   62</span>
+<span class="co"># 4:    JFK  MIA 1876</span>
+<span class="co"># 5:    JFK  SEA  298</span>
+<span class="co"># 6:    EWR  MIA  848</span>
+
+## or equivalently using a character vector in 'by'
+<span class="co"># ans <- flights[carrier == "AA", .N, by = c("origin", "dest")]</span></code></pre></div>
+</div>
+<div id="section-16" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><code>by</code> accepts multiple columns. We just provide all the columns by which to group by.</li>
+</ul>
+</div>
+<div id="origin-dest-month" class="section level4">
+<h4>– How can we get the average arrival and departure delay for each <code>orig,dest</code> pair for each month for carrier code <em>“AA”</em>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>,
+        .(<span class="kw">mean</span>(arr_delay), <span class="kw">mean</span>(dep_delay)),
+        by =<span class="st"> </span>.(origin, dest, month)]
+ans
+<span class="co">#      origin dest month         V1         V2</span>
+<span class="co">#   1:    JFK  LAX     1   6.590361 14.2289157</span>
+<span class="co">#   2:    LGA  PBI     1  -7.758621  0.3103448</span>
+<span class="co">#   3:    EWR  LAX     1   1.366667  7.5000000</span>
+<span class="co">#   4:    JFK  MIA     1  15.720670 18.7430168</span>
+<span class="co">#   5:    JFK  SEA     1  14.357143 30.7500000</span>
+<span class="co">#  ---                                        </span>
+<span class="co"># 196:    LGA  MIA    10  -6.251799 -1.4208633</span>
+<span class="co"># 197:    JFK  MIA    10  -1.880184  6.6774194</span>
+<span class="co"># 198:    EWR  PHX    10  -3.032258 -4.2903226</span>
+<span class="co"># 199:    JFK  MCO    10 -10.048387 -1.6129032</span>
+<span class="co"># 200:    JFK  DCA    10  16.483871 15.5161290</span></code></pre></div>
+</div>
+<div id="section-17" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We did not provide column names for expressions in <code>j</code>, they were automatically generated (<code>V1</code>, <code>V2</code>).</p></li>
+<li><p>Once again, note that the input order of grouping columns is preserved in the result.</p></li>
+</ul>
+</div>
+</div>
+<div id="section-18" class="section level1">
+<h1></h1>
+<p>Now what if we would like to order the result by those grouping columns <code>origin</code>, <code>dest</code> and <code>month</code>?</p>
+<div id="b-keyby" class="section level3">
+<h3>b) keyby</h3>
+<p><em>data.table</em> retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by.</p>
+<div id="so-how-can-we-directly-order-by-all-the-grouping-variables" class="section level4">
+<h4>– So how can we directly order by all the grouping variables?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>,
+        .(<span class="kw">mean</span>(arr_delay), <span class="kw">mean</span>(dep_delay)),
+        keyby =<span class="st"> </span>.(origin, dest, month)]
+ans
+<span class="co">#      origin dest month         V1         V2</span>
+<span class="co">#   1:    EWR  DFW     1   6.427673 10.0125786</span>
+<span class="co">#   2:    EWR  DFW     2  10.536765 11.3455882</span>
+<span class="co">#   3:    EWR  DFW     3  12.865031  8.0797546</span>
+<span class="co">#   4:    EWR  DFW     4  17.792683 12.9207317</span>
+<span class="co">#   5:    EWR  DFW     5  18.487805 18.6829268</span>
+<span class="co">#  ---                                        </span>
+<span class="co"># 196:    LGA  PBI     1  -7.758621  0.3103448</span>
+<span class="co"># 197:    LGA  PBI     2  -7.865385  2.4038462</span>
+<span class="co"># 198:    LGA  PBI     3  -5.754098  3.0327869</span>
+<span class="co"># 199:    LGA  PBI     4 -13.966667 -4.7333333</span>
+<span class="co"># 200:    LGA  PBI     5 -10.357143 -6.8571429</span></code></pre></div>
+</div>
+<div id="section-19" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li>All we did was to change <code>by</code> to <code>keyby</code>. This automatically orders the result by the grouping variables in increasing order. Note that <code>keyby()</code> is applied after performing the operation, i.e., on the computed result.</li>
+</ul>
+<p><strong>Keys:</strong> Actually <code>keyby</code> does a little more than <em>just ordering</em>. It also <em>sets a key</em> after ordering by setting an <em>attribute</em> called <code>sorted</code>. But we’ll learn more about <code>keys</code> in the next vignette.</p>
+<p>For now, all you’ve to know is you can use <code>keyby</code> to automatically order by the columns specified in <code>by</code>.</p>
+</div>
+</div>
+<div id="c-chaining" class="section level3">
+<h3>c) Chaining</h3>
+<p>Let’s reconsider the task of <a href="#origin-dest-.N">getting the total number of trips for each <code>origin, dest</code> pair for carrier <em>“AA”</em></a>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by =<span class="st"> </span>.(origin, dest)]</code></pre></div>
+<div id="how-can-we-order-ans-using-the-columns-origin-in-ascending-order-and-dest-in-descending-order" class="section level4">
+<h4>– How can we order <code>ans</code> using the columns <code>origin</code> in ascending order, and <code>dest</code> in descending order?</h4>
+<p>We can store the intermediate result in a variable, and then use <code>order(origin, -dest)</code> on that variable. It seems fairly straightforward.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>ans[<span class="kw">order</span>(origin, -dest)]
+<span class="kw">head</span>(ans)
+<span class="co">#    origin dest    N</span>
+<span class="co"># 1:    EWR  PHX  121</span>
+<span class="co"># 2:    EWR  MIA  848</span>
+<span class="co"># 3:    EWR  LAX   62</span>
+<span class="co"># 4:    EWR  DFW 1618</span>
+<span class="co"># 5:    JFK  STT  229</span>
+<span class="co"># 6:    JFK  SJU  690</span></code></pre></div>
+</div>
+<div id="section-20" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Recall that we can use “-” on a <em>character</em> column in <code>order()</code> within the frame of a <em>data.table</em>. This is possible to due <em>data.table</em>’s internal query optimisation.</p></li>
+<li><p>Also recall that <code>order(...)</code> with the frame of a <em>data.table</em> is <em>automatically optimised</em> to use <em>data.table</em>’s internal fast radix order <code>forder()</code> for speed. So you can keep using the already <em>familiar</em> base R functions without compromising in speed or memory efficiency that <em>data.table</em> offers. We will cover this in more detail in the <em>data.table internals</em> vignette.</p></li>
+</ul>
+</div>
+</div>
+</div>
+<div id="section-21" class="section level1">
+<h1></h1>
+<p>But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogether by <code>chaining</code> expressions.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[carrier ==<span class="st"> "AA"</span>, .N, by =<span class="st"> </span>.(origin, dest)][<span class="kw">order</span>(origin, -dest)]
+<span class="kw">head</span>(ans, <span class="dv">10</span>)
+<span class="co">#     origin dest    N</span>
+<span class="co">#  1:    EWR  PHX  121</span>
+<span class="co">#  2:    EWR  MIA  848</span>
+<span class="co">#  3:    EWR  LAX   62</span>
+<span class="co">#  4:    EWR  DFW 1618</span>
+<span class="co">#  5:    JFK  STT  229</span>
+<span class="co">#  6:    JFK  SJU  690</span>
+<span class="co">#  7:    JFK  SFO 1312</span>
+<span class="co">#  8:    JFK  SEA  298</span>
+<span class="co">#  9:    JFK  SAN  299</span>
+<span class="co"># 10:    JFK  ORD  432</span></code></pre></div>
+<div id="section-22" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>We can tack expressions one after another, <em>forming a chain</em> of operations, i.e., <code>DT[ ... ][ ... ][ ... ]</code>.</p></li>
+<li><p>Or you can also chain them vertically:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[ ...
+ ][ ...
+ ][ ...
+ ]</code></pre></div></li>
+</ul>
+</div>
+<div id="d-expressions-in-by" class="section level3">
+<h3>d) Expressions in <code>by</code></h3>
+<div id="can-by-accept-expressions-as-well-or-just-take-columns" class="section level4">
+<h4>– Can <code>by</code> accept <em>expressions</em> as well or just take columns?</h4>
+<p>Yes it does. As an example, if we would like to find out how many flights started late but arrived early (or on time), started and arrived late etc…</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, .N, .(dep_delay><span class="dv">0</span>, arr_delay><span class="dv">0</span>)]
+ans
+<span class="co">#    dep_delay arr_delay      N</span>
+<span class="co"># 1:      TRUE      TRUE  72836</span>
+<span class="co"># 2:     FALSE      TRUE  34583</span>
+<span class="co"># 3:     FALSE     FALSE 119304</span>
+<span class="co"># 4:      TRUE     FALSE  26593</span></code></pre></div>
+</div>
+<div id="section-23" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>The last row corresponds to <code>dep_delay > 0 = TRUE</code> and <code>arr_delay > 0 = FALSE</code>. We can see that 26593 flights started late but arrived early (or on time).</p></li>
+<li><p>Note that we did not provide any names to <code>by-expression</code>. And names have been automatically assigned in the result.</p></li>
+<li><p>You can provide other columns along with expressions, for example: <code>DT[, .N, by = .(a, b>0)]</code>.</p></li>
+</ul>
+</div>
+</div>
+<div id="e-multiple-columns-in-j---.sd" class="section level3">
+<h3>e) Multiple columns in <code>j</code> - <code>.SD</code></h3>
+<div id="do-we-have-to-compute-mean-for-each-column-individually" class="section level4">
+<h4>– Do we have to compute <code>mean()</code> for each column individually?</h4>
+<p>It is of course not practical to have to type <code>mean(myCol)</code> for every column one by one. What if you had a 100 columns to compute <code>mean()</code> of?</p>
+<p>How can we do this efficiently? To get there, refresh on <a href="#tip-1">this tip</a> - <em>“As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table”</em>. Suppose we can refer to the <em>data subset for each group</em> as a variable <em>while grouping</em>, then we can loop through all the columns of that variable using the already familiar base function <code>lapply()</code>. We don’t have to learn any new function.</p>
+</div>
+<div id="special-SD" class="section level4 bs-callout bs-callout-info">
+<h4>Special symbol <code>.SD</code>:</h4>
+<p><em>data.table</em> provides a <em>special</em> symbol, called <code>.SD</code>. It stands for <strong>S</strong>ubset of <strong>D</strong>ata. It by itself is a <em>data.table</em> that holds the data for <em>the current group</em> defined using <code>by</code>.</p>
+<p>Recall that a <em>data.table</em> is internally a list as well with all its columns of equal length.</p>
+</div>
+</div>
+</div>
+<div id="section-24" class="section level1">
+<h1></h1>
+<p>Let’s use the <a href="#what-is-datatable-1a"><em>data.table</em> <code>DT</code> from before</a> to get a glimpse of what <code>.SD</code> looks like.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT
+<span class="co">#    ID a  b  c</span>
+<span class="co"># 1:  b 1  7 13</span>
+<span class="co"># 2:  b 2  8 14</span>
+<span class="co"># 3:  b 3  9 15</span>
+<span class="co"># 4:  a 4 10 16</span>
+<span class="co"># 5:  a 5 11 17</span>
+<span class="co"># 6:  c 6 12 18</span>
+
+DT[, <span class="kw">print</span>(.SD), by =<span class="st"> </span>ID]
+<span class="co">#    a b  c</span>
+<span class="co"># 1: 1 7 13</span>
+<span class="co"># 2: 2 8 14</span>
+<span class="co"># 3: 3 9 15</span>
+<span class="co">#    a  b  c</span>
+<span class="co"># 1: 4 10 16</span>
+<span class="co"># 2: 5 11 17</span>
+<span class="co">#    a  b  c</span>
+<span class="co"># 1: 6 12 18</span>
+<span class="co"># Empty data.table (0 rows) of 1 col: ID</span></code></pre></div>
+<div id="section-25" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p><code>.SD</code> contains all the columns <em>except the grouping columns</em> by default.</p></li>
+<li><p>It is also generated by preserving the original order - data corresponding to <code>ID = "b"</code>, then <code>ID = "a"</code>, and then <code>ID = "c"</code>.</p></li>
+</ul>
+</div>
+</div>
+<div id="section-26" class="section level1">
+<h1></h1>
+<p>To compute on (multiple) columns, we can then simply use the base R function <code>lapply()</code>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, <span class="kw">lapply</span>(.SD, mean), by =<span class="st"> </span>ID]
+<span class="co">#    ID   a    b    c</span>
+<span class="co"># 1:  b 2.0  8.0 14.0</span>
+<span class="co"># 2:  a 4.5 10.5 16.5</span>
+<span class="co"># 3:  c 6.0 12.0 18.0</span></code></pre></div>
+<div id="section-27" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p><code>.SD</code> holds the rows corresponding to columns <em>a</em>, <em>b</em> and <em>c</em> for that group. We compute the <code>mean()</code> on each of these columns using the already familiar base function <code>lapply()</code>.</p></li>
+<li><p>Each group returns a list of three elements containing the mean value which will become the columns of the resulting <code>data.table</code>.</p></li>
+<li><p>Since <code>lapply()</code> returns a <em>list</em>, there is no need to wrap it with an additional <code>.()</code> (if necessary, refer to <a href="#tip-1">this tip</a>).</p></li>
+</ul>
+</div>
+</div>
+<div id="section-28" class="section level1">
+<h1></h1>
+<p>We are almost there. There is one little thing left to address. In our <code>flights</code> <em>data.table</em>, we only wanted to calculate the <code>mean()</code> of two columns <code>arr_delay</code> and <code>dep_delay</code>. But <code>.SD</code> would contain all the columns other than the grouping variables by default.</p>
+<div id="how-can-we-specify-just-the-columns-we-would-like-to-compute-the-mean-on" class="section level4">
+<h4>– How can we specify just the columns we would like to compute the <code>mean()</code> on?</h4>
+</div>
+<div id="sdcols" class="section level4 bs-callout bs-callout-info">
+<h4>.SDcols</h4>
+<p>Using the argument <code>.SDcols</code>. It accepts either column names or column indices. For example, <code>.SDcols = c("arr_delay", "dep_delay")</code> ensures that <code>.SD</code> contains only these two columns for each group.</p>
+<p>Similar to the <a href="#with_false">with = FALSE section</a>, you can also provide the columns to remove instead of columns to keep using <code>-</code> or <code>!</code> sign as well as select consecutive columns as <code>colA:colB</code> and deselect consecutive columns as <code>!(colA:colB) or</code>-(colA:colB)`.</p>
+</div>
+</div>
+<div id="section-29" class="section level1">
+<h1></h1>
+<p>Now let us try to use <code>.SD</code> along with <code>.SDcols</code> to get the <code>mean()</code> of <code>arr_delay</code> and <code>dep_delay</code> columns grouped by <code>origin</code>, <code>dest</code> and <code>month</code>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[carrier ==<span class="st"> "AA"</span>,                       ## Only on trips with carrier "AA"
+        <span class="kw">lapply</span>(.SD, mean),                     ## compute the mean
+        by =<span class="st"> </span>.(origin, dest, month),           ## for every 'origin,dest,month'
+        .SDcols =<span class="st"> </span><span class="kw">c</span>(<span class="st">"arr_delay"</span>, <span class="st">"dep_delay"</span>)] ## for just those specified in .SDcols
+<span class="co">#      origin dest month  arr_delay  dep_delay</span>
+<span class="co">#   1:    JFK  LAX     1   6.590361 14.2289157</span>
+<span class="co">#   2:    LGA  PBI     1  -7.758621  0.3103448</span>
+<span class="co">#   3:    EWR  LAX     1   1.366667  7.5000000</span>
+<span class="co">#   4:    JFK  MIA     1  15.720670 18.7430168</span>
+<span class="co">#   5:    JFK  SEA     1  14.357143 30.7500000</span>
+<span class="co">#  ---                                        </span>
+<span class="co"># 196:    LGA  MIA    10  -6.251799 -1.4208633</span>
+<span class="co"># 197:    JFK  MIA    10  -1.880184  6.6774194</span>
+<span class="co"># 198:    EWR  PHX    10  -3.032258 -4.2903226</span>
+<span class="co"># 199:    JFK  MCO    10 -10.048387 -1.6129032</span>
+<span class="co"># 200:    JFK  DCA    10  16.483871 15.5161290</span></code></pre></div>
+<div id="f-subset-.sd-for-each-group" class="section level3">
+<h3>f) Subset <code>.SD</code> for each group:</h3>
+<div id="how-can-we-return-the-first-two-rows-for-each-month" class="section level4">
+<h4>– How can we return the first two rows for each <code>month</code>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[, <span class="kw">head</span>(.SD, <span class="dv">2</span>), by =<span class="st"> </span>month]
+<span class="kw">head</span>(ans)
+<span class="co">#    month year day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1:     1 2014   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2:     1 2014   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co"># 3:     2 2014   1        -1         1      AA    JFK  LAX      358     2475    8</span>
+<span class="co"># 4:     2 2014   1        -5         3      AA    JFK  LAX      358     2475   11</span>
+<span class="co"># 5:     3 2014   1       -11        36      AA    JFK  LAX      375     2475    8</span>
+<span class="co"># 6:     3 2014   1        -3        14      AA    JFK  LAX      368     2475   11</span></code></pre></div>
+</div>
+<div id="section-30" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p><code>.SD</code> is a <em>data.table</em> that holds all the rows for <em>that group</em>. We simply subset the first two rows as we have seen <a href="#subset-rows-integer">here</a> already.</p></li>
+<li><p>For each group, <code>head(.SD, 2)</code> returns the first two rows as a <em>data.table</em> which is also a list. So we do not have to wrap it with <code>.()</code>.</p></li>
+</ul>
+</div>
+</div>
+<div id="g-why-keep-j-so-flexible" class="section level3">
+<h3>g) Why keep <code>j</code> so flexible?</h3>
+<p>So that we have a consistent syntax and keep using already existing (and familiar) base functions instead of learning new functions. To illustrate, let us use the <em>data.table</em> <code>DT</code> we created at the very beginning under <a href="#what-is-datatable-1a">What is a data.table?</a> section.</p>
+<div id="how-can-we-concatenate-columns-a-and-b-for-each-group-in-id" class="section level4">
+<h4>– How can we concatenate columns <code>a</code> and <code>b</code> for each group in <code>ID</code>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, .(<span class="dt">val =</span> <span class="kw">c</span>(a,b)), by =<span class="st"> </span>ID]
+<span class="co">#     ID val</span>
+<span class="co">#  1:  b   1</span>
+<span class="co">#  2:  b   2</span>
+<span class="co">#  3:  b   3</span>
+<span class="co">#  4:  b   7</span>
+<span class="co">#  5:  b   8</span>
+<span class="co">#  6:  b   9</span>
+<span class="co">#  7:  a   4</span>
+<span class="co">#  8:  a   5</span>
+<span class="co">#  9:  a  10</span>
+<span class="co"># 10:  a  11</span>
+<span class="co"># 11:  c   6</span>
+<span class="co"># 12:  c  12</span></code></pre></div>
+</div>
+<div id="section-31" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li>That’s it. There is no special syntax required. All we need to know is the base function <code>c()</code> which concatenates vectors and <a href="#tip-1">the tip from before</a>.</li>
+</ul>
+</div>
+<div id="what-if-we-would-like-to-have-all-the-values-of-column-a-and-b-concatenated-but-returned-as-a-list-column" class="section level4">
+<h4>– What if we would like to have all the values of column <code>a</code> and <code>b</code> concatenated, but returned as a list column?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, .(<span class="dt">val =</span> <span class="kw">list</span>(<span class="kw">c</span>(a,b))), by =<span class="st"> </span>ID]
+<span class="co">#    ID         val</span>
+<span class="co"># 1:  b 1,2,3,7,8,9</span>
+<span class="co"># 2:  a  4, 5,10,11</span>
+<span class="co"># 3:  c        6,12</span></code></pre></div>
+</div>
+<div id="section-32" class="section level4 bs-callout bs-callout-info">
+<h4></h4>
+<ul>
+<li><p>Here, we first concatenate the values with <code>c(a,b)</code> for each group, and wrap that with <code>list()</code>. So for each group, we return a list of all concatenated values.</p></li>
+<li><p>Note those commas are for display only. A list column can contain any object in each cell, and in this example, each cell is itself a vector and some cells contain longer vectors than others.</p></li>
+</ul>
+</div>
+</div>
+</div>
+<div id="section-33" class="section level1">
+<h1></h1>
+<p>Once you start internalising usage in <code>j</code>, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of <code>print()</code>.</p>
+<p>For example:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## (1) look at the difference between
+DT[, <span class="kw">print</span>(<span class="kw">c</span>(a,b)), by =<span class="st"> </span>ID]
+<span class="co"># [1] 1 2 3 7 8 9</span>
+<span class="co"># [1]  4  5 10 11</span>
+<span class="co"># [1]  6 12</span>
+<span class="co"># Empty data.table (0 rows) of 1 col: ID</span>
+
+## (2) and
+DT[, <span class="kw">print</span>(<span class="kw">list</span>(<span class="kw">c</span>(a,b))), by =<span class="st"> </span>ID]
+<span class="co"># [[1]]</span>
+<span class="co"># [1] 1 2 3 7 8 9</span>
+<span class="co"># </span>
+<span class="co"># [[1]]</span>
+<span class="co"># [1]  4  5 10 11</span>
+<span class="co"># </span>
+<span class="co"># [[1]]</span>
+<span class="co"># [1]  6 12</span>
+<span class="co"># Empty data.table (0 rows) of 1 col: ID</span></code></pre></div>
+<p>In (1), for each group, a vector is returned, with length = 6,4,2 here. However (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore (1) results in a length of <code>6+4+2 = 12</code>, whereas (2) returns <code>1+1+1=3</code>.</p>
+<div id="summary" class="section level2">
+<h2>Summary</h2>
+<p>The general form of <em>data.table</em> syntax is:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[i, j, by]</code></pre></div>
+<p>We have seen so far that,</p>
+<div id="using-i" class="section level4 bs-callout bs-callout-info">
+<h4>Using <code>i</code>:</h4>
+<ul>
+<li><p>We can subset rows similar to a <em>data.frame</em> - except you don’t have to use <code>DT$</code> repetitively since columns within the frame of a <em>data.table</em> are seen as if they are <em>variables</em>.</p></li>
+<li><p>We can also sort a <em>data.table</em> using <code>order()</code>, which internally uses <em>data.table</em>’s fast order for performance.</p></li>
+</ul>
+<p>We can do much more in <code>i</code> by keying a <em>data.table</em>, which allows blazing fast subsets and joins. We will see this in the <em>“Keys and fast binary search based subsets”</em> and <em>“Joins and rolling joins”</em> vignette.</p>
+</div>
+<div id="using-j" class="section level4 bs-callout bs-callout-info">
+<h4>Using <code>j</code>:</h4>
+<ol style="list-style-type: decimal">
+<li><p>Select columns the <em>data.table</em> way: <code>DT[, .(colA, colB)]</code>.</p></li>
+<li><p>Select columns the <em>data.frame</em> way: <code>DT[, c("colA", "colB"), with = FALSE]</code>.</p></li>
+<li><p>Compute on columns: <code>DT[, .(sum(colA), mean(colB))]</code>.</p></li>
+<li><p>Provide names if necessary: <code>DT[, .(sA =sum(colA), mB = mean(colB))]</code>.</p></li>
+<li><p>Combine with <code>i</code>: <code>DT[colA > value, sum(colB)]</code>.</p></li>
+</ol>
+</div>
+</div>
+</div>
+<div id="section-34" class="section level1">
+<h1></h1>
+<div id="using-by" class="section level4 bs-callout bs-callout-info">
+<h4>Using <code>by</code>:</h4>
+<ul>
+<li><p>Using <code>by</code>, we can group by columns by specifying a <em>list of columns</em> or a <em>character vector of column names</em> or even <em>expressions</em>. The flexibility of <code>j</code>, combined with <code>by</code> and <code>i</code> makes for a very powerful syntax.</p></li>
+<li><p><code>by</code> can handle multiple columns and also <em>expressions</em>.</p></li>
+<li><p>We can <code>keyby</code> grouping columns to automatically sort the grouped result.</p></li>
+<li><p>We can use <code>.SD</code> and <code>.SDcols</code> in <code>j</code> to operate on multiple columns using already familiar base functions. Here are some examples:</p>
+<ol style="list-style-type: decimal">
+<li><p><code>DT[, lapply(.SD, fun), by = ..., .SDcols = ...]</code> - applies <code>fun</code> to all columns specified in <code>.SDcols</code> while grouping by the columns specified in <code>by</code>.</p></li>
+<li><p><code>DT[, head(.SD, 2), by = ...]</code> - return the first two rows for each group.</p></li>
+<li><p><code>DT[col > val, head(.SD, 1), by = ...]</code> - combine <code>i</code> along with <code>j</code> and <code>by</code>.</p></li>
+</ol></li>
+</ul>
+</div>
+</div>
+<div id="section-35" class="section level1">
+<h1></h1>
+<div id="and-remember-the-tip" class="section level4 bs-callout bs-callout-warning">
+<h4>And remember the tip:</h4>
+<p>As long as <code>j</code> returns a <em>list</em>, each element of the list will become a column in the resulting <em>data.table</em>.</p>
+</div>
+</div>
+<div id="section-36" class="section level1">
+<h1></h1>
+<p>We will see how to <em>add/update/delete</em> columns <em>by reference</em> and how to combine them with <code>i</code> and <code>by</code> in the next vignette.</p>
+<hr />
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/doc/datatable-intro.pdf b/inst/doc/datatable-intro.pdf
deleted file mode 100644
index a9d0572..0000000
Binary files a/inst/doc/datatable-intro.pdf and /dev/null differ
diff --git a/inst/doc/datatable-keys-fast-subset.R b/inst/doc/datatable-keys-fast-subset.R
index 16f4bdc..4be8db9 100644
--- a/inst/doc/datatable-keys-fast-subset.R
+++ b/inst/doc/datatable-keys-fast-subset.R
@@ -2,14 +2,13 @@
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 
-## ----echo=FALSE-----------------------------------------------------------------------------------
-options(width=100)
+## ----echo = FALSE---------------------------------------------------------------------------------
+options(width = 100L)
 
 ## -------------------------------------------------------------------------------------------------
 flights <- fread("flights14.csv")
@@ -18,9 +17,9 @@ dim(flights)
 
 ## -------------------------------------------------------------------------------------------------
 set.seed(1L)
-DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE), 
+DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE),
                 ID2 = sample(1:3, 10, TRUE),
-                val = sample(10), 
+                val = sample(10),
                 stringsAsFactors = FALSE,
                 row.names = sample(LETTERS[1:10]))
 DF
@@ -30,7 +29,7 @@ rownames(DF)
 ## -------------------------------------------------------------------------------------------------
 DF["C", ]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  rownames(DF) = sample(LETTERS[1:5], 10, TRUE)
 #  # Warning: non-unique values when setting 'row.names': 'C', 'D'
 #  # Error in `row.names<-.data.frame`(`*tmp*`, value = value): duplicate 'row.names' are not allowed
@@ -52,9 +51,10 @@ head(flights)
 flights[.("JFK")]
 
 ## alternatively
-# flights[J("JFK")] (or) flights[list("JFK")]
+# flights[J("JFK")] (or) 
+# flights[list("JFK")]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  flights["JFK"]              ## same as flights[.("JFK")]
 
 ## ----eval = FALSE---------------------------------------------------------------------------------
@@ -88,7 +88,7 @@ key(flights)
 flights[.("LGA", "TPA"), .(arr_delay)]
 
 ## ----eval = FALSE---------------------------------------------------------------------------------
-#  flights[.("LGA", "TPA"), "arr_delay", with=FALSE]
+#  flights[.("LGA", "TPA"), "arr_delay", with = FALSE]
 
 ## -------------------------------------------------------------------------------------------------
 flights[.("LGA", "TPA"), .(arr_delay)][order(-arr_delay)]
@@ -114,33 +114,33 @@ setkey(flights, origin, dest)
 key(flights)
 
 ## -------------------------------------------------------------------------------------------------
-ans <- flights["JFK", max(dep_delay), keyby=month]
+ans <- flights["JFK", max(dep_delay), keyby = month]
 head(ans)
 key(ans)
 
 ## -------------------------------------------------------------------------------------------------
-flights[.("JFK", "MIA"), mult="first"]
+flights[.("JFK", "MIA"), mult = "first"]
 
 ## -------------------------------------------------------------------------------------------------
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last"]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last"]
 
 ## -------------------------------------------------------------------------------------------------
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last", nomatch = 0L]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", nomatch = 0L]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  # key by origin,dest columns
 #  flights[.("JFK", "MIA")]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  flights[origin == "JFK" & dest == "MIA"]
 
 ## -------------------------------------------------------------------------------------------------
 set.seed(2L)
 N = 2e7L
-DT = data.table(x = sample(letters, N, TRUE), 
-                y = sample(1000L, N, TRUE), 
-                val=runif(N), key = c("x", "y"))
-print(object.size(DT), units="Mb")
+DT = data.table(x = sample(letters, N, TRUE),
+                y = sample(1000L, N, TRUE),
+              val = runif(N), key = c("x", "y"))
+print(object.size(DT), units = "Mb")
 
 key(DT)
 
@@ -160,6 +160,6 @@ dim(ans2)
 
 identical(ans1$val, ans2$val)
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  1, 5, 10, 19, 22, 23, 30
 
diff --git a/inst/doc/datatable-keys-fast-subset.Rmd b/inst/doc/datatable-keys-fast-subset.Rmd
index b9ebd7c..d4d6b8d 100644
--- a/inst/doc/datatable-keys-fast-subset.Rmd
+++ b/inst/doc/datatable-keys-fast-subset.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Keys and fast binary search based subset"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Keys and fast binary search based subset}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,14 +13,13 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 
-This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and  group by using `by`. If you're not familar with these concepts, please read the *"Introduction to data.table"* and *"data.table reference semantics"* vignettes first.
+This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and  group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first.
 
 ***
 
@@ -31,8 +27,8 @@ This vignette is aimed at those who are already familiar with *data.table* synta
 
 We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -57,15 +53,15 @@ In this vignette, we will
 
 ### a) What is a *key*?
 
-In the *"Introduction to data.table"* vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*. 
+In the *"Introduction to data.table"* vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
 
 But first, let's start by looking at *data.frames*. All *data.frames* have a row names attribute. Consider the *data.frame* `DF` below.
 
 ```{r}
 set.seed(1L)
-DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE), 
+DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE),
                 ID2 = sample(1:3, 10, TRUE),
-                val = sample(10), 
+                val = sample(10),
                 stringsAsFactors = FALSE,
                 row.names = sample(LETTERS[1:10]))
 DF
@@ -79,15 +75,15 @@ We can *subset* a particular row using its row name as shown below:
 DF["C", ]
 ```
 
-i.e., row names are more or less *an index* to rows of a *data.frame*. However, 
+i.e., row names are more or less *an index* to rows of a *data.frame*. However,
 
 1. Each row is limited to *exactly one* row name.
 
-    But, a person (for example) has at least two names - a *first* and a *second* name. It is useful to organise a telephone directory by *surname* then *first name*. 
+    But, a person (for example) has at least two names - a *first* and a *second* name. It is useful to organise a telephone directory by *surname* then *first name*.
 
 2. And row names should be *unique*.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     rownames(DF) = sample(LETTERS[1:5], 10, TRUE)
     # Warning: non-unique values when setting 'row.names': 'C', 'D'
     # Error in `row.names<-.data.frame`(`*tmp*`, value = value): duplicate 'row.names' are not allowed
@@ -102,11 +98,11 @@ DT
 rownames(DT)
 ```
 
-* Note that row names have been reset. 
+* Note that row names have been reset.
 
 * *data.tables* never uses row names. Since *data.tables* **inherit** from *data.frames*, it still has the row names attribute. But it never uses them. We'll see in a moment as to why.
 
-    If you would like to preserve the row names, use `keep.rownames = TRUE` in `as.data.table()` - this will create a new column called `rn` and assign row names to this column. 
+    If you would like to preserve the row names, use `keep.rownames = TRUE` in `as.data.table()` - this will create a new column called `rn` and assign row names to this column.
 
 Instead, in *data.tables* we set and use `keys`. Think of a `key` as **supercharged rownames**.
 
@@ -114,11 +110,11 @@ Instead, in *data.tables* we set and use `keys`. Think of a `key` as **superchar
 
 1. We can set keys on *multiple columns* and the column can be of *different types* -- *integer*, *numeric*, *character*, *factor*, *integer64* etc. *list* and *complex* types are not supported yet.
 
-2. Uniqueness is not enforced, i.e., duplicate key values are allowed. Since rows are sorted by key, any duplicates in the key columns will appear consecutively.  
+2. Uniqueness is not enforced, i.e., duplicate key values are allowed. Since rows are sorted by key, any duplicates in the key columns will appear consecutively.
 
-3. Setting a `key` *does two things*: 
+3. Setting a `key` does *two* things:
 
-    a. reorders the rows of the *data.table* by the column(s) provided *by reference*, always in *increasing* order. 
+    a. physically reorders the rows of the *data.table* by the column(s) provided *by reference*, always in *increasing* order.
 
     b. marks those columns as *key* columns by setting an attribute called `sorted` to the *data.table*.
 
@@ -150,7 +146,7 @@ head(flights)
 
 * The *data.table* is now reordered (or sorted) by the column we provided - `origin`. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the *data.table*, and is therefore very memory efficient.
 
-* You can also set keys directly when creating *data.tables* using the `data.table()` function using `key=` argument. It takes a character vector of column names.
+* You can also set keys directly when creating *data.tables* using the `data.table()` function using `key` argument. It takes a character vector of column names.
 
 #### set* and `:=`: {.bs-callout .bs-callout-info}
 
@@ -158,7 +154,7 @@ In *data.table*, the `:=` operator and all the `set*` (e.g., `setkey`, `setorder
 
 #
 
-Once you *key* a *data.table* by certain columns, you can subset by querying those key columns using the `.()` notation in `i`. Recall that `.()` is an *alias to* `list()`. 
+Once you *key* a *data.table* by certain columns, you can subset by querying those key columns using the `.()` notation in `i`. Recall that `.()` is an *alias to* `list()`.
 
 #### -- Use the key column `origin` to subset all rows where the origin airport matches *"JFK"*
 
@@ -166,7 +162,8 @@ Once you *key* a *data.table* by certain columns, you can subset by querying tho
 flights[.("JFK")]
 
 ## alternatively
-# flights[J("JFK")] (or) flights[list("JFK")]
+# flights[J("JFK")] (or) 
+# flights[list("JFK")]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -177,7 +174,7 @@ flights[.("JFK")]
 
 * On single column key of *character* type, you can drop the `.()` notation and use the values directly when subsetting, like subset using row names on *data.frames*.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     flights["JFK"]              ## same as flights[.("JFK")]
     ```
 
@@ -205,7 +202,7 @@ key(flights)
 
 ### c) Keys and multiple columns
 
-To refresh, *keys* are like *supercharged* rownames. We can set key on multiple columns and they can be of multiple types. 
+To refresh, *keys* are like *supercharged* row names. We can set key on multiple columns and they can be of multiple types.
 
 #### -- How can I set keys on both `origin` *and* `dest` columns?
 
@@ -231,11 +228,11 @@ flights[.("JFK", "MIA")]
 
 #### How does the subset work here? {.bs-callout .bs-callout-info #multiple-key-point}
 
-* It is important to undertand how this works internally. *"JFK"* is first matched against the first key column `origin`. And *within those matching rows*, *"MIA"* is matched against the second key column `dest` to obtain *row indices* where both `origin` and `dest` match the given values. 
+* It is important to undertand how this works internally. *"JFK"* is first matched against the first key column `origin`. And *within those matching rows*, *"MIA"* is matched against the second key column `dest` to obtain *row indices* where both `origin` and `dest` match the given values.
 
 * Since no `j` is provided, we simply return *all columns* corresponding to those row indices.
 
-# 
+#
 
 #### -- Subset all rows where just the first key column `origin` matches *"JFK"*
 
@@ -257,7 +254,7 @@ flights[.(unique(origin), "MIA")]
 
 #### What's happening here? {.bs-callout .bs-callout-info}
 
-* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching vlaues in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therfore we provide *all* unique values from key column `origin`.
+* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching values in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therefore we provide *all* unique values from key column `origin`.
 
 * *"MIA"* is automatically recycled to fit the length of `unique(origin)` which is *3*.
 
@@ -276,14 +273,14 @@ flights[.("LGA", "TPA"), .(arr_delay)]
 
 #### {.bs-callout .bs-callout-info}
 
-* The *row indices* corresponding to `origin == "LGA" and `dest == "TPA"` are obtained using *key based subset*.
+* The *row indices* corresponding to `origin == "LGA"` and `dest == "TPA"` are obtained using *key based subset*.
 
 * Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in *Introduction to data.table* vignette.
 
 * We could have returned the result by using `with = FALSE` as well.
- 
+
     ```{r eval = FALSE}
-    flights[.("LGA", "TPA"), "arr_delay", with=FALSE]
+    flights[.("LGA", "TPA"), "arr_delay", with = FALSE]
     ```
 
 ### b) Chaining
@@ -326,16 +323,16 @@ key(flights)
 
 #### {.bs-callout .bs-callout-info}
 
-* We first set `key` to *hour*. This reorders `flights` by the column *hour* and marks that column as the `key` column.
+* We first set `key` to `hour`. This reorders `flights` by the column `hour` and marks that column as the `key` column.
 
-* Now we can subset on *hour* by using the `.()` notation. We subset for the value *24* and obtain the corresponding *row indices*.
+* Now we can subset on `hour` by using the `.()` notation. We subset for the value *24* and obtain the corresponding *row indices*.
 
 * And on those row indices, we replace the `key` column with the value `0`.
 
 * Since we have replaced values on the *key* column, the *data.table* `flights` isn't sorted by `hour` anymore. Therefore, the key has been automatically removed by setting to NULL.
 
 #
-Now, there shouldn't be any *24* in the *hour* column.
+Now, there shouldn't be any *24* in the `hour` column.
 
 ```{r}
 flights[, sort(unique(hour))]
@@ -353,14 +350,14 @@ key(flights)
 #### -- Get the maximum departure delay for each `month` corresponding to `origin = "JFK"`. Order the result by `month`
 
 ```{r}
-ans <- flights["JFK", max(dep_delay), keyby=month]
+ans <- flights["JFK", max(dep_delay), keyby = month]
 head(ans)
 key(ans)
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* We subset on the `key` column *origin* to obtain the *row indices* corresponding to *"JFK"*. 
+* We subset on the `key` column *origin* to obtain the *row indices* corresponding to *"JFK"*.
 
 * Once we obtain the row indices, we only need two columns - `month` to group by and `dep_delay` to obtain `max()` for each group. *data.table's* query optimisation therefore subsets just those two columns corresponding to the *row indices* obtained in `i`, for speed and memory efficiency.
 
@@ -377,29 +374,29 @@ We can choose, for each query, if *"all"* the matching rows should be returned,
 #### -- Subset only the first matching row from all rows where `origin` matches *"JFK"* and `dest` matches *"MIA"*
 
 ```{r}
-flights[.("JFK", "MIA"), mult="first"]
+flights[.("JFK", "MIA"), mult = "first"]
 ```
 
 #### -- Subset only the last matching row of all the rows where `origin` matches *"LGA", "JFK", "EWR"* and `dest` matches *"XNA"*
 
 ```{r}
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last"]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last"]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* The query *"JFK", "XNA"* doesn't match any rows in `flights` and therefore returns `NA`. 
+* The query *"JFK", "XNA"* doesn't match any rows in `flights` and therefore returns `NA`.
 
 * Once again, the query for second key column `dest`,  *"XNA"*, is recycled to fit the length of the query for first key column `origin`, which is of length 3.
 
-### The *nomatch* argument
+### b) The *nomatch* argument
 
 We can choose if queries that do not match should return `NA` or be skipped altogether using the `nomatch` argument.
 
 #### -- From the previous example, Subset all rows only if there's a match
 
 ```{r}
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last", nomatch = 0L]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", nomatch = 0L]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -412,30 +409,30 @@ flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last", nomatch = 0L]
 
 We have seen so far how we can set and use keys to subset. But what's the advantage? For example, instead of doing:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 # key by origin,dest columns
 flights[.("JFK", "MIA")]
 ```
 
 we could have done:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 flights[origin == "JFK" & dest == "MIA"]
 ```
 
-One advantage very likely is shorter syntax. But even more than that, *binary search based subsets* are **incredibly fast**. 
+One advantage very likely is shorter syntax. But even more than that, *binary search based subsets* are **incredibly fast**.
 
 ### a) Performance of binary search approach
 
-To illustrate, let's create a sample *data.table* with 20 million rows and three columns and key it by columns `x` and `y`. 
+To illustrate, let's create a sample *data.table* with 20 million rows and three columns and key it by columns `x` and `y`.
 
 ```{r}
 set.seed(2L)
 N = 2e7L
-DT = data.table(x = sample(letters, N, TRUE), 
-                y = sample(1000L, N, TRUE), 
-                val=runif(N), key = c("x", "y"))
-print(object.size(DT), units="Mb")
+DT = data.table(x = sample(letters, N, TRUE),
+                y = sample(1000L, N, TRUE),
+              val = runif(N), key = c("x", "y"))
+print(object.size(DT), units = "Mb")
 
 key(DT)
 ```
@@ -464,7 +461,7 @@ dim(ans2)
 identical(ans1$val, ans2$val)
 ```
 
-* The speedup is **~`r round(t1[3]/t2[3])`x**!
+* The speedup is **~`r round(t1[3]/max(t2[3], .001))`x**!
 
 ### b) Why does keying a *data.table* result in blazing fast susbets?
 
@@ -480,15 +477,15 @@ To understand that, let's first look at what *vector scan approach* (method 1) d
 
 This is what we call a *vector scan approach*. And this is quite inefficient, especially on larger tables and when one needs repeated subsetting, because it has to scan through all the rows each time.
 
-# 
+#
 
-Now let us look at binary search approach (method 2). Recall from [Properties of key](#key-properties) - *setting keys reorders the data.table by key columns*. Since the data is sorted, we don't have to *scan through the entire length of the column*! We can instead use *binary search* to search a value in `O(log n)` as opposed to `O(n)` in case of *vector scan approach*, where `n` is the number of rows in the *data.table*. 
+Now let us look at binary search approach (method 2). Recall from [Properties of key](#key-properties) - *setting keys reorders the data.table by key columns*. Since the data is sorted, we don't have to *scan through the entire length of the column*! We can instead use *binary search* to search a value in `O(log n)` as opposed to `O(n)` in case of *vector scan approach*, where `n` is the number of rows in the *data.table*.
 
 #### Binary search approach: {.bs-callout .bs-callout-info}
 
 Here's a very simple illustration. Let's consider the (sorted) numbers shown below:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 1, 5, 10, 19, 22, 23, 30
 ```
 
@@ -506,7 +503,7 @@ A vector scan approach on the other hand would have to scan through all the valu
 
 #
 
-It can be seen that with every search we reduce the number of searches by half. This is why *binary search* based subsets are **incredibly fast**. Since rows of each column of *data.tables* have contiguous locations in memory, the operations are performed in a very cache efficient manner (also contributes to *speed*). 
+It can be seen that with every search we reduce the number of searches by half. This is why *binary search* based subsets are **incredibly fast**. Since rows of each column of *data.tables* have contiguous locations in memory, the operations are performed in a very cache efficient manner (also contributes to *speed*).
 
 In addition, since we obtain the matching row indices directly without having to create those huge logical vectors (equal to the number of rows in a *data.table*), it is quite **memory efficient** as well.
 
@@ -516,19 +513,15 @@ In this vignette, we have learnt another method to subset rows in `i` by keying
 
 #### {.bs-callout .bs-callout-info}
 
-* set key and subset using the key on a *data.table*. 
+* set key and subset using the key on a *data.table*.
 
 * subset using keys which fetches *row indices* in `i`, but much faster.
 
 * combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before.
 
-Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*.
-
 #
 
-We don't have to set and use keys for aggregation operations in general, unless the data is extremely large and/or the task requires repeated subsetting where key based subsets will be noticeably performant. 
-
-However, keying *data.tables* are essential to *join* two *data.tables* which is the subject of discussion in the next vignette *"Joins and rolling joins"*. We will extend the concept of key based subsets to joining two *data.tables* based on `key` columns.
+Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next vignette, we will address this using a *new* feature -- *secondary indexes*.
 
 ***
 
diff --git a/inst/doc/datatable-keys-fast-subset.html b/inst/doc/datatable-keys-fast-subset.html
index 616aa91..2c2f003 100644
--- a/inst/doc/datatable-keys-fast-subset.html
+++ b/inst/doc/datatable-keys-fast-subset.html
@@ -8,50 +8,13 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
+<meta name="viewport" content="width=device-width, initial-scale=1">
 
-<meta name="date" content="2015-09-18" />
+
+<meta name="date" content="2016-12-02" />
 
 <title>Keys and fast binary search based subset</title>
 
-<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4wIHwgKGMpIDIwMDUsIDIwMTQgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
-<meta name="viewport" content="width=device-width, initial-scale=1" />
-<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjEgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE0IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgTUlUIChodHRwczovL2dpdGh1Yi5jb20vdHdicy9ib290c3RyYXAvYmxvYi9tYXN0ZXIvTElDRU5TRSkKICovCmlmKCJ1bmRlZmluZWQiPT10eXBlb2YgalF1ZXJ5KXRocm93IG5ldyBFcnJvcigiQm9vdHN0cmFwJ3MgSmF2YVNjcmlwdCByZXF1aXJlcyBqUXVlcnkiKTsrZnVuY3Rpb24oYSl7dmFyIGI9YS5mbi5qcXVlcnkuc3BsaXQoIiAiKVswXS5zcGxpdCgiLiIpO2lmKGJbMF08MiYmYlsxXTw5fH [...]
-<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
-<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG9jdW1lbnRFbGVtZW50LGQ9Yy5maXJzdEVsZW1lbn [...]
-<style type="text/css">
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 400;
-  src: url(fonts/OpenSans.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 700;
-  src: url(fonts/OpenSansBold.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 400;
-  src: url(fonts/OpenSansItalic.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 700;
-  src: url(fonts/OpenSansBoldItalic.ttf) format('truetype');
-}
-
-/*!
- * bootswatch v3.3.1+1
- * Homepage: http://bootswatch.com
- * Copyright 2012-2014 Thomas Park
- * Licensed under MIT
- * Based on Bootstrap
-*//*! normalize.css v3.0.2 | MIT License | git.io/normalize */html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}body{margin:0}article,aside,details,figcaption,figure,footer,header,hgroup,main,menu,nav,section,summary{display:block}audio,canvas,progress,video{display:inline-block;vertical-align:baseline}audio:not([controls]){display:none;height:0}[hidden],template{display:none}a{background-color:transparent}a:active,a:hover{outline:0}abbr[title]{border-bo [...]
-</style>
 
 
 <style type="text/css">code{white-space: pre;}</style>
@@ -92,44 +55,24 @@ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Ann
 code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
 code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
 </style>
-<style type="text/css">
-  pre:not([class]) {
-    background-color: white;
-  }
-</style>
 
 
-<link href="data:text/css;charset=utf-8,code%2C%0Akbd%2C%0Apre%2C%0Asamp%20%7B%0Afont%2Dfamily%3A%20Source%20Code%20Pro%2C%20Inconsolata%2C%20Monaco%2C%20Consolas%2C%20Menlo%2C%20Courier%20New%2C%20monospace%3B%0A%7D%0Acode%20%7B%0Apadding%3A%200px%202px%3B%0Afont%2Dsize%3A%2090%25%3B%0Acolor%3A%20%23c7254e%3B%0Awhite%2Dspace%3A%20nowrap%3B%0Abackground%2Dcolor%3A%20%23f9f2f4%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%200px%3B%0A%7D%0Apre%20%7B%0Adisplay%3A%20block%3B%0Apadding%3A%209% [...]
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
 
 </head>
 
 <body>
 
-<style type="text/css">
-.main-container {
-  max-width: 940px;
-  margin-left: auto;
-  margin-right: auto;
-}
-code {
-  color: inherit;
-  background-color: rgba(0, 0, 0, 0.04);
-}
-img { 
-  max-width:100%; 
-  height: auto; 
-}
-</style>
-<div class="container-fluid main-container">
 
 
-<div id="header">
-<h1 class="title">Keys and fast binary search based subset</h1>
-<h4 class="date"><em>2015-09-18</em></h4>
-</div>
 
+<h1 class="title toc-ignore">Keys and fast binary search based subset</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
 
-<p>This vignette is aimed at those who are already familiar with <em>data.table</em> syntax, its general form, how to subset rows in <code>i</code>, select and compute on columns, add/modify/delete columns <em>by reference</em> in <code>j</code> and group by using <code>by</code>. If you’re not familar with these concepts, please read the <em>“Introduction to data.table”</em> and <em>“data.table reference semantics”</em> vignettes first.</p>
+
+
+<p>This vignette is aimed at those who are already familiar with <em>data.table</em> syntax, its general form, how to subset rows in <code>i</code>, select and compute on columns, add/modify/delete columns <em>by reference</em> in <code>j</code> and group by using <code>by</code>. If you’re not familiar with these concepts, please read the <em>“Introduction to data.table”</em> and <em>“Reference semantics”</em> vignettes first.</p>
 <hr />
 <div id="data" class="section level2">
 <h2>Data</h2>
@@ -163,9 +106,9 @@ img {
 <p>In the <em>“Introduction to data.table”</em> vignette, we saw how to subset rows in <code>i</code> using logical expressions, row numbers and using <code>order()</code>. In this section, we will look at another way of subsetting incredibly fast - using <em>keys</em>.</p>
 <p>But first, let’s start by looking at <em>data.frames</em>. All <em>data.frames</em> have a row names attribute. Consider the <em>data.frame</em> <code>DF</code> below.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(1L)
-DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">ID1 =</span> <span class="kw">sample</span>(letters[<span class="dv">1</span>:<span class="dv">2</span>], <span class="dv">10</span>, <span class="ot">TRUE</span>), 
+DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">ID1 =</span> <span class="kw">sample</span>(letters[<span class="dv">1</span>:<span class="dv">2</span>], <span class="dv">10</span>, <span class="ot">TRUE</span>),
                 <span class="dt">ID2 =</span> <span class="kw">sample</span>(<span class="dv">1</span>:<span class="dv">3</span>, <span class="dv">10</span>, <span class="ot">TRUE</span>),
-                <span class="dt">val =</span> <span class="kw">sample</span>(<span class="dv">10</span>), 
+                <span class="dt">val =</span> <span class="kw">sample</span>(<span class="dv">10</span>),
                 <span class="dt">stringsAsFactors =</span> <span class="ot">FALSE</span>,
                 <span class="dt">row.names =</span> <span class="kw">sample</span>(LETTERS[<span class="dv">1</span>:<span class="dv">10</span>]))
 DF
@@ -224,9 +167,9 @@ DT
 <ol style="list-style-type: decimal">
 <li><p>We can set keys on <em>multiple columns</em> and the column can be of <em>different types</em> – <em>integer</em>, <em>numeric</em>, <em>character</em>, <em>factor</em>, <em>integer64</em> etc. <em>list</em> and <em>complex</em> types are not supported yet.</p></li>
 <li><p>Uniqueness is not enforced, i.e., duplicate key values are allowed. Since rows are sorted by key, any duplicates in the key columns will appear consecutively.</p></li>
-<li><p>Setting a <code>key</code> <em>does two things</em>:</p>
+<li><p>Setting a <code>key</code> does <em>two</em> things:</p>
 <ol style="list-style-type: lower-alpha">
-<li><p>reorders the rows of the <em>data.table</em> by the column(s) provided <em>by reference</em>, always in <em>increasing</em> order.</p></li>
+<li><p>physically reorders the rows of the <em>data.table</em> by the column(s) provided <em>by reference</em>, always in <em>increasing</em> order.</p></li>
 <li><p>marks those columns as <em>key</em> columns by setting an attribute called <code>sorted</code> to the <em>data.table</em>.</p></li>
 </ol>
 <p>Since the rows are reordered, a <em>data.table</em> can have at most one key because it can not be sorted in more than one way.</p></li>
@@ -261,7 +204,7 @@ DT
 <li><p>Alternatively you can pass a character vector of column names to the function <code>setkeyv()</code>. This is particularly useful while designing functions to pass columns to set key on as function arguments.</p></li>
 <li><p>Note that we did not have to assign the result back to a variable. This is because like the <code>:=</code> function we saw in the <em>“Introduction to data.table”</em> vignette, <code>setkey()</code> and <code>setkeyv()</code> modify the input <em>data.table</em> <em>by reference</em>. They return the result invisibly.</p></li>
 <li><p>The <em>data.table</em> is now reordered (or sorted) by the column we provided - <code>origin</code>. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the <em>data.table</em>, and is therefore very memory efficient.</p></li>
-<li><p>You can also set keys directly when creating <em>data.tables</em> using the <code>data.table()</code> function using <code>key=</code> argument. It takes a character vector of column names.</p></li>
+<li><p>You can also set keys directly when creating <em>data.tables</em> using the <code>data.table()</code> function using <code>key</code> argument. It takes a character vector of column names.</p></li>
 </ul>
 </div>
 <div id="set-and" class="section level4 bs-callout bs-callout-info">
@@ -290,7 +233,8 @@ DT
 <span class="co"># 81483: 2014    10  31        -6       -38      UA    JFK  LAX      323     2475   11</span>
 
 ## alternatively
-<span class="co"># flights[J("JFK")] (or) flights[list("JFK")]</span></code></pre></div>
+<span class="co"># flights[J("JFK")] (or) </span>
+<span class="co"># flights[list("JFK")]</span></code></pre></div>
 </div>
 <div id="section-3" class="section level4 bs-callout bs-callout-info">
 <h4></h4>
@@ -319,7 +263,7 @@ DT
 </div>
 <div id="c-keys-and-multiple-columns" class="section level3">
 <h3>c) Keys and multiple columns</h3>
-<p>To refresh, <em>keys</em> are like <em>supercharged</em> rownames. We can set key on multiple columns and they can be of multiple types.</p>
+<p>To refresh, <em>keys</em> are like <em>supercharged</em> row names. We can set key on multiple columns and they can be of multiple types.</p>
 <div id="how-can-i-set-keys-on-both-origin-and-dest-columns" class="section level4">
 <h4>– How can I set keys on both <code>origin</code> <em>and</em> <code>dest</code> columns?</h4>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">setkey</span>(flights, origin, dest)
@@ -415,7 +359,7 @@ flights[.(<span class="st">"JFK"</span>)] ## or in this case simply fl
 <div id="whats-happening-here" class="section level4 bs-callout bs-callout-info">
 <h4>What’s happening here?</h4>
 <ul>
-<li><p>Read <a href="#multiple-key-point">this</a> again. The value provided for the second key column <em>“MIA”</em> has to find the matching vlaues in <code>dest</code> key column <em>on the matching rows provided by the first key column <code>origin</code></em>. We can not skip the values of key columns <em>before</em>. Therfore we provide <em>all</em> unique values from key column <code>origin</code>.</p></li>
+<li><p>Read <a href="#multiple-key-point">this</a> again. The value provided for the second key column <em>“MIA”</em> has to find the matching values in <code>dest</code> key column <em>on the matching rows provided by the first key column <code>origin</code></em>. We can not skip the values of key columns <em>before</em>. Therefore we provide <em>all</em> unique values from key column <code>origin</code>.</p></li>
 <li><p><em>“MIA”</em> is automatically recycled to fit the length of <code>unique(origin)</code> which is <em>3</em>.</p></li>
 </ul>
 </div>
@@ -445,10 +389,10 @@ flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA&qu
 <div id="section-8" class="section level4 bs-callout bs-callout-info">
 <h4></h4>
 <ul>
-<li><p>The <em>row indices</em> corresponding to <code>origin == "LGA" and</code>dest == “TPA”` are obtained using <em>key based subset</em>.</p></li>
+<li><p>The <em>row indices</em> corresponding to <code>origin == "LGA"</code> and <code>dest == "TPA"</code> are obtained using <em>key based subset</em>.</p></li>
 <li><p>Once we have the row indices, we look at <code>j</code> which requires only the <code>arr_delay</code> column. So we simply select the column <code>arr_delay</code> for those <em>row indices</em> in the exact same way as we have seen in <em>Introduction to data.table</em> vignette.</p></li>
 <li><p>We could have returned the result by using <code>with = FALSE</code> as well.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA"</span>), <span class="st">"arr_delay"</span>, with=<span class="ot">FALSE</span>]</code></pre></div></li>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA"</span>), <span class="st">"arr_delay"</span>, with =<span class="st"> </span><span class="ot">FALSE</span>]</code></pre></div></li>
 </ul>
 </div>
 </div>
@@ -496,13 +440,25 @@ flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <span class="kw">key</span>(flights)
 <span class="co"># [1] "hour"</span>
 flights[.(<span class="dv">24</span>), hour :<span class="er">=</span><span class="st"> </span>0L]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#      1: 2014     4  15       598       602      DL    EWR  ATL      104      746    0</span>
+<span class="co">#      2: 2014     5  22       289       267      DL    EWR  ATL      102      746    0</span>
+<span class="co">#      3: 2014     7  14       277       253      DL    EWR  ATL      101      746    0</span>
+<span class="co">#      4: 2014     2  14       128       117      EV    EWR  BDL       27      116    0</span>
+<span class="co">#      5: 2014     6  17       127       119      EV    EWR  BDL       24      116    0</span>
+<span class="co">#     ---                                                                              </span>
+<span class="co"># 253312: 2014     8   3         1       -13      DL    JFK  SJU      196     1598    0</span>
+<span class="co"># 253313: 2014    10   8         1         1      B6    JFK  SJU      199     1598    0</span>
+<span class="co"># 253314: 2014     7  14       211       219      B6    JFK  SLC      282     1990    0</span>
+<span class="co"># 253315: 2014     7   3       440       418      FL    LGA  ATL      107      762    0</span>
+<span class="co"># 253316: 2014     6  13       300       280      DL    LGA  PBI      140     1035    0</span>
 <span class="kw">key</span>(flights)
 <span class="co"># NULL</span></code></pre></div>
 <div id="section-10" class="section level4 bs-callout bs-callout-info">
 <h4></h4>
 <ul>
-<li><p>We first set <code>key</code> to <em>hour</em>. This reorders <code>flights</code> by the column <em>hour</em> and marks that column as the <code>key</code> column.</p></li>
-<li><p>Now we can subset on <em>hour</em> by using the <code>.()</code> notation. We subset for the value <em>24</em> and obtain the corresponding <em>row indices</em>.</p></li>
+<li><p>We first set <code>key</code> to <code>hour</code>. This reorders <code>flights</code> by the column <code>hour</code> and marks that column as the <code>key</code> column.</p></li>
+<li><p>Now we can subset on <code>hour</code> by using the <code>.()</code> notation. We subset for the value <em>24</em> and obtain the corresponding <em>row indices</em>.</p></li>
 <li><p>And on those row indices, we replace the <code>key</code> column with the value <code>0</code>.</p></li>
 <li><p>Since we have replaced values on the <em>key</em> column, the <em>data.table</em> <code>flights</code> isn’t sorted by <code>hour</code> anymore. Therefore, the key has been automatically removed by setting to NULL.</p></li>
 </ul>
@@ -512,7 +468,7 @@ flights[.(<span class="dv">24</span>), hour :<span class="er">=</span><span clas
 </div>
 <div id="section-11" class="section level1">
 <h1></h1>
-<p>Now, there shouldn’t be any <em>24</em> in the <em>hour</em> column.</p>
+<p>Now, there shouldn’t be any <em>24</em> in the <code>hour</code> column.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <span class="co">#  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23</span></code></pre></div>
 <div id="e-aggregation-using-by" class="section level3">
@@ -523,7 +479,7 @@ flights[.(<span class="dv">24</span>), hour :<span class="er">=</span><span clas
 <span class="co"># [1] "origin" "dest"</span></code></pre></div>
 <div id="get-the-maximum-departure-delay-for-each-month-corresponding-to-origin-jfk.-order-the-result-by-month" class="section level4">
 <h4>– Get the maximum departure delay for each <code>month</code> corresponding to <code>origin = "JFK"</code>. Order the result by <code>month</code></h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="st">"JFK"</span>, <span class="kw">max</span>(dep_delay), keyby=month]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="st">"JFK"</span>, <span class="kw">max</span>(dep_delay), keyby =<span class="st"> </span>month]
 <span class="kw">head</span>(ans)
 <span class="co">#    month   V1</span>
 <span class="co"># 1:     1  881</span>
@@ -552,13 +508,13 @@ flights[.(<span class="dv">24</span>), hour :<span class="er">=</span><span clas
 <p>We can choose, for each query, if <em>“all”</em> the matching rows should be returned, or just the <em>“first”</em> or <em>“last”</em> using the <code>mult</code> argument. The default value is <em>“all”</em> - what we’ve seen so far.</p>
 <div id="subset-only-the-first-matching-row-from-all-rows-where-origin-matches-jfk-and-dest-matches-mia" class="section level4">
 <h4>– Subset only the first matching row from all rows where <code>origin</code> matches <em>“JFK”</em> and <code>dest</code> matches <em>“MIA”</em></h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"JFK"</span>, <span class="st">"MIA"</span>), mult=<span class="st">"first"</span>]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"JFK"</span>, <span class="st">"MIA"</span>), mult =<span class="st"> "first"</span>]
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
 <span class="co"># 1: 2014     1   1         6         3      AA    JFK  MIA      157     1089    5</span></code></pre></div>
 </div>
 <div id="subset-only-the-last-matching-row-of-all-the-rows-where-origin-matches-lga-jfk-ewr-and-dest-matches-xna" class="section level4">
 <h4>– Subset only the last matching row of all the rows where <code>origin</code> matches <em>“LGA”, “JFK”, “EWR”</em> and <code>dest</code> matches <em>“XNA”</em></h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), mult=<span class="st">"last"</span>]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), mult =<span class="st"> "last"</span>]
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
 <span class="co"># 1: 2014     5  23       163       148      MQ    LGA  XNA      158     1147   18</span>
 <span class="co"># 2:   NA    NA  NA        NA        NA      NA    JFK  XNA       NA       NA   NA</span>
@@ -572,12 +528,12 @@ flights[.(<span class="dv">24</span>), hour :<span class="er">=</span><span clas
 </ul>
 </div>
 </div>
-<div id="the-nomatch-argument" class="section level3">
-<h3>The <em>nomatch</em> argument</h3>
+<div id="b-the-nomatch-argument" class="section level3">
+<h3>b) The <em>nomatch</em> argument</h3>
 <p>We can choose if queries that do not match should return <code>NA</code> or be skipped altogether using the <code>nomatch</code> argument.</p>
 <div id="from-the-previous-example-subset-all-rows-only-if-theres-a-match" class="section level4">
 <h4>– From the previous example, Subset all rows only if there’s a match</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), mult=<span class="st">"last"</span>, nomatch =<span class="st"> </span>0L]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), mult =<span class="st"> "last"</span>, nomatch =<span class="st"> </span>0L]
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
 <span class="co"># 1: 2014     5  23       163       148      MQ    LGA  XNA      158     1147   18</span>
 <span class="co"># 2: 2014     2   3       231       268      EV    EWR  XNA      184     1131   12</span></code></pre></div>
@@ -604,10 +560,10 @@ flights[.(<span class="st">"JFK"</span>, <span class="st">"MIA&qu
 <p>To illustrate, let’s create a sample <em>data.table</em> with 20 million rows and three columns and key it by columns <code>x</code> and <code>y</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(2L)
 N =<span class="st"> </span>2e7L
-DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x =</span> <span class="kw">sample</span>(letters, N, <span class="ot">TRUE</span>), 
-                <span class="dt">y =</span> <span class="kw">sample</span>(1000L, N, <span class="ot">TRUE</span>), 
-                <span class="dt">val=</span><span class="kw">runif</span>(N), <span class="dt">key =</span> <span class="kw">c</span>(<span class="st">"x"</span>, <span class="st">"y"</span>))
-<span class="kw">print</span>(<span class="kw">object.size</span>(DT), <span class="dt">units=</span><span class="st">"Mb"</span>)
+DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x =</span> <span class="kw">sample</span>(letters, N, <span class="ot">TRUE</span>),
+                <span class="dt">y =</span> <span class="kw">sample</span>(1000L, N, <span class="ot">TRUE</span>),
+              <span class="dt">val =</span> <span class="kw">runif</span>(N), <span class="dt">key =</span> <span class="kw">c</span>(<span class="st">"x"</span>, <span class="st">"y"</span>))
+<span class="kw">print</span>(<span class="kw">object.size</span>(DT), <span class="dt">units =</span> <span class="st">"Mb"</span>)
 <span class="co"># 381.5 Mb</span>
 
 <span class="kw">key</span>(DT)
@@ -618,7 +574,7 @@ DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt"
 t1 <-<span class="st"> </span><span class="kw">system.time</span>(ans1 <-<span class="st"> </span>DT[x ==<span class="st"> "g"</span> &<span class="st"> </span>y ==<span class="st"> </span>877L])
 t1
 <span class="co">#    user  system elapsed </span>
-<span class="co">#   0.636   0.020   0.652</span>
+<span class="co">#   0.172   0.016   0.188</span>
 <span class="kw">head</span>(ans1)
 <span class="co">#    x   y       val</span>
 <span class="co"># 1: g 877 0.3946652</span>
@@ -634,7 +590,7 @@ t1
 t2 <-<span class="st"> </span><span class="kw">system.time</span>(ans2 <-<span class="st"> </span>DT[.(<span class="st">"g"</span>, 877L)])
 t2
 <span class="co">#    user  system elapsed </span>
-<span class="co">#       0       0       0</span>
+<span class="co">#   0.000   0.004   0.000</span>
 <span class="kw">head</span>(ans2)
 <span class="co">#    x   y       val</span>
 <span class="co"># 1: g 877 0.3946652</span>
@@ -649,7 +605,7 @@ t2
 <span class="kw">identical</span>(ans1$val, ans2$val)
 <span class="co"># [1] TRUE</span></code></pre></div>
 <ul>
-<li>The speedup is <strong>~x</strong>!</li>
+<li>The speedup is <strong>~188x</strong>!</li>
 </ul>
 </div>
 <div id="b-why-does-keying-a-data.table-result-in-blazing-fast-susbets" class="section level3">
@@ -698,28 +654,16 @@ t2
 <li><p>subset using keys which fetches <em>row indices</em> in <code>i</code>, but much faster.</p></li>
 <li><p>combine key based subsets with <code>j</code> and <code>by</code>. Note that the <code>j</code> and <code>by</code> operations are exactly the same as before.</p></li>
 </ul>
-<p>Key based subsets are <strong>incredibly fast</strong> and are particularly useful when the task involves <em>repeated subsetting</em>.</p>
 </div>
 </div>
 </div>
 <div id="section-18" class="section level1">
 <h1></h1>
-<p>We don’t have to set and use keys for aggregation operations in general, unless the data is extremely large and/or the task requires repeated subsetting where key based subsets will be noticeably performant.</p>
-<p>However, keying <em>data.tables</em> are essential to <em>join</em> two <em>data.tables</em> which is the subject of discussion in the next vignette <em>“Joins and rolling joins”</em>. We will extend the concept of key based subsets to joining two <em>data.tables</em> based on <code>key</code> columns.</p>
+<p>Key based subsets are <strong>incredibly fast</strong> and are particularly useful when the task involves <em>repeated subsetting</em>. But it may not be always desirable to set key and physically reorder the <em>data.table</em>. In the next vignette, we will address this using a <em>new</em> feature – <em>secondary indexes</em>.</p>
 <hr />
 </div>
 
 
-</div>
-
-<script>
-
-// add bootstrap table styles to pandoc tables
-$(document).ready(function () {
-  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
-});
-
-</script>
 
 <!-- dynamically load mathjax for compatibility with self-contained -->
 <script>
diff --git a/inst/doc/datatable-reference-semantics.R b/inst/doc/datatable-reference-semantics.R
index ebfdb79..0e16fc7 100644
--- a/inst/doc/datatable-reference-semantics.R
+++ b/inst/doc/datatable-reference-semantics.R
@@ -2,14 +2,13 @@
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 
-## ----echo=FALSE-----------------------------------------------------------------------------------
-options(width=100)
+## ----echo = FALSE---------------------------------------------------------------------------------
+options(width = 100L)
 
 ## -------------------------------------------------------------------------------------------------
 flights <- fread("flights14.csv")
@@ -17,29 +16,29 @@ flights
 dim(flights)
 
 ## -------------------------------------------------------------------------------------------------
-DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
+DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DF
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  DF$c <- 18:13               # (1) -- replace entire column
 #  # or
 #  DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c'
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  DT[, c("colA", "colB", ...) := list(valA, valB, ...)]
 #  
 #  # when you have only one column to assign to you
 #  # can drop the quotes and list(), for convenience
 #  DT[, colA := valA]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  DT[, `:=`(colA = valA, # valA is assigned to colA
 #            colB = valB, # valB is assigned to colB
 #            ...
 #  )]
 
 ## -------------------------------------------------------------------------------------------------
-flights[, `:=`(speed = distance / (air_time/60), # speed in km/hr
+flights[, `:=`(speed = distance / (air_time/60), # speed in mph (mi/h)
                delay = arr_delay + dep_delay)]   # delay in minutes
 head(flights)
 
@@ -68,11 +67,11 @@ head(flights)
 ## or using the functional form
 # flights[, `:=`(delay = NULL)]
 
-## ----eval=FALSE-----------------------------------------------------------------------------------
+## ----eval = FALSE---------------------------------------------------------------------------------
 #  flights[, delay := NULL]
 
 ## -------------------------------------------------------------------------------------------------
-flights[, max_speed := max(speed), by=.(origin, dest)]
+flights[, max_speed := max(speed), by = .(origin, dest)]
 head(flights)
 
 ## -------------------------------------------------------------------------------------------------
@@ -89,7 +88,7 @@ head(flights)
 ## -------------------------------------------------------------------------------------------------
 foo <- function(DT) {
   DT[, speed := distance / (air_time/60)]
-  DT[, .(max_speed = max(speed)), by=month]
+  DT[, .(max_speed = max(speed)), by = month]
 }
 ans = foo(flights)
 head(flights)
@@ -100,28 +99,28 @@ flights[, speed := NULL]
 
 ## -------------------------------------------------------------------------------------------------
 foo <- function(DT) {
-  DT <- copy(DT)                             ## deep copy
-  DT[, speed := distance / (air_time/60)]    ## doesn't affect 'flights'
-  DT[, .(max_speed = max(speed)), by=month]
+  DT <- copy(DT)                              ## deep copy
+  DT[, speed := distance / (air_time/60)]     ## doesn't affect 'flights'
+  DT[, .(max_speed = max(speed)), by = month]
 }
 ans <- foo(flights)
 head(flights)
 head(ans)
 
 ## -------------------------------------------------------------------------------------------------
-DT = data.table(x=1, y=2)
+DT = data.table(x = 1L, y = 2L)
 DT_n = names(DT)
 DT_n
 
 ## add a new column by reference
-DT[, z := 3]
+DT[, z := 3L]
 
 ## DT_n also gets updated
 DT_n
 
 ## use `copy()`
 DT_n = copy(names(DT))
-DT[, w := 4]
+DT[, w := 4L]
 
 ## DT_n doesn't get updated
 DT_n
diff --git a/inst/doc/datatable-reference-semantics.Rmd b/inst/doc/datatable-reference-semantics.Rmd
index 2a66bc6..1854e8b 100644
--- a/inst/doc/datatable-reference-semantics.Rmd
+++ b/inst/doc/datatable-reference-semantics.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Reference semantics"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Reference semantics}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,13 +13,12 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
-This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familar with these concepts, please read the *"Introduction to data.table"* vignette first.
+This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the *"Introduction to data.table"* vignette first.
 
 ***
 
@@ -30,8 +26,8 @@ This vignette discusses *data.table*'s reference semantics which allows to *add/
 
 We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -59,13 +55,13 @@ All the operations we have seen so far in the previous vignette resulted in a ne
 Before we look at *reference semantics*, consider the *data.frame* shown below:
 
 ```{r}
-DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
+DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DF
 ```
 
 When we did:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 DF$c <- 18:13               # (1) -- replace entire column
 # or
 DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c'
@@ -86,21 +82,21 @@ With *data.table's* `:=` operator, absolutely no copies are made in *both* (1) a
 
 ### b) The `:=` operator
 
-It can be used in `j` in two ways: 
+It can be used in `j` in two ways:
 
 (a) The `LHS := RHS` form
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     DT[, c("colA", "colB", ...) := list(valA, valB, ...)]
 
-    # when you have only one column to assign to you 
+    # when you have only one column to assign to you
     # can drop the quotes and list(), for convenience
     DT[, colA := valA]
 	  ```
 
 (b) The functional form
 
-	```{r eval=FALSE}
+	```{r eval = FALSE}
 	DT[, `:=`(colA = valA, # valA is assigned to colA
 	          colB = valB, # valB is assigned to colB
 	          ...
@@ -111,15 +107,15 @@ It can be used in `j` in two ways:
 
 Note that the code above explains how `:=` can be used. They are not working examples. We will start using them on `flights` *data.table* from the next section.
 
-# 
+#
 
 #### {.bs-callout .bs-callout-info}
 
-* Form (a) is usually easy to program with and is particularly useful when you don't know the columns to assign values to in advance.
+* In (a), `LHS` takes a character vector of column names and `RHS` a *list of values*. `RHS` just needs to be a `list`, irrespective of how its generated (e.g., using `lapply()`, `list()`, `mget()`, `mapply()` etc.). This form is usually easy to program with and is particularly useful when you don't know the columns to assign values to in advance.
 
-* On the other hand, form (b) is handy if you would like to jot some comments down for later.
+* On the other hand, (b) is handy if you would like to jot some comments down for later.
 
-* The result is returned *invisibly*. 
+* The result is returned *invisibly*.
 
 * Since `:=` is available in `j`, we can combine it with `i` and `by` operations just like the aggregation operations we saw in the previous vignette.
 
@@ -136,7 +132,7 @@ For the rest of the vignette, we will work with `flights` *data.table*.
 #### -- How can we add columns *speed* and *total delay* of each flight to `flights` *data.table*?
 
 ```{r}
-flights[, `:=`(speed = distance / (air_time/60), # speed in km/hr
+flights[, `:=`(speed = distance / (air_time/60), # speed in mph (mi/h)
                delay = arr_delay + dep_delay)]   # delay in minutes
 head(flights)
 
@@ -148,7 +144,7 @@ head(flights)
 
 * We did not have to assign the result back to `flights`.
 
-* The `flights` *data.table* now contains the two newly added columns. This is what we mean by *added by reference*. 
+* The `flights` *data.table* now contains the two newly added columns. This is what we mean by *added by reference*.
 
 * We used the functional form so that we could add comments on the side to explain what the computation does. You can also see the `LHS := RHS` form (commented).
 
@@ -190,6 +186,12 @@ Let's look at all the `hours` to verify.
 flights[, sort(unique(hour))]
 ```
 
+#### Exercise: {.bs-callout .bs-callout-warning #update-by-reference-question}
+
+What is the difference between `flights[hour == 24L, hour := 0L]` and `flights[hour == 24L][, hour := 0L]`? Hint: The latter needs an assignment (`<-`) if you would want to use the result later.
+
+If you can't figure it out, have a look at the `Note` section of `?":="`.
+
 ### c) Delete column by reference
 
 #### -- Remove `delay` column
@@ -210,7 +212,7 @@ head(flights)
 
 * When there is just one column to delete, we can drop the `c()` and double quotes and just use the column name *unquoted*, for convenience. That is:
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     flights[, delay := NULL]
     ```
 
@@ -223,7 +225,7 @@ We have already seen the use of `i` along with `:=` in [Section 2b](#ref-i-j). L
 #### -- How can we add a new column which contains for each `orig,dest` pair the maximum speed?
 
 ```{r}
-flights[, max_speed := max(speed), by=.(origin, dest)]
+flights[, max_speed := max(speed), by = .(origin, dest)]
 head(flights)
 ```
 
@@ -275,7 +277,7 @@ Let's say we would like to create a function that would return the *maximum spee
 ```{r}
 foo <- function(DT) {
   DT[, speed := distance / (air_time/60)]
-  DT[, .(max_speed = max(speed)), by=month]
+  DT[, .(max_speed = max(speed)), by = month]
 }
 ans = foo(flights)
 head(flights)
@@ -299,10 +301,10 @@ The `copy()` function *deep* copies the input object and therefore any subsequen
 
 There are two particular places where `copy()` function is essential:
 
-1. Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified *by reference*. As an example, let's consider the task in the previous section, except we don't want to modify `flghts` by reference. 
+1. Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified *by reference*. As an example, let's consider the task in the previous section, except we don't want to modify `flights` by reference.
 
     Let's first delete the `speed` column we generated in the previous section.
-    
+
     ```{r}
     flights[, speed := NULL]
     ```
@@ -310,9 +312,9 @@ There are two particular places where `copy()` function is essential:
 
     ```{r}
     foo <- function(DT) {
-      DT <- copy(DT)                             ## deep copy
-      DT[, speed := distance / (air_time/60)]    ## doesn't affect 'flights'
-      DT[, .(max_speed = max(speed)), by=month]
+      DT <- copy(DT)                              ## deep copy
+      DT[, speed := distance / (air_time/60)]     ## doesn't affect 'flights'
+      DT[, .(max_speed = max(speed)), by = month]
     }
     ans <- foo(flights)
     head(flights)
@@ -332,19 +334,19 @@ However we could improve this functionality further by *shallow* copying instead
 2. When we store the column names on to a variable, e.g., `DT_n = names(DT)`, and then *add/update/delete* column(s) *by reference*. It would also modify `DT_n`, unless we do `copy(names(DT))`.
 
     ```{r}
-    DT = data.table(x=1, y=2)
+    DT = data.table(x = 1L, y = 2L)
     DT_n = names(DT)
     DT_n
 
     ## add a new column by reference
-    DT[, z := 3]
+    DT[, z := 3L]
 
     ## DT_n also gets updated
     DT_n
 
     ## use `copy()`
     DT_n = copy(names(DT))
-    DT[, w := 4]
+    DT[, w := 4L]
 
     ## DT_n doesn't get updated
     DT_n
@@ -360,9 +362,9 @@ However we could improve this functionality further by *shallow* copying instead
 
 * We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference.
 
-# 
+#
 
-So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to peform *blazing fast subsets* by *keying data.tables*. 
+So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to perform *blazing fast subsets* by *keying data.tables*.
 
 ***
 
diff --git a/inst/doc/datatable-reference-semantics.html b/inst/doc/datatable-reference-semantics.html
index 146dfc3..6a3e201 100644
--- a/inst/doc/datatable-reference-semantics.html
+++ b/inst/doc/datatable-reference-semantics.html
@@ -8,50 +8,13 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
+<meta name="viewport" content="width=device-width, initial-scale=1">
 
-<meta name="date" content="2015-09-18" />
 
-<title>Reference semantics</title>
+<meta name="date" content="2016-12-02" />
 
-<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4wIHwgKGMpIDIwMDUsIDIwMTQgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
-<meta name="viewport" content="width=device-width, initial-scale=1" />
-<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjEgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE0IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgTUlUIChodHRwczovL2dpdGh1Yi5jb20vdHdicy9ib290c3RyYXAvYmxvYi9tYXN0ZXIvTElDRU5TRSkKICovCmlmKCJ1bmRlZmluZWQiPT10eXBlb2YgalF1ZXJ5KXRocm93IG5ldyBFcnJvcigiQm9vdHN0cmFwJ3MgSmF2YVNjcmlwdCByZXF1aXJlcyBqUXVlcnkiKTsrZnVuY3Rpb24oYSl7dmFyIGI9YS5mbi5qcXVlcnkuc3BsaXQoIiAiKVswXS5zcGxpdCgiLiIpO2lmKGJbMF08MiYmYlsxXTw5fH [...]
-<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
-<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG9jdW1lbnRFbGVtZW50LGQ9Yy5maXJzdEVsZW1lbn [...]
-<style type="text/css">
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 400;
-  src: url(fonts/OpenSans.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 700;
-  src: url(fonts/OpenSansBold.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 400;
-  src: url(fonts/OpenSansItalic.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 700;
-  src: url(fonts/OpenSansBoldItalic.ttf) format('truetype');
-}
+<title>Reference semantics</title>
 
-/*!
- * bootswatch v3.3.1+1
- * Homepage: http://bootswatch.com
- * Copyright 2012-2014 Thomas Park
- * Licensed under MIT
- * Based on Bootstrap
-*//*! normalize.css v3.0.2 | MIT License | git.io/normalize */html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}body{margin:0}article,aside,details,figcaption,figure,footer,header,hgroup,main,menu,nav,section,summary{display:block}audio,canvas,progress,video{display:inline-block;vertical-align:baseline}audio:not([controls]){display:none;height:0}[hidden],template{display:none}a{background-color:transparent}a:active,a:hover{outline:0}abbr[title]{border-bo [...]
-</style>
 
 
 <style type="text/css">code{white-space: pre;}</style>
@@ -92,44 +55,24 @@ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Ann
 code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
 code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
 </style>
-<style type="text/css">
-  pre:not([class]) {
-    background-color: white;
-  }
-</style>
 
 
-<link href="data:text/css;charset=utf-8,code%2C%0Akbd%2C%0Apre%2C%0Asamp%20%7B%0Afont%2Dfamily%3A%20Source%20Code%20Pro%2C%20Inconsolata%2C%20Monaco%2C%20Consolas%2C%20Menlo%2C%20Courier%20New%2C%20monospace%3B%0A%7D%0Acode%20%7B%0Apadding%3A%200px%202px%3B%0Afont%2Dsize%3A%2090%25%3B%0Acolor%3A%20%23c7254e%3B%0Awhite%2Dspace%3A%20nowrap%3B%0Abackground%2Dcolor%3A%20%23f9f2f4%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%200px%3B%0A%7D%0Apre%20%7B%0Adisplay%3A%20block%3B%0Apadding%3A%209% [...]
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
 
 </head>
 
 <body>
 
-<style type="text/css">
-.main-container {
-  max-width: 940px;
-  margin-left: auto;
-  margin-right: auto;
-}
-code {
-  color: inherit;
-  background-color: rgba(0, 0, 0, 0.04);
-}
-img { 
-  max-width:100%; 
-  height: auto; 
-}
-</style>
-<div class="container-fluid main-container">
 
 
-<div id="header">
-<h1 class="title">Reference semantics</h1>
-<h4 class="date"><em>2015-09-18</em></h4>
-</div>
 
+<h1 class="title toc-ignore">Reference semantics</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
 
-<p>This vignette discusses <em>data.table</em>’s reference semantics which allows to <em>add/update/delete</em> columns of a <em>data.table by reference</em>, and also combine them with <code>i</code> and <code>by</code>. It is aimed at those who are already familiar with <em>data.table</em> syntax, its general form, how to subset rows in <code>i</code>, select and compute on columns, and perform aggregations by group. If you’re not familar with these concepts, please read the <em>“Intro [...]
+
+
+<p>This vignette discusses <em>data.table</em>’s reference semantics which allows to <em>add/update/delete</em> columns of a <em>data.table by reference</em>, and also combine them with <code>i</code> and <code>by</code>. It is aimed at those who are already familiar with <em>data.table</em> syntax, its general form, how to subset rows in <code>i</code>, select and compute on columns, and perform aggregations by group. If you’re not familiar with these concepts, please read the <em>“Intr [...]
 <hr />
 <div id="data" class="section level2">
 <h2>Data</h2>
@@ -166,7 +109,7 @@ flights
 <div id="a-background" class="section level3">
 <h3>a) Background</h3>
 <p>Before we look at <em>reference semantics</em>, consider the <em>data.frame</em> shown below:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">ID =</span> <span class="kw">c</span>(<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"a"</span>,<span class="st">"a"</span>,<span class="st">"c"</span>), <span class="dt">a =</span> <span class="dv">1</span>:<span class= [...]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DF =<span class="st"> </span><span class="kw">data.frame</span>(<span class="dt">ID =</span> <span class="kw">c</span>(<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"b"</span>,<span class="st">"a"</span>,<span class="st">"a"</span>,<span class="st">"c"</span>), <span class="dt">a =</span> <span class="dv">1</span>:<span class= [...]
 DF
 <span class="co">#   ID a  b  c</span>
 <span class="co"># 1  b 1  7 13</span>
@@ -198,7 +141,7 @@ DF$c[DF$ID ==<span class="st"> "b"</span>] <-<span class="st"> </sp
 <li><p>The <code>LHS := RHS</code> form</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT[, <span class="kw">c</span>(<span class="st">"colA"</span>, <span class="st">"colB"</span>, ...) :<span class="er">=</span><span class="st"> </span><span class="kw">list</span>(valA, valB, ...)]
 
-<span class="co"># when you have only one column to assign to you </span>
+<span class="co"># when you have only one column to assign to you</span>
 <span class="co"># can drop the quotes and list(), for convenience</span>
 DT[, colA :<span class="er">=</span><span class="st"> </span>valA]</code></pre></div></li>
 <li><p>The functional form</p>
@@ -218,8 +161,8 @@ DT[, colA :<span class="er">=</span><span class="st"> </span>valA]</code></pre><
 <div id="section-3" class="section level4 bs-callout bs-callout-info">
 <h4></h4>
 <ul>
-<li><p>Form (a) is usually easy to program with and is particularly useful when you don’t know the columns to assign values to in advance.</p></li>
-<li><p>On the other hand, form (b) is handy if you would like to jot some comments down for later.</p></li>
+<li><p>In (a), <code>LHS</code> takes a character vector of column names and <code>RHS</code> a <em>list of values</em>. <code>RHS</code> just needs to be a <code>list</code>, irrespective of how its generated (e.g., using <code>lapply()</code>, <code>list()</code>, <code>mget()</code>, <code>mapply()</code> etc.). This form is usually easy to program with and is particularly useful when you don’t know the columns to assign values to in advance.</p></li>
+<li><p>On the other hand, (b) is handy if you would like to jot some comments down for later.</p></li>
 <li><p>The result is returned <em>invisibly</em>.</p></li>
 <li><p>Since <code>:=</code> is available in <code>j</code>, we can combine it with <code>i</code> and <code>by</code> operations just like the aggregation operations we saw in the previous vignette.</p></li>
 </ul>
@@ -235,8 +178,32 @@ DT[, colA :<span class="er">=</span><span class="st"> </span>valA]</code></pre><
 <h3>a) Add columns by reference</h3>
 <div id="how-can-we-add-columns-speed-and-total-delay-of-each-flight-to-flights-data.table" class="section level4">
 <h4>– How can we add columns <em>speed</em> and <em>total delay</em> of each flight to <code>flights</code> <em>data.table</em>?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, <span class="st">`</span><span class="dt">:=</span><span class="st">`</span>(<span class="dt">speed =</span> distance /<span class="st"> </span>(air_time/<span class="dv">60</span>), <span class="co"># speed in km/hr</span>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, <span class="st">`</span><span class="dt">:=</span><span class="st">`</span>(<span class="dt">speed =</span> distance /<span class="st"> </span>(air_time/<span class="dv">60</span>), <span class="co"># speed in mph (mi/h)</span>
                <span class="dt">delay =</span> arr_delay +<span class="st"> </span>dep_delay)]   <span class="co"># delay in minutes</span>
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 409.0909</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 423.0769</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 395.5414</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 424.2857</span>
+<span class="co">#     ---                                                                                       </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 422.6866</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 444.4444</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 311.5663</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 401.6000</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 359.4545</span>
+<span class="co">#         delay</span>
+<span class="co">#      1:    27</span>
+<span class="co">#      2:    10</span>
+<span class="co">#      3:    11</span>
+<span class="co">#      4:   -34</span>
+<span class="co">#      5:     3</span>
+<span class="co">#     ---      </span>
+<span class="co"># 253312:   -29</span>
+<span class="co"># 253313:   -19</span>
+<span class="co"># 253314:     8</span>
+<span class="co"># 253315:    11</span>
+<span class="co"># 253316:    -4</span>
 <span class="kw">head</span>(flights)
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed delay</span>
 <span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490    27</span>
@@ -268,7 +235,31 @@ flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <div id="replace-those-rows-where-hour-24-with-the-value-0" class="section level4">
 <h4>– Replace those rows where <code>hour == 24</code> with the value <code>0</code></h4>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># subassign by reference</span>
-flights[hour ==<span class="st"> </span>24L, hour :<span class="er">=</span><span class="st"> </span>0L]</code></pre></div>
+flights[hour ==<span class="st"> </span>24L, hour :<span class="er">=</span><span class="st"> </span>0L]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 409.0909</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 423.0769</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 395.5414</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 424.2857</span>
+<span class="co">#     ---                                                                                       </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 422.6866</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 444.4444</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 311.5663</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 401.6000</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 359.4545</span>
+<span class="co">#         delay</span>
+<span class="co">#      1:    27</span>
+<span class="co">#      2:    10</span>
+<span class="co">#      3:    11</span>
+<span class="co">#      4:   -34</span>
+<span class="co">#      5:     3</span>
+<span class="co">#     ---      </span>
+<span class="co"># 253312:   -29</span>
+<span class="co"># 253313:   -19</span>
+<span class="co"># 253314:     8</span>
+<span class="co"># 253315:    11</span>
+<span class="co"># 253316:    -4</span></code></pre></div>
 </div>
 <div id="section-5" class="section level4 bs-callout bs-callout-info">
 <h4></h4>
@@ -312,11 +303,28 @@ flights[hour ==<span class="st"> </span>24L, hour :<span class="er">=</span><spa
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># check again for '24'</span>
 flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <span class="co">#  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23</span></code></pre></div>
+<div id="update-by-reference-question" class="section level4 bs-callout bs-callout-warning">
+<h4>Exercise:</h4>
+<p>What is the difference between <code>flights[hour == 24L, hour := 0L]</code> and <code>flights[hour == 24L][, hour := 0L]</code>? Hint: The latter needs an assignment (<code><-</code>) if you would want to use the result later.</p>
+<p>If you can’t figure it out, have a look at the <code>Note</code> section of <code>?":="</code>.</p>
+</div>
 <div id="c-delete-column-by-reference" class="section level3">
 <h3>c) Delete column by reference</h3>
 <div id="remove-delay-column" class="section level4">
 <h4>– Remove <code>delay</code> column</h4>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, <span class="kw">c</span>(<span class="st">"delay"</span>) :<span class="er">=</span><span class="st"> </span><span class="ot">NULL</span>]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 409.0909</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 423.0769</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 395.5414</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 424.2857</span>
+<span class="co">#     ---                                                                                       </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 422.6866</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 444.4444</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 311.5663</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 401.6000</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 359.4545</span>
 <span class="kw">head</span>(flights)
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
 <span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
@@ -345,7 +353,31 @@ flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <p>We have already seen the use of <code>i</code> along with <code>:=</code> in <a href="#ref-i-j">Section 2b</a>. Let’s now see how we can use <code>:=</code> along with <code>by</code>.</p>
 <div id="how-can-we-add-a-new-column-which-contains-for-each-origdest-pair-the-maximum-speed" class="section level4">
 <h4>– How can we add a new column which contains for each <code>orig,dest</code> pair the maximum speed?</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, max_speed :<span class="er">=</span><span class="st"> </span><span class="kw">max</span>(speed), by=.(origin, dest)]
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, max_speed :<span class="er">=</span><span class="st"> </span><span class="kw">max</span>(speed), by =<span class="st"> </span>.(origin, dest)]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 409.0909</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 423.0769</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 395.5414</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 424.2857</span>
+<span class="co">#     ---                                                                                       </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 422.6866</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 444.4444</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 311.5663</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 401.6000</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 359.4545</span>
+<span class="co">#         max_speed</span>
+<span class="co">#      1:  526.5957</span>
+<span class="co">#      2:  526.5957</span>
+<span class="co">#      3:  526.5957</span>
+<span class="co">#      4:  517.5000</span>
+<span class="co">#      5:  526.5957</span>
+<span class="co">#     ---          </span>
+<span class="co"># 253312:  508.7425</span>
+<span class="co"># 253313:  538.4615</span>
+<span class="co"># 253314:  445.8621</span>
+<span class="co"># 253315:  456.3636</span>
+<span class="co"># 253316:  434.5055</span>
 <span class="kw">head</span>(flights)
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed max_speed</span>
 <span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490  526.5957</span>
@@ -374,6 +406,30 @@ flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">in_cols  =<span class="st"> </span><span class="kw">c</span>(<span class="st">"dep_delay"</span>, <span class="st">"arr_delay"</span>)
 out_cols =<span class="st"> </span><span class="kw">c</span>(<span class="st">"max_dep_delay"</span>, <span class="st">"max_arr_delay"</span>)
 flights[, <span class="kw">c</span>(out_cols) :<span class="er">=</span><span class="st"> </span><span class="kw">lapply</span>(.SD, max), by =<span class="st"> </span>month, .SDcols =<span class="st"> </span>in_cols]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11 409.0909</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19 423.0769</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7 395.5414</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13 424.2857</span>
+<span class="co">#     ---                                                                                       </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14 422.6866</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8 444.4444</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11 311.5663</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11 401.6000</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8 359.4545</span>
+<span class="co">#         max_speed max_dep_delay max_arr_delay</span>
+<span class="co">#      1:  526.5957           973           996</span>
+<span class="co">#      2:  526.5957           973           996</span>
+<span class="co">#      3:  526.5957           973           996</span>
+<span class="co">#      4:  517.5000           973           996</span>
+<span class="co">#      5:  526.5957           973           996</span>
+<span class="co">#     ---                                      </span>
+<span class="co"># 253312:  508.7425          1498          1494</span>
+<span class="co"># 253313:  538.4615          1498          1494</span>
+<span class="co"># 253314:  445.8621          1498          1494</span>
+<span class="co"># 253315:  456.3636          1498          1494</span>
+<span class="co"># 253316:  434.5055          1498          1494</span>
 <span class="kw">head</span>(flights)
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour    speed max_speed</span>
 <span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9 413.6490  526.5957</span>
@@ -405,6 +461,18 @@ flights[, <span class="kw">c</span>(out_cols) :<span class="er">=</span><span cl
 <p>Before moving on to the next section, let’s clean up the newly created columns <code>speed</code>, <code>max_speed</code>, <code>max_dep_delay</code> and <code>max_arr_delay</code>.</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># RHS gets automatically recycled to length of LHS</span>
 flights[, <span class="kw">c</span>(<span class="st">"speed"</span>, <span class="st">"max_speed"</span>, <span class="st">"max_dep_delay"</span>, <span class="st">"max_arr_delay"</span>) :<span class="er">=</span><span class="st"> </span><span class="ot">NULL</span>]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co">#     ---                                                                              </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8</span>
 <span class="kw">head</span>(flights)
 <span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
 <span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
@@ -421,7 +489,7 @@ flights[, <span class="kw">c</span>(<span class="st">"speed"</span>, <
 <p>Let’s say we would like to create a function that would return the <em>maximum speed</em> for each month. But at the same time, we would also like to add the column <code>speed</code> to <em>flights</em>. We could write a simple function as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">foo <-<span class="st"> </span>function(DT) {
   DT[, speed :<span class="er">=</span><span class="st"> </span>distance /<span class="st"> </span>(air_time/<span class="dv">60</span>)]
-  DT[, .(<span class="dt">max_speed =</span> <span class="kw">max</span>(speed)), by=month]
+  DT[, .(<span class="dt">max_speed =</span> <span class="kw">max</span>(speed)), by =<span class="st"> </span>month]
 }
 ans =<span class="st"> </span><span class="kw">foo</span>(flights)
 <span class="kw">head</span>(flights)
@@ -462,14 +530,26 @@ ans =<span class="st"> </span><span class="kw">foo</span>(flights)
 <h1></h1>
 <p>There are two particular places where <code>copy()</code> function is essential:</p>
 <ol style="list-style-type: decimal">
-<li><p>Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified <em>by reference</em>. As an example, let’s consider the task in the previous section, except we don’t want to modify <code>flghts</code> by reference.</p>
+<li><p>Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified <em>by reference</em>. As an example, let’s consider the task in the previous section, except we don’t want to modify <code>flights</code> by reference.</p>
 <p>Let’s first delete the <code>speed</code> column we generated in the previous section.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, speed :<span class="er">=</span><span class="st"> </span><span class="ot">NULL</span>]</code></pre></div>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, speed :<span class="er">=</span><span class="st"> </span><span class="ot">NULL</span>]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co">#     ---                                                                              </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8</span></code></pre></div>
 <p>Now, we could accomplish the task as follows:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">foo <-<span class="st"> </span>function(DT) {
-  DT <-<span class="st"> </span><span class="kw">copy</span>(DT)                             ## deep copy
-  DT[, speed :<span class="er">=</span><span class="st"> </span>distance /<span class="st"> </span>(air_time/<span class="dv">60</span>)]    ## doesn't affect 'flights'
-  DT[, .(<span class="dt">max_speed =</span> <span class="kw">max</span>(speed)), by=month]
+  DT <-<span class="st"> </span><span class="kw">copy</span>(DT)                              ## deep copy
+  DT[, speed :<span class="er">=</span><span class="st"> </span>distance /<span class="st"> </span>(air_time/<span class="dv">60</span>)]     ## doesn't affect 'flights'
+  DT[, .(<span class="dt">max_speed =</span> <span class="kw">max</span>(speed)), by =<span class="st"> </span>month]
 }
 ans <-<span class="st"> </span><span class="kw">foo</span>(flights)
 <span class="kw">head</span>(flights)
@@ -502,13 +582,15 @@ ans <-<span class="st"> </span><span class="kw">foo</span>(flights)
 <h1></h1>
 <ol start="2" style="list-style-type: decimal">
 <li><p>When we store the column names on to a variable, e.g., <code>DT_n = names(DT)</code>, and then <em>add/update/delete</em> column(s) <em>by reference</em>. It would also modify <code>DT_n</code>, unless we do <code>copy(names(DT))</code>.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x=</span><span class="dv">1</span>, <span class="dt">y=</span><span class="dv">2</span>)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x =</span> 1L, <span class="dt">y =</span> 2L)
 DT_n =<span class="st"> </span><span class="kw">names</span>(DT)
 DT_n
 <span class="co"># [1] "x" "y"</span>
 
 ## add a new column by reference
-DT[, z :<span class="er">=</span><span class="st"> </span><span class="dv">3</span>]
+DT[, z :<span class="er">=</span><span class="st"> </span>3L]
+<span class="co">#    x y z</span>
+<span class="co"># 1: 1 2 3</span>
 
 ## DT_n also gets updated
 DT_n
@@ -516,7 +598,9 @@ DT_n
 
 ## use `copy()`
 DT_n =<span class="st"> </span><span class="kw">copy</span>(<span class="kw">names</span>(DT))
-DT[, w :<span class="er">=</span><span class="st"> </span><span class="dv">4</span>]
+DT[, w :<span class="er">=</span><span class="st"> </span>4L]
+<span class="co">#    x y z w</span>
+<span class="co"># 1: 1 2 3 4</span>
 
 ## DT_n doesn't get updated
 DT_n
@@ -536,21 +620,11 @@ DT_n
 </div>
 <div id="section-16" class="section level1">
 <h1></h1>
-<p>So far we have seen a whole lot in <code>j</code>, and how to combine it with <code>by</code> and little of <code>i</code>. Let’s turn our attention back to <code>i</code> in the next vignette <em>“Keys and fast binary search based subset”</em> to peform <em>blazing fast subsets</em> by <em>keying data.tables</em>.</p>
+<p>So far we have seen a whole lot in <code>j</code>, and how to combine it with <code>by</code> and little of <code>i</code>. Let’s turn our attention back to <code>i</code> in the next vignette <em>“Keys and fast binary search based subset”</em> to perform <em>blazing fast subsets</em> by <em>keying data.tables</em>.</p>
 <hr />
 </div>
 
 
-</div>
-
-<script>
-
-// add bootstrap table styles to pandoc tables
-$(document).ready(function () {
-  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
-});
-
-</script>
 
 <!-- dynamically load mathjax for compatibility with self-contained -->
 <script>
diff --git a/inst/doc/datatable-reshape.R b/inst/doc/datatable-reshape.R
index 884d818..39056ed 100644
--- a/inst/doc/datatable-reshape.R
+++ b/inst/doc/datatable-reshape.R
@@ -2,29 +2,29 @@
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 
-## ----echo=FALSE-----------------------------------------------------------------------------------
-options(width=100)
+## ----echo = FALSE---------------------------------------------------------------------------------
+options(width = 100L)
 
 ## -------------------------------------------------------------------------------------------------
 DT = fread("melt_default.csv")
-DT 
+DT
 ## dob stands for date of birth.
 
 str(DT)
 
 ## -------------------------------------------------------------------------------------------------
-DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"), 
+DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"),
                 measure.vars = c("dob_child1", "dob_child2", "dob_child3"))
 DT.m1
 str(DT.m1)
 
 ## -------------------------------------------------------------------------------------------------
-DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"), 
+DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"),
                variable.name = "child", value.name = "dob")
 DT.m1
 
@@ -37,19 +37,19 @@ dcast(DT.m1, family_id ~ ., fun.agg = function(x) sum(!is.na(x)), value.var = "d
 ## -------------------------------------------------------------------------------------------------
 DT = fread("melt_enhanced.csv")
 DT
-## 1 = female, 2 = male 
+## 1 = female, 2 = male
 
 ## -------------------------------------------------------------------------------------------------
 DT.m1 = melt(DT, id = c("family_id", "age_mother"))
-DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed=TRUE)]
+DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)]
 DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value")
 DT.c1
 
 str(DT.c1) ## gender column is character type now!
 
 ## -------------------------------------------------------------------------------------------------
-colA = paste("dob_child", 1:3, sep="")
-colB = paste("gender_child", 1:3, sep="")
+colA = paste("dob_child", 1:3, sep = "")
+colB = paste("gender_child", 1:3, sep = "")
 DT.m2 = melt(DT, measure = list(colA, colB), value.name = c("dob", "gender"))
 DT.m2
 
diff --git a/inst/doc/datatable-reshape.Rmd b/inst/doc/datatable-reshape.Rmd
index bfde7d4..d731601 100644
--- a/inst/doc/datatable-reshape.Rmd
+++ b/inst/doc/datatable-reshape.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Efficient reshaping using data.tables"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Efficient reshaping using data.tables}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,36 +13,36 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 
 This vignette discusses the default usage of reshaping functions `melt` (wide to long) and `dcast` (long to wide) for *data.tables* as well as the **new extended functionalities** of melting and casting on *multiple columns* available from `v1.9.6`.
 
 ***
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ## Data
 
-We will load the data sets directly within sections. 
+We will load the data sets directly within sections.
 
 ## Introduction
 
-The `melt` and `dcast` functions for *data.tables* are extensions of the corresponding functions from the [reshape2](http://cran.r-project.org/package=reshape2) package.  
+The `melt` and `dcast` functions for *data.tables* are extensions of the corresponding functions from the [reshape2](https://cran.r-project.org/package=reshape2) package.
 
 In this vignette, we will
 
 1. first briefly look at the default *melting* and *casting* of *data.tables* to convert them from *wide* to *long* format and vice versa,
 
-2. then look at scenarios where the current functionalities becomes cumbersome and inefficient, 
+2. then look at scenarios where the current functionalities becomes cumbersome and inefficient,
 
 3. and finally look at the new improvements to both `melt` and `dcast` methods for *data.tables* to handle multiple columns simultaneously.
-    
+
 The extended functionalities are in line with *data.table's* philosophy of performing operations efficiently and in a straightforward manner.
 
 #### Note: {.bs-callout .bs-callout-info}
@@ -61,7 +58,7 @@ Suppose we have a `data.table` (artificial data) as shown below:
 
 ```{r}
 DT = fread("melt_default.csv")
-DT 
+DT
 ## dob stands for date of birth.
 
 str(DT)
@@ -76,7 +73,7 @@ str(DT)
 We could accomplish this using `melt()` by specifying `id.vars` and `measure.vars` arguments as follows:
 
 ```{r}
-DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"), 
+DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"),
                 measure.vars = c("dob_child1", "dob_child2", "dob_child3"))
 DT.m1
 str(DT.m1)
@@ -88,7 +85,7 @@ str(DT.m1)
 
 * We can also specify column *indices* instead of *names*.
 
-* By default, `variable` column is of type `factor`. Set `variable.factor` argument to `FALSE` if you'd like to return a *character* vector instead. `variable.factor` argument is only available in `melt` from `data.table` and not in the [`reshape2` package](http://github.com/hadley/reshape).
+* By default, `variable` column is of type `factor`. Set `variable.factor` argument to `FALSE` if you'd like to return a *character* vector instead. `variable.factor` argument is only available in `melt` from `data.table` and not in the [`reshape2` package](https://github.com/hadley/reshape).
 
 * By default, the molten columns are automatically named `variable` and `value`.
 
@@ -97,17 +94,17 @@ str(DT.m1)
 #
 
 #### - Name the `variable` and `value` columns to `child` and `dob` respectively
- 
+
 
 ```{r}
-DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"), 
+DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"),
                variable.name = "child", value.name = "dob")
 DT.m1
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* By default, when one of `id.vars` or `measure.vars` is missing, the rest of the columns are *automatically assigned* to the missing argument. 
+* By default, when one of `id.vars` or `measure.vars` is missing, the rest of the columns are *automatically assigned* to the missing argument.
 
 * When neither `id.vars` nor `measure.vars` are specified, as mentioned under `?melt`, all *non*-`numeric`, `integer`, `logical` columns will be assigned to `id.vars`.
 
@@ -117,7 +114,7 @@ DT.m1
 
 In the previous section, we saw how to get from wide form to long form. Let's see the reverse operation in this section.
 
-#### - How can we get back to the original data table `DT` from `DT.m`? 
+#### - How can we get back to the original data table `DT` from `DT.m`?
 
 That is, we'd like to collect all *child* observations corresponding to each `family_id, age_mother` together under the same row. We can accomplish it using `dcast` as follows:
 
@@ -127,16 +124,16 @@ dcast(DT.m1, family_id + age_mother ~ child, value.var = "dob")
 
 #### {.bs-callout .bs-callout-info}
 
-* `dcast` uses *formula* interface. The variables on the *LHS* of formula represents the *id* vars and *RHS* the *measure*  vars. 
+* `dcast` uses *formula* interface. The variables on the *LHS* of formula represents the *id* vars and *RHS* the *measure*  vars.
 
 * `value.var` denotes the column to be filled in with while casting to wide format.
 
 * `dcast` also tries to preserve attributes in result wherever possible.
- 
-# 
+
+#
 
 #### - Starting from `DT.m`, how can we get the number of children in each family?
- 
+
 You can also pass a function to aggregate by in `dcast` with the argument `fun.aggregate`. This is particularly essential when the formula provided does not identify single observation for each cell.
 
 ```{r}
@@ -154,14 +151,14 @@ However, there are situations we might run into where the desired operation is n
 ```{r}
 DT = fread("melt_enhanced.csv")
 DT
-## 1 = female, 2 = male 
-``` 
+## 1 = female, 2 = male
+```
 
-And you'd like to combine (melt) all the `dob` columns together, and `gender` columns together. Using the current functionalty, we can do something like this:
+And you'd like to combine (melt) all the `dob` columns together, and `gender` columns together. Using the current functionality, we can do something like this:
 
 ```{r}
 DT.m1 = melt(DT, id = c("family_id", "age_mother"))
-DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed=TRUE)]
+DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)]
 DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value")
 DT.c1
 
@@ -185,7 +182,7 @@ str(DT.c1) ## gender column is character type now!
 In fact, `base::reshape` is capable of performing this operation in a very straightforward manner. It is an extremely useful and often underrated function. You should definitely give it a try!
 
 ## 3. Enhanced (new) functionality
-		
+
 ### a) Enhanced `melt`
 
 Since we'd like for *data.tables* to perform this operation straightforward and efficient using the same interface, we went ahead and implemented an *additional functionality*, where we can `melt` to multiple columns *simultaneously*.
@@ -195,13 +192,13 @@ Since we'd like for *data.tables* to perform this operation straightforward and
 The idea is quite simple. We pass a list of columns to `measure.vars`, where each element of the list contains the columns that should be combined together.
 
 ```{r}
-colA = paste("dob_child", 1:3, sep="")
-colB = paste("gender_child", 1:3, sep="")
+colA = paste("dob_child", 1:3, sep = "")
+colB = paste("gender_child", 1:3, sep = "")
 DT.m2 = melt(DT, measure = list(colA, colB), value.name = c("dob", "gender"))
 DT.m2
 
 str(DT.m2) ## col type is preserved
-``` 
+```
 
 #### - Using `patterns()`
 
@@ -223,8 +220,8 @@ That's it!
 ### b) Enhanced `dcast`
 
 Okay great! We can now melt into multiple columns simultaneously. Now given the data set `DT.m2` as shown above, how can we get back to the same format as the original data we started with?
- 
-If we use the current functionality of `dcast`, then we'd have to cast twice and bind the results together. But that's once again verbose, not straightforward and is also inefficient. 
+
+If we use the current functionality of `dcast`, then we'd have to cast twice and bind the results together. But that's once again verbose, not straightforward and is also inefficient.
 
 #### - Casting multiple `value.var`s simultaneously
 
@@ -242,13 +239,13 @@ DT.c2
 
 * Everything is taken care of internally, and efficiently. In addition to being fast, it is also very memory efficient.
 
-# 
+#
 
 #### Multiple functions to `fun.aggregate`: {.bs-callout .bs-callout-info}
 
 You can also provide *multiple functions* to `fun.aggregate` to `dcast` for *data.tables*. Check the examples in `?dcast` which illustrates this functionality.
 
-# 
+#
 
 ***
 
diff --git a/inst/doc/datatable-reshape.html b/inst/doc/datatable-reshape.html
index 98a8776..28f3c87 100644
--- a/inst/doc/datatable-reshape.html
+++ b/inst/doc/datatable-reshape.html
@@ -8,50 +8,13 @@
 <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
 <meta name="generator" content="pandoc" />
 
+<meta name="viewport" content="width=device-width, initial-scale=1">
 
-<meta name="date" content="2015-09-18" />
+
+<meta name="date" content="2016-12-02" />
 
 <title>Efficient reshaping using data.tables</title>
 
-<script src="data:application/x-javascript;base64,LyohIGpRdWVyeSB2MS4xMS4wIHwgKGMpIDIwMDUsIDIwMTQgalF1ZXJ5IEZvdW5kYXRpb24sIEluYy4gfCBqcXVlcnkub3JnL2xpY2Vuc2UgKi8KIWZ1bmN0aW9uKGEsYil7Im9iamVjdCI9PXR5cGVvZiBtb2R1bGUmJiJvYmplY3QiPT10eXBlb2YgbW9kdWxlLmV4cG9ydHM/bW9kdWxlLmV4cG9ydHM9YS5kb2N1bWVudD9iKGEsITApOmZ1bmN0aW9uKGEpe2lmKCFhLmRvY3VtZW50KXRocm93IG5ldyBFcnJvcigialF1ZXJ5IHJlcXVpcmVzIGEgd2luZG93IHdpdGggYSBkb2N1bWVudCIpO3JldHVybiBiKGEpfTpiKGEpfSgidW5kZWZpbmVkIiE9dHlwZW9mIHdpbmRvdz93aW5kb3c6dG [...]
-<meta name="viewport" content="width=device-width, initial-scale=1" />
-<script src="data:application/x-javascript;base64,LyohCiAqIEJvb3RzdHJhcCB2My4zLjEgKGh0dHA6Ly9nZXRib290c3RyYXAuY29tKQogKiBDb3B5cmlnaHQgMjAxMS0yMDE0IFR3aXR0ZXIsIEluYy4KICogTGljZW5zZWQgdW5kZXIgTUlUIChodHRwczovL2dpdGh1Yi5jb20vdHdicy9ib290c3RyYXAvYmxvYi9tYXN0ZXIvTElDRU5TRSkKICovCmlmKCJ1bmRlZmluZWQiPT10eXBlb2YgalF1ZXJ5KXRocm93IG5ldyBFcnJvcigiQm9vdHN0cmFwJ3MgSmF2YVNjcmlwdCByZXF1aXJlcyBqUXVlcnkiKTsrZnVuY3Rpb24oYSl7dmFyIGI9YS5mbi5qcXVlcnkuc3BsaXQoIiAiKVswXS5zcGxpdCgiLiIpO2lmKGJbMF08MiYmYlsxXTw5fH [...]
-<script src="data:application/x-javascript;base64,LyoqCiogQHByZXNlcnZlIEhUTUw1IFNoaXYgMy43LjIgfCBAYWZhcmthcyBAamRhbHRvbiBAam9uX25lYWwgQHJlbSB8IE1JVC9HUEwyIExpY2Vuc2VkCiovCi8vIE9ubHkgcnVuIHRoaXMgY29kZSBpbiBJRSA4CmlmICghIXdpbmRvdy5uYXZpZ2F0b3IudXNlckFnZW50Lm1hdGNoKCJNU0lFIDgiKSkgewohZnVuY3Rpb24oYSxiKXtmdW5jdGlvbiBjKGEsYil7dmFyIGM9YS5jcmVhdGVFbGVtZW50KCJwIiksZD1hLmdldEVsZW1lbnRzQnlUYWdOYW1lKCJoZWFkIilbMF18fGEuZG9jdW1lbnRFbGVtZW50O3JldHVybiBjLmlubmVySFRNTD0ieDxzdHlsZT4iK2IrIjwvc3R5bGU+IixkLm [...]
-<script src="data:application/x-javascript;base64,LyohIFJlc3BvbmQuanMgdjEuNC4yOiBtaW4vbWF4LXdpZHRoIG1lZGlhIHF1ZXJ5IHBvbHlmaWxsICogQ29weXJpZ2h0IDIwMTMgU2NvdHQgSmVobAogKiBMaWNlbnNlZCB1bmRlciBodHRwczovL2dpdGh1Yi5jb20vc2NvdHRqZWhsL1Jlc3BvbmQvYmxvYi9tYXN0ZXIvTElDRU5TRS1NSVQKICogICovCgppZiAoISF3aW5kb3cubmF2aWdhdG9yLnVzZXJBZ2VudC5tYXRjaCgiTVNJRSA4IikpIHsKIWZ1bmN0aW9uKGEpeyJ1c2Ugc3RyaWN0IjthLm1hdGNoTWVkaWE9YS5tYXRjaE1lZGlhfHxmdW5jdGlvbihhKXt2YXIgYixjPWEuZG9jdW1lbnRFbGVtZW50LGQ9Yy5maXJzdEVsZW1lbn [...]
-<style type="text/css">
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 400;
-  src: url(fonts/OpenSans.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: normal;
-  font-weight: 700;
-  src: url(fonts/OpenSansBold.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 400;
-  src: url(fonts/OpenSansItalic.ttf) format('truetype');
-}
- at font-face {
-  font-family: 'Open Sans';
-  font-style: italic;
-  font-weight: 700;
-  src: url(fonts/OpenSansBoldItalic.ttf) format('truetype');
-}
-
-/*!
- * bootswatch v3.3.1+1
- * Homepage: http://bootswatch.com
- * Copyright 2012-2014 Thomas Park
- * Licensed under MIT
- * Based on Bootstrap
-*//*! normalize.css v3.0.2 | MIT License | git.io/normalize */html{font-family:sans-serif;-ms-text-size-adjust:100%;-webkit-text-size-adjust:100%}body{margin:0}article,aside,details,figcaption,figure,footer,header,hgroup,main,menu,nav,section,summary{display:block}audio,canvas,progress,video{display:inline-block;vertical-align:baseline}audio:not([controls]){display:none;height:0}[hidden],template{display:none}a{background-color:transparent}a:active,a:hover{outline:0}abbr[title]{border-bo [...]
-</style>
 
 
 <style type="text/css">code{white-space: pre;}</style>
@@ -92,41 +55,21 @@ code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Ann
 code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
 code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
 </style>
-<style type="text/css">
-  pre:not([class]) {
-    background-color: white;
-  }
-</style>
 
 
-<link href="data:text/css;charset=utf-8,code%2C%0Akbd%2C%0Apre%2C%0Asamp%20%7B%0Afont%2Dfamily%3A%20Source%20Code%20Pro%2C%20Inconsolata%2C%20Monaco%2C%20Consolas%2C%20Menlo%2C%20Courier%20New%2C%20monospace%3B%0A%7D%0Acode%20%7B%0Apadding%3A%200px%202px%3B%0Afont%2Dsize%3A%2090%25%3B%0Acolor%3A%20%23c7254e%3B%0Awhite%2Dspace%3A%20nowrap%3B%0Abackground%2Dcolor%3A%20%23f9f2f4%3B%0Aborder%2Dradius%3A%203px%3B%0Aborder%3A%200px%3B%0A%7D%0Apre%20%7B%0Adisplay%3A%20block%3B%0Apadding%3A%209% [...]
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
 
 </head>
 
 <body>
 
-<style type="text/css">
-.main-container {
-  max-width: 940px;
-  margin-left: auto;
-  margin-right: auto;
-}
-code {
-  color: inherit;
-  background-color: rgba(0, 0, 0, 0.04);
-}
-img { 
-  max-width:100%; 
-  height: auto; 
-}
-</style>
-<div class="container-fluid main-container">
 
 
-<div id="header">
-<h1 class="title">Efficient reshaping using data.tables</h1>
-<h4 class="date"><em>2015-09-18</em></h4>
-</div>
+
+<h1 class="title toc-ignore">Efficient reshaping using data.tables</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
+
 
 
 <p>This vignette discusses the default usage of reshaping functions <code>melt</code> (wide to long) and <code>dcast</code> (long to wide) for <em>data.tables</em> as well as the <strong>new extended functionalities</strong> of melting and casting on <em>multiple columns</em> available from <code>v1.9.6</code>.</p>
@@ -137,7 +80,7 @@ img {
 </div>
 <div id="introduction" class="section level2">
 <h2>Introduction</h2>
-<p>The <code>melt</code> and <code>dcast</code> functions for <em>data.tables</em> are extensions of the corresponding functions from the <a href="http://cran.r-project.org/package=reshape2">reshape2</a> package.</p>
+<p>The <code>melt</code> and <code>dcast</code> functions for <em>data.tables</em> are extensions of the corresponding functions from the <a href="https://cran.r-project.org/package=reshape2">reshape2</a> package.</p>
 <p>In this vignette, we will</p>
 <ol style="list-style-type: decimal">
 <li><p>first briefly look at the default <em>melting</em> and <em>casting</em> of <em>data.tables</em> to convert them from <em>wide</em> to <em>long</em> format and vice versa,</p></li>
@@ -156,7 +99,7 @@ img {
 <h3>a) <code>melt</code>ing <em>data.tables</em> (wide to long)</h3>
 <p>Suppose we have a <code>data.table</code> (artificial data) as shown below:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT =<span class="st"> </span><span class="kw">fread</span>(<span class="st">"melt_default.csv"</span>)
-DT 
+DT
 <span class="co">#    family_id age_mother dob_child1 dob_child2 dob_child3</span>
 <span class="co"># 1:         1         30 1998-11-26 2000-01-29         NA</span>
 <span class="co"># 2:         2         27 1996-06-22         NA         NA</span>
@@ -180,7 +123,7 @@ DT
 <div id="convert-dt-to-long-form-where-each-dob-is-a-separate-observation." class="section level4">
 <h4>- Convert <code>DT</code> to <em>long</em> form where each <code>dob</code> is a separate observation.</h4>
 <p>We could accomplish this using <code>melt()</code> by specifying <code>id.vars</code> and <code>measure.vars</code> arguments as follows:</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT.m1 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">id.vars =</span> <span class="kw">c</span>(<span class="st">"family_id"</span>, <span class="st">"age_mother"</span>), 
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT.m1 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">id.vars =</span> <span class="kw">c</span>(<span class="st">"family_id"</span>, <span class="st">"age_mother"</span>),
                 <span class="dt">measure.vars =</span> <span class="kw">c</span>(<span class="st">"dob_child1"</span>, <span class="st">"dob_child2"</span>, <span class="st">"dob_child3"</span>))
 DT.m1
 <span class="co">#     family_id age_mother   variable      value</span>
@@ -212,7 +155,7 @@ DT.m1
 <ul>
 <li><p><code>measure.vars</code> specify the set of columns we would like to collapse (or combine) together.</p></li>
 <li><p>We can also specify column <em>indices</em> instead of <em>names</em>.</p></li>
-<li><p>By default, <code>variable</code> column is of type <code>factor</code>. Set <code>variable.factor</code> argument to <code>FALSE</code> if you’d like to return a <em>character</em> vector instead. <code>variable.factor</code> argument is only available in <code>melt</code> from <code>data.table</code> and not in the <a href="http://github.com/hadley/reshape"><code>reshape2</code> package</a>.</p></li>
+<li><p>By default, <code>variable</code> column is of type <code>factor</code>. Set <code>variable.factor</code> argument to <code>FALSE</code> if you’d like to return a <em>character</em> vector instead. <code>variable.factor</code> argument is only available in <code>melt</code> from <code>data.table</code> and not in the <a href="https://github.com/hadley/reshape"><code>reshape2</code> package</a>.</p></li>
 <li><p>By default, the molten columns are automatically named <code>variable</code> and <code>value</code>.</p></li>
 <li><p><code>melt</code> preserves column attributes in result.</p></li>
 </ul>
@@ -222,7 +165,7 @@ DT.m1
 <h1></h1>
 <div id="name-the-variable-and-value-columns-to-child-and-dob-respectively" class="section level4">
 <h4>- Name the <code>variable</code> and <code>value</code> columns to <code>child</code> and <code>dob</code> respectively</h4>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT.m1 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">measure.vars =</span> <span class="kw">c</span>(<span class="st">"dob_child1"</span>, <span class="st">"dob_child2"</span>, <span class="st">"dob_child3"</span>), 
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT.m1 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">measure.vars =</span> <span class="kw">c</span>(<span class="st">"dob_child1"</span>, <span class="st">"dob_child2"</span>, <span class="st">"dob_child3"</span>),
                <span class="dt">variable.name =</span> <span class="st">"child"</span>, <span class="dt">value.name =</span> <span class="st">"dob"</span>)
 DT.m1
 <span class="co">#     family_id age_mother      child        dob</span>
@@ -300,15 +243,46 @@ DT
 <span class="co"># 3:         3         26 2002-07-11 2004-04-05 2007-09-02             2             2             1</span>
 <span class="co"># 4:         4         32 2004-10-10 2009-08-27 2012-07-21             1             1             1</span>
 <span class="co"># 5:         5         29 2000-12-05 2005-02-28         NA             2             1            NA</span>
-## 1 = female, 2 = male </code></pre></div>
-<p>And you’d like to combine (melt) all the <code>dob</code> columns together, and <code>gender</code> columns together. Using the current functionalty, we can do something like this:</p>
+## 1 = female, 2 = male</code></pre></div>
+<p>And you’d like to combine (melt) all the <code>dob</code> columns together, and <code>gender</code> columns together. Using the current functionality, we can do something like this:</p>
 <div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">DT.m1 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">id =</span> <span class="kw">c</span>(<span class="st">"family_id"</span>, <span class="st">"age_mother"</span>))
 <span class="co"># Warning in melt.data.table(DT, id = c("family_id", "age_mother")): 'measure.vars' [dob_child1,</span>
-<span class="co"># dob_child2, dob_child3, gender_child1, gender_child2, gender_child3] are not all of the same</span>
-<span class="co"># type. By order of hierarchy, the molten data value column will be of type 'character'. All measure</span>
-<span class="co"># variables not of type 'character' will be coerced to. Check DETAILS in ?melt.data.table for more on</span>
-<span class="co"># coercion.</span>
-DT.m1[, <span class="kw">c</span>(<span class="st">"variable"</span>, <span class="st">"child"</span>) :<span class="er">=</span><span class="st"> </span><span class="kw">tstrsplit</span>(variable, <span class="st">"_"</span>, <span class="dt">fixed=</span><span class="ot">TRUE</span>)]
+<span class="co"># dob_child2, dob_child3, gender_child1, ...] are not all of the same type. By order of hierarchy, the</span>
+<span class="co"># molten data value column will be of type 'character'. All measure variables not of type 'character'</span>
+<span class="co"># will be coerced to. Check DETAILS in ?melt.data.table for more on coercion.</span>
+DT.m1[, <span class="kw">c</span>(<span class="st">"variable"</span>, <span class="st">"child"</span>) :<span class="er">=</span><span class="st"> </span><span class="kw">tstrsplit</span>(variable, <span class="st">"_"</span>, <span class="dt">fixed =</span> <span class="ot">TRUE</span>)]
+<span class="co">#     family_id age_mother variable      value  child</span>
+<span class="co">#  1:         1         30      dob 1998-11-26 child1</span>
+<span class="co">#  2:         2         27      dob 1996-06-22 child1</span>
+<span class="co">#  3:         3         26      dob 2002-07-11 child1</span>
+<span class="co">#  4:         4         32      dob 2004-10-10 child1</span>
+<span class="co">#  5:         5         29      dob 2000-12-05 child1</span>
+<span class="co">#  6:         1         30      dob 2000-01-29 child2</span>
+<span class="co">#  7:         2         27      dob         NA child2</span>
+<span class="co">#  8:         3         26      dob 2004-04-05 child2</span>
+<span class="co">#  9:         4         32      dob 2009-08-27 child2</span>
+<span class="co"># 10:         5         29      dob 2005-02-28 child2</span>
+<span class="co"># 11:         1         30      dob         NA child3</span>
+<span class="co"># 12:         2         27      dob         NA child3</span>
+<span class="co"># 13:         3         26      dob 2007-09-02 child3</span>
+<span class="co"># 14:         4         32      dob 2012-07-21 child3</span>
+<span class="co"># 15:         5         29      dob         NA child3</span>
+<span class="co"># 16:         1         30   gender          1 child1</span>
+<span class="co"># 17:         2         27   gender          2 child1</span>
+<span class="co"># 18:         3         26   gender          2 child1</span>
+<span class="co"># 19:         4         32   gender          1 child1</span>
+<span class="co"># 20:         5         29   gender          2 child1</span>
+<span class="co"># 21:         1         30   gender          2 child2</span>
+<span class="co"># 22:         2         27   gender         NA child2</span>
+<span class="co"># 23:         3         26   gender          2 child2</span>
+<span class="co"># 24:         4         32   gender          1 child2</span>
+<span class="co"># 25:         5         29   gender          1 child2</span>
+<span class="co"># 26:         1         30   gender         NA child3</span>
+<span class="co"># 27:         2         27   gender         NA child3</span>
+<span class="co"># 28:         3         26   gender          1 child3</span>
+<span class="co"># 29:         4         32   gender          1 child3</span>
+<span class="co"># 30:         5         29   gender         NA child3</span>
+<span class="co">#     family_id age_mother variable      value  child</span>
 DT.c1 =<span class="st"> </span><span class="kw">dcast</span>(DT.m1, family_id +<span class="st"> </span>age_mother +<span class="st"> </span>child ~<span class="st"> </span>variable, <span class="dt">value.var =</span> <span class="st">"value"</span>)
 DT.c1
 <span class="co">#     family_id age_mother  child        dob gender</span>
@@ -360,8 +334,8 @@ DT.c1
 <div id="melt-multiple-columns-simultaneously" class="section level4">
 <h4>- <code>melt</code> multiple columns simultaneously</h4>
 <p>The idea is quite simple. We pass a list of columns to <code>measure.vars</code>, where each element of the list contains the columns that should be combined together.</p>
-<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">colA =<span class="st"> </span><span class="kw">paste</span>(<span class="st">"dob_child"</span>, <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">sep=</span><span class="st">""</span>)
-colB =<span class="st"> </span><span class="kw">paste</span>(<span class="st">"gender_child"</span>, <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">sep=</span><span class="st">""</span>)
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">colA =<span class="st"> </span><span class="kw">paste</span>(<span class="st">"dob_child"</span>, <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">sep =</span> <span class="st">""</span>)
+colB =<span class="st"> </span><span class="kw">paste</span>(<span class="st">"gender_child"</span>, <span class="dv">1</span>:<span class="dv">3</span>, <span class="dt">sep =</span> <span class="st">""</span>)
 DT.m2 =<span class="st"> </span><span class="kw">melt</span>(DT, <span class="dt">measure =</span> <span class="kw">list</span>(colA, colB), <span class="dt">value.name =</span> <span class="kw">c</span>(<span class="st">"dob"</span>, <span class="st">"gender"</span>))
 DT.m2
 <span class="co">#     family_id age_mother variable        dob gender</span>
@@ -461,16 +435,6 @@ DT.c2
 </div>
 
 
-</div>
-
-<script>
-
-// add bootstrap table styles to pandoc tables
-$(document).ready(function () {
-  $('tr.header').parent('thead').parent('table').addClass('table table-condensed');
-});
-
-</script>
 
 <!-- dynamically load mathjax for compatibility with self-contained -->
 <script>
diff --git a/inst/doc/datatable-secondary-indices-and-auto-indexing.R b/inst/doc/datatable-secondary-indices-and-auto-indexing.R
new file mode 100644
index 0000000..d4377a8
--- /dev/null
+++ b/inst/doc/datatable-secondary-indices-and-auto-indexing.R
@@ -0,0 +1,112 @@
+## ---- echo = FALSE, message = FALSE--------------------------------------
+require(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+
+## ----echo = FALSE---------------------------------------------------------------------------------
+options(width = 100L)
+
+## -------------------------------------------------------------------------------------------------
+flights <- fread("flights14.csv")
+head(flights)
+dim(flights)
+
+## -------------------------------------------------------------------------------------------------
+setindex(flights, origin)
+head(flights)
+
+## alternatively we can provide character vectors to the function 'setindexv()'
+# setindexv(flights, "origin") # useful to program with
+
+# 'index' attribute added
+names(attributes(flights))
+
+## -------------------------------------------------------------------------------------------------
+indices(flights)
+
+setindex(flights, origin, dest)
+indices(flights)
+
+## ---- eval = FALSE--------------------------------------------------------------------------------
+#  ## not run
+#  setkey(flights, origin)
+#  flights["JFK"] # or flights[.("JFK")]
+
+## ---- eval = FALSE--------------------------------------------------------------------------------
+#  ## not run
+#  setkey(flights, dest)
+#  flights["LAX"]
+
+## -------------------------------------------------------------------------------------------------
+flights["JFK", on = "origin"]
+
+## alternatively
+# flights[.("JFK"), on = "origin"] (or) 
+# flights[list("JFK"), on = "origin"]
+
+## -------------------------------------------------------------------------------------------------
+setindex(flights, origin)
+flights["JFK", on = "origin", verbose = TRUE][1:5]
+
+## -------------------------------------------------------------------------------------------------
+flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
+
+## -------------------------------------------------------------------------------------------------
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")]
+
+## -------------------------------------------------------------------------------------------------
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")][order(-arr_delay)]
+
+## -------------------------------------------------------------------------------------------------
+flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
+
+## -------------------------------------------------------------------------------------------------
+# get all 'hours' in flights
+flights[, sort(unique(hour))]
+
+## -------------------------------------------------------------------------------------------------
+flights[.(24L), hour := 0L, on = "hour"]
+
+## -------------------------------------------------------------------------------------------------
+flights[, sort(unique(hour))]
+
+## -------------------------------------------------------------------------------------------------
+ans <- flights["JFK", max(dep_delay), keyby = month, on = "origin"]
+head(ans)
+
+## -------------------------------------------------------------------------------------------------
+flights[c("BOS", "DAY"), on = "dest", mult = "first"]
+
+## -------------------------------------------------------------------------------------------------
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), on = c("origin", "dest"), mult = "last"]
+
+## -------------------------------------------------------------------------------------------------
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
+
+## -------------------------------------------------------------------------------------------------
+set.seed(1L)
+dt = data.table(x = sample(1e5L, 1e7L, TRUE), y = runif(100L))
+print(object.size(dt), units = "Mb")
+
+## -------------------------------------------------------------------------------------------------
+## have a look at all the attribute names
+names(attributes(dt))
+
+## run thefirst time
+(t1 <- system.time(ans <- dt[x == 989L]))
+head(ans)
+
+## secondary index is created
+names(attributes(dt))
+
+indices(dt)
+
+## -------------------------------------------------------------------------------------------------
+## successive subsets
+(t2 <- system.time(dt[x == 989L]))
+system.time(dt[x %in% 1989:2012])
+
diff --git a/inst/doc/datatable-secondary-indices-and-auto-indexing.Rmd b/inst/doc/datatable-secondary-indices-and-auto-indexing.Rmd
new file mode 100644
index 0000000..a880625
--- /dev/null
+++ b/inst/doc/datatable-secondary-indices-and-auto-indexing.Rmd
@@ -0,0 +1,327 @@
+---
+title: "Secondary indices and auto indexing"
+date: "`r Sys.Date()`"
+output: 
+  rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Secondary indices and auto indexing}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+```{r, echo = FALSE, message = FALSE}
+require(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+```
+
+This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familar with these concepts, please read the *"Introduction to data.table"*,  *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first.
+
+***
+
+## Data {#data}
+
+We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
+
+```{r echo = FALSE}
+options(width = 100L)
+```
+
+```{r}
+flights <- fread("flights14.csv")
+head(flights)
+dim(flights)
+```
+
+## Introduction
+
+In this vignette, we will
+
+* discuss *secondary indices* and provide rationale as to why we need them by citing cases where setting keys is not necessarily ideal,
+
+* perform fast subsetting, once again, but using the new `on` argument, which computes secondary indices internally for the task (temporarily), and reuses if one already exists,
+
+* and finally look at *auto indexing* which goes a step further and creates secondary indices automatically, but does so on native R syntax for subsetting.
+
+## 1. Secondary indices
+
+### a) What are secondary indices?
+
+Secondary indices are similar to `keys` in *data.table*, except for two major differences:
+
+* It *doesn't* physically reorder the entire data.table in RAM. Instead, it only computes the order for the set of columns provided and stores that *order vector* in an additional attribute called `index`.
+
+* There can be more than one secondary index for a data.table (as we will see below).
+
+### b) Set and get secondary indices
+
+#### -- How can we set the column `origin` as a secondary index in the *data.table* `flights`?
+
+```{r}
+setindex(flights, origin)
+head(flights)
+
+## alternatively we can provide character vectors to the function 'setindexv()'
+# setindexv(flights, "origin") # useful to program with
+
+# 'index' attribute added
+names(attributes(flights))
+```
+
+* `setindex` and `setindexv()` allows adding a secondary index to the data.table.
+
+* Note that `flights` is **not** phyiscally reordered in increasing order of `origin`, as would have been the case with `setkey()`.
+
+* Also note that the attribute `index` has been added to `flights`. 
+
+* `setindex(flights, NULL)` would remove all secondary indices.
+
+#### -- How can we get all the secondary indices set so far in `flights`?
+
+```{r}
+indices(flights)
+
+setindex(flights, origin, dest)
+indices(flights)
+```
+
+* The function `indices()` returns all current secondary indices in the data.table. If none exists, `NULL` is returned.
+
+* Note that by creating another index on the columns `origin, dest`, we do not lose the first index created on the column `origin`, i.e., we can have multiple secondary indices.
+
+### c) Why do we need secondary indices?
+
+#### -- Reordering a data.table can be expensive and not always ideal
+
+Consider the case where you would like to perform a fast key based subset on `origin` column for the value "JFK". We'd do this as:
+
+```{r, eval = FALSE}
+## not run
+setkey(flights, origin)
+flights["JFK"] # or flights[.("JFK")]
+```
+
+#### `setkey()` requires: {.bs-callout .bs-callout-info}
+
+a) computing the order vector for the column(s) provided, here, `origin`, and
+
+b) reordering the entire data.table, by reference, based on the order vector computed.
+
+# 
+
+Computing the order isn't the time consuming part, since data.table uses true radix sorting on integer, character and numeric vectors. However reordering the data.table could be time consuming (depending on the number of rows and columns). 
+
+Unless our task involves repeated subsetting on the same column, fast key based subsetting could effectively be nullified by the time to reorder, depending on our data.table dimensions.
+
+#### -- There can be only one `key` at the most
+
+Now if we would like to repeat the same operation but on `dest` column instead, for the value "LAX", then we have to `setkey()`, *again*. 
+
+```{r, eval = FALSE}
+## not run
+setkey(flights, dest)
+flights["LAX"]
+```
+
+And this reorders `flights` by `dest`, *again*. What we would really like is to be able to perform the fast subsetting by eliminating the reordering step. 
+
+And this is precisely what *secondary indices* allow for!
+
+#### -- Secondary indices can be reused
+
+Since there can be multiple secondary indices, and creating an index is as simple as storing the order vector as an attribute, this allows us to even eliminate the time to recompute the order vector if an index already exists.
+
+#### -- The new `on` argument allows for cleaner syntax and automatic creation and reuse of secondary indices
+
+As we will see in the next section, the `on` argument provides several advantages:
+
+#### `on` argument {.bs-callout .bs-callout-info}
+
+* enables subsetting by computing secondary indices on the fly. This eliminates having to do `setindex()` every time.
+
+* allows easy reuse of existing indices by just checking the attributes.
+
+* allows for a cleaner syntax by having the columns on which the subset is performed as part of the syntax. This makes the code easier to follow when looking at it at a later point. 
+
+    Note that `on` argument can also be used on keyed subsets as well. In fact, we encourage to provide the `on` argument even when subsetting using keys for better readability.
+
+# 
+
+## 2. Fast subsetting using `on` argument and secondary indices
+
+### a) Fast subsets in `i`
+
+#### -- Subset all rows where the origin airport matches *"JFK"* using `on`
+
+```{r}
+flights["JFK", on = "origin"]
+
+## alternatively
+# flights[.("JFK"), on = "origin"] (or) 
+# flights[list("JFK"), on = "origin"]
+```
+
+* This statement performs a fast binary search based subset as well, by computing the index on the fly. However, note that it doesn't save the index as an attribute automatically. This may change in the future.
+
+* If we had already created a secondary index, using `setindex()`, then `on` would reuse it instead of (re)computing it. We can see that by using `verbose = TRUE`:
+
+    ```{r}
+    setindex(flights, origin)
+    flights["JFK", on = "origin", verbose = TRUE][1:5]
+    ```
+
+#### -- How can I subset based on `origin` *and* `dest` columns?
+
+For example, if we want to subset `"JFK", "LAX"` combination, then:
+
+```{r}
+flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
+```
+
+* `on` argument accepts a character vector of column names corresponding to the order provided to `i-argument`.
+
+* Since the time to compute the secondary index is quite small, we don't have to use `setindex()`, unless, once again, the task involves repeated subsetting on the same column.
+
+### b) Select in `j`
+
+All the operations we will discuss below are no different to the ones we already saw in the *Keys and fast binary search based subset* vignette. Except we'll be using the `on` argument instead of setting keys.
+
+#### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"`
+
+```{r}
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")]
+```
+
+### c) Chaining
+
+#### -- On the result obtained above, use chaining to order the column in decreasing order.
+
+```{r}
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")][order(-arr_delay)]
+```
+
+### d) Compute or *do* in `j`
+
+#### -- Find the maximum arrival delay correspondong to `origin = "LGA"` and `dest = "TPA"`.
+
+```{r}
+flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
+```
+
+### e) *sub-assign* by reference using `:=` in `j`
+
+We have seen this example already in the *Reference semantics* and *Keys and fast binary search based subset* vignette. Let's take a look at all the `hours` available in the `flights` *data.table*:
+
+```{r}
+# get all 'hours' in flights
+flights[, sort(unique(hour))]
+```
+
+We see that there are totally `25` unique values in the data. Both *0* and *24* hours seem to be present. Let's go ahead and replace *24* with *0*, but this time using `on` instead of setting keys.
+
+```{r}
+flights[.(24L), hour := 0L, on = "hour"]
+```
+
+Now, let's check if `24` is replaced with `0` in the `hour` column.
+
+```{r}
+flights[, sort(unique(hour))]
+```
+
+* This is particularly a huge advantage of secondary indices. Previously, just to update a few rows of `hour`, we had to `setkey()` on it, which inevitablly reorders the entire data.table. With `on`, the order is preserved, and the operation is much faster! Looking at the code, the task we wanted to perform is also quite clear.
+
+### f) Aggregation using `by`
+
+#### -- Get the maximum departure delay for each `month` corresponding to `origin = "JFK"`. Order the result by `month`
+
+```{r}
+ans <- flights["JFK", max(dep_delay), keyby = month, on = "origin"]
+head(ans)
+```
+
+* We would have had to set the `key` back to `origin, dest` again, if we did not use `on` which internally builds secondary indices on the fly.
+
+### g) The *mult* argument
+
+The other arguments including `mult` work exactly the same way as we saw in the *Keys and fast binary search based subset* vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
+
+#### -- Subset only the first matching row where `dest` matches *"BOS"* and *"DAY"*
+
+```{r}
+flights[c("BOS", "DAY"), on = "dest", mult = "first"]
+```
+
+#### -- Subset only the last matching row where `origin` matches *"LGA", "JFK", "EWR"* and `dest` matches *"XNA"*
+
+```{r}
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), on = c("origin", "dest"), mult = "last"]
+```
+
+### h) The *nomatch* argument
+
+We can choose if queries that do not match should return `NA` or be skipped altogether using the `nomatch` argument.
+
+#### -- From the previous example, subset all rows only if there's a match
+
+```{r}
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
+```
+
+* There are no flights connecting "JFK" and "XNA". Therefore, that row is skipped in the result.
+
+## 3. Auto indexing
+
+First we looked at how to fast subset using binary search using *keys*. Then we figured out that we could improve performance even further and have more cleaner syntax by using secondary indices. What could be better than that? The answer is to optimise *native R syntax* to use secondary indices internally so that we can have the same performance without having to use newer syntax.
+
+That is what *auto indexing* does. At the moment, it is only implemented for binary operators `==` and `%in%`. And it only works with a single column at the moment as well. An index is automatically created *and* saved as an attribute. That is, unlike the `on` argument which computes the index on the fly each time, a secondary index is created here. 
+
+Let's start by creating a data.table big enough to highlight the advantage.
+
+```{r}
+set.seed(1L)
+dt = data.table(x = sample(1e5L, 1e7L, TRUE), y = runif(100L))
+print(object.size(dt), units = "Mb")
+```
+
+When we use `==` or `%in%` on a single column for the first time, a secondary index is created automtically, and it is used to perform the subset.
+
+```{r}
+## have a look at all the attribute names
+names(attributes(dt))
+
+## run thefirst time
+(t1 <- system.time(ans <- dt[x == 989L]))
+head(ans)
+
+## secondary index is created
+names(attributes(dt))
+
+indices(dt)
+```
+
+The time to subset the first time is the time to create the index + the time to subset. Since creating a secondary index involves only creating the order vector, this combined operation is faster than vector scans in many cases. But the real advantage comes in successive subsets. They are extremely fast.
+
+```{r}
+## successive subsets
+(t2 <- system.time(dt[x == 989L]))
+system.time(dt[x %in% 1989:2012])
+```
+
+* Running the first time took `r sprintf("%.3f", t1["elapsed"])` seconds where as the second time took `r sprintf("%.3f", t2["elapsed"])` seconds. 
+
+* Auto indexing can be disabled by setting the global argument `options(datatable.auto.index = FALSE)`.
+
+* Disabling auto indexing still allows to use indices created explicitly with `setindex` or `setindexv`. You can disable indices fully by setting global argument `options(datatable.use.index = FALSE)`.
+
+# 
+
+In the future, we plan to extend auto indexing to expressions involving more than one column. Also we are working on extending binary search to work with more binary operators like `<`, `<=`, `>` and `>=`. Once done, it would be straightforward to extend it to these operators as well. 
+
+We will extend fast *subsets* using keys and secondary indices to *joins* in the next vignette, *"Joins and rolling joins"*.
+
+***
diff --git a/inst/doc/datatable-secondary-indices-and-auto-indexing.html b/inst/doc/datatable-secondary-indices-and-auto-indexing.html
new file mode 100644
index 0000000..eb6371d
--- /dev/null
+++ b/inst/doc/datatable-secondary-indices-and-auto-indexing.html
@@ -0,0 +1,453 @@
+<!DOCTYPE html>
+
+<html xmlns="http://www.w3.org/1999/xhtml">
+
+<head>
+
+<meta charset="utf-8">
+<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
+<meta name="generator" content="pandoc" />
+
+<meta name="viewport" content="width=device-width, initial-scale=1">
+
+
+<meta name="date" content="2016-12-02" />
+
+<title>Secondary indices and auto indexing</title>
+
+
+
+<style type="text/css">code{white-space: pre;}</style>
+<style type="text/css">
+div.sourceCode { overflow-x: auto; }
+table.sourceCode, tr.sourceCode, td.lineNumbers, td.sourceCode {
+  margin: 0; padding: 0; vertical-align: baseline; border: none; }
+table.sourceCode { width: 100%; line-height: 100%; }
+td.lineNumbers { text-align: right; padding-right: 4px; padding-left: 4px; color: #aaaaaa; border-right: 1px solid #aaaaaa; }
+td.sourceCode { padding-left: 5px; }
+code > span.kw { color: #007020; font-weight: bold; } /* Keyword */
+code > span.dt { color: #902000; } /* DataType */
+code > span.dv { color: #40a070; } /* DecVal */
+code > span.bn { color: #40a070; } /* BaseN */
+code > span.fl { color: #40a070; } /* Float */
+code > span.ch { color: #4070a0; } /* Char */
+code > span.st { color: #4070a0; } /* String */
+code > span.co { color: #60a0b0; font-style: italic; } /* Comment */
+code > span.ot { color: #007020; } /* Other */
+code > span.al { color: #ff0000; font-weight: bold; } /* Alert */
+code > span.fu { color: #06287e; } /* Function */
+code > span.er { color: #ff0000; font-weight: bold; } /* Error */
+code > span.wa { color: #60a0b0; font-weight: bold; font-style: italic; } /* Warning */
+code > span.cn { color: #880000; } /* Constant */
+code > span.sc { color: #4070a0; } /* SpecialChar */
+code > span.vs { color: #4070a0; } /* VerbatimString */
+code > span.ss { color: #bb6688; } /* SpecialString */
+code > span.im { } /* Import */
+code > span.va { color: #19177c; } /* Variable */
+code > span.cf { color: #007020; font-weight: bold; } /* ControlFlow */
+code > span.op { color: #666666; } /* Operator */
+code > span.bu { } /* BuiltIn */
+code > span.ex { } /* Extension */
+code > span.pp { color: #bc7a00; } /* Preprocessor */
+code > span.at { color: #7d9029; } /* Attribute */
+code > span.do { color: #ba2121; font-style: italic; } /* Documentation */
+code > span.an { color: #60a0b0; font-weight: bold; font-style: italic; } /* Annotation */
+code > span.cv { color: #60a0b0; font-weight: bold; font-style: italic; } /* CommentVar */
+code > span.in { color: #60a0b0; font-weight: bold; font-style: italic; } /* Information */
+</style>
+
+
+
+<link href="data:text/css;charset=utf-8,body%20%7B%0Abackground%2Dcolor%3A%20%23fff%3B%0Amargin%3A%201em%20auto%3B%0Amax%2Dwidth%3A%20700px%3B%0Aoverflow%3A%20visible%3B%0Apadding%2Dleft%3A%202em%3B%0Apadding%2Dright%3A%202em%3B%0Afont%2Dfamily%3A%20%22Open%20Sans%22%2C%20%22Helvetica%20Neue%22%2C%20Helvetica%2C%20Arial%2C%20sans%2Dserif%3B%0Afont%2Dsize%3A%2014px%3B%0Aline%2Dheight%3A%201%2E35%3B%0A%7D%0A%23header%20%7B%0Atext%2Dalign%3A%20center%3B%0A%7D%0A%23TOC%20%7B%0Aclear%3A%20bot [...]
+
+</head>
+
+<body>
+
+
+
+
+<h1 class="title toc-ignore">Secondary indices and auto indexing</h1>
+<h4 class="date"><em>2016-12-02</em></h4>
+
+
+
+<p>This vignette assumes that the reader is familiar with data.table’s <code>[i, j, by]</code> syntax, and how to perform fast key based subsets. If you’re not familar with these concepts, please read the <em>“Introduction to data.table”</em>, <em>“Reference semantics”</em> and <em>“Keys and fast binary search based subset”</em> vignettes first.</p>
+<hr />
+<div id="data" class="section level2">
+<h2>Data</h2>
+<p>We will use the same <code>flights</code> data as in the <em>“Introduction to data.table”</em> vignette.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights <-<span class="st"> </span><span class="kw">fread</span>(<span class="st">"flights14.csv"</span>)
+<span class="kw">head</span>(flights)
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co"># 3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co"># 4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co"># 5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co"># 6: 2014     1   1         4         0      AA    EWR  LAX      339     2454   18</span>
+<span class="kw">dim</span>(flights)
+<span class="co"># [1] 253316     11</span></code></pre></div>
+</div>
+<div id="introduction" class="section level2">
+<h2>Introduction</h2>
+<p>In this vignette, we will</p>
+<ul>
+<li><p>discuss <em>secondary indices</em> and provide rationale as to why we need them by citing cases where setting keys is not necessarily ideal,</p></li>
+<li><p>perform fast subsetting, once again, but using the new <code>on</code> argument, which computes secondary indices internally for the task (temporarily), and reuses if one already exists,</p></li>
+<li><p>and finally look at <em>auto indexing</em> which goes a step further and creates secondary indices automatically, but does so on native R syntax for subsetting.</p></li>
+</ul>
+</div>
+<div id="secondary-indices" class="section level2">
+<h2>1. Secondary indices</h2>
+<div id="a-what-are-secondary-indices" class="section level3">
+<h3>a) What are secondary indices?</h3>
+<p>Secondary indices are similar to <code>keys</code> in <em>data.table</em>, except for two major differences:</p>
+<ul>
+<li><p>It <em>doesn’t</em> physically reorder the entire data.table in RAM. Instead, it only computes the order for the set of columns provided and stores that <em>order vector</em> in an additional attribute called <code>index</code>.</p></li>
+<li><p>There can be more than one secondary index for a data.table (as we will see below).</p></li>
+</ul>
+</div>
+<div id="b-set-and-get-secondary-indices" class="section level3">
+<h3>b) Set and get secondary indices</h3>
+<div id="how-can-we-set-the-column-origin-as-a-secondary-index-in-the-data.table-flights" class="section level4">
+<h4>– How can we set the column <code>origin</code> as a secondary index in the <em>data.table</em> <code>flights</code>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">setindex</span>(flights, origin)
+<span class="kw">head</span>(flights)
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co"># 3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co"># 4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co"># 5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co"># 6: 2014     1   1         4         0      AA    EWR  LAX      339     2454   18</span>
+
+## alternatively we can provide character vectors to the function 'setindexv()'
+<span class="co"># setindexv(flights, "origin") # useful to program with</span>
+
+<span class="co"># 'index' attribute added</span>
+<span class="kw">names</span>(<span class="kw">attributes</span>(flights))
+<span class="co"># [1] "names"             "row.names"         "class"             ".internal.selfref"</span>
+<span class="co"># [5] "index"</span></code></pre></div>
+<ul>
+<li><p><code>setindex</code> and <code>setindexv()</code> allows adding a secondary index to the data.table.</p></li>
+<li><p>Note that <code>flights</code> is <strong>not</strong> phyiscally reordered in increasing order of <code>origin</code>, as would have been the case with <code>setkey()</code>.</p></li>
+<li><p>Also note that the attribute <code>index</code> has been added to <code>flights</code>.</p></li>
+<li><p><code>setindex(flights, NULL)</code> would remove all secondary indices.</p></li>
+</ul>
+</div>
+<div id="how-can-we-get-all-the-secondary-indices-set-so-far-in-flights" class="section level4">
+<h4>– How can we get all the secondary indices set so far in <code>flights</code>?</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">indices</span>(flights)
+<span class="co"># [1] "origin"</span>
+
+<span class="kw">setindex</span>(flights, origin, dest)
+<span class="kw">indices</span>(flights)
+<span class="co"># [1] "origin"       "origin__dest"</span></code></pre></div>
+<ul>
+<li><p>The function <code>indices()</code> returns all current secondary indices in the data.table. If none exists, <code>NULL</code> is returned.</p></li>
+<li><p>Note that by creating another index on the columns <code>origin, dest</code>, we do not lose the first index created on the column <code>origin</code>, i.e., we can have multiple secondary indices.</p></li>
+</ul>
+</div>
+</div>
+<div id="c-why-do-we-need-secondary-indices" class="section level3">
+<h3>c) Why do we need secondary indices?</h3>
+<div id="reordering-a-data.table-can-be-expensive-and-not-always-ideal" class="section level4">
+<h4>– Reordering a data.table can be expensive and not always ideal</h4>
+<p>Consider the case where you would like to perform a fast key based subset on <code>origin</code> column for the value “JFK”. We’d do this as:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
+<span class="kw">setkey</span>(flights, origin)
+flights[<span class="st">"JFK"</span>] <span class="co"># or flights[.("JFK")]</span></code></pre></div>
+</div>
+<div id="setkey-requires" class="section level4 bs-callout bs-callout-info">
+<h4><code>setkey()</code> requires:</h4>
+<ol style="list-style-type: lower-alpha">
+<li><p>computing the order vector for the column(s) provided, here, <code>origin</code>, and</p></li>
+<li><p>reordering the entire data.table, by reference, based on the order vector computed.</p></li>
+</ol>
+</div>
+</div>
+</div>
+<div id="section" class="section level1">
+<h1></h1>
+<p>Computing the order isn’t the time consuming part, since data.table uses true radix sorting on integer, character and numeric vectors. However reordering the data.table could be time consuming (depending on the number of rows and columns).</p>
+<p>Unless our task involves repeated subsetting on the same column, fast key based subsetting could effectively be nullified by the time to reorder, depending on our data.table dimensions.</p>
+<div id="there-can-be-only-one-key-at-the-most" class="section level4">
+<h4>– There can be only one <code>key</code> at the most</h4>
+<p>Now if we would like to repeat the same operation but on <code>dest</code> column instead, for the value “LAX”, then we have to <code>setkey()</code>, <em>again</em>.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## not run
+<span class="kw">setkey</span>(flights, dest)
+flights[<span class="st">"LAX"</span>]</code></pre></div>
+<p>And this reorders <code>flights</code> by <code>dest</code>, <em>again</em>. What we would really like is to be able to perform the fast subsetting by eliminating the reordering step.</p>
+<p>And this is precisely what <em>secondary indices</em> allow for!</p>
+</div>
+<div id="secondary-indices-can-be-reused" class="section level4">
+<h4>– Secondary indices can be reused</h4>
+<p>Since there can be multiple secondary indices, and creating an index is as simple as storing the order vector as an attribute, this allows us to even eliminate the time to recompute the order vector if an index already exists.</p>
+</div>
+<div id="the-new-on-argument-allows-for-cleaner-syntax-and-automatic-creation-and-reuse-of-secondary-indices" class="section level4">
+<h4>– The new <code>on</code> argument allows for cleaner syntax and automatic creation and reuse of secondary indices</h4>
+<p>As we will see in the next section, the <code>on</code> argument provides several advantages:</p>
+</div>
+<div id="on-argument" class="section level4 bs-callout bs-callout-info">
+<h4><code>on</code> argument</h4>
+<ul>
+<li><p>enables subsetting by computing secondary indices on the fly. This eliminates having to do <code>setindex()</code> every time.</p></li>
+<li><p>allows easy reuse of existing indices by just checking the attributes.</p></li>
+<li><p>allows for a cleaner syntax by having the columns on which the subset is performed as part of the syntax. This makes the code easier to follow when looking at it at a later point.</p>
+<p>Note that <code>on</code> argument can also be used on keyed subsets as well. In fact, we encourage to provide the <code>on</code> argument even when subsetting using keys for better readability.</p></li>
+</ul>
+</div>
+</div>
+<div id="section-1" class="section level1">
+<h1></h1>
+<div id="fast-subsetting-using-on-argument-and-secondary-indices" class="section level2">
+<h2>2. Fast subsetting using <code>on</code> argument and secondary indices</h2>
+<div id="a-fast-subsets-in-i" class="section level3">
+<h3>a) Fast subsets in <code>i</code></h3>
+<div id="subset-all-rows-where-the-origin-airport-matches-jfk-using-on" class="section level4">
+<h4>– Subset all rows where the origin airport matches <em>“JFK”</em> using <code>on</code></h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[<span class="st">"JFK"</span>, on =<span class="st"> "origin"</span>]
+<span class="co">#        year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#     1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co">#     2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co">#     3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co">#     4: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co">#     5: 2014     1   1        -2       -18      AA    JFK  LAX      338     2475   21</span>
+<span class="co">#    ---                                                                              </span>
+<span class="co"># 81479: 2014    10  31        -4       -21      UA    JFK  SFO      337     2586   17</span>
+<span class="co"># 81480: 2014    10  31        -2       -37      UA    JFK  SFO      344     2586   18</span>
+<span class="co"># 81481: 2014    10  31         0       -33      UA    JFK  LAX      320     2475   17</span>
+<span class="co"># 81482: 2014    10  31        -6       -38      UA    JFK  SFO      343     2586    9</span>
+<span class="co"># 81483: 2014    10  31        -6       -38      UA    JFK  LAX      323     2475   11</span>
+
+## alternatively
+<span class="co"># flights[.("JFK"), on = "origin"] (or) </span>
+<span class="co"># flights[list("JFK"), on = "origin"]</span></code></pre></div>
+<ul>
+<li><p>This statement performs a fast binary search based subset as well, by computing the index on the fly. However, note that it doesn’t save the index as an attribute automatically. This may change in the future.</p></li>
+<li><p>If we had already created a secondary index, using <code>setindex()</code>, then <code>on</code> would reuse it instead of (re)computing it. We can see that by using <code>verbose = TRUE</code>:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">setindex</span>(flights, origin)
+flights[<span class="st">"JFK"</span>, on =<span class="st"> "origin"</span>, verbose =<span class="st"> </span><span class="ot">TRUE</span>][<span class="dv">1</span>:<span class="dv">5</span>]
+<span class="co"># on= matches existing index, using index</span>
+<span class="co"># Starting bmerge ...done in 0 secs</span>
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co"># 3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co"># 4: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co"># 5: 2014     1   1        -2       -18      AA    JFK  LAX      338     2475   21</span></code></pre></div></li>
+</ul>
+</div>
+<div id="how-can-i-subset-based-on-origin-and-dest-columns" class="section level4">
+<h4>– How can I subset based on <code>origin</code> <em>and</em> <code>dest</code> columns?</h4>
+<p>For example, if we want to subset <code>"JFK", "LAX"</code> combination, then:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"JFK"</span>, <span class="st">"LAX"</span>), on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>)][<span class="dv">1</span>:<span class="dv">5</span>]
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co"># 2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co"># 3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co"># 4: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co"># 5: 2014     1   1        -2       -18      AA    JFK  LAX      338     2475   21</span></code></pre></div>
+<ul>
+<li><p><code>on</code> argument accepts a character vector of column names corresponding to the order provided to <code>i-argument</code>.</p></li>
+<li><p>Since the time to compute the secondary index is quite small, we don’t have to use <code>setindex()</code>, unless, once again, the task involves repeated subsetting on the same column.</p></li>
+</ul>
+</div>
+</div>
+<div id="b-select-in-j" class="section level3">
+<h3>b) Select in <code>j</code></h3>
+<p>All the operations we will discuss below are no different to the ones we already saw in the <em>Keys and fast binary search based subset</em> vignette. Except we’ll be using the <code>on</code> argument instead of setting keys.</p>
+<div id="return-arr_delay-column-alone-as-a-data.table-corresponding-to-origin-lga-and-dest-tpa" class="section level4">
+<h4>– Return <code>arr_delay</code> column alone as a data.table corresponding to <code>origin = "LGA"</code> and <code>dest = "TPA"</code></h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA"</span>), .(arr_delay), on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>)]
+<span class="co">#       arr_delay</span>
+<span class="co">#    1:         1</span>
+<span class="co">#    2:        14</span>
+<span class="co">#    3:       -17</span>
+<span class="co">#    4:        -4</span>
+<span class="co">#    5:       -12</span>
+<span class="co">#   ---          </span>
+<span class="co"># 1848:        39</span>
+<span class="co"># 1849:       -24</span>
+<span class="co"># 1850:       -12</span>
+<span class="co"># 1851:        21</span>
+<span class="co"># 1852:       -11</span></code></pre></div>
+</div>
+</div>
+<div id="c-chaining" class="section level3">
+<h3>c) Chaining</h3>
+<div id="on-the-result-obtained-above-use-chaining-to-order-the-column-in-decreasing-order." class="section level4">
+<h4>– On the result obtained above, use chaining to order the column in decreasing order.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA"</span>), .(arr_delay), on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>)][<span class="kw">order</span>(-arr_delay)]
+<span class="co">#       arr_delay</span>
+<span class="co">#    1:       486</span>
+<span class="co">#    2:       380</span>
+<span class="co">#    3:       351</span>
+<span class="co">#    4:       318</span>
+<span class="co">#    5:       300</span>
+<span class="co">#   ---          </span>
+<span class="co"># 1848:       -40</span>
+<span class="co"># 1849:       -43</span>
+<span class="co"># 1850:       -46</span>
+<span class="co"># 1851:       -48</span>
+<span class="co"># 1852:       -49</span></code></pre></div>
+</div>
+</div>
+<div id="d-compute-or-do-in-j" class="section level3">
+<h3>d) Compute or <em>do</em> in <code>j</code></h3>
+<div id="find-the-maximum-arrival-delay-correspondong-to-origin-lga-and-dest-tpa." class="section level4">
+<h4>– Find the maximum arrival delay correspondong to <code>origin = "LGA"</code> and <code>dest = "TPA"</code>.</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="st">"LGA"</span>, <span class="st">"TPA"</span>), <span class="kw">max</span>(arr_delay), on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>)]
+<span class="co"># [1] 486</span></code></pre></div>
+</div>
+</div>
+<div id="e-sub-assign-by-reference-using-in-j" class="section level3">
+<h3>e) <em>sub-assign</em> by reference using <code>:=</code> in <code>j</code></h3>
+<p>We have seen this example already in the <em>Reference semantics</em> and <em>Keys and fast binary search based subset</em> vignette. Let’s take a look at all the <code>hours</code> available in the <code>flights</code> <em>data.table</em>:</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="co"># get all 'hours' in flights</span>
+flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
+<span class="co">#  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24</span></code></pre></div>
+<p>We see that there are totally <code>25</code> unique values in the data. Both <em>0</em> and <em>24</em> hours seem to be present. Let’s go ahead and replace <em>24</em> with <em>0</em>, but this time using <code>on</code> instead of setting keys.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(24L), hour :<span class="er">=</span><span class="st"> </span>0L, on =<span class="st"> "hour"</span>]
+<span class="co">#         year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co">#      1: 2014     1   1        14        13      AA    JFK  LAX      359     2475    9</span>
+<span class="co">#      2: 2014     1   1        -3        13      AA    JFK  LAX      363     2475   11</span>
+<span class="co">#      3: 2014     1   1         2         9      AA    JFK  LAX      351     2475   19</span>
+<span class="co">#      4: 2014     1   1        -8       -26      AA    LGA  PBI      157     1035    7</span>
+<span class="co">#      5: 2014     1   1         2         1      AA    JFK  LAX      350     2475   13</span>
+<span class="co">#     ---                                                                              </span>
+<span class="co"># 253312: 2014    10  31         1       -30      UA    LGA  IAH      201     1416   14</span>
+<span class="co"># 253313: 2014    10  31        -5       -14      UA    EWR  IAH      189     1400    8</span>
+<span class="co"># 253314: 2014    10  31        -8        16      MQ    LGA  RDU       83      431   11</span>
+<span class="co"># 253315: 2014    10  31        -4        15      MQ    LGA  DTW       75      502   11</span>
+<span class="co"># 253316: 2014    10  31        -5         1      MQ    LGA  SDF      110      659    8</span></code></pre></div>
+<p>Now, let’s check if <code>24</code> is replaced with <code>0</code> in the <code>hour</code> column.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[, <span class="kw">sort</span>(<span class="kw">unique</span>(hour))]
+<span class="co">#  [1]  0  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23</span></code></pre></div>
+<ul>
+<li>This is particularly a huge advantage of secondary indices. Previously, just to update a few rows of <code>hour</code>, we had to <code>setkey()</code> on it, which inevitablly reorders the entire data.table. With <code>on</code>, the order is preserved, and the operation is much faster! Looking at the code, the task we wanted to perform is also quite clear.</li>
+</ul>
+</div>
+<div id="f-aggregation-using-by" class="section level3">
+<h3>f) Aggregation using <code>by</code></h3>
+<div id="get-the-maximum-departure-delay-for-each-month-corresponding-to-origin-jfk.-order-the-result-by-month" class="section level4">
+<h4>– Get the maximum departure delay for each <code>month</code> corresponding to <code>origin = "JFK"</code>. Order the result by <code>month</code></h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">ans <-<span class="st"> </span>flights[<span class="st">"JFK"</span>, <span class="kw">max</span>(dep_delay), keyby =<span class="st"> </span>month, on =<span class="st"> "origin"</span>]
+<span class="kw">head</span>(ans)
+<span class="co">#    month   V1</span>
+<span class="co"># 1:     1  881</span>
+<span class="co"># 2:     1 1014</span>
+<span class="co"># 3:     1  920</span>
+<span class="co"># 4:     1 1241</span>
+<span class="co"># 5:     1  853</span>
+<span class="co"># 6:     1  798</span></code></pre></div>
+<ul>
+<li>We would have had to set the <code>key</code> back to <code>origin, dest</code> again, if we did not use <code>on</code> which internally builds secondary indices on the fly.</li>
+</ul>
+</div>
+</div>
+<div id="g-the-mult-argument" class="section level3">
+<h3>g) The <em>mult</em> argument</h3>
+<p>The other arguments including <code>mult</code> work exactly the same way as we saw in the <em>Keys and fast binary search based subset</em> vignette. The default value for <code>mult</code> is “all”. We can choose, instead only the “first” or “last” matching rows should be returned.</p>
+<div id="subset-only-the-first-matching-row-where-dest-matches-bos-and-day" class="section level4">
+<h4>– Subset only the first matching row where <code>dest</code> matches <em>“BOS”</em> and <em>“DAY”</em></h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[<span class="kw">c</span>(<span class="st">"BOS"</span>, <span class="st">"DAY"</span>), on =<span class="st"> "dest"</span>, mult =<span class="st"> "first"</span>]
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014     1   1         3         1      AA    JFK  BOS       39      187   12</span>
+<span class="co"># 2: 2014     1   1        25        35      EV    EWR  DAY      102      533   17</span></code></pre></div>
+</div>
+<div id="subset-only-the-last-matching-row-where-origin-matches-lga-jfk-ewr-and-dest-matches-xna" class="section level4">
+<h4>– Subset only the last matching row where <code>origin</code> matches <em>“LGA”, “JFK”, “EWR”</em> and <code>dest</code> matches <em>“XNA”</em></h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>), mult =<span class="st"> "last"</span>]
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014    10  31        -5       -11      MQ    LGA  XNA      165     1147    6</span>
+<span class="co"># 2:   NA    NA  NA        NA        NA      NA    JFK  XNA       NA       NA   NA</span>
+<span class="co"># 3: 2014    10  31        -2       -25      EV    EWR  XNA      160     1131    6</span></code></pre></div>
+</div>
+</div>
+<div id="h-the-nomatch-argument" class="section level3">
+<h3>h) The <em>nomatch</em> argument</h3>
+<p>We can choose if queries that do not match should return <code>NA</code> or be skipped altogether using the <code>nomatch</code> argument.</p>
+<div id="from-the-previous-example-subset-all-rows-only-if-theres-a-match" class="section level4">
+<h4>– From the previous example, subset all rows only if there’s a match</h4>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">flights[.(<span class="kw">c</span>(<span class="st">"LGA"</span>, <span class="st">"JFK"</span>, <span class="st">"EWR"</span>), <span class="st">"XNA"</span>), mult =<span class="st"> "last"</span>, on =<span class="st"> </span><span class="kw">c</span>(<span class="st">"origin"</span>, <span class="st">"dest"</span>), nomatch =<span class=" [...]
+<span class="co">#    year month day dep_delay arr_delay carrier origin dest air_time distance hour</span>
+<span class="co"># 1: 2014    10  31        -5       -11      MQ    LGA  XNA      165     1147    6</span>
+<span class="co"># 2: 2014    10  31        -2       -25      EV    EWR  XNA      160     1131    6</span></code></pre></div>
+<ul>
+<li>There are no flights connecting “JFK” and “XNA”. Therefore, that row is skipped in the result.</li>
+</ul>
+</div>
+</div>
+</div>
+<div id="auto-indexing" class="section level2">
+<h2>3. Auto indexing</h2>
+<p>First we looked at how to fast subset using binary search using <em>keys</em>. Then we figured out that we could improve performance even further and have more cleaner syntax by using secondary indices. What could be better than that? The answer is to optimise <em>native R syntax</em> to use secondary indices internally so that we can have the same performance without having to use newer syntax.</p>
+<p>That is what <em>auto indexing</em> does. At the moment, it is only implemented for binary operators <code>==</code> and <code>%in%</code>. And it only works with a single column at the moment as well. An index is automatically created <em>and</em> saved as an attribute. That is, unlike the <code>on</code> argument which computes the index on the fly each time, a secondary index is created here.</p>
+<p>Let’s start by creating a data.table big enough to highlight the advantage.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r"><span class="kw">set.seed</span>(1L)
+dt =<span class="st"> </span><span class="kw">data.table</span>(<span class="dt">x =</span> <span class="kw">sample</span>(1e5L, 1e7L, <span class="ot">TRUE</span>), <span class="dt">y =</span> <span class="kw">runif</span>(100L))
+<span class="kw">print</span>(<span class="kw">object.size</span>(dt), <span class="dt">units =</span> <span class="st">"Mb"</span>)
+<span class="co"># 114.4 Mb</span></code></pre></div>
+<p>When we use <code>==</code> or <code>%in%</code> on a single column for the first time, a secondary index is created automtically, and it is used to perform the subset.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## have a look at all the attribute names
+<span class="kw">names</span>(<span class="kw">attributes</span>(dt))
+<span class="co"># [1] "names"             "row.names"         "class"             ".internal.selfref"</span>
+
+## run thefirst time
+(t1 <-<span class="st"> </span><span class="kw">system.time</span>(ans <-<span class="st"> </span>dt[x ==<span class="st"> </span>989L]))
+<span class="co">#    user  system elapsed </span>
+<span class="co">#    0.16    0.00    0.16</span>
+<span class="kw">head</span>(ans)
+<span class="co">#      x         y</span>
+<span class="co"># 1: 989 0.5372007</span>
+<span class="co"># 2: 989 0.5642786</span>
+<span class="co"># 3: 989 0.7151100</span>
+<span class="co"># 4: 989 0.3920405</span>
+<span class="co"># 5: 989 0.9547465</span>
+<span class="co"># 6: 989 0.2914710</span>
+
+## secondary index is created
+<span class="kw">names</span>(<span class="kw">attributes</span>(dt))
+<span class="co"># [1] "names"             "row.names"         "class"             ".internal.selfref"</span>
+<span class="co"># [5] "index"</span>
+
+<span class="kw">indices</span>(dt)
+<span class="co"># [1] "x"</span></code></pre></div>
+<p>The time to subset the first time is the time to create the index + the time to subset. Since creating a secondary index involves only creating the order vector, this combined operation is faster than vector scans in many cases. But the real advantage comes in successive subsets. They are extremely fast.</p>
+<div class="sourceCode"><pre class="sourceCode r"><code class="sourceCode r">## successive subsets
+(t2 <-<span class="st"> </span><span class="kw">system.time</span>(dt[x ==<span class="st"> </span>989L]))
+<span class="co">#    user  system elapsed </span>
+<span class="co">#   0.000   0.000   0.001</span>
+<span class="kw">system.time</span>(dt[x %in%<span class="st"> </span><span class="dv">1989</span>:<span class="dv">2012</span>])
+<span class="co">#    user  system elapsed </span>
+<span class="co">#   0.000   0.000   0.001</span></code></pre></div>
+<ul>
+<li><p>Running the first time took 0.160 seconds where as the second time took 0.001 seconds.</p></li>
+<li><p>Auto indexing can be disabled by setting the global argument <code>options(datatable.auto.index = FALSE)</code>.</p></li>
+<li><p>Disabling auto indexing still allows to use indices created explicitly with <code>setindex</code> or <code>setindexv</code>. You can disable indices fully by setting global argument <code>options(datatable.use.index = FALSE)</code>.</p></li>
+</ul>
+</div>
+</div>
+<div id="section-2" class="section level1">
+<h1></h1>
+<p>In the future, we plan to extend auto indexing to expressions involving more than one column. Also we are working on extending binary search to work with more binary operators like <code><</code>, <code><=</code>, <code>></code> and <code>>=</code>. Once done, it would be straightforward to extend it to these operators as well.</p>
+<p>We will extend fast <em>subsets</em> using keys and secondary indices to <em>joins</em> in the next vignette, <em>“Joins and rolling joins”</em>.</p>
+<hr />
+</div>
+
+
+
+<!-- dynamically load mathjax for compatibility with self-contained -->
+<script>
+  (function () {
+    var script = document.createElement("script");
+    script.type = "text/javascript";
+    script.src  = "https://cdn.mathjax.org/mathjax/latest/MathJax.js?config=TeX-AMS-MML_HTMLorMML";
+    document.getElementsByTagName("head")[0].appendChild(script);
+  })();
+</script>
+
+</body>
+</html>
diff --git a/inst/tests/1680-fread-header-encoding.csv b/inst/tests/1680-fread-header-encoding.csv
new file mode 100644
index 0000000..13f7281
--- /dev/null
+++ b/inst/tests/1680-fread-header-encoding.csv
@@ -0,0 +1,5 @@
+Ort;Stra�e;Bezeichnung
+Vienna;Testgasse 1;"Ministerium ""Pestalozzi"""
+Graz;Teststra�e 3;HS
+Salzburg;Beispielstra�e 9;"NMS ""Die Schlauen"""
+Vienna;Wolfgang-Stra�e 7;"Wirtshaus ""Wien III"""
diff --git a/inst/tests/530_fread.txt b/inst/tests/530_fread.txt
new file mode 100644
index 0000000..1756214
--- /dev/null
+++ b/inst/tests/530_fread.txt
@@ -0,0 +1,51 @@
+a,b,c,d
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2
+1,2,3
+a,b
+
+1,3
+2,4
diff --git a/inst/tests/536_fread_fill_1.txt b/inst/tests/536_fread_fill_1.txt
new file mode 100644
index 0000000..763b49d
--- /dev/null
+++ b/inst/tests/536_fread_fill_1.txt
@@ -0,0 +1,29 @@
+a,b,c
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+4,5
+
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+
+1,2,qq
+
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+1
+1
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,er
+
diff --git a/inst/tests/536_fread_fill_2.txt b/inst/tests/536_fread_fill_2.txt
new file mode 100644
index 0000000..717b641
--- /dev/null
+++ b/inst/tests/536_fread_fill_2.txt
@@ -0,0 +1,28 @@
+a,b,c
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+4,5
+
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+
+1,2,qq
+
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+1
+1
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,er
diff --git a/inst/tests/536_fread_fill_3_extreme.txt b/inst/tests/536_fread_fill_3_extreme.txt
new file mode 100644
index 0000000..011826e
--- /dev/null
+++ b/inst/tests/536_fread_fill_3_extreme.txt
@@ -0,0 +1,22 @@
+a,b,c
+1,"first,,,,,,,,,,,,,,,,
+
+second,,,,,,,
+
+
+
+
+
+
+
+
+
+third",2
+
+
+
+2,"foo"
+3
+
+
+
diff --git a/inst/tests/536_fread_fill_4.txt b/inst/tests/536_fread_fill_4.txt
new file mode 100644
index 0000000..5a51bf8
--- /dev/null
+++ b/inst/tests/536_fread_fill_4.txt
@@ -0,0 +1,30 @@
+a,b,c
+
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+4,5
+
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+
+1,2,qq
+
+1,2,qq
+1,2,qq
+1,2,qq
+1
+
+1
+1
+1,2,qq
+1,2,qq
+1,2,qq
+1,2,er
+
diff --git a/inst/tests/fread_blank.txt b/inst/tests/fread_blank.txt
new file mode 100644
index 0000000..cc7ffcc
--- /dev/null
+++ b/inst/tests/fread_blank.txt
@@ -0,0 +1,48 @@
+a,b,c
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+1,2,3
+1,2,3
+1,2,3
diff --git a/inst/tests/fread_blank2.txt b/inst/tests/fread_blank2.txt
new file mode 100644
index 0000000..255978a
--- /dev/null
+++ b/inst/tests/fread_blank2.txt
@@ -0,0 +1,32 @@
+a,b,c
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
diff --git a/inst/tests/fread_blank3.txt b/inst/tests/fread_blank3.txt
new file mode 100644
index 0000000..c574684
--- /dev/null
+++ b/inst/tests/fread_blank3.txt
@@ -0,0 +1,12 @@
+a,b,c
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+1,2,3
+
+
+
+
+
+
diff --git a/inst/tests/issue_1087_utf8_bom.csv b/inst/tests/issue_1087_utf8_bom.csv
new file mode 100644
index 0000000..9c60e4f
--- /dev/null
+++ b/inst/tests/issue_1087_utf8_bom.csv
@@ -0,0 +1,2 @@
+a,b,c
+1,2,3
diff --git a/inst/tests/issue_1116_fread_few_lines.txt b/inst/tests/issue_1116_fread_few_lines.txt
new file mode 100644
index 0000000..f1b8879
--- /dev/null
+++ b/inst/tests/issue_1116_fread_few_lines.txt
@@ -0,0 +1,133 @@
+x,y
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
+1,"a
+,,,,,,,,,,
+_b"
diff --git a/inst/tests/issue_1116_fread_few_lines_2.txt b/inst/tests/issue_1116_fread_few_lines_2.txt
new file mode 100644
index 0000000..4226ea6
--- /dev/null
+++ b/inst/tests/issue_1116_fread_few_lines_2.txt
@@ -0,0 +1,177 @@
+x,y
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
+1,"a
+
+,,,,,,,,
+_b"
diff --git a/inst/tests/issue_1164_json.txt b/inst/tests/issue_1164_json.txt
new file mode 100644
index 0000000..ee3829a
--- /dev/null
+++ b/inst/tests/issue_1164_json.txt
@@ -0,0 +1,2 @@
+json1, string1
+"{""f1"":""value1"",""f2"":""double quote escaped with a backslash [ \"" ]""}", "string field"
diff --git a/inst/tests/issue_1462_fread_quotes.txt b/inst/tests/issue_1462_fread_quotes.txt
new file mode 100644
index 0000000..f1b09d6
--- /dev/null
+++ b/inst/tests/issue_1462_fread_quotes.txt
@@ -0,0 +1,4 @@
+897145298																										urn:occurrence:Arctos:MSB:Host:9010:1861932									en							http://arctos.database.museum/guid/MSB:Host:9010				http://arctosdb.org/home/data/								PhysicalObject								PRESERVED_SPECIMEN			9010	Mammalia	Host (of parasite) specimens		NORTH_AMERICA	US					2014-05-28T00:00Z	14	63.57609	-170.87962		sex=female ; weight=4 g; reproductive data=immature ; examined for parasites=yes ; parasites found=no ; hind foot with claw=12 mm; tail length=33 mm; total  [...]
+897145318																										urn:occurrence:Arctos:MSB:Host:9011:1861933									en							http://arctos.database.museum/guid/MSB:Host:9011				http://arctosdb.org/home/data/								PhysicalObject								PRESERVED_SPECIMEN			9011	Mammalia	Host (of parasite) specimens		NORTH_AMERICA	US					2014-05-28T00:00Z	14	63.57609	-170.87962		sex=female ; weight=4.2 g; reproductive data=immature ; examined for parasites=yes ; parasites found=no ; tail length=13 mm; total length=36     mm						226 [...]
+897145322																										urn:occurrence:Arctos:MSB:Host:927:1853849									en							http://arctos.database.museum/guid/MSB:Host:927				http://arctosdb.org/home/data/								PhysicalObject				(host of) MSB:Para http://arctos.database.museum/guid/MSB:Para:6247				PRESERVED_SPECIMEN			927	Mammalia	Host (of parasite) specimens		NORTH_AMERICA	US					1951-08-12T00:00Z	12	60.08292	-166.39397		sex=male ; weight=4.2 g; examined for parasites=yes ; parasites found=yes						224		1951-08- [...]
+897145342																										urn:occurrence:Arctos:MSB:Host:9012:1861934									en							http://arctos.database.museum/guid/MSB:Host:9012				http://arctosdb.org/home/data/								PhysicalObject								PRESERVED_SPECIMEN			9012	Mammalia	Host (of parasite) specimens		NORTH_AMERICA	US					2014-05-28T00:00Z	14	63.57609	-170.87962		sex=male ; weight=4 g; reproductive data=immature; testis 1 x 1; thymus large ; examined for parasites=yes ; parasites found=no ; hind foot with claw=13 mm; [...]
diff --git a/inst/tests/issue_1573_fill.txt b/inst/tests/issue_1573_fill.txt
new file mode 100644
index 0000000..e868824
--- /dev/null
+++ b/inst/tests/issue_1573_fill.txt
@@ -0,0 +1,8 @@
+SD1    ST1 SMS1    SD2 ST2 SMS2    SD3 ST3 SMS3    SD4 ST4 SMS4
+01-11-2015   00:00:01   323  2015-11-01      00:00:01   551
+01-11-2015   00:00:02   289  2015-11-01      00:00:02   618
+01-11-2015   01:13:16   253  2015-11-01      01:13:25   511  2015-11-01      01:13:33   489  2015-11-01      01:13:44   870
+01-11-2015   00:00:11   986  2015-11-01      00:00:12   602
+01-11-2015   00:00:27   48   2015-11-01      00:00:27   391  2015-11-01      00:00:27   429
+01-11-2015   00:00:13   750  2015-11-01      00:00:14   255
+01-11-2015   00:00:28   773  2015-11-01      00:00:29   114
diff --git a/inst/tests/melt-warning-1752.tsv b/inst/tests/melt-warning-1752.tsv
new file mode 100644
index 0000000..c34bcf6
--- /dev/null
+++ b/inst/tests/melt-warning-1752.tsv
@@ -0,0 +1,2 @@
+Id	Id2	Geography	RECORD CODES - File Identification	RECORD CODES - State/US-Abbreviation (USPS)	RECORD CODES - Summary Level	RECORD CODES - Geographic Component	RECORD CODES - Characteristic Iteration	RECORD CODES - Characteristic Iteration File Sequence Number	RECORD CODES - Logical Record Number	GEOGRAPHIC AREA CODES - Region	GEOGRAPHIC AREA CODES - Division	GEOGRAPHIC AREA CODES - State (FIPS)	GEOGRAPHIC AREA CODES - County	GEOGRAPHIC AREA CODES - FIPS County Class Code	GEOGRAPHIC ARE [...]
+310M100US10180	10180	Abilene, TX Metro Area	UR1US	US	310	0	0		407913																															10180	18		999																							�.).0-*(+,))+(0(E-314	�.).0-*(+,))+(0(E-316	Abilene, TX Metro Area	S		165252	69721	3.24520222E1	-9.97187428E1	M1																							0		1			
diff --git a/inst/tests/tests.Rraw b/inst/tests/tests.Rraw
index a00c2b9..91b6fcf 100644
--- a/inst/tests/tests.Rraw
+++ b/inst/tests/tests.Rraw
@@ -1,26 +1,15 @@
 
-# This file should be clean of non-ASCII characters; e.g. CRAN Solaris
-# Use:  grep --color='auto' -P -n "[\x80-\xFF]" tests.Rraw
-
 if (!exists("test.data.table",.GlobalEnv,inherits=FALSE)) {
     require(data.table)   # in dev the package should not be loaded
     options(warn=0)  # use require() so it warns but doesn't halt if not available
-    require(plyr)
-    require(ggplot2)
-    require(hexbin)
-    require(nlme)
-    require(xts)
-    require(bit64)
-    require(gdata)
-    require(GenomicRanges)
-    require(caret)
-    require(knitr)
-    require(plm)
+    inst_pkgs = rownames(installed.packages())
+    sugg_pkgs = c("chron", "ggplot2", "plyr", "reshape", "reshape2", "testthat", "hexbin", "fastmatch", "nlme", "GenomicRanges", "xts", "bit64", "gdata", "caret", "knitr", "curl", "zoo", "plm", "rmarkdown", "parallel")
+    lapply(setNames(sugg_pkgs, nm = sugg_pkgs), function(pkg) if(pkg %in% inst_pkgs) require(pkg, character.only=TRUE))
     # reshape2 ahead of reshape ...
     try(detach(package:reshape2),silent=TRUE)
     try(detach(package:reshape),silent=TRUE)
-    library(reshape2, pos="package:base", logical.return=TRUE)
-    library(reshape, pos="package:base", logical.return=TRUE)
+    if("reshape2" %in% inst_pkgs) library(reshape2, pos="package:base", logical.return=TRUE)
+    if("reshape" %in% inst_pkgs) library(reshape, pos="package:base", logical.return=TRUE)
     .devtesting=FALSE
 } else {
     # Matt has suppressMessages(require(bit64)) in .Rprofile
@@ -36,23 +25,29 @@ if (!exists("test.data.table",.GlobalEnv,inherits=FALSE)) {
 }
 options(warn=2)
 
+setDTthreads(2)
+# Tests are small and quick so should themselves switch to 1 thread, but explicity limit to 2
+# so as not to breach CRAN policy, just in case. 
+
 nfail = ntest = lastnum = 0
 whichfail = NULL
 
 .timingtests = FALSE
 started.at = Sys.time()
-oldbwb = options(datatable.old.bywithoutby=FALSE)  # in case user set it, or set in dev
+
+# Test default values in case user set global option. These are restored
+# at the end of this file.
+oldalloccol = options(datatable.alloccol=1024L)
+oldWhenJsymbol = options(datatable.WhenJisSymbolThenCallingScope=FALSE)
 
 if (!.devtesting) {
     test = data.table:::test
+    INT = data.table:::INT
     compactprint = data.table:::compactprint
     is.sorted = data.table:::is.sorted
     forderv = data.table:::forderv
     forder = data.table:::forder
     null.data.table = data.table:::null.data.table
-    ordernumtol = data.table:::ordernumtol   # TO DO: deprecated, remove
-    iradixorder = data.table:::iradixorder   # TO DO: deprecated, remove
-    dradixorder = data.table:::dradixorder   # TO DO: deprecated, remove
     uniqlist = data.table:::uniqlist
     uniqlengths = data.table:::uniqlengths
     setrev = data.table:::setrev
@@ -64,14 +59,14 @@ if (!.devtesting) {
     .R.subassignCopiesOthers = data.table:::.R.subassignCopiesOthers
     .R.subassignCopiesVecsxp = data.table:::.R.subassignCopiesVecsxp
     setdiff_ = data.table:::setdiff_
-    frankv = data.table:::frankv
+    frankv = data.table::frankv
     is_na = data.table:::is_na
     shallow = data.table:::shallow # until exported
     chmatch2 = data.table:::chmatch2
     which_ = data.table:::which_
-    shift = data.table:::shift
+    shift = data.table::shift
     any_na = data.table:::any_na
-    replace_dot = data.table:::replace_dot
+    replace_dot_alias = data.table:::replace_dot_alias
     isReallyReal = data.table:::isReallyReal
     between = data.table::between
     which.first = data.table:::which.first
@@ -80,6 +75,8 @@ if (!.devtesting) {
     `%+%.default` = data.table:::`%+%.default`
     .shallow = data.table:::.shallow
     getdots = data.table:::getdots
+    second = data.table::second # avoid S4Vectors::second
+    binary = data.table:::binary
 }
 
 # test for covering tables.R 100%, we need to run tables() before creating any data.tables to return null data.table
@@ -95,7 +92,7 @@ setkey(TESTDT,a,b)
 #       [5,] 4 9 5
 #       [6,] 4 9 6
 #       [7,] 7 2 7
-INT = function(...) { as.integer(c(...)) }
+
 ##########################
 
 test(1, TESTDT[SJ(4,6),v,mult="first"], 3L)
@@ -228,24 +225,29 @@ a = "d"
 test(70, TESTDT[eval(J(a)),v,by=.EACHI], data.table(a="d",v=3:6,key="a"))   # the eval() enabled you to use the 'a' in the calling scope, not 'a' in the TESTDT.  TO DO: document this.
 test(71, TESTDT[eval(SJ(a)),v,by=.EACHI], data.table(a="d",v=3:6,key="a"))
 test(72, TESTDT[eval(CJ(a)),v,by=.EACHI], data.table(a="d",v=3:6,key="a"))
-
-test(73, TESTDT[,v], 1:7)
-test(74, TESTDT[,3], 3)
-test(74.5, TESTDT[,3L], 3L)
-test(75, TESTDT[,"v"], "v")
-test(76, TESTDT[,2:3], 2:3)  # See ?[.data.table that with=FALSE is required for the likely intended result
+test(73, TESTDT[,v], 1:7)   # still old behaviour for 1 year. WhenJsymbol option was set to FALSE at the top of this file
+test(74, TESTDT[,3], data.table(v=1:7))
+test(74.1, TESTDT[,4], error="outside the column number range.*1,ncol=3")
+test(74.2, TESTDT[,3L], data.table(v=1:7))
+test(74.3, TESTDT[,0], null.data.table())
+test(75, TESTDT[,"v"], data.table(v=1:7))
+test(76, TESTDT[,2:3], TESTDT[,2:3,with=FALSE])
 test(77, TESTDT[,2:3,with=FALSE], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
-test(78, TESTDT[,c("b","v"),with=FALSE], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
+test(78, TESTDT[,c("b","v")], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
 colsVar = c("b","v")
-test(79, TESTDT[,colsVar], colsVar)
-test(80, TESTDT[,colsVar,with=FALSE], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
+test(79.1, TESTDT[,colsVar], colsVar)
+test(79.2, TESTDT[,colsVar,with=FALSE], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
+options(datatable.WhenJisSymbolThenCallingScope=TRUE)
+test(80.1, TESTDT[,colsVar], data.table(b=c("e","e","f","f","i","i","b"),v=1:7))
+test(80.2, TESTDT[,colsVar,with=TRUE], error="is turned on.*Please")
+options(datatable.WhenJisSymbolThenCallingScope=FALSE)
 
 # works in test.data.table, but not eval(body(test.data.table)) when in R CMD check ... test(81, TESTDT[1:2,c(a,b)], factor(c("a","c","e","e")))
 # It is expected the above to be common source of confusion. c(a,b) is evaluated within
 # the frame of TESTDT, and c() creates one vector, not 2 column subset as in data.frame's.
 # If 2 columns were required use list(a,b).  c() can be useful too, but is different.
 
-test(82, TESTDT[,c("a","b")], c("a","b"))
+test(82, TESTDT[,c("a","b")], data.table(a=TESTDT[[1]], b=TESTDT[[2]], key=c("a","b")))
 test(83, TESTDT[,list("a","b")], data.table(V1="a",V2="b"))
 test(83.1, TESTDT[,list("sum(a),sum(b)")], data.table("sum(a),sum(b)"))
 test(83.2, TESTDT[,list("sum(a),sum(b)"),by=a], {tt=data.table(a=c("a","c","d","g"),V1="sum(a),sum(b)",key="a");tt$V1=as.character(tt$V1);tt})
@@ -277,7 +279,7 @@ test(98, TESTDT[SJ(c("f","i","b")),list(GroupSum=sum(v)),by=.EACHI], data.table(
 # line above is the way to group, sort by group and setkey on the result by group.
 
 dt <- data.table(A = rep(1:3, each=4), B = rep(11:14, each=3), C = rep(21:22, 6), key = "A,B")
-test(99, unique(dt), data.table(dt[c(1L, 4L, 5L, 7L, 9L, 10L)], key="A,B"))
+test(99, unique(dt, by=key(dt)), data.table(dt[c(1L, 4L, 5L, 7L, 9L, 10L)], key="A,B"))
 
 # test [<- for column assignment
 dt1 <- dt2 <- dt
@@ -347,7 +349,7 @@ TESTDT = data.table(NULL)
 test(122, TESTDT[1], TESTDT)
 test(123, TESTDT[0], TESTDT)
 test(124, TESTDT[1:10], TESTDT)
-test(125, TESTDT["k"], error="x must be keyed")
+test(125, TESTDT["k"], error="the columns to join by must be specified either using")
 # test 126 no longer needed now that test() has 'error' argument
 
 TESTDT = data.table(a=3L,v=2L,key="a")  # testing 1-row table
@@ -445,12 +447,12 @@ test(165, subset(DT,a>2), DT[a>2])
 test(166, suppressWarnings(split(DT,DT$grp)[[2]]), DT[grp==2])
 
 if ("package:ggplot2" %in% search()) {
-    test(167,names(print(ggplot(DT,aes(b,f))+geom_point())),c("data","panel","plot"))
+    test(167, names(print(ggplot(DT,aes(b,f))+geom_point()))[c(1,3)], c("data","plot"))
     # The names() is a stronger test that it has actually plotted, but also because test() sees the invisible result
     test(167.1,DT[,print(ggplot(.SD,aes(b,f))+geom_point()),by=list(grp%%2L)],data.table(grp=integer()))  # %%2 because there are 5 groups in DT data at this stage, just need 2 to test
     # New test reported by C Neff on 11 Oct 2011
     if ("package:hexbin" %in% search())
-       test(167.2, names(print(ggplot(DT) + geom_hex(aes(b, f)) + facet_wrap(~grp))), c("data","panel","plot"))
+       test(167.2, names(print(ggplot(DT) + geom_hex(aes(b, f)) + facet_wrap(~grp)))[c(1,3)], c("data","plot"))
     else
        cat("Test 167.2 not run. If required call library(hexbin) first.\n")
 
@@ -559,6 +561,10 @@ test(205, x[y,list(d),mult="all"][,d], c(1,2,NA,NA))
 # with data.frame).
 TESTDT = data.table(a=1:3,v=1:3,key="a")
 test(206, TESTDT[NA], data.table(a=NA_integer_,v=NA_integer_,key="a"))  # NA are now allowed in keys, so retains key
+# TESTDT[NA] is expected to return a row of NA since nobody remembers that NA is different to NA_integer_
+# Then user tries TESTDT[c(1,NA,2)] and it feels consistent to them since they see that row of NA in the middle
+# But only the NA symbol is caught and replaced with NA_integer_, for this convenience.
+# Otherwise logical expressions returning a single NA logical will still return empty, for consistency, #1252.
 setkey(TESTDT,NULL)
 test(207, TESTDT[NA], data.table(a=NA_integer_,v=NA_integer_))
 
@@ -798,7 +804,7 @@ test(286, DT[,list(sum(unlist(.BY)),sum(z)),by=groupExpr], ans)
 # Bug fix from Damian B on 25 June 2011 :
 DT = data.table(X=c(NA,1,2,3), Y=c(NA,2,1,3))
 setkeyv(DT,c("X","Y"))
-test(287, unique(DT), DT)
+test(287, unique(DT, by=key(DT)), DT)
 
 # Bug fix #1421: using vars in calling scope in j when i is logical or integer.
 DT = data.table(A=c("a","b","b"),B=c(4,5,NA))
@@ -860,20 +866,22 @@ test(297,DT[,list(A1=sum(A1),A2=sum(A2),A3=sum(A3)),by=grp], DT[,lapply(.SD,sum)
 
 DT = data.table(a=1:3,b=4:6)
 test(298, {DT$b<-NULL;DT}, data.table(a=1:3))  # delete column
-test(299, DT$c <- as.character(DT$c), error="zero length")  # to simulate RHS which could (due to user error) be non NULL but zero length. This copies DT too, so the next test checks that a subsequent := detects and fixes that.
-test(299.1, DT[,c:=42L], data.table(a=1:3,c=42L), warning="Invalid .internal.selfref detected and fixed")
-test(299.2, truelength(DT)>length(DT))   # the := over-allocated, by 100 by default, but user may have changed default so just check '>'
+test(299.01, {DT$c<-as.character(DT$c);DT}, data.table(a=1:3, c=NA_character_)) # Column c is missing, so DT$c is NULL.
+test(299.02, DT[,c:=""], data.table(a=1:3,c=""))
+test(299.03, truelength(DT)>length(DT))   # the := over-allocated, by 100 by default, but user may have changed default so just check '>'
 # FR #2551 - old 299.3 and 299.5 are changed to include length(RHS) > 1 to issue the warning
-test(299.3, DT[2:3,c:=c(42, 42)], data.table(a=1:3,c=42L), warning="Coerced 'double' RHS to 'integer' to match the column's type.*length 3 (nrows of entire table)")
+DT[,c:=rep(42L,.N)] # plonk
+test(299.04, DT, data.table(a=1:3, c=42L))
+test(299.05, DT[2:3,c:=c(42, 42)], data.table(a=1:3,c=42L), warning="Coerced 'double' RHS to 'integer' to match the column's type.*length 3 (nrows of entire table)")
 # FR #2551 - length(RHS) = 1 - no warning for type conversion
-test(299.7, DT[2,c:=42], data.table(a=1:3,c=42L))
+test(299.06, DT[2,c:=42], data.table(a=1:3,c=42L))
 # also see tests 302 and 303.  (Ok, new test file for fast assign would be tidier).
-test(299.4, DT[,c:=rep(FALSE,nrow(DT))], data.table(a=1:3,c=FALSE))  # replace c column with logical
-test(299.5, DT[2:3,c:=c(42,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match the column's type.*length 3 (nrows of entire table)")
+test(299.07, DT[,c:=rep(FALSE,nrow(DT))], data.table(a=1:3,c=FALSE))  # replace c column with logical
+test(299.08, DT[2:3,c:=c(42,0)], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match the column's type.*length 3 (nrows of entire table)")
 # FR #2551 is now changed to fit in / fix bug #5442. Stricter warnings are in place now. Check tests 1294.1-34 below.
-test(299.8, DT[2,c:=42], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match")
-test(299.9, DT[2,c:=42L], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'integer' RHS to 'logical' to match")
-test(299.6, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE), warning="Coerced 'integer' RHS to 'logical' to match the column's type.*length 3 (nrows of entire table)")
+test(299.09, DT[2,c:=42], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'double' RHS to 'logical' to match")
+test(299.11, DT[2,c:=42L], data.table(a=1:3,c=c(FALSE,TRUE,FALSE)), warning="Coerced 'integer' RHS to 'logical' to match")
+test(299.12, DT[2:3,c:=c(0L, 0L)], data.table(a=1:3,c=FALSE), warning="Coerced 'integer' RHS to 'logical' to match the column's type.*length 3 (nrows of entire table)")
 
 
 # Test bug fix #1468, combining i and by.
@@ -1121,7 +1129,8 @@ test(384, DT[,{mySD=copy(.SD);mySD[1,b:=99L];mySD},by=a], data.table(a=rep(1:3,1
 # somehow missed testing := on logical subset with mixed TRUE/FALSE, reported by Muhammad Waliji
 DT = data.table(x=1:2, y=1:6)
 test(385, DT[x==1, y := x], data.table(x=1:2,y=c(1L,2L,1L,4L,1L,6L)))
-test(386, DT[c(FALSE,TRUE),y:=99L], data.table(x=1:2,y=c(1L,99L,1L,99L,1L,99L)))
+test(386.1, DT[c(FALSE,TRUE)], error="i evaluates to.*Recycling of logical i is no longer allowed.*use rep.*[.]N")
+test(386.2, DT[rep(c(FALSE,TRUE),length=.N),y:=99L], data.table(x=1:2,y=c(1L,99L,1L,99L,1L,99L)))
 
 # test that column names have the appearance of being local in j (can assign to them ok), bug #1624
 DT = data.table(name=c(rep('a', 3), rep('b', 2), rep('c', 5)), flag=FALSE)
@@ -1138,12 +1147,18 @@ test(388, DT[,{ans = score[1]
 
 # Test unique.data.table for numeric columns within tolerance, for consistency with
 # with unique.data.frame which does this using paste.
+old_rounding = getNumericRounding()
 DT = data.table(a=tan(pi*(1/4 + 1:10)),b=42L)
 # tan(...) from example in ?all.equal.
 test(395, all.equal(DT$a, rep(1,10)))
 test(396, length(unique(DT$a))>1)  # 10 unique values on all CRAN machines (as of Nov 2011) other than mac (5 unique)
-test(397, unique(DT), DT[1])  # before v1.7.2 unique would return all 10 rows. For stability within tolerance, data.table has it's own modified numeric sort.
-test(398, duplicated(DT), c(FALSE,rep(TRUE,9)))
+# commenting these two as they give different results on os x and linux.
+# test(397.1, unique(DT), DT[duplicated(DT)])  # default, no rounding
+# test(398.1, duplicated(DT), c(FALSE, FALSE, FALSE, TRUE, FALSE, TRUE, TRUE, TRUE, TRUE, FALSE))
+setNumericRounding(2L)
+test(397.2, unique(DT), DT[1])  # before v1.7.2 unique would return all 10 rows. For stability within tolerance, data.table has it's own modified numeric sort.
+test(398.2, duplicated(DT), c(FALSE,rep(TRUE,9)))
+setNumericRounding(old_rounding)
 
 DT = data.table(a=c(3.142, 4.2, 4.2, 3.142, 1.223, 1.223), b=rep(1,6))
 test(399, unique(DT), DT[c(1,2,5)])
@@ -1200,15 +1215,14 @@ test(415, x:=1, error="defined for use in j, once only and in particular ways")
 
 # Somehow never tested that X[Y] is error if X is unkeyed.
 DT = data.table(a=1:3,b=4:6)
-test(416, DT[J(2)], error="x must be keyed")
+test(416, DT[J(2)], error="the columns to join by must be specified either using")
 
-# Test shallow copy warning from := adding a column, and (TO DO) only when X is NAMED.
+# Test shallow copy verbose message from := adding a column, and (TO DO) only when X is NAMED.
 DT = data.table(a=1:3,b=4:6)
-test(417, alloc.col(DT,3), DT, warning="Attempt to reduce allocation from.*to 3 ignored. Can only increase allocation via shallow copy")
-old = getOption("datatable.alloccol")   # search for "r-devel" note in this file why not in one step here
-options(datatable.alloccol=3L)
+test(417, alloc.col(DT,3,verbose=TRUE), DT, output="Attempt to reduce allocation from.*to 5 ignored. Can only increase allocation via shallow copy")
+old = options(datatable.alloccol=1L)
 DT = data.table(a=1:3,b=4:6)
-options(datatable.alloccol=old)
+options(old)
 DT2 = DT
 test(418, length(DT)==2 && truelength(DT)==3)
 DT[,c:=7L]   # uses final slot
@@ -1216,7 +1230,7 @@ test(419, DT, DT2)
 test(420, length(DT)==3 && truelength(DT)==3 && length(DT2)==3 && truelength(DT2)==3)
 test(421, DT[,d:=8L,verbose=TRUE], output="Growing vector of column pointers from")
 test(422, length(DT)==4)
-test(423, truelength(DT)>=4)  # with default alloccol, new tl will be 103. But user might have set that higher and then be running test.data.table(), or user might have set alloccol to just ncol(DT)+1. Hence just >=4.
+test(423, truelength(DT), 1028L)
 
 # Test crash bug fixed, #1656, introduced with the 1.7.0 feature
 DT <- data.table(a = factor(c("A", "Z")), b = 1:4)
@@ -1233,35 +1247,42 @@ test(429, DT, data.table(a=factor(c("Z","Z","A","Z")),b=1:4))
 test(430, DT[1,1]<- 3L, NA_integer_, warning="RHS contains 3 which is outside the levels range.*1,2.*of column 1, NAs generated")
 test(431, DT[1,1:=4L], data.table(a=factor(c(NA,"Z","A","Z")),b=1:4), warning="RHS contains 4 which is outside the levels range.*1,2.*of column 1, NAs generated")
 
-# simple realloc test
-if (is.null(getOption("datatable.alloccol"))) {
-    DT = data.table(a=1:3,b=4:6)
-    test(432, truelength(DT), 100L)
-    alloc.col(DT,200)
-    test(433, truelength(DT), 200L)
-    DT = alloc.col(DT,300)  # superfluous in this example, but shouldn't fail
-    test(434, truelength(DT), 300L)
-    DT2 = alloc.col(DT,400)
-    test(435, truelength(DT), 400L)
-    test(436, truelength(DT2), 400L)
-}
 
-# test that alloc.col assigns to wherever object is
+old = getOption("datatable.alloccol")
+options(datatable.alloccol=NULL)   # In this =NULL case, R 3.0.0 returns TRUE rather than the old value.
+                                   # Hence split out into separate getOption() first.
+                                   # This was an R bug fixed in R 3.1.1.
+test(432.1, data.table(a=1:3,b=4:6), error="Has getOption('datatable.alloccol') somehow become unset?")
+options(datatable.alloccol=old)
+
+# simple realloc test
 DT = data.table(a=1:3,b=4:6)
+test(432.2, truelength(DT), 1026L)
+alloc.col(DT,200)   # should have no affect since 200<1024
+test(433, truelength(DT), 1026L)
+DT = alloc.col(DT,2000)  # test the superfluous DT =
+test(434, truelength(DT), 2002L)
+DT2 = alloc.col(DT,3000)  #  DT changed then DT2 pointed to it
+test(435, truelength(DT), 3002L)
+test(436, truelength(DT2), 3002L)
+
+# test that alloc.col assigns from within functions too (i.e. to wherever that object is)
+DT = data.table(a=1:3,b=4:6)   # tl 1024 now by default
+test(437.1, truelength(DT), 1026L)
 f = function() {
-    alloc.col(DT,200)  # DT isn't local so (via inherits=TRUE) it finds in frame above
+    alloc.col(DT,2042)  # DT isn't local so (via inherits=TRUE) it finds in frame above.
     invisible()
 }
 f()
-test(437, truelength(DT), 200L)
+test(437.2, truelength(DT), 2044L)
 
 # quick test that [<- over allocates (again) after the copy of length via *tmp*
 DT = data.table(a=1:3,b=4:6)
 tl = truelength(DT)
 DT$foo = 7L
-test(438, truelength(DT), tl)
+test(438, truelength(DT), tl+1L)   # the (not recommended) $<- calls a new alloc.col, hence tl becomes +1
 DT[,"bar"] = 8L
-test(439, truelength(DT), tl)
+test(439, truelength(DT), tl+2L)
 test(440, DT, data.table(a=1:3,b=4:6,foo=7L,bar=8L))
 
 # Test rbind works by colname now, for consistency with base, FR#1634
@@ -1347,7 +1368,7 @@ test(474, DT, data.table(a=1:4,b=5:8,c=factor(c("a","a","a","a")),d=factor(c("a"
 
 # Test scoping error introduced at 1.6.1, unique(DT) when key column is 'x'
 DT=data.table(x=c("a", "a", "b", "b"), y=c("a", "a", "b", "b"), key="x")
-test(475, unique(DT), data.table(x=c("a","b"),y=c("a","b"),key="x"))
+test(475, unique(DT, by=key(DT)), data.table(x=c("a","b"),y=c("a","b"),key="x"))
 
 # Test character and list columns in tables with many small groups
 N = if (.devtesting) 1000L else 100L
@@ -1382,8 +1403,10 @@ test(482, DT[, c("foo","baz"):=list(12L,15:18)], data.table(x=1:4,foo=12L,bar=11
 
 # Test that errors in := do not leave DT in bad state, #1711
 DT = data.table(x=1:4)
-test(483, DT[,c("foo","bar"):=list(20L,numeric())], error="RHS of assignment to new column.*bar.*is zero length but not empty list")
-test(484, DT, data.table(x=1:4))  # i.e. DT as it was before, without foo being added as it did in v1.7.7-
+test(483.1, DT[,c("foo","bar"):=list(20L,stop('user error'))], error="user error")
+test(483.2, DT, data.table(x=1:4))  # i.e. DT as it was before, without foo being added as it did in v1.7.7-
+# The test used to be as follows but as from v1.9.8, the empty numeric() now works and creates a NA_real_ column
+test(484, DT[,c("foo","bar"):=list(20L,numeric())], data.table(x=1:4, foo=20L, bar=NA_real_))
 
 # Test i's key longer than x's
 d1 <- data.table(a=1:2, b=11:14, key="a,b")
@@ -1678,7 +1701,6 @@ test(597, unique(DT), DT[c(1,2,4)])
 test(598, DT[,list(count=.N),by=c("x","y")], data.table(x=0.0,y=c(0.0,0.1,0.2),count=c(3L,1L,1L)))
 
 # And that numeric NAs sort stably to the beginning. Whether NAs are allowed in keys, another issue but
-# ordernumtol needs to deal with NA anyway for add hoc by and unique.
 DT = data.table( c(1.34, 1.34, 1.34,   NA, 2.22, 2.22, 1.34, NA,  NA, 1.34, 0.999), c(75.1,   NA, 75.1, 75.1,  2.3,  2.4,  2.5, NA, 1.1,   NA, 7.9 ))
 test(599, DT[c(8,9,4,11,2,10,7,1,3,5,6)], setkey(setkey(DT),NULL))
 
@@ -1730,13 +1752,17 @@ test(615, DT[J(2),z:=list(list(c(10L,11L)))]$z, rep(list(NULL, 10:11, NULL),each
 test(616, DT[a>1,p:=sum(b)]$p, rep(c(NA,3.3),c(3,6)))
 test(617, DT[a>1,q:=sum(b),by=a]$q, rep(c(NA,1.5,1.8),each=3))
 
-# Empty i clause, #2034. Thanks to Chris for testing, tests from him.
-test(618, copy(DT)[a>3,r:=sum(b)], DT)
-test(619, copy(DT)[J(-1),r:=sum(b)], DT)
-test(620, copy(DT)[J(-1),r:=sum(b),nomatch=0], DT)
+# Empty i clause, #2034. Thanks to Chris for testing, tests from him. Plus changes from #759
+ans = copy(DT)[,r:=NA_real_]
+test(618, copy(DT)[a>3,r:=sum(b)],     ans)
+test(619, copy(DT)[J(-1),r:=sum(b)],   ans)
+test(620.1, copy(DT)[NA,r:=sum(b)],    ans)
+test(620.2, copy(DT)[0,r:=sum(b)],     ans)
+test(620.3, copy(DT)[NULL,r:=sum(b)],  null.data.table())
+
 DT = data.table(x=letters, key="x")
 test(621, copy(DT)[J("bb"), x:="foo"], DT)  # when no update, key should be retained
-test(622, copy(DT)[J("bb"), x:="foo",nomatch=0], DT)
+test(622, copy(DT)[J("bb"), x:="foo",nomatch=0], DT, warning="ignoring nomatch")
 
 set.seed(2)
 DT = data.table(a=rnorm(5)*10, b=1:5)
@@ -1789,7 +1815,7 @@ test(632.1, merge(DT1,DT2,all=TRUE), setkey(adt(merge(adf(DT1),adf(DT2),by="a",a
 
 # Test that unsettting datatable.alloccol is caught, #2014
 old = getOption("datatable.alloccol")
-options(datatable.alloccol=NULL)  # the return value here seems to be TRUE rather than the old expression TO DO: follow up with r-devel
+options(datatable.alloccol=NULL)   # search above for R bug fix in 3.1.1 - why split into getOption first here.
 test(633, data.table(a=1:3), error="n must be integer length 1")
 options(datatable.alloccol=old)
 
@@ -1828,7 +1854,7 @@ test(645, setkey(DT,b), error="Column 2 is length 2 which differs from length of
 # Test faster mean.  Example from (now not needed as much) data.table wiki point 3.
 # Example is a lot of very small groups.
 set.seed(100)
-n=1e4  # small n so as not to overload daily CRAN checks.
+n=1e5  # small n so as not to overload daily CRAN checks.
 DT=data.table(grp1=sample(1:750, n, replace=TRUE),
               grp2=sample(1:750, n, replace=TRUE),
               x=rnorm(n),
@@ -1845,8 +1871,8 @@ test(647, ans1, ans3)
 # http://valgrind.org/docs/manual/manual-core.html#manual-core.limits
 # http://comments.gmane.org/gmane.comp.debugging.valgrind/10340
 test(648, any(is.na(ans1$V1)) && !any(is.nan(ans1$V1)))
-if (.devtesting) test(649, tt1["user.self"] < 10*tt2["user.self"])   # should be same speed, but *10 as large margin
-if (.devtesting) test(650, tt1["user.self"] < tt3["user.self"]/2)   # 10 times faster, but test 2 times faster as large margin
+# test 649 removed as compared 1.1s to 1.1s
+if (.devtesting) test(650, tt1["user.self"] < tt3["user.self"])
 
 tt1 = system.time(ans1<-DT[,list(mean(x,na.rm=TRUE),mean(y,na.rm=TRUE)),by=list(grp1,grp2)])   # 2.0s
 tt2 = system.time(ans2<-DT[,list(mean.default(x,na.rm=TRUE),mean.default(y,na.rm=TRUE)),by=list(grp1,grp2)])  # 5.0s
@@ -1893,7 +1919,7 @@ test(667, DT[a<3,sum(b),by=paste("a")], error='Otherwise, by=eval(paste("a")) sh
 test(668, DT[a<3,sum(b),by=eval(paste("a"))], DT[a<3,sum(b),by=a])
 test(669, DT[a<3,sum(b),by=c(2)], error="must evaluate to 'character'")
 
-# Test := keyby does key, #2065
+# Test := keyby does setkey, #2065
 DT = data.table(x=1:2, y=1:6)
 ans = data.table(x=rep(1:2,each=3),y=c(1L,3L,5L,2L,4L,6L),z=rep(c(9L,12L),each=3),key="x")
 test(670, DT[,z:=sum(y),keyby=x], ans)
@@ -1904,7 +1930,7 @@ test(672, DT[,z:=sum(y),keyby=x%%2], data.table(x=1:2,y=1:6,z=c(9L,12L)), warnin
 DT = data.table(x=1:2, y=1:6)
 test(673, DT[,z:=sum(y),by=x%%2], data.table(x=1:2,y=1:6,z=c(9L,12L)))
 DT = data.table(x=1:2, y=1:6)
-test(674, DT[x>1,z:=sum(y),keyby=x], error="When i is present, keyby := on a subset of rows doesn't make sense. Either change keyby to by, or remove i")
+test(674, DT[x>1,z:=sum(y),keyby=x], error=":= with keyby is only possible when i is not supplied since")
 
 # Test new .()
 DT = data.table(x=1:2, y=1:6, key="x")
@@ -2181,7 +2207,8 @@ test(788, DT[!J(2)], data.table(A=c(1L,1L,3L,3L),B=c(1L,2L,5L,6L),key="A"))
 test(789, DT[!(2:6)], DT[1])
 test(790, DT[!(2:6)], DT[!2:6])   # nicer than DT[-2:6] applying - to 2 first
 test(791, DT[!6], DT[1:5])
-test(792, DT[!c(TRUE,FALSE)], DT[c(FALSE,TRUE)])
+test(792.1, DT[!rep(c(TRUE,FALSE),length=.N)], DT[rep(c(FALSE,TRUE),length=.N)])
+test(792.2, DT[!A>=2], DT[A<2])
 test(793, setkey(DT[,A:=letters[A]],A)[!c("b","c")], DT["a"])
 test(794, DT[!"b"], DT[c("a","c")])
 test(795, DT[!0], DT)
@@ -2292,7 +2319,8 @@ if ("package:xts" %in% search()) {  # e.g. when run via R CMD check
 } else {
     cat("Test 841 not run. If required call library(xts) first.\n")
     # So these won't run from R CMD check (deliberately, for now) ...
-    test(842, last(list("a",1:2,89)), 89)  # xts's last returns a one item list here. Would prefer it to return the item itself.
+    ans = if ("package:gdata" %in% search()) list(89) else 89
+    test(842, last(list("a",1:2,89)), ans)  # xts's last and gdata::last returns a one item list here. Would prefer it to return the item itself.
     DT = data.table(a=1:3)
     test(842.1, last(DT), DT[3L])
     # xts's last returns a 3L atomic here for 1 column data.frame, strangely. We wish for the last row, consistently. I tried
@@ -2306,22 +2334,22 @@ l = list(data.table(a=1:3), data.table(b=4:6))
 test(843, l[[2L]][,c:=7:9], data.table(b=4:6,c=7:9))
 test(844, l, list(data.table(a=1:3), data.table(b=4:6,c=7:9)))
 names(l) = c("foo","bar")   # R >= 3.1 no longer copies all the contents, yay
-test(845, l[["foo"]][2,d:=4], data.table(a=1:3,d=c(NA,4L,NA)),
+test(845, l[["foo"]][2,d:=4L], data.table(a=1:3,d=c(NA,4L,NA)),
     warning= if (!.R.assignNamesCopiesAll) NULL else "Invalid .internal.selfref detected and fixed")
 l = list(data.table(a=1:3), data.table(b=4:6))
 setattr(l,"names",c("foo","bar"))
 test(846, l[["foo"]][2,d:=4], data.table(a=1:3,d=c(NA,4,NA)))
 test(847, l, list(foo=data.table(a=1:3,d=c(NA,4,NA)), bar=data.table(b=4:6)))
-old = getOption("datatable.alloccol")
-options(datatable.alloccol=2L)  # the return value here seems to be TRUE rather than the old expression TO DO: follow up with r-devel
+old = options(datatable.alloccol=0L)
 l = list(foo=data.table(a=1:3,b=4:6),bar=data.table(c=7:9,d=10:12))   # list() doesn't copy the NAMED==0 objects here
 test(848, truelength(l[[1L]]), 2L)
 test(849, {l[[1L]][,e:=13:15]; l[[1L]]}, data.table(a=1:3,b=4:6)[,e:=13:15])
-test(850, truelength(l[[1L]]), 102L)
+test(850, truelength(l[[1L]]), 3L)
 test(851, truelength(l[[2L]]), 2L)
+options(datatable.alloccol=1L)
 l[["bar"]][,f:=16:18]
-test(852, truelength(l[[2L]]), 102L)
-options(datatable.alloccol=old)
+test(852, truelength(l[[2L]]), 4L)
+options(old)
 # Now create the list from named objected
 DT1 = data.table(a=1:3, b=4:6)
 DT2 = data.table(c=7:9)
@@ -2382,7 +2410,7 @@ test(864.3, rbindlist(list(data.table(logical(0),logical(0)), DT<-data.table(baz
 # Steve's find that setnames failed for numeric 'old' when pointing to duplicated names
 DT = data.table(a=1:3,b=1:3,v=1:6,w=1:6)
 test(865, ans1<-DT[,{list(name1=sum(v),name2=sum(w))},by="a,b",verbose=TRUE],
-          output="result of j is a named list. It's very inefficient.*removed and put back")
+          output="GForce optimized.*gsum(v), gsum(w)")  # v1.9.7 treats wrapped {} better, so this is now optimized
 test(866, names(ans1), c("a","b","name1","name2"))
 test(867, names(ans2<-DT[,list(name1=sum(v),name2=sum(w)),by="a,b"]), c("a","b","name1","name2"))  # list names extracted here
 test(868, ans1, ans2)
@@ -2487,6 +2515,7 @@ for (eol in if (.Platform$OS.type=="unix") c("\n","\r\n") else "\n") {
     unlink(f3)
 }}}
 if ("package:bit64" %in% search()) {
+    n = 2000
     DT = data.table( a=sample(1:1000,n,replace=TRUE),
                      b=sample(as.integer64(2)^35 * 1:10, n, replace=TRUE),
                      c=sample(c("foo","bar","baz"),n,replace=TRUE) )
@@ -2494,17 +2523,16 @@ if ("package:bit64" %in% search()) {
     test(897, class(DT$b), "integer64")
     test(898, fread(f), DT)
     unlink(f)
-
     # Test all mid read bump coercions
     DT[,a2:=as.integer64(a)][,a3:=as.double(a)][,a4:=gsub(" ","",format(a))]
     DT[,b2:=as.double(b)][,b3:=gsub(" ","",format(b))]
     DT[,r:=a/100][,r2:=gsub(" ","",format(r))]
-    DT[12, a2:=as.integer64(12345678901234)]   # start on row 12 to avoid first 5, middle 5 and last 5 test rows
-    DT[13, a3:=3.14]
-    DT[14, a4:="123A"]
-    DT[15, b2:=1234567890123.45]
-    DT[16, b3:="12345678901234567890A"]  # A is needed otherwise read as double with loss of precision (TO DO: should detect and bump to STR)
-    DT[17, r2:="3.14A"]
+    DT[112, a2:=as.integer64(12345678901234)]   # start on row 112 to avoid the first 100 
+    DT[113, a3:=3.14]
+    DT[114, a4:="123A"]
+    DT[115, b2:=1234567890123.45]
+    DT[116, b3:="12345678901234567890A"]  # A is needed otherwise read as double with loss of precision (TO DO: should detect and bump to STR)
+    DT[117, r2:="3.14A"]
     write.table(DT,f<-tempfile(),sep=",",row.names=FALSE,quote=FALSE)
     test(899, fread(f), DT, warning="Bumped column.*to type character.*may not be lossless")
     unlink(f)
@@ -2520,10 +2548,10 @@ f = "1206FUT.txt"    # a CRLF line ending file (DOS)
 test(901.1, DT<-fread(f,strip.white=FALSE), setDT(read.table(f,sep="\t",header=TRUE,colClasses=as.vector(sapply(DT,class)))))
 test(901.2, DT<-fread(f), setDT(read.table(f,sep="\t",header=TRUE,colClasses=as.vector(sapply(DT,class)),strip.white=TRUE)))
 
-# Tests the coerce of column 23 to character on line 179 due to the 'A' for the first time :
+# Test the coerce of column 23 to character on line 179 due to the 'A' for the first time.
+# As from v1.9.8 the columns are guessed better and there is no longer a warning.  Test 899 tests the warning.
 f = "2008head.csv"
-test(902, fread(f), as.data.table(read.csv(f,stringsAsFactors=FALSE)), warning="Bumped column 23 to type character.*may not be lossless")
-
+test(902, fread(f), as.data.table(read.csv(f,stringsAsFactors=FALSE)))
 test(903, fread("A,B\n1,3,foo,5\n2,4,barbaz,6"), data.table(1:2,3:4,c("foo","barbaz"),5:6),
           warning="Starting data input on line 2 and discarding line 1 because.*: A,B")  # invalid colnames (too short)
 test(904, fread("A,B,C,D\n1,3,foo,5\n2,4,barbaz,6"), DT<-data.table(A=1:2,B=3:4,C=c("foo","barbaz"),D=5:6))  # ok
@@ -2725,7 +2753,7 @@ test(985.2, rbindlist(list(data.table(c("A","B")), data.table(factor(c("C",NA)))
 ## uniqueness
 dt <- data.table(A = rep(1:3, each=4), B = rep(11:14, each=3), C = rep(21:22, 6), key = "A,B")
 df <- as.data.frame(dt)
-test(986, unique(dt), dt[!duplicated(df[, key(dt)]),])
+test(986, unique(dt, by=key(dt)), dt[!duplicated(df[, key(dt)]),])
 test(987, unique(dt, by='A'), dt[!duplicated(df[, 'A'])])
 test(988, unique(dt, by='B'), dt[!duplicated(df[, 'B'])])
 test(989, unique(dt, by='C'), dt[!duplicated(df[, 'C'])])
@@ -2806,7 +2834,7 @@ test(1008.2, rbindlist(list(DT1, DT3)), data.table(a = c(1L,2L), b = c(2, 2.5)))
 
 # optimized mean() respects na.rm=TRUE by default, as intended
 DT = data.table(a=c(NA,NA,FALSE,FALSE), b=c(1,1,2,2))
-test(1009, DT[,list(mean(a), sum(a)),by=b], data.table(b=c(1,2),V1=c(NA,0),V2=c(NA,0)))
+test(1009, DT[,list(mean(a), sum(a)),by=b], data.table(b=c(1,2),V1=c(NA,0),V2=c(NA_integer_,0L))) # sum(logical()) should be integer, not real
 
 # an fread error shouldn't hold a lock on the file on Windows.
 f = tempfile()
@@ -2823,9 +2851,9 @@ test(1013, fread("A,B\n123,123\n", integer64="integer"), error="integer64='%s' w
 test(1014, fread("A,B\n123456789123456,21\n", integer64="character"), data.table(A="123456789123456",B=21L))
 test(1015, fread("A,B\n123456789123456,21\n", integer64="double"), data.table(A=as.double("123456789123456"),B=21L))
 # and that mid read bumps respect integer64 control too ..
-x = sample(1:1000,100,replace=TRUE)
+x = sample(1:1000,2000,replace=TRUE)
 DT = data.table( A=as.character(x), B=1:100)
-DT[15, A:="123456789123456"]  # row 15 is outside the top, middle and last 5 rows.
+DT[115, A:="123456789123456"]  # row 115 is outside the 100 rows at 10 points.
 write.table(DT,f<-tempfile(),sep=",",row.names=FALSE,quote=FALSE)
 test(1016, fread(f,integer64="numeric"), copy(DT)[,A:=as.numeric(A)])
 test(1017, fread(f,integer64="character"), DT, warning="Bumped column.*to type character.*may not be lossless")
@@ -2935,10 +2963,10 @@ if ("package:reshape2" %in% search()) {
 
     # bug #699 - melt segfaults when vars are not in dt
     x = data.table(a=c(1,2),b=c(2,3),c=c(3,4))
-    test(1316.1, melt(x, id="d"), error="Column 'd' not found in 'data'")
-    test(1316.2, melt(x, measure="d"), error="Column 'd' not found in 'data'")
-    test(1316.3, melt(x, id="a", measure="d"), error="Column 'd' not found in 'data'")
-    test(1316.4, melt(x, id="d", measure="a"), error="Column 'd' not found in 'data'")
+    test(1316.1, melt(x, id="d"), error="One or more values")
+    test(1316.2, melt(x, measure="d"), error="One or more values")
+    test(1316.3, melt(x, id="a", measure="d"), error="One or more values")
+    test(1316.4, melt(x, id="d", measure="a"), error="One or more values")
 
     # fix for #780.
     DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
@@ -2959,6 +2987,18 @@ if ("package:reshape2" %in% search()) {
     ## to ensure there's no segfault...
     ans <- melt(dt, measure.vars=names(dt), na.rm=TRUE)
     test(1509, ans, ans)
+
+    # improper levels fix, #1359
+    dt = data.table(id=1:3, x=NA_character_, y=c('a', NA_character_, 'c'))
+    test(1563, melt(dt, id.var="id", na.rm=TRUE), data.table(id=c(1L,3L), variable=factor(c("y", "y")), value=c("a", "c")))
+
+    # fixing segfault due to negative id and measure vars that I detected by accident
+    dt = data.table(x=1:5, y=6:10, z=11:15)
+    test(1569.1, melt(dt, id=-1, measure=NULL), error="One or more values in 'id.vars'")
+    test(1569.2, melt(dt, id=-1, measure=-1), error="One or more values in 'id.vars'")
+    test(1569.3, melt(dt, id=NULL, measure=-1), error="One or more values in 'measure.vars'")
+    test(1569.4, melt(dt, id=5, measure=-1), error="One or more values in 'id.vars'")
+    test(1569.5, melt(dt, id=1, measure=-1), error="One or more values in 'measure.vars'")
 }
 
 # sorting and grouping of Inf, -Inf, NA and NaN,  #4684, #4815 & #4883
@@ -2976,7 +3016,7 @@ test(1041, rbindlist(l), data.table(i1=factor(letters[1:5]),val=1:5))
 
 # negative indexing in *i* leads to crash/wrong aggregates when dogroups is called. bug #2697
 DT = data.table(x = c(1,2,3,4,5), group = c(1,1,2,2,3))
-test(1042, DT[-5, mean(x), by = group], data.table(group=1:2, V1=c(1.5, 3.5)))
+test(1042, DT[-5, mean(x), by = group], data.table(group=c(1,2), V1=c(1.5, 3.5)))
 # Test when abs(negative index) > nrow(dt) - should warn
 test(1042.1, DT[-10], DT, warning="Item 1 of i is -10 but there are only 5 rows. Ignoring this and 0 more like it out of 1.")
 test(1042.2, DT[c(-5, -10), mean(x), by = group], data.table(group=c(1,2),V1=c(1.5,3.5)), warning="Item 2 of i is -10 but there are only 5 rows. Ignoring this and 0 more like it out of 2.") 
@@ -3117,10 +3157,22 @@ test(1087, class(DT$last.x1), "ITime")
 
 # Tests 1088-1093 were non-ASCII. Now in DtNonAsciiTests
 
-# print of unnamed DT with >20 <= 100 rows, #4934
+# print of unnamed DT with >20 <= 100 rows, #97 (RF#4934)
 DT <- data.table(x=1:25, y=letters[1:25])
 DT.unnamed <- unname(copy(DT))
-test(1094, print(DT.unnamed), output="NA NA 1:  1  a 2:  2  b 3:  3  c")
+test(1094.1, capture.output(print(DT.unnamed)),
+     c("        ", " 1:  1 a", " 2:  2 b", " 3:  3 c", " 4:  4 d", 
+       " 5:  5 e", " 6:  6 f", " 7:  7 g", " 8:  8 h", " 9:  9 i",
+       "10: 10 j", "11: 11 k", "12: 12 l", "13: 13 m", "14: 14 n",
+       "15: 15 o", "16: 16 p", "17: 17 q", "18: 18 r", "19: 19 s", 
+       "20: 20 t", "21: 21 u", "22: 22 v", "23: 23 w", "24: 24 x",
+       "25: 25 y", "        "))
+
+# print of blank-named DT (eliminating matrix notation)
+#   #545 (RF#5253) and part of #1523
+DT <- data.table(x = 1:3)
+setnames(DT, "")
+test(1094.2, capture.output(print(DT)), c("    ", "1: 1", "2: 2", "3: 3"))
 
 # DT[!TRUE] or DT[!TRUE, which=TRUE], #4930. !TRUE still can be a recycling operation with !(all TRUE)
 DT <- data.table(x=1:3, y=4:6)
@@ -3330,7 +3382,7 @@ if ("package:reshape2" %in% search()) {
     ans = ans[ , .(min=min(value), max=max(value)), by=.(Month, variable)]
     ans = melt(ans, id=1:2, variable.name="variable2")
     ans = dcast(ans, Month ~ variable + variable2)
-    setnames(ans, c("Month", paste(names(ans)[-1L], ".", sep="_")))
+    setnames(ans, c("Month", paste(names(ans)[-1L], sep="_")))
     valvars = c("Ozone", "Solar.R", "Wind", "Temp")
     ans2 <- suppressWarnings(dcast(dt, Month ~ ., fun=list(min, max), na.rm=TRUE, value.var=valvars))
     setcolorder(ans, names(ans2))
@@ -3348,7 +3400,14 @@ x1 <- data.table(a = c(1:5), b = c(1:5))
 f <- tempfile()
 write.csv(x1, f, row.names = FALSE)
 if (.Platform$OS.type == "unix") {
-    test(1105, x1[a != 3], fread(paste('grep -v 3 ', f, sep="")))
+    gl = identical(Sys.getenv("CI_SERVER_NAME"), "GitLab CI")
+    if(gl){
+        # skip test which fails in CI, data.table#1506
+        x2 = try(fread(paste('grep -v 3 ', f, sep="")), silent = TRUE)
+        if(is.data.table(x2)) test(1105, x1[a != 3], x2)
+    } else {
+        test(1105, x1[a != 3], fread(paste('grep -v 3 ', f, sep="")))
+    }
 } else {
     # x2 <- fread(paste('more ', f, sep=""))
     # Doesn't work on winbuilder. Relies on 'more' available in DOS via Cygwin?
@@ -3441,7 +3500,7 @@ test(1125, rbindlist(list(DT1, DT1)), data.table(factor(c('a','a'), levels = c('
 DT = data.table(a = 1:2, b = 1:2)
 DT1 = data.table(a = 3:4, c = 1:2)
 
-test(1126, rbind(DT, DT1, fill = TRUE), data.table(a = 1:4, b = c(1, 2, NA, NA), c = c(NA, NA, 1, 2)))
+test(1126, rbind(DT, DT1, fill = TRUE), data.table(a = 1:4, b = c(1:2, NA, NA), c = c(NA, NA, 1:2)))
 
 ## check for #4959 - rbind'ing empty data.table's
 DT = data.table(a=character())
@@ -3489,8 +3548,10 @@ test(1133.2, DT[, new := c(1,2), by=x], data.table(x=c(1,1,1,1,1,2,2), new=c(1,2
 # Fix for FR #2496 - catch `{` in `:=` expression in `j`:
 DT <- data.table(x=c("A", "A", "B", "B"), val =1:4)
 DT2 <- copy(DT)[, a := 1L]
-test(1134.1, DT[, {a := 1L}], DT2, warning="Caught and removed")
-test(1134.2, DT[, {b := 2L}, by=x], DT2[, b:=2L, by=x], warning="Caught and removed")
+test(1134.1, DT[, {a := 1L}], DT2)
+test(1134.2, DT[, {a := 1L; NULL}], error="You have wrapped.*which is ok.*Consider")
+test(1134.3, DT[, {b := 2L}, by=x], DT2[, b:=2L, by=x])
+test(1134.3, DT[, {b := 2L; sum(val)}, by=x], error="You have wrapped.*which is ok.*Consider")
 
 # fix for bug #5069 
 if ("package:gdata" %in% search()) {
@@ -3557,7 +3618,7 @@ test(1139, capture.output(print(DT)), c("           x y", "1:     a ~ b 1", "2:
 
 # FR #4813 - provide warnings if there are remainders for both as.data.table.list(.) and data.table(.)
 X = list(a = 1:2, b = 1:3)
-test(1140, as.data.table(X), data.table(a=c(1,2,1), b=c(1,2,3)), warning="Item 1 is of size 2 but maximum")
+test(1140, as.data.table(X), data.table(a=c(1:2,1L), b=c(1:3)), warning="Item 1 is of size 2 but maximum")
 test(1141.1, data.table(a=1:2, b=1:3), data.table(a=c(1L,2L,1L), b=1:3), warning="Item 1 is of size 2 but maximum")
 test(1141.2, data.table(a=1:2, data.table(x=1:5, y=6:10)), data.table(a=c(1L,2L,1L,2L,1L), x=1:5, y=6:10), warning="Item 1 is of size 2 but maximum")
 test(1141.3, data.table(a=1:5, data.table(x=c(1,2), y=c(3,4))), data.table(a=c(1:5), x=c(1,2,1,2,1), y=c(3,4,3,4,3)), warning="Item 2 is of size 2 but maximum")
@@ -3619,21 +3680,44 @@ DF <- as.data.frame(DT)
 test(1146.2, {set(DF, i=NULL, j=1L, value=seq_len(nrow(DF)));setattr(DF,"reference",NULL);DF}, data.frame(Time=1:nrow(BOD), demand=BOD$demand))
 test(1146.3, set(DF, i=NULL, j="bla", value=seq_len(nrow(DF))), error="set() on a data.frame is for changing existing columns, not adding new ones. Please use a data.table for that.")
 
-# Feature - implemented fast radix order for numeric types (both +ve and -ve numerics).
-# note that if "x" is already a list, then the values will be modified by reference!
-# Note: 'ordernumtol' doesn't distinguish between NA and NaN whereas this one does!
-# R-wrapper is dradixorder
-set.seed(45)
-x <- rnorm(1e6)*1e4
-test(1147.1, base::order(x), dradixorder(x, tol=numeric(0))) # base::order doesn't test with tolerance
-test(1147.2, ordernumtol(x), dradixorder(x))
-tol = .Machine$double.eps^0.5
-x <- c(8, NaN, Inf, -7.18918, 5.18909+0.07*tol, NA, -7.18918111, -Inf, NA, 5.18909, NaN, 5.18909-1.2*tol, 5.18909-0.04*tol)
-test(1147.3, dradixorder(x), c(6L, 9L, 2L, 11L, 8L, 7L, 4L, 12L, 5L, 10L, 13L, 1L, 3L))
+if (.Machine$sizeof.longdouble == 16) {
+  # To not run on CRAN's solaris-sparc 32bit where sizeof.longdouble==0
+  
+  old = getNumericRounding()
+  
+  set.seed(6)
+  x = rnorm(1e6)*1e4
+  ans = base::sort.list(x, method="shell")
+  setNumericRounding(0)
+  test(1147.1, ans, forderv(x))
+  setNumericRounding(1)
+  test(1147.2, ans, forderv(x))
+  setNumericRounding(2)
+  test(1147.3, sum(ans != forderv(x)), 2)
+
+  tol = 3.000214e-13 
+  x = c(8, NaN, Inf, -7.18918, 5.18909+0.07*tol, NA, -7.18918111, -Inf, NA, 5.18909, NaN, 5.18909-1.2*tol, 5.18909-0.04*tol)
+  # cat(data.table:::binary(x[c(5,10,12,13)]),sep="\n")
+  # 0 10000000001 010011000001101000001100111100011000 00000000 11000000
+  # 0 10000000001 010011000001101000001100111100011000 00000000 10101000
+  # 0 10000000001 010011000001101000001100111100010111 11111111 00010011
+  # 0 10000000001 010011000001101000001100111100011000 00000000 10011010
+  
+  setNumericRounding(0)
+  test(1147.4, forderv(x), INT(6, 9, 2, 11, 8, 7, 4, 12, 13, 10, 5, 1, 3))
+  
+  setNumericRounding(1)
+  test(1147.5, forderv(x), INT(6, 9, 2, 11, 8, 7, 4, 12, 5, 10, 13, 1, 3))
+  
+  setNumericRounding(2)
+  test(1147.6, forderv(x), INT(6, 9, 2, 11, 8, 7, 4, 5, 10, 12, 13, 1, 3))
+  # rounds item 12 at bit 48 doesn't just truncate
+  
+  setNumericRounding(old)
+}
 
-# test for `iradixorder` when input is integer(0) and numeric(0)
-test(1149.1, iradixorder(integer(0)), integer(0)) 
-test(1149.2, iradixorder(numeric(0)), error="iradixorder is only for integer") 
+test(1149.1, forderv(integer(0)), integer(0)) 
+test(1149.2, forderv(numeric(0)), integer(0)) 
 
 # test uniqlengths
 set.seed(45)
@@ -3689,7 +3773,7 @@ DT = data.table(A=rep(1:2,c(100000,1)), B=runif(100001))
 before = gc()["Vcells",2]
 for (i in 1:50) DT[, sum(B), by=A]
 after = gc()["Vcells",2]
-test(1157, after < before+1)  # +1 = 1MB
+test(1157, after < before+3)  # +3 = 3MB
 # Before the patch, Vcells grew dramatically from 6MB to 60MB. Now stable at 6MB. Increase 50 to 1000 and it grew to over 1GB for this case.
 
 # Similar for when dogroups writes less rows than allocated, #2648.
@@ -3697,7 +3781,7 @@ DT = data.table(k = 1:50, g = 1:20, val = rnorm(1e4))
 before = gc()["Vcells",2]
 for (i in 1:50) DT[ , unlist(.SD), by = 'k']
 after = gc()["Vcells",2]
-test(1158, after < before+1)
+test(1158, after < before+3)  # 177.6MB => 179.2MB. Needs to be +3 now from v1.9.8 with alloccol up from 100 to 1024
 
 # tests for 'setDT' - convert list, DF to DT without copy
 x <- data.frame(a=1:4, b=5:8)
@@ -3865,8 +3949,8 @@ set(DT,c(5L,9L),"v",NA)
 set(DT,10:12,"y",NA)
 set(DT,10:12,"v",NA)
 options(datatable.optimize=1)  # turn off GForce
-test(1184.1, DT[, sum(v), by=x, verbose=TRUE], output="dogroups")
-test(1184.2, DT[, mean(v), by=x, verbose=TRUE], output="dogroups")
+test(1184.1, DT[, sum(v), by=x, verbose=TRUE], output="(GForce FALSE)")
+test(1184.2, DT[, mean(v), by=x, verbose=TRUE], output="(GForce FALSE)")
 test(1185.1, DT[, list(sum(y), sum(v), sum(y,na.rm=TRUE), sum(v,na.rm=TRUE)), by=x],
            data.table(x=c("a","b","c","d"), V1=c(NA,10L,NA,NA), V2=c(6,NA,NA,NA), V3=c(4L,10L,7L,0L), V4=c(6,10,15,0)))
 options(datatable.optimize=0)  # turn off fastmean optimization to get the answer to match to
@@ -3919,8 +4003,13 @@ test(1194, DT[.(x),.N], 1)  # tests bmerge uses twiddle
 DT[3, val:=0.0275016249291963]
 setkey(DT, NULL)  # val[3] and val[4] are now equal, within 2 byte rounding
 test(1195, DT[,.N,keyby=val], setkey(DT,val)[,.N,by=val])
-test(1196, DT[,.N,by=val]$N, INT(1,1,2))
-test(1197, DT[.(x),.N], 2)
+old_rounding = getNumericRounding() # default is 0
+test(1196.1, DT[,.N,by=val]$N, INT(1,1,1,1))
+test(1197.1, DT[.(x),.N], 1)
+setNumericRounding(2L)
+test(1196.2, DT[,.N,by=val]$N, INT(1,1,2))
+test(1197.2, DT[.(x),.N], 2)
+setNumericRounding(old_rounding)
 
 DT = data.table(id=1:2, val1=6:1, val2=6:1)   # 5380
 test(1199, DT[, sum(.SD), by=id], error="GForce sum can only be applied to columns, not .SD or similar.*looking for.*lapply(.SD")
@@ -3940,6 +4029,7 @@ DT = data.table(a=6:1, b=1:2)
 test(1206, DT[order(b,a)], data.table(a=INT(2,4,6,1,3,5),b=INT(1,1,1,2,2,2)))
 
 # Test joining to Inf, -Inf and mixed non-finites, and grouping
+old_rounding = getNumericRounding()
 DT = data.table(A=c(1,2,-Inf,+Inf,3,-1.1,NaN,NA,3.14,NaN,2.8,NA), B=1:12, key="A")
 for (i in 1:2) {
     setNumericRounding(if (i==1L) 0L else 2L)
@@ -3952,6 +4042,7 @@ for (i in 1:2) {
     test(1213+i*0.1, DT[,sum(B),keyby=list(g=abs(trunc(A)))], data.table(g=c(NA,NaN,1,2,3,Inf),V1=INT(20,17,7,13,14,7),key="g"))
     # test(1214+i*0.1, DT[.(-200.0),roll=TRUE]$B, 3L)  # TO DO: roll to -Inf.  Also remove -Inf and test rolling to NaN and NA
 }
+setNumericRounding(old_rounding)
 
 # that fread reads unescaped (but balanced) quotes in the middle of fields ok, #2694
 test(1215,
@@ -4100,25 +4191,48 @@ setnames(ans2, names(ans1))
 test(1240.1, ans1, setDT(as.data.frame(with(DT, table(XX, yy)), stringsAsFactors=FALSE)))
 test(1240.2, ans2, ans1)
 
+# R 3.3.0 started to use data.table's radix sort by default for order() on integer/factors.
+# Therefore we check against the non-data.table method ('shell') for correctness (otherwise we'd be
+# checking data.table code against itself) as well as checking data.table's ported code in R;
+# i.e. a three-way match.
+if (base::getRversion() < "3.3.0") {
+    base_order <- base::order
+} else {
+    base_order <- function(..., na.last=TRUE, method=c("shell","radix")) {
+        ans1 = base::order(..., na.last=na.last, method="shell")
+        if (!is.na(na.last) || base::getRversion()>"3.3.2") {
+            ans2 = base::order(..., na.last=na.last, method="radix")
+            if (!identical(ans1,ans2)) stop("Base R's order(,method='shell') != order(,method='radix')")
+        } else {
+            # Only when na.last=NA in just R 3.3.0-3.3.2 we don't check shell==radix
+            # because there was a problem in base R's port of data.table code then when :
+            #     1) 2 or more vectors were passed to base::order(,method="radix")
+            # AND 2) na.last=NA
+            # AND 3) there is a subgroup of size exactly 2
+            # AND 4) one of those 2 items in the subgroup is NA and the other is not NA
+            # See tests 1728.3 and 1728.13.
+        }
+        ans1
+    }
+}
+
 # Test for optimisation of 'order' to 'forder'
 set.seed(45L)
 DT <- data.table(x=sample(1e2, 1e6,TRUE), y=sample(1e2, 1e6,TRUE))
-# with optimisation -> order will be optimised to forder
-optim = getOption("datatable.optimize")
-options(datatable.optimize=Inf)
-t1 = unname(system.time(ans1 <- DT[order(x,-y)])['elapsed'])
-# without optimisation
-options(datatable.optimize=0L)
-t2 = unname(system.time(ans2 <- DT[order(x,-y)])['elapsed'])
+old = options(datatable.optimize=Inf)
+t1 = system.time(ans1 <- DT[order(x,-y)])[['elapsed']]   # optimized to forder()
+t2 = system.time(ans2 <- DT[base_order(x,-y)])[['elapsed']]  # not optimized
 test(1241.1, ans1, ans2)
-test(1241.2, t1 < t2, TRUE)  # with optimisation must be faster
-# restore optimisation
-options(datatable.optimize=optim)
+if (.devtesting) test(1241.2, t1 < t2+0.1)
+# 0.2 < 3.8 on Matt's laptop seems safe enough to test.
+# Even so, 1241.2 has been known to fail, perhaps if system swaps and this R sessions pauses or something?
+# We shouldn't have timing tests here that run on CRAN for this reason.  Hence wrapping with .devtesting
+options(old)
 
-# check no warning yet for with=FALSE and :=.  To be helpful warning, then helpful error.
 DT = data.table(a=1:3, b=4:6)
 myCol = "a"
-test(1242, DT[2,myCol:=6L,with=FALSE], data.table(a=INT(1,6,3), b=4:6))
+test(1242.1, DT[2,myCol:=6L,with=FALSE], data.table(a=INT(1,6,3), b=4:6), warning="with=FALSE together with := was deprecated in v1.9.4 released Oct 2014. Please")
+test(1242.2, DT[2,(myCol):=7L], data.table(a=INT(1,7,3), b=4:6))
 
 # consistency of output type of mult, #5378
 DT = data.table(id=rep(1:2,each=2), var=rnorm(4), key="id")
@@ -4192,9 +4306,9 @@ test(1248.1, DT[, y := y * eval(parse(text="1*2"))], data.table(x=seq(1,10,1), y
 # fix in 1248 was not complete. resurfaced again as bug #5527. Fixed now, test added here below:
 DT <- data.table(id=1:5, var=letters[1:5])
 ans <- copy(DT)
-id <- "va"
-test(1248.2, DT[, eval(parse(text=paste(id,"r",sep="")))], letters[1:5])
-test(1248.3, DT[, id2:=eval(parse(text=paste(id,"r",sep="")))], ans[, id2 := var])
+idPrefix <- "va"   # if this variable were named 'id' then the paste(id) below would see the 'id' _column_.
+test(1248.2, DT[, eval(parse(text=paste(idPrefix,"r",sep="")))], letters[1:5])
+test(1248.3, DT[, id2:=eval(parse(text=paste(idPrefix,"r",sep="")))], ans[, id2 := var])
 
 # test to make sure DT[order(...)] works fine when it's already sorted (forgot the case where forder returns integer(0) before)
 DT <- data.table(x=rep(1:4, each=5), y=1:20)
@@ -4260,7 +4374,7 @@ for (i in seq_along(names(DT))) {
     ans[[i]] = combn(names(DT), i, function(x) {
         tmp = apply(cj, 1, function(y) {
             test_no <<- signif(test_no+.001, 7)
-            ll = as.call(c(as.name("order"), 
+            ll = as.call(c(as.name("base_order"),
                     lapply(seq_along(x), function(j) {
                         if (y[j] == 1L) 
                             as.name(x[j]) 
@@ -4288,10 +4402,11 @@ ans = NULL
 if (nfail > oldnfail) cat(seedInfo, "\n")  # to reproduce
 
 ###############
-
+old_rounding = getNumericRounding()
 # turning off tolerance for UPCs (> 11 s.f. stored in numeric), #5369
 DT <- data.table(upc = c(301426027592, 301426027593, 314775802939, 314775802940, 314775803490, 314775803491, 314775815510, 314775815511, 314933000171, 314933000172),
                  year = 2006:2007)
+setNumericRounding(2L)
 test(1253, DT[,.N,by=upc]$N, rep.int(2L,5L))
 setNumericRounding(0)
 test(1254, DT[,.N,by=upc], data.table(upc=DT$upc, N=1L))
@@ -4317,7 +4432,7 @@ test(1264, DT[,.N,by=id]$N, 4L)  # 1 byte rounding isn't enough
 setNumericRounding(0)
 test(1265, DT[,.N,by=id]$N, INT(1,1,1,1))
 test(1266, getNumericRounding(), 0L)
-setNumericRounding(2)
+setNumericRounding(old_rounding)
 
 # fread reading NA in logical columns, #4766
 DF = data.frame(I=1:3, L=c(T,F,NA), R=3.14)
@@ -4384,6 +4499,7 @@ test(1268.8, dt[, c(.I, lapply(.SD, mean)), by=x, verbose=TRUE], ans2,
 ### FR #2722 tests end here ###
 
 # Wide range numeric and integer64, to test all bits
+old_rounding = getNumericRounding()
 x = sample( c(seq(-1e100, 1e100, length=1e5), c(seq(-1e-100,1e-100,length=1e5))) )
 setNumericRounding(0)
 test(1269, forderv(x), base::order(x))
@@ -4400,12 +4516,14 @@ if ("package:bit64" %in% search()) {
     test(1274, duplicated(DT, by="a"), rep(c(FALSE,TRUE),each=3))
     setkey(DT,a)
     test(1275, DT[.(as.integer64(35184372088834))], DT[3:4])
-    test(1276, unique(DT), DT[c(1,3,5)])
-    test(1277, duplicated(DT), rep(c(FALSE,TRUE),3))
+    test(1276, unique(DT, by=key(DT)), DT[c(1,3,5)])
+    test(1277, duplicated(DT, by=key(DT)), rep(c(FALSE,TRUE),3))
 }
+setNumericRounding(old_rounding)
 
 # distinguishing small numbers from 0.0 as from v1.9.2,  test from Rick
 # http://stackoverflow.com/questions/22290544/grouping-very-small-numbers-e-g-1e-28-and-0-0-in-data-table-v1-8-10-vs-v1-9-2
+old_rounding = getNumericRounding()
 test_no = 1278.001
 for (dround in c(0,2)) {
     setNumericRounding(dround)  # rounding should not affect the result here because although small, it's very accurace (1 s.f.)
@@ -4415,9 +4533,11 @@ for (dround in c(0,2)) {
         test_no = test_no + 0.001
     }
 }
+setNumericRounding(old_rounding)
 
 # rounding of milliseconds, workaround, TO DO: #5445
 # http://stackoverflow.com/questions/22356957/rounding-milliseconds-of-posixct-in-data-table-v1-9-2-ok-in-1-8-10
+old_rounding = getNumericRounding()
 DT = data.table(timestamp=as.POSIXct(
         c("2013-01-01 17:51:00.707",
           "2013-01-01 17:51:59.996",
@@ -4429,7 +4549,7 @@ setNumericRounding(2)
 test(1279, duplicated(DT), rep(c(FALSE,TRUE), c(4,2)))
 setNumericRounding(1)
 test(1280, duplicated(DT), rep(FALSE, 6))
-setNumericRounding(2)
+setNumericRounding(old_rounding)
 
 # FR #5465, keep.rownames argument for setDT, just for data.frames:
 DF <- data.frame(x=1:5, y=10:6)
@@ -4461,8 +4581,8 @@ test(1284.2, dt[order(-abs(x))], dt[3:1])
 
 # fix for bug #5582 - unique/duplicated on empty data.table returned NA
 dt <- data.table(x=numeric(0), y=character(0), key="x")
-test(1285.1, duplicated(dt), duplicated.data.frame(dt))
-test(1285.2, unique(dt), dt)
+test(1285.1, duplicated(dt, by=key(dt)), duplicated.data.frame(dt))
+test(1285.2, unique(dt, by=key(dt)), dt)
 
 # BUG #5672 fix
 a <- data.table(BOD, key="Time")
@@ -4518,10 +4638,10 @@ test(1288.16, rbindlist(ll, fill=TRUE), error="fill=TRUE, but names of input lis
 # TO DO: TODO: think of and add more tests for rbindlist
 
 # fix for #5647
-dt <- data.table(x=1L, y=1:10)
+dt = data.table(x=1L, y=1:10)
 cp = copy(dt)
 test(1289.1, dt[,z := c(rep(NA, 5), y), by=x], cp[, z := c(rep(NA, 5), y[1:5])], warning="RHS 1 is length 15")
-dt<-data.table(x=c(1:2), y=1:10)
+dt = data.table(x=c(1:2), y=1:10)
 cp = copy(dt)
 test(1289.2, dt[, z := c(rep(NA, 5),y), by=x], cp[, z := rep(NA_integer_, 10)], warning="RHS 1 is length 10")
 
@@ -4671,7 +4791,7 @@ test(1294.31, dt[, e := list("bla2")]$e, rep("bla2", 3))
 
 # FR #5357, when LHS evaluates to integer(0), provide warning and return dt, not an error.
 dt = data.table(a = 1:5, b1 = 1:5, b2 = 1:5)
-test(1295, dt[, grep("c", names(d)) := NULL], dt, warning="length(LHS) = 0, meaning no columns to delete or assign RHS to")
+test(1295, dt[, grep("c", names(d)) := NULL], dt, warning="length(LHS)==0; no columns to delete or assign RHS to")
 
 # Updating logical column in one-row DT (corruption of new R 3.1 internal globals for TRUE, FALSE and NA)
 DT = data.table(a=1:6, b=c(TRUE,FALSE))
@@ -5124,14 +5244,19 @@ test(1364.17, setdiff_(X[, list(a)], Y[, list(a)]), data.table(a=c(1,2)))
 # not join along with by=.EACHI, #604
 DT <- data.table(A=c(1,1,1,2,2,2,2,3,3,4,5,5))[, `:=`(B=as.integer(A), C=c("c", "e", "a", "d"), D=factor(c("c", "e", "a", "d")), E=1:12)]
 setkey(DT, A)
-test(1365.1, DT[!J(c(2,5)), sum(E), by=.EACHI], DT[J(c(1,3,4)), sum(E), by=.EACHI])
+test(1365.1, suppressMessages(DT[!J(c(2,5)), sum(E), by=.EACHI]), 
+            suppressMessages(DT[J(c(1,3,4)), sum(E), by=.EACHI]))
 setkey(DT, B)
-test(1365.2, DT[!J(c(4:5)), list(.N, sum(E)), by=.EACHI], DT[J(1:3), list(.N, sum(E)), by=.EACHI])
+test(1365.2, suppressMessages(DT[!J(c(4:5)), list(.N, sum(E)), by=.EACHI]), 
+            suppressMessages(DT[J(1:3), list(.N, sum(E)), by=.EACHI]))
 setkey(DT, C)
-test(1365.3, copy(DT)[!"c", f := .N, by=.EACHI], copy(DT)[c("a", "d", "e"), f := .N, by=.EACHI])
+test(1365.3, suppressMessages(copy(DT)[!"c", f := .N, by=.EACHI]), 
+            suppressMessages(copy(DT)[c("a", "d", "e"), f := .N, by=.EACHI]))
 setkey(DT, D)
-test(1365.4, DT[!J(factor("c")), .N, by=.EACHI], DT[J(factor(c("a", "d", "e"))), .N, by=.EACHI])
-test(1365.5, DT[!"c", lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")], DT[c("a", "d", "e"), lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")])
+test(1365.4, suppressMessages(DT[!J(factor("c")), .N, by=.EACHI]), 
+            suppressMessages(DT[J(factor(c("a", "d", "e"))), .N, by=.EACHI]))
+test(1365.5, suppressMessages(DT[!"c", lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")]), 
+            suppressMessages(DT[c("a", "d", "e"), lapply(.SD, sum), by=.EACHI, .SDcols=c("B", "E")]))
 
 # uniqlengths doesn't error on 0-length input
 test(1366, uniqlengths(integer(0), 0L), integer(0))
@@ -5341,30 +5466,47 @@ test(1375.3, DT[,mean(Petal.Width),by=Species][V1>1,Species:=toupper(Species)]$S
 
 # Secondary keys a.k.a indexes ...
 DT = data.table(a=1:10,b=10:1)
-test(1376.1, key2(DT), NULL)
+test(1376.1, indices(DT), NULL)
 test(1376.2, DT[b==7L,verbose=TRUE], DT[4L], output="Creating new index 'b'")
-test(1376.3, key2(DT), "b")
+test(1376.3, indices(DT), "b")
 test(1376.4, DT[b==8L,verbose=TRUE], DT[3L], output="Using existing index 'b'")
 test(1376.5, DT[a==7L,verbose=TRUE], DT[7L], output="Creating new index")  # add 2nd secondary key
-test(1376.6, key2(DT), c("b","a"))  # 2 secondary keys of single columns
+test(1376.6, indices(DT), c("b","a"))  # 2 secondary keys of single columns
 test(1376.7, DT[a==7L,verbose=TRUE], DT[7L], output="Using existing index 'a'")
 setkey(DT,b)
-test(1376.8, key2(DT), NULL)
-test(1376.9, list(DT[a==2L], key2(DT)), list(DT[9L],"a"))  # create key2 for next test
-set2key(DT,NULL)
-test(1376.10, list(key(DT), key2(DT)), list("b", NULL))
+test(1376.8, indices(DT), NULL)
+test(1376.9, list(DT[a==2L], indices(DT)), list(DT[9L],"a"))  # create indices for next test
+setindex(DT,NULL)
+test(1376.10, list(key(DT), indices(DT)), list("b", NULL))
 options(datatable.auto.index = FALSE)
-test(1376.11, list(DT[a==2L], key2(DT)), list(DT[9L],NULL))
+test(1376.11, list(DT[a==2L], indices(DT)), list(DT[9L],NULL))
 options(datatable.auto.index = TRUE)
-test(1376.12, list(DT[a==2L], key2(DT)), list(DT[9L],"a"))
+test(1376.12, list(DT[a==2L], indices(DT)), list(DT[9L],"a"))
 
-# When i is FALSE, it shouldn't matter if .SDcols is wrong. Package vardpoor relies on this in example(vardchanges).
+# When i is FALSE and a column is being added by reference, for consistency with cases when i is not FALSE
+# we should still add the column. But we need to know what type it should be, so the user supplied RHS of :=
+# needs to work on empty input to tell us the column type. Package vardpoor in example(vardchanges) used to
+# rely on DT[FALSE,...] not adding the column and not evaluating RHS but it no longer does that so we can
+# make this consistent now. If that usage is required then user should use if(FALSE) DT[...] instead.
 DT = data.table(a=1:3, b=4:6)
-test(1377.1, DT[FALSE, foo:=7], DT)
-test(1377.2, DT[0, foo:=7], DT)
-test(1377.3, DT[, foo := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","b")], data.table(a=1:3, b=4:6, foo=c("1__4","2__5","3__6")))
-test(1377.4, DT[, bar := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","zz")], error="Some items of .SDcols are not column names")
-test(1377.5, DT[FALSE, bar := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","zz")], DT)
+ans = copy(DT)[, foo:=NA_real_]
+test(1377.1, copy(DT)[FALSE, foo:=7], ans)
+test(1377.2, copy(DT)[0, foo:=7], ans)
+test(1377.3, copy(DT)[, foo := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","b")],
+             data.table(a=1:3, b=4:6, foo=c("1__4","2__5","3__6")))
+err = "Some items of .SDcols are not column names"
+# .SDcols should always be checked even if RHS (which uses .SDcols) isn't eval'd due to i==FALSE
+test(1377.4, copy(DT)[, bar := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","zz")],
+             error=err)
+test(1377.5, copy(DT)[FALSE, bar := Reduce(function(x,y)paste(x,y,sep="__"), .SD), .SDcols=c("a","zz")],
+             error=err)        
+test(1377.6, DT, data.table(a=1:3, b=4:6))  # check that the original hasn't been changed by these tests
+test(1377.7, copy(DT)[FALSE, bar:=stop("eval'd")], error="eval'd")
+DT[,bar:=NA]   # create column so that RHS isn't needed to be eval'd to know type. We don't allow type changes anyway.
+               # Now no need to eval RHS (and therefore find error), as relied on by package treemap
+               # in example(random.hierarchical.data) in the do.call of fun=="addRange" where it's called on
+               # an empty subset and LB <- x[[1]][1] results in NA which causes seq(LB, UB, ...) to error.
+test(1377.8, copy(DT)[FALSE, bar:=stop("eval'd")], DT)
 
 #====================================
 # fread issue with http download on Windows, thanks to Steve Miller for highlighting.
@@ -5378,126 +5520,13 @@ test(1378.1, dim(fread("russellCRLF.csv")), c(19,4))
 f = paste("file://",getwd(),"/russellCRLF.csv",sep="")
 # simulates a http:// request as far as file.download() and unlink() goes, without internet
 # download.file() in fread() changes the input data from \r\n to \n, on Windows.
-test(1378.2, dim(fread(f)), c(19,4))
+test(1378.2, dim(fread(f, showProgress=FALSE)), c(19,4))
 
 f = paste("file://",getwd(),"/russellCRCRLF.csv",sep="")
 # actually has 3 \r in the file, download.file() from file:// changes that to \r\r\n, so we can simulate download.file from http: in text mode.
-test(1378.3, fread(f), error="Line ending is .*r.*r.*n. R's download.file() appears to add the extra .*r in text mode on Windows. Please download again in binary mode (mode='wb') which might be faster too. Alternatively, pass the URL directly to fread and it will download the file in binary mode for you.")
-#====================================
-
-
-#====================================
-# Return to old bywithoutby behaviour. TO DO: delete these tests after Sep 2015
-
-options(datatable.old.bywithoutby=TRUE)
-deprecated_warn = "The data.table option 'datatable.old.bywithoutby' for grouping on join without providing `by` will be deprecated in the next release, use `by=.EACHI`."
-# Old tests from before commit: 0be720956fdd9c274e46133e154d4bbd5b2c7840
-# TO DO: address `allow.cartesian`.  Some differences below...
-off = -1000
-TESTDT = data.table(a=as.integer(c(1,3,4,4,4,4,7)), b=as.integer(c(5,5,6,6,9,9,2)), v=1:7)
-setkey(TESTDT,a,b)
-test(off-8, TESTDT[SJ(c(-9,1,4,4,8),c(1,4,4,10,1)),v]$v, INT(NA,NA,NA,NA,NA), warning=deprecated_warn)
-test(off-9, TESTDT[SJ(c(-9,1,4,4,8),c(1,4,4,10,1)),v,roll=TRUE]$v, INT(NA,NA,NA,6,NA), warning=deprecated_warn)
-test(off-10, TESTDT[SJ(c(-9,1,4,4,8),c(1,4,4,10,1)),v,roll=TRUE,rollends=FALSE]$v, INT(NA,NA,NA,NA,NA), warning=deprecated_warn)
-test(off-16, TESTDT[SJ(c(4)),v][[2]], INT(3,4,5,6), warning=deprecated_warn)
-test(off-18, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",nomatch=0][[2]], INT(3:6), warning=deprecated_warn)
-test(off-185, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",nomatch=NA][[2]], INT(NA,NA,3:6,NA), warning=deprecated_warn)
-test(off-19, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",roll=TRUE,nomatch=0][[2]], INT(1,3:6,7), warning=deprecated_warn)
-test(off-186, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",roll=TRUE,nomatch=NA][[2]], INT(NA,1,3:6,7), warning=deprecated_warn)
-test(off-20, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",roll=TRUE,rollends=FALSE,nomatch=0][[2]], INT(1,3:6), warning=deprecated_warn)
-test(off-187, TESTDT[SJ(c(-3,2,4,8)),v,mult="all",roll=TRUE,rollends=FALSE,nomatch=NA][[2]], INT(NA,1,3:6,NA), warning=deprecated_warn)
-test(off-21, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",nomatch=0][[3]], INT(1,3:4), warning=deprecated_warn)
-test(off-188, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",nomatch=NA][[3]], INT(NA,1,NA,3:4,NA,NA,NA), warning=deprecated_warn)
-test(off-22, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",roll=TRUE,nomatch=0][[3]], INT(1,3:4,4,6), warning=deprecated_warn)
-test(off-189, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",roll=TRUE,nomatch=NA][[3]], INT(NA,1,NA,3:4,4,6,NA), warning=deprecated_warn)
-test(off-23, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",roll=TRUE,rollends=FALSE,nomatch=0][[3]], INT(1,3:4,4), warning=deprecated_warn)
-test(off-190, TESTDT[SJ(c(-9,1,4,4,4,4,8),c(1,5,5,6,7,10,3)),v,mult="all",roll=TRUE,rollends=FALSE,nomatch=NA][[3]], INT(NA,1,NA,3:4,4,NA,NA), warning=deprecated_warn)
-test(off-24, TESTDT[SJ(c(1,NA,4,NA,NA,4,4),c(5,5,6,6,7,9,10)),v,mult="all",roll=TRUE,nomatch=0][[3]], INT(1,3:4,5:6,6), warning=deprecated_warn)
-test(off-191, TESTDT[SJ(c(1,NA,4,NA,NA,4,4),c(5,5,6,6,7,9,10)),v,mult="all",roll=TRUE,nomatch=NA][[3]], INT(NA,NA,NA,1,3:4,5:6,6), warning=deprecated_warn)
-
-TESTDT[, a:=letters[a]]
-TESTDT[, b:=letters[b]]
-setkey(TESTDT,a,b)
-
-a = "d"
-# Variable Twister.  a in this scope has same name as a inside DT scope.
-# Aug 2010 : As a result of bug 1005, and consistency with 'j' and 'by' we now allow self joins (test 183) in 'i'.
-test(off-70, TESTDT[eval(J(a)),v], data.table(a="d",v=3:6,key="a"), warning=deprecated_warn)   # the eval() enabled you to use the 'a' in the calling scope, not 'a' in the TESTDT.  TO DO: document this.
-test(off-71, TESTDT[eval(SJ(a)),v], data.table(a="d",v=3:6,key="a"), warning=deprecated_warn)
-test(off-72, TESTDT[eval(CJ(a)),v], data.table(a="d",v=3:6,key="a"), warning=deprecated_warn)
-
-DT = data.table(a=rep(1:3,each=2),b=c(TRUE,FALSE),v=1:6)
-setkey(DT,a,b)
-test(off-180, DT[J(2,FALSE),v]$v, 4L, warning=deprecated_warn)
-
-DT = data.table(A = c("o", "x"), B = 1:10, key = "A")
-test(off-183, DT[J(unique(A)), B]$B, DT$B, warning=deprecated_warn)
-
-# Tests of bug 1015 highlight by Harish
-# See thread "'by without by' now heeds nomatch=NA"
-# Tests 185-201 were added in above next to originals
-x <- data.table(a=c("a","b","d","e"),b=c("A","A","B","B"),d=c(1,2,3,4), key="a,b")
-y <- data.table(g=c("a","b","c","d"),h=c("A","A","A","A"))
-test(off-202, x[y], suppressWarnings(x[y,mult="all"]), warning=deprecated_warn)
-test(off-203, x[y,d]$d, c(1,2,NA,NA), warning=deprecated_warn)
-test(off-204, x[y,list(d)], suppressWarnings(x[y,d]), warning=deprecated_warn)
-test(off-205, x[y,list(d),mult="all"][,d], c(1,2,NA,NA), warning=deprecated_warn)
-
-DF = data.frame(a=LETTERS[1:10], b=1:10, stringsAsFactors=FALSE)
-DT = data.table(DF)
-setkey(DT,a)    # used to complain about character
-test(off-215, DT["C",b]$b, 3L, warning=deprecated_warn)
-DT = data.table(DF,key="a")
-test(off-216, DT["C",b]$b, 3L, warning=deprecated_warn)
-DT = data.table(a=c(1,2,3),v=1:3,key="a")
-test(off-217, DT[J(2),v]$v, 2L, warning=deprecated_warn)
-DT = data.table(a=c(1,2.1,3),v=1:3,key="a")
-test(off-218, DT[J(2.1),v]$v, 2L, warning=deprecated_warn)
-
-DT = data.table(a=1:5,b=6:10,key="a")
-q = quote(a>3)
-test(off-220, DT[eval(q),b], 9:10)
-test(off-221, DT[eval(parse(text="a>4")),b], 10L)
-test(off-222, DT[eval(parse(text="J(2)")),b]$b, 7L, warning=deprecated_warn)
-
-# Join Inherited Scope, and X[Y] including Y's non-join columns
-X=data.table(a=rep(1:3,c(3,3,2)),foo=1:8,key="a")
-Y=data.table(a=2:3,bar=6:7)
-test(off-239, X[Y,sum(foo)], data.table(a=2:3,V1=c(15L,15L),key="a"), warning=deprecated_warn)
-test(off-240, X[Y,sum(foo*bar)], data.table(a=2:3,V1=c(90L,105L),key="a"), warning=deprecated_warn)
-test(off-241, X[Y], data.table(a=rep(2:3,3:2),foo=4:8,bar=rep(6:7,3:2),key="a"), warning=deprecated_warn)
-test(off-242, X[Y,list(foo,bar)][,sum(foo*bar)], 195L, warning=deprecated_warn)
-
-
-X=data.table(a=rep(LETTERS[1:2],2:3),b=1:5,v=10:14,key="a,b")
-test(off-246, X["A"], {tt=X[1:2];setkey(tt,a);tt}, warning=deprecated_warn)  # key="a,b" is retained in 1.9.2 and 1.9.4; just old.bywithoutby=TRUE in 1.9.4 keeps "a" only, unfortunately.
-
-# Test .N==0 with nomatch=NA|0
-DT = data.table(a=1:2,b=1:6,key="a")
-test(off-349, DT[J(2:3),.N,nomatch=NA]$N, c(3L,0L), warning=deprecated_warn)
-test(off-350, DT[J(2:3),.N,nomatch=0]$N, c(3L), warning=deprecated_warn)
-# Test first .N==0 with nomatch=NA|0
-test(off-350.1, DT[J(4),.N]$N, 0L, warning=deprecated_warn)
-test(off-350.2, DT[J(0:4),.N]$N, c(0L,3L,3L,0L,0L), warning=deprecated_warn)
-
-# Test printing on nested data.table, bug #1803
-DT = data.table(x=letters[1:3],y=list(1:10,letters[1:4],data.table(a=1:3,b=4:6)))
-test(off-558, capture.output(print(DT)), c("   x            y","1: a 1,2,3,4,5,6,","2: b      a,b,c,d","3: c <data.table>"))
-test(off-559, setkey(DT,x)["a",y][[2]][[1]], 1:10, warning=deprecated_warn)   # y is symbol representing list column, specially detected in dogroups
-
-# another test linked from #2162
-DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1L,3L,6L), v=1:9, key="x")
-test(off-725, DT[c("a","b","d"),v][,list(v)], DT[J(c("a","b","d")),"v",with=FALSE], warning=deprecated_warn)  # unfiled bug fix for NA matches; see NEWS 1.8.3
-
-DT = data.table(a=1:3,b=1:6,key="a")
-test(off-869, suppressWarnings(DT[J(2,42,84),print(.SD)]), output="   b1: 22: 5.*Empty data.table (0 rows) of 3 cols: a,V2,V3")
-
-rm(deprecated_warn)
-options(datatable.old.bywithoutby=FALSE)
-# End (test reverting to old bywithoutby behaviour)  TO DO: delete these tests after Sep 2015
+test(1378.3, fread(f, showProgress=FALSE), error="Line ending is .*r.*r.*n. R's download.file() appears to add the extra .*r in text mode on Windows. Please download again in binary mode (mode='wb') which might be faster too. Alternatively, pass the URL directly to fread and it will download the file in binary mode for you.")
 #====================================
 
-
 oldv = options(datatable.fread.datatable = FALSE)
 test(1379.1, fread("A,B\n1,3\n2,4\n"), data.frame(A=1:2,B=3:4))
 test(1379.2, fread("A,B\n1,3\n2,4\n",data.table=TRUE), data.table(A=1:2,B=3:4))
@@ -5658,11 +5687,11 @@ DT = data.table(a=1:3, b=4:6)
 test(1396, DT[a==2, verbose=TRUE], DT[2], output="Creating new index 'a'")
 test(1397, DT[b==6, verbose=TRUE], DT[3], output="Creating new index 'b'")
 test(1398, DT[b==6, verbose=TRUE], DT[3], output="Using existing index 'b'")
-test(1399, key2(DT), c("a","b"))
+test(1399, indices(DT), c("a","b"))
 test(1400, DT[2, a:=4L, verbose=TRUE], data.table(a=c(1L,4L,3L),b=4:6), output="Dropping index 'a' due to update on 'a' (column 1)")
-test(1401, key2(DT), "b")
+test(1401, indices(DT), "b")
 test(1402, DT[,b:=NULL,verbose=TRUE], data.table(a=c(1L,4L,3L)), output="Dropping index 'b' due to delete of 'b' (column 2)")
-test(1403, key2(DT), NULL)
+test(1403, indices(DT), NULL)
 DT = data.table(x=1:5)
 test(1404, DT[, y := x <= 2L], data.table(x=1:5, y=c(TRUE,TRUE,FALSE,FALSE,FALSE)))
 test(1405, DT[y == TRUE, .N, verbose=TRUE], 2L, output="Creating new index")
@@ -5670,40 +5699,40 @@ test(1406, DT[, y := x <= 3L, verbose=TRUE], data.table(x=1:5, y=c(TRUE,TRUE,TRU
 test(1407, DT[y == TRUE, .N], 3L)
 DT = data.table(x=1:5, y=10:6)
 test(1408, DT[x==3,verbose=TRUE], DT[3], output="Creating")
-test(1409, key2(DT), "x")
+test(1409, indices(DT), "x")
 set(DT,1:3,1L,-10L)
-test(1410, key2(DT), NULL)
+test(1410, indices(DT), NULL)
 test(1411, DT[x==5], DT[5])
 setorder(DT, y)
-test(1412, key2(DT), NULL)
+test(1412, indices(DT), NULL)
 test(1413, DT[x==5], DT[1])
 DT = data.table(foo=1:3, bar=4:6, baz=9:7)
-set2key(DT,foo,bar,baz)
-test(1414, key2(DT), c("foo__bar__baz"))
+setindex(DT,foo,bar,baz)
+test(1414, indices(DT), c("foo__bar__baz"))
 test(1415, DT[2,bar:=10L,verbose=TRUE], output="Dropping index 'foo__bar__baz' due to update on 'bar'")  # test middle
-test(1416, key2(DT), NULL)
-set2key(DT,foo,bar,baz)
+test(1416, indices(DT), NULL)
+setindex(DT,foo,bar,baz)
 test(1417, DT[2,baz:=10L,verbose=TRUE], output="Dropping index 'foo__bar__baz' due to update on 'baz'")  # test last
-set2key(DT,bar,baz)
+setindex(DT,bar,baz)
 test(1418, DT[2,c("foo","bar"):=10L,verbose=TRUE], output="Dropping index.* due to update on 'bar'")     # test 2nd to 1st
-set2key(DT,bar,baz)
+setindex(DT,bar,baz)
 test(1419, DT[2,c("foo","baz"):=10L,verbose=TRUE], output="Dropping index.* due to update on 'baz'")     # test 2nd to 2nd
 
 # setnames updates secondary key
 DT = data.table(a=1:5,b=10:6)
-set2key(DT,b)
-test(1420, key2(DT), "b")
+setindex(DT,b)
+test(1420, indices(DT), "b")
 setnames(DT,"b","foo")
-test(1421, key2(DT), "foo")
+test(1421, indices(DT), "foo")
 test(1422, DT[foo==9, verbose=TRUE], DT[2], output="Using existing index 'foo'")
-set2key(DT,a,foo)
-test(1423, key2(DT), c("foo","a__foo"))   # tests as well that order of attributes is retained although we don't use that property currently.
-test(1424, key2(setnames(DT,"foo","bar")), c("bar","a__bar"))
-test(1425, key2(setnames(DT,"a","baz")), c("bar","baz__bar"))
+setindex(DT,a,foo)
+test(1423, indices(DT), c("foo","a__foo"))   # tests as well that order of attributes is retained although we don't use that property currently.
+test(1424, indices(setnames(DT,"foo","bar")), c("bar","a__bar"))
+test(1425, indices(setnames(DT,"a","baz")), c("bar","baz__bar"))
 test(1426, DT[baz==4L, verbose=TRUE], output="Creating new index 'baz'")
-test(1427, key2(DT), c("bar","baz__bar","baz"))
+test(1427, indices(DT), c("bar","baz__bar","baz"))
 test(1428, DT[bar==9L, verbose=TRUE], output="Using existing index 'bar'")
-test(1429, key2(setnames(DT,"bar","a")), c("baz", "a", "baz__a"))
+test(1429, indices(setnames(DT,"bar","a")), c("baz", "a", "baz__a"))
 
 # Finalised == and %in% optimization in i
 DT = data.table(a=1:3,b=c(0,2,3,0,0,2))
@@ -5746,7 +5775,6 @@ if (.Platform$OS.type=="windows" ||
 }
 
 # doubled quote inside a quoted field followed by an embedded newline
-# This file is 36 rows to move that line outside the top, middle and bottom 5 test rows
 test(1445, fread("doublequote_newline.csv")[7:10], data.table(A=c(1L,1L,2L,1L), B=c("a","embedded \"\"field\"\"\nwith some embedded new\nlines as well","not this one","a")))
 # the example from #489 directly :
 test(1446, fread('A,B,C\n233,"AN ""EMBEDDED"" QUOTE FIELD",morechars\n'), data.table(A=233L, B='AN ""EMBEDDED"" QUOTE FIELD', C='morechars'))
@@ -5901,6 +5929,13 @@ test(1463.30, shift(x,1L, type="lead"),     c(x[-1L], NA))
 test(1463.31, shift(x,1:2, type="lead"),    list(c(x[-1L],NA), c(x[-(1:2)],NA,NA)))
 test(1463.32, shift(x,1L, 0L, type="lead"), c(x[-(1)], FALSE))
 
+# for list of list, #1595
+x = data.table(foo = c(list(c("a","b","c")), list(c("b","c")), list(c("a","b")), list(c("a"))), id = c(1,1,2,2))
+test(1463.33, x[, shift(list(foo)), by=id],
+    data.table(id=c(1,1,2,2), V1=list(NA, c("a", "b", "c"), NA, c("a", "b"))))
+test(1463.34, x[, shift(list(foo), type="lead", fill=NA_integer_), by=id],
+    data.table(id=c(1,1,2,2), V1=list(c("b", "c"), NA_integer_, c("a"), NA_integer_)))
+
 # Fix for #1009 segfault in shift
 val = runif(1)
 test(1463.33, shift(val, 2L), NA_real_)
@@ -5919,7 +5954,8 @@ setattr(ans, 'names', nm)
 test(1463.27, shift(x, 1:2, give.names=TRUE), ans)
 
 # FR #686
-DT = data.table(a=rep(c("A", "B", "C", "A", "B"), c(2,2,3,1,2)))
+DT = data.table(a=rep(c("A", "B", "C", "A", "B"), c(2,2,3,1,2)), foo=1:10)
+# Seemingly superfluous 'foo' is needed to test fix for #1942
 DT[, b := as.integer(factor(a))][, c := as.numeric(factor(a))]
 test(1464.1, rleidv(DT, "a"), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))
 test(1464.2, rleid(DT$a), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))
@@ -5928,19 +5964,28 @@ test(1464.4, rleid(DT$b), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))
 test(1464.5, rleidv(DT, "c"), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))
 test(1464.6, rleid(DT$c), c(1L, 1L, 2L, 2L, 3L, 3L, 3L, 4L, 5L, 5L))
 test(1464.7, rleid(as.complex(c(1,0+5i,0+5i,1))), error="Type 'complex' not supported")
+test(1464.8, rleidv(DT, 0), error="outside range")
+test(1464.9, rleidv(DT, 5), error="outside range")
+test(1464.11, rleidv(DT, 1:4), 1:nrow(DT))
+set.seed(1)
+DT = data.table( sample(1:2,20,replace=TRUE), sample(1:2,20,replace=TRUE), sample(1:2,20, replace=TRUE))
+test(1464.12, rleidv(DT, 1:4), error="outside range")
+test(1464.13, rleidv(DT, 1:2), ans<-INT(1,2,3,4,5,6,6,6,7,8,8,9,10,11,12,13,14,15,16,17))
+test(1464.14, rleidv(DT, 2:1), ans)
+test(1464.15, rleidv(DT, c(3,1)), INT(1,1,2,2,3,4,5,5,6,7,8,9,10,11,12,13,14,15,16,17))
 
 # data.table-xts conversion #882
 
 if ("package:xts" %in% search()) {
     # Date index
-    dt = data.table(index = as.Date((as.Date("2014-12-12")-49):as.Date("2014-12-12"),origin="1970-01-01"),quantity = rep(c(1:5),10),value = rep(c(1:10)*100,5))
+    dt = data.table(index = as.Date((as.Date("2014-12-12")-49):as.Date("2014-12-12"),origin="1970-01-01"),quantity = as.numeric(rep(c(1:5),10)),value = rep(c(1:10)*100,5))
     xt = as.xts(matrix(data = c(dt$quantity, dt$value),ncol = 2,dimnames = list(NULL,c("quantity","value"))),order.by = dt$index)
     dt_xt = as.data.table(xt)
     xt_dt = as.xts.data.table(dt)
     test(1465.1, all.equal(dt, dt_xt, check.attributes = FALSE))
     test(1465.2, xt, xt_dt)
     # POSIXct index
-    dt <- data.table(index = as.POSIXct(as.Date((as.Date("2014-12-12")-49):as.Date("2014-12-12"),origin="1970-01-01"),origin="1970-01-01"),quantity = rep(c(1:5),10),value = rep(c(1:10)*100,5))
+    dt <- data.table(index = as.POSIXct(as.Date((as.Date("2014-12-12")-49):as.Date("2014-12-12"),origin="1970-01-01"),origin="1970-01-01"),quantity = as.numeric(rep(c(1:5),10)),value = rep(c(1:10)*100,5))
     xt = as.xts(matrix(data = c(dt$quantity, dt$value),ncol = 2,dimnames = list(NULL,c("quantity","value"))),order.by = dt$index)
     dt_xt = as.data.table(xt)
     xt_dt = as.xts.data.table(dt)
@@ -5954,12 +5999,17 @@ test(1466.1, as.data.table(as.data.frame(ar)), as.data.table(ar)) # array type
 x <- rep(Sys.time(),3)
 test(1466.2, as.data.table(as.data.frame(x)), as.data.table(x)) # posix type
 
-# fix for #1001
-options(datatable.auto.index=TRUE)
-DT <- data.table(a=1:2)
-test(1467.1, DT[a==3, b:=d+1], DT)
-# restore
-options(datatable.auto.index=FALSE)
+# fix for #1001, #1002 and #759
+# When adding a column, even if i results in no rows, the RHS needs to evaluate so we can know the
+# column type to create. Always create the column for consistency that does not depend on the data in i
+for (bool in c(FALSE,TRUE)) {
+  options(datatable.auto.index=bool)
+  DT = data.table(a=1:2)
+  test(1467.01 + bool*0.03, copy(DT)[a==3, b:=notExist+1], error="notExist")
+  test(1467.02 + bool*0.03, copy(DT)[a==3, b:=a+5L], data.table(a=1:2, b=NA_integer_))
+  test(1467.03 + bool*0.03, copy(DT)[a==3, b:=a+5], data.table(a=1:2, b=NA_real_))
+}
+test(1467.07, getOption("datatable.auto.index"))  # ensure to leave TRUE
 
 # fix for first bug reported in #1006 on 'foverlaps()'
 x <- c(-0.1, 0, 0.1)
@@ -5994,9 +6044,9 @@ test(1469.6, key(DT[J(c(2,0)), roll="nearest"]), NULL)
 
 # 1007 fix, dealing with Inf and -Inf correctly in rolling joins.
 DT = data.table(x=c(-Inf, 3, Inf), y=1:3, key="x")
-test(1470.1, DT[J(c(2,-Inf,5,Inf)), roll=Inf], data.table(x=c(2,-Inf,5,Inf), y=c(1, 1, 2, 3)))
+test(1470.1, DT[J(c(2,-Inf,5,Inf)), roll=Inf], data.table(x=c(2,-Inf,5,Inf), y=c(1L, 1:3)))
 test(1470.2, DT[J(c(2,-Inf,5,Inf)), roll=10], data.table(x=c(2,-Inf,5,Inf), y=INT(c(NA, 1, 2, 3))))
-test(1470.3, DT[SJ(c(2,-Inf,5,Inf)), roll=Inf], data.table(x=c(-Inf,2,5,Inf), y=c(1, 1, 2, 3), key="x"))
+test(1470.3, DT[SJ(c(2,-Inf,5,Inf)), roll=Inf], data.table(x=c(-Inf,2,5,Inf), y=c(1L, 1:3), key="x"))
 
 # 1006, second bug with -Inf, now that #1007 is fixed.
 x <- c(-Inf, -0.1, 0, 0.1, Inf)
@@ -6066,7 +6116,7 @@ test(1477.8, transpose(list(list(1:5))), error="Item 1 of list input is")
 # #480 `setDT` and 'lapply'
 ll = list(data.frame(a=1), data.frame(x=1, y=2), NULL, list())
 ll <- lapply(ll, setDT)
-test(1478.1, sapply(ll, truelength), rep(100L, 4L))
+test(1478.1, sapply(ll, truelength), c(1025L, 1026L, 1024L, 1024L))
 test(1478.2, sapply(ll, length), INT(1,2,0,0))
 
 # rbindlist stack imbalance issue, #980.
@@ -6079,13 +6129,18 @@ test(1480, DT[, x := strsplit(as.character(x), " ")], data.table(x=list("a", let
 
 # #970, over-allocation issue
 a=data.frame(matrix(1,ncol=101L))
-options(datatable.alloccol=100L)
+old = options(datatable.alloccol=100L)
 ans1 = data.table(a)
 options(datatable.alloccol=101L)
 ans2 = data.table(a)
-test(1481, ans1, ans2)
-# Global option. so reset back, else test 1478 fails.
-options(datatable.alloccol=100L)
+test(1481.1, ans2, ans1)
+options(datatable.alloccol=0L)
+ans3 = data.table(a)
+test(1481.2, ans3, ans1)
+options(datatable.alloccol=1L)
+ans4 = data.table(a)
+test(1481.3, ans4, ans1)
+options(old)
 
 # #479, check := assignment in environment (actual case is when loaded from disk, but we'll just simulate a scenario here).
 ee = new.env()
@@ -6093,7 +6148,9 @@ ee$DT = data.frame(x=1L, y=1:3)
 setattr(ee$DT, 'class', c("data.table", "data.frame"))
 test(1482.1, truelength(ee$DT), 0L) # make sure that the simulated environment is right.
 test(1482.2, ee$DT[, z := 3:1], data.table(x=1L, y=1:3, z=3:1), warning="Invalid .internal.selfref detected and")
-test(1482.3, truelength(ee$DT) >= 100L, TRUE) # truelength restored?
+test(1482.3, truelength(ee$DT), 1027L)
+test(1482.4, ee$DT[, za := 4:6], data.table(x=1L, y=1:3, z=3:1, za=4:6))  
+test(1482.5, truelength(ee$DT), 1027L)   # should have used spare slot i.e. no increase in tl
 
 # Fix for #499 and #945
 x <- data.table(k=as.factor(c(NA,1,2)),v=c(0,1,2), key="k")
@@ -6158,13 +6215,13 @@ test(1489, DT[.(TRUE)], DT[1L])
 # Fix for #932
 DT <- data.table(v1 = c(1:3, NA), v2 = c(1,NA,2.5,NaN), v3=c(NA, FALSE, NA, TRUE), v4=c("a", NA, "b", "c"))
 options(datatable.auto.index = TRUE) # just to be sure
-set2key(DT, v1)
+setindex(DT, v1)
 test(1490.1,  DT[v1==3],      subset(DT, v1==3))
 test(1490.2,  DT[!v1==3],     subset(DT, !v1==3))
 test(1490.3,  DT[v1==NA],     subset(DT, v1==NA))
 test(1490.4,  DT[!v1==NA],    subset(DT, !v1==NA))
 
-set2key(DT, v2)
+setindex(DT, v2)
 test(1490.5,  DT[v2==2.5],    subset(DT, v2==2.5))
 test(1490.6,  DT[!v2==2.5],   subset(DT, !v2==2.5))
 test(1490.7,  DT[v2==NA],     subset(DT, v2==NA))
@@ -6172,7 +6229,7 @@ test(1490.8,  DT[!v2==NA],    subset(DT, !v2==NA))
 test(1490.9,  DT[v2==NaN],    subset(DT, v2==NaN))
 test(1490.10, DT[!v2==NaN],   subset(DT, !v2==NaN))
 
-set2key(DT, v3)
+setindex(DT, v3)
 test(1490.11, DT[v3==FALSE],  subset(DT, v3==FALSE))
 test(1490.12, DT[!v3==FALSE], subset(DT, !v3==FALSE))
 test(1490.13, DT[v3==TRUE],   subset(DT, v3==TRUE))
@@ -6182,7 +6239,7 @@ test(1490.16, DT[!v3==NA],    subset(DT, !v3==NA))
 test(1490.17, DT[(v3)],       subset(DT, v3==TRUE))
 test(1490.18, DT[!(v3)],      subset(DT, !v3==TRUE))
 
-set2key(DT, v4)
+setindex(DT, v4)
 test(1490.19, DT[v4=="b"],    subset(DT, v4=="b"))
 test(1490.20, DT[!v4=="b"],   subset(DT, !v4=="b"))
 test(1490.21, DT[v4==NA],     subset(DT, v4==NA))
@@ -6211,12 +6268,12 @@ ee3 = quote(.(v1=.(.SD), v2=.(lm(. ~ xx)), v3=.(.(x)), v4=.(x^2)))
 ee4 = quote(c("a", "b") := .(.SD))
 ee5 = quote(c("a", "b") := .(v1=x^2, v2 = .(.SD[[1L]])))
 ee6 = quote(.(v1=.(.SD), v2=.(lm(. ~ xx)), v3=list(.(x)), v4=.(x^2)))
-test(1491.1, replace_dot(ee1), quote(list(val = lm(x ~ .))))
-test(1491.2, replace_dot(ee2), quote(list(v1=list(.SD), v2=list(min(y)), v3=list(list(x)), v4=list(x))))
-test(1491.3, replace_dot(ee3), quote(list(v1=list(.SD), v2=list(lm(. ~ xx)), v3=list(list(x)), v4=list(x^2))))
-test(1491.4, replace_dot(ee4), quote(c("a", "b") := list(.SD)))
-test(1491.5, replace_dot(ee5), quote(c("a", "b") := list(v1=x^2, v2 = list(.SD[[1L]]))))
-test(1491.6, replace_dot(ee6), quote(list(v1=list(.SD), v2=list(lm(. ~ xx)), v3=list(list(x)), v4=list(x^2))))
+test(1491.1, replace_dot_alias(ee1), quote(list(val = lm(x ~ .))))
+test(1491.2, replace_dot_alias(ee2), quote(list(v1=list(.SD), v2=list(min(y)), v3=list(list(x)), v4=list(x))))
+test(1491.3, replace_dot_alias(ee3), quote(list(v1=list(.SD), v2=list(lm(. ~ xx)), v3=list(list(x)), v4=list(x^2))))
+test(1491.4, replace_dot_alias(ee4), quote(c("a", "b") := list(.SD)))
+test(1491.5, replace_dot_alias(ee5), quote(c("a", "b") := list(v1=x^2, v2 = list(.SD[[1L]]))))
+test(1491.6, replace_dot_alias(ee6), quote(list(v1=list(.SD), v2=list(lm(. ~ xx)), v3=list(list(x)), v4=list(x^2))))
 
 # Fix for #1050
 dt = data.table(x=1:5, y=6:10)
@@ -6271,8 +6328,8 @@ if ("package:bit64" %in% search()) {
     x = c("12345678901234", rep("NA", 178), "a")
     y = sample(letters, length(x), TRUE)
     ll = paste(x,y, sep=",", collapse="\n")
-    test(1500.3, fread(ll), 
-        data.table(V1=c("12345678901234", rep("", 178), "a"), V2=y), warning="Bumped column 1 to type character on data")
+    test(1500.3, fread(ll),
+        data.table(V1=c("12345678901234", rep(NA, 178), "a"), V2=y))
 
     x = c("12345678901234", rep("NA", 178), "0.5")
     y = sample(letters, length(x), TRUE)
@@ -6290,8 +6347,8 @@ test(1502.2, dt1["a", z := 42L], dt2["a", z := 42L])
 
 # fix for #1080
 dt = data.table(col1 = c(1,2,3,2,5,3,2), col2 = c(0,9,8,9,6,5,4), key=c("col1"))
-test(1503.1, uniqueN(dt), 4L) # default on key columns
-test(1503.2, uniqueN(dt, by=NULL), 6L) # on all columns
+test(1503.1, uniqueN(dt, by=key(dt)), 4L) # default on key columns
+test(1503.2, uniqueN(dt), 6L) # on all columns
 test(1503.3, uniqueN(dt$col1), 4L) # on just that column
 
 # .SDcols and with=FALSE understands colstart:colend syntax
@@ -6305,11 +6362,15 @@ test(1504.3, dt[, lapply(.SD, sum), by=V1, .SDcols=-(V8:V10)],
              dt[, lapply(.SD, sum), by=V1, .SDcols=-(8:10)])
 test(1504.4, dt[, lapply(.SD, sum), by=V1, .SDcols=!(V8:V10)], 
              dt[, lapply(.SD, sum), by=V1, .SDcols=!(8:10)])
-# with=FALSE
-test(1504.5, dt[, V8:V10, with=FALSE],    dt[, 8:10, with=FALSE])
-test(1504.6, dt[, V10:V8, with=FALSE],    dt[, 10:8, with=FALSE])
-test(1504.7, dt[, -(V8:V10), with=FALSE], dt[, -(8:10), with=FALSE])
-test(1504.8, dt[, !(V8:V10), with=FALSE], dt[, !(8:10), with=FALSE])
+# with=FALSE and auto with=FALSE tests as from v1.9.8
+test(1504.5, dt[, V8:V10, with=FALSE],     dt[, 8:10, with=FALSE])
+test(1504.6, dt[, V8:V10],                 dt[, 8:10, with=FALSE])
+test(1504.7, dt[, V10:V8, with=FALSE],     dt[, 10:8, with=FALSE])
+test(1504.8, dt[, V10:V8],                 dt[, 10:8, with=FALSE])
+test(1504.9, dt[, -(V8:V10), with=FALSE],  dt[, -(8:10), with=FALSE])
+test(1504.11, dt[, -(V8:V10)],             dt[, -(8:10), with=FALSE])
+test(1504.12, dt[, !(V8:V10), with=FALSE], dt[, !(8:10), with=FALSE])
+test(1504.13, dt[, !(V8:V10)],             dt[, !(8:10), with=FALSE])
 
 # Fix for #1083
 dt = data.table(x=1:4, y=c(TRUE,FALSE))
@@ -6435,7 +6496,7 @@ x = data.table(x=c(1,1,1,2), y=1:4, key="x")
 test(1519.1, x[.(2:3), .N, nomatch=0L], 1L)
 x = data.table(k = INT(0,2,3,7), o = "b", key = "k")
 y = data.table(k = 1:5, n = paste("n", 1:5, sep=""), key = "k")
-test(1519.2, x[y, o := n, nomatch = 0], data.table(k = INT(0,2,3,7), o = c("b","n2","n3","b"), key = "k"))
+test(1519.2, x[y, o := n], data.table(k = INT(0,2,3,7), o = c("b","n2","n3","b"), key = "k"))
 
 # Fix for #1141 (thanks to @yvanrichard)
 x <- data.table(zxc = 1:3, vbn = 4:6)
@@ -6485,11 +6546,12 @@ text = "x,y\n1,a\n2,a\n3,b\n4,b\n5,a\n"
 test(1527.3, dt[, y := factor(y)], fread(text, stringsAsFactors=TRUE))
 
 # #1027, check.names argument to fread
-nm1 = names(data.table(a=1:2, a=3:4))
+nm1 = names(fread("a,a\n1,2\n3,4", check.names=FALSE))
 nm2 = names(fread("a,a\n1,2\n3,4", check.names=TRUE))
-nm3 = names(fread("a,a\n1,2\n3,4", check.names=FALSE))
-test(1528.1, make.unique(nm1), nm2)
-test(1528.2, nm1, nm3)
+nm3 = names(fread("a b,a b\n1,2\n3,4", check.names=TRUE))
+test(1528.1, c("a", "a"), nm1)
+test(1528.2, c("a", "a.1"), nm2)
+test(1528.3, c("a.b", "a.b.1"), nm3)
 
 # add tests for between
 x = sample(10, 20, TRUE)
@@ -6555,9 +6617,9 @@ test(1537 , names(melt(dt, id=1L, variable.name = "x", value.name="x")), c("x",
 # test for tables()
 test(1538, tables(), output = "Total:")
 
-# uniqueN could supports list input #1224
+# uniqueN not support list-of-list: reverted #1224
 d1 <- data.table(a = 1:4, l = list(list(letters[1:2]),list(Sys.time()),list(1:10),list(letters[1:2])))
-test(1539, d1[,uniqueN(l)], 3L)
+test(1539, d1[,uniqueN(l)], error = "x must be an atomic vector or data.frames/data.tables")
 
 # feature #1130 - joins without setting keys
 # can't test which=TRUE with DT1.copy's results..
@@ -6820,7 +6882,7 @@ str = 'L1\tsome\tunquoted\tstuff\nL2\tsome\t"half" quoted\tstuff\nL3\tthis\t"sho
 test(1551.4, fread(str), data.table(L1 = c("L2", "L3"), some = c("some", "this"), unquoted = c("\"half\" quoted", "should work"), stuff = c("stuff", "ok thought")))
 #1095
 rhs = read.table("issue_1095_fread.txt", sep=",", comment.char="", stringsAsFactors=FALSE, quote="", strip.white=TRUE)
-test(1551.5, fread("issue_1095_fread.txt"), setDT(rhs), warning="Bumped column 47 to type character on data row")
+test(1551.5, fread("issue_1095_fread.txt"), setDT(rhs))
 
 # FR #1314 rest of na.strings issue
 str = "a,b,c,d\n#N/A,+1,5.5,FALSE\n#N/A,5,6.6,TRUE\n#N/A,+1,#N/A,-999\n#N/A,#N/A,-999,FALSE\n#N/A,1,NA,TRUE"
@@ -6949,8 +7011,2780 @@ test(1557.2, dt[, paste0("index", 1:i), with=FALSE], dt[, 1:2, with=FALSE])
 test(1557.3, dt[, 5:4, with=FALSE], dt[, i:s, with=FALSE])
 test(1557.4, dt[, .SD, .SDcols=paste0("index", 1:i)], dt[, .SD, .SDcols=index1:index2])
 
+# fix for #1354
+test(1558, as.ITime(NA), setattr(NA_integer_, 'class', 'ITime'))
+
+if (!"package:xts" %in% search()) {
+    # #1347, xts issue from Joshua
+    x = as.Date(1:5, origin="2015-01-01")
+    test(1559.11, last(x), tail(x, 1L))
+} else {
+    test(1559.12, last(.xts(1:3,1:3)), .xts(1:3, 1:3)[3, ])
+}
+
+# fix for #1352
+dt1 = data.table(a=1:5, b=6:10, c=11:15)
+dt2 = data.table(a=3:6, b=8:11, d=1L)
+by_cols = c(x="a", y="b")
+test(1560, merge(dt1,dt2, by=by_cols, sort=FALSE), dt1[dt2, nomatch=0L, on=unname(by_cols)])
+
+# FR #1353
+DT = data.table(x=c(20,10,10,30,30,20), y=c("a", "a", "a", "b", "b", "b"), z=1:6)
+
+test(1561.1, rowid(DT$x), as.integer(c(1,1,2,1,2,2)))
+test(1561.2, rowidv(DT, cols="x"), as.integer(c(1,1,2,1,2,2)))
+test(1561.3, rowid(DT$x, prefix="group"), paste("group", as.integer(c(1,1,2,1,2,2)), sep=""))
+test(1561.4, rowid(DT$x, DT$y), as.integer(c(1,1,2,1,2,1)))
+test(1561.5, rowidv(DT, cols=c("x","y")), as.integer(c(1,1,2,1,2,1)))
+# convenient usage with dcast
+test(1561.6, dcast(DT, x ~ rowid(x, prefix="group"), value.var="z"), data.table(x=c(10,20,30), group1=c(2L,1L,4L), group2=c(3L,6L,5L), key="x"))
+
+# Fix for #1346
+DT = data.table(id=1:3, g1=4:6, g2=7:9)
+test(1562, melt(DT, measure=patterns("^g[12]"), variable.factor=FALSE), data.table(id=1:3, variable=rep(c("g1","g2"),each=3L), value=4:9))
+
+# tet 1563 added for melt above, fix for #1359.
+
+# fix for #1341
+dt <- data.table(a = 1:10)
+test(1564.1, truelength(dt[, .SD]), 1025L)
+test(1564.2, truelength(dt[a==5, .SD]), 1025L)
+test(1564.3, dt[a==5, .SD][, b := 1L], data.table(a=5L, b=1L))
+
+# Fix for #1251, DT[, .N, by=a] and DT[, .(.N), by=a] uses GForce now
+dt = data.table(a=sample(3,20,TRUE), b=1:10)
+old = options(datatable.optimize = Inf)
+ans1 = dt[, .N, by=a]
+ans2 = capture.output(dt[, .N, by=a, verbose=TRUE])
+test(1565.1, length(grep("GForce optimized j to", ans2))>0L, TRUE) # make sure GForce optimisation works
+options(datatable.optimize = 1L) # make sure result is right
+test(1565.2, ans1, dt[, .N, by=a])
+options(old)
+
+# Fix for #1212
+set.seed(123)
+dt <- data.table(a=c("abc", "def", "ghi"), b=runif(3))[, c:=list(list(data.table(d=runif(1), e=runif(1))))]
+test(1566.1, dt[, c], dt[, get("c")])
+test(1566.2, dt[, .(c=c)], dt[, .(c=get("c"))])
+test(1566.3, address(dt$c) == address(dt[, get("c")]), FALSE)
+
+# Fix for #1207
+d1 <- data.table(a = character(), b = list())
+test(1567.1, d1[, b, by=a], d1)
+test(1567.2, d1[, b, keyby=a], data.table(d1, key="a"))
+
+# Fix for #1334
+dt = data.table(x=ordered(rep(1:3,each=5)),y=ordered(rep(c("B","A","C"),5),levels=c("B","A","C")),z=1:15)
+test(1568, dt[, sum(z), keyby=.(I(x), I(y))], data.table(I=I(ordered(rep(1:3,each=3))), I.1=I(ordered(rep(c("B","A","C"),3),levels=c("B","A","C"))),V1=c(5L, 7L, 3L, 17L, 8L, 15L, 13L, 25L, 27L), key=c("I", "I.1")))
+
+# Test 1569 is written under melt above.
+
+# fix for #1378, merge resets class
+X = data.table(a=1:3, b=4:6)
+Y = data.table(a=1L, c=5L)
+setattr(Y, 'class', c("custom","data.table","data.frame"))
+test(1570.1, class(merge(X, Y, all=TRUE, by="a")), class(X))
+test(1570.2, class(merge(Y, X, all=TRUE, by="a")), class(X))
+
+# #1379, tstrsplit gains names argument
+X = data.table(a=c("ABC", "DEFG"))
+test(1571.1, names(tstrsplit(X$a, "", fixed=TRUE, names=TRUE)), paste("V", 1:4, sep=""))
+test(1571.2, names(tstrsplit(X$a, "", fixed=TRUE, names=letters[1:3])), error="is not equal to ")
+test(1571.3, names(tstrsplit(X$a, "", fixed=TRUE, names=letters[1:4])), letters[1:4])
+# tstrsplit also gains 'keep' argument
+test(1571.4, tstrsplit(X$a, "", fixed=TRUE, keep=c(2,4)), list(c("B", "E"), c(NA, "G")))
+test(1571.5, tstrsplit(X$a, "", fixed=TRUE, keep=c(2,7)), error="should contain integer")
+test(1571.5, tstrsplit(X$a, "", fixed=TRUE, keep=c(2,4), names=letters[1:5]), error="is not equal to")
+
+# fix for #1367, quote="" argument in use. Using embedded quotes in the example below reads the 
+# first two columns as one. I couldn't find a way to avoid introducing quote argument.
+test(1572, fread('"abcd efgh." ijkl.\tmnop "qrst uvwx."\t45\n', quote=""), 
+           setDT(read.table(text='"abcd efgh." ijkl.\tmnop "qrst uvwx."\t45\n', sep="\t", stringsAsFactors=FALSE, quote="")))
+
+# Fix for #1384, fread with empty new line, initial checks failed due to extra spaces.
+test(1573, fread('a,b
+       1,2
+       '), data.table(a=1L, b=2L))
+
+# Fix for #1375
+X = data.table(a=1:3,b=4:6,c=c("foo","bar","baz"))
+test(1574.1, X[.(5), on="b"], X[2])
+
+X = data.table(A=1:3,b=4:6,c=c("foo","bar","baz"))
+Y = data.table(A=2:4, B=5:7)
+test(1574.2, X[Y, on=c("A",b="B")], X[Y, on=c(A="A", b="B")])
+test(1574.3, X[Y, on=c(b="B", "A")], X[Y, on=c(b="B", A="A")])
+test(1574.4, X["bar", on="c"], X[2L]) # missed previously
+
+# fix for #1376
+X = data.table(a=1:3,b=4:6,c=c("foo","bar","baz"))
+Y = data.table(A=2:4, B=5:7)
+test(1575.1, X[Y, on=c(A="a")], error="not found in x")
+test(1575.2, X[Y, on=c(a="a")], error="not found in i")
+
+# work around for issue introduced in v1.9.4, #1396
+X = data.table(x=5:1, y=6:10)
+setattr(X, 'index', integer(0))
+setattr(attr(X, 'index'), 'x', 5:1) # auto indexed attribute as created from v1.9.4
+test(1576, X[, z := 1:5, verbose=TRUE],
+    output = "Dropping index 'x' as.*beginning of its name.*very likely created by v1.9.4 of data.table")
+
+# fix for #1408
+X = fread("a|b|c|d
+          this|is|row|1
+          this|is|row|2
+          this|NA|NA|3
+          this|is|row|4", stringsAsFactors = TRUE)
+test(1577.1, is.na(X[3, b]), TRUE)
+test(1577.2, levels(X$b), "is")
+X = fread("a|b|c|d
+          this|NA|row|1
+          this|NA|row|2
+          this|NA|NA|3
+          this|NA|row|4", colClasses="character", stringsAsFactors = TRUE)
+test(1577.3, levels(X$b), character(0))
+
+# FR #530, skip blank lines
+input = "a,b\n\n1,3\n2,4"
+test(1578.1, fread(input), data.table(V1=1:2, V2=3:4))
+test(1578.2, fread(input, blank.lines.skip=TRUE), data.table( a=1:2,  b=3:4))
+input = "a,b\n\n\n1,3\n2,4"
+test(1578.3, fread(input, blank.lines.skip=TRUE), data.table( a=1:2,  b=3:4))
+input = "a,b\n\n\n1,3\n\n2,4\n\n"
+test(1578.4, fread(input, blank.lines.skip=TRUE), data.table( a=1:2,  b=3:4))
+
+test(1578.5, fread("530_fread.txt", skip=47L), data.table(V1=1:2, V2=3:4))
+test(1578.6, fread("530_fread.txt", skip=47L, blank.lines.skip=TRUE), data.table(a=1:2, b=3:4))
+
+# gforce optimisations
+dt = data.table(x  = sample(letters, 300, TRUE), 
+                i1 = sample(-10:10, 300, TRUE),
+                i2 = sample(c(-10:10, NA), 300, TRUE),
+                d1 = as.numeric(sample(-10:10, 300, TRUE)),
+                d2 = as.numeric(sample(c(NA, NaN, -10:10), 300, TRUE)))
+if ('package:bit64' %in% search()) {
+    dt[, `:=`(d3 = as.integer64(sample(-10:10, 300, TRUE)))]
+    dt[, `:=`(d4 = as.integer64(sample(c(-10:10,NA), 300, TRUE)))]
+}
+
+# make sure gforce is on
+optim = getOption("datatable.optimize")
+options(datatable.optimize=2L)
+
+# testing gforce::gmedian
+test(1579.1, dt[, lapply(.SD, median), by=x], 
+             dt[, lapply(.SD, function(x) median(as.numeric(x))), by=x])
+test(1579.2, dt[, lapply(.SD, median, na.rm=TRUE), by=x], 
+             dt[, lapply(.SD, function(x) median(as.numeric(x), na.rm=TRUE)), by=x])
+test(1579.3, dt[, lapply(.SD, median), keyby=x], 
+             dt[, lapply(.SD, function(x) median(as.numeric(x))), keyby=x])
+test(1579.4, dt[, lapply(.SD, median, na.rm=TRUE), keyby=x], 
+             dt[, lapply(.SD, function(x) median(as.numeric(x), na.rm=TRUE)), keyby=x])
+ans = capture.output(dt[, lapply(.SD, median), by=x, verbose=TRUE])
+test(1579.5, any(grepl("GForce optimized", ans)), TRUE)
+
+# testing gforce::ghead and gforce::gtail 
+# head(.SD, 1) and tail(.SD, 1) optimisation
+test(1579.6,  dt[, head(.SD,1),  by=x],    dt[, utils::head(.SD,1),  by=x])
+test(1579.7,  dt[, head(.SD,1),  by=x],    dt[, utils::head(.SD,1),  by=x])
+test(1579.8,  dt[, head(.SD,1),  keyby=x], dt[, utils::head(.SD,1),  keyby=x])
+test(1579.9,  dt[, head(.SD,1),  keyby=x], dt[, utils::head(.SD,1),  keyby=x])
+test(1579.10, dt[, head(.SD,1L), by=x],    dt[, utils::head(.SD,1L), by=x])
+test(1579.11, dt[, head(.SD,1L), by=x],    dt[, utils::head(.SD,1L), by=x])
+test(1579.12, dt[, head(.SD,1L), keyby=x], dt[, utils::head(.SD,1L), keyby=x])
+test(1579.13, dt[, head(.SD,1L), keyby=x], dt[, utils::head(.SD,1L), keyby=x])
+
+test(1579.6,  dt[, tail(.SD,1),  by=x],    dt[, utils::tail(.SD,1),  by=x])
+test(1579.7,  dt[, tail(.SD,1),  by=x],    dt[, utils::tail(.SD,1),  by=x])
+test(1579.8,  dt[, tail(.SD,1),  keyby=x], dt[, utils::tail(.SD,1),  keyby=x])
+test(1579.9,  dt[, tail(.SD,1),  keyby=x], dt[, utils::tail(.SD,1),  keyby=x])
+test(1579.10, dt[, tail(.SD,1L), by=x],    dt[, utils::tail(.SD,1L), by=x])
+test(1579.11, dt[, tail(.SD,1L), by=x],    dt[, utils::tail(.SD,1L), by=x])
+test(1579.12, dt[, tail(.SD,1L), keyby=x], dt[, utils::tail(.SD,1L), keyby=x])
+test(1579.13, dt[, tail(.SD,1L), keyby=x], dt[, utils::tail(.SD,1L), keyby=x])
+
+mysub <- function(x, n) x[n]
+test(1579.14, dt[, .SD[2],  by=x],    dt[, mysub(.SD,2),  by=x])
+test(1579.15, dt[, .SD[2],  by=x],    dt[, mysub(.SD,2),  by=x])
+test(1579.16, dt[, .SD[2],  keyby=x], dt[, mysub(.SD,2),  keyby=x])
+test(1579.17, dt[, .SD[2],  keyby=x], dt[, mysub(.SD,2),  keyby=x])
+test(1579.18, dt[, .SD[2L], by=x],    dt[, mysub(.SD,2L), by=x])
+test(1579.19, dt[, .SD[2L], by=x],    dt[, mysub(.SD,2L), by=x])
+test(1579.20, dt[, .SD[2L], keyby=x], dt[, mysub(.SD,2L), keyby=x])
+test(1579.21, dt[, .SD[2L], keyby=x], dt[, mysub(.SD,2L), keyby=x])
+
+ans = capture.output(dt[, .SD[2], by=x, verbose=TRUE])
+test(1579.22, any(grepl("GForce optimized", ans)), TRUE)
+
+options(datatable.optimize=optim)
+
+# test for #1419, rleid doesn't remove names attribute
+x = c("a"=TRUE, "b"=FALSE)
+nx = copy(names(x))
+r = rleid(x)
+test(1580, nx, names(x))
+
+# FR #971, partly addressed (only subsets in 'i')
+# make sure GForce kicks in and the results are identical
+dt = dt[, .(x, d1, d2)]
+old = options(datatable.optimize=1L)
+
+test(1581.1, ans1 <- dt[x %in% letters[15:20], 
+                        c(.N, lapply(.SD, sum, na.rm=TRUE), 
+                              lapply(.SD, min, na.rm=TRUE), 
+                              lapply(.SD, max, na.rm=TRUE), 
+                              lapply(.SD, mean, na.rm=TRUE), 
+                              lapply(.SD, median, na.rm=TRUE)
+                          ), by=x, verbose=TRUE],
+             output = "(GForce FALSE)")
+options(datatable.optimize=2L)
+test(1581.2, ans2 <- dt[x %in% letters[15:20], 
+                        c(.N, lapply(.SD, sum, na.rm=TRUE), 
+                              lapply(.SD, min, na.rm=TRUE), 
+                              lapply(.SD, max, na.rm=TRUE), 
+                              lapply(.SD, mean, na.rm=TRUE), 
+                              lapply(.SD, median, na.rm=TRUE)
+                          ), by=x, verbose=TRUE],
+             output = "GForce optimized j")
+test(1581.3, ans1, ans2)
+
+# subsets in 'i' for head and tail
+options(datatable.optimize=1L)
+test(1581.4, ans1 <- dt[x %in% letters[15:20], head(.SD,1), by=x, verbose=TRUE],
+             output = "(GForce FALSE)")
+options(datatable.optimize=2L)
+test(1581.5, ans2 <- dt[x %in% letters[15:20], head(.SD,1), by=x, verbose=TRUE],
+             output = "GForce optimized j")
+test(1581.6, ans1, ans2)
+
+options(datatable.optimize=1L)
+test(1581.7, ans1 <- dt[x %in% letters[15:20], tail(.SD,1), by=x, verbose=TRUE],
+             output = "(GForce FALSE)")
+options(datatable.optimize=2L)
+test(1581.8, ans2 <- dt[x %in% letters[15:20], tail(.SD,1), by=x, verbose=TRUE],
+             output = "GForce optimized j")
+test(1581.9, ans1, ans2)
+
+options(datatable.optimize=1L)
+test(1581.10, ans1 <- dt[x %in% letters[15:20], .SD[2], by=x, verbose=TRUE],
+              output = "(GForce FALSE)")
+options(datatable.optimize=2L)
+test(1581.11, ans2 <- dt[x %in% letters[15:20], .SD[2], by=x, verbose=TRUE],
+              output = "GForce optimized j")
+test(1581.12, ans1, ans2)
+
+options(old)
+
+# handle NULL value correctly #1429
+test(1582, uniqueN(NULL), 0L)
+
+# bug fix #1461
+dt = data.table(x=c(1,1,1,2,2,2,3,3,3,4,4,4,5), y=c(NaN,1,2, 2,NaN,1, NA,NaN,2, NaN,NA,NaN, NaN))
+optim = getOption("datatable.optimize")
+# make sure gforce is on
+options(datatable.optimize=Inf)
+ans1 = suppressWarnings(dt[, base::min(y, na.rm=TRUE), by=x])
+ans2 = suppressWarnings(dt[, base::max(y, na.rm=TRUE), by=x])
+test(1583.1, dt[, min(y, na.rm=TRUE), by=x], ans1, warning="No non-missing values found")
+test(1583.2, dt[, max(y, na.rm=TRUE), by=x], ans2, warning="No non-missing values found")
+ans3 = suppressWarnings(dt[, base::min(y), by=x])
+ans4 = suppressWarnings(dt[, base::max(y), by=x])
+test(1583.3, dt[, min(y), by=x], ans3)
+test(1583.4, dt[, max(y), by=x], ans4)
+# restore optimisation
+options(datatable.optimize=optim)
+
+# Fixed a minor bug in fread when blank.lines.skip=TRUE
+f1 <- function(x, f=TRUE, b=FALSE) fread(x, fill=f, blank.lines.skip=b, data.table=FALSE)
+f2 <- function(x, f=TRUE, b=FALSE) read.table(x, fill=f, blank.lines.skip=b, sep=",", header=TRUE, stringsAsFactors=FALSE)
+test(1584.1, f1("fread_blank.txt", f=FALSE, b=TRUE), f2("fread_blank.txt", f=FALSE, b=TRUE))
+test(1584.2, f1("fread_blank2.txt", f=FALSE, b=TRUE), f2("fread_blank2.txt", f=FALSE, b=TRUE))
+test(1584.3, f1("fread_blank3.txt", f=FALSE, b=TRUE), f2("fread_blank3.txt", f=FALSE, b=TRUE))
+
+# fread fill=TRUE, #536. Also takes care of #1124.
+test(1585.1, f1("536_fread_fill_1.txt"), f2("536_fread_fill_1.txt"))
+test(1585.2, f1("536_fread_fill_1.txt", b=TRUE), f2("536_fread_fill_1.txt", b=TRUE))
+
+test(1585.3, f1("536_fread_fill_2.txt"), f2("536_fread_fill_2.txt"))
+test(1585.4, f1("536_fread_fill_2.txt", b=TRUE), f2("536_fread_fill_2.txt", b=TRUE))
+
+test(1585.5, f1("536_fread_fill_3_extreme.txt"), f2("536_fread_fill_3_extreme.txt"))
+test(1585.6, f1("536_fread_fill_3_extreme.txt", b=TRUE), f2("536_fread_fill_3_extreme.txt", b=TRUE))
+# no warning about bumping type. when fill=TRUE, column type detection starts at first non-empty line (which makes sense). 
+test(1585.7, f1("536_fread_fill_4.txt"), f2("536_fread_fill_4.txt"))
+test(1585.8, f1("536_fread_fill_4.txt", b=TRUE), f2("536_fread_fill_4.txt", b=TRUE))
+
+# TODO: add a test when fill=FALSE, but blank.lines.skip=TRUE, when the same effect should happen
+# TODO: fix and add test for cases like this:
+# a,b,c
+# 1,2,3
+# 4,5,6
+# 7,8,9,6 # extra column, but we've only detected 3 cols
+# 1,2,3
+# ...
+
+# fix for #721
+text="x,y\n1,a\n2,b\n"
+test(1586.1, fread(text, colClasses=c("integer", "factor")), data.table(x=1:2, y=factor(letters[1:2])))
+test(1586.2, fread(text, colClasses=c(x="factor")), data.table(x=factor(1:2), y=letters[1:2]))
+
+# FR #590
+text="x,y\n2,a\n1,q\n3,c\n"
+test(1587, fread(text, key="y"), setDT(fread(text), key="y"))
+
+# fix for #1361
+dt = data.table(i=1:10, f=as.factor(1:10))
+test(1588.1, dt[f %in% 3:4], dt[3:4])
+test(1588.2, dt[f == 3], dt[3])
+
+# fix for #1484
+if ("package:xts" %in% search()) {
+    x = xts::as.xts(8, order.by = as.Date("2016-01-03"))
+    test(1589, all.equal(as.data.table(x), data.table(index = as.Date("2016-01-03"), V1 = 8), check.attributes=FALSE))
+}
+
+# encoding issue in forder
+x <- "fa\xE7ile"
+Encoding(x)
+Encoding(x) <- "latin1"
+xx <- iconv(x, "latin1", "UTF-8")
+y = sample(c(x,xx), 10, TRUE)
+oy = if (length(oy <- forderv(y))) oy else seq_along(y)
+test(1590.4, oy, order(y))
+Encoding(xx) = "unknown"
+y = sample(c(x,xx), 10, TRUE)
+oy = if (length(oy <- forderv(y))) oy else seq_along(y)
+test(1590.5, oy, order(y))
+
+# #1432 test
+list_1 = list(a = c(44,47), dens = c(2331,1644))
+list_2 = list(a=66, dens= 1890)
+list_3 = list(a=c(44,46,48,50), dens=c(8000,1452,1596,7521)) 
+mylist = list(list_1, list_2, list_3)
+setattr(mylist, 'names', c("ID_1","ID_2","ID_3"))
+ans = data.table(id=rep(c("ID_1","ID_2","ID_3"), c(2,1,4)), 
+                  a=c(44,47,66,44,46,48,50), 
+               dens=c(2331,1644,1890,8000,1452,1596,7521))
+test(1591, rbindlist(mylist, idcol="id"), ans)
+
+# FR #1443
+DT <- data.table(x = 1:3, y = 4:6, z = 7:9)
+test(1592.1, setnames(DT, -5, "bla"), error="Items of 'old'")
+test(1592.2, names(setnames(DT, -1, c("m", "n"))), c("x", "m", "n"))
+
+# fix for #1513
+test(1593, CJ(c(1,2,2), c(1,2,3)), data.table(V1=rep(c(1,2), c(3,6)), V2=c(1,2,3,1,1,2,2,3,3), key=c("V1", "V2")))
+
+# FR #523, var, sd and prod
+options(datatable.optimize = Inf) # ensure gforce is on
+DT = data.table(x=sample(5, 100, TRUE), 
+               y1=sample(6, 100, TRUE), 
+               y2=sample(c(1:10,NA), 100, TRUE), 
+               z1=runif(100), 
+               z2=sample(c(runif(10),NA,NaN), 100, TRUE))
+test(1594.1, DT[, lapply(.SD, var, na.rm=FALSE), by=x], DT[, lapply(.SD, stats::var, na.rm=FALSE), by=x])
+test(1594.2, DT[, lapply(.SD, var, na.rm=TRUE), by=x], DT[, lapply(.SD, stats::var, na.rm=TRUE), by=x])
+test(1594.3, DT[, lapply(.SD, var, na.rm=TRUE), by=x, verbose=TRUE], output="GForce optimized j to.*gvar")
+
+test(1594.4, DT[, lapply(.SD, sd, na.rm=FALSE), by=x], DT[, lapply(.SD, stats::sd, na.rm=FALSE), by=x])
+test(1594.5, DT[, lapply(.SD, sd, na.rm=TRUE), by=x], DT[, lapply(.SD, stats::sd, na.rm=TRUE), by=x])
+test(1594.6, DT[, lapply(.SD, sd, na.rm=TRUE), by=x, verbose=TRUE], output="GForce optimized j to.*gsd")
+
+test(1594.7, DT[, lapply(.SD, prod, na.rm=FALSE), by=x], DT[, lapply(.SD, base::prod, na.rm=FALSE), by=x])
+test(1594.8, DT[, lapply(.SD, prod, na.rm=TRUE), by=x], DT[, lapply(.SD, base::prod, na.rm=TRUE), by=x])
+test(1594.9, DT[, lapply(.SD, prod, na.rm=TRUE), by=x, verbose=TRUE], output="GForce optimized j to.*gprod")
+
+# FR #1517
+dt1 = data.table(x=c(1,1,2), y=1:3)
+dt2 = data.table(x=c(2,3,4), z=4:6)
+test(1595, merge(dt1,dt2), merge(dt1,dt2, by="x"))
+
+# FR 1512, drop argument for dcast.data.table
+DT <- data.table(v1 = c(1.1, 1.1, 1.1, 2.2, 2.2, 2.2),
+                 v2 = factor(c(1L, 1L, 1L, 3L, 3L, 3L), levels=1:3), 
+                 v3 = factor(c(2L, 3L, 5L, 1L, 2L, 6L), levels=1:6), 
+                 v4 = c(3L, 2L, 2L, 5L, 4L, 3L)) 
+ans1 <- dcast(DT, v1+v2~v3, value.var="v4", drop=FALSE)
+test(1596.1, dcast(DT, v1+v2~v3, value.var="v4", drop=c(FALSE, TRUE)), ans1[, -6, with=FALSE])
+test(1596.2, dcast(DT, v1+v2~v3, value.var="v4", drop=c(TRUE, FALSE)), ans1[c(1,6)])
+
+# bug fix #1495
+dt  = data.table(id=1:30, nn = paste0('A', 1:30))
+smp = sample(30, size =10)
+lgl = dt$id %in% smp
+test(1597, dt[lgl, ], dt[id %in% smp])
+
+# FR #643
+vv = sample(letters[1:3], 10, TRUE)
+test(1599.1, data.table(x=vv, y=1:10, stringsAsFactors=TRUE)$x, factor(vv))
+vv = sample(c(letters[1:3], NA), 10, TRUE)
+test(1599.2, data.table(x=vv, y=1:10, stringsAsFactors=TRUE)$x, factor(vv))
+
+# bug #1477 fix
+DT <- data.table(a = 0L:1L, b = c(1L, 1L))
+test(1600.1, DT[ , lapply(.SD, function(x) if (all(x)) x)], data.table(b=c(1L, 1L)))
+# this fix wasn't entirely nice as it introduced another issue.
+# it's fixed now, but adding a test for that issue as well to catch it early next time.
+set.seed(17022016L)
+DT1 = data.table(id1 = c("c", "a", "b", "b", "b", "c"), 
+                  z1 = sample(100L, 6L), 
+                  z2 = sample(letters, 6L))
+DT2 = data.table(id1=c("c", "w", "b"), val=50:52)
+test(1600.2, names(DT1[DT2, .(id1=id1, val=val, bla=sum(z1, na.rm=TRUE)), on="id1"]), c("id1", "val", "bla"))
+
+# warn when merge empty data.table #597
+test(1601.1, merge(data.table(a=1),data.table(a=1), by="a"), data.table(a=1, key="a"))
+test(1601.2, tryCatch(merge(data.table(a=1),data.table(NULL), by="a"), warning = function(w) w$message), "You are trying to join data.tables where 'y' argument is 0 columns data.table.")
+test(1601.3, tryCatch(merge(data.table(NULL),data.table(a=1), by="a"), warning = function(w) w$message), "You are trying to join data.tables where 'x' argument is 0 columns data.table.")
+test(1601.4, tryCatch(merge(data.table(NULL),data.table(NULL), by="a"), warning = function(w) w$message), "You are trying to join data.tables where 'x' and 'y' arguments are 0 columns data.table.")
+
+# migrate `chron` dependency to Suggests #1558
+dd = as.IDate("2016-02-28")
+tt = as.ITime("03:04:43")
+if(!requireNamespace("chron", quietly = TRUE)){
+    # Since chron is recommended package and installed by default, perhaps it's ok doing requireNamespace
+    test(1602.1, as.chron.IDate(dd), error = "Install suggested `chron` package to use `as.chron.IDate` function.")
+    test(1602.2, as.chron.ITime(tt), error = "Install suggested `chron` package to use `as.chron.ITime` function.")
+} else {
+    test(1602.3, as.chron.IDate(dd), chron::as.chron(as.Date(dd)))
+    test(1602.4, class(as.chron.ITime(tt)), "times")
+}
+
+# fix for #1549
+d1 <- data.table(v1=1:2,x=x)
+d2 <- data.table(v1=3:4)
+test(1603.1, rbindlist(list(d2, d1), fill=TRUE), rbindlist(list(d1,d2), fill=TRUE)[c(3:4, 1:2)])
+
+# fix for #1440
+DT = data.table(a=1:3, b=4:6)
+myCol = "b"
+test(1604, DT[,.(myCol),with=F], error="When with=FALSE,")
+
+# fix for segfault #1531
+DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
+test(1605, DT[order(-x, "D")], error="Column 2 is length 1 which differs")
+
+# fix for #1503, fread's fill argument polishing
+test(1606, fread("2\n1,a,b", fill=TRUE), data.table(V1=2:1, V2=c("","a"), V3=c("","b")))
+
+# fix for #1476
+dt = data.table(resp=c(1:5))
+wide = copy(list(metrics = dt))$metrics # copy here copies the list of data.table and therefore didn't overallocate before..
+test(1607, wide[, id := .I], data.table(resp = 1:5, id = 1:5))
+
+# better fix for #1462, + improved error message (if this better fix fails)
+# no need for quote="" and sep="\t"..
+test(1608, dim(fread('issue_1462_fread_quotes.txt', header=FALSE)), c(4L, 224L))
+
+# fix for #1164
+test(1609, fread("issue_1164_json.txt"), data.table(json1='{""f1"":""value1"",""f2"":""double quote escaped with a backslash [ \\"" ]""}', string1="string field"))
+
+# set of enhancements to print.data.table for #1523
+# dplyr-like column summary
+icol = 1L:3L
+Dcol = as.Date(paste0("2016-01-0", 1:3))
+DT1  = data.table(lcol = list(list(1:3), list(1:3), list(1:3)),
+                  icol, ncol = as.numeric(icol), ccol = c("a", "b", "c"), 
+                  xcol = as.complex(icol), ocol = factor(icol, ordered = TRUE),
+                  fcol = factor(icol))
+test(1610.1, capture.output(print(DT1, class=TRUE)),
+     c("     lcol  icol  ncol   ccol   xcol  ocol   fcol",
+       "   <list> <int> <num> <char> <cplx> <ord> <fctr>", 
+       "1: <list>     1     1      a   1+0i     1      1", 
+       "2: <list>     2     2      b   2+0i     2      2", 
+       "3: <list>     3     3      c   3+0i     3      3"))
+# fails on travis and appveyor; no idea why.. Passes on my mac and windows machine.
+# test(1610.2, capture.output(print(DT2, class=TRUE))
+#      c("         Dcol                Pcol   gcol       Icol   ucol", 
+#        "       <Date>              <POSc> <lgcl>     <IDat> <asdf>", 
+#        "1: 2016-01-01 2016-01-01 01:00:00   TRUE 2016-01-01      1", 
+#        "2: 2016-01-02 2016-01-02 01:00:00   TRUE 2016-01-02      2", 
+#        "3: 2016-01-03 2016-01-03 01:00:00   TRUE 2016-01-03      3"))
+
+# fix for #833
+l1 = list(a=seq_len(5), matrix(seq_len(25),ncol = 5, nrow = 5))
+l2 = list(seq_len(5), matrix(seq_len(25),ncol = 5, nrow = 5))
+test(1611.1, as.data.table(l1), setnames(setDT(as.data.frame(l1)), c("a", paste("V", 1:5, sep=""))))
+test(1611.2, as.data.table(l2), setnames(setDT(as.data.frame(l2)), c("V1", "V1.1", paste("V", 2:5, sep=""))))
+
+# fix for #646
+# tz= is explicitly specified otherwise CRAN's solaris (both sparc and x86) fail. It may not be solaris per se
+# but something related to the timezone of the two solaris machines. I guess one or the other of as.POSIXct or
+# as.POSIXlt create the 'tzone' attribute differently for default tz="", just on solaris. I checked test.data.table
+# already uses all.equal(), not identical(). So I don't think it is an accuracy problem. But could be wrong. 
+ll = list(a=as.POSIXlt("2015-01-01", tz='UTC'), b=1:5)
+test(1612.1, as.data.table(ll), data.table(a=as.POSIXct("2015-01-01", tz='UTC'), b=1:5), warning="POSIXlt column type detected")
+dt = data.table(d1="1984-03-17")
+ans = data.table(d1="1984-03-17", d2=as.POSIXct("1984-03-17", tz='UTC'))
+test(1612.2, dt[, d2 := strptime(d1, "%Y-%m-%d", tz='UTC')], ans, warning="POSIXlt column type detected and converted")
+ll = list(a=as.POSIXlt("2015-01-01"), b=2L)
+test(1612.3, setDT(ll), error="Column 1 is of POSIXlt type")
+
+# tests for all.equal.data.table #1106
+# diff nrow
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(a = c(1:4,4L), b = letters[c(1:4,4L)])
+test(1613.1, all.equal(DT1, DT2), "Different number of rows")
+# diff ncol
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(a = 1:4)
+test(1613.2, all.equal(DT1, DT2), c("Different number of columns", "Different column names"))
+# diff colnames
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(aa = 1:4, bb = letters[1:4])
+test(1613.3, all.equal(DT1, DT2), "Different column names")
+# diff column order
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(b = letters[1:4], a = 1:4)
+test(1613.4, all.equal(DT1, DT2), "Different column order")
+test(1613.5, all.equal(DT1, DT2, ignore.col.order=TRUE), TRUE)
+# diff row order
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(a = 4:1, b = letters[4:1])
+test(1613.6, all.equal(DT1, DT2), "Column 'a': Mean relative difference: 0.8")
+test(1613.7, all.equal(DT1, DT2, ignore.row.order=TRUE), TRUE)
+# diff column order and diff row order
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(b = letters[4:1], a = 4:1)
+test(1613.8, all.equal(DT1, DT2), "Different column order")
+test(1613.9, all.equal(DT1, DT2, ignore.row.order=TRUE), "Different column order")
+test(1613.10, all.equal(DT1, DT2, ignore.col.order=TRUE), "Column 'a': Mean relative difference: 0.8")
+test(1613.11, all.equal(DT1, DT2, ignore.row.order=TRUE, ignore.col.order=TRUE), TRUE)
+# non-overlapping duplicates
+DT1 <- data.table(a = c(1:4,1:2), b = letters[c(1:4,1:2)])
+DT2 <- data.table(a = c(1:4,3:4), b = letters[c(1:4,3:4)])
+test(1613.12, all.equal(DT1, DT2), "Column 'a': Mean relative difference: 1.333333")
+test(1613.13, all.equal(DT1, DT2, ignore.row.order=TRUE), "Dataset 'current' has rows not present in 'target' or present in different quantity")
+# overlapping duplicates
+DT1 <- data.table(a = c(1:4,1:2), b = letters[c(1:4,1:2)])
+DT2 <- data.table(a = c(1:4,2:1), b = letters[c(1:4,2:1)])
+test(1613.14, all.equal(DT1, DT2), "Column 'a': Mean relative difference: 0.6666667")
+test(1613.15, all.equal(DT1, DT2, ignore.row.order=TRUE), TRUE)
+# mixed overlapping duplicates
+DT1 <- data.table(a = c(1:4,1:2), b = letters[c(1:4,1:2)])
+DT2 <- data.table(a = c(1:4,2:3), b = letters[c(1:4,2:3)])
+test(1613.16, all.equal(DT1, DT2, ignore.row.order = TRUE), "Dataset 'current' has rows not present in 'target' or present in different quantity")
+# overlapping duplicates not equal in count
+DT1 <- data.table(a = c(1:4, rep(1L,3), rep(2L,2)), b = letters[c(1:4, rep(1L,3), rep(2L,2))])
+DT2 <- data.table(a = c(1:4, rep(1L,2), rep(2L,3)), b = letters[c(1:4, rep(1L,2), rep(2L,3))])
+test(1613.17, all.equal(DT1, DT2, ignore.row.order = TRUE), "Dataset 'current' has rows not present in 'target' or present in different quantity")
+# overlapping duplicates equal in count
+DT1 <- data.table(a = c(1:4, 1L, 2L, 1L, 2L), b = letters[c(1:4, 1L, 2L, 1L, 2L)])
+DT2 <- data.table(a = c(2L, 1L, 1L, 2L, 1:4), b = letters[c(2L, 1L, 1L, 2L, 1:4)])
+test(1613.18, all.equal(DT1, DT2, ignore.row.order = TRUE), TRUE)
+# subset with overlapping duplicates
+DT1 <- data.table(a = c(1:3,3L), b = letters[c(1:3,3L)])
+DT2 <- data.table(a = c(1:4), b = letters[c(1:4)])
+test(1613.19, all.equal(DT1, DT2, ignore.row.order = TRUE), "Dataset 'target' has duplicate rows while 'current' doesn't")
+# different number of unique rows
+DT1 <- data.table(a = c(1:3,2:3), b = letters[c(1:3,2:3)])
+DT2 <- data.table(a = c(1L,1:4), b = letters[c(1L,1:4)])
+test(1613.20, all.equal(DT1, DT2, ignore.row.order = TRUE), "Dataset 'current' has rows not present in 'target' or present in different quantity")
+test(1613.21, all.equal(DT2, DT1, ignore.row.order = TRUE), "Dataset 'current' has rows not present in 'target' or present in different quantity")
+# test attributes: key
+DT1 <- data.table(a = 1:4, b = letters[1:4], key = "a")
+DT2 <- data.table(a = 1:4, b = letters[1:4])
+test(1613.22, all.equal(DT1, DT2), "Datasets has different keys. 'target': a. 'current' has no key.")
+test(1613.23, all.equal(DT1, DT2, check.attributes = FALSE), TRUE)
+test(1613.24, all.equal(DT1, setkeyv(DT2, "a"), check.attributes = TRUE), TRUE)
+# test attributes: index
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(a = 1:4, b = letters[1:4])
+setindexv(DT1, "b")
+test(1613.25, all.equal(DT1, DT2), "Datasets has different indexes. 'target': b. 'current' has no index.")
+test(1613.26, all.equal(DT1, DT2, check.attributes = FALSE), TRUE)
+test(1613.27, all.equal(DT1, setindexv(DT2, "a")), "Datasets has different indexes. 'target': b. 'current': a.")
+test(1613.28, all.equal(DT1, setindexv(DT2, "b")), "Datasets has different indexes. 'target': b. 'current': a, b.")
+test(1613.29, all.equal(DT1, setindexv(setindexv(DT2, NULL), "b")), TRUE)
+# test custom attribute
+DT1 <- data.table(a = 1:4, b = letters[1:4])
+DT2 <- data.table(a = 1:4, b = letters[1:4])
+setattr(DT1, "custom", 1L)
+test(1613.30, all.equal(DT1, DT2), "Datasets has different number of (non-excluded) attributes: target 3, current 2")
+test(1613.31, all.equal(DT1, DT2, check.attributes = FALSE), TRUE)
+setattr(DT2, "custom2", 2L)
+test(1613.32, all.equal(DT1, DT2), "Datasets has attributes with different names: custom, custom2")
+setattr(DT1, "custom2", 2L)
+setattr(DT2, "custom", 0L)
+if (base::getRversion() > "3.0.0") test(1613.33, all.equal(DT1, DT2), paste0("Attributes: < Component ", dQuote("custom"), ": Mean relative difference: 1 >"))
+setattr(DT2, "custom", 1L)
+test(1613.34, all.equal(DT1, DT2), TRUE)
+# trim.levels
+dt1 <- data.table(A = factor(letters[1:10])[1:4]) # 10 levels
+dt2 <- data.table(A = factor(letters[1:5])[1:4]) # 5 levels
+test(1613.35, all.equal(dt1, dt2))
+test(1613.36, !isTRUE(all.equal(dt1, dt2, trim.levels = FALSE)))
+test(1613.37, !isTRUE(all.equal(dt1, dt2, trim.levels = FALSE, check.attributes = FALSE)))
+test(1613.38, all.equal(dt1, dt2, trim.levels = FALSE, ignore.row.order = TRUE))
+test(1613.39, length(levels(dt1$A)) == 10L && length(levels(dt2$A)) == 5L, TRUE) # dt1 and dt2 not updated by reference
+# unsupported column types: list
+dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = lapply(1:4, function(x) new.env()))
+test(1613.40, all.equal(dt, dt), TRUE)
+test(1613.41, all.equal(dt, dt, ignore.row.order = TRUE), error = "Datasets to compare with 'ignore.row.order' must not have unsupported column types: list")
+# unsupported type in set-ops: complex, raw
+dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = as.complex(1:4), V4 = as.raw(1:4), V5 = lapply(1:4, function(x) NULL))
+test(1613.42, all.equal(dt, dt), TRUE)
+test(1613.43, all.equal(dt, dt, ignore.row.order = TRUE), error = "Datasets to compare with 'ignore.row.order' must not have unsupported column types: raw, complex, list")
+# supported types multi column test
+dt = data.table(
+    V1 = 1:4,
+    V2 = as.numeric(1:4),
+    V3 = letters[rep(1:2, 2)],
+    V4 = factor(c("a","a","b","b")),
+    V5 = as.POSIXct("2016-03-05 12:00:00", origin="1970-01-01")+(1:4)*3600,
+    V6 = as.Date("2016-03-05", origin="1970-01-01")+(1:4)
+)[, V7 := as.IDate(V6)
+  ][, V8 := as.ITime(V5)]
+test(1613.441, all.equal(dt, dt), TRUE)
+test(1613.442, all.equal(dt, dt, ignore.row.order = TRUE), TRUE)
+test(1613.443, all.equal(dt[c(1:4,1L)], dt[c(1:4,1L)]), TRUE)
+test(1613.444, all.equal(dt[c(1:4,1L)], dt[c(1L,1:4)]), "Column 'V1': Mean relative difference: 0.6")
+test(1613.445, all.equal(dt[c(1:4,1L)], dt[c(1L,1:4)], ignore.row.order = TRUE), TRUE)
+test(1613.45, all.equal(dt[c(1:4,1:2)], dt[c(1L,1L,1:4)], ignore.row.order = TRUE), c("Both datasets have duplicate rows, they also have numeric columns, together with ignore.row.order this force 'tolerance' argument to 0", "Dataset 'current' has rows not present in 'target' or present in different quantity"))
+test(1613.46, all.equal(dt[c(1:2,1:4,1:2)], dt[c(1:2,1:2,1:4)], ignore.row.order = TRUE), TRUE)
+# supported type all.equal: integer64
+if ("package:bit64" %in% search()) {
+    dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = bit64::as.integer64("90000000000")+1:4)
+    test(1613.47, all.equal(dt, dt), TRUE)
+    test(1613.48, all.equal(dt, dt, ignore.row.order = TRUE), TRUE)
+    test(1613.49, all.equal(dt[c(1:4,1L)], dt[c(1:4,1L)]), TRUE)
+    test(1613.50, all.equal(dt[c(1:4,1L)], dt[c(1L,1:4)]), "Column 'V1': Mean relative difference: 0.6")
+    test(1613.51, all.equal(dt[c(1:4,1L)], dt[c(1L,1:4)], ignore.row.order = TRUE), TRUE)
+    test(1613.52, all.equal(dt[c(1:4,1:2)], dt[c(1L,1L,1:4)], ignore.row.order = TRUE), c("Both datasets have duplicate rows, they also have numeric columns, together with ignore.row.order this force 'tolerance' argument to 0","Dataset 'current' has rows not present in 'target' or present in different quantity"))
+    test(1613.53, all.equal(dt[c(1:2,1:4,1:2)], dt[c(1:2,1:2,1:4)], ignore.row.order = TRUE), TRUE)
+}
+# all.equal - new argument 'tolerance' #1737
+x = data.table(1) # test numeric after adding 'tolerance' argument
+y = data.table(2)
+test(1613.5411, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5412, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(c(1,1))
+y = data.table(c(2,2))
+test(1613.5421, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5422, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(c(1,2))
+y = data.table(c(2,2))
+test(1613.5431, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5432, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(as.factor(1)) # test factor adding 'tolerance' argument
+y = data.table(as.factor(2))
+test(1613.5511, !isTRUE(all.equal(x,y)))
+test(1613.5512, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5513, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(as.factor(c(1,1)))
+y = data.table(as.factor(c(2,2)))
+test(1613.5521, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5522, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(as.factor(c(1,2)))
+y = data.table(as.factor(c(2,2)))
+test(1613.5531, !isTRUE(all.equal(x, y, ignore.row.order = FALSE)))
+test(1613.5532, !isTRUE(all.equal(x, y, ignore.row.order = TRUE)))
+x = data.table(-0.000189921844659375) # tolerance in action
+y = data.table(-0.000189921844655161)
+test(1613.561, all(all.equal(x, y, ignore.row.order = FALSE), all.equal(x, y, ignore.row.order = TRUE)))
+test(1613.562, all(is.character(all.equal(x, y, ignore.row.order = FALSE, tolerance = 0)), is.character(all.equal(x, y, ignore.row.order = TRUE, tolerance = 0))))
+test(1613.563, all(all.equal(rbind(x,y), rbind(y,y), ignore.row.order = FALSE), all.equal(rbind(x,y), rbind(y,y), ignore.row.order = TRUE)))
+test(1613.564, all(is.character(all.equal(rbind(x,y), rbind(y,y), ignore.row.order = FALSE, tolerance = 0)), is.character(all.equal(rbind(x,y), rbind(y,y), ignore.row.order = TRUE, tolerance = 0))))
+test(1613.565, all(all.equal(rbind(x,x,y), rbind(y,y,x), ignore.row.order = FALSE), is.character(r<-all.equal(rbind(x,x,y), rbind(y,y,x), ignore.row.order = TRUE)) && any(grepl("force 'tolerance' argument to 0", r)))) # no-match due factor force tolerance=0
+test(1613.566, all(all.equal(rbind(x,y,y), rbind(x,y,y), ignore.row.order = FALSE, tolerance = 0), all.equal(rbind(x,y,y), rbind(x,y,y), ignore.row.order = TRUE, tolerance = 0)))
+test(1613.567, all(is.character(all.equal(rbind(x,x,y), rbind(y,y,x), ignore.row.order = FALSE, tolerance = 0)), is.character(all.equal(rbind(x,x,y), rbind(y,y,x), ignore.row.order = TRUE, tolerance = 0))))
+test(1613.571, all(all.equal(cbind(x, list(factor(1))), cbind(y, list(factor(1))), ignore.row.order = FALSE), is.character(r<-all.equal(cbind(x, list(factor(1))), cbind(y, list(factor(1))), ignore.row.order = TRUE)) && any(grepl("force 'tolerance' argument to 0", r)))) # no-match due factor force tolerance=0
+test(1613.572, all(all.equal(cbind(x, list(factor(1))), cbind(x, list(factor(1))), ignore.row.order = FALSE), all.equal(cbind(x, list(factor(1))), cbind(x, list(factor(1))), ignore.row.order = TRUE))) # x to x with factor equality
+test(1613.573, all.equal(cbind(x, list(factor(1))), cbind(x, list(factor(1))), ignore.row.order = TRUE, tolerance = 1), error = "Factor columns and ignore.row.order cannot be used with non 0 tolerance argument") # error due to provided non zero tolerance
+test(1613.581, all(all.equal(x, y, ignore.row.order = FALSE, tolerance = 1), all.equal(x, y, ignore.row.order = TRUE, tolerance = 1)))
+test(1613.582, all(all.equal(x, y, ignore.row.order = FALSE, tolerance = sqrt(.Machine$double.eps)/2), all.equal(x, y, ignore.row.order = TRUE, tolerance = sqrt(.Machine$double.eps)/2)), warning = "Argument 'tolerance' was forced")
+
+if ("package:bit64" %in% search()) {
+    # fix for #1405, handles roll with -ve int64 values properly
+    dt = data.table(x=as.integer64(c(-1000, 0)), y=c(5,10))
+    val = c(-1100,-900,100)
+    ans = data.table(x=as.integer64(val))
+    test(1614.1, dt[.(val), roll=Inf,  on="x"], ans[, y:=c(NA,5,10)])
+    test(1614.2, dt[.(val), roll=Inf, on="x", rollends=TRUE], ans[, y:=c(5,5,10)])
+    test(1614.3, dt[.(val), roll=-Inf, on="x"], ans[, y:=c(5,10,NA)])
+    test(1614.4, dt[.(val), roll=-Inf, on="x", rollends=TRUE], ans[, y:=c(5,10,10)])
+}
+
+# fix for #1571
+x = data.table(c(1,1,2,7,2,3,4,4,7), 1:9)
+y = data.table(c(2,3,4,4,4,5))
+test(1615.1, x[!y, on="V1", mult="first"], data.table(V1=c(1,7), V2=INT(c(1,4))))
+test(1615.2, x[!y, on="V1", mult="last"], data.table(V1=c(1,7), V2=INT(c(2,9))))
+test(1615.3, x[!y, on="V1", mult="all"], data.table(V1=c(1,1,7,7), V2=INT(c(1,2,4,9))))
+
+# fix for #1287 and #1271
+set.seed(1L)
+dt = data.table(a=c(1,1,2), b=sample(10,3), c=sample(10,3))
+test(1616.1, dt[.(1:2), if (c-b > 0L) b, on="a", by=.EACHI, mult="first"], data.table(a=c(1,2), V1=c(3L,5L)))
+test(1616.2, dt[.(1:2), if (c-b > 0L) b, on="a", by=.EACHI, mult="last"], data.table(a=c(2), V1=5L))
+test(1616.3, dt[.(1:2), c := if (c-b > 0L) b, by=.EACHI, mult="first", on="a"], 
+             data.table(a=dt$a, b=dt$b, c=c(3L,2L,5L)) )
+
+# fix for #1281
+x <- 3 > 0
+ans = setattr(copy(x), "foo", "bar")
+if (base::getRversion() > "3.0.0") test(1617, setattr(x, "foo", "bar"), ans, warning = "Input is a length=1 logical that")
+
+# fix for #1445
+test(1618.1, fread("a,c,b\n1,2,3", select=c("b", "c")), data.table(b=3L, c=2L))
+test(1618.2, fread("a,c,b\n1,2,3", select=c("c", "b")), data.table(c=2L, b=3L))
+test(1618.3, fread("a,c,b\n1,2,3", select=c(3,2)), data.table(b=3L, c=2L))
+test(1618.4, fread("a,c,b\n1,2,3", select=c(2:3)), data.table(c=2L, b=3L))
+test(1618.5, fread("a,c,b\n1,2,3", select=c("b", "c"), col.names=c("q", "r")), data.table(q=3L, r=2L))
+test(1618.6, fread("a,c,b\n1,2,3", select=c("b", "z")), data.table(b=3L), warning="Column name 'z' not found.*skipping")
+
+# fix for #1270. Have been problems with R before vs after 3.1.0 here. But now ok in all R versions.
+DT = data.table(x=1:2, y=5:6)
+test(1619.1, DT[, .BY, by=x]$BY, as.list(1:2))
+test(1619.2, DT[, bycol := .BY, by=x]$bycol, as.list(1:2))
+
+# fix for #473
+DT = data.frame(x=1, y=2)
+setattr(DT, 'class', c('data.table', 'data.frame')) # simulates over-allocation lost scenario
+if (!truelength(DT)) test(1620, truelength(as.data.table(DT)), 1026L)
+
+# fix for #1116, (#1239 and #1201)
+test(1621.1, fread("issue_1116_fread_few_lines.txt"), setDT(read.delim("issue_1116_fread_few_lines.txt", stringsAsFactors=FALSE, sep=",", check.names=FALSE)))
+test(1621.2, fread("issue_1116_fread_few_lines_2.txt"), setDT(read.delim("issue_1116_fread_few_lines_2.txt", stringsAsFactors=FALSE, sep=",", check.names=FALSE)))
+
+# fix for #1573
+ans1 = fread("issue_1573_fill.txt", fill=TRUE, na.strings="")
+ans2 = setDT(read.table("issue_1573_fill.txt", header=TRUE, fill=TRUE, stringsAsFactors=FALSE, na.strings=""))
+test(1622.1, ans1, ans2)
+test(1622.2, ans1, fread("issue_1573_fill.txt", fill=TRUE, sep=" ", na.strings=""))
+
+# fix for #989
+# error_msg = if (base::getRversion() < "3.4") "can not be a directory name" else "does not exist"
+# Until R v3.3, file.info("~") returned TRUE for isdir. This seems to return NA in current devel. However, it
+# correctly identifies that "~" is not a file. So leads to another error message. So removing the error message
+# so that it errors properly on both versions. This seems fine to me since we just need it to error. Tested.
+test(1623, fread("~"), error="")
+
+# testing print.rownames option, #1097 (part of #1523)
+old = getOption("datatable.print.rownames")
+options(datatable.print.rownames = FALSE)
+DT <- data.table(a = 1:3)
+test(1624, capture.output(print(DT)), c(" a", " 1", " 2", " 3"))
+options(datatable.print.rownames = old)
+
+# fix for #1575
+text = "colA: dataA\ncolB: dataB\ncolC: dataC\n\nColA: dataA\nColB: dataB\nColC: dataC"
+test(1625.1, fread(text, header=FALSE, sep=":", blank.lines.skip=TRUE, strip.white=FALSE), 
+    setDT(read.table(text=text, header=FALSE, sep=":", blank.lines.skip=TRUE, stringsAsFactors=FALSE)))
+test(1625.2, fread(text, header=FALSE, sep=":", blank.lines.skip=TRUE), 
+    setDT(read.table(text=text, header=FALSE, sep=":", blank.lines.skip=TRUE, stringsAsFactors=FALSE, strip.white=TRUE)))
+
+# set-operators #547
+# setops basic check all
+x = data.table(c(1,2,2,2,3,4,4))
+y = data.table(c(2,3,4,4,4,5))
+test(1626.1, fintersect(x, y), data.table(c(2,3,4))) # intersect
+test(1626.2, fintersect(x, y, all=TRUE), data.table(c(2,3,4,4))) # intersect all
+test(1626.3, fsetdiff(x, y), data.table(c(1))) # setdiff (except)
+test(1626.4, fsetdiff(x, y, all=TRUE), data.table(c(1,2,2))) # setdiff all (except all)
+test(1626.5, funion(x, y), data.table(c(1,2,3,4,5))) # union
+test(1626.6, funion(x, y, all=TRUE), data.table(c(1,2,2,2,3,4,4,2,3,4,4,4,5))) # union all
+test(1626.7, fsetequal(x, y), FALSE) # setequal
+# setops check two cols
+x = data.table(c(1,2,2,2,3,4,4), c(1,1,1,3,3,3,3))
+y = data.table(c(2,3,4,4,4,5), c(1,1,2,3,3,3))
+test(1626.8, fintersect(x, y), data.table(c(2,4), c(1,3))) # intersect
+test(1626.9, fintersect(x, y, all=TRUE), data.table(c(2,4,4), c(1,3,3))) # intersect all
+test(1626.10, fsetdiff(x, y), data.table(c(1,2,3), c(1,3,3))) # setdiff (except)
+test(1626.11, fsetdiff(x, y, all=TRUE), data.table(c(1,2,2,3), c(1,1,3,3))) # setdiff all (except all)
+test(1626.12, funion(x, y), data.table(c(1,2,2,3,4,3,4,5), c(1,1,3,3,3,1,2,3))) # union
+test(1626.13, funion(x, y, all=TRUE), data.table(c(1,2,2,2,3,4,4,2,3,4,4,4,5), c(1,1,1,3,3,3,3,1,1,2,3,3,3))) # union all
+test(1626.14, fsetequal(x, y), FALSE) # setequal
+# setops on unique sets
+x = unique(x)
+y = unique(y)
+test(1626.15, fintersect(x, y), data.table(c(2,4), c(1,3))) # intersect
+test(1626.16, fintersect(x, y, all=TRUE), data.table(c(2,4), c(1,3))) # intersect all
+test(1626.17, fsetdiff(x, y), data.table(c(1,2,3), c(1,3,3))) # setdiff (except)
+test(1626.18, fsetdiff(x, y, all=TRUE), data.table(c(1,2,3), c(1,3,3))) # setdiff all (except all)
+test(1626.19, funion(x, y), data.table(c(1,2,2,3,4,3,4,5), c(1,1,3,3,3,1,2,3))) # union
+test(1626.20, funion(x, y, all=TRUE), data.table(c(1,2,2,3,4,2,3,4,4,5), c(1,1,3,3,3,1,1,2,3,3))) # union all
+test(1626.21, fsetequal(x, y), FALSE) # setequal
+# intersect precise duplicate handling
+dt = data.table(a=1L)
+test(1626.22, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,0)])), 0L)
+test(1626.23, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,0)], all=TRUE)), 0L)
+test(1626.24, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,1)])), 1L)
+test(1626.25, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,1)], all=TRUE)), 1L)
+test(1626.26, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,2)])), 1L)
+test(1626.27, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,2)], all=TRUE)), 2L)
+test(1626.28, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,3)])), 1L)
+test(1626.29, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,3)], all=TRUE)), 3L)
+test(1626.30, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,4)])), 1L)
+test(1626.31, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,4)], all=TRUE)), 4L)
+test(1626.32, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,5)])), 1L)
+test(1626.33, nrow(fintersect(dt[rep(1L,4)], dt[rep(1L,5)], all=TRUE)), 4L)
+# setdiff precise duplicate handling
+dt = data.table(a=1L)
+test(1626.34, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,0)])), 1L)
+test(1626.35, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,0)], all=TRUE)), 4L)
+test(1626.36, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,1)])), 0L)
+test(1626.37, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,1)], all=TRUE)), 3L)
+test(1626.38, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,2)])), 0L)
+test(1626.39, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,2)], all=TRUE)), 2L)
+test(1626.40, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,3)])), 0L)
+test(1626.41, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,3)], all=TRUE)), 1L)
+test(1626.42, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,4)])), 0L)
+test(1626.43, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,4)], all=TRUE)), 0L)
+test(1626.44, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,5)])), 0L)
+test(1626.45, nrow(fsetdiff(dt[rep(1L,4)], dt[rep(1L,5)], all=TRUE)), 0L)
+# unsupported type in set-ops: list (except UNION ALL)
+dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = lapply(1:4, function(x) new.env()))
+x = dt[c(2:4,2L,2L)]
+y = dt[c(1:3,2L)]
+test(1626.46, fintersect(x, y), error = "x and y must not have unsupported column types: list")
+test(1626.47, fintersect(x, y, all=TRUE), error = "x and y must not have unsupported column types: list")
+test(1626.48, fsetdiff(x, y), error = "x and y must not have unsupported column types: list")
+test(1626.49, fsetdiff(x, y, all=TRUE), error = "x and y must not have unsupported column types: list")
+test(1626.50, funion(x, y), error = "x and y must not have unsupported column types: list")
+test(1626.51, funion(x, y, all=TRUE), dt[c(2:4,2L,2L,1:3,2L)])
+test(1626.52, fsetequal(x, y), error = "x and y must not have unsupported column types: list")
+test(1626.53, fsetequal(dt[c(1:2,2L)], dt[c(1:2,2L)]), error = "x and y must not have unsupported column types: list")
+# unsupported type in set-ops: complex, raw
+dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = as.complex(1:4), V4 = as.raw(1:4), V5 = lapply(1:4, function(x) NULL))
+x = dt[c(2:4,2L,2L)]
+y = dt[c(1:3,2L)]
+test(1626.54, fintersect(x, y), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.55, fintersect(x, y, all=TRUE), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.56, fsetdiff(x, y), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.57, fsetdiff(x, y, all=TRUE), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.58, funion(x, y), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.59, funion(x, y, all=TRUE), error = "x and y must not have unsupported column types: raw, complex") # no 'list' here which is supported for `all=TRUE`
+test(1626.60, fsetequal(x, y), error = "x and y must not have unsupported column types: raw, complex, list")
+test(1626.61, fsetequal(dt[c(1:2,2L)], dt[c(1:2,2L)]), error = "x and y must not have unsupported column types: raw, complex, list")
+# supported types multi column test
+dt = data.table(
+    V1 = 1:4,
+    V2 = as.numeric(1:4),
+    V3 = letters[rep(1:2, 2)],
+    V4 = factor(c("a","a","b","b")),
+    V5 = as.POSIXct("2016-03-05 12:00:00", origin="1970-01-01")+(1:4)*3600,
+    V6 = as.Date("2016-03-05", origin="1970-01-01")+(1:4)
+)[, V7 := as.IDate(V6)
+  ][, V8 := as.ITime(V5)]
+x = dt[c(2:4,2L,2L)]
+y = dt[c(1:3,2L)]
+test(1626.62, fintersect(x, y), dt[2:3])
+test(1626.63, fintersect(x, y, all=TRUE), dt[c(2:3,2L)])
+test(1626.63, fsetdiff(x, y), dt[4L])
+test(1626.64, fsetdiff(x, y, all=TRUE), dt[c(4L,2L)])
+test(1626.65, funion(x, y), dt[c(2:4,1L)])
+test(1626.66, funion(x, y, all=TRUE), dt[c(2:4,2L,2L,1:3,2L)])
+test(1626.67, fsetequal(x, y), FALSE)
+test(1626.68, fsetequal(dt[c(2:3,3L)], dt[c(2:3,3L)]), TRUE)
+# supported type in set-ops: integer64
+if ("package:bit64" %in% search()) {
+    dt = data.table(V1 = 1:4, V2 = letters[1:4], V3 = bit64::as.integer64("90000000000")+1:4)
+    x = dt[c(2:4,2L,2L)]
+    y = dt[c(1:3,2L)]
+    test(1626.69, fintersect(x, y), dt[2:3])
+    test(1626.70, fintersect(x, y, all=TRUE), dt[c(2:3,2L)])
+    test(1626.71, fsetdiff(x, y), dt[4L])
+    test(1626.72, fsetdiff(x, y, all=TRUE), dt[c(4L,2L)])
+    test(1626.73, funion(x, y), dt[c(2:4,1L)])
+    test(1626.74, funion(x, y, all=TRUE), dt[c(2:4,2L,2L,1:3,2L)])
+    test(1626.75, fsetequal(x, y), FALSE)
+    test(1626.76, fsetequal(dt[c(2:3,3L)], dt[c(2:3,3L)]), TRUE)
+} else {
+    cat("Tests 1626.[69-76] not run. If required call library(bit64) first.\n")
+}
+
+# fix for #1087 and #1465
+test(1627, charToRaw(names(fread("issue_1087_utf8_bom.csv"))[1L]), as.raw(97L))
+
+# uniqueN gains na.rm argument, #1455
+set.seed(1L)
+dt = data.table(x=sample(c(1:3,NA),25,TRUE), y=sample(c(NA,"a", "b"), 25,TRUE), z=sample(2,25,TRUE))
+test(1628.1, uniqueN(dt, by=1:2, na.rm=TRUE), nrow(na.omit(dt[, .N, by=.(x,y)])))
+test(1628.2, uniqueN(dt, na.rm=TRUE), nrow(na.omit(dt[, .N, by=.(x,y,z)])))
+test(1628.3, dt[, uniqueN(y, na.rm=TRUE), by=z], dt[, length(unique(na.omit(y))), by=z])
+test(1628.4, dt[, uniqueN(.SD, na.rm=TRUE), by=z], dt[, nrow(na.omit(.SD[, .N, by=.(x,y)])), by=z])
+
+# fix for long standing FR/bug, #495
+# most likely I'm missing some tests, but we'll fix/add them as we go along.
+dt = data.table(grp=c(2,3,3,1,1,2,3), v1=1:7, v2=7:1, v3=10:16)
+test(1629.1, dt[, .SD*v1, .SDcols=v2:v3], dt[, .(v2=v2*v1, v3=v3*v1)])
+test(1629.2, dt[, lapply(.SD, function(x) x*v1), .SDcols=v2:v3], dt[, .(v2=v2*v1, v3=v3*v1)])
+test(1629.3, dt[, lapply(.SD, function(x) mean(x)*sum(v1)), .SDcols=v2:v3], data.table(v2=112, v3=364))
+test(1629.4, dt[, c(sum(v1), lapply(.SD, mean)), .SDcols=v2:v3], data.table(V1=28L, v2=4, v3=13))
+test(1629.5, dt[, c(v1=sum(v1), lapply(.SD, mean)), .SDcols=v2:v3], data.table(v1=28L, v2=4, v3=13))
+test(1629.6, dt[, .(v1=sum(v1), lapply(.SD, mean)), .SDcols=v2:v3], data.table(v1=28L, V2=list(4,13)))
+test(1629.7, dt[0][, .SD*v1, .SDcols=v2:v3], dt[0][, .SD, .SDcols=v2:v3])
+# add/update
+dt2 = copy(dt)
+test(1629.8, dt2[, c("v2", "v3") := .SD*v1, .SDcols=v2:v3], dt[, .(grp, v1, v2=v2*v1, v3=v3*v1)])
+# grouping operations
+oldopts = getOption("datatable.optimize") # backup
+options(datatable.optimize = 1L) # no gforce
+test(1629.9, dt[, .SD*sum(v1), by=grp, .SDcols=v2:v3], dt[, .SD*sum(v1), by=grp][, v1 := NULL])
+ans1 = dt[, sum(v1), by=grp]
+ans2 = dt[, base::max(.SD), by=grp, .SDcols=v2:v3]
+test(1629.10, dt[, max(.SD)*sum(v1), by=grp, .SDcols=v2:v3], ans1[, .(grp, V1=V1*ans2$V1)])
+test(1629.11, dt[, lapply(.SD, function(x) weighted.mean(x, w=v2)), .SDcols=c("v1","v3"), by=grp], 
+         dt[, .(v1=weighted.mean(v1,w=v2), v3=weighted.mean(v3, w=v2)), by=grp])
+test(1629.12, dt[, c(v1=max(v1), lapply(.SD, base::min)), by=grp, .SDcols=v2:v3], dt[, .(v1=max(v1), v2=min(v2), v3=min(v3)), by=grp])
+# gforce
+options(datatable.optimize = Inf) # Inf
+test(1629.13, dt[, c(v1=max(v1), lapply(.SD, min)), by=grp, .SDcols=v2:v3], dt[, .(v1=max(v1), v2=min(v2), v3=min(v3)), by=grp])
+# even more complex, shouldn't run any optimisation
+dt[, v4 := v1/2]
+test(1629.14, dt[, c(.(v1=v1*min(v4)), lapply(.SD, function(x) x*max(v4))), by=grp, .SDcols=v2:v3], 
+    dt[, .(v1=v1*min(v4), v2=v2*max(v4), v3=v3*max(v4)), by=grp])
+test(1629.15, copy(dt)[, c("a", "b", "c") := c(min(v1), lapply(.SD, function(x) max(x)*min(v1))), by=grp, .SDcols=v3:v4], copy(dt)[, c("a", "b", "c") := .(min(v1), max(v3)*min(v1), max(v4)*min(v1)), by=grp])
+options(datatable.optimize = oldopts)
+# by=.EACHI and operations with 'i'
+test(1629.16, dt[.(2:3), c(.(sum(v1)), lapply(.SD, function(x) mean(x)*min(v1))), by=.EACHI, .SDcols=v2:v3, on="grp"], dt[grp %in% 2:3, c(.(sum(v1)), lapply(.SD, function(x) mean(x)*min(v1))), by=grp, .SDcols=v2:v3])
+test(1629.17, dt[.(2:3), c(sum(v1), lapply(.SD, function(x) mean(x)*v1)), .SDcols=v2:v3, on="grp"][order(V1,v2,v3)], dt[grp %in% 2:3, c(sum(v1), lapply(.SD, function(x) mean(x)*v1)), .SDcols=v2:v3][order(V1,v2,v3)])
+
+# #759, add new cols on :=
+dt1 <- data.table(id = 1:2, x = 3:4)
+dt2 <- data.table(id = 3:4, y = c(5,6))
+# when updating using :=, nomatch = 0 or NA should make no difference i.e. new columns should always
+# be added. Otherwise there's an inconsistent number of columns in result that depends on data.
+ans = copy(dt1)[,z:=NA_real_]  # NA_real_ because :=2 below is type double
+test(1630.1,  copy(dt1)[id>5, z:=2,                      nomatch=0L], ans, warning="ignoring nomatch")
+test(1630.2,  copy(dt1)[dt2,  z:=2, on="id",             nomatch=0L], ans, warning="ignoring nomatch")
+test(1630.3,  copy(dt1)[dt2,  z:=y, on="id",             nomatch=0L], ans, warning="ignoring nomatch")
+test(1630.4,  copy(dt1)[dt2,  z:=y, on="id", by=.EACHI,  nomatch=0L], ans, warning="ignoring nomatch")
+test(1630.5,  copy(dt1)[id>5, z:=2,                      nomatch=NA], ans, warning="ignoring nomatch")
+test(1630.6,  copy(dt1)[dt2,  z:=2, on="id",             nomatch=NA], ans, warning="ignoring nomatch")
+test(1630.7,  copy(dt1)[dt2,  z:=y, on="id",             nomatch=NA], ans, warning="ignoring nomatch")
+test(1630.8,  copy(dt1)[dt2,  z:=y, on="id", by=.EACHI,  nomatch=NA], ans, warning="ignoring nomatch")
+test(1630.9,  copy(dt1)[id>5, z:=2L,                     nomatch=0L], copy(dt1)[,z:=NA_integer_], warning="ignoring nomatch")
+test(1630.11, copy(dt1)[id>5, z:=2L,                     nomatch=NA], copy(dt1)[,z:=NA_integer_], warning="ignoring nomatch")
+
+# fix for #1268, on= retains keys correctly.
+A = data.table(site=rep(c("A","B"), each=3), date=rep(1:3, times=2), x=rep(1:3*10, times=2), key="site,date")
+B = data.table(x=c(10,20), y=c(100,200), key="x")
+test(1631, key(A[B, on="x"]), NULL)
+
+# fix for #1479, secondary keys are removed when necessary
+dt = data.table(a = rep(c(F,F,T,F,F,F,F,F,F), 3), b = c("x", "y", "z"))
+setindex(dt, a)
+dt[, a := as.logical(sum(a)), by = b]
+test(1632.1, names(attributes(attr(dt, 'index'))), NULL)
+dt = data.table(a = rep(c(F,F,T,F,F,F,F,F,F), 3), b = c("x", "y", "z"))
+setindex(dt, b)
+dt[, a := as.logical(sum(a)), by = b]
+test(1632.2, names(attributes(attr(dt, 'index'))), "__b")
+dt = data.table(a = rep(c(F,F,T,F,F,F,F,F,F), 3), b = c("x", "y", "z"))
+test(1632.3, copy(dt)[, c := !a, by=b], copy(dt)[, c := c(T,T,F,T,T,T,T,T,T)])
+
+# by accepts colA:colB for interactive scenarios, #1395
+dt = data.table(x=rep(1,18), y=rep(1:2, each=9), z=rep(1:3,each=6), a=rep(1:6, each=3))[, b := 6]
+test(1633.1, dt[, sum(b), by=x:a], dt[, sum(b), by=.(x,y,z,a)])
+test(1633.2, dt[, sum(b), by=y:a], dt[, sum(b), by=.(y,z,a)])
+test(1633.3, dt[, sum(b), by=a:y], dt[, sum(b), by=.(a,z,y)])
+test(1633.4, dt[, .SD, by=1:nrow(dt)], data.table(nrow=1:nrow(dt), dt)) # make sure this works
+
+# reuse secondary indices
+dt = data.table(x=sample(3, 10, TRUE), y=1:10)
+v1 = capture.output(ans1 <- dt[.(3:2), on="x", verbose=TRUE])
+setindex(dt, x)
+v2 = capture.output(ans2 <- dt[.(3:2), on="x", verbose=TRUE])
+test(1634.1, any(grepl("ad hoc", v1)), TRUE)
+test(1634.2, any(grepl("existing index", v2)), TRUE)
+
+# fread's fill argument detects separator better in complex cases as well, #1573
+text = "a	b	c	d	e	f	g	h	i	j	k	l\n1	P	P;A;E;	Y	YW;	H().	1-3 pro\n2	Q9	a;a;a;a;	YB	YH;	M().	13 pn ba\n1	P3	P;	Y	Y;	R().	14 p\n53	P	P6;B;D;0;5;a;X;a;4R;	Y	Y;	H().	13 pe e\n1	P	P;O;O;a;a;a;	HLA-A	HLA-A;;	H().	HcIha,A-n\n102	P	P;O;P;P;P;P;P;P;a;a;a;a;a;a;a;a;a;a;	H-A	H-A;;	H().	HcIha,A"
+test(1635.1, ans1 <- fread(text, fill=TRUE), setDT(read.table(text=text, stringsAsFactors=FALSE, fill=TRUE, sep="\t", header=TRUE)))
+text = "a	b	c	d	e\n1	P	P;A;E;	Y	YW;	H().	1-3 pro\n2	Q9	a;a;a;a;	YB	YH;	M().	13 pn ba\n1	P3	P;	Y	Y;	R().	14 p\n53	P	P6;B;D;0;5;a;X;a;4R;	Y	Y;	H().	13 pe e\n1	P	P;O;O;a;a;a;	HLA-A	HLA-A;;	H().	HcIha,A-n\n102	P	P;O;P;P;P;P;P;P;a;a;a;a;a;a;a;a;a;a;	H-A	H-A;;	H().	HcIha,A"
+test(1635.2, fread(text, fill=TRUE), setnames(ans1[, 1:7, with=FALSE], c(letters[1:5], paste("V", 6:7, sep=""))))
+
+# testing function type in dt, #518
+dt = data.table(x=1, y=sum)
+test(1636.1, class(dt$y), "list")
+test(1636.2, any(grepl("1: 1 <function>", capture.output(print(dt)))), TRUE)
+dt = data.table(x=1:2, y=sum)
+test(1636.3, class(dt$y), "list")
+test(1636.4, any(grepl("2: 2 <function>", capture.output(print(dt)))), TRUE)
+dt = data.table(x=1:2, y=c(sum, min))
+test(1636.5, class(dt$y), "list")
+test(1636.6, any(grepl("2: 2 <function>", capture.output(print(dt)))), TRUE)
+
+# #484 fix (related to #495 fix above)
+dt = data.table(a = 1, b = 1)
+test(1637.1, dt[, data.table(a, .SD), by = cumsum(a)], data.table(cumsum=1, a=1, b=1))
+test(1637.2, dt[, data.table(a, .SD), by = cumsum(a), .SDcols=a:b], data.table(cumsum=1, a=1, a=1, b=1))
+test(1637.3, dt[, data.table(a, .SD), by = a], data.table(a=1,a=1,b=1))
+test(1637.4, dt[, data.table(b, .SD), by = cumsum(a)], data.table(cumsum=1, b=1, b=1))
+test(1637.5, dt[, data.table(a, b), by = cumsum(a)], data.table(cumsum=1, a=1, b=1))
+
+# when datatable.optimize<1, no optimisation of j should take place:
+old = options(datatable.optimize=0L)
+dt = data.table(x=1:5, y=6:10, z=c(1,1,1,2,2))
+test(1638, dt[, .SD, by=z, verbose=TRUE], output="All optimizations are turned off")
+options(old)
+
+#1389 - split.data.table - big chunk of unit tests
+set.seed(123)
+dt = data.table(x1 = rep(letters[1:2], 6), x2 = rep(letters[3:5], 4), x3 = rep(letters[5:8], 3), y = rnorm(12))
+dt = dt[sample(.N)]
+df = as.data.frame(dt)
+# - [x] split by factor the same as `split.data.frame` - `f` argument ----
+test(1639.1, lapply(split(df, as.factor(1:2)), setDT), split(dt, as.factor(1:2))) # drop=FALSE on same factor
+test(1639.2, lapply(split(df, as.factor(1:2), drop=TRUE), setDT), split(dt, as.factor(1:2), drop=TRUE)) # drop=TRUE on same factor
+test(1639.3, lapply(split(df, as.factor(1:4)[3:2]), setDT), split(dt, as.factor(1:4)[3:2])) # drop=FALSE on same factor with empty levels
+test(1639.4, lapply(split(df, as.factor(1:4)[3:2], drop=TRUE), setDT), split(dt, as.factor(1:4)[3:2], drop=TRUE)) # drop=TRUE on same factor with empty levels
+test(1639.5, lapply(split(df, as.factor(1:12)), setDT), split(dt, as.factor(1:12))) # drop=FALSE factor length of nrow
+test(1639.6, lapply(split(df, as.factor(1:12), drop=TRUE), setDT), split(dt, as.factor(1:12), drop=TRUE)) # drop=TRUE factor length of nrow
+ord = sample(2:13)
+test(1639.7, lapply(split(df, as.factor(1:14)[ord]), setDT), split(dt, as.factor(1:14)[ord])) # drop=FALSE factor length of nrow with empty levels
+test(1639.8, lapply(split(df, as.factor(1:14)[ord], drop=TRUE), setDT), split(dt, as.factor(1:14)[ord], drop=TRUE)) # drop=TRUE factor length of nrow with empty levels
+test(1639.9, lapply(split(df, list(as.factor(1:2), as.factor(3:2))), setDT), split(dt, list(as.factor(1:2), as.factor(3:2)))) # `f` list object drop=FALSE
+test(1639.10, lapply(split(df, list(as.factor(1:2), as.factor(3:2)), drop=TRUE), setDT), split(dt, list(as.factor(1:2), as.factor(3:2)), drop=TRUE)) # `f` list object drop=TRUE
+test(1639.11, split(dt, as.factor(integer())), error = "group length is 0 but data nrow > 0") # factor length 0L
+test(1639.12, split(dt, as.factor(integer()), drop=TRUE), error = "group length is 0 but data nrow > 0")
+test(1639.13, split(dt, as.factor(1:2)[0L]), error = "group length is 0 but data nrow > 0") # factor length 0L with empty levels
+test(1639.14, split(dt, as.factor(1:2)[0L], drop=TRUE), error = "group length is 0 but data nrow > 0")
+# - [x] edge cases for `f` argument ----
+if (base::getRversion() > "3.0.0") test(1639.15, split(df, as.factor(NA)), split(dt, as.factor(NA))) # factor NA
+if (base::getRversion() > "3.0.0") test(1639.16, split(df, as.factor(NA), drop=TRUE), split(dt, as.factor(NA), drop=TRUE))
+if (base::getRversion() > "3.0.0") test(1639.17, lapply(split(df, as.factor(1:2)[0L][1L]), setDT), split(dt, as.factor(1:2)[0L][1L])) # factor NA with empty levels
+if (base::getRversion() > "3.0.0") test(1639.18, split(df, as.factor(1:2)[0L][1L], drop=TRUE), split(dt, as.factor(1:2)[0L][1L], drop=TRUE))
+test(1639.19, lapply(split(df, as.factor(c(1L,NA,2L))), setDT), split(dt, as.factor(c(1L,NA,2L)))) # factor has NA
+test(1639.20, lapply(split(df, as.factor(c(1L,NA,2L)), drop=TRUE), setDT), split(dt, as.factor(c(1L,NA,2L)), drop=TRUE))
+test(1639.21, lapply(split(df, as.factor(c(1L,NA,2:4))[1:3]), setDT), split(dt, as.factor(c(1L,NA,2:4))[1:3])) # factor has NA with empty levels
+test(1639.22, lapply(split(df, as.factor(c(1L,NA,2:4))[1:3], drop=TRUE), setDT), split(dt, as.factor(c(1L,NA,2:4))[1:3], drop=TRUE))
+test(1639.23, lapply(split(df, letters[c(1L,NA,2L)]), setDT), split(dt, letters[c(1L,NA,2L)])) # character as `f` arg
+test(1639.24, lapply(split(df, letters[c(1L,NA,2L)], drop=TRUE), setDT), split(dt, letters[c(1L,NA,2L)], drop=TRUE))
+test(1639.25, lapply(split(df, "z"), setDT), split(dt, "z")) # character as `f` arg, length 1L
+test(1639.26, lapply(split(df, "z", drop=TRUE), setDT), split(dt, "z", drop=TRUE))
+test(1639.27, lapply(split(df, letters[c(1L,NA)]), setDT), split(dt, letters[c(1L,NA)])) # character as `f` arg, length 1L of non-NA
+test(1639.28, lapply(split(df, letters[c(1L,NA)], drop=TRUE), setDT), split(dt, letters[c(1L,NA)], drop=TRUE))
+test(1639.29, lapply(split(df[0L,], "z"), setDT), split(dt[0L], "z")) # nrow 0, f length 1-2
+test(1639.30, lapply(split(df[0L,], c("z1","z2")), setDT), split(dt[0L], c("z1","z2")))
+test(1639.31, lapply(split(df[0L,], "z", drop=TRUE), setDT), split(dt[0L], "z", drop=TRUE))
+test(1639.32, lapply(split(df[0L,], c("z1","z2"), drop=TRUE), setDT), split(dt[0L], c("z1","z2"), drop=TRUE))
+test(1639.33, lapply(split(df[1L,], "z"), setDT), split(dt[1L], "z")) # nrow 1, f length 1-2
+test(1639.34, lapply(suppressWarnings(split(df[1L,], c("z1","z2"))), setDT), suppressWarnings(split(dt[1L], c("z1","z2"))))
+test(1639.35, lapply(split(df[1L,], "z", drop=TRUE), setDT), split(dt[1L], "z", drop=TRUE) )
+test(1639.36, lapply(suppressWarnings(split(df[1L,], c("z1","z2"), drop=TRUE)), setDT), suppressWarnings(split(dt[1L], c("z1","z2"), drop=TRUE)))
+if (base::getRversion() > "3.0.0") test(1639.37, lapply(split(df[0L,], as.factor(NA_character_)), setDT), split(dt[0L], as.factor(NA_character_))) # nrow 0, f factor length 1L NA
+if (base::getRversion() > "3.0.0") test(1639.38, lapply(split(df[0L,], as.factor(NA_character_), drop=TRUE), setDT), split(dt[0L], as.factor(NA_character_), drop=TRUE))
+if (base::getRversion() > "3.0.0") test(1639.39, lapply(split(df[0L,], as.factor(1:2)[0L][1L]), setDT), split(dt[0L], as.factor(1:2)[0L][1L])) # nrow 0, f factor length 1L NA with empty levels
+if (base::getRversion() > "3.0.0") test(1639.40, lapply(split(df[0L,], as.factor(1:2)[0L][1L], drop=TRUE), setDT), split(dt[0L], as.factor(1:2)[0L][1L], drop=TRUE))
+test(1639.41, lapply(split(df[0L,], as.factor(integer())), setDT), split(dt[0L], as.factor(integer()))) # nrow 0, f factor length 0L
+test(1639.42, lapply(split(df[0L,], as.factor(integer()), drop=TRUE), setDT), split(dt[0L], as.factor(integer()), drop=TRUE))
+if (base::getRversion() > "3.0.0") test(1639.43, lapply(split(df[0L,], as.factor(1:2)[0L]), setDT), split(dt[0L], as.factor(1:2)[0L])) # nrow 0, f factor length 0L with empty levels
+if (base::getRversion() > "3.0.0") test(1639.44, lapply(split(df[0L,], as.factor(1:2)[0L], drop=TRUE), setDT), split(dt[0L], as.factor(1:2)[0L], drop=TRUE))
+test(1639.45, lapply(split(df[0L,], as.factor(1:3)[c(2L,NA,3L)]), setDT), split(dt[0L], as.factor(1:3)[c(2L,NA,3L)])) # nrow 0, f factor with empty levels and NA
+test(1639.46, lapply(split(df[0L,], as.factor(1:3)[c(2L,NA,3L)], drop=TRUE), setDT), split(dt[0L], as.factor(1:3)[c(2L,NA,3L)], drop=TRUE)) # nrow 0, f character length 1L NA
+if (base::getRversion() > "3.0.0") test(1639.47, lapply(split(df[0L,], NA_character_), setDT), split(dt[0L], NA_character_))
+if (base::getRversion() > "3.0.0") test(1639.48, lapply(split(df[0L,], NA_character_, drop=TRUE), setDT), split(dt[0L], NA_character_, drop=TRUE))
+test(1639.49, lapply(split(df[0L,], letters[c(NA,1:3)]), setDT), split(dt[0L], letters[c(NA,1:3)])) # nrow 0, f length > 1L, with NA
+test(1639.50, lapply(split(df[0L,], letters[c(NA,1:3)], drop=TRUE), setDT), split(dt[0L], letters[c(NA,1:3)], drop=TRUE))
+# - [x] split by reference to column names - `by` - for factor column ----
+fdt = dt[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3]
+l = split(fdt, by = "x1", flatten=FALSE) # single col
+test(1639.51, TRUE, all(is.list(l), identical(names(l), c("b","a")), sapply(l, is.data.table), sapply(l, nrow) == c(b=6L, a=6L), sapply(l, ncol) == c(b=4L, a=4L)))
+l = split(fdt, by = "x2", flatten=FALSE)
+test(1639.52, TRUE, all(is.list(l), identical(names(l), c("d","e","c")), sapply(l, is.data.table), sapply(l, nrow) == c(d=4L, e=4L, c=4L), sapply(l, ncol) == c(d=4L, e=4L, c=4L)))
+l = split(fdt, by = "x3", flatten=FALSE)
+test(1639.53, TRUE, all(is.list(l), identical(names(l), c("h","f","g","e")), sapply(l, is.data.table), sapply(l, nrow) == c(h=3L, f=3L, g=3L, e=3L), sapply(l, ncol) == c(h=4L, f=4L, g=4L, e=4L)))
+l = split(fdt, by = c("x1","x2"), flatten=FALSE) # multi col
+test(1639.54, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("d","e","c"), a=c("e","d","c"))),
+    sapply(l, sapply, nrow) == rep(2L, 6),
+    sapply(l, sapply, ncol) == rep(4L, 6)
+))
+l = split(fdt, by = c("x1","x3"), flatten=FALSE) # empty levels appears due subset x3 by x1 groups
+test(1639.55, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("h","f","e","g"), a=c("g","e","f","h"))),
+    sapply(l, sapply, nrow) == rep(c(3L,3L,0L,0L), 2),
+    sapply(l, sapply, ncol) == rep(4L, 8)
+))
+l = split(fdt, by = c("x2","x3"), flatten=FALSE)
+test(1639.56, TRUE, all(
+    is.list(l), identical(names(l), c("d","e","c")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(d=c("h","f","e","g"), e=c("h","f","g","e"), c=c("f","h","e","g"))),
+    sapply(l, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(fdt, by = c("x1","x2","x3"), flatten=FALSE) # empty levels in x3 after subset are expanded
+test(1639.57, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(b=list(d=c("h","f","e","g"), e=c("h","f","e","g"), c=c("f","h","e","g")), a=list(e=c("g","e","f","h"), d=c("e","g","f","h"), c=c("e","g","f","h")))),
+    sapply(l, sapply, sapply, nrow) == rep(c(1L,1L,0L,0L), 6),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 24)
+))
+l = split(fdt, by = c("x3","x1"), drop=TRUE, flatten=FALSE) # multi col rev
+test(1639.58, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 8)
+))
+l = split(fdt, by = c("x3","x1"), flatten=FALSE) # x1 has empty levels after split on x3 first
+test(1639.59, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(h=c("b","a"), f=c("b","a"), g=c("a","b"), e=c("a","b"))),
+    sapply(l, sapply, nrow) == rep(c(3L,0L), 4),
+    sapply(l, sapply, ncol) == rep(4L, 8)
+))
+l = split(fdt, by = c("x3","x2","x1"), drop = TRUE, flatten=FALSE)
+test(1639.60, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(h=list(d=c("b"), e=c("b"), c=c("b")), f=list(e=c("b"), c=c("b"), d=c("b")), g=list(e=c("a"), d=c("a"), c=c("a")), e=list(e=c("a"), d=c("a"), c=c("a")))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 12)
+))
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x3)) # split.data.frame match
+test(1639.61, unlist(split(fdt, by = c("x1","x3"), sorted = TRUE, flatten=FALSE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x3), drop=TRUE)
+test(1639.62, unlist(split(fdt, by = c("x1","x3"), sorted = TRUE, drop=TRUE, flatten=FALSE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=TRUE
+fdt = dt[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L], # empty levels in factor and drop=FALSE
+             x2 = as.factor(c("a", as.character(x2)))[-1L],
+             x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
+             y = y)]
+l = split(fdt, by = "x1")
+test(1639.63, TRUE, all(is.list(l), identical(names(l), c("b","a","c")), sapply(l, is.data.table), sapply(l, nrow) == c(b=6L, a=6L, c=0L), sapply(l, ncol) == c(b=4L, a=4L, c=4L)))
+l = split(fdt, by = "x2")
+test(1639.64, TRUE, all(is.list(l), identical(names(l), c("d","e","c","a")), sapply(l, is.data.table), sapply(l, nrow) == c(d=4L, e=4L, c=4L, a=0L), sapply(l, ncol) == c(d=4L, e=4L, c=4L, a=4L)))
+l = split(fdt, by = c("x3","x1"), flatten=FALSE)
+test(1639.65, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e","a","z")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(h=c("b","a","c"), f=c("b","a","c"), g=c("a","b","c"), e=c("a","b","c"), a=c("a","b","c"), z=c("a","b","c"))),
+    sapply(l, sapply, nrow) == c(rep(c(3L,0L,0L), 4), rep(0L, 6)),
+    sapply(l, sapply, ncol) == rep(4L, 18)
+))
+l = split(fdt, by = "x1", drop=TRUE) # empty levels in factor and drop=TRUE
+test(1639.66, TRUE, all(is.list(l), identical(names(l), c("b","a")), sapply(l, is.data.table), sapply(l, nrow) == c(b=6L, a=6L), sapply(l, ncol) == c(b=4L, a=4L)))
+l = split(fdt, by = "x2", drop=TRUE)
+test(1639.67, TRUE, all(is.list(l), identical(names(l), c("d","e","c")), sapply(l, is.data.table), sapply(l, nrow) == c(d=4L, e=4L, c=4L), sapply(l, ncol) == c(d=4L, e=4L, c=4L)))
+l = split(fdt, by = c("x3","x1"), drop=TRUE, flatten=FALSE)
+test(1639.68, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(fdt, by = c("x3","x1"), sorted=TRUE, flatten=FALSE) # test order for empty levels in factor and drop=FALSE
+test(1639.69, TRUE, all(
+    is.list(l), identical(names(l), c("a","e","f","g","h","z")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), setNames(rep(list(c("a","b","c")), 6), c("a","e","f","g","h","z"))),
+    sapply(l, sapply, nrow) == c(0L,0L,0L,3L,0L,0L,0L,3L,0L,3L,0L,0L,0L,3L,0L,0L,0L,0L),
+    sapply(l, sapply, ncol) == rep(4L, 18)
+))
+l = split(fdt, by = c("x3","x1"), sorted=TRUE, drop=TRUE, flatten=FALSE) # test order for empty levels in factor and drop=TRUE
+test(1639.70, TRUE, all(
+    is.list(l), identical(names(l), c("e","f","g","h")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(e=c("a"), f=c("b"), g=c("a"), h=c("b"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+sdf = split(as.data.frame(fdt), list(fdt$x3, fdt$x1)) # split.data.frame match on by = 2L and empty levels, drop=FALSE
+test(1639.71, unlist(split(fdt, by = c("x3","x1"), sorted=TRUE, flatten=FALSE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT))
+sdf = split(as.data.frame(fdt), list(fdt$x3, fdt$x1), drop=TRUE) # split.data.frame match on by = 2L and empty levels, drop=TRUE
+test(1639.72, unlist(split(fdt, by = c("x3","x1"), sorted=TRUE, drop=TRUE, flatten=FALSE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT))
+# - [x] split by reference to column names - `by` - factor and character column ----
+fdt = dt[, .(x1 = x1,
+             x2 = x2,
+             x3 = as.factor(x3),
+             y = y)]
+l = split(fdt, by = c("x2","x3"), flatten=FALSE)
+test(1639.73, TRUE, all(
+    is.list(l), identical(names(l), c("d","e","c")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(d=c("h","f","e","g"), e=c("h","f","g","e"), c=c("f","h","e","g"))),
+    sapply(l, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(fdt, by = c("x1","x2","x3"), flatten=FALSE) # empty levels in x3 after subset on x1, x2
+test(1639.74, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(b=list(d=c("h","f","e","g"), e=c("h","f","e","g"), c=c("f","h","e","g")), a=list(e=c("g","e","f","h"), d=c("e","g","f","h"), c=c("e","g","f","h")))),
+    sapply(l, sapply, sapply, nrow) == rep(c(1L,1L,0L,0L), 6),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 24)
+))
+l = split(fdt, by = c("x1","x2","x3"), drop=TRUE, flatten=FALSE)
+test(1639.75, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(b=list(d=c("h","f"), e=c("h","f"), c=c("f","h")), a=list(e=c("g","e"), d=c("e","g"), c=c("e","g")))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 12)
+))
+l = split(fdt, by = c("x3","x1"), flatten=FALSE) # multi col rev
+test(1639.76, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(fdt, by = c("x3","x2","x1"), flatten=FALSE)
+test(1639.77, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(h=list(d=c("b"), e=c("b"), c=c("b")), f=list(e=c("b"), c=c("b"), d=c("b")), g=list(e=c("a"), d=c("a"), c=c("a")), e=list(e=c("a"), d=c("a"), c=c("a")))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 12)
+))
+fdt = dt[, .(x1 = x1, # empty levels in factor and drop=FALSE
+             x2 = x2,
+             x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
+             y = y)]
+l = split(fdt, by = c("x3","x1"), flatten=FALSE) # empty levels in factor and drop=FALSE
+test(1639.78, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e","a","z")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"), a=character(), z=character())),
+    identical(lapply(l, lapply, nrow), list(h=list(b=3L), f=list(b=3L), g=list(a=3L), e=list(a=3L), a=structure(list(), .Names = character(0)), z=structure(list(), .Names = character(0)))),
+    identical(lapply(l, lapply, ncol), list(h=list(b=4L), f=list(b=4L), g=list(a=4L), e=list(a=4L), a=structure(list(), .Names = character(0)), z=structure(list(), .Names = character(0))))
+))
+
+l = split(fdt, by = c("x3","x1"), drop=TRUE, flatten=FALSE) # empty levels in factor and drop=TRUE
+test(1639.79, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(fdt, by = c("x3","x1"), sorted=TRUE, flatten=FALSE) # test order for empty levels in factor and drop=FALSE
+test(1639.80, TRUE, all(
+    is.list(l), identical(names(l), c("a","e","f","g","h","z")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(a=character(), e=c("a"), f=c("b"), g=c("a"), h=c("b"), z=character())),
+    identical(lapply(l, lapply, nrow), list(a=structure(list(), .Names = character(0)), e=list(a=3L), f=list(b=3L), g=list(a=3L), h=list(b=3L), z=structure(list(), .Names = character(0)))),
+    identical(lapply(l, lapply, ncol), list(a=structure(list(), .Names = character(0)), e=list(a=4L), f=list(b=4L), g=list(a=4L), h=list(b=4L), z=structure(list(), .Names = character(0))))
+))
+l = split(fdt, by = c("x3","x1"), sorted=TRUE, drop=TRUE, flatten=FALSE) # test order for empty levels in factor and drop=TRUE
+test(1639.81, TRUE, all(
+    is.list(l), identical(names(l), c("e","f","g","h")),
+    sapply(l, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, names), list(e=c("a"), f=c("b"), g=c("a"), h=c("b"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+# - [x] split by reference to column names - `by` - for character column ----
+l = split(dt, by = "x1") # single col
+test(1639.82, TRUE, all(is.list(l), identical(names(l), c("b","a")), sapply(l, is.data.table), sapply(l, nrow) == c(b=6L, a=6L), sapply(l, ncol) == c(b=4L, a=4L)))
+l = split(dt, by = "x2")
+test(1639.83, TRUE, all(is.list(l), identical(names(l), c("d","e","c")), sapply(l, is.data.table), sapply(l, nrow) == c(d=4L, e=4L, c=4L), sapply(l, ncol) == c(d=4L, e=4L, c=4L)))
+l = split(dt, by = "x3")
+test(1639.84, TRUE, all(is.list(l), identical(names(l), c("h","f","g","e")), sapply(l, is.data.table), sapply(l, nrow) == c(h=3L, f=3L, g=3L, e=3L), sapply(l, ncol) == c(h=4L, f=4L, g=4L, e=4L)))
+l = split(dt, by = c("x1","x2"), flatten=FALSE) # multi col
+test(1639.85, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("d","e","c"), a=c("e","d","c"))),
+    sapply(l, sapply, nrow) == rep(2L, 6),
+    sapply(l, sapply, ncol) == rep(4L, 6)
+))
+l = split(dt, by = c("x1","x3"), flatten=FALSE)
+test(1639.86, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("h","f"), a=c("g","e"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(dt, by = c("x2","x3"), flatten=FALSE)
+test(1639.87, TRUE, all(
+    is.list(l), identical(names(l), c("d","e","c")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(d=c("h","f","e","g"), e=c("h","f","g","e"), c=c("f","h","e","g"))),
+    sapply(l, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(dt, by = c("x1","x2","x3"), flatten=FALSE)
+test(1639.88, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(b=list(d=c("h","f"), e=c("h","f"), c=c("f","h")), a=list(e=c("g","e"), d=c("e","g"), c=c("e","g")))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 12)
+))
+l = split(dt, by = c("x3","x1"), flatten=FALSE) # multi col rev
+test(1639.89, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(4L, 4)
+))
+l = split(dt, by = c("x3","x2","x1"), flatten=FALSE)
+test(1639.90, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(h=list(d="b", e="b", c="b"), f=list(e="b", c="b", d="b"), g=list(e="a", d="a", c="a"), e=list(e="a",d="a",c="a"))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(4L, 12)
+))
+# - [x] allow to keep or drop field on which we split - `keep.by` argument ----
+l = split(dt, by = "x1", keep.by = FALSE)
+test(1639.91, TRUE, all(is.list(l), identical(names(l), c("b","a")), sapply(l, is.data.table), sapply(l, nrow) == c(b=6L, a=6L), sapply(l, ncol) == c(b=3L, a=3L)))
+l = split(dt, by = "x2", keep.by = FALSE)
+test(1639.92, TRUE, all(is.list(l), identical(names(l), c("d","e","c")), sapply(l, is.data.table), sapply(l, nrow) == c(d=4L, e=4L, c=4L), sapply(l, ncol) == c(d=3L, e=3L, c=3L)))
+l = split(dt, by = "x3", keep.by = FALSE)
+test(1639.93, TRUE, all(is.list(l), identical(names(l), c("h","f","g","e")), sapply(l, is.data.table), sapply(l, nrow) == c(h=3L, f=3L, g=3L, e=3L), sapply(l, ncol) == c(h=3L, f=3L, g=3L, e=3L)))
+l = split(dt, by = c("x1","x2"), keep.by = FALSE, flatten=FALSE) # multi col
+test(1639.94, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("d","e","c"), a=c("e","d","c"))),
+    sapply(l, sapply, nrow) == rep(2L, 6),
+    sapply(l, sapply, ncol) == rep(2L, 6)
+))
+l = split(dt, by = c("x1","x3"), keep.by = FALSE, flatten=FALSE)
+test(1639.95, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(b=c("h","f"), a=c("g","e"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(2L, 4)
+))
+l = split(dt, by = c("x2","x3"), keep.by = FALSE, flatten=FALSE)
+test(1639.96, TRUE, all(
+    is.list(l), identical(names(l), c("d","e","c")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(d=c("h","f","e","g"), e=c("h","f","g","e"), c=c("f","h","e","g"))),
+    sapply(l, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, ncol) == rep(2L, 12)
+))
+l = split(dt, by = c("x1","x2","x3"), keep.by = FALSE, flatten=FALSE)
+test(1639.97, TRUE, all(
+    is.list(l), identical(names(l), c("b","a")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(b=list(d=c("h","f"), e=c("h","f"), c=c("f","h")), a=list(e=c("g","e"), d=c("e","g"), c=c("e","g")))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(1L, 12)
+))
+l = split(dt, by = c("x3","x1"), keep.by = FALSE, flatten=FALSE) # multi col rev
+test(1639.98, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    identical(lapply(l, names), list(h=c("b"), f=c("b"), g=c("a"), e=c("a"))),
+    sapply(l, sapply, nrow) == rep(3L, 4),
+    sapply(l, sapply, ncol) == rep(2L, 4)
+))
+l = split(dt, by = c("x3","x2","x1"), keep.by = FALSE, flatten=FALSE)
+test(1639.99, TRUE, all(
+    is.list(l), identical(names(l), c("h","f","g","e")), 
+    sapply(l, function(x) !is.data.table(x) && is.list(x)), 
+    sapply(l, sapply, function(x) !is.data.table(x) && is.list(x)),
+    identical(lapply(l, lapply, names), list(h=list(d="b", e="b", c="b"), f=list(e="b", c="b", d="b"), g=list(e="a", d="a", c="a"), e=list(e="a",d="a",c="a"))),
+    sapply(l, sapply, sapply, nrow) == rep(1L, 12),
+    sapply(l, sapply, sapply, ncol) == rep(1L, 12)
+))
+# - [x] support recursive split into nested lists for `length(by) > 2L` (default) and `flatten` arg to produce non-nested list of data.table ----
+fdt = dt[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3] # factors, flatten consistent to non-flatten length(by)==1L
+test(1639.100, split(fdt, by = "x1"), split(fdt, by = "x1", flatten = FALSE)) # length(by) == 1L should be same as flatten=FALSE # ref data already checked in above test
+test(1639.101, split(fdt, by = "x2"), split(fdt, by = "x2", flatten = FALSE))
+test(1639.102, split(fdt, by = "x3"), split(fdt, by = "x3", flatten = FALSE))
+test(1639.103, split(fdt, by = "x1", sorted = TRUE), split(fdt, by = "x1", flatten = FALSE, sorted = TRUE))
+test(1639.104, split(fdt, by = "x3", sorted = TRUE), split(fdt, by = "x3", flatten = FALSE, sorted = TRUE))
+test(1639.105, split(fdt, by = "x1", sorted = TRUE, drop = TRUE), split(fdt, by = "x1", flatten = FALSE, sorted = TRUE, drop = TRUE))
+test(1639.106, split(fdt, by = "x1", sorted = TRUE, keep.by = FALSE), split(fdt, by = "x1", flatten = FALSE, sorted = TRUE, keep.by = FALSE))
+test(1639.107, unlist(split(fdt, by = c("x1","x2"), sorted = TRUE, flatten = FALSE), recursive = FALSE), split(fdt, by = c("x1","x2"), sorted = TRUE)) # by two variables - match after unlist nested one # sorted=TRUE
+test(1639.108, unlist(split(fdt, by = c("x1","x2"), sorted = FALSE, flatten = FALSE), recursive = FALSE), split(fdt, by = c("x1","x2"), sorted = FALSE)) # sorted=FALSE
+test(1639.109, unlist(split(fdt, by = c("x1","x2"), sorted = TRUE, keep.by = FALSE, flatten = FALSE), recursive = FALSE), split(fdt, by = c("x1","x2"), sorted = TRUE, keep.by = FALSE)) # drop.by=TRUE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2)) # vs split.data.frame by 2L # this will dispatch to `interaction(x1, x2)` which results into different order, see: levels(interaction(1:2,1:2)) vs CJ(1:2,1:2)
+test(1639.110, split(fdt, by = c("x1","x2"), sorted = TRUE), lapply(sdf[sort(names(sdf))], setDT))# vs split.data.frame by 2L drop=FALSE
+test(1639.111, unlist(split(fdt, by = c("x1","x2"), flatten = FALSE, sorted = TRUE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT))# vs split.data.frame by 2L drop=FALSE, flatten=FALSE + unlist
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2), drop=TRUE)
+test(1639.112, split(fdt, by = c("x1","x2"), sorted = TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=TRUE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2, fdt$x3)) # vs split.data.frame by 3L
+test(1639.113, split(fdt, by = c("x1","x2","x3"), flatten = TRUE, sorted = TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 3L drop=FALSE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2, fdt$x3), drop=TRUE)
+test(1639.114, split(fdt, by = c("x1","x2","x3"), flatten = TRUE, sorted = TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 3L drop=TRUE
+fdt = dt[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L], # empty levels in factors
+             x2 = as.factor(c("a", as.character(x2)))[-1L],
+             x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
+             y = y)]
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2)) # vs split.data.frame by 2L # this will dispatch to `interaction(x1, x2)` which results into different order, see: levels(interaction(1:2,1:2)) vs CJ(1:2,1:2)
+test(1639.115, split(fdt, by = c("x1","x2"), sorted = TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE
+test(1639.116, unlist(split(fdt, by = c("x1","x2"), flatten = FALSE, sorted = TRUE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE, flatten=FALSE + unlist
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2), drop=TRUE)
+test(1639.117, split(fdt, by = c("x1","x2"), sorted = TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=TRUE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2, fdt$x3)) # vs split.data.frame by 3L
+test(1639.118, split(fdt, by = c("x1","x2","x3"), flatten = TRUE, sorted = TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 3L drop=FALSE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2, fdt$x3), drop=TRUE)
+test(1639.119, split(fdt, by = c("x1","x2","x3"), flatten = TRUE, sorted = TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 3L drop=TRUE
+sdf = split(as.data.frame(fdt[, .SD, .SDcols=c("x3","y")]), f=list(fdt$x1, fdt$x2)) # flatten drop.by and empty lists # this will dispatch to `interaction(x1, x2)` which results into different order, see: levels(interaction(1:2,1:2)) vs CJ(1:2,1:2)
+test(1639.120, split(fdt, by = c("x1","x2"), sorted = TRUE, keep.by = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE
+test(1639.121, unlist(split(fdt, by = c("x1","x2"), flatten = FALSE, sorted = TRUE, keep.by = FALSE), recursive = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE, flatten=FALSE + unlist
+sdf = split(as.data.frame(fdt[, .SD, .SDcols=c("x3","y")]), f=list(fdt$x1, fdt$x2), drop=TRUE)
+test(1639.122, split(fdt, by = c("x1","x2"), sorted = TRUE, drop=TRUE, keep.by = FALSE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=TRUE
+# - [x] edge cases for `by` and `sorted`, 0 rows, 1 unique value in cols, drop ----
+test(1639.123, length(split(dt[0L], by = "x1")), 0L) # drop=FALSE vs split.data.frame expand list with empty levels won't work on characters, use factor with defined levels, included those unused.
+test(1639.124, length(split(as.data.frame(dt[0L]), df$x1)), 2L) # unlike data.frame because character != factor
+fdt = dt[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3] # factors no empty levels
+test(1639.125, length(split(fdt[0L], by = "x1")), 2L)
+test(1639.126, length(split(as.data.frame(fdt[0L]), df$x1)), 2L) # match on factors work
+test(1639.127, split(fdt[0L], by = "x1"), lapply(split(as.data.frame(fdt[0L]), df$x1), setDT)) # we match also on complete structure
+fdt = dt[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L], # factors empty levels
+             x2 = as.factor(c("a", as.character(x2)))[-1L],
+             x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
+             y = y)]
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2)) # vs split.data.frame by 2L# this will dispatch to `interaction(x1, x2)` which results into different order, see: levels(interaction(1:2,1:2)) vs CJ(1:2,1:2)
+test(1639.128, split(fdt, by = c("x1","x2"), sorted = TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=FALSE
+sdf = split(as.data.frame(fdt), f=list(fdt$x1, fdt$x2), drop=TRUE)
+test(1639.129, split(fdt, by = c("x1","x2"), sorted = TRUE, drop=TRUE), lapply(sdf[sort(names(sdf))], setDT)) # vs split.data.frame by 2L drop=TRUE
+test(1639.130, split(dt[0L], by = "x1"), structure(list(), .Names = character(0))) # 0 nrow character/factor with empty levels # no empty levels
+test(1639.131, split(fdt[0L], by = "x1"), lapply(c(a=1L,b=2L,c=3L), function(i) data.table(x1=factor(levels = c("a","b","c")),x2=factor(levels = c("a","c","d","e")),x3=factor(levels = c("a","e","f","g","h","z")),y=numeric()))) # expand empty levels
+test(1639.132, split(dt[0L], by = "x1", sorted = TRUE), structure(list(), .Names = character(0)))
+test(1639.133, split(fdt[0L], by = "x1", sorted = TRUE), lapply(c(a=1L,b=2L,c=3L), function(i) data.table(x1=factor(levels = c("a","b","c")),x2=factor(levels = c("a","c","d","e")),x3=factor(levels = c("a","e","f","g","h","z")),y=numeric()))) # same as none sorted as all appended on the end in sorted order due to lack of data
+dt2 = copy(dt)[, "l" := lapply(1:12, function(i) i)] # non-atomic type to 'by' should raise error
+test(1639.134, split(dt2, by = "l"), error = "argument 'by' must refer only to atomic type columns, classes of 'l' columns are not atomic type")
+# - [x] additional tests for names consistency with data.frame, and current examples in SO
+df = data.frame(product = c("b", "a", "b", "a"),
+                value = c(sample(1:10,4)),
+                year = c(2001, 2001, 2000, 2000))
+tmp = as.data.table(df)[, list(grp=list(.SD)), by=.(product, year), .SDcols=names(df)] # http://stackoverflow.com/a/33068928/2490497
+setattr(ans <- tmp$grp, 'names', paste(tmp$product, tmp$year, sep="."))
+dt = as.data.table(df) # http://stackoverflow.com/q/33068791/2490497
+dt[, grp := .GRP, by = list(product,year)] 
+setkey(dt, grp)
+o2 = dt[, list(list(.SD)), by = grp]$V1
+setattr(o2, 'names', paste(tmp$product, tmp$year, sep=".")) # names reused
+test(1639.135, o2, ans)
+lapply(ans, setattr, ".data.table.locked", NULL)
+sort.by.names = function(x) x[sort(names(x))]
+test(1639.136, sort.by.names(ans), sort.by.names(split(as.data.table(df), f=list(df$product, df$year))))
+test(1639.137, sort.by.names(ans), sort.by.names(unlist(split(setDT(df), by=c("product","year"), flatten = FALSE), recursive = FALSE)))
+test(1639.138, ans, split(as.data.table(df), by=c("product","year")))
+test(1639.139, sort.by.names(ans), sort.by.names(unlist(split(as.data.table(df), by=c("product","year"), flatten=FALSE), recursive = FALSE)))
+# test if split preallocate columns in results #1908
+if (base::getRversion() > "3.0.0") {
+  dt = data.table(x=rexp(100),y=rep(LETTERS[1:10], 10))
+  dtL = split(dt, by = "y")
+  test(1639.140, dim(dtL[[1]][, x2 := -x]), c(10L,3L))
+  test(1639.141, all(sapply(dtL, truelength) > 1000))
+}
+
+# allow x's cols (specifically x's join cols) to be referred to using 'x.' syntax
+# patch for #1615. Note that I specifically have not implemented x[y, aa, on=c(aa="bb")] 
+# to refer to x's join column as well because x[i, col] == x[i][, col] will not be TRUE anymore..
+x <- data.table(aa = 1:3, cc = letters[1:3])
+y <- data.table(bb = 3:5, dd = 3:1)
+test(1640.1, x[y, x.aa, on=c(aa="bb")], INT(3,NA,NA))
+test(1640.2, x[y, c(.SD, .(x.aa=x.aa)), on=c(aa="bb")], data.table(aa=3:5, cc=c("c", NA,NA), x.aa=INT(3,NA,NA)))
+
+# tests for non-equi joins
+# function to create a random data.table with all necessary columns
+# set.seed(45L) # for testing..
+set.seed(unclass(Sys.time()))
+nq_fun = function(n=100L) {
+    i1 = sample(sample(n, 10L), n, TRUE)
+    i2 = sample(-n/2:n/2, n, TRUE)
+    i3 = sample(-1e6:1e6, n, TRUE)
+    i4 = sample(c(NA_integer_, sample(-n:n, 10L, FALSE)), n, TRUE)
+
+    d1 = sample(rnorm(10L), n, TRUE)
+    d2 = sample(rnorm(50), n, TRUE)
+    d3 = sample(c(Inf, -Inf, NA, NaN, runif(10L)), n, TRUE)
+    d4 = sample(c(NA, NaN, rnorm(10L)), n, TRUE)
+
+    c1 = sample(letters[1:5], n, TRUE)
+    c2 = sample(LETTERS[1:15], n, TRUE)
+
+    dt = data.table(i1,i2,i3,i4, d1,d2,d3,d4, c1,c2)
+    if ("package:bit64" %in% search()) {
+        I1 = as.integer64(sample(sample(n, 10L), n, TRUE))
+        I2 = as.integer64(sample(-n/2:n/2, n, TRUE))
+        I3 = as.integer64(sample(-1e6:1e6, n, TRUE))
+        I4 = as.integer64(sample(c(NA_integer_, sample(-n:n, 10L, FALSE)), n, TRUE))
+        dt = cbind(dt, data.table(I1,I2,I3,I4))
+    }
+    dt
+}
+
+dt1 = nq_fun(400L)
+dt2 = nq_fun(50L)
+x = na.omit(dt1)
+y = na.omit(dt2)
+nqjoin_test <- function(x, y, k=1L, test_no, mult="all") {
+    ops = c("==", ">=", "<=", ">", "<")
+    xclass = sapply(x, class)
+    runcmb = combn(names(x), k)
+    runcmb = as.data.table(runcmb[, 1:min(100L, ncol(runcmb)), drop=FALSE]) # max 100 combinations to test
+    runops = lapply(runcmb, function(cols) {
+        thisops = sample(ops, k, TRUE)
+        thisops[grepl("^c", cols)] = "=="
+        thisops
+    })
+    is_only_na <- function(x) is.na(x) & !is.nan(x)
+    is_int64 <- function(x) "integer64" %in% class(x)
+    construct <- function(cols, vals, ops) {
+        expr = lapply(seq_along(cols), function(i) {
+            if (is_int64(vals[[i]])) {
+                if (is.na.integer64(vals[[i]])) if (ops[i] %in% c(">", "<")) quote(integer(0)) else as.call(list(quote(is.na.integer64), as.name(cols[[i]])))
+                else as.call(list(as.name(ops[[i]]), as.name(cols[[i]]), as.integer(vals[[i]])))
+                # don't know how to construct a call with int64 -- vals[[i]] gets converted to NAN
+            } else {
+                if (is.nan(vals[[i]])) if (ops[i] %in% c(">", "<")) quote(integer(0)) else as.call(list(quote(is.nan), as.name(cols[[i]])))
+                else if (is_only_na(vals[[i]])) if (ops[i] %in% c(">", "<")) quote(integer(0)) else as.call(list(quote(is_only_na), as.name(cols[[i]])))
+                else as.call(list(as.name(ops[[i]]), as.name(cols[[i]]), vals[[i]]))
+            }
+        })
+        ans = expr[[1L]]
+        lapply(expr[-1L], function(e) ans <<- as.call(list(quote(`&`), ans, e)))
+        ans
+    }
+    check <- function(x, y, cols, ops, mult="all") {
+        expr = lapply(1:nrow(y), function(i) {
+            expr = construct(cols, as.list(y[i, cols, with=FALSE]), ops)
+        })
+        ans = lapply(expr, function(e) {
+                val = x[eval(e)]
+                if (!nrow(val)) return(val)
+                val = if (mult=="first") val[1L] else if (mult=="last") val[.N] else val
+            })
+        rbindlist(ans)
+    }
+    nq <- function(x, y, cols, ops, nomatch=0L, mult="all") {
+        sd_cols = c(paste("x.", cols, sep=""), setdiff(names(x), cols))
+        ans = x[y, mget(sd_cols, as.environment(-1)), on = paste(cols, ops, cols, sep=""), allow=TRUE, nomatch=nomatch, mult=mult]
+        setnames(ans, gsub("^x[.]", "", names(ans)))
+        setcolorder(ans, names(x))[]
+    }
+    for (i in seq_along(runcmb)) {
+        thiscols = runcmb[[i]]
+        thisops = runops[[i]]
+        # cat("k = ", k, "\ti = ", i, "\t thiscols = [", paste(thiscols,collapse=","), "]\t thisops = [", paste(thisops,collapse=","), "]\t ", sep="")
+        ans1 = nq(x, y, thiscols, thisops, 0L, mult=mult)
+        ans2 = check(x, y, thiscols, thisops, mult=mult)
+        test_no = signif(test_no+.001, 7)
+        test(test_no, all.equal(ans1,ans2,ignore.row.order=TRUE), TRUE)
+        # if (identical(all.equal(ans1,ans2,ignore.row.order=TRUE), TRUE)) cat("successful\n") else stop("failed\n")
+    }
+}
+
+if (TRUE) {  # turn off to FALSE temporarily if needed to rerun valgrind, as very slow
+  # without NAs in x and i
+  nqjoin_test(x, y, 1L, 1641.0, mult="all")
+  nqjoin_test(x, y, 2L, 1642.0, mult="all")
+  nqjoin_test(x, y, 1L, 1643.0, mult="first")
+  nqjoin_test(x, y, 2L, 1644.0, mult="first")
+  nqjoin_test(x, y, 1L, 1645.0, mult="last")
+  nqjoin_test(x, y, 2L, 1646.0, mult="last")
+
+  # with NAs in x and i
+  nqjoin_test(dt1, dt2, 1L, 1647.0, mult="all")
+  nqjoin_test(dt1, dt2, 2L, 1648.0, mult="all")
+  nqjoin_test(dt1, dt2, 1L, 1649.0, mult="first")
+  nqjoin_test(dt1, dt2, 2L, 1650.0, mult="first")
+  nqjoin_test(dt1, dt2, 1L, 1651.0, mult="last")
+  nqjoin_test(dt1, dt2, 2L, 1652.0, mult="last")
+}
+
+# TODO: add tests for nomatch=NA..
+# tested, but takes quite some time.. so commenting for now
+# nqjoin_test(x, y, 3L,1643.0)
+# nqjoin_test(dt1,dt2,3L,1652.0)
+
+# nqjoin_test(  x,dt2,1L,1644.0) # without NA only in x
+# nqjoin_test(  x,dt2,2L,1645.0)
+# nqjoin_test(  x,dt2,3L,1646.0)
+# nqjoin_test(dt1,  y,1L,1647.0) # without NA only in i
+# nqjoin_test(dt1,  y,2L,1648.0)
+# nqjoin_test(dt1,  y,3L,1649.0)
+
+# test for the issues Jan spotted...
+dt = data.table(id="x", a=as.integer(c(3,8,8,15,15,15,16,22,22,25,25)), b=as.integer(c(9,10,25,19,22,25,38,3,9,7,28)), c=as.integer(c(22,33,44,14,49,44,40,25,400,52,77)))
+set.seed(1L)
+dt=dt[sample(.N)]
+test(1653.1, uniqueN(dt[dt, .(x.id, x.a, x.b, x.c, i.id, i.a, i.b, i.c), which=FALSE, on = c("id==id","a>=a","b>=b"), allow.cartesian=TRUE]), 42L)
+test(1653.2, x[y, .(x.i1, x.i2, x.i3, x.i4, x.d1, x.d2, x.d3, x.d4, x.c1, x.c2, i.i1, i.i2, i.i3, i.i4, i.d1, i.d2, i.d3, i.d4, i.c1, i.c2), on = c("i4==i4", "i1>=i1", "d4<=d4", "i3==i3", "d3>d3", "i2>i2", "d2>=d2", "d1>d1"), allow.cartesian = TRUE], x[y, .(x.i1, x.i2, x.i3, x.i4, x.d1, x.d2, x.d3, x.d4, x.c1, x.c2, i.i1, i.i2, i.i3, i.i4, i.d1, i.d2, i.d3, i.d4, i.c1, i.c2), on = c("i4==i4", "i1>=i1", "d4<=d4", "i3==i3", "d3>d3", "i2>i2", "d2>=d2", "d1>d1"), allow.cartesian = TRUE]) # e [...]
+
+# error on any op other than "==" on char type
+dt1 = data.table(x=sample(letters[1:2], 10, TRUE), y=sample(c(1L,5L,7L), 10, TRUE), z=1:10, k=11:20)
+dt2 = data.table(x=c("b", "a"), y=c(1L, 9L))
+test(1654, dt1[dt2, on="x>x"], error="Only '==' operator")
+
+# on= with .() syntax, #1257
+dt1 = data.table(x=sample(letters[1:2], 10, TRUE), y=sample(c(1L,5L,7L), 10, TRUE), z=1:10, k=11:20)
+dt2 = data.table(x=c("b", "a"), y=c(1L, 9L))
+test(1655.1, dt1[dt2, on=.(x)], dt1[dt2, on="x"])
+test(1655.2, dt1[dt2, on=.(x==x)], dt1[dt2, on=c("x==x")])
+test(1655.3, dt1[dt2, on=.(x==x)], dt1[dt2, on=c("x"="x")])
+test(1655.4, dt1[dt2, on=.(y>=y)], dt1[dt2, on=c("y>=y")])
+test(1655.4, dt1[dt2, on=.(x==x, y>=y)], dt1[dt2, on=c("x==x", "y>=y")])
+
+# Patching another issue spotted by Jan
+dt = data.table(id="x", a=as.integer(c(3,8,8,15,15,15,16,22,22,25,25)), 
+                b=as.integer(c(9,10,25,19,22,25,38,3,9,7,28)),
+                c=as.integer(c(22,33,44,14,49,44,40,25,400,52,77)))
+set.seed(1L)
+dt=dt[sample(.N)][, row_id := 1:.N]
+test(1656, nrow(dt[dt, .(x.id, x.a, x.b, x.c, x.row_id, i.id, i.a, i.b, i.c, i.row_id), on = .(c,b<=b,id,a>=a), allow.cartesian = TRUE]), 12L) # just to check that there's no warning
+
+# between is vectorised, #534
+set.seed(1L)
+dt = data.table(x=sample(3,10,TRUE), y=sample(2,10,TRUE), z=sample(5,10,TRUE))
+test(1657, dt[x %between% list(y,z)], dt[x>=y & x<=z])
+
+oldverbose = options(datatable.verbose=FALSE)
+
+# fwrite tests
+# without quoting
+test(1658.1, fwrite(data.table(a=c(NA, 2, 3.01), b=c('foo', NA, 'bar'))),
+             output=paste(c("a,b",",foo","2,","3.01,bar"),collapse=""))
+
+# with quoting and qmethod="escape"
+test(1658.2, fwrite(data.table(
+    a=c(NA, 2, 3.01),
+    `other column`=c('foo bar', NA, 'quote" and \\ bs \n and newline')),
+    quote=TRUE, qmethod="escape"),
+    output='"a","other column","foo bar"2,3.01,"quote\\" and \\\\ bs  and newline"')
+
+# with quoting and qmethod="double" (default)
+test(1658.3, fwrite(data.table(
+    a=c(NA, 1.2e-100, 3.01),
+    "other \"column"=c('foo bar', NA, 'quote" and \\ bs')),
+    quote=TRUE, qmethod="double"),
+    output='"a","other ""column"\n,"foo bar"\n1.2e-100,\n3.01,"quote"" and \\ bs"\n')
+
+# presence of " triggers auto quoting as well, #1925
+test(1658.4, fwrite(data.table(a=1:4, b=c('"foo','ba"r','baz"','a "quoted" region'))),
+    output='a,b\n1,"""foo"\n2,"ba""r"\n3,"baz"""\n4,"a ""quoted"" region"')
+test(1658.5, fwrite(data.table(a=1:4, b=c('"foo','ba"r','baz"','a "quoted" region')), qmethod='escape'),
+    output='a,b\n1,"\\"foo"\n2,"ba\\"r"\n3,"baz\\""\n4,"a \\"quoted\\" region"')
+# NB: sep2[2] triggering quoting when list columns are present is tested in test 1736
+
+# changing sep
+DT = data.table(a="foo", b="ba\"r")
+ans = '"a";"b"\n"foo";"ba""r"\n'
+test(1658.41, fwrite(DT, sep=";", quote=TRUE, qmethod="double"), output=ans)
+test(1658.42, write.table(DT, sep=";", qmethod="double", row.names=FALSE), output=ans)
+ans = '"a";"b"\n"foo";"ba\\"r"\n'
+test(1658.43, fwrite(DT, sep=";", quote=TRUE, qmethod="escape"), output=ans)
+test(1658.44, write.table(DT, sep=";", qmethod="escape", row.names=FALSE), output=ans)
+
+if (.Platform$OS.type=="unix") {
+  # on linux we can create windows format files if we want
+  test(1658.5, fwrite(data.table(a="foo", b="bar"), eol="\r\n", quote=TRUE),
+               output = '"a","b""foo","bar"')
+}
+
+# changing NA
+test(1658.6, fwrite(data.table(a=c("foo", NA), b=c(1, NA)), na="NA", quote=TRUE),
+            output='"a","b"\n"foo",1\nNA,NA\n')
+
+# no col.names
+test(1658.7, fwrite(data.table(a="foo", b="bar"), col.names=F, quote=TRUE),
+            output='"foo","bar"\n')
+
+test(1658.8, fwrite(data.table(a=c(1:5), b=c(1:5)), quote=TRUE),
+            output='"a","b"\n1,1\n2,2\n3,3\n4,4\n5,5\n')
+
+# block size equal to number of rows
+test(1658.9, fwrite(data.table(a=c(1:3), b=c(1:3)), quote=TRUE),
+            output='"a","b"\n1,1\n2,2\n3,3\n')
+
+# block size one bigger than number of rows
+test(1658.11, fwrite(data.table(a=c(1:3), b=c(1:3)), quote=TRUE),
+            output='"a","b"\n1,1\n2,2\n3,3\n')
+
+# block size one less than number of rows
+test(1658.12, fwrite(data.table(a=c(1:3), b=c(1:3)), quote=TRUE),
+            output='"a","b"\n1,1\n2,2\n3,3\n')
+
+# writing a data.frame
+test(1658.13, fwrite(data.frame(a="foo", b="bar"), quote=TRUE),
+            output='"a","b"\n"foo","bar"\n')
+
+# single-column data.table
+test(1658.14, fwrite(data.table(a=c(1,2,3)), quote=TRUE),
+            output='"a"\n1\n2\n3\n')
+
+# single-column data.frame
+test(1658.15, fwrite(data.frame(a=c(1,2,3)), quote=TRUE),
+            output='"a"\n1\n2\n3\n')
+
+# different column types
+test(1658.16, fwrite(data.table(
+    factor1=as.factor(c('foo', 'bar')),
+    factor2=as.factor(c(NA, "baz")),
+    bool=c(TRUE,NA),
+    ints=as.integer(c(NA, 5))), na='na', quote=TRUE),
+  output='"factor1","factor2","bool","ints"\n"foo",na,TRUE,na\n"bar","baz",na,5\n')
+
+# empty data table (headers but no rows)
+empty_dt <- data.table(a=1, b=2)[0,]
+test(1658.17, fwrite(empty_dt, quote=TRUE), output='"a","b"\n')
+
+# data.table with duplicate column names
+test(1658.18, fwrite(data.table(a=1, a=2), quote=TRUE), output='"a","a"\n1,2\n')
+
+# number of significant digits = 15
+test(1658.19, fwrite(data.table(a=1/0.9), quote=TRUE), output='"a"\n1.11111111111111\n')
+
+# test append
+f = tempfile()
+fwrite(data.table(a=c(1,2), b=c('a', 'b')), f, quote=TRUE)
+fwrite(data.table(a=c(3,4), b=c('c', 'd')), f, append=TRUE, quote=TRUE)
+test(1658.21, readLines(f), c('"a","b"','1,"a"','2,"b"','3,"c"','4,"d"'))
+unlink(f)
+
+# simple data table (reference for the error cases below)
+ok_dt <- data.table(foo="bar")
+test(1658.22, fwrite(ok_dt, quote=TRUE), output='"foo"\n"bar"\n')
+
+options(oldverbose)
+
+# wrong argument types
+test(1658.23, fwrite(ok_dt, 1), error="is.character(file).*not TRUE")
+test(1658.24, fwrite(ok_dt, quote=123), error="identical(quote.*auto.*FALSE.*TRUE")
+test(1658.25, fwrite(ok_dt, sep="..."), error="nchar(sep)")
+test(1658.26, fwrite(ok_dt, qmethod=c("double", "double")), error="length(qmethod)")
+test(1658.27, fwrite(ok_dt, col.names="foobar"), error="isLOGICAL(col.names)")
+
+# null data table (no columns)
+test(1658.28, fwrite(data.table(a=1)[NULL,]), error="ncol(x) > 0L is not TRUE")
+
+## End fwrite tests
+
+# tests for #679, inrange(), FR #707
+dt = data.table(a=c(8,3,10,7,-10), val=runif(5))
+range = data.table(start = 1:5, end = 6:10)
+test(1659.1, dt[a %inrange% range], dt[1:4])
+test(1659.2, dt[inrange(a, range$start, range$end)], dt[1:4])
+test(1659.3, dt[inrange(a, range$start, range$end, incbounds=FALSE)], dt[c(1,2,4)])
+range[4, `:=`(start=-12L, end=-4L)]
+test(1659.4, dt[a %inrange% range], dt)
+
+# tests for non-equi joins returning columns correctly when j is missing
+dt1 = fread('Chr    Start    End    Region
+chr6    3324    3360   Region1
+chr4    2445    2455   Region2
+chr1    1034    1090   Region4')
+dt2 = fread('Site    Chr     Location    Gene    
+Site1   chr4    2447        GeneB   
+Site2   chr9    1153        GeneT   
+Site3   chr6    3350        GeneM   
+Site4   chr1    1034        GeneC   
+Site5   chr1    2000        GeneU   
+Site6   chr6    3359        GeneF   
+Site7   chr7    1158        GeneI   
+Site8   chr4    2451        GeneO
+Site9   chr6    3367        GeneZ   ')
+test(1660.1, names(dt2[dt1, on=.(Chr, Location>=Start, Location<=End)]), c(names(dt2), "Location.1", "Region"))
+test(1660.2, names(dt1[dt2, on=.(Chr, Start<=Location, End>=Location)]), c(names(dt1), "Site", "Gene"))
+
+# `names<-` should NOT modify by reference #1015
+DT = data.table(x=1, y=2)
+nn = names(DT)
+test(1661.1, {names(DT) <- c("k", "m"); nn}, c("x","y"), warning=if (base::getRversion()>="3.1.0") NULL else "Please upgrade")
+test(1661.2, names(DT), c("k","m"))
+
+# as.Date.IDate won't change the class if xts package loaded #1500 
+if ("package:zoo" %in% search()) {
+    x = as.IDate("2016-01-15")
+    require(zoo)
+    test(1662, class(as.Date(x)), "Date")
+} else {
+    cat("Test 1662 not run. If required call library(zoo) first.\n")
+}
+
+# IDate support in as.xts.data.table #1499
+if ("package:xts" %in% search()) {
+    dt <- data.table(date = c(as.IDate("2014-12-31"),
+                              as.IDate("2015-12-31"),
+                              as.IDate("2016-12-31")), 
+                     nav = c(100,101,99),
+                     key = "date")
+    dt.xts <- as.xts.data.table(dt)
+    test(1663, dt.xts[1L], xts::xts(data.table(nav=100), order.by=as.Date("2014-12-31")))
+} else {
+    cat("Test 1663 not run. If required call library(xts) first.\n")
+}
+
+# fwrite crash on very large number of columns (say 100k)
+set.seed(123)
+m <- matrix(runif(3*100000), nrow = 3)
+DT <- as.data.table(m)
+f <- tempfile()
+system.time(fwrite(DT, f, eol='\n', quote=TRUE))  # eol fixed so size test passes on Windows
+system.time(fwrite(DT, f, eol='\n', quote=TRUE))  # run again to force seg fault
+test(1664, abs(file.info(f)$size %/% 100000 - 62) <= 1.5)  # file size appears to be 34 bytes bigger on Windows (6288931 vs 6288965)
+unlink(f)
+
+# rbindlist support for complex type
+dt1 = data.table(x=1L, y=2+3i)
+dt2 = data.table(x=0:101, y=3+sample(102)*1i)
+test(1665.1, rbindlist(list(dt1,dt2)), setDT(rbind(as.data.frame(dt1), as.data.frame(dt2))))
+# print method now works (when rows > 100 it uses rbind/rbindlist internally)
+test(1665.2, ans <- capture.output(dt2), ans) # just checking that it doesn't error, really.
+
+# Use existing index even when auto index is disabled #1422
+d = data.table(k=3:1) # subset - no index
+options("datatable.use.index"=TRUE, "datatable.auto.index"=TRUE)
+test(1666.1, d[k==1L, verbose=TRUE], d[3L], output="Creating new index 'k'")
+d = data.table(k=3:1)
+options("datatable.use.index"=TRUE, "datatable.auto.index"=FALSE)
+test(1666.2, grep("Creating new index", capture.output(d[k==1L, verbose=TRUE])), integer(0)) # do not create index
+d = data.table(k=3:1)
+options("datatable.use.index"=FALSE, "datatable.auto.index"=FALSE)
+test(1666.3, grep("Creating new index", capture.output(d[k==1L, verbose=TRUE])), integer(0))
+d = data.table(k=3:1)
+options("datatable.use.index"=FALSE, "datatable.auto.index"=TRUE)
+test(1666.4, grep("Creating new index", capture.output(d[k==1L, verbose=TRUE])), integer(0))
+d = data.table(k=3:1) # subset - index
+setindex(d, k)
+options("datatable.use.index"=TRUE, "datatable.auto.index"=TRUE)
+test(1666.5, d[k==1L, verbose=TRUE], d[3L], output="Using existing index 'k'")
+options("datatable.use.index"=TRUE, "datatable.auto.index"=FALSE)
+test(1666.6, d[k==1L, verbose=TRUE], d[3L], output="Using existing index 'k'")
+options("datatable.use.index"=FALSE, "datatable.auto.index"=FALSE)
+test(1666.7, grep("Using existing index", capture.output(d[k==1L, verbose=TRUE])), integer(0)) # not using existing index
+options("datatable.use.index"=FALSE, "datatable.auto.index"=TRUE)
+test(1666.8, grep("Using existing index", capture.output(d[k==1L, verbose=TRUE])), integer(0))
+d1 = data.table(k=3:1) # join - no index
+d2 = data.table(k=2:4)
+options("datatable.use.index"=TRUE, "datatable.auto.index"=TRUE)
+test(1666.9, d1[d2, on="k", verbose=TRUE], d1[d2, on="k"], output="ad hoc")
+options("datatable.use.index"=TRUE, "datatable.auto.index"=FALSE)
+test(1666.11, d1[d2, on="k", verbose=TRUE], d1[d2, on="k"], output="ad hoc")
+options("datatable.use.index"=FALSE, "datatable.auto.index"=FALSE)
+test(1666.12, grep("Looking for existing (secondary) index", capture.output(d1[d2, on="k", verbose=TRUE])), integer(0)) # not looking for index
+options("datatable.use.index"=FALSE, "datatable.auto.index"=TRUE)
+test(1666.13, grep("Looking for existing (secondary) index", capture.output(d1[d2, on="k", verbose=TRUE])), integer(0))
+d1 = data.table(k=3:1,v1=10:12) # join - index
+d2 = data.table(k=2:4,v2=20:22)
+setindex(d1, k)
+ans = data.table(k=2:4, v1=c(11L,10L,NA), v2=20:22)
+options("datatable.use.index"=TRUE, "datatable.auto.index"=TRUE)
+test(1666.14, d1[d2, on="k", verbose=TRUE], ans, output="existing index")
+options("datatable.use.index"=TRUE, "datatable.auto.index"=FALSE)
+test(1666.15, d1[d2, on="k", verbose=TRUE], ans, output="existing index")
+options("datatable.use.index"=FALSE, "datatable.auto.index"=FALSE)
+test(1666.16, d1[d2, on="k", verbose=TRUE], ans, output='ad hoc')
+options("datatable.use.index"=FALSE, "datatable.auto.index"=TRUE)
+test(1666.17, d1[d2, on="k", verbose=TRUE], ans, output='ad hoc')
+# reset defaults
+options("datatable.use.index"=TRUE, "datatable.auto.index"=TRUE)
+
+
+#testing fix to #1654 (dcast should only error when _using_ duplicated names)
+DT <- data.table(a = 1:4, a = 1:4, id = rep(1:4, 2), V1 = 8:1)
+test(1667.1, dcast(DT, id ~ rowid(id), value.var = "V1"), 
+     output = "   id 1 21:  1 8 42:  2 7 33:  3 6 24:  4 5 1")
+DT <- data.table(a = 1:4, id = 1:4, id = rep(1:4, 2), V1 = 8:1)
+test(1667.2, dcast(DT, id ~ rowid(id), value.var = "V1"), error = "data.table to cast")
+
+# fix for #1672
+test(1668, chmatch(c("a","b"), c("a","c"), nomatch = integer()), c(1L, NA_integer_))
+
+# fix for #1650, segfault in rolling joins resulting from fixing #1405.
+x = data.table(Date = as.Date(c("2015-12-29", "2015-12-29", "2015-12-29", "2015-12-29", "2016-01-30", "2016-01-30", 
+                                "2016-01-30", "2016-01-30", "2016-02-29", "2016-02-29", "2016-02-29", "2016-02-29", 
+                                "2016-03-26", "2016-03-26", "2016-03-26", "2016-03-26")), 
+                 ID = c("A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D", "A", "B", "C", "D"), 
+              Value = c("A201512", "B201512", "C201512", "D201512", "A201601", "B201601", "C201601", "D201601", 
+                        "A201602", "B201602", "C201602", "D201602", "A201603", "B201603", "C201603", "D201603"), 
+              key = c('Date', 'ID'))
+y = CJ(Date = as.Date(c("2015-12-31", "2016-01-31", "2016-02-29", "2016-03-31")), ID = unique(x$ID))
+test(1669, x[y, on=c("ID", "Date"), roll=TRUE, which=TRUE], 1:16)
+
+# 1680 fix, fread header encoding issue
+x = "Stra\xdfe"
+Encoding(x) = "latin1"
+nm = names(fread("1680-fread-header-encoding.csv", encoding="Latin-1"))
+test(1670, nm[2], x)
+
+# as.data.table must return a copy even if 'x' is a data.table
+x = data.table(a=1, b=2)
+test(1670.1, address(x) != address(as.data.table(x)), TRUE)
+setattr(x, 'class', c('a', class(x)))
+test(1670.2, class(as.data.table(x)), class(x)[2:3])
+
+# #1676, `:=` with by shouldn't add cols on supported types
+dt = data.table(x=1, y=2)
+test(1671, dt[, z := sd, by=x], error="invalid type/length (closure/1)")
+
+# 1683
+DT <- data.table(V1 = rep(1:2, 3), V2 = 1:6)
+test(1672.1, DT[ , .(.I[1L], V2[1L]), by = V1],
+     output = "   V1 V1 V21:  1  1  12:  2  2  2")
+#make sure GForce operating
+test(1672.2, DT[ , .(.I[1L], V2[1L]), by = V1, verbose = TRUE],
+     output = "GForce optimized j")
+#make sure works on .I by itself
+test(1672.3, DT[ , .I[1L], by = V1],
+     output = "   V1 V11:  1  12:  2  2")
+#make sure GForce here as well
+test(1672.4, DT[ , .I[1L], by = V1, verbose = TRUE],
+     output = "GForce optimized j")
+#make sure works with order
+test(1672.5, DT[order(V1), .I[1L], by = V1],
+     output = "   V1 V11:  1  12:  2  2")
+# should also work with subsetting
+test(1672.6, DT[1:5, .(.I[1L], V2[1L]), by = V1],
+     output = "   V1 V1 V21:  1  1  12:  2  2  2")
+     
+#tests for #1528
+TT <- as.IDate("2016-04-25")
+test(1673.1, TT + 4L, as.IDate("2016-04-29"))
+test(1673.2, TT + 4, as.IDate("2016-04-29"))
+test(1673.3, TT - 3, as.IDate("2016-04-22"))
+test(1673.4, TT - 3L, as.IDate("2016-04-22"))
+test(1673.5, as.IDate("2016-04-28") - as.IDate("2016-04-20"), 8L)
+
+
+# test for radix integer order when MAXINT is present AND decreasing=TRUE AND na.last=FALSE
+# https://bugs.r-project.org/bugzilla/show_bug.cgi?id=16925
+# It seems this 'just' fails ASAN, but also results in seg fault under some compilers
+# https://github.com/rstudio/shiny/issues/1200
+test(1674, forderv(c(2147483645L, 2147483646L, 2147483647L, 2147483644L), order=-1L), c(3,2,1,4))
+
+# fix for #1718
+A = data.table(foo = c(1, 2, 3), bar = c(4, 5, 6))
+B = data.table(foo = c(1, 2, 3, 4, 5, 6), bar = c(NA, NA, NA, 4, 5, 6))
+A[, bar := factor(bar, levels = c(4, 5), labels = c("Boop", "Beep"), exclude = 6)]
+B[, bar := factor(bar, levels = c(4, 5, NA), labels = c("Boop", "Beep", NA), exclude = NULL)]
+test(1675.1, as.integer(B[A, bar := i.bar, on="foo"]$bar), c(1:3,1:2,NA))
+A = data.table(foo = c(1, 2, 3), bar = c(4, 5, 6))
+B = data.table(foo = c(1, 2, 3, 4, 5, 6), bar = c(NA, NA, NA, 4, 5, 6))
+A[, bar := factor(bar, levels = c(4, 5), labels = c("Boop", "Beep"), exclude = 6)]
+B[, bar := factor(bar, levels = c(4, 5), labels = c("Boop", "Beep"), exclude = 6)]
+test(1675.2, as.integer(B[A, bar := i.bar, on="foo"]$bar), c(1:2,NA,1:2,NA))
+
+# fwrite na arg segfault fix, #1725
+dt = data.table(x=1:2, y=c(NA,"a"))
+f = tempfile()
+test(1676.1, fwrite(dt, f, na=NULL), error="is not TRUE")
+fwrite(dt, f, na=NA)
+test(1676.2, fread(f), data.table(x=1:2, y=c(NA, "a")))
+unlink(f)
+
+# duplicate names in foverlaps #1730
+a = data.table(start = 1:5, end = 2:6, c2 = rnorm(10), c2 = rnorm(10), key=c("start","end"))
+b = data.table(start = 1:5, end = 2:6, c3 = rnorm(5), key=c("start","end"))
+test(1677.1, foverlaps(a, b), error="x has some duplicated column")
+test(1677.2, foverlaps(b, a), error="y has some duplicated column")
+
+# na.omit.data.table removes indices #1734
+dt = data.table(a=4:1, b=c(letters[c(1L,NA,2:3)]))
+setindexv(dt, "a")
+test(1678.1, indices(dt2 <- na.omit(dt, cols="b")), NULL)
+setindexv(dt2, "a")
+test(1678.2, indices(na.omit(dt2, cols="b")), "a")
+
+# rleid gains `prefix` argument, similar to rowid
+x = sample(3,10,TRUE)
+test(1679.1, rleid(x, prefix="id"), paste0("id", rleid(x)))
+test(1679.2, rleidv(x, prefix="id"), paste0("id", rleidv(x)))
+
+# melt.data.table call along with patterns from within a function, #1749
+x = data.table(x1=1:2, x2=3:4, y1=5:6, y2=7:8, z1=9:10, z2=11:12)
+foo <- function(x) {
+    pats = c("^y", "z")
+    melt(x, measure.vars=patterns(pats))
+}
+test(1680.1, foo(x), melt(x, measure.vars=patterns("^y", "^z")))
+
+# melt warning prints only first 5 cols, #1752
+DT = fread("melt-warning-1752.tsv")
+ans = suppressWarnings(melt(DT[, names(DT) %like% "(^Id[0-9]*$)|GEOGRAPHIC AREA CODES", with=FALSE], id=1:2))
+test(1681, melt(DT[, names(DT) %like% "(^Id[0-9]*$)|GEOGRAPHIC AREA CODES", with=FALSE], id=1:2), 
+    ans, warning="are not all of the same type")
+
+# non-equi joins with by=.EACHI, not as exhaustive, but given the previous
+# tests were, this should be fine.. we'll add tests as we go along.
+set.seed(45L)
+dt1 = data.table(x=sample(8,20,TRUE), y=sample(8,20,TRUE), z=1:20)
+dt2 = data.table(c(2,5), c(5,7), c(2,4))
+dt3 = data.table(c(12,5), c(15,7), c(2,4))
+
+test(1682.1, dt1[dt2, .N, by=.EACHI, on=.(x>=V1, y<=V2)], dt1[dt2, on=.(x>=V1, y<=V2)][, .N, by=.(x,y)])
+test(1682.2, dt1[dt2, sum(z), by=.EACHI, on=.(x>=V1, y<=V2)], dt1[dt2, on=.(x>=V1, y<=V2)][, sum(z), by=.(x,y)])
+test(1682.3, dt1[dt2, as.numeric(median(z)), by=.EACHI, on=.(x>=V1, y<=V2)], dt1[dt2, on=.(x>=V1, y<=V2)][, median(z), by=.(x,y)])
+test(1682.4, dt1[dt3, .N, by=.EACHI, on=.(x>=V1, y<=V2)], dt1[dt3, on=.(x>=V1, y<=V2)][, .(N=sum(!is.na(z))), by=.(x,y)])
+test(1682.5, dt1[dt3, .N, by=.EACHI, on=.(x>=V1, y<=V2), nomatch=0L], dt1[dt3, on=.(x>=V1, y<=V2), nomatch=0L][, .N, by=.(x,y)])
+test(1682.6, dt1[dt2, on=.(x>=V1, y<=V2), sum(z)*V3, by=.EACHI], dt1[dt2, on=.(x>=V1, y<=V2)][, sum(z)*V3[1L], by=.(x,y)])
+test(1682.7, dt1[dt3, on=.(x>=V1, y<=V2), sum(z)*V3, by=.EACHI], dt1[dt3, on=.(x>=V1, y<=V2)][, sum(z)*V3[1L], by=.(x,y)])
+# add test for update operation
+idx = dt1[dt2[1], which=TRUE, on=.(x>=V1, y<=V2)]
+test(1682.8, copy(dt1)[dt2[1], z := 2L*z, by=.EACHI, on=.(x>=V1, y<=V2)], copy(dt1)[(idx), z := 2L*z])
+# test for add by reference
+test(1682.9, copy(dt1)[dt2[1], foo := z, by=.EACHI, on=.(x>=V1, y<=V2)], copy(dt1)[(idx), foo := z])
+# test for nomatch=0L with by=.EACHI fix for non-equi joins
+dt = data.table(x=c(1,4,7,10), y=c(6,12,18,24), z=4:1)
+test(1683.1, dt[.(c(2,15), c(100,25)), sum(z), on=.(x>=V1, y<=V2), by=.EACHI], data.table(x=c(2,15), y=c(100,25), V1=c(6L, NA)))
+test(1683.2, dt[.(c(2,15), c(100,25)), sum(z), on=.(x>=V1, y<=V2), by=.EACHI, nomatch=0L], data.table(x=2, y=100, V1=6L))
+
+# unique should remove index #1760
+dt <- data.table(a = c("1", "1", "2", "2", "3", "4", "4", "4"), 
+                 b = letters[1:8], 
+                 d = c(TRUE, TRUE, FALSE, FALSE, TRUE, FALSE, FALSE, FALSE))
+dt[d == TRUE, `:=`(b = "M")] # create index
+udt <- unique(dt, by = c("a", "b"))
+test(1684, nrow(udt[d == TRUE]), 2)
+
+# #1758, data.table print issue
+foo <- function(annot=c("a", "b")) {
+  dt = data.table(x=annot, y=NA)
+  ro = structure(list(dt=dt), class="dtu")
+  suppressWarnings(ro$dt[, flag := TRUE])
+  ro
+}
+old = options(datatable.verbose=FALSE)
+test(1685, grep("dtu", capture.output(foo())), 7L)
+options(old)
+
+# fix for #1771
+test(1686.1, uniqueN(1L), 1L)
+test(1685.2, uniqueN(1L, na.rm=TRUE), 1L)
+
+# fix for #1744
+DT = data.table(ID = 1:2, A = 3:4, B = 5:6)
+test(1686.1, DT[, .(A,B)], DT[, c(mget("A"), .SD), .SDcols="B"])
+test(1686.2, DT[, .(V1=A,B)], DT[, c(.(get("A")), .SD), .SDcols="B"])
+
+# tests for first
+test(1687.1, first(1:5), 1L)
+test(1687.2, first(data.table(x=1:5, y=6:10)), data.table(x=1L, y=6L))
+
+if ("package:bit64" %in% search()) {
+    # fix for #1385 and part of #1459
+    x1 = data.table(id=1, value=as.integer64(1))
+    x2 = data.table(id=c(1,2))
+    test(1688.1, merge(x2, x1, by="id", all.x=TRUE)$value, as.integer64(c(1,NA)))
+
+    x1 = data.table(x = c(1),y = integer64(1))
+    x2 = data.table(x = c(1,2))
+    test(1688.2, merge(x1, x2, all=TRUE, by="x")$y, as.integer64(c(0, NA)))
+}
+
+test(1689, capture.output(IDateTime(as.POSIXct("2016/01/13 17:00", tz = "America/Los_Angeles"))),
+     c("        idate    itime", "1: 2016-01-13 17:00:00"))
+
+# fix for #1766 and #1704
+A = data.table(i = 1:6, j = rep(1:2, 3), x = letters[1:6], key = "i")
+B = data.table(j = 1:2, y = letters[1:2], key = "j")
+test(1690.1, key(A[B, on = "j"]), NULL)
+test(1690.2, key(A[B, on = "j"]), NULL)
+
+dt <-  data.table(
+        origin = c("A", "A", "A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "B", "B", "B", "B", "B", "C", "C", "B", "A", "C", "C", "C", "C", "C", "A", "A", "C", "C", "B", "B"),
+        destination = c("A", "A", "A", "A", "B", "B", "A", "A", "C", "C", "A", "A", "B", "B", "B", "C", "C", "B", "B", "A", "B", "C", "C", "C", "A", "A", "C", "C", "B", "B", "C", "C"),
+        points_in_dest = c(5, 5, 5, 5, 4, 4, 5, 5, 3, 3, 5, 5, 4, 4, 4, 3, 3, 4, 4, 5, 4, 3, 3, 3, 5,5, 3, 3, 4, 4, 3, 3),
+        depart_time = c(7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 18, 7, 8, 16, 7, 8, 16, 18, 8, 16, 7, 8, 18, 7, 8, 16, 18, 7, 8, 16, 18), 
+        travel_time = c(0, 0, 0, 0, 70, 10, 70, 10, 10, 10, 70, 70, 0, 0, 0, 70, 10, 10, 70, 70, 10, 0, 0, 0, 10, 70, 10, 70, 10, 70, 70, 10))
+dt[ depart_time<=8  & travel_time < 60, condition1 := TRUE]
+dt[ depart_time>=16 & travel_time < 60, condition2 := TRUE] 
+setkey(dt, origin, destination)
+res <- unique(dt[(condition1)],by=key(dt))[unique(dt[(condition2)], by=key(dt)), 
+                                on = c(destination = "origin", origin = "destination"), 
+                                nomatch = 0L]
+test(1690.3, res[, .(points = sum(points_in_dest)),  keyby = origin], data.table(origin=LETTERS[1:3], points=c(9,7,12), key="origin"))
+
+# fix for #1626 (so that rbind plays nicely with non-list inputs, e.g., package 
+# psych creates a list with the input data.frame/data.table and a matrix it 
+# creates...)
+dt = data.table(x=1:5, y=6:10)
+test(1691, rbind(dt, dt), rbind(dt, as.matrix(dt)))
+
+# For #1783 -- subsetting a data.table by an ITime object
+test(1692, capture.output(as.data.table(structure(57600L, class = "ITime"))),
+     c("         V1", "1: 16:00:00"))
+
+# testing all time part extraction routines (subsumes #874)
+t <- "2016-08-03 01:02:03.45"
+test(1693.1, second(t), 3L)
+test(1693.2, minute(t), 2L)
+test(1693.3, hour(t), 1L)
+test(1693.4, yday(t), 216L)
+test(1693.5, wday(t), 4L)
+test(1693.6, week(t), 31L)
+test(1693.7, month(t), 8L)
+test(1693.8, quarter(t), 3L)
+test(1693.9, year(t), 2016L)
+
+# fix for #1740 - sub-assigning NAs for factors
+dt = data.table(x = 1:5, y = factor(c("","a","b","a", "")), z = 5:9)
+ans = data.table(x = 1:5, y = factor(c(NA,"a","b","a", NA)), z = 5:9)
+test(1694.0, dt[y=="", y := NA], ans)
+
+# more tests for between()
+x = c(NaN, NA, 1, 5, -Inf, Inf)
+test(1695.1, x %between% c(3, 7), c(NA, NA, FALSE, TRUE, FALSE, FALSE))
+test(1695.2, x %between% c(NA, 7), c(NA, NA, NA, NA, NA, FALSE))
+test(1695.3, x %between% c(3, NA), c(NA, NA, FALSE, NA, FALSE, NA))
+test(1695.4, x %between% c(NA, NA), rep(NA, 6L))
+
+x = c(NA, 1L, 5L)
+test(1695.5, x %between% c(3, 7), c(NA, FALSE, TRUE))
+test(1695.6, x %between% c(NA, 7), c(NA, NA, NA))
+test(1695.7, x %between% c(3, NA), c(NA, FALSE, NA))
+test(1695.8, x %between% c(NA, NA), rep(NA, 3L))
+
+x = rep(NA_integer_, 3)
+test(1695.9, x %between% c(3, 7), rep(NA, 3L))
+test(1695.10, x %between% c(NA, 7), rep(NA, 3L))
+test(1695.11, x %between% c(3, NA), rep(NA, 3L))
+test(1695.12, x %between% c(NA, NA), rep(NA, 3L))
+
+x = integer(0)
+test(1695.13, x %between% c(3, 7), logical(0))
+
+# test for #1819, verbose message for bmerge
+old_opt = getOption("datatable.verbose")
+options(datatable.verbose = TRUE)
+x = data.table(A = 10:17)
+test(1696.0, any(grepl("bmerge", capture.output(x[A %inrange% 13:14]))), TRUE)
+# restore verbosity
+options(datatable.verbose = old_opt)
+
+# adding a test for #1825 (though it is not on timing, but correctness while 
+# joining on keyed tables using 'on' argument)
+x = data.table(a=1:3, b=4:6, key="a")
+y = data.table(a=2:4, c=7:9)
+test(1697.1, x[y], x[y, on=key(x)])
+y = data.table(m=2:4, c=7:9, key="m")
+test(1697.2, x[y], x[y, on=c(a="m")])
+
+# #1823, fix for 'on='' on keyed anti-joins loses key
+x = data.table(id = 1:10, val = letters[1:10], key = "id")
+y = data.table(id = 3:6, key = "id")
+test(1698.1, key(x[!y]), key(x[!y, on = "id"]))
+
+# minor enhancement to dcast, #1821
+dt = data.table(x=c(1,1,1,2,2,2), y=1:6, z=6:1)
+test(1699.1, dcast(dt, x ~ ., value.var="z", fun=list(sd, mean)), data.table(x=c(1,2), z_sd=1, z_mean=c(5,2), key="x"))
+
+# minor enhancement to dcast, #1810
+dt = data.table(
+    var1 = c("a","b","c","b","d","e","f"),
+    var2 = c("aa","bb","cc","dd","ee","ee","ff"),
+    subtype = c("1","2","2","2","1","1","2"),
+    type = c("A","A","A","A","B","B","B")
+)
+test(1700.1, dcast(dt, type ~ subtype, value.var = c("var1", "var2"), fun = function(v) paste0(unique(v), collapse = "|")), 
+    data.table(type=c("A","B"), var1_1=c("a", "d|e"), var1_2=c("b|c", "f"), 
+                var2_1=c("aa", "ee"), var2_2=c("bb|cc|dd","ff"), key="type"))
+
+# fixing regression introduced on providing functionality of 'x.' prefix in 'j' (for non-equi joins)
+A = data.table(x=c(1,1,1,2,2), y=1:5, z=5:1)
+B = data.table(x=c(2,3), val=4)
+col1 = "y"
+col2 = "x.y"
+test(1701.1, A[, .(length(x), length(y)), by=x], data.table(x=c(1,2), V1=1L, V2=c(3:2)))
+test(1701.2, A[, .(x), by=x], data.table(x=c(1,2), x=c(1,2)))
+test(1701.3, A[B, x.x, on="x"], c(2,2,NA))
+test(1701.4, A[B, x.y, on="x"], c(4:5,NA))
+test(1701.5, A[B, .(get("x"), get("x.x")), on="x"], data.table(V1=c(2,2,3), V2=c(2,2,NA)))
+test(1701.6, A[B, mget(c("x", "x.x")), on="x"], data.table(x=c(2,2,3), x.x=c(2,2,NA)))
+# 1761 fix as well 
+test(1701.6, A[B, .(x.x, get("x.x"), x.y), on="x", by=.EACHI], data.table(x=c(2,2,3), x.x=c(2,2,NA), V2=c(2,2,NA), x.y=c(4:5,NA)))
+dt = data.table(a=1L)
+test(1701.7, dt[dt, .(xa=x.a, ia=i.a), .EACHI, on="a"], data.table(a=1L, xa=1L, ia=1L))
+
+# ISO 8601-consistent week numbering, #1765
+#  test cases via https://en.wikipedia.org/wiki/ISO_week_date
+test_cases <- c("2005-01-01", "2005-01-02", "2005-12-31",
+                "2007-01-01", "2007-12-30", "2007-12-31",
+                "2008-01-01", "2008-12-28", "2008-12-29",
+                "2008-12-30", "2008-12-31", "2009-01-01", 
+                "2009-12-31", "2010-01-01",
+                "2010-01-02", "2010-01-03")
+
+test_values <- c(53L, 53L, 52L, 1L, 52L, 1L, 1L, 
+                 52L, 1L, 1L, 1L, 1L, 53L, 53L, 53L, 53L)
+
+test(1702, isoweek(test_cases), test_values)
+
+# fread, ensure no shell commands #1702
+if (.Platform$OS.type=="unix") {
+    cat("a,b\n1,2", file=f<-tempfile())
+    cmd <- sprintf("cat %s", f)
+    test(1703.1, fread(cmd), data.table(a=1L, b=2L))
+    test(1703.2, fread(file=cmd), error=sprintf("Provided file '%s' does not exists", cmd))
+    unlink(f)
+}
+
+# Ensure all.equal respects 'check.attributes' w.r.t. column names. As testthat::check_equivalent relies on this
+# as used by package popEpi in its tests
+test(1704, all.equal(data.table( a=1:3, b=4:6 ), data.table( A=1:3, B=4:6 ), check.attributes=FALSE))
+
+if (.Platform$OS.type!="windows") {
+  # On Windows: "'mc.cores' > 1 is not supported on Windows".
+  # parallel package isn't parallel on Windows, but data.table is.
+  if ("package:parallel" %in% search()) {   #1745 and #1727
+    test(1705, getDTthreads()<=2) # this was set at the top of tests.Rraw.  CRAN's policy is max two.
+                                  # not 1, to pass on Rdevel clag UBSAN and ASAN without OpenMP
+    lx <- replicate(4, runif(1e5), simplify=FALSE)
+    f <- function(mc.cores = 2, threads = 2) {
+      setDTthreads(threads)
+      invisible(mclapply(lx, function(x) fsort(x), mc.cores = mc.cores))
+    }
+    f(1, 1) # was always ok
+    f(2, 1) # was always ok
+    f(1, 2) # was always ok
+    f(2, 2) # used to hang. Now should not because data.table auto switches to single threaded
+            # commenting out avoid_openmp_hang_within_fork() confirms this test catches catches the hang
+    test(1706, getDTthreads()<=2) # Tests that it reverts to previous state after use of mclapply  
+  } else {
+    cat("Tests 1705 and 1706 not run. If required call library(parallel) first.\n")
+  }
+}
+
+# all.equal.data.table should consider modes equal like base R (detected via Bioc's flowWorkspace tests)
+# If strict testing is required, then use identical().
+# TODO: add strict.numeric (default FALSE) to all.equal.data.table() ?
+test(1707.1, all.equal( data.frame(a=0L), data.frame(a=0) ) )
+test(1707.2, all.equal( data.table(a=0L), data.table(a=0) ) )
+test(1708.1, !isTRUE(all.equal( data.frame(a=0L), data.frame(a=FALSE) )))
+test(1708.2, all.equal( data.table(a=0L), data.table(a=FALSE) ),
+             "Datasets have different column modes. First 3: a(numeric!=logical)")
+x = data.frame(a=0L)
+y = data.frame(a=0)
+setattr(y[[1]],"class",c("hello","world"))
+test(1709.1, !isTRUE(all.equal(x,y,check.attributes=TRUE)))   # desired
+test(1709.2, !isTRUE(all.equal(x,y,check.attributes=FALSE)))  # not desired
+x = as.data.table(x)
+y = as.data.table(y)
+test(1710.1, mode(x[[1]]) == mode(y[[1]]))
+test(1710.2, storage.mode(x[[1]]) != storage.mode(y[[1]]))
+test(1710.3, class(y[[1]]), c("hello","world"))
+test(1710.4, all.equal(x,y,check.attributes=TRUE),            # desired
+             "Datasets have different column classes. First 3: a(numeric!=hello;world)")
+test(1710.5, isTRUE(all.equal(x,y,check.attributes=FALSE)))   # desired
+
+# Include tests as-is from #1252 (unexpected NA row from logical subsets with 1-row containing NA)
+DT = data.table(a=1, d=NA)
+test(1711, DT[!is.na(a) & d == "3"], DT[0]) 
+DT = data.table(a = c(1,2), d = c(NA,3))
+test(1712, DT[!is.na(a) & d == "3"], DT[2])
+test(1713, DT[d==3], DT[2])
+
+# Test new helpful error message suggested by Jan
+test(1714, !exists("aColumnOfDT"))   # use a long column name to be sure not used before in this test script
+DT = data.table(a=1:3, aColumnOfDT=4:6)
+options(datatable.WhenJisSymbolThenCallingScope=FALSE)
+test(1715, DT[,aColumnOfDT], 4:6)    # old behaviour for sure tested before but here for context
+options(datatable.WhenJisSymbolThenCallingScope=TRUE)
+test(1716, DT[,aColumnOfDT], error=".*datatable.WhenJisSymbolThenCallingScope.*TRUE.*aColumnOfDT.*wrap with quotes.*aColumnOfDT.*aColumnOfDT.*aColumnOfDT")   # The .* are to check the salient points of the error (e.g. that it names the variable correctly 4 times), without getting too picky over the exact wording and punctuation of the full error
+options(datatable.WhenJisSymbolThenCallingScope=FALSE)
+
+# Test out-of-bounds error on numeric j
+DT = data.table(a=1:3, b=4:6, c=7:9)
+test(1717, DT[,4], error="Item 1 of j is 4 which is outside the column number range.*ncol=3")
+test(1718, DT[,0], null.data.table())
+test(1719, DT[,c(2,0,1)], data.table(b=4:6, a=1:3))
+test(1720, DT[,c(-2,2)], error="j mixes positives and negatives")
+test(1721, DT[,c(1,0,5)], error="Item 3 of j is 5 which.*ncol=3")  # to check it says Item 3 even though 0 gets removed internally
+
+# Tests to ensure auto with=FALSE of ! and - only allow symbols around : (i.e. DT[,!(colB:colE)] and not any other symbol usage inside ! and -.  Thanks to Mark L #1864 and confirmed by Michael C with both tests added as-is
+DT = data.table(FieldName = c(1,2,NA,4,NA,6), rowId=1:6, removalIndex=c(2,7,0,5,10,0)) 
+test(1722.1, DT[,!is.na(as.numeric(FieldName))],   c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE))
+test(1722.2, DT[,(!is.na(as.numeric(FieldName)))], c(TRUE,TRUE,FALSE,TRUE,FALSE,TRUE))
+test(1723.1, DT[removalIndex>0,rowId-(2*removalIndex-1)], c(-2,-11,-5,-14))
+test(1723.2, DT[removalIndex>0,(rowId-(2*removalIndex-1))], c(-2,-11,-5,-14))
+DT = data.table(FieldName = c("1", "2", "3", "four", "five", "6"))
+test(1724.1, DT[, is.na(as.numeric(FieldName))], c(FALSE,FALSE,FALSE,TRUE,TRUE,FALSE), warning="NAs introduced by coercion")
+test(1724.2, DT[, !is.na(as.numeric(FieldName))], c(TRUE,TRUE,TRUE,FALSE,FALSE,TRUE), warning="NAs introduced by coercion")
+
+# Ensure NA's are added properly when a new column is added, not all the target rows are joined to, and the number of i
+# rows is equal or greater than the number of rows in the target table.
+DT = data.table(a=1:3, key="a")
+DT[.(4), add0:=1.1][]          # didn't break due to 95e438c on 29 Sep 2016
+DT[.(c(3,4)), add1:=1.1][]     # didn't break
+DT[.(c(3,3,4)), add2:=1.1][]   # did break
+DT[.(2:4), add3:=1.1][]        # did break
+test(1725, DT, data.table(a=1:3, add0=NA_real_, add1=c(NA,NA,1.1), add2=c(NA,NA,1.1), add3=c(NA,1.1,1.1), key="a"))
+
+# keyby= runs groups in sorted order, #606. Only relevant when j does something that depends on previous group, perhaps
+# by using <<-. To run in appearance order use by=. See also #1880.
+# It wasn't useful to always run groups in appearance order. Now we have the option and it's consistent.
+DT = data.table(grp=rep(3:1,each=3), val=1:9)
+lastGrp = 0L
+test(1726.1, DT[, {ans=mean(val)+lastGrp; lastGrp<<-min(val); .(ans, .GRP)}, keyby=grp],
+           data.table(grp=1:3, V1=c(8,12,6), V2=1:3, key="grp") )
+test(1726.2, lastGrp, 1L)
+lastGrp = -1L
+test(1726.3, DT[, {ans=mean(val)+lastGrp; lastGrp<<-min(val); .(ans, .GRP)}, by=grp],
+           data.table(grp=3:1, V1=c(1,6,12), V2=1:3) )
+test(1726.4, lastGrp, 7L)
+rm(lastGrp)
+
+# better := verbose messages, #1808
+DT = data.table(a = 1:10)
+test(1727.1, DT[a < 5, a := 5L, verbose=TRUE], output="Assigning to 4 row subset of 10 rows")
+test(1727.2, DT[a < 5, a := 5L, verbose=TRUE], output="No rows match i.*Assigning to 0 row subset of 10 rows")
+test(1727.3, DT[0, d:=1, verbose=TRUE], data.table(a=c(rep(5L,5L),6:10), d=NA_real_),
+             output = "Assigning to 0 row subset of 10 rows.*Added 1 new column initialized with all-NA")            
+test(1727.4, DT[.(a=11L), on="a", c("f","g"):=.(1L,"dummy"), verbose=TRUE],
+             data.table(a=c(rep(5L,5L),6:10), d=NA_real_, f=NA_integer_, g=NA_character_),
+             output = "Assigning to 0 row subset of 10 rows.*Added 2 new columns initialized with all-NA")
+             
+# Add test for working and no problem na.last=NA with subgroup size 2 containing 1 NA
+# and 2 randomly not working cases with na.last=NA size 2 with 1 NA, due to using uninitialized memory 
+DT = data.table(x=INT(2,2,2,1,1), y=INT(1,NA,3,2,NA))
+test(1728.1, DT[order(x,y,na.last=TRUE)], data.table(x=INT(1,1,2,2,2), y=INT(2,NA,1,3,NA)))
+test(1728.2, DT[order(x,y,na.last=FALSE)], data.table(x=INT(1,1,2,2,2), y=INT(NA,2,NA,1,3)))
+test(1728.3, DT[order(x,y,na.last=NA)], data.table(x=INT(1,2,2), y=INT(2,1,3)))
+# 1 row
+DT = data.table(x=NA_integer_, y=1)
+test(1728.4, DT[order(x,y,na.last=TRUE)], DT)
+test(1728.5, DT[order(x,y,na.last=FALSE)], DT)
+test(1728.6, DT[order(x,y,na.last=NA)], DT[0])
+# 2 row with 1 NA 
+DT = data.table(x=as.integer(c(NA,1)), y=2:3)
+test(1728.7, DT[order(x,y,na.last=TRUE)], DT[c(2,1)])
+test(1728.8, DT[order(x,y,na.last=FALSE)], DT)
+test(1728.9, DT[order(x,y,na.last=NA)], DT[2]) # was randomly wrong
+test(1728.11, DT[order(x,na.last=TRUE)], DT[c(2,1)])
+test(1728.12, DT[order(x,na.last=FALSE)], DT)
+test(1728.13, DT[order(x,na.last=NA)], DT[2])  # was randomly wrong
+
+# fwrite wrong and crash on 9.9999999999999982236431605, #1847
+oldverbose=options(datatable.verbose=FALSE)
+test(1729.1, fwrite(data.table(V1=c(1), V2=c(9.9999999999999982236431605997495353221893310546875))),
+             output="V1,V21,10")
+test(1729.2, fwrite(data.table(V2=c(9.9999999999999982236431605997495353221893310546875), V1=c(1))),
+             output="V2,V110,1")
+DT = data.table(V1=c(9999999999.99, 0.00000000000000099, 0.0000000000000000000009, 0.9, 9.0, 9.1, 99.9,
+                     0.000000000000000000000999999999999999999999999,
+                     99999999999999999999999999999.999999))
+ans = "V19999999999.999.9e-169e-220.999.199.91e-211e+29"
+test(1729.3, fwrite(DT), output=ans)
+test(1729.4, write.csv(DT,row.names=FALSE,quote=FALSE), output=ans)
+options(oldverbose)
+
+# same decimal/scientific rule (shortest format) as write.csv
+DT = data.table(V1=c(-00000.00006, -123456789.123456789,
+                     seq.int(-1000,1000,17),
+                     seq(-1000,1000,pi*87),
+                     -1.2345678912345 * 10^(c((-30):30)),
+                     +1.2345678912345 * 10^(c((-30):30)),
+                     -1.2345 * 10^((-20):20),
+                     +1.2345 * 10^((-20):20),
+                     -1.7 * 10^((-20):20),
+                     +1.7 * 10^((-20):20),
+                     -7 * 10^((-20):20),
+                     +7 * 10^((-20):20),
+                     0, NA, NaN, Inf, -Inf,
+                     5.123456789e-290, -5.123456789e-290,
+                     5.123456789e-307, -5.123456789e-307,
+                     5.123456789e+307, -5.123456789e+307))
+test(1729.5, nrow(DT), 507)
+
+oldverbose=options(datatable.verbose=FALSE)
+# capture.output() exact tests must not be polluted with verbosity
+
+x = capture.output(fwrite(DT,na="NA"))[-1]   # -1 to remove the column name V1
+y = capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))[-1]
+# One mismatch that seems to be accuracy in base R's write.csv
+# tmp = cbind(row=1:length(x), `fwrite`=x, `write.csv`=y)
+# tmp[x!=y,]
+# row  fwrite                  write.csv       
+# 177  "-1234567891234500000"  "-1234567891234499840"
+# 238  "1234567891234500000"   "1234567891234499840"
+# looking in surrounding rows for the first one shows the switch point :
+# tmp[175:179,]
+# row  fwrite                  write.csv       
+# 175  "-12345678912345000"    "-12345678912345000"     # ok
+# 176  "-123456789123450000"   "-123456789123450000"    # ok
+# 177  "-1234567891234500000"  "-1234567891234499840"   # e+18 last before switch to scientific
+# 178  "-1.2345678912345e+19"  "-1.2345678912345e+19"   # ok
+# 179  "-1.2345678912345e+20"  "-1.2345678912345e+20"   # ok
+test(1729.6, x[c(177,238)], c("-1234567891234500000","1234567891234500000"))
+x = x[-c(177,238)]
+y = y[-c(177,238)]
+test(1729.7, length(x), 505)
+test(1729.8, x, y)
+if (!identical(x,y)) print(data.table(row=1:length(x), `fwrite`=x, `write.csv`=y)[x!=y])
+
+DT = data.table(c(5.123456789e+300, -5.123456789e+300,
+                  1e-305,1e+305, 1.2e-305,1.2e+305, 1.23e-305,1.23e+305))
+ans = c("V1","5.123456789e+300","-5.123456789e+300",
+        "1e-305","1e+305","1.2e-305","1.2e+305","1.23e-305","1.23e+305")
+# explicitly check against ans rather than just comparing fwrite to write.csv so that :
+# i) we can easily see intended results right here in future without needing to run
+# ii) we don't get a false pass if fwrite and write.csv agree but are both wrong because of
+#     a problem with the test mechanism itself or something else strange or unexpected
+# Exactly the same binary representation on both linux and windows (so any differences in
+# output are not because the value itself is stored differently) :
+# > cat(binary(DT[[1]]),sep="\n")
+# 0 11111100101 111010011010000100010111101110000100 11110100 00000100
+# 1 11111100101 111010011010000100010111101110000100 11110100 00000100
+# 0 00000001001 110000010110110001011100010100100101 00110101 01110101
+# 0 11111110100 001000111010010100010110111010000010 11011001 10111010
+# 0 00000001010 000011011010011101101010100101111100 10111001 10101101
+# 0 11111110100 010111011111100101001110101100000011 01101011 10101100
+# 0 00000001010 000101000110010100110011101010000110 00111110 01010001
+# 0 11111110100 011001101011100100100011110110110000 01001110 01011101
+test(1729.9, fwrite(DT), output=paste(ans,collapse=""))
+test(1729.11, write.csv(DT,row.names=FALSE,quote=FALSE), output=paste(ans,collapse=""))
+DT = data.table(unlist(.Machine[c("double.eps","double.neg.eps","double.xmin","double.xmax")]))
+#    double.eps double.neg.eps    double.xmin    double.xmax
+#  2.220446e-16   1.110223e-16  2.225074e-308  1.797693e+308
+test(1729.12, typeof(DT[[1L]]), "double")
+test(1729.13, capture.output(fwrite(DT)), capture.output(write.csv(DT,row.names=FALSE,quote=FALSE)))
+
+if ("package:bit64" %in% search()) {
+  test(1730.1, typeof(-2147483647L), "integer")
+  test(1730.2, as.integer(-2147483648), NA_integer_, warning="coercion")
+  test(1730.3, as.integer("-2147483647"), -2147483647L)
+  test(1730.4, as.integer("-2147483648"), NA_integer_, warning="coercion")
+  test(1730.5, as.integer64("-2147483648"), as.integer64(-2147483648))
+  if (.devtesting) {
+    # these don't pass UBSAN/USAN because of the overflow being tested here, so just in dev not on CRAN
+    test(1730.6, as.character((as.integer64(2^62)-1)*2+1), "9223372036854775807")
+    test(1730.7, as.character((as.integer64(2^62)-1)*2+2), NA_character_, warning="integer64 overflow")
+    test(1730.8, as.character(-(as.integer64(2^62)-1)*2-1), "-9223372036854775807")
+    test(1730.9, as.character(-(as.integer64(2^62)-1)*2-2), NA_character_, warning="integer64.*flow")
+  }
+  # Currently bit64 truncs to extremes in character coercion. Don't test that in case bit64 changes in future.
+  # as.integer64("-9223372036854775808") == NA
+  # as.integer64("-9223372036854775999") == NA
+  # as.integer64("+9223372036854775808") == 9223372036854775807
+  # as.integer64("+9223372036854775999") == 9223372036854775807  
+  DT = data.table( as.integer64(c(
+    "-9223372036854775807",  # integer64 min  2^63-1
+    "+9223372036854775807",  # integer64 max
+    "-9223372036854775806","+9223372036854775806",  # 1 below extreme just to check
+    "0","-1","1",
+    "NA",NA,
+    "-2147483646", # 1 below extreme to check
+    "-2147483647", # smallest integer in R
+    "-2147483648", # NA_INTEGER == INT_MIN but valid integer64
+    "-2147483649",
+    "+2147483646", # positives as well just in case
+    "+2147483647",
+    "+2147483648",
+    "+2147483649"
+  )))
+  ans = c("V1","-9223372036854775807","9223372036854775807","-9223372036854775806","9223372036854775806",
+          "0","-1","1","__NA__","__NA__",
+          "-2147483646","-2147483647","-2147483648","-2147483649",
+          "2147483646","2147483647","2147483648","2147483649")
+  test(1731.1, class(DT[[1L]]), "integer64")
+  test(1731.2, fwrite(DT,na="__NA__"), output=paste(ans,collapse=""))
+  f = tempfile()
+  test(1731.3, fwrite(DT,f, na="__NA__",..turbo=FALSE), NULL, warning="turbo.*will be removed")
+  test(1731.4, readLines(f), ans)
+  unlink(f)
+  test(1731.5, write.csv(DT,na="__NA__",row.names=FALSE,quote=FALSE), output=paste(ans,collapse=""))
+  # write.csv works on integer64 because it calls bit64's as.character method
+} else {
+  cat("Tests 1730 & 1731 not run. If required call library(bit64) first.\n")
+}
+
+# fwrite(,quote='auto' and qmethod)
+DT = data.table(x=c("fo,o", "foo", 'b"ar', NA, "", "NA"),
+                "ColName,WithComma"=1:6,
+                'Three\nLine\nColName'=c('bar\n', "noNeedToQuote", 'a\nlong\n"sentence"', "0000", " \n  ", '   "\n  '))
+x = capture.output(fwrite(DT,na="NA",quote=TRUE, qmethod='escape'))
+y = capture.output(write.table(DT,row.names=FALSE,quote=TRUE,sep=",",qmethod='escape'))
+test(1732.1, x, y)
+x = capture.output(fwrite(DT,na="NA",quote=TRUE,qmethod='double'))
+y = capture.output(write.table(DT,row.names=FALSE,quote=TRUE,sep=",",qmethod='double'))
+test(1732.2, x, y)
+x = capture.output(fwrite(DT,na="NA",quote=FALSE))
+y = capture.output(write.csv(DT,row.names=FALSE,quote=FALSE))
+test(1732.3, x, y)
+f = tempfile()
+fwrite(DT,f,quote='auto',qmethod='escape')
+# write.csv / write.table don't do field-by-field quoting so can't compare to them.
+ans = c('x,"ColName,WithComma","Three', 'Line', 'ColName"',
+        '"fo,o",1,"bar','"',
+        'foo,2,noNeedToQuote',
+        '"b\\"ar",3,"a',  'long', "\\\"sentence\\\"\"",
+        ',4,0000',
+        ',5," ','  "',
+        "NA,6,\"   \\\"", "  \"")
+test(1732.4, readLines(f), ans)
+fwrite(DT,f,quote='auto',qmethod='double')
+ans[7] = '"b""ar",3,"a'
+ans[9] = "\"\"sentence\"\"\""
+ans[13] = "NA,6,\"   \"\""
+test(1732.5, readLines(f), ans)
+DT = data.table(A=c("foo","ba,r","baz"), B=c("AA","BB","CC"), C=c("DD","E\nE","FF"))
+test(1732.6, fwrite(DT, quote='auto'), output='A,B,Cfoo,AA,DD"ba,r",BB,"EE"baz,CC,FF')
+unlink(f)
+
+# dec=","
+test(1733.1, fwrite(data.table(pi),dec=","), error="dec != sep is not TRUE")
+test(1733.2, fwrite(data.table(c(1.2,-8.0,pi,67.99),1:4),dec=",",sep=";"),
+           output="V1;V21,2;1-8;23,14159265358979;367,99;4")
+
+# fwrite implied and actual row.names
+DT = data.table(foo=1:3,bar=c(1.2,9.8,-6.0))
+test(1734.1, capture.output(fwrite(DT,row.names=TRUE,quote=FALSE)),
+             capture.output(write.csv(DT,quote=FALSE)))
+test(1734.2, capture.output(fwrite(DT,row.names=TRUE,quote=TRUE)),
+             capture.output(write.csv(DT)))
+test(1734.3, fwrite(DT,row.names=TRUE,quote='auto'),   # same other than 'foo' and 'bar' column names not quoted
+             output="\"\",foo,bar\"1\",1,1.2\"2\",2,9.8\"3\",3,-6")
+DF = as.data.frame(DT)
+test(1734.4, capture.output(fwrite(DF,row.names=TRUE,quote=FALSE)),
+             capture.output(write.csv(DF,quote=FALSE)))
+test(1734.5, capture.output(fwrite(DF,row.names=TRUE,quote=TRUE)),
+             capture.output(write.csv(DF)))
+rownames(DF)[2] = "someName"
+rownames(DF)[3] = "another"
+test(1734.6, capture.output(fwrite(DF,row.names=TRUE,quote=FALSE)),
+             capture.output(write.csv(DF,quote=FALSE)))
+test(1734.7, capture.output(fwrite(DF,row.names=TRUE,quote=TRUE)),
+             capture.output(write.csv(DF)))
+
+# fwrite showProgress test 1735. Turned off as too long/big for CRAN.
+if (FALSE) {
+  N = 6e8  # apx 6GB
+  DT = data.table(C1=sample(100000,N,replace=TRUE), C2=sample(paste0(LETTERS,LETTERS,LETTERS), N, replace=TRUE))
+  gc()
+  d = "/dev/shm/"
+  # and
+  d = "/tmp/"
+  f = paste0(d,"test.txt")
+  system.time(fwrite(DT, f, nThread=1))
+  file.info(f)$size/1024^3
+  unlink(f)
+  # ensure progress meter itself isn't taking time; e.g. too many calls to time() or clock()
+  system.time(fwrite(DT, f, showProgress=FALSE, nThread=1))
+  system.time(fwrite(DT, f, nThread=2))
+  system.time(fwrite(DT, f, nThread=4))
+  system.time(fwrite(DT, f, verbose=TRUE))
+  f2 = paste0(d,"test2.txt")
+  system.time(fwrite(DT, f2, verbose=TRUE))  # test 'No space left on device'
+  unlink(f)
+  unlink(f2)
+  system.time(fwrite(DT, f2))  # try again, should work now space free'd
+  file.info(f2)$size/1024^3
+  unlink(f2)
+}
+
+# list columns and sep2
+set.seed(1)
+DT = data.table(A=1:4,
+                B=list(1:10,15:18,7,9:10),
+                C=list(letters[19:23],c(1.2,2.3,3.4,pi,-9),c("foo","bar"),c(TRUE,TRUE,FALSE)))
+test(1736.1, capture.output(fwrite(DT)), c("A,B,C", "1,1|2|3|4|5|6|7|8|9|10,s|t|u|v|w",
+            "2,15|16|17|18,1.2|2.3|3.4|3.14159265358979|-9", "3,7,foo|bar", "4,9|10,TRUE|TRUE|FALSE"))
+test(1736.2, fwrite(DT, sep2=","), error="length(sep2)")
+test(1736.3, fwrite(DT, sep2=c("",",","")), error="sep.*,.*sep2.*,.*must all be different")
+test(1736.4, fwrite(DT, sep2=c("","||","")), error="nchar.*sep2.*2")
+test(1736.5, capture.output(fwrite(DT, sep='|', sep2=c("c(",",",")"))), c("A|B|C", "1|c(1,2,3,4,5,6,7,8,9,10)|c(s,t,u,v,w)", 
+ "2|c(15,16,17,18)|c(1.2,2.3,3.4,3.14159265358979,-9)", "3|c(7)|c(foo,bar)", "4|c(9,10)|c(TRUE,TRUE,FALSE)"))
+test(1736.6, capture.output(fwrite(DT, sep='|', sep2=c("{",",","}"), logicalAsInt=TRUE)),
+ c("A|B|C", "1|{1,2,3,4,5,6,7,8,9,10}|{s,t,u,v,w}", 
+ "2|{15,16,17,18}|{1.2,2.3,3.4,3.14159265358979,-9}", "3|{7}|{foo,bar}", "4|{9,10}|{1,1,0}"))
+DT = data.table(A=c("foo","ba|r","baz"))
+test(1736.7, capture.output(fwrite(DT)), c("A","foo","ba|r","baz"))  # no list column so no need to quote
+DT = data.table(A=c("foo","ba|r","baz"), B=list(1:3,1:4,c("fo|o","ba,r","baz"))) # now list column and need to quote
+test(1736.8, capture.output(fwrite(DT)), c("A,B", "foo,1|2|3", "\"ba|r\",1|2|3|4", "baz,\"fo|o\"|\"ba,r\"|baz"))
+test(1736.9, capture.output(fwrite(DT,quote=TRUE)), c("\"A\",\"B\"", "\"foo\",1|2|3", "\"ba|r\",1|2|3|4", "\"baz\",\"fo|o\"|\"ba,r\"|\"baz\""))
+
+# any list of same length vector input
+test(1737.1, fwrite(list()), NULL, warning="fwrite was passed an empty list of no columns")
+test(1737.2, fwrite(list(1.2)), output="1.2")
+test(1737.3, fwrite(list(1.2,B="foo")), output=",B1.2,foo")
+test(1737.4, fwrite(list("A,Name"=1.2,B="fo,o")), output="\"A,Name\",B1.2,\"fo,o\"")
+test(1737.5, fwrite(list(1.2,B=c("foo","bar"))), error="Column 2's length (2) is not the same as column 1's length (1)") 
+
+# fwrite ITime, Date, IDate
+DT = data.table(A=as.ITime(c("23:59:58","23:59:59","12:00:00","00:00:01",NA,"00:00:00")))
+test(1738.1, capture.output(fwrite(DT)), c("A","23:59:58","23:59:59","12:00:00","00:00:01","","00:00:00"))
+test(1738.2, capture.output(fwrite(DT)), capture.output(write.csv(DT,row.names=FALSE,quote=FALSE, na="")))
+dts = c("1901-05-17","1907-10-22","1929-10-24","1962-05-28","1987-10-19","2008-09-15",
+        "1968-12-30","1968-12-31","1969-01-01","1969-01-02")
+DT = data.table(A=as.Date(dts), B=as.IDate(dts))
+test(1738.3, sapply(DT,typeof), c(A="double",B="integer"))
+test(1738.4, capture.output(fwrite(DT)), capture.output(write.csv(DT,row.names=FALSE,quote=FALSE)))
+test(1738.5, as.integer(as.Date(c("0000-03-01","9999-12-31"))), c(-719468L,2932896L))
+
+if (FALSE) {
+  # Full range takes too long for CRAN.
+  dts = seq.Date(as.Date("0000-03-01"),as.Date("9999-12-31"),by="day")
+  dtsCh = as.character(dts)   # 36s
+  dtsCh = gsub(" ","0",sprintf("%10s",dtsCh))  # R does not 0 pad years < 1000
+  test(1739.1, length(dtsCh)==3652365 && identical(dtsCh[c(1,3652365)],c("0000-03-01","9999-12-31")))
+} else {
+  # test on CRAN a reduced but important range  
+  dts = seq.Date(as.Date("1899-12-31"),as.Date("2100-01-01"),by="day")
+  dtsCh = as.character(dts)
+  test(1739.1, length(dtsCh)==73051 && identical(dtsCh[c(1,73051)],c("1899-12-31","2100-01-01")))
+}
+DT = data.table(A=dts, B=as.IDate(dts))
+test(1739.2, sapply(DT,typeof), c(A="double",B="integer"))
+test(1739.3, typeof(dts), "double")
+f = tempfile()
+g = tempfile()                               # Full range
+fwrite(DT,f)                                 #     0.092s
+write.csv(DT,g,row.names=FALSE,quote=FALSE)  #    65.250s
+test(1739.4, readLines(f), c("A,B",paste(dtsCh,dtsCh,sep=",")))
+test(1739.5, readLines(f), readLines(g))
+unlink(f)
+unlink(g)
+
+# dateTimeAs
+DT = data.table(
+  A = as.Date(d<-c("1907-10-21","1907-10-22","1907-10-22","1969-12-31","1970-01-01","1970-01-01",
+                   "1972-02-29","1999-12-31","2000-02-29","2016-09-12")),
+  B = as.IDate(d),
+  C = as.ITime(t<-c("23:59:59","00:00:00","00:00:01", "23:59:58", "00:00:00","00:00:01",
+                   "12:00:00", "01:23:45", "23:59:59","01:30:30")),
+  D = as.POSIXct(dt<-paste(d,t), tz="UTC"),
+  E = as.POSIXct(paste0(dt,c(".999",".0",".5",".111112",".123456",".023",".0",".999999",".99",".0009")), tz="UTC"))
+
+test(1740.1, fwrite(DT,dateTimeAs="iso"), error="dateTimeAs must be 'ISO','squash','epoch' or 'write.csv'")
+test(1740.2, capture.output(fwrite(DT,dateTimeAs="ISO")), c(
+"A,B,C,D,E",
+"1907-10-21,1907-10-21,23:59:59,1907-10-21T23:59:59Z,1907-10-21T23:59:59.999Z",
+"1907-10-22,1907-10-22,00:00:00,1907-10-22T00:00:00Z,1907-10-22T00:00:00Z",
+"1907-10-22,1907-10-22,00:00:01,1907-10-22T00:00:01Z,1907-10-22T00:00:01.500Z",
+"1969-12-31,1969-12-31,23:59:58,1969-12-31T23:59:58Z,1969-12-31T23:59:58.111112Z",
+"1970-01-01,1970-01-01,00:00:00,1970-01-01T00:00:00Z,1970-01-01T00:00:00.123456Z",
+"1970-01-01,1970-01-01,00:00:01,1970-01-01T00:00:01Z,1970-01-01T00:00:01.023Z",
+"1972-02-29,1972-02-29,12:00:00,1972-02-29T12:00:00Z,1972-02-29T12:00:00Z",
+"1999-12-31,1999-12-31,01:23:45,1999-12-31T01:23:45Z,1999-12-31T01:23:45.999999Z",
+"2000-02-29,2000-02-29,23:59:59,2000-02-29T23:59:59Z,2000-02-29T23:59:59.990Z",
+"2016-09-12,2016-09-12,01:30:30,2016-09-12T01:30:30Z,2016-09-12T01:30:30.000900Z"))
+test(1740.3, capture.output(fwrite(DT,dateTimeAs="squash")), c(
+"A,B,C,D,E",
+"19071021,19071021,235959,19071021235959000,19071021235959999",
+"19071022,19071022,000000,19071022000000000,19071022000000000",
+"19071022,19071022,000001,19071022000001000,19071022000001500",
+"19691231,19691231,235958,19691231235958000,19691231235958111",
+"19700101,19700101,000000,19700101000000000,19700101000000123",
+"19700101,19700101,000001,19700101000001000,19700101000001023",
+"19720229,19720229,120000,19720229120000000,19720229120000000",
+"19991231,19991231,012345,19991231012345000,19991231012345999",
+"20000229,20000229,235959,20000229235959000,20000229235959990",
+"20160912,20160912,013030,20160912013030000,20160912013030000"))
+test(1740.4, capture.output(fwrite(DT,dateTimeAs="epoch")), c(
+"A,B,C,D,E",
+"-22718,-22718,86399,-1962748801,-1962748800.001",
+"-22717,-22717,0,-1962748800,-1962748800",
+"-22717,-22717,1,-1962748799,-1962748798.5",
+"-1,-1,86398,-2,-1.888888",
+"0,0,0,0,0.123456",
+"0,0,1,1,1.023",
+"789,789,43200,68212800,68212800",
+"10956,10956,5025,946603425,946603425.999999",
+"11016,11016,86399,951868799,951868799.99",
+"17056,17056,5430,1473643830,1473643830.0009"))
+
+test(1741.1, attr(DT[[4]],"tzone"), "UTC")
+test(1741.2, attr(DT[[5]],"tzone"), "UTC")
+# Remove tzone attribute to make write.csv write in local time.
+# That local time will vary on the boxes this test runs on, so we just compare to
+# write.csv rather than fixed strings as above.
+setattr(DT[[4]], "tzone", NULL)
+setattr(DT[[5]], "tzone", NULL)
+
+
+if (base::getRversion() >= "3.0.2") {
+  # "format() now supports digits = 0, to display nsmall decimal places."
+  old = options(digits.secs=0)
+  test(1741.3, x1<-capture.output(fwrite(DT,dateTimeAs="write.csv")),
+               capture.output(write.csv(DT,row.names=FALSE,quote=FALSE)))
+  options(digits.secs=3)
+  test(1741.4, x2<-capture.output(fwrite(DT,dateTimeAs="write.csv")),
+               capture.output(write.csv(DT,row.names=FALSE,quote=FALSE)))
+  options(digits.secs=6)
+  test(1741.5, x3<-capture.output(fwrite(DT,dateTimeAs="write.csv")),
+               capture.output(write.csv(DT,row.names=FALSE,quote=FALSE)))
+  options(old)
+  # check that extra digits made it into output
+  test(1741.6, sum(nchar(x1)) < sum(nchar(x2)) && sum(nchar(x2)) < sum(nchar(x3)))
+}
+
+# > 1e6 columns (there used to be VLAs at C level that caused stack overflow), #1903
+set.seed(1)
+L = lapply(1:1e6, sample, x=100, size=2)
+x = capture.output(fwrite(L))
+test(1742.1, nchar(x), c(2919861L, 2919774L))   # tests 2 very long lines, too
+test(1742.2, substring(x,1,10), c("27,58,21,9","38,91,90,6"))
+test(1742.3, L[[1L]], c(27L,38L))
+test(1742.4, L[[1000000L]], c(76L, 40L))
+test(1742.5, substring(x,nchar(x)-10,nchar(x)), c("50,28,95,76","62,87,23,40"))
+
+options(oldverbose)  # last capture.output(fwrite()) has happened now. TODO: tidy up and remove.
+
+# fread should properly handle NA in colClasses argument #1910
+test(1743.1, sapply(fread("a,b\n1,a", colClasses=c(NA, "factor"), verbose=TRUE), class), c(a="integer", b="factor"), output="Argument colClasses is ignored as requested by provided NA value")
+test(1743.2, sapply(fread("a,b\n1,a", colClasses=c(NA, NA), verbose=TRUE), class), c(a="integer", b="character"), output="Argument colClasses is ignored as requested by provided NA values")
+test(1743.3, fread("a,b\n1,a", colClasses=c(NA, TRUE)), error="when colClasses is logical it must be all NA")
+# also unknown issue in mixed character/factor output and colClasses vector
+test(1743.4, sapply(fread("a,b\n1,a", colClasses=c("character", "factor")), class), c(a="character", b="factor"))
+
+# rolling join stopped working for double with fractions, #1904
+DT = data.table(A=c(1999.917,2000.417,2000.917,2001.417,2001.917))
+setkey(DT,A)
+x = c(2000.167,2000.417,2000.667,2000.917,2001.167)
+test(1744.1, DT[.(x),roll=FALSE,which=TRUE], INT(NA,2,NA,3,NA))
+test(1744.2, DT[.(x),roll=TRUE, which=TRUE], INT(1,2,2,3,3))
+test(1744.3, DT[.(x),roll=1/12, which=TRUE], INT(NA,2,NA,3,NA))
+
+# 0's at the end of a non-empty subset of empty DT, #1937
+test(1745.1, data.table(a=character(0))[c(1,0)], data.table(a=NA_character_))
+test(1745.2, data.table(a=numeric(0))[c(1,0)], data.table(a=NA_real_))
+test(1745.3, data.table(a=integer(0))[c(1,0)], data.table(a=NA_integer_))
+
+# Long standing crash when by=.EACHI, nomatch=0, the first item in i has no match
+# AND j has function call that is passed a key column, #1933.
+DT = data.table(A=letters[1:5],B=1:5,key="A")
+ids = c("p","q","r","c","s","d")
+test(1746.1, DT[ids, A, by=.EACHI, nomatch=0], data.table(A=c("c","d"),A=c("c","d")))   # was always ok
+test(1746.2, DT[ids, print(A), by=.EACHI, nomatch=0],                                   # reliable crash in v1.9.6 and v1.9.8
+             data.table(A=character(0)), output="\"c\".*\"d\"")
+test(1746.3, DT[ids, {print(A);A}, by=.EACHI, nomatch=0],                               # reliable crash in v1.9.6 and v1.9.8
+             data.table(A=c("c","d"),V1=c("c","d")), output="\"c\".*\"d\"")
+
+# combining on= with by= and keyby=, #1943
+freshDT = data.table(x = rep(c("a", "b"), each = 4), y = 1:0, z = c(3L, 6L, 8L, 5L, 4L, 1L, 2L, 7L))
+DT = copy(freshDT)
+test(1747.1, DT["b", max(z), by    = y, on = "x"], ans1<-data.table(y=1:0, V1=c(4L,7L)))
+test(1747.2, DT["b", max(z), keyby = y, on = "x"], ans2<-data.table(y=0:1, V1=c(7L,4L), key="y"))
+test(1747.3, DT[x=="b", max(z), by = y], ans1)
+test(1747.4, DT[x=="b", max(z), keyby = y], ans2)
+DT = copy(freshDT)  # to clear any auto indexes
+test(1747.5, DT[x=="b", max(z), by = y], ans1)
+test(1747.6, DT[x=="b", max(z), keyby = y], ans2)
+setkey(DT, x)
+test(1747.7, DT["b", max(z), by = y], ans1)
+test(1747.8, DT["b", max(z), keyby = y], ans2)
+DT = copy(freshDT)  # and agin without the == having run before the setkey
+setkey(DT, x)
+test(1747.9,  DT["b", max(z), by = y], ans1)
+test(1747.11, DT["b", max(z), keyby = y], ans2)
+
+DT = as.data.table(mtcars[mtcars$cyl %in% c(6, 8), c("am", "vs", "hp")])
+test(1748.1, DT[.(0), max(hp), by    = vs, on = "am"], ans1<-data.table(vs=c(1,0), V1=c(123,245)))
+test(1748.2, DT[.(0), max(hp), keyby = vs, on = "am"], ans2<-data.table(vs=c(0,1), V1=c(245,123), key="vs"))
+DT = as.data.table(mtcars[mtcars$cyl %in% c(6, 8), c("am", "vs", "hp")])
+test(1748.3, DT[am==0, max(hp), by=vs], ans1)
+test(1748.4, DT[am==0, max(hp), keyby=vs], ans2)
+
+
+
 ##########################
 
+# TODO: Tests involving GForce functions needs to be run with optimisation level 1 and 2, so that both functions are tested all the time.
 
 # TO DO: Add test for fixed bug #5519 - dcast returned error when a package imported data.table, but dint happen when "depends" on data.table. This is fixed (commit 1263 v1.9.3), but not sure how to add test.
 
@@ -6978,17 +9812,19 @@ test(1557.4, dt[, .SD, .SDcols=paste0("index", 1:i)], dt[, .SD, .SDcols=index1:i
 
 ##########################
 options(warn=0)
-options(oldbwb) # set at top of this file
-plat = paste("endian=",.Platform$endian,", sizeof(long double)==",.Machine$sizeof.longdouble,sep="")
+setDTthreads(0)
+options(oldalloccol)  # set at top of this file
+options(oldWhenJsymbol)
+plat = paste("endian==",.Platform$endian,", sizeof(long double)==",.Machine$sizeof.longdouble,
+             ", sizeof(pointer)==",.Machine$sizeof.pointer, sep="")
 if (nfail > 0) {
     if (nfail>1) {s1="s";s2="s: "} else {s1="";s2=" "}
     cat("\r")
     stop(nfail," error",s1," out of ",ntest, " (lastID=",lastnum,", ",plat, ") in inst/tests/tests.Rraw on ",date(),". Search tests.Rraw for test number",s2,paste(whichfail,collapse=", "),".")
     # important to stop() here, so that 'R CMD check' fails
 }
-cat("\rAll ",ntest," tests (lastID=",lastnum,") in inst/tests/tests.Rraw completed ok in ",timetaken(started.at)," on ",date()," (",plat,")\n",sep="")
-# Reporting lastnum rather than ntest makes it easier to check user has the latest version, assuming
-# each release or patch has extra tests.
+cat("\n",plat,"\n\nAll ",ntest," tests in inst/tests/tests.Rraw completed ok in ",timetaken(started.at)," on ",date(),"\n",sep="")
 # date() is included so we can tell when CRAN checks were run (in particular if they have been rerun since
 # an update to Rdevel itself; data.table doesn't have any other dependency) since there appears to be no other
 # way to see the timestamp that CRAN checks were run. Some CRAN machines lag by several days.
+
diff --git a/man/IDateTime.Rd b/man/IDateTime.Rd
index 5bca483..4a65a37 100644
--- a/man/IDateTime.Rd
+++ b/man/IDateTime.Rd
@@ -29,12 +29,15 @@
 \alias{round.IDate}
 \alias{seq.IDate}
 \alias{split.IDate}
-\alias{hour}   
-\alias{yday}   
-\alias{wday}   
-\alias{mday}   
-\alias{week}   
-\alias{month}  
+\alias{second}
+\alias{minute}
+\alias{hour}
+\alias{yday}
+\alias{wday}
+\alias{mday}
+\alias{week}
+\alias{isoweek}
+\alias{month}
 \alias{quarter}
 \alias{year}
 \alias{IDate-class}
@@ -64,14 +67,17 @@ as.ITime(x, ...)
 IDateTime(x, ...)
 \method{IDateTime}{default}(x, ...)
 
-hour(x)   
-yday(x)   
-wday(x)   
-mday(x)   
-week(x)   
-month(x)  
+second(x)
+minute(x)
+hour(x)
+yday(x)
+wday(x)
+mday(x)
+week(x)
+isoweek(x)
+month(x)
 quarter(x)
-year(x)   
+year(x)
 
 }
 
@@ -89,7 +95,7 @@ year(x)
 \code{IDate} is a date class derived from \code{Date}. It has the same
 internal representation as the \code{Date} class, except the storage
 mode is integer. \code{IDate} is a relatively simple wrapper, and it
-should work in almost all situations as a replacement for \code{Date}. 
+should work in almost all situations as a replacement for \code{Date}.
 
 Functions that use \code{Date} objects generally work for
 \code{IDate} objects. This package provides specific methods for
@@ -118,7 +124,7 @@ following: \code{as.POSIXct(time, date)} or \code{as.POSIXct(date,
 time)}.
 
 \code{IDateTime} takes a date-time input and returns a data table with
-columns \code{date} and \code{time}. 
+columns \code{date} and \code{time}.
 
 Using integer storage allows dates and/or times to be used as data table
 keys. With positive integers with a range less than 100,000, grouping
@@ -131,7 +137,7 @@ intervals. \code{as.POSIXlt} is also useful. For example,
 \code{as.POSIXlt(x)$mon} is the integer month. The R base convenience
 functions \code{weekdays}, \code{months}, and \code{quarters} can also
 be used, but these return character values, so they must be converted to
-factors for use with data.table.
+factors for use with data.table. \code{isoweek} is ISO 8601-consistent.
 
 The \code{round} method for IDate's is useful for grouping and plotting. It can
 round to weeks, months, quarters, and years.
@@ -148,11 +154,14 @@ round to weeks, months, quarters, and years.
    For \code{IDateTime}, a data table with columns \code{idate} and
    \code{itime} in \code{IDate} and \code{ITime} format.
 
-   \code{hour}, code{yday}, \code{wday}, \code{mday}, \code{week},
-   \code{month}, \code{quarter}, and \code{year} return integer values
-   for hour, day of year, day of week, day of month, week, month,
-   quarter, and year.
-   
+   \code{second}, \code{minute}, \code{hour}, \code{yday}, \code{wday},
+   \code{mday}, \code{week}, \code{month}, \code{quarter},
+   and \code{year} return integer values
+   for second, minute, hour, day of year, day of week,
+   day of month, week, month, quarter, and year, respectively.
+
+   These values are all taken directly from the \code{POSIXlt} representation of \code{x}, with the notable difference that while \code{yday}, \code{wday}, and \code{mon} are all 0-based, here they are 1-based.
+
 }
 \references{
 
@@ -160,7 +169,8 @@ round to weeks, months, quarters, and years.
   R News, vol. 4, no. 1, June 2004.
 
   H. Wickham, http://gist.github.com/10238.
-  
+
+  ISO 8601, http://www.iso.org/iso/home/standards/iso8601.htm
 }
 
 \author{ Tom Short, t.short at ieee.org }
@@ -177,7 +187,7 @@ round to weeks, months, quarters, and years.
 
 # S4 coercion also works
 identical(as.IDate("2001-01-01"), as("2001-01-01", "IDate"))
- 
+
 # create ITime:
 (t <- as.ITime("10:45"))
 
@@ -190,18 +200,18 @@ identical(as.ITime("10:45"), as("10:45", "ITime"))
 
 as.POSIXct("2001-01-01") + as.ITime("10:45")
 
-datetime <- seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), by = "5 hour")    
+datetime <- seq(as.POSIXct("2001-01-01"), as.POSIXct("2001-01-03"), by = "5 hour")
 (af <- data.table(IDateTime(datetime), a = rep(1:2, 5), key = "a,idate,itime"))
 
 af[, mean(a), by = "itime"]
 af[, mean(a), by = list(hour = hour(itime))]
-af[, mean(a), by = list(wday = factor(weekdays(idate)))] 
-af[, mean(a), by = list(wday = wday(idate))] 
+af[, mean(a), by = list(wday = factor(weekdays(idate)))]
+af[, mean(a), by = list(wday = wday(idate))]
 
-as.POSIXct(af$idate) 
-as.POSIXct(af$idate, time = af$itime) 
-as.POSIXct(af$idate, af$itime) 
-as.POSIXct(af$idate, time = af$itime, tz = "GMT") 
+as.POSIXct(af$idate)
+as.POSIXct(af$idate, time = af$itime)
+as.POSIXct(af$idate, af$itime)
+as.POSIXct(af$idate, time = af$itime, tz = "GMT")
 
 as.POSIXct(af$itime, af$idate)
 as.POSIXct(af$itime) # uses today's date
@@ -216,7 +226,7 @@ if (require(chron)) {
     as.chron(as.ITime("10:45"), as.IDate("2000-01-01"))
     as.ITime(chron(times = "11:01:01"))
     IDateTime(chron("12/31/98","10:45:00"))
-}  
+}
 
 }
 \keyword{utilities}
diff --git a/man/J.Rd b/man/J.Rd
index ea2a122..d860f50 100644
--- a/man/J.Rd
+++ b/man/J.Rd
@@ -14,7 +14,8 @@ CJ(..., sorted = TRUE, unique = FALSE)  # DT[CJ(...)]
 }
 
 \arguments{
-  \item{\dots}{ Each argument is a vector. Generally each vector is the same length but if they are not then usual silent repitition is applied. }
+  \item{\dots}{ Each argument is a vector. Generally each vector is the
+  same length but if they are not then the usual silent repetition is applied. }
   \item{sorted}{ logical. Should the input order be retained?}
   \item{unique}{ logical. When \code{TRUE}, only unique values of each vectors are used (automatically). }
 }
@@ -45,7 +46,7 @@ DT[list("b")]   # same
 # CJ usage examples
 CJ(c(5,NA,1), c(1,3,2)) # sorted and keyed data.table
 do.call(CJ, list(c(5,NA,1), c(1,3,2))) # same as above
-CJ(c(5,NA,1), c(1,3,2), sorted=FALSE) # same order as input, unkeyed 
+CJ(c(5,NA,1), c(1,3,2), sorted=FALSE) # same order as input, unkeyed
 # use for 'unique=' argument
 x = c(1,1,2)
 y = c(4,6,4)
diff --git a/man/all.equal.data.table.Rd b/man/all.equal.data.table.Rd
index fdb157b..e27fde1 100644
--- a/man/all.equal.data.table.Rd
+++ b/man/all.equal.data.table.Rd
@@ -3,12 +3,13 @@
 \alias{all.equal.data.table}
 \title{ Equality Test Between Two Data Tables }
 \description{
-  Performs some factor level ``stripping'' and other operations to allow
-  for a convenient test of data equality between \code{data.table} objects.
+  Convenient test of data equality between \code{data.table} objects. Performs some factor level \emph{stripping}.
 }
 
 \usage{
-  \method{all.equal}{data.table}(target, current, trim.levels = TRUE, ...)
+  \method{all.equal}{data.table}(target, current, trim.levels=TRUE, check.attributes=TRUE, 
+    ignore.col.order=FALSE, ignore.row.order=FALSE, tolerance=sqrt(.Machine$double.eps),
+    ...)
 }
 
 \arguments{
@@ -18,18 +19,32 @@
 
   \item{trim.levels}{
     A logical indicating whether or not to remove all unused levels in columns
-    that are factors before running equality check.
+    that are factors before running equality check. It effect only when \code{check.attributes} is TRUE and \code{ignore.row.order} is FALSE.
+  }
+    
+  \item{check.attributes}{
+    A logical indicating whether or not to check attributes, will apply not only to data.table but also attributes of the columns. It will skip \code{c("row.names",".internal.selfref")} data.table attributes.
+  }
+  
+  \item{ignore.col.order}{
+    A logical indicating whether or not to ignore columns order in \code{data.table}.
+  }
+  
+  \item{ignore.row.order}{
+    A logical indicating whether or not to ignore rows order in \code{data.table}. This option requires datasets to use data types on which join can be made, so no support for \emph{list, complex, raw}, but still supports \link[bit64]{integer64}.
+  }
+  
+  \item{tolerance}{
+    A numeric value used when comparing numeric columns, by default \code{sqrt(.Machine$double.eps)}. Unless non-default value provided it will be forced to \code{0} if used together with \code{ignore.row.order} and duplicate rows detected or factor columns present.
   }
 
   \item{\dots}{
-    Passed down to internal call of \code{\link{all.equal.list}}
+    Passed down to internal call of \code{\link[base]{all.equal}}.
   }
 }
 
 \details{
-  This function is used primarily to make life easy with a testing harness
-  built around \code{test_that}. A call to \code{test_that::(expect_equal|equal)}
-  will ultimately dispatch to this method when making an "equality" check.
+  For efficiency data.table method will exit on detected non-equality issues, unlike most \code{\link[base]{all.equal}} methods which process equality checks further. Besides that fact it also handles the most time consuming case of \code{ignore.row.order = TRUE} very efficiently.
 }
 
 \value{
@@ -38,13 +53,38 @@
 }
 
 \seealso{
-  \code{\link{all.equal.list}}
+  \code{\link[base]{all.equal}}
 }
 
 \examples{
 dt1 <- data.table(A = letters[1:10], X = 1:10, key = "A")
 dt2 <- data.table(A = letters[5:14], Y = 1:10, key = "A")
-identical(all.equal(dt1, dt1), TRUE)
+isTRUE(all.equal(dt1, dt1))
 is.character(all.equal(dt1, dt2))
+
+# ignore.col.order
+x <- copy(dt1)
+y <- dt1[, .(X, A)]
+all.equal(x, y)
+all.equal(x, y, ignore.col.order = TRUE)
+
+# ignore.row.order
+x <- setkeyv(copy(dt1), NULL)
+y <- dt1[sample(nrow(dt1))]
+all.equal(x, y)
+all.equal(x, y, ignore.row.order = TRUE)
+
+# check.attributes
+x = copy(dt1)
+y = setkeyv(copy(dt1), NULL)
+all.equal(x, y)
+all.equal(x, y, check.attributes = FALSE)
+
+# trim.levels
+x <- data.table(A = factor(letters[1:10])[1:4]) # 10 levels
+y <- data.table(A = factor(letters[1:5])[1:4]) # 5 levels
+all.equal(x, y, trim.levels = FALSE)
+all.equal(x, y, trim.levels = FALSE, check.attributes = FALSE)
+all.equal(x, y)
 }
 
diff --git a/man/as.data.table.Rd b/man/as.data.table.Rd
new file mode 100644
index 0000000..66ca962
--- /dev/null
+++ b/man/as.data.table.Rd
@@ -0,0 +1,79 @@
+\name{as.data.table}
+\alias{as.data.table}
+\alias{as.data.table.matrix}
+\alias{as.data.table.list}
+\alias{as.data.table.data.frame}
+\alias{as.data.table.data.table}
+\alias{as.data.table.factor}
+\alias{as.data.table.ordered}
+\alias{as.data.table.integer}
+\alias{as.data.table.numeric}
+\alias{as.data.table.logical}
+\alias{as.data.table.character}
+\alias{as.data.table.Date}
+\alias{is.data.table}
+\title{Coerce to data.table}
+\description{
+Functions to check if an object is \code{data.table}, or coerce it if possible. 
+
+}
+\usage{
+as.data.table(x, keep.rownames=FALSE, \dots)
+
+\method{as.data.table}{data.table}(x, \dots)
+
+is.data.table(x)
+
+}
+\arguments{
+  \item{x}{An R object.}
+  \item{keep.rownames}{Default is \code{FALSE}. If \code{TRUE}, adds the input object's names as a separate column named \code{"rn"}. \code{keep.rownames = "id"} names the column \code{"id"} instead.}
+  \item{\dots}{Additional arguments to be passed to or from other methods.}
+}
+\details{
+
+  \code{as.data.table} is a generic function with many methods, and other packages can supply further methods. 
+
+  If a \code{list} is supplied, each element is converted to a column in the \code{data.table} with shorter elements recycled automatically. Similarly, each column of a \code{matrix} is converted separately.
+
+  \code{character} objects are \emph{not} converted to \code{factor} types unlike \code{as.data.frame}.
+
+  If a \code{data.frame} is supplied, all classes preceding \code{"data.frame"} are stripped. Similarly, for \code{data.table} as input, all classes preceding \code{"data.table"} are stripped. \code{as.data.table} methods returns a \emph{copy} of original data. To modify by reference see \code{\link{setDT}} and \code{\link{setDF}}.
+
+  \code{keep.rownames} argument can be used to preserve the (row)names attribute in the resulting \code{data.table}.
+}
+\seealso{ 
+  \code{\link{data.table}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{J}}, \code{\link{SJ}}, \code{\link{CJ}}, \code{\link{merge.data.table}}, \code{\link{:=}}, \code{\link{alloc.col}}, \code{\link{truelength}}, \code{\link{rbindlist}}, \code{\link{setNumericRounding}}, \code{\link{datatable-optimize}}
+}
+\examples{
+nn = c(a=0.1, b=0.2, c=0.3, d=0.4)
+as.data.table(nn)
+as.data.table(nn, keep.rownames=TRUE)
+as.data.table(nn, keep.rownames="rownames")
+
+# char object not converted to factor
+cc = c(X="a", Y="b", Z="c")
+as.data.table(cc)
+as.data.table(cc, keep.rownames=TRUE)
+as.data.table(cc, keep.rownames="rownames")
+
+mm = matrix(1:4, ncol=2, dimnames=list(c("r1", "r2"), c("c1", "c2")))
+as.data.table(mm)
+as.data.table(mm, keep.rownames=TRUE)
+as.data.table(mm, keep.rownames="rownames")
+
+ll = list(a=1:2, b=3:4)
+as.data.table(ll)
+as.data.table(ll, keep.rownames=TRUE)
+as.data.table(ll, keep.rownames="rownames")
+
+df = data.frame(x=rep(c("x","y","z"),each=2), y=c(1,3,6), row.names=LETTERS[1:6])
+as.data.table(df)
+as.data.table(df, keep.rownames=TRUE)
+as.data.table(df, keep.rownames="rownames")
+
+dt = data.table(x=rep(c("x","y","z"),each=2), y=c(1:6))
+as.data.table(dt)
+}
+\keyword{ data }
+
diff --git a/man/as.xts.data.table.Rd b/man/as.xts.data.table.Rd
index 2f6b843..a80803a 100644
--- a/man/as.xts.data.table.Rd
+++ b/man/as.xts.data.table.Rd
@@ -5,10 +5,11 @@
   Efficient conversion of data.table to xts, data.table must have \emph{POSIXct} or \emph{Date} type in first column.
 }
 \usage{
-as.xts.data.table(x)
+as.xts.data.table(x, ...)
 }
 \arguments{
 \item{x}{data.table to convert to xts, must have \emph{POSIXct} or \emph{Date} in the first column. All others non-numeric columns will be omitted with warning.}
+\item{\dots}{ignored, just for consistency with generic method.}
 }
 \seealso{ \code{\link{as.data.table.xts}} }
 \examples{
diff --git a/man/assign.Rd b/man/assign.Rd
index 0e361c8..06bc96c 100644
--- a/man/assign.Rd
+++ b/man/assign.Rd
@@ -3,93 +3,116 @@
 \alias{set}
 \title{ Assignment by reference }
 \description{
-    Fast add, remove and modify subsets of columns, by reference.
+    Fast add, remove and update subsets of columns, by reference. \code{:=} opertor can be used in two ways: \code{LHS := RHS} form, and \code{Functional form}. See \code{Usage}.
+
+    \code{set} is a low overhead loopable version of \code{:=}. It is particularly useful for repetitively updating rows of certain columns by reference (using a for-loop). See \code{Examples}. It can not perform grouping operations.
+
 }
 \usage{
-#   DT[i, LHS:=RHS, by=...]
+# 1. LHS := RHS form
+# DT[i, LHS := RHS, by = ...]
+# DT[i, c("LHS1", "LHS2") := list(RHS1, RHS2), by = ...]
 
-#   DT[i, c("LHS1","LHS2") := list(RHS1, RHS2), by=...]
+# 2. Functional form
+# DT[i, `:=`(LHS1 = RHS1,
+#            LHS2 = RHS2,
+#            ...), by = ...]
 
-#   DT[i, `:=`(LHS1=RHS1,
-#              LHS2=RHS2,
-#              ...), by=...]
-    
-    set(x, i=NULL, j, value)
+set(x, i = NULL, j, value)
 }
 \arguments{
-\item{LHS}{ A single column name. Or, when \code{with=FALSE}, a vector of column names or numeric positions (or a variable that evaluates as such). If the column doesn't exist, it is added, by reference. }
-\item{RHS}{ A vector of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. Or, when \code{with=FALSE}, a \code{list} of replacement vectors which are applied (the \code{list} is recycled if necessary) to each column of \code{LHS} . To remove a column use \code{NULL}. }
+\item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. }
+\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any.  To remove a column use \code{NULL}. }
 \item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. }
-\item{i}{ Optional. In set(), integer row numbers to be assigned \code{value}. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. }
-\item{j}{ In set(), integer column number to be assigned \code{value}. }
-\item{value}{ Value to assign by reference to \code{x[i,j]}. }
+\item{i}{ Optional. Indicates the rows on which the values must be updated with. If not provided, implies \emph{all rows}. The \code{:=} form is more powerful as it allows \emph{subsets} and \code{joins} based add/update columns by reference. See \code{Details}.
+
+    In \code{set}, only integer type is allowed in \code{i} indicating which rows \code{value} should be assigned to. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. }
+\item{j}{ Column name(s) (character) or number(s) (integer) to be assigned \code{value} when column(s) already exist, and only column name(s) if they are to be added newly. }
+\item{value}{ A list of replacement values to assign by reference to \code{x[i, j]}. }
 }
 \details{
-\code{:=} is defined for use in \code{j} only. It \emph{updates} or \emph{adds} the column(s) by reference. It makes no copies of any part of memory at all. Typical usages are :
+\code{:=} is defined for use in \code{j} only. It \emph{adds} or \emph{updates} or \emph{removes} column(s) by reference. It makes no copies of any part of memory at all. Read the \href{../doc/datatable-reference-semantics.html}{Reference Semantics HTML vignette} to follow with examples. Some typical usages are:
+
 \preformatted{
-    DT[i,colname:=value]                      # update (or add at the end if doesn't exist) a column called "colname" with value where i and (when new column) NA elsewhere
-    DT[i,"colname \%":=value]                  # same. column called "colname \%"
-    DT[i,(3:6):=value]                        # update existing columns 3:6 with value. Aside: parens are not required here since : already makes LHS a call rather than a symbol
-    DT[i,colnamevector:=value,with=FALSE]     # old syntax. The contents of colnamevector in calling scope determine the column names or positions to update (or add)
-    DT[i,(colnamevector):=value]              # same, shorthand. Now preferred. The parens are enough to stop the LHS being a symbol
-    DT[i,colC:=mean(colB),by=colA]            # update (or add) column called "colC" by reference by group. A major feature of `:=`.
-    DT[,`:=`(new1=sum(colB), new2=sum(colC))] # multiple :=.  
+    DT[, col := val]                              # update (or add at the end if doesn't exist) a column called "col" with value "val" (recycled if necessary).
+    DT[i, col := val]                             # same as above, but only for those rows specified in i and (for new columns) NA elsewhere.
+    DT[i, "col a" := val]                         # same. column is called "col a"
+    DT[i, (3:6) := val]                           # update existing columns 3:6 with value. Aside: parens are not required here since : already makes LHS a call rather than a symbol.
+    DT[i, colvector := val, with = FALSE]         # OLD syntax. The contents of "colvector" in calling scope determine the column(s).
+    DT[i, (colvector) := val]                     # same (NOW PREFERRED) shorthand syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector).
+    DT[i, colC := mean(colB), by = colA]          # update (or add) column called "colC" by reference by group. A major feature of `:=`.
+    DT[,`:=`(new1 = sum(colB), new2 = sum(colC))] # Functional form
 }
-The following all result in a friendly error (by design) :
+
+All of the following result in a friendly error (by design) :
+
 \preformatted{
-    x := 1L                                   # friendly error
-    DT[i,colname] := value                    # friendly error
-    DT[i]$colname := value                    # friendly error
-    DT[,{col1:=1L;col2:=2L}]                  # friendly error. Use `:=`() instead for multiple := (see above)
+    x := 1L
+    DT[i, col] := val
+    DT[i]$col := val
+    DT[, {col1 := 1L; col2 := 2L}]                # Use the functional form, `:=`(), instead (see above).
 }
 
-\code{:=} in \code{j} can be combined with all types of \code{i} (such as binary search), and all types of \code{by}. This a one reason why \code{:=} has been implemented in \code{j}. See FAQ 2.16 for analogies to SQL. \cr\cr  % for some reason in this .Rd file (but not the others), newlines seem to be ignored.
+For additional resources, check the \href{../doc/datatable-faq.html}{FAQs vignette}. Also have a look at StackOverflow's \href{http://stackoverflow.com/search?q=\%5Bdata.table\%5D+reference}{data.table tag}.
 
-When \code{LHS} is a factor column and \code{RHS} is a character vector with items missing from the factor levels, the new level(s) are automatically added (by reference, efficiently), unlike base methods.\cr\cr
+\code{:=} in \code{j} can be combined with all types of \code{i} (such as binary search), and all types of \code{by}. This a one reason why \code{:=} has been implemented in \code{j}. See the \href{../doc/datatable-reference-semantics}{Reference Semantics HTML vignette} and also \code{FAQ 2.16} for analogies to SQL.
 
-Unlike \code{<-} for \code{data.frame}, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given (whether or not fractional data is truncated). The motivation for this is efficiency. It is best to get the column types correct up front and stick to them. Changing a column type is possible but [...]
+When \code{LHS} is a factor column and \code{RHS} is a character vector with items missing from the factor levels, the new level(s) are automatically added (by reference, efficiently), unlike base methods.
 
-\code{data.table}s are \emph{not} copied-on-change by \code{:=}, \code{setkey} or any of the other \code{set*} functions. See \code{\link{copy}}.\cr\cr
+Unlike \code{<-} for \code{data.frame}, the (potentially large) LHS is not coerced to match the type of the (often small) RHS. Instead the RHS is coerced to match the type of the LHS, if necessary. Where this involves double precision values being coerced to an integer column, a warning is given (whether or not fractional data is truncated). The motivation for this is efficiency. It is best to get the column types correct up front and stick to them. Changing a column type is possible but [...]
 
-Additional resources: search for "\code{:=}" in the \href{../doc/datatable-faq.pdf}{FAQs vignette} (3 FAQs mention \code{:=}), search Stack Overflow's \href{http://stackoverflow.com/search?q=\%5Bdata.table\%5D+reference}{data.table tag for "reference"} (6 questions).\cr\cr
+\code{data.table}s are \emph{not} copied-on-change by \code{:=}, \code{setkey} or any of the other \code{set*} functions. See \code{\link{copy}}.
+}
 
-Advanced (internals) : sub assigning to existing columns is easy to see how that is done internally. Removing columns by reference is also straightforward by modifying the vector of column pointers only (using memmove in C). Adding columns is more tricky to see how that can be grown by reference: the list vector of column pointers is over-allocated, see \code{\link{truelength}}. By defining \code{:=} in \code{j} we believe update synax is natural, and scales, but also it bypasses \code{[ [...]
+\section{Advanced (internals):}{It is easy to see how \emph{sub-assigning} to existing columns is done internally. Removing columns by reference is also straightforward by modifying the vector of column pointers only (using memmove in C). However adding (new) columns is more tricky as to how the \code{data.table} can be grown \emph{by reference}: the list vector of column pointers is \emph{over-allocated}, see \code{\link{truelength}}. By defining \code{:=} in \code{j} we believe update  [...]
 
-Since \code{[.data.table} incurs overhead to check the existence and type of arguments (for example), \code{set()} provides direct (but less flexible) assignment by reference with low overhead, appropriate for use inside a \code{for} loop. See examples. \code{:=} is more flexible than \code{set()} because \code{:=} is intended to be combined with \code{i} and \code{by} in single queries on large datasets.
+Since \code{[.data.table} incurs overhead to check the existence and type of arguments (for example), \code{set()} provides direct (but less flexible) assignment by reference with low overhead, appropriate for use inside a \code{for} loop. See examples. \code{:=} is more powerful and flexible than \code{set()} because \code{:=} is intended to be combined with \code{i} and \code{by} in single queries on large datasets.
+}
+\section{Note:}{
+    \code{DT[a > 4, b := c]} is different from \code{DT[a > 4][, b := c]}. The first expression updates (or adds) column \code{b} with the value \code{c} on those rows where \code{a > 4} evaluates to \code{TRUE}. \code{X} is updated \emph{by reference}, therefore no assignment needed. 
 
+    The second expression on the other hand updates a \emph{new} \code{data.table} that's returned by the subset operation. Since the subsetted data.table is ephemeral (it is not assigned to a symbol), the result would be lost; unless the result is assigned, for example, as follows: \code{ans <- DT[a > 4][, b := c]}.
 }
 \value{
-    \code{DT} is modified by reference and the new value is returned. If you require a copy, take a copy first (using \code{DT2=copy(DT)}). Recall that this package is for large data (of mixed column types, with multi-column keys) where updates by reference can be many orders of magnitude faster than copying the entire table.   
+\code{DT} is modified by reference and returned invisibly. If you require a copy, take a \code{\link{copy}} first (using \code{DT2 = copy(DT)}).
 }
 \seealso{ \code{\link{data.table}}, \code{\link{copy}}, \code{\link{alloc.col}}, \code{\link{truelength}}, \code{\link{set}}
 }
 \examples{
-DT = data.table(a=LETTERS[c(1,1:3)],b=4:7,key="a")
-DT[,c:=8]        # add a numeric column, 8 for all rows
-DT[,d:=9L]       # add an integer column, 9L for all rows
-DT[,c:=NULL]     # remove column c
-DT[2,d:=10L]     # subassign by reference to column d
-DT               # DT changed by reference
-
-DT[b>4,b:=d*2L]  # subassign to b using d, where b>4
-DT["A",b:=0L]    # binary search for group "A" and set column b
-
-DT[,e:=mean(d),by=a]  # add new column by group by reference
-DT["B",f:=mean(d)]    # subassign to new column, NA initialized
+DT = data.table(a = LETTERS[c(3L,1:3)], b = 4:7)
+DT[, c := 8]                # add a numeric column, 8 for all rows
+DT[, d := 9L]               # add an integer column, 9L for all rows
+DT[, c := NULL]             # remove column c
+DT[2, d := -8L]             # subassign by reference to d; 2nd row is -8L now
+DT                          # DT changed by reference
+DT[2, d := 10L][]           # shorthand for update and print
+
+DT[b > 4, b := d * 2L]      # subassign to b with d*2L on those rows where b > 4 is TRUE
+DT[b > 4][, b := d * 2L]    # different from above. [, := ] is performed on the subset 
+                            # which is an new (ephemeral) data.table. Result needs to be 
+                            # assigned to a variable (using `<-`).
+
+DT[, e := mean(d), by = a]  # add new column by group by reference
+DT["A", b := 0L, on = "a"]  # ad-hoc update of column b for group "A" using
+			    # joins-as-subsets with binary search and 'on='
+# same as above but using keys
+setkey(DT, a)
+DT["A", b := 0L]            # binary search for group "A" and set column b using keys
+DT["B", f := mean(d)]       # subassign to new column, NA initialized
 
 \dontrun{
 # Speed example ...
     
-m = matrix(1,nrow=100000,ncol=100)
+m = matrix(1, nrow = 2e6L, ncol = 100L)
 DF = as.data.frame(m)
 DT = as.data.table(m)    
 
-system.time(for (i in 1:1000) DF[i,1] <- i)
-# 591 seconds      
-system.time(for (i in 1:1000) DT[i,V1:=i])
-# 2.4 seconds  ( 246 times faster, 2.4 is overhead in [.data.table )
-system.time(for (i in 1:1000) set(DT,i,1L,i))
-# 0.03 seconds  ( 19700 times faster, overhead of [.data.table is avoided )
+system.time(for (i in 1:1000) DF[i, 1] = i)
+# 15.856 seconds
+system.time(for (i in 1:1000) DT[i, V1 := i])
+# 0.279 seconds  (57 times faster)
+system.time(for (i in 1:1000) set(DT, i, 1L, i))
+# 0.002 seconds  (7930 times faster, overhead of [.data.table is avoided)
 
 # However, normally, we call [.data.table *once* on *large* data, not many times on small data.
 # The above is to demonstrate overhead, not to recommend looping in this way. But the option
diff --git a/man/between.Rd b/man/between.Rd
index d4d2cec..03d6daf 100644
--- a/man/between.Rd
+++ b/man/between.Rd
@@ -1,29 +1,72 @@
 \name{between}
 \alias{between}
 \alias{\%between\%}
-\title{ Convenience function for range subset logic. }
+\alias{inrange}
+\alias{\%inrange\%}
+\title{ Convenience functions for range subsets. }
 \description{
-  Intended for use in [.data.table i.
+Intended for use in \code{i} in \code{[.data.table}.
+
+\code{between} is equivalent to \code{x >= lower & x <= upper} when 
+\code{incbounds=TRUE}, or \code{x > lower & y < upper} when \code{FALSE}.
+
+\code{inrange} checks whether each value in \code{x} is in between any of 
+the intervals provided in \code{lower,upper}.
 }
 \usage{
-between(x,lower,upper,incbounds=TRUE)
-x %between% c(lower,upper)
+between(x, lower, upper, incbounds=TRUE)
+x \%between\% y
+
+inrange(x, lower, upper, incbounds=TRUE)
+x \%inrange\% y
 }
 \arguments{
-   \item{x}{ Any vector e.g. numeric, character, date, ... }
-   \item{lower}{ Lower range bound. }
-   \item{upper}{ Upper range bound. }
-   \item{incbounds}{ \code{TRUE} means inclusive bounds i.e. [lower,upper]. \code{FALSE} means exclusive bounds i.e. (lower,upper). }
+\item{x}{ Any orderable vector, i.e., those with relevant methods for 
+\code{`<=`}, such as \code{numeric}, \code{character}, \code{Date}, etc. in 
+case of \code{between} and a numeric vector in case of \code{inrange}.}
+\item{lower}{ Lower range bound.}
+\item{upper}{ Upper range bound.}
+\item{y}{ A length-2 \code{vector} or \code{list}, with \code{y[[1]]} 
+interpreted as \code{lower} and \code{y[[2]]} as \code{upper}.} 
+\item{incbounds}{ \code{TRUE} means inclusive bounds, i.e., [lower,upper]. 
+\code{FALSE} means exclusive bounds, i.e., (lower,upper). 
+
+It is set to \code{TRUE} by default for infix notations.}
+}
+\details{
+
+From \code{v1.9.8+}, \code{between} is vectorised. \code{lower} and 
+\code{upper} are recycled to \code{length(x)} if necessary.
+
+\emph{non-equi} joins were recently implemented in \code{v1.9.8}. It extends 
+binary search based joins in \code{data.table} to other binary operators 
+including \code{>=, <=, >, <}. \code{inrange} makes use of this new  
+functionality and performs a range join.
+
 }
-% \details{
-% }
 \value{
-    Logical vector as the same length as \code{x} with value \code{TRUE} for those that lie within the range [lower,upper] or (lower,upper).
+Logical vector as the same length as \code{x} with value \code{TRUE} for those 
+that lie within the specified range.
+}
+\note{ Current implementation does not make use of ordered keys for 
+\code{\%between\%}. }
+\seealso{
+\code{\link{data.table}}, \code{\link{like}}, \code{\link{\%chin\%}} 
 }
-\note{ Current implementation does not make use of ordered keys. }
-\seealso{ \code{\link{data.table}}, \code{\link{like}} }
 \examples{
-DT = data.table(a=1:5, b=6:10)
-DT[b \%between\% c(7,9)]
+X = data.table(a=1:5, b=6:10, c=c(5:1))
+X[b \%between\% c(7,9)]
+X[between(b, 7, 9)] # same as above
+# NEW feature in v1.9.8, vectorised between
+X[c \%between\% list(a,b)]
+X[between(c, a, b)] # same as above
+X[between(c, a, b, incbounds=FALSE)] # open interval
+
+# inrange()
+Y = data.table(a=c(8,3,10,7,-10), val=runif(5))
+range = data.table(start = 1:5, end = 6:10)
+Y[a \%inrange\% range]
+Y[inrange(a, range$start, range$end)] # same as above
+Y[inrange(a, range$start, range$end, incbounds=FALSE)] # open interval
 }
 \keyword{ data }
diff --git a/man/data.table.Rd b/man/data.table.Rd
index 85aad18..4fb543b 100644
--- a/man/data.table.Rd
+++ b/man/data.table.Rd
@@ -2,23 +2,23 @@
 \alias{data.table-package}
 \docType{package}
 \alias{data.table}
-\alias{as.data.table}
-\alias{is.data.table}
 \alias{Ops.data.table}
 \alias{is.na.data.table}
 \alias{[.data.table}
 \title{ Enhanced data.frame }
 \description{
-   \code{data.table} \emph{inherits} from \code{data.frame}. It offers fast subset, fast grouping, fast update, fast ordered joins and list columns in a short and flexible syntax, for faster development. It is inspired by \code{A[B]} syntax in \R where \code{A} is a matrix and \code{B} is a 2-column matrix. Since a \code{data.table} \emph{is} a \code{data.frame}, it is compatible with \R functions and packages that \emph{only} accept \code{data.frame}.
+   \code{data.table} \emph{inherits} from \code{data.frame}. It offers fast and nemory efficient: file reader and writer, aggregations, updates, equi, non-equi, rolling, range and interval joins, in a short and flexible syntax, for faster development. 
+
+   It is inspired by \code{A[B]} syntax in \R where \code{A} is a matrix and \code{B} is a 2-column matrix. Since a \code{data.table} \emph{is} a \code{data.frame}, it is compatible with \R functions and packages that accept \emph{only} \code{data.frame}s.
    
-   The 10 minute quick start guide to \code{data.table} may be a good place to start: \href{../doc/datatable-intro.pdf}{\code{vignette("datatable-intro")}}. Or, the first section of FAQs is intended to be read from start to finish and is considered core documentation: \href{../doc/datatable-faq.pdf}{\code{vignette("datatable-faq")}}. If you have read and searched these documents and the help page below, please feel free to ask questions on \href{http://r.789695.n4.nabble.com/datatable-he [...]
+   Type \code{vignette(package="data.table")} to get started. The \href{../doc/datatable-intro.html}{Introduction to data.table} vignette introduces \code{data.table}'s \code{x[i, j, by]} syntax and is a good place to start. If you have read the vignettes and the help page below, please feel free to ask questions on Stack Overflow \href{http://stackoverflow.com/questions/tagged/data.table}{data.table tag} or on \href{http://r.789695.n4.nabble.com/datatable-help-f2315188.html}{datatable-h [...]
    
-   Please check the \href{http://datatable.r-forge.r-project.org/}{homepage} for up to the minute \href{https://github.com/Rdatatable/data.table/blob/master/README.md}{news}.
+   Please check the \href{https://github.com/Rdatatable/data.table/wiki}{homepage} for up to the minute live NEWS.
    
-   Tip: one of the quickest ways to learn the features is to type \code{example(data.table)} and study the output at the prompt.
+   Tip: one of the \emph{quickest} ways to learn the features is to type \code{example(data.table)} and study the output at the prompt.
 }
 \usage{
-data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)
+data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL, stringsAsFactors=FALSE)
 
 \method{[}{data.table}(x, i, j, by, keyby, with = TRUE,
   nomatch = getOption("datatable.nomatch"),                   # default: NA_integer_
@@ -31,277 +31,399 @@ data.table(..., keep.rownames=FALSE, check.names=FALSE, key=NULL)
   .SDcols,
   verbose = getOption("datatable.verbose"),                   # default: FALSE
   allow.cartesian = getOption("datatable.allow.cartesian"),   # default: FALSE
-  drop = NULL, 
-  on = NULL # join without setting keys, new feature from v1.9.6+
-  )
+  drop = NULL, on = NULL)
 }
 \arguments{
-  \item{\dots}{ Just as \code{\dots} in \code{\link{data.frame}}. Usual recycling rules are applied to vectors of different lengths to create a list of equal length vectors.
+    \item{\dots}{ Just as \code{\dots} in \code{\link{data.frame}}. Usual recycling rules are applied to vectors of different lengths to create a list of equal length vectors.}
 
-}
-  \item{keep.rownames}{ If \code{\dots} is a \code{matrix} or \code{data.frame}, \code{TRUE} will retain the rownames of that object in a column named \code{rn}.
+    \item{keep.rownames}{ If \code{\dots} is a \code{matrix} or \code{data.frame}, \code{TRUE} will retain the rownames of that object in a column named \code{rn}.}
 
-}
-  \item{check.names}{ Just as \code{check.names} in \code{\link{data.frame}}.
+    \item{check.names}{ Just as \code{check.names} in \code{\link{data.frame}}.}
 
-}
-  \item{key}{ Character vector of one or more column names which is passed to \code{\link{setkey}}. It may be a single comma separated string such as \code{key="x,y,z"}, or a vector of names such as \code{key=c("x","y","z")}.
+    \item{key}{ Character vector of one or more column names which is passed to \code{\link{setkey}}. It may be a single comma separated string such as \code{key="x,y,z"}, or a vector of names such as \code{key=c("x","y","z")}.}
 
-}
-  \item{x}{ A \code{data.table}.
+    \item{stringsAsFactors}{Logical (default is \code{FALSE}). Convert all \code{character} columns to \code{factor}s?}
 
-}
-  \item{i}{ Integer, logical or character vector, single column numeric \code{matrix}, expression of column names, \code{list} or \code{data.table}.
+    \item{x}{ A \code{data.table}.}
+
+    \item{i}{ Integer, logical or character vector, single column numeric \code{matrix}, expression of column names, \code{list}, \code{data.frame} or \code{data.table}.
+
+        \code{integer} and \code{logical} vectors work the same way they do in \code{\link{[.data.frame}} except logical \code{NA}s are treated as FALSE.
+
+        \code{expression} is evaluated within the frame of the \code{data.table} (i.e. it sees column names as if they are variables) and can evaluate to any of the other types.
+
+        \code{character}, \code{list} and \code{data.frame} input to \code{i} is converted into a \code{data.table} internally using \code{\link{as.data.table}}. 
 
-  integer and logical vectors work the same way they do in \code{\link{[.data.frame}}. Other than \code{NA}s in logical \code{i} are treated as \code{FALSE} and a single \code{NA} logical is not recycled to match the number of rows, as it is in \code{[.data.frame}.
+        If \code{i} is a \code{data.table}, the columns in \code{i} to be matched against \code{x} can be specified using one of these ways:
 
-  character is matched to the first column of \code{x}'s key.
+        \itemize{
+            \item{\code{on} argument (see below). It allows for both \code{equi-} and the newly implemented \code{non-equi} joins.}
 
-  expression is evaluated within the frame of the \code{data.table} (i.e. it sees column names as if they are variables) and can evaluate to any of the other types.
+            \item{If not, \code{x} \emph{must be keyed}. Key can be set using \code{\link{setkey}}. If \code{i} is also keyed, then first \emph{key} column of \code{i} is matched against first \emph{key} column of \code{x}, second against second, etc..
 
-  When \code{i} is a \code{data.table}, \code{x} must have a key. \code{i} is \emph{joined} to \code{x} using \code{x}'s key and the rows in \code{x} that match are returned. An equi-join is performed between each column in \code{i} to each column in \code{x}'s key; i.e., column 1 of \code{i} is matched to the 1st column of \code{x}'s key, column 2 to the second, etc. The match is a binary search in compiled C in O(log n) time. If \code{i} has \emph{fewer} columns than \code{x}'s key the [...]
+            If \code{i} is not keyed, then first column of \code{i} is matched against first \emph{key} column of \code{x}, second column of \code{i} against second \emph{key} column of \code{x}, etc...
+
+            This is summarised in code as \code{min(length(key(x)), if (haskey(i)) length(key(i)) else ncol(i))}.}
+        }
+        Using \code{on=} is recommended (even during keyed joins) as it helps understand the code better and also allows for \emph{non-equi} joins.
   
-  All types of `i` may be prefixed with \code{!}. This signals a \emph{not-join} or \emph{not-select} should be performed. Throughout \code{data.table} documentation, where we refer to the type of `i`, we mean the type of `i` \emph{after} the `!`, if present. See examples.
+        When the binary operator \code{==} alone is used, an \emph{equi} join is performed. In SQL terms, \code{x[i]} then performs a \emph{right join} by default. \code{i} prefixed with \code{!} signals a \emph{not-join} or \emph{not-select}. 
 
-  Advanced: When \code{i} is an expression of column names that evaluates to \code{data.table} or \code{list}, a join is performed. We call this a \emph{self join}.
+        Support for \emph{non-equi} join was recently implemented, which allows for other binary operators \code{>=, >, <= and <}.
 
-  Advanced: When \code{i} is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.
+        See \href{../doc/datatable-keys-fast-subset.html}{Keys and fast binary search based subset} and \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{Secondary indices and auto indexing}.
 
-}
-  \item{j}{ A single column name, single expresson of column names, \code{list()} of expressions of column names, an expression or function call that evaluates to \code{list} (including \code{data.frame} and \code{data.table} which are \code{list}s, too), or (when \code{with=FALSE}) a vector of names or positions to select.
+        \emph{Advanced:} When \code{i} is a single variable name, it is not considered an expression of column names and is instead evaluated in calling scope.
+    }
 
-  \code{j} is evaluated within the frame of the \code{data.table}; i.e., it sees column names as if they are variables. Use \code{j=list(...)} to return multiple columns and/or expressions of columns. A single column or single expression returns that type, usually a vector. See the examples.
+    \item{j}{When \code{with=TRUE} (default), \code{j} is evaluated within the frame of the data.table; i.e., it sees column names as if they are variables. This allows to not just \emph{select} columns in \code{j}, but also \code{compute} on them e.g., \code{x[, a]} and \code{x[, sum(a)]} returns \code{x$a} and \code{sum(x$a)} as a vector respectively. \code{x[, .(a, b)]} and \code{x[, .(sa=sum(a), sb=sum(b))]} returns a two column data.table each, the first simply \emph{selecting} colu [...]
 
-}
-  \item{by}{ A single unquoted column name, a \code{list()} of expressions of column names, a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end), or a character vector of column names.
+    The expression `.()` is a \emph{shorthand} alias to \code{list()}; they both mean the same. As long as \code{j} returns a \code{list}, each element of the list becomes a column in the resulting \code{data.table}. This is the default \emph{enhanced} mode.
 
-  The \code{list()} of expressions is evaluated within the frame of the \code{data.table} (i.e. it sees column names as if they are variables). The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{j=list(...)} may be omitted when there is just one expression, for convenience, typically a single expression such as \code{sum(colB)}; e.g., \code{DT[,sum(c [...]
+    When \code{with=FALSE}, \code{j} can only be a vector of column names or positions to select (as in \code{data.frame}). 
 
-  When \code{by} contains the first \code{n} columns of \code{x}'s key, we call this a \emph{keyed by}. In a keyed by the groups appear contiguously in RAM and memory is copied in bulk internally, for extra speed. Otherwise, we call it an \emph{ad hoc by}. Ad hoc by is still many times faster than \code{tapply}, for example, but just not as fast as keyed by when datasets are very large, in particular when the size of \emph{each group} is large. Not to be confused with \code{keyby=} defin [...]
+    \emph{Advanced:} \code{j} also allows the use of special \emph{read-only} symbols: \code{\link{.SD}}, \code{\link{.N}}, \code{\link{.I}}, \code{\link{.GRP}}, \code{\link{.BY}}.
 
-  Advanced: When \code{i} is a \code{data.table}, \code{DT[i,j,by=.EACHI]} evaluates \code{j} for the groups in `DT` that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. It is particularly memory efficient as you don't have to materialise the join result only to aggregate later. Please refer to \href{http://stackoverflow.com/a/27004566/559784}{this Stackoverflow answer} for a more detai [...]
+    \emph{Advanced:} When \code{i} is a \code{data.table}, the columns of \code{i} can be referred to in \code{j} by using the prefix \code{i.}, e.g., \code{X[Y, .(val, i.val)]}. Here \code{val} refers to \code{X}'s column and \code{i.val} \code{Y}'s. 
 
-  Advanced: When grouping, symbols \code{.SD}, \code{.BY}, \code{.N}, \code{.I} and \code{.GRP} may be used in the \code{j} expression, defined as follows.
-  
-  \code{.SD} is a \code{data.table} containing the \bold{S}ubset of \code{x}'s \bold{D}ata for each group, excluding any columns used in \code{by} (or \code{keyby}).
-  
-  \code{.BY} is a \code{list} containing a length 1 vector for each item in \code{by}. This can be useful when \code{by} is not known in advance. The \code{by} variables are also available to \code{j} directly by name; useful for example for titles of graphs if \code{j} is a plot command, or to branch with \code{if()} depending on the value of a group variable.
-  
-  \code{.N} is an integer, length 1, containing the number of rows in the group. This may be useful when the column names are not known in advance and for convenience generally. When grouping by \code{i}, \code{.N} is the number of rows in \code{x} matched to, for each row of \code{i}, regardless of whether \code{nomatch} is \code{NA} or \code{0}. It is renamed to \code{N} (no dot) in the result (otherwise a column called \code{".N"} could conflict with the \code{.N} variable, see FAQ 4. [...]
-  
-  \code{.I} is an integer vector equal to \code{seq_len(nrow(x))}. While grouping, it holds for each item in the group, it's row location in \code{x}. This is useful to subset in \code{j}; e.g. \code{DT[, .I[which.max(somecol)], by=grp]}.
-  
-  \code{.GRP} is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc.
-  
-  \code{.SD}, \code{.BY}, \code{.N}, \code{.I} and \code{.GRP} are \emph{read only}. Their bindings are locked and attempting to assign to them will generate an error. If you wish to manipulate \code{.SD} before returning it, take a \code{copy(.SD)} first (see FAQ 4.5). Using \code{:=} in the \code{j} of \code{.SD} is reserved for future use as a (tortuously) flexible way to update \code{DT} by reference by group (even when groups are not contiguous in an ad hoc by).
+    \emph{Advanced:} Columns of \code{x} can now be referred to using the prefix \code{x.} and is particularly useful during joining to refer to \code{x}'s \emph{join} columns as they are otherwise masked by \code{i}'s. For example, \code{X[Y, .(x.a-i.a, b), on="a"]}.
 
-  Advanced: In the \code{X[Y,j]} form of grouping, the \code{j} expression sees variables in \code{X} first, then \code{Y}. We call this \emph{join inherited scope}. If the variable is not in \code{X} or \code{Y} then the calling frame is searched, its calling frame, and so on in the usual way up to and including the global environment.
+    See \href{../doc/datatable-intro.html}{Introduction to data.table} vignette and examples.} 
 
-}
-  \item{keyby}{ An \emph{ad-hoc-by} or \emph{keyed-by} (just as \code{by=} defined above) but with an additional \code{setkey()} run on the \code{by} columns of the result afterwards, for convenience. It is common practice to use `keyby=` routinely when you wish the result to be sorted. Out loud we read \code{keyby=} as \emph{by= then setkey}. Otherwise, `by=` can be relied on to return the groups in the order they appear in the data.
-  
-}
-  \item{with}{ By default \code{with=TRUE} and \code{j} is evaluated within the frame of \code{x}; column names can be used as variables. When \code{with=FALSE} \code{j} is a character vector of column names or a numeric vector of column positions to select, and the value returned is always a \code{data.table}. \code{with=FALSE} is often useful in \code{data.table} to select columns dynamically.
+    \item{by}{ Column names are seen as if they are variables (as in \code{j} when \code{with=TRUE}). The \code{data.table} is then grouped by the \code{by} and \code{j} is evaluated within each group. The order of the rows within each group is preserved, as is the order of the groups. \code{by} accepts: 
 
-}
-  \item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row in \code{i} has no match to \code{x}'s key, \code{nomatch=NA} (default) means \code{NA} is returned for \code{x}'s non-join columns for that row of \code{i}. \code{0} means no rows will be returned for that row of \code{i}. The default value (used when \code{nomatch} is not supplied) can be changed from \code{NA} to \code{0} using \code{options(datatable.nomatch=0)}.
+    \itemize{
+        \item{A single unquoted column name: e.g., \code{DT[, .(sa=sum(a)), by=x]}}
 
-}
-  \item{mult}{ When \emph{multiple} rows in \code{x} match to the row in \code{i}, \code{mult} controls which are returned: \code{"all"} (default), \code{"first"} or \code{"last"}.
+        \item{a \code{list()} of expressions of column names: e.g., \code{DT[, .(sa=sum(a)), by=.(x=x>0, y)]}}
 
-}
-  \item{roll}{ Applies to the last join column, generally a date but can be any ordered variable, irregular and including gaps. If \code{roll=TRUE} and \code{i}'s row matches to all but the last \code{x} join column, and its value in the last \code{i} join column falls in a gap (including after the last observation in \code{x} for that group), then the \emph{prevailing} value in \code{x} is \emph{rolled} forward. This operation is particularly fast using a modified binary search. The ope [...]
-  When \code{roll} is a positive number, this limits how far values are carried forward. \code{roll=TRUE} is equivalent to \code{roll=+Inf}.
-  When \code{roll} is a negative number, values are rolled backwards; i.e., next observation carried backwards (NOCB). Use \code{-Inf} for unlimited roll back.
-  When \code{roll} is \code{"nearest"}, the nearest value is joined to.
-  
-}
-  \item{rollends}{ A logical vector length 2 (a single logical is recycled). When rolling forward (e.g. \code{roll=TRUE}) if a value is past the \emph{last} observation within each group defined by the join columns, \code{rollends[2]=TRUE} will roll the last value forwards. \code{rollends[1]=TRUE} will roll the first value backwards if the value is before it. If \code{rollends=FALSE} the value of \code{i} must fall in a gap in \code{x} but not after the end or before the beginning of the [...]
-  
-}
-  \item{which}{ \code{TRUE} returns the row numbers of \code{x} that \code{i} matches to. \code{NA} returns the row numbers of \code{i} that have no match in \code{x}. By default \code{FALSE} and the rows in \code{x} that match are returned.
+        \item{a single character string containing comma separated column names (where spaces are significant since column names may contain spaces even at the start or end): e.g., \code{DT[, sum(a), by="x,y,z"]}}
 
-}
-  \item{.SDcols}{ Advanced. Specifies the columns of \code{x} included in \code{.SD}. May be character column names or numeric positions. This is useful for speed when applying a function through a subset of (possible very many) columns; e.g., \code{DT[,lapply(.SD,sum),by="x,y",.SDcols=301:350]}.
+        \item{a character vector of column names: e.g., \code{DT[, sum(a), by=c("x", "y")]}}
+
+        \item{or of the form \code{startcol:endcol}: e.g., \code{DT[, sum(a), by=x:z]}}
+    }
+
+    \emph{Advanced:} When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}), \code{DT[i, j, by=.EACHI]} evaluates \code{j} for the groups in `DT` that each row in \code{i} joins to. That is, you can join (in \code{i}) and aggregate (in \code{j}) simultaneously. We call this \emph{grouping by each i}. See \href{http://stackoverflow.com/a/27004566/559784}{this StackOverflow answer} for a more detailed explanation until we \href{https://github.com/Rdatatable/data.table/i [...]
+
+    \emph{Advanced:} In the \code{X[Y, j]} form of grouping, the \code{j} expression sees variables in \code{X} first, then \code{Y}. We call this \emph{join inherited scope}. If the variable is not in \code{X} or \code{Y} then the calling frame is searched, its calling frame, and so on in the usual way up to and including the global environment.}
+
+    \item{keyby}{ Same as \code{by}, but with an additional \code{setkey()} run on the \code{by} columns of the result, for convenience. It is common practice to use `keyby=` routinely when you wish the result to be sorted.}
+
+    \item{with}{ By default \code{with=TRUE} and \code{j} is evaluated within the frame of \code{x}; column names can be used as variables. 
+
+        When \code{with=FALSE} \code{j} is a character vector of column names, a numeric vector of column positions to select or of the form \code{startcol:endcol}, and the value returned is always a \code{data.table}. \code{with=FALSE} is often useful in \code{data.table} to select columns dynamically. Note that \code{x[, cols, with=FALSE]} is equivalent to \code{x[, .SD, .SDcols=cols]}.}
+
+    \item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row in \code{i} has no match to \code{x}, \code{nomatch=NA} (default) means \code{NA} is returned. \code{0} means no rows will be returned for that row of \code{i}. Use \code{options(datatable.nomatch=0)} to change the default value (used when \code{nomatch} is not supplied).}
+
+    \item{mult}{ When \code{i} is a \code{list} (or \code{data.frame} or \code{data.table}) and \emph{multiple} rows in \code{x} match to the row in \code{i}, \code{mult} controls which are returned: \code{"all"} (default), \code{"first"} or \code{"last"}.}
+
+    \item{roll}{ When \code{i} is a \code{data.table} and its row matches to all but the last \code{x} join column, and its value in the last \code{i} join column falls in a gap (including after the last observation in \code{x} for that group), then:
+
+        \itemize{
+            \item{\code{+Inf} (or \code{TRUE}) rolls the \emph{prevailing} value in \code{x} forward. It is also known as last observation carried forward (LOCF).}
+            \item{\code{-Inf} rolls backwards instead; i.e., next observation carried backward (NOCB).}
+            \item{finite positive or negative number limits how far values are carried forward or backward.}
+            \item{"nearest" rolls the nearest value instead.}
+        }
+        Rolling joins apply to the last join column, generally a date but can be any variable. It is particularly fast using a modified binary search.
+        
+        A common idiom is to select a contemporaneous regular time series (\code{dts}) across a set of identifiers (\code{ids}):  \code{DT[CJ(ids,dts),roll=TRUE]} where \code{DT} has a 2-column key (id,date) and \code{\link{CJ}} stands for \emph{cross join}.}
+
+    \item{rollends}{ A logical vector length 2 (a single logical is recycled) indicating whether values falling before the first value or after the last value for a group should be rolled as well.
+        \itemize{
+            \item{If \code{rollends[2]=TRUE}, it will roll the last value forward. \code{TRUE} by default for LOCF and \code{FALSE} for NOCB rolls.}
+            \item{If \code{rollends[1]=TRUE}, it will roll the first value backward. \code{TRUE} by default for NOCB and \code{FALSE} for LOCF rolls.}
+        }
+        When \code{roll} is a finite number, that limit is also applied when rolling the ends.}
+
+    \item{which}{\code{TRUE} returns the row numbers of \code{x} that \code{i} matches to. If \code{NA}, returns the row numbers of \code{i} that have no match in \code{x}. By default \code{FALSE} and the rows in \code{x} that match are returned.}
+
+    \item{.SDcols}{ Specifies the columns of \code{x} to be included in the special symbol \code{\link{.SD}} which stands for \code{Subset of data.table}. May be character column names or numeric positions. This is useful for speed when applying a function through a subset of (possible very many) columns; e.g., \code{DT[, lapply(.SD, sum), by="x,y", .SDcols=301:350]}.
  
+    For convenient interactive use, the form \code{startcol:endcol} is also allowed (as in \code{by}), e.g., \code{DT[, lapply(.SD, sum), by=x:y, .SDcols=a:f]}
 }
   \item{verbose}{ \code{TRUE} turns on status and information messages to the console. Turn this on by default using \code{options(datatable.verbose=TRUE)}. The quantity and types of verbosity may be expanded in future.
 
 }  
   \item{allow.cartesian}{ \code{FALSE} prevents joins that would result in more than \code{nrow(x)+nrow(i)} rows. This is usually caused by duplicate values in \code{i}'s join columns, each of which join to the same group in `x` over and over again: a \emph{misspecified} join. Usually this was not intended and the join needs to be changed. The word 'cartesian' is used loosely in this context. The traditional cartesian join is (deliberately) difficult to achieve in \code{data.table}: wher [...]
   
-  \item{drop}{ Never used by \code{data.table}. Do not use. It needs to be here because \code{data.table} inherits from \code{data.frame}. See \code{vignette("datatable-faq")}.
-  
-}
-  \item{on}{ A named atomic vector of column names indicating which columns in \code{i} should be joined to which columns in \code{x}. See \code{Examples}.}
+  \item{drop}{ Never used by \code{data.table}. Do not use. It needs to be here because \code{data.table} inherits from \code{data.frame}. See \href{../doc/datatable-faq.html}{datatable-faq}.}
+
+  \item{on}{ Indicate which columns in \code{i} should be joined with columns in \code{x} along with the type of binary operator to join with. When specified, this overrides the keys set on \code{x} and \code{i}. There are multiple ways of specifying \code{on} argument: 
+        \itemize{
+            \item{As a character vector, e.g., \code{X[Y, on=c("a", "b")]}. This assumes both these columns are present in \code{X} and \code{Y}.}
+            \item{As a \emph{named} character vector, e.g., \code{X[Y, on=c(x="a", y="b")]}. This is useful when column names to join by are different between the two tables. 
+
+            NB: \code{X[Y, on=c("a", y="b")]} is also possible if column \code{"a"} is common between the two tables.}
+            \item{For convenience during interactive scenarios, it is also possible to use \code{.()} syntax as \code{X[Y, on=.(a, b)]}.}
+            \item{From v1.9.8, (non-equi) joins using binary operators \code{>=, >, <=, <} are also possible, e.g., \code{X[Y, on=c("x>=a", "y<=b")]}, or for interactive use as \code{X[Y, on=.(x>=a, y<=b)]}.}
+        }
+        See examples as well as \href{../doc/datatable-secondary-indices-and-auto-indexing.html}{Secondary indices and auto indexing}.
+    }
 }
 \details{
-\code{data.table} builds on base \R functionality to reduce 2 types of time :
+\code{data.table} builds on base \R functionality to reduce 2 types of time:\cr
+
 \enumerate{
-\item programming time (easier to write, read, debug and maintain)
-\item compute time
+    \item{programming time (easier to write, read, debug and maintain), and}
+    \item{compute time (fast and memory efficient).}
 }
 
-It combines database like operations such as \code{\link{subset}}, \code{\link{with}} and \code{\link{by}} and provides similar joins that \code{\link{merge}} provides but faster. This is achieved by using R's column based ordered in-memory \code{data.frame} structure, \code{eval} within the environment of a \code{list}, the \code{[.data.table} mechanism to condense the features, and compiled C to make certain operations fast.
+The general form of data.table syntax is:\cr
 
-The package can be used just for rapid programming (compact syntax). Largest compute time benefits are on 64bit platforms with plentiful RAM, or when smaller datasets are repeatedly queried within a loop, or when other methods use so much working memory that they fail with an out of memory error.
-
-As with \code{[.data.frame}, \emph{compound queries} can be concatenated on one line; e.g., 
 \preformatted{
-    DT[,sum(v),by=colA][V1<300][tail(order(V1))]
-    # sum(v) by colA then return the 6 largest which are under 300
+    DT[ i,  j,  by ] # + extra arguments
+        |   |   |
+        |   |    -------> grouped by what?
+        |    -------> what to do?
+         ---> on which rows?
 }
-The \code{j} expression does not have to return data; e.g.,
-\preformatted{
-    DT[,plot(colB,colC),by=colA]
-    # produce a set of plots (likely to pdf) returning no data
-}
-Multiple \code{data.table}s (e.g. \code{X}, \code{Y} and \code{Z}) can be joined in many ways; e.g.,
+
+The way to read this out loud is: "Take \code{DT}, subset rows by \code{i}, \emph{then} compute \code{j} grouped by \code{by}. Here are some basic usage examples expanding on this definition. See the vignette (and examples) for working examples.
+
 \preformatted{
-    X[Y][Z]
-    X[Z][Y]
-    X[Y[Z]]
-    X[Z[Y]]
+    X[, a]                      # return col 'a' from X as vector. If not found, search in parent frame.
+    X[, .(a)]                   # same as above, but return as a data.table.
+    X[, sum(a)]                 # return sum(a) as a vector (with same scoping rules as above)
+    X[, .(sum(a)), by=c]        # get sum(a) grouped by 'c'.
+    X[, sum(a), by=c]           # same as above, .() can be ommitted in by on single expression for convenience
+    X[, sum(a), by=c:f]         # get sum(a) grouped by all columns in between 'c' and 'f' (both inclusive)
+
+    X[, sum(a), keyby=b]        # get sum(a) grouped by 'b', and sort that result by the grouping column 'b'
+    X[, sum(a), by=b][order(b)] # same order as above, but by chaining compound expressions
+    X[c>1, sum(a), by=c]        # get rows where c>1 is TRUE, and on those rows, get sum(a) grouped by 'c'
+    X[Y, .(a, b), on="c"]       # get rows where Y$c == X$c, and select columns 'X$a' and 'X$b' for those rows
+    X[Y, .(a, i.a), on="c"]     # get rows where Y$c == X$c, and then select 'X$a' and 'Y$a' (=i.a)
+    X[Y, sum(a*i.a), on="c" by=.EACHI] # for *each* 'Y$c', get sum(a*i.a) on matching rows in 'X$c'
+
+    X[, plot(a, b), by=c]       # j accepts any expression, generates plot for each group and returns no data
+    # see ?assign to add/update/delete columns by reference using the same consistent interface
 }
+
 A \code{data.table} is a \code{list} of vectors, just like a \code{data.frame}. However :
 \enumerate{
-\item it never has rownames. Instead it may have one \emph{key} of one or more columns. This key can be used for row indexing instead of rownames.
+\item it never has or uses rownames. Rownames based indexing can be done by setting a \emph{key} of one or more columns or done \emph{ad-hoc} using the \code{on} argument (now preferred).
 \item it has enhanced functionality in \code{[.data.table} for fast joins of keyed tables, fast aggregation, fast last observation carried forward (LOCF) and fast add/modify/delete of columns by reference with no copy at all.
 }
 
-Since a \code{list} \emph{is} a \code{vector}, \code{data.table} columns may be type \code{list}. Columns of type \code{list} can contain mixed types. Each item in a column of type \code{list} may be different lengths. This is true of \code{data.frame}, too.
+See the \code{see also} section for the several other \emph{methods} that are available for operating on data.tables efficiently.
 
-Several \emph{methods} are provided for \code{data.table}, including \code{is.na}, \code{na.omit},
-\code{t}, \code{rbind}, \code{cbind}, \code{merge} and others.
 }
 \references{
-\code{data.table} homepage: \url{http://datatable.r-forge.r-project.org/}\cr
-User reviews: \url{http://crantastic.org/packages/data-table}\cr
-\url{http://en.wikipedia.org/wiki/Binary_search}\cr
-\url{http://en.wikipedia.org/wiki/Radix_sort}             
+\url{https://github.com/Rdatatable/data.table/wiki} (\code{data.table} homepage)\cr
+\url{http://crantastic.org/packages/data-table} (User reviews)\cr
+\url{http://en.wikipedia.org/wiki/Binary_search}
 }
-\note{ If \code{keep.rownames} or \code{check.names} are supplied they must be written in full because \R does not allow partial argument names after `\code{\dots}`. For example, \code{data.table(DF,keep=TRUE)} will create a
-column called \code{"keep"} containing \code{TRUE} and this is correct behaviour; \code{data.table(DF,keep.rownames=TRUE)} was intended.
+\note{ If \code{keep.rownames} or \code{check.names} are supplied they must be written in full because \R does not allow partial argument names after `\code{\dots}`. For example, \code{data.table(DF, keep=TRUE)} will create a
+column called \code{"keep"} containing \code{TRUE} and this is correct behaviour; \code{data.table(DF, keep.rownames=TRUE)} was intended.
 
-POSIXlt is not supported as a column type because it uses 40 bytes to store a single datetime. Unexpected errors may occur if you manage to create a column of type POSIXlt. Please see \href{http://r-forge.r-project.org/scm/viewvc.php/pkg/NEWS?view=markup&root=datatable}{NEWS} for 1.6.3, and \code{\link{IDateTime}} instead. IDateTime has methods to convert to and from POSIXlt.
-}
-\seealso{ \code{\link{data.frame}}, \code{\link{[.data.frame}} , \code{\link{setkey}}, \code{\link{J}}, \code{\link{SJ}}, \code{\link{CJ}}, \code{\link{merge.data.table}}, \code{\link{tables}}, \code{\link{test.data.table}}, \code{\link{IDateTime}}, \code{\link{unique.data.table}}, \code{\link{copy}}, \code{\link{:=}}, \code{\link{alloc.col}}, \code{\link{truelength}}, \code{\link{rbindlist}}, \code{\link{setNumericRounding}}
+\code{POSIXlt} is not supported as a column type because it uses 40 bytes to store a single datetime. They are implicitly converted to \code{POSIXct} type with \emph{warning}. You may also be interested in \code{\link{IDateTime}} instead; it has methods to convert to and from \code{POSIXlt}.
 }
+\seealso{ \code{\link{special-symbols}}, \code{\link{data.frame}}, \code{\link{[.data.frame}}, \code{\link{as.data.table}}, \code{\link{setkey}}, \code{\link{setorder}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{J}}, \code{\link{SJ}}, \code{\link{CJ}}, \code{\link{merge.data.table}}, \code{\link{tables}}, \code{\link{test.data.table}}, \code{\link{IDateTime}}, \code{\link{unique.data.table}}, \code{\link{copy}}, \code{\link{:=}}, \code{\link{alloc.col}}, \code{\link{truelengt [...]
 \examples{
 \dontrun{
 example(data.table)  # to run these examples at the prompt}
 
-DF = data.frame(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
-DT = data.table(x=rep(c("a","b","c"),each=3), y=c(1,3,6), v=1:9)
+DF = data.frame(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
+DT = data.table(x=rep(c("b","a","c"),each=3), y=c(1,3,6), v=1:9)
 DF
 DT
-identical(dim(DT),dim(DF)) # TRUE
-identical(DF$a, DT$a)      # TRUE
-is.list(DF)                # TRUE
-is.list(DT)                # TRUE
+identical(dim(DT), dim(DF))    # TRUE
+identical(DF$a, DT$a)          # TRUE
+is.list(DF)                    # TRUE
+is.list(DT)                    # TRUE
 
-is.data.frame(DT)          # TRUE
+is.data.frame(DT)              # TRUE
 
 tables()
 
-DT[2]                      # 2nd row
-DT[,v]                     # v column (as vector)
-DT[,list(v)]               # v column (as data.table)
-DT[2:3,sum(v)]             # sum(v) over rows 2 and 3
-DT[2:5,cat(v,"\n")]        # just for j's side effect
-DT[c(FALSE,TRUE)]          # even rows (usual recycling)
-
-DT[,2,with=FALSE]          # 2nd column
+# basic row subset operations
+DT[2]                          # 2nd row
+DT[3:2]                        # 3rd and 2nd row
+DT[order(x)]                   # no need for order(DT$x)
+DT[order(x), ]                 # same as above. The ',' is optional
+DT[y>2]                        # all rows where DT$y > 2
+DT[y>2 & v>5]                  # compound logical expressions
+DT[!2:4]                       # all rows other than 2:4
+DT[-(2:4)]                     # same
+
+# select|compute columns data.table way
+DT[, v]                        # v column (as vector)
+DT[, list(v)]                  # v column (as data.table)
+DT[, .(v)]                     # same as above, .() is a shorthand alias to list()
+DT[, sum(v)]                   # sum of column v, returned as vector
+DT[, .(sum(v))]                # same, but return data.table (column autonamed V1)
+DT[, .(sv=sum(v))]             # same, but column named "sv"
+DT[, .(v, v*2)]                # return two column data.table, v and v*2
+
+# subset rows and select|compute data.table way
+DT[2:3, sum(v)]                # sum(v) over rows 2 and 3, return vector
+DT[2:3, .(sum(v))]             # same, but return data.table with column V1
+DT[2:3, .(sv=sum(v))]          # same, but return data.table with column sv 
+DT[2:5, cat(v, "\n")]          # just for j's side effect
+
+# select columns the data.frame way
+DT[, 2, with=FALSE]            # 2nd column, returns a data.table always
 colNum = 2
-DT[,colNum,with=FALSE]     # same
-
-setkey(DT,x)               # set a 1-column key. No quotes, for convenience.
-setkeyv(DT,"x")            # same (v in setkeyv stands for vector)
-v="x"
-setkeyv(DT,v)              # same
-# key(DT)<-"x"             # copies whole table, please use set* functions instead
-
-DT["a"]                    # binary search (fast)
-DT[x=="a"]                 # same; i.e. binary search (fast)
-
-DT[,sum(v),by=x]           # keyed by
-DT[,sum(v),by=key(DT)]     # same
-DT[,sum(v),by=y]           # ad hoc by
-
-DT["a",sum(v)]                    # j for one group
-DT[c("a","b"),sum(v),by=.EACHI]   # j for two groups
-
-X = data.table(c("b","c"),foo=c(4,2))
+DT[, colNum, with=FALSE]       # same, equivalent to DT[, .SD, .SDcols=colNum]
+DT[["v"]]                      # same as DT[, v] but much faster
+
+# grouping operations - j and by
+DT[, sum(v), by=x]             # ad hoc by, order of groups preserved in result
+DT[, sum(v), keyby=x]          # same, but order the result on by cols
+DT[, sum(v), by=x][order(x)]   # same but by chaining expressions together
+
+# fast ad hoc row subsets (subsets as joins)
+DT["a", on="x"]                # same as x == "a" but uses binary search (fast)
+DT["a", on=.(x)]               # same, for convenience, no need to quote every column
+DT[.("a"), on="x"]             # same
+DT[x=="a"]                     # same, single "==" internally optimised to use binary search (fast)
+DT[x!="b" | y!=3]              # not yet optimized, currently vector scan subset
+DT[.("b", 3), on=c("x", "y")]  # join on columns x,y of DT; uses binary search (fast)
+DT[.("b", 3), on=.(x, y)]      # same, but using on=.()
+DT[.("b", 1:2), on=c("x", "y")]             # no match returns NA
+DT[.("b", 1:2), on=.(x, y), nomatch=0]      # no match row is not returned
+DT[.("b", 1:2), on=c("x", "y"), roll=Inf]   # locf, nomatch row gets rolled by previous row
+DT[.("b", 1:2), on=.(x, y), roll=-Inf]      # nocb, nomatch row gets rolled by next row
+DT["b", sum(v*y), on="x"]                   # on rows where DT$x=="b", calculate sum(v*y)
+
+# all together now
+DT[x!="a", sum(v), by=x]                    # get sum(v) by "x" for each i != "a"
+DT[!"a", sum(v), by=.EACHI, on="x"]         # same, but using subsets-as-joins
+DT[c("b","c"), sum(v), by=.EACHI, on="x"]   # same
+DT[c("b","c"), sum(v), by=.EACHI, on=.(x)]  # same, using on=.()
+
+# joins as subsets
+X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
 X
 
-DT[X]                      # join
-DT[X,sum(v),by=.EACHI]     # join and eval j for each row in i
-DT[X,mult="first"]         # first row of each group
-DT[X,mult="last"]          # last row of each group
-DT[X,sum(v)*foo,by=.EACHI] # join inherited scope
-
-setkey(DT,x,y)             # 2-column key
-setkeyv(DT,c("x","y"))     # same
-
-DT["a"]                    # join to 1st column of key
-DT[.("a")]                 # same, .() is an alias for list()
-DT[list("a")]              # same
-DT[.("a",3)]               # join to 2 columns
-DT[.("a",3:6)]             # join 4 rows (2 missing)
-DT[.("a",3:6),nomatch=0]   # remove missing
-DT[.("a",3:6),roll=TRUE]   # rolling join (locf)
-
-DT[,sum(v),by=.(y\%\%2)]   # by expression
-DT[,.SD[2],by=x]           # 2nd row of each group
-DT[,tail(.SD,2),by=x]      # last 2 rows of each group
-DT[,lapply(.SD,sum),by=x]  # apply through columns by group
-
-DT[,list(MySum=sum(v),
-         MyMin=min(v),
-         MyMax=max(v)),
-    by=.(x,y\%\%2)]        # by 2 expressions
-
-DT[,sum(v),x][V1<20]       # compound query
-DT[,sum(v),x][order(-V1)]  # ordering results
-
-print(DT[,z:=42L])         # add new column by reference
-print(DT[,z:=NULL])        # remove column by reference
-print(DT["a",v:=42L])      # subassign to existing v column by reference
-print(DT["b",v2:=84L])     # subassign to new column by reference (NA padded)
-
-DT[,m:=mean(v),by=x][]     # add new column by reference by group
-                           # NB: postfix [] is shortcut to print()
-
-DT[,.SD[which.min(v)],by=x][]  # nested query by group
-
-DT[!.("a")]                # not join
-DT[!"a"]                   # same
-DT[!2:4]                   # all rows other than 2:4
-DT[x!="b" | y!=3]          # not yet optimized, currently vector scans
-DT[!.("b",3)]              # same result but much faster
-
-
-# new feature: 'on' argument, from v1.9.6+
-DT1 = data.table(x=c("c", "a", "b", "a", "b"), a=1:5)
-DT2 = data.table(x=c("d", "c", "b"), mul=6:8)
-
-DT1[DT2, on=c(x="x")] # join on columns 'x'
-DT1[DT2, on="x"] # same as above
-DT1[DT2, .(sum(a) * mul), by=.EACHI, on="x"] # using by=.EACHI
-DT1[DT2, .(sum(a) * mul), by=.EACHI, on="x", nomatch=0L] # using by=.EACHI
-
-# Follow r-help posting guide, support is here (*not* r-help) :
+DT[X, on="x"]                         # right join
+X[DT, on="x"]                         # left join
+DT[X, on="x", nomatch=0]              # inner join
+DT[!X, on="x"]                        # not join
+DT[X, on=.(y<=foo)]                   # NEW non-equi join (v1.9.8+)
+DT[X, on="y<=foo"]                    # same as above
+DT[X, on=c("y<=foo")]                 # same as above
+DT[X, on=.(y>=foo)]                   # NEW non-equi join (v1.9.8+)
+DT[X, on=.(x, y<=foo)]                # NEW non-equi join (v1.9.8+)
+DT[X, .(x,y,x.y,v), on=.(x, y>=foo)]  # Select x's join columns as well
+
+DT[X, on="x", mult="first"]           # first row of each group
+DT[X, on="x", mult="last"]            # last row of each group
+DT[X, sum(v), by=.EACHI, on="x"]      # join and eval j for each row in i
+DT[X, sum(v)*foo, by=.EACHI, on="x"]  # join inherited scope
+DT[X, sum(v)*i.v, by=.EACHI, on="x"]  # 'i,v' refers to X's v column
+DT[X, on=.(x, v>=v), sum(y)*foo, by=.EACHI] # NEW non-equi join with by=.EACHI (v1.9.8+)
+
+# setting keys
+kDT = copy(DT)                        # (deep) copy DT to kDT to work with it.
+setkey(kDT,x)                         # set a 1-column key. No quotes, for convenience.
+setkeyv(kDT,"x")                      # same (v in setkeyv stands for vector)
+v="x"
+setkeyv(kDT,v)                        # same
+# key(kDT)<-"x"                       # copies whole table, please use set* functions instead
+haskey(kDT)                           # TRUE
+key(kDT)                              # "x"
+
+# fast *keyed* subsets
+kDT["a"]                              # subset-as-join on *key* column 'x'
+kDT["a", on="x"]                      # same, being explicit using 'on=' (preferred)
+
+# all together
+kDT[!"a", sum(v), by=.EACHI]          # get sum(v) for each i != "a"
+
+# multi-column key
+setkey(kDT,x,y)                       # 2-column key
+setkeyv(kDT,c("x","y"))               # same
+
+# fast *keyed* subsets on multi-column key
+kDT["a"]                              # join to 1st column of key
+kDT["a", on="x"]                      # on= is optional, but is preferred
+kDT[.("a")]                           # same, .() is an alias for list()
+kDT[list("a")]                        # same
+kDT[.("a", 3)]                        # join to 2 columns
+kDT[.("a", 3:6)]                      # join 4 rows (2 missing)
+kDT[.("a", 3:6), nomatch=0]           # remove missing
+kDT[.("a", 3:6), roll=TRUE]           # locf rolling join
+kDT[.("a", 3:6), roll=Inf]            # same as above
+kDT[.("a", 3:6), roll=-Inf]           # nocb rolling join
+kDT[!.("a")]                          # not join
+kDT[!"a"]                             # same
+
+# more on special symbols, see also ?"special-symbols"
+DT[.N]                                # last row
+DT[, .N]                              # total number of rows in DT
+DT[, .N, by=x]                        # number of rows in each group
+DT[, .SD, .SDcols=x:y]                # select columns 'x' and 'y'
+DT[, .SD[1]]                          # first row of all columns
+DT[, .SD[1], by=x]                    # first row of 'y' and 'v' for each group in 'x'
+DT[, c(.N, lapply(.SD, sum)), by=x]   # get rows *and* sum columns 'v' and 'y' by group
+DT[, .I[1], by=x]                     # row number in DT corresponding to each group
+DT[, grp := .GRP, by=x]               # add a group counter column
+X[, DT[.BY, y, on="x"], by=x]         # join within each group
+
+# add/update/delete by reference (see ?assign)
+print(DT[, z:=42L])                   # add new column by reference
+print(DT[, z:=NULL])                  # remove column by reference
+print(DT["a", v:=42L, on="x"])        # subassign to existing v column by reference
+print(DT["b", v2:=84L, on="x"])       # subassign to new column by reference (NA padded)
+
+DT[, m:=mean(v), by=x][]              # add new column by reference by group
+                                      # NB: postfix [] is shortcut to print()
+# advanced usage
+DT = data.table(x=rep(c("b","a","c"),each=3), v=c(1,1,1,2,2,1,1,2,2), y=c(1,3,6), a=1:9, b=9:1)
+
+DT[, sum(v), by=.(y\%\%2)]              # expressions in by
+DT[, sum(v), by=.(bool = y\%\%2)]       # same, using a named list to change by column name
+DT[, .SD[2], by=x]                    # get 2nd row of each group
+DT[, tail(.SD,2), by=x]               # last 2 rows of each group
+DT[, lapply(.SD, sum), by=x]          # sum of all (other) columns for each group
+DT[, .SD[which.min(v)], by=x]         # nested query by group
+
+DT[, list(MySum=sum(v),
+          MyMin=min(v),
+          MyMax=max(v)),
+    by=.(x, y\%\%2)]                    # by 2 expressions
+
+DT[, .(a = .(a), b = .(b)), by=x]     # list columns
+DT[, .(seq = min(a):max(b)), by=x]    # j is not limited to just aggregations
+DT[, sum(v), by=x][V1<20]             # compound query
+DT[, sum(v), by=x][order(-V1)]        # ordering results
+DT[, c(.N, lapply(.SD,sum)), by=x]    # get number of observations and sum per group
+DT[, {tmp <- mean(y); 
+      .(a = a-tmp, b = b-tmp)
+      }, by=x]                        # anonymous lambdain 'j', j accepts any valid 
+                                      # expression. TO REMEMBER: every element of 
+                                      # the list becomes a column in result.
+pdf("new.pdf")
+DT[, plot(a,b), by=x]                 # can also plot in 'j'
+dev.off()
+
+# using rleid, get max(y) and min of all cols in .SDcols for each consecutive run of 'v'
+DT[, c(.(y=max(y)), lapply(.SD, min)), by=rleid(v), .SDcols=v:b]
+
+# Follow r-help posting guide, SUPPORT is here (*not* r-help) :
 # http://stackoverflow.com/questions/tagged/data.table
 # or
 # datatable-help at lists.r-forge.r-project.org
 
 \dontrun{
 vignette("datatable-intro")
+vignette("datatable-reference-semantics")
+vignette("datatable-keys-fast-subset")
+vignette("datatable-secondary-indices-and-auto-indexing")
+vignette("datatable-reshape")
 vignette("datatable-faq")
 
-test.data.table()          # over 1,300 low level tests
 
-update.packages()          # keep up to date
+test.data.table()          # over 5700 low level tests
+
+# keep up to date with latest stable version on CRAN
+update.packages()
+# get the latest devel (needs Rtools for windows, xcode for mac)
+install.packages("data.table", repos = "https://Rdatatable.github.io/data.table", type = "source")
+
 }}
 \keyword{ data }
 
diff --git a/man/datatable-optimize.Rd b/man/datatable-optimize.Rd
new file mode 100644
index 0000000..008b518
--- /dev/null
+++ b/man/datatable-optimize.Rd
@@ -0,0 +1,148 @@
+\name{datatable.optimize}
+\alias{datatable-optimize}
+\alias{datatable.optimize}
+\alias{data.table-optimize}
+\alias{data.table.optimize}
+\alias{gforce}
+\alias{GForce}
+\alias{autoindex}
+\alias{autoindexing}
+\alias{auto-index}
+\alias{auto-indexing}
+\alias{rounding}
+\title{Optimisations in data.table}
+\description{
+\code{data.table} internally optimises certain expressions inorder to improve 
+performance. This section briefly summarises those optimisations.
+
+Note that there's no additional input needed from the user to take advantage 
+of these optimisations. They happen automatically.
+
+Run the code under the \emph{example} section to get a feel for the performance 
+benefits from these optimisations.
+
+}
+\details{
+\code{data.table} reads the global option \code{datatable.optimize} to figure 
+out what level of optimisation is required. The default value \code{Inf} 
+activates \emph{all} available optimisations.
+
+At optimisation level \code{>= 1}, i.e., \code{getOption("datatable.optimize")} 
+>= 1, these are the optimisations:
+
+\itemize{
+    \item The base function \code{order} is internally replaced with 
+    \code{data.table}'s \emph{fast ordering}. That is, \code{dt[order(...)]} 
+    gets internally optimised to \code{dt[forder(...)]}. 
+
+    \item The expression \code{dt[, lapply(.SD, fun), by=.]} gets optimised 
+    to \code{dt[, list(fun(a), fun(b), ...), by=.]} where \code{a,b, ...} are 
+    columns in \code{.SD}. This improves performance tremendously.
+
+    \item Similarly, the expression \code{dt[, c(.N, lapply(.SD, fun)), by=.]} 
+    gets optimised to \code{dt[, list(.N, fun(a), fun(b), ...)]}. \code{.N} is 
+    just for example here. 
+
+    \item \code{base::mean} function is internally optimised to use 
+    \code{data.table}'s \code{fastmean} function. \code{mean()} from \code{base} 
+    is an S3 generic and gets slow with many groups.
+}
+
+At optimisation level \code{>= 2}, i.e., \code{getOption("datatable.optimize")} >= 2, additional optimisations are implemented on top of the optimisations already shown above. 
+
+\itemize{
+
+    \item When expressions in \code{j} which contains only these functions 
+    \code{min, max, mean, median, var, sd, prod}, for example, 
+    \code{dt[, list(mean(x), median(x), min(y), max(y)), by=z]}, they are very 
+    effectively optimised using, what we call, \emph{GForce}. These functions 
+    are replaced with \code{gmean, gmedian, gmin, gmax} instead. 
+
+    Normally, once the rows belonging to each groups are identified, the values 
+    corresponding to the group is gathered and the \code{j-expression} is 
+    evaluated. This can be improved by computing the result directly without 
+    having to gather the values or evaluating the expression for each group 
+    (which can get costly with large number of groups) by implementing it 
+    specifically for a particular function. As a result, it is extremely fast.
+
+    \item In addition to all the functions above, `.N` is also optimised to 
+    use GForce. It when used separately or combined with the functions mentioned 
+    above still uses GForce.
+
+    \item Expressions of the form \code{DT[i, j, by]} are also optimised when 
+    \code{i} is a \emph{subset} operation and \code{j} is any/all of the functions 
+    discussed above.
+}
+
+\bold{Auto indexing:} \code{data.table} also allows for blazing fast subsets by 
+creating an \emph{index} on the first run. Any successive subsets on the same 
+column then reuses this index to \emph{binary search} (instead of 
+\emph{vector scan}) and is therefore fast.
+
+At the moment, expressions of the form \code{dt[col == val]} and 
+\code{dt[col \%in\% val]} are both optimised. We plan to expand this to more 
+operators and conditions in the future.
+
+Auto indexing can be switched off with the global option 
+\code{options(datatable.auto.index = FALSE)}. To switch off using existing 
+indices set global option \code{options(datatable.use.index = FALSE)}.
+}
+\seealso{ \code{\link{setNumericRounding}}, \code{\link{getNumericRounding}} }
+\examples{
+\dontrun{
+# Generate a big data.table with a relatively many columns
+set.seed(1L)
+dt = lapply(1:20, function(x) sample(c(-100:100), 5e6L, TRUE))
+setDT(dt)[, id := sample(1e5, 5e6, TRUE)]
+print(object.size(dt), units="Mb") # 400MB, not huge, but will do
+
+# 'order' optimisation
+options(datatable.optimize = 1L) # optimisation 'on'
+system.time(ans1 <- dt[order(id)])
+options(datatable.optimize = 0L) # optimisation 'off'
+system.time(ans2 <- dt[order(id)])
+identical(ans1, ans2)
+
+# optimisation of 'lapply(.SD, fun)'
+options(datatable.optimize = 1L) # optimisation 'on'
+system.time(ans1 <- dt[, lapply(.SD, min), by=id])
+options(datatable.optimize = 0L) # optimisation 'off'
+system.time(ans2 <- dt[, lapply(.SD, min), by=id])
+identical(ans1, ans2)
+
+# optimisation of 'mean'
+options(datatable.optimize = 1L) # optimisation 'on'
+system.time(ans1 <- dt[, lapply(.SD, mean), by=id])
+system.time(ans2 <- dt[, lapply(.SD, base::mean), by=id])
+identical(ans1, ans2)
+
+# optimisation of 'c(.N, lapply(.SD, ))'
+options(datatable.optimize = 1L) # optimisation 'on'
+system.time(ans1 <- dt[, c(.N, lapply(.SD, min)), by=id])
+options(datatable.optimize = 0L) # optimisation 'off'
+system.time(ans2 <- dt[, c(N=.N, lapply(.SD, min)), by=id])
+identical(ans1, ans2)
+
+# GForce
+options(datatable.optimize = 2L) # optimisation 'on'
+system.time(ans1 <- dt[, lapply(.SD, median), by=id])
+system.time(ans2 <- dt[, lapply(.SD, function(x) as.numeric(stats::median(x))), by=id])
+identical(ans1, ans2)
+
+# restore optimization
+options(datatable.optimize = Inf)
+
+# auto indexing
+options(datatable.auto.index = FALSE)
+system.time(ans1 <- dt[id == 100L]) # vector scan
+system.time(ans2 <- dt[id == 100L]) # vector scan
+system.time(dt[id %in% 100:500])    # vector scan
+
+options(datatable.auto.index = TRUE)
+system.time(ans1 <- dt[id == 100L]) # index + binary search subset
+system.time(ans2 <- dt[id == 100L]) # only binary search subset
+system.time(dt[id %in% 100:500])    # only binary search subset again
+
+}}
+\keyword{ data }
+
diff --git a/man/dcast.data.table.Rd b/man/dcast.data.table.Rd
index 64b18ca..bdcd55e 100644
--- a/man/dcast.data.table.Rd
+++ b/man/dcast.data.table.Rd
@@ -5,9 +5,9 @@
 \description{
   \code{dcast.data.table} is a much faster version of \code{reshape2::dcast}, but for \code{data.table}s. More importantly, it's capable of handling very large data quite efficiently in terms of memory usage in comparison to \code{reshape2::dcast}. 
 
-  From 1.9.6, \code{dcast} is a implemented as a S3 generic in \code{data.table}. To melt or cast data.tables, it is not necessary to load \code{reshape2} anymore. If you have to, then load \code{reshape2} package before loading \code{data.table}. 
+  From 1.9.6, \code{dcast} is implemented as an S3 generic in \code{data.table}. To melt or cast \code{data.table}s, it is not necessary to load \code{reshape2} anymore. If you have load \code{reshape2}, do so before loading \code{data.table} to prevent unwanted masking. 
 
-  \bold{NEW}: \code{dcast.data.table} can now cast multiple \code{value.var} columns and also accepts multiple functions under \code{fun.aggregate} argument. See \code{examples} for more.
+  \bold{NEW}: \code{dcast.data.table} can now cast multiple \code{value.var} columns and also accepts multiple functions to \code{fun.aggregate}. See Examples for more.
 }
 
 % \method{dcast}{data.table}
@@ -19,23 +19,26 @@
 }
 \arguments{
   \item{data}{ A \code{data.table}.}
-  \item{formula}{A formula of the form LHS ~ RHS to cast, see details.}
-  \item{fun.aggregate}{Should the data be aggregated before casting? If the formula doesn't identify single observation for each cell, then aggregation defaults to \code{length} with a message.
+  \item{formula}{A formula of the form LHS ~ RHS to cast, see Details.}
+  \item{fun.aggregate}{Should the data be aggregated before casting? If the formula doesn't identify a single observation for each cell, then aggregation defaults to \code{length} with a message.
 
-  \bold{NEW}: it is possible to provide a list of functions to \code{fun.aggregate} argument. See \code{examples}.}
-  \item{sep}{Default is \code{_} for backwards compatibility. Character vector of length 1, indicating the separating character in variable names generated during casting.}
-  \item{...}{Any other arguments that maybe passed to the aggregating function.}
+  \bold{NEW}: it is possible to provide a list of functions to \code{fun.aggregate}. See Examples. }
+  \item{sep}{Character vector of length 1, indicating the separating character in variable names generated during casting. Default is \code{_} for backwards compatibility. }
+  \item{...}{Any other arguments that may be passed to the aggregating function.}
   \item{margins}{Not implemented yet. Should take variable names to compute margins on. A value of \code{TRUE} would compute all margins.}
-  \item{subset}{Specified if casting should be done on subset of the data. Ex: subset = .(col1 <= 5) or subset = .(variable != "January").}
-  \item{fill}{Value to fill missing cells with. If \code{fun.aggregate} is present, takes the value by applying the function on 0-length vector.}
-  \item{drop}{\code{FALSE} will cast by including all missing combinations.}
+  \item{subset}{Specified if casting should be done on a subset of the data. Ex: \code{subset = .(col1 <= 5)} or \code{subset = .(variable != "January")}.}
+  \item{fill}{Value with which to fill missing cells. If \code{fun.aggregate} is present, takes the value by applying the function on a 0-length vector.}
+  \item{drop}{\code{FALSE} will cast by including all missing combinations. 
+
+  \bold{NEW:} Following \href{https://github.com/Rdatatable/data.table/issues/1512}{#1512}, \code{c(FALSE, TRUE)} will only include all missing combinations of formula \code{LHS}. And \code{c(TRUE, FALSE)} will only include all missing combinations of formula RHS. See Examples.}
+  
   \item{value.var}{Name of the column whose values will be filled to cast. Function `guess()` tries to, well, guess this column automatically, if none is provided.
 
-  \bold{NEW}: it is possible to cast multiple \code{value.var} columns simultaneously now. See \code{examples}.}
-  \item{verbose}{Not used yet. Maybe dropped in the future or used to provide information messages onto the console.}
+  \bold{NEW}: it is now possible to cast multiple \code{value.var} columns simultaneously. See Examples. }
+  \item{verbose}{Not used yet. May be dropped in the future or used to provide informative messages through the console.}
 }
 \details{
-The cast formula takes the form \code{LHS ~ RHS}	, ex: \code{var1 + var2 ~ var3}. The order of entries in the formula is essential. There are two special variables: \code{.} and \code{...}. Their functionality is identical to that of \code{reshape2::dcast}. 
+The cast formula takes the form \code{LHS ~ RHS}, ex: \code{var1 + var2 ~ var3}. The order of entries in the formula is essential. There are two special variables: \code{.} and \code{...}. \code{.} represents no variable; \code{...} represents all variables not otherwise mentioned in \code{formula}; see Examples. 
 
 \code{dcast} also allows \code{value.var} columns of type \code{list}.
 
@@ -68,6 +71,28 @@ dcast(DT, diet+chick ~ time, drop=FALSE, fill=0)
 # using subset
 dcast(DT, chick ~ time, fun=mean, subset=.(time < 10 & chick < 20))
 
+# drop argument, #1512
+DT <- data.table(v1 = c(1.1, 1.1, 1.1, 2.2, 2.2, 2.2),
+                 v2 = factor(c(1L, 1L, 1L, 3L, 3L, 3L), levels=1:3), 
+                 v3 = factor(c(2L, 3L, 5L, 1L, 2L, 6L), levels=1:6), 
+                 v4 = c(3L, 2L, 2L, 5L, 4L, 3L)) 
+# drop=TRUE
+dcast(DT, v1 + v2 ~ v3)                      # default is drop=TRUE
+dcast(DT, v1 + v2 ~ v3, drop=FALSE)          # all missing combinations of both LHS and RHS
+dcast(DT, v1 + v2 ~ v3, drop=c(FALSE, TRUE)) # all missing combinations of only LHS
+dcast(DT, v1 + v2 ~ v3, drop=c(TRUE, FALSE)) # all missing combinations of only RHS
+
+# using . and ...
+DT <- data.table(v1 = rep(1:2, each = 6),
+                 v2 = rep(rep(1:3, 2), each = 2),
+                 v3 = rep(1:2, 6),
+                 v4 = rnorm(6))
+dcast(DT, ... ~ v3, value.var = "v4") #same as v1 + v2 ~ v3, value.var = "v4"
+dcast(DT, v1 + v2 + v3 ~ ., value.var = "v4")
+
+## for each combination of (v1, v2), add up all values of v4
+dcast(DT, v1 + v2 ~ ., value.var = "v4", fun.aggregate = sum)
+
 \dontrun{
 # benchmark against reshape2's dcast, minimum of 3 runs
 set.seed(45)
@@ -93,7 +118,7 @@ dcast(dt, x + y ~ z, fun=list(sum, mean), value.var=c("d1", "d2"))
 dcast(dt, x + y ~ z, fun=list(sum, mean), value.var=list("d1", "d2"))
 }
 \seealso{
-  \code{\link{melt.data.table}}, \url{http://cran.r-project.org/package=reshape}
+  \code{\link{melt.data.table}}, \code{\link{rowid}}, \url{https://cran.r-project.org/package=reshape}
 }
 \keyword{data}
 
diff --git a/man/duplicated.Rd b/man/duplicated.Rd
index 9cd68f3..fa9052b 100644
--- a/man/duplicated.Rd
+++ b/man/duplicated.Rd
@@ -8,62 +8,80 @@
 \alias{uniqueN}
 \title{ Determine Duplicate Rows }
 \description{
-     \code{duplicated} returns a logical vector indicating which rows of a \code{data.table} (by 
-     key columns or when no key all columns) are duplicates of a row with smaller subscripts.
+\code{duplicated} returns a logical vector indicating which rows of a 
+\code{data.table} are duplicates of a row with smaller subscripts.
 
-     \code{unique} returns a \code{data.table} with duplicated rows (by key) removed, or
-     (when no key) duplicated rows by all columns removed.
+\code{unique} returns a \code{data.table} with duplicated rows removed, by 
+columns specified in \code{by} argument. When no \code{by} then duplicated
+rows by all columns are removed.
 
-     \code{anyDuplicated} returns the \emph{index} \code{i} of the first duplicated entry if there is one, and 0 otherwise. 
+\code{anyDuplicated} returns the \emph{index} \code{i} of the first duplicated 
+entry if there is one, and 0 otherwise. 
 
-     \code{uniqueN} is equivalent to \code{length(unique(x))} but much faster for \code{atomic vectors}, \code{data.frames} and \code{data.tables}, for other types it dispatch to \code{length(unique(x))}. The number of unique rows are computed directly without materialising the intermediate unique data.table and is therefore memory efficient as well.
+\code{uniqueN} is equivalent to \code{length(unique(x))} when x is an 
+\code{atomic vector}, and \code{nrow(unique(x))} when x is a \code{data.frame} 
+or \code{data.table}. The number of unique rows are computed directly without 
+materialising the intermediate unique data.table and is therefore faster and 
+memory efficient.
 
 }
 \usage{
-\method{duplicated}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
+\method{duplicated}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...)
 
-\method{unique}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
+\method{unique}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...)
 
-\method{anyDuplicated}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=key(x), ...)
+\method{anyDuplicated}{data.table}(x, incomparables=FALSE, fromLast=FALSE, by=seq_along(x), ...)
 
-uniqueN(x, by=if (is.data.table(x)) key(x) else NULL)
+uniqueN(x, by=if (is.list(x)) seq_along(x) else NULL, na.rm=FALSE)
 }
 \arguments{
-  \item{x}{ Atomic vectors, lists, data.frames or data.tables.}
-  \item{\dots}{ Not used at this time. }
-  \item{incomparables}{ Not used. Here for S3 method consistency. }
-  \item{fromLast}{ logical indicating if duplication should be considered from the reverse side, i.e., the last (or rightmost) of identical elements would correspond to \code{duplicated = FALSE}.}
-  \item{by}{
-    \code{character} or \code{integer} vector indicating which combinations of
-    columns form \code{x} to use for uniqueness checks. Defaults to
-    \code{key(x))} which, by default, only uses the keyed columns. \code{by=NULL}
-    uses all columns and acts like the analogous
-    \code{data.frame} methods.
-  }
+\item{x}{ A data.table. \code{uniqueN} accepts atomic vectors and data.frames 
+as well.}
+\item{\dots}{ Not used at this time. }
+\item{incomparables}{ Not used. Here for S3 method consistency. }
+\item{fromLast}{ logical indicating if duplication should be considered from 
+the reverse side, i.e., the last (or rightmost) of identical elements would 
+correspond to \code{duplicated = FALSE}.}
+\item{by}{\code{character} or \code{integer} vector indicating which combinations 
+of columns from \code{x} to use for uniqueness checks. By default all columns are
+are being used. The was changed recently for consistency to data.frame methods.
+In version \code{< 1.9.8} default was \code{key(x)}.}
+\item{na.rm}{Logical (default is \code{FALSE}). Should missing values (including 
+\code{NaN}) be removed?}
 }
 \details{
-  Because data.tables are usually sorted by key, tests for duplication are   especially quick when only the keyed columns are considered. Unlike \code{\link[base]{unique.data.frame}}, \code{paste} is not used to ensure equality of floating point data. It is instead accomplished directly (for speed) whilst avoiding unexpected behaviour due to floating point representation by rounding the last two bytes off the significand (default) as explained in \code{\link{setNumericRounding}}.
-
-  \code{v1.9.4} introduces \code{anyDuplicated} method for data.tables and is similar to base in functionality. It also implements the logical argument \code{fromLast} for all three functions, with default value \code{FALSE}. 
-
-  Any combination of columns can be used to test for uniqueness (not just the
-  key columns) and are specified via the \code{by} parameter. To get
-  the analagous \code{data.frame} functionality, set \code{by} to \code{NULL}.
+Because data.tables are usually sorted by key, tests for duplication are 
+especially quick when only the keyed columns are considered. Unlike 
+\code{\link[base]{unique.data.frame}}, \code{paste} is not used to ensure 
+equality of floating point data. It is instead accomplished directly and is 
+therefore quite fast. data.table provides \code{\link{setNumericRounding}} to 
+handle cases where limitations in floating point representation is undesirable.
+
+\code{v1.9.4} introduces \code{anyDuplicated} method for data.tables and is 
+similar to base in functionality. It also implements the logical argument 
+\code{fromLast} for all three functions, with default value \code{FALSE}. 
 }
 \value{
-     \code{duplicated} returns a logical vector of length \code{nrow(x)}
-  indicating which rows are duplicates.
+\code{duplicated} returns a logical vector of length \code{nrow(x)}
+indicating which rows are duplicates.
 
-     \code{unique} returns a data table with duplicated rows removed.
+\code{unique} returns a data table with duplicated rows removed.
 
-     \code{anyDuplicated} returns a integer value with the index of first duplicate. If none exists, 0L is returned.
+\code{anyDuplicated} returns a integer value with the index of first duplicate. 
+If none exists, 0L is returned.
 
-     \code{uniqueN} returns the number of unique elements in the vector, \code{data.frame} or \code{data.table}.
+\code{uniqueN} returns the number of unique elements in the vector, 
+\code{data.frame} or \code{data.table}.
 
 }
-\seealso{ \code{\link{setNumericRounding}}, \code{\link{data.table}}, \code{\link{duplicated}}, \code{\link{unique}}, \code{\link{all.equal}} }
+\seealso{ \code{\link{setNumericRounding}}, \code{\link{data.table}}, 
+\code{\link{duplicated}}, \code{\link{unique}}, \code{\link{all.equal}}, 
+\code{\link{fsetdiff}}, \code{\link{funion}}, \code{\link{fintersect}}, 
+\code{\link{fsetequal}}
+}
 \examples{
-DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A,B")
+DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), 
+                  C = rep(1:2, 6), key = "A,B")
 duplicated(DT)
 unique(DT)
 
@@ -87,7 +105,8 @@ identical(unique(DT),DT[1])  # TRUE, stable within tolerance
 identical(unique(DT),DT[10]) # FALSE
 
 # fromLast=TRUE
-DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), C = rep(1:2, 6), key = "A,B")
+DT <- data.table(A = rep(1:3, each=4), B = rep(1:4, each=3), 
+                 C = rep(1:2, 6), key = "A,B")
 duplicated(DT, by="B", fromLast=TRUE)
 unique(DT, by="B", fromLast=TRUE)
 
@@ -96,11 +115,15 @@ anyDuplicated(DT, by=c("A", "B"))    # 3L
 any(duplicated(DT, by=c("A", "B")))  # TRUE
 
 # uniqueN, unique rows on key columns
+uniqueN(DT, by = key(DT))
+# uniqueN, unique rows on all columns
 uniqueN(DT)
-# uniqueN, unique rows on all all columns
-uniqueN(DT, by=NULL)
 # uniqueN while grouped by "A"
 DT[, .(uN=uniqueN(.SD)), by=A]
-}
-\keyword{ data }
 
+# uniqueN's na.rm=TRUE
+x = sample(c(NA, NaN, runif(3)), 10, TRUE)
+uniqueN(x, na.rm = FALSE) # 5, default
+uniqueN(x, na.rm=TRUE) # 3
+}
+\keyword{ data }
\ No newline at end of file
diff --git a/man/first.Rd b/man/first.Rd
new file mode 100644
index 0000000..6e6a6ba
--- /dev/null
+++ b/man/first.Rd
@@ -0,0 +1,34 @@
+\name{first}
+\alias{first}
+\title{ First item of an object }
+\description{
+Returns the first item of a vector or list, or the first row of a data.frame 
+or data.table.
+}
+\usage{
+first(x, ...)
+}
+\arguments{
+\item{x}{ A vector, list, data.frame or data.table. Otherwise the S3 method 
+of \code{xts::first} is deployed. }
+\item{...}{ Not applicable for \code{data.table::first}. Any arguments here 
+are passed through to \code{xts::first}. }
+}
+\value{
+If no other arguments are supplied it depends on the type of x. The first item 
+of a vector or list. The first row of a \code{data.frame} or \code{data.table}. 
+Otherwise, whatever \code{xts::first} returns (if package xts has been loaded, 
+otherwise a helpful error).
+
+If any argument is supplied in addition to \code{x} (such as \code{n} or 
+\code{keep} in \code{xts::first}), regardless of \code{x}'s type, then 
+\code{xts::first} is called if xts has been loaded, otherwise a helpful error.
+}
+\seealso{ \code{\link{NROW}}, \code{\link{head}}, \code{\link{tail}}, 
+\code{\link{last}} }
+\examples{
+first(1:5) # [1] 1
+x = data.table(x=1:5, y=6:10)
+first(x) # same as x[1]
+}
+\keyword{ data }
diff --git a/man/foverlaps.Rd b/man/foverlaps.Rd
index 30f009a..648b18f 100644
--- a/man/foverlaps.Rd
+++ b/man/foverlaps.Rd
@@ -2,51 +2,124 @@
 \alias{foverlaps}
 \title{Fast overlap joins}
 \description{
-  A \emph{fast} binary-search based \emph{overlap join} of two \code{data.table}s. This is very much inspired by \code{findOverlaps} function from the bioconductor package \code{IRanges} (see link below under \code{See Also}).
-    
-  Usually, \code{x} is a very large data.table with small interval ranges, and \code{y} is much smaller \emph{keyed} \code{data.table} with relatively larger interval spans. For an usage in \code{genomics}, see the examples section.
-  
-  NOTE: This is still under development, meaning it's stable, but some features are yet to be implemented. Also, some arguments and/or the function name itself could be changed.
+A \emph{fast} binary-search based \emph{overlap join} of two \code{data.table}s.
+This is very much inspired by \code{findOverlaps} function from the Bioconductor
+package \code{IRanges} (see link below under \code{See Also}).
+
+Usually, \code{x} is a very large data.table with small interval ranges, and
+\code{y} is much smaller \emph{keyed} \code{data.table} with relatively larger
+interval spans. For a usage in \code{genomics}, see the examples section.
+
+NOTE: This is still under development, meaning it's stable, but some features
+are yet to be implemented. Also, some arguments and/or the function name itself
+could be changed.
 }
 
 \usage{
-foverlaps(x, y, by.x = if (!is.null(key(x))) key(x) else key(y), 
-    by.y = key(y), maxgap = 0L, minoverlap = 1L, 
-    type = c("any", "within", "start", "end", "equal"), 
-    mult = c("all", "first", "last"), 
-    nomatch = getOption("datatable.nomatch"), 
+foverlaps(x, y, by.x = if (!is.null(key(x))) key(x) else key(y),
+    by.y = key(y), maxgap = 0L, minoverlap = 1L,
+    type = c("any", "within", "start", "end", "equal"),
+    mult = c("all", "first", "last"),
+    nomatch = getOption("datatable.nomatch"),
     which = FALSE, verbose = getOption("datatable.verbose"))
 }
 \arguments{
-  \item{x, y}{ \code{data.table}s. \code{y} needs to be keyed, but not necessarily \code{x}. See examples. }
-  \item{by.x, by.y}{A vector of column names (or numbers) to compute the overlap joins. The last two columns in both \code{by.x} and \code{by.y} should each correspond to the \code{start} and \code{end} interval columns in \code{x} and \code{y} respectively. And the \code{start} column should always be <= \code{end} column. If \code{x} is keyed,  \code{by.x} is equal to \code{key(x)}, else \code{key(y)}. \code{by.y} defaults to \code{key(y)}. }
-  \item{maxgap}{It should be a non-negative integer value, >= 0. Default is 0 (no gap). For intervals \code{[a,b]} and \code{[c,d]}, where \code{a<=b} and \code{c<=d}, when \code{c > b} or \code{d < a}, the two intervals don't overlap. If the gap between these two intervals is \code{<= maxgap}, these two intervals are considered as overlapping. Note: This is not yet implemented.}
-  \item{minoverlap}{ It should be a positive integer value, > 0. Default is 1. For intervals \code{[a,b]} and \code{[c,d]}, where \code{a<=b} and \code{c<=d}, when \code{c<=b} and \code{d>=a}, the two intervals overlap. If the length of overlap between these two intervals is \code{>= minoverlap}, then these two intervals are considered to be overlapping. Note: This is not yet implemented.}
-  \item{type}{ Default value is \code{any}. Allowed values are \code{any}, \code{within}, \code{start}, \code{end} and \code{equal}. Note: \code{equal} is not yet implemented. But this is just a normal join of the type \code{y[x, ...]}, unless you require also using \code{maxgap} and \code{minoverlap} arguments. 
-  
-      The types shown here are identical in functionality to the function \code{findOverlaps} in the bioconductor package \code{IRanges}. Let \code{[a,b]} and \code{[c,d]} be intervals in \code{x} and \code{y} with \code{a<=b} and \code{c<=d}. For \code{type="start"}, the intervals overlap iff \code{a == c}. For \code{type="end"}, the intervals overlap iff \code{b == d}. For \code{type="within"}, the intervals overlap iff \code{a>=c and b<=d}. For \code{type="equal"}, the intervals overl [...]
-      
-      NB: \code{maxgap} argument, when > 0, is to be interpreted according to the type of the overlap. This will be updated once \code{maxgap} is implemented.}
-  \item{mult}{ When multiple rows in \code{y} match to the row in \code{x}, \code{mult=.} controls which values are returned - \code{"all"} (default), \code{"first"} or \code{"last"}.}
-  \item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row (with interval say, \code{[a,b]}) in \code{x} has no match in \code{y}, \code{nomatch=NA} (default) means \code{NA} is returned for \code{y}'s non-\code{by.y} columns for that row of \code{x}. \code{nomatch=0} means no rows will be returned for that row of \code{x}. The default value (used when \code{nomatch} is not supplied) can be changed from \code{NA} to \code{0} using \code{options(datatable.nomatch=0)}. }
-  \item{which}{ When \code{TRUE}, if \code{mult="all"} returns a two column \code{data.table} with the first column corresponding to \code{x}'s row number and the second corresponding to \code{y}'s. when \code{nomatch=NA}, no matches return \code{NA} for \code{y}, and if \code{nomatch=0}, those rows where no match is found will be skipped; if \code{mult="first" or "last"}, a vector of length equal to the number of rows in \code{x} is returned, with no-match entries filled with \code{NA}  [...]
-  \item{verbose}{ \code{TRUE} turns on status and information messages to the console. Turn this on by default using \code{options(datatable.verbose=TRUE)}. The quantity and types of verbosity may be expanded in future.}
+\item{x, y}{ \code{data.table}s. \code{y} needs to be keyed, but not necessarily
+\code{x}. See examples. }
+\item{by.x, by.y}{A vector of column names (or numbers) to compute the overlap
+joins. The last two columns in both \code{by.x} and \code{by.y} should each
+correspond to the \code{start} and \code{end} interval columns in \code{x} and
+\code{y} respectively. And the \code{start} column should always be <= \code{end}
+column. If \code{x} is keyed,  \code{by.x} is equal to \code{key(x)}, else
+\code{key(y)}. \code{by.y} defaults to \code{key(y)}. }
+\item{maxgap}{It should be a non-negative integer value, >= 0. Default is 0 (no
+gap). For intervals \code{[a,b]} and \code{[c,d]}, where \code{a<=b} and
+\code{c<=d}, when \code{c > b} or \code{d < a}, the two intervals don't overlap.
+If the gap between these two intervals is \code{<= maxgap}, these two intervals
+are considered as overlapping. Note: This is not yet implemented.}
+\item{minoverlap}{ It should be a positive integer value, > 0. Default is 1. For
+intervals \code{[a,b]} and \code{[c,d]}, where \code{a<=b} and \code{c<=d}, when
+\code{c<=b} and \code{d>=a}, the two intervals overlap. If the length of overlap
+between these two intervals is \code{>= minoverlap}, then these two intervals are
+considered to be overlapping. Note: This is not yet implemented.}
+\item{type}{ Default value is \code{any}. Allowed values are \code{any},
+\code{within}, \code{start}, \code{end} and \code{equal}. Note: \code{equal} is
+not yet implemented. But this is just a normal join of the type \code{y[x, ...]},
+unless you require also using \code{maxgap} and \code{minoverlap} arguments.
+
+The types shown here are identical in functionality to the function
+\code{findOverlaps} in the bioconductor package \code{IRanges}. Let \code{[a,b]}
+and \code{[c,d]} be intervals in \code{x} and \code{y} with \code{a<=b} and
+\code{c<=d}. For \code{type="start"}, the intervals overlap iff \code{a == c}.
+For \code{type="end"}, the intervals overlap iff \code{b == d}. For
+\code{type="within"}, the intervals overlap iff \code{a>=c and b<=d}. For
+\code{type="equal"}, the intervals overlap iff \code{a==c and b==d}. For
+\code{type="any"}, as long as \code{c<=b and d>=a}, they overlap. In addition
+to these requirments, they also have to satisfy the \code{minoverlap} argument
+as explained above.
+
+NB: \code{maxgap} argument, when > 0, is to be interpreted according to the type
+of the overlap. This will be updated once \code{maxgap} is implemented.}
+
+\item{mult}{ When multiple rows in \code{y} match to the row in \code{x},
+\code{mult=.} controls which values are returned - \code{"all"} (default),
+\code{"first"} or \code{"last"}.}
+\item{nomatch}{ Same as \code{nomatch} in \code{\link{match}}. When a row (with
+interval say, \code{[a,b]}) in \code{x} has no match in \code{y}, \code{nomatch=NA}
+(default) means \code{NA} is returned for \code{y}'s non-\code{by.y} columns for
+that row of \code{x}. \code{nomatch=0} means no rows will be returned for that
+row of \code{x}. The default value (used when \code{nomatch} is not supplied)
+can be changed from \code{NA} to \code{0} using \code{options(datatable.nomatch=0)}.}
+\item{which}{ When \code{TRUE}, if \code{mult="all"} returns a two column
+\code{data.table} with the first column corresponding to \code{x}'s row number
+and the second corresponding to \code{y}'s. when \code{nomatch=NA}, no matches
+return \code{NA} for \code{y}, and if \code{nomatch=0}, those rows where no
+match is found will be skipped; if \code{mult="first" or "last"}, a vector of
+length equal to the number of rows in \code{x} is returned, with no-match entries
+filled with \code{NA} or \code{0} corresponding to the \code{nomatch} argument.
+Default is \code{FALSE}, which returns a join with the rows in \code{y}.}
+\item{verbose}{ \code{TRUE} turns on status and information messages to the
+console. Turn this on by default using \code{options(datatable.verbose=TRUE)}.
+The quantity and types of verbosity may be expanded in future.}
 }
 \details{
-Very briefly, \code{foverlaps()} collapses the two-column interval in \code{y} to one-column of \emph{unique} values to generate a \code{lookup} table, and then performs the join depending on the type of \code{overlap}, using the already available \code{binary search} feature of \code{data.table}. The time (and space) required to generate the \code{lookup} is therefore proportional to the number of unique values present in the interval columns of \code{y} when combined together. 
+Very briefly, \code{foverlaps()} collapses the two-column interval in \code{y}
+to one-column of \emph{unique} values to generate a \code{lookup} table, and
+then performs the join depending on the type of \code{overlap}, using the
+already available \code{binary search} feature of \code{data.table}. The time
+(and space) required to generate the \code{lookup} is therefore proportional
+to the number of unique values present in the interval columns of \code{y}
+when combined together.
 
-Overlap joins takes advantage of the fact that \code{y} is sorted to speed-up finding overlaps. Therefore \code{y} has to be keyed (see \code{?setkey}) prior to running \code{foverlaps()}. A key on \code{x} is not necessary, although it \emph{might} speed things further. The columns in \code{by.x} argument should correspond to the columns specified in \code{by.y}. The last two columns should be the \emph{interval} columns in both \code{by.x} and \code{by.y}. The first interval column in  [...]
+Overlap joins takes advantage of the fact that \code{y} is sorted to speed-up
+finding overlaps. Therefore \code{y} has to be keyed (see \code{?setkey})
+prior to running \code{foverlaps()}. A key on \code{x} is not necessary,
+although it \emph{might} speed things further. The columns in \code{by.x}
+argument should correspond to the columns specified in \code{by.y}. The last
+two columns should be the \emph{interval} columns in both \code{by.x} and
+\code{by.y}. The first interval column in \code{by.x} should always be <= the
+second interval column in \code{by.x}, and likewise for \code{by.y}. The
+\code{\link{storage.mode}} of the interval columns must be either \code{double}
+or \code{integer}. It therefore works with \code{bit64::integer64} type as well.
 
-The \code{lookup} generation step could be quite time consuming if the number of unique values in \code{y} are too large (ex: in the order of tens of millions). There might be improvements possible by constructing lookup using RLE, which is a pending feature request. However most scenarios will not have too many unique values for \code{y}.
-
-Columns of \code{numeric} types (i.e., \code{double}) have their last two bytes rounded off while computing overlap joins, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. Have a look at \code{\link{setNumericRounding}} to learn more.
+The \code{lookup} generation step could be quite time consuming if the number
+of unique values in \code{y} are too large (ex: in the order of tens of millions).
+There might be improvements possible by constructing lookup using RLE, which is
+a pending feature request. However most scenarios will not have too many unique
+values for \code{y}.
 }
 \value{
-    A new \code{data.table} by joining over the interval columns (along with other additional identifier columns) specified in \code{by.x} and \code{by.y}. 
-    
-    NB: When \code{which=TRUE}: \code{a)} \code{mult="first" or "last"} returns a \code{vector} of matching row numbers in \code{y}, and \code{b)} when \code{mult="all"} returns a data.table with two columns with the first containing row numbers of \code{x} and the second column with corresponding row numbers of \code{y}.
-    
-    \code{nomatch=NA or 0} also influences whether non-matching rows are returned or not, as explained above.
+A new \code{data.table} by joining over the interval columns (along with other
+additional identifier columns) specified in \code{by.x} and \code{by.y}.
+
+NB: When \code{which=TRUE}: \code{a)} \code{mult="first" or "last"} returns a
+\code{vector} of matching row numbers in \code{y}, and \code{b)} when
+\code{mult="all"} returns a data.table with two columns with the first
+containing row numbers of \code{x} and the second column with corresponding
+row numbers of \code{y}.
+
+\code{nomatch=NA or 0} also influences whether non-matching rows are returned
+or not, as explained above.
 }
 
 \examples{
@@ -61,9 +134,9 @@ foverlaps(x, y, type="any", mult="first") ## returns only first match
 foverlaps(x, y, type="within") ## matches iff 'x' is within 'y'
 
 ## with extra identifiers (ex: in genomics)
-x = data.table(chr=c("Chr1", "Chr1", "Chr2", "Chr2", "Chr2"), 
+x = data.table(chr=c("Chr1", "Chr1", "Chr2", "Chr2", "Chr2"),
                start=c(5,10, 1, 25, 50), end=c(11,20,4,52,60))
-y = data.table(chr=c("Chr1", "Chr1", "Chr2"), start=c(1, 15,1), 
+y = data.table(chr=c("Chr1", "Chr1", "Chr2"), start=c(1, 15,1),
                end=c(4, 18, 55), geneid=letters[1:3])
 setkey(y, chr, start, end)
 foverlaps(x, y, type="any", which=TRUE)
@@ -74,16 +147,17 @@ foverlaps(x, y, type="within")
 foverlaps(x, y, type="start")
 
 ## x and y have different column names - specify by.x
-x = data.table(seq=c("Chr1", "Chr1", "Chr2", "Chr2", "Chr2"), 
+x = data.table(seq=c("Chr1", "Chr1", "Chr2", "Chr2", "Chr2"),
                start=c(5,10, 1, 25, 50), end=c(11,20,4,52,60))
-y = data.table(chr=c("Chr1", "Chr1", "Chr2"), start=c(1, 15,1), 
+y = data.table(chr=c("Chr1", "Chr1", "Chr2"), start=c(1, 15,1),
                end=c(4, 18, 55), geneid=letters[1:3])
 setkey(y, chr, start, end)
-foverlaps(x, y, by.x=c("seq", "start", "end"), 
+foverlaps(x, y, by.x=c("seq", "start", "end"),
             type="any", which=TRUE)
 }
 \seealso{
-  \code{\link{data.table}}, \url{http://www.bioconductor.org/packages/release/bioc/html/IRanges.html}, \code{\link{setNumericRounding}}
+\code{\link{data.table}},
+\url{http://www.bioconductor.org/packages/release/bioc/html/IRanges.html},
+\code{\link{setNumericRounding}}
 }
 \keyword{ data }
-
diff --git a/man/fread.Rd b/man/fread.Rd
index e797dd0..b428f2f 100644
--- a/man/fread.Rd
+++ b/man/fread.Rd
@@ -9,12 +9,13 @@
    `fread` is for \emph{regular} delimited files; i.e., where every row has the same number of columns. In future, secondary separator (\code{sep2}) may be specified \emph{within} each column. Such columns will be read as type \code{list} where each cell is itself a vector.
 }
 \usage{
-fread(input, sep="auto", sep2="auto", nrows=-1L, header="auto", na.strings="NA",
+fread(input, sep="auto", sep2="auto", nrows=-1L, header="auto", na.strings="NA", file,
 stringsAsFactors=FALSE, verbose=getOption("datatable.verbose"), autostart=1L,
 skip=0L, select=NULL, drop=NULL, colClasses=NULL,
 integer64=getOption("datatable.integer64"),         # default: "integer64"
 dec=if (sep!=".") "." else ",", col.names, 
-check.names=FALSE, encoding="unknown", strip.white=TRUE, 
+check.names=FALSE, encoding="unknown", quote="\"", 
+strip.white=TRUE, fill=FALSE, blank.lines.skip=FALSE, key=NULL, 
 showProgress=getOption("datatable.showProgress"),   # default: TRUE
 data.table=getOption("datatable.fread.datatable")   # default: TRUE
 )
@@ -26,27 +27,32 @@ data.table=getOption("datatable.fread.datatable")   # default: TRUE
   \item{nrows}{ The number of rows to read, by default -1 means all. Unlike \code{read.table}, it doesn't help speed to set this to the number of rows in the file (or an estimate), since the number of rows is automatically determined and is already fast. Only set \code{nrows} if you require the first 10 rows, for example. `nrows=0` is a special case that just returns the column names and types; e.g., a dry run for a large file or to quickly check format consistency of a set of files befo [...]
   \item{header}{ Does the first data line contain column names? Defaults according to whether every non-empty field on the first data line is type character. If so, or TRUE is supplied, any empty column names are given a default name. }
   \item{na.strings}{ A character vector of strings which are to be interpreted as \code{NA} values. By default \code{",,"} for columns read as type character is read as a blank string (\code{""}) and \code{",NA,"} is read as \code{NA}. Typical alternatives might be \code{na.strings=NULL} (no coercion to NA at all!) or perhaps \code{na.strings=c("NA","N/A","null")}. }
+  \item{file}{ File path, useful when we want to ensure that no shell commands will be executed. File path can also be provided to \code{input} argument. }
   \item{stringsAsFactors}{ Convert all character columns to factors? }
   \item{verbose}{ Be chatty and report timings? }
   \item{autostart}{ Any line number within the region of machine readable delimited text, by default 30. If the file is shorter or this line is empty (e.g. short files with trailing blank lines) then the last non empty line (with a non empty line above that) is used. This line and the lines above it are used to auto detect \code{sep}, \code{sep2} and the number of fields. It's extremely unlikely that \code{autostart} should ever need to be changed, we hope. }
-  \item{skip}{ If -1 (default) use the procedure described below starting on line \code{autostart} to find the first data row. \code{skip>=0} means ignore \code{autostart} and take line \code{skip+1} as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). \code{skip="string"} searches for \code{"string"} in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata). }
+  \item{skip}{ If 0 (default) use the procedure described below starting on line \code{autostart} to find the first data row. \code{skip>0} means ignore \code{autostart} and take line \code{skip+1} as the first data row (or column names according to header="auto"|TRUE|FALSE as usual). \code{skip="string"} searches for \code{"string"} in the file (e.g. a substring of the column names row) and starts on that line (inspired by read.xls in package gdata). }
   \item{select}{ Vector of column names or numbers to keep, drop the rest. }
   \item{drop}{ Vector of column names or numbers to drop, keep the rest. }
   \item{colClasses}{ A character vector of classes (named or unnamed), as read.csv. Or a named list of vectors of column names or numbers, see examples. colClasses in fread is intended for rare overrides, not for routine use. fread will only promote a column to a higher type if colClasses requests it. It won't downgrade a column to a lower type since NAs would result. You have to coerce such columns afterwards yourself, if you really require data loss. }
   \item{integer64}{ "integer64" (default) reads columns detected as containing integers larger than 2^31 as type \code{bit64::integer64}. Alternatively, \code{"double"|"numeric"} reads as \code{base::read.csv} does; i.e., possibly with loss of precision and if so silently. Or, "character". }
   \item{dec}{ The decimal separator as in \code{base::read.csv}. If not "." (default) then usually ",". See details. }
   \item{col.names}{ A vector of optional names for the variables (columns). The default is to use the header column if present or detected, or if not "V" followed by the column number. }
-  \item{check.names}{ default is \code{FALSE}. If \code{TRUE}, it uses the base function \code{\link{make.unique}} to ensure that column names are all unique.}
-  \item{encoding}{ default is \code{"unknown"}. Other possible options are \code{"UTF-8"} and \code{"Latin-1"}.  }
+  \item{check.names}{default is \code{FALSE}. If \code{TRUE} then the names of the variables in the \code{data.table} are checked to ensure that they are syntactically valid variable names. If necessary they are adjusted (by \code{\link{make.names}}) so that they are, and also to ensure that there are no duplicates.}
+  \item{encoding}{ default is \code{"unknown"}. Other possible options are \code{"UTF-8"} and \code{"Latin-1"}.  Note: it is not used to re-encode the input, rather enables handling of encoded strings in their native encoding. }
+  \item{quote}{ By default (\code{"\""}), if a field starts with a doublequote, \code{fread} handles embedded quotes robustly as explained under \code{Details}. If it fails, then another attempt is made to read the field \emph{as is}, i.e., as if quotes are disabled. By setting \code{quote=""}, the field is always read as if quotes are disabled. }
   \item{strip.white}{ default is \code{TRUE}. Strips leading and trailing whitespaces of unquoted fields. If \code{FALSE}, only header trailing spaces are removed. }
-  \item{showProgress}{ TRUE displays progress on the console using \code{\\r}. It is produced in fread's C code where the very nice (but R level) txtProgressBar and tkProgressBar are not easily available. }
+  \item{fill}{logical (default is \code{FALSE}). If \code{TRUE} then in case the rows have unequal length, blank fields are implicitly filled.}
+  \item{blank.lines.skip}{\code{logical}, default is \code{FALSE}. If \code{TRUE} blank lines in the input are ignored.}
+  \item{key}{Character vector of one or more column names which is passed to \code{\link{setkey}}. It may be a single comma separated string such as \code{key="x,y,z"}, or a vector of names such as \code{key=c("x","y","z")}. Only valid when argument \code{data.table=TRUE}.}
+  \item{showProgress}{ \code{TRUE} displays progress on the console using \code{\\r}. It is produced in fread's C code where the very nice (but R level) txtProgressBar and tkProgressBar are not easily available. }
   \item{data.table}{ TRUE returns a \code{data.table}. FALSE returns a \code{data.frame}. }
 }
 \details{
 
 Once the separator is found on line \code{autostart}, the number of columns is determined. Then the file is searched backwards from \code{autostart} until a row is found that doesn't have that number of columns. Thus, the first data row is found and any human readable banners are automatically skipped. This feature can be particularly useful for loading a set of files which may not all have consistently sized banners. Setting \code{skip>0} overrides this feature by setting \code{autostar [...]
 
-The first 5 rows, middle 5 rows and last 5 rows are then read to determine column types. The lowest type for each column is chosen from the ordered list \code{integer}, \code{integer64}, \code{double}, \code{character}. This enables \code{fread} to allocate exactly the right number of rows, with columns of the right type, up front once. The file may of course \emph{still} contain data of a different type in rows other than first, middle and last 5. In that case, the column types are bump [...]
+A sample of 1,000 rows is used to determine column types (100 rows from 10 points). The lowest type for each column is chosen from the ordered list: \code{logical}, \code{integer}, \code{integer64}, \code{double}, \code{character}. This enables \code{fread} to allocate exactly the right number of rows, with columns of the right type, up front once. The file may of course still contain data of a higher type in rows outside the sample. In that case, the column types are bumped mid read and [...]
 
 There is no line length limit, not even a very large one. Since we are encouraging \code{list} columns (i.e. \code{sep2}) this has the potential to encourage longer line lengths. So the approach of scanning each line into a buffer first and then rescanning that buffer is not used. There are no buffers used in \code{fread}'s C code at all. The field width limit is limited by R itself: the maximum width of a character string (currenly 2^31-1 bytes, 2GB).
 
@@ -60,10 +66,12 @@ If an empty line is encountered then reading stops there, with warning if any te
 
 On Windows, "French_France.1252" is tried which should be available as standard (any locale with comma decimal separator would suffice) and on unix "fr_FR.utf8" (you may need to install this locale on unix). \code{fread()} is very careful to set the locale back again afterwards, even if the function fails with an error. The choice of locale is determined by \code{options()$datatable.fread.dec.locale}. This may be a \emph{vector} of locale names and if so they will be tried in turn until  [...]
 
-\bold{Quotes:}
+\bold{Quotes:} 
+
+When \code{quote} is a single character, 
 
   \itemize{
-      \item{Spaces and othe whitespace (other than \code{sep} and \code{\\n}) may appear in unquoted character fields, e.g., \code{...,2,Joe Bloggs,3.14,...}.}
+      \item{Spaces and other whitespace (other than \code{sep} and \code{\\n}) may appear in unquoted character fields, e.g., \code{...,2,Joe Bloggs,3.14,...}.}
 
       \item{When \code{character} columns are \emph{quoted}, they must start and end with that quoting character immediately followed by \code{sep} or \code{\\n}, e.g., \code{...,2,"Joe Bloggs",3.14,...}. 
 
@@ -71,20 +79,19 @@ On Windows, "French_France.1252" is tried which should be available as standard
 
       If an embedded quote is followed by the separator inside a quoted field, the embedded quotes up to that point in that field must be balanced; e.g. \code{...,2,"www.blah?x="one",y="two"",3.14,...}.
 
-      Quoting may be used to signify that numeric data should be read as text.
-
       On those fields that do not satisfy these conditions, e.g., fields with unbalanced quotes, \code{fread} re-attempts that field as if it isn't quoted. This is quite useful in reading files that contains fields with unbalanced quotes as well, automatically.}
   }
 
+To read fields \emph{as is} instead, use \code{quote = ""}.
 }
 \value{
     A \code{data.table} by default. A \code{data.frame} when argument \code{data.table=FALSE}; e.g. \code{options(datatable.fread.datatable=FALSE)}.
 }
 \references{
 Background :\cr
-\url{http://cran.r-project.org/doc/manuals/R-data.html}\cr
+\url{https://cran.r-project.org/doc/manuals/R-data.html}\cr
 \url{http://stackoverflow.com/questions/1727772/quickly-reading-very-large-tables-as-dataframes-in-r}\cr
-\url{www.biostat.jhsph.edu/~rpeng/docs/R-large-tables.html}\cr
+\url{http://www.biostat.jhsph.edu/~rpeng/docs/R-large-tables.html}\cr
 \url{https://stat.ethz.ch/pipermail/r-help/2007-August/138315.html}\cr
 \url{http://www.cerebralmastication.com/2009/11/loading-big-data-into-r/}\cr
 \url{http://stackoverflow.com/questions/9061736/faster-than-scan-with-rcpp}\cr
@@ -256,6 +263,18 @@ fread(data, drop=2:3)             # same using column numbers
 fread(data, select=c("A","D"))    # less typing, easier to read
 fread(data, select=c(1,4))        # same using column numbers
 
+# skip blank lines
+fread("a,b\n1,a\n2,b\n\n\n3,c\n", blank.lines.skip=TRUE)
+# fill
+fread("a,b\n1,a\n2\n3,c\n", fill=TRUE)
+fread("a,b\n\n1,a\n2\n\n3,c\n\n", fill=TRUE)
+
+# fill with skip blank lines
+fread("a,b\n\n1,a\n2\n\n3,c\n\n", fill=TRUE, blank.lines.skip=TRUE)
+
+# check.names usage
+fread("a b,a b\n1,2\n")
+fread("a b,a b\n1,2\n", check.names=TRUE) # no duplicates + syntactically valid names
 }
 \keyword{ data }
 
diff --git a/man/fsort.Rd b/man/fsort.Rd
new file mode 100644
index 0000000..07aa2a4
--- /dev/null
+++ b/man/fsort.Rd
@@ -0,0 +1,32 @@
+\name{fsort}
+\alias{fsort}
+\title{Fast parallel sort}
+\description{
+  Similar to \code{base::sort} but parallel. Experimental.
+}
+
+\usage{
+fsort(x, decreasing = FALSE, na.last = FALSE, internal=FALSE, verbose=FALSE, ...)
+}
+\arguments{
+  \item{x}{ A vector. Type double, currently. }
+  \item{decreasing}{ Decreasing order? }
+  \item{na.last}{ Control treatment of \code{NA}s. If \code{TRUE}, missing values in the data are put last; if \code{FALSE}, they are put first; if \code{NA}, they are removed; if \code{"keep"} they are kept with rank \code{NA}. }
+  \item{internal}{ Internal use only. Temporary variable. Will be removed. }
+  \item{verbose}{ Print tracing information. }
+  \item{...}{ Not sure yet. Should be consistent with base R.}
+}
+\details{
+  Returns the input in sorted order. Fast using parallelism.
+}
+\value{    
+  The input in sorted order.
+}
+
+\examples{
+x = runif(1e6)
+system.time(ans1 <- sort(x, method="quick"))
+system.time(ans2 <- fsort(x))
+identical(ans1, ans2)
+}
+
diff --git a/man/fwrite.Rd b/man/fwrite.Rd
new file mode 100644
index 0000000..a1251e9
--- /dev/null
+++ b/man/fwrite.Rd
@@ -0,0 +1,118 @@
+\name{fwrite}
+\alias{fwrite}
+\title{Fast CSV writer}
+\description{
+As \code{write.csv} but much faster (e.g. 2 seconds versus 1 minute) and just as flexible. Modern machines almost surely have more than one CPU so \code{fwrite} uses them; on all operating systems including Linux, Mac and Windows.
+
+This is new functionality as of Nov 2016. We may need to refine argument names and defaults.
+}
+\usage{
+fwrite(x, file = "", append = FALSE, quote = "auto",
+  sep = ",", sep2 = c("","|",""),
+  eol = if (.Platform$OS.type=="windows") "\r\n" else "\n",
+  na = "", dec = ".", row.names = FALSE, col.names = TRUE,
+  qmethod = c("double","escape"),
+  logicalAsInt = FALSE, dateTimeAs = c("ISO","squash","epoch","write.csv"),
+  buffMB = 8L, nThread = getDTthreads(),
+  showProgress = getOption("datatable.showProgress"),
+  verbose = getOption("datatable.verbose"),
+  ..turbo=TRUE)
+}
+\arguments{
+  \item{x}{Any \code{list} of same length vectors; e.g. \code{data.frame} and \code{data.table}.}
+  \item{file}{Output file name. \code{""} indicates output to the console. }
+  \item{append}{If \code{TRUE}, the file is opened in append mode and column names (header row) are not written.}
+  \item{quote}{When \code{"auto"}, character fields, factor fields and column names will only be surrounded by double quotes when they need to be; i.e., when the field contains the separator \code{sep}, a line ending \code{\\n}, the double quote itself or (when \code{list} columns are present) \code{sep2[2]} (see \code{sep2} below). If \code{FALSE} the fields are not wrapped with quotes even if this would break the CSV due to the contents of the field. If \code{TRUE} double quotes are al [...]
+  \item{sep}{The separator between columns. Default is \code{","}.}
+  \item{sep2}{For columns of type \code{list} where each item is an atomic vector, \code{sep2} controls how to separate items \emph{within} the column. \code{sep2[1]} is written at the start of the output field, \code{sep2[2]} is placed between each item and \code{sep2[3]} is written at the end. \code{sep2[1]} and \code{sep2[3]} may be any length strings including empty \code{""} (default). \code{sep2[2]} must be a single character and (when \code{list} columns are present and therefore  [...]
+  \item{eol}{Line separator. Default is \code{"\r\n"} for Windows and \code{"\n"} otherwise.}
+  \item{na}{The string to use for missing values in the data. Default is a blank string \code{""}.}
+  \item{dec}{The decimal separator, by default \code{"."}. See link in references. Cannot be the same as \code{sep}.}
+  \item{row.names}{Should row names be written? For compatibility with \code{data.frame} and \code{write.csv} since \code{data.table} never has row names. Hence default \code{FALSE} unlike \code{write.csv}.} 
+  \item{col.names}{Should the column names (header row) be written? If missing, \code{append=TRUE} and the file already exists, the default is set to \code{FALSE} for convenience to prevent column names appearing again mid file.}
+  \item{qmethod}{A character string specifying how to deal with embedded double quote characters when quoting strings.
+      \itemize{
+	\item{"escape" - the quote character (as well as the backslash character) is escaped in C style by a backslash, or}
+	\item{"double" (default, same as \code{write.csv}), in which case the double quote is doubled with another one.}
+      }}
+  \item{logicalAsInt}{Should \code{logical} values be written as \code{1} and \code{0} rather than \code{"TRUE"} and \code{"FALSE"}?}
+  \item{dateTimeAs}{ How \code{Date}/\code{IDate}, \code{ITime} and \code{POSIXct} items are written. 
+      \itemize{
+	\item{"ISO" (default) - \code{2016-09-12}, \code{18:12:16} and \code{2016-09-12T18:12:16.999999Z}. 0, 3 or 6 digits of fractional seconds are printed if and when present for convenience, regardless of any R options such as \code{digits.secs}. The idea being that if milli and microseconds are present then you most likely want to retain them. R's internal UTC representation is written faithfully to encourage ISO standards, stymie timezone ambiguity and for speed. An option to consider is  [...]
+	\item{"squash" - \code{20160912}, \code{181216} and \code{20160912181216999}. This option allows fast and simple extraction of \code{yyyy}, \code{mm}, \code{dd} and (most commonly to group by) \code{yyyymm} parts using integer div and mod operations. In R for example, one line helper functions could use \code{\%/\%10000}, \code{\%/\%100\%\%100}, \code{\%\%100} and \code{\%/\%100} respectively. POSIXct UTC is squashed to 17 digits (including 3 digits of milliseconds always, even if \code [...]
+	\item{"epoch" - \code{17056}, \code{65536} and \code{1473703936.999999}. The underlying number of days or seconds since the relevant epoch (1970-01-01, 00:00:00 and 1970-01-01T00:00:00Z respectively), negative before that (see \code{?Date}). 0, 3 or 6 digits of fractional seconds are printed if and when present.}
+	\item{"write.csv" - this currently affects \code{POSIXct} only. It is written as \code{write.csv} does by using the \code{as.character} method which heeds \code{digits.secs} and converts from R's internal UTC representation back to local time (or the \code{"tzone"} attribute) as of that historical date. Accordingly this can be slow. All other column types (including \code{Date}, \code{IDate} and \code{ITime} which are independent of timezone) are written as the "ISO" option using fast C [...]
+      }
+  The first three options are fast due to new specialized C code. The epoch to date-part conversion uses a fast approach by Howard Hinnant (see references) using a day-of-year starting on 1 March. You should not be able to notice any difference in write speed between those three options. The date range supported for \code{Date} and \code{IDate} is [0000-03-01, 9999-12-31]. Every one of these 3,652,365 dates have been tested and compared to base R including all 2,790 leap days in this ran [...]
+  This option applies to vectors of date/time in list column cells, too. \cr \cr
+  A fully flexible format string (such as \code{"\%m/\%d/\%Y"}) is not supported. This is to encourage use of ISO standards and because that flexibility is not known how to make fast at C level. We may be able to support one or two more specific options if required.
+  }
+  \item{buffMB}{The buffer size (MB) per thread in the range 1 to 1024, default 8MB. Experiment to see what works best for your data on your hardware.}
+  \item{nThread}{The number of threads to use. Experiment to see what works best for your data on your hardware.}
+  \item{showProgress}{ Display a progress meter on the console? Ignored when \code{file==""}. }
+  \item{verbose}{Be chatty and report timings?}
+  \item{..turbo}{Use specialized custom C code to format numeric, integer and integer64 columns. This reduces call overhead to the C library and avoids copies. Try with and without to see the difference it makes on your machine and please report any differences in output. If you do find cases where \code{..turbo=FALSE} is needed, please report them as bugs, since this option WILL BE REMOVED in future. Hence why it has the \code{..} prefix.}
+}
+\details{
+\code{fwrite} began as a community contribution with \href{https://github.com/Rdatatable/data.table/pull/1613}{pull request #1613} by Otto Seiskari. This gave Matt Dowle the impetus to specialize the numeric formatting and to parallelize: \url{http://blog.h2o.ai/2016/04/fast-csv-writing-for-r/}. Final items were tracked in \href{https://github.com/Rdatatable/data.table/issues/1664}{issue #1664} such as automatic quoting, \code{bit64::integer64} support, decimal/scientific formatting exac [...]
+}
+\seealso{
+  \code{\link{setDTthreads}}, \code{\link{fread}}, \code{\link[utils]{write.csv}}, \code{\link[utils]{write.table}}, \href{https://CRAN.R-project.org/package=bit64}{\code{bit64::integer64}}
+}
+\references{
+  \url{http://howardhinnant.github.io/date_algorithms.html}\cr
+  \url{https://en.wikipedia.org/wiki/Decimal_mark}
+}
+\examples{
+
+DF = data.frame(A=1:3, B=c("foo","A,Name","baz"))
+fwrite(DF)
+write.csv(DF, row.names=FALSE, quote=FALSE)  # same
+
+fwrite(DF, row.names=TRUE, quote=TRUE)
+write.csv(DF)                                # same
+
+DF = data.frame(A=c(2.1,-1.234e-307,pi), B=c("foo","A,Name","bar"))
+fwrite(DF, quote='auto')        # Just DF[2,2] is auto quoted 
+write.csv(DF, row.names=FALSE)  # same numeric formatting
+
+DT = data.table(A=c(2,5.6,-3),B=list(1:3,c("foo","A,Name","bar"),round(pi*1:3,2)))
+fwrite(DT)
+fwrite(DT, sep="|", sep2=c("{",",","}"))
+
+\dontrun{
+
+set.seed(1)
+DT = as.data.table( lapply(1:10, sample,
+         x=as.numeric(1:5e7), size=5e6))                            #     382MB
+system.time(fwrite(DT, "/dev/shm/tmp1.csv"))                        #      0.8s
+system.time(write.csv(DT, "/dev/shm/tmp2.csv",                      #     60.6s
+                      quote=FALSE, row.names=FALSE))
+system("diff /dev/shm/tmp1.csv /dev/shm/tmp2.csv")                  # identical
+
+set.seed(1)
+N = 1e7
+DT = data.table(
+  str1=sample(sprintf("%010d",sample(N,1e5,replace=TRUE)), N, replace=TRUE),
+  str2=sample(sprintf("%09d",sample(N,1e5,replace=TRUE)), N, replace=TRUE),
+  str3=sample(sapply(sample(2:30, 100, TRUE), function(n)
+     paste0(sample(LETTERS, n, TRUE), collapse="")), N, TRUE),
+  str4=sprintf("%05d",sample(sample(1e5,50),N,TRUE)),
+  num1=sample(round(rnorm(1e6,mean=6.5,sd=15),2), N, replace=TRUE),
+  num2=sample(round(rnorm(1e6,mean=6.5,sd=15),10), N, replace=TRUE),
+  str5=sample(c("Y","N"),N,TRUE),
+  str6=sample(c("M","F"),N,TRUE),
+  int1=sample(ceiling(rexp(1e6)), N, replace=TRUE),
+  int2=sample(N,N,replace=TRUE)-N/2
+)                                                                   #     774MB
+system.time(fwrite(DT,"/dev/shm/tmp1.csv"))                         #      1.1s
+system.time(write.csv(DT,"/dev/shm/tmp2.csv",row.names=F,quote=F))  #     63.2s
+system("diff /dev/shm/tmp1.csv /dev/shm/tmp2.csv")                  # identical
+
+unlink("/dev/shm/tmp1.csv")
+unlink("/dev/shm/tmp2.csv")
+}
+
+}
+\keyword{ data }
+
diff --git a/man/last.Rd b/man/last.Rd
index 91dbd64..ee6979f 100644
--- a/man/last.Rd
+++ b/man/last.Rd
@@ -2,23 +2,35 @@
 \alias{last}
 \title{ Last item of an object }
 \description{
-   Returns the last item of a vector or list, or the last row of a data.frame or data.table.
+Returns the last item of a vector or list, or the last row of a data.frame or 
+data.table.
 }
 \usage{
-last(x,...)
+last(x, ...)
 }
 \arguments{
-  \item{x}{ A vector, list, data.frame or data.table. Otherwise the S3 method of \code{xts::last} is deployed. }
-  \item{...}{ Not applicable for \code{data.table::last}. Any arguments here are passed through to \code{xts::last}. }
+\item{x}{ A vector, list, data.frame or data.table. Otherwise the S3 method of 
+\code{xts::last} is deployed. }
+\item{...}{ Not applicable for \code{data.table::last}. Any arguments here are 
+passed through to \code{xts::last}. }
 }
 % \details{
 % }
 \value{
-    If no other arguments are supplied it depends on the type of x. The last item of a vector or list. The last row of a \code{data.frame} or \code{data.table}. Otherwise, whatever \code{xts::last} returns (if package xts has been loaded, otherwise a helpful error).
-    If any argument is supplied in addition to \code{x} (such as \code{n} or \code{keep} in \code{xts::last}), regardless of \code{x}'s type, then \code{xts::last} is called if xts has been loaded, otherwise a helpful error.
+If no other arguments are supplied it depends on the type of x. The last item 
+of a vector or list. The last row of a \code{data.frame} or \code{data.table}. 
+Otherwise, whatever \code{xts::last} returns (if package xts has been loaded, 
+otherwise a helpful error).
+
+If any argument is supplied in addition to \code{x} (such as \code{n} or 
+\code{keep} in \code{xts::last}), regardless of \code{x}'s type, then 
+\code{xts::last} is called if xts has been loaded, otherwise a helpful error.
+}
+\seealso{ \code{\link{NROW}}, \code{\link{head}}, \code{\link{tail}}, 
+\code{\link{first}} }
+\examples{
+last(1:5) # [1] 5
+x = data.table(x=1:5, y=6:10)
+last(x) # same as x[5]
 }
-\seealso{ \code{\link{NROW}}, \code{\link{head}}, \code{\link{tail}} }
-% \examples{
-% }
 \keyword{ data }
-
diff --git a/man/like.Rd b/man/like.Rd
index 1a17954..22f88f7 100644
--- a/man/like.Rd
+++ b/man/like.Rd
@@ -3,14 +3,14 @@
 \alias{\%like\%}
 \title{ Convenience function for calling regexpr. }
 \description{
-  Intended for use in [.data.table i.
+  Intended for use in \code{i} in \code{[.data.table}.
 }
 \usage{
 like(vector,pattern)
-vector %like% pattern
+vector \%like\% pattern
 }
 \arguments{
-   \item{vector}{ Either a character vector or a factor. A factor is faster. }
+   \item{vector}{ Either a \code{character} vector or a \code{factor}. A \code{factor} is faster. }
    \item{pattern}{ Passed on to \code{\link{grepl}}. }
 }
 % \details{
@@ -25,4 +25,3 @@ DT = data.table(Name=c("Mary","George","Martha"), Salary=c(2,3,4))
 DT[Name \%like\% "^Mar"]
 }
 \keyword{ data }
-
diff --git a/man/melt.data.table.Rd b/man/melt.data.table.Rd
index 5fae5db..ab5d4ac 100644
--- a/man/melt.data.table.Rd
+++ b/man/melt.data.table.Rd
@@ -3,12 +3,12 @@
 \alias{melt}
 \title{Fast melt for data.table}
 \description{
-  An S3 method for melting \code{data.table}s written entirely in C for speed. It also avoids any unnecessary copies by handling all arguments internally in a memory efficient manner.
-
-  From 1.9.6, to melt or cast data.tables, it is not necessary to load \code{reshape2} anymore. If you have to, then load \code{reshape2} package before loading \code{data.table}. 
-
-  \bold{NEW}: \code{melt.data.table} now allows melting into multiple columns simultaneously. See the \code{details} and \code{examples} section.
+An S3 method for melting \code{data.table}s written in C for speed and memory 
+efficiency. Since \code{v1.9.6}, \code{melt.data.table} allows melting into 
+multiple columns simultaneously.
 
+It is not necessary to load \code{reshape2} anymore. But if you have to, then 
+load \code{reshape2} package \emph{before} loading \code{data.table}. 
 }
 \usage{
 ## fast melt a data.table
@@ -19,32 +19,67 @@
     verbose = getOption("datatable.verbose"))
 }
 \arguments{
-  \item{data}{ A \code{data.table} object to melt.}
-  \item{id.vars}{vector of id variables. Can be integer (corresponding id column numbers) or character (id column names) vector. If missing, all non-measure columns will be assigned to it.}
-  \item{measure.vars}{vector of measure variables. Can be integer (corresponding measure column numbers) or character (measure column names) vector. If missing, all non-id columns will be assigned to it.
-  
-  \bold{NEW:} \code{measure.vars} also now accepts a list of character/integer vectors to melt into multiple columns - i.e., melt into more than one \code{value} columns simultaneously. Use the function \code{patterns} to provide multiple patterns conveniently. See the examples section. }
-  \item{variable.name}{name for the measured variable names column. The default name is 'variable'.}
-  \item{value.name}{name for the molten data values column. The default name is 'value'.}
-  \item{na.rm}{If \code{TRUE}, \code{NA} values will be removed from the molten data.}
-  \item{variable.factor}{If \code{TRUE}, the \code{variable} column will be converted to \code{factor}, else it will be a \code{character} column.}
-  \item{value.factor}{If \code{TRUE}, the \code{value} column will be converted to \code{factor}, else the molten value type is left unchanged.}
-  \item{verbose}{\code{TRUE} turns on status and information messages to the console. Turn this on by default using \code{options(datatable.verbose=TRUE)}. The quantity and types of verbosity may be expanded in future.}
-  \item{...}{any other arguments to be passed to/from other methods.}
+\item{data}{ A \code{data.table} object to melt.}
+\item{id.vars}{vector of id variables. Can be integer (corresponding id 
+column numbers) or character (id column names) vector. If missing, all 
+non-measure columns will be assigned to it.}
+\item{measure.vars}{vector of measure variables. Can be integer (corresponding 
+measure column numbers) or character (measure column names) vector. If missing, 
+all non-id columns will be assigned to it.
+
+\code{measure.vars} also now accepts a list of character/integer vectors to 
+melt into multiple columns - i.e., melt into more than one \code{value} columns 
+simultaneously. Use \code{\link{patterns}} to provide multiple patterns 
+conveniently. See also \code{Examples}.}
+\item{variable.name}{name for the measured variable names column. The default 
+name is 'variable'.}
+\item{value.name}{name for the molten data values column. The default name is 
+'value'.}
+\item{na.rm}{If \code{TRUE}, \code{NA} values will be removed from the molten 
+data.}
+\item{variable.factor}{If \code{TRUE}, the \code{variable} column will be 
+converted to \code{factor}, else it will be a \code{character} column.}
+\item{value.factor}{If \code{TRUE}, the \code{value} column will be converted 
+to \code{factor}, else the molten value type is left unchanged.}
+\item{verbose}{\code{TRUE} turns on status and information messages to the 
+console. Turn this on by default using \code{options(datatable.verbose=TRUE)}. 
+The quantity and types of verbosity may be expanded in future.}
+\item{...}{any other arguments to be passed to/from other methods.}
 }
 \details{
-If \code{id.vars} and \code{measure.vars} are both missing, all non-\code{numeric/integer/logical} columns are assigned as id variables and the rest as measure variables. If only one of \code{id.vars} or \code{measure.vars} is supplied, the rest of the columns will be assigned to the other. Both \code{id.vars} and \code{measure.vars} can have the same column more than once and the same column can be both as id and measure variables. 
-
-\code{melt.data.table} also accepts \code{list} columns for both id and measure variables. 
-
-When all \code{measure.vars} are not of the same type, they'll be coerced according to the hierarchy \code{list} > \code{character} > \code{numeric > integer > logical}. For example, if any of the measure variables is a \code{list}, then entire value column will be coerced to a list. Note that, if the type of \code{value} column is a list, \code{na.rm = TRUE} will have no effect.
-
-From version \code{1.9.6}, \code{melt} gains a feature with \code{measure.vars} accepting a list of \code{character} or \code{integer} vectors as well to melt into multiple columns in a single function call efficiently. See the \code{examples} section for the usage.
-
-Attributes are preserved if all \code{value} columns are of the same type. By default, if any of the columns to be melted are of type \code{factor}, it'll be coerced to \code{character} type. This is to be compatible with \code{reshape2}'s \code{melt.data.frame}. To get a \code{factor} column, set \code{value.factor = TRUE}. \code{melt.data.table} also preserves \code{ordered} factors.
+If \code{id.vars} and \code{measure.vars} are both missing, all 
+non-\code{numeric/integer/logical} columns are assigned as id variables and 
+the rest as measure variables. If only one of \code{id.vars} or 
+\code{measure.vars} is supplied, the rest of the columns will be assigned to 
+the other. Both \code{id.vars} and \code{measure.vars} can have the same column 
+more than once and the same column can be both as id and measure variables. 
+
+\code{melt.data.table} also accepts \code{list} columns for both id and measure 
+variables. 
+
+When all \code{measure.vars} are not of the same type, they'll be coerced 
+according to the hierarchy \code{list} > \code{character} > \code{numeric > 
+integer > logical}. For example, if any of the measure variables is a 
+\code{list}, then entire value column will be coerced to a list. Note that, 
+if the type of \code{value} column is a list, \code{na.rm = TRUE} will have no 
+effect.
+
+From version \code{1.9.6}, \code{melt} gains a feature with \code{measure.vars} 
+accepting a list of \code{character} or \code{integer} vectors as well to melt 
+into multiple columns in a single function call efficiently. The function 
+\code{\link{patterns}} can be used to provide regular expression patterns. When 
+used along with \code{melt}, if \code{cols} argument is not provided, the 
+patterns will be matched against \code{names(data)}, for convenience.
+
+Attributes are preserved if all \code{value} columns are of the same type. By 
+default, if any of the columns to be melted are of type \code{factor}, it'll 
+be coerced to \code{character} type. This is to be compatible with 
+\code{reshape2}'s \code{melt.data.frame}. To get a \code{factor} column, set 
+\code{value.factor = TRUE}. \code{melt.data.table} also preserves 
+\code{ordered} factors.
 }
 \value{
-    An unkeyed \code{data.table} containing the molten data.
+An unkeyed \code{data.table} containing the molten data.
 }
 
 \examples{
@@ -83,7 +118,7 @@ melt(DT, id=1, measure=c("c_1", "i_2")) # i2 coerced to char
 # on na.rm=TRUE. NAs are removed efficiently, from within C
 melt(DT, id=1, measure=c("c_1", "i_2"), na.rm=TRUE) # remove NA
 
-# NEW FEATURE: measure.vars can be a list
+# measure.vars can be also a list
 # melt "f_1,f_2" and "d_1,d_2" simultaneously, retain 'factor' attribute
 # convenient way using internal function patterns()
 melt(DT, id=1:2, measure=patterns("^f_", "^d_"), value.factor=TRUE)
diff --git a/man/merge.Rd b/man/merge.Rd
index 6f3bccd..7cc0683 100644
--- a/man/merge.Rd
+++ b/man/merge.Rd
@@ -1,16 +1,19 @@
 \name{merge}
 \alias{merge}
 \alias{merge.data.table}
-\title{ Merge Two Data Tables }
+\title{Merge two data.tables}
 \description{
-  Fast merge of two \code{data.table}s.
-
-  This \code{merge} method for \code{data.table} behaves very similarly to that
-  of \code{data.frame}s with one major exception: By default, 
-  the columns used to merge the \code{data.table}s are the shared key columns
-  rather than the shared columns with the same names. Set the \code{by}, or \code{by.x}, 
-  \code{by.y} arguments explicitly to override this default.
-  
+Fast merge of two \code{data.table}s. The \code{data.table} method behaves 
+very similarly to that of \code{data.frame}s except that, by default, it attempts to merge
+
+\itemize{
+  \item at first based on the shared key columns, and if there are none, 
+  \item then based on key columns of the first argument \code{x}, and if there 
+  are none, 
+  \item then based on the common columns between the two \code{data.table}s.
+}
+
+Set the \code{by}, or \code{by.x} and \code{by.y} arguments explicitly to override this default.
 }
 
 \usage{
@@ -21,83 +24,59 @@ allow.cartesian=getOption("datatable.allow.cartesian"),  # default FALSE
 }
 
 \arguments{
-  \item{x, y}{
-    \code{data table}s. \code{y} is coerced to a \code{data.table} if
-    it isn't one already.
-  }
-
-  \item{by}{
-    A vector of shared column names in \code{x} and \code{y} to merge on.
-    This defaults to the shared key columns between the two tables.
-    If \code{y} has no key columns, this defaults to the key of \code{x}.
-  }
-  
-  \item{by.x, by.y}{
-    Vectors of column names in \code{x} and \code{y} to merge on.
-  }
-
-  \item{all}{
-    logical; \code{all = TRUE} is shorthand to save setting both \code{all.x = TRUE} and
-    \code{all.y = TRUE}.
-  }
-
-  \item{all.x}{
-    logical; if \code{TRUE}, then extra rows will be added to the
-    output, one for each row in \code{x} that has no matching row in
-    \code{y}.  These rows will have 'NA's in those columns that are
-    usually filled with values from \code{y}.  The default is \code{FALSE},
-    so that only rows with data from both \code{x} and \code{y} are
-    included in the output.
-  }
-
-  \item{all.y}{
-    logical; analogous to \code{all.x} above.
-  }
-  
-  \item{sort}{
-    logical. If \code{TRUE} (default), the merged \code{data.table} is sorted 
-    by setting the key to the \code{by / by.x} columns. If \code{FALSE}, the 
-    result is not sorted.
-  }
-  
-  \item{suffixes}{
-    A \code{character(2)} specifying the suffixes to be used for making
-    non-\code{by} column names unique. The suffix behavior works in a similar 
-    fashion as the \code{\link{merge.data.frame}} method does.
-  }
-  
-  \item{allow.cartesian}{
-    See \code{allow.cartesian} in \code{\link{[.data.table}}.
-  }
-
-  \item{\dots}{
-    Not used at this time.
-  }
+\item{x, y}{\code{data table}s. \code{y} is coerced to a \code{data.table} if 
+it isn't one already.}
+\item{by}{A vector of shared column names in \code{x} and \code{y} to merge on.
+This defaults to the shared key columns between the two tables.
+If \code{y} has no key columns, this defaults to the key of \code{x}.}
+\item{by.x, by.y}{Vectors of column names in \code{x} and \code{y} to merge on.}
+\item{all}{logical; \code{all = TRUE} is shorthand to save setting both 
+\code{all.x = TRUE} and \code{all.y = TRUE}.}
+\item{all.x}{logical; if \code{TRUE}, then extra rows will be added to the 
+output, one for each row in \code{x} that has no matching row in \code{y}.  
+These rows will have 'NA's in those columns that are usually filled with values 
+from \code{y}.  The default is \code{FALSE}, so that only rows with data from both 
+\code{x} and \code{y} are included in the output.}
+\item{all.y}{logical; analogous to \code{all.x} above.}
+\item{sort}{logical. If \code{TRUE} (default), the merged \code{data.table} is 
+sorted by setting the key to the \code{by / by.x} columns. If \code{FALSE}, the 
+result is not sorted.}
+\item{suffixes}{A \code{character(2)} specifying the suffixes to be used for 
+making non-\code{by} column names unique. The suffix behavior works in a similar 
+fashion as the \code{\link{merge.data.frame}} method does.}
+\item{allow.cartesian}{See \code{allow.cartesian} in \code{\link{[.data.table}}.}
+\item{\dots}{Not used at this time.}
 }
 
 \details{
-  \code{\link{merge}} is a generic function in base R. It dispatches to either the
-  \code{merge.data.frame} method or \code{merge.data.table} method depending on the class of its first argument.
-  
-  In versions \code{< v1.9.6}, if the specified columns in \code{by} was not the key (or head of the key) of \code{x} or \code{y}, then a \code{\link{copy}} is first rekeyed prior to performing the merge. This was less performant and memory inefficient. 
-
-  In version \code{v1.9.4} secondary keys was implemented. In \code{v1.9.6}, the concept of secondary keys has been 
-  extended to \code{merge}. No deep copies are made anymore and therefore very performant and memory efficient. Also there is better control for providing the columns to merge on with the help of newly implemented \code{by.x} and \code{by.y} arguments.
-
-  For a more \code{data.table}-centric way of merging two \code{data.table}s, see \code{\link{[.data.table}}; e.g., \code{x[y, ...]}. See FAQ 1.12 for a detailed comparison of \code{merge} and \code{x[y, ...]}.
-
-  Merges on numeric columns: Columns of numeric types (i.e., double) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. For large numbers (integers > 2^31), we recommend using \code{bit64::integer64}. Have a look at \code{\link{setNumericRounding}} to learn more.
-  
+\code{\link{merge}} is a generic function in base R. It dispatches to either the
+\code{merge.data.frame} method or \code{merge.data.table} method depending on 
+the class of its first argument. Note that, unlike \code{SQL}, \code{NA} is 
+matched against \code{NA} (and \code{NaN} against \code{NaN}) while merging.
+
+In versions \code{<= v1.9.4}, if the specified columns in \code{by} was not the 
+key (or head of the key) of \code{x} or \code{y}, then a \code{\link{copy}} is 
+first rekeyed prior to performing the merge. This was less performant and memory 
+inefficient. The concept of secondary keys (implemented in \code{v1.9.4}) was 
+used to overcome this limitation from \code{v1.9.6}+. No deep copies are made 
+anymore and therefore very performant and memory efficient. Also there is better 
+control for providing the columns to merge on with the help of newly implemented 
+\code{by.x} and \code{by.y} arguments.
+
+For a more \code{data.table}-centric way of merging two \code{data.table}s, see 
+\code{\link{[.data.table}}; e.g., \code{x[y, ...]}. See FAQ 1.12 for a detailed 
+comparison of \code{merge} and \code{x[y, ...]}.
 }
 
 \value{
-  A new \code{data.table} based on the merged \code{data table}s, sorted by the
-  columns set (or inferred for) the \code{by} argument.
+A new \code{data.table} based on the merged \code{data table}s, and sorted by the 
+columns set (or inferred for) the \code{by} argument if argument \code{sort} is 
+set to \code{TRUE}.
 }
 
 \seealso{
-  \code{\link{data.table}}, \code{\link{[.data.table}},
-  \code{\link{merge.data.frame}}
+\code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{[.data.table}}, 
+\code{\link{merge.data.frame}}
 }
 
 \examples{
diff --git a/man/openmp-utils.Rd b/man/openmp-utils.Rd
new file mode 100644
index 0000000..ab5b797
--- /dev/null
+++ b/man/openmp-utils.Rd
@@ -0,0 +1,19 @@
+\name{setDTthreads}
+\alias{setDTthreads}
+\alias{getDTthreads}
+\title{ Set or get number of threads that data.table should use }
+\description{
+Set and get number of threads to be used in \code{data.table} functions that are parallelized with OpenMP. Default value 0 means to utilize all CPU available with an appropriate number of threads calculated by OpenMP. \code{getDTthreads()} returns the number of threads that will be used. This affects \code{data.table} only and does not change R itself or other packages using OpenMP. The most common usage expected is \code{setDTthreads(1)} to limit \code{data.table} to one thread for pre- [...]
+}
+\usage{
+setDTthreads(threads)
+getDTthreads()
+}
+\arguments{
+  \item{threads}{ An integer >= 0. Default 0 means use all CPU available and leave the operating system to multi task. }
+}
+\value{
+A length 1 \code{integer}. The old value is returned by \code{setDTthreads} so you can store that value and pass it to \code{setDTthreads} again after the section of your code where you, probably, limited to one thread.
+}
+\keyword{ data }
+
diff --git a/man/patterns.Rd b/man/patterns.Rd
index 0ff1c6e..c35d42d 100644
--- a/man/patterns.Rd
+++ b/man/patterns.Rd
@@ -1,24 +1,33 @@
 \name{patterns}
 \alias{patterns}
-\title{Regex patterns to extract columns from data.table}
+\title{Obtain matching indices corresponding to patterns}
 \description{
-  From \code{v1.9.6}, \code{\link{melt.data.table}} has a new enhanced functionality in which \code{measure.vars} argument can accept a \emph{list of column names} and melt them into separate columns. See the \code{Efficient reshaping using data.tables} vignette linked below to learn more.
+\code{patterns} returns the matching indices in the argument \code{cols} 
+corresponding to the regular expression patterns provided. The patterns must be 
+supported by \code{\link[base]{grep}}.
 
-  \code{patterns} is designed purely for convenience, to be used only within the \code{measure.vars} argument of \code{melt.data.table}. Column names corresponding to each pattern from the \code{data.table} is melted into a separate column.
+From \code{v1.9.6}, \code{\link{melt.data.table}} has an enhanced functionality 
+in which \code{measure.vars} argument can accept a \emph{list of column names} 
+and melt them into separate columns. See the \code{Efficient reshaping using 
+data.tables} vignette linked below to learn more.
 }
 \usage{
-patterns(...)
+patterns(..., cols=character(0))
 }
 \arguments{
-  \item{...}{ A set of patterns. See example. }
+  \item{...}{A set of regular expression patterns.}
+  \item{cols}{A character vector of names to which each pattern is matched.}
 }
 \seealso{ 
-  \code{\link{melt}}, \url{https://github.com/Rdatatable/data.table/wiki/Getting-started}
+  \code{\link{melt}}, 
+  \url{https://github.com/Rdatatable/data.table/wiki/Getting-started}
 }
 \examples{
-# makes sense only in the context of melt at the moment
 dt = data.table(x1 = 1:5, x2 = 6:10, y1 = letters[1:5], y2 = letters[6:10])
 # melt all columns that begin with 'x' & 'y', respectively, into separate columns
+melt(dt, measure.vars = patterns("^x", "^y", cols=names(dt)))
+# when used with melt, 'cols' is implictly assumed to be names of input 
+# data.table, if not provided.
 melt(dt, measure.vars = patterns("^x", "^y"))
 }
-\keyword{ data }
+\keyword{data}
diff --git a/man/print.data.table.Rd b/man/print.data.table.Rd
new file mode 100644
index 0000000..8726f29
--- /dev/null
+++ b/man/print.data.table.Rd
@@ -0,0 +1,49 @@
+\name{print.data.table}
+\alias{print.data.table}
+\title{ data.table Printing Options }
+\description{
+  \code{print.data.table} extends the functionalities of \code{print.data.frame}.
+
+  Key enhancements include automatic output compression of many observations and concise column-wise \code{class} summary.
+}
+\usage{
+  \method{print}{data.table}(x,
+    topn=getOption("datatable.print.topn"),          # default: 5
+    nrows=getOption("datatable.print.nrows"),        # default: 100
+    class=getOption("datatable.print.class"),  # default: FALSE
+    row.names=getOption("datatable.print.rownames"), # default: TRUE
+    quote=FALSE,...)
+}
+\arguments{
+  \item{x}{ A \code{data.table}. }
+  \item{topn}{ The number of rows to be printed from the beginning and end of tables with more than \code{nrows} rows. }
+  \item{nrows}{ The number of rows which will be printed before truncation is enforced. }
+  \item{class}{ If \code{TRUE}, the resulting output will include above each column its storage class (or a self-evident abbreviation thereof). }
+  \item{row.names}{ If \code{TRUE}, row indices will be printed alongside \code{x}. }
+  \item{quote}{ If \code{TRUE}, all output will appear in quotes, as in \code{print.default}. }
+  \item{\dots}{ Other arguments ultimately passed to \code{format}. }
+}
+\details{
+  By default, with an eye to the typically large number of observations in a code{data.table}, only the beginning and end of the object are displayed (specifically, \code{head(x, topn)} and \code{tail(x, topn)} are displayed unless \code{nrow(x) < nrows}, in which case all rows will print).
+}
+\seealso{\code{\link{print.default}}}
+\examples{
+  #output compression
+  DT <- data.table(a = 1:1000)
+  print(DT, nrows = 100, topn = 4)
+  
+  #`quote` can be used to identify whitespace
+  DT <- data.table(blanks = c(" 12", " 34"),
+                   noblanks = c("12", "34"))
+  print(DT, quote = TRUE)
+  
+  #`class` provides handy column type summaries at a glance
+  DT <- data.table(a = vector("integer", 3), 
+                   b = vector("complex", 3),
+                   c = as.IDate(paste0("2016-02-0", 1:3)))
+  print(DT, class = TRUE)
+  
+  #`row.names` can be eliminated to save space
+  DT <- data.table(a = 1:3)
+  print(DT, row.names = FALSE)
+}
\ No newline at end of file
diff --git a/man/rbindlist.Rd b/man/rbindlist.Rd
index eae4d1d..fe20aec 100644
--- a/man/rbindlist.Rd
+++ b/man/rbindlist.Rd
@@ -34,7 +34,7 @@ Note that any additional attributes that might exist on individual items of the
 \value{
     An unkeyed \code{data.table} containing a concatenation of all the items passed in.
 }
-\seealso{ \code{\link{data.table}} }
+\seealso{ \code{\link{data.table}}, \code{\link{split.data.table}} }
 \examples{
 # default case
 DT1 = data.table(A=1:3,B=letters[1:3])
diff --git a/man/rleid.Rd b/man/rleid.Rd
index 6a98dad..2881041 100644
--- a/man/rleid.Rd
+++ b/man/rleid.Rd
@@ -1,36 +1,41 @@
 \name{rleid}
 \alias{rleid}
 \alias{rleidv}
-\title{ Generate run-length type group id}
+\title{Generate run-length type group id}
 \description{
    A convenience function for generating a \emph{run-length} type \emph{id} column to be used in grouping operations. It accepts atomic vectors, lists, data.frames or data.tables as input.
 }
 \usage{
-rleid(...)
-rleidv(x, cols=seq_along(x))
+rleid(..., prefix=NULL)
+rleidv(x, cols=seq_along(x), prefix=NULL)
 }
 \arguments{
   \item{x}{ A vector, list, data.frame or data.table. }
   \item{...}{ A sequence of numeric, integer64, character or logical vectors, all of same length. For interactive use.}
   \item{cols}{ Only meaningful for lists, data.frames or data.tables. A character vector of column names (or numbers) of x. }
+  \item{prefix}{ Either \code{NULL} (default) or a character vector of length=1 which is prefixed to the row ids, returning a character vector (instead of an integer vector).}
 }
 \details{
-    At times aggregation (or grouping) operations need to be performed where consecutive runs of identical values should belong to the same group (See \code{\link[base]{rle}}). The use for such a function has come up repeatedly on StackOverflow, see the \code{See Also} section. This function allows to generate \emph{"run-length"} groups directly.
+    At times aggregation (or grouping) operations need to be performed where consecutive runs of identical values should belong to the same group (See \code{\link[base]{rle}}). The use for such a function has come up repeatedly on StackOverflow, see the \code{See Also} section. This function allows to generate "run-length" groups directly.
 
     \code{rleid} is designed for interactive use and accepts a sequence of vectors as arguments. For programming, \code{rleidv} might be more useful. 
 }
 \value{
-	An integer vector with same length as \code{NROW(x)}.
+    When \code{prefix = NULL}, an integer vector with same length as \code{NROW(x)}, else a character vector with the value in \code{prefix} prefixed to the ids obtained.
 }
 \examples{
 DT = data.table(grp=rep(c("A", "B", "C", "A", "B"), c(2,2,3,1,2)), value=1:10)
 rleid(DT$grp) # get run-length ids
 rleidv(DT, "grp") # same as above
+
+rleid(DT$grp, prefix="grp") # prefix with 'grp'
+
 # get sum of value over run-length groups
 DT[, sum(value), by=.(grp, rleid(grp))]
+DT[, sum(value), by=.(grp, rleid(grp, prefix="grp"))]
 
 }
 \seealso{
-  \code{\link{data.table}}, \url{http://stackoverflow.com/q/21421047/559784}
+  \code{\link{data.table}}, \code{\link{rowid}}, \url{http://stackoverflow.com/q/21421047/559784}
 }
 \keyword{ data }
diff --git a/man/rowid.Rd b/man/rowid.Rd
new file mode 100644
index 0000000..1fc17a6
--- /dev/null
+++ b/man/rowid.Rd
@@ -0,0 +1,49 @@
+\name{rowid}
+\alias{rowid}
+\alias{rowidv}
+\title{ Generate unique row ids within each group}
+\description{
+   Convenience functions for generating a unique row ids within each group. It accepts atomic vectors, lists, data.frames or data.tables as input.
+
+   \code{rowid} is intended for interactive use, particularly along with the function \code{dcast} to generate unique ids directly in the formula. 
+
+   \code{rowidv(dt, cols=c("x", "y"))} is equivalent to column \code{N} in the code \code{dt[, N := seq_len(.N), by=c("x", "y")]}.
+   
+   See examples for more.
+}
+\usage{
+rowid(..., prefix=NULL)
+rowidv(x, cols=seq_along(x), prefix=NULL)
+}
+\arguments{
+  \item{x}{ A vector, list, data.frame or data.table. }
+  \item{...}{ A sequence of numeric, integer64, character or logical vectors, all of same length. For interactive use.}
+  \item{cols}{ Only meaningful for lists, data.frames or data.tables. A character vector of column names (or numbers) of x. }
+  \item{prefix}{ Either \code{NULL} (default) or a character vector of length=1 which is prefixed to the row ids, returning a character vector (instead of an integer vector).}
+}
+\value{
+	When \code{prefix = NULL}, an integer vector with same length as \code{NROW(x)}, else a character vector with the value in \code{prefix} prefixed to the ids obtained.
+}
+\examples{
+DT = data.table(x=c(20,10,10,30,30,20), y=c("a", "a", "a", "b", "b", "b"), z=1:6)
+
+rowid(DT$x) # 1,1,2,1,2,2
+rowidv(DT, cols="x") # same as above
+
+rowid(DT$x, prefix="group") # prefixed with 'group'
+
+rowid(DT$x, DT$y) # 1,1,2,1,2,1
+rowidv(DT, cols=c("x","y")) # same as above
+DT[, .(N=seq_len(.N)), by=.(x,y)]$N # same as above
+
+# convenient usage with dcast
+dcast(DT, x ~ rowid(x, prefix="group"), value.var="z")
+#     x group1 group2
+# 1: 10      2      3
+# 2: 20      1      6
+# 3: 30      4      5
+}
+\seealso{
+  \code{\link{dcast.data.table}}, \code{\link{rleid}}
+}
+\keyword{ data }
diff --git a/man/setDF.Rd b/man/setDF.Rd
index 68f7af7..d50d16f 100644
--- a/man/setDF.Rd
+++ b/man/setDF.Rd
@@ -1,6 +1,6 @@
 \name{setDF}
 \alias{setDF}
-\title{Convert a data.table to data.frame by reference}
+\title{Coerce a data.table to data.frame by reference}
 \description{
   In \code{data.table} parlance, all \code{set*} functions change their input \emph{by reference}. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other \code{data.table} operator that modifies input by reference is \code{\link{:=}}. Check out the \code{See Also} section below for other \code{set*} function \code{data.table} provides.
 
@@ -15,7 +15,7 @@ setDF(x, rownames=NULL)
 }
 
 \details{
-  This feature request came up on the data.table mailing list: \url{http://bit.ly/1xkokNQ}. All \code{data.table} attributes including any keys of the input data.table are stripped off.
+  This feature request came up on the \href{http://r.789695.n4.nabble.com/Is-there-any-overhead-to-converting-back-and-forth-from-a-data-table-to-a-data-frame-td4688332.html}{data.table mailing list}. All \code{data.table} attributes including any keys of the input data.table are stripped off.
   
   When using \code{rownames}, recall that the row names of a \code{data.frame} must be unique. By default, the assigned set of row names is simply the sequence 1, ..., \code{nrow(x)} (or \code{length(x)} for \code{list}s).
 }
@@ -24,7 +24,7 @@ setDF(x, rownames=NULL)
     The input \code{data.table} is modified by reference to a \code{data.frame} and returned (invisibly). If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \code{?copy}..
 }
 
-\seealso{ \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{copy}}, \code{\link{setDT}}
+\seealso{ \code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{setDT}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}
 }
 \examples{
 X = data.table(x=1:5, y=6:10)
diff --git a/man/setDT.Rd b/man/setDT.Rd
index 4f352e9..d0cfd0e 100644
--- a/man/setDT.Rd
+++ b/man/setDT.Rd
@@ -1,6 +1,6 @@
 \name{setDT}
 \alias{setDT}
-\title{Convert lists and data.frames to data.table by reference}
+\title{Coerce lists and data.frames to data.table by reference}
 \description{
   In \code{data.table} parlance, all \code{set*} functions change their input \emph{by reference}. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other \code{data.table} operator that modifies input by reference is \code{\link{:=}}. Check out the \code{See Also} section below for other \code{set*} function \code{data.table} provides.
 
@@ -25,7 +25,7 @@ setDT(x, keep.rownames=FALSE, key=NULL, check.names=FALSE)
     The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., \code{setDT(X)[, sum(B), by=A]}. If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \code{?copy}.
 }
 
-\seealso{ \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}, \code{\link{copy}}, \code{\link{setDF}}
+\seealso{ \code{\link{data.table}}, \code{\link{as.data.table}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setorder}}
 }
 \examples{
 
diff --git a/man/setNumericRounding.Rd b/man/setNumericRounding.Rd
index f437a51..70d8dad 100644
--- a/man/setNumericRounding.Rd
+++ b/man/setNumericRounding.Rd
@@ -3,34 +3,40 @@
 \alias{getNumericRounding}
 \title{ Change or turn off numeric rounding }
 \description{
-  Change rounding to 0, 1 or 2 bytes when joining, grouping or ordering numeric (i.e. double, POSIXct) columns.
+Change rounding to 0, 1 or 2 bytes when joining, grouping or ordering numeric 
+(i.e. double, POSIXct) columns.
 }
 \usage{
 setNumericRounding(x)
 getNumericRounding()
 }
 \arguments{
-  \item{x}{ integer or numeric vector: 2 (default), 1 or 0 byte rounding }
+  \item{x}{ integer or numeric vector: 0 (default), 1 or 2 byte rounding }
 }
 \details{
-  Computers cannot represent some floating point numbers (such as 0.6) precisely, using base 2. This leads to unexpected behaviour when
-  joining or grouping columns of type 'numeric'; i.e. 'double', see example below.  To deal with this automatically for convenience, 
-  when joining or grouping, data.table rounds such data to apx 11 s.f. which is plenty of digits for many cases. This is achieved by
-  rounding the last 2 bytes off the significand.  Where this is not enough, \code{setNumericRounding} can be used to reduce to 1 byte
-  rounding, or no rounding (0 bytes rounded) for full precision.
-  
-  It's bytes rather than bits because it's tied in with the radix sort algorithm for sorting numerics which sorts byte by byte. With the
-  default rounding of 2 bytes, at most 6 passes are needed. With no rounding, at most 8 passes are needed and hence may be slower. The
-  choice of default is not for speed however, but to avoid surprising results such as in the example below.
+Computers cannot represent some floating point numbers (such as 0.6) 
+precisely, using base 2. This leads to unexpected behaviour when joining or 
+grouping columns of type 'numeric'; i.e. 'double', see example below. In 
+cases where this is undesirable, data.table allows rounding such data up to 
+approximately 11 s.f. which is plenty of digits for many cases. This is 
+achieved by rounding the last 2 bytes off the significand. Other possible 
+values are 1 byte rounding, or no rounding (full precision, default).
+ 
+It's bytes rather than bits because it's tied in with the radix sort 
+algorithm for sorting numerics which sorts byte by byte. With the default 
+rounding of 0 bytes, at most 8 passes are needed. With rounding of 2 bytes, at 
+most 6 passes are needed (and therefore might be a tad faster).
 
-  For large numbers (integers > 2^31), we recommend using \code{bit64::integer64} rather than setting rounding to \code{0}.
-  
-  If you're using \code{POSIXct} type column with \emph{millisecond} (or lower) resolution, you might want to consider setting \code{setNumericRounding(1)} . This'll become the default for \code{POSIXct} types in the future, instead of the default \code{2}.
-}
+For large numbers (integers > 2^31), we recommend using 
+\code{bit64::integer64}, even though the default is to round off 0 bytes (full 
+precision).
+ }
 \value{
-    \code{setNumericRounding} returns no value; the new value is applied. \code{getNumericRounding} returns the current value: 0, 1 or 2.
+\code{setNumericRounding} returns no value; the new value is applied. 
+\code{getNumericRounding} returns the current value: 0, 1 or 2.
 }
 \seealso{
+\code{\link{datatable-optimize}}\cr
 \url{http://en.wikipedia.org/wiki/Double-precision_floating-point_format}\cr
 \url{http://en.wikipedia.org/wiki/Floating_point}\cr
 \url{http://docs.oracle.com/cd/E19957-01/806-3568/ncg_goldberg.html}
@@ -38,22 +44,20 @@ getNumericRounding()
 \examples{
 DT = data.table(a=seq(0,1,by=0.2),b=1:2, key="a")
 DT
-setNumericRounding(0)   # turn off rounding
+setNumericRounding(0)   # By default, rounding is turned off
 DT[.(0.4)]   # works
-DT[.(0.6)]   # no match, confusing since 0.6 is clearly there in DT
+DT[.(0.6)]   # no match, can be confusing since 0.6 is clearly there in DT
+             # happens due to floating point representation limitations
 
-setNumericRounding(2)   # restore default
-DT[.(0.6)]   # works as expected
+setNumericRounding(2)   # round off last 2 bytes
+DT[.(0.6)]   # works
 
 # using type 'numeric' for integers > 2^31 (typically ids)
 DT = data.table(id = c(1234567890123, 1234567890124, 1234567890125), val=1:3)
 print(DT, digits=15)
-DT[,.N,by=id]   # 1 row
+DT[,.N,by=id]   # 1 row, (last 2 bytes rounded)
 setNumericRounding(0)
-DT[,.N,by=id]   # 3 rows
+DT[,.N,by=id]   # 3 rows, (no rounding, default)
 # better to use bit64::integer64 for such ids
-setNumericRounding(2)
 }
 \keyword{ data }
-
-
diff --git a/man/setattr.Rd b/man/setattr.Rd
index a6522f0..2c75146 100644
--- a/man/setattr.Rd
+++ b/man/setattr.Rd
@@ -14,7 +14,7 @@ setnames(x,old,new)
   \item{name}{ The character attribute name. }
   \item{value}{ The value to assign to the attribute or \code{NULL} removes the attribute, if present. }
   \item{old}{ When \code{new} is provided, character names or numeric positions of column names to change. When \code{new} is not provided, the new column names, which must be the same length as the number of columns. See examples. }
-  \item{new}{ Optional. New column names, the same length as \code{old}. } 
+  \item{new}{ Optional. New column names, must be the same length as columns provided to \code{old} argument. } 
 }
 \details{
 
@@ -49,6 +49,9 @@ setnames(DT,2:3,c("D","E"))        # multiple
 setnames(DT,c("a","E"),c("A","F")) # multiple by name (warning if either "a" or "E" is missing)
 setnames(DT,c("X","Y","Z"))        # replace all (length of names must be == ncol(DT))
 
+DT <- data.table(x = 1:3, y = 4:6, z = 7:9)
+setnames(DT, -2, c("a", "b"))      # NEW FR #1443, allows -ve indices in 'old' argument
+
 DT = data.table(a=1:3, b=4:6)
 f = function(...) {
     # ...
diff --git a/man/setkey.Rd b/man/setkey.Rd
index 857603a..5116881 100644
--- a/man/setkey.Rd
+++ b/man/setkey.Rd
@@ -6,55 +6,110 @@
 \alias{haskey}
 \alias{set2key}
 \alias{set2keyv}
+\alias{setindex}
+\alias{setindexv}
 \alias{key2}
+\alias{indices}
 \title{ Create key on a data table }
 \description{
-  In \code{data.table} parlance, all \code{set*} functions change their input \emph{by reference}. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other \code{data.table} operator that modifies input by reference is \code{\link{:=}}. Check out the \code{See Also} section below for other \code{set*} function \code{data.table} provides.
+In \code{data.table} parlance, all \code{set*} functions change their input 
+\emph{by reference}. That is, no copy is made at all, other than temporary 
+working memory, which is as large as one column.. The only other \code{data.table} 
+operator that modifies input by reference is \code{\link{:=}}. Check out the 
+\code{See Also} section below for other \code{set*} function \code{data.table} 
+provides.
 
-  \code{setkey()} sorts a \code{data.table} and marks it as sorted (with an attribute \code{sorted}). The sorted columns are the key. The key can be any columns in any order. The columns are sorted in ascending order always. The table is changed \emph{by reference} and is therefore very memory efficient.
-  
-  \code{key()} returns the \code{data.table}'s key if it exists, and \code{NULL} if none exist.
-  
-  \code{haskey()} returns a logical \code{TRUE}/\code{FALSE} depending on whether the \code{data.table} has a key (or not).
+\code{setkey()} sorts a \code{data.table} and marks it as sorted (with an 
+attribute \code{sorted}). The sorted columns are the key. The key can be any 
+columns in any order. The columns are sorted in ascending order always. The table 
+is changed \emph{by reference} and is therefore very memory efficient.
 
+\code{key()} returns the \code{data.table}'s key if it exists, and \code{NULL} 
+if none exist.
+
+\code{haskey()} returns a logical \code{TRUE}/\code{FALSE} depending on whether 
+the \code{data.table} has a key (or not).
 }
 \usage{
 setkey(x, ..., verbose=getOption("datatable.verbose"), physical = TRUE)
 setkeyv(x, cols, verbose=getOption("datatable.verbose"), physical = TRUE)
-set2key(...)
-set2keyv(...)
+setindex(...)
+setindexv(...)
 key(x)
-key2(x)
+indices(x)
 haskey(x)
 key(x) <- value   #  DEPRECATED, please use setkey or setkeyv instead.
 }
 \arguments{
-  \item{x}{ A \code{data.table}. }
-  \item{\dots}{ The columns to sort by. Do not quote the column names. If \code{\dots} is missing (i.e. \code{setkey(DT)}), all the columns are used. \code{NULL} removes the key. }
-  \item{cols}{ A character vector (only) of column names. }
-  \item{value}{ In (deprecated) \code{key<-}, a character vector (only) of column names.}
-  \item{verbose}{ Output status and information. }
-  \item{physical}{ TRUE changes the order of the data in RAM. FALSE adds a secondary key a.k.a. index. }
+\item{x}{ A \code{data.table}. }
+\item{\dots}{ The columns to sort by. Do not quote the column names. If 
+\code{\dots} is missing (i.e. \code{setkey(DT)}), all the columns are used. 
+\code{NULL} removes the key. }
+\item{cols}{ A character vector (only) of column names. }
+\item{value}{ In (deprecated) \code{key<-}, a character vector (only) of column 
+names.}
+\item{verbose}{ Output status and information. }
+\item{physical}{ TRUE changes the order of the data in RAM. FALSE adds a 
+secondary key a.k.a. index. }
 }
 \details{
-  \code{setkey} reorders (or sorts) the rows of a data.table by the columns provided. In versions \code{1.9+}, for \code{integer} columns, a modified version of base's counting sort is implemented, which allows negative values as well. It is extremely fast, but is limited by the range of integer values being <= 1e5. If that fails, it falls back to a (fast) 4-pass radix sort for integers, implemented based on Pierre Terdiman's and Michael Herf's code (see links below). Similarly, a very f [...]
+\code{setkey} reorders (or sorts) the rows of a data.table by the columns 
+provided. In versions \code{1.9+}, for \code{integer} columns, a modified version 
+of base's counting sort is implemented, which allows negative values as well. It 
+is extremely fast, but is limited by the range of integer values being <= 1e5. If 
+that fails, it falls back to a (fast) 4-pass radix sort for integers, implemented 
+based on Pierre Terdiman's and Michael Herf's code (see links below). Similarly, 
+a very fast 6-pass radix order for columns of type \code{double} is also implemented. 
+This gives a speed-up of about 5-8x compared to \code{1.8.10} on \code{setkey} 
+and all internal \code{order}/\code{sort} operations. Fast radix sorting is also 
+implemented for \code{character} and \code{bit64::integer64} types.
+
+The sort is \emph{stable}; i.e., the order of ties (if any) is preserved, in both 
+versions - \code{<=1.8.10} and \code{>= 1.9.0}.
 
-  Note that columns of \code{numeric} types (i.e., \code{double}) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. Have a look at \code{\link{setNumericRounding}} to learn more.
+In \code{data.table} versions \code{<= 1.8.10}, for columns of type \code{integer}, 
+the sort is attempted with the very fast \code{"radix"} method in 
+\code{\link[base]{sort.list}}. If that fails, the sort reverts to the default 
+method in \code{\link[base]{order}}. For character vectors, \code{data.table} 
+takes advantage of R's internal global string cache and implements a very efficient 
+order, also exported as \code{\link{chorder}}.
 
-  The sort is \emph{stable}; i.e., the order of ties (if any) is preserved, in both versions - \code{<=1.8.10} and \code{>= 1.9.0}.
+In v1.7.8, the \code{key<-} syntax was deprecated. The \code{<-} method copies 
+the whole table and we know of no way to avoid that copy without a change in 
+\R itself. Please use the \code{set}* functions instead, which make no copy at 
+all. \code{setkey} accepts unquoted column names for convenience, whilst 
+\code{setkeyv} accepts one vector of column names.
 
-  In \code{data.table} versions \code{<= 1.8.10}, for columns of type \code{integer}, the sort is attempted with the very fast \code{"radix"} method in \code{\link[base]{sort.list}}. If that fails, the sort reverts to the default method in \code{\link[base]{order}}. For character vectors, \code{data.table} takes advantage of R's internal global string cache and implements a very efficient order, also exported as \code{\link{chorder}}.
+The problem (for \code{data.table}) with the copy by \code{key<-} (other than 
+being slower) is that \R doesn't maintain the over allocated truelength, but it 
+looks as though it has. Adding a column by reference using \code{:=} after a 
+\code{key<-} was therefore a memory overwrite and eventually a segfault; the 
+over allocated memory wasn't really there after \code{key<-}'s copy. \code{data.table}s 
+now have an attribute \code{.internal.selfref} to catch and warn about such copies. 
+This attribute has been implemented in a way that is friendly with 
+\code{identical()} and \code{object.size()}. 
 
-  In v1.7.8, the \code{key<-} syntax was deprecated. The \code{<-} method copies the whole table and we know of no way to avoid that copy without a change in \R itself. Please use the \code{set}* functions instead, which make no copy at all. \code{setkey} accepts unquoted column names for convenience, whilst \code{setkeyv} accepts one vector of column names.
-  
-  The problem (for \code{data.table}) with the copy by \code{key<-} (other than being slower) is that \R doesn't maintain the over allocated truelength, but it looks as though it has. Adding a column by reference using \code{:=} after a \code{key<-} was therefore a memory overwrite and eventually a segfault; the over allocated memory wasn't really there after \code{key<-}'s copy. \code{data.table}s now have an attribute \code{.internal.selfref} to catch and warn about such copies. This a [...]
+For the same reason, please use the other \code{set*} functions which modify 
+objects by reference, rather than using the \code{<-} operator which results 
+in copying the entire object. 
 
-  For the same reason, please use the other \code{set*} functions which modify objects by reference, rather than using the \code{<-} operator which results in copying the entire object. 
-  
-  It isn't good programming practice, in general, to use column numbers rather than names. This is why \code{setkey} and \code{setkeyv} only accept column names. If you use column numbers then bugs (possibly silent) can more easily creep into your code as time progresses if changes are made elsewhere in your code; e.g., if you add, remove or reorder columns in a few months time, a \code{setkey} by column number will then refer to a different column, possibly returning incorrect results w [...]
-  }
+It isn't good programming practice, in general, to use column numbers rather 
+than names. This is why \code{setkey} and \code{setkeyv} only accept column names. 
+If you use column numbers then bugs (possibly silent) can more easily creep into 
+your code as time progresses if changes are made elsewhere in your code; e.g., if 
+you add, remove or reorder columns in a few months time, a \code{setkey} by column 
+number will then refer to a different column, possibly returning incorrect results 
+with no warning. (A similar concept exists in SQL, where \code{"select * from ..."} 
+is considered poor programming style when a robust, maintainable system is 
+required.)  If you really wish to use column numbers, it's possible but 
+deliberately a little harder; e.g., \code{setkeyv(DT,colnames(DT)[1:2])}.
+}
 \value{
-    The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., \code{setkey(DT,a)[J("foo")]}. If you require a copy, take a copy first (using \code{DT2=copy(DT)}). \code{copy()} may also sometimes be useful before \code{:=} is used to subassign to a column by reference. See \code{?copy}.
+The input is modified by reference, and returned (invisibly) so it can be used 
+in compound statements; e.g., \code{setkey(DT,a)[J("foo")]}. If you require a 
+copy, take a copy first (using \code{DT2=copy(DT)}). \code{copy()} may also 
+sometimes be useful before \code{:=} is used to subassign to a column by 
+reference. See \code{?copy}.
 }
 \references{
 \url{http://en.wikipedia.org/wiki/Radix_sort}\cr
@@ -62,9 +117,22 @@ key(x) <- value   #  DEPRECATED, please use setkey or setkeyv instead.
 \url{http://cran.at.r-project.org/web/packages/bit/index.html}\cr
 \url{http://stereopsis.com/radix.html}
 }
-\note{ Despite its name, \code{base::sort.list(x,method="radix")} actually invokes a \emph{counting sort} in R, not a radix sort. See do_radixsort in src/main/sort.c. A counting sort, however, is particularly suitable for sorting integers and factors, and we like it. In fact we like it so much that \code{data.table} contains a counting sort algorithm for character vectors using R's internal global string cache. This is particularly fast for character vectors containing many duplicates, s [...]
+\note{ Despite its name, \code{base::sort.list(x,method="radix")} actually 
+invokes a \emph{counting sort} in R, not a radix sort. See do_radixsort in 
+src/main/sort.c. A counting sort, however, is particularly suitable for 
+sorting integers and factors, and we like it. In fact we like it so much 
+that \code{data.table} contains a counting sort algorithm for character vectors 
+using R's internal global string cache. This is particularly fast for character 
+vectors containing many duplicates, such as grouped data in a key column. This 
+means that character is often preferred to factor. Factors are still fully 
+supported, in particular ordered factors (where the levels are not in 
+alphabetic order).
 }
-\seealso{ \code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}}, \code{\link[base]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{chorder}}, \code{\link{setNumericRounding}}
+\seealso{ \code{\link{data.table}}, \code{\link{tables}}, \code{\link{J}}, 
+\code{\link[base]{sort.list}}, \code{\link{copy}}, \code{\link{setDT}}, 
+\code{\link{setDF}}, \code{\link{set}} \code{\link{:=}}, \code{\link{setorder}}, 
+\code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, 
+\code{\link{chorder}}, \code{\link{setNumericRounding}}
 }
 \examples{
 # Type 'example(setkey)' to run these at prompt and browse output
diff --git a/man/setops.Rd b/man/setops.Rd
new file mode 100644
index 0000000..d590ec8
--- /dev/null
+++ b/man/setops.Rd
@@ -0,0 +1,59 @@
+\name{setops}
+\alias{setops}
+\alias{intersect}
+\alias{fintersect}
+\alias{setdiff}
+\alias{fsetdiff}
+\alias{except}
+\alias{fexcept}
+\alias{union}
+\alias{funion}
+\alias{setequal}
+\alias{fsetequal}
+\title{ Set operations for data tables }
+\description{
+  Similar to base's set functions, \code{union}, \code{intersect}, \code{setdiff} and \code{setequal} but for \code{data.table}s. Additional \code{all} argument controls if/how \code{duplicate} rows are returned. \code{bit64::integer64} is also supported.
+
+  Unlike SQL, data.table functions will retain order of rows in result.
+}
+\usage{
+fintersect(x, y, all = FALSE)
+fsetdiff(x, y, all = FALSE)
+funion(x, y, all = FALSE)
+fsetequal(x, y)
+}
+\arguments{
+	\item{x,y}{\code{data.table}s.}
+	\item{all}{Logical. Default is \code{FALSE} and removes duplicate rows on the result. When \code{TRUE}, if there are \code{xn} copies of a particular row in \code{x} and \code{yn} copies of the same row in \code{y}, then:
+		\itemize{
+
+			\item{\code{fintersect} will return \code{min(xn, yn)} copies of that row.}
+
+			\item{\code{fsetdiff} will return \code{max(0, xn-yn)} copies of that row.}
+
+			\item{\code{funion} will return xn+yn copies of that row.}}
+		}
+}
+\details{
+  Columns of type \code{complex} and \code{list} are not supported except for \code{funion}.
+}
+\value{
+    A data.table in case of \code{fintersect}, \code{funion} and \code{fsetdiff}. Logical \code{TRUE} or \code{FALSE} for \code{fsetequal}.
+}
+\seealso{ \code{\link{data.table}}, \code{\link{rbindlist}}, \code{\link{all.equal.data.table}}, \code{\link{unique}}, \code{\link{duplicated}}, \code{\link{uniqueN}}, \code{\link{anyDuplicated}}
+}
+\references{
+\url{https://db.apache.org/derby/papers/Intersect-design.html}
+}
+\examples{
+x = data.table(c(1,2,2,2,3,4,4))
+y = data.table(c(2,3,4,4,4,5))
+fintersect(x, y)            # intersect
+fintersect(x, y, all=TRUE)  # intersect all
+fsetdiff(x, y)              # except
+fsetdiff(x, y, all=TRUE)    # except all
+funion(x, y)                # union
+funion(x, y, all=TRUE)      # union all
+fsetequal(x, y)             # setequal
+}
+\keyword{ data }
diff --git a/man/setorder.Rd b/man/setorder.Rd
index b133919..7e2762f 100644
--- a/man/setorder.Rd
+++ b/man/setorder.Rd
@@ -7,13 +7,23 @@
 
 \title{Fast row reordering of a data.table by reference}
 \description{
-  In \code{data.table} parlance, all \code{set*} functions change their input \emph{by reference}. That is, no copy is made at all, other than temporary working memory, which is as large as one column.. The only other \code{data.table} operator that modifies input by reference is \code{\link{:=}}. Check out the \code{See Also} section below for other \code{set*} function \code{data.table} provides.
-  
-  \code{setorder} (and \code{setorderv}) reorders the rows of a \code{data.table} based on the columns (and column order) provided. It reorders the table \emph{by reference} and is therefore very memory efficient.
-  
-  Also \code{x[order(.)]} is now optimised internally to use data.table's fast order by default. data.table always reorders in C-locale. To sort by session locale, use \code{x[base::order(.)]} instead.
-  
-  \code{bit64::integer64} type is also supported for reordering rows of a \code{data.table}.
+In \code{data.table} parlance, all \code{set*} functions change their input 
+\emph{by reference}. That is, no copy is made at all, other than temporary 
+working memory, which is as large as one column.. The only other 
+\code{data.table} operator that modifies input by reference is \code{\link{:=}}. 
+Check out the \code{See Also} section below for other \code{set*} function 
+\code{data.table} provides.
+
+\code{setorder} (and \code{setorderv}) reorders the rows of a \code{data.table} 
+based on the columns (and column order) provided. It reorders the table 
+\emph{by reference} and is therefore very memory efficient.
+
+Also \code{x[order(.)]} is now optimised internally to use data.table's fast 
+order by default. data.table always reorders in C-locale. To sort by session 
+locale, use \code{x[base::order(.)]} instead.
+
+\code{bit64::integer64} type is also supported for reordering rows of a 
+\code{data.table}.
 }
 
 \usage{
@@ -23,27 +33,60 @@ setorderv(x, cols, order=1L, na.last=FALSE)
 # x[order(., na.last=TRUE)]
 }
 \arguments{
-  \item{x}{ A \code{data.table}. }
-  \item{...}{ The columns to sort by. Do not quote column names. If \code{...} is missing (ex: \code{setorder(x)}), \code{x} is rearranged based on all columns in ascending order by default. To sort by a column in descending order prefix a \code{"-"}, i.e., \code{setorder(x, a, -b, c)}. The \code{-b} works when \code{b} is of type \code{character} as well. }
-  \item{cols}{ A character vector of column names of \code{x}, to which to order by. Do not add \code{"-"} here. Use \code{order} argument instead.}
-  \item{order}{ An integer vector with only possible values of \code{1} and \code{-1}, corresponding to ascending and descending order. The length of \code{order} must be either \code{1} or equal to that of \code{cols}. If \code{length(order) == 1}, it's recycled to \code{length(cols)}. }
-  \item{na.last}{logical. If \code{TRUE}, missing values in the data are placed last; if \code{FALSE}, they are placed first; if \code{NA} they are removed. \code{na.last=NA} is valid only for \code{x[order(., na.last)]} and it's default is \code{TRUE}. \code{setorder} and \code{setorderv} only accept TRUE/FALSE with default \code{FALSE}.}
+\item{x}{ A \code{data.table}. }
+\item{...}{ The columns to sort by. Do not quote column names. If \code{...} 
+is missing (ex: \code{setorder(x)}), \code{x} is rearranged based on all 
+columns in ascending order by default. To sort by a column in descending order 
+prefix a \code{"-"}, i.e., \code{setorder(x, a, -b, c)}. The \code{-b} works 
+when \code{b} is of type \code{character} as well. }
+\item{cols}{ A character vector of column names of \code{x}, to which to order 
+by. Do not add \code{"-"} here. Use \code{order} argument instead.}
+\item{order}{ An integer vector with only possible values of \code{1} and 
+\code{-1}, corresponding to ascending and descending order. The length of 
+\code{order} must be either \code{1} or equal to that of \code{cols}. If 
+\code{length(order) == 1}, it's recycled to \code{length(cols)}. }
+\item{na.last}{logical. If \code{TRUE}, missing values in the data are placed 
+last; if \code{FALSE}, they are placed first; if \code{NA} they are removed. 
+\code{na.last=NA} is valid only for \code{x[order(., na.last)]} and it's 
+default is \code{TRUE}. \code{setorder} and \code{setorderv} only accept 
+TRUE/FALSE with default \code{FALSE}.}
 }
 \details{
-  \code{data.table} implements fast radix based ordering. In versions <= 1.9.2, it was only capable of increasing order (ascending). From 1.9.4 on, the functionality has been extended to decreasing order (descending) as well. Columns of \code{numeric} types (i.e., \code{double}) have their last two bytes rounded off while computing order, by defalult, to avoid any unexpected behaviour due to limitations in representing floating point numbers precisely. Have a look at \code{\link{setNumer [...]
+\code{data.table} implements fast radix based ordering. In versions <= 1.9.2, 
+it was only capable of increasing order (ascending). From 1.9.4 on, the 
+functionality has been extended to decreasing order (descending) as well. 
 
-  \code{setorder} accepts unquoted column names (with names preceded with a \code{-} sign for descending order) and reorders data.table rows \emph{by reference}, for e.g., \code{setorder(x, a, -b, c)}. Note that \code{-b} also works with columns of type \code{character} unlike \code{base::order}, which requires \code{-xtfrm(y)} instead (which is slow). \code{setorderv} in turn accepts a  character vector of column names and an integer vector of column order separately.
+\code{setorder} accepts unquoted column names (with names preceded with a 
+\code{-} sign for descending order) and reorders data.table rows 
+\emph{by reference}, for e.g., \code{setorder(x, a, -b, c)}. Note that 
+\code{-b} also works with columns of type \code{character} unlike 
+\code{base::order}, which requires \code{-xtfrm(y)} instead (which is slow). 
+\code{setorderv} in turn accepts a  character vector of column names and an 
+integer vector of column order separately.
 
-  Note that \code{\link{setkey}} still requires and will always sort only in ascending order, and is different from \code{setorder} in that it additionally sets the \code{sorted} attribute. 
-  
-  \code{na.last} argument, by default, is \code{FALSE} for \code{setorder} and \code{setorderv} to be consistent with \code{data.table}'s \code{setkey} and is \code{TRUE} for \code{x[order(.)]} to be consistent with \code{base::order}. Only \code{x[order(.)]} can have \code{na.last = NA} as it's a subset operation as opposed to \code{setorder} or \code{setorderv} which reorders the data.table by reference.
-  
-  If \code{setorder} results in reordering of the rows of a keyed \code{data.table}, then it's key will be set to \code{NULL}.
-  }
+Note that \code{\link{setkey}} still requires and will always sort only in 
+ascending order, and is different from \code{setorder} in that it additionally 
+sets the \code{sorted} attribute. 
+
+\code{na.last} argument, by default, is \code{FALSE} for \code{setorder} and 
+\code{setorderv} to be consistent with \code{data.table}'s \code{setkey} and 
+is \code{TRUE} for \code{x[order(.)]} to be consistent with \code{base::order}. 
+Only \code{x[order(.)]} can have \code{na.last = NA} as it's a subset operation 
+as opposed to \code{setorder} or \code{setorderv} which reorders the data.table 
+by reference.
+
+If \code{setorder} results in reordering of the rows of a keyed \code{data.table}, 
+then it's key will be set to \code{NULL}.
+}
 \value{
-    The input is modified by reference, and returned (invisibly) so it can be used in compound statements; e.g., \code{setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]}. If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See \code{?copy}.
+The input is modified by reference, and returned (invisibly) so it can be used 
+in compound statements; e.g., \code{setorder(DT,a,-b)[, cumsum(c), by=list(a,b)]}. 
+If you require a copy, take a copy first (using \code{DT2 = copy(DT)}). See 
+\code{?copy}.
 }
-\seealso{ \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, \code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setDT}}, \code{\link{setDF}}, \code{\link{copy}}, \code{\link{setNumericRounding}}
+\seealso{ \code{\link{setkey}}, \code{\link{setcolorder}}, \code{\link{setattr}}, 
+\code{\link{setnames}}, \code{\link{set}}, \code{\link{:=}}, \code{\link{setDT}}, 
+\code{\link{setDF}}, \code{\link{copy}}, \code{\link{setNumericRounding}}
 }
 \examples{
 
diff --git a/man/shift.Rd b/man/shift.Rd
index 31f37d0..9f0546b 100644
--- a/man/shift.Rd
+++ b/man/shift.Rd
@@ -14,15 +14,15 @@ shift(x, n=1L, fill=NA, type=c("lag", "lead"), give.names=FALSE)
 }
 \arguments{
   \item{x}{ A vector, list, data.frame or data.table. }
-  \item{n}{ Non-negative integer vector providing the periods to lead/lag by. To create multiple lead/lag vectors, provide multiple values to \code{n}. }
+  \item{n}{ Non-negative integer vector denoting the offset to lead or lag the input by. To create multiple lead/lag vectors, provide multiple values to \code{n}. }
   \item{fill}{ Value to pad by. }
   \item{type}{ default is \code{"lag"}. The other possible value is \code{"lead"}. }
   \item{give.names}{default is \code{FALSE} which returns an unnamed list. When \code{TRUE}, names are automatically generated corresponding to \code{type} and \code{n}. }
 }
 \details{
-  \code{shift} accepts vectors, lists, data.frames or data.tables. It always returns a list except when the input is a \code{vector} and \code{length(n) == 1} in which case a \code{vector} is returned, for convenience. This is so that it can be used conveniently within data.table's syntax. For example, \code{DT[, (cols) := shift(.SD, 1L), by=id]} would lag every column of \code{.SD} by 1 period for each group and \code{DT[, newcol := colA + shift(colB)]} would assign the sum of two \emph [...]
+  \code{shift} accepts vectors, lists, data.frames or data.tables. It always returns a list except when the input is a \code{vector} and \code{length(n) == 1} in which case a \code{vector} is returned, for convenience. This is so that it can be used conveniently within data.table's syntax. For example, \code{DT[, (cols) := shift(.SD, 1L), by=id]} would lag every column of \code{.SD} by 1 for each group and \code{DT[, newcol := colA + shift(colB)]} would assign the sum of two \emph{vector [...]
 
-  Argument \code{n} allows multiple values. For example, \code{DT[, (cols) := shift(.SD, 1:2), by=id]} would lag every column of \code{.SD} by \code{1} and \code{2} periods for each group. If \code{.SD} contained four columns, the first two elements of the list would correspond to \code{lag=1} and \code{lag=2} for the first column of \code{.SD}, the next two for second column of \code{.SD} and so on. Please see examples for more.
+  Argument \code{n} allows multiple values. For example, \code{DT[, (cols) := shift(.SD, 1:2), by=id]} would lag every column of \code{.SD} by \code{1} and \code{2} for each group. If \code{.SD} contained four columns, the first two elements of the list would correspond to \code{lag=1} and \code{lag=2} for the first column of \code{.SD}, the next two for second column of \code{.SD} and so on. Please see examples for more.
 
   \code{shift} is designed mainly for use in data.tables along with \code{:=} or \code{set}. Therefore, it returns an unnamed list by default as assigning names for each group over and over can be quite time consuming with many groups. It may be useful to set names automatically in other cases, which can be done by setting \code{give.names} to \code{TRUE}.
 }
@@ -33,9 +33,9 @@ shift(x, n=1L, fill=NA, type=c("lag", "lead"), give.names=FALSE)
 \examples{
 # on vectors, returns a vector as long as length(n) == 1, #1127
 x = 1:5
-# lag with period=1 and pad with NA (returns vector)
+# lag with n=1 and pad with NA (returns vector)
 shift(x, n=1, fill=NA, type="lag")
-# lag with period=1 and 2, and pad with 0 (returns list)
+# lag with n=1 and 2, and pad with 0 (returns list)
 shift(x, n=1:2, fill=0, type="lag")
 
 # on data.tables
diff --git a/man/shouldPrint.Rd b/man/shouldPrint.Rd
new file mode 100644
index 0000000..80851f5
--- /dev/null
+++ b/man/shouldPrint.Rd
@@ -0,0 +1,25 @@
+\name{shouldPrint}
+\alias{shouldPrint}
+\title{ For use by packages that mimic/divert auto printing e.g. IRkernel and knitr }
+\description{
+  Not for use by users. Exported only for use by IRkernel (Jupyter) and knitr.
+}
+\usage{
+  shouldPrint(x)
+}
+\arguments{
+  \item{x}{ A \code{data.table}. }
+}
+\details{
+  Should IRkernel/Jupyter print a data.table returned invisibly by DT[,:=] ?
+  This is a read-once function since it resets an internal flag. If you need the value more than once in your logic, store the value from the first call.
+}
+\value{
+  TRUE or FALSE.
+}
+\references{
+  \url{https://github.com/IRkernel/IRkernel/issues/127}\cr
+  \url{https://github.com/Rdatatable/data.table/issues/933}\cr
+}
+
+
diff --git a/man/special-symbols.Rd b/man/special-symbols.Rd
new file mode 100644
index 0000000..4d5adf1
--- /dev/null
+++ b/man/special-symbols.Rd
@@ -0,0 +1,50 @@
+\name{special-symbols}
+\alias{special-symbols}
+\alias{datatable-symbols}
+\alias{.SD}
+\alias{.I}
+\alias{.GRP}
+\alias{.BY}
+\alias{.N}
+\title{ Special symbols }
+\description{
+    \code{.SD}, \code{.BY}, \code{.N}, \code{.I} and \code{.GRP} are \emph{read only} symbols for use in \code{j}. \code{.N} can be used in \code{i} as well. See the vignettes and examples here and in \code{\link{data.table}}. 
+}
+\details{
+    The bindings of these variables are locked and attempting to assign to them will generate an error. If you wish to manipulate \code{.SD} before returning it, take a \code{copy(.SD)} first (see FAQ 4.5). Using \code{:=} in the \code{j} of \code{.SD} is reserved for future use as a (tortuously) flexible way to update \code{DT} by reference by group (even when groups are not contiguous in an ad hoc by).
+
+    These symbols are used in \code{j} and defined as follows.
+
+    \itemize{
+        \item{\code{.SD} is a \code{data.table} containing the \bold{S}ubset of \code{x}'s \bold{D}ata for each group, excluding any columns used in \code{by} (or \code{keyby}).}
+        \item{\code{.BY} is a \code{list} containing a length 1 vector for each item in \code{by}. This can be useful when \code{by} is not known in advance. The \code{by} variables are also available to \code{j} directly by name; useful for example for titles of graphs if \code{j} is a plot command, or to branch with \code{if()} depending on the value of a group variable.}
+        \item{\code{.N} is an integer, length 1, containing the number of rows in the group. This may be useful when the column names are not known in advance and for convenience generally. When grouping by \code{i}, \code{.N} is the number of rows in \code{x} matched to, for each row of \code{i}, regardless of whether \code{nomatch} is \code{NA} or \code{0}. It is renamed to \code{N} (no dot) in the result (otherwise a column called \code{".N"} could conflict with the \code{.N} variable [...]
+        \item{\code{.I} is an integer vector equal to \code{seq_len(nrow(x))}. While grouping, it holds for each item in the group, it's row location in \code{x}. This is useful to subset in \code{j}; e.g. \code{DT[, .I[which.max(somecol)], by=grp]}.}
+        \item{\code{.GRP} is an integer, length 1, containing a simple group counter. 1 for the 1st group, 2 for the 2nd, etc.}
+    }
+}
+\seealso{
+    \code{\link{data.table}}, \code{\link{:=}}, \code{\link{set}}, \code{\link{datatable-optimize}}
+}
+\examples{
+\dontrun{
+DT = data.table(x=rep(c("b","a","c"),each=3), v=c(1,1,1,2,2,1,1,2,2), y=c(1,3,6), a=1:9, b=9:1)
+DT
+X = data.table(x=c("c","b"), v=8:7, foo=c(4,2))
+X
+
+DT[.N]                                 # last row, only special symbol allowed in 'i'
+DT[, .N]                               # total number of rows in DT
+DT[, .N, by=x]                         # number of rows in each group
+DT[, .SD, .SDcols=x:y]                 # select columns 'x' and 'y'
+DT[, .SD[1]]                           # first row of all columns
+DT[, .SD[1], by=x]                     # first row of 'y' and 'v' for each group in 'x'
+DT[, c(.N, lapply(.SD, sum)), by=x]    # get rows *and* sum columns 'v' and 'y' by group
+DT[, .I[1], by=x]                      # row number in DT corresponding to each group
+DT[, .N, by=rleid(v)]                  # get count of consecutive runs of 'v'
+DT[, c(.(y=max(y)), lapply(.SD, min)), 
+        by=rleid(v), .SDcols=v:b]      # compute 'j' for each consecutive runs of 'v'
+DT[, grp := .GRP, by=x]                # add a group counter
+X[, DT[.BY, y, on="x"], by=x]          # join within each group
+}}
+\keyword{ data }
diff --git a/man/split.Rd b/man/split.Rd
new file mode 100644
index 0000000..0dd2445
--- /dev/null
+++ b/man/split.Rd
@@ -0,0 +1,76 @@
+\name{split}
+\alias{split}
+\alias{split.data.table}
+\title{ Split data.table into chunks in a list }
+\description{
+  Split method for data.table. Faster and more flexible. Be aware that processing list of data.tables will be generally much slower than manipulation in single data.table by group using \code{by} argument, read more on \code{\link{data.table}}.
+}
+\usage{
+\method{split}{data.table}(x, f, drop = FALSE,
+      by, sorted = FALSE, keep.by = TRUE, flatten = TRUE, 
+      ..., verbose = getOption("datatable.verbose"))
+}
+\arguments{
+  \item{x}{data.table }
+  \item{f}{factor or list of factors. Same as \code{\link[base]{split.data.frame}}. Use \code{by} argument instead, this is just for consistency with data.frame method.}
+  \item{drop}{logical. Default \code{FALSE} will not drop empty list elements caused by factor levels not refered by that factors. Works also with new arguments of split data.table method.}
+  \item{by}{character vector. Column names on which split should be made. For \code{length(by) > 1L} and \code{flatten} FALSE it will result nested lists with data.tables on leafs.}
+  \item{sorted}{When default \code{FALSE} it will retain the order of groups we are splitting on. When \code{TRUE} then sorted list(s) are returned. Does not have effect for \code{f} argument.}
+  \item{keep.by}{logical default \code{TRUE}. Keep column provided to \code{by} argument.}
+  \item{flatten}{logical default \code{TRUE} will unlist nested lists of data.tables. When using \code{f} results are always flattened to list of data.tables.}
+  \item{\dots}{passed to data.frame way of processing when using \code{f} argument.}
+  \item{verbose}{logical default \code{FALSE}. When \code{TRUE} it will print to console data.table split query used to split data.}
+}
+\details{
+    Argument \code{f} is just for consistency in usage to data.frame method. Recommended is to use \code{by} argument instead, it will be faster, more flexible, and by default will preserve order according to order in data.
+}
+\value{
+    List of \code{data.table}s. If using \code{flatten} FALSE and \code{length(by) > 1L} then recursively nested lists having \code{data.table}s as leafs of grouping according to \code{by} argument.
+}
+\seealso{ \code{\link{data.table}}, \code{\link{rbindlist}} }
+\examples{
+set.seed(123)
+dt = data.table(x1 = rep(letters[1:2], 6), 
+                x2 = rep(letters[3:5], 4), 
+                x3 = rep(letters[5:8], 3), 
+                y = rnorm(12))
+dt = dt[sample(.N)]
+df = as.data.frame(dt)
+
+# split consistency with data.frame: `x, f, drop`
+all.equal(
+    split(dt, list(dt$x1, dt$x2)),
+    lapply(split(df, list(df$x1, df$x2)), setDT)
+)
+
+# nested list using `flatten` arguments
+split(dt, by=c("x1", "x2"))
+split(dt, by=c("x1", "x2"), flatten=FALSE)
+
+# dealing with factors
+fdt = dt[, c(lapply(.SD, as.factor), list(y=y)), .SDcols=x1:x3]
+fdf = as.data.frame(fdt)
+sdf = split(fdf, list(fdf$x1, fdf$x2))
+all.equal(
+    split(fdt, by=c("x1", "x2"), sorted=TRUE),
+    lapply(sdf[sort(names(sdf))], setDT)
+)
+
+# factors having unused levels, drop FALSE, TRUE
+fdt = dt[, .(x1 = as.factor(c(as.character(x1), "c"))[-13L],
+             x2 = as.factor(c("a", as.character(x2)))[-1L],
+             x3 = as.factor(c("a", as.character(x3), "z"))[c(-1L,-14L)],
+             y = y)]
+fdf = as.data.frame(fdt)
+sdf = split(fdf, list(fdf$x1, fdf$x2))
+all.equal(
+    split(fdt, by=c("x1", "x2"), sorted=TRUE),
+    lapply(sdf[sort(names(sdf))], setDT)
+)
+sdf = split(fdf, list(fdf$x1, fdf$x2), drop=TRUE)
+all.equal(
+    split(fdt, by=c("x1", "x2"), sorted=TRUE, drop=TRUE),
+    lapply(sdf[sort(names(sdf))], setDT)
+)
+}
+\keyword{ data }
diff --git a/man/test.data.table.Rd b/man/test.data.table.Rd
index 9d383ef..60ef51c 100644
--- a/man/test.data.table.Rd
+++ b/man/test.data.table.Rd
@@ -5,17 +5,18 @@
   Runs a set of tests to check data.table is working correctly.
 }
 \usage{
-test.data.table(verbose=FALSE, pkg="pkg")
+test.data.table(verbose=FALSE, pkg="pkg", silent=FALSE)
 }
 \arguments{
 \item{verbose}{ If TRUE sets datatable.verbose to TRUE for the duration of the tests. }
 \item{pkg}{Root directory name under which all package content (ex: DESCRIPTION, src/, R/, inst/ etc..) resides.}
+\item{silent}{Logical, default FASLE, when TRUE it will not raise error on in case of test fails.}
 }
 \details{
    Runs a series of tests. These can be used to see features and examples of usage, too. Running test.data.table will tell you the full location of the test file(s) to open.
 }
 \value{
-    TRUE if all tests were successful. FALSE otherwise.
+When \code{silent} equals to TRUE it will return TRUE if all tests were successful. FALSE otherwise. If \code{silent} equals to FALSE it will return TRUE if all tests were successful. Error otherwise.
 }
 \seealso{ \code{\link{data.table}} }
 \examples{
diff --git a/man/truelength.Rd b/man/truelength.Rd
index 04312d8..5ac64e4 100644
--- a/man/truelength.Rd
+++ b/man/truelength.Rd
@@ -8,24 +8,24 @@
 \usage{
 truelength(x)
 alloc.col(DT,
-    n = getOption("datatable.alloccol"),        # default: quote(max(100L,ncol(DT)+64L))
+    n = getOption("datatable.alloccol"),        # default: 1024L
     verbose = getOption("datatable.verbose"))   # default: FALSE
 }
 \arguments{
 \item{x}{ Any type of vector, including \code{data.table} which is a \code{list} vector of column pointers. }
 \item{DT}{ A \code{data.table}. }
-\item{n}{ The number of column pointer slots to reserve in memory, including existing columns. May be a numeric, or a quote()-ed expression (see default). If \code{DT} is a 10 column \code{data.table}, \code{n=1000} means grow the spare slots from 90 to 990, assuming the default of 100 has not been changed. }
+\item{n}{ The number of spare column pointer slots to ensure are available. If \code{DT} is a 1,000 column \code{data.table} with 24 spare slots remaining, \code{n=1024L} means grow the 24 spare slots to be 1024. \code{truelength(DT)} will then be 2024 in this example. }
 \item{verbose}{ Output status and information. }
 }
 \details{
     When adding columns by reference using \code{:=}, we \emph{could} simply create a new column list vector (one longer) and memcpy over the old vector, with no copy of the column vectors themselves. That requires negligibe use of space and time, and is what v1.7.2 did.  However, that copy of the list vector of column pointers only (but not the columns themselves), a \emph{shallow copy}, resulted in inconsistent behaviour in some circumstances. So, as from v1.7.3 data.table over allocat [...]
 
-    When the allocated column pointer slots are used up, to add a new column \code{data.table} must reallocate that vector. If two or more variables are bound to the same data.table this shallow copy may or may not be desirable, but we don't think this will be a problem very often (more discussion may be required on datatable-help). Setting \code{options(datatable.verbose=TRUE)} includes messages if and when a shallow copy is taken. To avoid shallow copies there are several options: use  [...]
+    When the allocated column pointer slots are used up, to add a new column \code{data.table} must reallocate that vector. If two or more variables are bound to the same data.table this shallow copy may or may not be desirable, but we don't think this will be a problem very often (more discussion may be required on datatable-help). Setting \code{options(datatable.verbose=TRUE)} includes messages if and when a shallow copy is taken. To avoid shallow copies there are several options: use  [...]
 
     Please note : over allocation of the column pointer vector is not for efficiency per se. It's so that \code{:=} can add columns by reference without a shallow copy.
 }
 \value{
-    \code{truelength(x)} returns the length of the vector allocated in memory. \code{length(x)} of those items are in use. Currently, it's just the list vector of column pointers that is over-allocated (i.e. \code{truelength(DT)}), not the column vectors themselves, which would in future allow fast row \code{insert()}. For tables loaded from disk however, \code{truelength} is 0 in \R 2.14.0 and random in \R <= 2.13.2; i.e., in both cases perhaps unexpected. \code{data.table} detects this [...]
+    \code{truelength(x)} returns the length of the vector allocated in memory. \code{length(x)} of those items are in use. Currently, it's just the list vector of column pointers that is over-allocated (i.e. \code{truelength(DT)}), not the column vectors themselves, which would in future allow fast row \code{insert()}. For tables loaded from disk however, \code{truelength} is 0 in \R 2.14.0+ (and random in \R <= 2.13.2), which is perhaps unexpected. \code{data.table} detects this state a [...]
     
     \code{alloc.col} \emph{reallocates} \code{DT} by reference. This may be useful for efficiency if you know you are about to going to add a lot of columns in a loop. It also returns the new \code{DT}, for convenience in compound queries.
 }
@@ -33,12 +33,12 @@ alloc.col(DT,
 \examples{
 DT = data.table(a=1:3,b=4:6)
 length(DT)                 # 2 column pointer slots used
-truelength(DT)             # 100 column pointer slots allocated
-alloc.col(DT,200)
+truelength(DT)             # 1026 column pointer slots allocated
+alloc.col(DT,2048)
 length(DT)                 # 2 used
-truelength(DT)             # 200 allocated, 198 free
+truelength(DT)             # 2050 allocated, 2048 free
 DT[,c:=7L]                 # add new column by assigning to spare slot
-truelength(DT)-length(DT)  # 197 slots spare
+truelength(DT)-length(DT)  # 2047 slots spare
 }
 \keyword{ data }
 
diff --git a/man/tstrsplit.Rd b/man/tstrsplit.Rd
index 7944cc6..58bf6da 100644
--- a/man/tstrsplit.Rd
+++ b/man/tstrsplit.Rd
@@ -8,19 +8,23 @@
 }
 
 \usage{
-tstrsplit(x, ..., fill=NA, type.convert=FALSE)
+tstrsplit(x, ..., fill=NA, type.convert=FALSE, keep, names=FALSE)
 }
 \arguments{
 	\item{x}{The vector to split (and transpose).}
   \item{...}{ All the arguments to be passed to \code{strsplit}. }
   \item{fill}{ Default is \code{NA}. It is used to fill shorter list elements so as to return each element of the transposed result of equal lengths. }
   \item{type.convert}{\code{TRUE} calls \code{\link{type.convert}} with \code{as.is=TRUE} on the columns.}
+  \item{keep}{Specify indices corresponding to just those list elements to retain in the transposed result. Default is to return all.}
+  \item{names}{\code{TRUE} auto names the list with \code{V1, V2} etc. Default (\code{FALSE}) is to return an unnamed list.}
 }
 \details{
-  It internally calls \code{strsplit} first, and then \code{\link{transpose}} on the result.
+  It internally calls \code{strsplit} first, and then \code{\link{transpose}} on the result. 
+
+  \code{names} argument can be used to return an auto named list, although this argument does not have any effect when used with \code{:=}, which requires names to be provided explicitly. It might be useful in other scenarios.
 }
 \value{
-  A transposed list.
+  A transposed list after splitting by the pattern provided.
 }
 
 \examples{
@@ -29,7 +33,14 @@ strsplit(x, "", fixed=TRUE)
 tstrsplit(x, "", fixed=TRUE)
 tstrsplit(x, "", fixed=TRUE, fill="<NA>")
 
+# using keep to return just 1,3,5
+tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5))
+
+# names argument
+tstrsplit(x, "", fixed=TRUE, keep=c(1,3,5), names=LETTERS[1:3])
+
 DT = data.table(x=c("A/B", "A", "B"), y=1:3)
+DT[, c("c1") := tstrsplit(x, "/", fixed=TRUE, keep=1L)][]
 DT[, c("c1", "c2") := tstrsplit(x, "/", fixed=TRUE)][]
 }
 \seealso{
diff --git a/src/Makevars b/src/Makevars
index df3b51e..5162e62 100644
--- a/src/Makevars
+++ b/src/Makevars
@@ -1,5 +1,9 @@
 
+PKG_CFLAGS = $(SHLIB_OPENMP_CFLAGS)
+PKG_LIBS = $(SHLIB_OPENMP_CFLAGS)
+
 all: $(SHLIB)
 	mv $(SHLIB) datatable$(SHLIB_EXT)
+	if [ "$(OS)" != "Windows_NT" ] && [ `uname -s` = 'Darwin' ]; then install_name_tool -id datatable$(SHLIB_EXT) datatable$(SHLIB_EXT); fi
 
 
diff --git a/src/assign.c b/src/assign.c
index c9688cf..a1fb47d 100644
--- a/src/assign.c
+++ b/src/assign.c
@@ -213,17 +213,29 @@ SEXP alloccol(SEXP dt, R_len_t n, Rboolean verbose)
     tl = TRUELENGTH(dt);
     if (tl<0) error("Internal error, tl of class is marked but tl<0.");  // R <= 2.13.2 and we didn't catch uninitialized tl somehow
     if (tl>0 && tl<l) error("Internal error, please report (including result of sessionInfo()) to datatable-help: tl (%d) < l (%d) but tl of class is marked.", tl, l);
-    if (tl>l+1000) warning("tl (%d) is greater than 1000 items over-allocated (l = %d). If you didn't set the datatable.alloccol option to be very large, please report this to datatable-help including the result of sessionInfo().",tl,l);
+    if (tl>l+10000) warning("tl (%d) is greater than 10,000 items over-allocated (l = %d). If you didn't set the datatable.alloccol option to be very large, please report this to datatable-help including the result of sessionInfo().",tl,l);
     if (n>tl) return(shallow(dt,R_NilValue,n)); // usual case (increasing alloc)
-    if (n<tl) warning("Attempt to reduce allocation from %d to %d ignored. Can only increase allocation via shallow copy.",tl,n);
+    if (n<tl && verbose) Rprintf("Attempt to reduce allocation from %d to %d ignored. Can only increase allocation via shallow copy. Please do not use DT[...]<- or DT$someCol<-. Use := inside DT[...] instead.",tl,n);
               // otherwise the finalizer can't clear up the Large Vector heap
     return(dt);
 }
 
 SEXP alloccolwrapper(SEXP dt, SEXP newncol, SEXP verbose) {
-    if (!isInteger(newncol) || length(newncol)!=1) error("n must be integer length 1. Has datatable.alloccol somehow become unset?");
+    if (!isInteger(newncol) || length(newncol)!=1) error("n must be integer length 1. Has getOption('datatable.alloccol') somehow become unset?");
     if (!isLogical(verbose) || length(verbose)!=1) error("verbose must be TRUE or FALSE"); 
-    return(alloccol(dt, INTEGER(newncol)[0], LOGICAL(verbose)[0]));
+    
+    SEXP ans = PROTECT(alloccol(dt, INTEGER(newncol)[0], LOGICAL(verbose)[0]));
+    
+    for(R_len_t i = 0; i < LENGTH(ans); i++) {
+        // clear the same excluded by copyMostAttrib(). Primarily for data.table and as.data.table, but added here centrally (see #4890).
+
+	setAttrib(VECTOR_ELT(ans, i), R_NamesSymbol, R_NilValue);
+	setAttrib(VECTOR_ELT(ans, i), R_DimSymbol, R_NilValue);
+	setAttrib(VECTOR_ELT(ans, i), R_DimNamesSymbol, R_NilValue);
+    }
+
+    UNPROTECT(1);
+    return ans;
 }
 
 SEXP shallowwrapper(SEXP dt, SEXP cols) {
@@ -309,27 +321,33 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
     nrow = length(VECTOR_ELT(dt,0));
     if (isNull(rows)) {
         targetlen = nrow;
+        if (verbose) Rprintf("Assigning to all %d rows\n", nrow);
         // fast way to assign to whole column, without creating 1:nrow(x) vector up in R, or here in C
     } else {
-        if (!LENGTH(rows)) return(dt);
         if (isReal(rows)) {
             rows = PROTECT(rows = coerceVector(rows, INTSXP));
             protecti++;
             warning("Coerced i from numeric to integer. Please pass integer for efficiency; e.g., 2L rather than 2");
-        }    
+        }
         if (!isInteger(rows))
             error("i is type '%s'. Must be integer, or numeric is coerced with warning. If i is a logical subset, simply wrap with which(), and take the which() outside the loop if possible for efficiency.", type2char(TYPEOF(rows)));
         targetlen = length(rows);
-        Rboolean anyToDo = FALSE;
-        for (i=0;i<targetlen;i++) {
+        int numToDo = 0;
+        for (i=0; i<targetlen; i++) {
             if ((INTEGER(rows)[i]<0 && INTEGER(rows)[i]!=NA_INTEGER) || INTEGER(rows)[i]>nrow)
                 error("i[%d] is %d which is out of range [1,nrow=%d].",i+1,INTEGER(rows)[i],nrow);
-            if (INTEGER(rows)[i]>=1) anyToDo = TRUE;
+            if (INTEGER(rows)[i]>=1) numToDo++;
+        }
+        if (verbose) Rprintf("Assigning to %d row subset of %d rows\n", numToDo, nrow);
+        // TODO: include in message if any rows are assigned several times (e.g. by=.EACHI with dups in i)
+        if (numToDo==0) {
+            if (!length(newcolnames)) return(dt); // all items of rows either 0 or NA. !length(newcolnames) for #759
+            if (verbose) Rprintf("Added %d new column%s initialized with all-NA\n",
+                                 length(newcolnames), (length(newcolnames)>1)?"s":"");
         }
-        if (!anyToDo) return(dt);  // all items of rows either 0 or NA, nothing to do.  
     }
     if (!length(cols)) {
-        warning("length(LHS) = 0, meaning no columns to delete or assign RHS to.");
+        warning("length(LHS)==0; no columns to delete or assign RHS to.");
         return(dt);
     }
     // FR #2077 - set able to add new cols by reference
@@ -406,7 +424,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
                     warning("Adding new column '%s' then assigning NULL (deleting it).",CHAR(STRING_ELT(newcolnames,newcolnum)));
                     continue;
                 }
-                error("RHS of assignment to new column '%s' is zero length but not empty list(). For new columns the RHS must either be empty list() to create an empty list column, or, have length > 0; e.g. NA_integer_, 0L, etc.", CHAR(STRING_ELT(newcolnames,newcolnum)));
+                // RHS of assignment to new column is zero length but we'll use its type to create all-NA column of that type
             }
         }
         if (!(isVectorAtomic(thisvalue) || isNewList(thisvalue)))  // NULL had a continue earlier above
@@ -414,23 +432,23 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
         if (isMatrix(thisvalue) && (j=INTEGER(getAttrib(thisvalue, R_DimSymbol))[1]) > 1)  // matrix passes above (considered atomic vector)
             warning("%d column matrix RHS of := will be treated as one vector", j);
         if ((coln+1)<=oldncol && isFactor(VECTOR_ELT(dt,coln)) &&
-            !isString(thisvalue) && TYPEOF(thisvalue)!=INTSXP && !isReal(thisvalue) && !isNewList(thisvalue))  // !=INTSXP includes factor
+            !isString(thisvalue) && TYPEOF(thisvalue)!=INTSXP && TYPEOF(thisvalue)!=LGLSXP && !isReal(thisvalue) && !isNewList(thisvalue))  // !=INTSXP includes factor
             error("Can't assign to column '%s' (type 'factor') a value of type '%s' (not character, factor, integer or numeric)", CHAR(STRING_ELT(names,coln)),type2char(TYPEOF(thisvalue)));
-        if (nrow>0) {
+        if (nrow>0 && targetlen>0) {
             if (vlen>targetlen)
                 warning("Supplied %d items to be assigned to %d items of column '%s' (%d unused)", vlen, targetlen,CHAR(colnam),vlen-targetlen);  
             else if (vlen>0 && targetlen%vlen != 0)
                 warning("Supplied %d items to be assigned to %d items of column '%s' (recycled leaving remainder of %d items).",vlen,targetlen,CHAR(colnam),targetlen%vlen);
         }
     }
-    // having now checked the inputs, from this point there should be no errors,
-    // so we can now proceed to modify DT by reference.
+    // having now checked the inputs, from this point there should be no errors so we can now proceed to
+    // modify DT by reference. Other than if new columns are being added and the allocVec() fails with
+    // out-of-memory. In that case the user will receive hard halt and know to rerun.
     if (length(newcolnames)) {
-        //if (length(rows)!=0) error("Attempt to add new column(s) and set subset of rows at the same time. Create the new column(s) first, and then you'll be able to assign to a subset. If i is set to 1:nrow(x) then please remove that (no need, it's faster without).");
         oldtncol = TRUELENGTH(dt);   // TO DO: oldtncol can be just called tl now, as we won't realloc here any more.
         
         if (oldtncol<oldncol) error("Internal error, please report (including result of sessionInfo()) to datatable-help: oldtncol (%d) < oldncol (%d) but tl of class is marked.", oldtncol, oldncol);
-        if (oldtncol>oldncol+1000L) warning("truelength (%d) is greater than 1000 items over-allocated (length = %d). See ?truelength. If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().",oldtncol, oldncol); 
+        if (oldtncol>oldncol+10000L) warning("truelength (%d) is greater than 10,000 items over-allocated (length = %d). See ?truelength. If you didn't set the datatable.alloccol option very large, please report this to datatable-help including the result of sessionInfo().",oldtncol, oldncol); 
         
         if (oldtncol < oldncol+LENGTH(newcolnames))
             error("Internal logical error. DT passed to assign has not been allocated enough column slots. l=%d, tl=%d, adding %d", oldncol, oldtncol, LENGTH(newcolnames));
@@ -458,9 +476,9 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
             continue;   // delete column(s) afterwards, below this loop
         }
         vlen = length(thisvalue);
-        if (length(rows)==0 && targetlen==vlen) {
+        if (length(rows)==0 && targetlen==vlen && (vlen>0 || nrow==0)) {
             if (  NAMED(thisvalue)==2 ||  // set() protects the NAMED of atomic vectors from .Call setting arguments to 2 by wrapping with list
-                 (TYPEOF(values)==VECSXP && i>LENGTH(values)-1)) { // recycled RHS would have columns pointing to others, #2298.
+                 (TYPEOF(values)==VECSXP && i>LENGTH(values)-1)) { // recycled RHS would have columns pointing to others, #185.
                 if (verbose) {
                     if (NAMED(thisvalue)==2) Rprintf("RHS for item %d has been duplicated because NAMED is %d, but then is being plonked.\n",i+1, NAMED(thisvalue));
                     else Rprintf("RHS for item %d has been duplicated because the list of RHS values (length %d) is being recycled, but then is being plonked.\n", i+1, length(values));
@@ -476,10 +494,10 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
             continue;
         }
         if (coln+1 > oldncol) {  // new column
-            if (length(rows)) 
-                newcol = allocNAVector(TYPEOF(thisvalue),nrow);  // fill with NAs first for where 'rows' (a subset) doesn't touch
-            else              
-                newcol = allocVector(TYPEOF(thisvalue),nrow);    // PROTECT not needed, protected by SET_VECTOR_ELT on next line
+            newcol = allocNAVector(TYPEOF(thisvalue),nrow);
+            // initialize with NAs for when 'rows' is a subset and it doesn't touch
+            // do not try to save the time to NA fill (contiguous branch free assign anyway) since being
+            // sure all items will be written to (isNull(rows), length(rows), vlen<1, targetlen) is not worth the risk.
             SET_VECTOR_ELT(dt,coln,newcol);
             if (isVectorAtomic(thisvalue)) copyMostAttrib(thisvalue,newcol);  // class etc but not names
             // else for lists (such as data.frame and data.table) treat them as raw lists and drop attribs
@@ -529,8 +547,9 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
                                 PROTECT(addlevels = growVector(addlevels, length(addlevels)+1000));
                                 protecti++;
                             }
-                            SET_STRING_ELT(addlevels,addi,thisv);
-                            SET_TRUELENGTH(thisv,++addi+length(targetlevels));  
+                            SET_STRING_ELT(addlevels,addi++,thisv);
+                            // if-else for #1718 fix
+                            SET_TRUELENGTH(thisv, (thisv != NA_STRING) ? (addi+length(targetlevels)) : NA_INTEGER);
                         }
                         INTEGER(RHS)[j] = TRUELENGTH(thisv);
                     }
@@ -546,12 +565,13 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
                     savetl_end();
                 } else {
                     // value is either integer or numeric vector
-                    if (TYPEOF(thisvalue)!=INTSXP && !isReal(thisvalue))
+                    if (TYPEOF(thisvalue)!=INTSXP && TYPEOF(thisvalue)!=LGLSXP && !isReal(thisvalue))
                         error("Internal logical error. Up front checks (before starting to modify DT) didn't catch type of RHS ('%s') assigning to factor column '%s'. Please report to datatable-help.", type2char(TYPEOF(thisvalue)), CHAR(STRING_ELT(names,coln)));
-                    if (isReal(thisvalue)) {
+                    if (isReal(thisvalue) || TYPEOF(thisvalue)==LGLSXP) {
                         PROTECT(RHS = coerceVector(thisvalue,INTSXP));
                         protecti++;
-                        warning("Coerced 'double' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.");
+                        // silence warning on singleton NAs
+                        if (INTEGER(RHS)[0] != NA_INTEGER) warning("Coerced '%s' RHS to 'integer' to match the factor column's underlying type. Character columns are now recommended (can be in keys), or coerce RHS to integer or character first.", type2char(TYPEOF(thisvalue)));
                     } else RHS = thisvalue;
                     for (j=0; j<length(RHS); j++) {
                         if ( (INTEGER(RHS)[j]<1 || INTEGER(RHS)[j]>LENGTH(targetlevels)) 
@@ -589,7 +609,7 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
                 }
             }
         }
-        memrecycle(targetcol, rows, 0, length(rows) ? LENGTH(rows) : LENGTH(targetcol), RHS);  // also called from dogroups
+        memrecycle(targetcol, rows, 0, targetlen, RHS);  // also called from dogroups where these arguments are used more
     }
     key = getAttrib(dt,install("sorted"));
     if (length(key)) {
@@ -619,7 +639,15 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
             for (i=0; i<LENGTH(cols); i++) {
                 tc2 = c2 = CHAR(STRING_ELT(names, INTEGER(cols)[i]-1));  // the column name being updated; e.g. "col1"
                 while (*tc1) {
-                    if (*tc1!='_' || *(tc1+1)!='_') error("Internal error: __ not found in index name");
+                    if (*tc1!='_' || *(tc1+1)!='_') {
+                    	// fix for #1396
+                    	if (verbose) {
+                    		Rprintf("Dropping index '%s' as it doesn't have '__' at the beginning of its name. It was very likely created by v1.9.4 of data.table.\n", c1);
+                    	}
+                    	setAttrib(index, a, R_NilValue);
+                    	i = LENGTH(cols);
+                    	break;
+                    }
                     tc1 += 2;
                     if (*tc1=='\0') error("Internal error: index name ends with trailing __");
                     while (*tc1 && *tc2 && *tc1 == *tc2) { tc1++; tc2++; }
@@ -687,16 +715,41 @@ SEXP assign(SEXP dt, SEXP rows, SEXP cols, SEXP newcolnames, SEXP values, SEXP v
     return(dt);  // needed for `*tmp*` mechanism (when := isn't used), and to return the new object after a := for compound syntax.
 }
 
+static Rboolean anyNamed(SEXP x) {
+    if (NAMED(x)) return TRUE;
+    if (isNewList(x)) for (int i=0; i<LENGTH(x); i++)
+        if (anyNamed(VECTOR_ELT(x,i))) return TRUE;
+    return FALSE;
+}
+
 void memrecycle(SEXP target, SEXP where, int start, int len, SEXP source)
 // like memcpy but recycles source and takes care of aging
 // 'where' a 1-based INTEGER vector subset of target to assign to,  or NULL or integer()
 // assigns to target[start:start+len-1] or target[where[start:start+len-1]]  where start is 0-based
 {
-    int r = 0, w;
+    int r=0, w, protecti=0;
     if (len<1) return;
     int slen = length(source) > len ? len : length(source); // fix for 5647. when length(source) > len, slen must be len.
     if (slen<1) return;
     if (TYPEOF(target) != TYPEOF(source)) error("Internal error: TYPEOF(target)['%s']!=TYPEOF(source)['%s']", type2char(TYPEOF(target)),type2char(TYPEOF(source)));
+    if (isNewList(source)) {
+        // A list() column; i.e. target is a column of pointers to SEXPs rather than the much more common case
+        // where memrecycle copies the DATAPTR data to the atomic target from the atomic source.
+        // If any item within the list is NAMED then take a fresh copy. So far this has occurred from dogroups.c when
+        // j returns .BY or similar specials as-is within a list(). Those specials are static inside
+        // dogroups so if we don't copy now the last value written to them by dogroups becomes repeated in the result;
+        // i.e. the wrong result.
+        // If source is itself recycled later (many list() column items pointing to the same object) we are ok with that
+        // since we now have a fresh copy and := will not assign with a list() column's cell value; := only changes the
+        // SEXP pointed to.
+        // If source is already not named (because j already created a fresh unnamed vector within a list()) we don't want to
+        // duplicate unnecessarily, hence checking for named rather than duplicating always.
+        // See #481 and #1270
+        if (anyNamed(source)) {
+            source = PROTECT(duplicate(source));
+            protecti++;
+        }
+    }
     size_t size = SIZEOF(target);
     if (!length(where)) {
         switch (TYPEOF(target)) {
@@ -772,12 +825,13 @@ void memrecycle(SEXP target, SEXP where, int start, int len, SEXP source)
         }
         // if slen>10 it may be worth memcpy, but we'd need to first know if 'where' was a contiguous subset
     }
+    UNPROTECT(protecti);
 }
 
 SEXP allocNAVector(SEXPTYPE type, R_len_t n)
 {
     // an allocVector following with initialization to NA since a subassign to a new column using :=
-    // routinely leaves untouched items (rather than 0 or "" as allocVector does with it's memset)
+    // routinely leaves untouched items (rather than 0 or "" as allocVector does with its memset)
     // We guess that author of allocVector would have liked to initialize with NA but was prevented since memset
     // is restricted to one byte.
     R_len_t i;
diff --git a/src/between.c b/src/between.c
new file mode 100644
index 0000000..43b2455
--- /dev/null
+++ b/src/between.c
@@ -0,0 +1,104 @@
+#include "data.table.h"
+#include <Rdefines.h>
+
+static double l=0.0, u=0.0;
+
+Rboolean int_upper_closed(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER || (double)INTEGER(x)[i] <= u ? NA_LOGICAL : FALSE);
+}
+
+Rboolean int_upper_open(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER || (double)INTEGER(x)[i] < u ? NA_LOGICAL : FALSE);
+}
+
+Rboolean int_lower_closed(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER || (double)INTEGER(x)[i] >= l ? NA_LOGICAL : FALSE);
+}
+
+Rboolean int_lower_open(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER || (double)INTEGER(x)[i] > l ? NA_LOGICAL : FALSE);
+}
+
+Rboolean int_both_closed(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER ? NA_LOGICAL : ((double)INTEGER(x)[i] >= l && (double)INTEGER(x)[i] <= u));
+}
+
+Rboolean int_both_open(SEXP x, R_len_t i) {
+    return (INTEGER(x)[i] == NA_INTEGER ? NA_LOGICAL : ((double)INTEGER(x)[i] > l && (double)INTEGER(x)[i] < u));
+}
+
+Rboolean double_upper_closed(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) || REAL(x)[i] <= u ? NA_LOGICAL : FALSE);
+}
+
+Rboolean double_upper_open(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) || REAL(x)[i] < u ? NA_LOGICAL : FALSE);
+}
+
+Rboolean double_lower_closed(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) || REAL(x)[i] >= l ? NA_LOGICAL : FALSE);
+}
+
+Rboolean double_lower_open(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) || REAL(x)[i] > l ? NA_LOGICAL : FALSE);
+}
+
+Rboolean double_both_closed(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) ? NA_LOGICAL : (REAL(x)[i] >= l && REAL(x)[i] <= u));
+}
+
+Rboolean double_both_open(SEXP x, R_len_t i) {
+    return (ISNAN(REAL(x)[i]) ? NA_LOGICAL : (REAL(x)[i] > l && REAL(x)[i] < u));
+}
+
+SEXP between(SEXP x, SEXP lower, SEXP upper, SEXP bounds) {
+
+    R_len_t i, nx = length(x), nl = length(lower), nu = length(upper);
+    l = 0.0; u = 0.0;
+    SEXP ans;
+    Rboolean (*flower)(), (*fupper)(), (*fboth)();
+    if (!nx || !nl || !nu)
+        return (allocVector(LGLSXP, 0));
+    if (nl != 1 && nl != nx)
+        error("length(lower) (%d) must be either 1 or length(x) (%d)", nl, nx);
+    if (nu != 1 && nu != nx)
+        error("length(upper) (%d) must be either 1 or length(x) (%d)", nu, nx);
+    if (!isLogical(bounds) || LOGICAL(bounds)[0] == NA_LOGICAL)
+        error("incbounds must be logical TRUE/FALSE.");
+
+    // no support for int64 yet (only handling most common cases)
+    // coerce to also get NA values properly
+    lower = PROTECT(coerceVector(lower, REALSXP)); l = REAL(lower)[0];
+    upper = PROTECT(coerceVector(upper, REALSXP)); u = REAL(upper)[0];
+    ans   = PROTECT(allocVector(LGLSXP, nx));
+
+    if (LOGICAL(bounds)[0]) {
+        fupper = isInteger(x) ? &int_upper_closed : &double_upper_closed;
+        flower = isInteger(x) ? &int_lower_closed : &double_lower_closed;
+        fboth  = isInteger(x) ? &int_both_closed  : &double_both_closed;
+    } else {
+        fupper = isInteger(x) ? &int_upper_open : &double_upper_open;
+        flower = isInteger(x) ? &int_lower_open : &double_lower_open;
+        fboth  = isInteger(x) ? &int_both_open  : &double_both_open;
+    }
+
+    if ( ISNAN(REAL(lower)[0]) ) {
+        if ( ISNAN(REAL(upper)[0]) ) {
+            #pragma omp parallel for num_threads(getDTthreads())
+            for (i=0; i<nx; i++) LOGICAL(ans)[i] = NA_LOGICAL;
+        } else {
+            #pragma omp parallel for num_threads(getDTthreads())
+            for (i=0; i<nx; i++) LOGICAL(ans)[i] = fupper(x, i);
+        }
+    } else {
+        if ( ISNAN(REAL(upper)[0]) ) {
+            #pragma omp parallel for num_threads(getDTthreads())
+            for (i=0; i<nx; i++) LOGICAL(ans)[i] = flower(x, i);
+        } else {
+            #pragma omp parallel for num_threads(getDTthreads())
+            for (i=0; i<nx; i++) LOGICAL(ans)[i] = fboth(x, i);
+        }
+    }
+    UNPROTECT(3);
+    return(ans);
+}
diff --git a/src/bmerge.c b/src/bmerge.c
index 4fd1dc2..012ac9b 100644
--- a/src/bmerge.c
+++ b/src/bmerge.c
@@ -1,4 +1,5 @@
 #include "data.table.h"
+#include <signal.h> // the debugging machinery + breakpoint aidee
 
 /*
 Implements binary search (a.k.a. divide and conquer).
@@ -19,29 +20,30 @@ Differences over standard binary search (e.g. bsearch in stdlib.h) :
 #define ENC_KNOWN(x) (LEVELS(x) & 12)
 // 12 = LATIN1_MASK (1<<2) | UTF8_MASK (1<<3)  // Would use these definitions from Defn.h, but that appears to be private to R. Hence 12.
 
-static SEXP i, x;
-static int ncol, *icols, *xcols, *o, *xo, *retFirst, *retLength, *allLen1, *rollends;
+#define EQ 1
+#define LE 2
+#define LT 3
+#define GE 4
+#define GT 5
+
+static SEXP i, x, nqgrp;
+static int ncol, *icols, *xcols, *o, *xo, *retFirst, *retLength, *retIndex, *allLen1, *allGrp1, *rollends, ilen, anslen;
+static int *op, nqmaxgrp, *tmpptr, scols;
+static int ctr, nomatch; // populating matches for non-equi joins
+enum {ALL, FIRST, LAST} mult = ALL;
 static double roll, rollabs;
-static Rboolean nearest=FALSE, enc_warn=TRUE;
+static Rboolean rollToNearest=FALSE;
 #define XIND(i) (xo ? xo[(i)]-1 : i)
 
-void bmerge_r(int xlow, int xupp, int ilow, int iupp, int col, int lowmax, int uppmax);
+void bmerge_r(int xlow, int xupp, int ilow, int iupp, int col, int thisgrp, int lowmax, int uppmax);
 
-SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SEXP xoArg, SEXP rollarg, SEXP rollendsArg, SEXP nomatch, SEXP retFirstArg, SEXP retLengthArg, SEXP allLen1Arg)
-{
+SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SEXP xoArg, SEXP rollarg, SEXP rollendsArg, SEXP nomatchArg, SEXP multArg, SEXP opArg, SEXP nqgrpArg, SEXP nqmaxgrpArg) {
     int xN, iN, protecti=0;
-    roll = 0.0;
-    nearest = FALSE;
-    enc_warn = TRUE;
-    if (isString(rollarg)) {
-        if (strcmp(CHAR(STRING_ELT(rollarg,0)),"nearest") != 0) error("roll is character but not 'nearest'");
-        roll=1.0; nearest=TRUE;       // the 1.0 here is just any non-0.0, so roll!=0.0 can be used later
-    } else {
-        if (!isReal(rollarg)) error("Internal error: roll is not character or double");
-        roll = REAL(rollarg)[0];   // more common case (rolling forwards or backwards) or no roll when 0.0
-    }
-    rollabs = fabs(roll);
-    
+    ctr=0; // needed for non-equi join case
+    SEXP retFirstArg, retLengthArg, retIndexArg, allLen1Arg, allGrp1Arg;
+    retFirstArg = retLengthArg = retIndexArg = R_NilValue; // suppress gcc msg
+
+    // iArg, xArg, icolsArg and xcolsArg
     i = iArg; x = xArg;  // set globals so bmerge_r can see them.
     if (!isInteger(icolsArg)) error("Internal error: icols is not integer vector");
     if (!isInteger(xcolsArg)) error("Internal error: xcols is not integer vector");
@@ -49,7 +51,7 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE
     icols = INTEGER(icolsArg);
     xcols = INTEGER(xcolsArg);
     xN = LENGTH(VECTOR_ELT(x,0));
-    iN = LENGTH(VECTOR_ELT(i,0));
+    iN = ilen = anslen = LENGTH(VECTOR_ELT(i,0));
     ncol = LENGTH(icolsArg);    // there may be more sorted columns in x than involved in the join
     for(int col=0; col<ncol; col++) {
         if (icols[col]==NA_INTEGER) error("Internal error. icols[%d] is NA", col);
@@ -60,25 +62,84 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE
         int xt = TYPEOF(VECTOR_ELT(x, xcols[col]-1));
         if (it != xt) error("typeof x.%s (%s) != typeof i.%s (%s)", CHAR(STRING_ELT(getAttrib(x,R_NamesSymbol),xcols[col]-1)), type2char(xt), CHAR(STRING_ELT(getAttrib(i,R_NamesSymbol),icols[col]-1)), type2char(it));
     }
-    if (!isInteger(retFirstArg) || LENGTH(retFirstArg)!=iN) error("retFirst must be integer vector the same length as nrow(i)");
-    retFirst = INTEGER(retFirstArg);
-    if (!isInteger(retLengthArg) || LENGTH(retLengthArg)!=iN) error("retLength must be integer vector the same length as nrow(i)");
-    retLength = INTEGER(retLengthArg);
-    if (!isLogical(allLen1Arg) || LENGTH(allLen1Arg) != 1) error("allLen1 must be a length 1 logical vector");
-    allLen1 = LOGICAL(allLen1Arg);
-    if (!isLogical(rollendsArg) || LENGTH(rollendsArg) != 2) error("rollends must be a length 2 logical vector");
+    // raise(SIGINT);
+
+    // rollArg, rollendsArg
+    roll = 0.0; rollToNearest = FALSE;
+    if (isString(rollarg)) {
+        if (strcmp(CHAR(STRING_ELT(rollarg,0)),"nearest") != 0) error("roll is character but not 'nearest'");
+        roll=1.0; rollToNearest=TRUE;       // the 1.0 here is just any non-0.0, so roll!=0.0 can be used later
+    } else {
+        if (!isReal(rollarg)) error("Internal error: roll is not character or double");
+        roll = REAL(rollarg)[0];   // more common case (rolling forwards or backwards) or no roll when 0.0
+    }
+    rollabs = fabs(roll);
+    if (!isLogical(rollendsArg) || LENGTH(rollendsArg) != 2)
+        error("rollends must be a length 2 logical vector");
     rollends = LOGICAL(rollendsArg);
-    
-    if (nearest && TYPEOF(VECTOR_ELT(i, icols[ncol-1]-1))==STRSXP) error("roll='nearest' can't be applied to a character column, yet.");
-         
-    for (int j=0; j<iN; j++) {
+    if (rollToNearest && TYPEOF(VECTOR_ELT(i, icols[ncol-1]-1))==STRSXP)
+        error("roll='nearest' can't be applied to a character column, yet.");
+
+    // nomatch arg
+    nomatch = INTEGER(nomatchArg)[0];
+
+    // mult arg
+    if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "all")) mult = ALL;
+    else if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "first")) mult = FIRST;
+    else if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "last")) mult = LAST;
+    else error("Internal error: invalid value for 'mult'. Please report to datatable-help");
+
+    // opArg
+    if (!isInteger(opArg) || length(opArg) != ncol)
+        error("Internal error: opArg is not an integer vector of length equal to length(on)");
+    op = INTEGER(opArg);
+    if (!isInteger(nqgrpArg))
+        error("Internal error: nqgrpArg must be an integer vector");
+    nqgrp = nqgrpArg; // set global for bmerge_r
+    scols = (!length(nqgrpArg)) ? 0 : -1; // starting col index, -1 is external group column for non-equi join case
+
+    // nqmaxgrpArg
+    if (!isInteger(nqmaxgrpArg) || length(nqmaxgrpArg) != 1 || INTEGER(nqmaxgrpArg)[0] <= 0)
+        error("Intrnal error: nqmaxgrpArg is not a positive length-1 integer vector");
+    nqmaxgrp = INTEGER(nqmaxgrpArg)[0];
+    if (nqmaxgrp>1 && mult == ALL) {
+        // non-equi case with mult=ALL, may need reallocation
+        anslen = 1.1 * ((iN > 1000) ? iN : 1000);
+        retFirst = Calloc(anslen, int); // anslen is set above
+        retLength = Calloc(anslen, int);
+        retIndex = Calloc(anslen, int);
+        if (retFirst==NULL || retLength==NULL || retIndex==NULL)
+            error("Internal error in allocating memory for non-equi join");
+        // initialise retIndex here directly, as next loop is meant for both equi and non-equi joins
+        for (int j=0; j<anslen; j++) retIndex[j] = j+1;
+    } else { // equi joins (or) non-equi join but no multiple matches
+        retFirstArg = PROTECT(allocVector(INTSXP, anslen));
+        retFirst = INTEGER(retFirstArg);
+        retLengthArg = PROTECT(allocVector(INTSXP, anslen)); // TODO: no need to allocate length at all when
+        retLength = INTEGER(retLengthArg);                   // mult = "first" / "last"
+        retIndexArg = PROTECT(allocVector(INTSXP, 0));
+        retIndex = INTEGER(retIndexArg);
+        protecti += 3;
+    }
+    for (int j=0; j<anslen; j++) {
         // defaults need to populated here as bmerge_r may well not touch many locations, say if the last row of i is before the first row of x.
-        retFirst[j] = INTEGER(nomatch)[0];   // default to no match for NA goto below
+        retFirst[j] = nomatch;   // default to no match for NA goto below
         // retLength[j] = 0;   // TO DO: do this to save the branch below and later branches at R level to set .N to 0
-        retLength[j] = INTEGER(nomatch)[0]==0 ? 0 : 1;
+        retLength[j] = nomatch==0 ? 0 : 1;
     }
+
+    // allLen1Arg
+    allLen1Arg = PROTECT(allocVector(LGLSXP, 1));
+    allLen1 = LOGICAL(allLen1Arg);
     allLen1[0] = TRUE;  // All-0 and All-NA are considered all length 1 according to R code currently. Really, it means any(length>1).
-    
+
+    // allGrp1Arg, if TRUE, out of all nested group ids, only one of them matches 'x'. Might be rare, but helps to be more efficient in that case.
+    allGrp1Arg = PROTECT(allocVector(LGLSXP, 1));
+    allGrp1 = LOGICAL(allGrp1Arg);
+    allGrp1[0] = TRUE;
+    protecti += 2;
+
+    // isorted arg
     o = NULL;
     if (!LOGICAL(isorted)[0]) {
         SEXP order = PROTECT(vec_init(length(icolsArg), ScalarInteger(1))); // rep(1, length(icolsArg))
@@ -86,44 +147,93 @@ SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, SE
         protecti += 2;
         if (!LENGTH(oSxp)) o = NULL; else o = INTEGER(oSxp);
     }
+
+    // xo arg
     xo = NULL;
     if (length(xoArg)) {
         if (!isInteger(xoArg)) error("Internal error: xoArg is not an integer vector");
         xo = INTEGER(xoArg);
     }
-    
-    if (iN) bmerge_r(-1,xN,-1,iN,0,1,1);
-    
+
+    // start bmerge
+    if (iN) {
+        // embarassingly parallel if we've storage space for nqmaxgrp*iN
+        for (int kk=0; kk<nqmaxgrp; kk++) {
+            bmerge_r(-1,xN,-1,iN,scols,kk+1,1,1);
+        }
+    }
+    ctr += iN;
+    if (nqmaxgrp > 1 && mult == ALL) {
+        // memcpy ret* to SEXP
+        retFirstArg = PROTECT(allocVector(INTSXP, ctr));
+        retLengthArg = PROTECT(allocVector(INTSXP, ctr));
+        retIndexArg = PROTECT(allocVector(INTSXP, ctr));
+        protecti += 3;
+        memcpy(INTEGER(retFirstArg), retFirst, sizeof(int)*ctr);
+        memcpy(INTEGER(retLengthArg), retLength, sizeof(int)*ctr);
+        memcpy(INTEGER(retIndexArg), retIndex, sizeof(int)*ctr);
+    }
+    SEXP ans = PROTECT(allocVector(VECSXP, 5)); protecti++;
+    SEXP ansnames = PROTECT(allocVector(STRSXP, 5)); protecti++;
+    SET_VECTOR_ELT(ans, 0, retFirstArg);
+    SET_VECTOR_ELT(ans, 1, retLengthArg);
+    SET_VECTOR_ELT(ans, 2, retIndexArg);
+    SET_VECTOR_ELT(ans, 3, allLen1Arg);
+    SET_VECTOR_ELT(ans, 4, allGrp1Arg);
+    SET_STRING_ELT(ansnames, 0, mkChar("starts"));
+    SET_STRING_ELT(ansnames, 1, mkChar("lens"));
+    SET_STRING_ELT(ansnames, 2, mkChar("indices"));
+    SET_STRING_ELT(ansnames, 3, mkChar("allLen1"));
+    SET_STRING_ELT(ansnames, 4, mkChar("allGrp1"));
+    setAttrib(ans, R_NamesSymbol, ansnames);
+    if (nqmaxgrp > 1 && mult == ALL) {
+        Free(retFirst);
+        Free(retLength);
+        Free(retIndex);
+    }
     UNPROTECT(protecti);
-    return(R_NilValue);
+    return (ans);
 }
 
 static union {
   int i;
   double d;
-  unsigned long long ll;
+  unsigned long long ull;
+  long long ll;
   SEXP s;
 } ival, xval;
 
 static int mid, tmplow, tmpupp;  // global to save them being added to recursive stack. Maybe optimizer would do this anyway.
-static SEXP ic, xc, class;
+static SEXP ic, xc;
+
+// If we find a non-ASCII, non-NA, non-UTF8 encoding, we try to convert it to UTF8. That is, marked non-ascii/non-UTF8 encodings will always be checked in UTF8 locale. This seems to be the best fix I could think of to put the encoding issues to rest..
+// Since the if-statement will fail with the first condition check in "normal" ASCII cases, there shouldn't be huge penalty issues for default setup.
+// Fix for #66, #69, #469 and #1293
+// TODO: compare 1.9.6 performance with 1.9.7 with huge number of ASCII strings.
+SEXP ENC2UTF8(SEXP s) {
+    if (!IS_ASCII(s) && s != NA_STRING && !IS_UTF8(s))
+        s = mkCharCE(translateCharUTF8(s), CE_UTF8);
+    return (s);
+}
 
-void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int lowmax, int uppmax)
+void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int thisgrp, int lowmax, int uppmax)
 // col is >0 and <=ncol-1 if this range of [xlow,xupp] and [ilow,iupp] match up to but not including that column
 // lowmax=1 if xlowIn is the lower bound of this group (needed for roll)
 // uppmax=1 if xuppIn is the upper bound of this group (needed for roll)
+// new: col starts with -1 for non-equi joins, which gathers rows from nested id group counter 'thisgrp'
 {
     int xlow=xlowIn, xupp=xuppIn, ilow=ilowIn, iupp=iuppIn, j, k, ir, lir, tmp;
+    Rboolean isInt64=FALSE;
     ir = lir = ilow + (iupp-ilow)/2;           // lir = logical i row.
     if (o) ir = o[lir]-1;                      // ir = the actual i row if i were ordered
-
-    ic = VECTOR_ELT(i,icols[col]-1);  // ic = i column
-    xc = VECTOR_ELT(x,xcols[col]-1);  // xc = x column
+    if (col>-1) {
+        ic = VECTOR_ELT(i,icols[col]-1);  // ic = i column
+        xc = VECTOR_ELT(x,xcols[col]-1);  // xc = x column
     // it was checked in bmerge() that the types are equal
-    
+    } else xc = nqgrp;
     switch (TYPEOF(xc)) {
     case LGLSXP : case INTSXP :   // including factors
-        ival.i = INTEGER(ic)[ir];
+        ival.i = (col>-1) ? INTEGER(ic)[ir] : thisgrp;
         while(xlow < xupp-1) {
             mid = xlow + (xupp-xlow)/2;   // Same as (xlow+xupp)/2 but without risk of overflow
             xval.i = INTEGER(xc)[XIND(mid)];
@@ -131,7 +241,8 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int lowma
                 xlow=mid;
             } else if (xval.i>ival.i) {   // TO DO: is *(&xlow, &xupp)[0|1]=mid more efficient than branch?
                 xupp=mid;
-            } else { // xval.i == ival.i  including NA_INTEGER==NA_INTEGER
+            } else {
+                // xval.i == ival.i  including NA_INTEGER==NA_INTEGER
                 // branch mid to find start and end of this group in this column
                 // TO DO?: not if mult=first|last and col<ncol-1
                 tmplow = mid;
@@ -150,45 +261,59 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int lowma
                 break;
             }
         }
+        if (col>-1 && op[col] != EQ) {
+            switch (op[col]) {
+                case LE : xlow = xlowIn; break;
+                case LT : xupp = xlow + 1; xlow = xlowIn; break;
+                case GE : if (ival.i != NA_INTEGER) xupp = xuppIn; break;
+                case GT : xlow = xupp - 1; if (ival.i != NA_INTEGER) xupp = xuppIn; break;
+            }
+            // for LE/LT cases, we need to ensure xlow excludes NA indices, != EQ is checked above already
+            if (op[col] <= 3 && xlow<xupp-1 && ival.i != NA_INTEGER && INTEGER(xc)[XIND(xlow+1)] == NA_INTEGER) {
+                tmplow = xlow; tmpupp = xupp;
+                while (tmplow < tmpupp-1) {
+                    mid = tmplow + (tmpupp-tmplow)/2;
+                    xval.i = INTEGER(xc)[XIND(mid)];
+                    if (xval.i == NA_INTEGER) tmplow = mid; else tmpupp = mid;
+                }
+                xlow = tmplow; // tmplow is the index of last NA value
+            }
+        }
         tmplow = lir;
         tmpupp = lir;
-        while(tmplow<iupp-1) {   // TO DO: could double up from lir rather than halving from iupp
-            mid = tmplow + (iupp-tmplow)/2;
-            xval.i = INTEGER(ic)[ o ? o[mid]-1 : mid ];   // reuse xval to search in i
-            if (xval.i == ival.i) tmplow=mid; else iupp=mid;
-        }
-        while(ilow<tmpupp-1) {
-            mid = ilow + (tmpupp-ilow)/2;
-            xval.i = INTEGER(ic)[ o ? o[mid]-1 : mid ];
-            if (xval.i == ival.i) tmpupp=mid; else ilow=mid;
+        if (col>-1) {
+            while(tmplow<iupp-1) {   // TO DO: could double up from lir rather than halving from iupp
+                mid = tmplow + (iupp-tmplow)/2;
+                xval.i = INTEGER(ic)[ o ? o[mid]-1 : mid ];   // reuse xval to search in i
+                if (xval.i == ival.i) tmplow=mid; else iupp=mid;
+                // if we could guarantee ivals to be *always* sorted for all columns independently (= max(nestedid) = 1), then we can speed this up by 2x by adding checks for GE,GT,LE,LT separately.
+            }
+            while(ilow<tmpupp-1) {
+                mid = ilow + (tmpupp-ilow)/2;
+                xval.i = INTEGER(ic)[ o ? o[mid]-1 : mid ];
+                if (xval.i == ival.i) tmpupp=mid; else ilow=mid;
+            }
         }
         // ilow and iupp now surround the group in ic, too
         break;
     case STRSXP :
-        ival.s = STRING_ELT(ic,ir);
+        if (op[col] != EQ) error("Only '==' operator is supported for columns of type %s.", type2char(TYPEOF(xc)));
+        ival.s = ENC2UTF8(STRING_ELT(ic,ir));
         while(xlow < xupp-1) {
             mid = xlow + (xupp-xlow)/2;
-            xval.s = STRING_ELT(xc, XIND(mid));
-            if (enc_warn && (ENC_KNOWN(ival.s) || ENC_KNOWN(xval.s))) {
-                // The || is only done here to avoid the warning message being repeating in this code.
-                warning("A known encoding (latin1 or UTF-8) was detected in a join column. data.table compares the bytes currently, so doesn't support *mixed* encodings well; i.e., using both latin1 and UTF-8, or if any unknown encodings are non-ascii and some of those are marked known and others not. But if either latin1 or UTF-8 is used exclusively, and all unknown encodings are ascii, then the result should be ok. In future we will check for you and avoid this warning if everything is [...]
-                // TO DO: check and warn in forder whether any strings are non-ascii (>127) but unknown encoding
-                //        check in forder whether both latin1 and UTF-8 have been used
-                //        See bugs #5159 and #5266 and related #5295 to revisit
-                enc_warn = FALSE;  // just warn once
-            }
+            xval.s = ENC2UTF8(STRING_ELT(xc, XIND(mid)));
             tmp = StrCmp(xval.s, ival.s);  // uses pointer equality first, NA_STRING are allowed and joined to, then uses strcmp on CHAR().
             if (tmp == 0) {                // TO DO: deal with mixed encodings and locale optionally
                 tmplow = mid;
                 tmpupp = mid;
                 while(tmplow<xupp-1) {
                     mid = tmplow + (xupp-tmplow)/2;
-                    xval.s = STRING_ELT(xc, XIND(mid));
-                    if (ival.s == xval.s) tmplow=mid; else xupp=mid;  // the == here assumes (within this column) no mixing of latin1 and UTF-8, and no unknown non-ascii
-                }                                                     // TO DO: add checks to forder, see above.
+                    xval.s = ENC2UTF8(STRING_ELT(xc, XIND(mid)));
+                    if (ival.s == xval.s) tmplow=mid; else xupp=mid;  // the == here handles encodings as well. Marked non-utf8 encodings are converted to utf-8 using ENC2UTF8.
+                }
                 while(xlow<tmpupp-1) {
                     mid = xlow + (tmpupp-xlow)/2;
-                    xval.s = STRING_ELT(xc, XIND(mid));
+                    xval.s = ENC2UTF8(STRING_ELT(xc, XIND(mid)));
                     if (ival.s == xval.s) tmpupp=mid; else xlow=mid;  // see above re ==
                 }
                 break;
@@ -202,75 +327,144 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int lowma
         tmpupp = lir;
         while(tmplow<iupp-1) {
             mid = tmplow + (iupp-tmplow)/2;
-            xval.s = STRING_ELT(ic, o ? o[mid]-1 : mid);
+            xval.s = ENC2UTF8(STRING_ELT(ic, o ? o[mid]-1 : mid));
             if (xval.s == ival.s) tmplow=mid; else iupp=mid;   // see above re ==
         }
         while(ilow<tmpupp-1) {
             mid = ilow + (tmpupp-ilow)/2;
-            xval.s = STRING_ELT(ic, o ? o[mid]-1 : mid);
+            xval.s = ENC2UTF8(STRING_ELT(ic, o ? o[mid]-1 : mid));
             if (xval.s == ival.s) tmpupp=mid; else ilow=mid;   // see above re == 
         }
         break;
     case REALSXP :
-        class = getAttrib(xc, R_ClassSymbol);
-        twiddle = (isString(class) && STRING_ELT(class, 0)==char_integer64) ? &i64twiddle : &dtwiddle;
-        ival.ll = twiddle(DATAPTR(ic), ir, 1);
+        isInt64 = INHERITS(xc, char_integer64);
+        twiddle = isInt64 ? &i64twiddle : &dtwiddle;
+        ival.ull = twiddle(DATAPTR(ic), ir, 1);
         while(xlow < xupp-1) {
             mid = xlow + (xupp-xlow)/2;
-            xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-            if (xval.ll<ival.ll) {
+            xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+            if (xval.ull<ival.ull) {
                 xlow=mid;
-            } else if (xval.ll>ival.ll) {
+            } else if (xval.ull>ival.ull) {
                 xupp=mid;
-            } else { // xval.ll == ival.ll) 
+            } else { // xval.ull == ival.ull) 
                 tmplow = mid;
                 tmpupp = mid;
                 while(tmplow<xupp-1) {
                     mid = tmplow + (xupp-tmplow)/2;
-                    xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-                    if (xval.ll == ival.ll) tmplow=mid; else xupp=mid;
+                    xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+                    if (xval.ull == ival.ull) tmplow=mid; else xupp=mid;
                 }
                 while(xlow<tmpupp-1) {
                     mid = xlow + (tmpupp-xlow)/2;
-                    xval.ll = twiddle(DATAPTR(xc), XIND(mid), 1);
-                    if (xval.ll == ival.ll) tmpupp=mid; else xlow=mid;
+                    xval.ull = twiddle(DATAPTR(xc), XIND(mid), 1);
+                    if (xval.ull == ival.ull) tmpupp=mid; else xlow=mid;
                 }
                 break;
             }
         }
+        if (col>-1 && op[col] != EQ) {
+            Rboolean isivalNA = !isInt64 ? ISNAN(REAL(ic)[ir]) : (*(long long *)&REAL(ic)[ir] == NAINT64);
+            switch (op[col]) {
+            case LE : if (!isivalNA) xlow = xlowIn; break;
+            case LT : xupp = xlow + 1; if (!isivalNA) xlow = xlowIn; break;
+            case GE : if (!isivalNA) xupp = xuppIn; break;
+            case GT : xlow = xupp - 1; if (!isivalNA) xupp = xuppIn; break;
+            }
+            // for LE/LT cases, we need to ensure xlow excludes NA indices, != EQ is checked above already
+            if (op[col] <= 3 && xlow<xupp-1 && !isivalNA && (!isInt64 ? ISNAN(REAL(xc)[XIND(xlow+1)]) : (*(long long *)&REAL(xc)[XIND(xlow+1)] == NAINT64))) {
+                tmplow = xlow; tmpupp = xupp;
+                while (tmplow < tmpupp-1) {
+                    mid = tmplow + (tmpupp-tmplow)/2;
+                    xval.d = REAL(xc)[XIND(mid)];
+                    if (!isInt64 ? ISNAN(xval.d) : xval.ll == NAINT64) tmplow = mid; else tmpupp = mid;
+                }
+                xlow = tmplow; // tmplow is the index of last NA value
+            }
+        }
         tmplow = lir;
         tmpupp = lir;
-        while(tmplow<iupp-1) {
-            mid = tmplow + (iupp-tmplow)/2;
-            xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
-            if (xval.ll == ival.ll) tmplow=mid; else iupp=mid;
-        }
-        while(ilow<tmpupp-1) {
-            mid = ilow + (tmpupp-ilow)/2;
-            xval.ll = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
-            if (xval.ll == ival.ll) tmpupp=mid; else ilow=mid;
+        if (col>-1) {
+            while(tmplow<iupp-1) {
+                mid = tmplow + (iupp-tmplow)/2;
+                xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
+                if (xval.ull == ival.ull) tmplow=mid; else iupp=mid;
+            }
+            while(ilow<tmpupp-1) {
+                mid = ilow + (tmpupp-ilow)/2;
+                xval.ull = twiddle(DATAPTR(ic), o ? o[mid]-1 : mid, 1 );
+                if (xval.ull == ival.ull) tmpupp=mid; else ilow=mid;
+            }
         }
+        // ilow and iupp now surround the group in ic, too
         break;
     default:
         error("Type '%s' not supported as key column", type2char(TYPEOF(xc)));
     }
-    
-    if (xlow<xupp-1) {      // if value found, low and upp surround it, unlike standard binary search where low falls on it
-        if (col<ncol-1) bmerge_r(xlow, xupp, ilow, iupp, col+1, 1, 1);  // final two 1's are lowmax and uppmax
-        else {
+    if (xlow<xupp-1) { // if value found, low and upp surround it, unlike standard binary search where low falls on it
+        if (col<ncol-1) {
+            bmerge_r(xlow, xupp, ilow, iupp, col+1, thisgrp, 1, 1);
+            // final two 1's are lowmax and uppmax
+        } else {
             int len = xupp-xlow-1;
-            if (len>1) allLen1[0] = FALSE;
-            for (j=ilow+1; j<iupp; j++) {   // usually iterates once only for j=ir
-                k = o ? o[j]-1 : j;
-                retFirst[k] = xlow+2;       // extra +1 for 1-based indexing at R level
-                retLength[k]= len; 
+            if (mult==ALL && len>1) allLen1[0] = FALSE;
+            if (nqmaxgrp == 1) {
+                for (j=ilow+1; j<iupp; j++) {   // usually iterates once only for j=ir
+                    k = o ? o[j]-1 : j;
+                    retFirst[k] = (mult != LAST) ? xlow+2 : xupp; // extra +1 for 1-based indexing at R level
+                    retLength[k]= (mult == ALL) ? len : 1;
+                    // retIndex initialisation is taken care of in bmerge and doesn't change for thisgrp=1
+                }
+            } else {
+                // non-equi join
+                for (j=ilow+1; j<iupp; j++) {
+                    k = o ? o[j]-1 : j;
+                    if (retFirst[k] != nomatch) {
+                        if (mult == ALL) {
+                            // for this irow, we've matches on more than one group
+                            allGrp1[0] = FALSE;
+                            retFirst[ctr+ilen] = xlow+2;
+                            retLength[ctr+ilen] = len;
+                            retIndex[ctr+ilen] = k+1;
+                            ++ctr;
+                            if (ctr+ilen >= anslen) {
+                                anslen = 1.1*anslen;
+                                tmpptr = Realloc(retFirst, anslen, int);
+                                if (tmpptr != NULL) retFirst = tmpptr; 
+                                else error("Error in reallocating memory in non-equi joins.\n");
+                                tmpptr = Realloc(retLength, anslen, int);
+                                if (tmpptr != NULL) retLength = tmpptr; 
+                                else error("Error in reallocating memory in non-equi joins.\n");
+                                tmpptr = Realloc(retIndex, anslen, int);
+                                if (tmpptr != NULL) retIndex = tmpptr; 
+                                else error("Error in reallocating memory in non-equi joins.\n");
+                            }
+                        } else if (mult == FIRST) {
+                            retFirst[k] = (XIND(retFirst[k]-1) > XIND(xlow+1)) ? xlow+2 : retFirst[k];
+                            retLength[k] = 1;
+                        } else {
+                            retFirst[k] = (XIND(retFirst[k]-1) < XIND(xupp-1)) ? xupp : retFirst[k];
+                            retLength[k] = 1;
+                        }
+                    } else {
+                        // none of the groups so far have filled in for this index. So use it!
+                        if (mult == ALL) {
+                            retFirst[k] = xlow+2;
+                            retLength[k] = len;
+                            retIndex[k] = k+1;
+                            // no need to increment ctr of course
+                        } else {
+                            retFirst[k] = (mult == FIRST) ? xlow+2 : xupp;
+                            retLength[k] = 1;
+                        }
+                    }
+                }
             }
         }
-    }
-    else if (roll!=0.0 && col==ncol-1) {
+    } else if (roll!=0.0 && col==ncol-1) {
         // runs once per i row (not each search test), so not hugely time critical
         if (xlow != xupp-1 || xlow<xlowIn || xupp>xuppIn) error("Internal error: xlow!=xupp-1 || xlow<xlowIn || xupp>xuppIn");
-        if (nearest) {   // value of roll ignored currently when nearest
+        if (rollToNearest) {   // value of roll ignored currently when nearest
             if ( (!lowmax || xlow>xlowIn) && (!uppmax || xupp<xuppIn) ) {
                 if (  ( TYPEOF(ic)==REALSXP && REAL(ic)[ir]-REAL(xc)[XIND(xlow)] <= REAL(xc)[XIND(xupp)]-REAL(ic)[ir] )
                    || ( TYPEOF(ic)<=INTSXP && INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)] <= INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir] )) {
@@ -288,37 +482,68 @@ void bmerge_r(int xlowIn, int xuppIn, int ilowIn, int iuppIn, int col, int lowma
                 retLength[ir] = 1;
             }
         } else {
+            // Regular roll=TRUE|+ve|-ve
+            // Fixed issues: #1405, #1650, #1007
+            // TODO: incorporate the twiddle logic for roll as well instead of tolerance?  
             if ( (   (roll>0.0 && (!lowmax || xlow>xlowIn) && (xupp<xuppIn || !uppmax || rollends[1]))
                   || (roll<0.0 && xupp==xuppIn && uppmax && rollends[1]) )
-              && (   (TYPEOF(ic)==REALSXP && (REAL(ic)[ir]-REAL(xc)[XIND(xlow)]-rollabs<1e-6 || 
-                                              REAL(ic)[ir]-REAL(xc)[XIND(xlow)] == rollabs)) // #1007 fix
-                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs<1e-6 ) 
+              && (   (TYPEOF(ic)==REALSXP &&
+                      (ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xlow)], 1) &&
+                     (( !isInt64 &&
+                        (ival.d-xval.d-rollabs < 1e-6 || 
+                         ival.d-xval.d == rollabs /*#1007*/))
+                   || ( isInt64 &&
+                        (double)(ival.ll-xval.ll)-rollabs < 1e-6 ) ))  // cast to double for when rollabs==Inf
+                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(ic)[ir]-INTEGER(xc)[XIND(xlow)])-rollabs < 1e-6 )
                   || (TYPEOF(ic)==STRSXP)   )) {
                 retFirst[ir] = xlow+1;
                 retLength[ir] = 1;
             } else if
                (  (  (roll<0.0 && (!uppmax || xupp<xuppIn) && (xlow>xlowIn || !lowmax || rollends[0]))
                   || (roll>0.0 && xlow==xlowIn && lowmax && rollends[0]) )
-              && (   (TYPEOF(ic)==REALSXP && (REAL(xc)[XIND(xupp)]-REAL(ic)[ir]-rollabs<1e-6 || 
-                                              REAL(xc)[XIND(xupp)]-REAL(ic)[ir] == rollabs)) // 1007 fix
-                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs<1e-6 )
+              && (   (TYPEOF(ic)==REALSXP &&
+                      (ival.d = REAL(ic)[ir], xval.d = REAL(xc)[XIND(xupp)], 1) &&
+                     (( !isInt64 &&
+                        (xval.d-ival.d-rollabs < 1e-6 || 
+                         xval.d-ival.d == rollabs /*#1007*/))
+                   || ( isInt64 &&
+                        (double)(xval.ll-ival.ll)-rollabs < 1e-6 ) ))
+                  || (TYPEOF(ic)<=INTSXP && (double)(INTEGER(xc)[XIND(xupp)]-INTEGER(ic)[ir])-rollabs < 1e-6 )
                   || (TYPEOF(ic)==STRSXP)   )) {
                 retFirst[ir] = xupp+1;   // == xlow+2
                 retLength[ir] = 1;
             }
         }
-        if (iupp-ilow > 2 && retFirst[ir]!=NA_INTEGER) {  // >=2 equal values in the last column being rolling to the same point.  
-            for (j=ilow+1; j<iupp; j++) {                 // will rewrite retFirst[ir] to itself, but that's ok
+        if (iupp-ilow > 2 && retFirst[ir]!=NA_INTEGER) {
+            // >=2 equal values in the last column being rolling to the same point.  
+            for (j=ilow+1; j<iupp; j++) {
+                // will rewrite retFirst[ir] to itself, but that's ok
                 if (o) k=o[j]-1; else k=j;
                 retFirst[k] = retFirst[ir];
                 retLength[k]= retLength[ir]; 
             }
         }
     }
-    if (ilow>ilowIn && (xlow>xlowIn || (roll!=0.0 && col==ncol-1)))
-        bmerge_r(xlowIn, xlow+1, ilowIn, ilow+1, col, lowmax, uppmax && xlow+1==xuppIn);
-    if (iupp<iuppIn && (xupp<xuppIn || (roll!=0.0 && col==ncol-1)))
-        bmerge_r(xupp-1, xuppIn, iupp-1, iuppIn, col, lowmax && xupp-1==xlowIn, uppmax);
+    switch (op[col]) {
+    case EQ:
+        if (ilow>ilowIn && (xlow>xlowIn || ((roll!=0.0 || op[col] != EQ) && col==ncol-1)))
+            bmerge_r(xlowIn, xlow+1, ilowIn, ilow+1, col, 1, lowmax, uppmax && xlow+1==xuppIn);
+        if (iupp<iuppIn && (xupp<xuppIn || ((roll!=0.0 || op[col] != EQ) && col==ncol-1)))
+            bmerge_r(xupp-1, xuppIn, iupp-1, iuppIn, col, 1, lowmax && xupp-1==xlowIn, uppmax);
+    break;
+    case LE: case LT:
+        // roll is not yet implemented
+        if (ilow>ilowIn)
+            bmerge_r(xlowIn, xuppIn, ilowIn, ilow+1, col, 1, lowmax, uppmax && xlow+1==xuppIn);
+        if (iupp<iuppIn)
+            bmerge_r(xlowIn, xuppIn, iupp-1, iuppIn, col, 1, lowmax && xupp-1==xlowIn, uppmax);
+    break;
+    case GE: case GT:
+        // roll is not yet implemented
+        if (ilow>ilowIn)
+            bmerge_r(xlowIn, xuppIn, ilowIn, ilow+1, col, 1, lowmax, uppmax && xlow+1==xuppIn);
+        if (iupp<iuppIn)
+            bmerge_r(xlowIn, xuppIn, iupp-1, iuppIn, col, 1, lowmax && xupp-1==xlowIn, uppmax);
+    break;
+    }
 }
-
-
diff --git a/src/data.table.h b/src/data.table.h
index c6cbc31..354b1e7 100644
--- a/src/data.table.h
+++ b/src/data.table.h
@@ -1,10 +1,29 @@
 #include <R.h>
 #define USE_RINTERNALS
 #include <Rinternals.h>
-// #include <omp.h>
+#include <Rversion.h>
+#ifdef _OPENMP
+  #include <omp.h>
+#else // so it still compiles on machines with compilers void of openmp support
+  #define omp_get_num_threads() 1
+  #define omp_get_thread_num() 0
+#endif
 // #include <signal.h> // the debugging machinery + breakpoint aidee
 // raise(SIGINT);
 
+// Fixes R-Forge #5150, and #1641
+// a simple check for R version to decide if the type should be R_len_t or 
+// R_xlen_t long vector support was added in R 3.0.0
+#if defined(R_VERSION) && R_VERSION >= R_Version(3, 0, 0)
+  typedef R_xlen_t RLEN;
+#else
+  typedef R_len_t RLEN;
+#endif
+
+#define IS_UTF8(x)  (LEVELS(x) & 8)
+#define IS_ASCII(x) (LEVELS(x) & 64)
+#define IS_LATIN(x) (LEVELS(x) & 4)
+
 #define SIZEOF(x) sizes[TYPEOF(x)]
 #ifdef MIN
 #undef MIN
@@ -15,6 +34,11 @@
 // init.c
 void setSizes();
 SEXP char_integer64;
+SEXP char_ITime;
+SEXP char_IDate;
+SEXP char_Date;
+SEXP char_POSIXct;
+Rboolean INHERITS(SEXP x, SEXP char_); 
 
 // dogroups.c
 SEXP keepattr(SEXP to, SEXP from);
@@ -66,13 +90,26 @@ SEXP alloccol(SEXP dt, R_len_t n, Rboolean verbose);
 void memrecycle(SEXP target, SEXP where, int r, int len, SEXP source);
 SEXP shallowwrapper(SEXP dt, SEXP cols);
 
-SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEXP xjiscols, SEXP grporder, SEXP order, SEXP starts, SEXP lens, SEXP jexp, SEXP env, SEXP lhs, SEXP newnames, SEXP on, SEXP verbose);
+SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, 
+                SEXP xjiscols, SEXP grporder, SEXP order, SEXP starts, 
+                SEXP lens, SEXP jexp, SEXP env, SEXP lhs, SEXP newnames, 
+                SEXP on, SEXP verbose);
 
 // bmerge.c
-SEXP bmerge(SEXP left, SEXP right, SEXP leftcols, SEXP rightcols, SEXP isorted, SEXP xoArg, SEXP rollarg, SEXP rollends, SEXP nomatch, SEXP retFirst, SEXP retLength, SEXP allLen1);
-
-// fcast.c
-SEXP coerce_to_char(SEXP s, SEXP env);
+SEXP bmerge(SEXP iArg, SEXP xArg, SEXP icolsArg, SEXP xcolsArg, SEXP isorted, 
+                SEXP xoArg, SEXP rollarg, SEXP rollendsArg, SEXP nomatchArg, 
+                SEXP multArg, SEXP opArg, SEXP nqgrpArg, SEXP nqmaxgrpArg);
+SEXP ENC2UTF8(SEXP s);
 
 // rbindlist.c
-SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean * isRowOrdered);
+SEXP combineFactorLevels(SEXP factorLevels, int *factorType, Rboolean *isRowOrdered);
+
+// quickselect
+double dquickselect(double *x, int n, int k);
+double iquickselect(int *x, int n, int k);
+
+// openmp-utils.c
+int getDTthreads();
+void avoid_openmp_hang_within_fork();
+
+
diff --git a/src/dogroups.c b/src/dogroups.c
index c23e41a..255d49c 100644
--- a/src/dogroups.c
+++ b/src/dogroups.c
@@ -27,13 +27,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
 {
     R_len_t i, j, k, rownum, ngrp, njval=0, ngrpcols, ansloc=0, maxn, estn=-1, r, thisansloc, grpn, thislen, igrp, vlen, origIlen=0, origSDnrow=0;
     int protecti=0;
-    SEXP names, names2, xknames, bynames, dtnames, ans=NULL, jval, thiscol, SD, BY, N, I, GRP, iSD, xSD, rownames, s, RHS, listwrap, target, source;
-    SEXP *nameSyms, *xknameSyms;
+    SEXP names, names2, xknames, bynames, dtnames, ans=NULL, jval, thiscol, SDall, BY, N, I, GRP, iSD, xSD, rownames, s, RHS, listwrap, target, source, tmp;
     Rboolean wasvector, firstalloc=FALSE, NullWarnDone=FALSE, recycleWarn=TRUE;
-    #if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
-        SEXP dupcol;
-        int named=0;
-    #endif
     size_t size; // must be size_t, otherwise bug #5305 (integer overflow in memcpy)
     clock_t tstart=0, tblock[10]={0}; int nblock[10]={0};
 
@@ -46,7 +41,8 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
     if(!isEnvironment(env)) error("’env’ should be an environment");
     ngrp = length(starts);  // the number of groups  (nrow(groups) will be larger when by)
     ngrpcols = length(grpcols);
-    SD = findVar(install(".SD"), env);
+    // fix for longstanding FR/bug, #495. E.g., DT[, c(sum(v1), lapply(.SD, mean)), by=grp, .SDcols=v2:v3] resulted in error.. the idea is, 1) we create .SDall, which is normally == .SD. But if extra vars are detected in jexp other than .SD, then .SD becomes a shallow copy of .SDall with only .SDcols in .SD. Since internally, we don't make a copy, changing .SDall will reflect in .SD. Hopefully this'll workout :-). 
+    SDall = findVar(install(".SDall"), env);
     
     defineVar(install(".BY"), BY = allocVector(VECSXP, ngrpcols), env);
     bynames = PROTECT(allocVector(STRSXP, ngrpcols));  protecti++;   // TO DO: do we really need bynames, can we assign names afterwards in one step?
@@ -75,7 +71,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
     dtnames = getAttrib(dt, R_NamesSymbol); // added here to fix #4990 - `:=` did not issue recycling warning during "by"
     // fetch rownames of .SD.  rownames[1] is set to -thislen for each group, in case .SD is passed to
     // non data.table aware package that uses rownames
-    for (s = ATTRIB(SD); s != R_NilValue && TAG(s)!=R_RowNamesSymbol; s = CDR(s));
+    for (s = ATTRIB(SDall); s != R_NilValue && TAG(s)!=R_RowNamesSymbol; s = CDR(s));
     // getAttrib0 basically but that's hidden in attrib.c
     if (s==R_NilValue) error("row.names attribute of .SD not found");
     rownames = CAR(s);
@@ -83,25 +79,23 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
     
     // fetch names of .SD and prepare symbols. In case they are copied-on-write by user assigning to those variables
     // using <- in j (which is valid, useful and tested), they are repointed to the .SD cols for each group.
-    names = getAttrib(SD, R_NamesSymbol);
-    if (length(names) != length(SD)) error("length(names)!=length(SD)");
-    nameSyms = Calloc(length(names), SEXP);
-    if (!nameSyms) error("Calloc failed to allocate %d nameSyms in dogroups",length(names));
-    for(i = 0; i < length(SD); i++) {
-        if (SIZEOF(VECTOR_ELT(SD, i))==0)
-            error("Type %d in .SD column %d", TYPEOF(VECTOR_ELT(SD, i)), i);
+    names = getAttrib(SDall, R_NamesSymbol);
+    if (length(names) != length(SDall)) error("length(names)!=length(SD)");
+    SEXP *nameSyms = (SEXP *)R_alloc(length(names), sizeof(SEXP));
+    for(i = 0; i < length(SDall); i++) {
+        if (SIZEOF(VECTOR_ELT(SDall, i))==0)
+            error("Type %d in .SD column %d", TYPEOF(VECTOR_ELT(SDall, i)), i);
         nameSyms[i] = install(CHAR(STRING_ELT(names, i)));
         // fixes http://stackoverflow.com/questions/14753411/why-does-data-table-lose-class-definition-in-sd-after-group-by
-        copyMostAttrib(VECTOR_ELT(dt,INTEGER(dtcols)[i]-1), VECTOR_ELT(SD,i));  // not names, otherwise test 778 would fail
+        copyMostAttrib(VECTOR_ELT(dt,INTEGER(dtcols)[i]-1), VECTOR_ELT(SDall,i));  // not names, otherwise test 778 would fail
     }
     
     origIlen = length(I);  // test 762 has length(I)==1 but nrow(SD)==0
-    if (length(SD)) origSDnrow = length(VECTOR_ELT(SD, 0));
+    if (length(SDall)) origSDnrow = length(VECTOR_ELT(SDall, 0));
 
     xknames = getAttrib(xSD, R_NamesSymbol);
     if (length(xknames) != length(xSD)) error("length(xknames)!=length(xSD)");
-    xknameSyms = Calloc(length(xknames), SEXP);
-    if (!xknameSyms) error("Calloc failed to allocate %d xknameSyms in dogroups",length(xknames));
+    SEXP *xknameSyms = (SEXP *)R_alloc(length(xknames), sizeof(SEXP));
     for(i = 0; i < length(xSD); i++) {
         if (SIZEOF(VECTOR_ELT(xSD, i))==0)
             error("Type %d in .xSD column %d", TYPEOF(VECTOR_ELT(xSD, i)), i);
@@ -118,7 +112,13 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
     
     ansloc = 0;
     for(i=0; i<ngrp; i++) {   // even for an empty i table, ngroup is length 1 (starts is value 0), for consistency of empty cases
-        if (INTEGER(starts)[i] == 0 && i>0) continue; // replaced (i>0 || !isNull(lhs)) with i>0 to fix #5376
+
+        if (INTEGER(starts)[i]==0 && (i<ngrp-1 || estn>-1)) continue;
+        // Previously had replaced (i>0 || !isNull(lhs)) with i>0 to fix #5376
+        // The above is now to fix #1993, see test 1746.
+        // In cases were no i rows match, '|| estn>-1' ensures that the last empty group creates an empty result.
+        // TODO: revisit and tidy
+        
         if (!isNull(lhs) &&
                (INTEGER(starts)[i] == NA_INTEGER ||
                 (LENGTH(order) && INTEGER(order)[ INTEGER(starts)[i]-1 ]==NA_INTEGER)))
@@ -134,7 +134,10 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                    (char *)DATAPTR(VECTOR_ELT(groups,INTEGER(jiscols)[j]-1))+i*size,
                    size);  
         }
-        igrp = (length(grporder) && !LOGICAL(on)[0]) ? INTEGER(grporder)[INTEGER(starts)[i]-1]-1 : (isNull(jiscols) ? INTEGER(starts)[i]-1 : i);
+        if (LOGICAL(on)[0])
+            igrp = (length(grporder) && isNull(jiscols)) ? INTEGER(grporder)[INTEGER(starts)[i]-1]-1 : i;
+        else
+            igrp = (length(grporder)) ? INTEGER(grporder)[INTEGER(starts)[i]-1]-1 : (isNull(jiscols) ? INTEGER(starts)[i]-1 : i);
         if (igrp >= 0) for (j=0; j<length(BY); j++) {    // igrp can be -1 so 'if' is important, otherwise memcpy crash
             size = SIZEOF(VECTOR_ELT(BY,j));
             memcpy((char *)DATAPTR(VECTOR_ELT(BY,j)),  // ok use of memcpy size 1. Loop'd through columns not rows
@@ -142,19 +145,19 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                    size);
         }
         if (INTEGER(starts)[i] == NA_INTEGER || (LENGTH(order) && INTEGER(order)[ INTEGER(starts)[i]-1 ]==NA_INTEGER)) {
-            for (j=0; j<length(SD); j++) {
-                switch (TYPEOF(VECTOR_ELT(SD, j))) {
+            for (j=0; j<length(SDall); j++) {
+                switch (TYPEOF(VECTOR_ELT(SDall, j))) {
                 case LGLSXP :
-                    LOGICAL(VECTOR_ELT(SD,j))[0] = NA_LOGICAL;
+                    LOGICAL(VECTOR_ELT(SDall,j))[0] = NA_LOGICAL;
                     break;
                 case INTSXP :
-                    INTEGER(VECTOR_ELT(SD,j))[0] = NA_INTEGER;
+                    INTEGER(VECTOR_ELT(SDall,j))[0] = NA_INTEGER;
                     break;
                 case REALSXP :
-                    REAL(VECTOR_ELT(SD,j))[0] = NA_REAL;
+                    REAL(VECTOR_ELT(SDall,j))[0] = NA_REAL;
                     break;
                 case STRSXP :
-                    SET_STRING_ELT(VECTOR_ELT(SD,j),0,NA_STRING);
+                    SET_STRING_ELT(VECTOR_ELT(SDall,j),0,NA_STRING);
                     break;
                 default:
                     error("Logical error. Type of column should have been checked by now");
@@ -184,9 +187,9 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
             if (LOGICAL(verbose)[0]) tstart = clock();
             if (LENGTH(order)==0) {
                 rownum = INTEGER(starts)[i]-1;
-                for (j=0; j<length(SD); j++) {
-                    size = SIZEOF(VECTOR_ELT(SD,j));
-                    memcpy((char *)DATAPTR(VECTOR_ELT(SD,j)),  // direct memcpy best here, for usually large size groups. by= each row is slow and not recommended anyway, so we don't mind there's no switch here for grpn==1
+                for (j=0; j<length(SDall); j++) {
+                    size = SIZEOF(VECTOR_ELT(SDall,j));
+                    memcpy((char *)DATAPTR(VECTOR_ELT(SDall,j)),  // direct memcpy best here, for usually large size groups. by= each row is slow and not recommended anyway, so we don't mind there's no switch here for grpn==1
                        (char *)DATAPTR(VECTOR_ELT(dt,INTEGER(dtcols)[j]-1))+rownum*size,
                        grpn*size);
                     // SD is our own alloc'd memory, and the source (DT) is protected throughout, so no need for SET_* overhead
@@ -202,9 +205,9 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
             } else {
                 // Fairly happy with this block. No need for SET_* here. See comment above. 
                 for (k=0; k<grpn; k++) INTEGER(I)[k] = INTEGER(order)[ INTEGER(starts)[i]-1 + k ];
-                for (j=0; j<length(SD); j++) {
-                    size = SIZEOF(VECTOR_ELT(SD,j));
-                    target = VECTOR_ELT(SD,j);
+                for (j=0; j<length(SDall); j++) {
+                    size = SIZEOF(VECTOR_ELT(SDall,j));
+                    target = VECTOR_ELT(SDall,j);
                     source = VECTOR_ELT(dt,INTEGER(dtcols)[j]-1);
                     if (size==4) {
                         for (k=0; k<grpn; k++) {
@@ -223,9 +226,9 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
             }
         }
         INTEGER(rownames)[1] = -grpn;  // the .set_row_names() of .SD. Not .N when nomatch=NA and this is a nomatch
-        for (j=0; j<length(SD); j++) {
-            SETLENGTH(VECTOR_ELT(SD,j), grpn);
-            defineVar(nameSyms[j], VECTOR_ELT(SD, j), env);
+        for (j=0; j<length(SDall); j++) {
+            SETLENGTH(VECTOR_ELT(SDall,j), grpn);
+            defineVar(nameSyms[j], VECTOR_ELT(SDall, j), env);
             // In case user's j assigns to the columns names (env is static) (tests 387 and 388)
             // nameSyms pre-stored to save repeated install() for efficiency.
         }
@@ -271,9 +274,12 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                     // first time adding to new column
                     if (isNull(RHS)) error("RHS is NULL when grouping :=. Makes no sense to delete a column by group. Perhaps use an empty vector instead.");
                     if (TRUELENGTH(dt) < INTEGER(lhs)[j]) error("Internal error: Trying to add new column by reference but tl is full; alloc.col should have run first at R level before getting to this point in dogroups");
+                    tmp = PROTECT(allocNAVector(TYPEOF(RHS), LENGTH(VECTOR_ELT(dt,0))));
+                    // increment length only if the allocation passes, #1676
                     SETLENGTH(dtnames, LENGTH(dtnames)+1);
                     SETLENGTH(dt, LENGTH(dt)+1);
-                    SET_VECTOR_ELT(dt, INTEGER(lhs)[j]-1, allocNAVector(TYPEOF(RHS), LENGTH(VECTOR_ELT(dt,0))));
+                    SET_VECTOR_ELT(dt, INTEGER(lhs)[j]-1, tmp);
+                    UNPROTECT(1);
                     // Even if we could know reliably to switch from allocNAVector to allocVector for slight speedup, user code could still contain a switched halt, and in that case we'd want the groups not yet done to have NA rather than uninitialized or 0.
                     // dtnames = getAttrib(dt, R_NamesSymbol); // commented this here and added it on the beginning to fix #4990
                     SET_STRING_ELT(dtnames, INTEGER(lhs)[j]-1, STRING_ELT(newnames, INTEGER(lhs)[j]-origncol-1));
@@ -283,42 +289,22 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                 size = SIZEOF(target);
                 vlen = length(RHS);
                 if (vlen==0) continue;
-                if (vlen>grpn && j<LENGTH(jval)) warning("RHS %d is length %d (greater than the size (%d) of group %d). The last %d element(s) will be discarded.", j+1, vlen, grpn, i+1, vlen-grpn);
+                if (grpn>0 && vlen>grpn && j<LENGTH(jval)) warning("RHS %d is length %d (greater than the size (%d) of group %d). The last %d element(s) will be discarded.", j+1, vlen, grpn, i+1, vlen-grpn);
                 // fix for #4990 - `:=` did not issue recycling warning during "by" operation.
                 if (vlen<grpn && vlen>0 && grpn%vlen != 0) 
                     warning("Supplied %d items to be assigned to group %d of size %d in column '%s' (recycled leaving remainder of %d items).",vlen,i+1,grpn,CHAR(STRING_ELT(dtnames,INTEGER(lhs)[j]-1)),grpn%vlen);
-                // fix for issues/481 for := case
-                // missed it in commit: https://github.com/Rdatatable/data.table/commit/86276f48798491d328caa72f6ebcce4d51649440
-                // see that link (or scroll down for the non := version) for comments
-                #if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
-                named=0;
-                if (isNewList(RHS) && NAMED(RHS) != 2) {
-                    dupcol = VECTOR_ELT(RHS, 0);
-                    named  = NAMED(dupcol);
-                    while(isNewList(dupcol)) {
-                        if (named == 2) break;
-                        else {
-                            dupcol = VECTOR_ELT(dupcol, 0);
-                            named = NAMED(dupcol);
-                        }
-                    }
-                    if (named == 2) RHS = PROTECT(duplicate(RHS));
-                }
-                memrecycle(target, order, INTEGER(starts)[i]-1, grpn, RHS);
-                if (named == 2) UNPROTECT(1);
-                #else
+                
                 memrecycle(target, order, INTEGER(starts)[i]-1, grpn, RHS);
-                #endif
                 
-                // fixes bug #2531. Got to set the class back. See comment below for explanation. This is the new fix. Works great!
-                // Also fix for #5437 (bug due to regression in 1.9.2+)
-                copyMostAttrib(RHS, target);  // not names, otherwise test 778 would fail
+                copyMostAttrib(RHS, target);  // not names, otherwise test 778 would fail.  
                 /* OLD FIX: commented now. The fix below resulted in segfault on factor columns because I dint set the "levels"
                    Instead of fixing that, I just removed setting class if it's factor. Not appropriate fix.
                    Correct fix of copying all attributes (except names) added above. Now, everything should be alright.
                    Test 1144 (#5104) will provide the right output now. Modified accordingly.
                 OUTDATED: if (!isFactor(RHS)) setAttrib(target, R_ClassSymbol, getAttrib(RHS, R_ClassSymbol));
-                OUTDATED: // added !isFactor(RHS) to fix #5104 (side-effect of fixing #2531) */
+                OUTDATED: // added !isFactor(RHS) to fix #5104 (side-effect of fixing #2531)
+                   See also #155 and #36 */
+
             }
             UNPROTECT(1);
             continue;
@@ -333,7 +319,6 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
         if (ansloc + maxn > estn) {
             if (estn == -1) {
                 // Given first group and j's result on it, make a good guess for size of result required.
-                // This was 'byretn' in R in v1.8.0, now here in C from v1.8.1 onwards.
                 if (grpn==0)
                     estn = maxn = 0;   // empty case e.g. test 184. maxn is 1 here due to sum(integer()) == 0L
                 else if (maxn==1) // including when grpn==1 we default to assuming it's an aggregate
@@ -422,6 +407,9 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                 case STRSXP :
                     for (r=0; r<maxn; r++) SET_STRING_ELT(target,thisansloc+r,NA_STRING);
                     break;
+                case VECSXP :
+                    for (r=0; r<maxn; r++) SET_VECTOR_ELT(target,thisansloc+r,NA_STRING);
+                    break;
                 default:
                     error("Logical error. Type of column should have been checked by now");
                 }
@@ -433,35 +421,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
                 warning("Column %d of result for group %d is length %d but the longest column in this result is %d. Recycled leaving remainder of %d items. This warning is once only for the first group with this issue.",j+1,i+1,thislen,maxn,maxn%thislen);
                 recycleWarn = FALSE;
             }
-            // fix for issues/481
-            #if defined(R_VERSION) && R_VERSION >= R_Version(3, 1, 0)
-            // added version because, for ex: DT[, list(list(unique(y))), by=x] gets duplicated
-            // because unique(y) returns NAMED(2). So, do it only if v>= 3.1.0. If <3.1.0,
-            // it gets duplicated anyway, so avoid copying twice!
-            named=0;
-            if (isNewList(source) && NAMED(source) != 2) {
-                // NAMED(source) != 2 prevents DT[, list(y), by=x] where 'y' is already a list 
-                // or data.table and 99% of cases won't clear the if-statement above.
-                dupcol = VECTOR_ELT(source, 0);
-                named  = NAMED(dupcol);
-                while(isNewList(dupcol)) {
-                    // while loop basically peels each list() layer one by one until there's no 
-                    // list() wrapped anymore. Ex: consider DT[, list(list(list(sum(y)))), by=x] - 
-                    // here, we don't need to duplicate, but we won't know that until we reach 
-                    // 'sum(y)' and know that it's NAMED() != 2.
-                    if (named == 2) break;
-                    else {
-                        dupcol = VECTOR_ELT(dupcol, 0);
-                        named = NAMED(dupcol);
-                    }
-                }
-                if (named == 2) source = PROTECT(duplicate(source));
-            }
-            memrecycle(target, R_NilValue, thisansloc, maxn, source);
-            if (named == 2) UNPROTECT(1);
-            #else
             memrecycle(target, R_NilValue, thisansloc, maxn, source);
-            #endif
         }
         ansloc += maxn;
         if (firstalloc) {
@@ -480,7 +440,7 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
         }
     } else ans = R_NilValue;
     // Now reset length of .SD columns and .I to length of largest group, otherwise leak if the last group is smaller (often is).
-    for (j=0; j<length(SD); j++) SETLENGTH(VECTOR_ELT(SD,j), origSDnrow);
+    for (j=0; j<length(SDall); j++) SETLENGTH(VECTOR_ELT(SDall,j), origSDnrow);
     SETLENGTH(I, origIlen);
     if (LOGICAL(verbose)[0]) {
         if (nblock[0] && nblock[1]) error("Internal error: block 0 [%d] and block 1 [%d] have both run", nblock[0], nblock[1]);
@@ -490,8 +450,6 @@ SEXP dogroups(SEXP dt, SEXP dtcols, SEXP groups, SEXP grpcols, SEXP jiscols, SEX
         Rprintf("  eval(j) took %.3fs for %d calls\n", 1.0*tblock[2]/CLOCKS_PER_SEC, nblock[2]);
     }
     UNPROTECT(protecti);
-    Free(nameSyms);
-    Free(xknameSyms);
     return(ans);
 }
 
diff --git a/src/fastradixdouble.c b/src/fastradixdouble.c
deleted file mode 100644
index 0d8720f..0000000
--- a/src/fastradixdouble.c
+++ /dev/null
@@ -1,253 +0,0 @@
-#include <R.h>
-#define USE_RINTERNALS
-#include <Rinternals.h>
-// #include <signal.h> // the debugging machinery + breakpoint aidee
-
-// Tested on:
-// R-3.0.2 osx 10.8.5 64-bit gcc 4.2.1, 
-// R-2.15.2 debian 64-bit gcc-4.7.2, 
-// R-2.15.3 osx 10.8.5 32-bit gcc-4.2.1, 
-// R-2.15.3 osx 10.8.5 64-bit gcc-4.2.1
-
-// for tolerance
-extern SEXP fastradixint(SEXP vec, SEXP return_index);
-
-// adapted from Michael Herf's code - http://stereopsis.com/radix.html
-// TO IMPLEMENT (probably) - R's long vector support (R_xlen_t)
-// 1) logical argument 'return_index' = TRUE returns 'order' (indices) and FALSE returns sorted value directly instead of indices
-// 2) also allows ordering/sorting with 'tolerance' (another pass through length of input vector will happen + multiple integer radix sort calls). 
-// The performance will depend on the number of groups that have to be sorted because under given tolerance they become identical. In theory, 
-// most of the times the last pass shouldn't affect the performance at all.
-// the "decreasing=" feature has been removed. Use 'setrev' instead to get the reverse order if required (small performance improvement by doing this)
-
-// Hack for 32-bit and 64-bit versions of R (or architectures):
-// ------------------------------------------------------------
-// data.table requires that NA and NaN be sorted before all other numbers and that NA is before NaN:
-// 1) 'unsigned long' in 64-bit is 64-bit, but 32-bit in 32-bit. Therefore we've to use 'unsigned long long' which seems 64-bit in both 32-and 64-bit
-// 2) Inf=0x7ff0000000000000, -Inf=0xfff0000000000000, NA=0x7ff80000000007a2 and NaN=0x7ff8000000000000 in 32-bit unsigned long long
-// 3) Inf=0x7ff0000000000000, -Inf=0xfff0000000000000, NA=0x7ff00000000007a2 and NaN=0x7ff8000000000000 in 64-bit unsigned long long
-// Therefore in 32-bit, NA gets sorted before NaN and in 64-bit, NaN gets sorted before NA in 64 bit.
-// 4) also, sometimes NaN seems to be 0xfff8000000000000 instead of 0x7ff8000000000000.
-// As of now the solution is to 'set' the first 16 bits to 7ff8 if NA/NaN for both 32/64-bit
-// so that all these differences are nullified and then the usual checks are done.
-// At the end, we need NA to be 0x7ff80000000007a2 and NaN to be 0x7ff8000000000000 and that's what we'll make sure of.
-
-// HACK for NA/NaN sort before all the other numbers:
-// --------------------------------------------------
-// Once we set this to be NA/NaN, to make sure that NA/NaN gets sorted before 
-// ALL the other numbers, we just flip the sign bit so that, 
-// NA and NaN become 0xfff80000000007a2 and 0xfff8000000000000
-// NA now is smaller than NaN and will be sorted before any other number (including -Inf)
-
-#define FLIP_SIGN_BIT   0x8000000000000000
-#define RESET_NA_NAN    0x0000ffffffffffff
-#define SET_NA_NAN_COMP 0xfff8000000000000 // we can directly set complement in hack_na_nan
-
-unsigned long long flip_double(unsigned long long f) {
-    unsigned long long mask = -(long long)(f >> 63) | FLIP_SIGN_BIT;
-    return f ^ mask;
-}
-
-void flip_double_ref(unsigned long long *f) {
-    unsigned long long mask = -(long long)(*f >> 63) | FLIP_SIGN_BIT;
-    *f ^= mask;
-}
-
-unsigned long long invert_flip_double(unsigned long long f) {
-    unsigned long long mask = ((f >> 63) - 1) | FLIP_SIGN_BIT;
-    return f ^ mask;
-}
-
-void hack_na_nan(unsigned long long *f) {
-    // same as flip_double_ref but with bit twiddling first for NA/NaN
-    // removed mask setting as sign bit is always 0, so we just flip it
-    // since we've to flip the sign bit, why not directly set it in #define
-    *f &= RESET_NA_NAN;
-    *f |= SET_NA_NAN_COMP;
-}
-
-// utils for accessing 11-bit quantities
-#define _0(x) (x & 0x7FF)
-#define _1(x) (x >> 11 & 0x7FF)
-#define _2(x) (x >> 22 & 0x7FF)
-#define _3(x) (x >> 33 & 0x7FF)
-#define _4(x) (x >> 44 & 0x7FF)
-#define _5(x) (x >> 55)
-
-#define STACK_HIST 2048
-
-// x should be of type numeric
-SEXP fastradixdouble(SEXP x, SEXP tol, SEXP return_index) {
-    int i;
-    unsigned long long pos, fi, si, n;
-    unsigned long long sum0 = 0, sum1 = 0, sum2 = 0, sum3 = 0, sum4 = 0, sum5 = 0, tsum;    
-    SEXP xtmp, order, ordertmp;
-    
-    n = length(x);
-    if (!isReal(x) || n <= 0) error("List argument to 'fastradixdouble' must be non-empty and of type 'numeric'");
-    if (TYPEOF(return_index) != LGLSXP || length(return_index) != 1) error("Argument 'return_index' to 'fastradixdouble' must be logical TRUE/FALSE");
-    if (TYPEOF(tol) != REALSXP) error("Argument 'tol' to 'fastradixdouble' must be a numeric vector of length 1");
-
-
-    xtmp  = PROTECT(allocVector(REALSXP, n));
-    ordertmp = PROTECT(allocVector(INTSXP, n));
-    order = PROTECT(allocVector(INTSXP, n));
-    
-    unsigned long long *array = (unsigned long long*)REAL(x);
-    unsigned long long *sort = (unsigned long long*)REAL(xtmp);
-            
-    // 6 histograms on the stack:
-    unsigned long long b0[STACK_HIST * 6];
-    unsigned long long *b1 = b0 + STACK_HIST;
-    unsigned long long *b2 = b1 + STACK_HIST;
-    unsigned long long *b3 = b2 + STACK_HIST;
-    unsigned long long *b4 = b3 + STACK_HIST;
-    unsigned long long *b5 = b4 + STACK_HIST;
-
-    // definitely faster on big data than a for-loop
-    memset(b0, 0, STACK_HIST*6*sizeof(unsigned long long));
-
-    // Step 1:  parallel histogramming pass
-    for (i=0;i<n;i++) {
-        // flip NaN/NA sign bit so that they get sorted in the front (special for data.table)
-        if (ISNAN(REAL(x)[i])) hack_na_nan(&array[i]);
-        fi = flip_double((unsigned long long)array[i]);
-        b0[_0(fi)]++;
-        b1[_1(fi)]++;
-        b2[_2(fi)]++;
-        b3[_3(fi)]++;
-        b4[_4(fi)]++;
-        b5[_5(fi)]++;
-    }
-    
-    // Step 2:  Sum the histograms -- each histogram entry records the number of values preceding itself.
-    for (i=0;i<STACK_HIST;i++) {
-
-        tsum = b0[i] + sum0;
-        b0[i] = sum0 - 1;
-        sum0 = tsum;
-
-        tsum = b1[i] + sum1;
-        b1[i] = sum1 - 1;
-        sum1 = tsum;
-
-        tsum = b2[i] + sum2;
-        b2[i] = sum2 - 1;
-        sum2 = tsum;
-
-        tsum = b3[i] + sum3;
-        b3[i] = sum3 - 1;
-        sum3 = tsum;
-
-        tsum = b4[i] + sum4;
-        b4[i] = sum4 - 1;
-        sum4 = tsum;
-
-        tsum = b5[i] + sum5;
-        b5[i] = sum5 - 1;
-        sum5 = tsum;
-    }
-
-    for (i=0;i<n;i++) {
-        fi = array[i];
-        flip_double_ref(&fi);
-        pos = _0(fi);
-        sort[++b0[pos]] = fi;
-        INTEGER(ordertmp)[b0[pos]] = i;
-    }
-
-    for (i=0;i<n;i++) {
-        si = sort[i];
-        pos = _1(si);
-        array[++b1[pos]] = si;
-        INTEGER(order)[b1[pos]] = INTEGER(ordertmp)[i];
-    }
-
-    for (i=0;i<n;i++) {
-        fi = array[i];
-        pos = _2(fi);
-        sort[++b2[pos]] = fi;
-        INTEGER(ordertmp)[b2[pos]] = INTEGER(order)[i];
-    }
-
-    for (i=0;i<n;i++) {
-        si = sort[i];
-        pos = _3(si);
-        array[++b3[pos]] = si;
-        INTEGER(order)[b3[pos]] = INTEGER(ordertmp)[i];
-    }
-
-    for (i=0;i<n;i++) {
-        fi = array[i];
-        pos = _4(fi);
-        sort[++b4[pos]] = fi;
-        INTEGER(ordertmp)[b4[pos]] = INTEGER(order)[i];
-    }
-
-    for (i=0;i<n;i++) {
-        si = sort[i];
-        pos = _5(si);
-        array[++b5[pos]] = invert_flip_double(si);
-        INTEGER(order)[b5[pos]] = INTEGER(ordertmp)[i]+1;
-    }
-
-    // NOTE: that the result won't be 'exactly' identical to ordernumtol if there are too many values that are 'very close' to each other.
-    // However, I believe this is the correct version in those cases. Why? if you've 3 numbers that are sorted with no tolerance, and now under tolerance
-    // these numbers are equal, then all 3 indices *must* be sorted. That is ther right order. But in some close cases, 'ordernumtol' doesn't do this.
-    // To test, do: x <- rnorm(1e6); and run fastradixdouble and ordernumtol with same tolerance and compare results.
-
-    // check for tolerance and reorder wherever necessary
-    if (length(tol) > 0) {
-        i=1;
-        int j, start=0, end=0;
-        SEXP st,dst,rt,sq,ridx;
-        PROTECT(ridx = allocVector(LGLSXP, 1));
-        LOGICAL(ridx)[0] = TRUE;
-        while(i<n) {
-            if (!R_FINITE(REAL(x)[i]) || !R_FINITE(REAL(x)[i-1])) { i++; continue; }
-            // hack to skip checking Inf=Inf, -Inf=-Inf, NA=NA and NaN=NaN... using unsigned int
-            if (REAL(x)[i]-REAL(x)[i-1] > REAL(tol)[0] || array[i] == array[i-1]) { i++; continue; }
-            start = i-1;
-            i++;
-            while(i < n && REAL(x)[i] - REAL(x)[i-1] < REAL(tol)[0]) { i++; }
-            end = i-1;
-            i++;
-            if (end-start+1 == 1) continue;
-            PROTECT(st = allocVector(INTSXP, end-start+1));
-            PROTECT(sq = allocVector(REALSXP, end-start+1));
-            // To investigate: probably a simple bubble sort or shell sort may be quicker on groups with < 10 items than a 3-pass radix?
-            // Can't rely on R's base radix order because even if you've two items in group and one of them is 3 and the other is 1e6, then it won't work!
-            // Just doing this gives 4x speed-up under small group sizes. (from 37 to 6-9 seconds)
-            if (end-start+1 == 2) {
-                // avoid radix sort on 2 items
-                if (INTEGER(order)[start] > INTEGER(order)[end]) {
-                    // then just swap
-                    INTEGER(st)[0] = INTEGER(order)[start];
-                    INTEGER(order)[start] = INTEGER(order)[end];
-                    INTEGER(order)[end] = INTEGER(st)[0];
-            
-                    REAL(sq)[0] = REAL(x)[start];
-                    REAL(x)[start] = REAL(x)[end];
-                    REAL(x)[end] = REAL(sq)[0];
-                }
-                UNPROTECT(2); // st, sq
-                continue;
-            }
-            for (j=0; j<end-start+1; j++) {
-                INTEGER(st)[j] = INTEGER(order)[j+start];
-                REAL(sq)[j] = REAL(x)[j+start];
-            }
-            PROTECT(dst = duplicate(st));
-            PROTECT(rt = fastradixint(dst, ridx));
-            for (j=0; j<end-start+1; j++) {
-                INTEGER(order)[j+start] = INTEGER(st)[INTEGER(rt)[j]-1];
-                REAL(x)[j+start] = REAL(sq)[INTEGER(rt)[j]-1];
-            }
-            UNPROTECT(4); // st, dst, sq, rt
-        }
-        UNPROTECT(1); // ridx
-    }
-    UNPROTECT(3); // xtmp, order, ordertmp
-    if (LOGICAL(return_index)[0]) return(order); 
-    return(x);
-}
diff --git a/src/fastradixint.c b/src/fastradixint.c
deleted file mode 100644
index 37b8936..0000000
--- a/src/fastradixint.c
+++ /dev/null
@@ -1,100 +0,0 @@
-#include <R.h>
-#define USE_RINTERNALS
-#include <Rinternals.h>
-
-// adapted from Michael Herf's code - http://stereopsis.com/radix.html
-// TO IMPLEMENT (probably) - R's long vector support
-// returns sort 'order' or 'value' depending on 'return_index' parameter (but not both currently)
-// allows for -ve integer values and NA (not possible in R's NA)
-// No restrictions imposed (stack size is small and is 3-pass) as opposed to base R's (max-min <= 1e5) to radix sort
-// better to replace regularorder1 (as base R seems to use 1-pass IIUC and therefore will be faster wherever applicable)
-#define _0(x) (x & 0x7FF)
-#define _1(x) (x >> 11 & 0x7FF)
-#define _2(x) (x >> 22)
-
-unsigned int flip_int(unsigned int f) {
-    return ((int)(f ^ 0x80000000));
-} 
-
-void flip_int_ref(unsigned int *f) {
-    *f = ((int)(*f ^ 0x80000000));
-}
-unsigned int invert_flip_int(unsigned int f) {
-    return ((int)(f ^ 0x80000000));
-}
-
-#define STACK_HIST 2048
-
-SEXP fastradixint(SEXP x, SEXP return_index) {
-    int i;
-    unsigned int pos, fi, si, n;
-    unsigned int sum0 = 0, sum1 = 0, sum2 = 0, tsum;    
-    SEXP ans, order, ordertmp;
-    
-    n = length(x);
-    if (!isInteger(x) || n <= 0) error("Argument 'x' to 'fastradixint' must be non-empty and of type 'integer'");
-    if (TYPEOF(return_index) != LGLSXP || length(return_index) != 1) error("Argument 'return_index' to 'fastradixint' must be logical TRUE/FALSE");
-    
-    ans  = PROTECT(allocVector(INTSXP, n));
-    order = PROTECT(allocVector(INTSXP, n));
-    ordertmp = PROTECT(allocVector(INTSXP, n));
-    
-    unsigned int *array = (unsigned int*)INTEGER(x);
-    unsigned int *sort = (unsigned int*)INTEGER(ans);     
-
-    // 3 histograms on the stack:
-    unsigned int b0[STACK_HIST * 3];
-    unsigned int *b1 = b0 + STACK_HIST;
-    unsigned int *b2 = b1 + STACK_HIST;
-    
-    // definitely faster on big data than a for-loop
-    memset(b0, 0, STACK_HIST*3*sizeof(unsigned int));
-
-    // Step 1:  parallel histogramming pass
-    for (i=0;i<n;i++) {
-        fi = flip_int((unsigned int)array[i]);
-        b0[_0(fi)]++;
-        b1[_1(fi)]++;
-        b2[_2(fi)]++;
-    }
-    
-    for (i=0;i<STACK_HIST;i++) {
-
-        tsum = b0[i] + sum0;
-        b0[i] = sum0 - 1;
-        sum0 = tsum;
-
-        tsum = b1[i] + sum1;
-        b1[i] = sum1 - 1;
-        sum1 = tsum;
-
-        tsum = b2[i] + sum2;
-        b2[i] = sum2 - 1;
-        sum2 = tsum;
-    }
-        
-    for (i=0;i<n;i++) {
-        fi = array[i];
-        flip_int_ref(&fi);
-        pos = _0(fi);
-        sort[++b0[pos]] = fi;
-        INTEGER(order)[b0[pos]] = i;
-    }
-    
-    for (i=0;i<n;i++) {
-        si = sort[i];
-        pos = _1(si);
-        array[++b1[pos]] = si;
-        INTEGER(ordertmp)[b1[pos]] = INTEGER(order)[i];
-    }
-
-    for (i=0;i<n;i++) {
-        fi = array[i];
-        pos = _2(fi);
-        sort[++b2[pos]] = invert_flip_int(fi);
-        INTEGER(order)[b2[pos]] = INTEGER(ordertmp)[i]+1;
-    }
-    UNPROTECT(3); // order, ordertmp, ans
-    if (LOGICAL(return_index)[0]) return(order);
-    return(ans);
-}
diff --git a/src/fcast.c b/src/fcast.c
index 0245311..96daeb8 100644
--- a/src/fcast.c
+++ b/src/fcast.c
@@ -3,213 +3,9 @@
 // #include <signal.h> // the debugging machinery + breakpoint aidee
 // raise(SIGINT);
 
-static SEXP subsetVectorRaw(SEXP x, SEXP idx, int l, int tl)
-// Only for use by subsetDT() or subsetVector() below, hence static
-// l is the count of non-zero (including NAs) in idx i.e. the length of the result
-// tl is the amount to be allocated,  tl>=l
-// TO DO: if no 0 or NA detected up front in subsetDT() below, could switch to a faster subsetVectorRawNo0orNA()
-{
-    int i, this, ansi=0, max=length(x);
-    if (tl<l) error("Internal error: tl<n passed to subsetVectorRaw");
-    SEXP ans = PROTECT(allocVector(TYPEOF(x), tl));
-    SETLENGTH(ans, l);
-    SET_TRUELENGTH(ans, tl);
-    switch(TYPEOF(x)) {
-    case INTSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this==0) continue;
-            INTEGER(ans)[ansi++] = (this==NA_INTEGER || this>max) ? NA_INTEGER : INTEGER(x)[this-1];
-        }
-        break;
-    case REALSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this==0) continue;
-            REAL(ans)[ansi++] = (this==NA_INTEGER || this>max) ? NA_REAL : REAL(x)[this-1];
-        }
-        break;
-    case LGLSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this==0) continue;
-            LOGICAL(ans)[ansi++] = (this==NA_INTEGER || this>max) ? NA_LOGICAL : LOGICAL(x)[this-1];
-        }
-        break;
-    case STRSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this==0) continue;
-            SET_STRING_ELT(ans, ansi++, (this==NA_INTEGER || this>max) ? NA_STRING : STRING_ELT(x, this-1));
-        }
-        break;
-    case VECSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this==0) continue;
-            SET_VECTOR_ELT(ans, ansi++, (this==NA_INTEGER || this>max) ? R_NilValue : VECTOR_ELT(x, this-1));
-        }
-        break;
-    // Fix for #982
-    // source: https://github.com/wch/r-source/blob/fbf5cdf29d923395b537a9893f46af1aa75e38f3/src/main/subset.c    
-    case CPLXSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this == 0) continue;
-            if (this == NA_INTEGER || this>max) {
-                COMPLEX(ans)[ansi].r = NA_REAL;
-                COMPLEX(ans)[ansi].i = NA_REAL;
-            } else COMPLEX(ans)[ansi] = COMPLEX(x)[this-1];
-            ansi++;
-        }
-        break;
-    case RAWSXP :
-        for (i=0; i<LENGTH(idx); i++) {
-            this = INTEGER(idx)[i];
-            if (this == 0) continue;
-            RAW(ans)[ansi++] = (this == NA_INTEGER || this>max) ? (Rbyte) 0 : RAW(x)[this-1];
-        }
-        break;
-    default :
-        error("Unknown column type '%s'", type2char(TYPEOF(x)));
-    }
-    if (ansi != l) error("Internal error: ansi [%d] != l [%d] at the end of subsetVector", ansi, l);
-    copyMostAttrib(x, ans);
-    UNPROTECT(1);
-    return(ans);
-}
-
-static int check_idx(SEXP idx, int n)
-{
-    int i, this, ans=0;
-    if (!isInteger(idx)) error("Internal error. 'idx' is type '%s' not 'integer'", type2char(TYPEOF(idx)));
-    for (i=0; i<LENGTH(idx); i++) {  // check idx once up front and count the non-0 so we know how long the answer will be
-        this = INTEGER(idx)[i];
-        if (this==0) continue;
-        if (this!=NA_INTEGER && this<0) error("Internal error: item %d of idx is %d. Negatives should have been dealt with earlier.", i+1, this);
-        // this>n is treated as NA for consistency with [.data.frame and things like cbind(DT[w],DT[w+1])
-        ans++;
-    }
-    return ans;
-}
-
-SEXP convertNegativeIdx(SEXP idx, SEXP maxArg)
-{
-    int this;
-    // + more precise and helpful error messages telling user exactly where the problem is (saving user debugging time)
-    // + a little more efficient than negativeSubscript in src/main/subscript.c (it's private to R so we can't call it anyway)
-    
-    if (!isInteger(idx)) error("Internal error. 'idx' is type '%s' not 'integer'", type2char(TYPEOF(idx)));
-    if (!isInteger(maxArg) || length(maxArg)!=1) error("Internal error. 'maxArg' is type '%s' and length %d, should be an integer singleton", type2char(TYPEOF(maxArg)), length(maxArg));
-    int max = INTEGER(maxArg)[0];
-    if (max<0) error("Internal error. max is %d, must be >= 0.", max);  // NA also an error which'll print as INT_MIN
-    int firstNegative = 0, firstPositive = 0, firstNA = 0, num0 = 0;
-    int i=0;
-    for (i=0; i<LENGTH(idx); i++) {
-        this = INTEGER(idx)[i];
-        if (this==NA_INTEGER) { if (firstNA==0) firstNA = i+1;  continue; }
-        if (this==0)          { num0++;  continue; }
-        if (this>0)           { if (firstPositive==0) firstPositive=i+1; continue; }
-        if (firstNegative==0) firstNegative=i+1;
-    }
-    if (firstNegative==0) return(idx);  // 0's and NA can be mixed with positives, there are no negatives present, so we're done
-    if (firstPositive) error("Item %d of i is %d and item %d is %d. Cannot mix positives and negatives.",
-                             firstNegative, INTEGER(idx)[firstNegative-1], firstPositive, INTEGER(idx)[firstPositive-1]);
-    if (firstNA)       error("Item %d of i is %d and item %d is NA. Cannot mix negatives and NA.",
-                             firstNegative, INTEGER(idx)[firstNegative-1], firstNA);
-    
-    // idx is all negative without any NA but perhaps 0 present (num0) ...
-    
-    char *tmp = Calloc(max, char);    // 4 times less memory that INTSXP in src/main/subscript.c
-    int firstDup = 0, numDup = 0, firstBeyond = 0, numBeyond = 0;
-    for (i=0; i<LENGTH(idx); i++) {
-        this = -INTEGER(idx)[i];
-        if (this==0) continue;
-        if (this>max) {
-            numBeyond++;
-            if (firstBeyond==0) firstBeyond=i+1;
-            continue;
-        }
-        if (tmp[this-1]==1) {
-            numDup++;
-            if (firstDup==0) firstDup=i+1;
-        } else tmp[this-1] = 1;
-    }
-    if (numBeyond)
-        warning("Item %d of i is %d but there are only %d rows. Ignoring this and %d more like it out of %d.", firstBeyond, INTEGER(idx)[firstBeyond-1], max, numBeyond-1, LENGTH(idx));
-    if (numDup)
-        warning("Item %d of i is %d which has occurred before. Ignoring this and %d other duplicates out of %d.", firstDup, INTEGER(idx)[firstDup-1], numDup-1, LENGTH(idx));
-    
-    SEXP ans = PROTECT(allocVector(INTSXP, max-LENGTH(idx)+num0+numDup+numBeyond));
-    int ansi = 0;
-    for (i=0; i<max; i++) {
-        if (tmp[i]==0) INTEGER(ans)[ansi++] = i+1;
-    }
-    Free(tmp);
-    UNPROTECT(1);
-    if (ansi != max-LENGTH(idx)+num0+numDup+numBeyond) error("Internal error: ansi[%d] != max[%d]-LENGTH(idx)[%d]+num0[%d]+numDup[%d]+numBeyond[%d] in convertNegativeIdx",ansi,max,LENGTH(idx),num0,numDup,numBeyond);
-    return(ans);
-}
-
-SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) { // rows and cols are 1-based passed from R level
-
-// Originally for subsetting vectors in fcast and now the beginnings of [.data.table ported to C
-// Immediate need is for R 3.1 as lglVec[1] now returns R's global TRUE and we don't want := to change that global [think 1 row data.tables]
-// Could do it other ways but may as well go to C now as we were going to do that anyway
-    
-    SEXP ans, tmp;
-    R_len_t i, j, ansn=0;
-    int this;
-    if (!isNewList(x)) error("Internal error. Argument 'x' to CsubsetDT is type '%s' not 'list'", type2char(TYPEOF(rows)));
-    if (!length(x)) return(x);  // return empty list
-    ansn = check_idx(rows, length(VECTOR_ELT(x,0)));  // check once up front before looping calls to subsetVectorRaw below
-    if (!isInteger(cols)) error("Internal error. Argument 'cols' to Csubset is type '%s' not 'integer'", type2char(TYPEOF(cols)));
-    for (i=0; i<LENGTH(cols); i++) {
-        this = INTEGER(cols)[i];
-        if (this<1 || this>LENGTH(x)) error("Item %d of 'cols' is %d which is outside 1-based range [1,ncol(x)=%d]", i+1, this, LENGTH(x));
-    }
-    ans = PROTECT(allocVector(VECSXP, LENGTH(cols)+64));  // just do alloc.col directly, eventually alloc.col can be deprecated.
-    copyMostAttrib(x, ans);  // other than R_NamesSymbol, R_DimSymbol and R_DimNamesSymbol  
-                             // so includes row.names (oddly, given other dims aren't) and "sorted", dealt with below
-    SET_TRUELENGTH(ans, LENGTH(ans));
-    SETLENGTH(ans, LENGTH(cols));
-    for (i=0; i<LENGTH(cols); i++) {
-        SET_VECTOR_ELT(ans, i, subsetVectorRaw(VECTOR_ELT(x, INTEGER(cols)[i]-1), rows, ansn, ansn));  // column vectors aren't over allocated yet
-    }
-    setAttrib(ans, R_NamesSymbol, subsetVectorRaw( getAttrib(x, R_NamesSymbol), cols, LENGTH(cols), LENGTH(cols)+64 ));
-    tmp = PROTECT(allocVector(INTSXP, 2));
-    INTEGER(tmp)[0] = NA_INTEGER;
-    INTEGER(tmp)[1] = -ansn;
-    setAttrib(ans, R_RowNamesSymbol, tmp);  // The contents of tmp must be set before being passed to setAttrib(). setAttrib looks at tmp value and copies it in the case of R_RowNamesSymbol. Caused hard to track bug around 28 Sep 2014.
-    // maintain key if ordered subset ...
-    SEXP key = getAttrib(x, install("sorted"));
-    if (length(key)) {
-        SEXP in = PROTECT(chmatch(key,getAttrib(ans,R_NamesSymbol), 0, TRUE)); // (nomatch ignored when in=TRUE)
-        i = 0;  while(i<LENGTH(key) && LOGICAL(in)[i]) i++;
-        UNPROTECT(1);
-        // i is now the keylen that can be kept. 2 lines above much easier in C than R
-        if (i==0) {
-            setAttrib(ans, install("sorted"), R_NilValue);
-            // clear key that was copied over by copyMostAttrib() above
-        } else if (isOrderedSubset(rows, ScalarInteger(length(VECTOR_ELT(x,0))))) {
-            setAttrib(ans, install("sorted"), tmp=allocVector(STRSXP, i));
-            for (j=0; j<i; j++) SET_STRING_ELT(tmp, j, STRING_ELT(key, j));
-        }
-    }
-    setAttrib(ans, install(".data.table.locked"), R_NilValue);
-    setselfref(ans);
-    UNPROTECT(2);
-    return ans;
-}
-
-SEXP subsetVector(SEXP x, SEXP idx) { // idx is 1-based passed from R level
-    int n = check_idx(idx, length(x));
-    return subsetVectorRaw(x, idx, n, n);
-}
-
 // TO DO: margins
 SEXP fcast(SEXP lhs, SEXP val, SEXP nrowArg, SEXP ncolArg, SEXP idxArg, SEXP fill, SEXP fill_d, SEXP is_agg) {
-    
+
     int nrows=INTEGER(nrowArg)[0], ncols=INTEGER(ncolArg)[0];
     int i,j,k, nlhs=length(lhs), nval=length(val), *idx = INTEGER(idxArg), thisidx;;
     SEXP thiscol, target, ans, thisfill;
@@ -299,20 +95,7 @@ SEXP fcast(SEXP lhs, SEXP val, SEXP nrowArg, SEXP ncolArg, SEXP idxArg, SEXP fil
     return(ans);
 }
 
-// internal functions that are not used anymore..
-
-// # nocov start
-// Note: all these functions below are internal functions and are designed specific to fcast.
-SEXP zero_init(R_len_t n) {
-    R_len_t i;
-    SEXP ans;
-    if (n < 0) error("Input argument 'n' to 'zero_init' must be >= 0");
-    ans = PROTECT(allocVector(INTSXP, n));
-    for (i=0; i<n; i++) INTEGER(ans)[i] = 0;
-    UNPROTECT(1);
-    return(ans);
-}
-
+// used in bmerge.c
 SEXP vec_init(R_len_t n, SEXP val) {
 
     SEXP ans;
@@ -341,79 +124,95 @@ SEXP vec_init(R_len_t n, SEXP val) {
     return(ans);
 }
 
-SEXP cast_order(SEXP v, SEXP env) {
-    R_len_t len;
-    SEXP call, ans;
-    if (TYPEOF(env) != ENVSXP) error("Argument 'env' to (data.table internals) 'cast_order' must be an environment");
-    if (TYPEOF(v) == VECSXP) len = length(VECTOR_ELT(v, 0));
-    else len = length(v);
-    PROTECT(call = lang2(install("forder"), v)); // TODO: save the 'eval' by calling directly the C-function.
-    ans = PROTECT(eval(call, env));
-    if (length(ans) == 0) { // forder returns integer(0) if already sorted
-        UNPROTECT(1); // ans
-        ans = PROTECT(seq_int(len, 1));
-    }
-    UNPROTECT(2);
-    return(ans);
-}
-
-SEXP cross_join(SEXP s, SEXP env) {
-    // Calling CJ is faster and don't have to worry about sorting or setting key.
-    SEXP call, r;
-    if (!isNewList(s) || isNull(s)) error("Argument 's' to 'cross_join' must be a list of length > 0");
-    PROTECT(call = lang3(install("do.call"), install("CJ"), s));
-    r = eval(call, env);
-    UNPROTECT(1);
-    return(r);
-}
-
-SEXP diff_int(SEXP x, R_len_t n) {
-
-    R_len_t i;
-    SEXP ans;
-    if (TYPEOF(x) != INTSXP) error("Argument 'x' to 'diff_int' must be an integer vector");
-    ans = PROTECT(allocVector(INTSXP, length(x)));
-    for (i=1; i<length(x); i++)
-        INTEGER(ans)[i-1] = INTEGER(x)[i] - INTEGER(x)[i-1];
-    INTEGER(ans)[length(x)-1] = n - INTEGER(x)[length(x)-1] + 1;
-    UNPROTECT(1);
-    return(ans);
-}
-
-SEXP intrep(SEXP x, SEXP len) {
-
-    R_len_t i,j,l=0, k=0;
-    SEXP ans;
-    if (TYPEOF(x) != INTSXP || TYPEOF(len) != INTSXP) error("Arguments 'x' and 'len' to 'intrep' should both be integer vectors");
-    if (length(x) != length(len)) error("'x' and 'len' must be of same length");
-    // assuming both are of length >= 1
-    for (i=0; i<length(len); i++)
-        l += INTEGER(len)[i]; // assuming positive values for len. internal use - can't bother to check.
-    ans = PROTECT(allocVector(INTSXP, l));
-    for (i=0; i<length(len); i++) {
-        for (j=0; j<INTEGER(len)[i]; j++) {
-            INTEGER(ans)[k++] = INTEGER(x)[i];
-        }
-    }
-    UNPROTECT(1); // ans
-    return(ans);
-}
-
-// taken match_transform() from base:::unique.c and modified
-SEXP coerce_to_char(SEXP s, SEXP env)
-{
-    if(OBJECT(s)) {
-    if(inherits(s, "factor")) return asCharacterFactor(s);
-    else if(getAttrib(s, R_ClassSymbol) != R_NilValue) {
-        SEXP call, r;
-        PROTECT(call = lang2(install("as.character"), s));
-        r = eval(call, env);
-        UNPROTECT(1);
-        return r;
-    }
-    }
-    /* else */
-    return coerceVector(s, STRSXP);
-}
-
-// # nocov end
+// commenting all unused functions, but not deleting it, just in case
+
+// // internal functions that are not used anymore..
+
+// // # nocov start
+// // Note: all these functions below are internal functions and are designed specific to fcast.
+// SEXP zero_init(R_len_t n) {
+//     R_len_t i;
+//     SEXP ans;
+//     if (n < 0) error("Input argument 'n' to 'zero_init' must be >= 0");
+//     ans = PROTECT(allocVector(INTSXP, n));
+//     for (i=0; i<n; i++) INTEGER(ans)[i] = 0;
+//     UNPROTECT(1);
+//     return(ans);
+// }
+
+// SEXP cast_order(SEXP v, SEXP env) {
+//     R_len_t len;
+//     SEXP call, ans;
+//     if (TYPEOF(env) != ENVSXP) error("Argument 'env' to (data.table internals) 'cast_order' must be an environment");
+//     if (TYPEOF(v) == VECSXP) len = length(VECTOR_ELT(v, 0));
+//     else len = length(v);
+//     PROTECT(call = lang2(install("forder"), v)); // TODO: save the 'eval' by calling directly the C-function.
+//     ans = PROTECT(eval(call, env));
+//     if (length(ans) == 0) { // forder returns integer(0) if already sorted
+//         UNPROTECT(1); // ans
+//         ans = PROTECT(seq_int(len, 1));
+//     }
+//     UNPROTECT(2);
+//     return(ans);
+// }
+
+// SEXP cross_join(SEXP s, SEXP env) {
+//     // Calling CJ is faster and don't have to worry about sorting or setting key.
+//     SEXP call, r;
+//     if (!isNewList(s) || isNull(s)) error("Argument 's' to 'cross_join' must be a list of length > 0");
+//     PROTECT(call = lang3(install("do.call"), install("CJ"), s));
+//     r = eval(call, env);
+//     UNPROTECT(1);
+//     return(r);
+// }
+
+// SEXP diff_int(SEXP x, R_len_t n) {
+
+//     R_len_t i;
+//     SEXP ans;
+//     if (TYPEOF(x) != INTSXP) error("Argument 'x' to 'diff_int' must be an integer vector");
+//     ans = PROTECT(allocVector(INTSXP, length(x)));
+//     for (i=1; i<length(x); i++)
+//         INTEGER(ans)[i-1] = INTEGER(x)[i] - INTEGER(x)[i-1];
+//     INTEGER(ans)[length(x)-1] = n - INTEGER(x)[length(x)-1] + 1;
+//     UNPROTECT(1);
+//     return(ans);
+// }
+
+// SEXP intrep(SEXP x, SEXP len) {
+
+//     R_len_t i,j,l=0, k=0;
+//     SEXP ans;
+//     if (TYPEOF(x) != INTSXP || TYPEOF(len) != INTSXP) error("Arguments 'x' and 'len' to 'intrep' should both be integer vectors");
+//     if (length(x) != length(len)) error("'x' and 'len' must be of same length");
+//     // assuming both are of length >= 1
+//     for (i=0; i<length(len); i++)
+//         l += INTEGER(len)[i]; // assuming positive values for len. internal use - can't bother to check.
+//     ans = PROTECT(allocVector(INTSXP, l));
+//     for (i=0; i<length(len); i++) {
+//         for (j=0; j<INTEGER(len)[i]; j++) {
+//             INTEGER(ans)[k++] = INTEGER(x)[i];
+//         }
+//     }
+//     UNPROTECT(1); // ans
+//     return(ans);
+// }
+
+// // taken match_transform() from base:::unique.c and modified
+// SEXP coerce_to_char(SEXP s, SEXP env)
+// {
+//     if(OBJECT(s)) {
+//     if(inherits(s, "factor")) return asCharacterFactor(s);
+//     else if(getAttrib(s, R_ClassSymbol) != R_NilValue) {
+//         SEXP call, r;
+//         PROTECT(call = lang2(install("as.character"), s));
+//         r = eval(call, env);
+//         UNPROTECT(1);
+//         return r;
+//     }
+//     }
+//     /* else */
+//     return coerceVector(s, STRSXP);
+// }
+
+// // # nocov end
diff --git a/src/fmelt.c b/src/fmelt.c
index ff6f11e..da967cb 100644
--- a/src/fmelt.c
+++ b/src/fmelt.c
@@ -17,12 +17,12 @@ SEXP seq_int(int n, int start) {
 // very specific "set_diff" for integers
 SEXP set_diff(SEXP x, int n) {
     SEXP ans, xmatch;
-    int i, j = 0, *buf;
+    int i, j = 0;
     if (TYPEOF(x) != INTSXP) error("'x' must be an integer");
     if (n <= 0) error("'n' must be a positive integer");
     xmatch = match(x, seq_int(n, 1), 0); // took a while to realise: matches vec against x - thanks to comment from Matthew in assign.c!
     
-    buf = (int *) R_alloc(n, sizeof(int));
+    int *buf = (int *) R_alloc(n, sizeof(int));
     for (i=0; i<n; i++) {
         if (INTEGER(xmatch)[i] == 0) {
             buf[j++] = i+1;
@@ -30,7 +30,7 @@ SEXP set_diff(SEXP x, int n) {
     }
     n = j;
     PROTECT(ans = allocVector(INTSXP, n));
-    memcpy(INTEGER(ans), buf, sizeof(int) * n); // sizeof is of type size_t - no integer overflow issues
+    if (n) memcpy(INTEGER(ans), buf, sizeof(int) * n); // sizeof is of type size_t - no integer overflow issues
     UNPROTECT(1);
     return(ans);
 }
@@ -39,7 +39,7 @@ SEXP set_diff(SEXP x, int n) {
 // for melt's `na.rm=TRUE` option
 SEXP which_notNA(SEXP x) {
     SEXP v, ans;
-    int i, j=0, n = length(x), *buf;
+    int i, j=0, n = length(x);
     
     PROTECT(v = allocVector(LGLSXP, n));
     switch (TYPEOF(x)) {
@@ -60,7 +60,7 @@ SEXP which_notNA(SEXP x) {
             "which_notNA", type2char(TYPEOF(x)));
     }
     
-    buf = (int *) R_alloc(n, sizeof(int));
+    int *buf = (int *) R_alloc(n, sizeof(int));
     for (i = 0; i < n; i++) {
         if (LOGICAL(v)[i] == TRUE) {
             buf[j] = i + 1;
@@ -69,7 +69,7 @@ SEXP which_notNA(SEXP x) {
     }
     n = j;
     PROTECT(ans = allocVector(INTSXP, n));
-    memcpy(INTEGER(ans), buf, sizeof(int) * n);
+    if (n) memcpy(INTEGER(ans), buf, sizeof(int) * n);
     
     UNPROTECT(2);
     return(ans);
@@ -77,10 +77,10 @@ SEXP which_notNA(SEXP x) {
 
 SEXP which(SEXP x, Rboolean bool) {
     
-    int i, j=0, n = length(x), *buf;
+    int i, j=0, n = length(x);
     SEXP ans;
     if (!isLogical(x)) error("Argument to 'which' must be logical");
-    buf = (int *) R_alloc(n, sizeof(int));
+    int *buf = (int *) R_alloc(n, sizeof(int));
     for (i = 0; i < n; i++) {
         if (LOGICAL(x)[i] == bool) {
             buf[j] = i + 1;
@@ -89,7 +89,7 @@ SEXP which(SEXP x, Rboolean bool) {
     }
     n = j;
     PROTECT(ans = allocVector(INTSXP, n));
-    memcpy(INTEGER(ans), buf, sizeof(int) * n);
+    if (n) memcpy(INTEGER(ans), buf, sizeof(int) * n);
     
     UNPROTECT(1);
     return(ans);
@@ -106,7 +106,7 @@ SEXP whichwrapper(SEXP x, SEXP bool) {
 SEXP concat(SEXP vec, SEXP idx) {
     
     SEXP s, t, v;
-    int i;
+    int i, nidx=length(idx);
     
     if (TYPEOF(vec) != STRSXP) error("concat: 'vec must be a character vector");
     if (!isInteger(idx) || length(idx) < 0) error("concat: 'idx' must be an integer vector of length >= 0");
@@ -114,10 +114,11 @@ SEXP concat(SEXP vec, SEXP idx) {
         if (INTEGER(idx)[i] < 0 || INTEGER(idx)[i] > length(vec)) 
             error("concat: 'idx' must take values between 0 and length(vec); 0 <= idx <= length(vec)");
     }
-    PROTECT(v = allocVector(STRSXP, length(idx)));
-    for (i=0; i<length(idx); i++) {
+    PROTECT(v = allocVector(STRSXP, nidx > 5 ? 5 : nidx));
+    for (i=0; i<length(v); i++) {
         SET_STRING_ELT(v, i, STRING_ELT(vec, INTEGER(idx)[i]-1));
     }
+    if (nidx > 5) SET_STRING_ELT(v, 4, mkChar("..."));
     PROTECT(t = s = allocList(3));
     SET_TYPEOF(t, LANGSXP);
     SETCAR(t, install("paste")); t = CDR(t);
@@ -193,8 +194,8 @@ SEXP checkVars(SEXP DT, SEXP id, SEXP measure, Rboolean verbose) {
         }
         booltmp = PROTECT(duplicated(tmp, FALSE)); protecti++;
         for (i=0; i<length(tmp); i++) {
-            if (INTEGER(tmp)[i] <= 0) error("Column '%s' not found in 'data'", CHAR(STRING_ELT(id, i)));
-            else if (INTEGER(tmp)[i] > ncol) error("id.vars value exceeds ncol(data)");
+            if (INTEGER(tmp)[i] <= 0 || INTEGER(tmp)[i] > ncol) 
+                error("One or more values in 'id.vars' is invalid.");
             else if (!LOGICAL(booltmp)[i]) targetcols++;
             else continue;
         }
@@ -227,8 +228,8 @@ SEXP checkVars(SEXP DT, SEXP id, SEXP measure, Rboolean verbose) {
         }
         booltmp = PROTECT(duplicated(tmp, FALSE)); protecti++;
         for (i=0; i<length(tmp); i++) {
-            if (INTEGER(tmp)[i] <= 0) error("Column '%s' not found in 'data'", CHAR(STRING_ELT(measure, i)));
-            else if (INTEGER(tmp)[i] > ncol) error("measure.vars value exceeds ncol(data)");
+            if (INTEGER(tmp)[i] <= 0 || INTEGER(tmp)[i] > ncol) 
+                error("One or more values in 'measure.vars' is invalid.");
             else if (!LOGICAL(booltmp)[i]) targetcols++;
             else continue;
         }
@@ -257,8 +258,8 @@ SEXP checkVars(SEXP DT, SEXP id, SEXP measure, Rboolean verbose) {
             default : error("Unknown 'id.vars' type %s, must be character or integer vector", type2char(TYPEOF(id)));
         }
         for (i=0; i<length(tmp); i++) {
-            if (INTEGER(tmp)[i] <= 0) error("Column '%s' not found in 'data'", CHAR(STRING_ELT(id, i)));
-            else if (INTEGER(tmp)[i] > ncol) error("measure.vars value exceeds ncol(data)");
+            if (INTEGER(tmp)[i] <= 0 || INTEGER(tmp)[i] > ncol) 
+                error("One or more values in 'id.vars' is invalid.");
         }
         idcols = PROTECT(tmp); protecti++;
         switch(TYPEOF(measure)) {
@@ -273,8 +274,8 @@ SEXP checkVars(SEXP DT, SEXP id, SEXP measure, Rboolean verbose) {
             tmp = PROTECT(unlist_(tmp2)); protecti++;
         }
         for (i=0; i<length(tmp); i++) {
-            if (INTEGER(tmp)[i] <= 0) error("Column '%s' not found in 'data'", CHAR(STRING_ELT(measure, i)));
-            else if (INTEGER(tmp)[i] > ncol) error("measure.vars value exceeds ncol(data)");
+            if (INTEGER(tmp)[i] <= 0 || INTEGER(tmp)[i] > ncol) 
+                error("One or more values in 'measure.vars' is invalid.");
         }
         if (isNewList(measure)) valuecols = tmp2; 
         else {
@@ -316,14 +317,16 @@ static void preprocess(SEXP DT, SEXP id, SEXP measure, SEXP varnames, SEXP valna
     }
     if (length(varnames) != 1)
         error("'variable.name' must be a character/integer vector of length=1.");
-    data->leach = malloc(sizeof(int) * data->lvalues);
-    data->isidentical = malloc(sizeof(int) * data->lvalues);
-    data->isfactor = calloc(sizeof(int), data->lvalues);
-    data->maxtype = calloc(sizeof(SEXPTYPE), data->lvalues);
+    data->leach = (int *)R_alloc(data->lvalues, sizeof(int));
+    data->isidentical = (int *)R_alloc(data->lvalues, sizeof(int));
+    data->isfactor = (int *)R_alloc(data->lvalues, sizeof(int));
+    data->maxtype = (SEXPTYPE *)R_alloc(data->lvalues, sizeof(SEXPTYPE));
     for (i=0; i<data->lvalues; i++) {
         tmp = VECTOR_ELT(data->valuecols, i);
         data->leach[i] = length(tmp);
-        data->isidentical[i] = 1;
+        data->isidentical[i] = 1;  // TODO - why 1 and not Rboolean TRUE?
+        data->isfactor[i] = 0;  // seems to hold 2 below, so not an Rboolean FALSE here. TODO - better name for variable?
+        data->maxtype[i] = 0;   // R_alloc doesn't initialize so careful to here, relied on below
         data->lmax = (data->lmax > data->leach[i]) ? data->lmax : data->leach[i];
         data->lmin = (data->lmin < data->leach[i]) ? data->lmin : data->leach[i];
         for (j=0; j<data->leach[i]; j++) {
@@ -354,7 +357,7 @@ SEXP getvaluecols(SEXP DT, SEXP dtnames, Rboolean valfactor, Rboolean verbose, s
     
     int i, j, k, protecti=0, counter=0, thislen=0;
     SEXP tmp, seqcols, thiscol, thisvaluecols, target, ansvals, thisidx=R_NilValue, flevels, clevels;
-    Rboolean coerced=FALSE, thisfac=FALSE, *isordered, copyattr = FALSE, thisvalfactor;
+    Rboolean coerced=FALSE, thisfac=FALSE, copyattr = FALSE, thisvalfactor;
     size_t size;
     for (i=0; i<data->lvalues; i++) {
         thisvaluecols = VECTOR_ELT(data->valuecols, i);
@@ -384,7 +387,7 @@ SEXP getvaluecols(SEXP DT, SEXP dtnames, Rboolean valfactor, Rboolean verbose, s
         }
     } else data->totlen = data->nrow * data->lmax;
     flevels = PROTECT(allocVector(VECSXP, data->lmax)); protecti++;
-    isordered = malloc(sizeof(Rboolean) * data->lmax);
+    Rboolean *isordered = (Rboolean *)R_alloc(data->lmax, sizeof(Rboolean));
     ansvals = PROTECT(allocVector(VECSXP, data->lvalues)); protecti++;
     for (i=0; i<data->lvalues; i++) {
         thisvalfactor = (data->maxtype[i] == VECSXP) ? FALSE : valfactor;
@@ -475,13 +478,12 @@ SEXP getvaluecols(SEXP DT, SEXP dtnames, Rboolean valfactor, Rboolean verbose, s
         }
     }
     UNPROTECT(protecti);
-    free(isordered);
     return(ansvals);
 }
 
 SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, struct processData *data) {
     
-    int i,j,k,cnt=0,nrows=0, nlevels=0, protecti=0, thislen;
+    int i,j,k,cnt=0,nrows=0, nlevels=0, protecti=0, thislen, zerolen=0;
     SEXP ansvars, thisvaluecols, levels, target, matchvals, thisnames;
 
     ansvars = PROTECT(allocVector(VECSXP, 1)); protecti++;
@@ -499,10 +501,11 @@ SEXP getvarcols(SEXP DT, SEXP dtnames, Rboolean varfactor, Rboolean verbose, str
             for (j=0; j<data->lmax; j++) {
                 thislen = length(VECTOR_ELT(data->naidx, j));
                 for (k=0; k<thislen; k++)
-                    INTEGER(target)[nrows + k] = INTEGER(matchvals)[j];
+                    INTEGER(target)[nrows + k] = INTEGER(matchvals)[j - zerolen]; // fix for #1359
                 nrows += thislen;
-                nlevels += (thislen != 0);
-            } 
+                zerolen += (thislen == 0);
+            }
+            nlevels = data->lmax - zerolen;
         } else {
             for (j=0; j<data->lmax; j++) {
                 for (k=0; k<data->nrow; k++) 
@@ -691,11 +694,6 @@ SEXP fmelt(SEXP DT, SEXP id, SEXP measure, SEXP varfactor, SEXP valfactor, SEXP
         }
         setAttrib(ans, R_NamesSymbol, ansnames);
     }
-    // should be 'free', not 'Free'. Fixes #1059
-    free(data.isfactor);
-    free(data.maxtype);
-    free(data.leach);
-    free(data.isidentical);
     UNPROTECT(protecti);
     return(ans);
 }
diff --git a/src/forder.c b/src/forder.c
index a15b599..32b99ae 100644
--- a/src/forder.c
+++ b/src/forder.c
@@ -93,15 +93,14 @@ static void gsfree() {
      3. Separated setRange so forder can redirect to iradix
 */
 
-static int range, off;                                                      // used by both icount and forder
+static int range, xmin;                                                  // used by both icount and forder
 static void setRange(int *x, int n)
 {
     int i, tmp;
-    int xmin = NA_INTEGER, xmax = NA_INTEGER;
+    xmin = NA_INTEGER;     // used by forder
+    int xmax = NA_INTEGER; // declared locally as we only need xmin outside
     double overflow;
     
-    off = (nalast == 1) ? 0 : 1;   // nalast^decreasing ? 0 : 1;            // off=0 will store values starting from index 0. NAs will go last.
-                                                                            // off=1 will store values starting from index 1. NAs will be at 0th index.
     i = 0;
     while(i<n && x[i]==NA_INTEGER) i++;
     if (i<n) xmax = xmin = x[i];
@@ -116,8 +115,7 @@ static void setRange(int *x, int n)
     overflow = (double)xmax - (double)xmin + 1;                             // ex: x=c(-2147483647L, NA_integer_, 1L) results in overflowing int range.
     if (overflow > INT_MAX) {range = INT_MAX; return;}                      // detect and force iradix here, since icount is out of the picture
     range = xmax-xmin+1;
-    off = order==1 ? -xmin+off : xmax+off;                                  // so that  off+order*x[i]  (below in icount)
-                                                                            // => (x[i]-xmin)+0|1  or  (xmax-x[i])+0|1
+    
     return;
 }
 
@@ -135,31 +133,41 @@ static void icount(int *x, int *o, int n)
 */
 {
     int i=0, tmp;
-    int napos = (nalast == 1) ? range : 0;      // increasing ? 0 : range   // take care of 'nalast' argument
+    int napos = range;  // always count NA in last bucket and we'll account for nalast option in due course
     static unsigned int counts[N_RANGE+1] = {0};                            // static is IMPORTANT, counting sort is called repetitively.
     /* counts are set back to 0 at the end efficiently. 1e5 = 0.4MB i.e 
     tiny. We'll only use the front part of it, as large as range. So it's 
     just reserving space, not using it. Have defined N_RANGE to be 100000.*/
     if (range > N_RANGE) Error("Internal error: range = %d; isorted can't handle range > %d", range, N_RANGE);
     for(i=0; i<n; i++) {
-        if (x[i] == NA_INTEGER) counts[napos]++;                            // For nalast=NA case, we won't remove/skip NAs, rather set 'o' indices
-        else counts[off + order*x[i]]++;                                     // to 0. subset will skip them. We can't know how many NAs to skip 
-    }                                                                       // beforehand - i.e. while allocating "ans" vector
+        if (x[i] == NA_INTEGER) counts[napos]++;             // For nalast=NA case, we won't remove/skip NAs, rather set 'o' indices
+        else counts[x[i]-xmin]++;                            // to 0. subset will skip them. We can't know how many NAs to skip 
+    }                                                        // beforehand - i.e. while allocating "ans" vector
     // TO DO: at this point if the last count==n then it's all the same number and we can stop now.
     // Idea from Terdiman, then improved on that by not needing to loop through counts.
     
-    tmp = 0;                                                                // *** BLOCK 4 ***
-    for (i=0; i<=range; i++) 
+    tmp = 0;
+    if (nalast!=1 && counts[napos]) {
+        push(counts[napos]);
+        tmp += counts[napos];
+    }
+    int w = (order==1) ? 0 : range-1;                                   // *** BLOCK 4 ***
+    for (i=0; i<range; i++) 
     /* no point in adding tmp<n && i<=range, since range includes max, 
        need to go to max, unlike 256 loops elsewhere in forder.c */
     {
-        if (counts[i]) {                                                    // cumulate but not through 0's. Helps resetting zeros when n<range, below.
-            push(counts[i]);
-            counts[i] = (tmp += counts[i]);
+        if (counts[w]) {                                                    // cumulate but not through 0's. Helps resetting zeros when n<range, below.
+            push(counts[w]);
+            counts[w] = (tmp += counts[w]);
         }
+        w += order; // order is +1 or -1
+    }
+    if (nalast==1 && counts[napos]) {
+        push(counts[napos]);
+        counts[napos] = (tmp += counts[napos]);
     }
     for(i=n-1; i>=0; i--) {
-        o[--counts[(x[i] == NA_INTEGER) ? napos : off+order*x[i]]] = (int)(i+1);    // This way na.last=TRUE/FALSE cases will have just a single if-check overhead.
+        o[--counts[(x[i] == NA_INTEGER) ? napos : x[i]-xmin]] = (int)(i+1);    // This way na.last=TRUE/FALSE cases will have just a single if-check overhead.
     }
     if (nalast == 0)                                                        // nalast = 1, -1 are both taken care already.
         for (i=0; i<n; i++) o[i] = (x[o[i]-1] == NA_INTEGER) ? 0 : o[i];    // nalast = 0 is dealt with separately as it just sets o to 0
@@ -172,7 +180,7 @@ static void icount(int *x, int *o, int n)
         doesn't matter if we set to 0 several times on any repeats */
         counts[napos]=0;
         for (i=0; i<n; i++) {
-            if (x[i]!=NA_INTEGER) counts[off + order*x[i]]=0;
+            if (x[i]!=NA_INTEGER) counts[x[i]-xmin]=0;
         }
     } else {
         memset(counts, 0, (range+1)*sizeof(int));                           // *** BLOCK 6 ***
@@ -414,7 +422,8 @@ static void iradix_r(int *xsub, int *osub, int n, int radix)
 // + changed to MSD and hooked into forder framework here.
 // + replaced tolerance with rounding s.f.
 
-static int dround = 2;
+// No rounding by default, for now. Handles #1642, #1728, #1463, #485
+static int dround = 0;
 static unsigned long long dmask1;
 static unsigned long long dmask2;
 
@@ -704,7 +713,7 @@ int StrCmp2(SEXP x, SEXP y) {    // same as StrCmp but also takes into account '
     if (x == y) return 0;                   // same cached pointer (including NA_STRING==NA_STRING)
     if (x == NA_STRING) return nalast;      // if x=NA, nalast=1 ? then x > y else x < y (Note: nalast == 0 is already taken care of in 'csorted', won't be 0 here)
     if (y == NA_STRING) return -nalast;     // if y=NA, nalast=1 ? then y > x
-    return order*strcmp(CHAR(x), CHAR(y));  // same as explanation in StrCmp
+    return order*strcmp(CHAR(ENC2UTF8(x)), CHAR(ENC2UTF8(y)));  // same as explanation in StrCmp
 }
 
 int StrCmp(SEXP x, SEXP y)            // also used by bmerge and chmatch
@@ -712,9 +721,13 @@ int StrCmp(SEXP x, SEXP y)            // also used by bmerge and chmatch
     if (x == y) return 0;             // same cached pointer (including NA_STRING==NA_STRING)
     if (x == NA_STRING) return -1;    // x<y
     if (y == NA_STRING) return 1;     // x>y
-    return strcmp(CHAR(x), CHAR(y));  // can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not
-}                                     // ordering is ascii only (C locale). TO DO: revisit and allow user to change to strcoll, and take account of Encoding
-                                      // see comments in bmerge().  10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals.
+    return strcmp(CHAR(ENC2UTF8(x)), CHAR(ENC2UTF8(y))); // ENC2UTF8 handles encoding issues by converting all marked non-utf8 encodings alone to utf8 first. The function could be wrapped in the first if-statement already instead of at the last stage, but this is to ensure that all-ascii cases are handled with maximum efficiency.
+    // This seems to fix the issues as far as I've checked. Will revisit if necessary.
+    
+    // OLD COMMENT: can return 0 here for the same string in known and unknown encodings, good if the unknown string is in that encoding but not if not ordering is ascii only (C locale). TO DO: revisit and allow user to change to strcoll, and take account of Encoding. see comments in bmerge().  10k calls of strcmp = 0.37s, 10k calls of strcoll = 4.7s. See ?Comparison, ?Encoding, Scollate in R internals.
+
+}
+
 // TO DO: check that all unknown encodings are ascii; i.e. no non-ascii unknowns are present, and that either Latin1
 //        or UTF-8 is used by user, not both. Then error if not. If ok, then can proceed with byte level. ascii is never marked known by R, but non-ascii (i.e. knowable encoding) could be marked unknown.
 //        does R internals have is_ascii function exported?  If not, simple enough.
@@ -1021,6 +1034,7 @@ static void isort(int *x, int *o, int n)
 {
     if (n<=2) {
         if (nalast == 0 && n == 2) {                        // nalast = 0 and n == 2 (check bottom of this file for explanation)
+            if (o[0]==-1) { o[0]=1; o[1]=2; }
             for (int i=0; i<n; i++) if (x[i] == NA_INTEGER) o[i] = 0; 
             push(1); push(1);
             return;
@@ -1052,6 +1066,7 @@ static void dsort(double *x, int *o, int n)
 {
     if (n <= 2) {                                           // nalast = 0 and n == 2 (check bottom of this file for explanation)
         if (nalast == 0 && n == 2) {                        // don't have to twiddle here.. at least one will be NA and 'n' WILL BE 2.
+            if (o[0]==-1) { o[0]=1; o[1]=2; }
             for (int i=0; i<n; i++) if (is_nan(x, i)) o[i] = 0;
             push(1); push(1);
             return;
@@ -1082,9 +1097,14 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrp, SEXP sortStrArg, SEXP orderArg, SEXP
     if (isNewList(DT)) {
         if (!length(DT)) error("DT is an empty list() of 0 columns");
         if (!isInteger(by) || !length(by)) error("DT has %d columns but 'by' is either not integer or length 0", length(DT));  // seq_along(x) at R level
-        for (i=0; i<LENGTH(by); i++) if (INTEGER(by)[i] < 1 || INTEGER(by)[i] > length(DT)) error("'by' value %d out of range [1,%d]", INTEGER(by)[i], length(DT));
         n = length(VECTOR_ELT(DT,0));
-        x = VECTOR_ELT(DT,INTEGER(by)[0]-1);        
+        for (i=0; i<LENGTH(by); i++) {
+            if (INTEGER(by)[i] < 1 || INTEGER(by)[i] > length(DT)) 
+                error("'by' value %d out of range [1,%d]", INTEGER(by)[i], length(DT));
+            if ( n != length(VECTOR_ELT(DT, INTEGER(by)[i]-1)) )
+                error("Column %d is length %d which differs from length of column 1 (%d)\n", INTEGER(by)[i], length(VECTOR_ELT(DT, INTEGER(by)[i]-1)), n);
+        }
+        x = VECTOR_ELT(DT,INTEGER(by)[0]-1);
     } else {
         if (!isNull(by)) error("Input is a single vector but 'by' is not NULL");
         n = length(DT);
@@ -1105,7 +1125,6 @@ SEXP forder(SEXP DT, SEXP by, SEXP retGrp, SEXP sortStrArg, SEXP orderArg, SEXP
     o[0] = -1;                                  // so [i|c|d]sort know they can populate o directly with no working memory needed to reorder existing order
                                                 // had to repace this from '0' to '-1' because 'nalast = 0' replace 'o[.]' with 0 values.
     xd = DATAPTR(x);
-    
     stackgrps = length(by)>1 || LOGICAL(retGrp)[0];
     savetl_init();   // from now on use Error not error.
 
diff --git a/src/frank.c b/src/frank.c
index a8581e6..df8a68a 100644
--- a/src/frank.c
+++ b/src/frank.c
@@ -71,13 +71,14 @@ SEXP dt_na(SEXP x, SEXP cols) {
 SEXP frank(SEXP xorderArg, SEXP xstartArg, SEXP xlenArg, SEXP ties_method) {
     int i=0, j=0, k=0, n;
     int *xstart = INTEGER(xstartArg), *xlen = INTEGER(xlenArg), *xorder = INTEGER(xorderArg);
-    enum {MEAN, MAX, MIN, DENSE} ties = MEAN; // RUNLENGTH
+    enum {MEAN, MAX, MIN, DENSE, SEQUENCE} ties = MEAN; // RUNLENGTH
     SEXP ans;
 
     if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "average"))  ties = MEAN;
     else if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "max")) ties = MAX;
     else if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "min")) ties = MIN;
     else if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "dense")) ties = DENSE;
+    else if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "sequence")) ties = SEQUENCE;
     // else if (!strcmp(CHAR(STRING_ELT(ties_method, 0)), "runlength")) ties = RUNLENGTH;
     else error("Internal error: invalid ties.method for frankv(), should have been caught before. Please report to datatable-help");
     n = length(xorderArg);
@@ -110,6 +111,13 @@ SEXP frank(SEXP xorderArg, SEXP xstartArg, SEXP xlenArg, SEXP ties_method) {
                 k++;
             }
             break;
+            case SEQUENCE :
+            for (i = 0; i < length(xstartArg); i++) {
+                k=1;
+                for (j = xstart[i]-1; j < xstart[i]+xlen[i]-1; j++)
+                    INTEGER(ans)[xorder[j]-1] = k++;
+            }
+            break;
             // case RUNLENGTH :
             // for (i = 0; i < length(xstartArg); i++) {
             //     k=1;
diff --git a/src/fread.c b/src/fread.c
index d5716ef..4283c7e 100644
--- a/src/fread.c
+++ b/src/fread.c
@@ -8,6 +8,7 @@
 #include <windows.h>
 #include <stdio.h>
 #include <tchar.h>
+#include <inttypes.h>  // for PRId64
 #else
 #include <sys/mman.h>
 #include <sys/stat.h>
@@ -26,7 +27,7 @@ And even more diagnostics to verbose=TRUE so we can see where crashes are.
 colClasses shouldn't be ignored but rather respected and then warn if data accuracy is lost. See first NOTE in NEWS.
 Detect and coerce dates and times. By searching for - and :, and dateTtime etc, or R's own method or fasttime. POSIXct default, for microseconds? : http://stackoverflow.com/questions/14056370/cast-string-to-idatetime
 Fill in too-short lines :  http://stackoverflow.com/questions/21124372/fread-doesnt-like-lines-with-less-fields-than-other-lines
-Allow to increase to top 500, middle 500 and bottom 500.
+Allow to increase from 100 rows at 10 points
 madvise is too eager when reading just the top 10 rows.
 Add as.colClasses to fread.R after return from C level (e.g. for colClasses "Date", although as slow as read.csv via character)
 Allow comment char to ignore. Important in format detection. But require valid line data before comment character in the read loop? See http://stackoverflow.com/a/18922269/403310
@@ -69,7 +70,8 @@ static union {double d; long long l; int b;} u;   // b=boolean, can hold NA_LOGI
 static const char *fieldStart, *fieldEnd;
 static int fieldLen;
 #define NUT        8   // Number of User Types (just for colClasses where "numeric"/"double" are equivalent)
-static const char UserTypeName[NUT][10] = {"logical", "integer", "integer64", "numeric", "character", "NULL", "double", "CLASS" };  // important that first 6 correspond to TypeName.  "CLASS" is the fall back to character then as.class at R level ("CLASS" string is just a placeholder).
+static const char UserTypeName[NUT][10] = {"logical", "integer", "integer64", "numeric", "character", "NULL", "double", "CLASS" };
+// important that first 6 correspond to TypeName.  "CLASS" is the fall back to character then as.class at R level ("CLASS" string is just a placeholder).
 static int UserTypeNameMap[NUT] = { SXP_LGL, SXP_INT, SXP_INT64, SXP_REAL, SXP_STR, SXP_NULL, SXP_REAL, SXP_STR };
 // quote
 const char *quote;
@@ -210,13 +212,13 @@ static inline void skip_spaces() {
 static inline void Field()
 {
     quoteStatus=0;
-    if (*ch=='\"') { // protected, now look for the next ", so long as it doesn't leave unbalanced unquoted regions
+    if (*ch==quote[0]) { // protected, now look for the next ", so long as it doesn't leave unbalanced unquoted regions
         quoteStatus=1;
         fieldStart = ch+1;
         int eolCount=0;  // just >0 is used currently but may as well count
         Rboolean noEmbeddedEOL=FALSE, quoteProblem=FALSE;
         while(++ch<eof) {
-            if (*ch!='\"') {
+            if (*ch!=quote[0]) {
                 if (noEmbeddedEOL && *ch==eol) { quoteProblem=TRUE; break; }
                 eolCount+=(*ch==eol);
                 continue;  // fast return in most cases of characters
@@ -225,13 +227,15 @@ static inline void Field()
             // " followed by sep|eol|eof dominates a field ending with \" (for support of Windows style paths)
             
             if (*(ch-1)!='\\') {
-                if (ch+1<eof && *(ch+1)=='\"') { ch++; continue; }  // skip doubled-quote
+                if (ch+1<eof && *(ch+1)==quote[0]) { ch++; continue; }  // skip doubled-quote
                 // unescaped subregion
                 if (eolCount) {ch++; quoteProblem=TRUE; break;}
-                while (++ch<eof && (*ch!='\"' || *(ch-1)=='\\') && *ch!=eol);
-                if (ch==eof || *ch==eol) {quoteProblem=TRUE; break;}
+                // *ch!=sep needed for detecting cases mentioned in SO post under #1462
+                while (++ch<eof && (*ch!=quote[0] || *(ch-1)=='\\') && *ch!=eol && *ch!=sep);
+                if (ch==eof || *ch==eol || *ch==sep) {quoteProblem=TRUE; break;}
                 noEmbeddedEOL = 1;
-            }
+            } else if (ch+1<eof && *(ch+1)==quote[0] && ch+2<eof && *(ch+2)!=sep) { ch++; continue; }
+            // above else if is necessary for #1164. ch+2 condition is to take care of cases like <"blabla \"",> where sep=',' (test 1336.1)
         }
         if (quoteProblem || ch==eof) {
             // "..." logic has failed. Delegate to normal routine instead of erroring. Solves many cases, especially when files have both proper balanced, and imbalanced quotes. I think this is more towards fread's philosophy than having an explicit 'quote' argument..
@@ -432,7 +436,7 @@ static SEXP coerceVectorSoFar(SEXP v, int oldtype, int newtype, R_len_t sofar, R
     if (sizes[TypeSxp[oldtype]]<4) STOP("Internal error: SIZEOF oldtype %d < 4", oldtype);
     if (sizes[TypeSxp[newtype]]<4) STOP("Internal error: SIZEOF newtype %d < 4", newtype);
     if (sizes[TypeSxp[oldtype]] == sizes[TypeSxp[newtype]] && newtype != SXP_STR) {   // after && is quick fix. TO DO: revisit
-        TYPEOF(v) = TypeSxp[newtype];
+        SET_TYPEOF(v, TypeSxp[newtype]);  // SET_TYPEOF() not TYPEOF= for Karl Millar and rho.
         newv=v;
     } else {
         clock_t tCoerceAlloc0 = clock();
@@ -477,7 +481,7 @@ static SEXP coerceVectorSoFar(SEXP v, int oldtype, int newtype, R_len_t sofar, R
         }
         break;
     case SXP_STR:
-        warning("Bumped column %d to type character on data row %d, field contains '%.*s'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Pleas [...]
+        warning("Bumped column %d to type character on data row %d, field contains '%.*s'. Coercing previously read values in this column from logical, integer or numeric back to character which may not be lossless; e.g., if '00' and '000' occurred before they will now be just '0', and there may be inconsistencies with treatment of ',,' and ',NA,' too (if they occurred in this column before the bump). If this matters please rerun and set 'colClasses' to 'character' for this column. Pleas [...]
         static char buffer[129];  // 25 to hold [+-]2^63, with spare space to be safe and snprintf too
         switch(oldtype) {
         case SXP_LGL : case SXP_INT :
@@ -497,9 +501,9 @@ static SEXP coerceVectorSoFar(SEXP v, int oldtype, int newtype, R_len_t sofar, R
                     SET_STRING_ELT(newv,i,R_BlankString);
                 else {
                     #ifdef WIN32
-                        snprintf(buffer,128,"%I64d",*(long long *)&REAL(v)[i]);
+                        snprintf(buffer,128,"%" PRId64,*(long long *)&REAL(v)[i]);
                     #else
-                       snprintf(buffer,128,"%lld",*(long long *)&REAL(v)[i]);
+                        snprintf(buffer,128,"%lld",    *(long long *)&REAL(v)[i]);
                     #endif
                     SET_STRING_ELT(newv, i, mkChar(buffer));
                 }
@@ -528,17 +532,17 @@ static SEXP coerceVectorSoFar(SEXP v, int oldtype, int newtype, R_len_t sofar, R
     return(newv);
 }
 
-SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastrings, SEXP verbosearg, SEXP autostart, SEXP skip, SEXP select, SEXP drop, SEXP colClasses, SEXP integer64, SEXP dec, SEXP encoding, SEXP stripWhiteArg, SEXP showProgressArg)
+SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastrings, SEXP verbosearg, SEXP autostart, SEXP skip, SEXP select, SEXP drop, SEXP colClasses, SEXP integer64, SEXP dec, SEXP encoding, SEXP quoteArg, SEXP stripWhiteArg, SEXP skipEmptyLinesArg, SEXP fillArg, SEXP showProgressArg)
 // can't be named fread here because that's already a C function (from which the R level fread function took its name)
 {
-    SEXP thiscol, ans, thisstr;
-    R_len_t i, resi, j, resj, k, protecti=0, nrow=0, ncol=0;
-    int thistype;
+    SEXP ans, thisstr;
+    R_len_t i, resi, j, k, protecti=0, nrow=0, ncol=0;
     const char *pos, *ch2, *lineStart;
-    Rboolean header, allchar;
+    Rboolean header, allchar, skipEmptyLines, fill;
     verbose=LOGICAL(verbosearg)[0];
     clock_t t0 = clock();
     ERANGEwarning = FALSE;  // just while detecting types, then TRUE before the read data loop
+    PROTECT_INDEX pi;
 
     // Encoding, #563: Borrowed from do_setencoding from base R
     // https://github.com/wch/r-source/blob/ca5348f0b5e3f3c2b24851d7aff02de5217465eb/src/main/util.c#L1115
@@ -549,13 +553,17 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     else ienc = CE_NATIVE;
 
     stripWhite = LOGICAL(stripWhiteArg)[0];
+    skipEmptyLines = LOGICAL(skipEmptyLinesArg)[0];
+    fill = LOGICAL(fillArg)[0];
+
+    // quoteArg for those rare cases when default scenario doesn't cut it.., FR #568
+    if (!isString(quoteArg) || LENGTH(quoteArg)!=1 || strlen(CHAR(STRING_ELT(quoteArg,0))) > 1)
+        error("quote must either be empty or a single character");
+    quote = CHAR(STRING_ELT(quoteArg,0));
 
-    // Extra tracing for apparent 32bit Windows problem: https://github.com/Rdatatable/data.table/issues/1111
-    if (!isInteger(showProgressArg)) error("showProgress is not type integer but type '%s'. Please report.", type2char(TYPEOF(showProgressArg)));
-    if (LENGTH(showProgressArg)!=1) error("showProgress is not length 1 but length %d. Please report.", LENGTH(showProgressArg));
-    int showProgress = INTEGER(showProgressArg)[0];
-    if (showProgress!=0 && showProgress!=1)
-        error("showProgress is not 0 or 1 but %d. Please report.", showProgress);    
+    if (!isLogical(showProgressArg) || LENGTH(showProgressArg)!=1 || LOGICAL(showProgressArg)[0]==NA_LOGICAL)
+        error("Internal error: showProgress is not TRUE or FALSE. Please report.");
+    const Rboolean showProgress = LOGICAL(showProgressArg)[0];
     
     if (!isString(dec) || LENGTH(dec)!=1 || strlen(CHAR(STRING_ELT(dec,0))) != 1)
         error("dec must be a single character");
@@ -582,7 +590,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
          ||(isString(skip) && LENGTH(skip)==1))) error("'skip' must be a length 1 vector of type numeric or integer >=0, or single character search string");
     if (!isNull(separg)) {
         if (!isString(separg) || LENGTH(separg)!=1 || strlen(CHAR(STRING_ELT(separg,0)))!=1) error("'sep' must be 'auto' or a single character");
-        if (*CHAR(STRING_ELT(separg,0))=='\"') error("sep = '%c' = quote, is not an allowed separator.",'\"');
+        if (*CHAR(STRING_ELT(separg,0))==quote[0]) error("sep = '%c' = quote, is not an allowed separator.",quote[0]);
         if (*CHAR(STRING_ELT(separg,0)) == decChar) error("The two arguments to fread 'dec' and 'sep' are equal ('%c').", decChar);
     }
     if (!isString(integer64) || LENGTH(integer64)!=1) error("'integer64' must be a single character string");
@@ -598,10 +606,10 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     if( ! isNull(nastrings)) {
       FLAG_NA_STRINGS_NULL = 0;
       NASTRINGS_LEN = LENGTH(nastrings);
-      NA_MASK = (int *)malloc(NASTRINGS_LEN * sizeof(int));
+      NA_MASK = (int *)R_alloc(NASTRINGS_LEN, sizeof(int));
       NA_MAX_NCHAR = get_maxlen(nastrings);
-      NA_STRINGS = malloc(NASTRINGS_LEN * sizeof(char *));
-      EACH_NA_STRING_LEN = malloc(NASTRINGS_LEN * sizeof(int));
+      NA_STRINGS = (const char **)R_alloc(NASTRINGS_LEN, sizeof(char *));
+      EACH_NA_STRING_LEN = (int *)R_alloc(NASTRINGS_LEN, sizeof(int));
       for (int i = 0; i < NASTRINGS_LEN; i++) {
         NA_STRINGS[i] = CHAR(STRING_ELT(nastrings, i));
         EACH_NA_STRING_LEN[i] = strlen(CHAR(STRING_ELT(nastrings, i)));
@@ -692,9 +700,11 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     // ********************************************************************************************
     //   Auto detect eol, first eol where there are two (i.e. CRLF)
     // ********************************************************************************************
+    // take care of UTF8 BOM, #1087 and #1465
+    if (!memcmp(mmp, "\xef\xbb\xbf", 3)) mmp += 3;
     ch = mmp;
     while (ch<eof && *ch!='\n' && *ch!='\r') {
-        if (*ch=='\"') while(++ch<eof && *ch!='\"') {};  // allows protection of \n and \r inside column names
+        if (*ch==quote[0]) while(++ch<eof && *ch!=quote[0]) {};  // allows protection of \n and \r inside column names
         ch++;                                            // this 'if' needed in case opening protection is not closed before eof
     }
     if (ch>=eof) {
@@ -794,18 +804,21 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
         if (verbose) Rprintf("Using supplied sep '%s' ... ", seps[0]=='\t'?"\\t":seps);
     }
     int nseps = strlen(seps);
-    
+    int *maxcols = (int *)R_alloc(nseps, sizeof(int)); // if (fill) grab longest col stretch as topNcol
     const char *topStart=ch, *thisStart=ch;
     char topSep=seps[0];
     int topLine=0, topLen=0, topNcol=-1;
     for (int s=0; s<nseps; s++) {
+        maxcols[s] = 0;  // the R_alloc above doesn't initialize
         if (seps[s] == decChar) continue;
         ch=pos; sep=seps[s];
         i=0;
         int thisLine=line, thisLen=0, thisNcol=-1;  // this* = this run's starting *
         while(ch<=eof && ++i<=30) {
+            if (*ch==eol && skipEmptyLines && i<30) {ch++; continue;}
             lineStart = ch;
             ncol = countfields();
+            maxcols[s] = (fill && ncol > maxcols[s]) ? ncol : maxcols[s];
             if (ncol==-1) {
                 if (thisNcol==-1) break;  // if first row has quote problem, move straight on to test a different sep
                 ncol=thisNcol;   // skip the quote problem row for now (consider part of current run)
@@ -814,12 +827,12 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
             if (ch==eof || i==30 || ncol!=thisNcol) {
                 // this* still refers to the previous run which has just finished
                 // Rprintf("\nRun: s='%c' thisLine=%d thisLen=%d thisNcol=%d", sep, thisLine, thisLen, thisNcol);
-                if (thisNcol>1 && (thisLen>topLen ||     // longest run wins
+                if (thisNcol>1 && ( thisLen>topLen || // longest run wins
                                    (thisLen==topLen && sep==topSep && thisNcol>topNcol))) {  // if tied, the one that divides it more (test 1328, 2 rows)
                     topStart = thisStart;
                     topLine = thisLine;
                     topLen = thisLen;
-                    topNcol = thisNcol;
+                    topNcol = (!fill) ? thisNcol : maxcols[s]; // if fill=TRUE, longest column stretch
                     topSep = sep;
                 }
                 if (lineStart==eof) break;
@@ -838,16 +851,19 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
         ncol=1;
     } else {
         sep=topSep;
-        ch=pos=topStart;
-        line=topLine;
         ncol=topNcol;
+        if (!fill) { ch=pos=topStart; line=topLine; }
+        else { ch=pos; line=1; }
         if (verbose) {
             if (isNull(separg)) { if (sep=='\t') Rprintf("'\\t'\n"); else Rprintf("'%c'\n", sep); }
             else Rprintf("found ok\n");
         } 
     }
     if (verbose) {
-        if (sep!=eol) Rprintf("Detected %d columns. Longest stretch was from line %d to line %d\n",ncol,line,line+topLen-1);
+        if (sep!=eol) {
+            if (!fill) Rprintf("Detected %d columns. Longest stretch was from line %d to line %d\n",ncol,line,line+topLen-1);
+            else Rprintf("Detected %d (maximum) columns (fill=TRUE)\n", ncol);
+        }
         ch2 = ch; while(++ch2<eof && *ch2!=eol && ch2-ch<10);
         Rprintf("Starting data input on line %d (either column names or first row of data). First 10 characters: %.*s\n", line, (int)(ch2-ch), ch);
     }
@@ -874,9 +890,9 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     allchar=TRUE;
     for (i=0; i<ncol; i++) {
         skip_spaces(); // remove trailing spaces
-        if (ch<eof && *ch=='\"') {
-            while(++ch<eof && (*ch!='\"' || (ch+1<eof && *(ch+1)!=sep && *(ch+1)!=eol))) {};
-            if (ch<eof && *ch++!='\"') STOP("Internal error: quoted field ends before EOF but not with \"sep");
+        if (ch<eof && *ch==quote[0]) {
+            while(++ch<eof && (*ch!=quote[0] || (ch+1<eof && *(ch+1)!=sep && *(ch+1)!=eol))) {};
+            if (ch<eof && *ch++!=quote[0]) STOP("Internal error: quoted field ends before EOF but not with \"sep");
         } else {                              // if field reads as double ok then it's INT/INT64/REAL; i.e., not character (and so not a column name)
             if (*ch!=sep && *ch!=eol && Strtod())  // blank column names (,,) considered character and will get default names
                 allchar=FALSE;                     // considered testing at least one isalpha, but we want 1E9 to be a value not a column name
@@ -884,8 +900,9 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
                 while(ch<eof && *ch!=eol && *ch!=sep) ch++;  // skip over unquoted character field
         }
         if (i<ncol-1) {   // not the last column (doesn't have a separator after it)
-            if (ch<eof && *ch!=sep) STOP("Unexpected character ending field %d of line %d: %.*s", i+1, line, ch-pos+5, pos);
-            else if (ch<eof) ch++;
+            if (ch<eof && *ch!=sep) {
+                if (!fill) STOP("Unexpected character ending field %d of line %d: %.*s", i+1, line, ch-pos+5, pos);
+            } else if (ch<eof) ch++;
         } 
     }
     // discard any whitespace after last column name on first row before the eol (was a TODO)
@@ -909,12 +926,12 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
             // Also Fiedl() it takes care of leading spaces. Easier to understand the logic.
             skip_spaces(); Field();
             if (fieldLen) {
-                SET_STRING_ELT(names, i, mkCharLen(fieldStart, fieldLen));
+                SET_STRING_ELT(names, i, mkCharLenCE(fieldStart, fieldLen, ienc)); // #1680 fix, respect encoding on header col
             } else {
                 sprintf(buff,"V%d",i+1);
                 SET_STRING_ELT(names, i, mkChar(buff));
             }
-            if (i<ncol-1) ch++; // move the beginning char of next field
+            if (ch<eof && *ch!=eol && i<ncol-1) ch++; // move the beginning char of next field
         }
         while (ch<eof && *ch!=eol) ch++; // no need for skip_spaces() here
         if (ch<eof && *ch==eol) ch+=eolLen;  // now on first data row (row after column names)
@@ -926,7 +943,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     //   Count number of rows
     // ********************************************************************************************
     i = INTEGER(nrowsarg)[0];
-    if (pos==eof || *pos==eol) {
+    if (pos==eof || (*pos==eol && !fill && !skipEmptyLines)) {
         nrow=0;
         if (verbose) Rprintf("Byte after header row is eof or eol, 0 data rows present.\n");
     } else if (i>-1) {
@@ -935,27 +952,57 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
         // Intended for nrow=10 to see top 10 rows quickly without touching remaining pages
     } else {
         long long neol=1, nsep=0, tmp;
-        while (ch<eof) {   
-            // goal: quick row count with no branches while still coping with embedded sep and \n, below
-            neol+=(*ch==eol);
-            nsep+=(*ch++==sep);
+        // handle most frequent case first
+        if (!fill) {
+            // goal: quick row count with no branches and cope with embedded sep and \n
+            while (ch<eof) {
+                neol+=(*ch==eol);
+                nsep+=(*ch++==sep);
+            }
+        } else {
+            // goal: don't count newlines within quotes, 
+            // don't rely on 'sep' because it'll provide an underestimate
+            if (!skipEmptyLines) {
+                while (ch<eof) {
+                    if (*ch!=quote[0]) neol += (*ch==eol);
+                    else while(ch+1<eof && *(++ch)!=quote[0]); // quits at next quote
+                    ch++;
+                }
+            } else {
+                while (ch<eof) {
+                    if (*ch!=quote[0]) neol += (*ch==eol && *(ch-1)!=eol);
+                    else while(ch+1<eof && *(++ch)!=quote[0]); // quits at next quote
+                    ch++;
+                }                
+            }
         }
         if (ch!=eof) STOP("Internal error: ch!=eof after counting sep and eol");
-        i=0; int nblank=0;
-        while (i==0 && ch>pos) {   
-            // count blank lines at the end
-            i=0; while (ch>pos && *--ch!=eol2) i += !isspace(*ch);
-            nblank += (i==0);
-            ch -= eolLen-1;
-        }
-        // if (nblank==0) There is non white after the last eol. Ok and dealt with. TO DO: reference test id here in comment
-        if (ncol==1) tmp = neol-nblank;
-        else tmp = MIN( nsep / (ncol-1),  neol-nblank );   // good quick estimate with embedded sep and eol in mind
+        i=0; int endblanks=0;
+        if (!fill) {
+            while (i==0 && ch>pos) {
+                // count blank lines at the end
+                i=0; while (ch>pos && *--ch!=eol2) i += !isspace(*ch);
+                endblanks += (i==0);
+                ch -= eolLen-1;
+            }
+        } else if (fill && !skipEmptyLines) endblanks = (*(ch-1) == eol);
+
+        // if (endblanks==0) There is non white after the last eol. Ok and dealt with. TO DO: reference test id here in comment
+        if (ncol==1 || fill) tmp = neol-endblanks;
+        else tmp = MIN( nsep/(ncol-1),  neol-endblanks );   // good quick estimate with embedded sep and eol in mind
         if (verbose || tmp>INT_MAX) {
-            Rprintf("Count of eol: %lld (including %d at the end)\n",neol,nblank);
-            Rprintf("Count of sep: %lld\n",nsep);
-            if (ncol==1) Rprintf("ncol==1 so sep count ignored\n");
-            else Rprintf("nrow = MIN( nsep [%lld] / ncol [%d] -1, neol [%lld] - nblank [%d] ) = %lld\n", nsep, ncol, neol, nblank, tmp);
+            if (!fill) {
+                Rprintf("Count of eol: %lld (including %d at the end)\n",neol,endblanks);
+                if (ncol==1) Rprintf("ncol==1 so sep count ignored\n");
+                else {
+                    Rprintf("Count of sep: %lld\n",nsep);
+                    Rprintf("nrow = MIN( nsep [%lld] / (ncol [%d] -1), neol [%lld] - endblanks [%d] ) = %lld\n", nsep, ncol, neol, endblanks, tmp);
+                }
+            } else {
+                if (!skipEmptyLines) 
+                    Rprintf("nrow = neol [%lld] - endblanks [%d] = %lld\n", neol, endblanks, tmp);
+                else Rprintf("nrow = neol (after discarding blank lines) = %lld\n", tmp);
+            }
             if (tmp > INT_MAX) STOP("nrow larger than current 2^31 limit");
         }
         nrow = tmp;
@@ -968,18 +1015,24 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     }
     clock_t tRowCount = clock();
     
-    // ********************************************************************************************
-    //   Make best guess at column types using first 5 rows, middle 5 rows and last 5 rows
-    // ********************************************************************************************
+    // *********************************************************************************************************
+    //   Make best guess at column types using 100 rows at 10 points, including the very first and very last row
+    // *********************************************************************************************************
     int type[ncol]; for (i=0; i<ncol; i++) type[i]=0;   // default type is lowest.
-    const char *str, *thispos;
-    for (j=0; j<(nrow>15?3:1); j++) {
-        switch(j) {
-        case 0: ch = pos;                  str="   first";  break;  // str same width so the codes line up vertically
-        case 1: ch = pos + 1*(eof-pos)/3;  str="+ middle";  break;
-        case 2: ch = pos + 2*(eof-pos)/3;  str="+   last";  break;  // 2/3 way through rather than end ... easier
+    const char *thispos;
+    int numPoints = nrow>1000 ? 11  : 1;
+    int eachNrows = nrow>1000 ? 100 : nrow;  // if nrow<=1000, test all the rows in a single iteration
+    for (j=0; j<numPoints; j++) {
+        if (j<10) {
+            ch = pos + j*(eof-pos)/10;
+        } else {
+            ch = eof - 50*(eof-pos)/nrow;
+            // include very last line by setting last point apx 50 lines from
+            // end and testing 100 lines from there until eof.
         }
-        if (j) {  
+        // detect types by starting at the first non-empty line
+        if (fill || skipEmptyLines) while (ch<eof && *ch==eol) ch++;
+        if (j) {
             // we may have landed inside quoted field containing embedded sep and/or embedded \n
             // find next \n and see if 5 good lines follow. If not try next \n, and so on, until we find the real \n
             // We don't know which line number this is because we jumped straight to it
@@ -996,14 +1049,14 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
                     break;
                 }
             }
+            // change warning to verbose, avoids confusion, see #1124
             if (i<5) {
-                if (INTEGER(nrowsarg)[0]==-1)
-                    warning("Unable to find 5 lines with expected number of columns (%s)\n", str);
+                if (verbose) Rprintf("Couldn't guess column types from test point %d\n", j);
                 continue;
             }
         }
         i = 0;
-        while(i<5 && ch<eof && *ch!=eol) {
+        while(i<eachNrows && ch<eof && *ch!=eol) {  // Test 100 lines from each of the 10 points in the file
             i++;
             lineStart = ch;
             for (field=0;field<ncol;field++) {
@@ -1029,15 +1082,16 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
                     Field();   // don't do err=1 here because we don't know 'line' when j=1|2. Leave error to throw in data read step.
                 }
                 if (ch<eof && *ch==sep && field<ncol-1) {ch++; continue;}  // done, next field
-                if (field<ncol-1) {
-                    if (*ch>31) STOP("Expected sep ('%c') but '%c' ends field %d when detecting types (%s): %.*s", sep, *ch, field, str, ch-lineStart+1, lineStart);
-                    else STOP("Expected sep ('%c') but new line, EOF (or other non printing character) ends field %d when detecting types (%s): %.*s", sep, field, str, ch-lineStart+1, lineStart);
+                if (field<ncol-1 && !fill) {
+                    if (*ch>31) STOP("Expected sep ('%c') but '%c' ends field %d when detecting types from point %d: %.*s", sep, *ch, field, j, ch-lineStart+1, lineStart);
+                    else STOP("Expected sep ('%c') but new line, EOF (or other non printing character) ends field %d when detecting types from point %d: %.*s", sep, field, j, ch-lineStart+1, lineStart);
                 }
             }
             while (ch<eof && *ch!=eol) ch++;
             if (ch<eof && *ch==eol) ch+=eolLen;
+            if (stripWhite) skip_spaces();
         }
-        if (verbose) { Rprintf("Type codes (%s 5 rows): ",str); for (i=0; i<ncol; i++) Rprintf("%d",type[i]); Rprintf("\n"); }
+        if (verbose) { Rprintf("Type codes (point %2d): ",j); for (i=0; i<ncol; i++) Rprintf("%d",type[i]); Rprintf("\n"); }
     }
     ch = pos;
     
@@ -1047,10 +1101,13 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     int numNULL = 0;
     SEXP colTypeIndex, items, itemsInt, UserTypeNameSxp;
     int tmp[ncol]; for (i=0; i<ncol; i++) tmp[i]=0;  // used to detect ambiguities (dups) in user's input
-    if (length(colClasses)) {
+    if (isLogical(colClasses)) {
+        // allNA only valid logical input
+        for (int k=0; k<LENGTH(colClasses); k++) if (LOGICAL(colClasses)[k] != NA_LOGICAL) STOP("when colClasses is logical it must be all NA. Position %d contains non-NA: %d", k+1, LOGICAL(colClasses)[k]);
+        if (verbose) Rprintf("Argument colClasses is ignored as requested by provided NA values\n");
+    } else if (length(colClasses)) {
         UserTypeNameSxp = PROTECT(allocVector(STRSXP, NUT));
         protecti++;
-        int thisType;
         for (i=0; i<NUT; i++) SET_STRING_ELT(UserTypeNameSxp, i, mkChar(UserTypeName[i]));
         if (isString(colClasses)) {
             // this branch unusual for fread: column types for all columns in one long unamed character vector
@@ -1058,8 +1115,12 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
             if (LENGTH(colClasses)!=1 && LENGTH(colClasses)!=ncol) STOP("colClasses is unnamed and length %d but there are %d columns. See ?data.table for colClasses usage.", LENGTH(colClasses), ncol);
             colTypeIndex = PROTECT(chmatch(colClasses, UserTypeNameSxp, NUT, FALSE));  // if type not found then read as character then as. at R level
             protecti++;
-            for (k=0; k<ncol; k++) {
-                thisType = UserTypeNameMap[ INTEGER(colTypeIndex)[ LENGTH(colClasses)==1 ? 0 : k] -1 ];
+            for (int k=0; k<ncol; k++) {
+                if (STRING_ELT(colClasses, LENGTH(colClasses)==1 ? 0 : k) == NA_STRING) {
+                    if (verbose) Rprintf("Column %d ('%s') was detected as type '%s'. Argument colClasses is ignored as requested by provided NA value\n", k+1, CHAR(STRING_ELT(names,k)), UserTypeName[type[k]] );
+                    continue;
+                }
+                int thisType = UserTypeNameMap[ INTEGER(colTypeIndex)[ LENGTH(colClasses)==1 ? 0 : k] -1 ];
                 if (type[k]<thisType) {
                     if (verbose) Rprintf("Column %d ('%s') was detected as type '%s' but bumped to '%s' as requested by colClasses\n", k+1, CHAR(STRING_ELT(names,k)), UserTypeName[type[k]], UserTypeName[thisType] );
                     type[k]=thisType;
@@ -1072,7 +1133,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
             colTypeIndex = PROTECT(chmatch(getAttrib(colClasses, R_NamesSymbol), UserTypeNameSxp, NUT, FALSE));
             protecti++;
             for (i=0; i<LENGTH(colClasses); i++) {
-                thisType = UserTypeNameMap[INTEGER(colTypeIndex)[i]-1];
+                int thisType = UserTypeNameMap[INTEGER(colTypeIndex)[i]-1];
                 items = VECTOR_ELT(colClasses,i);
                 if (thisType == SXP_NULL) {
                     if (!isNull(drop) || !isNull(select)) STOP("Can't use NULL in colClasses when select or drop is used as well.");
@@ -1132,7 +1193,13 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     if (length(select)) {
         if (any_duplicated(select,FALSE)) STOP("Duplicates detected in select");
         if (isString(select)) {
-            itemsInt = PROTECT(chmatch(names, select, NA_INTEGER, FALSE)); protecti++;
+            // invalid cols check part of #1445 moved here (makes sense before reading the file)
+            itemsInt = PROTECT(chmatch(select, names, NA_INTEGER, FALSE));
+            for (i=0; i<length(select); i++) if (INTEGER(itemsInt)[i]==NA_INTEGER) 
+                warning("Column name '%s' not found in column name header (case sensitive), skipping.", CHAR(STRING_ELT(select, i)));
+            UNPROTECT(1);
+            PROTECT_WITH_INDEX(itemsInt, &pi);
+            REPROTECT(itemsInt = chmatch(names, select, NA_INTEGER, FALSE), pi); protecti++;
             for (i=0; i<ncol; i++) if (INTEGER(itemsInt)[i]==NA_INTEGER) { type[i]=SXP_NULL; numNULL++; }
         } else {
             itemsInt = PROTECT(coerceVector(select, INTSXP)); protecti++;
@@ -1167,8 +1234,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     }
     for (i=0,resi=0; i<ncol; i++) {
         if (type[i] == SXP_NULL) continue;
-        thistype = TypeSxp[ type[i] ];
-        thiscol = allocVector(thistype,nrow);
+        SEXP thiscol = allocVector(TypeSxp[ type[i] ], nrow);
         SET_VECTOR_ELT(ans,resi++,thiscol);  // no need to PROTECT thiscol, see R-exts 5.9.1
         if (type[i]==SXP_INT64) setAttrib(thiscol, R_ClassSymbol, ScalarString(mkChar("integer64")));
         SET_TRUELENGTH(thiscol, nrow);
@@ -1183,35 +1249,41 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
     ERANGEwarning = TRUE;
     clock_t nexttime = t0+2*CLOCKS_PER_SEC;  // start printing % done after a few seconds. If doesn't appear then you know mmap is taking a while.
                                              // We don't want to be bothered by progress meter for quick tasks
-    int batchend; Rboolean hasPrinted=FALSE, whileBreak=FALSE;
+    Rboolean hasPrinted=FALSE, whileBreak=FALSE;
     i = 0;
     while (i<nrow && ch<eof) {
-        if (showProgress==1 && clock()>nexttime) {
+        if (showProgress && clock()>nexttime) {
             Rprintf("\rRead %.1f%% of %d rows", (100.0*i)/nrow, nrow);   // prints straight away if the mmap above took a while, is the idea
             R_FlushConsole();    // for Windows
             nexttime = clock()+CLOCKS_PER_SEC;
             hasPrinted = TRUE;
         }
         R_CheckUserInterrupt();
-        batchend = MIN(i+10000, nrow);    // batched into 10k rows to save (expensive) calls to clock()
-        for (; i<batchend && ch<eof; i++) {
+        int batchend = MIN(i+10000, nrow);    // batched into 10k rows to save (expensive) calls to clock()
+        while(i<batchend && ch<eof) {
             //Rprintf("Row %d : %.10s\n", i+1, ch);
+            if (stripWhite) skip_spaces(); // #1575 fix
             if (*ch==eol) {
-                // blank line causes early stop.  TO DO: allow blank line skips
-                whileBreak = TRUE;  // break the enclosing while too, without changing i
-                break;              // break this for
+                if (skipEmptyLines) { ch++; continue; }
+                else if (!fill) {
+                    whileBreak = TRUE;  // break the enclosing while too, without changing i
+                    break;              // break this while
+                }
             }
-            for (j=0,resj=0; j<ncol; resj+=(type[j]!=SXP_NULL),j++) {
-                //Rprintf("Field %d: '%.10s' as type %d\n", j+1, ch, type[j]);
+            for (int j=0, resj=-1; j<ncol; j++) {
+                // Rprintf("Field %d: '%.10s' as type %d\n", j+1, ch, type[j]);
+                // TODO: fill="auto" to automatically fill when end of line/file and != ncol?
                 if (stripWhite) skip_spaces();
-                thiscol = VECTOR_ELT(ans, resj);
+                SEXP thiscol = (type[j]!=SXP_NULL) ? VECTOR_ELT(ans, ++resj) : NULL;
                 switch (type[j]) {
                 case SXP_LGL:
-                    if (Strtob()) { LOGICAL(thiscol)[i] = u.b; break; }
+                    if (fill && (*ch==eol || ch==eof)) { LOGICAL(thiscol)[i] = NA_LOGICAL; break; }
+                    else if (Strtob()) { LOGICAL(thiscol)[i] = u.b; break; }
                     SET_VECTOR_ELT(ans, resj, thiscol = coerceVectorSoFar(thiscol, type[j]++, SXP_INT, i, j));
                 case SXP_INT:
                     ch2=ch; u.l=NA_INTEGER;
-                    if (Strtoll() && INT_MIN<=u.l && u.l<=INT_MAX) {  // relies on INT_MIN==NA.INTEGER, checked earlier
+                    if (fill && (*ch2==eol || ch2==eof)) { INTEGER(thiscol)[i] = u.l; break; }
+                    else if (Strtoll() && INT_MIN<=u.l && u.l<=INT_MAX) {  // relies on INT_MIN==NA.INTEGER, checked earlier
                         INTEGER(thiscol)[i] = (int)u.l;
                         break;   //  Most common case. Done with this field. Strtoll already moved ch for us to sit on next sep or eol.
                     }
@@ -1223,20 +1295,26 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
                 case SXP_INT64:
                     // fix for #488. PREVIOSULY: u.d = NA_REAL;
                     u.l = NAINT64; // NAINT64 is defined in data.table.h, = LLONG_MIN
-                    if (Strtoll()) { REAL(thiscol)[i] = u.d; break; }
+                    if (fill && (*ch==eol || ch==eof)) { REAL(thiscol)[i] = u.d; break; }
+                    else if (Strtoll()) { REAL(thiscol)[i] = u.d; break; }
                     SET_VECTOR_ELT(ans, resj, thiscol = coerceVectorSoFar(thiscol, type[j]++, SXP_REAL, i, j));
                     // A bump from INT to STR will bump through INT64 and then REAL before STR, coercing each time. Deliberately done this way. It's
                     // a small and very rare cost (see comments in coerceVectorSoFar), for better speed 99% of the time (saving deep branches).
                     // TO DO: avoid coercing several times and bump straight to the new type once, somehow.
                 case SXP_REAL: case_SXP_REAL:
-                    if (Strtod()) { REAL(thiscol)[i] = u.d; break; }
+                    if (fill && (*ch==eol || ch==eof)) { REAL(thiscol)[i] = NA_REAL;  break; }
+                    else if (Strtod()) { REAL(thiscol)[i] = u.d; break; }
                     SET_VECTOR_ELT(ans, resj, thiscol = coerceVectorSoFar(thiscol, type[j]++, SXP_STR, i, j));
                 case SXP_STR: case SXP_NULL: case_SXP_STR:
-                    Field();
-                    if (type[j]==SXP_STR) SET_STRING_ELT(thiscol, i, mkCharLenCE(fieldStart, fieldLen, ienc));
+                    if (fill && (*ch==eol || ch==eof)) {
+                        if (type[j]==SXP_STR) SET_STRING_ELT(thiscol, i, mkChar(""));
+                    } else {
+                        Field();
+                        if (type[j]==SXP_STR) SET_STRING_ELT(thiscol, i, mkCharLenCE(fieldStart, fieldLen, ienc));
+                    }
                 }
                 if (ch<eof && *ch==sep && j<ncol-1) {ch++; continue;}  // done, next field
-                if (j<ncol-1) {
+                if (j<ncol-1 && !fill) {
                     if (*ch>31) STOP("Expected sep ('%c') but '%c' ends field %d on line %d when reading data: %.*s", sep, *ch, j+1, line, ch-pos+1, pos);
                     else STOP("Expected sep ('%c') but new line or EOF ends field %d on line %d when reading data: %.*s", sep, j+1, line, ch-pos+1, pos);
                     // print whole line here because it's often something earlier in the line that messed up
@@ -1247,15 +1325,16 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
             if (ch<eof && *ch!=eol) {
                 // TODO: skip spaces here if strip.white=TRUE (arg to be added) and then check+warn
                 // TODO: warn about uncommented text here
-                error("Expecting %d cols, but line %d contains text after processing all cols. It is very likely that this is due to one or more fields having embedded sep='%c' and/or (unescaped) '\\n' characters within unbalanced unescaped quotes. fread cannot handle such ambiguous cases and those lines may not have been read in as expected. Please read the section on quotes in ?fread.", ncol, line, sep);
+                error("Expecting %d cols, but line %d contains text after processing all cols. Try again with fill=TRUE. Another reason could be that fread's logic in distinguishing one or more fields having embedded sep='%c' and/or (unescaped) '\\n' characters within unbalanced unescaped quotes has failed. If quote='' doesn't help, please file an issue to figure out if the logic could be improved.", ncol, line, sep);
             }
             ch+=eolLen; // now that we error here, the if-statement isn't needed -> // if (ch<eof && *ch==eol) ch+=eolLen;
             pos = ch;  // start of line position only needed to include the whole line in any error message
             line++;
+            i++;
         }
         if (whileBreak) break;
     }
-    if (showProgress==1 && hasPrinted) {
+    if (showProgress && hasPrinted) {
         j = 1+(clock()-t0)/CLOCKS_PER_SEC;
         Rprintf("\rRead %d rows and %d (of %d) columns from %.3f GB file in %02d:%02d:%02d\n", i, ncol-numNULL, ncol, 1.0*filesize/(1024*1024*1024), j/3600, (j%3600)/60, j%60);
         R_FlushConsole();
@@ -1270,29 +1349,26 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
         if (INTEGER(nrowsarg)[0] == -1 || i < nrow) warning("Stopped reading at empty line %d but text exists afterwards (discarded): %.*s", line, ch2-ch, ch);
     }
     if (i<nrow) {
-        if (nrow-i > 100 && (double)i/nrow < 0.95)
-            warning("Read less rows (%d) than were allocated (%d). Run again with verbose=TRUE and please report.",i,nrow);
-        else if (verbose)
-            Rprintf("Read slightly fewer rows (%d) than were allocated (%d).\n", i, nrow);
+        // the condition above happens usually when the file contains many newlines. This is not necesarily something to be worried about. I've therefore commented the warning part, and retained the verbose message. If there are cases where lines don't get read in, we can revisit this warning. Fixes #1116.
+        // if (nrow-i > 100 && (double)i/nrow < 0.95)
+            // warning("Read less rows (%d) than were allocated (%d). Run again with verbose=TRUE and please report.",i,nrow);
+        // else if (verbose)
+        if (verbose)
+            Rprintf("Read fewer rows (%d) than were allocated (%d).\n", i, nrow);
         nrow = i;
     } else {
         if (i!=nrow) STOP("Internal error: i [%d] > nrow [%d]", i, nrow);
         if (verbose) Rprintf("Read %d rows. Exactly what was estimated and allocated up front\n", i);
     }
     for (j=0; j<ncol-numNULL; j++) SETLENGTH(VECTOR_ELT(ans,j), nrow);
-    // release memory from NA handling operations
-    if(!FLAG_NA_STRINGS_NULL) {
-      free(NA_MASK);
-      free(NA_STRINGS);
-      free(EACH_NA_STRING_LEN);
-    }
+    
     // ********************************************************************************************
     //   Convert na.strings to NA for character columns
     // ********************************************************************************************
     for (k=0; k<length(nastrings); k++) {
         thisstr = STRING_ELT(nastrings,k);
         for (j=0; j<ncol-numNULL; j++) {
-            thiscol = VECTOR_ELT(ans,j);
+            SEXP thiscol = VECTOR_ELT(ans,j);
             if (TYPEOF(thiscol)==STRSXP) {
                 for (i=0; i<nrow; i++)
                     if (STRING_ELT(thiscol,i)==thisstr) SET_STRING_ELT(thiscol, i, NA_STRING);
@@ -1305,7 +1381,7 @@ SEXP readfile(SEXP input, SEXP separg, SEXP nrowsarg, SEXP headerarg, SEXP nastr
         Rprintf("%8.3fs (%3.0f%%) Memory map (rerun may be quicker)\n", 1.0*(tMap-t0)/CLOCKS_PER_SEC, 100.0*(tMap-t0)/tot);
         Rprintf("%8.3fs (%3.0f%%) sep and header detection\n", 1.0*(tLayout-tMap)/CLOCKS_PER_SEC, 100.0*(tLayout-tMap)/tot);
         Rprintf("%8.3fs (%3.0f%%) Count rows (wc -l)\n", 1.0*(tRowCount-tLayout)/CLOCKS_PER_SEC, 100.0*(tRowCount-tLayout)/tot);
-        Rprintf("%8.3fs (%3.0f%%) Column type detection (first, middle and last 5 rows)\n", 1.0*(tColType-tRowCount)/CLOCKS_PER_SEC, 100.0*(tColType-tRowCount)/tot);
+        Rprintf("%8.3fs (%3.0f%%) Column type detection (100 rows at 10 points)\n", 1.0*(tColType-tRowCount)/CLOCKS_PER_SEC, 100.0*(tColType-tRowCount)/tot);
         Rprintf("%8.3fs (%3.0f%%) Allocation of %dx%d result (xMB) in RAM\n", 1.0*(tAlloc-tColType)/CLOCKS_PER_SEC, 100.0*(tAlloc-tColType)/tot, nrow, ncol);
         Rprintf("%8.3fs (%3.0f%%) Reading data\n", 1.0*(tRead-tAlloc-tCoerce)/CLOCKS_PER_SEC, 100.0*(tRead-tAlloc-tCoerce)/tot);
         Rprintf("%8.3fs (%3.0f%%) Allocation for type bumps (if any), including gc time if triggered\n", 1.0*tCoerceAlloc/CLOCKS_PER_SEC, 100.0*tCoerceAlloc/tot);
diff --git a/src/fsort.c b/src/fsort.c
new file mode 100644
index 0000000..7b63c7a
--- /dev/null
+++ b/src/fsort.c
@@ -0,0 +1,302 @@
+#include "data.table.h"
+
+#define INSERT_THRESH 200  // TODO: expose via api and test
+
+static void dinsert(double *x, int n) {   // TODO: if and when twiddled, double => ull
+  if (n<2) return;
+  for (int i=1; i<n; i++) {
+    double xtmp = x[i];
+    int j = i-1;
+    if (xtmp<x[j]) {
+      x[j+1] = x[j];
+      j--;
+      while (j>=0 && xtmp<x[j]) { x[j+1] = x[j]; j--; }
+      x[j+1] = xtmp;
+    }
+  }
+}
+
+static union {
+  double d;
+  unsigned long long ull;
+} u;
+static unsigned long long minULL;
+
+static void dradix_r(  // single-threaded recursive worker
+  double *in,          // n doubles to be sorted
+  double *working,     // working memory to put the sorted items before copying over *in; must not overlap *in
+  R_xlen_t n,          // number of items to sort.  *in and *working must be at least n long
+  int fromBit,         // After twiddle to ordered ull, the bits [fromBit,toBit] are used to count  
+  int toBit,           //   fromBit<toBit; bit 0 is the least significant; fromBit is right shift amount too
+  R_xlen_t *counts     // already zero'd counts vector, 2^(toBit-fromBit+1) long. A stack of these is reused.
+) {
+  unsigned long long width = 1ULL<<(toBit-fromBit+1);
+  unsigned long long mask = width-1;
+  
+  const double *tmp=in;
+  for (R_xlen_t i=0; i<n; i++) {
+    counts[(*(unsigned long long *)tmp - minULL) >> fromBit & mask]++;
+    tmp++;
+  }
+  int last = (*(unsigned long long *)--tmp - minULL) >> fromBit & mask;
+  if (counts[last] == n) {
+    // Single value for these bits here. All counted in one bucket which must be the bucket for the last item.
+    counts[last] = 0;  // clear ready for reuse. All other counts must be zero already so save time by not setting to 0. 
+    if (fromBit > 0)   // move on to next bits (if any remain) to resolve 
+      dradix_r(in, working, n, fromBit<8 ? 0 : fromBit-8, toBit-8, counts+256);
+    return;
+  }
+
+  R_xlen_t cumSum=0;
+  for (R_xlen_t i=0; cumSum<n; i++) { // cumSum<n better than i<width as may return early
+    unsigned long long tmp;
+    if ((tmp=counts[i])) {  // don't cumulate through 0s, important below to save a wasteful memset to zero
+      counts[i] = cumSum;
+      cumSum += tmp;
+    }
+  } // leaves cumSum==n && 0<i && i<=width
+  
+  tmp=in;
+  for (R_xlen_t i=0; i<n; i++) {  // go forwards not backwards to give cpu pipeline better chance
+    int thisx = (*(unsigned long long *)tmp - minULL) >> fromBit & mask;
+    working[ counts[thisx]++ ] = *tmp;
+    tmp++;
+  }
+  
+  memcpy(in, working, n*sizeof(double));
+  
+  if (fromBit==0) {
+    // nothing left to do other than reset the counts to 0, ready for next recursion
+    // the final bucket must contain n and it might be close to the start. After that must be all 0 so no need to reset.
+    // Also this way, we don't need to know how big thisCounts is and therefore no possibility of getting that wrong.
+    // wasteful thisCounts[i]=0 even when already 0 is better than a branch. We are highly recursive at this point
+    // so avoiding memset() is known to be worth it.
+    for (int i=0; counts[i]<n; i++) counts[i]=0;
+    return;
+  }
+  
+  cumSum=0;
+  for (int i=0; cumSum<n; i++) {   // again, cumSum<n better than i<width as it can return early
+    if (counts[i] == 0) continue;
+    R_xlen_t thisN = counts[i] - cumSum;  // undo cummulate; i.e. diff
+    if (thisN <= INSERT_THRESH) {
+      dinsert(in+cumSum, thisN);  // for thisN==1 this'll return instantly. Probably better than several branches here.
+    } else {
+      dradix_r(in+cumSum, working, thisN, fromBit<=8 ? 0 : fromBit-8, toBit-8, counts+256);
+    }  
+    cumSum = counts[i];
+    counts[i] = 0; // reset to 0 to save wasteful memset afterwards
+  }
+}
+
+R_xlen_t *qsort_data;
+// would have liked to define cmp inside fsort where qsort is called but wasn't sure that's portable
+int qsort_cmp(const void *a, const void *b) {
+  // return >0 if the element a goes after the element b
+  // doesn't master if stable or not
+  R_xlen_t x = qsort_data[*(int *)a];
+  R_xlen_t y = qsort_data[*(int *)b];
+  // return x-y;  would like this, but this is long and the cast to int return may not preserve sign
+  // We have long vectors in mind (1e10(74GB), 1e11(740GB)) where extreme skew may feasibly mean the largest count
+  // is greater than 2^32. The first split is (currently) 16 bits so should be very rare but to be safe keep 64bit counts.    
+  return (x<y)-(x>y);   // largest first in a safe branchless way casting long to int
+}
+
+SEXP fsort(SEXP x, SEXP verboseArg) {
+  if (!isLogical(verboseArg) || LENGTH(verboseArg)!=1 || LOGICAL(verboseArg)[0]==NA_LOGICAL)
+    error("verbose must be TRUE or FALSE");
+  Rboolean verbose = LOGICAL(verboseArg)[0];
+  if (!isNumeric(x)) error("x must be a vector of type 'double' currently");
+  // TODO: not only detect if already sorted, but if it is, just return x to save the duplicate
+  
+  SEXP ansVec = PROTECT(allocVector(REALSXP, xlength(x)));
+  double *ans = REAL(ansVec);
+  // allocate early in case fails if not enough RAM
+  // TODO: document this is much cheaper than a copy followed by in-place.
+  
+  int nth = getDTthreads();
+  int nBatch=nth*2;  // at least nth; more to reduce last-man-home; but not too large to keep counts small in cache
+  if (verbose) Rprintf("nth=%d, nBatch=%d\n",nth,nBatch);
+  
+  R_xlen_t batchSize = (xlength(x)-1)/nBatch + 1;
+  if (batchSize < 1024) batchSize = 1024; // simple attempt to work reasonably for short vector. 1024*8 = 2 4kb pages
+  nBatch = (xlength(x)-1)/batchSize + 1;
+  R_xlen_t lastBatchSize = xlength(x) - (nBatch-1)*batchSize;
+  // could be that lastBatchSize == batchSize when i) xlength(x) is multiple of nBatch
+  // and ii) for small vectors with just one batch 
+  
+  double mins[nBatch], maxs[nBatch];
+  #pragma omp parallel for schedule(dynamic) num_threads(nth)
+  for (int batch=0; batch<nBatch; batch++) {
+    R_xlen_t thisLen = (batch==nBatch-1) ? lastBatchSize : batchSize;
+    double *d = &REAL(x)[batchSize * batch];
+    double myMin=*d, myMax=*d;
+    d++;
+    for (R_xlen_t j=1; j<thisLen; j++) {
+      // TODO: test for sortedness here as well.
+      if (*d<myMin) myMin=*d;
+      else if (*d>myMax) myMax=*d;
+      d++;
+    }
+    mins[batch] = myMin;
+    maxs[batch] = myMax;
+  }
+  double min=mins[0], max=maxs[0];
+  for (int i=1; i<nBatch; i++) {
+    // TODO: if boundaries are sorted then we only need sort the unsorted batches known above
+    if (mins[i]<min) min=mins[i];
+    if (maxs[i]>max) max=maxs[i];
+  }
+  if (verbose) Rprintf("Range = [%g,%g]\n", min, max);
+  if (min < 0.0) error("Cannot yet handle negatives.");
+  // TODO: -0ULL should allow negatives
+  //       avoid twiddle function call as expensive in recent tests (0.34 vs 2.7)
+  //       possibly twiddle once to *ans, then untwiddle at the end in a fast parallel sweep
+  u.d = max;
+  unsigned long long maxULL = u.ull;
+  u.d = min;
+  minULL = u.ull;  // set static global for use by dradix_r
+  
+  int maxBit = floor(log(maxULL-minULL) / log(2));  // 0 is the least significant bit
+  int MSBNbits = maxBit > 15 ? 16 : maxBit+1;       // how many bits make up the MSB
+  int shift = maxBit + 1 - MSBNbits;                // the right shift to leave the MSB bits remaining
+  int MSBsize = 1<<MSBNbits;                        // the number of possible MSB values (16 bits => 65,536)
+  if (verbose) Rprintf("maxBit=%d; MSBNbits=%d; shift=%d; MSBsize=%d\n", maxBit, MSBNbits, shift, MSBsize);
+  
+  R_xlen_t *counts = calloc(nBatch*(size_t)MSBsize, sizeof(R_xlen_t));
+  if (counts==NULL) error("Unable to allocate working memory");
+  // provided MSBsize>=9, each batch is a multiple of at least one 4k page, so no page overlap
+  // TODO: change all calloc, malloc and free to Calloc and Free to be robust to error() and catch ooms. 
+  
+  if (verbose) Rprintf("counts is %dMB (%d pages per nBatch=%d, batchSize=%lld, lastBatchSize=%lld)\n",
+                       nBatch*MSBsize*sizeof(R_xlen_t)/(1024*1024), nBatch*MSBsize*sizeof(R_xlen_t)/(4*1024*nBatch),
+                       nBatch, batchSize, lastBatchSize);
+  
+  #pragma omp parallel for num_threads(nth)
+  for (int batch=0; batch<nBatch; batch++) {
+    R_xlen_t thisLen = (batch==nBatch-1) ? lastBatchSize : batchSize;
+    double *tmp = &REAL(x)[batchSize * (size_t)batch];
+    R_xlen_t *thisCounts = counts + batch*(size_t)MSBsize;
+    for (R_xlen_t j=0; j<thisLen; j++) {
+      thisCounts[(*(unsigned long long *)tmp - minULL) >> shift]++;
+      tmp++;
+    }
+  }
+  
+  // cumulate columnwise; parallel histogram; small so no need to parallelize
+  R_xlen_t rollSum=0;
+  for (int msb=0; msb<MSBsize; msb++) {
+    int j = msb;
+    for (int batch=0; batch<nBatch; batch++) {
+      R_xlen_t tmp = counts[j];
+      counts[j] = rollSum;
+      rollSum += tmp;
+      j += MSBsize;  // deliberately non-contiguous here
+    }
+  }  // leaves msb cumSum in the last batch i.e. last row of the matrix
+  
+  #pragma omp parallel for num_threads(nth)
+  for (int batch=0; batch<nBatch; batch++) {
+    R_xlen_t thisLen = (batch==nBatch-1) ? lastBatchSize : batchSize;
+    double *source = &REAL(x)[batchSize * batch];
+    R_xlen_t *thisCounts = counts + batch*MSBsize;
+    for (R_xlen_t j=0; j<thisLen; j++) {
+      ans[ thisCounts[(*(unsigned long long *)source - minULL) >> shift]++ ] = *source;
+      // This assignment to ans is not random access as it may seem, but cache efficient by
+      // design since target pages are written to contiguously. MSBsize * 4k < cache.
+      // TODO: therefore 16 bit MSB seems too big for this step. Time this step and reduce 16 a lot.
+      //       20MB cache / nth / 4k => MSBsize=160
+      source++;
+    }
+  }
+  // Done with batches now. Will not use batch dimension again.
+  
+  // TODO: add a timing point up to here
+  
+  if (shift > 0) { // otherwise, no more bits left to resolve ties and we're done  
+    int toBit = shift-1;
+    int fromBit = toBit>7 ? toBit-7 : 0;
+
+    // sort bins by size, largest first to minimise last-man-home
+    R_xlen_t *msbCounts = counts + (nBatch-1)*(size_t)MSBsize;
+    // msbCounts currently contains the ending position of each MSB (the starting location of the next) even across empty
+    if (msbCounts[MSBsize-1] != xlength(x)) error("Internal error: counts[nBatch-1][MSBsize-1] != length(x)");
+    R_xlen_t *msbFrom = malloc(MSBsize*sizeof(R_xlen_t));
+    int *order = malloc(MSBsize*sizeof(int));
+    R_xlen_t cumSum = 0;
+    for (int i=0; i<MSBsize; i++) {
+      msbFrom[i] = cumSum;
+      msbCounts[i] = msbCounts[i] - cumSum;
+      cumSum += msbCounts[i];
+      order[i] = i;
+    }
+    qsort_data = msbCounts;
+    qsort(order, MSBsize, sizeof(int), qsort_cmp);  // find order of the sizes, largest first
+    // Would have liked to define qsort_cmp() inside this function right here, but not sure that's fully portable.
+    // TODO: time this qsort but likely insignificant.
+    
+    if (verbose) {
+      Rprintf("Top 5 MSB counts: "); for(int i=0; i<5; i++) Rprintf("%lld ", msbCounts[order[i]]); Rprintf("\n");
+      Rprintf("Reduced MSBsize from %d to ", MSBsize);
+    }
+    while (MSBsize>0 && msbCounts[order[MSBsize-1]] < 2) MSBsize--;
+    if (verbose) {
+      Rprintf("%d by excluding 0 and 1 counts\n", MSBsize);
+    }
+    
+    #pragma omp parallel num_threads(getDTthreads())
+    {
+      R_xlen_t *counts = calloc((toBit/8 + 1)*256, sizeof(R_xlen_t));
+      // each thread has its own (small) stack of counts
+      // don't use VLAs here: perhaps too big for stack yes but more that VLAs apparently fail with schedule(dynamic)
+      
+      double *working=NULL;
+      // the working memory (for the largest groups) is allocated the first time the thread is assigned to
+      // an iteration.
+     
+      #pragma omp for schedule(dynamic,1)    
+      // All we assume here is that a thread can never be assigned to an earlier iteration; i.e. threads 0:(nth-1)
+      // get iterations 0:(nth-1) possibly out of order, then first-come-first-served in order after that.
+      // If a thread deals with an msb lower than the first one it dealt with, then its *working will be too small.
+      for (int msb=0; msb<MSBsize; msb++) {
+      
+        R_xlen_t from= msbFrom[order[msb]];
+        R_xlen_t thisN = msbCounts[order[msb]];
+        
+        if (working==NULL) working = malloc(thisN * sizeof(double)); // TODO: check succeeded otherwise exit gracefully 
+        // Depends on msbCounts being sorted largest first before this parallel loop
+        // Could be significant RAM saving if the largest msb is
+        // a lot larger than the 2nd largest msb, especially as nth grows to perhaps 128 on X1.
+        // However, the initial split is so large (16bits => 65,536) that the largest MSB should be
+        // relatively small anyway (n/65,536 if uniformly distributed).
+        // For msb>=nth, that thread's *working will already be big
+        // enough because the smallest *working (for thread nth-1) is big enough for all iterations following.
+        // Progressively, less and less of the working will be needed by the thread (just the first thisN will be
+        // used) and the unused pages will simply not be cached.
+        // TODO: Calloc isn't thread-safe. But this deep malloc should be ok here as no possible error() points
+        //       before free. Just need to add the check and exit thread safely somehow.
+
+        if (thisN <= INSERT_THRESH) {
+          dinsert(ans+from, thisN);
+        } else {
+          dradix_r(ans+from, working, thisN, fromBit, toBit, counts);
+        }
+      }
+      free(counts);
+      free(working);
+    }
+    free(msbFrom);
+    free(order);
+  }
+  
+  free(counts);
+  
+  // TODO: parallel sweep to check sorted using <= on original input. Feasible that twiddling messed up.
+  //       After a few years of heavy use remove this check for speed, and move into unit tests.
+  //       It's a perfectly contiguous and cache efficient parallel scan so should be relatively negligible.
+  
+  UNPROTECT(1);
+  return(ansVec);
+}
+
+
diff --git a/src/fwrite.c b/src/fwrite.c
new file mode 100644
index 0000000..8962023
--- /dev/null
+++ b/src/fwrite.c
@@ -0,0 +1,1110 @@
+#include "data.table.h"
+#include "fwriteLookups.h"
+#include <errno.h>
+#include <unistd.h>  // for access()
+#include <fcntl.h>
+#include <time.h>
+#ifdef WIN32
+#include <sys/types.h>
+#include <sys/stat.h>
+#include <io.h>
+#define WRITE _write
+#define CLOSE _close
+#else
+#define WRITE write
+#define CLOSE close
+#endif
+
+#define NUM_SF   15
+#define SIZE_SF  1000000000000000ULL  // 10^NUM_SF
+
+// Globals for this file only (written once to hold parameters passed from R level)                   
+static const char *na;                 // by default "" or if set then usually "NA"
+static char sep;                       // comma in .csv files
+static char sep2;                      // ; in list column vectors
+static char dec;                       // the '.' in the number 3.1416. In Europe often: 3,1416
+static Rboolean verbose=FALSE;         // be chatty?
+static Rboolean quote=FALSE;           // whether to surround fields with double quote ". NA means 'auto' (default)
+static Rboolean qmethod_escape=TRUE;   // when quoting fields, how to manage double quote in the field contents
+static Rboolean logicalAsInt=FALSE;    // logical as 0/1 or "TRUE"/"FALSE"
+static Rboolean squash=FALSE;          // 0=ISO(yyyy-mm-dd) 1=squash(yyyymmdd)
+static int dateTimeAs=0;               // 0=ISO(yyyy-mm-dd) 1=squash(yyyymmdd), 2=epoch, 3=write.csv
+#define DATETIMEAS_EPOCH     2
+#define DATETIMEAS_WRITECSV  3
+#define ET_DATE    1   // extraType values
+#define ET_ITIME   2
+#define ET_INT64   3
+#define ET_POSIXCT 4
+#define ET_FACTOR  5
+
+static inline void writeInteger(long long x, char **thisCh)
+{
+  char *ch = *thisCh;
+  // both integer and integer64 are passed to this function so careful
+  // to test for NA_INTEGER in the calling code. INT_MIN (NA_INTEGER) is
+  // a valid non-NA in integer64
+  if (x == 0) {
+    *ch++ = '0';
+  } else {
+    if (x<0) { *ch++ = '-'; x=-x; }
+    // avoid log() call for speed. write backwards then reverse when we know how long
+    int width = 0;
+    while (x>0) { *ch++ = '0'+x%10; x /= 10; width++; }
+    for (int i=width/2; i>0; i--) {
+      char tmp=*(ch-i);
+      *(ch-i) = *(ch-width+i-1);
+      *(ch-width+i-1) = tmp;
+    }
+  }
+  *thisCh = ch;
+}
+
+static inline void writeChars(const char *x, char **thisCh)
+{
+  // similar to C's strcpy but i) doesn't copy \0 and ii) moves destination along
+  char *ch = *thisCh;
+  while (*x) *ch++=*x++;
+  *thisCh = ch;
+}
+
+static inline void writeLogical(Rboolean x, char **thisCh)
+{
+  char *ch = *thisCh;
+  if (x == NA_LOGICAL) {
+    writeChars(na, &ch);
+  } else if (logicalAsInt) {
+    *ch++ = '0'+x;
+  } else if (x) {
+    *ch++='T'; *ch++='R'; *ch++='U'; *ch++='E';
+  } else {
+    *ch++='F'; *ch++='A'; *ch++='L'; *ch++='S'; *ch++='E';
+  }
+  *thisCh = ch;
+}
+
+SEXP genLookups() {
+  Rprintf("genLookups commented out of the package so it's clear it isn't needed to build. The hooks are left in so it's easy to put back in development should we need to.\n");
+  // e.g. ldexpl may not be available on some platforms, or if it is it may not be accurate.
+  return R_NilValue;
+}
+/*
+  FILE *f = fopen("/tmp/fwriteLookups.h", "w"); 
+  fprintf(f, "//\n\
+// Generated by fwrite.c:genLookups()\n\
+//\n\
+// 3 vectors: sigparts, expsig and exppow\n\
+// Includes precision higher than double; leave this compiler on this machine\n\
+// to parse the literals at reduced precision.\n\
+// 2^(-1023:1024) is held more accurately than double provides by storing its\n\
+// exponent separately (expsig and exppow)\n\
+// We don't want to depend on 'long double' (>64bit) availability to generate\n\
+// these at runtime; libraries and hardware vary.\n\
+// These small lookup tables are used for speed.\n\
+//\n\n");
+  fprintf(f, "double sigparts[53] = {\n0.0,\n");
+  for (int i=1; i<=52; i++) {
+    fprintf(f, "%.40Le%s\n",ldexpl(1.0L,-i), i==52?"":",");
+  }
+  fprintf(f, "};\n\ndouble expsig[2048] = {\n");
+  char x[2048][60];
+  for (int i=0; i<2048; i++) {
+    sprintf(x[i], "%.40Le", ldexpl(1.0L, i-1023));
+    fprintf(f, "%.*s%s\n", (int)(strchr(x[i],'e')-x[i]), x[i], (i==2047?"":",") );
+  }
+  fprintf(f, "};\n\nint exppow[2048] = {\n");
+  for (int i=0; i<2048; i++) {
+    fprintf(f, "%d%s", atoi(strchr(x[i],'e')+1), (i==2047?"":",") );
+  }
+  fprintf(f, "};\n\n");
+  fclose(f);
+  return R_NilValue;
+}
+*/
+
+static union {
+  double d;
+  unsigned long long ull;
+} u;
+
+static inline void writeNumeric(double x, char **thisCh)
+{
+  // hand-rolled / specialized for speed
+  // *thisCh is safely the output destination with enough space (ensured via calculating maxLineLen up front)
+  // technique similar to base R (format.c:formatReal and printutils.c:EncodeReal0)
+  // differences/tricks :
+  //   i) no buffers. writes straight to the final file buffer passed to write()
+  //  ii) no C libary calls such as sprintf() where the fmt string has to be interpretted over and over
+  // iii) no need to return variables or flags.  Just writes.
+  //  iv) shorter, easier to read and reason with. In one self contained place.
+  char *ch = *thisCh;
+  if (!R_FINITE(x)) {
+    if (ISNAN(x)) {
+      writeChars(na, &ch);
+    } else if (x>0) {
+      *ch++ = 'I'; *ch++ = 'n'; *ch++ = 'f';
+    } else {
+      *ch++ = '-'; *ch++ = 'I'; *ch++ = 'n'; *ch++ = 'f';
+    }
+  } else if (x == 0.0) {
+    *ch++ = '0';   // and we're done.  so much easier rather than passing back special cases
+  } else {
+    if (x < 0.0) { *ch++ = '-'; x = -x; }  // and we're done on sign, already written. no need to pass back sign
+    u.d = x;
+    unsigned long long fraction = u.ull & 0xFFFFFFFFFFFFF;  // (1ULL<<52)-1;
+    int exponent = (int)((u.ull>>52) & 0x7FF);              // [0,2047]
+
+    // Now sum the appropriate powers 2^-(1:52) of the fraction 
+    // Important for accuracy to start with the smallest first; i.e. 2^-52
+    // Exact powers of 2 (1.0, 2.0, 4.0, etc) are represented precisely with fraction==0
+    // Skip over tailing zeros for exactly representable numbers such 0.5, 0.75
+    // Underflow here (0u-1u = all 1s) is on an unsigned type which is ok by C standards
+    // sigparts[0] arranged to be 0.0 in genLookups() to enable branch free loop here
+    double acc = 0;  // 'long double' not needed
+    int i = 52;
+    if (fraction) {
+      while ((fraction & 0xFF) == 0) { fraction >>= 8; i-=8; } 
+      while (fraction) {
+        acc += sigparts[(((fraction&1u)^1u)-1u) & i];
+        i--;
+        fraction >>= 1;
+      }
+    }
+    // 1.0+acc is in range [1.5,2.0) by IEEE754
+    // expsig is in range [1.0,10.0) by design of fwriteLookups.h
+    // Therefore y in range [1.5,20.0)
+    // Avoids (potentially inaccurate and potentially slow) log10/log10l, pow/powl, ldexp/ldexpl
+    // By design we can just lookup the power from the tables
+    double y = (1.0+acc) * expsig[exponent];  // low magnitude mult
+    int exp = exppow[exponent];
+    if (y>=9.99999999999999) { y /= 10; exp++; }
+    unsigned long long l = y * SIZE_SF;  // low magnitude mult 10^NUM_SF
+    // l now contains NUM_SF+1 digits as integer where repeated /10 below is accurate
+
+    // if (verbose) Rprintf("\nTRACE: acc=%.20Le ; y=%.20Le ; l=%llu ; e=%d     ", acc, y, l, exp);    
+
+    if (l%10 >= 5) l+=10; // use the last digit to round
+    l /= 10;
+    if (l == 0) {
+      if (*(ch-1)=='-') ch--;
+      *ch++ = '0';
+    } else {
+      // Count trailing zeros and therefore s.f. present in l
+      int trailZero = 0;
+      while (l%10 == 0) { l /= 10; trailZero++; }
+      int sf = NUM_SF - trailZero;
+      if (sf==0) {sf=1; exp++;}  // e.g. l was 9999999[5-9] rounded to 10000000 which added 1 digit
+      
+      // l is now an unsigned long that doesn't start or end with 0
+      // sf is the number of digits now in l
+      // exp is e<exp> were l to be written with the decimal sep after the first digit
+      int dr = sf-exp-1; // how many characters to print to the right of the decimal place
+      int width=0;       // field width were it written decimal format. Used to decide whether to or not.
+      int dl0=0;         // how many 0's to add to the left of the decimal place before starting l
+      if (dr<=0) { dl0=-dr; dr=0; width=sf+dl0; }  // 1, 10, 100, 99000
+      else {
+        if (sf>dr) width=sf+1;                     // 1.234 and 123.4
+        else { dl0=1; width=dr+1+dl0; }            // 0.1234, 0.0001234
+      }
+      // So:  3.1416 => l=31416, sf=5, exp=0     dr=4; dl0=0; width=6
+      //      30460  => l=3046, sf=4, exp=4      dr=0; dl0=1; width=5
+      //      0.0072 => l=72, sf=2, exp=-3       dr=4; dl0=1; width=6
+      if (width <= sf + (sf>1) + 2 + (abs(exp)>99?3:2)) {
+         //              ^^^^ to not include 1 char for dec in -7e-04 where sf==1
+         //                      ^ 2 for 'e+'/'e-'
+         // decimal format ...
+         ch += width-1;
+         if (dr) {
+           while (dr && sf) { *ch--='0'+l%10; l/=10; dr--; sf--; }
+           while (dr) { *ch--='0'; dr--; }
+           *ch-- = dec;
+         }
+         while (dl0) { *ch--='0'; dl0--; }
+         while (sf) { *ch--='0'+l%10; l/=10; sf--; }
+         // ch is now 1 before the first char of the field so position it afterward again, and done
+         ch += width+1;
+      } else {
+        // scientific ...
+        ch += sf;  // sf-1 + 1 for dec
+        for (int i=sf; i>1; i--) {
+          *ch-- = '0' + l%10;   
+          l /= 10;
+        }
+        if (sf == 1) ch--; else *ch-- = dec;
+        *ch = '0' + l;
+        ch += sf + (sf>1);
+        *ch++ = 'e';  // lower case e to match base::write.csv
+        if (exp < 0) { *ch++ = '-'; exp=-exp; }
+        else { *ch++ = '+'; }  // to match base::write.csv
+        if (exp < 100) {
+          *ch++ = '0' + (exp / 10);
+          *ch++ = '0' + (exp % 10);
+        } else {
+          *ch++ = '0' + (exp / 100);
+          *ch++ = '0' + (exp / 10) % 10;
+          *ch++ = '0' + (exp % 10);
+        }
+      }
+    }
+  }
+  *thisCh = ch;
+}
+
+static inline void writeString(SEXP x, char **thisCh)
+{
+  char *ch = *thisCh;
+  if (x == NA_STRING) {
+    // NA is not quoted by write.csv even when quote=TRUE to distinguish from "NA"
+    writeChars(na, &ch);
+  } else {
+    Rboolean q = quote;
+    if (q==NA_LOGICAL) { // quote="auto"
+      const char *tt = CHAR(x);
+      while (*tt!='\0' && *tt!=sep && *tt!=sep2 && *tt!='\n' && *tt!='"') *ch++ = *tt++;
+      // windows includes \n in its \r\n so looking for \n only is sufficient
+      // sep2 is set to '\0' when no list columns are present
+      if (*tt=='\0') {
+        // most common case: no sep, newline or " contained in string
+        *thisCh = ch;  // advance caller over the field already written
+        return;
+      }
+      ch = *thisCh; // rewind the field written since it needs to be quoted
+      q = TRUE;
+    }
+    if (q==FALSE) {
+      writeChars(CHAR(x), &ch);
+    } else {
+      *ch++ = '"';
+      const char *tt = CHAR(x);
+      if (qmethod_escape) {
+        while (*tt!='\0') {
+          if (*tt=='"' || *tt=='\\') *ch++ = '\\';
+          *ch++ = *tt++;
+        }
+      } else {
+        // qmethod='double'
+        while (*tt!='\0') {
+          if (*tt=='"') *ch++ = '"';
+          *ch++ = *tt++;
+        }
+      }
+      *ch++ = '"';
+    }
+  }
+  *thisCh = ch;
+}
+
+// DATE/TIME
+static inline void writeITime(int x, char **thisCh)
+{
+  char *ch = *thisCh;
+  if (x<0) writeChars(na, &ch);  // <0 covers NA_INTEGER too
+  else if (dateTimeAs == DATETIMEAS_EPOCH) writeInteger(x, &ch);
+  else {
+    int hh = x/3600;
+    int mm = (x - hh*3600) / 60;
+    int ss = x%60;
+    *ch++ = '0'+hh/10;
+    *ch++ = '0'+hh%10;
+    *ch++ = ':';
+    ch -= squash;
+    *ch++ = '0'+mm/10;
+    *ch++ = '0'+mm%10;
+    *ch++ = ':';
+    ch -= squash;
+    *ch++ = '0'+ss/10;
+    *ch++ = '0'+ss%10;
+  }
+  *thisCh = ch;
+}
+
+static inline void writeDate(int x, char **thisCh)
+{
+  // From base ?Date :
+  //  "  Dates are represented as the number of days since 1970-01-01, with negative values
+  // for earlier dates. They are always printed following the rules of the current Gregorian calendar,
+  // even though that calendar was not in use long ago (it was adopted in 1752 in Great Britain and its
+  // colonies)  "
+
+  // The algorithm here was taken from civil_from_days() here :
+  //   http://howardhinnant.github.io/date_algorithms.html
+  // donated to the public domain thanks to Howard Hinnant, 2013.
+  // The rebase to 1 March 0000 is inspired; avoids needing isleap at all.
+  // The only small modifications here are :
+  //   1) no need for era
+  //   2) impose date range of [0000-03-01, 9999-12-31]. All 3,652,365 dates tested in test 1739
+  //   3) use direct lookup for mmdd rather than the math using 153, 2 and 5
+  //   4) use true/false value (md/100)<3 rather than ?: branch
+  // The end result is 5 lines of simple branch free integer math with no library calls.
+  // as.integer(as.Date(c("0000-03-01","9999-12-31"))) == c(-719468,+2932896)
+
+  char *ch = *thisCh;
+  if (x< -719468 || x>2932896) writeChars(na, &ch);  // NA_INTEGER<(-719468) too (==INT_MIN checked in init.c)
+  else if (dateTimeAs == DATETIMEAS_EPOCH) writeInteger(x, &ch);
+  else {
+    x += 719468;  // convert days from 1970-01-01 to days from 0000-03-01 (the day after 29 Feb 0000)
+    int y = (x - x/1461 + x/36525 - x/146097) / 365;  // year of the preceeding March 1st
+    int z =  x - y*365 - y/4 + y/100 - y/400 + 1;     // days from March 1st in year y
+    int md = monthday[z];  // See fwriteLookups.h for how the 366 item lookup 'monthday' is arranged
+    y += z && (md/100)<3;  // The +1 above turned z=-1 to 0 (meaning Feb29 of year y not Jan or Feb of y+1)
+    
+    ch += 7 + 2*!squash;
+    *ch-- = '0'+md%10; md/=10;
+    *ch-- = '0'+md%10; md/=10;
+    *ch-- = '-';
+    ch += squash;
+    *ch-- = '0'+md%10; md/=10;
+    *ch-- = '0'+md%10; md/=10;
+    *ch-- = '-';
+    ch += squash;
+    *ch-- = '0'+y%10; y/=10;
+    *ch-- = '0'+y%10; y/=10;
+    *ch-- = '0'+y%10; y/=10;
+    *ch   = '0'+y%10; y/=10;
+    ch += 8 + 2*!squash;
+  }
+  *thisCh = ch;
+}
+
+
+static inline void writePOSIXct(double x, char **thisCh)
+{
+  char *ch = *thisCh;
+  
+  // Write ISO8601 UTC by default to encourage ISO standards, stymie ambiguity and for speed.
+  // R internally represents POSIX datetime in UTC always. Its 'tzone' attribute can be ignored.
+  // R's representation ignores leap seconds too which is POSIX compliant, convenient and fast.
+  // Aside: an often overlooked option for users is to start R in UTC: $ TZ='UTC' R
+  // All positive integers up to 2^53 (9e15) are exactly representable by double which is relied
+  // on in the ops here; number of seconds since epoch.
+  
+  if (!R_FINITE(x)) { writeChars(na, &ch); }
+  else if (dateTimeAs==DATETIMEAS_EPOCH) writeNumeric(x, &ch);
+  else {
+    int xi, d, t;
+    if (x>=0) {
+      xi = (int)x;
+      d = xi / 86400;
+      t = xi % 86400;
+    } else {
+      // before 1970-01-01T00:00:00Z
+      xi = (int)floor(x);
+      d = (xi+1)/86400 - 1;
+      t = xi - d*86400;  // xi and d are both negative here; t becomes the positive number of seconds into the day
+    }
+    int m = (int)((x-xi)*10000000); // 7th digit used to round up if 9
+    m += (m%10);  // 9 is numerical accuracy, 8 or less then we truncate to last microsecond
+    m /= 10;
+    writeDate(d, &ch);
+    *ch++ = 'T';
+    ch -= squash;
+    writeITime(t, &ch);
+    if (squash || (m && m%1000==0)) {
+       // when squash always write 3 digits of milliseconds even if 000, for consistent scale of squash integer64
+       // don't use writeInteger() because it doesn't 0 pad which we need here
+       // integer64 is big enough for squash with milli but not micro; trunc (not round) micro when squash
+       m /= 1000;
+       *ch++ = '.';
+       ch -= squash;
+       *(ch+2) = '0'+m%10; m/=10;
+       *(ch+1) = '0'+m%10; m/=10;
+       *ch     = '0'+m;
+       ch += 3;
+    } else if (m) {
+       // microseconds are present and !squash
+       *ch++ = '.';
+       *(ch+5) = '0'+m%10; m/=10;
+       *(ch+4) = '0'+m%10; m/=10;
+       *(ch+3) = '0'+m%10; m/=10;
+       *(ch+2) = '0'+m%10; m/=10;
+       *(ch+1) = '0'+m%10; m/=10;
+       *ch     = '0'+m;
+       ch += 6;
+    }
+    *ch++ = 'Z';
+    ch -= squash;
+  }
+  *thisCh = ch;
+}
+
+
+static int failed = 0;
+static int rowsPerBatch;
+
+static inline void checkBuffer(
+  char **buffer,       // this thread's buffer
+  size_t *myAlloc,     // the size of this buffer
+  char **ch,           // the end of the last line written to the buffer by this thread
+  size_t myMaxLineLen  // the longest line seen so far by this thread
+  // Initial size for the thread's buffer is twice as big as needed for rowsPerBatch based on
+  // maxLineLen from the sample; i.e. only 50% of the buffer should be used.
+  // If we get to 75% used, we'll realloc.
+  // i.e. very cautious and grateful to the OS for not fetching untouched pages of buffer.
+  // Plus, more caution ... myMaxLineLine is tracked and if that grows we'll realloc too.
+  // Very long lines are caught up front and rowsPerBatch is set to 1 in that case.
+  // This checkBuffer() is called after every line.
+) {
+  if (failed) return;  // another thread already failed. Fall through and error().
+  size_t thresh = 0.75*(*myAlloc);
+  if ((*ch > (*buffer)+thresh) ||
+      (rowsPerBatch*myMaxLineLen > thresh )) {
+    size_t off = *ch-*buffer;
+    *myAlloc = 1.5*(*myAlloc);
+    *buffer = realloc(*buffer, *myAlloc);
+    if (*buffer==NULL) {
+      failed = -errno;    // - for malloc/realloc errno, + for write errno
+    } else {
+      *ch = *buffer+off;  // in case realloc moved the allocation
+    }
+  }
+}
+
+SEXP writefile(SEXP DFin,               // any list of same length vectors; e.g. data.frame, data.table
+               SEXP filename_Arg,
+               SEXP sep_Arg,
+               SEXP sep2_Arg,
+               SEXP eol_Arg,
+               SEXP na_Arg,
+               SEXP dec_Arg,
+               SEXP quote_Arg,          // 'auto'=NA_LOGICAL|TRUE|FALSE
+               SEXP qmethod_escapeArg,  // TRUE|FALSE
+               SEXP append,             // TRUE|FALSE
+               SEXP row_names,          // TRUE|FALSE
+               SEXP col_names,          // TRUE|FALSE
+               SEXP logicalAsInt_Arg,   // TRUE|FALSE
+               SEXP dateTimeAs_Arg,     // 0=ISO(yyyy-mm-dd),1=squash(yyyymmdd),2=epoch,3=write.csv
+               SEXP buffMB_Arg,         // [1-1024] default 8MB
+               SEXP nThread,
+               SEXP showProgress_Arg,
+               SEXP verbose_Arg,
+               SEXP turbo_Arg)
+{
+  if (!isNewList(DFin)) error("fwrite must be passed an object of type list; e.g. data.frame, data.table");
+  RLEN ncol = length(DFin);
+  if (ncol==0) {
+    warning("fwrite was passed an empty list of no columns. Nothing to write.");
+    return R_NilValue;
+  }
+  RLEN nrow = length(VECTOR_ELT(DFin, 0));
+
+  const Rboolean showProgress = LOGICAL(showProgress_Arg)[0];
+  time_t start_time = time(NULL);
+  time_t next_time = start_time+2; // start printing progress meter in 2 sec if not completed by then
+  
+  verbose = LOGICAL(verbose_Arg)[0];
+  const Rboolean turbo = LOGICAL(turbo_Arg)[0];
+  
+  sep = *CHAR(STRING_ELT(sep_Arg, 0));  // DO NOT DO: allow multichar separator (bad idea)
+  const char *sep2start = CHAR(STRING_ELT(sep2_Arg, 0));
+  sep2 = *CHAR(STRING_ELT(sep2_Arg, 1));
+  const char *sep2end = CHAR(STRING_ELT(sep2_Arg, 2));
+  
+  const char *eol = CHAR(STRING_ELT(eol_Arg, 0));
+  // someone might want a trailer on every line so allow any length string as eol
+  
+  na = CHAR(STRING_ELT(na_Arg, 0));
+  dec = *CHAR(STRING_ELT(dec_Arg,0));
+  quote = LOGICAL(quote_Arg)[0];
+  qmethod_escape = LOGICAL(qmethod_escapeArg)[0];
+  const char *filename = CHAR(STRING_ELT(filename_Arg, 0));
+  logicalAsInt = LOGICAL(logicalAsInt_Arg)[0];
+  dateTimeAs = INTEGER(dateTimeAs_Arg)[0];
+  squash = (dateTimeAs==1);
+  int nth = INTEGER(nThread)[0];
+  int firstListColumn = 0;
+  clock_t t0=clock();
+
+  SEXP DF = DFin;
+  int protecti = 0;
+  if (dateTimeAs == DATETIMEAS_WRITECSV) {
+    int j=0; while(j<ncol && !INHERITS(VECTOR_ELT(DFin,j), char_POSIXct)) j++;
+    if (j<ncol) {
+      // dateTimeAs=="write.csv" && there exist some POSIXct columns; coerce them
+      DF = PROTECT(allocVector(VECSXP, ncol));
+      protecti++;
+      // potentially large if ncol=1e6 as reported in #1903 where using large VLA caused stack overflow
+      SEXP s = PROTECT(allocList(2));
+      SET_TYPEOF(s, LANGSXP);
+      SETCAR(s, install("format.POSIXct"));
+      for (int j=0; j<ncol; j++) {
+        SEXP column = VECTOR_ELT(DFin, j);
+        if (INHERITS(column, char_POSIXct)) {
+          SETCAR(CDR(s), column);
+          SET_VECTOR_ELT(DF, j, eval(s, R_GlobalEnv));
+        } else {
+          SET_VECTOR_ELT(DF, j, column);
+        }
+      }
+      UNPROTECT(1);  // s, not DF
+    }
+  }
+  
+  int sameType = TYPEOF(VECTOR_ELT(DFin, 0)); // to avoid deep switch later
+
+  // Store column type tests in lookup for efficiency
+  // ET_INT64, ET_ITIME, ET_DATE, ET_POSIXCT, ET_FACTOR
+  char *extraType = (char *)R_alloc(ncol, sizeof(char)); // not a VLA as ncol could be > 1e6 columns
+  
+  for (int j=0; j<ncol; j++) {
+    SEXP column = VECTOR_ELT(DF, j);
+    if (nrow != length(column))
+      error("Column %d's length (%d) is not the same as column 1's length (%d)", j+1, length(column), nrow);
+    extraType[j] = 0;
+    if (isFactor(column)) {
+      extraType[j] = ET_FACTOR;
+    } else if (INHERITS(column, char_integer64)) {
+      if (TYPEOF(column)!=REALSXP) error("Column %d inherits from 'integer64' but is type '%s' not REALSXP", j+1, type2char(TYPEOF(column)));
+      extraType[j] = ET_INT64;
+    } else if (INHERITS(column, char_ITime)) {
+      extraType[j] = ET_ITIME;
+    } else if (INHERITS(column, char_Date)) {  // including IDate which inherits from Date
+      extraType[j] = ET_DATE;
+    } else if (INHERITS(column, char_POSIXct)) {
+      if (dateTimeAs==DATETIMEAS_WRITECSV) error("Internal error: column should have already been coerced to character");
+      extraType[j] = ET_POSIXCT;
+    }
+    if (TYPEOF(column)!=sameType || getAttrib(column,R_ClassSymbol)!=R_NilValue) {
+      sameType = 0;  // only all plain INTSXP or all plain REALSXP save the deep switch() below. 
+    }
+    if (firstListColumn==0 && TYPEOF(column)==VECSXP) firstListColumn = j+1;
+  }
+  
+  if (!firstListColumn) {
+    if (verbose) Rprintf("No list columns are present. Setting sep2='' otherwise quote='auto' would quote fields containing sep2.\n");
+    sep2='\0';
+  } else {
+    if (verbose) Rprintf("If quote='auto' all fields will be quoted if the field contains either sep ('%c') or sep2[2] ('%c'). Column %d is a list column.\n", sep, sep2, firstListColumn );
+    if (dec==sep) error("Internal error: dec != sep was checked at R level");
+    if (dec==sep2 || sep==sep2)
+      error("sep ('%c'), sep2[2L] ('%c') and dec ('%c') must all be different when list columns are present. Column %d is a list column.", sep, sep2, dec, firstListColumn); 
+  }
+  
+  // user may want row names even when they don't exist (implied row numbers as row names)
+  Rboolean doRowNames = LOGICAL(row_names)[0];
+  SEXP rowNames = NULL;
+  if (doRowNames) {
+    rowNames = getAttrib(DFin, R_RowNamesSymbol);
+    if (!isString(rowNames)) rowNames=NULL;
+  }
+  
+  // Estimate max line length of a 1000 row sample (100 rows in 10 places).
+  // 'Estimate' even of this sample because quote='auto' may add quotes and escape embedded quotes.
+  // Buffers will be resized later if there are too many line lengths outside the sample, anyway.
+  // maxLineLen is just used to determine rowsPerBatch.
+  int maxLineLen = 0;
+  int na_len = strlen(na);
+  int step = nrow<1000 ? 100 : nrow/10;
+  for (int start=0; start<nrow; start+=step) {
+    int end = (nrow-start)<100 ? nrow : start+100;
+    for (int i=start; i<end; i++) {
+      int thisLineLen=0;
+      if (doRowNames) {
+        if (rowNames) thisLineLen += LENGTH(STRING_ELT(rowNames,i));
+        else thisLineLen += 1+(int)log10(nrow);
+        if (quote==TRUE) thisLineLen+=2;
+        thisLineLen++; // sep
+      }
+      for (int j=0; j<ncol; j++) {
+        SEXP column = VECTOR_ELT(DF, j);
+        static char tmp[32]; // +- 15digits dec e +- nnn \0 = 23 + 9 safety = 32. Covers integer64 too (20 digits).
+        char *ch=tmp;
+        switch(TYPEOF(column)) {
+        case LGLSXP:
+          thisLineLen += logicalAsInt ? 1/*0|1*/ : 5/*FALSE*/;  // na_len might be 2 (>1) but ok; this is estimate
+          break;
+        case INTSXP: {
+          int i32 = INTEGER(column)[i];
+          if (i32 == NA_INTEGER) ch += na_len;
+          else if (extraType[j] == ET_FACTOR) ch += LENGTH(STRING_ELT(getAttrib(column,R_LevelsSymbol), i32-1));
+          else if (extraType[j] == ET_ITIME) ch += 8;
+          else if (extraType[j] == ET_DATE) writeDate(i32, &ch);
+          else writeInteger(i32, &ch); }
+          thisLineLen += (int)(ch-tmp);
+          break;          
+        case REALSXP:
+          if (extraType[j] == ET_INT64) {
+            long long i64 = *(long long *)&REAL(column)[i];
+            if (i64==NAINT64) ch += na_len;
+            else writeInteger(i64, &ch);
+          }
+          else if (extraType[j]==ET_DATE) writeDate((int)REAL(column)[i], &ch);
+          else if (extraType[j]==ET_POSIXCT) writePOSIXct(REAL(column)[i], &ch);
+          else writeNumeric(REAL(column)[i], &ch);
+          thisLineLen += (int)(ch-tmp);
+          break;
+        case STRSXP:
+          thisLineLen += LENGTH(STRING_ELT(column, i));
+          break;
+        case VECSXP: {
+          // a list column containing atomic vectors in each cell
+          SEXP v = VECTOR_ELT(column,i);
+          thisLineLen += strlen(sep2start);
+          switch(TYPEOF(v)) {
+          case LGLSXP :
+            thisLineLen += LENGTH(v) * (logicalAsInt ? 1 : 5);
+            break;
+          case INTSXP:
+            if (isFactor(v)) {
+              SEXP l = getAttrib(v, R_LevelsSymbol);
+              for (int k=0; k<LENGTH(v); k++) {
+                thisLineLen += INTEGER(v)[k]==NA_INTEGER ? na_len : LENGTH(STRING_ELT(l, INTEGER(v)[k]-1));
+              }
+            }
+            else if (INHERITS(v, char_ITime)) thisLineLen += LENGTH(v) * (6 + 2*!squash);
+            else if (INHERITS(v, char_Date))  thisLineLen += LENGTH(v) * (8 + 2*!squash);
+            else {
+              for (int k=0; k<LENGTH(v); k++) {
+                if ( INTEGER(v)[k]==NA_INTEGER ) thisLineLen += na_len;
+                else { writeInteger(INTEGER(v)[k], &ch); thisLineLen+=(int)(ch-tmp); ch=tmp; }
+              }
+            }
+            break;
+          case REALSXP:
+            if (INHERITS(v, char_integer64)) {
+              for (int k=0; k<LENGTH(v); k++) {
+                long long i64 = *(long long *)&REAL(v)[k];
+                if (i64==NAINT64) thisLineLen += na_len;
+                else {
+                  writeInteger(i64, &ch);
+                  thisLineLen += (int)(ch-tmp);
+                  ch=tmp;
+                }
+              }
+            }
+            else if (INHERITS(v, char_Date)) thisLineLen += LENGTH(v) * (8 + 2*!squash);
+            else if (INHERITS(v, char_POSIXct)) {
+              for (int k=0; k<LENGTH(v); k++) { writePOSIXct(REAL(v)[k], &ch); thisLineLen+=(int)(ch-tmp); ch=tmp; }
+            } else {
+              for (int k=0; k<LENGTH(v); k++) { writeNumeric(REAL(v)[k], &ch); thisLineLen+=(int)(ch-tmp); ch=tmp; }
+            }
+            break;
+          case STRSXP:
+            for (int k=0; k<LENGTH(v); k++) thisLineLen += LENGTH(STRING_ELT(v, k));
+            break;
+          default:
+            error("Column %d is a list column but on row %d is type '%s' - not yet implemented. fwrite() can write list columns containing atomic vectors of type logical, integer, integer64, double, character and factor, currently.", j+1, type2char(TYPEOF(v)));
+          }  // end switch on atomic vector type in a list column 
+          thisLineLen += LENGTH(v)/*sep2 after each field*/ +strlen(sep2end); }
+          // LENGTH(v) could be 0 so we don't subtract one for the last sep2 (just an estimate anyway)
+          break;  // from case VECSXP for list column
+        default:
+          error("Column %d's type is '%s' - not yet implemented.", j+1, type2char(TYPEOF(column)) );
+        }
+        thisLineLen++; // column sep
+      } // next column
+      if (thisLineLen > maxLineLen) maxLineLen = thisLineLen;
+    }
+  }
+  maxLineLen += strlen(eol);
+  if (verbose) Rprintf("maxLineLen=%d from sample. Found in %.3fs\n", maxLineLen, 1.0*(clock()-t0)/CLOCKS_PER_SEC);
+  
+  int f;
+  if (*filename=='\0') {
+    f=-1;  // file="" means write to standard output
+    eol = "\n";  // We'll use Rprintf(); it knows itself about \r\n on Windows
+  } else { 
+#ifdef WIN32
+    f = _open(filename, _O_WRONLY | _O_BINARY | _O_CREAT | (LOGICAL(append)[0] ? _O_APPEND : _O_TRUNC), _S_IWRITE);
+    // eol must be passed from R level as '\r\n' on Windows since write() only auto-converts \n to \r\n in
+    // _O_TEXT mode. We use O_BINARY for full control and perhaps speed since O_TEXT must have to deep branch an if('\n')
+#else
+    f = open(filename, O_WRONLY | O_CREAT | (LOGICAL(append)[0] ? O_APPEND : O_TRUNC), 0644);
+#endif
+    if (f == -1) {
+      int erropen = errno;
+      if( access( filename, F_OK ) != -1 )
+        error("%s: '%s'. Failed to open existing file for writing. Do you have write permission to it? Is this Windows and does another process such as Excel have it open?", strerror(erropen), filename);
+      else
+        error("%s: '%s'. Unable to create new file for writing (it does not exist already). Do you have permission to write here, is there space on the disk and does the path exist?", strerror(erropen), filename); 
+    }
+  }
+  t0=clock();
+    
+  if (verbose) {
+    Rprintf("Writing column names ... ");
+    if (f==-1) Rprintf("\n");
+  }
+  if (LOGICAL(col_names)[0]) {
+    SEXP names = getAttrib(DFin, R_NamesSymbol);  
+    if (names!=R_NilValue) {
+      if (LENGTH(names) != ncol) error("Internal error: length of column names is not equal to the number of columns. Please report.");
+      // allow for quoting even when not.
+      int buffSize = 2/*""*/ +1/*,*/;
+      for (int j=0; j<ncol; j++) buffSize += 1/*"*/ +2*LENGTH(STRING_ELT(names, j)) +1/*"*/ +1/*,*/;
+      //     in case every name full of quotes(!) to be escaped ^^
+      buffSize += strlen(eol) +1/*\0*/;
+      char *buffer = malloc(buffSize);
+      if (buffer == NULL) error("Unable to allocate %d buffer for column names", buffSize);
+      char *ch = buffer;
+      if (doRowNames) {
+        if (quote!=FALSE) { *ch++='"'; *ch++='"'; } // to match write.csv
+        *ch++ = sep;
+      }
+      for (int j=0; j<ncol; j++) {
+        writeString(STRING_ELT(names, j), &ch);
+        *ch++ = sep;
+      }
+      ch--;  // backup onto the last sep after the last column
+      writeChars(eol, &ch);  // replace it with the newline 
+      if (f==-1) { *ch='\0'; Rprintf(buffer); }
+      else if (WRITE(f, buffer, (int)(ch-buffer))==-1) {
+        int errwrite=errno;
+        close(f); // the close might fail too but we want to report the write error
+        free(buffer);
+        error("%s: '%s'", strerror(errwrite), filename);
+      }
+      free(buffer);
+    }
+  }
+  if (verbose) Rprintf("done in %.3fs\n", 1.0*(clock()-t0)/CLOCKS_PER_SEC);
+  if (nrow == 0) {
+    if (verbose) Rprintf("No data rows present (nrow==0)\n");
+    if (f!=-1 && CLOSE(f)) error("%s: '%s'", strerror(errno), filename);
+    UNPROTECT(protecti);
+    return(R_NilValue);
+  }
+
+  // Decide buffer size and rowsPerBatch for each thread
+  // Once rowsPerBatch is decided it can't be changed, but we can increase buffer size if the lines
+  // turn out to be longer than estimated from the sample.
+  // buffSize large enough to fit many lines to i) reduce calls to write() and ii) reduce thread sync points
+  // It doesn't need to be small in cache because it's written contiguously.
+  // If we don't use all the buffer for any reasons that's ok as OS will only page in the pages touched.
+  // So, generally the larger the better up to max filesize/nth to use all the threads. A few times
+  //   smaller than that though, to achieve some load balancing across threads since schedule(dynamic).
+  int buffMB = INTEGER(buffMB_Arg)[0]; // checked at R level between 1 and 1024
+  if (buffMB<1 || buffMB>1024) error("buffMB=%d outside [1,1024]", buffMB); // check it again even so
+  size_t buffSize = 1024*1024*buffMB;
+  if (maxLineLen > buffSize) buffSize=2*maxLineLen;  // A very long line; at least 1,048,576 characters
+  rowsPerBatch =
+    (10*maxLineLen > buffSize) ? 1 :  // very long lines (100,000 characters+) we'll just do one row at a time.
+    0.5 * buffSize/maxLineLen;        // Aim for 50% buffer usage. See checkBuffer for comments.
+  if (rowsPerBatch > nrow) rowsPerBatch=nrow;
+  int numBatches = (nrow-1)/rowsPerBatch + 1;
+  if (numBatches < nth) nth = numBatches;
+  if (verbose) {
+    Rprintf("Writing %d rows in %d batches of %d rows (each buffer size %dMB, turbo=%d, showProgress=%d, nth=%d) ... ",
+    nrow, numBatches, rowsPerBatch, buffMB, turbo, showProgress, nth);
+    if (f==-1) Rprintf("\n");
+  }
+  t0 = clock();
+  
+  failed=0;  // static global so checkBuffer can set it. -errno for malloc or realloc fails, +errno for write fail
+  Rboolean hasPrinted=FALSE;
+  Rboolean anyBufferGrown=FALSE;
+  int maxBuffUsedPC=0;
+  
+  #pragma omp parallel num_threads(nth)
+  {
+    char *ch, *buffer;               // local to each thread
+    ch = buffer = malloc(buffSize);  // each thread has its own buffer
+    // Don't use any R API alloc here (e.g. R_alloc); they are
+    // not thread-safe as per last sentence of R-exts 6.1.1.
+    
+    if (buffer==NULL) {failed=-errno;}
+    // Do not rely on availability of '#omp cancel' new in OpenMP v4.0 (July 2013).
+    // OpenMP v4.0 is in gcc 4.9+ (https://gcc.gnu.org/wiki/openmp) but
+    // not yet in clang as of v3.8 (http://openmp.llvm.org/)
+    // If not-me failed, I'll see shared 'failed', fall through loop, free my buffer
+    // and after parallel section, single thread will call R API error() safely.
+    
+    size_t myAlloc = buffSize;
+    size_t myMaxLineLen = maxLineLen;
+    // so we can realloc(). Should only be needed if there are very long single CHARSXP
+    // much longer than occurred in the sample for maxLineLen. Or for list() columns 
+    // contain vectors which are much longer than occurred in the sample.
+    
+    #pragma omp single
+    {
+      nth = omp_get_num_threads();  // update nth with the actual nth (might be different than requested)
+    }
+    int me = omp_get_thread_num();
+    
+    #pragma omp for ordered schedule(dynamic)
+    for(RLEN start=0; start<nrow; start+=rowsPerBatch) {
+      if (failed) continue;  // Not break. See comments above about #omp cancel
+      int end = ((nrow-start)<rowsPerBatch) ? nrow : start+rowsPerBatch;
+      
+      // all-integer and all-double deep switch() avoidance. We could code up all-integer64
+      // as well but that seems even less likely in practice than all-integer or all-double
+      if (turbo && sameType==REALSXP && !doRowNames) {
+        // avoid deep switch() on type. turbo switches on both sameType and specialized writeNumeric
+        for (RLEN i=start; i<end; i++) {
+          char *lineStart = ch;
+          for (int j=0; j<ncol; j++) {
+            SEXP column = VECTOR_ELT(DF, j);
+            writeNumeric(REAL(column)[i], &ch);
+            *ch++ = sep;
+          }
+          ch--;  // backup onto the last sep after the last column
+          writeChars(eol, &ch);  // replace it with the newline.
+          
+          size_t thisLineLen = ch-lineStart;
+          if (thisLineLen > myMaxLineLen) myMaxLineLen=thisLineLen;
+          checkBuffer(&buffer, &myAlloc, &ch, myMaxLineLen);
+          if (failed) break;
+        }
+      } else if (turbo && sameType==INTSXP && !doRowNames) {
+        for (RLEN i=start; i<end; i++) {
+          char *lineStart = ch;
+          for (int j=0; j<ncol; j++) {
+            SEXP column = VECTOR_ELT(DF, j);
+            if (INTEGER(column)[i] == NA_INTEGER) {
+              writeChars(na, &ch);
+            } else {
+              writeInteger(INTEGER(column)[i], &ch);
+            }
+            *ch++ = sep;
+          }
+          ch--;
+          writeChars(eol, &ch);
+          
+          size_t thisLineLen = ch-lineStart;
+          if (thisLineLen > myMaxLineLen) myMaxLineLen=thisLineLen;
+          checkBuffer(&buffer, &myAlloc, &ch, myMaxLineLen);
+          if (failed) break;
+        }
+      } else {
+        // mixed types. switch() on every cell value since must write row-by-row
+        for (RLEN i=start; i<end; i++) {
+          char *lineStart = ch;
+          if (doRowNames) {
+            if (rowNames==NULL) {
+              if (quote!=FALSE) *ch++='"';  // default 'auto' will quote the row.name numbers
+              writeInteger(i+1, &ch);
+              if (quote!=FALSE) *ch++='"';
+            } else {
+              writeString(STRING_ELT(rowNames, i), &ch);
+            }
+            *ch++=sep;
+          }
+          for (int j=0; j<ncol; j++) {
+            SEXP column = VECTOR_ELT(DF, j);
+            switch(TYPEOF(column)) {
+            case LGLSXP:
+              writeLogical(LOGICAL(column)[i], &ch);
+              break;
+            case INTSXP:
+              if (INTEGER(column)[i] == NA_INTEGER) {
+                writeChars(na, &ch);
+              } else if (extraType[j] == ET_FACTOR) {
+                writeString(STRING_ELT(getAttrib(column,R_LevelsSymbol), INTEGER(column)[i]-1), &ch);
+              } else if (extraType[j] == ET_ITIME) {
+                writeITime(INTEGER(column)[i], &ch);
+              } else if (extraType[j] == ET_DATE) {
+                writeDate(INTEGER(column)[i], &ch);
+              } else {
+                writeInteger(INTEGER(column)[i], &ch);
+              }
+              break;
+            case REALSXP:
+              if (extraType[j] == ET_INT64) {
+                long long i64 = *(long long *)&REAL(column)[i];
+                if (i64 == NAINT64) {
+                  writeChars(na, &ch);
+                } else {
+                  writeInteger(i64, &ch);
+                }
+              } else {
+                if (extraType[j] == ET_DATE) {
+                  writeDate( R_FINITE(REAL(column)[i]) ? (int)REAL(column)[i] : NA_INTEGER, &ch);
+                } else if (extraType[j] == ET_POSIXCT) {
+                  writePOSIXct(REAL(column)[i], &ch);
+                } else if (turbo) {
+                  writeNumeric(REAL(column)[i], &ch); // handles NA, Inf etc within it
+                } else {
+                  // if there are any problems with the specialized writeNumeric, user can revert to (slower) standard library
+                  if (ISNAN(REAL(column)[i])) {
+                    writeChars(na, &ch);
+                  } else {
+                    ch += sprintf(ch, "%.15g", REAL(column)[i]);
+                  }
+                }
+              }
+              break;
+            case STRSXP:
+              writeString(STRING_ELT(column, i), &ch);
+              break;
+              
+            case VECSXP: {
+              // a list column containing atomic vectors in each cell
+              SEXP v = VECTOR_ELT(column,i);
+              writeChars(sep2start, &ch);
+              switch(TYPEOF(v)) {
+              case LGLSXP :
+                for (int k=0; k<LENGTH(v); k++) {
+                  writeLogical(LOGICAL(v)[k], &ch);
+                  *ch++ = sep2;
+                }
+                break;
+              case INTSXP:
+                if (isFactor(v)) {
+                  SEXP l = getAttrib(v, R_LevelsSymbol);
+                  for (int k=0; k<LENGTH(v); k++) {
+                    if (INTEGER(v)[k]==NA_INTEGER) writeChars(na, &ch);
+                    else writeString(STRING_ELT(l, INTEGER(v)[k]-1), &ch);
+                    *ch++ = sep2;
+                  }
+                } else if (INHERITS(v, char_ITime)) {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    writeITime(INTEGER(v)[k], &ch);
+                    *ch++ = sep2;
+                  }
+                } else if (INHERITS(v, char_Date)) {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    writeDate(INTEGER(v)[k], &ch);
+                    *ch++ = sep2;
+                  }
+                } else {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    if (INTEGER(v)[k]==NA_INTEGER ) writeChars(na, &ch);
+                    else writeInteger(INTEGER(v)[k], &ch);
+                    *ch++ = sep2;
+                  }
+                }
+                break;
+              case REALSXP:
+                if (INHERITS(v, char_integer64)) {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    long long i64 = *(long long *)&REAL(v)[k];
+                    if (i64==NAINT64) writeChars(na, &ch);
+                    else writeInteger(i64, &ch);
+                    *ch++ = sep2;
+                  }
+                } else if (INHERITS(v, char_Date)) {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    writeDate(R_FINITE(REAL(v)[k]) ? (int)REAL(v)[k] : NA_INTEGER, &ch);
+                    *ch++ = sep2;
+                  }
+                } else if (INHERITS(v, char_POSIXct)) {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    writePOSIXct(REAL(v)[k], &ch);
+                    *ch++ = sep2;
+                  }
+                } else {
+                  for (int k=0; k<LENGTH(v); k++) {
+                    writeNumeric(REAL(v)[k], &ch);
+                    *ch++ = sep2;
+                  }
+                }
+                break;
+              case STRSXP:
+                for (int k=0; k<LENGTH(v); k++) {
+                  writeString(STRING_ELT(v, k), &ch);
+                  *ch++ = sep2;
+                }
+                break;
+              default:
+                error("Column %d is a list column but on row %d is type '%s' - not yet implemented. fwrite() can write list columns containing atomic vectors of type logical, integer, integer64, double, character and factor, currently.", j+1, type2char(TYPEOF(v)));
+              }  // end switch on atomic vector type in a list column
+              if (LENGTH(v)) ch--; // backup over the last sep2 after the last item
+              writeChars(sep2end, &ch); }
+              break;  // from case VECSXP for list column
+              
+            default:
+              error("Internal error: unsupported column type should have been thrown above when calculating maxLineLen");
+            }
+            *ch++ = sep; // next column
+          }
+          ch--;  // backup onto the last sep after the last column. 0-columns was caught and returned earlier, so >=1 cols.
+          writeChars(eol, &ch);  // replace it with the newline.
+          
+          // Track longest line seen so far. If we start to see longer lines than we saw in the
+          // sample, we'll realloc the buffer. The rowsPerBatch chosen based on the (very good) sample,
+          // must fit in the buffer. Can't early write and reset buffer because the
+          // file output would be out-of-order. Can't change rowsPerBatch after the 'parallel for' started.
+          size_t thisLineLen = ch-lineStart;
+          if (thisLineLen > myMaxLineLen) myMaxLineLen=thisLineLen;
+          checkBuffer(&buffer, &myAlloc, &ch, myMaxLineLen);
+          if (failed) break; // don't write any more rows, fall through to clear up and error() below
+        }
+      }
+      #pragma omp ordered
+      {
+        if (!failed) { // a thread ahead of me could have failed below while I was working or waiting above
+          if (f==-1) {
+            *ch='\0';  // standard C string end marker so Rprintf knows where to stop
+            Rprintf(buffer);
+            // nth==1 at this point since when file=="" (f==-1 here) fwrite.R calls setDTthreads(1)
+            // Although this ordered section is one-at-a-time it seems that calling Rprintf() here, even with a
+            // R_FlushConsole() too, causes corruptions on Windows but not on Linux. At least, as observed so
+            // far using capture.output(). Perhaps Rprintf() updates some state or allocation that cannot be done
+            // by slave threads, even when one-at-a-time. Anyway, made this single-threaded when output to console
+            // to be safe (setDTthreads(1) in fwrite.R) since output to console doesn't need to be fast.
+          } else {
+            if (WRITE(f, buffer, (int)(ch-buffer)) == -1) {
+              failed=errno;
+            }
+            if (myAlloc > buffSize) anyBufferGrown = TRUE;
+            int used = 100*((double)(ch-buffer))/buffSize;  // percentage of original buffMB
+            if (used > maxBuffUsedPC) maxBuffUsedPC = used;
+            time_t now;
+            if (me==0 && showProgress && (now=time(NULL))>=next_time && !failed) {
+              // See comments above inside the f==-1 clause.
+              // Not only is this ordered section one-at-a-time but we'll also Rprintf() here only from the
+              // master thread (me==0) and hopefully this will work on Windows. If not, user should set
+              // showProgress=FALSE until this can be fixed or removed.
+              int ETA = (int)((nrow-end)*(((double)(now-start_time))/end));
+              if (hasPrinted || ETA >= 2) {
+                if (verbose && !hasPrinted) Rprintf("\n"); 
+                Rprintf("\rWritten %.1f%% of %d rows in %d secs using %d thread%s. "
+                        "anyBufferGrown=%s; maxBuffUsed=%d%%. Finished in %d secs.      ",
+                         (100.0*end)/nrow, nrow, (int)(now-start_time), nth, nth==1?"":"s", 
+                         anyBufferGrown?"yes":"no", maxBuffUsedPC, ETA);
+                R_FlushConsole();    // for Windows
+                next_time = now+1;
+                hasPrinted = TRUE;
+              }
+            }
+            // May be possible for master thread (me==0) to call R_CheckUserInterrupt() here.
+            // Something like: 
+            // if (me==0) {
+            //   failed = TRUE;  // inside ordered here; the slaves are before ordered and not looking at 'failed'
+            //   R_CheckUserInterrupt();
+            //   failed = FALSE; // no user interrupt so return state
+            // }
+            // But I fear the slaves will hang waiting for the master (me==0) to complete the ordered
+            // section which may not happen if the master thread has been interrupted. Rather than
+            // seeing failed=TRUE and falling through to free() and close() as intended.
+            // Could register a finalizer to free() and close() perhaps :
+            // http://r.789695.n4.nabble.com/checking-user-interrupts-in-C-code-tp2717528p2717722.html
+            // Conclusion for now: do not provide ability to interrupt.
+            // write() errors and malloc() fails will be caught and cleaned up properly, however.
+          }
+          ch = buffer;  // back to the start of my buffer ready to fill it up again
+        }
+      }
+    }
+    free(buffer);
+    // all threads will call this free on their buffer, even if one or more threads had malloc
+    // or realloc fail. If the initial malloc failed, free(NULL) is ok and does nothing.
+  }
+  // Finished parallel region and can call R API safely now.
+  if (hasPrinted) {
+    if (!failed) {
+      // clear the progress meter
+      Rprintf("\r                                                                       "
+              "                                                              \r");
+      R_FlushConsole();  // for Windows
+    } else {
+      // unless failed as we'd like to see anyBufferGrown and maxBuffUsedPC
+      Rprintf("\n");
+    }
+  }
+  if (f!=-1 && CLOSE(f) && !failed)
+    error("%s: '%s'", strerror(errno), filename);
+  // quoted '%s' in case of trailing spaces in the filename
+  // If a write failed, the line above tries close() to clean up, but that might fail as well. So the
+  // '&& !failed' is to not report the error as just 'closing file' but the next line for more detail
+  // from the original error.
+  if (failed<0) {
+    error("%s. One or more threads failed to malloc or realloc their private buffer. nThread=%d and initial buffMB per thread was %d.\n", strerror(-failed), nth, buffMB);
+  } else if (failed>0) {
+    error("%s: '%s'", strerror(failed), filename);
+  }
+  if (verbose) Rprintf("done (actual nth=%d, anyBufferGrown=%s, maxBuffUsed=%d%%)\n",
+                       nth, anyBufferGrown?"yes":"no", maxBuffUsedPC);
+  UNPROTECT(protecti);
+  return(R_NilValue);
+}
+
+
diff --git a/src/fwriteLookups.h b/src/fwriteLookups.h
new file mode 100644
index 0000000..ed6a9bb
--- /dev/null
+++ b/src/fwriteLookups.h
@@ -0,0 +1,2142 @@
+//
+// Generated by fwrite.c:genLookups()
+//
+// 3 vectors: sigparts, expsig and exppow
+// Includes precision higher than double; leave this compiler on this machine
+// to parse the literals at reduced precision.
+// 2^(-1023:1024) is held more accurately than double provides by storing its
+// exponent separately (expsig and exppow)
+// We don't want to depend on 'long double' (>64bit) availability to generate
+// these at runtime; libraries and hardware vary.
+// These small lookup tables are used for speed.
+//
+
+double sigparts[53] = {
+0.0,
+5.0000000000000000000000000000000000000000e-01,
+2.5000000000000000000000000000000000000000e-01,
+1.2500000000000000000000000000000000000000e-01,
+6.2500000000000000000000000000000000000000e-02,
+3.1250000000000000000000000000000000000000e-02,
+1.5625000000000000000000000000000000000000e-02,
+7.8125000000000000000000000000000000000000e-03,
+3.9062500000000000000000000000000000000000e-03,
+1.9531250000000000000000000000000000000000e-03,
+9.7656250000000000000000000000000000000000e-04,
+4.8828125000000000000000000000000000000000e-04,
+2.4414062500000000000000000000000000000000e-04,
+1.2207031250000000000000000000000000000000e-04,
+6.1035156250000000000000000000000000000000e-05,
+3.0517578125000000000000000000000000000000e-05,
+1.5258789062500000000000000000000000000000e-05,
+7.6293945312500000000000000000000000000000e-06,
+3.8146972656250000000000000000000000000000e-06,
+1.9073486328125000000000000000000000000000e-06,
+9.5367431640625000000000000000000000000000e-07,
+4.7683715820312500000000000000000000000000e-07,
+2.3841857910156250000000000000000000000000e-07,
+1.1920928955078125000000000000000000000000e-07,
+5.9604644775390625000000000000000000000000e-08,
+2.9802322387695312500000000000000000000000e-08,
+1.4901161193847656250000000000000000000000e-08,
+7.4505805969238281250000000000000000000000e-09,
+3.7252902984619140625000000000000000000000e-09,
+1.8626451492309570312500000000000000000000e-09,
+9.3132257461547851562500000000000000000000e-10,
+4.6566128730773925781250000000000000000000e-10,
+2.3283064365386962890625000000000000000000e-10,
+1.1641532182693481445312500000000000000000e-10,
+5.8207660913467407226562500000000000000000e-11,
+2.9103830456733703613281250000000000000000e-11,
+1.4551915228366851806640625000000000000000e-11,
+7.2759576141834259033203125000000000000000e-12,
+3.6379788070917129516601562500000000000000e-12,
+1.8189894035458564758300781250000000000000e-12,
+9.0949470177292823791503906250000000000000e-13,
+4.5474735088646411895751953125000000000000e-13,
+2.2737367544323205947875976562500000000000e-13,
+1.1368683772161602973937988281250000000000e-13,
+5.6843418860808014869689941406250000000000e-14,
+2.8421709430404007434844970703125000000000e-14,
+1.4210854715202003717422485351562500000000e-14,
+7.1054273576010018587112426757812500000000e-15,
+3.5527136788005009293556213378906250000000e-15,
+1.7763568394002504646778106689453125000000e-15,
+8.8817841970012523233890533447265625000000e-16,
+4.4408920985006261616945266723632812500000e-16,
+2.2204460492503130808472633361816406250000e-16
+};
+
+double expsig[2048] = {
+1.1125369292536006915451163586662020321096,
+2.2250738585072013830902327173324040642192,
+4.4501477170144027661804654346648081284384,
+8.9002954340288055323609308693296162568769,
+1.7800590868057611064721861738659232513754,
+3.5601181736115222129443723477318465027507,
+7.1202363472230444258887446954636930055015,
+1.4240472694446088851777489390927386011003,
+2.8480945388892177703554978781854772022006,
+5.6961890777784355407109957563709544044012,
+1.1392378155556871081421991512741908808802,
+2.2784756311113742162843983025483817617605,
+4.5569512622227484325687966050967635235210,
+9.1139025244454968651375932101935270470419,
+1.8227805048890993730275186420387054094084,
+3.6455610097781987460550372840774108188168,
+7.2911220195563974921100745681548216376335,
+1.4582244039112794984220149136309643275267,
+2.9164488078225589968440298272619286550534,
+5.8328976156451179936880596545238573101068,
+1.1665795231290235987376119309047714620214,
+2.3331590462580471974752238618095429240427,
+4.6663180925160943949504477236190858480855,
+9.3326361850321887899008954472381716961709,
+1.8665272370064377579801790894476343392342,
+3.7330544740128755159603581788952686784684,
+7.4661089480257510319207163577905373569367,
+1.4932217896051502063841432715581074713873,
+2.9864435792103004127682865431162149427747,
+5.9728871584206008255365730862324298855494,
+1.1945774316841201651073146172464859771099,
+2.3891548633682403302146292344929719542198,
+4.7783097267364806604292584689859439084395,
+9.5566194534729613208585169379718878168790,
+1.9113238906945922641717033875943775633758,
+3.8226477813891845283434067751887551267516,
+7.6452955627783690566868135503775102535032,
+1.5290591125556738113373627100755020507006,
+3.0581182251113476226747254201510041014013,
+6.1162364502226952453494508403020082028026,
+1.2232472900445390490698901680604016405605,
+2.4464945800890780981397803361208032811210,
+4.8929891601781561962795606722416065622421,
+9.7859783203563123925591213444832131244841,
+1.9571956640712624785118242688966426248968,
+3.9143913281425249570236485377932852497936,
+7.8287826562850499140472970755865704995873,
+1.5657565312570099828094594151173140999175,
+3.1315130625140199656189188302346281998349,
+6.2630261250280399312378376604692563996698,
+1.2526052250056079862475675320938512799340,
+2.5052104500112159724951350641877025598679,
+5.0104209000224319449902701283754051197359,
+1.0020841800044863889980540256750810239472,
+2.0041683600089727779961080513501620478943,
+4.0083367200179455559922161027003240957887,
+8.0166734400358911119844322054006481915774,
+1.6033346880071782223968864410801296383155,
+3.2066693760143564447937728821602592766310,
+6.4133387520287128895875457643205185532619,
+1.2826677504057425779175091528641037106524,
+2.5653355008114851558350183057282074213048,
+5.1306710016229703116700366114564148426095,
+1.0261342003245940623340073222912829685219,
+2.0522684006491881246680146445825659370438,
+4.1045368012983762493360292891651318740876,
+8.2090736025967524986720585783302637481752,
+1.6418147205193504997344117156660527496350,
+3.2836294410387009994688234313321054992701,
+6.5672588820774019989376468626642109985402,
+1.3134517764154803997875293725328421997080,
+2.6269035528309607995750587450656843994161,
+5.2538071056619215991501174901313687988322,
+1.0507614211323843198300234980262737597664,
+2.1015228422647686396600469960525475195329,
+4.2030456845295372793200939921050950390657,
+8.4060913690590745586401879842101900781314,
+1.6812182738118149117280375968420380156263,
+3.3624365476236298234560751936840760312526,
+6.7248730952472596469121503873681520625052,
+1.3449746190494519293824300774736304125010,
+2.6899492380989038587648601549472608250021,
+5.3798984761978077175297203098945216500041,
+1.0759796952395615435059440619789043300008,
+2.1519593904791230870118881239578086600017,
+4.3039187809582461740237762479156173200033,
+8.6078375619164923480475524958312346400066,
+1.7215675123832984696095104991662469280013,
+3.4431350247665969392190209983324938560026,
+6.8862700495331938784380419966649877120053,
+1.3772540099066387756876083993329975424011,
+2.7545080198132775513752167986659950848021,
+5.5090160396265551027504335973319901696042,
+1.1018032079253110205500867194663980339208,
+2.2036064158506220411001734389327960678417,
+4.4072128317012440822003468778655921356834,
+8.8144256634024881644006937557311842713668,
+1.7628851326804976328801387511462368542734,
+3.5257702653609952657602775022924737085467,
+7.0515405307219905315205550045849474170934,
+1.4103081061443981063041110009169894834187,
+2.8206162122887962126082220018339789668374,
+5.6412324245775924252164440036679579336747,
+1.1282464849155184850432888007335915867349,
+2.2564929698310369700865776014671831734699,
+4.5129859396620739401731552029343663469398,
+9.0259718793241478803463104058687326938796,
+1.8051943758648295760692620811737465387759,
+3.6103887517296591521385241623474930775518,
+7.2207775034593183042770483246949861551037,
+1.4441555006918636608554096649389972310207,
+2.8883110013837273217108193298779944620415,
+5.7766220027674546434216386597559889240829,
+1.1553244005534909286843277319511977848166,
+2.3106488011069818573686554639023955696332,
+4.6212976022139637147373109278047911392663,
+9.2425952044279274294746218556095822785327,
+1.8485190408855854858949243711219164557065,
+3.6970380817711709717898487422438329114131,
+7.3940761635423419435796974844876658228261,
+1.4788152327084683887159394968975331645652,
+2.9576304654169367774318789937950663291305,
+5.9152609308338735548637579875901326582609,
+1.1830521861667747109727515975180265316522,
+2.3661043723335494219455031950360530633044,
+4.7322087446670988438910063900721061266087,
+9.4644174893341976877820127801442122532175,
+1.8928834978668395375564025560288424506435,
+3.7857669957336790751128051120576849012870,
+7.5715339914673581502256102241153698025740,
+1.5143067982934716300451220448230739605148,
+3.0286135965869432600902440896461479210296,
+6.0572271931738865201804881792922958420592,
+1.2114454386347773040360976358584591684118,
+2.4228908772695546080721952717169183368237,
+4.8457817545391092161443905434338366736473,
+9.6915635090782184322887810868676733472947,
+1.9383127018156436864577562173735346694589,
+3.8766254036312873729155124347470693389179,
+7.7532508072625747458310248694941386778357,
+1.5506501614525149491662049738988277355671,
+3.1013003229050298983324099477976554711343,
+6.2026006458100597966648198955953109422686,
+1.2405201291620119593329639791190621884537,
+2.4810402583240239186659279582381243769074,
+4.9620805166480478373318559164762487538149,
+9.9241610332960956746637118329524975076297,
+1.9848322066592191349327423665904995015259,
+3.9696644133184382698654847331809990030519,
+7.9393288266368765397309694663619980061038,
+1.5878657653273753079461938932723996012208,
+3.1757315306547506158923877865447992024415,
+6.3514630613095012317847755730895984048830,
+1.2702926122619002463569551146179196809766,
+2.5405852245238004927139102292358393619532,
+5.0811704490476009854278204584716787239064,
+1.0162340898095201970855640916943357447813,
+2.0324681796190403941711281833886714895626,
+4.0649363592380807883422563667773429791251,
+8.1298727184761615766845127335546859582503,
+1.6259745436952323153369025467109371916501,
+3.2519490873904646306738050934218743833001,
+6.5038981747809292613476101868437487666002,
+1.3007796349561858522695220373687497533200,
+2.6015592699123717045390440747374995066401,
+5.2031185398247434090780881494749990132802,
+1.0406237079649486818156176298949998026560,
+2.0812474159298973636312352597899996053121,
+4.1624948318597947272624705195799992106241,
+8.3249896637195894545249410391599984212483,
+1.6649979327439178909049882078319996842497,
+3.3299958654878357818099764156639993684993,
+6.6599917309756715636199528313279987369986,
+1.3319983461951343127239905662655997473997,
+2.6639966923902686254479811325311994947995,
+5.3279933847805372508959622650623989895989,
+1.0655986769561074501791924530124797979198,
+2.1311973539122149003583849060249595958396,
+4.2623947078244298007167698120499191916791,
+8.5247894156488596014335396240998383833583,
+1.7049578831297719202867079248199676766717,
+3.4099157662595438405734158496399353533433,
+6.8198315325190876811468316992798707066866,
+1.3639663065038175362293663398559741413373,
+2.7279326130076350724587326797119482826746,
+5.4558652260152701449174653594238965653493,
+1.0911730452030540289834930718847793130699,
+2.1823460904061080579669861437695586261397,
+4.3646921808122161159339722875391172522794,
+8.7293843616244322318679445750782345045589,
+1.7458768723248864463735889150156469009118,
+3.4917537446497728927471778300312938018235,
+6.9835074892995457854943556600625876036471,
+1.3967014978599091570988711320125175207294,
+2.7934029957198183141977422640250350414588,
+5.5868059914396366283954845280500700829177,
+1.1173611982879273256790969056100140165835,
+2.2347223965758546513581938112200280331671,
+4.4694447931517093027163876224400560663341,
+8.9388895863034186054327752448801121326683,
+1.7877779172606837210865550489760224265337,
+3.5755558345213674421731100979520448530673,
+7.1511116690427348843462201959040897061346,
+1.4302223338085469768692440391808179412269,
+2.8604446676170939537384880783616358824538,
+5.7208893352341879074769761567232717649077,
+1.1441778670468375814953952313446543529815,
+2.2883557340936751629907904626893087059631,
+4.5767114681873503259815809253786174119262,
+9.1534229363747006519631618507572348238523,
+1.8306845872749401303926323701514469647705,
+3.6613691745498802607852647403028939295409,
+7.3227383490997605215705294806057878590818,
+1.4645476698199521043141058961211575718164,
+2.9290953396399042086282117922423151436327,
+5.8581906792798084172564235844846302872655,
+1.1716381358559616834512847168969260574531,
+2.3432762717119233669025694337938521149062,
+4.6865525434238467338051388675877042298124,
+9.3731050868476934676102777351754084596248,
+1.8746210173695386935220555470350816919250,
+3.7492420347390773870441110940701633838499,
+7.4984840694781547740882221881403267676998,
+1.4996968138956309548176444376280653535400,
+2.9993936277912619096352888752561307070799,
+5.9987872555825238192705777505122614141598,
+1.1997574511165047638541155501024522828320,
+2.3995149022330095277082311002049045656639,
+4.7990298044660190554164622004098091313279,
+9.5980596089320381108329244008196182626558,
+1.9196119217864076221665848801639236525312,
+3.8392238435728152443331697603278473050623,
+7.6784476871456304886663395206556946101246,
+1.5356895374291260977332679041311389220249,
+3.0713790748582521954665358082622778440498,
+6.1427581497165043909330716165245556880997,
+1.2285516299433008781866143233049111376199,
+2.4571032598866017563732286466098222752399,
+4.9142065197732035127464572932196445504797,
+9.8284130395464070254929145864392891009595,
+1.9656826079092814050985829172878578201919,
+3.9313652158185628101971658345757156403838,
+7.8627304316371256203943316691514312807676,
+1.5725460863274251240788663338302862561535,
+3.1450921726548502481577326676605725123070,
+6.2901843453097004963154653353211450246141,
+1.2580368690619400992630930670642290049228,
+2.5160737381238801985261861341284580098456,
+5.0321474762477603970523722682569160196913,
+1.0064294952495520794104744536513832039383,
+2.0128589904991041588209489073027664078765,
+4.0257179809982083176418978146055328157530,
+8.0514359619964166352837956292110656315060,
+1.6102871923992833270567591258422131263012,
+3.2205743847985666541135182516844262526024,
+6.4411487695971333082270365033688525052048,
+1.2882297539194266616454073006737705010410,
+2.5764595078388533232908146013475410020819,
+5.1529190156777066465816292026950820041639,
+1.0305838031355413293163258405390164008328,
+2.0611676062710826586326516810780328016655,
+4.1223352125421653172653033621560656033311,
+8.2446704250843306345306067243121312066622,
+1.6489340850168661269061213448624262413324,
+3.2978681700337322538122426897248524826649,
+6.5957363400674645076244853794497049653297,
+1.3191472680134929015248970758899409930659,
+2.6382945360269858030497941517798819861319,
+5.2765890720539716060995883035597639722638,
+1.0553178144107943212199176607119527944528,
+2.1106356288215886424398353214239055889055,
+4.2212712576431772848796706428478111778110,
+8.4425425152863545697593412856956223556221,
+1.6885085030572709139518682571391244711244,
+3.3770170061145418279037365142782489422488,
+6.7540340122290836558074730285564978844976,
+1.3508068024458167311614946057112995768995,
+2.7016136048916334623229892114225991537991,
+5.4032272097832669246459784228451983075981,
+1.0806454419566533849291956845690396615196,
+2.1612908839133067698583913691380793230392,
+4.3225817678266135397167827382761586460785,
+8.6451635356532270794335654765523172921570,
+1.7290327071306454158867130953104634584314,
+3.4580654142612908317734261906209269168628,
+6.9161308285225816635468523812418538337256,
+1.3832261657045163327093704762483707667451,
+2.7664523314090326654187409524967415334902,
+5.5329046628180653308374819049934830669805,
+1.1065809325636130661674963809986966133961,
+2.2131618651272261323349927619973932267922,
+4.4263237302544522646699855239947864535844,
+8.8526474605089045293399710479895729071687,
+1.7705294921017809058679942095979145814337,
+3.5410589842035618117359884191958291628675,
+7.0821179684071236234719768383916583257350,
+1.4164235936814247246943953676783316651470,
+2.8328471873628494493887907353566633302940,
+5.6656943747256988987775814707133266605880,
+1.1331388749451397797555162941426653321176,
+2.2662777498902795595110325882853306642352,
+4.5325554997805591190220651765706613284704,
+9.0651109995611182380441303531413226569408,
+1.8130221999122236476088260706282645313882,
+3.6260443998244472952176521412565290627763,
+7.2520887996488945904353042825130581255526,
+1.4504177599297789180870608565026116251105,
+2.9008355198595578361741217130052232502211,
+5.8016710397191156723482434260104465004421,
+1.1603342079438231344696486852020893000884,
+2.3206684158876462689392973704041786001768,
+4.6413368317752925378785947408083572003537,
+9.2826736635505850757571894816167144007074,
+1.8565347327101170151514378963233428801415,
+3.7130694654202340303028757926466857602830,
+7.4261389308404680606057515852933715205659,
+1.4852277861680936121211503170586743041132,
+2.9704555723361872242423006341173486082264,
+5.9409111446723744484846012682346972164527,
+1.1881822289344748896969202536469394432905,
+2.3763644578689497793938405072938788865811,
+4.7527289157378995587876810145877577731622,
+9.5054578314757991175753620291755155463244,
+1.9010915662951598235150724058351031092649,
+3.8021831325903196470301448116702062185297,
+7.6043662651806392940602896233404124370595,
+1.5208732530361278588120579246680824874119,
+3.0417465060722557176241158493361649748238,
+6.0834930121445114352482316986723299496476,
+1.2166986024289022870496463397344659899295,
+2.4333972048578045740992926794689319798590,
+4.8667944097156091481985853589378639597181,
+9.7335888194312182963971707178757279194361,
+1.9467177638862436592794341435751455838872,
+3.8934355277724873185588682871502911677745,
+7.7868710555449746371177365743005823355489,
+1.5573742111089949274235473148601164671098,
+3.1147484222179898548470946297202329342196,
+6.2294968444359797096941892594404658684391,
+1.2458993688871959419388378518880931736878,
+2.4917987377743918838776757037761863473757,
+4.9835974755487837677553514075523726947513,
+9.9671949510975675355107028151047453895026,
+1.9934389902195135071021405630209490779005,
+3.9868779804390270142042811260418981558010,
+7.9737559608780540284085622520837963116021,
+1.5947511921756108056817124504167592623204,
+3.1895023843512216113634249008335185246408,
+6.3790047687024432227268498016670370492817,
+1.2758009537404886445453699603334074098563,
+2.5516019074809772890907399206668148197127,
+5.1032038149619545781814798413336296394253,
+1.0206407629923909156362959682667259278851,
+2.0412815259847818312725919365334518557701,
+4.0825630519695636625451838730669037115403,
+8.1651261039391273250903677461338074230805,
+1.6330252207878254650180735492267614846161,
+3.2660504415756509300361470984535229692322,
+6.5321008831513018600722941969070459384644,
+1.3064201766302603720144588393814091876929,
+2.6128403532605207440289176787628183753858,
+5.2256807065210414880578353575256367507715,
+1.0451361413042082976115670715051273501543,
+2.0902722826084165952231341430102547003086,
+4.1805445652168331904462682860205094006172,
+8.3610891304336663808925365720410188012345,
+1.6722178260867332761785073144082037602469,
+3.3444356521734665523570146288164075204938,
+6.6888713043469331047140292576328150409876,
+1.3377742608693866209428058515265630081975,
+2.6755485217387732418856117030531260163950,
+5.3510970434775464837712234061062520327901,
+1.0702194086955092967542446812212504065580,
+2.1404388173910185935084893624425008131160,
+4.2808776347820371870169787248850016262320,
+8.5617552695640743740339574497700032524641,
+1.7123510539128148748067914899540006504928,
+3.4247021078256297496135829799080013009856,
+6.8494042156512594992271659598160026019713,
+1.3698808431302518998454331919632005203943,
+2.7397616862605037996908663839264010407885,
+5.4795233725210075993817327678528020815770,
+1.0959046745042015198763465535705604163154,
+2.1918093490084030397526931071411208326308,
+4.3836186980168060795053862142822416652616,
+8.7672373960336121590107724285644833305232,
+1.7534474792067224318021544857128966661046,
+3.5068949584134448636043089714257933322093,
+7.0137899168268897272086179428515866644186,
+1.4027579833653779454417235885703173328837,
+2.8055159667307558908834471771406346657674,
+5.6110319334615117817668943542812693315349,
+1.1222063866923023563533788708562538663070,
+2.2444127733846047127067577417125077326139,
+4.4888255467692094254135154834250154652279,
+8.9776510935384188508270309668500309304558,
+1.7955302187076837701654061933700061860912,
+3.5910604374153675403308123867400123721823,
+7.1821208748307350806616247734800247443646,
+1.4364241749661470161323249546960049488729,
+2.8728483499322940322646499093920098977459,
+5.7456966998645880645292998187840197954917,
+1.1491393399729176129058599637568039590983,
+2.2982786799458352258117199275136079181967,
+4.5965573598916704516234398550272158363934,
+9.1931147197833409032468797100544316727867,
+1.8386229439566681806493759420108863345573,
+3.6772458879133363612987518840217726691147,
+7.3544917758266727225975037680435453382294,
+1.4708983551653345445195007536087090676459,
+2.9417967103306690890390015072174181352918,
+5.8835934206613381780780030144348362705835,
+1.1767186841322676356156006028869672541167,
+2.3534373682645352712312012057739345082334,
+4.7068747365290705424624024115478690164668,
+9.4137494730581410849248048230957380329336,
+1.8827498946116282169849609646191476065867,
+3.7654997892232564339699219292382952131734,
+7.5309995784465128679398438584765904263469,
+1.5061999156893025735879687716953180852694,
+3.0123998313786051471759375433906361705388,
+6.0247996627572102943518750867812723410775,
+1.2049599325514420588703750173562544682155,
+2.4099198651028841177407500347125089364310,
+4.8198397302057682354815000694250178728620,
+9.6396794604115364709630001388500357457240,
+1.9279358920823072941926000277700071491448,
+3.8558717841646145883852000555400142982896,
+7.7117435683292291767704001110800285965792,
+1.5423487136658458353540800222160057193158,
+3.0846974273316916707081600444320114386317,
+6.1693948546633833414163200888640228772634,
+1.2338789709326766682832640177728045754527,
+2.4677579418653533365665280355456091509053,
+4.9355158837307066731330560710912183018107,
+9.8710317674614133462661121421824366036214,
+1.9742063534922826692532224284364873207243,
+3.9484127069845653385064448568729746414486,
+7.8968254139691306770128897137459492828971,
+1.5793650827938261354025779427491898565794,
+3.1587301655876522708051558854983797131588,
+6.3174603311753045416103117709967594263177,
+1.2634920662350609083220623541993518852635,
+2.5269841324701218166441247083987037705271,
+5.0539682649402436332882494167974075410542,
+1.0107936529880487266576498833594815082108,
+2.0215873059760974533152997667189630164217,
+4.0431746119521949066305995334379260328433,
+8.0863492239043898132611990668758520656866,
+1.6172698447808779626522398133751704131373,
+3.2345396895617559253044796267503408262747,
+6.4690793791235118506089592535006816525493,
+1.2938158758247023701217918507001363305099,
+2.5876317516494047402435837014002726610197,
+5.1752635032988094804871674028005453220395,
+1.0350527006597618960974334805601090644079,
+2.0701054013195237921948669611202181288158,
+4.1402108026390475843897339222404362576316,
+8.2804216052780951687794678444808725152631,
+1.6560843210556190337558935688961745030526,
+3.3121686421112380675117871377923490061053,
+6.6243372842224761350235742755846980122105,
+1.3248674568444952270047148551169396024421,
+2.6497349136889904540094297102338792048842,
+5.2994698273779809080188594204677584097684,
+1.0598939654755961816037718840935516819537,
+2.1197879309511923632075437681871033639074,
+4.2395758619023847264150875363742067278147,
+8.4791517238047694528301750727484134556294,
+1.6958303447609538905660350145496826911259,
+3.3916606895219077811320700290993653822518,
+6.7833213790438155622641400581987307645036,
+1.3566642758087631124528280116397461529007,
+2.7133285516175262249056560232794923058014,
+5.4266571032350524498113120465589846116028,
+1.0853314206470104899622624093117969223206,
+2.1706628412940209799245248186235938446411,
+4.3413256825880419598490496372471876892823,
+8.6826513651760839196980992744943753785645,
+1.7365302730352167839396198548988750757129,
+3.4730605460704335678792397097977501514258,
+6.9461210921408671357584794195955003028516,
+1.3892242184281734271516958839191000605703,
+2.7784484368563468543033917678382001211407,
+5.5568968737126937086067835356764002422813,
+1.1113793747425387417213567071352800484563,
+2.2227587494850774834427134142705600969125,
+4.4455174989701549668854268285411201938250,
+8.8910349979403099337708536570822403876501,
+1.7782069995880619867541707314164480775300,
+3.5564139991761239735083414628328961550600,
+7.1128279983522479470166829256657923101201,
+1.4225655996704495894033365851331584620240,
+2.8451311993408991788066731702663169240480,
+5.6902623986817983576133463405326338480961,
+1.1380524797363596715226692681065267696192,
+2.2761049594727193430453385362130535392384,
+4.5522099189454386860906770724261070784769,
+9.1044198378908773721813541448522141569537,
+1.8208839675781754744362708289704428313907,
+3.6417679351563509488725416579408856627815,
+7.2835358703127018977450833158817713255630,
+1.4567071740625403795490166631763542651126,
+2.9134143481250807590980333263527085302252,
+5.8268286962501615181960666527054170604504,
+1.1653657392500323036392133305410834120901,
+2.3307314785000646072784266610821668241801,
+4.6614629570001292145568533221643336483603,
+9.3229259140002584291137066443286672967206,
+1.8645851828000516858227413288657334593441,
+3.7291703656001033716454826577314669186882,
+7.4583407312002067432909653154629338373765,
+1.4916681462400413486581930630925867674753,
+2.9833362924800826973163861261851735349506,
+5.9666725849601653946327722523703470699012,
+1.1933345169920330789265544504740694139802,
+2.3866690339840661578531089009481388279605,
+4.7733380679681323157062178018962776559209,
+9.5466761359362646314124356037925553118419,
+1.9093352271872529262824871207585110623684,
+3.8186704543745058525649742415170221247368,
+7.6373409087490117051299484830340442494735,
+1.5274681817498023410259896966068088498947,
+3.0549363634996046820519793932136176997894,
+6.1098727269992093641039587864272353995788,
+1.2219745453998418728207917572854470799158,
+2.4439490907996837456415835145708941598315,
+4.8878981815993674912831670291417883196630,
+9.7757963631987349825663340582835766393261,
+1.9551592726397469965132668116567153278652,
+3.9103185452794939930265336233134306557304,
+7.8206370905589879860530672466268613114609,
+1.5641274181117975972106134493253722622922,
+3.1282548362235951944212268986507445245843,
+6.2565096724471903888424537973014890491687,
+1.2513019344894380777684907594602978098337,
+2.5026038689788761555369815189205956196675,
+5.0052077379577523110739630378411912393350,
+1.0010415475915504622147926075682382478670,
+2.0020830951831009244295852151364764957340,
+4.0041661903662018488591704302729529914680,
+8.0083323807324036977183408605459059829359,
+1.6016664761464807395436681721091811965872,
+3.2033329522929614790873363442183623931744,
+6.4066659045859229581746726884367247863487,
+1.2813331809171845916349345376873449572697,
+2.5626663618343691832698690753746899145395,
+5.1253327236687383665397381507493798290790,
+1.0250665447337476733079476301498759658158,
+2.0501330894674953466158952602997519316316,
+4.1002661789349906932317905205995038632632,
+8.2005323578699813864635810411990077265264,
+1.6401064715739962772927162082398015453053,
+3.2802129431479925545854324164796030906106,
+6.5604258862959851091708648329592061812211,
+1.3120851772591970218341729665918412362442,
+2.6241703545183940436683459331836824724884,
+5.2483407090367880873366918663673649449769,
+1.0496681418073576174673383732734729889954,
+2.0993362836147152349346767465469459779908,
+4.1986725672294304698693534930938919559815,
+8.3973451344588609397387069861877839119630,
+1.6794690268917721879477413972375567823926,
+3.3589380537835443758954827944751135647852,
+6.7178761075670887517909655889502271295704,
+1.3435752215134177503581931177900454259141,
+2.6871504430268355007163862355800908518282,
+5.3743008860536710014327724711601817036563,
+1.0748601772107342002865544942320363407313,
+2.1497203544214684005731089884640726814625,
+4.2994407088429368011462179769281453629251,
+8.5988814176858736022924359538562907258501,
+1.7197762835371747204584871907712581451700,
+3.4395525670743494409169743815425162903401,
+6.8791051341486988818339487630850325806801,
+1.3758210268297397763667897526170065161360,
+2.7516420536594795527335795052340130322720,
+5.5032841073189591054671590104680260645441,
+1.1006568214637918210934318020936052129088,
+2.2013136429275836421868636041872104258176,
+4.4026272858551672843737272083744208516353,
+8.8052545717103345687474544167488417032705,
+1.7610509143420669137494908833497683406541,
+3.5221018286841338274989817666995366813082,
+7.0442036573682676549979635333990733626164,
+1.4088407314736535309995927066798146725233,
+2.8176814629473070619991854133596293450466,
+5.6353629258946141239983708267192586900931,
+1.1270725851789228247996741653438517380186,
+2.2541451703578456495993483306877034760373,
+4.5082903407156912991986966613754069520745,
+9.0165806814313825983973933227508139041490,
+1.8033161362862765196794786645501627808298,
+3.6066322725725530393589573291003255616596,
+7.2132645451451060787179146582006511233192,
+1.4426529090290212157435829316401302246638,
+2.8853058180580424314871658632802604493277,
+5.7706116361160848629743317265605208986554,
+1.1541223272232169725948663453121041797311,
+2.3082446544464339451897326906242083594622,
+4.6164893088928678903794653812484167189243,
+9.2329786177857357807589307624968334378486,
+1.8465957235571471561517861524993666875697,
+3.6931914471142943123035723049987333751394,
+7.3863828942285886246071446099974667502789,
+1.4772765788457177249214289219994933500558,
+2.9545531576914354498428578439989867001116,
+5.9091063153828708996857156879979734002231,
+1.1818212630765741799371431375995946800446,
+2.3636425261531483598742862751991893600892,
+4.7272850523062967197485725503983787201785,
+9.4545701046125934394971451007967574403570,
+1.8909140209225186878994290201593514880714,
+3.7818280418450373757988580403187029761428,
+7.5636560836900747515977160806374059522856,
+1.5127312167380149503195432161274811904571,
+3.0254624334760299006390864322549623809142,
+6.0509248669520598012781728645099247618285,
+1.2101849733904119602556345729019849523657,
+2.4203699467808239205112691458039699047314,
+4.8407398935616478410225382916079398094628,
+9.6814797871232956820450765832158796189255,
+1.9362959574246591364090153166431759237851,
+3.8725919148493182728180306332863518475702,
+7.7451838296986365456360612665727036951404,
+1.5490367659397273091272122533145407390281,
+3.0980735318794546182544245066290814780562,
+6.1961470637589092365088490132581629561124,
+1.2392294127517818473017698026516325912225,
+2.4784588255035636946035396053032651824449,
+4.9569176510071273892070792106065303648899,
+9.9138353020142547784141584212130607297798,
+1.9827670604028509556828316842426121459560,
+3.9655341208057019113656633684852242919119,
+7.9310682416114038227313267369704485838238,
+1.5862136483222807645462653473940897167648,
+3.1724272966445615290925306947881794335295,
+6.3448545932891230581850613895763588670590,
+1.2689709186578246116370122779152717734118,
+2.5379418373156492232740245558305435468236,
+5.0758836746312984465480491116610870936472,
+1.0151767349262596893096098223322174187294,
+2.0303534698525193786192196446644348374589,
+4.0607069397050387572384392893288696749178,
+8.1214138794100775144768785786577393498356,
+1.6242827758820155028953757157315478699671,
+3.2485655517640310057907514314630957399342,
+6.4971311035280620115815028629261914798685,
+1.2994262207056124023163005725852382959737,
+2.5988524414112248046326011451704765919474,
+5.1977048828224496092652022903409531838948,
+1.0395409765644899218530404580681906367790,
+2.0790819531289798437060809161363812735579,
+4.1581639062579596874121618322727625471158,
+8.3163278125159193748243236645455250942316,
+1.6632655625031838749648647329091050188463,
+3.3265311250063677499297294658182100376927,
+6.6530622500127354998594589316364200753853,
+1.3306124500025470999718917863272840150771,
+2.6612249000050941999437835726545680301541,
+5.3224498000101883998875671453091360603082,
+1.0644899600020376799775134290618272120616,
+2.1289799200040753599550268581236544241233,
+4.2579598400081507199100537162473088482466,
+8.5159196800163014398201074324946176964932,
+1.7031839360032602879640214864989235392986,
+3.4063678720065205759280429729978470785973,
+6.8127357440130411518560859459956941571946,
+1.3625471488026082303712171891991388314389,
+2.7250942976052164607424343783982776628778,
+5.4501885952104329214848687567965553257556,
+1.0900377190420865842969737513593110651511,
+2.1800754380841731685939475027186221303023,
+4.3601508761683463371878950054372442606045,
+8.7203017523366926743757900108744885212090,
+1.7440603504673385348751580021748977042418,
+3.4881207009346770697503160043497954084836,
+6.9762414018693541395006320086995908169672,
+1.3952482803738708279001264017399181633934,
+2.7904965607477416558002528034798363267869,
+5.5809931214954833116005056069596726535738,
+1.1161986242990966623201011213919345307148,
+2.2323972485981933246402022427838690614295,
+4.4647944971963866492804044855677381228590,
+8.9295889943927732985608089711354762457180,
+1.7859177988785546597121617942270952491436,
+3.5718355977571093194243235884541904982872,
+7.1436711955142186388486471769083809965744,
+1.4287342391028437277697294353816761993149,
+2.8574684782056874555394588707633523986298,
+5.7149369564113749110789177415267047972595,
+1.1429873912822749822157835483053409594519,
+2.2859747825645499644315670966106819189038,
+4.5719495651290999288631341932213638378076,
+9.1438991302581998577262683864427276756153,
+1.8287798260516399715452536772885455351231,
+3.6575596521032799430905073545770910702461,
+7.3151193042065598861810147091541821404922,
+1.4630238608413119772362029418308364280984,
+2.9260477216826239544724058836616728561969,
+5.8520954433652479089448117673233457123938,
+1.1704190886730495817889623534646691424788,
+2.3408381773460991635779247069293382849575,
+4.6816763546921983271558494138586765699150,
+9.3633527093843966543116988277173531398300,
+1.8726705418768793308623397655434706279660,
+3.7453410837537586617246795310869412559320,
+7.4906821675075173234493590621738825118640,
+1.4981364335015034646898718124347765023728,
+2.9962728670030069293797436248695530047456,
+5.9925457340060138587594872497391060094912,
+1.1985091468012027717518974499478212018982,
+2.3970182936024055435037948998956424037965,
+4.7940365872048110870075897997912848075930,
+9.5880731744096221740151795995825696151860,
+1.9176146348819244348030359199165139230372,
+3.8352292697638488696060718398330278460744,
+7.6704585395276977392121436796660556921488,
+1.5340917079055395478424287359332111384298,
+3.0681834158110790956848574718664222768595,
+6.1363668316221581913697149437328445537190,
+1.2272733663244316382739429887465689107438,
+2.4545467326488632765478859774931378214876,
+4.9090934652977265530957719549862756429752,
+9.8181869305954531061915439099725512859504,
+1.9636373861190906212383087819945102571901,
+3.9272747722381812424766175639890205143802,
+7.8545495444763624849532351279780410287603,
+1.5709099088952724969906470255956082057521,
+3.1418198177905449939812940511912164115041,
+6.2836396355810899879625881023824328230083,
+1.2567279271162179975925176204764865646017,
+2.5134558542324359951850352409529731292033,
+5.0269117084648719903700704819059462584066,
+1.0053823416929743980740140963811892516813,
+2.0107646833859487961480281927623785033626,
+4.0215293667718975922960563855247570067253,
+8.0430587335437951845921127710495140134506,
+1.6086117467087590369184225542099028026901,
+3.2172234934175180738368451084198056053802,
+6.4344469868350361476736902168396112107605,
+1.2868893973670072295347380433679222421521,
+2.5737787947340144590694760867358444843042,
+5.1475575894680289181389521734716889686084,
+1.0295115178936057836277904346943377937217,
+2.0590230357872115672555808693886755874434,
+4.1180460715744231345111617387773511748867,
+8.2360921431488462690223234775547023497734,
+1.6472184286297692538044646955109404699547,
+3.2944368572595385076089293910218809399094,
+6.5888737145190770152178587820437618798187,
+1.3177747429038154030435717564087523759637,
+2.6355494858076308060871435128175047519275,
+5.2710989716152616121742870256350095038550,
+1.0542197943230523224348574051270019007710,
+2.1084395886461046448697148102540038015420,
+4.2168791772922092897394296205080076030840,
+8.4337583545844185794788592410160152061680,
+1.6867516709168837158957718482032030412336,
+3.3735033418337674317915436964064060824672,
+6.7470066836675348635830873928128121649344,
+1.3494013367335069727166174785625624329869,
+2.6988026734670139454332349571251248659738,
+5.3976053469340278908664699142502497319475,
+1.0795210693868055781732939828500499463895,
+2.1590421387736111563465879657000998927790,
+4.3180842775472223126931759314001997855580,
+8.6361685550944446253863518628003995711160,
+1.7272337110188889250772703725600799142232,
+3.4544674220377778501545407451201598284464,
+6.9089348440755557003090814902403196568928,
+1.3817869688151111400618162980480639313786,
+2.7635739376302222801236325960961278627571,
+5.5271478752604445602472651921922557255142,
+1.1054295750520889120494530384384511451028,
+2.2108591501041778240989060768769022902057,
+4.4217183002083556481978121537538045804114,
+8.8434366004167112963956243075076091608228,
+1.7686873200833422592791248615015218321646,
+3.5373746401666845185582497230030436643291,
+7.0747492803333690371164994460060873286582,
+1.4149498560666738074232998892012174657316,
+2.8298997121333476148465997784024349314633,
+5.6597994242666952296931995568048698629266,
+1.1319598848533390459386399113609739725853,
+2.2639197697066780918772798227219479451706,
+4.5278395394133561837545596454438958903413,
+9.0556790788267123675091192908877917806825,
+1.8111358157653424735018238581775583561365,
+3.6222716315306849470036477163551167122730,
+7.2445432630613698940072954327102334245460,
+1.4489086526122739788014590865420466849092,
+2.8978173052245479576029181730840933698184,
+5.7956346104490959152058363461681867396368,
+1.1591269220898191830411672692336373479274,
+2.3182538441796383660823345384672746958547,
+4.6365076883592767321646690769345493917095,
+9.2730153767185534643293381538690987834189,
+1.8546030753437106928658676307738197566838,
+3.7092061506874213857317352615476395133676,
+7.4184123013748427714634705230952790267351,
+1.4836824602749685542926941046190558053470,
+2.9673649205499371085853882092381116106941,
+5.9347298410998742171707764184762232213881,
+1.1869459682199748434341552836952446442776,
+2.3738919364399496868683105673904892885552,
+4.7477838728798993737366211347809785771105,
+9.4955677457597987474732422695619571542210,
+1.8991135491519597494946484539123914308442,
+3.7982270983039194989892969078247828616884,
+7.5964541966078389979785938156495657233768,
+1.5192908393215677995957187631299131446754,
+3.0385816786431355991914375262598262893507,
+6.0771633572862711983828750525196525787014,
+1.2154326714572542396765750105039305157403,
+2.4308653429145084793531500210078610314806,
+4.8617306858290169587063000420157220629611,
+9.7234613716580339174126000840314441259223,
+1.9446922743316067834825200168062888251845,
+3.8893845486632135669650400336125776503689,
+7.7787690973264271339300800672251553007378,
+1.5557538194652854267860160134450310601476,
+3.1115076389305708535720320268900621202951,
+6.2230152778611417071440640537801242405903,
+1.2446030555722283414288128107560248481181,
+2.4892061111444566828576256215120496962361,
+4.9784122222889133657152512430240993924722,
+9.9568244445778267314305024860481987849444,
+1.9913648889155653462861004972096397569889,
+3.9827297778311306925722009944192795139778,
+7.9654595556622613851444019888385590279555,
+1.5930919111324522770288803977677118055911,
+3.1861838222649045540577607955354236111822,
+6.3723676445298091081155215910708472223644,
+1.2744735289059618216231043182141694444729,
+2.5489470578119236432462086364283388889458,
+5.0978941156238472864924172728566777778915,
+1.0195788231247694572984834545713355555783,
+2.0391576462495389145969669091426711111566,
+4.0783152924990778291939338182853422223132,
+8.1566305849981556583878676365706844446265,
+1.6313261169996311316775735273141368889253,
+3.2626522339992622633551470546282737778506,
+6.5253044679985245267102941092565475557012,
+1.3050608935997049053420588218513095111402,
+2.6101217871994098106841176437026190222805,
+5.2202435743988196213682352874052380445609,
+1.0440487148797639242736470574810476089122,
+2.0880974297595278485472941149620952178244,
+4.1761948595190556970945882299241904356487,
+8.3523897190381113941891764598483808712975,
+1.6704779438076222788378352919696761742595,
+3.3409558876152445576756705839393523485190,
+6.6819117752304891153513411678787046970380,
+1.3363823550460978230702682335757409394076,
+2.6727647100921956461405364671514818788152,
+5.3455294201843912922810729343029637576304,
+1.0691058840368782584562145868605927515261,
+2.1382117680737565169124291737211855030522,
+4.2764235361475130338248583474423710061043,
+8.5528470722950260676497166948847420122086,
+1.7105694144590052135299433389769484024417,
+3.4211388289180104270598866779538968048835,
+6.8422776578360208541197733559077936097669,
+1.3684555315672041708239546711815587219534,
+2.7369110631344083416479093423631174439068,
+5.4738221262688166832958186847262348878135,
+1.0947644252537633366591637369452469775627,
+2.1895288505075266733183274738904939551254,
+4.3790577010150533466366549477809879102508,
+8.7581154020301066932733098955619758205016,
+1.7516230804060213386546619791123951641003,
+3.5032461608120426773093239582247903282007,
+7.0064923216240853546186479164495806564013,
+1.4012984643248170709237295832899161312803,
+2.8025969286496341418474591665798322625605,
+5.6051938572992682836949183331596645251210,
+1.1210387714598536567389836666319329050242,
+2.2420775429197073134779673332638658100484,
+4.4841550858394146269559346665277316200968,
+8.9683101716788292539118693330554632401937,
+1.7936620343357658507823738666110926480387,
+3.5873240686715317015647477332221852960775,
+7.1746481373430634031294954664443705921549,
+1.4349296274686126806258990932888741184310,
+2.8698592549372253612517981865777482368620,
+5.7397185098744507225035963731554964737240,
+1.1479437019748901445007192746310992947448,
+2.2958874039497802890014385492621985894896,
+4.5917748078995605780028770985243971789792,
+9.1835496157991211560057541970487943579583,
+1.8367099231598242312011508394097588715917,
+3.6734198463196484624023016788195177431833,
+7.3468396926392969248046033576390354863667,
+1.4693679385278593849609206715278070972733,
+2.9387358770557187699218413430556141945467,
+5.8774717541114375398436826861112283890933,
+1.1754943508222875079687365372222456778187,
+2.3509887016445750159374730744444913556373,
+4.7019774032891500318749461488889827112747,
+9.4039548065783000637498922977779654225493,
+1.8807909613156600127499784595555930845099,
+3.7615819226313200254999569191111861690197,
+7.5231638452626400509999138382223723380395,
+1.5046327690525280101999827676444744676079,
+3.0092655381050560203999655352889489352158,
+6.0185310762101120407999310705778978704316,
+1.2037062152420224081599862141155795740863,
+2.4074124304840448163199724282311591481726,
+4.8148248609680896326399448564623182963453,
+9.6296497219361792652798897129246365926905,
+1.9259299443872358530559779425849273185381,
+3.8518598887744717061119558851698546370762,
+7.7037197775489434122239117703397092741524,
+1.5407439555097886824447823540679418548305,
+3.0814879110195773648895647081358837096610,
+6.1629758220391547297791294162717674193219,
+1.2325951644078309459558258832543534838644,
+2.4651903288156618919116517665087069677288,
+4.9303806576313237838233035330174139354575,
+9.8607613152626475676466070660348278709151,
+1.9721522630525295135293214132069655741830,
+3.9443045261050590270586428264139311483660,
+7.8886090522101180541172856528278622967321,
+1.5777218104420236108234571305655724593464,
+3.1554436208840472216469142611311449186928,
+6.3108872417680944432938285222622898373857,
+1.2621774483536188886587657044524579674771,
+2.5243548967072377773175314089049159349543,
+5.0487097934144755546350628178098318699085,
+1.0097419586828951109270125635619663739817,
+2.0194839173657902218540251271239327479634,
+4.0389678347315804437080502542478654959268,
+8.0779356694631608874161005084957309918536,
+1.6155871338926321774832201016991461983707,
+3.2311742677852643549664402033982923967415,
+6.4623485355705287099328804067965847934829,
+1.2924697071141057419865760813593169586966,
+2.5849394142282114839731521627186339173932,
+5.1698788284564229679463043254372678347863,
+1.0339757656912845935892608650874535669573,
+2.0679515313825691871785217301749071339145,
+4.1359030627651383743570434603498142678291,
+8.2718061255302767487140869206996285356581,
+1.6543612251060553497428173841399257071316,
+3.3087224502121106994856347682798514142632,
+6.6174449004242213989712695365597028285265,
+1.3234889800848442797942539073119405657053,
+2.6469779601696885595885078146238811314106,
+5.2939559203393771191770156292477622628212,
+1.0587911840678754238354031258495524525642,
+2.1175823681357508476708062516991049051285,
+4.2351647362715016953416125033982098102570,
+8.4703294725430033906832250067964196205139,
+1.6940658945086006781366450013592839241028,
+3.3881317890172013562732900027185678482056,
+6.7762635780344027125465800054371356964111,
+1.3552527156068805425093160010874271392822,
+2.7105054312137610850186320021748542785645,
+5.4210108624275221700372640043497085571289,
+1.0842021724855044340074528008699417114258,
+2.1684043449710088680149056017398834228516,
+4.3368086899420177360298112034797668457031,
+8.6736173798840354720596224069595336914062,
+1.7347234759768070944119244813919067382812,
+3.4694469519536141888238489627838134765625,
+6.9388939039072283776476979255676269531250,
+1.3877787807814456755295395851135253906250,
+2.7755575615628913510590791702270507812500,
+5.5511151231257827021181583404541015625000,
+1.1102230246251565404236316680908203125000,
+2.2204460492503130808472633361816406250000,
+4.4408920985006261616945266723632812500000,
+8.8817841970012523233890533447265625000000,
+1.7763568394002504646778106689453125000000,
+3.5527136788005009293556213378906250000000,
+7.1054273576010018587112426757812500000000,
+1.4210854715202003717422485351562500000000,
+2.8421709430404007434844970703125000000000,
+5.6843418860808014869689941406250000000000,
+1.1368683772161602973937988281250000000000,
+2.2737367544323205947875976562500000000000,
+4.5474735088646411895751953125000000000000,
+9.0949470177292823791503906250000000000000,
+1.8189894035458564758300781250000000000000,
+3.6379788070917129516601562500000000000000,
+7.2759576141834259033203125000000000000000,
+1.4551915228366851806640625000000000000000,
+2.9103830456733703613281250000000000000000,
+5.8207660913467407226562500000000000000000,
+1.1641532182693481445312500000000000000000,
+2.3283064365386962890625000000000000000000,
+4.6566128730773925781250000000000000000000,
+9.3132257461547851562500000000000000000000,
+1.8626451492309570312500000000000000000000,
+3.7252902984619140625000000000000000000000,
+7.4505805969238281250000000000000000000000,
+1.4901161193847656250000000000000000000000,
+2.9802322387695312500000000000000000000000,
+5.9604644775390625000000000000000000000000,
+1.1920928955078125000000000000000000000000,
+2.3841857910156250000000000000000000000000,
+4.7683715820312500000000000000000000000000,
+9.5367431640625000000000000000000000000000,
+1.9073486328125000000000000000000000000000,
+3.8146972656250000000000000000000000000000,
+7.6293945312500000000000000000000000000000,
+1.5258789062500000000000000000000000000000,
+3.0517578125000000000000000000000000000000,
+6.1035156250000000000000000000000000000000,
+1.2207031250000000000000000000000000000000,
+2.4414062500000000000000000000000000000000,
+4.8828125000000000000000000000000000000000,
+9.7656250000000000000000000000000000000000,
+1.9531250000000000000000000000000000000000,
+3.9062500000000000000000000000000000000000,
+7.8125000000000000000000000000000000000000,
+1.5625000000000000000000000000000000000000,
+3.1250000000000000000000000000000000000000,
+6.2500000000000000000000000000000000000000,
+1.2500000000000000000000000000000000000000,
+2.5000000000000000000000000000000000000000,
+5.0000000000000000000000000000000000000000,
+1.0000000000000000000000000000000000000000,
+2.0000000000000000000000000000000000000000,
+4.0000000000000000000000000000000000000000,
+8.0000000000000000000000000000000000000000,
+1.6000000000000000000000000000000000000000,
+3.2000000000000000000000000000000000000000,
+6.4000000000000000000000000000000000000000,
+1.2800000000000000000000000000000000000000,
+2.5600000000000000000000000000000000000000,
+5.1200000000000000000000000000000000000000,
+1.0240000000000000000000000000000000000000,
+2.0480000000000000000000000000000000000000,
+4.0960000000000000000000000000000000000000,
+8.1920000000000000000000000000000000000000,
+1.6384000000000000000000000000000000000000,
+3.2768000000000000000000000000000000000000,
+6.5536000000000000000000000000000000000000,
+1.3107200000000000000000000000000000000000,
+2.6214400000000000000000000000000000000000,
+5.2428800000000000000000000000000000000000,
+1.0485760000000000000000000000000000000000,
+2.0971520000000000000000000000000000000000,
+4.1943040000000000000000000000000000000000,
+8.3886080000000000000000000000000000000000,
+1.6777216000000000000000000000000000000000,
+3.3554432000000000000000000000000000000000,
+6.7108864000000000000000000000000000000000,
+1.3421772800000000000000000000000000000000,
+2.6843545600000000000000000000000000000000,
+5.3687091200000000000000000000000000000000,
+1.0737418240000000000000000000000000000000,
+2.1474836480000000000000000000000000000000,
+4.2949672960000000000000000000000000000000,
+8.5899345920000000000000000000000000000000,
+1.7179869184000000000000000000000000000000,
+3.4359738368000000000000000000000000000000,
+6.8719476736000000000000000000000000000000,
+1.3743895347200000000000000000000000000000,
+2.7487790694400000000000000000000000000000,
+5.4975581388800000000000000000000000000000,
+1.0995116277760000000000000000000000000000,
+2.1990232555520000000000000000000000000000,
+4.3980465111040000000000000000000000000000,
+8.7960930222080000000000000000000000000000,
+1.7592186044416000000000000000000000000000,
+3.5184372088832000000000000000000000000000,
+7.0368744177664000000000000000000000000000,
+1.4073748835532800000000000000000000000000,
+2.8147497671065600000000000000000000000000,
+5.6294995342131200000000000000000000000000,
+1.1258999068426240000000000000000000000000,
+2.2517998136852480000000000000000000000000,
+4.5035996273704960000000000000000000000000,
+9.0071992547409920000000000000000000000000,
+1.8014398509481984000000000000000000000000,
+3.6028797018963968000000000000000000000000,
+7.2057594037927936000000000000000000000000,
+1.4411518807585587200000000000000000000000,
+2.8823037615171174400000000000000000000000,
+5.7646075230342348800000000000000000000000,
+1.1529215046068469760000000000000000000000,
+2.3058430092136939520000000000000000000000,
+4.6116860184273879040000000000000000000000,
+9.2233720368547758080000000000000000000000,
+1.8446744073709551616000000000000000000000,
+3.6893488147419103232000000000000000000000,
+7.3786976294838206464000000000000000000000,
+1.4757395258967641292800000000000000000000,
+2.9514790517935282585600000000000000000000,
+5.9029581035870565171200000000000000000000,
+1.1805916207174113034240000000000000000000,
+2.3611832414348226068480000000000000000000,
+4.7223664828696452136960000000000000000000,
+9.4447329657392904273920000000000000000000,
+1.8889465931478580854784000000000000000000,
+3.7778931862957161709568000000000000000000,
+7.5557863725914323419136000000000000000000,
+1.5111572745182864683827200000000000000000,
+3.0223145490365729367654400000000000000000,
+6.0446290980731458735308800000000000000000,
+1.2089258196146291747061760000000000000000,
+2.4178516392292583494123520000000000000000,
+4.8357032784585166988247040000000000000000,
+9.6714065569170333976494080000000000000000,
+1.9342813113834066795298816000000000000000,
+3.8685626227668133590597632000000000000000,
+7.7371252455336267181195264000000000000000,
+1.5474250491067253436239052800000000000000,
+3.0948500982134506872478105600000000000000,
+6.1897001964269013744956211200000000000000,
+1.2379400392853802748991242240000000000000,
+2.4758800785707605497982484480000000000000,
+4.9517601571415210995964968960000000000000,
+9.9035203142830421991929937920000000000000,
+1.9807040628566084398385987584000000000000,
+3.9614081257132168796771975168000000000000,
+7.9228162514264337593543950336000000000000,
+1.5845632502852867518708790067200000000000,
+3.1691265005705735037417580134400000000000,
+6.3382530011411470074835160268800000000000,
+1.2676506002282294014967032053760000000000,
+2.5353012004564588029934064107520000000000,
+5.0706024009129176059868128215040000000000,
+1.0141204801825835211973625643008000000000,
+2.0282409603651670423947251286016000000000,
+4.0564819207303340847894502572032000000000,
+8.1129638414606681695789005144064000000000,
+1.6225927682921336339157801028812800000000,
+3.2451855365842672678315602057625600000000,
+6.4903710731685345356631204115251200000000,
+1.2980742146337069071326240823050240000000,
+2.5961484292674138142652481646100480000000,
+5.1922968585348276285304963292200960000000,
+1.0384593717069655257060992658440192000000,
+2.0769187434139310514121985316880384000000,
+4.1538374868278621028243970633760768000000,
+8.3076749736557242056487941267521536000000,
+1.6615349947311448411297588253504307200000,
+3.3230699894622896822595176507008614400000,
+6.6461399789245793645190353014017228800000,
+1.3292279957849158729038070602803445760000,
+2.6584559915698317458076141205606891520000,
+5.3169119831396634916152282411213783040000,
+1.0633823966279326983230456482242756608000,
+2.1267647932558653966460912964485513216000,
+4.2535295865117307932921825928971026432000,
+8.5070591730234615865843651857942052864000,
+1.7014118346046923173168730371588410572800,
+3.4028236692093846346337460743176821145600,
+6.8056473384187692692674921486353642291200,
+1.3611294676837538538534984297270728458240,
+2.7222589353675077077069968594541456916480,
+5.4445178707350154154139937189082913832960,
+1.0889035741470030830827987437816582766592,
+2.1778071482940061661655974875633165533184,
+4.3556142965880123323311949751266331066368,
+8.7112285931760246646623899502532662132736,
+1.7422457186352049329324779900506532426547,
+3.4844914372704098658649559801013064853094,
+6.9689828745408197317299119602026129706189,
+1.3937965749081639463459823920405225941238,
+2.7875931498163278926919647840810451882476,
+5.5751862996326557853839295681620903764951,
+1.1150372599265311570767859136324180752990,
+2.2300745198530623141535718272648361505980,
+4.4601490397061246283071436545296723011961,
+8.9202980794122492566142873090593446023922,
+1.7840596158824498513228574618118689204784,
+3.5681192317648997026457149236237378409569,
+7.1362384635297994052914298472474756819137,
+1.4272476927059598810582859694494951363827,
+2.8544953854119197621165719388989902727655,
+5.7089907708238395242331438777979805455310,
+1.1417981541647679048466287755595961091062,
+2.2835963083295358096932575511191922182124,
+4.5671926166590716193865151022383844364248,
+9.1343852333181432387730302044767688728496,
+1.8268770466636286477546060408953537745699,
+3.6537540933272572955092120817907075491398,
+7.3075081866545145910184241635814150982797,
+1.4615016373309029182036848327162830196559,
+2.9230032746618058364073696654325660393119,
+5.8460065493236116728147393308651320786237,
+1.1692013098647223345629478661730264157247,
+2.3384026197294446691258957323460528314495,
+4.6768052394588893382517914646921056628990,
+9.3536104789177786765035829293842113257980,
+1.8707220957835557353007165858768422651596,
+3.7414441915671114706014331717536845303192,
+7.4828883831342229412028663435073690606384,
+1.4965776766268445882405732687014738121277,
+2.9931553532536891764811465374029476242553,
+5.9863107065073783529622930748058952485107,
+1.1972621413014756705924586149611790497021,
+2.3945242826029513411849172299223580994043,
+4.7890485652059026823698344598447161988086,
+9.5780971304118053647396689196894323976171,
+1.9156194260823610729479337839378864795234,
+3.8312388521647221458958675678757729590468,
+7.6624777043294442917917351357515459180937,
+1.5324955408658888583583470271503091836187,
+3.0649910817317777167166940543006183672375,
+6.1299821634635554334333881086012367344750,
+1.2259964326927110866866776217202473468950,
+2.4519928653854221733733552434404946937900,
+4.9039857307708443467467104868809893875800,
+9.8079714615416886934934209737619787751599,
+1.9615942923083377386986841947523957550320,
+3.9231885846166754773973683895047915100640,
+7.8463771692333509547947367790095830201279,
+1.5692754338466701909589473558019166040256,
+3.1385508676933403819178947116038332080512,
+6.2771017353866807638357894232076664161024,
+1.2554203470773361527671578846415332832205,
+2.5108406941546723055343157692830665664409,
+5.0216813883093446110686315385661331328819,
+1.0043362776618689222137263077132266265764,
+2.0086725553237378444274526154264532531528,
+4.0173451106474756888549052308529065063055,
+8.0346902212949513777098104617058130126110,
+1.6069380442589902755419620923411626025222,
+3.2138760885179805510839241846823252050444,
+6.4277521770359611021678483693646504100888,
+1.2855504354071922204335696738729300820178,
+2.5711008708143844408671393477458601640355,
+5.1422017416287688817342786954917203280710,
+1.0284403483257537763468557390983440656142,
+2.0568806966515075526937114781966881312284,
+4.1137613933030151053874229563933762624568,
+8.2275227866060302107748459127867525249137,
+1.6455045573212060421549691825573505049827,
+3.2910091146424120843099383651147010099655,
+6.5820182292848241686198767302294020199309,
+1.3164036458569648337239753460458804039862,
+2.6328072917139296674479506920917608079724,
+5.2656145834278593348959013841835216159448,
+1.0531229166855718669791802768367043231890,
+2.1062458333711437339583605536734086463779,
+4.2124916667422874679167211073468172927558,
+8.4249833334845749358334422146936345855116,
+1.6849966666969149871666884429387269171023,
+3.3699933333938299743333768858774538342046,
+6.7399866667876599486667537717549076684093,
+1.3479973333575319897333507543509815336819,
+2.6959946667150639794667015087019630673637,
+5.3919893334301279589334030174039261347274,
+1.0783978666860255917866806034807852269455,
+2.1567957333720511835733612069615704538910,
+4.3135914667441023671467224139231409077819,
+8.6271829334882047342934448278462818155639,
+1.7254365866976409468586889655692563631128,
+3.4508731733952818937173779311385127262256,
+6.9017463467905637874347558622770254524511,
+1.3803492693581127574869511724554050904902,
+2.7606985387162255149739023449108101809804,
+5.5213970774324510299478046898216203619609,
+1.1042794154864902059895609379643240723922,
+2.2085588309729804119791218759286481447844,
+4.4171176619459608239582437518572962895687,
+8.8342353238919216479164875037145925791374,
+1.7668470647783843295832975007429185158275,
+3.5336941295567686591665950014858370316550,
+7.0673882591135373183331900029716740633099,
+1.4134776518227074636666380005943348126620,
+2.8269553036454149273332760011886696253240,
+5.6539106072908298546665520023773392506479,
+1.1307821214581659709333104004754678501296,
+2.2615642429163319418666208009509357002592,
+4.5231284858326638837332416019018714005184,
+9.0462569716653277674664832038037428010367,
+1.8092513943330655534932966407607485602073,
+3.6185027886661311069865932815214971204147,
+7.2370055773322622139731865630429942408294,
+1.4474011154664524427946373126085988481659,
+2.8948022309329048855892746252171976963317,
+5.7896044618658097711785492504343953926635,
+1.1579208923731619542357098500868790785327,
+2.3158417847463239084714197001737581570654,
+4.6316835694926478169428394003475163141308,
+9.2633671389852956338856788006950326282616,
+1.8526734277970591267771357601390065256523,
+3.7053468555941182535542715202780130513046,
+7.4106937111882365071085430405560261026093,
+1.4821387422376473014217086081112052205219,
+2.9642774844752946028434172162224104410437,
+5.9285549689505892056868344324448208820874,
+1.1857109937901178411373668864889641764175,
+2.3714219875802356822747337729779283528350,
+4.7428439751604713645494675459558567056699,
+9.4856879503209427290989350919117134113399,
+1.8971375900641885458197870183823426822680,
+3.7942751801283770916395740367646853645360,
+7.5885503602567541832791480735293707290719,
+1.5177100720513508366558296147058741458144,
+3.0354201441027016733116592294117482916288,
+6.0708402882054033466233184588234965832575,
+1.2141680576410806693246636917646993166515,
+2.4283361152821613386493273835293986333030,
+4.8566722305643226772986547670587972666060,
+9.7133444611286453545973095341175945332120,
+1.9426688922257290709194619068235189066424,
+3.8853377844514581418389238136470378132848,
+7.7706755689029162836778476272940756265696,
+1.5541351137805832567355695254588151253139,
+3.1082702275611665134711390509176302506279,
+6.2165404551223330269422781018352605012557,
+1.2433080910244666053884556203670521002511,
+2.4866161820489332107769112407341042005023,
+4.9732323640978664215538224814682084010046,
+9.9464647281957328431076449629364168020091,
+1.9892929456391465686215289925872833604018,
+3.9785858912782931372430579851745667208036,
+7.9571717825565862744861159703491334416073,
+1.5914343565113172548972231940698266883215,
+3.1828687130226345097944463881396533766429,
+6.3657374260452690195888927762793067532858,
+1.2731474852090538039177785552558613506572,
+2.5462949704181076078355571105117227013143,
+5.0925899408362152156711142210234454026287,
+1.0185179881672430431342228442046890805257,
+2.0370359763344860862684456884093781610515,
+4.0740719526689721725368913768187563221029,
+8.1481439053379443450737827536375126442059,
+1.6296287810675888690147565507275025288412,
+3.2592575621351777380295131014550050576823,
+6.5185151242703554760590262029100101153647,
+1.3037030248540710952118052405820020230729,
+2.6074060497081421904236104811640040461459,
+5.2148120994162843808472209623280080922918,
+1.0429624198832568761694441924656016184584,
+2.0859248397665137523388883849312032369167,
+4.1718496795330275046777767698624064738334,
+8.3436993590660550093555535397248129476668,
+1.6687398718132110018711107079449625895334,
+3.3374797436264220037422214158899251790667,
+6.6749594872528440074844428317798503581335,
+1.3349918974505688014968885663559700716267,
+2.6699837949011376029937771327119401432534,
+5.3399675898022752059875542654238802865068,
+1.0679935179604550411975108530847760573014,
+2.1359870359209100823950217061695521146027,
+4.2719740718418201647900434123391042292054,
+8.5439481436836403295800868246782084584108,
+1.7087896287367280659160173649356416916822,
+3.4175792574734561318320347298712833833643,
+6.8351585149469122636640694597425667667287,
+1.3670317029893824527328138919485133533457,
+2.7340634059787649054656277838970267066915,
+5.4681268119575298109312555677940534133829,
+1.0936253623915059621862511135588106826766,
+2.1872507247830119243725022271176213653532,
+4.3745014495660238487450044542352427307063,
+8.7490028991320476974900089084704854614127,
+1.7498005798264095394980017816940970922825,
+3.4996011596528190789960035633881941845651,
+6.9992023193056381579920071267763883691301,
+1.3998404638611276315984014253552776738260,
+2.7996809277222552631968028507105553476521,
+5.5993618554445105263936057014211106953041,
+1.1198723710889021052787211402842221390608,
+2.2397447421778042105574422805684442781216,
+4.4794894843556084211148845611368885562433,
+8.9589789687112168422297691222737771124866,
+1.7917957937422433684459538244547554224973,
+3.5835915874844867368919076489095108449946,
+7.1671831749689734737838152978190216899893,
+1.4334366349937946947567630595638043379979,
+2.8668732699875893895135261191276086759957,
+5.7337465399751787790270522382552173519914,
+1.1467493079950357558054104476510434703983,
+2.2934986159900715116108208953020869407966,
+4.5869972319801430232216417906041738815931,
+9.1739944639602860464432835812083477631863,
+1.8347988927920572092886567162416695526373,
+3.6695977855841144185773134324833391052745,
+7.3391955711682288371546268649666782105490,
+1.4678391142336457674309253729933356421098,
+2.9356782284672915348618507459866712842196,
+5.8713564569345830697237014919733425684392,
+1.1742712913869166139447402983946685136878,
+2.3485425827738332278894805967893370273757,
+4.6970851655476664557789611935786740547514,
+9.3941703310953329115579223871573481095027,
+1.8788340662190665823115844774314696219005,
+3.7576681324381331646231689548629392438011,
+7.5153362648762663292463379097258784876022,
+1.5030672529752532658492675819451756975204,
+3.0061345059505065316985351638903513950409,
+6.0122690119010130633970703277807027900817,
+1.2024538023802026126794140655561405580163,
+2.4049076047604052253588281311122811160327,
+4.8098152095208104507176562622245622320654,
+9.6196304190416209014353125244491244641308,
+1.9239260838083241802870625048898248928262,
+3.8478521676166483605741250097796497856523,
+7.6957043352332967211482500195592995713046,
+1.5391408670466593442296500039118599142609,
+3.0782817340933186884593000078237198285219,
+6.1565634681866373769186000156474396570437,
+1.2313126936373274753837200031294879314087,
+2.4626253872746549507674400062589758628175,
+4.9252507745493099015348800125179517256350,
+9.8505015490986198030697600250359034512699,
+1.9701003098197239606139520050071806902540,
+3.9402006196394479212279040100143613805080,
+7.8804012392788958424558080200287227610159,
+1.5760802478557791684911616040057445522032,
+3.1521604957115583369823232080114891044064,
+6.3043209914231166739646464160229782088128,
+1.2608641982846233347929292832045956417626,
+2.5217283965692466695858585664091912835251,
+5.0434567931384933391717171328183825670502,
+1.0086913586276986678343434265636765134100,
+2.0173827172553973356686868531273530268201,
+4.0347654345107946713373737062547060536402,
+8.0695308690215893426747474125094121072803,
+1.6139061738043178685349494825018824214561,
+3.2278123476086357370698989650037648429121,
+6.4556246952172714741397979300075296858243,
+1.2911249390434542948279595860015059371649,
+2.5822498780869085896559191720030118743297,
+5.1644997561738171793118383440060237486594,
+1.0328999512347634358623676688012047497319,
+2.0657999024695268717247353376024094994638,
+4.1315998049390537434494706752048189989275,
+8.2631996098781074868989413504096379978551,
+1.6526399219756214973797882700819275995710,
+3.3052798439512429947595765401638551991420,
+6.6105596879024859895191530803277103982840,
+1.3221119375804971979038306160655420796568,
+2.6442238751609943958076612321310841593136,
+5.2884477503219887916153224642621683186272,
+1.0576895500643977583230644928524336637254,
+2.1153791001287955166461289857048673274509,
+4.2307582002575910332922579714097346549018,
+8.4615164005151820665845159428194693098036,
+1.6923032801030364133169031885638938619607,
+3.3846065602060728266338063771277877239214,
+6.7692131204121456532676127542555754478429,
+1.3538426240824291306535225508511150895686,
+2.7076852481648582613070451017022301791371,
+5.4153704963297165226140902034044603582743,
+1.0830740992659433045228180406808920716549,
+2.1661481985318866090456360813617841433097,
+4.3322963970637732180912721627235682866194,
+8.6645927941275464361825443254471365732389,
+1.7329185588255092872365088650894273146478,
+3.4658371176510185744730177301788546292955,
+6.9316742353020371489460354603577092585911,
+1.3863348470604074297892070920715418517182,
+2.7726696941208148595784141841430837034364,
+5.5453393882416297191568283682861674068729,
+1.1090678776483259438313656736572334813746,
+2.2181357552966518876627313473144669627491,
+4.4362715105933037753254626946289339254983,
+8.8725430211866075506509253892578678509966,
+1.7745086042373215101301850778515735701993,
+3.5490172084746430202603701557031471403986,
+7.0980344169492860405207403114062942807973,
+1.4196068833898572081041480622812588561595,
+2.8392137667797144162082961245625177123189,
+5.6784275335594288324165922491250354246378,
+1.1356855067118857664833184498250070849276,
+2.2713710134237715329666368996500141698551,
+4.5427420268475430659332737993000283397103,
+9.0854840536950861318665475986000566794205,
+1.8170968107390172263733095197200113358841,
+3.6341936214780344527466190394400226717682,
+7.2683872429560689054932380788800453435364,
+1.4536774485912137810986476157760090687073,
+2.9073548971824275621972952315520181374146,
+5.8147097943648551243945904631040362748291,
+1.1629419588729710248789180926208072549658,
+2.3258839177459420497578361852416145099317,
+4.6517678354918840995156723704832290198633,
+9.3035356709837681990313447409664580397266,
+1.8607071341967536398062689481932916079453,
+3.7214142683935072796125378963865832158906,
+7.4428285367870145592250757927731664317813,
+1.4885657073574029118450151585546332863563,
+2.9771314147148058236900303171092665727125,
+5.9542628294296116473800606342185331454250,
+1.1908525658859223294760121268437066290850,
+2.3817051317718446589520242536874132581700,
+4.7634102635436893179040485073748265163400,
+9.5268205270873786358080970147496530326800,
+1.9053641054174757271616194029499306065360,
+3.8107282108349514543232388058998612130720,
+7.6214564216699029086464776117997224261440,
+1.5242912843339805817292955223599444852288,
+3.0485825686679611634585910447198889704576,
+6.0971651373359223269171820894397779409152,
+1.2194330274671844653834364178879555881830,
+2.4388660549343689307668728357759111763661,
+4.8777321098687378615337456715518223527322,
+9.7554642197374757230674913431036447054644,
+1.9510928439474951446134982686207289410929,
+3.9021856878949902892269965372414578821857,
+7.8043713757899805784539930744829157643715,
+1.5608742751579961156907986148965831528743,
+3.1217485503159922313815972297931663057486,
+6.2434971006319844627631944595863326114972,
+1.2486994201263968925526388919172665222994,
+2.4973988402527937851052777838345330445989,
+4.9947976805055875702105555676690660891978,
+9.9895953610111751404211111353381321783955,
+1.9979190722022350280842222270676264356791,
+3.9958381444044700561684444541352528713582,
+7.9916762888089401123368889082705057427164,
+1.5983352577617880224673777816541011485433,
+3.1966705155235760449347555633082022970866,
+6.3933410310471520898695111266164045941731,
+1.2786682062094304179739022253232809188346,
+2.5573364124188608359478044506465618376693,
+5.1146728248377216718956089012931236753385,
+1.0229345649675443343791217802586247350677,
+2.0458691299350886687582435605172494701354,
+4.0917382598701773375164871210344989402708,
+8.1834765197403546750329742420689978805416,
+1.6366953039480709350065948484137995761083,
+3.2733906078961418700131896968275991522166,
+6.5467812157922837400263793936551983044333,
+1.3093562431584567480052758787310396608867,
+2.6187124863169134960105517574620793217733,
+5.2374249726338269920211035149241586435466,
+1.0474849945267653984042207029848317287093,
+2.0949699890535307968084414059696634574187,
+4.1899399781070615936168828119393269148373,
+8.3798799562141231872337656238786538296746,
+1.6759759912428246374467531247757307659349,
+3.3519519824856492748935062495514615318698,
+6.7039039649712985497870124991029230637397,
+1.3407807929942597099574024998205846127479,
+2.6815615859885194199148049996411692254959,
+5.3631231719770388398296099992823384509917,
+1.0726246343954077679659219998564676901983,
+2.1452492687908155359318439997129353803967,
+4.2904985375816310718636879994258707607934,
+8.5809970751632621437273759988517415215868,
+1.7161994150326524287454751997703483043174,
+3.4323988300653048574909503995406966086347,
+6.8647976601306097149819007990813932172694,
+1.3729595320261219429963801598162786434539,
+2.7459190640522438859927603196325572869078,
+5.4918381281044877719855206392651145738155,
+1.0983676256208975543971041278530229147631,
+2.1967352512417951087942082557060458295262,
+4.3934705024835902175884165114120916590524,
+8.7869410049671804351768330228241833181049,
+1.7573882009934360870353666045648366636210,
+3.5147764019868721740707332091296733272420,
+7.0295528039737443481414664182593466544839,
+1.4059105607947488696282932836518693308968,
+2.8118211215894977392565865673037386617936,
+5.6236422431789954785131731346074773235871,
+1.1247284486357990957026346269214954647174,
+2.2494568972715981914052692538429909294348,
+4.4989137945431963828105385076859818588697,
+8.9978275890863927656210770153719637177394,
+1.7995655178172785531242154030743927435479,
+3.5991310356345571062484308061487854870958,
+7.1982620712691142124968616122975709741915,
+1.4396524142538228424993723224595141948383,
+2.8793048285076456849987446449190283896766,
+5.7586096570152913699974892898380567793532,
+1.1517219314030582739994978579676113558706,
+2.3034438628061165479989957159352227117413,
+4.6068877256122330959979914318704454234826,
+9.2137754512244661919959828637408908469651,
+1.8427550902448932383991965727481781693930,
+3.6855101804897864767983931454963563387861,
+7.3710203609795729535967862909927126775721,
+1.4742040721959145907193572581985425355144,
+2.9484081443918291814387145163970850710288,
+5.8968162887836583628774290327941701420577,
+1.1793632577567316725754858065588340284115,
+2.3587265155134633451509716131176680568231,
+4.7174530310269266903019432262353361136462,
+9.4349060620538533806038864524706722272923,
+1.8869812124107706761207772904941344454585,
+3.7739624248215413522415545809882688909169,
+7.5479248496430827044831091619765377818338,
+1.5095849699286165408966218323953075563668,
+3.0191699398572330817932436647906151127335,
+6.0383398797144661635864873295812302254671,
+1.2076679759428932327172974659162460450934,
+2.4153359518857864654345949318324920901868,
+4.8306719037715729308691898636649841803737,
+9.6613438075431458617383797273299683607473,
+1.9322687615086291723476759454659936721495,
+3.8645375230172583446953518909319873442989,
+7.7290750460345166893907037818639746885979,
+1.5458150092069033378781407563727949377196,
+3.0916300184138066757562815127455898754391,
+6.1832600368276133515125630254911797508783,
+1.2366520073655226703025126050982359501757,
+2.4733040147310453406050252101964719003513,
+4.9466080294620906812100504203929438007026,
+9.8932160589241813624201008407858876014053,
+1.9786432117848362724840201681571775202811,
+3.9572864235696725449680403363143550405621,
+7.9145728471393450899360806726287100811242,
+1.5829145694278690179872161345257420162248,
+3.1658291388557380359744322690514840324497,
+6.3316582777114760719488645381029680648994,
+1.2663316555422952143897729076205936129799,
+2.5326633110845904287795458152411872259597,
+5.0653266221691808575590916304823744519195,
+1.0130653244338361715118183260964748903839,
+2.0261306488676723430236366521929497807678,
+4.0522612977353446860472733043858995615356,
+8.1045225954706893720945466087717991230712,
+1.6209045190941378744189093217543598246142,
+3.2418090381882757488378186435087196492285,
+6.4836180763765514976756372870174392984569,
+1.2967236152753102995351274574034878596914,
+2.5934472305506205990702549148069757193828,
+5.1868944611012411981405098296139514387656,
+1.0373788922202482396281019659227902877531,
+2.0747577844404964792562039318455805755062,
+4.1495155688809929585124078636911611510124,
+8.2990311377619859170248157273823223020249,
+1.6598062275523971834049631454764644604050,
+3.3196124551047943668099262909529289208100,
+6.6392249102095887336198525819058578416199,
+1.3278449820419177467239705163811715683240,
+2.6556899640838354934479410327623431366480,
+5.3113799281676709868958820655246862732959,
+1.0622759856335341973791764131049372546592,
+2.1245519712670683947583528262098745093184,
+4.2491039425341367895167056524197490186367,
+8.4982078850682735790334113048394980372735,
+1.6996415770136547158066822609678996074547,
+3.3992831540273094316133645219357992149094,
+6.7985663080546188632267290438715984298188,
+1.3597132616109237726453458087743196859638,
+2.7194265232218475452906916175486393719275,
+5.4388530464436950905813832350972787438550,
+1.0877706092887390181162766470194557487710,
+2.1755412185774780362325532940389114975420,
+4.3510824371549560724651065880778229950840,
+8.7021648743099121449302131761556459901681,
+1.7404329748619824289860426352311291980336,
+3.4808659497239648579720852704622583960672,
+6.9617318994479297159441705409245167921344,
+1.3923463798895859431888341081849033584269,
+2.7846927597791718863776682163698067168538,
+5.5693855195583437727553364327396134337076,
+1.1138771039116687545510672865479226867415,
+2.2277542078233375091021345730958453734830,
+4.4555084156466750182042691461916907469660,
+8.9110168312933500364085382923833814939321,
+1.7822033662586700072817076584766762987864,
+3.5644067325173400145634153169533525975728,
+7.1288134650346800291268306339067051951457,
+1.4257626930069360058253661267813410390291,
+2.8515253860138720116507322535626820780583,
+5.7030507720277440233014645071253641561165,
+1.1406101544055488046602929014250728312233,
+2.2812203088110976093205858028501456624466,
+4.5624406176221952186411716057002913248932,
+9.1248812352443904372823432114005826497865,
+1.8249762470488780874564686422801165299573,
+3.6499524940977561749129372845602330599146,
+7.2999049881955123498258745691204661198292,
+1.4599809976391024699651749138240932239658,
+2.9199619952782049399303498276481864479317,
+5.8399239905564098798606996552963728958633,
+1.1679847981112819759721399310592745791727,
+2.3359695962225639519442798621185491583453,
+4.6719391924451279038885597242370983166907,
+9.3438783848902558077771194484741966333813,
+1.8687756769780511615554238896948393266763,
+3.7375513539561023231108477793896786533525,
+7.4751027079122046462216955587793573067051,
+1.4950205415824409292443391117558714613410,
+2.9900410831648818584886782235117429226820,
+5.9800821663297637169773564470234858453641,
+1.1960164332659527433954712894046971690728,
+2.3920328665319054867909425788093943381456,
+4.7840657330638109735818851576187886762912,
+9.5681314661276219471637703152375773525825,
+1.9136262932255243894327540630475154705165,
+3.8272525864510487788655081260950309410330,
+7.6545051729020975577310162521900618820660,
+1.5309010345804195115462032504380123764132,
+3.0618020691608390230924065008760247528264,
+6.1236041383216780461848130017520495056528,
+1.2247208276643356092369626003504099011306,
+2.4494416553286712184739252007008198022611,
+4.8988833106573424369478504014016396045222,
+9.7977666213146848738957008028032792090445,
+1.9595533242629369747791401605606558418089,
+3.9191066485258739495582803211213116836178,
+7.8382132970517478991165606422426233672356,
+1.5676426594103495798233121284485246734471,
+3.1352853188206991596466242568970493468942,
+6.2705706376413983192932485137940986937885,
+1.2541141275282796638586497027588197387577,
+2.5082282550565593277172994055176394775154,
+5.0164565101131186554345988110352789550308,
+1.0032913020226237310869197622070557910062,
+2.0065826040452474621738395244141115820123,
+4.0131652080904949243476790488282231640246,
+8.0263304161809898486953580976564463280492,
+1.6052660832361979697390716195312892656098,
+3.2105321664723959394781432390625785312197,
+6.4210643329447918789562864781251570624394,
+1.2842128665889583757912572956250314124879,
+2.5684257331779167515825145912500628249758,
+5.1368514663558335031650291825001256499515,
+1.0273702932711667006330058365000251299903,
+2.0547405865423334012660116730000502599806,
+4.1094811730846668025320233460001005199612,
+8.2189623461693336050640466920002010399224,
+1.6437924692338667210128093384000402079845,
+3.2875849384677334420256186768000804159690,
+6.5751698769354668840512373536001608319379,
+1.3150339753870933768102474707200321663876,
+2.6300679507741867536204949414400643327752,
+5.2601359015483735072409898828801286655503,
+1.0520271803096747014481979765760257331101,
+2.1040543606193494028963959531520514662201,
+4.2081087212386988057927919063041029324403,
+8.4162174424773976115855838126082058648805,
+1.6832434884954795223171167625216411729761,
+3.3664869769909590446342335250432823459522,
+6.7329739539819180892684670500865646919044,
+1.3465947907963836178536934100173129383809,
+2.6931895815927672357073868200346258767618,
+5.3863791631855344714147736400692517535235,
+1.0772758326371068942829547280138503507047,
+2.1545516652742137885659094560277007014094,
+4.3091033305484275771318189120554014028188,
+8.6182066610968551542636378241108028056377,
+1.7236413322193710308527275648221605611275,
+3.4472826644387420617054551296443211222551,
+6.8945653288774841234109102592886422445101,
+1.3789130657754968246821820518577284489020,
+2.7578261315509936493643641037154568978041,
+5.5156522631019872987287282074309137956081,
+1.1031304526203974597457456414861827591216,
+2.2062609052407949194914912829723655182432,
+4.4125218104815898389829825659447310364865,
+8.8250436209631796779659651318894620729730,
+1.7650087241926359355931930263778924145946,
+3.5300174483852718711863860527557848291892,
+7.0600348967705437423727721055115696583784,
+1.4120069793541087484745544211023139316757,
+2.8240139587082174969491088422046278633514,
+5.6480279174164349938982176844092557267027,
+1.1296055834832869987796435368818511453405,
+2.2592111669665739975592870737637022906811,
+4.5184223339331479951185741475274045813622,
+9.0368446678662959902371482950548091627243,
+1.8073689335732591980474296590109618325449,
+3.6147378671465183960948593180219236650897,
+7.2294757342930367921897186360438473301795,
+1.4458951468586073584379437272087694660359,
+2.8917902937172147168758874544175389320718,
+5.7835805874344294337517749088350778641436,
+1.1567161174868858867503549817670155728287,
+2.3134322349737717735007099635340311456574,
+4.6268644699475435470014199270680622913149,
+9.2537289398950870940028398541361245826297,
+1.8507457879790174188005679708272249165259,
+3.7014915759580348376011359416544498330519,
+7.4029831519160696752022718833088996661038,
+1.4805966303832139350404543766617799332208,
+2.9611932607664278700809087533235598664415,
+5.9223865215328557401618175066471197328830,
+1.1844773043065711480323635013294239465766,
+2.3689546086131422960647270026588478931532,
+4.7379092172262845921294540053176957863064,
+9.4758184344525691842589080106353915726128,
+1.8951636868905138368517816021270783145226,
+3.7903273737810276737035632042541566290451,
+7.5806547475620553474071264085083132580903,
+1.5161309495124110694814252817016626516181,
+3.0322618990248221389628505634033253032361,
+6.0645237980496442779257011268066506064722,
+1.2129047596099288555851402253613301212944,
+2.4258095192198577111702804507226602425889,
+4.8516190384397154223405609014453204851778,
+9.7032380768794308446811218028906409703555,
+1.9406476153758861689362243605781281940711,
+3.8812952307517723378724487211562563881422,
+7.7625904615035446757448974423125127762844,
+1.5525180923007089351489794884625025552569,
+3.1050361846014178702979589769250051105138,
+6.2100723692028357405959179538500102210275,
+1.2420144738405671481191835907700020442055,
+2.4840289476811342962383671815400040884110,
+4.9680578953622685924767343630800081768220,
+9.9361157907245371849534687261600163536441,
+1.9872231581449074369906937452320032707288,
+3.9744463162898148739813874904640065414576,
+7.9488926325796297479627749809280130829153,
+1.5897785265159259495925549961856026165831,
+3.1795570530318518991851099923712052331661,
+6.3591141060637037983702199847424104663322,
+1.2718228212127407596740439969484820932664,
+2.5436456424254815193480879938969641865329,
+5.0872912848509630386961759877939283730658,
+1.0174582569701926077392351975587856746132,
+2.0349165139403852154784703951175713492263,
+4.0698330278807704309569407902351426984526,
+8.1396660557615408619138815804702853969052,
+1.6279332111523081723827763160940570793810,
+3.2558664223046163447655526321881141587621,
+6.5117328446092326895311052643762283175242,
+1.3023465689218465379062210528752456635048,
+2.6046931378436930758124421057504913270097,
+5.2093862756873861516248842115009826540193,
+1.0418772551374772303249768423001965308039,
+2.0837545102749544606499536846003930616077,
+4.1675090205499089212999073692007861232155,
+8.3350180410998178425998147384015722464309,
+1.6670036082199635685199629476803144492862,
+3.3340072164399271370399258953606288985724,
+6.6680144328798542740798517907212577971448,
+1.3336028865759708548159703581442515594290,
+2.6672057731519417096319407162885031188579,
+5.3344115463038834192638814325770062377158,
+1.0668823092607766838527762865154012475432,
+2.1337646185215533677055525730308024950863,
+4.2675292370431067354111051460616049901726,
+8.5350584740862134708222102921232099803453,
+1.7070116948172426941644420584246419960691,
+3.4140233896344853883288841168492839921381,
+6.8280467792689707766577682336985679842762,
+1.3656093558537941553315536467397135968552,
+2.7312187117075883106631072934794271937105,
+5.4624374234151766213262145869588543874210,
+1.0924874846830353242652429173917708774842,
+2.1849749693660706485304858347835417549684,
+4.3699499387321412970609716695670835099368,
+8.7398998774642825941219433391341670198736,
+1.7479799754928565188243886678268334039747,
+3.4959599509857130376487773356536668079494,
+6.9919199019714260752975546713073336158989,
+1.3983839803942852150595109342614667231798,
+2.7967679607885704301190218685229334463595,
+5.5935359215771408602380437370458668927191,
+1.1187071843154281720476087474091733785438,
+2.2374143686308563440952174948183467570876,
+4.4748287372617126881904349896366935141753,
+8.9496574745234253763808699792733870283505,
+1.7899314949046850752761739958546774056701,
+3.5798629898093701505523479917093548113402,
+7.1597259796187403011046959834187096226804,
+1.4319451959237480602209391966837419245361,
+2.8638903918474961204418783933674838490722,
+5.7277807836949922408837567867349676981443,
+1.1455561567389984481767513573469935396289,
+2.2911123134779968963535027146939870792577,
+4.5822246269559937927070054293879741585155,
+9.1644492539119875854140108587759483170310,
+1.8328898507823975170828021717551896634062,
+3.6657797015647950341656043435103793268124,
+7.3315594031295900683312086870207586536248,
+1.4663118806259180136662417374041517307250,
+2.9326237612518360273324834748083034614499,
+5.8652475225036720546649669496166069228998,
+1.1730495045007344109329933899233213845800,
+2.3460990090014688218659867798466427691599,
+4.6921980180029376437319735596932855383198,
+9.3843960360058752874639471193865710766397,
+1.8768792072011750574927894238773142153279,
+3.7537584144023501149855788477546284306559,
+7.5075168288047002299711576955092568613118,
+1.5015033657609400459942315391018513722624,
+3.0030067315218800919884630782037027445247,
+6.0060134630437601839769261564074054890494,
+1.2012026926087520367953852312814810978099,
+2.4024053852175040735907704625629621956198,
+4.8048107704350081471815409251259243912395,
+9.6096215408700162943630818502518487824791,
+1.9219243081740032588726163700503697564958,
+3.8438486163480065177452327401007395129916,
+7.6876972326960130354904654802014790259832,
+1.5375394465392026070980930960402958051966,
+3.0750788930784052141961861920805916103933,
+6.1501577861568104283923723841611832207866,
+1.2300315572313620856784744768322366441573,
+2.4600631144627241713569489536644732883146,
+4.9201262289254483427138979073289465766293,
+9.8402524578508966854277958146578931532585,
+1.9680504915701793370855591629315786306517,
+3.9361009831403586741711183258631572613034,
+7.8722019662807173483422366517263145226068,
+1.5744403932561434696684473303452629045214,
+3.1488807865122869393368946606905258090427,
+6.2977615730245738786737893213810516180855,
+1.2595523146049147757347578642762103236171,
+2.5191046292098295514695157285524206472342,
+5.0382092584196591029390314571048412944684,
+1.0076418516839318205878062914209682588937,
+2.0152837033678636411756125828419365177874,
+4.0305674067357272823512251656838730355747,
+8.0611348134714545647024503313677460711494,
+1.6122269626942909129404900662735492142299,
+3.2244539253885818258809801325470984284598,
+6.4489078507771636517619602650941968569195,
+1.2897815701554327303523920530188393713839,
+2.5795631403108654607047841060376787427678,
+5.1591262806217309214095682120753574855356,
+1.0318252561243461842819136424150714971071,
+2.0636505122486923685638272848301429942142,
+4.1273010244973847371276545696602859884285,
+8.2546020489947694742553091393205719768570,
+1.6509204097989538948510618278641143953714,
+3.3018408195979077897021236557282287907428,
+6.6036816391958155794042473114564575814856,
+1.3207363278391631158808494622912915162971,
+2.6414726556783262317616989245825830325942,
+5.2829453113566524635233978491651660651885,
+1.0565890622713304927046795698330332130377,
+2.1131781245426609854093591396660664260754,
+4.2263562490853219708187182793321328521508,
+8.4527124981706439416374365586642657043016,
+1.6905424996341287883274873117328531408603,
+3.3810849992682575766549746234657062817206,
+6.7621699985365151533099492469314125634412,
+1.3524339997073030306619898493862825126882,
+2.7048679994146060613239796987725650253765,
+5.4097359988292121226479593975451300507530,
+1.0819471997658424245295918795090260101506,
+2.1638943995316848490591837590180520203012,
+4.3277887990633696981183675180361040406024,
+8.6555775981267393962367350360722080812048,
+1.7311155196253478792473470072144416162410,
+3.4622310392506957584946940144288832324819,
+6.9244620785013915169893880288577664649638,
+1.3848924157002783033978776057715532929928,
+2.7697848314005566067957552115431065859855,
+5.5395696628011132135915104230862131719711,
+1.1079139325602226427183020846172426343942,
+2.2158278651204452854366041692344852687884,
+4.4316557302408905708732083384689705375769,
+8.8633114604817811417464166769379410751537,
+1.7726622920963562283492833353875882150307,
+3.5453245841927124566985666707751764300615,
+7.0906491683854249133971333415503528601230,
+1.4181298336770849826794266683100705720246,
+2.8362596673541699653588533366201411440492,
+5.6725193347083399307177066732402822880984,
+1.1345038669416679861435413346480564576197,
+2.2690077338833359722870826692961129152393,
+4.5380154677666719445741653385922258304787,
+9.0760309355333438891483306771844516609574,
+1.8152061871066687778296661354368903321915,
+3.6304123742133375556593322708737806643830,
+7.2608247484266751113186645417475613287659,
+1.4521649496853350222637329083495122657532,
+2.9043298993706700445274658166990245315064,
+5.8086597987413400890549316333980490630127,
+1.1617319597482680178109863266796098126025,
+2.3234639194965360356219726533592196252051,
+4.6469278389930720712439453067184392504102,
+9.2938556779861441424878906134368785008204,
+1.8587711355972288284975781226873757001641,
+3.7175422711944576569951562453747514003282,
+7.4350845423889153139903124907495028006563,
+1.4870169084777830627980624981499005601313,
+2.9740338169555661255961249962998011202625,
+5.9480676339111322511922499925996022405250,
+1.1896135267822264502384499985199204481050,
+2.3792270535644529004768999970398408962100,
+4.7584541071289058009537999940796817924200,
+9.5169082142578116019075999881593635848401,
+1.9033816428515623203815199976318727169680,
+3.8067632857031246407630399952637454339360,
+7.6135265714062492815260799905274908678721,
+1.5227053142812498563052159981054981735744,
+3.0454106285624997126104319962109963471488,
+6.0908212571249994252208639924219926942976,
+1.2181642514249998850441727984843985388595,
+2.4363285028499997700883455969687970777191,
+4.8726570056999995401766911939375941554381,
+9.7453140113999990803533823878751883108762,
+1.9490628022799998160706764775750376621752,
+3.8981256045599996321413529551500753243505,
+7.7962512091199992642827059103001506487010,
+1.5592502418239998528565411820600301297402,
+3.1185004836479997057130823641200602594804,
+6.2370009672959994114261647282401205189608,
+1.2474001934591998822852329456480241037922,
+2.4948003869183997645704658912960482075843,
+4.9896007738367995291409317825920964151686,
+9.9792015476735990582818635651841928303373,
+1.9958403095347198116563727130368385660675,
+3.9916806190694396233127454260736771321349,
+7.9833612381388792466254908521473542642698,
+1.5966722476277758493250981704294708528540,
+3.1933444952555516986501963408589417057079,
+6.3866889905111033973003926817178834114158,
+1.2773377981022206794600785363435766822832,
+2.5546755962044413589201570726871533645663,
+5.1093511924088827178403141453743067291327,
+1.0218702384817765435680628290748613458265,
+2.0437404769635530871361256581497226916531,
+4.0874809539271061742722513162994453833061,
+8.1749619078542123485445026325988907666123,
+1.6349923815708424697089005265197781533225,
+3.2699847631416849394178010530395563066449,
+6.5399695262833698788356021060791126132898,
+1.3079939052566739757671204212158225226580,
+2.6159878105133479515342408424316450453159,
+5.2319756210266959030684816848632900906319,
+1.0463951242053391806136963369726580181264,
+2.0927902484106783612273926739453160362527,
+4.1855804968213567224547853478906320725055,
+8.3711609936427134449095706957812641450110,
+1.6742321987285426889819141391562528290022,
+3.3484643974570853779638282783125056580044,
+6.6969287949141707559276565566250113160088,
+1.3393857589828341511855313113250022632018,
+2.6787715179656683023710626226500045264035,
+5.3575430359313366047421252453000090528070,
+1.0715086071862673209484250490600018105614,
+2.1430172143725346418968500981200036211228,
+4.2860344287450692837937001962400072422456,
+8.5720688574901385675874003924800144844912,
+1.7144137714980277135174800784960028968982,
+3.4288275429960554270349601569920057937965,
+6.8576550859921108540699203139840115875930,
+1.3715310171984221708139840627968023175186,
+2.7430620343968443416279681255936046350372,
+5.4861240687936886832559362511872092700744,
+1.0972248137587377366511872502374418540149,
+2.1944496275174754733023745004748837080298,
+4.3888992550349509466047490009497674160595,
+8.7777985100699018932094980018995348321190,
+1.7555597020139803786418996003799069664238,
+3.5111194040279607572837992007598139328476,
+7.0222388080559215145675984015196278656952,
+1.4044477616111843029135196803039255731390,
+2.8088955232223686058270393606078511462781,
+5.6177910464447372116540787212157022925562,
+1.1235582092889474423308157442431404585112,
+2.2471164185778948846616314884862809170225,
+4.4942328371557897693232629769725618340449,
+8.9884656743115795386465259539451236680899,
+1.7976931348623159077293051907890247336180
+};
+
+int exppow[2048] = {
+-308,-308,-308,-308,-307,-307,-307,-306,-306,-306,-305,-305,-305,-305,-304,-304,-304,-303,-303,-303,-302,-302,-302,-302,-301,-301,-301,-300,-300,-300,-299,-299,-299,-299,-298,-298,-298,-297,-297,-297,-296,-296,-296,-296,-295,-295,-295,-294,-294,-294,-293,-293,-293,-292,-292,-292,-292,-291,-291,-291,-290,-290,-290,-289,-289,-289,-289,-288,-288,-288,-287,-287,-287,-286,-286,-286,-286,-285,-285,-285,-284,-284,-284,-283,-283,-283,-283,-282,-282,-282,-281,-281,-281,-280,-280,-280,-280,-279,-2 [...]
+
+
+int monthday[366] =
+{ 229,301,302,303,304,305,306,307,308,309,310,311,312,313,314,315,316,317,318,319,320,321,322,323,324,325,326,
+327,328,329,330,331,401,402,403,404,405,406,407,408,409,410,411,412,413,414,415,416,417,418,419,420,421,422,423,424,
+425,426,427,428,429,430,501,502,503,504,505,506,507,508,509,510,511,512,513,514,515,516,517,518,519,520,521,522,523,
+524,525,526,527,528,529,530,531,601,602,603,604,605,606,607,608,609,610,611,612,613,614,615,616,617,618,619,620,621,
+622,623,624,625,626,627,628,629,630,701,702,703,704,705,706,707,708,709,710,711,712,713,714,715,716,717,718,719,720,
+721,722,723,724,725,726,727,728,729,730,731,801,802,803,804,805,806,807,808,809,810,811,812,813,814,815,816,817,818,
+819,820,821,822,823,824,825,826,827,828,829,830,831,901,902,903,904,905,906,907,908,909,910,911,912,913,914,915,916,
+917,918,919,920,921,922,923,924,925,926,927,928,929,930,1001,1002,1003,1004,1005,1006,1007,1008,1009,1010,1011,1012,
+1013,1014,1015,1016,1017,1018,1019,1020,1021,1022,1023,1024,1025,1026,1027,1028,1029,1030,1031,1101,1102,1103,1104,
+1105,1106,1107,1108,1109,1110,1111,1112,1113,1114,1115,1116,1117,1118,1119,1120,1121,1122,1123,1124,1125,1126,1127,
+1128,1129,1130,1201,1202,1203,1204,1205,1206,1207,1208,1209,1210,1211,1212,1213,1214,1215,1216,1217,1218,1219,1220,
+1221,1222,1223,1224,1225,1226,1227,1228,1229,1230,1231,101,102,103,104,105,106,107,108,109,110,111,112,113,114,115,
+116,117,118,119,120,121,122,123,124,125,126,127,128,129,130,131,201,202,203,204,205,206,207,208,209,210,211,212,213,
+214,215,216,217,218,219,220,221,222,223,224,225,226,227,228 };
+
+
+
diff --git a/src/gsumm.c b/src/gsumm.c
index 83a1e6a..bfe4f50 100644
--- a/src/gsumm.c
+++ b/src/gsumm.c
@@ -3,10 +3,27 @@
 
 static int *grp = NULL;      // the group of each x item, like a factor
 static int ngrp = 0;         // number of groups
-static int *grpsize = NULL;  // size of each group, used by gmean not gsum
+static int *grpsize = NULL;  // size of each group, used by gmean (and gmedian) not gsum
 static int grpn = 0;         // length of underlying x == length(grp)
+static int *irows;           // GForce support for subsets in 'i' (TODO: joins in 'i')
+static int irowslen = -1;    // -1 is for irows = NULL
 
-SEXP gstart(SEXP o, SEXP f, SEXP l) {
+// for gmedian
+static int maxgrpn = 0;
+static int *oo = NULL;
+static int *ff = NULL;
+static int isunsorted = 0;
+static union {double d;
+              long long ll;} u;
+
+// from R's src/cov.c (for variance / sd)
+#ifdef HAVE_LONG_DOUBLE
+# define SQRTL sqrtl
+#else
+# define SQRTL sqrt
+#endif
+
+SEXP gstart(SEXP o, SEXP f, SEXP l, SEXP irowsArg) {
     int i, j, g, *this;
     // clock_t start = clock();
     if (!isInteger(o)) error("o is not integer vector");
@@ -18,9 +35,9 @@ SEXP gstart(SEXP o, SEXP f, SEXP l) {
     grpsize = INTEGER(l);  // l will be protected in calling R scope until gend(), too
     for (i=0; i<ngrp; i++) grpn+=grpsize[i];
     if (LENGTH(o) && LENGTH(o)!=grpn) error("o has length %d but sum(l)=%d", LENGTH(o), grpn);
-    grp = malloc(grpn * sizeof(int));
-    if (!grp) error("Unable to allocate %d * %d bytes in gstart", grpn, sizeof(int));
+    grp = (int *)R_alloc(grpn, sizeof(int));
     if (LENGTH(o)) {
+        isunsorted = 1; // for gmedian
         for (g=0; g<ngrp; g++) {
             this = INTEGER(o) + INTEGER(f)[g]-1;
             for (j=0; j<grpsize[g]; j++)  grp[ this[j]-1 ] = g;
@@ -31,12 +48,21 @@ SEXP gstart(SEXP o, SEXP f, SEXP l) {
             for (j=0; j<grpsize[g]; j++)  this[j] = g;
         }
     }
+    // for gmedian
+    // initialise maxgrpn
+    maxgrpn = INTEGER(getAttrib(o, install("maxgrpn")))[0];
+    oo = INTEGER(o);
+    ff = INTEGER(f);
+
+    irows = INTEGER(irowsArg);
+    if (!isNull(irowsArg)) irowslen = length(irowsArg);
+
     // Rprintf("gstart took %8.3f\n", 1.0*(clock()-start)/CLOCKS_PER_SEC);
     return(R_NilValue);
 }
 
 SEXP gend() {
-    free(grp); grp = NULL; ngrp = 0;
+    ngrp = 0; maxgrpn = 0; irowslen = -1; isunsorted = 0;
     return(R_NilValue);
 }
 
@@ -46,11 +72,12 @@ SEXP gsum(SEXP x, SEXP narm)
 {
     if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
     if (!isVectorAtomic(x)) error("GForce sum can only be applied to columns, not .SD or similar. To sum all items in a list such as .SD, either add the prefix base::sum(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,sum),by=,.SDcols=]'");
-    int i, thisgrp;
-    int n = LENGTH(x);
+    if (inherits(x, "factor")) error("sum is not meaningful for factors.");
+    int i, ix, thisgrp;
+    int n = (irowslen == -1) ? length(x) : irowslen;
     //clock_t start = clock();
     SEXP ans;
-    if (grpn != length(x)) error("grpn [%d] != length(x) [%d] in gsum", grpn, length(x));
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gsum", grpn, n);
     long double *s = malloc(ngrp * sizeof(long double));
     if (!s) error("Unable to allocate %d * %d bytes for gsum", ngrp, sizeof(long double));
     memset(s, 0, ngrp * sizeof(long double)); // all-0 bits == (long double)0, checked in init.c
@@ -58,11 +85,12 @@ SEXP gsum(SEXP x, SEXP narm)
     case LGLSXP: case INTSXP:
         for (i=0; i<n; i++) {
             thisgrp = grp[i];
-            if(INTEGER(x)[i] == NA_INTEGER) { 
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if(INTEGER(x)[ix] == NA_INTEGER) { 
                 if (!LOGICAL(narm)[0]) s[thisgrp] = NA_REAL;  // Let NA_REAL propogate from here. R_NaReal is IEEE.
                 continue;
             }
-            s[thisgrp] += INTEGER(x)[i];  // no under/overflow here, s is long double (like base)
+            s[thisgrp] += INTEGER(x)[ix];  // no under/overflow here, s is long double (like base)
         }
         ans = PROTECT(allocVector(INTSXP, ngrp));
         for (i=0; i<ngrp; i++) {
@@ -83,8 +111,9 @@ SEXP gsum(SEXP x, SEXP narm)
         ans = PROTECT(allocVector(REALSXP, ngrp));
         for (i=0; i<n; i++) {
             thisgrp = grp[i];
-            if(ISNAN(REAL(x)[i]) && LOGICAL(narm)[0]) continue;  // else let NA_REAL propogate from here
-            s[thisgrp] += REAL(x)[i];  // done in long double, like base
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if(ISNAN(REAL(x)[ix]) && LOGICAL(narm)[0]) continue;  // else let NA_REAL propogate from here
+            s[thisgrp] += REAL(x)[ix];  // done in long double, like base
         }
         for (i=0; i<ngrp; i++) {
             if (s[i] > DBL_MAX) REAL(ans)[i] = R_PosInf;
@@ -106,10 +135,11 @@ SEXP gsum(SEXP x, SEXP narm)
 SEXP gmean(SEXP x, SEXP narm)
 {
     SEXP ans;
-    int i, protecti=0, thisgrp, n;
+    int i, ix, protecti=0, thisgrp, n;
     //clock_t start = clock();
     if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
     if (!isVectorAtomic(x)) error("GForce mean can only be applied to columns, not .SD or similar. Likely you're looking for 'DT[,lapply(.SD,mean),by=,.SDcols=]'. See ?data.table.");
+    if (inherits(x, "factor")) error("mean is not meaningful for factors.");
     if (!LOGICAL(narm)[0]) {
         ans = PROTECT(gsum(x,narm)); protecti++;
         switch(TYPEOF(ans)) {
@@ -125,8 +155,8 @@ SEXP gmean(SEXP x, SEXP narm)
         return(ans);
     }
     // na.rm=TRUE.  Similar to gsum, but we need to count the non-NA as well for the divisor
-    n = LENGTH(x);
-    if (grpn != n) error("grpn [%d] != length(x) [%d] in gsum", grpn, length(x));
+    n = (irowslen == -1) ? length(x) : irowslen;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gsum", grpn, n);
 
     long double *s = malloc(ngrp * sizeof(long double));
     if (!s) error("Unable to allocate %d * %d bytes for sum in gmean na.rm=TRUE", ngrp, sizeof(long double));
@@ -140,16 +170,18 @@ SEXP gmean(SEXP x, SEXP narm)
     case LGLSXP: case INTSXP:
         for (i=0; i<n; i++) {
             thisgrp = grp[i];
-            if(INTEGER(x)[i] == NA_INTEGER) continue;
-            s[thisgrp] += INTEGER(x)[i];  // no under/overflow here, s is long double
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if(INTEGER(x)[ix] == NA_INTEGER) continue;
+            s[thisgrp] += INTEGER(x)[ix];  // no under/overflow here, s is long double
             c[thisgrp]++;
         }
         break;
     case REALSXP:
         for (i=0; i<n; i++) {
             thisgrp = grp[i];
-            if (ISNAN(REAL(x)[i])) continue;
-            s[thisgrp] += REAL(x)[i];
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if (ISNAN(REAL(x)[ix])) continue;
+            s[thisgrp] += REAL(x)[ix];
             c[thisgrp]++;
         }
         break;
@@ -172,55 +204,44 @@ SEXP gmean(SEXP x, SEXP narm)
     return(ans);
 }
 
-// TO DO: gsd, gprod, gwhich.min, gwhich.max
-
 // gmin
 SEXP gmin(SEXP x, SEXP narm)
 {
     if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
     if (!isVectorAtomic(x)) error("GForce min can only be applied to columns, not .SD or similar. To find min of all items in a list such as .SD, either add the prefix base::min(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,min),by=,.SDcols=]'");
-    R_len_t i, thisgrp=0;
-    int n = LENGTH(x);
+    if (inherits(x, "factor")) error("min is not meaningful for factors.");
+    R_len_t i, ix, thisgrp=0;
+    int n = (irowslen == -1) ? length(x) : irowslen;
     //clock_t start = clock();
     SEXP ans;
-    if (grpn != length(x)) error("grpn [%d] != length(x) [%d] in gmin", grpn, length(x));
-    char *update = Calloc(ngrp, char);
-    if (update == NULL) error("Unable to allocate %d * %d bytes for gmin", ngrp, sizeof(char));
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gmin", grpn, n);
     switch(TYPEOF(x)) {
     case LGLSXP: case INTSXP:
         ans = PROTECT(allocVector(INTSXP, ngrp));
-        for (i=0; i<ngrp; i++) INTEGER(ans)[i] = 0;
         if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) INTEGER(ans)[i] = INT_MAX;
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (INTEGER(x)[i] != NA_INTEGER && INTEGER(ans)[thisgrp] != NA_INTEGER) {
-                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] > INTEGER(x)[i] ) {
-                        INTEGER(ans)[thisgrp] = INTEGER(x)[i];
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
-                    }
-                } else INTEGER(ans)[thisgrp] = NA_INTEGER;
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (INTEGER(x)[ix] < INTEGER(ans)[thisgrp])   // NA_INTEGER==INT_MIN checked in init.c
+                    INTEGER(ans)[thisgrp] = INTEGER(x)[ix];
             }
         } else {
+            for (i=0; i<ngrp; i++) INTEGER(ans)[i] = NA_INTEGER;
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (INTEGER(x)[i] != NA_INTEGER) {
-                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] > INTEGER(x)[i] ) {
-                        INTEGER(ans)[thisgrp] = INTEGER(x)[i];
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
-                    }
-                } else {
-                    if (update[thisgrp] != 1) {
-                        INTEGER(ans)[thisgrp] = NA_INTEGER;
-                    }
-                }
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (INTEGER(x)[ix] == NA_INTEGER) continue;
+                if (INTEGER(ans)[thisgrp] == NA_INTEGER || INTEGER(x)[ix] < INTEGER(ans)[thisgrp])
+                    INTEGER(ans)[thisgrp] = INTEGER(x)[ix];
             }
             for (i=0; i<ngrp; i++) {
-                if (update[i] != 1)  {// equivalent of INTEGER(ans)[thisgrp] == NA_INTEGER
+                if (INTEGER(ans)[i] == NA_INTEGER) {
                     warning("No non-missing values found in at least one group. Coercing to numeric type and returning 'Inf' for such groups to be consistent with base");
                     UNPROTECT(1);
                     ans = PROTECT(coerceVector(ans, REALSXP));
                     for (i=0; i<ngrp; i++) {
-                        if (update[i] != 1) REAL(ans)[i] = R_PosInf;
+                        if (ISNA(REAL(ans)[i])) REAL(ans)[i] = R_PosInf;
                     }
                     break;
                 }
@@ -229,33 +250,33 @@ SEXP gmin(SEXP x, SEXP narm)
         break;
     case STRSXP:
         ans = PROTECT(allocVector(STRSXP, ngrp));
-        for (i=0; i<ngrp; i++) SET_STRING_ELT(ans, i, mkChar(""));
         if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) SET_STRING_ELT(ans, i, R_BlankString);
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (STRING_ELT(x, i) != NA_STRING && STRING_ELT(ans, thisgrp) != NA_STRING) {
-                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x, i))) > 0 ) {
-                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, i));
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (STRING_ELT(x, ix) == NA_STRING) {
+                    SET_STRING_ELT(ans, thisgrp, NA_STRING);
+                } else {
+                    if (STRING_ELT(ans, thisgrp) == R_BlankString ||
+                        (STRING_ELT(ans, thisgrp) != NA_STRING && strcmp(CHAR(STRING_ELT(x, ix)), CHAR(STRING_ELT(ans, thisgrp))) < 0 )) {
+                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, ix));
                     }
-                } else SET_STRING_ELT(ans, thisgrp, NA_STRING);
+                }
             }
         } else {
+            for (i=0; i<ngrp; i++) SET_STRING_ELT(ans, i, NA_STRING);
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (STRING_ELT(x, i) != NA_STRING) {
-                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x, i))) > 0 ) {
-                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, i));
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
-                    }
-                } else {
-                    if (update[thisgrp] != 1) {
-                        SET_STRING_ELT(ans, thisgrp, NA_STRING);
-                    }
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (STRING_ELT(x, ix) == NA_STRING) continue;
+                if (STRING_ELT(ans, thisgrp) == NA_STRING || 
+                    strcmp(CHAR(STRING_ELT(x, ix)), CHAR(STRING_ELT(ans, thisgrp))) < 0) {
+                    SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, ix));
                 }
             }
             for (i=0; i<ngrp; i++) {
-                if (update[i] != 1)  {// equivalent of INTEGER(ans)[thisgrp] == NA_INTEGER
+                if (STRING_ELT(ans, i)==NA_STRING) {
                     warning("No non-missing values found in at least one group. Returning 'NA' for such groups to be consistent with base");
                     break;
                 }
@@ -264,35 +285,27 @@ SEXP gmin(SEXP x, SEXP narm)
         break;
     case REALSXP:
         ans = PROTECT(allocVector(REALSXP, ngrp));
-        for (i=0; i<ngrp; i++) REAL(ans)[i] = 0;
-        if (!LOGICAL(narm)[0]) {
+        if (!LOGICAL(narm)[0]) {    
+            for (i=0; i<ngrp; i++) REAL(ans)[i] = R_PosInf;
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if ( !ISNA(REAL(x)[i]) && !ISNA(REAL(ans)[thisgrp]) ) {
-                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] > REAL(x)[i] ) {
-                        REAL(ans)[thisgrp] = REAL(x)[i];
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
-                    }
-                } else REAL(ans)[thisgrp] = NA_REAL;
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (ISNAN(REAL(x)[ix]) || REAL(x)[ix] < REAL(ans)[thisgrp])
+                    REAL(ans)[thisgrp] = REAL(x)[ix];
             }
         } else {
+            for (i=0; i<ngrp; i++) REAL(ans)[i] = NA_REAL;
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if ( !ISNA(REAL(x)[i]) ) {
-                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] > REAL(x)[i] ) {
-                        REAL(ans)[thisgrp] = REAL(x)[i];
-                        if (update[thisgrp] != 1) update[thisgrp] = 1;
-                    }
-                } else {
-                    if (update[thisgrp] != 1) {
-                        REAL(ans)[thisgrp] = R_PosInf;
-                    }
-                }
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (ISNAN(REAL(x)[ix])) continue;
+                if (ISNAN(REAL(ans)[thisgrp]) || REAL(x)[ix] < REAL(ans)[thisgrp])
+                    REAL(ans)[thisgrp] = REAL(x)[ix];
             }
-            // everything taken care of already. Just warn if all NA groups have occurred at least once
             for (i=0; i<ngrp; i++) {
-                if (update[i] != 1)  {// equivalent of REAL(ans)[thisgrp] == R_PosInf
+                if (ISNAN(REAL(ans)[i])) {
                     warning("No non-missing values found in at least one group. Returning 'Inf' for such groups to be consistent with base");
+                    for (; i<ngrp; i++) if (ISNAN(REAL(ans)[i])) REAL(ans)[i] = R_PosInf;
                     break;
                 }
             }
@@ -303,7 +316,6 @@ SEXP gmin(SEXP x, SEXP narm)
     }
     copyMostAttrib(x, ans); // all but names,dim and dimnames. And if so, we want a copy here, not keepattr's SET_ATTRIB.
     UNPROTECT(1);
-    Free(update);
     // Rprintf("this gmin took %8.3f\n", 1.0*(clock()-start)/CLOCKS_PER_SEC);
     return(ans);
 }
@@ -313,13 +325,17 @@ SEXP gmax(SEXP x, SEXP narm)
 {
     if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
     if (!isVectorAtomic(x)) error("GForce max can only be applied to columns, not .SD or similar. To find max of all items in a list such as .SD, either add the prefix base::max(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,max),by=,.SDcols=]'");
-    R_len_t i, thisgrp=0;
-    int n = LENGTH(x);
+    if (inherits(x, "factor")) error("max is not meaningful for factors.");
+    R_len_t i, ix, thisgrp=0;
+    int n = (irowslen == -1) ? length(x) : irowslen;
     //clock_t start = clock();
     SEXP ans;
-    if (grpn != length(x)) error("grpn [%d] != length(x) [%d] in gmax", grpn, length(x));
-    char *update = Calloc(ngrp, char);
-    if (update == NULL) error("Unable to allocate %d * %d bytes for gmax", ngrp, sizeof(char));
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gmax", grpn, n);
+    
+    // TODO rework gmax in the same way as gmin and remove this *update
+    char *update = (char *)R_alloc(ngrp, sizeof(char));
+    for (int i=0; i<ngrp; i++) update[i] = 0;
+    
     switch(TYPEOF(x)) {
     case LGLSXP: case INTSXP:
         ans = PROTECT(allocVector(INTSXP, ngrp));
@@ -327,9 +343,10 @@ SEXP gmax(SEXP x, SEXP narm)
         if (!LOGICAL(narm)[0]) { // simple case - deal in a straightforward manner first
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (INTEGER(x)[i] != NA_INTEGER && INTEGER(ans)[thisgrp] != NA_INTEGER) {
-                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] < INTEGER(x)[i] ) {
-                        INTEGER(ans)[thisgrp] = INTEGER(x)[i];
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (INTEGER(x)[ix] != NA_INTEGER && INTEGER(ans)[thisgrp] != NA_INTEGER) {
+                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] < INTEGER(x)[ix] ) {
+                        INTEGER(ans)[thisgrp] = INTEGER(x)[ix];
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else  INTEGER(ans)[thisgrp] = NA_INTEGER;
@@ -337,9 +354,10 @@ SEXP gmax(SEXP x, SEXP narm)
         } else {
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (INTEGER(x)[i] != NA_INTEGER) {
-                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] < INTEGER(x)[i] ) {
-                        INTEGER(ans)[thisgrp] = INTEGER(x)[i];
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (INTEGER(x)[ix] != NA_INTEGER) {
+                    if ( update[thisgrp] != 1 || INTEGER(ans)[thisgrp] < INTEGER(x)[ix] ) {
+                        INTEGER(ans)[thisgrp] = INTEGER(x)[ix];
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else {
@@ -367,9 +385,10 @@ SEXP gmax(SEXP x, SEXP narm)
         if (!LOGICAL(narm)[0]) { // simple case - deal in a straightforward manner first
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (STRING_ELT(x,i) != NA_STRING && STRING_ELT(ans, thisgrp) != NA_STRING) {
-                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x,i))) < 0 ) {
-                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, i));
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (STRING_ELT(x,ix) != NA_STRING && STRING_ELT(ans, thisgrp) != NA_STRING) {
+                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x,ix))) < 0 ) {
+                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, ix));
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else  SET_STRING_ELT(ans, thisgrp, NA_STRING);
@@ -377,9 +396,10 @@ SEXP gmax(SEXP x, SEXP narm)
         } else {
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if (STRING_ELT(x, i) != NA_STRING) {
-                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x, i))) < 0 ) {
-                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, i));
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if (STRING_ELT(x, ix) != NA_STRING) {
+                    if ( update[thisgrp] != 1 || strcmp(CHAR(STRING_ELT(ans, thisgrp)), CHAR(STRING_ELT(x, ix))) < 0 ) {
+                        SET_STRING_ELT(ans, thisgrp, STRING_ELT(x, ix));
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else {
@@ -402,9 +422,11 @@ SEXP gmax(SEXP x, SEXP narm)
         if (!LOGICAL(narm)[0]) {
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if ( !ISNA(REAL(x)[i]) && !ISNA(REAL(ans)[thisgrp]) ) {
-                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] < REAL(x)[i] ) {
-                        REAL(ans)[thisgrp] = REAL(x)[i];
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if ( !ISNA(REAL(x)[ix]) && !ISNA(REAL(ans)[thisgrp]) ) {
+                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] < REAL(x)[ix] || 
+                         (ISNAN(REAL(x)[ix]) && !ISNAN(REAL(ans)[thisgrp])) ) { // #1461
+                        REAL(ans)[thisgrp] = REAL(x)[ix];
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else REAL(ans)[thisgrp] = NA_REAL;
@@ -412,9 +434,10 @@ SEXP gmax(SEXP x, SEXP narm)
         } else {
             for (i=0; i<n; i++) {
                 thisgrp = grp[i];
-                if ( !ISNA(REAL(x)[i]) ) {
-                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] < REAL(x)[i] ) {
-                        REAL(ans)[thisgrp] = REAL(x)[i];
+                ix = (irowslen == -1) ? i : irows[i]-1;
+                if ( !ISNAN(REAL(x)[ix]) ) { // #1461
+                    if ( update[thisgrp] != 1 || REAL(ans)[thisgrp] < REAL(x)[ix] ) {
+                        REAL(ans)[thisgrp] = REAL(x)[ix];
                         if (update[thisgrp] != 1) update[thisgrp] = 1;
                     }
                 } else {
@@ -437,7 +460,577 @@ SEXP gmax(SEXP x, SEXP narm)
     }
     copyMostAttrib(x, ans); // all but names,dim and dimnames. And if so, we want a copy here, not keepattr's SET_ATTRIB.
     UNPROTECT(1);
-    Free(update);
     // Rprintf("this gmax took %8.3f\n", 1.0*(clock()-start)/CLOCKS_PER_SEC);
     return(ans);
 }
+
+// gmedian, always returns numeric type (to avoid as.numeric() wrap..)
+SEXP gmedian(SEXP x, SEXP narm) {
+
+    if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
+    if (!isVectorAtomic(x)) error("GForce median can only be applied to columns, not .SD or similar. To find median of all items in a list such as .SD, either add the prefix stats::median(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,median),by=,.SDcols=]'");
+    if (inherits(x, "factor")) error("median is not meaningful for factors.");
+    R_len_t i=0, j=0, k=0, imed=0, thisgrpsize=0, medianindex=0, nacount=0;
+    double val = 0.0;
+    Rboolean isna = FALSE, isint64 = FALSE;
+    SEXP ans, sub, class;
+    void *ptr;
+    int n = (irowslen == -1) ? length(x) : irowslen;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gmedian", grpn, n);
+    switch(TYPEOF(x)) {
+    case REALSXP:
+        class = getAttrib(x, R_ClassSymbol);
+        isint64 = (isString(class) && STRING_ELT(class, 0) == char_integer64);
+        ans = PROTECT(allocVector(REALSXP, ngrp));
+        sub = PROTECT(allocVector(REALSXP, maxgrpn)); // allocate once upfront
+        ptr = DATAPTR(sub);
+        if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) {
+                isna = FALSE;
+                thisgrpsize = grpsize[i];
+                SETLENGTH(sub, thisgrpsize);
+                for (j=0; j<thisgrpsize; j++) {
+                    k = ff[i]+j-1;
+                    if (isunsorted) k = oo[k]-1;
+                    k = (irowslen == -1) ? k : irows[k]-1;
+                    // TODO: raise this if-statement?
+                    if (!isint64) {
+                        if (!ISNAN(REAL(x)[k])) {
+                            REAL(sub)[j] = REAL(x)[k];
+                        } else {
+                            REAL(ans)[i] = NA_REAL;
+                            isna = TRUE; break;
+                        }
+                    } else {
+                        u.d = REAL(x)[k];
+                        if (u.ll != NAINT64) {
+                            REAL(sub)[j] = (double)u.ll;
+                        } else {
+                            REAL(ans)[i] = NA_REAL;
+                            isna = TRUE; break;
+                        }
+                    } 
+                }
+                if (isna) continue;
+                medianindex = (R_len_t)(ceil((double)(thisgrpsize)/2));
+                REAL(ans)[i] = dquickselect(ptr, thisgrpsize, medianindex-1); // 0-indexed
+                // all elements to the left of thisgrpsize/2 is < the value at that index
+                // we just need to get min of last half
+                if (thisgrpsize % 2 == 0) {
+                    val = REAL(sub)[medianindex]; // 0-indexed
+                    for (imed=medianindex+1; imed<thisgrpsize; imed++) {
+                        val = REAL(sub)[imed] > val ? val : REAL(sub)[imed];
+                    }
+                    REAL(ans)[i] = (REAL(ans)[i] + val)/2.0;
+                }
+            }
+        } else {
+            for (i=0; i<ngrp; i++) {
+                nacount = 0;
+                thisgrpsize = grpsize[i];
+                for (j=0; j<thisgrpsize; j++) {
+                    k = ff[i]+j-1;
+                    if (isunsorted) k = oo[k]-1;
+                    k = (irowslen == -1) ? k : irows[k]-1;
+                    // TODO: raise this if-statement?
+                    if (!isint64) {
+                        if (!ISNAN(REAL(x)[k])) {
+                            REAL(sub)[j-nacount] = REAL(x)[k];
+                        } else { nacount++; continue; }
+                    } else {
+                        u.d = REAL(x)[k];
+                        if (u.ll != NAINT64) {
+                            REAL(sub)[j-nacount] = (double)u.ll;
+                        } else { nacount++; continue; }
+                    }
+                }
+                if (nacount == thisgrpsize) {
+                    REAL(ans)[i] = NA_REAL; // all NAs
+                    continue;
+                }
+                thisgrpsize -= nacount;
+                SETLENGTH(sub, thisgrpsize);
+                medianindex = (R_len_t)(ceil((double)(thisgrpsize)/2));
+                REAL(ans)[i] = dquickselect(ptr, thisgrpsize, medianindex-1);
+                if (thisgrpsize % 2 == 0) {
+                    // all elements to the left of thisgrpsize/2 is < the value at that index
+                    // we just need to get min of last half
+                    val = REAL(sub)[medianindex]; // 0-indexed
+                    for (imed=medianindex+1; imed<thisgrpsize; imed++) {
+                        val = REAL(sub)[imed] > val ? val : REAL(sub)[imed];
+                    }
+                    REAL(ans)[i] = (REAL(ans)[i] + val)/2.0;
+                }
+            }            
+        }
+        SETLENGTH(sub, maxgrpn);
+        break;
+    case LGLSXP: case INTSXP: 
+        ans = PROTECT(allocVector(REALSXP, ngrp));
+        sub = PROTECT(allocVector(INTSXP, maxgrpn)); // allocate once upfront
+        ptr = DATAPTR(sub);
+        if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) {
+                isna = FALSE;
+                thisgrpsize = grpsize[i];
+                SETLENGTH(sub, thisgrpsize);
+                for (j=0; j<thisgrpsize; j++) {
+                    k = ff[i]+j-1;
+                    if (isunsorted) k = oo[k]-1;
+                    k = (irowslen == -1) ? k : irows[k]-1;
+                    if (INTEGER(x)[k] != NA_INTEGER) {
+                        INTEGER(sub)[j] = INTEGER(x)[k];
+                    } else {
+                        REAL(ans)[i] = NA_REAL;
+                        isna = TRUE; break;
+                    }
+                }
+                if (isna) continue;
+                medianindex = (R_len_t)(ceil((double)(thisgrpsize)/2));
+                REAL(ans)[i] = iquickselect(ptr, thisgrpsize, medianindex-1); // 0-indexed
+                // all elements to the left of thisgrpsize/2 is < the value at that index
+                // we just need to get min of last half
+                if (thisgrpsize % 2 == 0) {
+                    val = INTEGER(sub)[medianindex]; // 0-indexed
+                    for (imed=medianindex+1; imed<thisgrpsize; imed++) {
+                        val = INTEGER(sub)[imed] > val ? val : INTEGER(sub)[imed];
+                    }
+                    REAL(ans)[i] = (REAL(ans)[i] + val)/2.0;
+                }
+            }
+        } else {
+            for (i=0; i<ngrp; i++) {
+                nacount = 0;
+                thisgrpsize = grpsize[i];
+                for (j=0; j<thisgrpsize; j++) {
+                    k = ff[i]+j-1;
+                    if (isunsorted) k = oo[k]-1;
+                    k = (irowslen == -1) ? k : irows[k]-1;
+                    if (INTEGER(x)[k] != NA_INTEGER) {
+                        INTEGER(sub)[j-nacount] = INTEGER(x)[k];
+                    } else { nacount++; continue; }
+                }
+                if (nacount == thisgrpsize) {
+                    REAL(ans)[i] = NA_REAL; // all NAs
+                    continue;
+                }
+                thisgrpsize -= nacount;
+                SETLENGTH(sub, thisgrpsize);
+                medianindex = (R_len_t)(ceil((double)(thisgrpsize)/2));
+                REAL(ans)[i] = iquickselect(ptr, thisgrpsize, medianindex-1);
+                if (thisgrpsize % 2 == 0) {
+                    // all elements to the left of thisgrpsize/2 is < the value at that index
+                    // we just need to get min of last half
+                    val = INTEGER(sub)[medianindex]; // 0-indexed
+                    for (imed=medianindex+1; imed<thisgrpsize; imed++) {
+                        val = INTEGER(sub)[imed] > val ? val : INTEGER(sub)[imed];
+                    }
+                    REAL(ans)[i] = (REAL(ans)[i] + val)/2.0;
+                }
+            }            
+        }
+        SETLENGTH(sub, maxgrpn);
+        break;
+    default:
+        error("Type '%s' not supported by GForce median (gmedian). Either add the prefix stats::median(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+    }
+    UNPROTECT(2);
+    return(ans);
+}
+
+SEXP glast(SEXP x) {
+
+    if (!isVectorAtomic(x)) error("GForce tail can only be applied to columns, not .SD or similar. To get tail of all items in a list such as .SD, either add the prefix utils::tail(.SD) or turn off GForce optimization using options(datatable.optimize=1).");
+
+    R_len_t i,k;
+    int n = (irowslen == -1) ? length(x) : irowslen;
+    SEXP ans;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gtail", grpn, n);
+    switch(TYPEOF(x)) {
+    case LGLSXP: 
+        ans = PROTECT(allocVector(LGLSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]+grpsize[i]-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            LOGICAL(ans)[i] = LOGICAL(x)[k];
+        }
+    break;
+    case INTSXP:
+        ans = PROTECT(allocVector(INTSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]+grpsize[i]-2;
+            if (isunsorted) k = oo[k]-1;            
+            k = (irowslen == -1) ? k : irows[k]-1;
+            INTEGER(ans)[i] = INTEGER(x)[k];
+        }
+    break;
+    case REALSXP:
+        ans = PROTECT(allocVector(REALSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]+grpsize[i]-2;
+            if (isunsorted) k = oo[k]-1;            
+            k = (irowslen == -1) ? k : irows[k]-1;
+            REAL(ans)[i] = REAL(x)[k];
+        }
+    break;
+    case STRSXP:
+        ans = PROTECT(allocVector(STRSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]+grpsize[i]-2;
+            if (isunsorted) k = oo[k]-1;            
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_STRING_ELT(ans, i, STRING_ELT(x, k));
+        }
+    break;
+    case VECSXP:
+        ans = PROTECT(allocVector(VECSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]+grpsize[i]-2;
+            if (isunsorted) k = oo[k]-1;            
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_VECTOR_ELT(ans, i, VECTOR_ELT(x, k));
+        }
+    break;
+    default:
+        error("Type '%s' not supported by GForce tail (gtail). Either add the prefix utils::tail(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+    }
+    copyMostAttrib(x, ans);
+    UNPROTECT(1);
+    return(ans);
+}
+
+SEXP gfirst(SEXP x) {
+
+    if (!isVectorAtomic(x)) error("GForce head can only be applied to columns, not .SD or similar. To get head of all items in a list such as .SD, either add the prefix utils::head(.SD) or turn off GForce optimization using options(datatable.optimize=1).");
+
+    R_len_t i,k;
+    int n = (irowslen == -1) ? length(x) : irowslen;
+    SEXP ans;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in ghead", grpn, n);
+    switch(TYPEOF(x)) {
+    case LGLSXP: 
+        ans = PROTECT(allocVector(LGLSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]-1;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            LOGICAL(ans)[i] = LOGICAL(x)[k];
+        }
+    break;
+    case INTSXP:
+        ans = PROTECT(allocVector(INTSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]-1;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            INTEGER(ans)[i] = INTEGER(x)[k];
+        }
+    break;
+    case REALSXP:
+        ans = PROTECT(allocVector(REALSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]-1;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            REAL(ans)[i] = REAL(x)[k];
+        }
+    break;
+    case STRSXP:
+        ans = PROTECT(allocVector(STRSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]-1;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_STRING_ELT(ans, i, STRING_ELT(x, k));
+        }
+    break;
+    case VECSXP:
+        ans = PROTECT(allocVector(VECSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            k = ff[i]-1;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_VECTOR_ELT(ans, i, VECTOR_ELT(x, k));
+        }
+    break;
+    default:
+        error("Type '%s' not supported by GForce head (ghead). Either add the prefix utils::head(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+    }
+    copyMostAttrib(x, ans);
+    UNPROTECT(1);
+    return(ans);
+}
+
+SEXP gtail(SEXP x, SEXP valArg) {
+    if (!isInteger(valArg) || LENGTH(valArg)!=1 || INTEGER(valArg)[0]!=1) error("Internal error, gtail is only implemented for n=1. This should have been caught before. Please report to datatable-help.");
+    return (glast(x));
+}
+
+SEXP ghead(SEXP x, SEXP valArg) {
+    if (!isInteger(valArg) || LENGTH(valArg)!=1 || INTEGER(valArg)[0]!=1) error("Internal error, ghead is only implemented for n=1. This should have been caught before. Please report to datatable-help.");
+    return (gfirst(x));
+}
+
+SEXP gnthvalue(SEXP x, SEXP valArg) {
+
+    if (!isInteger(valArg) || LENGTH(valArg)!=1 || INTEGER(valArg)[0]<=0) error("Internal error, `g[` (gnthvalue) is only implemented single value subsets with positive index, e.g., .SD[2]. This should have been caught before. Please report to datatable-help.");
+    R_len_t i,k, val=INTEGER(valArg)[0];
+    int n = (irowslen == -1) ? length(x) : irowslen;
+    SEXP ans;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in ghead", grpn, n);
+    switch(TYPEOF(x)) {
+    case LGLSXP: 
+        ans = PROTECT(allocVector(LGLSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            if (val > grpsize[i]) { LOGICAL(ans)[i] = NA_LOGICAL; continue; }
+            k = ff[i]+val-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            LOGICAL(ans)[i] = LOGICAL(x)[k];
+        }
+    break;
+    case INTSXP:
+        ans = PROTECT(allocVector(INTSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            if (val > grpsize[i]) { INTEGER(ans)[i] = NA_INTEGER; continue; }
+            k = ff[i]+val-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            INTEGER(ans)[i] = INTEGER(x)[k];
+        }
+    break;
+    case REALSXP:
+        ans = PROTECT(allocVector(REALSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            if (val > grpsize[i]) { REAL(ans)[i] = NA_REAL; continue; }
+            k = ff[i]+val-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            REAL(ans)[i] = REAL(x)[k];
+        }
+    break;
+    case STRSXP:
+        ans = PROTECT(allocVector(STRSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            if (val > grpsize[i]) { SET_STRING_ELT(ans, i, NA_STRING); continue; }
+            k = ff[i]+val-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_STRING_ELT(ans, i, STRING_ELT(x, k));
+        }
+    break;
+    case VECSXP:
+        ans = PROTECT(allocVector(VECSXP, ngrp));
+        for (i=0; i<ngrp; i++) {
+            if (val > grpsize[i]) { SET_VECTOR_ELT(ans, i, R_NilValue); continue; }
+            k = ff[i]+val-2;
+            if (isunsorted) k = oo[k]-1;
+            k = (irowslen == -1) ? k : irows[k]-1;
+            SET_VECTOR_ELT(ans, i, VECTOR_ELT(x, k));
+        }
+    break;
+    default:
+        error("Type '%s' not supported by GForce subset `[` (gnthvalue). Either add the prefix utils::head(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+    }
+    copyMostAttrib(x, ans);
+    UNPROTECT(1);
+    return(ans);    
+}
+
+// TODO: gwhich.min, gwhich.max
+// implemented this similar to gmedian to balance well between speed and memory usage. There's one extra allocation on maximum groups and that's it.. and that helps speed things up extremely since we don't have to collect x's values for each group for each step (mean, residuals, mean again and then variance).
+SEXP gvarsd1(SEXP x, SEXP narm, Rboolean isSD)
+{
+    if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
+    if (!isVectorAtomic(x)) error("GForce var/sd can only be applied to columns, not .SD or similar. To find var/sd of all items in a list such as .SD, either add the prefix stats::var(.SD) (or stats::sd(.SD)) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,var),by=,.SDcols=]'");
+    if (inherits(x, "factor")) error("var/sd is not meaningful for factors.");
+    long double m, s, v;
+    R_len_t i, j, ix, thisgrpsize = 0, n = (irowslen == -1) ? length(x) : irowslen;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gvar", grpn, n);
+    SEXP sub, ans = PROTECT(allocVector(REALSXP, ngrp));
+    Rboolean ans_na;
+    switch(TYPEOF(x)) {
+        case LGLSXP: case INTSXP:
+        sub = PROTECT(allocVector(INTSXP, maxgrpn)); // allocate once upfront
+        if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) {
+                m=0.; s=0.; v=0.; ans_na = FALSE;
+                if (grpsize[i] != 1) {
+                    thisgrpsize = grpsize[i];
+                    SETLENGTH(sub, thisgrpsize); // to gather this group's data
+                    for (j=0; j<thisgrpsize; j++) {
+                        ix = ff[i]+j-1;
+                        if (isunsorted) ix = oo[ix]-1;
+                        ix = (irowslen == -1) ? ix : irows[ix]-1;
+                        if (INTEGER(x)[ix] == NA_INTEGER) { ans_na = TRUE; break; }
+                        INTEGER(sub)[j] = INTEGER(x)[ix];
+                        m += INTEGER(sub)[j]; // sum
+                    }
+                    if (ans_na) { REAL(ans)[i] = NA_REAL; continue; }
+                    m = m/thisgrpsize; // mean, first pass
+                    for (j=0; j<thisgrpsize; j++) s += (INTEGER(sub)[j]-m); // residuals
+                    m += (s/thisgrpsize); // mean, second pass
+                    for (j=0; j<thisgrpsize; j++) { // variance
+                        v += (INTEGER(sub)[j]-(double)m) * (INTEGER(sub)[j]-(double)m);
+                    }
+                    REAL(ans)[i] = (double)v/(thisgrpsize-1);
+                    if (isSD) REAL(ans)[i] = SQRTL(REAL(ans)[i]);
+                } else REAL(ans)[i] = NA_REAL;
+            }
+        } else {
+            for (i=0; i<ngrp; i++) {
+                m=0.; s=0.; v=0.; thisgrpsize = 0;
+                if (grpsize[i] != 1) {
+                    SETLENGTH(sub, grpsize[i]); // to gather this group's data
+                    for (j=0; j<grpsize[i]; j++) {
+                        ix = ff[i]+j-1;
+                        if (isunsorted) ix = oo[ix]-1;
+                        ix = (irowslen == -1) ? ix : irows[ix]-1;
+                        if (INTEGER(x)[ix] == NA_INTEGER) continue;
+                        INTEGER(sub)[thisgrpsize] = INTEGER(x)[ix];
+                        m += INTEGER(sub)[thisgrpsize]; // sum
+                        thisgrpsize++;
+                    }
+                    if (thisgrpsize <= 1) { REAL(ans)[i] = NA_REAL; continue; }
+                    m = m/thisgrpsize; // mean, first pass
+                    for (j=0; j<thisgrpsize; j++) s += (INTEGER(sub)[j]-m); // residuals
+                    m += (s/thisgrpsize); // mean, second pass
+                    for (j=0; j<thisgrpsize; j++) { // variance
+                        v += (INTEGER(sub)[j]-(double)m) * (INTEGER(sub)[j]-(double)m);
+                    }
+                    REAL(ans)[i] = (double)v/(thisgrpsize-1);
+                    if (isSD) REAL(ans)[i] = SQRTL(REAL(ans)[i]);
+                } else REAL(ans)[i] = NA_REAL;
+            }
+        }
+        SETLENGTH(sub, maxgrpn);
+        break;
+        case REALSXP:
+        sub = PROTECT(allocVector(REALSXP, maxgrpn)); // allocate once upfront
+        if (!LOGICAL(narm)[0]) {
+            for (i=0; i<ngrp; i++) {
+                m=0.; s=0.; v=0.; ans_na = FALSE;
+                if (grpsize[i] != 1) {
+                    thisgrpsize = grpsize[i];
+                    SETLENGTH(sub, thisgrpsize); // to gather this group's data
+                    for (j=0; j<thisgrpsize; j++) {
+                        ix = ff[i]+j-1;
+                        if (isunsorted) ix = oo[ix]-1;
+                        ix = (irowslen == -1) ? ix : irows[ix]-1;
+                        if (ISNAN(REAL(x)[ix])) { ans_na = TRUE; break; }
+                        REAL(sub)[j] = REAL(x)[ix];
+                        m += REAL(sub)[j]; // sum
+                    }
+                    if (ans_na) { REAL(ans)[i] = NA_REAL; continue; }
+                    m = m/thisgrpsize; // mean, first pass
+                    for (j=0; j<thisgrpsize; j++) s += (REAL(sub)[j]-m); // residuals
+                    m += (s/thisgrpsize); // mean, second pass
+                    for (j=0; j<thisgrpsize; j++) { // variance
+                        v += (REAL(sub)[j]-(double)m) * (REAL(sub)[j]-(double)m);
+                    }
+                    REAL(ans)[i] = (double)v/(thisgrpsize-1);
+                    if (isSD) REAL(ans)[i] = SQRTL(REAL(ans)[i]);
+                } else REAL(ans)[i] = NA_REAL;
+            }
+        } else {
+            for (i=0; i<ngrp; i++) {
+                m=0.; s=0.; v=0.; thisgrpsize = 0;
+                if (grpsize[i] != 1) {
+                    SETLENGTH(sub, grpsize[i]); // to gather this group's data
+                    for (j=0; j<grpsize[i]; j++) {
+                        ix = ff[i]+j-1;
+                        if (isunsorted) ix = oo[ix]-1;
+                        ix = (irowslen == -1) ? ix : irows[ix]-1;
+                        if (ISNAN(REAL(x)[ix])) continue;
+                        REAL(sub)[thisgrpsize] = REAL(x)[ix];
+                        m += REAL(sub)[thisgrpsize]; // sum
+                        thisgrpsize++;
+                    }
+                    if (thisgrpsize <= 1) { REAL(ans)[i] = NA_REAL; continue; }
+                    m = m/thisgrpsize; // mean, first pass
+                    for (j=0; j<thisgrpsize; j++) s += (REAL(sub)[j]-m); // residuals
+                    m += (s/thisgrpsize); // mean, second pass
+                    for (j=0; j<thisgrpsize; j++) { // variance
+                        v += (REAL(sub)[j]-(double)m) * (REAL(sub)[j]-(double)m);
+                    }
+                    REAL(ans)[i] = (double)v/(thisgrpsize-1);
+                    if (isSD) REAL(ans)[i] = SQRTL(REAL(ans)[i]);
+                } else REAL(ans)[i] = NA_REAL;
+            }
+        }
+        SETLENGTH(sub, maxgrpn);
+        break;
+        default: 
+            if (isSD) {
+                error("Type '%s' not supported by GForce var (gvar). Either add the prefix stats::var(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+            } else {
+                error("Type '%s' not supported by GForce sd (gsd). Either add the prefix stats::sd(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));                
+            }
+    }
+    UNPROTECT(2);
+    return (ans);
+}
+
+SEXP gvar(SEXP x, SEXP narm) {
+    return (gvarsd1(x, narm, FALSE));
+}
+
+SEXP gsd(SEXP x, SEXP narm) {
+    return (gvarsd1(x, narm, TRUE));
+}
+
+SEXP gprod(SEXP x, SEXP narm)
+{
+    if (!isLogical(narm) || LENGTH(narm)!=1 || LOGICAL(narm)[0]==NA_LOGICAL) error("na.rm must be TRUE or FALSE");
+    if (!isVectorAtomic(x)) error("GForce prod can only be applied to columns, not .SD or similar. To multiply all items in a list such as .SD, either add the prefix base::prod(.SD) or turn off GForce optimization using options(datatable.optimize=1). More likely, you may be looking for 'DT[,lapply(.SD,prod),by=,.SDcols=]'");
+    if (inherits(x, "factor")) error("prod is not meaningful for factors.");
+    int i, ix, thisgrp;
+    int n = (irowslen == -1) ? length(x) : irowslen;
+    //clock_t start = clock();
+    SEXP ans;
+    if (grpn != n) error("grpn [%d] != length(x) [%d] in gprod", grpn, n);
+    long double *s = malloc(ngrp * sizeof(long double));
+    if (!s) error("Unable to allocate %d * %d bytes for gprod", ngrp, sizeof(long double));
+    for (i=0; i<ngrp; i++) s[i] = 1.0;
+    ans = PROTECT(allocVector(REALSXP, ngrp));
+    switch(TYPEOF(x)) {
+    case LGLSXP: case INTSXP:
+        for (i=0; i<n; i++) {
+            thisgrp = grp[i];
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if(INTEGER(x)[ix] == NA_INTEGER) { 
+                if (!LOGICAL(narm)[0]) s[thisgrp] = NA_REAL;  // Let NA_REAL propogate from here. R_NaReal is IEEE.
+                continue;
+            }
+            s[thisgrp] *= INTEGER(x)[ix];  // no under/overflow here, s is long double (like base)
+        }
+        for (i=0; i<ngrp; i++) {
+            if (s[i] > DBL_MAX) REAL(ans)[i] = R_PosInf;
+            else if (s[i] < -DBL_MAX) REAL(ans)[i] = R_NegInf;
+            else REAL(ans)[i] = (double)s[i];
+        }
+        break;
+    case REALSXP:
+        for (i=0; i<n; i++) {
+            thisgrp = grp[i];
+            ix = (irowslen == -1) ? i : irows[i]-1;
+            if(ISNAN(REAL(x)[ix]) && LOGICAL(narm)[0]) continue;  // else let NA_REAL propogate from here
+            s[thisgrp] *= REAL(x)[ix];  // done in long double, like base
+        }
+        for (i=0; i<ngrp; i++) {
+            if (s[i] > DBL_MAX) REAL(ans)[i] = R_PosInf;
+            else if (s[i] < -DBL_MAX) REAL(ans)[i] = R_NegInf;
+            else REAL(ans)[i] = (double)s[i];
+        }
+        break;
+    default:
+        free(s);
+        error("Type '%s' not supported by GForce prod (gprod). Either add the prefix base::prod(.) or turn off GForce optimization using options(datatable.optimize=1)", type2char(TYPEOF(x)));
+    }
+    free(s);
+    copyMostAttrib(x, ans);
+    UNPROTECT(1);
+    // Rprintf("this gprod took %8.3f\n", 1.0*(clock()-start)/CLOCKS_PER_SEC);
+    return(ans);
+}
diff --git a/src/init.c b/src/init.c
index 3773f53..eb7c35f 100644
--- a/src/init.c
+++ b/src/init.c
@@ -17,6 +17,8 @@ SEXP setcharvec();
 SEXP setcolorder();
 SEXP chmatchwrapper();
 SEXP readfile();
+SEXP writefile();
+SEXP genLookups();
 SEXP reorder();
 SEXP rbindlist();
 SEXP vecseq();
@@ -29,8 +31,6 @@ SEXP fmelt();
 SEXP fcast();
 SEXP uniqlist();
 SEXP uniqlengths();
-SEXP fastradixdouble();
-SEXP fastradixint();
 SEXP setrev();
 SEXP forder();
 SEXP fsorted();
@@ -60,6 +60,24 @@ SEXP anyNA();
 SEXP isReallyReal();
 SEXP setlevels();
 SEXP rleid();
+SEXP gmedian();
+SEXP gtail();
+SEXP ghead();
+SEXP glast();
+SEXP gfirst();
+SEXP gnthvalue();
+SEXP dim();
+SEXP gvar();
+SEXP gsd();
+SEXP gprod();
+SEXP nestedid();
+SEXP setDTthreads();
+SEXP getDTthreads_R();
+SEXP nqnewindices();
+SEXP fsort();
+SEXP inrange();
+SEXP between();
+SEXP hasOpenMP();
 
 // .Externals
 SEXP fastmean();
@@ -79,6 +97,8 @@ R_CallMethodDef callMethods[] = {
 {"Csetcolorder", (DL_FUNC) &setcolorder, -1},
 {"Cchmatchwrapper", (DL_FUNC) &chmatchwrapper, -1},
 {"Creadfile", (DL_FUNC) &readfile, -1},
+{"Cwritefile", (DL_FUNC) &writefile, -1},
+{"CgenLookups", (DL_FUNC) &genLookups, -1},
 {"Creorder", (DL_FUNC) &reorder, -1},
 {"Crbindlist", (DL_FUNC) &rbindlist, -1},
 {"Cvecseq", (DL_FUNC) &vecseq, -1},
@@ -91,8 +111,6 @@ R_CallMethodDef callMethods[] = {
 {"Cfcast", (DL_FUNC) &fcast, -1}, 
 {"Cuniqlist", (DL_FUNC) &uniqlist, -1},
 {"Cuniqlengths", (DL_FUNC) &uniqlengths, -1},
-{"Cfastradixdouble", (DL_FUNC) &fastradixdouble, -1}, 
-{"Cfastradixint", (DL_FUNC) &fastradixint, -1},
 {"Csetrev", (DL_FUNC) &setrev, -1},
 {"Cforder", (DL_FUNC) &forder, -1},
 {"Cfsorted", (DL_FUNC) &fsorted, -1},
@@ -122,7 +140,24 @@ R_CallMethodDef callMethods[] = {
 {"CisReallyReal", (DL_FUNC) &isReallyReal, -1},
 {"Csetlevels", (DL_FUNC) &setlevels, -1},
 {"Crleid", (DL_FUNC) &rleid, -1},
-
+{"Cgmedian", (DL_FUNC) &gmedian, -1},
+{"Cgtail", (DL_FUNC) &gtail, -1},
+{"Cghead", (DL_FUNC) &ghead, -1},
+{"Cglast", (DL_FUNC) &glast, -1},
+{"Cgfirst", (DL_FUNC) &gfirst, -1},
+{"Cgnthvalue", (DL_FUNC) &gnthvalue, -1},
+{"Cdim", (DL_FUNC) &dim, -1},
+{"Cgvar", (DL_FUNC) &gvar, -1},
+{"Cgsd", (DL_FUNC) &gsd, -1},
+{"Cgprod", (DL_FUNC) &gprod, -1},
+{"Cnestedid", (DL_FUNC) &nestedid, -1},
+{"CsetDTthreads", (DL_FUNC) &setDTthreads, -1},
+{"CgetDTthreads", (DL_FUNC) &getDTthreads_R, -1},
+{"Cnqnewindices", (DL_FUNC) &nqnewindices, -1},
+{"Cfsort", (DL_FUNC) &fsort, -1},
+{"Cinrange", (DL_FUNC) &inrange, -1},
+{"Cbetween", (DL_FUNC) &between, -1},
+{"ChasOpenMP", (DL_FUNC) &hasOpenMP, -1},
 {NULL, NULL, 0}
 };
 
@@ -139,7 +174,7 @@ void attribute_visible R_init_datatable(DllInfo *info)
     R_registerRoutines(info, NULL, callMethods, NULL, externalMethods);
     R_useDynamicSymbols(info, FALSE);
     setSizes();
-    const char *msg = "... failed. Please forward this message to maintainer('data.table') or datatable-help.";
+    const char *msg = "... failed. Please forward this message to maintainer('data.table').";
     if (NA_INTEGER != INT_MIN) error("Checking NA_INTEGER [%d] == INT_MIN [%d] %s", NA_INTEGER, INT_MIN, msg);
     if (NA_INTEGER != NA_LOGICAL) error("Checking NA_INTEGER [%d] == NA_LOGICAL [%d] %s", NA_INTEGER, NA_LOGICAL, msg);
     if (sizeof(int) != 4) error("Checking sizeof(int) [%d] is 4 %s", sizeof(int), msg);
@@ -168,9 +203,53 @@ void attribute_visible R_init_datatable(DllInfo *info)
     memset(&ld, 0, sizeof(long double));
     if (ld != 0.0) error("Checking memset(&ld, 0, sizeof(long double)); ld == (long double)0.0 %s", msg);
     
-    setNumericRounding(ScalarInteger(2));
+    setNumericRounding(ScalarInteger(0)); // #1642, #1728, #1463, #485
+    
+    // create needed strings in advance for speed, same techique as R_*Symbol
+    // Following R-exts 5.9.4; paragraph and example starting "Using install ..."
+    // either use PRINTNAME(install()) or R_PreserveObject(mkChar()) here.
+    char_integer64 = PRINTNAME(install("integer64"));
+    char_ITime =     PRINTNAME(install("ITime"));
+    char_Date =      PRINTNAME(install("Date"));   // used for IDate too since IDate inherits from Date
+    char_POSIXct =   PRINTNAME(install("POSIXct"));
+    if (TYPEOF(char_integer64) != CHARSXP) {
+      // checking one is enough in case of any R-devel changes
+      error("PRINTNAME(install(\"integer64\")) has returned %s not %s",
+            type2char(TYPEOF(char_integer64)), type2char(CHARSXP));
+    }
     
-    char_integer64 = mkChar("integer64");  // for speed, similar to R_*Symbol.
+    avoid_openmp_hang_within_fork();
+}
+
+
+inline Rboolean INHERITS(SEXP x, SEXP char_) {
+  // Thread safe inherits() by pre-calling install() above in init first then
+  // passing those char_* in here for simple and fast non-API pointer compare.
+  // The thread-safety aspect here is only currently actually needed for list columns in
+  // fwrite() where the class of the cell's vector is tested; the class of the column
+  // itself is pre-stored by fwrite (for example in isInteger64[] and isITime[]).
+  // Thread safe in the limited sense of correct and intended usage :
+  // i) no API call such as install() or mkChar() must be passed in.
+  // ii) no attrib writes must be possible in other threads.
+  SEXP class;
+  if (isString(class = getAttrib(x, R_ClassSymbol))) {
+    for (int i=0; i<LENGTH(class); i++) {
+      if (STRING_ELT(class, i) == char_) return TRUE;
+    }
+  }
+  return FALSE;
+}
+
+SEXP hasOpenMP() {
+  // Just for use by onAttach to avoid an RPRINTF from C level which isn't suppressable by CRAN
+  // There is now a 'grep' in CRAN_Release.cmd to detect any use of RPRINTF in init.c, which is
+  // why RPRINTF is capitalized in this comment to avoid that grep.
+  // TODO: perhaps .Platform or .Machine in R itself could contain whether OpenMP is available.
+  #ifdef _OPENMP
+  return ScalarLogical(TRUE);
+  #else
+  return ScalarLogical(FALSE);
+  #endif
 }
 
 
diff --git a/src/inrange.c b/src/inrange.c
new file mode 100644
index 0000000..7c6fd28
--- /dev/null
+++ b/src/inrange.c
@@ -0,0 +1,35 @@
+#include "data.table.h"
+#include <Rdefines.h>
+
+
+SEXP inrange(SEXP ansArg, SEXP xoArg, SEXP startsArg, SEXP lenArg) {
+
+    int *ans = INTEGER(ansArg), *xo = INTEGER(xoArg);
+    int *starts = INTEGER(startsArg), *len = INTEGER(lenArg);
+    R_len_t i, j, n = length(startsArg), nxo = length(xoArg);
+    for (i = 0; i < n; i++) {
+        for (j = starts[i]-1; j < starts[i]-1+len[i]; j++) {
+            ans[nxo ? xo[j]-1 : j] = 1;
+        }
+    }
+    // old complicated logic which is only really useful when matches 
+    // contains A LOT of overlapping indices.. rare in real examples.
+    // so switched to simpler logic above.. retaining it commented for now.
+
+    // R_len_t i =0,j, ss,ee,new_ss,new_ee;
+    // while(i < n && starts[i] == 0) i++;
+    // while (i < n) {
+    //     ss = starts[i]-1;
+    //     ee = ss + len[i]-1;
+    //     // Rprintf("Starting at %d, start=%d, end=%d\n", i, ss, ee);
+    //     // ss[i+1] >= ss[i] due to ordering from R-side
+    //     // if ee[i] >= ss[i+1], then there's overlap, pick largest of ee[i], ee[i+1]
+    //     while(++i < n && ee >= (new_ss = starts[i]-1)) {
+    //         new_ee = new_ss + len[i]-1;
+    //         ee = ee > new_ee ? ee : new_ee;
+    //     }
+    //     // Rprintf("Moved to %d, start=%d, end=%d\n", i, ss, ee);
+    //     for (j=ss; j<=ee; j++) ans[nxo ? xo[j]-1 : j] = 1;
+    // }
+    return (R_NilValue);
+}
diff --git a/src/openmp-utils.c b/src/openmp-utils.c
new file mode 100644
index 0000000..ca4871e
--- /dev/null
+++ b/src/openmp-utils.c
@@ -0,0 +1,64 @@
+#include "data.table.h"
+#ifdef _OPENMP
+#include <pthread.h>
+#endif
+
+/* GOALS:
+* 1) By default use all CPU for end-user convenience in most usage scenarios.
+* 2) But not on CRAN - two threads max is policy
+* 3) And not if user doesn't want to:
+*    i) Respect env variable OMP_NUM_THREADS (which just calls (ii) on startup)
+*    ii) Respect omp_set_num_threads()
+*    iii) Provide way to restrict data.table only independently of base R and 
+*         other packages using openMP
+* 4) Avoid user needing to remember to unset this control after their use of data.table
+* 5) Automatically drop down to 1 thread when called from parallel package (e.g. mclapply) to
+*    avoid the deadlock/hang (#1745 and #1727) and return to prior state afterwards.
+*/
+
+static int DTthreads = 0;
+// Never read directly, hence static. Always go via getDTthreads() and check that in
+// future using grep's in CRAN_Release.cmd
+
+int getDTthreads() {
+#ifdef _OPENMP
+    return DTthreads == 0 ? omp_get_max_threads() : DTthreads;  
+#else
+    return 1;
+#endif
+}
+
+SEXP getDTthreads_R() {
+    return ScalarInteger(getDTthreads());
+}
+
+SEXP setDTthreads(SEXP threads) {
+    if (!isInteger(threads) || length(threads) != 1 || INTEGER(threads)[0] < 0) {
+        // catches NA too since NA is -ve
+        error("Argument to setDTthreads must be a single integer >= 0. \
+            Default 0 is recommended to use all CPU.");
+    }
+    // do not call omp_set_num_threads() here as that affects other openMP 
+    // packages and base R as well potentially.
+    int old = DTthreads;
+    DTthreads = INTEGER(threads)[0];
+    return ScalarInteger(old);
+}
+
+// auto avoid deadlock when data.table called from parallel::mclapply
+static int preFork_DTthreads = 0;
+void when_fork() {
+    preFork_DTthreads = DTthreads;
+    DTthreads = 1;
+}
+void when_fork_end() {
+    DTthreads = preFork_DTthreads;
+}
+void avoid_openmp_hang_within_fork() {
+    // Called once on loading data.table from init.c
+#ifdef _OPENMP
+    pthread_atfork(&when_fork, &when_fork_end, NULL);
+#endif
+}
+
+
diff --git a/src/quickselect.c b/src/quickselect.c
new file mode 100644
index 0000000..668c8c2
--- /dev/null
+++ b/src/quickselect.c
@@ -0,0 +1,102 @@
+#include "data.table.h"
+#include <Rdefines.h>
+//#include <sys/mman.h>
+#include <Rversion.h>
+#include <fcntl.h>
+#include <time.h>
+
+// from good ol' Numerical Recipes in C
+#define SWAP(a,b) temp=(a);(a)=(b);(b)=temp;
+
+double dquickselect(double *x, int n, int k) {
+    unsigned long i,ir,j,l,mid;
+    double a,temp;
+
+    l=0;
+    ir=n-1;
+    for(;;) {
+        if (ir <= l+1) { 
+            if (ir == l+1 && x[ir] < x[l]) {
+                SWAP(x[l],x[ir]);
+            }
+        return x[k];
+        } else {
+            mid=(l+ir) >> 1; 
+            SWAP(x[mid],x[l+1]);
+            if (x[l] > x[ir]) {
+                SWAP(x[l],x[ir]);
+            }
+            if (x[l+1] > x[ir]) {
+                SWAP(x[l+1],x[ir]);
+            }
+            if (x[l] > x[l+1]) {
+                SWAP(x[l],x[l+1]);
+            }
+            i=l+1; 
+            j=ir;
+            a=x[l+1]; 
+            for (;;) { 
+                do i++; while (x[i] < a); 
+                do j--; while (x[j] > a); 
+                if (j < i) break; 
+                    SWAP(x[i],x[j]);
+            } 
+            x[l+1]=x[j]; 
+            x[j]=a;
+            if (j >= k) ir=j-1; 
+            if (j <= k) l=i;
+        }
+    }
+}
+
+double iquickselect(int *x, int n, int k) {
+    unsigned long i,ir,j,l,mid;
+    int a,temp;
+
+    l=0;
+    ir=n-1;
+    for(;;) {
+        if (ir <= l+1) { 
+            if (ir == l+1 && x[ir] < x[l]) {
+                SWAP(x[l],x[ir]);
+            }
+        return (double)(x[k]);
+        } else {
+            mid=(l+ir) >> 1; 
+            SWAP(x[mid],x[l+1]);
+            if (x[l] > x[ir]) {
+                SWAP(x[l],x[ir]);
+            }
+            if (x[l+1] > x[ir]) {
+                SWAP(x[l+1],x[ir]);
+            }
+            if (x[l] > x[l+1]) {
+                SWAP(x[l],x[l+1]);
+            }
+            i=l+1; 
+            j=ir;
+            a=x[l+1]; 
+            for (;;) { 
+                do i++; while (x[i] < a); 
+                do j--; while (x[j] > a); 
+                if (j < i) break; 
+                    SWAP(x[i],x[j]);
+            } 
+            x[l+1]=x[j]; 
+            x[j]=a;
+            if (j >= k) ir=j-1; 
+            if (j <= k) l=i;
+        }
+    }
+}
+
+
+// SEXP quickselect(SEXP xArg, SEXP n, SEXP k) {
+
+//     void *x = DATAPTR(xArg);
+//     SEXP ans = PROTECT(allocVector(REALSXP, 1L));
+//     REAL(ans)[0] = quickselectwrapper(x, INTEGER(n)[0], INTEGER(k)[0]-1);
+
+//     UNPROTECT(1);
+//     return(ans);
+// }
diff --git a/src/rbindlist.c b/src/rbindlist.c
index f149ce7..b5188ed 100644
--- a/src/rbindlist.c
+++ b/src/rbindlist.c
@@ -1,21 +1,11 @@
 #include "data.table.h"
 #include <Rdefines.h>
-#include <Rversion.h>
 #include <stdint.h>
 // #include <signal.h> // the debugging machinery + breakpoint aidee
 // raise(SIGINT);
 
 /* Eddi's hash setup for combining factor levels appropriately - untouched from previous state (except made combineFactorLevels static) */
 
-// Fixes #5150
-// a simple check for R version to decide if the type should be R_len_t or R_xlen_t
-// long vector support was added in R 3.0.0
-#if defined(R_VERSION) && R_VERSION >= R_Version(3, 0, 0)
-  typedef R_xlen_t RLEN;
-#else
-  typedef R_len_t RLEN;
-#endif
-
 // a simple linked list, will use this when finding global order for ordered factors
 // will keep two ints
 struct llist {
@@ -135,11 +125,12 @@ static void HashTableSetup(HashData *d, RLEN n)
     d->hash = shash;
     d->equal = sequal;
     MKsetup(d, n);
-    d->HashTable = malloc(sizeof(struct llist *) * (d->M));
-    if (d->HashTable == NULL) error("malloc failed in rbindlist.c. This part of the code will be reworked.");
+    //d->HashTable = malloc(sizeof(struct llist *) * (d->M));
+    //if (d->HashTable == NULL) error("malloc failed in rbindlist.c. This part of the code will be reworked.");
+    d->HashTable = (struct llist **)R_alloc(d->M, sizeof(struct llist *));
     for (RLEN i = 0; i < d->M; i++) d->HashTable[i] = NULL;
 }
-
+/*
 static void CleanHashTable(HashData *d)
 {
     struct llist * root, * tmp;
@@ -154,6 +145,7 @@ static void CleanHashTable(HashData *d)
     }
     free(d->HashTable);
 }
+*/
 
 // factorType is 1 for factor and 2 for ordered
 // will simply unique normal factors and attempt to find global order for ordered ones
@@ -204,8 +196,7 @@ SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean * isRowOr
             }
             if (data.nmax-- < 0) error("hash table is full");
 
-            pl = malloc(sizeof(struct llist));
-            if (pl == NULL) error("malloc failed in rbindlist.c. This part of the code will be reworked.");
+            pl = (struct llist *)R_alloc(1, sizeof(struct llist));
             pl->next = NULL;
             pl->i = i;
             pl->j = j;
@@ -221,9 +212,10 @@ SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean * isRowOr
     SEXP finalLevels = PROTECT(allocVector(STRSXP, uniqlen));
     R_len_t counter = 0;
     if (*factorType == 2) {
-        int * locs = malloc(sizeof(int) * len);
-        if (locs == NULL) error("malloc failed in rbindlist.c. This part of the code will be reworked.");
-        for (i = 0; i < len; ++i) locs[i] = 0;
+        int *locs = (int *)R_alloc(len, sizeof(int));
+        for (int i=0; i<len; i++) locs[i] = 0;
+        // note there's a goto (!!) normalFactor below. When locs was allocated with malloc, the goto jumped over the
+        // old free() and caused leak. Now uses the safer R_alloc.  TODO - review all this logic.
 
         R_len_t k;
         SEXP tmp;
@@ -299,7 +291,6 @@ SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean * isRowOr
                 if (h[idx] == NULL) error("internal hash error, please report to datatable-help");
             }
         }
-        free (locs);
     }
 
  normalFactor:
@@ -325,7 +316,8 @@ SEXP combineFactorLevels(SEXP factorLevels, int * factorType, Rboolean * isRowOr
         }
     }
 
-    CleanHashTable(&data);
+    // CleanHashTable(&data);   No longer needed now we use R_alloc(). But the hash table approach
+    // will be removed completely at some point.
 
     return finalLevels;
 }
@@ -502,12 +494,14 @@ static void preprocess(SEXP l, Rboolean usenames, Rboolean fill, struct preproce
     
     data->first = -1; data->lcount = 0; data->n_rows = 0; data->n_cols = 0; data->protecti = 0;
     data->max_type = NULL; data->is_factor = NULL; data->ans_ptr = R_NilValue; data->mincol=0;
-    data->fn_rows = Calloc(LENGTH(l), int); data->colname = R_NilValue;
+    data->fn_rows = (int *)R_alloc(LENGTH(l), sizeof(int));
+    data->colname = R_NilValue;
 
     // get first non null name, 'rbind' was doing a 'match.names' for each item.. which is a bit more time consuming.
     // And warning that it'll be matched by names is not necessary, I think, as that's the default for 'rbind'. We 
     // should instead document it.
     for (i=0; i<LENGTH(l); i++) { // isNull is checked already in rbindlist
+        data->fn_rows[i] = 0;  // careful to initialize before continues as R_alloc above doesn't initialize
         li = VECTOR_ELT(l, i);
         if (isNull(li)) continue;
         if (TYPEOF(li) != VECSXP) error("Item %d of list input is not a data.frame, data.table or list",i+1);
@@ -565,10 +559,12 @@ static void preprocess(SEXP l, Rboolean usenames, Rboolean fill, struct preproce
     
     // decide type of each column
     // initialize the max types - will possibly increment later
-    data->max_type  = Calloc(data->n_cols, SEXPTYPE);
-    data->is_factor = Calloc(data->n_cols, int);
+    data->max_type  = (SEXPTYPE *)R_alloc(data->n_cols, sizeof(SEXPTYPE));
+    data->is_factor = (int *)R_alloc(data->n_cols, sizeof(int));
     for (i = 0; i< data->n_cols; i++) {
         thisClass = R_NilValue;
+        data->max_type[i] = 0;
+        data->is_factor[i] = 0;
         if (usenames) f_ind = VECTOR_ELT(findices, i);
         for (j=data->first; j<LENGTH(l); j++) {
             if (data->is_factor[i] == 2) break;
@@ -585,7 +581,7 @@ static void preprocess(SEXP l, Rboolean usenames, Rboolean fill, struct preproce
             } else {
                 // Fix for #705, check attributes and error if non-factor class and not identical
                 if (!data->is_factor[i] && 
-                    !R_compute_identical(thisClass, getAttrib(thiscol, R_ClassSymbol), 0)) {
+                    !R_compute_identical(thisClass, getAttrib(thiscol, R_ClassSymbol), 0) && !fill) {
                     error("Class attributes at column %d of input list at position %d does not match with column %d of input list at position %d. Coercion of objects of class 'factor' alone is handled internally by rbind/rbindlist at the moment.", i+1, j+1, i+1, data->first+1);
                 }
                 type = TYPEOF(thiscol);
@@ -595,16 +591,27 @@ static void preprocess(SEXP l, Rboolean usenames, Rboolean fill, struct preproce
     }
 }
 
-SEXP rbindlist(SEXP l, SEXP sexp_usenames, SEXP sexp_fill) {
+// function does c(idcol, nm), where length(idcol)=1
+// fix for #1432, + more efficient to move the logic to C
+SEXP add_idcol(SEXP nm, SEXP idcol, int cols) {
+    SEXP ans = PROTECT(allocVector(STRSXP, cols+1));
+    SET_STRING_ELT(ans, 0, STRING_ELT(idcol, 0));
+    for (int i=0; i<cols; i++) {
+        SET_STRING_ELT(ans, i+1, STRING_ELT(nm, i));
+    }
+    UNPROTECT(1);
+    return (ans);
+}
+
+SEXP rbindlist(SEXP l, SEXP sexp_usenames, SEXP sexp_fill, SEXP idcol) {
     
     R_len_t jj, ansloc, resi, i,j,r, idx, thislen;
     struct preprocessData data; 
-    Rboolean usenames, fill, to_copy = FALSE, coerced=FALSE;
+    Rboolean usenames, fill, to_copy = FALSE, coerced=FALSE, isidcol = !isNull(idcol);
     SEXP fnames = R_NilValue, findices = R_NilValue, f_ind = R_NilValue, ans, lf, li, target, thiscol, levels;
     SEXP factorLevels = R_NilValue, finalFactorLevels;
-    Rboolean *isRowOrdered = NULL;
-    R_len_t protecti;
-    
+    R_len_t protecti=0;
+
     // first level of error checks
     if (!isLogical(sexp_usenames) || LENGTH(sexp_usenames)!=1 || LOGICAL(sexp_usenames)[0]==NA_LOGICAL)
         error("use.names should be TRUE or FALSE");
@@ -620,26 +627,31 @@ SEXP rbindlist(SEXP l, SEXP sexp_usenames, SEXP sexp_fill) {
         warning("Resetting 'use.names' to TRUE. 'use.names' can not be FALSE when 'fill=TRUE'.\n");
         usenames=TRUE;
     }
+
     // check for factor, get max types, and when usenames=TRUE get the answer 'names' and column indices for proper reordering.
     preprocess(l, usenames, fill, &data);
     fnames   = VECTOR_ELT(data.ans_ptr, 0);
     findices = VECTOR_ELT(data.ans_ptr, 1);
-    protecti = data.protecti;
+    protecti = data.protecti;   // TODO very ugly and doesn't seem right. Assign items to list instead, perhaps.
     if (data.n_rows == 0 && data.n_cols == 0) {
         UNPROTECT(protecti);
         return(R_NilValue);
     }
-
+    if (isidcol) {
+        fnames = PROTECT(add_idcol(fnames, idcol, data.n_cols));
+        protecti++;
+    }
     factorLevels = PROTECT(allocVector(VECSXP, data.lcount));
-    isRowOrdered = Calloc(data.lcount, Rboolean);
+    Rboolean *isRowOrdered = (Rboolean *)R_alloc(data.lcount, sizeof(Rboolean));
+    for (int i=0; i<data.lcount; i++) isRowOrdered[i] = FALSE;
     
-    ans = PROTECT(allocVector(VECSXP, data.n_cols)); protecti++;
+    ans = PROTECT(allocVector(VECSXP, data.n_cols+isidcol)); protecti++;
     setAttrib(ans, R_NamesSymbol, fnames);
     lf = VECTOR_ELT(l, data.first);
     for(j=0; j<data.n_cols; j++) {
         if (fill) target = allocNAVector(data.max_type[j], data.n_rows);
         else target = allocVector(data.max_type[j], data.n_rows);
-        SET_VECTOR_ELT(ans, j, target);
+        SET_VECTOR_ELT(ans, j+isidcol, target);
         
         if (usenames) {
             to_copy = TRUE;
@@ -714,6 +726,11 @@ SEXP rbindlist(SEXP l, SEXP sexp_usenames, SEXP sexp_fill) {
                 for (r=0; r<thislen; r++)
                     SET_VECTOR_ELT(target, ansloc+r, VECTOR_ELT(thiscol,r));
                 break;
+	    case CPLXSXP : // #1659 fix
+		if (TYPEOF(thiscol) != TYPEOF(target)) error("Internal logical error in rbindlist.c, type of 'thiscol' should have already been coerced to 'target'. Please report to datatable-help.");
+		for (r=0; r<thislen; r++)
+		    COMPLEX(target)[ansloc+r] = COMPLEX(thiscol)[r];
+		break;
             case REALSXP:
             case INTSXP:
             case LGLSXP:
@@ -735,16 +752,34 @@ SEXP rbindlist(SEXP l, SEXP sexp_usenames, SEXP sexp_fill) {
             finalFactorLevels = combineFactorLevels(factorLevels, &(data.is_factor[j]), isRowOrdered);
             SEXP factorLangSxp = PROTECT(lang3(install(data.is_factor[j] == 1 ? "factor" : "ordered"),
                                                target, finalFactorLevels));
-            SET_VECTOR_ELT(ans, j, eval(factorLangSxp, R_GlobalEnv));
+            SET_VECTOR_ELT(ans, j+isidcol, eval(factorLangSxp, R_GlobalEnv));
             UNPROTECT(2);  // finalFactorLevels, factorLangSxp
         }
     }
     if (factorLevels != R_NilValue) UNPROTECT_PTR(factorLevels);
 
-    Free(data.max_type);
-    Free(data.is_factor);
-    Free(data.fn_rows);
-    Free(isRowOrdered);
+    // fix for #1432, + more efficient to move the logic to C
+    if (isidcol) {
+        R_len_t runidx = 1, cntridx = 0;
+        SEXP lnames = getAttrib(l, R_NamesSymbol);
+        if (isNull(lnames)) {
+            target = allocVector(INTSXP, data.n_rows);
+            SET_VECTOR_ELT(ans, 0, target);
+            for (i=0; i<LENGTH(l); i++) {
+                for (j=0; j<data.fn_rows[i]; j++)
+                    INTEGER(target)[cntridx++] = runidx;
+                runidx++;
+            }
+        } else {
+            target = allocVector(STRSXP, data.n_rows);
+            SET_VECTOR_ELT(ans, 0, target);
+            for (i=0; i<LENGTH(l); i++) {
+                for (j=0; j<data.fn_rows[i]; j++)
+                    SET_STRING_ELT(target, cntridx++, STRING_ELT(lnames, i));
+            }
+        }
+    }
+
     UNPROTECT(protecti);
     return(ans);
 }
diff --git a/src/reorder.c b/src/reorder.c
index 74008d8..bcc2a58 100644
--- a/src/reorder.c
+++ b/src/reorder.c
@@ -1,5 +1,107 @@
 #include "data.table.h"
-#include <Rdefines.h>
+
+SEXP reorder(SEXP x, SEXP order)
+{
+    // For internal use only by setkey().
+    // 'order' must strictly be a permutation of 1:n (i.e. no repeats, zeros or NAs)
+    // If only a small subset in the middle is reordered the ends are moved in: [start,end].
+    // x may be a vector, or a list of same-length vectors such as data.table
+    
+    R_len_t nrow, ncol;
+    int maxSize = 0;
+    if (isNewList(x)) {
+      nrow = length(VECTOR_ELT(x,0));
+      ncol = length(x);
+      for (int i=0; i<ncol; i++) {
+        SEXP v = VECTOR_ELT(x,i);
+        if (SIZEOF(v)!=4 && SIZEOF(v)!=8)
+          error("Item %d of list is type '%s' which isn't yet supported", i+1, type2char(TYPEOF(v)));
+        if (length(v)!=nrow)
+          error("Column %d is length %d which differs from length of column 1 (%d). Invalid data.table.", i+1, length(v), nrow);
+        if (SIZEOF(v) > maxSize)
+          maxSize=SIZEOF(v);
+      }
+    } else {
+      if (SIZEOF(x)!=4 && SIZEOF(x)!=8)
+        error("reorder accepts vectors but this non-VECSXP is type '%s' which isn't yet supported", type2char(TYPEOF(x)));
+      maxSize = SIZEOF(x);
+      nrow = length(x);
+      ncol = 1;
+    }
+    if (!isInteger(order)) error("order must be an integer vector");
+    if (length(order) != nrow) error("nrow(x)[%d]!=length(order)[%d]",nrow,length(order));
+    
+    R_len_t start = 0;
+    while (start<nrow && INTEGER(order)[start] == start+1) start++;
+    if (start==nrow) return(R_NilValue);  // input is 1:n, nothing to do
+    R_len_t end = nrow-1;
+    while (INTEGER(order)[end] == end+1) end--;
+    for (R_len_t i=start; i<=end; i++) { 
+      int itmp = INTEGER(order)[i]-1;
+      if (itmp<start || itmp>end) error("order is not a permutation of 1:nrow[%d]", nrow);
+    }
+    // Creorder is for internal use (so we should get the input right!), but the check above seems sensible, otherwise
+    // would be segfault below. The for loop above should run in neglible time (sequential) and will also catch NAs.
+    // It won't catch duplicates in order, but that's ok. Checking that would be going too far given this is for internal use only.
+    
+    // Enough working ram for one column of the largest type, for every thread.
+    // Up to a limit of 1GB total. It's system dependent how to find out the truly free RAM - TODO.
+    // Without a limit it could easily start swapping and not work at all.
+    int nth = MIN(getDTthreads(), ncol);
+    size_t oneTmpSize = (end-start+1)*(size_t)maxSize;
+    size_t totalLimit = 1024*1024*(size_t)1024;  // 1GB
+    nth = MIN(totalLimit/oneTmpSize, nth);
+    if (nth==0) nth=1;  // if one column's worth is very big, we'll just have to try
+    char *tmp[nth];  // VLA ok because small; limited to max getDTthreads() not ncol which could be > 1e6
+    int ok=0; for (; ok<nth; ok++) {
+      tmp[ok] = malloc(oneTmpSize);
+      if (tmp[ok] == NULL) break;
+    }
+    if (ok==0) error("unable to allocate %d * %d bytes of working memory for reordering data.table", end-start+1, maxSize);
+    nth = ok;  // as many threads for which we have a successful malloc
+    // So we can still reorder a 10GB table in 16GB of RAM, as long as we have at least one column's worth of tmp
+    
+    #pragma omp parallel for schedule(dynamic) num_threads(nth)
+    for (int i=0; i<ncol; i++) {
+      const SEXP v = isNewList(x) ? VECTOR_ELT(x,i) : x;
+      const int size = SIZEOF(v);
+      const int me = omp_get_thread_num();
+      const int *vi = INTEGER(order)+start;
+      if (size==4) {
+        const int *vd = (const int *)DATAPTR(v);
+        int *tmpp = (int *)tmp[me];
+        for (int j=start; j<=end; j++) {
+          *tmpp++ = vd[*vi++ -1];  // just copies 4 bytes, including pointers on 32bit
+        }
+      } else {
+        const double *vd = (const double *)DATAPTR(v);
+        double *tmpp = (double *)tmp[me];
+        for (int j=start; j<=end; j++) {
+          *tmpp++ = vd[*vi++ -1];  // just copies 8 bytes, pointers too including STRSXP and VECSXP
+        }
+      }
+      // How is this possible to not only ignore the write barrier but in parallel too?
+      // Only because this reorder() function accepts and checks a unique permutation of 1:nrow. It
+      // performs an in-place shuffle. This operation in the end does not change gcgen, mark or
+      // named/refcnt. They all stay the same even for STRSXP and VECSXP because it's just a data shuffle.
+      //
+      // Theory:
+      // The write to tmp is contiguous and io efficient (so less threads should not help that)
+      // The read from vd is as io efficient as order is ordered (the more threads the better when close
+      // to ordered but less threads may help when not very ordered).
+      // TODO large data benchmark to confirm theory and auto tune.
+      // io probably limits too much but at least this is our best shot (e.g. branchless) in finding out
+      // on other platforms with faster bus, perhaps
+      
+      // copy the reordered data back into the original vector
+      memcpy((char *)DATAPTR(v) + start*(size_t)size,
+             tmp[me],
+             (end-start+1)*(size_t)size);
+      // size_t, otherwise #5305 (integer overflow in memcpy)
+    }
+    for (int i=0; i<nth; i++) free(tmp[i]);
+    return(R_NilValue);
+}
 
 // reverse a vector - equivalent of rev(x) in base, but implemented in C and about 12x faster (on 1e8)
 SEXP setrev(SEXP x) {
@@ -35,109 +137,4 @@ SEXP setrev(SEXP x) {
     return(R_NilValue);
 }
 
-SEXP reorder(SEXP x, SEXP order)
-{
-    // For internal use only by setkey().
-    // Reordering a vector in-place doesn't change generations so we can skip SET_STRING_ELT overhead etc.
-    // Speed is dominated by page fetch when input is randomly ordered so we're at the software limit here (better bus etc should shine).
-    // 'order' must strictly be a permutation of 1:n (i.e. no repeats, zeros or NAs)
-    // If only a small subset is reordered, this is detected using start and end.
-    // x may be a vector, or a list of vectors e.g. data.table
-    char *tmp, *tmpp, *vd;
-    SEXP v;
-    R_len_t i, j, itmp, nrow, ncol, start, end;
-    size_t size; // must be size_t, otherwise bug #5305 (integer overflow in memcpy)
-
-    if (isNewList(x)) {
-        nrow = length(VECTOR_ELT(x,0));
-        ncol = length(x);
-        for (i=0;i<ncol;i++) {
-            v = VECTOR_ELT(x,i);
-            if (SIZEOF(v) == 0) error("Item %d of list is type '%s' which isn't yet supported", i+1, type2char(TYPEOF(v)));
-            if (length(v)!=nrow) error("Column %d is length %d which differs from length of column 1 (%d). Invalid data.table.", i+1, length(v), nrow);
-        }
-    } else {
-        if (SIZEOF(x) == 0) error("reorder accepts vectors but this non-VECSXP is type '%s' which isn't yet supported", type2char(TYPEOF(x)));
-        nrow = length(x);
-        ncol = 1;
-    }
-    if (!isInteger(order)) error("order must be an integer vector");
-    if (length(order) != nrow) error("nrow(x)[%d]!=length(order)[%d]",nrow,length(order));
-    
-    start = 0;
-    while (start<nrow && INTEGER(order)[start] == start+1) start++;
-    if (start==nrow) return(R_NilValue);  // input is 1:n, nothing to do
-    end = nrow-1;
-    while (INTEGER(order)[end] == end+1) end--;
-    for (i=start; i<=end; i++) { itmp=INTEGER(order)[i]-1; if (itmp<start || itmp>end) error("order is not a permutation of 1:nrow[%d]", nrow); }
-    // Creorder is for internal use (so we should get the input right!), but the check above seems sensible, otherwise
-    // would be segfault below. The for loop above should run in neglible time (sequential) and will also catch NAs.
-    // It won't catch duplicates in order, but that's ok. Checking that would be going too far given this is for internal use only.
-    
-    tmp=(char *)malloc((end-start+1)*sizeof(double));   // Enough working space for one column of the largest type. setSizes() has a check too.
-                                                        // So we can reorder a 10GB table in 16GB of RAM
-    if (!tmp) error("unable to allocate %d * %d bytes of working memory for reordering data.table", end-start+1, sizeof(double));
-    for (i=0; i<ncol; i++) {
-        v = isNewList(x) ? VECTOR_ELT(x,i) : x;
-        size = SIZEOF(v);
-        if (!size) error("don't know how to reorder type '%s' of column %d. Please send this message to datatable-help",type2char(TYPEOF(v)),i+1);
-        tmpp=tmp;
-        vd = (char *)DATAPTR(v);
-        if (size==4) {
-            for (j=start;j<=end;j++) {
-                *(int *)tmpp = ((int *)vd)[INTEGER(order)[j]-1];  // just copies 4 bytes (pointers on 32bit too)
-                tmpp += 4;
-            }
-        } else {
-            if (size!=8) error("Size of column %d's type isn't 4 or 8", i+1);
-            for (j=start;j<=end;j++) {
-                *(double *)tmpp = ((double *)vd)[INTEGER(order)[j]-1];  // just copies 8 bytes (pointers on 64bit too)
-                tmpp += 8;
-            }
-        }
-        memcpy(vd + start*size, tmp, (end-start+1) * size);
-    }
-    free(tmp);
-    return(R_NilValue);
-}
-
-
-/* 
-used to be : 
-for (j=0;j<nrow;j++) {
-  memcpy((char *)tmpp, (char *)DATAPTR(VECTOR_ELT(dt,i)) + ((size_t)(INTEGER(order)[j]-1))*size, size);
-  tmpp += size;
-}
-This added 5s in 4e8 calls (see below) [-O3 on]. That 5s is insignificant vs page fetch, though, unless already
-ordered when page fetch goes away. Perhaps memcpy(dest, src, 4) and memcpy(dest, src, 8) would be optimized to remove
-call overhead but memcpy(dest, src, size) isn't since optimizer doesn't know value of 'size' variable up front.
-
-DT = setDT(lapply(1:4, function(x){sample(1e5,1e8,replace=TRUE)}))
-o = fastorder(DT, 1L)
-none = 1:1e8                           
-                                       
-# worst case 5% faster, thoroughly random     # before   after
-system.time(.Call(Creorder,DT,o))             # 102.301  97.082
-system.time(.Call(Creorder,DT,o))             # 102.602  97.001
-
-# best case 50% faster, already ordered       # before   after
-system.time(.Call(Creorder,DT,none))          # 9.310    4.187
-system.time(.Call(Creorder,DT,none))          # 9.295    4.077
-
-# Somewhere inbetween 5%-50% would be e.g. grouped data where we're reordering the blocks.
-# But, likely the worst case speedup most of the time. On my slow netbook anyway, with slow RAM.
-
-# However, on decent laptop (faster RAM/bus etc) it looks better at 40% speedup consistently ...
-
-# thoroughly random                           # before   after
-system.time(.Call("Creorder",DT,o))           # 33.216   19.184
-system.time(.Call("Creorder",DT,o))           # 30.556   18.667
-
-# already ordered                             # before   after
-system.time(.Call("Creorder",DT,none))        # 4.172    2.384
-system.time(.Call("Creorder",DT,none))        # 4.076    2.356
-
-*/
-
-
 
diff --git a/src/shift.c b/src/shift.c
index df23917..255d8cb 100644
--- a/src/shift.c
+++ b/src/shift.c
@@ -110,6 +110,18 @@ SEXP shift(SEXP obj, SEXP k, SEXP fill, SEXP type) {
                         copyMostAttrib(this, tmp);
                     }
                 break;
+
+		case VECSXP :
+		    thisfill = PROTECT(coerceVector(fill, VECSXP));
+		    for (j=0; j<nk; j++) {
+			tmp = allocVector(VECSXP, xrows);
+			SET_VECTOR_ELT(ans, i*nk+j, tmp);
+			for (m=0; m<xrows; m++)
+			    SET_VECTOR_ELT(tmp, m, (m < INTEGER(k)[j]) ? VECTOR_ELT(thisfill, 0) : VECTOR_ELT(this, m - INTEGER(k)[j]));
+			copyMostAttrib(this, tmp);
+		    }
+		break;
+
                 default :
                     error("Unsupported type '%s'", type2char(TYPEOF(this)));
             }
@@ -193,6 +205,18 @@ SEXP shift(SEXP obj, SEXP k, SEXP fill, SEXP type) {
                         copyMostAttrib(this, tmp);
                     }
                 break;
+
+		case VECSXP :
+		    thisfill = PROTECT(coerceVector(fill, VECSXP));
+		    for (j=0; j<nk; j++) {
+			tmp = allocVector(VECSXP, xrows);
+			SET_VECTOR_ELT(ans, i*nk+j, tmp);
+			for (m=0; m<xrows; m++)
+			    SET_VECTOR_ELT(tmp, m, (xrows-m <= INTEGER(k)[j]) ? VECTOR_ELT(thisfill, 0) : VECTOR_ELT(this, m + INTEGER(k)[j]));
+			copyMostAttrib(this, tmp);
+		    }
+		break;
+
     	        default :
     	            error("Unsupported type '%s'", type2char(TYPEOF(this)));
         	}
diff --git a/src/subset.c b/src/subset.c
new file mode 100644
index 0000000..1a12128
--- /dev/null
+++ b/src/subset.c
@@ -0,0 +1,313 @@
+#include "data.table.h"
+
+static SEXP subsetVectorRaw(SEXP target, SEXP source, SEXP idx, Rboolean any0orNA)
+// Only for use by subsetDT() or subsetVector() below, hence static
+{
+    if (!length(target)) return target;
+
+    const int max=length(source);
+    switch(TYPEOF(source)) {
+    case INTSXP : case LGLSXP :
+        if (any0orNA) {
+          // any 0 or NA *in idx*; if there's 0 or NA in the data that's just regular data to be copied
+          for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+              int this = INTEGER(idx)[i];
+              if (this==0) continue;
+              INTEGER(target)[ansi++] = (this==NA_INTEGER || this>max) ? NA_INTEGER : INTEGER(source)[this-1];
+              // negatives are checked before (in check_idx()) not to have reached here
+              // NA_INTEGER == NA_LOGICAL is checked in init.c
+          }
+        } else {
+          // totally branch free to give optimizer/hardware best chance on all platforms
+          // We keep the branchless version together here inside the same switch to keep
+          // the code together by type
+          // INTEGER and LENGTH are up front to isolate in preparation to stop using USE_RINTERNALS
+          int *vd = INTEGER(source);
+          int *vi = INTEGER(idx);
+          int *p =  INTEGER(target);
+          const int upp = LENGTH(idx);
+          for (int i=0; i<upp; i++) *p++ = vd[vi[i]-1];
+        }
+        break;
+    case REALSXP :
+        if (any0orNA) {
+          // define needed vars just when we need them. To registerize and to limit scope related bugs 
+          union { double d; long long ll; } naval;
+          if (INHERITS(source, char_integer64)) naval.ll = NAINT64;
+          else naval.d = NA_REAL;
+          for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+              int this = INTEGER(idx)[i];
+              if (this==0) continue;
+              REAL(target)[ansi++] = (this==NA_INTEGER || this>max) ? naval.d : REAL(source)[this-1];
+          }
+        } else {
+          double *vd = REAL(source);
+          int *vi =    INTEGER(idx);
+          double *p =  REAL(target);
+          const int upp = LENGTH(idx);
+          for (int i=0; i<upp; i++) *p++ = vd[vi[i]-1];
+        }
+        break;
+    case STRSXP : {
+        #pragma omp critical
+        // write barrier is not thread safe. We can and do do non-STRSXP at the same time, though.
+        // we don't strictly need the critical since subsetDT has been written to dispatch one-thread only to
+        // do all the STRSXP columns, but keep the critical here anyway for safety. So long as it's once at high
+        // level as it is here and not deep.
+        // We could go parallel here but would need access to NODE_IS_OLDER, at least. Given gcgen, mark and named
+        // are upper bounded and max 3, REFCNT==REFCNTMAX could be checked first and then critical SET_ if not.
+        // Inside that critical just before SET_ it could check REFCNT<REFCNTMAX still held. Similarly for gcgen.
+        // TODO - discuss with Luke Tierney. Produce benchmarks on integer/double to see if it's worth making a safe
+        // API interface for package use for STRSXP.
+        {
+          if (any0orNA) {
+            for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+                int this = INTEGER(idx)[i];
+                if (this==0) continue;
+                SET_STRING_ELT(target, ansi++, (this==NA_INTEGER || this>max) ? NA_STRING : STRING_ELT(source, this-1));
+            }
+          } else {
+            SEXP *vd = (SEXP *)DATAPTR(source);
+            int *vi =    INTEGER(idx);
+            const int upp = LENGTH(idx);
+            for (int i=0; i<upp; i++) SET_STRING_ELT(target, i, vd[vi[i]-1]);
+            // Aside: setkey() knows it always receives a permutation (it does a shuffle in-place) and so doesn't
+            // need to use SET_*. setkey() can do its own parallelism therefore, including STRSXP and VECSXP.
+          }
+        }}
+        break;
+    case VECSXP : {
+        #pragma omp critical
+        {
+          if (any0orNA) {
+            for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+                int this = INTEGER(idx)[i];
+                if (this==0) continue;
+                SET_VECTOR_ELT(target, ansi++, (this==NA_INTEGER || this>max) ? R_NilValue : VECTOR_ELT(source, this-1));
+            }
+          } else {
+            for (int i=0; i<LENGTH(idx); i++) {
+                SET_VECTOR_ELT(target, i, VECTOR_ELT(source, INTEGER(idx)[i]-1));
+            }
+          }
+        }}
+        break;
+    case CPLXSXP :
+        if (any0orNA) {
+          for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+              int this = INTEGER(idx)[i];
+              if (this==0) continue;
+              if (this==NA_INTEGER || this>max) {
+                  COMPLEX(target)[ansi].r = NA_REAL;
+                  COMPLEX(target)[ansi++].i = NA_REAL;
+              } else COMPLEX(target)[ansi++] = COMPLEX(source)[this-1];
+          }
+        } else {
+          for (int i=0; i<LENGTH(idx); i++)
+              COMPLEX(target)[i] = COMPLEX(source)[INTEGER(idx)[i]-1];
+        }
+        break;
+    case RAWSXP :
+        if (any0orNA) {
+          for (int i=0, ansi=0; i<LENGTH(idx); i++) {
+              int this = INTEGER(idx)[i];
+              if (this==0) continue;
+              RAW(target)[ansi++] = (this==NA_INTEGER || this>max) ? (Rbyte) 0 : RAW(source)[this-1];
+          }
+        } else {
+          for (int i=0; i<LENGTH(idx); i++)
+              RAW(target)[i] = RAW(source)[INTEGER(idx)[i]-1];
+        }
+        break;
+    // default :
+    // no error() needed here as caught earlier when single threaded; error() here not thread-safe.
+    }
+    return target;
+}
+
+static void check_idx(SEXP idx, int max, /*outputs...*/int *ansLen, Rboolean *any0orNA)
+// count non-0 in idx => the length of the subset result stored in *ansLen
+// return whether any 0, NA (or >max) exist and set any0orNA if so, for branchless subsetVectorRaw
+// >max is treated as NA for consistency with [.data.frame and operations like cbind(DT[w],DT[w+1])
+// if any negatives then error since they should have been dealt with by convertNegativeIdx() called
+// from R level first. 
+// do this once up-front and reuse the result for each column
+// single cache efficient sweep so no need to go parallel (well, very low priority to go parallel)
+{
+    if (!isInteger(idx)) error("Internal error. 'idx' is type '%s' not 'integer'", type2char(TYPEOF(idx)));
+    Rboolean anyNeg=FALSE, anyNA=FALSE;
+    int ans=0;
+    for (int i=0; i<LENGTH(idx); i++) {
+        int this = INTEGER(idx)[i];
+        ans += (this!=0);
+        anyNeg |= this<0 && this!=NA_INTEGER;
+        anyNA |= this==NA_INTEGER || this>max;
+    }
+    if (anyNeg) error("Internal error: idx contains negatives. Should have been dealt with earlier.");
+    *ansLen = ans;
+    *any0orNA = ans<LENGTH(idx) || anyNA;
+}
+
+// TODO - currently called from R level first. Can it be called from check_idx instead?
+SEXP convertNegativeIdx(SEXP idx, SEXP maxArg)
+{
+    // + more precise and helpful error messages telling user exactly where the problem is (saving user debugging time)
+    // + a little more efficient than negativeSubscript in src/main/subscript.c (it's private to R so we can't call it anyway)
+
+    if (!isInteger(idx)) error("Internal error. 'idx' is type '%s' not 'integer'", type2char(TYPEOF(idx)));
+    if (!isInteger(maxArg) || length(maxArg)!=1) error("Internal error. 'maxArg' is type '%s' and length %d, should be an integer singleton", type2char(TYPEOF(maxArg)), length(maxArg));
+    int max = INTEGER(maxArg)[0];
+    if (max<0) error("Internal error. max is %d, must be >= 0.", max);  // NA also an error which'll print as INT_MIN
+    int firstNegative = 0, firstPositive = 0, firstNA = 0, num0 = 0;
+    for (int i=0; i<LENGTH(idx); i++) {
+        int this = INTEGER(idx)[i];
+        if (this==NA_INTEGER) { if (firstNA==0) firstNA = i+1;  continue; }
+        if (this==0)          { num0++;  continue; }
+        if (this>0)           { if (firstPositive==0) firstPositive=i+1; continue; }
+        if (firstNegative==0) firstNegative=i+1;
+    }
+    if (firstNegative==0) return(idx);  // 0's and NA can be mixed with positives, there are no negatives present, so we're done
+    if (firstPositive) error("Item %d of i is %d and item %d is %d. Cannot mix positives and negatives.",
+                 firstNegative, INTEGER(idx)[firstNegative-1], firstPositive, INTEGER(idx)[firstPositive-1]);
+    if (firstNA)       error("Item %d of i is %d and item %d is NA. Cannot mix negatives and NA.",
+                 firstNegative, INTEGER(idx)[firstNegative-1], firstNA);
+
+    // idx is all negative without any NA but perhaps 0 present (num0) ...
+
+    char *tmp = (char *)R_alloc(max, sizeof(char));    // 4 times less memory that INTSXP in src/main/subscript.c
+    for (int i=0; i<max; i++) tmp[i] = 0;
+    // Not using Calloc as valgrind shows it leaking (I don't see why) - just changed to R_alloc to be done with it.
+    // Maybe R needs to be rebuilt with valgrind before Calloc's Free can be matched up by valgrind?
+    int firstDup = 0, numDup = 0, firstBeyond = 0, numBeyond = 0;
+    for (int i=0; i<LENGTH(idx); i++) {
+        int this = -INTEGER(idx)[i];
+        if (this==0) continue;
+        if (this>max) {
+            numBeyond++;
+            if (firstBeyond==0) firstBeyond=i+1;
+            continue;
+        }
+        if (tmp[this-1]==1) {
+            numDup++;
+            if (firstDup==0) firstDup=i+1;
+        } else tmp[this-1] = 1;
+    }
+    if (numBeyond)
+        warning("Item %d of i is %d but there are only %d rows. Ignoring this and %d more like it out of %d.", firstBeyond, INTEGER(idx)[firstBeyond-1], max, numBeyond-1, LENGTH(idx));
+    if (numDup)
+        warning("Item %d of i is %d which has occurred before. Ignoring this and %d other duplicates out of %d.", firstDup, INTEGER(idx)[firstDup-1], numDup-1, LENGTH(idx));
+
+    SEXP ans = PROTECT(allocVector(INTSXP, max-LENGTH(idx)+num0+numDup+numBeyond));
+    int ansi = 0;
+    for (int i=0; i<max; i++) if (tmp[i]==0) INTEGER(ans)[ansi++] = i+1;
+    UNPROTECT(1);
+    if (ansi != max-LENGTH(idx)+num0+numDup+numBeyond) error("Internal error: ansi[%d] != max[%d]-LENGTH(idx)[%d]+num0[%d]+numDup[%d]+numBeyond[%d] in convertNegativeIdx",ansi,max,LENGTH(idx),num0,numDup,numBeyond);
+    return(ans);
+}
+
+/*
+* subsetDT - Subsets a data.table
+* NOTE:
+*   1) 'rows' and 'cols' are 1-based, passed from R level
+*   2) Originally for subsetting vectors in fcast and now the beginnings of 
+*       [.data.table ported to C
+*   3) Immediate need is for R 3.1 as lglVec[1] now returns R's global TRUE 
+*       and we don't want := to change that global [think 1 row data.tables]
+*   4) Could do it other ways but may as well go to C now as we were going to 
+*       do that anyway
+*/
+SEXP subsetDT(SEXP x, SEXP rows, SEXP cols) {
+    if (!isNewList(x)) error("Internal error. Argument 'x' to CsubsetDT is type '%s' not 'list'", type2char(TYPEOF(rows)));
+    if (!length(x)) return(x);  // return empty list
+    
+    // check index once up front for 0 or NA, for branchless subsetVectorRaw 
+    R_len_t ansn=0;
+    Rboolean any0orNA=FALSE;
+    check_idx(rows, length(VECTOR_ELT(x,0)), &ansn, &any0orNA);
+
+    if (!isInteger(cols)) error("Internal error. Argument 'cols' to Csubset is type '%s' not 'integer'", type2char(TYPEOF(cols)));
+    for (int i=0; i<LENGTH(cols); i++) {
+        int this = INTEGER(cols)[i];
+        if (this<1 || this>LENGTH(x)) error("Item %d of 'cols' is %d which is outside 1-based range [1,ncol(x)=%d]", i+1, this, LENGTH(x));
+    }
+    SEXP ans = PROTECT(allocVector(VECSXP, LENGTH(cols)+64));  // just do alloc.col directly, eventually alloc.col can be deprecated.
+    copyMostAttrib(x, ans);  // other than R_NamesSymbol, R_DimSymbol and R_DimNamesSymbol  
+                             // so includes row.names (oddly, given other dims aren't) and "sorted", dealt with below
+    SET_TRUELENGTH(ans, LENGTH(ans));
+    SETLENGTH(ans, LENGTH(cols));
+    for (int i=0; i<LENGTH(cols); i++) {
+        SEXP source, target;
+        target = PROTECT(allocVector(TYPEOF(source=VECTOR_ELT(x, INTEGER(cols)[i]-1)), ansn));
+        SETLENGTH(target, ansn);
+        SET_TRUELENGTH(target, ansn);
+        copyMostAttrib(source, target);
+        SET_VECTOR_ELT(ans, i, target);
+        UNPROTECT(1);
+    }
+    #pragma omp parallel num_threads(MIN(getDTthreads(),LENGTH(cols)))
+    {
+      #pragma omp master
+      // this thread and this thread only handles all the STRSXP and VECSXP columns, one by one
+      // it doesn't have to be master; the directive is just convenient.
+      for (int i=0; i<LENGTH(cols); i++) {
+        SEXP target = VECTOR_ELT(ans, i);
+        if (isString(target) || isNewList(target))
+          subsetVectorRaw(target, VECTOR_ELT(x, INTEGER(cols)[i]-1), rows, any0orNA);
+      }
+      #pragma omp for schedule(dynamic)
+      // slaves get on with the other non-STRSXP non-VECSXP columns at the same time.
+      // master may join in when it's finished, straight away if there are no STRSXP or VECSXP columns
+      for (int i=0; i<LENGTH(cols); i++) {
+        SEXP target = VECTOR_ELT(ans, i);
+        if (!isString(target) && !isNewList(target))
+          subsetVectorRaw(target, VECTOR_ELT(x, INTEGER(cols)[i]-1), rows, any0orNA);
+      }
+    }
+    SEXP tmp = PROTECT(allocVector(STRSXP, LENGTH(cols)+64));
+    SET_TRUELENGTH(tmp, LENGTH(tmp));
+    SETLENGTH(tmp, LENGTH(cols));
+    setAttrib(ans, R_NamesSymbol, tmp);
+    subsetVectorRaw(tmp, getAttrib(x, R_NamesSymbol), cols, /*any0orNA=*/FALSE);
+    UNPROTECT(1);
+    
+    tmp = PROTECT(allocVector(INTSXP, 2));
+    INTEGER(tmp)[0] = NA_INTEGER;
+    INTEGER(tmp)[1] = -ansn;
+    setAttrib(ans, R_RowNamesSymbol, tmp);  // The contents of tmp must be set before being passed to setAttrib(). setAttrib looks at tmp value and copies it in the case of R_RowNamesSymbol. Caused hard to track bug around 28 Sep 2014.
+    UNPROTECT(1);    
+
+    // maintain key if ordered subset ...
+    SEXP key = getAttrib(x, install("sorted"));
+    if (length(key)) {
+        SEXP in = PROTECT(chmatch(key,getAttrib(ans,R_NamesSymbol), 0, TRUE)); // (nomatch ignored when in=TRUE)
+        int i = 0;  while(i<LENGTH(key) && LOGICAL(in)[i]) i++;
+        UNPROTECT(1);
+        // i is now the keylen that can be kept. 2 lines above much easier in C than R
+        if (i==0) {
+            setAttrib(ans, install("sorted"), R_NilValue);
+            // clear key that was copied over by copyMostAttrib() above
+        } else if (isOrderedSubset(rows, ScalarInteger(length(VECTOR_ELT(x,0))))) {
+            setAttrib(ans, install("sorted"), tmp=allocVector(STRSXP, i));
+            for (int j=0; j<i; j++) SET_STRING_ELT(tmp, j, STRING_ELT(key, j));
+        }
+    }
+    setAttrib(ans, install(".data.table.locked"), R_NilValue);
+    setselfref(ans);
+    UNPROTECT(1);
+    return ans;
+}
+
+SEXP subsetVector(SEXP x, SEXP idx) { // idx is 1-based passed from R level
+    int ansn;
+    Rboolean any0orNA;
+    check_idx(idx, length(x), &ansn, &any0orNA);
+    SEXP ans = PROTECT(allocVector(TYPEOF(x), ansn));
+    SETLENGTH(ans, ansn);
+    SET_TRUELENGTH(ans, ansn);
+    copyMostAttrib(x, ans);
+    subsetVectorRaw(ans, x, idx, any0orNA);
+    UNPROTECT(1);
+    return ans;
+}
+
+
diff --git a/src/transpose.c b/src/transpose.c
index 3bfded7..093cf4f 100644
--- a/src/transpose.c
+++ b/src/transpose.c
@@ -4,10 +4,10 @@
 
 SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg) {
 
-    R_len_t i, j, k=0, ln, *len, maxlen=0, zerolen=0, anslen;
+    R_len_t i, j, k=0, maxlen=0, zerolen=0, anslen;
     SEXP li, thisi, ans;
     SEXPTYPE type, maxtype=0;
-    Rboolean ignore, coerce = FALSE;
+    Rboolean coerce = FALSE;
 
     if (!isNewList(l)) 
         error("l must be a list.");
@@ -17,11 +17,11 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg) {
         error("ignore.empty should be logical TRUE/FALSE.");
     if (length(fill) != 1)
         error("fill must be NULL or length=1 vector.");
-    ln = LENGTH(l);
-    ignore = LOGICAL(ignoreArg)[0];
+    R_len_t ln = LENGTH(l);
+    Rboolean ignore = LOGICAL(ignoreArg)[0];
 
     // preprocessing
-    len  = Calloc(ln, R_len_t);
+    R_len_t *len  = (R_len_t *)R_alloc(ln, sizeof(R_len_t));
     for (i=0; i<ln; i++) {
         li = VECTOR_ELT(l, i);
         if (!isVectorAtomic(li) && !isNull(li)) 
@@ -92,7 +92,6 @@ SEXP transpose(SEXP l, SEXP fill, SEXP ignoreArg) {
         }
         k++;
     }
-    Free(len);
     UNPROTECT(2);
     return(ans);
 }
diff --git a/src/uniqlist.c b/src/uniqlist.c
index bca42e3..b714ead 100644
--- a/src/uniqlist.c
+++ b/src/uniqlist.c
@@ -37,7 +37,9 @@ SEXP uniqlist(SEXP l, SEXP order)
             case INTSXP : case LGLSXP :
                 b=INTEGER(v)[thisi]==INTEGER(v)[previ]; break;
             case STRSXP :
-                b=STRING_ELT(v,thisi)==STRING_ELT(v,previ); break;  // forder checks no non-ascii unknown, and either UTF-8 or Latin1 but not both. So == pointers is ok given that check.
+                // fix for #469, when key is set, duplicated calls uniqlist, where encoding 
+                // needs to be taken care of.
+                b=ENC2UTF8(STRING_ELT(v,thisi))==ENC2UTF8(STRING_ELT(v,previ)); break;  // marked non-utf8 encodings are converted to utf8 so as to match properly when inputs are of different encodings.
             case REALSXP :
                 ulv = (unsigned long long *)REAL(v);  
                 b = ulv[thisi] == ulv[previ]; // (gives >=2x speedup)
@@ -83,53 +85,168 @@ SEXP uniqlengths(SEXP x, SEXP n) {
 
 // we could compute `uniqlist` and `uniqlengths` and then construct the result
 // but that seems unnecessary waste of memory and roundabout..
-// so, we'll do it directly here.. 'order()' is implemented in C-api, but not used 
-// in R side yet, but if need be, it can be added easily.
-SEXP rleid(SEXP l, SEXP order)
+// so, we'll do it directly here..
+SEXP rleid(SEXP l, SEXP cols)
 {
-    Rboolean b, byorder;
-    unsigned long long *ulv; // for numeric check speed-up
-    SEXP v, ans, class;
-    R_len_t nrow = length(VECTOR_ELT(l,0)), ncol = length(l);
-    R_len_t i, j, len = 1, thisi, previ;
-
-    if (!nrow || !ncol) return (allocVector(INTSXP, 0));
-    ans = PROTECT(allocVector(INTSXP, nrow));
-    if (NA_INTEGER != NA_LOGICAL || sizeof(NA_INTEGER)!=sizeof(NA_LOGICAL)) 
-        error("Have assumed NA_INTEGER == NA_LOGICAL (currently R_NaInt). If R changes this in future (seems unlikely), an extra case is required; a simple change.");
+  R_len_t nrow = length(VECTOR_ELT(l,0)), ncol = length(l);
+  if (!nrow || !ncol) return (allocVector(INTSXP, 0));
+  if (!isInteger(cols) || LENGTH(cols)==0) error("cols must be an integer vector with length >= 1");
+  for (int i=0; i<LENGTH(cols); i++) {
+    int this = INTEGER(cols)[i];
+    if (this<1 || this>LENGTH(l)) error("Item %d of cols is %d which is outside range of l [1,length(l)=%d]", i+1, this, LENGTH(l));
+  }
+  for (int i=1; i<ncol; i++) {
+    if (length(VECTOR_ELT(l,i)) != nrow) error("All elements to input list must be of same length. Element [%d] has length %d != length of first element = %d.", i+1, length(VECTOR_ELT(l,i)), nrow);
+  }
+  SEXP ans = PROTECT(allocVector(INTSXP, nrow));
+  R_len_t grp = 1;
+  INTEGER(ans)[0] = grp; // first row is always the first of first group
+  for (int i=1; i<nrow; i++) {
+    Rboolean b = TRUE;
+    int j = LENGTH(cols);
+    // the last column varies the most frequently so check that first and work backwards
+    while (--j>=0 && b) {
+      SEXP v = VECTOR_ELT(l, INTEGER(cols)[j]-1);
+      switch (TYPEOF(v)) {
+      case INTSXP : case LGLSXP :
+        b = INTEGER(v)[i]==INTEGER(v)[i-1];
+        break;
+      case STRSXP :
+        b = STRING_ELT(v,i)==STRING_ELT(v,i-1);
+        // TODO: do we want to check encodings here now that forder seems to?
+        // Old comment : forder checks no non-ascii unknown, and either UTF-8 or Latin1 but not both.
+        //               So == pointers is ok given that check
+        break;
+      case REALSXP : {
+        long long *ll = (long long *)DATAPTR(v);  
+        b = ll[i]==ll[i-1]; }
+        // 8 bytes of bits are identical. For real (no rounding currently) and integer64
+        // long long == 8 bytes checked in init.c
+        break;
+      default :
+        error("Type '%s' not supported", type2char(TYPEOF(v))); 
+      }
+    }
+    INTEGER(ans)[i] = (grp+=!b);
+  }
+  UNPROTECT(1);
+  return(ans);
+}
 
-    INTEGER(ans)[0] = 1; // first row is always the first of first group
-    byorder = INTEGER(order)[0] != -1;
-    thisi = byorder ? INTEGER(order)[0]-1 : 0;
-    for (i=1; i<nrow; i++) {
-        previ = thisi;
-        thisi = byorder ? INTEGER(order)[i]-1 : i;
-        j = ncol;  // the last column varies the most frequently so check that first and work backwards
-        b = TRUE;
-        while (--j>=0 && b) {
-            v=VECTOR_ELT(l,j);
-            switch (TYPEOF(v)) {
-            case INTSXP : case LGLSXP :
-                b=INTEGER(v)[thisi]==INTEGER(v)[previ]; break;
-            case STRSXP :
-                b=STRING_ELT(v,thisi)==STRING_ELT(v,previ); break;  // forder checks no non-ascii unknown, and either UTF-8 or Latin1 but not both. So == pointers is ok given that check.
-            case REALSXP :
-                ulv = (unsigned long long *)REAL(v);  
-                b = ulv[thisi] == ulv[previ]; // (gives >=2x speedup)
-                if (!b) {
-                    class = getAttrib(v, R_ClassSymbol);
-                    twiddle = (isString(class) && STRING_ELT(class, 0)==char_integer64) ? &i64twiddle : &dtwiddle;
-                    b = twiddle(ulv, thisi, 1) == twiddle(ulv, previ, 1);
+SEXP nestedid(SEXP l, SEXP cols, SEXP order, SEXP grps, SEXP resetvals, SEXP multArg) {
+    Rboolean b, byorder = length(order);
+    SEXP v, ans, class;
+    R_len_t nrows = length(VECTOR_ELT(l,0)), ncols = length(cols);
+    R_len_t i, j, k, thisi, previ, ansgrpsize=1000, nansgrp=0;
+    R_len_t *ptr, *ansgrp = Calloc(ansgrpsize, R_len_t), tmp, starts, grplen;
+    R_len_t ngrps = length(grps), *i64 = Calloc(ncols, R_len_t);
+    R_len_t resetctr=0, rlen = length(resetvals) ? INTEGER(resetvals)[0] : 0;
+    if (!isInteger(cols) || ncols == 0)
+        error("cols must be an integer vector of positive length");
+    // mult arg
+    enum {ALL, FIRST, LAST} mult = ALL;
+    if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "all")) mult = ALL;
+    else if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "first")) mult = FIRST;
+    else if (!strcmp(CHAR(STRING_ELT(multArg, 0)), "last")) mult = LAST;
+    else error("Internal error: invalid value for 'mult'. Please report to datatable-help");
+    // integer64
+    for (j=0; j<ncols; j++) {
+        class = getAttrib(VECTOR_ELT(l, INTEGER(cols)[j]-1), R_ClassSymbol);
+        i64[j] = isString(class) && STRING_ELT(class, 0) == char_integer64;
+    }
+    ans  = PROTECT(allocVector(INTSXP, nrows));
+    int *ians = INTEGER(ans), *igrps = INTEGER(grps);
+    grplen = (ngrps == 1) ? nrows : igrps[1]-igrps[0];
+    starts = igrps[0]-1 + (mult != LAST ? 0 : grplen-1);
+    ansgrp[0] = byorder ? INTEGER(order)[starts]-1 : starts;
+    for (j=0; j<grplen; j++) {
+        ians[byorder ? INTEGER(order)[igrps[0]-1+j]-1 : igrps[0]-1+j] = 1;
+    }
+    nansgrp = 1;
+    for (i=1; i<ngrps; i++) {
+        // "first"=add next grp to current grp iff min(next) >= min(current)
+        // "last"=add next grp to current grp iff max(next) >= max(current)
+        // in addition to this thisi >= previ should be satisfied
+        // could result in more groups.. so done only for first/last cases 
+        // as it allows to extract indices directly in bmerge.
+        grplen = (i+1 < ngrps) ? igrps[i+1]-igrps[i] : nrows-igrps[i]+1;
+        starts = igrps[i]-1 + (mult != LAST ? 0 : grplen-1);
+        thisi = byorder ? INTEGER(order)[starts]-1 : starts;
+        for (k=0; k<nansgrp; k++) {
+            j = ncols;
+            previ = ansgrp[k];
+            // b=TRUE is ideal for mult=ALL, results in lesser groups
+            b = mult == ALL || (thisi >= previ);
+            // >= 0 is not necessary as first col will always be in 
+            // increasing order. NOTE: all "==" cols are already skipped for 
+            // computing nestedid during R-side call, for efficiency.
+            while(b && --j>0) {
+                v = VECTOR_ELT(l,INTEGER(cols)[j]-1);
+                switch(TYPEOF(v)) {
+                    case INTSXP: case LGLSXP:
+                    b = INTEGER(v)[thisi] >= INTEGER(v)[previ];
+                    break;
+                    case STRSXP :
+                    b = ENC2UTF8(STRING_ELT(v,thisi)) == ENC2UTF8(STRING_ELT(v,previ));
+                    break;
+                    case REALSXP:
+                    twiddle = i64[j] ? &i64twiddle : &dtwiddle;
+                    b = twiddle(DATAPTR(v), thisi, 1) >= twiddle(DATAPTR(v), previ, 1);
+                    break;
+                    default:
+                    error("Type '%s' not supported", type2char(TYPEOF(v)));
                 }
-                break;
-                // TO DO: store previ twiddle call, but it'll need to be vector since this is in a loop through columns. Hopefully the first == will short circuit most often
-            default :
-                error("Type '%s' not supported", type2char(TYPEOF(v))); 
             }
+            if (b) break;
+        }
+        // TODO: move this as the outer for-loop and parallelise..
+        // but preferably wait to see if problems with that big non-equi 
+        // group sizes do occur that commonly before to invest time here.
+        if (rlen != starts) {
+            tmp = b ? k : nansgrp++;
+        } else { // we're wrapping up this group, reset nansgrp
+            tmp = 0; nansgrp = 1;
+            rlen += INTEGER(resetvals)[++resetctr];
+        }
+        if (nansgrp >= ansgrpsize) {
+            ansgrpsize = 1.1*ansgrpsize*nrows/i;
+            ptr = Realloc(ansgrp, ansgrpsize, int);
+            if (ptr != NULL) ansgrp = ptr; 
+            else error("Error in reallocating memory in 'nestedid'\n");
         }
-        INTEGER(ans)[i] = b ? len : ++len;
+        for (j=0; j<grplen; j++) {
+            ians[byorder ? INTEGER(order)[igrps[i]-1+j]-1 : igrps[i]-1+j] = tmp+1;
+        }
+        ansgrp[tmp] = thisi;
     }
+    Free(ansgrp);
+    Free(i64);
     UNPROTECT(1);
     return(ans);
 }
 
+SEXP nqnewindices(SEXP xo, SEXP len, SEXP indices, SEXP nArg) {
+
+    R_len_t i, n = INTEGER(nArg)[0], nas=0, tmp=0;
+    SEXP ans, newstarts, newlen;
+    ans = PROTECT(allocVector(VECSXP, 2));
+    SET_VECTOR_ELT(ans, 0, (newstarts = allocVector(INTSXP, n)));
+    SET_VECTOR_ELT(ans, 1, (newlen = allocVector(INTSXP, n)));
+
+    for (i=0; i<n; i++) INTEGER(newlen)[i] = 0;
+    for (i=0; i<length(indices); i++) {
+        if (INTEGER(xo)[i] != NA_INTEGER)
+            INTEGER(newlen)[INTEGER(indices)[i]-1] += INTEGER(len)[i];
+        else INTEGER(newlen)[INTEGER(indices)[i]-1] = INTEGER(len)[i] != 0;
+    }
+    for (i=0; i<n; i++) {
+        if (INTEGER(xo)[nas++] == NA_INTEGER) {
+            INTEGER(newstarts)[i] = NA_INTEGER;
+        } else {
+            INTEGER(newstarts)[i] = INTEGER(newlen)[i] ? tmp+1 : 0;
+            tmp += INTEGER(newlen)[i];
+        }
+    }
+    UNPROTECT(1);
+    return (ans);
+}
diff --git a/src/wrappers.c b/src/wrappers.c
index 52b58b7..f8fff6c 100644
--- a/src/wrappers.c
+++ b/src/wrappers.c
@@ -14,6 +14,12 @@ SEXP setattrib(SEXP x, SEXP name, SEXP value)
          isString(value) && (strcmp(CHAR(STRING_ELT(value, 0)), "data.table") == 0 || 
          strcmp(CHAR(STRING_ELT(value, 0)), "data.frame") == 0) )
         error("Internal structure doesn't seem to be a list. Can't set class to be 'data.table' or 'data.frame'. Use 'as.data.table()' or 'as.data.frame()' methods instead.");
+    if (isLogical(x) && x == ScalarLogical(TRUE)) {
+        x = PROTECT(duplicate(x));
+        setAttrib(x, name, NAMED(value) ? duplicate(value) : value);
+        UNPROTECT(1);
+        return(x);
+    }
     setAttrib(x, name,
         NAMED(value) ? duplicate(value) : value);
         // duplicate is temp fix to restore R behaviour prior to R-devel change on 10 Jan 2014 (r64724).
@@ -96,3 +102,26 @@ SEXP copyNamedInList(SEXP x)
 	return R_NilValue;
 }
 
+
+
+SEXP dim(SEXP x)
+{
+    // fast implementation of dim.data.table
+
+    if (TYPEOF(x) != VECSXP) {
+	error("dim.data.table expects a data.table as input (which is a list), but seems to be of type %s", 
+	    type2char(TYPEOF(x)));
+    }
+    
+    SEXP ans = allocVector(INTSXP, 2);
+    if(length(x) == 0) {
+	INTEGER(ans)[0] = 0;
+	INTEGER(ans)[1] = 0;
+    }
+    else {
+	INTEGER(ans)[0] = length(VECTOR_ELT(x, 0));
+	INTEGER(ans)[1] = length(x);
+    }
+
+    return ans;
+}
diff --git a/tests/autoprint.R b/tests/autoprint.R
index c98eba0..4709cd1 100644
--- a/tests/autoprint.R
+++ b/tests/autoprint.R
@@ -40,7 +40,7 @@ DT[1,a:=as.integer(a)]                # no
 DT[1,a:=10L][]                        # yes. ...[] == oops, forgot print(...)
 
 # Test that error in := doesn't suppress next valid print, bug #2376
-try(DT[,foo:=ColumnNameTypo])         # error: not found.
+tryCatch(DT[,foo:=ColumnNameTypo], error=function(e) e$message)         # error: not found.
 DT                                    # yes
 DT                                    # yes
 
diff --git a/tests/autoprint.Rout.save b/tests/autoprint.Rout.save
index 13e2f6d..60ae505 100644
--- a/tests/autoprint.Rout.save
+++ b/tests/autoprint.Rout.save
@@ -107,8 +107,8 @@ NULL
 2: 10
 > 
 > # Test that error in := doesn't suppress next valid print, bug #2376
-> try(DT[,foo:=ColumnNameTypo])         # error: not found.
-Error in eval(expr, envir, enclos) : object 'ColumnNameTypo' not found
+> tryCatch(DT[,foo:=ColumnNameTypo], error=function(e) e$message)         # error: not found.
+[1] "object 'ColumnNameTypo' not found"
 > DT                                    # yes
     a
 1: 10
diff --git a/tests/knitr.R b/tests/knitr.R
index e70a79a..5ead0c4 100644
--- a/tests/knitr.R
+++ b/tests/knitr.R
@@ -1,4 +1,8 @@
-require(knitr)
-knit("knitr.Rmd", quiet=TRUE)
-cat(readLines("knitr.md"),sep="\n")
+if (suppressPackageStartupMessages(requireNamespace("knitr", quietly = TRUE))) {
+    require(knitr)
+    knit("knitr.Rmd", quiet=TRUE)
+    cat(readLines("knitr.md"), sep="\n")
+} else {
+    cat(readLines("knitr.Rout.mock", warn = FALSE), sep="\n")
+}
 
diff --git a/tests/knitr.Rout.mock b/tests/knitr.Rout.mock
new file mode 100644
index 0000000..1f17724
--- /dev/null
+++ b/tests/knitr.Rout.mock
@@ -0,0 +1,41 @@
+Loading required package: knitr
+Loading required package: data.table
+
+```r
+require(data.table)              # print?
+DT = data.table(x=1:3, y=4:6)    # no
+DT                               # yes
+```
+
+```
+##    x y
+## 1: 1 4
+## 2: 2 5
+## 3: 3 6
+```
+
+```r
+DT[, z := 7:9]                   # no
+print(DT[, z := 10:12])          # yes
+```
+
+```
+##    x y  z
+## 1: 1 4 10
+## 2: 2 5 11
+## 3: 3 6 12
+```
+
+```r
+if (1 < 2) DT[, a := 1L]         # no
+DT                               # yes
+```
+
+```
+##    x y  z a
+## 1: 1 4 10 1
+## 2: 2 5 11 1
+## 3: 3 6 12 1
+```
+Some text.
+
diff --git a/tests/knitr.Rout.save b/tests/knitr.Rout.save
index 8aa8004..252480d 100644
--- a/tests/knitr.Rout.save
+++ b/tests/knitr.Rout.save
@@ -15,12 +15,15 @@ Type 'demo()' for some demos, 'help()' for on-line help, or
 'help.start()' for an HTML browser interface to help.
 Type 'q()' to quit R.
 
-> require(knitr)
+> if (suppressPackageStartupMessages(requireNamespace("knitr", quietly = TRUE))) {
++     require(knitr)
++     knit("knitr.Rmd", quiet=TRUE)
++     cat(readLines("knitr.md"), sep="\n")
++ } else {
++     cat(readLines("knitr.Rout.mock", warn = FALSE), sep="\n")
++ }
 Loading required package: knitr
-> knit("knitr.Rmd", quiet=TRUE)
 Loading required package: data.table
-[1] "knitr.md"
-> cat(readLines("knitr.md"),sep="\n")
 
 ```r
 require(data.table)              # print?
diff --git a/tests/tests.R b/tests/main.R
similarity index 76%
rename from tests/tests.R
rename to tests/main.R
index 0aac7d2..b81a446 100644
--- a/tests/tests.R
+++ b/tests/main.R
@@ -1,7 +1,7 @@
 require(data.table)
-test.data.table()
+test.data.table()  # runs the main test suite of 5,000+ tests in /inst/tests/tests.Rraw
 
-# Turn off verbose repeat to save time (particularly Travis, but also CRAN)
+# Turn off verbose repeat to save time (particularly Travis, but also CRAN) :
 # test.data.table(verbose=TRUE)
 # Calling it again in the past revealed some memory bugs but also verbose mode checks the verbose messages run ok
 # TO DO: check we test each verbose message at least once, instead of a full repeat of all tests
diff --git a/tests/test-all.R b/tests/test-all.R
deleted file mode 100644
index 8772cc0..0000000
--- a/tests/test-all.R
+++ /dev/null
@@ -1,4 +0,0 @@
-library(testthat)
-library(data.table)
-
-test_package("data.table")
diff --git a/tests/testthat.R b/tests/testthat.R
new file mode 100644
index 0000000..c457c8f
--- /dev/null
+++ b/tests/testthat.R
@@ -0,0 +1,6 @@
+if(requireNamespace("testthat", quietly = TRUE)){
+    library(testthat)
+    library(data.table)
+    test_check("data.table")
+}
+
diff --git a/inst/tests/test-S4.R b/tests/testthat/test-S4.R
similarity index 100%
rename from inst/tests/test-S4.R
rename to tests/testthat/test-S4.R
diff --git a/inst/tests/test-data.frame-like.R b/tests/testthat/test-data.frame-like.R
similarity index 97%
rename from inst/tests/test-data.frame-like.R
rename to tests/testthat/test-data.frame-like.R
index 9994da1..8c7ec40 100644
--- a/inst/tests/test-data.frame-like.R
+++ b/tests/testthat/test-data.frame-like.R
@@ -22,11 +22,11 @@ test_that("`xkey` column names are valid in merge (bug#1299", {
 test_that("one column merges work (bug #1241)", {
     dt <- data.table(a=rep(1:2,each=3), b=1:6, key="a")
     y <- data.table(a=c(0,1), bb=c(10,11), key="a")
-    expect_equal(merge(y, dt), data.table(a=1L, bb=11L, b=1:3, key="a"),
+    expect_equal(merge(y, dt), data.table(a=1L, bb=11, b=1:3, key="a"),
                  info="Original test #231")
     expect_equal(merge(y, dt, all=TRUE),
                  data.table(a=rep(c(0L,1L,2L),c(1,3,3)),
-                            bb=rep(c(10L,11L,NA_integer_),c(1,3,3)),
+                            bb=rep(c(10,11,NA_real_),c(1,3,3)),
                             b=c(NA_integer_,1:6), key="a"),
                  info="Original test #232")
 
diff --git a/vignettes/Makefile b/vignettes/Makefile
index 333ec94..bdc2822 100644
--- a/vignettes/Makefile
+++ b/vignettes/Makefile
@@ -1,15 +1,7 @@
 # Makefile to use knitr for package vignettes
 
-# put all PDF targets here, separated by spaces
-PDFS= datatable-faq.pdf datatable-intro.pdf datatable-intro-vignette.html
-
-all: $(PDFS) 
-
 clean:
 	rm -rf *.tex *.bbl *.blg *.aux *.out *.toc *.log *.spl *tikzDictionary *.md figure/
 
-%.pdf: %.Rnw
-	$(R_HOME)/bin/Rscript -e "tools::texi2pdf('$*.tex')"
-
 %.html: %.Rmd
 	$(R_HOME)/bin/Rscript -e "if (getRversion() < '3.0.0') knitr::knit2html('$*.Rmd')"
diff --git a/vignettes/datatable-faq.Rmd b/vignettes/datatable-faq.Rmd
new file mode 100644
index 0000000..f78ca19
--- /dev/null
+++ b/vignettes/datatable-faq.Rmd
@@ -0,0 +1,616 @@
+---
+title: "Frequently Asked Questions about data.table"
+date: "`r Sys.Date()`"
+output:
+  rmarkdown::html_vignette:
+    toc: true
+    number_sections: true
+vignette: >
+  %\VignetteIndexEntry{Frequently asked questions}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+<style>
+h2 {
+    font-size: 20px;
+}
+</style>
+
+```{r, echo = FALSE, message = FALSE}
+library(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+```
+
+The first section, Beginner FAQs, is intended to be read in order, from start to finish.  It's just written in a FAQ style to be digested more easily. It isn't really the most frequently asked questions. A better measure for that is looking on Stack Overflow.
+
+This FAQ is required reading and considered core documentation. Please do not ask questions on Stack Overflow or raise issues on GitHub until you have read it. We can all tell when you ask that you haven't read it. So if you do ask and haven't read it, don't use your real name.
+
+This document has been quickly revised given the changes in v1.9.8 released Nov 2016. Please do submit pull requests to fix mistakes or improvements.  If anyone knows why the table of contents comes out so narrow and squashed when displayed by CRAN, please let us know.  This document used to be a PDF and we changed it recently to HTML.
+
+
+# Beginner FAQs
+
+## Why do `DT[ , 5]` and `DT[2, 5]` return a 1-column data.table rather than vectors like `data.frame`? {#j-num}
+
+For consistency so that when you use data.table in functions that accept varying inputs, you can rely on `DT[...]` returning a data.table. You don't have to remember to include `drop=FALSE` like you do in data.frame. data.table was first released in 2006 and this difference to data.frame has been a feature since the very beginning.
+
+You may have heard that it is generally bad practice to refer to columns by number rather than name, though. If your colleague comes along and reads your code later they may have to hunt around to find out which column is number 5. If you or they change the column ordering higher up in your R program, you may produce wrong results with no warning or error if you forget to change all the places in your code which refer to column number 5. That is your fault not R's or data.table's. It's r [...]
+
+Say column 5 is named `"region"` and you really must extract that column as a vector not a data.table. It is more robust to use the column name and write `DT$region` or `DT[["region"]]`; i.e., the same as base R. Using base R's `$` and `[[` on data.table is encouraged. Not when combined with `<-` to assign (use `:=` instead for that) but just to select a single column by name they are encouraged.
+
+There are some circumstances where referring to a column by number seems like the only way, such as a sequence of columns. In these situations just like data.frame, you can write `DT[, 5:10]` and `DT[,c(1,4,10)]`. However, again, it is more robust (to future changes in your data's number of and ordering of columns) to use a named range such as `DT[,columnRed:columnViolet]` or name each one `DT[,c("columnRed","columnOrange","columnYellow")]`. It is harder work up front, but you will proba [...]
+
+However, what we really want you to do is `DT[,.(columnRed,columnOrange,columnYellow)]`; i.e., use column names as if they are variables directly inside `DT[...]`. You don't have to prefix each column with `DT$` like you do in data.frame. The `.()` part is just an alias for `list()` and you can use `list()` instead if you prefer. You can place any R expression of column names, using any R package, returning different types of different lengths, right there. We wanted to encourage you to  [...]
+
+Reminder: you can place _any_ R expression inside `DT[...]` using column names as if they are variables; e.g., try `DT[, colA*colB/2]`. That does return a vector because you used column names as if they are variables. Wrap with `.()` to return a data.table; i.e. `DT[,.(colA*colB/2)]`.  Name it: `DT[,.(myResult = colA*colB/2)]`.  And we'll leave it to you to guess how to return two things from this query. It's also quite common to do a bunch of things inside an anonymous body: `DT[, { x<- [...]
+
+## Why does `DT[,"region"]` return a 1-column data.table rather than a vector?
+
+See the [answer above](#j-num). Try `DT$region` instead. Or `DT[["region"]]`. 
+
+
+## Why does `DT[, region]` return a vector for the "region" column?  I'd like a 1-column data.table.
+
+Try `DT[ , .(region)]` instead. `.()` is an alias for `list()` and ensures a data.table is returned.
+
+Also continue reading and see the FAQ after next. Skim whole documents before getting stuck in one part.
+
+## Why does `DT[ , x, y, z]` not work? I wanted the 3 columns `x`,`y` and `z`.
+
+The `j` expression is the 2nd argument. Try `DT[ , c("x","y","z")]` or `DT[ , .(x,y,z)]`.
+
+## I assigned a variable `mycol = "x"` but then `DT[ , mycol]` returns `"x"`. How do I get it to look up the column name contained in the `mycol` variable?
+
+In v1.9.8 released Nov 2016 there is an abililty to turn on new behaviour: `options(datatable.WhenJisSymbolThenCallingScope=TRUE)`. It will then work as you expected, just like data.frame. If you are a new user of data.table, you should probably do this. You can place this command in your .Rprofile file so you don't have to remember again. See the long item in release notes about this. The release notes are linked at the top of the data.table homepage: [NEWS](https://github.com/Rdatatabl [...]
+
+Without turning on that new behavior, what's happening is that the `j` expression sees objects in the calling scope. The variable `mycol` does not exist as a column name of `DT` so data.table then looked in the calling scope and found `mycol` there and returned its value `"x"`. This is correct behaviour currently. Had `mycol` been a column name, then that column's data would have been returned. What has been done to date has been `DT[ , mycol, with = FALSE]` which will return the `x` col [...]
+
+## What are the benefits of being able to use column names as if they are variables inside `DT[...]`?
+
+`j` doesn't have to be just column names. You can write any R _expression_ of column names directly in `j`, _e.g._, `DT[ , mean(x*y/z)]`.  The same applies to `i`, _e.g._, `DT[x>1000, sum(y*z)]`.
+
+This runs the `j` expression on the set of rows where the `i` expression is true. You don't even need to return data, _e.g._, `DT[x>1000, plot(y, z)]`. You can do `j` by group simply by adding `by = `; e.g., `DT[x>1000, sum(y*z), by = w]`. This runs `j` for each group in column `w` but just over the rows where `x>1000`. By placing the 3 parts of the query (i=where, j=select and by=group by) inside the square brackets, data.table sees this query as a whole before any part of it is evaluat [...]
+
+## OK, I'm starting to see what data.table is about, but why didn't you just enhance `data.frame` in R? Why does it have to be a new package?
+
+As [highlighted above](#j-num), `j` in `[.data.table` is fundamentally different from `j` in `[.data.frame`. Even if something as simple as `DF[ , 1]` was changed in base R to return a data.frame rather than a vector, that would break existing code in many 1000's of CRAN packages and user code. As soon as we took the step to create a new class that inherited from data.frame, we had the opportunity to change a few things and we did. We want data.table to be slightly different and to work  [...]
+
+Furthermore, data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table can be passed to any package that only accepts `data.frame` and that package can use `[.data.frame` syntax on the data.table. See [this answer](http://stackoverflow.com/a/10529888/403310) for how that is achieved.
+
+We _have_ proposed enhancements to R wherever possible, too. One of these was accepted as a new feature in R 2.12.0 :
+
+> `unique()` and `match()` are now faster on character vectors where all elements are in the global CHARSXP cache and have unmarked encoding (ASCII).  Thanks to Matt Dowle for suggesting improvements to the way the hash code is generated in unique.c.
+
+A second proposal was to use `memcpy` in duplicate.c, which is much faster than a for loop in C. This would improve the _way_ that R copies data internally (on some measures by 13 times). The thread on r-devel is [here](http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html).
+
+A third more significant proposal that was accepted is that R now uses data.table's radix sort code as from R 3.3.0 :
+
+> The radix sort algorithm and implementation from data.table (forder) replaces the previous radix (counting) sort and adds a new method for order(). Contributed by Matt Dowle and Arun Srinivasan, the new algorithm supports logical, integer (even with large values), real, and character vectors. It outperforms all other methods, but there are some caveats (see ?sort).
+
+This was big event for us and we celebrated until the cows came home. (Not really.)
+
+## Why are the defaults the way they are? Why does it work the way it does?
+
+The simple answer is because the main author originally designed it for his own use. He wanted it that way. He finds it a more natural, faster way to write code, which also executes more quickly.
+
+## Isn't this already done by `with()` and `subset()` in `base`?
+
+Some of the features discussed so far are, yes. The package builds upon base functionality. It does the same sorts of things but with less code required and executes many times faster if used correctly.
+
+## Why does `X[Y]` return all the columns from `Y` too? Shouldn't it return a subset of `X`?
+
+This was changed in v1.5.3 (Feb 2011). Since then `X[Y]` includes `Y`'s non-join columns. We refer to this feature as _join inherited scope_ because not only are `X` columns available to the `j` expression, so are `Y` columns. The downside is that `X[Y]` is less efficient since every item of `Y`'s non-join columns are duplicated to match the (likely large) number of rows in `X` that match. We therefore strongly encourage `X[Y, j]` instead of `X[Y]`. See [next FAQ](#MergeDiff).
+
+## What is the difference between `X[Y]` and `merge(X, Y)`? {#MergeDiff}
+
+`X[Y]` is a join, looking up `X`'s rows using `Y` (or `Y`'s key if it has one) as an index.
+
+`Y[X]` is a join, looking up `Y`'s rows using `X` (or `X`'s key if it has one) as an index.
+
+`merge(X,Y)`[^1] does both ways at the same time. The number of rows of `X[Y]` and `Y[X]` usually differ, whereas the number of rows returned by `merge(X, Y)` and `merge(Y, X)` is the same.
+
+_BUT_ that misses the main point. Most tasks require something to be done on the data after a join or merge. Why merge all the columns of data, only to use a small subset of them afterwards? You may suggest `merge(X[ , ColsNeeded1], Y[ , ColsNeeded2])`, but that requires the programmer to work out which columns are needed. `X[Y, j]` in data.table does all that in one step for you. When you write `X[Y, sum(foo*bar)]`, data.table automatically inspects the `j` expression to see which colum [...]
+
+[^1]: Here we mean either the `merge` _method_ for data.table or the `merge` method for `data.frame` since both methods work in the same way in this respect. See `?merge.data.table` and [below](#r-dispatch) for more information about method dispatch.
+
+## Anything else about `X[Y, sum(foo*bar)]`?
+
+This behaviour changed in v1.9.4 (Sep 2014). It now does the `X[Y]` join and then runs `sum(foo*bar)` over all the rows; i.e., `X[Y][ , sum(foo*bar)]`. It used to run `j` for each _group_ of `X` that each row of `Y` matches to. That can still be done as it's very useful but you now need to be explicit and specify `by = .EACHI`, _i.e._, `X[Y, sum(foo*bar), by = .EACHI]`. We call this _grouping by each `i`_.
+
+For example, (further complicating it by using _join inherited scope_, too):
+
+```{r}
+X = data.table(grp = c("a", "a", "b",
+                       "b", "b", "c", "c"), foo = 1:7)
+setkey(X, grp)
+Y = data.table(c("b", "c"), bar = c(4, 2))
+X
+Y
+X[Y, sum(foo*bar)]
+X[Y, sum(foo*bar), by = .EACHI]
+```
+
+## That's nice. How did you manage to change it given that users depended on the old behaviour?
+
+The request to change came from users. The feeling was that if a query is doing grouping then an explicit `by=` should be present for code readability reasons. An option was provided to return the old behaviour: `options(datatable.old.bywithoutby)`, by default `FALSE`. This enabled upgrading to test the other new features / bug fixes in v1.9.4, with later migration of any by-without-by queries when ready by adding `by=.EACHI` to them. We retained 47 pre-change tests and added them back a [...]
+
+Of the 66 packages on CRAN or Bioconductor that depended on or import data.table at the time of releasing v1.9.4 (it is now over 300), only one was affected by the change. That could be because many packages don't have comprehensive tests, or just that grouping by each row in `i` wasn't being used much by downstream packages. We always test the new version with all dependent packages before release and coordinate any changes with those maintainers. So this release was quite straightforwa [...]
+
+Another compelling reason to make the change was that previously, there was no efficient way to achieve what `X[Y, sum(foo*bar)]` does now. You had to write `X[Y][ , sum(foo*bar)]`. That was suboptimal because `X[Y]` joined all the columns and passed them all to the second compound query without knowing that only `foo` and `bar` are needed. To solve that efficiency problem, extra programming effort was required: `X[Y, list(foo, bar)][ , sum(foo*bar)]`.  The change to `by = .EACHI` has si [...]
+
+# General Syntax
+
+## How can I avoid writing a really long `j` expression? You've said that I should use the column _names_, but I've got a lot of columns.
+
+When grouping, the `j` expression can use column names as variables, as you know, but it can also use a reserved symbol `.SD` which refers to the **S**ubset of the **D**ata.table for each group (excluding the grouping columns). So to sum up all your columns it's just `DT[ , lapply(.SD, sum), by = grp]`. It might seem tricky, but it's fast to write and fast to run. Notice you don't have to create an anonymous function. The `.SD` object is efficiently implemented internally and more effici [...]
+
+So please don't do, for example, `DT[ , sum(.SD[["sales"]]), by = grp]`. That works but is inefficient and inelegant. `DT[ , sum(sales), by = grp]` is what was intended, and it could be 100s of times faster. If you use _all_ of the data in `.SD` for each group (such as in `DT[ , lapply(.SD, sum), by = grp]`) then that's very good usage of `.SD`. If you're using _several_ but not _all_ of the columns, you can combine `.SD` with `.SDcols`; see `?data.table`.
+
+## Why is the default for `mult` now `"all"`?
+
+In v1.5.3 the default was changed to `"all"`. When `i` (or `i`'s key if it has one) has fewer columns than `x`'s key, `mult` was already set to `"all"` automatically. Changing the default makes this clearer and easier for users as it came up quite often.
+
+In versions up to v1.3, `"all"` was slower. Internally, `"all"` was implemented by joining using `"first"`, then again from scratch using `"last"`, after which a diff between them was performed to work out the span of the matches in `x` for each row in `i`. Most often we join to single rows, though, where `"first"`,`"last"` and `"all"` return the same result. We preferred maximum performance for the majority of situations so the default chosen was `"first"`. When working with a non-uniqu [...]
+
+In v1.4 the binary search in C was changed to branch at the deepest level to find first and last. That branch will likely occur within the same final pages of RAM so there should no longer be a speed disadvantage in defaulting `mult` to `"all"`. We warned that the default might change and made the change in v1.5.3.
+
+A future version of data.table may allow a distinction between a key and a _unique key_. Internally `mult = "all"` would perform more like `mult = "first"` when all `x`'s key columns were joined to and `x`'s key was a unique key. data.table would need checks on insert and update to make sure a unique key is maintained. An advantage of specifying a unique key would be that data.table would ensure no duplicates could be inserted, in addition to performance.
+
+## I'm using `c()` in `j` and getting strange results.
+
+This is a common source of confusion. In `data.frame` you are used to, for example:
+
+```{r}
+DF = data.frame(x = 1:3, y = 4:6, z = 7:9)
+DF
+DF[ , c("y", "z")]
+```
+
+which returns the two columns. In data.table you know you can use the column names directly and might try:
+
+```{r}
+DT = data.table(DF)
+DT[ , c(y, z)]
+```
+
+but this returns one vector.  Remember that the `j` expression is evaluated within the environment of `DT` and `c()` returns a vector.  If 2 or more columns are required, use `list()` or `.()` instead:
+
+```{r}
+DT[ , .(y, z)]
+```
+
+`c()` can be useful in a data.table too, but its behaviour is different from that in `[.data.frame`.
+
+## I have built up a complex table with many columns.  I want to use it as a template for a new table; _i.e._, create a new table with no rows, but with the column names and types copied from my table. Can I do that easily?
+
+Yes. If your complex table is called `DT`, try `NEWDT = DT[0]`.
+
+## Is a null data.table the same as `DT[0]`?
+
+No. By "null data.table" we mean the result of `data.table(NULL)` or `as.data.table(NULL)`; _i.e._,
+
+```{r}
+data.table(NULL)
+data.frame(NULL)
+as.data.table(NULL)
+as.data.frame(NULL)
+is.null(data.table(NULL))
+is.null(data.frame(NULL))
+```
+
+The null data.table|`frame` is `NULL` with some attributes attached, which means it's no longer `NULL`. In R only pure `NULL` is `NULL` as tested by `is.null()`. When referring to the "null data.table" we use lower case null to help distinguish from upper case `NULL`. To test for the null data.table, use `length(DT) == 0` or `ncol(DT) == 0` (`length` is slightly faster as it's a primitive function).
+
+An _empty_ data.table (`DT[0]`) has one or more columns, all of which are empty. Those empty columns still have names and types.
+
+```{r}
+DT = data.table(a = 1:3, b = c(4, 5, 6), d = c(7L,8L,9L))
+DT[0]
+sapply(DT[0], class)
+```
+
+## Why has the `DT()` alias been removed? {#DTremove1}
+`DT` was introduced originally as a wrapper for a list of `j `expressions. Since `DT` was an alias for data.table, this was a convenient way to take care of silent recycling in cases where each item of the `j` list evaluated to different lengths. The alias was one reason grouping was slow, though.
+
+As of v1.3, `list()` or `.()` should be passed instead to the `j` argument. These are much faster, especially when there are many groups. Internally, this was a nontrivial change. Vector recycling is now done internally, along with several other speed enhancements for grouping.
+
+## But my code uses `j = DT(...)` and it works. The previous FAQ says that `DT()` has been removed. {#DTremove2}
+
+Then you are using a version prior to 1.5.3. Prior to 1.5.3 `[.data.table` detected use of `DT()` in the `j` and automatically replaced it with a call to `list()`. This was to help the transition for existing users.
+
+## What are the scoping rules for `j` expressions?
+
+Think of the subset as an environment where all the column names are variables. When a variable `foo` is used in the `j` of a query such as `X[Y, sum(foo)]`, `foo` is looked for in the following order :
+
+ 1. The scope of `X`'s subset; _i.e._, `X`'s column names.
+ 2. The scope of each row of `Y`; _i.e._, `Y`'s column names (_join inherited scope_)
+ 3. The scope of the calling frame; _e.g._, the line that appears before the data.table query.
+ 4. Exercise for reader: does it then ripple up the calling frames, or go straight to `globalenv()`?
+ 5. The global environment
+
+This is _lexical scoping_ as explained in [R FAQ 3.3.1](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping). The environment in which the function was created is not relevant, though, because there is _no function_. No anonymous _function_ is passed to `j`. Instead, an anonymous _body_ is passed to `j`; for example,
+
+```{r}
+DT = data.table(x = rep(c("a", "b"), c(2, 3)), y = 1:5)
+DT
+DT[ , {z = sum(y); z + 3}, by = x]
+```
+
+Some programming languages call this a _lambda_.
+
+## Can I trace the `j` expression as it runs through the groups? {#j-trace}
+
+Try something like this:
+
+```{r}
+DT[ , {
+  cat("Objects:", paste(objects(), collapse = ","), "\n")
+  cat("Trace: x=", as.character(x), " y=", y, "\n")
+  sum(y)},
+  by = x]
+```
+
+## Inside each group, why are the group variables length-1?
+
+[Above](#j-trace), `x` is a grouping variable and (as from v1.6.1) has `length` 1 (if inspected or used in `j`). It's for efficiency and convenience. Therefore, there is no difference between the following two statements:
+
+```{r}
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x, sum(y)), by = x]
+DT[ , .(g = 1, h = 2, i = 3, j = 4, repeatgroupname = x[1], sum(y)), by = x]
+```
+
+If you need the size of the current group, use `.N` rather than calling `length()` on any column.
+
+## Only the first 10 rows are printed, how do I print more?
+
+There are two things happening here. First, if the number of rows in a data.table are large (`> 100` by default), then a summary of the data.table is printed to the console by default. Second, the summary of a large data.table is printed by taking the top and bottom `n` (`= 5` by default) rows of the data.table and only printing those. Both of these parameters (when to trigger a summary and how much of a table to use as a summary) are configurable by R's `options` mechanism, or by callin [...]
+
+For instance, to enforce the summary of a data.table to only happen when a data.table is greater than 50 rows, you could `options(datatable.print.nrows = 50)`. To disable the summary-by-default completely, you could `options(datatable.print.nrows = Inf)`. You could also call `print` directly, as in `print(your.data.table, nrows = Inf)`.
+
+If you want to show more than just the top (and bottom) 10 rows of a data.table summary (say you like 20), set `options(datatable.print.topn = 20)`, for example. Again, you could also just call `print` directly, as in `print(your.data.table, topn = 20)`.
+
+## With an `X[Y]` join, what if `X` contains a column called `"Y"`?
+
+When `i` is a single name such as `Y` it is evaluated in the calling frame. In all other cases such as calls to `.()` or other expressions, `i` is evaluated within the scope of `X`. This facilitates easy _self-joins_ such as `X[J(unique(colA)), mult = "first"]`.
+
+## `X[Z[Y]]` is failing because `X` contains a column `"Y"`. I'd like it to use the table `Y` in calling scope.
+
+The `Z[Y]` part is not a single name so that is evaluated within the frame of `X` and the problem occurs. Try `tmp = Z[Y]; X[tmp]`. This is robust to `X` containing a column `"tmp"` because `tmp` is a single name. If you often encounter conflicts of this type, one simple solution may be to name all tables in uppercase and all column names in lowercase, or some similar scheme.
+
+## Can you explain further why data.table is inspired by `A[B]` syntax in `base`?
+
+Consider `A[B]` syntax using an example matrix `A` :
+```{r}
+A = matrix(1:12, nrow = 4)
+A
+```
+
+To obtain cells `(1, 2) = 5` and `(3, 3) = 11` many users (we believe) may try this first :
+```{r}
+A[c(1, 3), c(2, 3)]
+```
+
+However, this returns the union of those rows and columns. To reference the cells, a 2-column matrix is required. `?Extract` says :
+
+> When indexing arrays by `[` a single argument `i` can be a matrix with as many columns as there are dimensions of `x`; the result is then a vector with elements corresponding to the sets of indices in each row of `i`.
+
+Let's try again.
+
+```{r}
+B = cbind(c(1, 3), c(2, 3))
+B
+A[B]
+```
+
+A matrix is a 2-dimensional structure with row names and column names. Can we do the same with names?
+
+```{r}
+rownames(A) = letters[1:4]
+colnames(A) = LETTERS[1:3]
+A
+B = cbind(c("a", "c"), c("B", "C"))
+A[B]
+```
+
+So yes, we can. Can we do the same with a `data.frame`?
+
+```{r}
+A = data.frame(A = 1:4, B = letters[11:14], C = pi*1:4)
+rownames(A) = letters[1:4]
+A
+B
+A[B]
+```
+
+But, notice that the result was coerced to `character.` R coerced `A` to `matrix` first so that the syntax could work, but the result isn't ideal.  Let's try making `B` a `data.frame`.
+
+```{r}
+B = data.frame(c("a", "c"), c("B", "C"))
+cat(try(A[B], silent = TRUE))
+```
+
+So we can't subset a `data.frame` by a `data.frame` in base R. What if we want row names and column names that aren't `character` but `integer` or `float`? What if we want more than 2 dimensions of mixed types? Enter data.table.
+
+Furthermore, matrices, especially sparse matrices, are often stored in a 3-column tuple: `(i, j, value)`. This can be thought of as a key-value pair where `i` and `j` form a 2-column key. If we have more than one value, perhaps of different types, it might look like `(i, j, val1, val2, val3, ...)`. This looks very much like a `data.frame`. Hence data.table extends `data.frame` so that a `data.frame` `X` can be subset by a `data.frame` `Y`, leading to the `X[Y]` syntax.
+
+## Can base be changed to do this then, rather than a new package?
+`data.frame` is used _everywhere_ and so it is very difficult to make _any_ changes to it.
+data.table _inherits_ from `data.frame`. It _is_ a `data.frame`, too. A data.table _can_ be passed to any package that _only_ accepts `data.frame`. When that package uses `[.data.frame` syntax on the data.table, it works. It works because `[.data.table` looks to see where it was called from. If it was called from such a package, `[.data.table` diverts to `[.data.frame`.
+
+## I've heard that data.table syntax is analogous to SQL.
+Yes :
+
+ - `i`  $\Leftrightarrow$ where
+ - `j`  $\Leftrightarrow$  select
+ - `:=`  $\Leftrightarrow$  update
+ - `by`  $\Leftrightarrow$  group by
+ - `i`  $\Leftrightarrow$  order by (in compound syntax)
+ - `i`  $\Leftrightarrow$  having (in compound syntax)
+ - `nomatch = NA`  $\Leftrightarrow$  outer join
+ - `nomatch = 0L`  $\Leftrightarrow$  inner join
+ - `mult = "first"|"last"`  $\Leftrightarrow$  N/A because SQL is inherently unordered
+ - `roll = TRUE`  $\Leftrightarrow$  N/A because SQL is inherently unordered
+
+The general form is :
+
+```{r, eval = FALSE}
+DT[where, select|update, group by][order by][...] ... [...]
+```
+
+A key advantage of column vectors in R is that they are _ordered_, unlike SQL[^2]. We can use ordered functions in `data.table queries such as `diff()` and we can use _any_ R function from any package, not just the functions that are defined in SQL. A disadvantage is that R objects must fit in memory, but with several R packages such as ff, bigmemory, mmap and indexing, this is changing.
+
+[^2]: It may be a surprise to learn that `select top 10 * from ...` does _not_ reliably return the same rows over time in SQL. You do need to include an `order by` clause, or use a clustered index to guarantee row order; _i.e._, SQL is inherently unordered.
+
+## What are the smaller syntax differences between `data.frame` and data.table {#SmallerDiffs}
+
+ - `DT[3]` refers to the 3rd _row_, but `DF[3]` refers to the 3rd _column_
+ - `DT[3, ] == DT[3]`, but `DF[ , 3] == DF[3]` (somewhat confusingly in data.frame, whereas data.table is consistent)
+ - For this reason we say the comma is _optional_ in `DT`, but not optional in `DF`
+ - `DT[[3]] == DF[3] == DF[[3]]`
+ - `DT[i, ]`, where `i` is a single integer, returns a single row, just like `DF[i, ]`, but unlike a matrix single-row subset which returns a vector.
+ - `DT[ , j]` where `j` is a single integer returns a one-column data.table, unlike `DF[, j]` which returns a vector by default
+ - `DT[ , "colA"][[1]] == DF[ , "colA"]`.
+ - `DT[ , colA] == DF[ , "colA"]` (currently in data.table v1.9.8 but is about to change, see release notes)
+ - `DT[ , list(colA)] == DF[ , "colA", drop = FALSE]`
+ - `DT[NA]` returns 1 row of `NA`, but `DF[NA]` returns an entire copy of `DF` containing `NA` throughout. The symbol `NA` is type `logical` in R and is therefore recycled by `[.data.frame`. The user's intention was probably `DF[NA_integer_]`. `[.data.table` diverts to this probable intention automatically, for convenience.
+ - `DT[c(TRUE, NA, FALSE)]` treats the `NA` as `FALSE`, but `DF[c(TRUE, NA, FALSE)]` returns
+  `NA` rows for each `NA`
+ - `DT[ColA == ColB]` is simpler than `DF[!is.na(ColA) & !is.na(ColB) & ColA == ColB, ]`
+ - `data.frame(list(1:2, "k", 1:4))` creates 3 columns, data.table creates one `list` column.
+ - `check.names` is by default `TRUE` in `data.frame` but `FALSE` in data.table, for convenience.
+ - `stringsAsFactors` is by default `TRUE` in `data.frame` but `FALSE` in data.table, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of converting to `factor`.
+ - Atomic vectors in `list` columns are collapsed when printed using `", "` in `data.frame`, but `","` in data.table with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.
+
+In `[.data.frame` we very often set `drop = FALSE`. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column `data.frame`. In `[.data.table` we took the opportunity to make it consistent and dropped `drop`.
+
+When a data.table is passed to a data.table-unaware package, that package is not concerned with any of these differences; it just works.
+
+## I'm using `j` for its side effect only, but I'm still getting data returned. How do I stop that?
+
+In this case `j` can be wrapped with `invisible()`; e.g., `DT[ , invisible(hist(colB)), by = colA]`[^3]
+
+[^3]: _e.g._, `hist()` returns the breakpoints in addition to plotting to the graphics device.
+
+## Why does `[.data.table` now have a `drop` argument from v1.5?
+
+So that data.table can inherit from `data.frame` without using `...`. If we used `...` then invalid argument names would not be caught.
+
+The `drop` argument is never used by `[.data.table`. It is a placeholder for non-data.table-aware packages when they use the `[.data.frame` syntax directly on a data.table.
+
+## Rolling joins are cool and very fast! Was that hard to program?
+The prevailing row on or before the `i` row is the final row the binary search tests anyway. So `roll = TRUE` is essentially just a switch in the binary search C code to return that row.
+
+## Why does `DT[i, col := value]` return the whole of `DT`? I expected either no visible value (consistent with `<-`), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.
+
+This has changed in v1.8.3 to meet your expectations. Please upgrade.
+
+The whole of `DT` is returned (now invisibly) so that compound syntax can work; _e.g._, `DT[i, done := TRUE][ , sum(done)]`. The number of rows updated is returned when `verbose` is `TRUE`, either on a per-query basis or globally using `options(datatable.verbose = TRUE)`.
+
+## OK, thanks. What was so difficult about the result of `DT[i, col := value]` being returned invisibly?
+R internally forces visibility on for `[`. The value of FunTab's eval column (see [src/main/names.c](https://github.com/wch/r-source/blob/trunk/src/main/names.c)) for `[` is `0` meaning "force `R_Visible` on" (see [R-Internals section 1.6](https://cran.r-project.org/doc/manuals/r-release/R-ints.html#Autoprinting) ). Therefore, when we tried `invisible()` or setting `R_Visible` to `0` directly ourselves, `eval` in [src/main/eval.c](https://github.com/wch/r-source/blob/trunk/src/main/eval. [...]
+
+To solve this problem, the key was to stop trying to stop the print method running after a `:=`. Instead, inside `:=` we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.
+
+## Why do I have to type `DT` sometimes twice after using `:=` to print the result to console?
+
+This is an unfortunate downside to get [#869](https://github.com/Rdatatable/data.table/issues/869) to work. If a `:=` is used inside a function with no `DT[]` before the end of the function, then the next time `DT` is typed at the prompt, nothing will be printed. A repeated `DT` will print. To avoid this: include a `DT[]` after the last `:=` in your function. If that is not possible (e.g., it's not a function you can change) then `print(DT)` and `DT[]` at the prompt are guaranteed to pri [...]
+
+## I've noticed that `base::cbind.data.frame` (and `base::rbind.data.frame`) appear to be changed by data.table. How is this possible? Why?
+
+It is a temporary, last resort solution until we discover a better way to solve the problems listed below. Essentially, the issue is that data.table inherits from `data.frame`, _and_ `base::cbind` and `base::rbind` (uniquely) do their own S3 dispatch internally as documented by `?cbind`. The change is adding one `for` loop to the start of each function directly in `base`; _e.g._,
+
+```{r}
+base::cbind.data.frame
+```
+
+That modification is made dynamically, _i.e._, the `base` definition of `cbind.data.frame` is fetched, the `for` loop added to the beginning and then assigned back to `base`. This solution is intended to be robust to different definitions of `base::cbind.data.frame` in different versions of R, including unknown future changes. Again, it is a last resort until a better solution is known or made available. The competing requirements are:
+
+ - `cbind(DT, DF)` needs to work. Defining `cbind.data.table` doesn't work because `base::cbind` does its own S3 dispatch and requires that the _first_ `cbind` method for each object it is passed is _identical_. This is not true in `cbind(DT, DF)` because the first method for `DT` is `cbind.data.table` but the first method for `DF` is `cbind.data.frame`. `base::cbind` then falls through to its internal `bind` code which appears to treat `DT` as a regular `list` and returns very odd looki [...]
+
+ - This naturally leads to trying to mask `cbind.data.frame` instead. Since a data.table is a `data.frame`, `cbind` would find the same method for both `DT` and `DF`. However, this doesn't work either because `base::cbind` appears to find methods in `base` first; _i.e._, `base::cbind.data.frame` isn't maskable. This is reproducible as follows :
+
+```{r}
+foo = data.frame(a = 1:3)
+cbind.data.frame = function(...) cat("Not printed\n")
+cbind(foo)
+rm("cbind.data.frame")
+```
+
+ - Finally, we tried masking `cbind` itself (v1.6.5 and v1.6.6). This allowed `cbind(DT, DF)` to work, but introduced compatibility issues with package `IRanges`, since `IRanges` also masks `cbind`. It worked if `IRanges` was lower on the `search()` path than data.table, but if `IRanges` was higher then data.table's, `cbind` would never be called and the strange-looking `matrix` output occurs again (see [below](#cbinderror)).
+
+If you know of a better solution that still solves all the issues above, then please let us know and we'll gladly change it.
+
+## I've read about method dispatch (_e.g._ `merge` may or may not dispatch to `merge.data.table`) but _how_ does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when? {#r-dispatch}
+
+This comes up quite a lot but it's really earth-shatteringly simple. A function such as `merge` is _generic_ if it consists of a call to `UseMethod`. When you see people talking about whether or not functions are _generic_ functions they are merely typing the function without `()` afterwards, looking at the program code inside it and if they see a call to `UseMethod` then it is _generic_.  What does `UseMethod` do? It literally slaps the function name together with the class of the first [...]
+
+You might now ask: where is this documented in R? Answer: it's quite clear, but, you need to first know to look in `?UseMethod` and _that_ help file contains :
+
+> When a function calling `UseMethod('fun')` is applied to an object with class attribute `c('first', 'second')`, the system searches for a function called `fun.first` and, if it finds it, applies it to the object. If no such function is found a function called `fun.second` is tried. If no class name produces a suitable function, the function `fun.default` is used, if it exists, or an error results.
+
+Happily, an internet search for "How does R method dispatch work" (at the time of this writing) returns the `?UseMethod` help page in the top few links. Admittedly, other links rapidly descend into the intricacies of S3 vs S4, internal generics and so on.
+
+However, features like basic S3 dispatch (pasting the function name together with the class name) is why some R folk love R. It's so simple. No complicated registration or signature is required. There isn't much needed to learn. To create the `merge` method for data.table all that was required, literally, was to merely create a function called `merge.data.table`.
+
+# Questions relating to compute time
+
+## I have 20 columns and a large number of rows. Why is an expression of one column so quick?
+
+Several reasons:
+
+ - Only that column is grouped, the other 19 are ignored because data.table inspects the `j` expression and realises it doesn't use the other columns.
+ - One memory allocation is made for the largest group only, then that memory is re-used for the other groups. There is very little garbage to collect.
+ - R is an in-memory column store; i.e., the columns are contiguous in RAM. Page fetches from RAM into L2 cache are minimised.
+
+## I don't have a `key` on a large table, but grouping is still really quick. Why is that?
+
+data.table uses radix sorting. This is significantly faster than other sort algorithms. See [our presentations](http://user2015.math.aau.dk/presentations/234.pdf) on [our homepage](https://github.com/Rdatatable/data.table/wiki) for more information.
+
+This is also one reason why `setkey()` is quick.
+
+When no `key` is set, or we group in a different order from that of the key, we call it an _ad hoc_ `by`.
+
+## Why is grouping by columns in the key faster than an _ad hoc_ `by`?
+
+Because each group is contiguous in RAM, thereby minimising page fetches and memory can be
+copied in bulk (`memcpy` in C) rather than looping in C.
+
+## What are primary and secondary indexes in data.table?
+
+Manual: [`?setkey`](https://www.rdocumentation.org/packages/data.table/functions/setkey)
+S.O. : [What is the purpose of setting a key in data.table?](https://stackoverflow.com/questions/20039335/what-is-the-purpose-of-setting-a-key-in-data-table/20057411#20057411)
+
+`setkey(DT, col1, col2)` orders the rows by column `col1` then within each group of `col1` it orders by `col2`. This is a _primary index_. The row order is changed _by reference_ in RAM. Subsequent joins and groups on those key columns then take advantage of the sort order for efficiency. (Imagine how difficult looking for a phone number in a printed telephone directory would be if it wasn't sorted by surname then forename. That's literally all `setkey` does. It sorts the rows by the col [...]
+
+However, you can only have one primary key because data can only be physically sorted in RAM in one way at a time. Choose the primary index to be the one you use most often (e.g. `[id,date]`). Sometimes there isn't an obvious choice for the primary key or you need to join and group many different columns in different orders. Enter a secondary index. This does use memory (`4*nrow` bytes regardless of the number of columns in the index) to store the order of the rows by the columns you spe [...]
+
+We use the words _index_ and _key_ interchangeably.
+
+# Error messages
+## "Could not find function `DT`"
+See above [here](#DTremove1) and [here](#DTremove2).
+
+## "unused argument(s) (`MySum = sum(v)`)"
+
+This error is generated by `DT[ , MySum = sum(v)]`. `DT[ , .(MySum = sum(v))]` was intended, or `DT[ , j = .(MySum = sum(v))]`.
+
+## "`translateCharUTF8` must be called on a `CHARSXP`"
+This error (and similar, _e.g._, "`getCharCE` must be called on a `CHARSXP`") may be nothing do with character data or locale. Instead, this can be a symptom of an earlier memory corruption. To date these have been reproducible and fixed (quickly). Please report it to our [issues tracker](https://github.com/Rdatatable/data.table/issues).
+
+## `cbind(DT, DF)` returns a strange format, _e.g._ `Integer,5` {#cbinderror}
+
+This occurs prior to v1.6.5, for `rbind(DT, DF)` too. Please upgrade to v1.6.7 or later.
+
+## "cannot change value of locked binding for `.SD`"
+
+`.SD` is locked by design. See `?data.table`. If you'd like to manipulate `.SD` before using it, or returning it, and don't wish to modify `DT` using `:=`, then take a copy first (see `?copy`), _e.g._,
+
+```{r}
+DT = data.table(a = rep(1:3, 1:3), b = 1:6, c = 7:12)
+DT
+DT[ , { mySD = copy(.SD)
+      mySD[1, b := 99L]
+      mySD},
+    by = a]
+```
+
+## "cannot change value of locked binding for `.N`"
+
+Please upgrade to v1.8.1 or later. From this version, if `.N` is returned by `j` it is renamed to `N` to avoid any ambiguity in any subsequent grouping between the `.N` special variable and a column called `".N"`.
+
+The old behaviour can be reproduced by forcing `.N` to be called `.N`, like this :
+```{r}
+DT = data.table(a = c(1,1,2,2,2), b = c(1,2,2,2,1))
+DT
+DT[ , list(.N = .N), list(a, b)]   # show intermediate result for exposition
+cat(try(
+    DT[ , list(.N = .N), by = list(a, b)][ , unique(.N), by = a]   # compound query more typical
+, silent = TRUE))
+```
+
+If you are already running v1.8.1 or later then the error message is now more helpful than the "cannot change value of locked binding" error, as you can see above, since this vignette was produced using v1.8.1 or later.
+
+The more natural syntax now works :
+```{r}
+if (packageVersion("data.table") >= "1.8.1") {
+    DT[ , .N, by = list(a, b)][ , unique(N), by = a]
+  }
+if (packageVersion("data.table") >= "1.9.3") {
+    DT[ , .N, by = .(a, b)][ , unique(N), by = a]   # same
+}
+```
+
+# Warning messages
+## "The following object(s) are masked from `package:base`: `cbind`, `rbind`"
+
+This warning was present in v1.6.5 and v.1.6.6 only, when loading the package. The motivation was to allow `cbind(DT, DF)` to work, but as it transpired, this broke (full) compatibility with package `IRanges`. Please upgrade to v1.6.7 or later.
+
+## "Coerced numeric RHS to integer to match the column's type"
+
+Hopefully, this is self explanatory. The full message is:
+
+Coerced numeric RHS to integer to match the column's type; may have truncated precision. Either change the column to numeric first by creating a new numeric vector length 5 (nrows of entire table) yourself and assigning that (i.e. 'replace' column), or coerce RHS to integer yourself (e.g. 1L or as.integer) to make your intent clear (and for speed). Or, set the column type correctly up front when you create the table and stick to it, please.
+
+
+To generate it, try :
+
+```{r}
+DT = data.table(a = 1:5, b = 1:5)
+suppressWarnings(
+DT[2, b := 6]         # works (slower) with warning
+)
+class(6)              # numeric not integer
+DT[2, b := 7L]        # works (faster) without warning
+class(7L)             # L makes it an integer
+DT[ , b := rnorm(5)]  # 'replace' integer column with a numeric column
+```
+
+## Reading data.table from RDS or RData file
+
+`*.RDS` and `*.RData` are file types which can store in-memory R objects on disk efficiently. However, storing data.table into the binary file loses its column over-allocation. This isn't a big deal -- your data.table will be copied in memory on the next _by reference_ operation and throw a warning. Therefore it is recommended to call `alloc.col()` on each data.table loaded with `readRDS()` or `load()` calls.
+
+# General questions about the package
+
+## v1.3 appears to be missing from the CRAN archive?
+That is correct. v1.3 was available on R-Forge only. There were several large
+changes internally and these took some time to test in development.
+
+## Is data.table compatible with S-plus?
+
+Not currently.
+
+ - A few core parts of the package are written in C and use internal R functions and R structures.
+ - The package uses lexical scoping which is one of the differences between R and **S-plus** explained by [R FAQ 3.3.1](https://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping)
+
+## Is it available for Linux, Mac and Windows?
+Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no special or OS-specific libraries used.
+
+## I think it's great. What can I do?
+Please file suggestions, bug reports and enhancement requests on our [issues tracker](https://github.com/Rdatatable/data.table/issues). This helps make the package better.
+
+Please do star the package on [GitHub](https://github.com/Rdatatable/data.table/wiki). This helps encourage the developers and helps other R users find the package.
+
+You can submit pull requests to change the code and/or documentation yourself; see our [Contribution Guidelines](https://github.com/Rdatatable/data.table/blob/master/Contributing.md).
+
+## I think it's not great. How do I warn others about my experience?
+Please put your vote and comments on [Crantastic](http://crantastic.org/packages/data-table). Please make it constructive so we have a chance to improve.
+
+## I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?
+Yes, there are two options. You can post to [datatable-help](mailto:datatable-help at lists.r-forge.r-project.org). It's like r-help, but just for this package. Or the [`[data.table]` tag](https://stackoverflow.com/tags/data.table/info) on [Stack Overflow](https://stackoverflow.com/). Feel free to answer questions in those places, too.
+
+## Where are the datatable-help archives?
+The [homepage](https://github.com/Rdatatable/data.table/wiki) contains links to the archives in several formats.
+
+## I'd prefer not to post on the Issues page, can I mail just one or two people privately?
+Sure. You're more likely to get a faster answer from the Issues page or Stack Overflow, though. Further, asking publicly in those places helps build the general knowledge base.
+
+## I have created a package that uses data.table. How do I ensure my package is data.table-aware so that inheritance from `data.frame` works?
+
+Please see [this answer](http://stackoverflow.com/a/10529888/403310).
+
+
diff --git a/vignettes/datatable-faq.Rnw b/vignettes/datatable-faq.Rnw
deleted file mode 100644
index 097caa4..0000000
--- a/vignettes/datatable-faq.Rnw
+++ /dev/null
@@ -1,585 +0,0 @@
-\documentclass[a4paper]{article}
-
-\usepackage[margin=3cm]{geometry}
-%%\usepackage[round]{natbib}
-\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
-
-%%\newcommand{\acronym}[1]{\textsc{#1}}
-%%\newcommand{\class}[1]{\mbox{\textsf{#1}}}
-\newcommand{\code}[1]{\mbox{\texttt{#1}}}
-\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
-\newcommand{\proglang}[1]{\textsf{#1}}
-\SweaveOpts{keep.source=TRUE}
-%% \VignetteIndexEntry{Frequently asked questions}
-
-<<echo=FALSE,results=hide>>=
-if (!exists("data.table",.GlobalEnv)) library(data.table)  # see Intro.Rnw for comments on these two lines
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-options(width=70)  # so lines wrap round
-@
-
-\begin{document}
-\title{FAQs about the \pkg{data.table} package in \proglang{R}}
-\date{Revised: \today\\(A later revision may be available on the \href{https://github.com/Rdatatable/data.table/wiki}{homepage})}
-\maketitle
-
-The first section, Beginner FAQs, is intended to be read in order, from start to finish.
-
-\tableofcontents
-\section{Beginner FAQs}
-
-\subsection{Why does \code{DT[,5]} return \code{5}?}
-Because by default, unlike a \code{data.frame}, the 2nd argument is an \emph{expression}
-which is evaluated within the scope of \code{DT}. 5 evaluates to 5. It is generally bad practice to refer
-to columns by number rather than name. If someone else comes along and reads your code later, they
-may have to hunt around to find out which column is number 5. Furthermore, if you or someone else
-changes the column ordering of \code{DT} higher up in your \proglang{R} program, you might get bugs if you forget to
-change all the places in your code which refer to column number 5.
-
-Say column 5 is called ''region'', just do \code{DT[,region]} instead. Notice there are no quotes around
-the column name. This is what we mean by j being evaluated within the scope of the \code{data.table}. That
-scope consists of an environment where the column names are variables.
-
-You can place \emph{any} \proglang{R} expression in \code{j}; e.g., \code{DT[,colA*colB/2]}. Further, \code{j} may be a set of \proglang{R} expressions (including calls to any \proglang{R} package) wrapped with \code{list()}, \code{.()} or an anonymous code block wrapped with braces \code{{}}. A simple example is \code{DT[,fitdistr(d1-d1,"normal")]}.
-
-Having said this, there are some circumstances where referring to a column by number is ok, such as
-a sequence of columns. In these situations just do \code{DT[,5:10,with=FALSE]} or \newline \code{DT[,c(1,4,10),with=FALSE]}.
-See \code{?data.table} for an explanation of the \code{with} argument. It lets you use \code{data.table} the same way as \code{data.frame}, when you need to.
-
-Note that \code{with()} has been a base function for a long time.  That's one reason we say \code{data.table} builds
-upon base functionality. There is little new here really, \code{data.table} is just making use of \code{with()}
-and building it into the syntax.
-
-\subsection{Why does \code{DT[,"region"]} return \code{"region"}?}
-See answer to 1.1 above. Try \code{DT[,region]} instead. Or \code{DT[,"region",with=FALSE]}.
-
-
-\subsection{Why does \code{DT[,region]} return a vector?  I'd like a 1-column \code{data.table}. There is no \code{drop} argument like I'm used to in \code{data.frame}.}
-Try \code{DT[,.(region)]} instead. \code{.()} is an alias for \code{list()} and ensures a \code{data.table} is returned.
-
-\subsection{Why does \code{DT[,x,y,z]} not work? I wanted the 3 columns \code{x},\code{y} and \code{z}.}
-The \code{j} expression is the 2nd argument. The correct way to do this is \code{DT[,.(x,y,z)]}.
-
-\subsection{I assigned a variable \code{mycol="x"} but then \code{DT[,mycol]} returns \code{"x"}. How do I get it to look up the column name contained in the \code{mycol} variable?}
-This is what we mean when we say the \code{j} expression 'sees' objects in the calling scope. The variable \code{mycol} does not exist as a column name of
-\code{DT} so \proglang{R} then looked in the calling scope and found \code{mycol} there and returned its value \code{"x"}. This is correct behaviour. Had \code{mycol} been a column name,
-then that column's data would have been returned. What you probably meant was \code{DT[,mycol,with=FALSE]}, which will return the \code{x} column's 
-data as you wanted. Alternatively, since a \code{data.table} \emph{is} a \code{list}, too, you can write \code{DT[["x"]]} or \code{DT[[mycol]]}.
-
-\subsection{Ok but I don't know the expressions in advance. How do I programatically pass them in?}
-To create expressions use the \code{quote()} function. We refer to these as \emph{quote()-ed} expressions to
-save confusion with the double quotes used to create a character vector such as \code{c("x")}. The simplest
-quote()-ed expression is just one column name :
-
-\code{q = quote(x)}
-
-\code{DT[,eval(q)]  \# returns the column x as a vector}
-
-\code{q = quote(list(x))}
-
-\code{DT[,eval(q)]  \# returns the column x as a 1-column data.table}
-\newline
-Since these are \emph{expressions}, we are not restricted to column names only :
-
-\code{q = quote(mean(x))}
-
-\code{DT[,eval(q)]  \# identical to DT[,mean(x)]}
-
-\code{q = quote(list(x,sd(y),mean(y*z)))}
-
-\code{DT[,eval(q)]  \# identical to DT[,list(x,sd(y),mean(y*z))]}
-\newline
-However, if it's just simply a vector of column names you need, it may be simpler to pass a character vector to \code{j} and use \code{with=FALSE}.
-
-To pass an expression into your own function, one idiom is as follows :
-<<>>=
-DT = as.data.table(iris)
-setkey(DT,Species)
-myfunction = function(dt, expr) {
-    e = substitute(expr)
-    dt[,eval(e),by=Species]
-}
-myfunction(DT,sum(Sepal.Width))
-@
-
-\code{quote()} and \code{eval()} are like macros in other languages. Instead of \code{j=myfunction()} (which won't work without laboriously passing in all the arguments) it's \code{j=eval(mymacro)}. This can be more efficient than a function call, and convenient. When data.table sees \code{j=eval(mymacro)} it knows to find \code{mymacro} in calling scope so as not to be tripped up if a column name happens to be called \code{mymacro}, too. 
-
-For example, let's make sure that exactly the same \code{j} is run for a set of different grouping criteria :
-
-<<>>=
-DT = as.data.table(iris)
-whatToRun = quote( .(AvgWidth = mean(Sepal.Width),
-                     MaxLength = max(Sepal.Length)) )
-DT[, eval(whatToRun), by=Species]
-DT[, eval(whatToRun), by=.(FirstLetter=substring(Species,1,1))]
-DT[, eval(whatToRun), by=.(Petal.Width=round(Petal.Width,0))]
-@
-
-\subsection{What are the benefits of being able to use column names as if they are variables inside \code{DT[...]}?}
-\code{j} doesn't have to be just column names. You can write any \proglang{R} \emph{expression} of column names directly as the \code{j}; e.g., 
-\code{DT[,mean(x*y/z)]}.  The same applies
-to \code{i}; e.g., \code{DT[x>1000, sum(y*z)]}.
-This runs the \code{j} expression on the set of rows where the \code{i} expression is true. You don't even need to return data; e.g., \code{DT[x>1000, plot(y,z)]}. Finally, you can do \code{j} by group by adding \code{by=}; e.g., \code{DT[x>1000, sum(y*z), by=w]}. This runs \code{j} for each group in column \code{w} but just over the rows where \code{x>1000}. By placing the 3 parts of the query (where, select and group by) inside the square brackets, \code{data.table} sees this query as  [...]
-
-\subsection{OK, I'm starting to see what \code{data.table} is about, but why didn't you enhance \code{data.frame} in \proglang{R}? Why does it have to be a new package?}
-As FAQ 1.1 highlights, \code{j} in \code{[.data.table} is fundamentally different from \code{j} in \code{[.data.frame}. Even something as simple as \code{DF[,1]} would break existing code in many packages and user code.  This is by design. We want it to work this way for more complicated syntax to work. There are other differences, too (see FAQ \ref{faq:SmallerDiffs}).
-
-Furthermore, \code{data.table} \emph{inherits} from \code{data.frame}. It \emph{is} a \code{data.frame}, too. A \code{data.table} can be passed to any package that only accepts \code{data.frame} and that package can use \code{[.data.frame} syntax on the \code{data.table}. 
-
-We \emph{have} proposed enhancements to \proglang{R} wherever possible, too. One of these was accepted as a new feature in \proglang{R} 2.12.0 :
-\begin{quotation}unique() and match() are now faster on character vectors where
-      all elements are in the global CHARSXP cache and have unmarked
-      encoding (ASCII).  Thanks to Matt Dowle for suggesting
-      improvements to the way the hash code is generated in unique.c.\end{quotation}
-
-A second proposal was to use memcpy in duplicate.c, which is much faster than a for loop in C. This would improve the \emph{way} that \proglang{R} copies data internally (on some measures by 13 times). The thread on r-devel is here : \url{http://tolstoy.newcastle.edu.au/R/e10/devel/10/04/0148.html}.
-
-\subsection{Why are the defaults the way they are? Why does it work the way it does?}
-The simple answer is because the main author originally designed it for his own use. He wanted it that way. He finds it a more natural, faster way to
-write code, which also executes more quickly.
-
-\subsection{Isn't this already done by \code{with()} and \code{subset()} in base?}
-Some of the features discussed so far are, yes. The package builds upon base functionality. It does the same sorts of things but with
-less code required and executes many times faster if used correctly.
-
-\subsection{Why does \code{X[Y]} return all the columns from \code{Y} too? Shouldn't it return a subset of \code{X}?}
-This was changed in v1.5.3. \code{X[Y]} now includes \code{Y}'s non-join columns. We refer to this feature as \emph{join inherited scope} because not only are \code{X} columns available to the j expression, so are \code{Y} columns. The downside is that \code{X[Y]} is less efficient since every item of \code{Y}'s non-join columns are duplicated to match the (likely large) number of rows in \code{X} that match. We therefore strongly encourage \code{X[Y,j]} instead of \code{X[Y]}. See next FAQ.
-
-\subsection{What is the difference between \code{X[Y]} and \code{merge(X,Y)}?}
-\code{X[Y]} is a join, looking up \code{X}'s rows using \code{Y} (or \code{Y}'s key if it has one) as an index.\newline
-\code{Y[X]} is a join, looking up \code{Y}'s rows using \code{X} (or \code{X}'s key if it has one) as an index.\newline
-\code{merge(X,Y)}\footnote{Here we mean either the \code{merge} \emph{method} for \code{data.table} or the \code{merge} method for \code{data.frame} since both methods work in the same way in this respect. See \code{?merge.data.table} and FAQ 2.24 for more information about method dispatch.} does both ways at the same time. The number of rows of \code{X[Y]} and \code{Y[X]} usually differ; whereas the number of rows returned by \code{merge(X,Y)} and \code{merge(Y,X)} is the same.
-
-\emph{BUT} that misses the main point. Most tasks require something to be done on the data after a join or merge. Why merge all the columns of data, only to use a small subset of them afterwards? You may suggest \code{merge(X[,ColsNeeded1],Y[,ColsNeeded2])}, but that takes copies of the subsets of data and it requires the programmer to work out which columns are needed. \code{X[Y,j]} in data.table does all that in one step for you. When you write \code{X[Y,sum(foo*bar)]}, \code{data.tabl [...]
-
-\subsection{Anything else about \code{X[Y,sum(foo*bar)]}?}
-This behaviour changed in v1.9.4 (Sep 2014). It now does the \code{X[Y]} join and then runs \code{sum(foo*bar)} over all the rows; i.e., \code{X[Y][,sum(foo*bar)]}. It used to run \code{j} for each \emph{group} of \code{X} that each row of \code{Y} matches to. That can still be done as it's very useful but you now need to be explicit and specify \code{by=.EACHI}; i.e., \code{X[Y,sum(foo*bar),by=.EACHI]}. We call this \emph{grouping by each i}.
-For example, and making it complicated by using \emph{join inherited scope}, too :
-<<>>=
-X = data.table(grp=c("a","a","b","b","b","c","c"), foo=1:7)
-setkey(X,grp)
-Y = data.table(c("b","c"), bar=c(4,2))
-X
-Y
-X[Y,sum(foo*bar)]
-X[Y,sum(foo*bar),by=.EACHI]
-@
-
-\subsection{That's nice. How did you manage to change it?}
-The request to change came from users. The feeling was that if a query is doing grouping then an explicit `by=` should be present for code readability reasons. An option is provided to return the old behaviour: \code{options(datatable.old.bywithoutby)}, by default FALSE. This enables upgrading to test the other new features / bug fixes in v1.9.4, with later migration of any by-without-by queries when ready (by adding \code{by=.EACHI} to them). We retained 47 pre-change tests and added th [...]
-
-Of the 66 packages on CRAN or Bioconductor that depend or import data.table at the time of releasing v1.9.4, only one was affected by the change. That could be because many packages don't have comprehensive tests, or just that grouping by each row in i wasn't being used much by downstream packages. We always test the new version with all dependent packages before release and coordinate any changes with those maintainers. So this release was quite straightforward in that regard.
-
-Another compelling reason to make the change was that previously, there was no efficient way to achieve what \code{X[Y,sum(foo*bar)]} does now. You had to write \code{X[Y][,sum(foo*bar)]}. That was suboptimal because \code{X[Y]} joined all the columns and passed them all to the second compound query without knowing that only foo and bar are needed. To solve that efficiency problem, extra programming effort was required: \code{X[Y,list(foo,bar)][,sum(foo*bar)]}.  The change to \code{by=.E [...]
-
-\section{General syntax}
-
-\subsection{How can I avoid writing a really long \code{j} expression? You've said I should use the column \emph{names}, but I've got a lot of columns.}
-When grouping, the \code{j} expression can use column names as variables, as you know, but it can also use a reserved symbol \code{.SD} which refers to the {\bf S}ubset of the \code{{\bf D}ata.table} for each group (excluding the grouping columns). So to sum up all your columns it's just \code{DT[,lapply(.SD,sum),by=grp]}. It might seem tricky, but it's fast to write and fast to run. Notice you don't have to create an anonymous \code{function}. The \code{.SD} object is efficiently implem [...]
-So please don't do this, for example, \code{DT[,sum(.SD[["sales"]]),by=grp]}. That works but is inefficient and inelegant. This is what was intended: \code{DT[,sum(sales),by=grp]} and could be 100's of times faster. If you do use all the data in \code{.SD} for each group (such as in \code{DT[,lapply(.SD,sum),by=grp]}) then that's very good usage of \code{.SD}. Also see \code{?data.table} for the \code{.SDcols} argument which allows you to specify a subset of columns for \code{.SD}.
-
-\subsection{Why is the default for \code{mult} now \code{"all"}?}
-In v1.5.3 the default was changed to \code{"all"}. When \code{i} (or \code{i}'s key if it has one) has fewer columns than \code{x}'s key, \code{mult} was already set to \code{"all"} automatically. Changing the default makes this clearer and easier for users as it came up quite often.
-
-In versions up to v1.3, \code{"all"} was slower. Internally, \code{"all"} was implemented by joining using \code{"first"}, then again from scratch using \code{"last"}, after which a diff between them was performed to work out the span of the matches in \code{x} for each row in \code{i}. Most often we join to single rows, though, where \code{"first"},\code{"last"} and \code{"all"} return the same result. We preferred maximum performance for the majority of situations so the default chosen [...]
-
-In v1.4 the binary search in C was changed to branch at the deepest level to find first and last. That branch will likely occur within the same final pages of RAM so there should no longer be a speed disadvantage in defaulting \code{mult} to \code{"all"}. We warned that the default might change and made the change in v1.5.3.
-
-A future version of \code{data.table} may allow a distinction between a key and a \emph{unique key}. Internally \code{mult="all"} would perform more like \code{mult="first"} when all \code{x}'s key columns were joined to and \code{x}'s key was a unique key. \code{data.table} would need checks on insert and update to make sure a unique key is maintained. An advantage of specifying a unique key would be that \code{data.table} would ensure no duplicates could be inserted, in addition to per [...]
-
-
-\subsection{I'm using \code{c()} in the \code{j} and getting strange results.}
-This is a common source of confusion. In \code{data.frame} you are used to, for example:
-<<>>=
-DF = data.frame(x=1:3,y=4:6,z=7:9)
-DF
-DF[,c("y","z")]
-@
-which returns the two columns. In \code{data.table} you know you can use the column names directly and might try :
-<<>>=
-DT = data.table(DF)
-DT[,c(y,z)]
-@
-but this returns one vector.  Remember that the \code{j} expression is evaluated within the environment of \code{DT} and \code{c()} returns a vector.  If 2 or more columns are required, use \code{list()} or \code{.()} instead:
-<<>>=
-DT[,.(y,z)]
-@
-\code{c()} can be useful in a \code{data.table} too, but its behaviour is different from that in \code{[.data.frame}.
-
-\subsection{I have built up a complex table with many columns.  I want to use it as a template for a new table; i.e., create a new table with no rows, but with the column names and types copied from my table. Can I do that easily?}
-Yes. If your complex table is called \code{DT}, try \code{NEWDT = DT[0]}.
-
-\subsection{Is a null data.table the same as \code{DT[0]}?}
-No. By "null data.table" we mean the result of \code{data.table(NULL)} or \code{as.data.table(NULL)}; i.e.,
-<<>>=
-data.table(NULL)
-data.frame(NULL)
-as.data.table(NULL)
-as.data.frame(NULL)
-is.null(data.table(NULL))
-is.null(data.frame(NULL))
-@
-The null \code{data.table|frame} is \code{NULL} with some attributes attached, making it not NULL anymore. In R only pure \code{NULL} is \code{NULL} as tested by \code{is.null()}. When referring to the "null data.table" we use lower case null to help distinguish from upper case \code{NULL}. To test for the null data.table, use \code{length(DT)==0} or \code{ncol(DT)==0} (\code{length} is slightly faster as it's a primitive function).
-An \emph{empty} data.table (\code{DT[0]}) has one or more columns, all of which are empty. Those empty columns still have names and types.
-<<>>=
-DT = data.table(a=1:3,b=c(4,5,6),d=c(7L,8L,9L))
-DT[0]
-sapply(DT[0],class)
-@
-
-\subsection{Why has the \code{DT()} alias been removed?}\label{faq:DTremove1}
-\code{DT} was introduced originally as a wrapper for a list of \code{j} expressions. Since \code{DT} was an alias for \code{data.table}, this was a convenient way to take care of silent recycling in cases where each item of the \code{j} list evaluated to different lengths. The alias was one reason grouping was slow, though.
-As of v1.3, \code{list()} or \code{.()} should be passed instead to the \code{j} argument. These are much faster, especially when there are many groups. Internally, this was a nontrivial change. Vector recycling is now done internally, along with several other speed enhancements for grouping.
-
-\subsection{But my code uses \code{j=DT(...)} and it works. The previous FAQ says that \code{DT()} has been removed.}\label{faq:DTremove2}
-Then you are using a version prior to 1.5.3. Prior to 1.5.3 \code{[.data.table} detected use of \code{DT()} in the \code{j} and automatically replaced it with a call to \code{list()}. This was to help the transition for existing users.
-
-\subsection{What are the scoping rules for \code{j} expressions?}
-Think of the subset as an environment where all the column names are variables. When a variable \code{foo} is used in the \code{j} of a query such as \code{X[Y,sum(foo)]}, \code{foo} is looked for in the following order :
-\begin{enumerate}
-\item The scope of \code{X}'s subset; i.e., \code{X}'s column names.
-\item The scope of each row of \code{Y}; i.e., \code{Y}'s column names (\emph{join inherited scope})
-\item The scope of the calling frame; e.g., the line that appears before the \code{data.table} query.
-\item Exercise for reader: does it then ripple up the calling frames, or go straight to \code{globalenv()}?
-\item The global environment
-\end{enumerate}
-This is \emph{lexical scoping} as explained in \href{http://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping}{R FAQ 3.3.1}.
-The environment in which the function was created is not relevant, though, because there is \emph{no function}. No anonymous \emph{function}
-is passed to the \code{j}. Instead, an anonymous \emph{body} is passed to the \code{j}; for example,
-<<>>=
-DT = data.table(x=rep(c("a","b"),c(2,3)),y=1:5)
-DT
-DT[,{z=sum(y);z+3},by=x]
-@
-Some programming languages call this a \emph{lambda}.
-
-\subsection{Can I trace the \code{j} expression as it runs through the groups?}
-Try something like this:
-<<>>=
-DT[,{
-  cat("Objects:",paste(objects(),collapse=","),"\n")
-  cat("Trace: x=",as.character(x)," y=",y,"\n")
-  sum(y)
-},by=x]
-@
-
-\subsection{Inside each group, why are the group variables length 1?}
-In the previous FAQ, \code{x} is a grouping variable
-and (as from v1.6.1) has length 1 (if inspected or used in \code{j}). It's for efficiency and convenience. Therefore, there is no difference between the following two statements:
-<<>>=
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x,sum(y)),by=x]
-DT[,.(g=1,h=2,i=3,j=4,repeatgroupname=x[1],sum(y)),by=x]
-@
-If you need the size of the current group, use \code{.N} rather than calling \code{length()} on any column.
-
-\subsection{Only the first 10 rows are printed, how do I print more?}
-There are two things happening here. First, if the number of rows in a \code{data.table} are large (\code{> 100} by default), then a summary of the \code{data.table} is printed to the console by default. Second, the summary of a large \code{data.table} is printed by takingthe top and bottom \code{n} rows of the \code{data.table} and only printing those. Both of these parameters (when to trigger a summary and how much of a table to use as a summary) are configurable by \proglang{R}'s \cod [...]
-
-For instance, to enforce the summary of a \code{data.table} to only happen when a \code{data.table} is greater than 50 rows, you could \code{options(datatable.print.nrows=50)}. To disable the summary-by-default completely, you could \code{options(datatable.print.nrows=Inf)}. You could also call \code{print} directly, as in \code{print(your.data.table, nrows=Inf)}.
-
-If you want to show more than just the top (and bottom) 10 rows of a \code{data.table} summary (say you like 20), set \code{options(datatable.print.topn=20)}, for example. Again, you could also just call \code{print} directly, as in \code{print(your.data.table, topn=20)}
-
-\subsection{With an \code{X[Y]} join, what if \code{X} contains a column called \code{"Y"}?}
-When \code{i} is a single name such as \code{Y} it is evaluated in the calling frame. In all other cases such as calls to \code{.()} or other expressions, \code{i} is evaluated within the scope of \code{X}. This facilitates easy \emph{self joins} such as \code{X[J(unique(colA)),mult="first"]}.
-
-\subsection{\code{X[Z[Y]]} is failing because \code{X} contains a column \code{"Y"}. I'd like it to use the table \code{Y} in calling scope.}
-The \code{Z[Y]} part is not a single name so that is evaluated within the frame of \code{X} and the problem occurs. Try \code{tmp=Z[Y];X[tmp]}. This is robust to \code{X} containing a column \code{"tmp"} because \code{tmp} is a single name. If you often encounter conflics of this type, one simple solution may be to name all tables in uppercase and all column names in lowercase, or some similar scheme.
-
-\subsection{Can you explain further why \code{data.table} is inspired by \code{A[B]} syntax in base?}
-Consider \code{A[B]} syntax using an example matrix \code{A} :
-<<>>=
-A = matrix(1:12,nrow=4)
-A
-@
-To obtain cells (1,2)=5 and (3,3)=11 many users (we believe) may try this first :
-<<>>=
-A[c(1,3),c(2,3)]
-@
-That returns the union of those rows and columns, though. To reference the cells, a 2-column matrix is required. \code{?Extract} says :
-\begin{quotation}
-When indexing arrays by [ a single argument i can be a matrix with as many columns as there are dimensions of x; the result is then a vector with elements corresponding to the sets of indices in each row of i.
-\end{quotation}
-Let's try again.
-<<>>=
-B = cbind(c(1,3),c(2,3))
-B
-A[B]
-@
-A matrix is a 2-dimension structure with row names and column names. Can we do the same with names?
-<<>>=
-rownames(A) = letters[1:4]
-colnames(A) = LETTERS[1:3]
-A
-B = cbind(c("a","c"),c("B","C"))
-A[B]
-@
-So, yes we can. Can we do the same with \code{data.frame}?
-<<>>=
-A = data.frame(A=1:4,B=letters[11:14],C=pi*1:4)
-rownames(A) = letters[1:4]
-A
-B
-A[B]
-@
-But, notice that the result was coerced to character. \proglang{R} coerced \code{A} to matrix first so that the syntax could work, but the result isn't ideal.  Let's try making \code{B} a \code{data.frame}.
-<<>>=
-B = data.frame(c("a","c"),c("B","C"))
-cat(try(A[B],silent=TRUE))
-@
-So we can't subset a \code{data.frame} by a \code{data.frame} in base R. What if we want row names and column names that aren't character but integer or float? What if we want more than 2 dimensions of mixed types? Enter \code{data.table}.
-
-Furthermore, matrices, especially sparse matrices, are often stored in a 3 column tuple: (i,j,value). This can be thought of as a key-value pair where \code{i} and \code{j} form a 2-column key. If we have more than one value, perhaps of different types it might look like (i,j,val1,val2,val3,...). This looks very much like a \code{data.frame}. Hence \code{data.table} extends \code{data.frame} so that a \code{data.frame X} can be subset by a \code{data.frame Y}, leading to the \code{X[Y]} syntax.
-
-\subsection{Can base be changed to do this then, rather than a new package?}
-\code{data.frame} is used \emph{everywhere} and so it is very difficult to make \emph{any} changes to it.
-\code{data.table} \emph{inherits} from \code{data.frame}. It \emph{is} a \code{data.frame}, too. A \code{data.table} \emph{can} be passed to any package that \emph{only} accepts \code{data.frame}. When that package uses \code{[.data.frame} syntax on the \code{data.table}, it works. It works because \code{[.data.table} looks to see where it was called from. If it was called from such a package, \code{[.data.table} diverts to \code{[.data.frame}.
-
-\subsection{I've heard that \code{data.table} syntax is analogous to SQL.}
-Yes :
-\begin{itemize}
-\item{\code{i}  <==>  where}
-\item{\code{j}  <==>  select}
-\item{\code{:=}  <==>  update}
-\item{\code{by}  <==>  group by}
-\item{\code{i}  <==>  order by (in compound syntax)}
-\item{\code{i}  <==>  having (in compound syntax)}
-\item{\code{nomatch=NA}  <==>  outer join}
-\item{\code{nomatch=0}  <==>  inner join}
-\item{\code{mult="first"|"last"}  <==>  N/A because SQL is inherently unordered}
-\item{\code{roll=TRUE}  <==>  N/A because SQL is inherently unordered}
-\end{itemize}
-The general form is : \newline
-\code{\hspace*{2cm}DT[where,select|update,group by][order by][...] ... [...]}
-\newline\newline
-A key advantage of column vectors in \proglang{R} is that they are \emph{ordered}, unlike SQL\footnote{It may be a surprise to learn that \code{select top 10 * from ...} does \emph{not} reliably return the same rows over time in SQL. You do need to include an \code{order by} clause, or use a clustered index to guarantee row order; i.e., SQL is inherently unordered.}. We can use ordered functions in \code{data.table} queries such as \code{diff()} and we can use \emph{any} \proglang{R} fun [...]
-
-
-\subsection{What are the smaller syntax differences between \code{data.frame} and \code{data.table}?}\label{faq:SmallerDiffs}
-\begin{itemize}
-\item{\code{DT[3]} refers to the 3rd row, but \code{DF[3]} refers to the 3rd column}
-\item{\code{DT[3,]} == \code{DT[3]}, but \code{DF[,3]} == \code{DF[3]} (somewhat confusingly)}
-\item{For this reason we say the comma is \emph{optional} in \code{DT}, but not optional in \code{DF}}
-\item{\code{DT[[3]]} == \code{DF[3]} == \code{DF[[3]]}}
-\item{\code{DT[i,]} where \code{i} is a single integer returns a single row, just like \code{DF[i,]}, but unlike a matrix single row subset which returns a vector.}
-\item{\code{DT[,j,with=FALSE]} where \code{j} is a single integer returns a one column \code{data.table}, unlike \code{DF[,j]} which returns a vector by default}
-\item{\code{DT[,"colA",with=FALSE][[1]]} == \code{DF[,"colA"]}.}
-\item{\code{DT[,colA]} == \code{DF[,"colA"]}}
-\item{\code{DT[,list(colA)]} == \code{DF[,"colA",drop=FALSE]}}
-\item{\code{DT[NA]} returns 1 row of \code{NA}, but \code{DF[NA]} returns a copy of \code{DF} containing
-  \code{NA} throughout. The symbol \code{NA} is type logical in \proglang{R} and
-  is therefore recycled by \code{[.data.frame}. Intention was probably \code{DF[NA\_integer\_]}.
-  \code{[.data.table} does this automatically for convenience.}
-\item{\code{DT[c(TRUE,NA,FALSE)]} treats the \code{NA} as \code{FALSE}, but \code{DF[c(TRUE,NA,FALSE)]} returns
-  \code{NA} rows for each \code{NA}}
-\item{\code{DT[ColA==ColB]} is simpler than \code{DF[!is.na(ColA) \& !is.na(ColB) \& ColA==ColB,]}}
-\item{\code{data.frame(list(1:2,"k",1:4))} creates 3 columns, \code{data.table} creates one \code{list} column.}
-\item{\code{check.names} is by default \code{TRUE} in \code{data.frame} but \code{FALSE} in \code{data.table}, for convenience.}
-\item{\code{stringsAsFactors} is by default \code{TRUE} in \code{data.frame} but \code{FALSE} in \code{data.table}, for efficiency. Since a global string cache was added to R, characters items are a pointer to the single cached string and there is no longer a performance benefit of coverting to factor.}
-\item{Atomic vectors in \code{list} columns are collapsed when printed using ", " in \code{data.frame}, but "," in \code{data.table} with a trailing comma after the 6th item to avoid accidental printing of large embedded objects.}
-\end{itemize}
-In \code{[.data.frame} we very often set \code{drop=FALSE}. When we forget, bugs can arise in edge cases where single columns are selected and all of a sudden a vector is returned rather than a single column \code{data.frame}. In \code{[.data.table} we took the opportunity to make it consistent and drop \code{drop}.
-\newline\newline
-When a \code{data.table} is passed to a \code{data.table}-unaware package, that package it not concerned with any of these differences; it just works.
-
-\subsection{I'm using \code{j} for its side effect only, but I'm still getting data returned. How do I stop that?}
-In this case \code{j} can be wrapped with \code{invisible()}; e.g., \code{DT[,invisible(hist(colB)),by=colA]}\footnote{\code{hist()} returns the breakpoints in addition to plotting to the graphics device}.
-
-\subsection{Why does \code{[.data.table} now have a \code{drop} argument from v1.5?}
-So that \code{data.table} can inherit from \code{data.frame} without using \code{\dots}. If we used \code{\dots} then invalid argument names would not be caught.
-
-The \code{drop} argument is never used by \code{[.data.table}. It is a placeholder for non \code{data.table} aware packages when they use the \code{[.data.frame} syntax directly on a \code{data.table}.
-
-\subsection{Rolling joins are cool and very fast! Was that hard to program?}
-The prevailing row on or before the \code{i} row is the final row the binary search tests anyway. So \code{roll=TRUE} is essentially just a switch in the binary search C code to return that row.
-
-\subsection{Why does \code{DT[i,col:=value]} return the whole of \code{DT}? I expected either no visible value (consistent with \code{<-}), or a message or return value containing how many rows were updated. It isn't obvious that the data has indeed been updated by reference.}
-This has changed in v1.8.3 to meet your expectations. Please upgrade.
-The whole of \code{DT} is returned (now invisibly) so that compound syntax can work; e.g., \code{DT[i,done:=TRUE][,sum(done)]}. The number of rows updated is returned when verbosity is on, either on a per query basis or globally using \code{options(datatable.verbose=TRUE)}.
-
-\subsection{Ok, thanks. What was so difficult about the result of \code{DT[i,col:=value]} being returned invisibly?}
-\proglang{R} internally forces visibility on for \code{[}. The value of FunTab's eval column (see src/main/names.c) for \code{[} is 0 meaning force \code{R\_Visible} on (see R-Internals section 1.6). Therefore, when we tried \code{invisible()} or setting \code{R\_Visible} to 0 directly ourselves, \code{eval} in src/main/eval.c would force it on again.
-
-To solve this problem, the key was to stop trying to stop the print method running after a \code{:=}. Instead, inside \code{:=} we now (from v1.8.3) set a global flag which the print method uses to know whether to actually print or not.
-
-\subsection{I've noticed that \code{base::cbind.data.frame} (and \code{base::rbind.data.frame}) appear to be changed by \code{data.table}. How is this possible? Why?}
-It is a temporary, last resort solution until we discover a better way to solve the problems listed below. Essentially, the issue is that \code{data.table} inherits from \code{data.frame}, \emph{and}, \code{base::cbind} and \code{base::rbind} (uniquely) do their own S3 dispatch internally as documented by \code{?cbind}. The change is adding one \code{for} loop to the start of each function directly in base; e.g.,
-<<>>=
-base::cbind.data.frame
-@
-That modification is made dynamically; i.e., the base definition of \code{cbind.data.frame} is fetched, the \code{for} loop added to the beginning and then assigned back to base. This solution is intended to be robust to different definitions of \code{base::cbind.data.frame} in different versions of \proglang{R}, including unknown future changes. Again, it is a last resort until a better solution is known or made available. The competing requirements are :
-\begin{itemize}
-\item \code{cbind(DT,DF)} needs to work. Defining \code{cbind.data.table} doesn't work because \code{base::cbind} does its own S3 dispatch and requires that the \emph{first} \code{cbind} method for each object it is passed is \emph{identical}. This is not true in \code{cbind(DT,DF)} because the first method for \code{DT} is \code{cbind.data.table} but the first method for \code{DF} is \code{cbind.data.frame}. \code{base::cbind} then falls through to its internal \code{bind} code which ap [...]
-\item This naturally leads to trying to mask \code{cbind.data.frame} instead. Since a \code{data.table} is a \code{data.frame}, \code{cbind} would find the same method for both \code{DT} and \code{DF}. However, this doesn't work either because \code{base::cbind} appears to find methods in \code{base} first; i.e., \code{base::cbind.data.frame} isn't maskable. This is reproducible as follows :
-\end{itemize}
-<<>>=
-foo = data.frame(a=1:3)
-cbind.data.frame = function(...)cat("Not printed\n")
-cbind(foo)
-@
-<<echo=FALSE>>=
-rm("cbind.data.frame")
-@
-\begin{itemize}
-\item Finally, we tried masking \code{cbind} itself (v1.6.5 and v1.6.6). This allowed \code{cbind(DT,DF)} to work, but introduced compatibility issues with package IRanges, since IRanges also masks \code{cbind}. It worked if IRanges was lower on the search() path than \code{data.table}, but if IRanges was higher then \code{data.table}'s \code{cbind} would never be called and the strange looking matrix output occurs again (FAQ \ref{faq:cbinderror}).
-\end{itemize}
-If you know of a better solution, that still solves all the issues above, then please let us know and we'll gladly change it.
-
-\subsection{I've read about method dispatch (e.g. \code{merge} may or may not dispatch to \code{merge.data.table}) but \emph{how} does R know how to dispatch? Are dots significant or special? How on earth does R know which function to dispatch and when?}
-This comes up quite a lot, but it's really earth shatteringly simple. A function such as \code{merge} is \emph{generic} if it consists of a call to \code{UseMethod}. When you see people talking about whether or not functions are \emph{generic} functions they are merely typing the function, without \code{()} afterwards, looking at the program code inside it and if they see a call to \code{UseMethod} then it is \emph{generic}.  What does \code{UseMethod} do? It literally slaps the function [...]
-
-You might now ask: where is this documented in R? Answer: it's quite clear, but, you need to first know to look in \code{?UseMethod} and \emph{that} help file contains : 
-
- "When a function calling UseMethod('fun') is applied to an object with class attribute c('first', 'second'), the system searches for a function called fun.first and, if it finds it, applies it to the object. If no such function is found a function called fun.second is tried. If no class name produces a suitable function, the function fun.default is used, if it exists, or an error results."
-
-Happily, an internet search for "How does R method dispatch work" (at the time of writing) returns the \code{?UseMethod} help page as the top link. Admittedly, other links rapidly descend into the intracies of S3 vs S4, internal generics and so on.
-
-However, features like basic S3 dispatch (pasting the function name together with the class name) is why some R folk love R. It's so simple. No complicated registration or signature is required. There isn't much needed to learn. To create the \code{merge} method for \code{data.table} all that was required, literally, was to merely create a function called \code{merge.data.table}.
-
-\section{Questions relating to compute time}
-
-\subsection{I have 20 columns and a large number of rows. Why is an expression of one column so quick?}
-Several reasons:
-\begin{itemize}
-\item Only that column is grouped, the other 19 are ignored because \code{data.table} inspects the \code{j} expression and realises it doesn't use the other columns.
-\item One memory allocation is made for the largest group only, then that memory is re-used for the other groups. There is very little garbage to collect.
-\item \proglang{R} is an in-memory column store; i.e., the columns are contiguous in RAM. Page fetches from RAM into L2 cache are minimised.
-\end{itemize}
-
-\subsection{I don't have a key on a large table, but grouping is still really quick. Why is that?}
-\code{data.table} uses radix sorting. This is significantly faster than other sort algorithms. See our presentations on our homepage for more information.
-
-This is also one reason why \code{setkey()} is quick.
-
-When no key is set, or we group in a different order from that of the key, we call it an \emph{ad hoc by}.
-
-\subsection{Why is grouping by columns in the key faster than an ad hoc by?}
-Because each group is contiguous in RAM, thereby minimising page fetches and memory can be
-copied in bulk (memcpy in C) rather than looping in C.
-
-\section{Error messages}
-\subsection{\code{Could not find function "DT"}}
-See FAQ \ref{faq:DTremove1} and FAQ \ref{faq:DTremove2}.
-
-\subsection{\code{unused argument(s) (MySum = sum(v))}}
-This error is generated by \code{DT[,MySum=sum(v)]}. \code{DT[,.(MySum=sum(v))]} was intended, or \code{DT[,j=.(MySum=sum(v))]}.
-
-\subsection{\code{'translateCharUTF8' must be called on a CHARSXP}}
-This error (and similar; e.g., \code{'getCharCE' must be called on a CHARSXP}) may be nothing do with character data or locale. Instead, this can be a symptom of an earlier memory corruption. To date these have been reproducible and fixed (quickly). Please report it to datatable-help.
-
-\subsection{\code{cbind(DT,DF) returns a strange format e.g. 'Integer,5'}} \label{faq:cbinderror}
-This occurs prior to v1.6.5, for \code{rbind(DT,DF)} too. Please upgrade to v1.6.7 or later.
-
-\subsection{\code{cannot change value of locked binding for '.SD'}}
-\code{.SD} is locked by design. See \code{?data.table}. If you'd like to manipulate \code{.SD} before using it, or returning it, and don't wish to modify \code{DT} using \code{:=}, then take a copy first (see \code{?copy}); e.g.,
-<<>>=
-DT = data.table(a=rep(1:3,1:3),b=1:6,c=7:12)
-DT
-DT[,{ mySD = copy(.SD)
-      mySD[1,b:=99L]
-      mySD },
-    by=a]
-@
-
-\subsection{\code{cannot change value of locked binding for '.N'}}
-Please upgrade to v1.8.1 or later. From this version, if \code{.N} is returned by \code{j} it is renamed to \code{N} to avoid any abiguity in any subsequent grouping between the \code{.N} special variable and a column called \code{".N"}.
-The old behaviour can be reproduced by forcing \code{.N} to be called \code{.N}, like this :
-<<>>=
-DT = data.table(a=c(1,1,2,2,2),b=c(1,2,2,2,1))
-DT
-DT[,list(.N=.N),list(a,b)]   # show intermediate result for exposition
-cat(try(
-    DT[,list(.N=.N),by=list(a,b)][,unique(.N),by=a]   # compound query more typical
-,silent=TRUE))
-@
-If you are already running v1.8.1 or later then the error message is now more helpful than the \code{cannot change value of locked binding} error.  As you can see above, since this vignette was produced using v1.8.1 or later. 
-The more natural syntax now works :
-<<>>=
-if (packageVersion("data.table") >= "1.8.1") {
-    DT[,.N,by=list(a,b)][,unique(N),by=a]
-}
-if (packageVersion("data.table") >= "1.9.3") {
-    DT[,.N,by=.(a,b)][,unique(N),by=a]   # same
-}
-@
-
-\section{Warning messages}
-\subsection{\code{The following object(s) are masked from 'package:base': cbind, rbind}}
-This warning was present in v1.6.5 and v.1.6.6 only, when loading the package. The motivation was to allow \code{cbind(DT,DF)} to work, but as it transpired, broke (full) compatibility with package IRanges. Please upgrade to v1.6.7 or later. 
-
-\subsection{\code{Coerced numeric RHS to integer to match the column's type}}
-Hopefully, this is self explanatory. The full message is :\newline
-
-\code{Coerced numeric RHS to integer to match the column's type; may have truncated}\newline
-\code{precision. Either change the column to numeric first by creating a new numeric}\newline
-\code{vector length 5 (nrows of entire table) yourself and assigning that (i.e. }\newline
-\code{'replace' column), or coerce RHS to integer yourself (e.g. 1L or as.integer)}\newline
-\code{to make your intent clear (and for speed). Or, set the column type correctly}\newline
-\code{up front when you create the table and stick to it, please.}\newline
-
-To generate it, try :
-<<>>=
-DT = data.table(a=1:5,b=1:5)
-suppressWarnings(
-DT[2,b:=6]        # works (slower) with warning
-)
-class(6)          # numeric not integer
-DT[2,b:=7L]       # works (faster) without warning
-class(7L)         # L makes it an integer
-DT[,b:=rnorm(5)]  # 'replace' integer column with a numeric column
-@
-
-\section{General questions about the package}
-
-\subsection{v1.3 appears to be missing from the CRAN archive?}
-That is correct. v1.3 was available on R-Forge only. There were several large
-changes internally and these took some time to test in development.
-
-\subsection{Is \code{data.table} compatible with S-plus?}
-Not currently.
-\begin{itemize}
-\item A few core parts of the package are written in C and use internal \proglang{R} functions and \proglang{R} structures.
-\item The package uses lexical scoping which is one of the differences between \proglang{R} and \proglang{S-plus} explained by
-\href{http://cran.r-project.org/doc/FAQ/R-FAQ.html#Lexical-scoping}{R FAQ 3.3.1}.
-\end{itemize}
-
-\subsection{Is it available for Linux, Mac and Windows?}
-Yes, for both 32-bit and 64-bit on all platforms. Thanks to CRAN. There are no special or OS-specific libraries used.
-
-\subsection{I think it's great. What can I do?}
-Please file suggestions, bug reports and enhancement requests on \href{https://github.com/Rdatatable/data.table/issues}{GitHub}. 
-This helps make the package better.
-
-Please do vote for the package on \href{http://crantastic.org/packages/data-table}{Crantastic}. This helps encourage the developers and helps other \proglang{R} users find the package. If you have time to write a comment too, that can help others in the community. Just simply clicking that you use the package, though, is much appreciated.
-
-You can submit pull requests to change the code and/or documentation yourself.
-
-\subsection{I think it's not great. How do I warn others about my experience?}
-Please put your vote and comments on \href{http://crantastic.org/packages/data-table}{Crantastic}. Please make it constructive so we have a chance to improve.
-
-\subsection{I have a question. I know the r-help posting guide tells me to contact the maintainer (not r-help), but is there a larger group of people I can ask?}
-Yes, there are two options. You can post to \href{mailto:datatable-help at lists.r-forge.r-project.org}{datatable-help}. It's like r-help, but just for this package. Or the \href{http://stackoverflow.com/questions/tagged/data.table}{\code{data.table} tag on Stack Overflow}. Feel free to answer questions in those places, too.
-
-\subsection{Where are the datatable-help archives?}
-The \href{https://github.com/Rdatatable/data.table/wiki}{homepage} contains links to the archives in several formats.
-
-\subsection{I'd prefer not to contact datatable-help, can I mail just one or two people privately?}
-Sure. You're more likely to get a faster answer from datatable-help or Stack Overflow, though. Asking publicly in those places helps build the knowledge base.
-
-\subsection{I have created a package that depends on \code{data.table}. How do I ensure my package is \code{data.table}-aware so that inheritance from \code{data.frame} works?}
-Either i) include \code{data.table} in the \code{Depends:} field of your DESCRIPTION file, or ii) include \code{data.table} in the \code{Imports:} field of your DESCRIPTION file AND \code{import(data.table)} in your NAMESPACE file.
-
-\subsection{Why is this FAQ in pdf format? Can it moved to HTML?}
-Yes we'd like to move it to a HTML vignette. Just haven't got to that yet. 
-The benefits of vignettes (rather than a wiki) including the following:
-\begin{itemize}
-\item We include \proglang{R} code in the vignettes. This code is \emph{actually run} when the file is created, not copy and pasted.
-\item This document is \emph{reproducible}. Grab the .Rnw and you can run it yourself.
-\item CRAN checks the package (including running vignettes) every night on Linux, Mac and Windows, both 32bit and 64bit. Results are posted to \url{http://cran.r-project.org/web/checks/check_results_data.table.html}. Included there are results from r-devel; i.e., not yet released R. That serves as a very useful early warning system for any potential future issues as \proglang{R} itself develops.
-\item This file is bound into each version of the package. The package is not accepted on CRAN unless this file passes checks. Each version of the package will have its own FAQ file which will be relevant for that version. Contrast this to a single website, which can be ambiguous if the answer depends on the version.
-\item You can open it offline at your \proglang{R} prompt using \code{vignette()}.
-\item You can extract the code from the document and play with it using\newline \code{edit(vignette("datatable-faq"))}.
-\end{itemize}
-
-\end{document}
-
-
diff --git a/inst/doc/datatable-intro-vignette.Rmd b/vignettes/datatable-intro.Rmd
similarity index 88%
rename from inst/doc/datatable-intro-vignette.Rmd
rename to vignettes/datatable-intro.Rmd
index 7e2335f..d942281 100644
--- a/inst/doc/datatable-intro-vignette.Rmd
+++ b/vignettes/datatable-intro.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Introduction to data.table"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Introduction to data.table}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,11 +13,10 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 This vignette introduces the *data.table* syntax, its general form, how to *subset* rows, *select and compute* on columns and perform aggregations *by group*. Familiarity with *data.frame* data structure from base R is useful, but not essential to follow this vignette.
 
@@ -30,11 +26,11 @@ This vignette introduces the *data.table* syntax, its general form, how to *subs
 
 Data manipulation operations such as *subset*, *group*, *update*, *join* etc., are all inherently related. Keeping these *related operations together* allows for:
 
-* *concise* and *consistent* syntax irrespective of the set of operations you would like to perform to achieve your end goal. 
+* *concise* and *consistent* syntax irrespective of the set of operations you would like to perform to achieve your end goal.
 
-* performing analysis *fluidly* without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis. 
+* performing analysis *fluidly* without the cognitive burden of having to map each operation to a particular function from a set of functions available before to perform the analysis.
 
-* *automatically* optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient. 
+* *automatically* optimising operations internally, and very effectively, by knowing precisely the data required for each operation and therefore very fast and memory efficient.
 
 Briefly, if you are interested in reducing *programming* and *compute* time tremendously, then this package is for you. The philosophy that *data.table* adheres to makes this possible. Our goal is to illustrate it through this series of vignettes.
 
@@ -45,8 +41,8 @@ In this vignette, we will use [NYC-flights14](https://github.com/arunsrinivasan/
 
 We can use *data.table's* fast file reader `fread` to load *flights* directly as follows:
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -63,7 +59,7 @@ In this vignette, we will
 
 1. start with basics - what is a *data.table*, its general form, how to *subset* rows, *select and compute* on columns
 
-2. and then we will look at performing data aggregations by group, 
+2. and then we will look at performing data aggregations by group,
 
 ## 1. Basics {#basics-1}
 
@@ -72,7 +68,7 @@ In this vignette, we will
 *data.table* is an R package that provides **an enhanced version** of *data.frames*. In the [Data](#data) section, we already created a *data.table* using `fread()`. We can also create one using the `data.table()` function. Here is an example:
 
 ```{r}
-DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
+DT = data.table(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DT
 class(DT$ID)
 ```
@@ -86,7 +82,7 @@ You can also convert existing objects to a *data.table* using `as.data.table()`.
 * Row numbers are printed with a `:` in order to visually separate the row number from the first column.
 
 * When the number of rows to print exceeds the global option `datatable.print.nrows` (default = `r getOption("datatable.print.nrows")`), it automatically prints only the top 5 and bottom 5 rows (as can be seen in the [Data](#data) section).
-    
+
     ```{.r}
     getOption("datatable.print.nrows")
     ```
@@ -110,7 +106,7 @@ Users who have a SQL background might perhaps immediately relate to this syntax.
 
 Take `DT`, subset rows using `i`, then calculate `j`, grouped by `by`.
 
-# 
+#
 
 Let's begin by looking at `i` and `j` first - subsetting rows and operating on columns.
 
@@ -156,7 +152,7 @@ head(ans)
 * In addition, `order(...)` within the frame of a *data.table* uses *data.table*'s internal fast radix order `forder()`, which is much faster than `base::order`. Here's a small example to highlight the difference.
 
     ```{r}
-    odt = data.table(col=sample(1e7))
+    odt = data.table(col = sample(1e7))
     (t1 <- system.time(ans1 <- odt[base::order(col)]))  ## uses order from base R
     (t2 <- system.time(ans2 <- odt[order(col)]))        ## uses data.table's forder
     (identical(ans1, ans2))
@@ -203,7 +199,7 @@ head(ans)
 
 #
 
-*data.tables* (and *data.frames*) are internally *lists*  as well, but with all its columns of equal length and with a *class* attribute. Allowing `j` to return a *list* enables converting and returning a *data.table* very efficiently. 
+*data.tables* (and *data.frames*) are internally *lists*  as well, but with all its columns of equal length and with a *class* attribute. Allowing `j` to return a *list* enables converting and returning a *data.table* very efficiently.
 
 #### Tip: {.bs-callout .bs-callout-warning #tip-1}
 
@@ -223,7 +219,7 @@ head(ans)
 
 * Wrap both columns within `.()`, or `list()`. That's it.
 
-# 
+#
 
 #### -- Select both `arr_delay` and `dep_delay` columns *and* rename them to `delay_arr` and `delay_dep`.
 
@@ -247,15 +243,15 @@ ans
 
 #### What's happening here? {.bs-callout .bs-callout-info}
 
-* *data.table*'s `j` can handle more than just *selecting columns* - it can handle *expressions*, i.e., *compute on columns*. This shouldn't be surprising, as columns can be referred to as if they are variables. Then we should be able to *compute* by calling functions on those variables. And that's what precisely happens here. 
+* *data.table*'s `j` can handle more than just *selecting columns* - it can handle *expressions*, i.e., *compute on columns*. This shouldn't be surprising, as columns can be referred to as if they are variables. Then we should be able to *compute* by calling functions on those variables. And that's what precisely happens here.
 
 ### f) Subset in `i` *and* do in `j`
 
 #### -- Calculate the average arrival and departure delay for all flights with "JFK" as the origin airport in the month of June.
 
 ```{r}
-ans <- flights[origin == "JFK" & month == 6L, 
-               .(m_arr=mean(arr_delay), m_dep=mean(dep_delay))]
+ans <- flights[origin == "JFK" & month == 6L,
+               .(m_arr = mean(arr_delay), m_dep = mean(dep_delay))]
 ans
 ```
 
@@ -265,7 +261,7 @@ ans
 
 * Now, we look at `j` and find that it uses only *two columns*. And what we have to do is to compute their `mean()`. Therefore we subset just those columns corresponding to the matching rows, and compute their `mean()`.
 
-Because the three main components of the query (`i`, `j` and `by`) are *together* inside `[...]`, *data.table* can see all three and optimise the query altogether *before evaluation*, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency. 
+Because the three main components of the query (`i`, `j` and `by`) are *together* inside `[...]`, *data.table* can see all three and optimise the query altogether *before evaluation*, not each separately. We are able to therefore avoid the entire subset, for both speed and memory efficiency.
 
 #### -- How many trips have been made in 2014 from "JFK" airport in the month of June?
 
@@ -274,7 +270,7 @@ ans <- flights[origin == "JFK" & month == 6L, length(dest)]
 ans
 ```
 
-The function `length()` requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to `length()` really. 
+The function `length()` requires an input argument. We just needed to compute the number of rows in the subset. We could have used any other column as input argument to `length()` really.
 
 This type of operation occurs quite frequently, especially while grouping as we will see in the next section, that *data.table* provides a *special symbol* `.N` for it.
 
@@ -294,7 +290,7 @@ ans
 
 * Once again, we subset in `i` to get the *row indices* where `origin` airport equals *"JFK"*, and `month` equals *6*.
 
-* We see that `j` uses only `.N` and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices). 
+* We see that `j` uses only `.N` and no other columns. Therefore the entire subset is not materialised. We simply return the number of rows in the subset (which is just the length of row indices).
 
 * Note that we did not wrap `.N` with `list()` or `.()`. Therefore a vector is returned.
 
@@ -302,12 +298,12 @@ We could have accomplished the same operation by doing `nrow(flights[origin == "
 
 ### g) Great! But how can I refer to columns by names in `j` (like in a *data.frame*)?
 
-You can refer to column names the *data.frame* way using `with = FALSE`. 
+You can refer to column names the *data.frame* way using `with = FALSE`.
 
 #### -- Select both `arr_delay` and `dep_delay` columns the *data.frame* way.
 
 ```{r}
-ans <- flights[, c("arr_delay", "dep_delay"), with=FALSE]
+ans <- flights[, c("arr_delay", "dep_delay"), with = FALSE]
 head(ans)
 ```
 
@@ -325,38 +321,38 @@ DF[with(DF, x > 1), ]
 
 #### {.bs-callout .bs-callout-info #with_false}
 
-* Using `with()` in (2) allows using `DF`'s column `x` as if it were a variable. 
+* Using `with()` in (2) allows using `DF`'s column `x` as if it were a variable.
 
-    Hence the argument name `with` in *data.table*. Setting `with=FALSE` disables the ability to refer to columns as if they are variables, thereby restoring the "*data.frame* mode".
+    Hence the argument name `with` in *data.table*. Setting `with = FALSE` disables the ability to refer to columns as if they are variables, thereby restoring the "*data.frame* mode".
 
 * We can also *deselect* columns using `-` or `!`. For example:
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     ## not run
 
     # returns all columns except arr_delay and dep_delay
-    ans <- flights[, !c("arr_delay", "dep_delay"), with=FALSE]
+    ans <- flights[, !c("arr_delay", "dep_delay"), with = FALSE]
     # or
-    ans <- flights[, -c("arr_delay", "dep_delay"), with=FALSE]
+    ans <- flights[, -c("arr_delay", "dep_delay"), with = FALSE]
     ```
 
 * From `v1.9.5+`, we can also select by specifying start and end column names, for e.g, `year:day` to select the first three columns.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     ## not run
 
     # returns year,month and day
-    ans <- flights[, year:day, with=FALSE]
+    ans <- flights[, year:day, with = FALSE]
     # returns day, month and year
-    ans <- flights[, day:year, with=FALSE]
+    ans <- flights[, day:year, with = FALSE]
     # returns all columns except year, month and day
-    ans <- flights[, -(year:day), with=FALSE]
-    ans <- flights[, !(year:day), with=FALSE]
+    ans <- flights[, -(year:day), with = FALSE]
+    ans <- flights[, !(year:day), with = FALSE]
     ```
 
     This is particularly handy while working interactively.
 
-# 
+#
 
 `with = TRUE` is default in *data.table* because we can do much more by allowing `j` to handle expressions - especially when combined with `by` as we'll see in a moment.
 
@@ -369,11 +365,11 @@ We've already seen `i` and `j` from *data.table*'s general form in the previous
 #### -- How can we get the number of trips corresponding to each origin airport?
 
 ```{r}
-ans <- flights[, .(.N), by=.(origin)]
+ans <- flights[, .(.N), by = .(origin)]
 ans
 
 ## or equivalently using a character vector in 'by'
-# ans <- flights[, .(.N), by="origin"]
+# ans <- flights[, .(.N), by = "origin"]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -389,20 +385,20 @@ ans
 * When there's only one column or expression to refer to in `j` and `by`, we can drop the `.()` notation. This is purely for convenience. We could instead do:
 
     ```{r}
-    ans <- flights[, .N, by=origin]
+    ans <- flights[, .N, by = origin]
     ans
     ```
 
     We'll use this convenient form wherever applicable hereafter.
 
-# 
+#
 
 #### -- How can we calculate the number of trips for each origin airport for carrier code *"AA"*? {#origin-.N}
 
 The unique carrier code *"AA"* corresponds to *American Airlines Inc.*
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=origin]
+ans <- flights[carrier == "AA", .N, by = origin]
 ans
 ```
 
@@ -415,11 +411,11 @@ ans
 #### -- How can we get the total number of trips for each `origin, dest` pair for carrier code *"AA"*? {#origin-dest-.N}
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=.(origin,dest)]
+ans <- flights[carrier == "AA", .N, by = .(origin,dest)]
 head(ans)
 
 ## or equivalently using a character vector in 'by'
-# ans <- flights[carrier == "AA", .N, by=c("origin", "dest")]
+# ans <- flights[carrier == "AA", .N, by = c("origin", "dest")]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -429,9 +425,9 @@ head(ans)
 #### -- How can we get the average arrival and departure delay for each `orig,dest` pair for each month for carrier code *"AA"*? {#origin-dest-month}
 
 ```{r}
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        by=.(origin, dest, month)]
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        by = .(origin, dest, month)]
 ans
 ```
 
@@ -447,14 +443,14 @@ Now what if we would like to order the result by those grouping columns `origin`
 
 ### b) keyby
 
-*data.table* retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by. 
+*data.table* retaining the original order of groups is intentional and by design. There are cases when preserving the original order is essential. But at times we would like to automatically sort by the variables we grouped by.
 
 #### -- So how can we directly order by all the grouping variables?
 
 ```{r}
-ans <- flights[carrier == "AA", 
-        .(mean(arr_delay), mean(dep_delay)), 
-        keyby=.(origin, dest, month)]
+ans <- flights[carrier == "AA",
+        .(mean(arr_delay), mean(dep_delay)),
+        keyby = .(origin, dest, month)]
 ans
 ```
 
@@ -462,7 +458,7 @@ ans
 
 * All we did was to change `by` to `keyby`. This automatically orders the result by the grouping variables in increasing order. Note that `keyby()` is applied after performing the  operation, i.e., on the computed result.
 
-**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an *attribute* called `sorted`. But we'll learn more about `keys` in the next vignette. 
+**Keys:** Actually `keyby` does a little more than *just ordering*. It also *sets a key* after ordering by setting an *attribute* called `sorted`. But we'll learn more about `keys` in the next vignette.
 
 For now, all you've to know is you can use `keyby` to automatically order by the columns specified in `by`.
 
@@ -476,7 +472,7 @@ ans <- flights[carrier == "AA", .N, by = .(origin, dest)]
 
 #### -- How can we order `ans` using the columns `origin` in ascending order, and `dest` in descending order?
 
-We can store the intermediate result in a variable, and then use `order(origin, -dest)` on that variable. It seems fairly straightforward. 
+We can store the intermediate result in a variable, and then use `order(origin, -dest)` on that variable. It seems fairly straightforward.
 
 ```{r}
 ans <- ans[order(origin, -dest)]
@@ -491,10 +487,10 @@ head(ans)
 
 #
 
-But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogther by `chaining` expressions.
+But this requires having to assign the intermediate result and then overwriting that result. We can do one better and avoid this intermediate assignment on to a variable altogether by `chaining` expressions.
 
 ```{r}
-ans <- flights[carrier == "AA", .N, by=.(origin, dest)][order(origin, -dest)]
+ans <- flights[carrier == "AA", .N, by = .(origin, dest)][order(origin, -dest)]
 head(ans, 10)
 ```
 
@@ -504,10 +500,10 @@ head(ans, 10)
 
 * Or you can also chain them vertically:
 
-    ```{r eval=FALSE}
-    DT[ ... 
-     ][ ... 
-     ][ ... 
+    ```{r eval = FALSE}
+    DT[ ...
+     ][ ...
+     ][ ...
      ]
     ```
 
@@ -528,7 +524,7 @@ ans
 
 * Note that we did not provide any names to `by-expression`. And names have been automatically assigned in the result.
 
-* You can provide other columns along with expressions, for example: `DT[, .N, by=.(a, b>0)]`. 
+* You can provide other columns along with expressions, for example: `DT[, .N, by = .(a, b>0)]`.
 
 ### e) Multiple columns in `j` - `.SD`
 
@@ -536,27 +532,27 @@ ans
 
 It is of course not practical to have to type `mean(myCol)` for every column one by one. What if you had a 100 columns to compute `mean()` of?
 
-How can we do this efficiently? To get there, refresh on [this tip](#tip-1) - *"As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table"*. Suppose we can refer to the *data subset for each group* as a variable *while grouping*, then we can loop through all the columns of that variable using the already familiar base function `lapply()`. We don't have to learn any new function. 
+How can we do this efficiently? To get there, refresh on [this tip](#tip-1) - *"As long as j-expression returns a list, each element of the list will be converted to a column in the resulting data.table"*. Suppose we can refer to the *data subset for each group* as a variable *while grouping*, then we can loop through all the columns of that variable using the already familiar base function `lapply()`. We don't have to learn any new function.
 
 #### Special symbol `.SD`: {.bs-callout .bs-callout-info #special-SD}
 
-*data.table* provides a *special* symbol, called `.SD`. It stands for **S**ubset of **D**ata. It by itself is a *data.table* that holds the data for *the current group* defined using `by`. 
+*data.table* provides a *special* symbol, called `.SD`. It stands for **S**ubset of **D**ata. It by itself is a *data.table* that holds the data for *the current group* defined using `by`.
 
 Recall that a *data.table* is internally a list as well with all its columns of equal length.
 
-# 
+#
 
 Let's use the [*data.table* `DT` from before](#what-is-datatable-1a) to get a glimpse of what `.SD` looks like.
 
 ```{r}
 DT
 
-DT[, print(.SD), by=ID]
+DT[, print(.SD), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* `.SD` contains all the columns *except the grouping columns* by default. 
+* `.SD` contains all the columns *except the grouping columns* by default.
 
 * It is also generated by preserving the original order - data corresponding to `ID = "b"`, then `ID = "a"`, and then `ID = "c"`.
 
@@ -565,37 +561,37 @@ DT[, print(.SD), by=ID]
 To compute on (multiple) columns, we can then simply use the base R function `lapply()`.
 
 ```{r}
-DT[, lapply(.SD, mean), by=ID]
+DT[, lapply(.SD, mean), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* `.SD` holds the rows corresponding to columns *a*, *b* and *c* for that group. We compute the `mean()` on each of these columns using the already familiar base function `lapply()`. 
+* `.SD` holds the rows corresponding to columns *a*, *b* and *c* for that group. We compute the `mean()` on each of these columns using the already familiar base function `lapply()`.
 
 * Each group returns a list of three elements containing the mean value which will become the columns of the resulting `data.table`.
 
 * Since `lapply()` returns a *list*, there is no need to wrap it with an additional `.()` (if necessary, refer to [this tip](#tip-1)).
 
-# 
+#
 
-We are almost there. There is one little thing left to address. In our `flights` *data.table*, we only wanted to calculate the `mean()` of two columns `arr_delay` and `dep_delay`. But `.SD` would contain all the columns other than the grouping variables by default. 
+We are almost there. There is one little thing left to address. In our `flights` *data.table*, we only wanted to calculate the `mean()` of two columns `arr_delay` and `dep_delay`. But `.SD` would contain all the columns other than the grouping variables by default.
 
 #### -- How can we specify just the columns we would like to compute the `mean()` on?
 
 #### .SDcols {.bs-callout .bs-callout-info}
 
-Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group. 
+Using the argument `.SDcols`. It accepts either column names or column indices. For example, `.SDcols = c("arr_delay", "dep_delay")` ensures that `.SD` contains only these two columns for each group.
 
 Similar to the [with = FALSE section](#with_false), you can also provide the columns to remove instead of columns to keep using `-` or `!` sign as well as select consecutive columns as `colA:colB` and deselect consecutive columns as `!(colA:colB) or `-(colA:colB)`.
 
-# 
+#
 Now let us try to use `.SD` along with `.SDcols` to get the `mean()` of `arr_delay` and `dep_delay` columns grouped by `origin`, `dest` and `month`.
 
 ```{r}
-flights[carrier == "AA",                     ## Only on trips with carrier "AA"
-        lapply(.SD, mean),                   ## compute the mean
-        by=.(origin, dest, month),           ## for every 'origin,dest,month'
-        .SDcols=c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
+flights[carrier == "AA",                       ## Only on trips with carrier "AA"
+        lapply(.SD, mean),                     ## compute the mean
+        by = .(origin, dest, month),           ## for every 'origin,dest,month'
+        .SDcols = c("arr_delay", "dep_delay")] ## for just those specified in .SDcols
 ```
 
 ### f) Subset `.SD` for each group:
@@ -603,7 +599,7 @@ flights[carrier == "AA",                     ## Only on trips with carrier "AA"
 #### -- How can we return the first two rows for each `month`?
 
 ```{r}
-ans <- flights[, head(.SD, 2), by=month]
+ans <- flights[, head(.SD, 2), by = month]
 head(ans)
 ```
 
@@ -617,10 +613,10 @@ head(ans)
 
 So that we have a consistent syntax and keep using already existing (and familiar) base functions instead of learning new functions. To illustrate, let us use the *data.table* `DT` we created at the very beginning under [What is a data.table?](#what-is-datatable-1a) section.
 
-#### -- How can we concatenate columns `a` and `b` for each group in `ID`? 
+#### -- How can we concatenate columns `a` and `b` for each group in `ID`?
 
 ```{r}
-DT[, .(val = c(a,b)), by=ID]
+DT[, .(val = c(a,b)), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -630,7 +626,7 @@ DT[, .(val = c(a,b)), by=ID]
 #### -- What if we would like to have all the values of column `a` and `b` concatenated, but returned as a list column?
 
 ```{r}
-DT[, .(val = list(c(a,b))), by=ID]
+DT[, .(val = list(c(a,b))), by = ID]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -639,17 +635,17 @@ DT[, .(val = list(c(a,b))), by=ID]
 
 * Note those commas are for display only. A list column can contain any object in each cell, and in this example, each cell is itself a vector and some cells contain longer vectors than others.
 
-# 
-Once you start internalising usage in `j`, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of `print()`. 
+#
+Once you start internalising usage in `j`, you will realise how powerful the syntax can be. A very useful way to understand it is by playing around, with the help of `print()`.
 
 For example:
 
 ```{r}
 ## (1) look at the difference between
-DT[, print(c(a,b)), by=ID]
+DT[, print(c(a,b)), by = ID]
 
 ## (2) and
-DT[, print(list(c(a,b))), by=ID]
+DT[, print(list(c(a,b))), by = ID]
 ```
 
 In (1), for each group, a vector is returned, with length = 6,4,2 here. However (2) returns a list of length 1 for each group, with its first element holding vectors of length 6,4,2. Therefore (1) results in a length of ` 6+4+2 = `r 6+4+2``, whereas (2) returns `1+1+1=`r 1+1+1``.
@@ -658,7 +654,7 @@ In (1), for each group, a vector is returned, with length = 6,4,2 here. However
 
 The general form of *data.table* syntax is:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 DT[i, j, by]
 ```
 
@@ -676,7 +672,7 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 
 1. Select columns the *data.table* way: `DT[, .(colA, colB)]`.
 
-2. Select columns the *data.frame* way: `DT[, c("colA", "colB"), with=FALSE]`.
+2. Select columns the *data.frame* way: `DT[, c("colA", "colB"), with = FALSE]`.
 
 3. Compute on columns: `DT[, .(sum(colA), mean(colB))]`.
 
@@ -687,20 +683,20 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 #
 
 #### Using `by`: {.bs-callout .bs-callout-info}
- 
+
 * Using `by`, we can group by columns by specifying a *list of columns* or a *character vector of column names* or even *expressions*. The flexibility of `j`, combined with `by` and `i` makes for a very powerful syntax.
 
-* `by` can handle multiple columns and also *expressions*. 
+* `by` can handle multiple columns and also *expressions*.
 
-* We can `keyby` grouping columns to automatically sort the grouped result. 
+* We can `keyby` grouping columns to automatically sort the grouped result.
 
 * We can use `.SD` and `.SDcols` in `j` to operate on multiple columns using already familiar base functions. Here are some examples:
 
-    1. `DT[, lapply(.SD, fun), by=., .SDcols=...]` - applies `fun` to all columns specified in `.SDcols` while grouping by the columns specified in `by`.
+    1. `DT[, lapply(.SD, fun), by = ..., .SDcols = ...]` - applies `fun` to all columns specified in `.SDcols` while grouping by the columns specified in `by`.
 
-    2. `DT[, head(.SD, 2), by=.]` - return the first two rows for each group.
+    2. `DT[, head(.SD, 2), by = ...]` - return the first two rows for each group.
 
-    3. `DT[col > val, head(.SD, 1), by=.]` - combine `i` along with `j` and `by`.
+    3. `DT[col > val, head(.SD, 1), by = ...]` - combine `i` along with `j` and `by`.
 
 #
 
@@ -708,7 +704,7 @@ We can do much more in `i` by keying a *data.table*, which allows blazing fast s
 
 As long as `j` returns a *list*, each element of the list will become a column in the resulting *data.table*.
 
-# 
+#
 
 We will see how to *add/update/delete* columns *by reference* and how to combine them with `i` and `by` in the next vignette.
 
diff --git a/vignettes/datatable-intro.Rnw b/vignettes/datatable-intro.Rnw
deleted file mode 100644
index 49627cb..0000000
--- a/vignettes/datatable-intro.Rnw
+++ /dev/null
@@ -1,293 +0,0 @@
-\documentclass[a4paper]{article}
-
-\usepackage[margin=3cm]{geometry}
-%%\usepackage[round]{natbib}
-\usepackage[colorlinks=true,urlcolor=blue]{hyperref}
-
-%%\newcommand{\acronym}[1]{\textsc{#1}}
-%%\newcommand{\class}[1]{\mbox{\textsf{#1}}}
-\newcommand{\code}[1]{\mbox{\texttt{#1}}}
-\newcommand{\pkg}[1]{{\normalfont\fontseries{b}\selectfont #1}}
-\newcommand{\proglang}[1]{\textsf{#1}}
-\SweaveOpts{keep.source=TRUE, strip.white=all}
-%% \VignetteIndexEntry{Quick introduction}
-
-<<echo=FALSE,results=hide>>=
-if (!exists("data.table",.GlobalEnv)) library(data.table)  
-# In devel won't call library, but R CMD build/check will.
-rm(list=as.character(tables()$NAME),envir=.GlobalEnv)
-# for development when we Sweave this file repeatedly. Otherwise first tables() shows tables from last run
-@
-
-\begin{document}
-\title{Introduction to the \pkg{data.table} package in \proglang{R}}
-\date{Revised: \today\\(A later revision may be available on the \href{https://github.com/Rdatatable/data.table/wiki}{homepage})}
-\maketitle
-
-\section*{Introduction}
-
-This vignette is aimed at those who are already familiar with creating and subsetting \code{data.frame} in \proglang{R}. We aim for this quick introduction
-to be readable in {\bf 10 minutes}, briefly covering a few features:
-1.\,Keys; 2.\,Fast Grouping; and 3.\,Fast \emph{ordered} join.
-
-\section*{Creation}
-
-Recall that we create a \code{data.frame} using the function \code{data.frame()}:
-<<>>=
-DF = data.frame(x=c("b","b","b","a","a"),v=rnorm(5))
-DF
-@
-A \code{data.table} is created in exactly the same way:
-<<>>=
-DT = data.table(x=c("b","b","b","a","a"),v=rnorm(5))
-DT
-@
-Observe that a \code{data.table} prints the row numbers with a colon so as to visually separate the row number from the first column.
-We can easily convert existing \code{data.frame} objects to \code{data.table}.
-<<>>=
-CARS = data.table(cars)
-head(CARS)
-@
-We have just created two \code{data.table}s: \code{DT} and \code{CARS}. It is often useful to see a list of all
-\code{data.table}s in memory:
-<<>>=
-tables()
-@
-
-The MB column is useful to quickly assess memory use and to spot if any redundant tables can be
-removed to free up memory. Just like \code{data.frame}s, \code{data.table}s must fit inside RAM. 
-
-Some users regularly work with 20 or more tables in memory, rather like a database. 
-The result of \code{tables()} is itself a \code{data.table}, returned silently, so that \code{tables()} 
-can be used in programs. \code{tables()} is unrelated to the base function \code{table()}.
-
-To see the column types :
-
-<<>>=
-sapply(DT,class)
-@
-
-You may have noticed the empty column KEY in the result of \code{tables()} above. This is the subject of the next section.
-
-
-\section*{1. Keys}
-
-Let's start by considering \code{data.frame}, specifically \code{rownames}. We know that each row has exactly one row name. However, a person (for example) has at least two names, a first name and a second name. It's useful to organise a telephone directory sorted by surname then first name.
-
-In \code{data.table}, a \emph{key} consists of one \emph{or more} columns. These columns may be integer, factor or numeric as well as character. Furthermore, the rows are sorted by the key. Therefore, a \code{data.table} can have at most one key because it cannot be sorted in more than one way. We can think of a key as like super-charged row names; i.e., mult-column and multi-type.
-
-Uniqueness is not enforced; i.e., duplicate key values are allowed. Since
-the rows are sorted by the key, any duplicates in the key will appear consecutively.
-
-Let's remind ourselves of our tables:
-<<>>=
-tables()
-DT
-@
-
-No keys have been set yet. 
-
-<<>>=
-DT[2,]         # select row 2
-DT[x=="b",]    # select rows where column x == "b"
-@
-
-Aside: notice that we did not need to prefix \code{x} with \code{DT\$x}. In \code{data.table} queries, we can use column names as if they are variables directly.
-
-But since there are no rownames, the following does not work:
-<<>>=
-cat(try(DT["b",],silent=TRUE))
-@
-
-The error message tells us we need to use \code{setkey()}:
-<<>>=
-setkey(DT,x)
-DT
-@
-
-Notice that the rows in \code{DT} have now been re-ordered according to the values of \code{x}. 
-The two \code{"a"} rows have moved to the top.
-We can confirm that \code{DT} does indeed have a key using \code{haskey()}, \code{key()},
-\code{attributes()}, or just running \code{tables()}.
-
-<<>>=
-tables()
-@
-
-Now that we are sure \code{DT} has a key, let's try again:
-
-<<>>=
-DT["b",]
-@
-
-By default all the rows in the group are returned\footnote{In contrast to a \code{data.frame} where only the first rowname is returned when the rownames contain duplicates.}. The \code{mult} argument (short for \emph{multiple}) allows the first or last row of the group to be returned instead.
-
-<<>>=
-DT["b",mult="first"]
-DT["b",mult="last"]
-@
-
-Also, the comma is optional.
-
-<<>>=
-DT["b"]
-@
-
-Let's now create a new \code{data.frame}. We will make it large enough to demonstrate the
-difference between a \emph{vector scan} and a \emph{binary search}.
-<<print=TRUE>>=
-grpsize = ceiling(1e7/26^2)   # 10 million rows, 676 groups
-tt=system.time( DF <- data.frame(
-  x=rep(LETTERS,each=26*grpsize),
-  y=rep(letters,each=grpsize),
-  v=runif(grpsize*26^2),
-  stringsAsFactors=FALSE)
-)
-head(DF,3)
-tail(DF,3)
-dim(DF)
-@
-
-We might say that \proglang{R} has created a 3 column table and \emph{inserted}
-\Sexpr{format(nrow(DF),big.mark=",",scientific=FALSE)} rows. It took \Sexpr{format(tt[3],nsmall=3)} secs, so it inserted
-\Sexpr{format(as.integer(nrow(DF)/tt[3]),big.mark=",",scientific=FALSE)} rows per second. This is normal in base \proglang{R}. Notice that we set \code{stringsAsFactors=FALSE}. This makes it a little faster for a fairer comparison, but feel free to experiment. 
-
-Let's extract an arbitrary group from \code{DF}:
-
-<<print=TRUE>>=
-tt=system.time(ans1 <- DF[DF$x=="R" & DF$y=="h",])   # 'vector scan'
-head(ans1,3)
-dim(ans1)
-@
-
-Now convert to a \code{data.table} and extract the same group:
-
-<<>>=
-DT = as.data.table(DF)       # but normally use fread() or data.table() directly, originally 
-system.time(setkey(DT,x,y))  # one-off cost, usually
-@
-<<print=TRUE>>=
-ss=system.time(ans2 <- DT[list("R","h")])   # binary search
-head(ans2,3)
-dim(ans2)
-identical(ans1$v, ans2$v)
-@
-<<echo=FALSE>>=
-if(!identical(ans1$v, ans2$v)) stop("vector scan vs binary search not equal")
-@
-
-At \Sexpr{format(ss[3],nsmall=3)} seconds, this was {\bf\Sexpr{as.integer(tt[3]/ss[3])}} times faster than \Sexpr{format(tt[3],nsmall=3)} seconds,
-and produced precisely the same result. If you are thinking that a few seconds is not much to save, it's the relative speedup that's important. The
-vector scan is linear, but the binary search is O(log n). It scales. If a task taking 10 hours is sped up by 100 times to 6 minutes, that is
-significant\footnote{We wonder how many people are deploying parallel techniques to code that is vector scanning}. 
-
-We can do vector scans in \code{data.table}, too. In other words we can use data.table \emph{badly}.
-
-<<>>=
-system.time(ans1 <- DT[x=="R" & y=="h",])   # works but is using data.table badly
-system.time(ans2 <- DF[DF$x=="R" & DF$y=="h",])   # the data.frame way
-mapply(identical,ans1,ans2)
-@
-
-
-If the phone book analogy helped, the {\bf\Sexpr{as.integer(tt[3]/ss[3])}} times speedup should not be surprising. We use the key to take advantage of the fact 
-that the table is sorted and use binary search to find the matching rows. We didn't vector scan; we didn't use \code{==}.
-
-When we used \code{x=="R"} we \emph{scanned} the entire column x, testing each and every value to see if it equalled "R". We did
-it again in the y column, testing for "h". Then \code{\&} combined the two logical results to create a single logical vector which was
-passed to the \code{[} method, which in turn searched it for \code{TRUE} and returned those rows. These were \emph{vectorized} operations. They
-occurred internally in R and were very fast, but they were scans. \emph{We} did those scans because \emph{we} wrote that R code.
-
-
-When \code{i} is a \code{list} (and \code{data.table} is a \code{list} too), we say that we are \emph{joining}. In this case, we are joining DT to the 1 row, 2 column table returned by \code{list("R","h")}. Since we do this a lot, there is an alias for \code{list}: \code{.()}.
-
-<<>>=
-identical( DT[list("R","h"),],
-           DT[.("R","h"),])
-@
-<<echo=FALSE>>=
-if(!identical(DT[list("R","h"),],DT[.("R","h"),])) stop("list != . check")
-@
-
-Both vector scanning and binary search are available in \code{data.table}, but one way of using \code{data.table} is much better than the other.
-
-The join syntax is a short, fast to write and easy to maintain. Passing a \code{data.table} into a \code{data.table} subset is analogous to \code{A[B]} syntax in base \proglang{R} where \code{A} is a matrix and \code{B} is a 2-column matrix\footnote{Subsetting a keyed \code{data.table} by a n-column 
-\code{data.table} is consistent with subsetting a n-dimension array by a n-column matrix in base R}. In fact, the \code{A[B]} syntax in base R inspired the \code{data.table} package. There are
-other types of ordered joins and further arguments which are beyond the scope of this quick introduction.
-
-The merge method of \code{data.table} is very similar to \code{X[Y]}, but there are some differences. See FAQ 1.12.
-
-This first section has been about the first argument inside \code{DT[...]}, namely \code{i}. The next section is about the 2nd and 3rd arguments: \code{j} and \code{by}.
-
-
-\section*{2. Fast grouping}
-
-
-The second argument to \code{DT[...]} is \code{j} and may consist of
-one or more expressions whose arguments are (unquoted) column names, as if the column names were variables. Just as we saw earlier in \code{i} as well.
-
-<<>>=
-DT[,sum(v)]
-@
-
-When we supply a \code{j} expression and a 'by' expression, the \code{j} expression is repeated for each 'by' group.
-
-<<>>=
-DT[,sum(v),by=x]
-@
-
-The \code{by} in \code{data.table} is fast.  Let's compare it to \code{tapply}.
-
-<<>>=
-ttt=system.time(tt <- tapply(DT$v,DT$x,sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by=x]); sss
-head(tt)
-head(ss)
-identical(as.vector(tt), ss$V1)
-@
-<<echo=FALSE>>=
-if(!identical(as.vector(tt), ss$V1)) stop("by check failed")
-@
-
-At \Sexpr{sprintf("%0.3f",sss[3])} sec, this was {\bf\Sexpr{as.integer(ttt[3]/sss[3])}} times faster than 
-\Sexpr{sprintf("%0.3f",ttt[3])} sec, and produced precisely the same result.
-
-Next, let's group by two columns:
-
-<<>>=
-ttt=system.time(tt <- tapply(DT$v,list(DT$x,DT$y),sum)); ttt
-sss=system.time(ss <- DT[,sum(v),by="x,y"]); sss
-tt[1:5,1:5]
-head(ss)
-identical(as.vector(t(tt)), ss$V1)
-@
-<<echo=FALSE>>=
-if(!identical(as.vector(t(tt)), ss$V1)) stop("group check failed")
-@
-
-This was {\bf\Sexpr{as.integer(ttt[3]/sss[3])}} times faster, and the syntax is a little simpler and easier to read.
-\newline
-
-
-\section*{3. Fast ordered joins}
-
-This is also known as last observation carried forward (LOCF) or a \emph{rolling join}.
-
-Recall that \code{X[Y]} is a join between \code{data.table} \code{X} and \code{data.table} \code{Y}.  If \code{Y} has 2 columns, the first column is matched
-to the first column of the key of \code{X} and the 2nd column to the 2nd.  An equi-join is performed by default, meaning that the values must be equal.
-
-Instead of an equi-join, a rolling join is :
-
-\code{X[Y,roll=TRUE]}
-
-As before the first column of \code{Y} is matched to \code{X} where the values are equal. The last join column in \code{Y} though, the 2nd one in
-this example, is treated specially. If no match is found, then the row before is returned, provided the first column still matches.
-
-Further controls are rolling forwards, backwards, nearest and limited staleness. 
-
-For examples type \code{example(data.table)} and follow the output at the prompt.
-
-
-\end{document}
-
-
diff --git a/vignettes/datatable-keys-fast-subset.Rmd b/vignettes/datatable-keys-fast-subset.Rmd
index b9ebd7c..d4d6b8d 100644
--- a/vignettes/datatable-keys-fast-subset.Rmd
+++ b/vignettes/datatable-keys-fast-subset.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Keys and fast binary search based subset"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Keys and fast binary search based subset}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,14 +13,13 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 
-This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and  group by using `by`. If you're not familar with these concepts, please read the *"Introduction to data.table"* and *"data.table reference semantics"* vignettes first.
+This vignette is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, add/modify/delete columns *by reference* in `j` and  group by using `by`. If you're not familiar with these concepts, please read the *"Introduction to data.table"* and *"Reference semantics"* vignettes first.
 
 ***
 
@@ -31,8 +27,8 @@ This vignette is aimed at those who are already familiar with *data.table* synta
 
 We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -57,15 +53,15 @@ In this vignette, we will
 
 ### a) What is a *key*?
 
-In the *"Introduction to data.table"* vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*. 
+In the *"Introduction to data.table"* vignette, we saw how to subset rows in `i` using logical expressions, row numbers and using `order()`. In this section, we will look at another way of subsetting incredibly fast - using *keys*.
 
 But first, let's start by looking at *data.frames*. All *data.frames* have a row names attribute. Consider the *data.frame* `DF` below.
 
 ```{r}
 set.seed(1L)
-DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE), 
+DF = data.frame(ID1 = sample(letters[1:2], 10, TRUE),
                 ID2 = sample(1:3, 10, TRUE),
-                val = sample(10), 
+                val = sample(10),
                 stringsAsFactors = FALSE,
                 row.names = sample(LETTERS[1:10]))
 DF
@@ -79,15 +75,15 @@ We can *subset* a particular row using its row name as shown below:
 DF["C", ]
 ```
 
-i.e., row names are more or less *an index* to rows of a *data.frame*. However, 
+i.e., row names are more or less *an index* to rows of a *data.frame*. However,
 
 1. Each row is limited to *exactly one* row name.
 
-    But, a person (for example) has at least two names - a *first* and a *second* name. It is useful to organise a telephone directory by *surname* then *first name*. 
+    But, a person (for example) has at least two names - a *first* and a *second* name. It is useful to organise a telephone directory by *surname* then *first name*.
 
 2. And row names should be *unique*.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     rownames(DF) = sample(LETTERS[1:5], 10, TRUE)
     # Warning: non-unique values when setting 'row.names': 'C', 'D'
     # Error in `row.names<-.data.frame`(`*tmp*`, value = value): duplicate 'row.names' are not allowed
@@ -102,11 +98,11 @@ DT
 rownames(DT)
 ```
 
-* Note that row names have been reset. 
+* Note that row names have been reset.
 
 * *data.tables* never uses row names. Since *data.tables* **inherit** from *data.frames*, it still has the row names attribute. But it never uses them. We'll see in a moment as to why.
 
-    If you would like to preserve the row names, use `keep.rownames = TRUE` in `as.data.table()` - this will create a new column called `rn` and assign row names to this column. 
+    If you would like to preserve the row names, use `keep.rownames = TRUE` in `as.data.table()` - this will create a new column called `rn` and assign row names to this column.
 
 Instead, in *data.tables* we set and use `keys`. Think of a `key` as **supercharged rownames**.
 
@@ -114,11 +110,11 @@ Instead, in *data.tables* we set and use `keys`. Think of a `key` as **superchar
 
 1. We can set keys on *multiple columns* and the column can be of *different types* -- *integer*, *numeric*, *character*, *factor*, *integer64* etc. *list* and *complex* types are not supported yet.
 
-2. Uniqueness is not enforced, i.e., duplicate key values are allowed. Since rows are sorted by key, any duplicates in the key columns will appear consecutively.  
+2. Uniqueness is not enforced, i.e., duplicate key values are allowed. Since rows are sorted by key, any duplicates in the key columns will appear consecutively.
 
-3. Setting a `key` *does two things*: 
+3. Setting a `key` does *two* things:
 
-    a. reorders the rows of the *data.table* by the column(s) provided *by reference*, always in *increasing* order. 
+    a. physically reorders the rows of the *data.table* by the column(s) provided *by reference*, always in *increasing* order.
 
     b. marks those columns as *key* columns by setting an attribute called `sorted` to the *data.table*.
 
@@ -150,7 +146,7 @@ head(flights)
 
 * The *data.table* is now reordered (or sorted) by the column we provided - `origin`. Since we reorder by reference, we only require additional memory of one column of length equal to the number of rows in the *data.table*, and is therefore very memory efficient.
 
-* You can also set keys directly when creating *data.tables* using the `data.table()` function using `key=` argument. It takes a character vector of column names.
+* You can also set keys directly when creating *data.tables* using the `data.table()` function using `key` argument. It takes a character vector of column names.
 
 #### set* and `:=`: {.bs-callout .bs-callout-info}
 
@@ -158,7 +154,7 @@ In *data.table*, the `:=` operator and all the `set*` (e.g., `setkey`, `setorder
 
 #
 
-Once you *key* a *data.table* by certain columns, you can subset by querying those key columns using the `.()` notation in `i`. Recall that `.()` is an *alias to* `list()`. 
+Once you *key* a *data.table* by certain columns, you can subset by querying those key columns using the `.()` notation in `i`. Recall that `.()` is an *alias to* `list()`.
 
 #### -- Use the key column `origin` to subset all rows where the origin airport matches *"JFK"*
 
@@ -166,7 +162,8 @@ Once you *key* a *data.table* by certain columns, you can subset by querying tho
 flights[.("JFK")]
 
 ## alternatively
-# flights[J("JFK")] (or) flights[list("JFK")]
+# flights[J("JFK")] (or) 
+# flights[list("JFK")]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -177,7 +174,7 @@ flights[.("JFK")]
 
 * On single column key of *character* type, you can drop the `.()` notation and use the values directly when subsetting, like subset using row names on *data.frames*.
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     flights["JFK"]              ## same as flights[.("JFK")]
     ```
 
@@ -205,7 +202,7 @@ key(flights)
 
 ### c) Keys and multiple columns
 
-To refresh, *keys* are like *supercharged* rownames. We can set key on multiple columns and they can be of multiple types. 
+To refresh, *keys* are like *supercharged* row names. We can set key on multiple columns and they can be of multiple types.
 
 #### -- How can I set keys on both `origin` *and* `dest` columns?
 
@@ -231,11 +228,11 @@ flights[.("JFK", "MIA")]
 
 #### How does the subset work here? {.bs-callout .bs-callout-info #multiple-key-point}
 
-* It is important to undertand how this works internally. *"JFK"* is first matched against the first key column `origin`. And *within those matching rows*, *"MIA"* is matched against the second key column `dest` to obtain *row indices* where both `origin` and `dest` match the given values. 
+* It is important to undertand how this works internally. *"JFK"* is first matched against the first key column `origin`. And *within those matching rows*, *"MIA"* is matched against the second key column `dest` to obtain *row indices* where both `origin` and `dest` match the given values.
 
 * Since no `j` is provided, we simply return *all columns* corresponding to those row indices.
 
-# 
+#
 
 #### -- Subset all rows where just the first key column `origin` matches *"JFK"*
 
@@ -257,7 +254,7 @@ flights[.(unique(origin), "MIA")]
 
 #### What's happening here? {.bs-callout .bs-callout-info}
 
-* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching vlaues in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therfore we provide *all* unique values from key column `origin`.
+* Read [this](#multiple-key-point) again. The value provided for the second key column *"MIA"* has to find the matching values in `dest` key column *on the matching rows provided by the first key column `origin`*. We can not skip the values of key columns *before*. Therefore we provide *all* unique values from key column `origin`.
 
 * *"MIA"* is automatically recycled to fit the length of `unique(origin)` which is *3*.
 
@@ -276,14 +273,14 @@ flights[.("LGA", "TPA"), .(arr_delay)]
 
 #### {.bs-callout .bs-callout-info}
 
-* The *row indices* corresponding to `origin == "LGA" and `dest == "TPA"` are obtained using *key based subset*.
+* The *row indices* corresponding to `origin == "LGA"` and `dest == "TPA"` are obtained using *key based subset*.
 
 * Once we have the row indices, we look at `j` which requires only the `arr_delay` column. So we simply select the column `arr_delay` for those *row indices* in the exact same way as we have seen in *Introduction to data.table* vignette.
 
 * We could have returned the result by using `with = FALSE` as well.
- 
+
     ```{r eval = FALSE}
-    flights[.("LGA", "TPA"), "arr_delay", with=FALSE]
+    flights[.("LGA", "TPA"), "arr_delay", with = FALSE]
     ```
 
 ### b) Chaining
@@ -326,16 +323,16 @@ key(flights)
 
 #### {.bs-callout .bs-callout-info}
 
-* We first set `key` to *hour*. This reorders `flights` by the column *hour* and marks that column as the `key` column.
+* We first set `key` to `hour`. This reorders `flights` by the column `hour` and marks that column as the `key` column.
 
-* Now we can subset on *hour* by using the `.()` notation. We subset for the value *24* and obtain the corresponding *row indices*.
+* Now we can subset on `hour` by using the `.()` notation. We subset for the value *24* and obtain the corresponding *row indices*.
 
 * And on those row indices, we replace the `key` column with the value `0`.
 
 * Since we have replaced values on the *key* column, the *data.table* `flights` isn't sorted by `hour` anymore. Therefore, the key has been automatically removed by setting to NULL.
 
 #
-Now, there shouldn't be any *24* in the *hour* column.
+Now, there shouldn't be any *24* in the `hour` column.
 
 ```{r}
 flights[, sort(unique(hour))]
@@ -353,14 +350,14 @@ key(flights)
 #### -- Get the maximum departure delay for each `month` corresponding to `origin = "JFK"`. Order the result by `month`
 
 ```{r}
-ans <- flights["JFK", max(dep_delay), keyby=month]
+ans <- flights["JFK", max(dep_delay), keyby = month]
 head(ans)
 key(ans)
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* We subset on the `key` column *origin* to obtain the *row indices* corresponding to *"JFK"*. 
+* We subset on the `key` column *origin* to obtain the *row indices* corresponding to *"JFK"*.
 
 * Once we obtain the row indices, we only need two columns - `month` to group by and `dep_delay` to obtain `max()` for each group. *data.table's* query optimisation therefore subsets just those two columns corresponding to the *row indices* obtained in `i`, for speed and memory efficiency.
 
@@ -377,29 +374,29 @@ We can choose, for each query, if *"all"* the matching rows should be returned,
 #### -- Subset only the first matching row from all rows where `origin` matches *"JFK"* and `dest` matches *"MIA"*
 
 ```{r}
-flights[.("JFK", "MIA"), mult="first"]
+flights[.("JFK", "MIA"), mult = "first"]
 ```
 
 #### -- Subset only the last matching row of all the rows where `origin` matches *"LGA", "JFK", "EWR"* and `dest` matches *"XNA"*
 
 ```{r}
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last"]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last"]
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* The query *"JFK", "XNA"* doesn't match any rows in `flights` and therefore returns `NA`. 
+* The query *"JFK", "XNA"* doesn't match any rows in `flights` and therefore returns `NA`.
 
 * Once again, the query for second key column `dest`,  *"XNA"*, is recycled to fit the length of the query for first key column `origin`, which is of length 3.
 
-### The *nomatch* argument
+### b) The *nomatch* argument
 
 We can choose if queries that do not match should return `NA` or be skipped altogether using the `nomatch` argument.
 
 #### -- From the previous example, Subset all rows only if there's a match
 
 ```{r}
-flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last", nomatch = 0L]
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", nomatch = 0L]
 ```
 
 #### {.bs-callout .bs-callout-info}
@@ -412,30 +409,30 @@ flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult="last", nomatch = 0L]
 
 We have seen so far how we can set and use keys to subset. But what's the advantage? For example, instead of doing:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 # key by origin,dest columns
 flights[.("JFK", "MIA")]
 ```
 
 we could have done:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 flights[origin == "JFK" & dest == "MIA"]
 ```
 
-One advantage very likely is shorter syntax. But even more than that, *binary search based subsets* are **incredibly fast**. 
+One advantage very likely is shorter syntax. But even more than that, *binary search based subsets* are **incredibly fast**.
 
 ### a) Performance of binary search approach
 
-To illustrate, let's create a sample *data.table* with 20 million rows and three columns and key it by columns `x` and `y`. 
+To illustrate, let's create a sample *data.table* with 20 million rows and three columns and key it by columns `x` and `y`.
 
 ```{r}
 set.seed(2L)
 N = 2e7L
-DT = data.table(x = sample(letters, N, TRUE), 
-                y = sample(1000L, N, TRUE), 
-                val=runif(N), key = c("x", "y"))
-print(object.size(DT), units="Mb")
+DT = data.table(x = sample(letters, N, TRUE),
+                y = sample(1000L, N, TRUE),
+              val = runif(N), key = c("x", "y"))
+print(object.size(DT), units = "Mb")
 
 key(DT)
 ```
@@ -464,7 +461,7 @@ dim(ans2)
 identical(ans1$val, ans2$val)
 ```
 
-* The speedup is **~`r round(t1[3]/t2[3])`x**!
+* The speedup is **~`r round(t1[3]/max(t2[3], .001))`x**!
 
 ### b) Why does keying a *data.table* result in blazing fast susbets?
 
@@ -480,15 +477,15 @@ To understand that, let's first look at what *vector scan approach* (method 1) d
 
 This is what we call a *vector scan approach*. And this is quite inefficient, especially on larger tables and when one needs repeated subsetting, because it has to scan through all the rows each time.
 
-# 
+#
 
-Now let us look at binary search approach (method 2). Recall from [Properties of key](#key-properties) - *setting keys reorders the data.table by key columns*. Since the data is sorted, we don't have to *scan through the entire length of the column*! We can instead use *binary search* to search a value in `O(log n)` as opposed to `O(n)` in case of *vector scan approach*, where `n` is the number of rows in the *data.table*. 
+Now let us look at binary search approach (method 2). Recall from [Properties of key](#key-properties) - *setting keys reorders the data.table by key columns*. Since the data is sorted, we don't have to *scan through the entire length of the column*! We can instead use *binary search* to search a value in `O(log n)` as opposed to `O(n)` in case of *vector scan approach*, where `n` is the number of rows in the *data.table*.
 
 #### Binary search approach: {.bs-callout .bs-callout-info}
 
 Here's a very simple illustration. Let's consider the (sorted) numbers shown below:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 1, 5, 10, 19, 22, 23, 30
 ```
 
@@ -506,7 +503,7 @@ A vector scan approach on the other hand would have to scan through all the valu
 
 #
 
-It can be seen that with every search we reduce the number of searches by half. This is why *binary search* based subsets are **incredibly fast**. Since rows of each column of *data.tables* have contiguous locations in memory, the operations are performed in a very cache efficient manner (also contributes to *speed*). 
+It can be seen that with every search we reduce the number of searches by half. This is why *binary search* based subsets are **incredibly fast**. Since rows of each column of *data.tables* have contiguous locations in memory, the operations are performed in a very cache efficient manner (also contributes to *speed*).
 
 In addition, since we obtain the matching row indices directly without having to create those huge logical vectors (equal to the number of rows in a *data.table*), it is quite **memory efficient** as well.
 
@@ -516,19 +513,15 @@ In this vignette, we have learnt another method to subset rows in `i` by keying
 
 #### {.bs-callout .bs-callout-info}
 
-* set key and subset using the key on a *data.table*. 
+* set key and subset using the key on a *data.table*.
 
 * subset using keys which fetches *row indices* in `i`, but much faster.
 
 * combine key based subsets with `j` and `by`. Note that the `j` and `by` operations are exactly the same as before.
 
-Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*.
-
 #
 
-We don't have to set and use keys for aggregation operations in general, unless the data is extremely large and/or the task requires repeated subsetting where key based subsets will be noticeably performant. 
-
-However, keying *data.tables* are essential to *join* two *data.tables* which is the subject of discussion in the next vignette *"Joins and rolling joins"*. We will extend the concept of key based subsets to joining two *data.tables* based on `key` columns.
+Key based subsets are **incredibly fast** and are particularly useful when the task involves *repeated subsetting*. But it may not be always desirable to set key and physically reorder the *data.table*. In the next vignette, we will address this using a *new* feature -- *secondary indexes*.
 
 ***
 
diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd
index 2a66bc6..1854e8b 100644
--- a/vignettes/datatable-reference-semantics.Rmd
+++ b/vignettes/datatable-reference-semantics.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Reference semantics"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Reference semantics}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,13 +13,12 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
-# options(datatable.auto.index=FALSE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
-This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familar with these concepts, please read the *"Introduction to data.table"* vignette first.
+This vignette discusses *data.table*'s reference semantics which allows to *add/update/delete* columns of a *data.table by reference*, and also combine them with `i` and `by`. It is aimed at those who are already familiar with *data.table* syntax, its general form, how to subset rows in `i`, select and compute on columns, and perform aggregations by group. If you're not familiar with these concepts, please read the *"Introduction to data.table"* vignette first.
 
 ***
 
@@ -30,8 +26,8 @@ This vignette discusses *data.table*'s reference semantics which allows to *add/
 
 We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ```{r}
@@ -59,13 +55,13 @@ All the operations we have seen so far in the previous vignette resulted in a ne
 Before we look at *reference semantics*, consider the *data.frame* shown below:
 
 ```{r}
-DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c=13:18)
+DF = data.frame(ID = c("b","b","b","a","a","c"), a = 1:6, b = 7:12, c = 13:18)
 DF
 ```
 
 When we did:
 
-```{r eval=FALSE}
+```{r eval = FALSE}
 DF$c <- 18:13               # (1) -- replace entire column
 # or
 DF$c[DF$ID == "b"] <- 15:13 # (2) -- subassign in column 'c'
@@ -86,21 +82,21 @@ With *data.table's* `:=` operator, absolutely no copies are made in *both* (1) a
 
 ### b) The `:=` operator
 
-It can be used in `j` in two ways: 
+It can be used in `j` in two ways:
 
 (a) The `LHS := RHS` form
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     DT[, c("colA", "colB", ...) := list(valA, valB, ...)]
 
-    # when you have only one column to assign to you 
+    # when you have only one column to assign to you
     # can drop the quotes and list(), for convenience
     DT[, colA := valA]
 	  ```
 
 (b) The functional form
 
-	```{r eval=FALSE}
+	```{r eval = FALSE}
 	DT[, `:=`(colA = valA, # valA is assigned to colA
 	          colB = valB, # valB is assigned to colB
 	          ...
@@ -111,15 +107,15 @@ It can be used in `j` in two ways:
 
 Note that the code above explains how `:=` can be used. They are not working examples. We will start using them on `flights` *data.table* from the next section.
 
-# 
+#
 
 #### {.bs-callout .bs-callout-info}
 
-* Form (a) is usually easy to program with and is particularly useful when you don't know the columns to assign values to in advance.
+* In (a), `LHS` takes a character vector of column names and `RHS` a *list of values*. `RHS` just needs to be a `list`, irrespective of how its generated (e.g., using `lapply()`, `list()`, `mget()`, `mapply()` etc.). This form is usually easy to program with and is particularly useful when you don't know the columns to assign values to in advance.
 
-* On the other hand, form (b) is handy if you would like to jot some comments down for later.
+* On the other hand, (b) is handy if you would like to jot some comments down for later.
 
-* The result is returned *invisibly*. 
+* The result is returned *invisibly*.
 
 * Since `:=` is available in `j`, we can combine it with `i` and `by` operations just like the aggregation operations we saw in the previous vignette.
 
@@ -136,7 +132,7 @@ For the rest of the vignette, we will work with `flights` *data.table*.
 #### -- How can we add columns *speed* and *total delay* of each flight to `flights` *data.table*?
 
 ```{r}
-flights[, `:=`(speed = distance / (air_time/60), # speed in km/hr
+flights[, `:=`(speed = distance / (air_time/60), # speed in mph (mi/h)
                delay = arr_delay + dep_delay)]   # delay in minutes
 head(flights)
 
@@ -148,7 +144,7 @@ head(flights)
 
 * We did not have to assign the result back to `flights`.
 
-* The `flights` *data.table* now contains the two newly added columns. This is what we mean by *added by reference*. 
+* The `flights` *data.table* now contains the two newly added columns. This is what we mean by *added by reference*.
 
 * We used the functional form so that we could add comments on the side to explain what the computation does. You can also see the `LHS := RHS` form (commented).
 
@@ -190,6 +186,12 @@ Let's look at all the `hours` to verify.
 flights[, sort(unique(hour))]
 ```
 
+#### Exercise: {.bs-callout .bs-callout-warning #update-by-reference-question}
+
+What is the difference between `flights[hour == 24L, hour := 0L]` and `flights[hour == 24L][, hour := 0L]`? Hint: The latter needs an assignment (`<-`) if you would want to use the result later.
+
+If you can't figure it out, have a look at the `Note` section of `?":="`.
+
 ### c) Delete column by reference
 
 #### -- Remove `delay` column
@@ -210,7 +212,7 @@ head(flights)
 
 * When there is just one column to delete, we can drop the `c()` and double quotes and just use the column name *unquoted*, for convenience. That is:
 
-    ```{r eval=FALSE}
+    ```{r eval = FALSE}
     flights[, delay := NULL]
     ```
 
@@ -223,7 +225,7 @@ We have already seen the use of `i` along with `:=` in [Section 2b](#ref-i-j). L
 #### -- How can we add a new column which contains for each `orig,dest` pair the maximum speed?
 
 ```{r}
-flights[, max_speed := max(speed), by=.(origin, dest)]
+flights[, max_speed := max(speed), by = .(origin, dest)]
 head(flights)
 ```
 
@@ -275,7 +277,7 @@ Let's say we would like to create a function that would return the *maximum spee
 ```{r}
 foo <- function(DT) {
   DT[, speed := distance / (air_time/60)]
-  DT[, .(max_speed = max(speed)), by=month]
+  DT[, .(max_speed = max(speed)), by = month]
 }
 ans = foo(flights)
 head(flights)
@@ -299,10 +301,10 @@ The `copy()` function *deep* copies the input object and therefore any subsequen
 
 There are two particular places where `copy()` function is essential:
 
-1. Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified *by reference*. As an example, let's consider the task in the previous section, except we don't want to modify `flghts` by reference. 
+1. Contrary to the situation we have seen in the previous point, we may not want the input data.table to a function to be modified *by reference*. As an example, let's consider the task in the previous section, except we don't want to modify `flights` by reference.
 
     Let's first delete the `speed` column we generated in the previous section.
-    
+
     ```{r}
     flights[, speed := NULL]
     ```
@@ -310,9 +312,9 @@ There are two particular places where `copy()` function is essential:
 
     ```{r}
     foo <- function(DT) {
-      DT <- copy(DT)                             ## deep copy
-      DT[, speed := distance / (air_time/60)]    ## doesn't affect 'flights'
-      DT[, .(max_speed = max(speed)), by=month]
+      DT <- copy(DT)                              ## deep copy
+      DT[, speed := distance / (air_time/60)]     ## doesn't affect 'flights'
+      DT[, .(max_speed = max(speed)), by = month]
     }
     ans <- foo(flights)
     head(flights)
@@ -332,19 +334,19 @@ However we could improve this functionality further by *shallow* copying instead
 2. When we store the column names on to a variable, e.g., `DT_n = names(DT)`, and then *add/update/delete* column(s) *by reference*. It would also modify `DT_n`, unless we do `copy(names(DT))`.
 
     ```{r}
-    DT = data.table(x=1, y=2)
+    DT = data.table(x = 1L, y = 2L)
     DT_n = names(DT)
     DT_n
 
     ## add a new column by reference
-    DT[, z := 3]
+    DT[, z := 3L]
 
     ## DT_n also gets updated
     DT_n
 
     ## use `copy()`
     DT_n = copy(names(DT))
-    DT[, w := 4]
+    DT[, w := 4L]
 
     ## DT_n doesn't get updated
     DT_n
@@ -360,9 +362,9 @@ However we could improve this functionality further by *shallow* copying instead
 
 * We can use `:=` for its side effect or use `copy()` to not modify the original object while updating by reference.
 
-# 
+#
 
-So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to peform *blazing fast subsets* by *keying data.tables*. 
+So far we have seen a whole lot in `j`, and how to combine it with `by` and little of `i`. Let's turn our attention back to `i` in the next vignette *"Keys and fast binary search based subset"* to perform *blazing fast subsets* by *keying data.tables*.
 
 ***
 
diff --git a/vignettes/datatable-reshape.Rmd b/vignettes/datatable-reshape.Rmd
index bfde7d4..d731601 100644
--- a/vignettes/datatable-reshape.Rmd
+++ b/vignettes/datatable-reshape.Rmd
@@ -1,13 +1,10 @@
 ---
 title: "Efficient reshaping using data.tables"
 date: "`r Sys.Date()`"
-output: 
-  rmarkdown::html_document:
-    theme: spacelab
-    highlight: pygments
-    css : css/bootstrap.css
+output:
+  rmarkdown::html_vignette
 vignette: >
-  %\VignetteIndexEntry{Vignette Title}
+  %\VignetteIndexEntry{Efficient reshaping using data.tables}
   %\VignetteEngine{knitr::rmarkdown}
   \usepackage[utf8]{inputenc}
 ---
@@ -16,36 +13,36 @@ vignette: >
 require(data.table)
 knitr::opts_chunk$set(
   comment = "#",
-  error = FALSE,
-  tidy = FALSE,
-  cache = FALSE,
-  collapse=TRUE)
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
 ```
 
 This vignette discusses the default usage of reshaping functions `melt` (wide to long) and `dcast` (long to wide) for *data.tables* as well as the **new extended functionalities** of melting and casting on *multiple columns* available from `v1.9.6`.
 
 ***
 
-```{r echo=FALSE}
-options(width=100)
+```{r echo = FALSE}
+options(width = 100L)
 ```
 
 ## Data
 
-We will load the data sets directly within sections. 
+We will load the data sets directly within sections.
 
 ## Introduction
 
-The `melt` and `dcast` functions for *data.tables* are extensions of the corresponding functions from the [reshape2](http://cran.r-project.org/package=reshape2) package.  
+The `melt` and `dcast` functions for *data.tables* are extensions of the corresponding functions from the [reshape2](https://cran.r-project.org/package=reshape2) package.
 
 In this vignette, we will
 
 1. first briefly look at the default *melting* and *casting* of *data.tables* to convert them from *wide* to *long* format and vice versa,
 
-2. then look at scenarios where the current functionalities becomes cumbersome and inefficient, 
+2. then look at scenarios where the current functionalities becomes cumbersome and inefficient,
 
 3. and finally look at the new improvements to both `melt` and `dcast` methods for *data.tables* to handle multiple columns simultaneously.
-    
+
 The extended functionalities are in line with *data.table's* philosophy of performing operations efficiently and in a straightforward manner.
 
 #### Note: {.bs-callout .bs-callout-info}
@@ -61,7 +58,7 @@ Suppose we have a `data.table` (artificial data) as shown below:
 
 ```{r}
 DT = fread("melt_default.csv")
-DT 
+DT
 ## dob stands for date of birth.
 
 str(DT)
@@ -76,7 +73,7 @@ str(DT)
 We could accomplish this using `melt()` by specifying `id.vars` and `measure.vars` arguments as follows:
 
 ```{r}
-DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"), 
+DT.m1 = melt(DT, id.vars = c("family_id", "age_mother"),
                 measure.vars = c("dob_child1", "dob_child2", "dob_child3"))
 DT.m1
 str(DT.m1)
@@ -88,7 +85,7 @@ str(DT.m1)
 
 * We can also specify column *indices* instead of *names*.
 
-* By default, `variable` column is of type `factor`. Set `variable.factor` argument to `FALSE` if you'd like to return a *character* vector instead. `variable.factor` argument is only available in `melt` from `data.table` and not in the [`reshape2` package](http://github.com/hadley/reshape).
+* By default, `variable` column is of type `factor`. Set `variable.factor` argument to `FALSE` if you'd like to return a *character* vector instead. `variable.factor` argument is only available in `melt` from `data.table` and not in the [`reshape2` package](https://github.com/hadley/reshape).
 
 * By default, the molten columns are automatically named `variable` and `value`.
 
@@ -97,17 +94,17 @@ str(DT.m1)
 #
 
 #### - Name the `variable` and `value` columns to `child` and `dob` respectively
- 
+
 
 ```{r}
-DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"), 
+DT.m1 = melt(DT, measure.vars = c("dob_child1", "dob_child2", "dob_child3"),
                variable.name = "child", value.name = "dob")
 DT.m1
 ```
 
 #### {.bs-callout .bs-callout-info}
 
-* By default, when one of `id.vars` or `measure.vars` is missing, the rest of the columns are *automatically assigned* to the missing argument. 
+* By default, when one of `id.vars` or `measure.vars` is missing, the rest of the columns are *automatically assigned* to the missing argument.
 
 * When neither `id.vars` nor `measure.vars` are specified, as mentioned under `?melt`, all *non*-`numeric`, `integer`, `logical` columns will be assigned to `id.vars`.
 
@@ -117,7 +114,7 @@ DT.m1
 
 In the previous section, we saw how to get from wide form to long form. Let's see the reverse operation in this section.
 
-#### - How can we get back to the original data table `DT` from `DT.m`? 
+#### - How can we get back to the original data table `DT` from `DT.m`?
 
 That is, we'd like to collect all *child* observations corresponding to each `family_id, age_mother` together under the same row. We can accomplish it using `dcast` as follows:
 
@@ -127,16 +124,16 @@ dcast(DT.m1, family_id + age_mother ~ child, value.var = "dob")
 
 #### {.bs-callout .bs-callout-info}
 
-* `dcast` uses *formula* interface. The variables on the *LHS* of formula represents the *id* vars and *RHS* the *measure*  vars. 
+* `dcast` uses *formula* interface. The variables on the *LHS* of formula represents the *id* vars and *RHS* the *measure*  vars.
 
 * `value.var` denotes the column to be filled in with while casting to wide format.
 
 * `dcast` also tries to preserve attributes in result wherever possible.
- 
-# 
+
+#
 
 #### - Starting from `DT.m`, how can we get the number of children in each family?
- 
+
 You can also pass a function to aggregate by in `dcast` with the argument `fun.aggregate`. This is particularly essential when the formula provided does not identify single observation for each cell.
 
 ```{r}
@@ -154,14 +151,14 @@ However, there are situations we might run into where the desired operation is n
 ```{r}
 DT = fread("melt_enhanced.csv")
 DT
-## 1 = female, 2 = male 
-``` 
+## 1 = female, 2 = male
+```
 
-And you'd like to combine (melt) all the `dob` columns together, and `gender` columns together. Using the current functionalty, we can do something like this:
+And you'd like to combine (melt) all the `dob` columns together, and `gender` columns together. Using the current functionality, we can do something like this:
 
 ```{r}
 DT.m1 = melt(DT, id = c("family_id", "age_mother"))
-DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed=TRUE)]
+DT.m1[, c("variable", "child") := tstrsplit(variable, "_", fixed = TRUE)]
 DT.c1 = dcast(DT.m1, family_id + age_mother + child ~ variable, value.var = "value")
 DT.c1
 
@@ -185,7 +182,7 @@ str(DT.c1) ## gender column is character type now!
 In fact, `base::reshape` is capable of performing this operation in a very straightforward manner. It is an extremely useful and often underrated function. You should definitely give it a try!
 
 ## 3. Enhanced (new) functionality
-		
+
 ### a) Enhanced `melt`
 
 Since we'd like for *data.tables* to perform this operation straightforward and efficient using the same interface, we went ahead and implemented an *additional functionality*, where we can `melt` to multiple columns *simultaneously*.
@@ -195,13 +192,13 @@ Since we'd like for *data.tables* to perform this operation straightforward and
 The idea is quite simple. We pass a list of columns to `measure.vars`, where each element of the list contains the columns that should be combined together.
 
 ```{r}
-colA = paste("dob_child", 1:3, sep="")
-colB = paste("gender_child", 1:3, sep="")
+colA = paste("dob_child", 1:3, sep = "")
+colB = paste("gender_child", 1:3, sep = "")
 DT.m2 = melt(DT, measure = list(colA, colB), value.name = c("dob", "gender"))
 DT.m2
 
 str(DT.m2) ## col type is preserved
-``` 
+```
 
 #### - Using `patterns()`
 
@@ -223,8 +220,8 @@ That's it!
 ### b) Enhanced `dcast`
 
 Okay great! We can now melt into multiple columns simultaneously. Now given the data set `DT.m2` as shown above, how can we get back to the same format as the original data we started with?
- 
-If we use the current functionality of `dcast`, then we'd have to cast twice and bind the results together. But that's once again verbose, not straightforward and is also inefficient. 
+
+If we use the current functionality of `dcast`, then we'd have to cast twice and bind the results together. But that's once again verbose, not straightforward and is also inefficient.
 
 #### - Casting multiple `value.var`s simultaneously
 
@@ -242,13 +239,13 @@ DT.c2
 
 * Everything is taken care of internally, and efficiently. In addition to being fast, it is also very memory efficient.
 
-# 
+#
 
 #### Multiple functions to `fun.aggregate`: {.bs-callout .bs-callout-info}
 
 You can also provide *multiple functions* to `fun.aggregate` to `dcast` for *data.tables*. Check the examples in `?dcast` which illustrates this functionality.
 
-# 
+#
 
 ***
 
diff --git a/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
new file mode 100644
index 0000000..a880625
--- /dev/null
+++ b/vignettes/datatable-secondary-indices-and-auto-indexing.Rmd
@@ -0,0 +1,327 @@
+---
+title: "Secondary indices and auto indexing"
+date: "`r Sys.Date()`"
+output: 
+  rmarkdown::html_vignette
+vignette: >
+  %\VignetteIndexEntry{Secondary indices and auto indexing}
+  %\VignetteEngine{knitr::rmarkdown}
+  \usepackage[utf8]{inputenc}
+---
+
+```{r, echo = FALSE, message = FALSE}
+require(data.table)
+knitr::opts_chunk$set(
+  comment = "#",
+    error = FALSE,
+     tidy = FALSE,
+    cache = FALSE,
+ collapse = TRUE)
+```
+
+This vignette assumes that the reader is familiar with data.table's `[i, j, by]` syntax, and how to perform fast key based subsets. If you're not familar with these concepts, please read the *"Introduction to data.table"*,  *"Reference semantics"* and *"Keys and fast binary search based subset"* vignettes first.
+
+***
+
+## Data {#data}
+
+We will use the same `flights` data as in the *"Introduction to data.table"* vignette.
+
+```{r echo = FALSE}
+options(width = 100L)
+```
+
+```{r}
+flights <- fread("flights14.csv")
+head(flights)
+dim(flights)
+```
+
+## Introduction
+
+In this vignette, we will
+
+* discuss *secondary indices* and provide rationale as to why we need them by citing cases where setting keys is not necessarily ideal,
+
+* perform fast subsetting, once again, but using the new `on` argument, which computes secondary indices internally for the task (temporarily), and reuses if one already exists,
+
+* and finally look at *auto indexing* which goes a step further and creates secondary indices automatically, but does so on native R syntax for subsetting.
+
+## 1. Secondary indices
+
+### a) What are secondary indices?
+
+Secondary indices are similar to `keys` in *data.table*, except for two major differences:
+
+* It *doesn't* physically reorder the entire data.table in RAM. Instead, it only computes the order for the set of columns provided and stores that *order vector* in an additional attribute called `index`.
+
+* There can be more than one secondary index for a data.table (as we will see below).
+
+### b) Set and get secondary indices
+
+#### -- How can we set the column `origin` as a secondary index in the *data.table* `flights`?
+
+```{r}
+setindex(flights, origin)
+head(flights)
+
+## alternatively we can provide character vectors to the function 'setindexv()'
+# setindexv(flights, "origin") # useful to program with
+
+# 'index' attribute added
+names(attributes(flights))
+```
+
+* `setindex` and `setindexv()` allows adding a secondary index to the data.table.
+
+* Note that `flights` is **not** phyiscally reordered in increasing order of `origin`, as would have been the case with `setkey()`.
+
+* Also note that the attribute `index` has been added to `flights`. 
+
+* `setindex(flights, NULL)` would remove all secondary indices.
+
+#### -- How can we get all the secondary indices set so far in `flights`?
+
+```{r}
+indices(flights)
+
+setindex(flights, origin, dest)
+indices(flights)
+```
+
+* The function `indices()` returns all current secondary indices in the data.table. If none exists, `NULL` is returned.
+
+* Note that by creating another index on the columns `origin, dest`, we do not lose the first index created on the column `origin`, i.e., we can have multiple secondary indices.
+
+### c) Why do we need secondary indices?
+
+#### -- Reordering a data.table can be expensive and not always ideal
+
+Consider the case where you would like to perform a fast key based subset on `origin` column for the value "JFK". We'd do this as:
+
+```{r, eval = FALSE}
+## not run
+setkey(flights, origin)
+flights["JFK"] # or flights[.("JFK")]
+```
+
+#### `setkey()` requires: {.bs-callout .bs-callout-info}
+
+a) computing the order vector for the column(s) provided, here, `origin`, and
+
+b) reordering the entire data.table, by reference, based on the order vector computed.
+
+# 
+
+Computing the order isn't the time consuming part, since data.table uses true radix sorting on integer, character and numeric vectors. However reordering the data.table could be time consuming (depending on the number of rows and columns). 
+
+Unless our task involves repeated subsetting on the same column, fast key based subsetting could effectively be nullified by the time to reorder, depending on our data.table dimensions.
+
+#### -- There can be only one `key` at the most
+
+Now if we would like to repeat the same operation but on `dest` column instead, for the value "LAX", then we have to `setkey()`, *again*. 
+
+```{r, eval = FALSE}
+## not run
+setkey(flights, dest)
+flights["LAX"]
+```
+
+And this reorders `flights` by `dest`, *again*. What we would really like is to be able to perform the fast subsetting by eliminating the reordering step. 
+
+And this is precisely what *secondary indices* allow for!
+
+#### -- Secondary indices can be reused
+
+Since there can be multiple secondary indices, and creating an index is as simple as storing the order vector as an attribute, this allows us to even eliminate the time to recompute the order vector if an index already exists.
+
+#### -- The new `on` argument allows for cleaner syntax and automatic creation and reuse of secondary indices
+
+As we will see in the next section, the `on` argument provides several advantages:
+
+#### `on` argument {.bs-callout .bs-callout-info}
+
+* enables subsetting by computing secondary indices on the fly. This eliminates having to do `setindex()` every time.
+
+* allows easy reuse of existing indices by just checking the attributes.
+
+* allows for a cleaner syntax by having the columns on which the subset is performed as part of the syntax. This makes the code easier to follow when looking at it at a later point. 
+
+    Note that `on` argument can also be used on keyed subsets as well. In fact, we encourage to provide the `on` argument even when subsetting using keys for better readability.
+
+# 
+
+## 2. Fast subsetting using `on` argument and secondary indices
+
+### a) Fast subsets in `i`
+
+#### -- Subset all rows where the origin airport matches *"JFK"* using `on`
+
+```{r}
+flights["JFK", on = "origin"]
+
+## alternatively
+# flights[.("JFK"), on = "origin"] (or) 
+# flights[list("JFK"), on = "origin"]
+```
+
+* This statement performs a fast binary search based subset as well, by computing the index on the fly. However, note that it doesn't save the index as an attribute automatically. This may change in the future.
+
+* If we had already created a secondary index, using `setindex()`, then `on` would reuse it instead of (re)computing it. We can see that by using `verbose = TRUE`:
+
+    ```{r}
+    setindex(flights, origin)
+    flights["JFK", on = "origin", verbose = TRUE][1:5]
+    ```
+
+#### -- How can I subset based on `origin` *and* `dest` columns?
+
+For example, if we want to subset `"JFK", "LAX"` combination, then:
+
+```{r}
+flights[.("JFK", "LAX"), on = c("origin", "dest")][1:5]
+```
+
+* `on` argument accepts a character vector of column names corresponding to the order provided to `i-argument`.
+
+* Since the time to compute the secondary index is quite small, we don't have to use `setindex()`, unless, once again, the task involves repeated subsetting on the same column.
+
+### b) Select in `j`
+
+All the operations we will discuss below are no different to the ones we already saw in the *Keys and fast binary search based subset* vignette. Except we'll be using the `on` argument instead of setting keys.
+
+#### -- Return `arr_delay` column alone as a data.table corresponding to `origin = "LGA"` and `dest = "TPA"`
+
+```{r}
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")]
+```
+
+### c) Chaining
+
+#### -- On the result obtained above, use chaining to order the column in decreasing order.
+
+```{r}
+flights[.("LGA", "TPA"), .(arr_delay), on = c("origin", "dest")][order(-arr_delay)]
+```
+
+### d) Compute or *do* in `j`
+
+#### -- Find the maximum arrival delay correspondong to `origin = "LGA"` and `dest = "TPA"`.
+
+```{r}
+flights[.("LGA", "TPA"), max(arr_delay), on = c("origin", "dest")]
+```
+
+### e) *sub-assign* by reference using `:=` in `j`
+
+We have seen this example already in the *Reference semantics* and *Keys and fast binary search based subset* vignette. Let's take a look at all the `hours` available in the `flights` *data.table*:
+
+```{r}
+# get all 'hours' in flights
+flights[, sort(unique(hour))]
+```
+
+We see that there are totally `25` unique values in the data. Both *0* and *24* hours seem to be present. Let's go ahead and replace *24* with *0*, but this time using `on` instead of setting keys.
+
+```{r}
+flights[.(24L), hour := 0L, on = "hour"]
+```
+
+Now, let's check if `24` is replaced with `0` in the `hour` column.
+
+```{r}
+flights[, sort(unique(hour))]
+```
+
+* This is particularly a huge advantage of secondary indices. Previously, just to update a few rows of `hour`, we had to `setkey()` on it, which inevitablly reorders the entire data.table. With `on`, the order is preserved, and the operation is much faster! Looking at the code, the task we wanted to perform is also quite clear.
+
+### f) Aggregation using `by`
+
+#### -- Get the maximum departure delay for each `month` corresponding to `origin = "JFK"`. Order the result by `month`
+
+```{r}
+ans <- flights["JFK", max(dep_delay), keyby = month, on = "origin"]
+head(ans)
+```
+
+* We would have had to set the `key` back to `origin, dest` again, if we did not use `on` which internally builds secondary indices on the fly.
+
+### g) The *mult* argument
+
+The other arguments including `mult` work exactly the same way as we saw in the *Keys and fast binary search based subset* vignette. The default value for `mult` is "all". We can choose, instead only the "first" or "last" matching rows should be returned.
+
+#### -- Subset only the first matching row where `dest` matches *"BOS"* and *"DAY"*
+
+```{r}
+flights[c("BOS", "DAY"), on = "dest", mult = "first"]
+```
+
+#### -- Subset only the last matching row where `origin` matches *"LGA", "JFK", "EWR"* and `dest` matches *"XNA"*
+
+```{r}
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), on = c("origin", "dest"), mult = "last"]
+```
+
+### h) The *nomatch* argument
+
+We can choose if queries that do not match should return `NA` or be skipped altogether using the `nomatch` argument.
+
+#### -- From the previous example, subset all rows only if there's a match
+
+```{r}
+flights[.(c("LGA", "JFK", "EWR"), "XNA"), mult = "last", on = c("origin", "dest"), nomatch = 0L]
+```
+
+* There are no flights connecting "JFK" and "XNA". Therefore, that row is skipped in the result.
+
+## 3. Auto indexing
+
+First we looked at how to fast subset using binary search using *keys*. Then we figured out that we could improve performance even further and have more cleaner syntax by using secondary indices. What could be better than that? The answer is to optimise *native R syntax* to use secondary indices internally so that we can have the same performance without having to use newer syntax.
+
+That is what *auto indexing* does. At the moment, it is only implemented for binary operators `==` and `%in%`. And it only works with a single column at the moment as well. An index is automatically created *and* saved as an attribute. That is, unlike the `on` argument which computes the index on the fly each time, a secondary index is created here. 
+
+Let's start by creating a data.table big enough to highlight the advantage.
+
+```{r}
+set.seed(1L)
+dt = data.table(x = sample(1e5L, 1e7L, TRUE), y = runif(100L))
+print(object.size(dt), units = "Mb")
+```
+
+When we use `==` or `%in%` on a single column for the first time, a secondary index is created automtically, and it is used to perform the subset.
+
+```{r}
+## have a look at all the attribute names
+names(attributes(dt))
+
+## run thefirst time
+(t1 <- system.time(ans <- dt[x == 989L]))
+head(ans)
+
+## secondary index is created
+names(attributes(dt))
+
+indices(dt)
+```
+
+The time to subset the first time is the time to create the index + the time to subset. Since creating a secondary index involves only creating the order vector, this combined operation is faster than vector scans in many cases. But the real advantage comes in successive subsets. They are extremely fast.
+
+```{r}
+## successive subsets
+(t2 <- system.time(dt[x == 989L]))
+system.time(dt[x %in% 1989:2012])
+```
+
+* Running the first time took `r sprintf("%.3f", t1["elapsed"])` seconds where as the second time took `r sprintf("%.3f", t2["elapsed"])` seconds. 
+
+* Auto indexing can be disabled by setting the global argument `options(datatable.auto.index = FALSE)`.
+
+* Disabling auto indexing still allows to use indices created explicitly with `setindex` or `setindexv`. You can disable indices fully by setting global argument `options(datatable.use.index = FALSE)`.
+
+# 
+
+In the future, we plan to extend auto indexing to expressions involving more than one column. Also we are working on extending binary search to work with more binary operators like `<`, `<=`, `>` and `>=`. Once done, it would be straightforward to extend it to these operators as well. 
+
+We will extend fast *subsets* using keys and secondary indices to *joins* in the next vignette, *"Joins and rolling joins"*.
+
+***

-- 
Alioth's /usr/local/bin/git-commit-notice on /srv/git.debian.org/git/debian-med/r-cran-data.table.git