diff --git a/man/assign.Rd b/man/assign.Rd index b0c038349..80cd98420 100644 --- a/man/assign.Rd +++ b/man/assign.Rd @@ -35,13 +35,13 @@ set(x, i = NULL, j, value) } \arguments{ \item{LHS}{ A character vector of column names (or numeric positions) or a variable that evaluates as such. If the column doesn't exist, it is added, \emph{by reference}. } -\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. } +\item{RHS}{ A list of replacement values. It is recycled in the usual way to fill the number of rows satisfying \code{i}, if any. To remove a column use \code{NULL}. Note that a zero-length \code{RHS} (other than \code{NULL}) is an error, unless \code{by} is used, in which case it is treated as a no-op for that group. } \item{x}{ A \code{data.table}. Or, \code{set()} accepts \code{data.frame}, too. } \item{i}{ Optional. Indicates the rows on which the values must be updated. If not \code{NULL}, implies \emph{all rows}. Missing or zero values are ignored. The \code{:=} form is more powerful as it allows adding/updating columns by reference based on \emph{subsets} and \code{joins}. See \code{Details}. In \code{set}, only integer type is allowed in \code{i} indicating which rows \code{value} should be assigned to. \code{NULL} represents all rows more efficiently than creating a vector such as \code{1:nrow(x)}. } -\item{j}{ Column name(s) (character) or number(s) (integer) to be assigned \code{value} when column(s) already exist, and only column name(s) if they are to be created. } -\item{value}{ A list of replacement values to assign by reference to \code{x[i, j]}. } +\item{j}{ Column name(s) (character) or number(s) (integer). For \code{set}, these specify the columns of \code{x} to be updated. } +\item{value}{ A list or vector of replacement values to be assigned by reference to \code{x[i, j]}. For \code{set}, if multiple columns are specified in \code{j}, \code{value} should be a list. } } \details{ \code{:=} is defined for use in \code{j} only. It \emph{adds} or \emph{updates} or \emph{removes} column(s) by reference. It makes no copies of any part of memory at all. Please read \href{../doc/datatable-reference-semantics.html}{\code{vignette("datatable-reference-semantics")}} and follow with examples. Some typical usages are: @@ -54,6 +54,7 @@ set(x, i = NULL, j, value) DT[i, colvector := val, with = FALSE] # OLD syntax. The contents of "colvector" in calling scope determine the column(s). DT[i, (colvector) := val] # same (NOW PREFERRED) shorthand syntax. The parens are enough to stop the LHS being a symbol; same as c(colvector). DT[i, colC := mean(colB), by = colA] # update (or add) column called "colC" by reference by group. A major feature of `:=`. + DT[, x := if (.N > 2) sum(v) else integer(0), by = g] # zero-length RHS is treated as a no-op for groups with <= 2 rows DT[,`:=`(new1 = sum(colB), new2 = sum(colC))] # Functional form DT[, let(new1 = sum(colB), new2 = sum(colC))] # New alias for functional form. } diff --git a/vignettes/datatable-reference-semantics.Rmd b/vignettes/datatable-reference-semantics.Rmd index 8b93085a1..49d20ceb2 100644 --- a/vignettes/datatable-reference-semantics.Rmd +++ b/vignettes/datatable-reference-semantics.Rmd @@ -232,6 +232,12 @@ head(flights) * We could have also provided `by` with a *character vector* as we saw in the [`vignette("datatable-intro", package="data.table")`](datatable-intro.html) vignette, e.g., `by = c("origin", "dest")`. +#### Note on zero-length RHS and `by` + +#### Note on zero-length RHS and `by` + +* If the `RHS` of a `:=` assignment evaluates to a zero-length vector, an error is normally raised. When `:=` is used with `by`, however, a zero-length result for a group is treated as a no-op for that group and no error is thrown. This allows grouped operations to continue even when some groups produce no result. + # ### e) Multiple columns and `:=`