Skip to content

Commit

Permalink
document use of sparse data in recipes
Browse files Browse the repository at this point in the history
  • Loading branch information
EmilHvitfeldt committed Sep 10, 2024
1 parent a9f8d0d commit a954200
Show file tree
Hide file tree
Showing 6 changed files with 52 additions and 6 deletions.
9 changes: 6 additions & 3 deletions R/recipe.R
Original file line number Diff line number Diff line change
Expand Up @@ -41,7 +41,8 @@ recipe.default <- function(x, ...) {
#' `cbind`; see Examples). A model formula may not be the best choice for
#' high-dimensional data with many columns, because of problems with memory.
#' @param x,data A data frame, tibble, or sparse matrix from the `Matrix`
#' package of the *template* data set.
#' package of the *template* data set. See [sparse_data] for more information
#' about use of sparse data.
#' (see below).
#' @return An object of class `recipe` with sub-objects:
#' \item{var_info}{A tibble containing information about the original data
Expand Down Expand Up @@ -321,7 +322,8 @@ prep <- function(x, ...) {
#' parameters from a training set that can be later applied to other data
#' sets.
#' @param training A data frame, tibble, or sparse matrix from the `Matrix`
#' package, that will be used to estimate parameters for preprocessing.
#' package, that will be used to estimate parameters for preprocessing. See
#' [sparse_data] for more information about use of sparse data.
#' @param fresh A logical indicating whether already trained operation should be
#' re-trained. If `TRUE`, you should pass in a data set to the argument
#' `training`.
Expand Down Expand Up @@ -605,7 +607,8 @@ bake <- function(object, ...) {
#' @param new_data A data frame, tibble, or sparse matrix from the `Matrix`
#' package for whom the preprocessing will be applied. If `NULL` is given to
#' `new_data`, the pre-processed _training data_ will be returned (assuming
#' that `prep(retain = TRUE)` was used).
#' that `prep(retain = TRUE)` was used). See [sparse_data] for more
#' information about use of sparse data.
#' @param ... One or more selector functions to choose which variables will be
#' returned by the function. See [selections()] for more details.
#' If no selectors are given, the default is to use
Expand Down
19 changes: 19 additions & 0 deletions R/sparsevctrs.R
Original file line number Diff line number Diff line change
@@ -1,3 +1,22 @@
#' Using sparse data with recipes
#'
#' [recipe()], [prep()], and [bake()] all accept sparse tibbles from the
#' `sparsevctrs` package and sparse matrices from the `Matrix` package. Sparse
#' matrices are converted to sparse tibbles internally as each step expects a
#' tibble as its input, and is expected to return a tibble as well.
#'
#' Several steps work with sparse data. A step can either work with sparse
#' data, ruin sparsity, or create sparsity. The documentation for each step
#' will indicate whether it will work with sparse data or create sparse columns.
#' If nothing is listed it is assumed to ruin sparsity.
#'
#' Spare tibbles or data.frames will be return from [bake()] if sparse columns
#' are present in data, either from being generated in steps or because sparse
#' data was passed into [recipe()], [prep()], or [bake()].
#'
#' @name sparse_data
NULL

is_sparse_tibble <- function(x) {
any(vapply(x, sparsevctrs::is_sparse_vector, logical(1)))
}
Expand Down
3 changes: 2 additions & 1 deletion man/bake.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/prep.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

3 changes: 2 additions & 1 deletion man/recipe.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions man/sparse_data.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit a954200

Please sign in to comment.