From 41c481f41ad322341d0698d001b4af5c98c5dbac Mon Sep 17 00:00:00 2001 From: Bryce Mecum Date: Sat, 14 Sep 2024 12:38:47 -0700 Subject: [PATCH] GH-44069: [Docs][R] Add note to to_arrow() docs about collect/compute (#44094) ### Rationale for this change Improves the documentation for the `to_arrow()` function for the use case referenced in https://github.com/apache/arrow/issues/44069. ### What changes are included in this PR? Just docs. ### Are these changes tested? Yes. Built and tested locally. ### Are there any user-facing changes? Just docs. * GitHub Issue: #44069 Authored-by: Bryce Mecum Signed-off-by: Nic Crane --- r/R/duckdb.R | 8 +++++++- r/man/to_arrow.Rd | 9 ++++++++- 2 files changed, 15 insertions(+), 2 deletions(-) diff --git a/r/R/duckdb.R b/r/R/duckdb.R index a2bf62de2fde2..65c70243e7ab3 100644 --- a/r/R/duckdb.R +++ b/r/R/duckdb.R @@ -137,7 +137,13 @@ duckdb_disconnector <- function(con, tbl_name) { #' Create an Arrow object from a DuckDB connection #' -#' This can be used in pipelines that pass data back and forth between Arrow and DuckDB +#' This can be used in pipelines that pass data back and forth between Arrow and +#' DuckDB. +#' +#' Note that you can only call `collect()` or `compute()` on the result of this +#' function once. To work around this limitation, you should either only call +#' `collect()` as the final step in a pipeline or call `as_arrow_table()` on the +#' result to materialize the entire Table in-memory. #' #' @param .data the object to be converted #' @return A `RecordBatchReader`. diff --git a/r/man/to_arrow.Rd b/r/man/to_arrow.Rd index aed40609a5161..87b8fea36eeda 100644 --- a/r/man/to_arrow.Rd +++ b/r/man/to_arrow.Rd @@ -13,7 +13,14 @@ to_arrow(.data) A \code{RecordBatchReader}. } \description{ -This can be used in pipelines that pass data back and forth between Arrow and DuckDB +This can be used in pipelines that pass data back and forth between Arrow and +DuckDB. +} +\details{ +Note that you can only call \code{collect()} or \code{compute()} on the result of this +function once. To work around this limitation, you should either only call +\code{collect()} as the final step in a pipeline or call \code{as_arrow_table()} on the +result to materialize the entire Table in-memory. } \examples{ \dontshow{if (getFromNamespace("run_duckdb_examples", "arrow")()) (if (getRversion() >= "3.4") withAutoprint else force)(\{ # examplesIf}