-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Expose a std::shared_ptr<arrow::Table> to R SEXP #36274
Comments
The I think what you may be looking for is the C data interface. Arrow C++ can export a table as an ABI-stable stream of record batches. This is not quite the same as a table but will allow you to export the Table from the arrow R package and import it using C++ from elsewhere. # These are specific to my system (homebrew on MacOS M1)
arrow_include <- "-I/opt/homebrew/Cellar/apache-arrow/12.0.0_1/include"
arrow_libs <- "-L/opt/homebrew/Cellar/apache-arrow/12.0.0_1/lib -larrow"
Sys.setenv("PKG_CXXFLAGS" = arrow_include)
Sys.setenv("PKG_LIBS" = arrow_libs)
cpp11::cpp_source(code = '
#include <arrow/table.h>
#include <arrow/c/bridge.h>
#include <cpp11.hpp>
using namespace arrow;
// Version that returns a Result<> so we can use Arrow C++-style error handling
// macros
Result<int> count_rows_internal(SEXP array_stream_xptr) {
auto array_stream = reinterpret_cast<struct ArrowArrayStream*>(
R_ExternalPtrAddr(array_stream_xptr));
ARROW_ASSIGN_OR_RAISE(auto reader, ImportRecordBatchReader(array_stream))
std::shared_ptr<Table> table;
ARROW_RETURN_NOT_OK(reader->ReadAll(&table));
return table->num_rows();
}
// Version that uses cpp11 error handling
[[cpp11::register]]
int count_rows(SEXP array_stream_xptr) {
Result<int> num_rows = count_rows_internal(array_stream_xptr);
if (num_rows.ok()) {
return *num_rows;
} else {
cpp11::stop("Arrow C++ error: %s", num_rows.status().ToString().c_str());
}
}
', cxx_std = "CXX17")
library(arrow, warn.conflicts = FALSE)
#> Some features are not enabled in this build of Arrow. Run `arrow_info()` for more information.
library(nanoarrow)
tab <- arrow_table(x = 1:10)
(array_stream <- as_nanoarrow_array_stream(tab))
#> <nanoarrow_array_stream struct<x: int32>>
#> $ get_schema:function ()
#> $ get_next :function (schema = x$get_schema(), validate = TRUE)
#> $ release :function ()
count_rows(array_stream)
#> [1] 10 Created on 2023-06-26 with reprex v2.0.2 |
Could you expand on the answer @paleolimbot? In my use case, for example, I have some C++ code which creates a |
Sorry for the delay here...I was taking some time away from the keyboard. It seems like you are interested in the reverse problem...the reprex above demos taking an Arrow Table from R and doing a computation in C++ that doesn't return a Table. Below I've tweaked it a bit to illustrate the reverse process (i.e., if you have a Table in Arrow C++, how to communicate it back to the Arrow R package to get a Table object): # These are specific to my system (homebrew on MacOS M1)
arrow_include <- "-I/opt/homebrew/Cellar/apache-arrow/12.0.1/include"
arrow_libs <- "-L/opt/homebrew/Cellar/apache-arrow/12.0.1/lib -larrow"
Sys.setenv("PKG_CXXFLAGS" = arrow_include)
Sys.setenv("PKG_LIBS" = arrow_libs)
cpp11::cpp_source(code = '
#include <arrow/table.h>
#include <arrow/c/bridge.h>
#include <cpp11.hpp>
using namespace arrow;
// Version that returns a Result<> so we can use Arrow C++-style error handling
// macros
Result<std::shared_ptr<Table>> array_stream_to_table(SEXP array_stream_xptr) {
auto array_stream = reinterpret_cast<struct ArrowArrayStream*>(
R_ExternalPtrAddr(array_stream_xptr));
ARROW_ASSIGN_OR_RAISE(auto reader, ImportRecordBatchReader(array_stream))
return reader->ToTable();
}
Status table_to_array_stream(const std::shared_ptr<Table>& table, SEXP array_stream_xptr) {
auto reader = std::make_shared<arrow::TableBatchReader>(table);
auto array_stream = reinterpret_cast<struct ArrowArrayStream*>(
R_ExternalPtrAddr(array_stream_xptr));
return ExportRecordBatchReader(reader, array_stream);
}
// Version that uses cpp11 error handling
[[cpp11::register]]
void slice_table(SEXP array_stream_xptr_in, int offset, int length, SEXP array_stream_xptr_out) {
Result<std::shared_ptr<Table>> maybe_input = array_stream_to_table(array_stream_xptr_in);
if (!maybe_input.ok()) {
cpp11::stop("Arrow C++ error: %s", maybe_input.status().ToString().c_str());
}
std::shared_ptr<Table> input = *maybe_input;
std::shared_ptr<Table> output = input->Slice(offset, length);
Status status = table_to_array_stream(output, array_stream_xptr_out);
if (!status.ok()) {
cpp11::stop("Arrow C++ error: %s", status.ToString().c_str());
}
}
', cxx_std = "CXX17")
library(arrow, warn.conflicts = FALSE)
library(nanoarrow)
# Prepare input
tab <- arrow_table(x = 1:10)
array_stream_in = as_nanoarrow_array_stream(tab)
array_stream_out = nanoarrow_allocate_array_stream()
# Call C++ function
slice_table(array_stream_in, 2, 7, array_stream_out)
# convert output to Table
as_arrow_table(as_record_batch_reader(array_stream_out))
#> Table
#> 7 rows x 1 columns
#> $x <int32> Created on 2023-08-06 with reprex v2.0.2 |
@paleolimbot You mentioned that the Perhaps I am misunderstanding the Cheers, |
I am actually not sure of the time complexity of
If you control the builds of both the Arrow R package and whatever C++ you're writing (e.g., via setting |
Describe the usage question you have. Please include as many useful details as possible.
Hello,
I'm trying to build bindings around a lib that expose std::shared_ptrarrow::Table to R scripts.
Is there a way to access to function doing the conversion from arrow::Table to R SEXP from arrow-r?
Regards
Component(s)
R
The text was updated successfully, but these errors were encountered: