Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reuse connection to Extract,Consume and execute substrait query plans #113

Open
wants to merge 9 commits into
base: main
Choose a base branch
from

Conversation

pdet
Copy link
Collaborator

@pdet pdet commented Sep 24, 2024

Up until now, we have been creating new connections to interact with substrait. This was necessary to circumvent the client context lock.

However, this was a brittle solution and especially problematic when dealing with temporary objects since these are not passed around new connections.

This PR allows us to reuse the same connection by utilizing a new flexible relation binder that allows us to consume and execute substrait plans from within the same connection; it also changes the from_substrait functions to use bind replace.

Aditionally, extracting plans will also go through a specialized code path, to allow us to use the same connection.

Since substrait also uses special options (e.g., some optimizations are disabled), these are set and reset on the original connection.

vector<Value> parameters {Value::LIST(parquet_files)};
auto scan_rel = make_shared_ptr<TableFunctionRelation>(
context, "parquet_scan", parameters, std::move(named_parameters), nullptr, true, acquire_lock);
auto rel = static_cast<Relation *>(scan_rel.get());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably beyond the scope of this PR but does this handle emitting multiple names for struct types?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey David,
I don't think this PR will alter anything related to multiple names for structs. The goal here is to support temporary objects (e.g., a pyarrow object that has been registered in the python client) in substrait!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants