Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[PERF] Predicate Pushdown into Scan Operator #1730

Merged
merged 13 commits into from
Dec 16, 2023

Conversation

samster25
Copy link
Member

@samster25 samster25 commented Dec 15, 2023

  • Implements Predicate Pushdown into Scan Operator with using Native Downloads
  • Implements Limit-Limit Folding
  • Fixes bug with Statistics evaluation where we were not performing bitwise ands and ors

Copy link

codecov bot commented Dec 15, 2023

Codecov Report

Merging #1730 (b6d6669) into main (b833b25) will increase coverage by 0.00%.
Report is 1 commits behind head on main.
The diff coverage is n/a.

Additional details and impacted files

Impacted file tree graph

@@           Coverage Diff           @@
##             main    #1730   +/-   ##
=======================================
  Coverage   85.02%   85.02%           
=======================================
  Files          55       55           
  Lines        5515     5517    +2     
=======================================
+ Hits         4689     4691    +2     
  Misses        826      826           

see 2 files with indirect coverage changes

@samster25 samster25 changed the title [PERF] predicate pushdown rule into scan [PERF] Predicate Pushdown into Scan Operator Dec 15, 2023
Copy link
Contributor

@clarkzinzow clarkzinzow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, feel free to disregard the nits around the plan reprs, since we have a decent bit of general reworking to do there anyways!

@@ -53,8 +53,7 @@ impl Source {
partitioning_keys,
pushdowns,
})) => {
res.push("Source:".to_string());
res.push(format!("Scan op = {}", scan_op));
res.push(format!("Source: Operator = {}", scan_op));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this more readable when visualizing the plan? In the past, we've tried to keep the first line of the logical op repr super concise, essentially just a name for the logical op, but it looks like the scan operator repr can be pretty long:

write!(f, "AnonymousScanOperator: File paths=[{}], Format-specific config = {:?}, Storage config = {:?}", self.files.join(", "), self.file_format_config, self.storage_config)

IMO each string in this returned vec should be pretty atomic/granular, and it should be up to the display mode (e.g. tree plan visualization vs. single-line summary) to condense it as desired.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does make it more readable imo! But the main reason to do this was to make it much closer to the Legacy Source repr to be able to write the repr based tests much easier. Both should be able about the same length

Comment on lines +100 to +114
predicate.apply(&mut |e: &Expr| {

match e {
#[cfg(feature = "python")]
Expr::Function{func: FunctionExpr::Python(..), .. } => {
has_udf = true;
Ok(VisitRecursion::Stop)
},
Expr::Function{func: FunctionExpr::Uri(..), .. } => {
has_udf = true;
Ok(VisitRecursion::Stop)
},
_ => Ok(VisitRecursion::Continue)
}
})?;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So nice!

@@ -33,7 +33,7 @@ impl AnonymousScanOperator {

impl Display for AnonymousScanOperator {
fn fmt(&self, f: &mut std::fmt::Formatter<'_>) -> std::fmt::Result {
write!(f, "{:#?}", self)
write!(f, "AnonymousScanOperator: File paths=[{}], Format-specific config = {:?}, Storage config = {:?}", self.files.join(", "), self.file_format_config, self.storage_config)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may want to do the same for GlobScanOperator as well!

write!(f, "{:#?}", self)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah was thinking that, can do that in a follow up!

@samster25 samster25 merged commit fa782d8 into main Dec 16, 2023
49 checks passed
@samster25 samster25 deleted the sammy/predicate-pushdown-into-scan branch December 16, 2023 00:39
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants