How does Daft do logical optimization? #2626
Unanswered
htcd-subham
asked this question in
Q&A
Replies: 1 comment
-
Hi! Daft does filter predicate pushdowns into the scan. This allows us to perform filtering as we read data, making it much more memory efficient :) |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
== Unoptimized Logical Plan ==
|
| Glob paths = [s3://daft-public-data/tutorials/10-min/sample-data-dog-owners-
| partitioned.pq/**]
| Coerce int96 timestamp unit = Nanoseconds
| IO config = S3 config = { Max connections = 8, Retry initial backoff ms = 1000,
| Connect timeout ms = 30000, Read timeout ms = 30000, Max retries = 25, Retry
| mode = adaptive, Anonymous = false, Use SSL = true, Verify SSL = true, Check
| hostname SSL = true, Requester pays = false, Force Virtual Addressing = false },
| Azure config = { Anonymous = false, Use SSL = true }, GCS config = { Anonymous =
| false }, HTTP config = { user_agent = daft/0.0.1 }
| Use multithreading = true
| File schema = first_name#Utf8, last_name#Utf8, age#Int64, DoB#Date,
| country#Utf8, has_dog#Boolean
| Partitioning keys = []
| Output schema = first_name#Utf8, last_name#Utf8, age#Int64, DoB#Date,
| country#Utf8, has_dog#Boolean
== Optimized Logical Plan ==
| Glob paths = [s3://daft-public-data/tutorials/10-min/sample-data-dog-owners-
| partitioned.pq/**]
| Coerce int96 timestamp unit = Nanoseconds
| IO config = S3 config = { Max connections = 8, Retry initial backoff ms = 1000,
| Connect timeout ms = 30000, Read timeout ms = 30000, Max retries = 25, Retry
| mode = adaptive, Anonymous = false, Use SSL = true, Verify SSL = true, Check
| hostname SSL = true, Requester pays = false, Force Virtual Addressing = false },
| Azure config = { Anonymous = false, Use SSL = true }, GCS config = { Anonymous =
| false }, HTTP config = { user_agent = daft/0.0.1 }
| Use multithreading = true
| File schema = first_name#Utf8, last_name#Utf8, age#Int64, DoB#Date,
| country#Utf8, has_dog#Boolean
| Partitioning keys = []
| Filter pushdown = col(country) == lit("Canada")
| Output schema = first_name#Utf8, last_name#Utf8, age#Int64, DoB#Date,
| country#Utf8, has_dog#Boolean
== Physical Plan ==
| Num Scan Tasks = 1
| Estimated Scan Bytes = 6336
| Clustering spec = { Num partitions = 1 }
How did Daft did this optimization?
Beta Was this translation helpful? Give feedback.
All reactions