-
Notifications
You must be signed in to change notification settings - Fork 147
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FEAT] Outer joins for native executor #2860
base: main
Are you sure you want to change the base?
Conversation
CodSpeed Performance ReportMerging #2860 will not alter performanceComparing Summary
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #2860 +/- ##
==========================================
+ Coverage 78.14% 78.21% +0.06%
==========================================
Files 610 611 +1
Lines 72146 72345 +199
==========================================
+ Hits 56381 56584 +203
+ Misses 15765 15761 -4
|
input: &Arc<MicroPartition>, | ||
state: &mut InnerHashJoinProbeState, | ||
) -> DaftResult<Arc<MicroPartition>> { | ||
let (probe_table, tables) = state.get_probeable_and_table(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This may be worth making a struct type for
struct ProbeState {
probe_table
tables
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Makes sense! Implemented it
src/daft-local-execution/src/intermediate_ops/inner_hash_join_probe.rs
Outdated
Show resolved
Hide resolved
let mut build_side_growable = | ||
GrowableTable::new(&tables.iter().collect::<Vec<_>>(), true, 20)?; | ||
|
||
for (table_idx, row_idx) in merged_bitmap.get_unused_indices() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be much more performant using a BitmapIter
pub struct BitmapIter<'a> { |
Which will compress the adjacent valid bits so we can reduce the calls to extend.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also convert the bitmap into a BooleanArray and use mask_filter
https://github.com/Eventual-Inc/Daft/blob/b1ea3b9749e01512f48dfd45f9899a329fc9799f/src/daft-table/src/lib.rs#L321
instead of iterating
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought the BitmapIter returns an iterator over the individual bits? I just checked and there is also SlicesIterator
: https://github.com/Eventual-Inc/Daft/blob/b1ea3b9749e01512f48dfd45f9899a329fc9799f/src/arrow2/src/bitmap/utils/slice_iterator.rs which is a Iterator over a bitmap that returns slices of set regions
, did you mean this one?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, yup thats the one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I prefer the BooleanArray as a mask_filter
method more, it's a lot cleaner. Went with that in the latest commit.
Implement outer joins for Swordfish.
(Yes, this PR is a little big. But:
Outer join probes (and left/right now) are implemented as a Streaming Sink.
execute
phase of the streaming sink, probing is done concurrently via workers (this is the same implementation as all the other join types). The only difference is that during probing, workers will save the indices on the left side that have matches (using a mutable bitmap).finalize
phase, we merge together all the bitmaps across the concurrent workers (via a bitwise OR) to get a global view of all the indices that had matches. Then, we take all the indices that didn't get a match and return them (with nulls for the right side). This is the same logic we currently use for the python runner.used_indices
bitmaps for left/right joins as well.Note: I had to make Streaming Sink concurrency-aware to allow this. The changes in particular are:
max concurrency
, currently only LIMIT will have this set to 1.execute
accepts somemut state
and finalize will consolidate all of the state, i.e.Vec<Box<dyn State>>
.finalize
method doesn't get called before the workers are done with theexecutes
.