Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add PhysicalJoinType #572

Closed

Conversation

danepitkin
Copy link
Contributor

@danepitkin danepitkin commented Nov 8, 2023

BREAKING CHANGE: the JoinType for physical operators HashJoin, MergeJoin, and
NestedLoopJoin has been refactored into a shared enum called PhysicalJoinType.
It is wire compatible and a new is_null_aware field is added to the physical join rels.

Closes #563

@danepitkin
Copy link
Contributor Author

Question: Is it better for the original ANTI_JOIN enums to default to NO_NULLS or WITH_NULLs?

Copy link
Member

@EpsilonPrime EpsilonPrime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is null handling part of the type or is it a separate setting/option?

JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
}
PhysicalJoinType type = 7;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For protobuf compatibility it is better to make this 8 and mark 7 as deprecated.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose if we keep all of the values the same as you've done here it would be wire compatible which mean we wouldn't need to renumber the field for the "type" change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep, it's still technically wire compatible! The Anti Join will have to default to one of the two options though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I change this to be

  PHYSICAL_JOIN_TYPE_LEFT_ANTI = 7;
  PHYSICAL_JOIN_TYPE_RIGHT_ANTI = 8;
  PHYSICAL_JOIN_TYPE_LEFT_ANTI_NULL_AWARE = 9;
  PHYSICAL_JOIN_TYPE_RIGHT_ANTI_NULL_AWARE = 10;

?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated to this ^

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated again to use a separate is_null_aware field in the rels

@danepitkin
Copy link
Contributor Author

Is null handling part of the type or is it a separate setting/option?

This Velox documentation makes me think that these are considered different types of anti joins (rather than a separate config): https://facebookincubator.github.io/velox/develop/anti-join.html

@danepitkin
Copy link
Contributor Author

danepitkin commented Nov 8, 2023

Hmm, although they did implement it slightly differently, using JoinType::kAnti and JoinType::kNullAwareAnti, presumably with the NULL config separate.

Edit: actually, this is similar to this PR, but the left/right option is configured elsewhere it seems.

Copy link

github-actions bot commented Nov 8, 2023

ACTION NEEDED

Substrait follows the Conventional Commits
specification
for
release automation.

The PR title and description are used as the merge commit message. Please update your PR title and description to match the specification.

@danepitkin
Copy link
Contributor Author

I see @westonpace 's old PR here does use a null_is_match bool instead of expanding the jointype enum #467

@danepitkin
Copy link
Contributor Author

Hmm, maybe it would be better to have bool null_aware in each physical join so that other join types can use it.

@jacques-n
Copy link
Contributor

This shouldn't be a breaking binary change, right?

@danepitkin
Copy link
Contributor Author

danepitkin commented Nov 28, 2023

This shouldn't be a breaking binary change, right?

Yes, good call!

Edit: The "Breaking Changes Check" does flag this, though.

JOIN_TYPE_RIGHT_ANTI = 8;
}
PhysicalJoinType type = 7;
bool is_null_aware = 8;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly does this boolean enable? It would be good to document it.

@westonpace
Copy link
Member

I've created a new PR specifically for dealing with the null_is_match / null_aware stuff at #585

Do you want to rescope this PR to just dealing with the join type?

@westonpace westonpace added awaiting-user-input This issue is waiting on further input from users and removed awaiting SMC approval labels Jan 23, 2024
@jacques-n
Copy link
Contributor

No progress has been made in more than six months. Closing without prejudice.

@jacques-n jacques-n closed this Aug 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
awaiting-user-input This issue is waiting on further input from users
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Refactor JoinType for physical operators
5 participants