Skip to content

Commit

Permalink
feat: normalize the join types (#662)
Browse files Browse the repository at this point in the history
There are two groups of join types in the definition with differing
enums.

This PR leaves JoinRel's SEMI and ANTI as the canonical names for
LEFT_SEMI and LEFT_ANTI.  Aliases are not allowed due to JSON (and
text) serialization behavior.

This PR also adds RIGHT_SEMI and RIGHT_ANTI to JoinRel's JoinType.

RIGHT_SINGLE is added to all types.  The PR correspondingly renames
SINGLE to LEFT_SINGLE, ANTI TO LEFT_ANTI, and SEMI to LEFT_SEMI.
Finally this PR adds LEFT_SINGLE to all of the other join types.

BREAKING CHANGE:    JoinRel's type enum now has LEFT_SINGLE
instead of SINGLE.  Similarly there is now LEFT_ANTI and LEFT_SEMI.
Other values are available in all join type enums. This affects JSON and
text formats only (binary plans -- the interoperable part of Substrait --
will still be compatible before and after this change).

---------

Co-authored-by: Jacques Nadeau <[email protected]>
  • Loading branch information
EpsilonPrime and jacques-n authored Aug 8, 2024
1 parent 77bf3a1 commit bed84ec
Show file tree
Hide file tree
Showing 2 changed files with 20 additions and 10 deletions.
17 changes: 12 additions & 5 deletions proto/substrait/algebra.proto
Original file line number Diff line number Diff line change
Expand Up @@ -173,11 +173,12 @@ message JoinRel {
JOIN_TYPE_OUTER = 2;
JOIN_TYPE_LEFT = 3;
JOIN_TYPE_RIGHT = 4;
JOIN_TYPE_SEMI = 5;
JOIN_TYPE_ANTI = 6;
// This join is useful for nested sub-queries where we need exactly one record in output (or throw exception)
// See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf
JOIN_TYPE_SINGLE = 7;
JOIN_TYPE_LEFT_SEMI = 5;
JOIN_TYPE_LEFT_ANTI = 6;
JOIN_TYPE_LEFT_SINGLE = 7;
JOIN_TYPE_RIGHT_SEMI = 8;
JOIN_TYPE_RIGHT_ANTI = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -654,6 +655,8 @@ message HashJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down Expand Up @@ -700,6 +703,8 @@ message MergeJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand All @@ -726,6 +731,8 @@ message NestedLoopJoinRel {
JOIN_TYPE_RIGHT_SEMI = 6;
JOIN_TYPE_LEFT_ANTI = 7;
JOIN_TYPE_RIGHT_ANTI = 8;
JOIN_TYPE_LEFT_SINGLE = 9;
JOIN_TYPE_RIGHT_SINGLE = 10;
}

substrait.extensions.AdvancedExtension advanced_extension = 10;
Expand Down
13 changes: 8 additions & 5 deletions site/docs/relations/logical_relations.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,15 +222,18 @@ The join operation will combine two separate inputs into a single output, based

### Join Types

| Type | Description |
| ----- | ------------------------------------------------------------ |
| Type | Description |
| ----- |-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Inner | Return records from the left side only if they match the right side. Return records from the right side only when they match the left side. For each cross input match, return a record including the data from both sides. Non-matching records are ignored. |
| Outer | Return all records from both the left and right inputs. For each cross input match, return a record including the data from both sides. For any remaining non-match records, return the record from the corresponding input along with nulls for the opposite input. |
| Left | Return all records from the left input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the left input, return the left record along with nulls for the right input. |
| Right | Return all records from the right input. For each cross input match, return a record including the data from both sides. For any remaining non-matching records from the right input, return the right record along with nulls for the left input. |
| Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
| Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
| Single | Returns one join partner per entry on the left input. If more than one join partner exists, there are two valid semantics. 1) Only the first match is returned. 2) The system throws an error. If there is no match between the left and right inputs, NULL is returned. |
| Left Semi | Returns records from the left input. These are returned only if the records have a join partner on the right side. |
| Right Semi | Returns records from the right input. These are returned only if the records have a join partner on the left side. |
| Left Anti | Return records from the left input. These are returned only if the records do not have a join partner on the right side. |
| Right Anti | Return records from the right input. These are returned only if the records do not have a join partner on the left side. |
| Left Single | Return all records from the left input with no join expansion. If at least one record from the right input matches the left, return one arbitrary matching record from the right input. For any left records without matching right records, return the left record along with nulls for the right input. Similar to a left outer join but only returns one right match at most. Useful for nested sub-queries where we need exactly one record in output (or throw exception). See Section 3.2 of https://15721.courses.cs.cmu.edu/spring2018/papers/16-optimizer2/hyperjoins-btw2017.pdf for more information. |
| Right Single | Same as left single except that the right and left inputs are switched. |


=== "JoinRel Message"
Expand Down

0 comments on commit bed84ec

Please sign in to comment.