-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add aggregate relation tests #103
Conversation
] | ||
} | ||
}, | ||
"expressions": [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These project after aggregate expressions in Datafusion and DuckDB are technically correct but are weird. Also these add new fields (but don't restrict the output) which mean that the total number of names is going to be wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do you think that's something that the duckdb/datafusion folks should "fix"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's something that should be fixed.
"project": { | ||
"common": { | ||
"emit": { | ||
"outputMapping": [3, 4, 5] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This project is hiding whether or not the aggregate outputs an additional column.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you explain a little more? how is it doing that?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, so the output fields were 0, 1, 2 and they are being reexposed here as three fields. If there was an extra column (needed if there are two or more grouping sets) then the columns should be 4, 5, 6. It's not clear if they're treating the extra column as the first remapped column. I suppose the correctness tests and types could validate that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For the future: the producer tests are not satisfying to me as a test methodology. Version numbers should only really matter if we're testing at a particular Substrait version. The validator is testing plan generation validity and the tests validate correctness (at least for the platforms we try it against). If the validator is doing its job then we shouldn't need to store the generated plans. For correctness I'd argue that a Producer has to run its plan against its own system and at least one alternative successfully for it to be correct.
Yeah, that was the original idea for this testing framework. Hopefully we can get to that point eventually where producers/consumers are mature enough to do that. |
No description provided.