Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to define a test file format #681

Open
scgkiran opened this issue Aug 14, 2024 · 3 comments
Open

Proposal to define a test file format #681

scgkiran opened this issue Aug 14, 2024 · 3 comments

Comments

@scgkiran
Copy link
Contributor

Define a simple and human-readable test file format. It should be easy to parse programmatically.
Define ANTLR grammar for the test file format. This will enable parser generation for multiple languages.
Define format of literals for various data types. It should include both simple and complex data types
It should cover test format for all types of functions Scalar/Aggregate/Windows
This will allow to build a tool to report coverage for each of the substrait functions

@scgkiran
Copy link
Contributor Author

Created a draft PR #680

@EpsilonPrime
Copy link
Member

I have a few questions about the proposed test file format mainly stemming from not knowing what the format's intended use would be.

  • What use cases would this test format handle that isn't already covered by the test format in substrait-io/bft (which handles tests of functions)?
  • Is there provision for testing how relations work as defined in substrait-io/consumer-testing?
  • Do the use cases require cross-language compatibility? If the cross-language capability is required would a protobuffer definition suffice or is an ANTLR grammar truly necessary?
  • Should the test file format need to live in the specification repository (here)?

@jacques-n
Copy link
Contributor

Will answer some high level questions on these. We can discuss more in the sync tomorrow.

What use cases would this test format handle that isn't already covered by the test format in substrait-io/bft (which handles tests of functions)?

The intention is for this to supplant the BFT test files. These files define the semantics of the functions are really an extended part of the documentation of function semantics. We have updates we're working on so the BFT framework would source these for testing. The current format in BFT is extremely verbose, making it burdensome to build test cases and difficult to accurately assess scan test cases and clarify the quality of the coverage.

Is there provision for testing how relations work as defined in substrait-io/consumer-testing?

We're focused on these one at a time. We figured we'd start with scalar functions then move through other things one at a time. I'm not sure we'd be likely to get to relations.

Do the use cases require cross-language compatibility? If the cross-language capability is required would a protobuffer definition suffice or is an ANTLR grammar truly necessary?

As mentioned above, we brainstormed several iterations. The main thing we struggled with is that the human observability/intuitiveness aspect gets lost the moment you try to make focus on making this easy to consume for a machine.

Should the test file format need to live in the specification repository (here)?

Yes, I think it should. It is a specification as much as a test. This is this function behaves under certain conditions. I think of it as simply an extension of the specification in the yaml files to help clear up ambiguities.

We have some intention to additionally introduce a coverage tool which gives us an overview of the number of test cases per function. Most of our functions are underspecified at the moment. Once we get to coverage, I would recommend that new functions must include clear specified behavior for common cases, edge cases, as well different option combinations. Otherwise, we aren't really specifying anything concrete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants