-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
GH-459: Add Variant logical type annotation #460
base: master
Are you sure you want to change the base?
Conversation
|
||
* The top level must be a group annotated with `VARIANT` that contains a | ||
`binary` field named `metadata`, and a `binary` field named `value`. | ||
* Additional fields which start with `_` (underscore) can be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this needed? None of the other types allow writing columns that should be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was desired in case there were some additional (but redundant) metadata or values we might store, and still allow it to be a valid Variant value (group).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think that we want to add ignored columns. If we need to update the spec because something is missing, we should just do that directly instead of working around it with unspecified columns that only work in certain proprietary cases.
/** | ||
* Embedded Variant logical type annotation | ||
*/ | ||
struct VariantType { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rdblue Thanks! I updated the PR.
|
||
* The top level must be a group annotated with `VARIANT` that contains a | ||
`binary` field named `metadata`, and a `binary` field named `value`. | ||
* Additional fields which start with `_` (underscore) can be ignored. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was desired in case there were some additional (but redundant) metadata or values we might store, and still allow it to be a valid Variant value (group).
### VARIANT | ||
|
||
`VARIANT` is used for a Variant value. It must annotate a group. The group must | ||
contain a `binary` field named `metadata`, and a `binary` field named `value`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have been using BYTE_ARRAY
instead of binary
in this doc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right. The type is BYTE_ARRAY
in thrift but binary
in actual type definitions.
I think that binary
is more clear, but we should mention that they are synonyms at a minimum. How about this?
The group must contain a field named
metadata
and a field namedvalue
. Both fields must have typebinary
, which is also calledBYTE_ARRAY
in the Parquet thrift definition.
This looks close to me. I think we just need to fix two things:
|
Rationale for this change
Add a variant logical type.
What changes are included in this PR?
Additions to the types thrift definition, and the description of the logical type. The actual Variant spec documents are unchanged, and will be addressed later in a separate PR.
Closes #459