-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unable to set dictionary_page_offset when encoding_stats are missing #2962
Labels
Comments
Thanks for reporting the issue! I think there is a similar effort to resolve this issue but it looks more complicated than it appears: #1340 |
mothukur
added a commit
to mothukur/parquet-java
that referenced
this issue
Sep 13, 2024
…ing_stats are missing
mothukur
added a commit
to mothukur/parquet-java
that referenced
this issue
Sep 13, 2024
…ing_stats are missing
I've submitted a PR with the fix. Could you please review it? |
wgtmac
pushed a commit
that referenced
this issue
Sep 24, 2024
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Describe the bug, including details regarding any error messages, version, and platform.
I am facing an issue while splitting a parquet file into multiple files using the ParquetFileWriter.appendRowGroups API. It is failing to set the dictionary page offsets correctly in the new files. When investigated further, I observed that the API ParquetMetadataConverter.addRowGroup has an assumption on the availability of EncodingStats always. As per the format specification, it is not mandatory to have the encoding_stats. Is it possible to remove this requirement?
https://github.com/apache/parquet-java/blob/master/parquet-hadoop/src/main/java/org/apache/parquet/format/converter/ParquetMetadataConverter.java#L559
https://github.com/apache/parquet-format/blob/master/src/main/thrift/parquet.thrift#L826
Component(s)
No response
The text was updated successfully, but these errors were encountered: