PARQUET-2464: Fix dictionaryPageOffset flag setting in toParquetMetadata method #1340
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Issue
toParquetMetadata method converts org.apache.parquet.hadoop.metadata.ParquetMetadata to org.apache.parquet.format.FileMetaData but this does not set the dictionary page offset bit in FileMetaData.
When a FileMetaData object is serialized while writing to the footer and then deserialized, the dictionary offset is lost as the dictionary page offset bit was never set.
PARQUET-1850 tried to fix this but it did only a partial fix.
It sets setDictionary_page_offset only if getEncodingStats are present
Fix
However, it should setDictionary_page_offset even when getEncodingStats are not present but encodings are present.
It should use the implementation in ColumnChunkMetatdata below:
So new change in ParquetMetadataCOnvertor should be like:
Test
Added a new UT for this scenario. All existing UTs should also pass.