Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[C++][Parquet] Dataset: ParquetFileFragment::EvaluateStatisticsAsExpression should better checks Statistics::HasNullCount #43712

Closed
mapleFU opened this issue Aug 15, 2024 · 1 comment

Comments

@mapleFU
Copy link
Member

mapleFU commented Aug 15, 2024

Describe the enhancement requested

ParquetFileFragment::EvaluateStatisticsAsExpression filters parquet file with parquet statistics, the function is listed below:

if (statistics.num_values() == 0 && statistics.null_count() > 0) {

statistics.null_count() is used here, however, there're merely case when !statistics.HasNullCount(). So this function should check statistics.HasNullCount() before using that

!statistics.HasNullCount() is merely happens, since parquet-java and parquet-c++ always writes this even when null-count == 0. However, parquet-rs previously don't write it when count == 0 . And maybe some legacy file without this.

So as a result, we need check !statistics.HasNullCount() here

Component(s)

C++, Parquet

mapleFU added a commit that referenced this issue Sep 6, 2024
…ly when !HasNullCount() (#43726)

### Rationale for this change

See issue. When `!HasNullCount`, we cannot gurantee null exists

### What changes are included in this PR?

Handle HasNullCount in dataset expr

### Are these changes tested?

Yes

### Are there any user-facing changes?

Merely

* GitHub Issue: #43712

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: mwish <[email protected]>
@mapleFU mapleFU added this to the 18.0.0 milestone Sep 6, 2024
@mapleFU
Copy link
Member Author

mapleFU commented Sep 6, 2024

Issue resolved by pull request 43726
#43726

@mapleFU mapleFU closed this as completed Sep 6, 2024
khwilson pushed a commit to khwilson/arrow that referenced this issue Sep 14, 2024
…orrectly when !HasNullCount() (apache#43726)

### Rationale for this change

See issue. When `!HasNullCount`, we cannot gurantee null exists

### What changes are included in this PR?

Handle HasNullCount in dataset expr

### Are these changes tested?

Yes

### Are there any user-facing changes?

Merely

* GitHub Issue: apache#43712

Lead-authored-by: mwish <[email protected]>
Co-authored-by: mwish <[email protected]>
Co-authored-by: Antoine Pitrou <[email protected]>
Signed-off-by: mwish <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant