Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
GH-23870: [Python] Ensure parquet.write_to_dataset doesn't create emp…
…ty files for non-observed dictionary (category) values (#36465) ### What changes are included in this PR? If we partition on a categorical variable with "unobserved" categories (values present in the dictionary, but not in the actual data), the legacy path in `pq.write_to_dataset` currently creates empty files. The new dataset-based path already has the preferred behavior, and this PR fixes it for the legacy path and adds a test for both as well. This also fixes one of the pandas deprecation warnings listed in #36412 ### Are these changes tested? Yes ### Are there any user-facing changes? Yes, this no longer creates a hive-style directory with one empty file (parquet file with 0 rows) when users have unobserved categories. However, this aligns the legacy path with the new and default dataset-based path. * Closes: #23870 Authored-by: Joris Van den Bossche <[email protected]> Signed-off-by: Joris Van den Bossche <[email protected]>
- Loading branch information