Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] dataproc serverless Integration tests failing in json_matrix_test.py #11500

Closed
yinqingh opened this issue Sep 26, 2024 · 4 comments
Closed
Labels
bug Something isn't working

Comments

@yinqingh
Copy link
Collaborator

Describe the bug
Seeing test failures in rapids-it-dataproc-serverless-2.2#32. The full test logs can be found in Dataproc Serverless job with name "rapids-it-dataproc-serverless-22-32-3-20240925124028"

FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_allow_unquoted_control_chars_off[read_json_df][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_allow_unquoted_control_chars_off[read_json_sql][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_allow_unquoted_control_chars_off[DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_bytes[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_bytes[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_bytes[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_shorts[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_shorts[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_shorts[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_ints[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_ints[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_ints[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_longs[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_longs[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_longs[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-bad_whitespace.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-bad_whitespace.json-DecimalType(38,10)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-bad_whitespace.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-scan_emtpy_lines.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-scan_emtpy_lines.json-DecimalType(38,10)][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_decs[read_json_df-scan_emtpy_lines.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_decs[bad_whitespace.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_decs[bad_whitespace.json-DecimalType(38,10)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_decs[bad_whitespace.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_strings[read_json_df-boolean_formatted.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_strings[read_json_df-invalid_ridealong_columns.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_strings[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_strings[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_strings[boolean_formatted.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_strings[invalid_ridealong_columns.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_strings[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_bools[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_bools[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_bools[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_floats[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_floats[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, APPROXIMATE_FLOAT]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_floats[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_doubles[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, APPROXIMATE_FLOAT]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_doubles[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, APPROXIMATE_FLOAT]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_doubles[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, APPROXIMATE_FLOAT, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_corrected_dates[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_corrected_timestamps[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_long_arrays[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_long_arrays[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_long_arrays[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_string_arrays[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_string_arrays[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_string_arrays[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_long_structs[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_long_structs[read_json_df-nested_escaped_strings.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_long_structs[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_long_structs[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_long_structs[nested_escaped_strings.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_string_structs[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_string_structs[read_json_df-nested_escaped_strings.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_string_structs[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_string_structs[bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_string_structs[nested_escaped_strings.json][DATAGEN_SEED=0, TZ=UTC, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_dec_arrays[read_json_df-bad_whitespace.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_dec_arrays[read_json_df-bad_whitespace.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_dec_arrays[read_json_df-scan_emtpy_lines.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_dec_arrays[read_json_df-scan_emtpy_lines.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_dec_arrays[bad_whitespace.json-DecimalType(38,0)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_from_json_dec_arrays[bad_whitespace.json-DecimalType(10,2)][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, ALLOW_NON_GPU(FileSourceScanExec)]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_mixed_struct[read_json_df-bad_whitespace.json][DATAGEN_SEED=0, TZ=UTC]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_mixed_struct[read_json_df-nested_escaped_strings.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]
FAILED rapids-it-dataproc-serverless-32/integration_tests/src/main/python/json_matrix_test.py::test_scan_json_mixed_struct[read_json_df-scan_emtpy_lines.json][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]

Observed different errors in the report

Caused by: java.lang.IllegalStateException: No empty row count provided and the table read has no row count or columns
Caused by: ai.rapids.cudf.CudfException: CUDF failure at: target/libcudf-install/include/cudf/column/column_factories.hpp:342: Invalid, non-fixed-width type.

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Dataproc Serverless version 2.2.20
  • Scala213

Additional context
Add any other context about the problem here.

@yinqingh yinqingh added ? - Needs Triage Need team to review and classify bug Something isn't working labels Sep 26, 2024
@revans2
Copy link
Collaborator

revans2 commented Sep 26, 2024

@yinqingh do you have a more complete stack trace of the exceptions?

scan and from_json got fixes for the row count provided errors as a part of the same PR #11464 which also started to mark some of the tests that are failing as no longer marked with xfail.

Is it possible that you are running a version of the tests that is newer than the version of the plugin that is installed on dataproc?

@nartal1
Copy link
Collaborator

nartal1 commented Sep 26, 2024

Seeing similar failures on dataproc-serverless on json_test.py

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_infer_schema_round_trip[-String][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT, ALLOW_NON_GPU(FileSourceScanExec)]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_infer_schema_round_trip[json-String][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT, ALLOW_NON_GPU(FileSourceScanExec)]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_round_trip[json-Byte][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM, APPROXIMATE_FLOAT]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[-None][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[json-yyyy/MM/dd'T'HH:mm][DATAGEN_SEED=0, TZ=UTC]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[json-yyyy-MM'T'HH:mm[:ss]][DATAGEN_SEED=0, TZ=UTC]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[json-MM-yyyy'T'HH:mm:ss][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[json-MM/yyyy'T'HH:mm:ss[.SSS]][DATAGEN_SEED=0, TZ=UTC]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip[json-dd/MM/yyyy'T'HH:mm:ss.SSSXXX][DATAGEN_SEED=0, TZ=UTC, INJECT_OOM]

[2024-09-26T18:29:47.838Z] FAILED rapids-it-dataproc-serverless-33/integration_tests/src/main/python/json_test.py::test_json_ts_formats_round_trip_ntz_v1[TIMESTAMP_LTZ-yyyy/MM/dd][DATAGEN_SEED=0, TZ=UTC, 

@yinqingh
Copy link
Collaborator Author

Confirmed that rapids-it-dataproc-serverless-2.2#32 used inconsistent plugin jar (00cd422) and IT package (a34f33e) due to prerelease version shifting, which caused these failures.

But for the failures mentioned by @nartal1 in rapids-it-dataproc-serverless-2.2#33, confirmed that the revisions of the tests and plugin are the same (6a9731f)

@mattahrens mattahrens removed the ? - Needs Triage Need team to review and classify label Oct 1, 2024
@mattahrens
Copy link
Collaborator

Closing as subsequent jobs have passed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

4 participants