Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] test_yyyyMMdd_format_for_legacy_mode[DATAGEN_SEED=1727619674, TZ=UTC] failed GPU and CPU are not both null #11543

Closed
pxLi opened this issue Sep 30, 2024 · 5 comments · Fixed by #11545
Assignees
Labels
? - Needs Triage Need team to review and classify bug Something isn't working test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Sep 30, 2024

Describe the bug
first seen in rapids_integration-scala213-pre_release-github, run: 127 but no other occurrence.

It could be a failure when specific DATAGEN_SEED=1727619674.

[2024-09-29T16:31:20.704Z] FAILED ../../src/main/python/date_time_test.py::test_yyyyMMdd_format_for_legacy_mode[DATAGEN_SEED=1727619674, TZ=UTC] - AssertionError: GPU and CPU are not both null at [1733, 'unix_timestamp(a, yyyyMMdd)']
[2024-09-29T16:31:20.705Z] = 1 failed, 30669 passed, 1886 skipped, 598 xfailed, 658 xpassed, 15102 warnings in 7805.95s (2:10:05) =
[2024-09-29T16:31:20.703Z]         elif (cpu == None):
[2024-09-29T16:31:20.703Z] >           assert cpu == gpu, "GPU and CPU are not both null at {}".format(path)
[2024-09-29T16:31:20.703Z] E           AssertionError: GPU and CPU are not both null at [1733, 'unix_timestamp(a, yyyyMMdd)']

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
A clear and concise description of what you expected to happen.

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@pxLi pxLi added ? - Needs Triage Need team to review and classify bug Something isn't working test Only impacts tests labels Sep 30, 2024
@firestarman
Copy link
Collaborator

One more is related to yyyyMMdd : #11539

@res-life
Copy link
Collaborator

TZ=UTC

GPU and CPU have different behavoirs for UTC timezone.

@res-life
Copy link
Collaborator

Diff is:

CPU: -Row(a='15821010', unix_timestamp(a, yyyyMMdd)=None)
GPU: +Row(a='15821010', unix_timestamp(a, yyyyMMdd)=-12219724800)

We already documented that LEGACY mode has several limitations:

LEGACY timeParserPolicy support has the following limitations when running on the GPU:

Only 4 digit years are supported
The proleptic Gregorian calendar is used instead of the hybrid Julian+Gregorian calendar that Spark uses in legacy mode
When format is yyyyMMdd, GPU only supports 8 digit strings. Spark supports like 7 digit 2024101 string while GPU does not support.

@mattahrens
Copy link
Collaborator

Re-opened to make sure this feature is not enabled by default given the limitations with incompatibilities with Spark CPU results.

@revans2
Copy link
Collaborator

revans2 commented Oct 1, 2024

Actually this is already disabled by default. spark.rapids.sql.incompatibleDateFormats.enabled is the config that is being turned on in the test to let this run on the GPU so we are good to go with this.

@revans2 revans2 closed this as completed Oct 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
? - Needs Triage Need team to review and classify bug Something isn't working test Only impacts tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants