Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEA] Implement kernel to support Africa/Casablanca time zone in LEGACY mode. #11562

Open
pxLi opened this issue Oct 7, 2024 · 3 comments
Open
Assignees
Labels
bug Something isn't working feature request New feature or request P0 Must have for release test Only impacts tests

Comments

@pxLi
Copy link
Collaborator

pxLi commented Oct 7, 2024

Describe the bug

this is failing non-utc test in 24.10, first seen in rapids_it-non-utc-pre_release, run:123

FAILED ../../src/main/python/date_time_test.py::test_yyyyMMdd_format_for_legacy_mode[DATAGEN_SEED=1728130368, TZ=Africa/Casablanca] - AssertionError: GPU and CPU int values are different at [14, 'unix_timestamp(a, yyyyMMdd)']

cpu = 231388876800, gpu = 231388873200

@pytest.mark.skipif(not is_supported_time_zone(), reason="not all time zones are supported now, refer to https://github.com/NVIDIA/spark-rapids/issues/6839, please update after all time zones are supported")
    # Test years after 1900, refer to issues: https://github.com/NVIDIA/spark-rapids/issues/11543, https://github.com/NVIDIA/spark-rapids/issues/11539
    def test_yyyyMMdd_format_for_legacy_mode():
        gen = StringGen('(19[0-9]{2}|[2-9][0-9]{3})([0-9]{4})')
>       assert_gpu_and_cpu_are_equal_sql(
            lambda spark : unary_op_df(spark, gen),
            "tab",
            '''select unix_timestamp(a, 'yyyyMMdd'),
                      from_unixtime(unix_timestamp(a, 'yyyyMMdd'), 'yyyyMMdd'),
                      date_format(to_timestamp(a, 'yyyyMMdd'), 'yyyyMMdd')
               from tab
            ''',
            {'spark.sql.legacy.timeParserPolicy': 'LEGACY',
             'spark.rapids.sql.incompatibleDateFormats.enabled': True})

../../src/main/python/date_time_test.py:466: 
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 
../../src/main/python/asserts.py:641: in assert_gpu_and_cpu_are_equal_sql
    assert_gpu_and_cpu_are_equal_collect(do_it_all, conf, is_cpu_first=is_cpu_first)
../../src/main/python/asserts.py:599: in assert_gpu_and_cpu_are_equal_collect
    _assert_gpu_and_cpu_are_equal(func, 'COLLECT', conf=conf, is_cpu_first=is_cpu_first, result_canonicalize_func_before_compare=result_canonicalize_func_before_compare)
../../src/main/python/asserts.py:521: in _assert_gpu_and_cpu_are_equal
    assert_equal(from_cpu, from_gpu)
../../src/main/python/asserts.py:111: in assert_equal
    _assert_equal(cpu, gpu, float_check=get_float_check(), path=[])
../../src/main/python/asserts.py:43: in _assert_equal
    _assert_equal(cpu[index], gpu[index], float_check, path + [index])
../../src/main/python/asserts.py:36: in _assert_equal
    _assert_equal(cpu[field], gpu[field], float_check, path + [field])
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ 

cpu = 231388876800, gpu = 231388873200
float_check = <function get_float_check.<locals>.<lambda> at 0x7f7b98cebbe0>
path = [14, 'unix_timestamp(a, yyyyMMdd)']

Steps/Code to reproduce bug
Please provide a list of steps or a code sample to reproduce the issue.
Avoid posting private or sensitive data.

Expected behavior
Pass or ignore the case

Environment details (please complete the following information)

  • Environment location: [Standalone, YARN, Kubernetes, Cloud(specify cloud provider)]
  • Spark configuration settings related to the issue

Additional context
Add any other context about the problem here.

@pxLi pxLi added bug Something isn't working P0 Must have for release test Only impacts tests labels Oct 7, 2024
@pxLi
Copy link
Collaborator Author

pxLi commented Oct 7, 2024

also cc @res-life to help, this case is still unstable in non-utc environment

@res-life
Copy link
Collaborator

res-life commented Oct 8, 2024

Spark behavior:

scala> spark.conf.set("spark.sql.session.timeZone", "Africa/Casablanca")

scala> spark.conf.set("spark.sql.legacy.timeParserPolicy", "CORRECTED")

scala> spark.sql("select unix_timestamp('42481005', 'yyyyMMdd')").show()
+----------------------------------+
|unix_timestamp(42481005, yyyyMMdd)|
+----------------------------------+
|                       71910716400|
+----------------------------------+


scala> 

scala> spark.conf.set("spark.sql.legacy.timeParserPolicy", "LEGACY")

scala> spark.sql("select unix_timestamp('42481005', 'yyyyMMdd')").show()
+----------------------------------+
|unix_timestamp(42481005, yyyyMMdd)|
+----------------------------------+
|                       71910720000|
+----------------------------------+

GPU limitation

GPU kernel is consistent with the CORRECTED mode, does not fully support LEGACY mode.

Follow-up

For LEGACY mode, need to implement:
Spark code for LEGACY mode: Link
It uses SimpleDateFormat:

class LegacySimpleTimestampFormatter(
    pattern: String,
    zoneId: ZoneId,
    locale: Locale,
    lenient: Boolean = true) extends TimestampFormatter {
  @transient private lazy val sdf = {
    val formatter = new SimpleDateFormat(pattern, locale)
    formatter.setTimeZone(TimeZone.getTimeZone(zoneId))
    formatter.setLenient(lenient)
    formatter
  }

  override def parse(s: String): Long = {
    fromJavaTimestamp(new Timestamp(sdf.parse(s).getTime))
  }

Workaround

Disable this case for branch 24.10 when TZ is not UTC or Asia/Shanghai
Update document to clarify that not all non-DST(daylight saving time) time zones are supported, only tested Asia/Shanghai timezone.

@res-life
Copy link
Collaborator

res-life commented Oct 8, 2024

Let's use this issue to track the support for Africa/Casablanca time zone in legacy mode.

@res-life res-life changed the title [BUG] test_yyyyMMdd_format_for_legacy_mode[DATAGEN_SEED=1728130368, TZ=Africa/Casablanca] - AssertionError: GPU and CPU int values are different [FEA] Implement kernel to support Africa/Casablanca time zone in LEGACY mode. Oct 8, 2024
@res-life res-life added the feature request New feature or request label Oct 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working feature request New feature or request P0 Must have for release test Only impacts tests
Projects
None yet
Development

No branches or pull requests

2 participants