Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Datetime(time_unit, time_zone) and Duration(time_unit) types #960

Open
wants to merge 34 commits into
base: main
Choose a base branch
from

Conversation

FBruzzesi
Copy link
Member

@FBruzzesi FBruzzesi commented Sep 13, 2024

What type of PR is this? (check all applicable)

  • πŸ’Ύ Refactor
  • ✨ Feature
  • πŸ› Bug Fix
  • πŸ”§ Optimization
  • πŸ“ Documentation
  • βœ… Test
  • 🐳 Other

Related issues

Checklist

  • Code follows style guide (ruff)
  • Tests added
  • Documented the changes

If you have comments or can explain your changes, please do so below.

Introduces time units and time zones in Datetime type.

@github-actions github-actions bot added the enhancement New feature or request label Sep 13, 2024
@@ -75,13 +75,13 @@ def test_cast_date_datetime_pandas() -> None:
df = df.select(nw.col("a").cast(nw.Datetime))
result = nw.to_native(df)
expected = pd.DataFrame({"a": [datetime(2020, 1, 1), datetime(2020, 1, 2)]}).astype(
{"a": "timestamp[ns][pyarrow]"}
{"a": "timestamp[us][pyarrow]"}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes the default to the polars one

nw.col("date")
.cast(nw.Datetime("ms", time_zone="Europe/Rome"))
.cast(nw.String())
.str.slice(offset=0, length=19)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

19: number of characters of "2024-01-01 01:00:00". The format right after that is different for each backend

Comment on lines +227 to +230
pd_datetime_rgx = (
r"^datetime64\[(?P<time_unit>ms|us|ns)(?:, (?P<time_zone>[a-zA-Z\/]+))?\]$"
)
pa_datetime_rgx = r"^timestamp\[(?P<time_unit>ms|us|ns)(?:, tz=(?P<time_zone>[a-zA-Z\/]+))?\]\[pyarrow\]$"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try to break these πŸ™ˆ

Comment on lines 441 to 448
# Pandas does not support "ms" or "us" time units before version 1.5.0
# Let's overwrite with "ns"
if implementation is Implementation.PANDAS and backend_version < (
1,
5,
0,
): # pragma: no cover
time_unit = "ns"
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we can do much else here

@FBruzzesi
Copy link
Member Author

@MarcoGorelli any experience on how to locate timezone database at "C:\Users\runneradmin\Downloads\tzdata" ? πŸ˜‚

@FBruzzesi FBruzzesi changed the title feat: time zone aware Datetime type feat: Datetime(time_unit, time_zone) and Duration(time_unit) types Sep 14, 2024
@MarcoGorelli
Copy link
Member

wow, nice one! i'll try breaking it a bit, but this is amazing, been wanting to do this for a while πŸš€

@MarcoGorelli
Copy link
Member

I think the dtype comparison isn't quite right:

In [17]: s
Out[17]:
shape: (1,)
Series: '' [datetime[ΞΌs, Asia/Kathmandu]]
[
        2019-12-31 18:15:00 +0545
]

In [18]: nw.from_native(s, allow_series=True).dtype == nw.Datetime('us')
Out[18]: True

In [19]: s.dtype == pl.Datetime('us')
Out[19]: False

@FBruzzesi
Copy link
Member Author

I think the dtype comparison isn't quite right:

In [17]: s
Out[17]:
shape: (1,)
Series: '' [datetime[ΞΌs, Asia/Kathmandu]]
[
        2019-12-31 18:15:00 +0545
]

In [18]: nw.from_native(s, allow_series=True).dtype == nw.Datetime('us')
Out[18]: True

In [19]: s.dtype == pl.Datetime('us')
Out[19]: False

Nice catch! Thanks! Will fix later on πŸ‘Œ

@MarcoGorelli
Copy link
Member

πŸ€” interesting, so elif dtype in {nw.Datetime, nw.Date}: breaks if we add the time_unit and time_zone attributes

@MarcoGorelli
Copy link
Member

do we need to keep this as return hash(self.__class__) so that

nw.Datetime('us', 'foo') in {nw.Datetime}

keeps working?

It's unfortunate that I'd written the code in Altair like that, but I also don't think it's too bad, it's just a little annoying that it departs from what Polars does

this might actually be a really good use-case for our stable v1 api. keep Altair's code working as-is, but make the __hash__ behaviour align with Polars' in the main Narwhals namespace and in v2 (when we get there)

@FBruzzesi
Copy link
Member Author

this might actually be a really good use-case for our stable v1 api. keep Altair's code working as-is, but make the __hash__ behaviour align with Polars' in the main Narwhals namespace and in v2 (when we get there)

I wonder how that would look like concretely. Would we re-define the type in /stable/v1.py with the current behavior and evolve it in the main codebase?

@FBruzzesi
Copy link
Member Author

As altair CI is failing (dtype in {nw.Datetime, nw.Date} is evaluated to false because of the equality method), I think to easiest way for stability is to have the previous implementation of Datetime and Duration in /stable/v1.py.

@MarcoGorelli WDYT?

@MarcoGorelli
Copy link
Member

i think this isn't super-easy because there's a few places where we do from narwhals import dtypes

@MarcoGorelli
Copy link
Member

ok this is going to take some more effort

(Pdb) p type(column)
<class 'narwhals.stable.v1.Series'>
(Pdb) p type(column.dtype)
<class 'narwhals.dtypes.Datetime'>

I think the second one should be

<class 'narwhals.stable.v1.Datetime'>

this is going to be tricky to get right, but I think it'll be worth it. i'll spend some more time on this

@FBruzzesi
Copy link
Member Author

Could we use the approach suggested in #1046 ?

@MarcoGorelli
Copy link
Member

Looks like this might be it πŸ₯³ 🍾 can't believe it...this took hours...worth it though. It means we can change the main narwhals namespace, with zero impact on Altair users. as it turns out, the stable api was really worth doing... this is so rewarding πŸ•Ί

i think the nightly ci failure is unrelated

i'll check this again tomorrow, then hopefully we can make a release with this in on Tuesday

@FBruzzesi
Copy link
Member Author

That's awesome! I will take some time this upcoming week to check what happened in detail!
Also plotly has a test with specific time-zone, thus I am looking forward to the release of this feature 😁

However, to use the "edge" dtypes (or in general features), should we

+ import narwhals as nw
- import narwhals.stable.v1 as nw

?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[Enh]: Add time units and time zone specifics
2 participants