Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FEAT]: Interval dtype #3018

Open
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

universalmind303
Copy link
Collaborator

@universalmind303 universalmind303 commented Oct 7, 2024

Description

implement the arrow interval datatype in daft.

Why?

Our current duration dtype is useful for various operations, but it only designed to handle fixed durations. Users will often want to perform operations using relative times such as years, months, and weeks, but this is currently only possible by incorporating additional libraries to handle the relative to fixed conversions for us.

The addition of Interval means that we can natively operate on relative times. This opens up a lot of possibilities for SQL, as we can now use the SQL INTERVAL operator to operate on datetimes.

Still TODO

  • implement ops for Interval
    • add
    • sub
  • python tests

Note for reviewer:

arrow's interval spec is kinda weird, and is actually 3 different implementations depending on the interval

  • years and months are represented as a logical type over i32
  • days/millis is a physical type over (i32, i32)
  • month/day/nano is a physical type over (i32, i32, i64)

Since most of the kernels are built around the month/day/nano variant, I created the daft Interval type around that one. This introduces a slight amount of overhead when using only year/month (extra i32 & i64) or days/millis (extra i64)

But considering the most common pattern for interval is to do date math on Timestamp columns, this should not be a huge concern.

@github-actions github-actions bot added the enhancement New feature or request label Oct 7, 2024
Copy link

codspeed-hq bot commented Oct 7, 2024

CodSpeed Performance Report

Merging #3018 will not alter performance

Comparing universalmind303:interval-dtype (51fda30) with main (c2397bf)

Summary

✅ 17 untouched benchmarks

@universalmind303 universalmind303 marked this pull request as ready for review October 8, 2024 17:10
Comment on lines +143 to +145
// ----------------
// Python
// ----------------
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these match statements were very difficult to comprehend. the comments IMO make it much easier to follow the matching logic.

Copy link

codecov bot commented Oct 8, 2024

Codecov Report

Attention: Patch coverage is 72.26463% with 109 lines in your changes missing coverage. Please review.

Project coverage is 78.38%. Comparing base (0c39007) to head (51fda30).
Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/daft-core/src/datatypes/interval.rs 80.00% 46 Missing ⚠️
src/daft-core/src/series/ops/arithmetic.rs 38.46% 16 Missing ⚠️
src/daft-core/src/array/ops/repr.rs 0.00% 10 Missing ⚠️
src/daft-core/src/array/from_iter.rs 50.00% 9 Missing ⚠️
src/daft-core/src/array/serdes.rs 0.00% 9 Missing ⚠️
src/daft-dsl/src/lit.rs 56.25% 7 Missing ⚠️
src/daft-core/src/series/serdes.rs 0.00% 4 Missing ⚠️
src/daft-schema/src/python/datatype.rs 0.00% 3 Missing ⚠️
src/daft-core/src/array/ops/sort.rs 0.00% 2 Missing ⚠️
src/daft-table/src/repr_html.rs 0.00% 2 Missing ⚠️
... and 1 more
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3018      +/-   ##
==========================================
- Coverage   78.39%   78.38%   -0.02%     
==========================================
  Files         603      604       +1     
  Lines       71443    71897     +454     
==========================================
+ Hits        56007    56355     +348     
- Misses      15436    15542     +106     
Files with missing lines Coverage Δ
daft/expressions/__init__.py 100.00% <100.00%> (ø)
daft/expressions/expressions.py 93.78% <100.00%> (+0.02%) ⬆️
src/daft-core/src/array/growable/arrow_growable.rs 100.00% <ø> (ø)
src/daft-core/src/array/growable/mod.rs 100.00% <ø> (ø)
src/daft-core/src/array/ops/as_arrow.rs 100.00% <ø> (ø)
src/daft-core/src/array/ops/compare_agg.rs 66.50% <ø> (ø)
src/daft-core/src/array/ops/get.rs 78.57% <ø> (ø)
src/daft-core/src/array/ops/take.rs 99.41% <ø> (ø)
src/daft-core/src/array/ops/time.rs 85.63% <100.00%> (+1.26%) ⬆️
src/daft-core/src/datatypes/infer_datatype.rs 86.00% <100.00%> (+0.08%) ⬆️
... and 19 more

... and 4 files with indirect coverage changes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant