Skip to content

Releases: Eventual-Inc/Daft

v0.3.8

11 Oct 15:34
ab1b772
Compare
Choose a tag to compare

Changes

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.3.7

11 Oct 02:30
f5d365b
Compare
Choose a tag to compare

Changes

👾 Bug Fixes

  • [BUG] Fix reading of logical types from streaming parquet @colin-ho (#3027)
  • [BUG] Fix reading of logical types from Parquet files in s3 @jaychia (#3026)

🧰 Maintenance

v0.3.6

08 Oct 22:14
64b8699
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.3.5

01 Oct 18:14
fe4553f
Compare
Choose a tag to compare

Changes

✨ New Features

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.3.4

25 Sep 20:00
a8602a2
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

🧰 Maintenance

⬆️ Dependencies

v0.3.3

18 Sep 23:01
6766955
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Use to_arrow_iter in to_arrow to avoid unnecessary array concats @jaychia (#2780)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.3.2

05 Sep 19:54
fa6a482
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

  • [PERF] Make merging of ScanTasks be more conservative when provided with a LIMIT @jaychia (#2758)

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.3.1

24 Aug 00:40
bf5c853
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

  • [BUG] Use python logging level @colin-ho (#2705)
  • [BUG] Add a with_execution/planning_config context manager and fix tests for splitting of parquet @jaychia (#2713)
  • [BUG] Fix Resource Request Serialization and factor our Serialize Object as bincode @samster25 (#2707)

📖 Documentation

  • [DOCS] Partitioning user guide and small doc fixes @jaychia (#2717)
  • [FEAT] (ACTORS-2) Add optimization pass to split Project into ActorPoolProject @jaychia (#2627)
  • [BUG] Add a with_execution/planning_config context manager and fix tests for splitting of parquet @jaychia (#2713)
  • Update PreCommit Hooks @samster25 (#2715)
  • [FEAT]: huggingface integration @universalmind303 (#2701)

🧰 Maintenance

v0.3.0

20 Aug 22:13
b3f5260
Compare
Choose a tag to compare

‼️ v0.2 → v0.3 Migration Guide ‼️

We're proud to release version 0.3.0 of Daft! Please note that with this minor version increment, v0.3 contains several breaking changes:

  • daft.read_delta_lake
    • This function was deprecated in favor of daft.read_deltalake in v0.2.26 and is now removed. (#2663)
  • daft.read_parquet / daft.read_csv / daft.read_json
    • Schema hints are deprecated in favor of infer_schema (whether to turn on schema inference) and schema (a definitive schema if infer_schema is False, otherwise it is used as a schema hint that is applied post inference). (#2326)
  • Expression.str.normalize()
    • Parameters are now all False by default, and need to individually be toggled on. (#2647)
  • DataFrame.agg / GroupedDataFrame.agg
    • Tuple syntax for aggregations was deprecated in v0.2.18 and is now no longer supported. Please use aggregation expressions instead. (#2663)
    • Ex: df.agg([(col("x"), "sum"), (col("y"), "mean")]) should be written instead as df.agg(col("x").sum(), col("y").mean())
  • DataFrame.count
    • Calling .count() with no arguments will now return a DataFrame with column “count” which contains the length of the entire DataFrame, instead of the count for each of the columns (#1996)
  • DataFrame.with_column
    • Resource requests should now be specified on UDF expressions (@udf(num_gpus=…)) instead of on Projections (through .with_column(..., resource_request=...) (#2654)
  • DataFrame.join
    • When joining two DataFrames, columns will now be merged only if they exactly match join keys. (#2631)
    • Ex:
df1 = daft.from_pydict({
	"a": ["x", "y"],
	"b": [1, 2]
})

df2 = daft.from_pydict({
	"a": ["y", "z"],
	"b": [20, 30]
})

result_df = df1.join(
	df2, 
	left_on=[col("a"), col("b")],
	right_on=[col("a"), col("b")/10], # NOTE THE "/10"
	how="outer"
)

result_df.sort("a").collect()
# before
╭──────┬───────╮
│ a    ┆ b     │
│ ---  ┆ ---   │
│ Utf8 ┆ Int64 │
╞══════╪═══════╡
│ x    ┆ 1     │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ y    ┆ 2     │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┤
│ z    ┆ 30    │
╰──────┴───────╯

# after
╭──────┬───────┬─────────╮
│ a    ┆ b     ┆ right.b │
│ ---  ┆ ---   ┆ ---     │
│ Utf8 ┆ Int64 ┆ Int64   │
╞══════╪═══════╪═════════╡
│ x    ┆ 1     ┆ None    │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ y    ┆ 2     ┆ 20      │
├╌╌╌╌╌╌┼╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌┤
│ z    ┆ None  ┆ 30      │
╰──────┴───────┴─────────╯

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

🧰 Maintenance

v0.2.33

02 Aug 21:31
9bb4b3a
Compare
Choose a tag to compare

Changes

✨ New Features

🚀 Performance Improvements

👾 Bug Fixes

📖 Documentation

🧰 Maintenance