From 8e4e301fdbc6620bebbfcdeddeb93ecfed1c5cb0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 17 Jul 2024 11:18:55 +0200 Subject: [PATCH 01/16] Website: Add blog post for 17.0.0 --- _posts/2024-07-16-17.0.0-release.md | 81 +++++++++++++++++++++++++++++ 1 file changed, 81 insertions(+) create mode 100644 _posts/2024-07-16-17.0.0-release.md diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md new file mode 100644 index 000000000000..aadf8681d3f9 --- /dev/null +++ b/_posts/2024-07-16-17.0.0-release.md @@ -0,0 +1,81 @@ +--- +layout: post +title: "Apache Arrow 17.0.0 Release" +date: "2024-07-16 00:00:00" +author: pmc +categories: [release] +--- + + + +The Apache Arrow team is pleased to announce the 17.0.0 release. This covers +over 3 months of development work and includes [**331 resolved issues**][1] +on [**529 distinct commits**][2] from [**92 distinct contributors**][2]. +See the [Install Page](https://arrow.apache.org/install/) +to learn how to get the libraries for your platform. + +The release notes below are not exhaustive and only expose selected highlights +of the release. Many other bugfixes and improvements have been made: we refer +you to the [complete changelog][3]. + +## Community + +Since the 16.0.0 release, Dane Pitkin has been invited to be committer. +No new members have joined the Project Management Committee (PMC). + +Thanks for your contributions and participation in the project! + +## C Data Interface notes + +## Arrow Flight RPC notes + +## C++ notes + +## C# notes + +## Go Notes + +## Java notes + +## JavaScript notes + +## Python notes + +## R notes + +For more on what’s in the 17.0.0 R package, see the [R changelog][4]. + +## Ruby and C GLib notes + +### Ruby + +### C GLib + +## Rust notes + +The Rust projects have moved to separate repositories outside the +main Arrow monorepo. For notes on the latest release of the Rust +implementation, see the latest [Arrow Rust changelog][5]. + +[1]: https://github.com/apache/arrow/milestone/62?closed=1 +[2]: {{ site.baseurl }}/release/17.0.0.html#contributors +[3]: {{ site.baseurl }}/release/17.0.0.html#changelog +[4]: {{ site.baseurl }}/docs/r/news/ +[5]: https://github.com/apache/arrow-rs/tags From 10c89aa7301ea8874d306e3d6f5bd70ab75b0613 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 17 Jul 2024 15:41:01 +0200 Subject: [PATCH 02/16] Apply suggestions from code review Co-authored-by: David Li --- _posts/2024-07-16-17.0.0-release.md | 11 +++++++++++ 1 file changed, 11 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index aadf8681d3f9..619c931f21fd 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -46,6 +46,13 @@ Thanks for your contributions and participation in the project! ## Arrow Flight RPC notes +- Flight SQL was formally stabilized (GH-39204). +- Flight SQL added a bulk ingestion command (GH-38255). +- The JDBC Flight SQL driver now accepts "catalog" as a connection parameter (GH-41947). +- "Stateless" prepared statements are now supported (GH-37220, GH-41262). +- Java added `FlightStatusCode.RESOURCE_EXHAUSTED` (GH-35888). +- C++ has some basic support for logging with OpenTelemetry (GH-39898). + ## C++ notes ## C# notes @@ -54,6 +61,10 @@ Thanks for your contributions and participation in the project! ## Java notes +**Some changes are coming up in the next version, Arrow 18. Java 8 support will be removed. The version of the flight-core artifact with shaded gRPC will no longer be distributed.** + +- Basic support for ListView (GH-41287) and StringView (GH-40339) has been added. These types should still be considered experimental. + ## JavaScript notes ## Python notes From dc6b2bae44db525109815d65ad6b5c6ece02a89f Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Wed, 17 Jul 2024 17:28:07 +0200 Subject: [PATCH 03/16] C++ additions --- _posts/2024-07-16-17.0.0-release.md | 83 +++++++++++++++++++++++++++++ 1 file changed, 83 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 619c931f21fd..829d1f84b695 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -55,6 +55,89 @@ Thanks for your contributions and participation in the project! ## C++ notes +- Half-float values can now be parsed and formatted correctly (GH-41089). +- Record batches can now be converted to row-major tensors, not only column-major (GH-40866). +- The CSV writer is now able to write large string arrays that are larger than + 2 GiB (GH-40270). +- A possible invalid memory access in `BooleanArray.true_count()` has been fixed (GH-41016). +- A new method `FlattenRecursively` allows recursive nesting of list and + fixed-size list arrays (GH-41055). +- The scratch space in some `Scalar` subclasses is now immutable. This is required + for proper concurrent access to `Scalar` instances (GH-40069). +- Calling the `bit_width` or `byte_width` method of an extension type now defers + to the underlying storage type (GH-41353). +- Fixed a bug where `MapArray::FromArrays` would behave incorrectly if the given + offsets array has a non-zero offset (GH-40750). +- `MapArray::FromArrays` now accepts an optional null bitmap argument + (GH-41684). +- The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed (GH-41343). +- Building with libc++ and C++20 enabled has been fixed (GH-43095). +- mimalloc is now preferred over jemalloc as the default memory pool (GH-43254). + +### Acero + +- The left anti join filter no longer crashes when the filter rows are empty (GH-41121). +- A race condition was fixed in the asof join (GH-41149). +- A potential stack overflow has been fixed (GH-41334, GH-41738). +- Potential crashes on very large data have been fixed (GH-41813, GH-43046). +- A potential data corruption on very large data has been fixed (GH-43202). + +### Compute + +- List views and maps are now supported by the `if_else`, `case_when` and + `coalesce` functions (GH-41418). +- List views are now supported by the functions `list_slice` (GH-42065), + `list_parent_indices` (GH-42235), `take` and `filter` (GH-42116). +- `list_flatten` can now be recursive based on new optional argument + (GH-41183, GH-41055) +- The `take` and `filter` functions have been made significantly faster on fixed-width + types, including fixed-size lists of fixed-width types (GH-39798). + +### Dataset + +- Repeated scanning of an encrypted Parquet dataset now works correctly (GH-41431). + +### Filesystems + +- Standard filesystem implementations are now tracked in a global registry which + also allows loading third-party filesystem implementations, for example from + runtime-loaded DLLs (GH-40342, +- Directory metadata operations on Azure filesystems are now more aligned with + the common expectations for filesystems (GH-41034). +- `CopyFile` is now supported for Azure filesystems with hierarchical namespace + enabled (GH-41095). +- Azure credentials can now be loaded explicitly from the environment (GH-39345), + or using the Azure CLI (GH-39344). +- A potential deadlock was fixed when closing an S3 output stream (GH-41862). + +### GPU + +- Non-CPU data can now be pretty-printed (GH-41664). +- Non-CPU data with offsets, such as list and binary data, can now be properly + sent over IPC (GH-42198). + +### IPC + +- Flatbuffers serialization is now more deterministic (GH-40361). + +### Parquet + +- A crash was fixed when reading an invalid Parquet file where columns claim to + be of different lengths (GH-41317). +- Definition and repetition levels are now more strictly checked, avoiding later + crashes when reading an invalid Parquet file (GH-41321). +- A crash was fixed when reading an invalid encrypted Parquet file (GH-43070). +- Fixed a bug where the BYTE_STREAM_SPLIT decoder could behave incorrectly + when nulls are present in a column (GH-41562). +- Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize` could + return an invalid estimate in some situations (GH-41545). +- Delimiting records is now faster for columns with nested repeating (GH-41361). + +### Substrait + +- Support for more Arrow data types was added: some temporal types, half floats, + large string and large binary (GH-40695). + ## C# notes ## Go Notes From 77f8ef008874e9cf686d5c50f288ab57cd639711 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Wed, 17 Jul 2024 17:34:04 +0200 Subject: [PATCH 04/16] Update _posts/2024-07-16-17.0.0-release.md Co-authored-by: Felipe Oliveira Carvalho --- _posts/2024-07-16-17.0.0-release.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 829d1f84b695..8a1bcee4cc60 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -44,6 +44,7 @@ Thanks for your contributions and participation in the project! ## C Data Interface notes + - `ArrowDeviceArrayStream` can now be imported and exported (GH-40078) ## Arrow Flight RPC notes - Flight SQL was formally stabilized (GH-39204). From 6aaa6aa69da0ff95c40f2288a960f7534e17fee8 Mon Sep 17 00:00:00 2001 From: Antoine Pitrou Date: Wed, 17 Jul 2024 18:47:46 +0200 Subject: [PATCH 05/16] Remove 18.0.0 changes --- _posts/2024-07-16-17.0.0-release.md | 5 +---- 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 8a1bcee4cc60..fed18780e234 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -72,16 +72,13 @@ Thanks for your contributions and participation in the project! - `MapArray::FromArrays` now accepts an optional null bitmap argument (GH-41684). - The `ARROW_NO_DEPRECATED_API` macro was unused and has been removed (GH-41343). -- Building with libc++ and C++20 enabled has been fixed (GH-43095). -- mimalloc is now preferred over jemalloc as the default memory pool (GH-43254). ### Acero - The left anti join filter no longer crashes when the filter rows are empty (GH-41121). - A race condition was fixed in the asof join (GH-41149). - A potential stack overflow has been fixed (GH-41334, GH-41738). -- Potential crashes on very large data have been fixed (GH-41813, GH-43046). -- A potential data corruption on very large data has been fixed (GH-43202). +- A potential crash on very large data has been fixed (GH-41813). ### Compute From 571e6d797aae0e438eaf473151e4835436d17383 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:41:31 +0900 Subject: [PATCH 06/16] Add Linux packages/Ruby/GLib notes --- _posts/2024-07-16-17.0.0-release.md | 15 +++++++++++++-- 1 file changed, 13 insertions(+), 2 deletions(-) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index fed18780e234..a935a9d1d5cb 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -42,9 +42,14 @@ No new members have joined the Project Management Committee (PMC). Thanks for your contributions and participation in the project! +## Linux packages notes + +- We dropped support for Debian GNU/Linux bullseye + ## C Data Interface notes - - `ArrowDeviceArrayStream` can now be imported and exported (GH-40078) +- `ArrowDeviceArrayStream` can now be imported and exported (GH-40078) + ## Arrow Flight RPC notes - Flight SQL was formally stabilized (GH-39204). @@ -143,7 +148,7 @@ Thanks for your contributions and participation in the project! ## Java notes **Some changes are coming up in the next version, Arrow 18. Java 8 support will be removed. The version of the flight-core artifact with shaded gRPC will no longer be distributed.** - + - Basic support for ListView (GH-41287) and StringView (GH-40339) has been added. These types should still be considered experimental. ## JavaScript notes @@ -158,8 +163,14 @@ For more on what’s in the 17.0.0 R package, see the [R changelog][4]. ### Ruby +- Improved `Arrow::Table#to_s` format + - This is a breaking change + ### C GLib +- Added support for Microsoft Visual C++ +- Added `gadataset_dataset_to_record_batch_reader()` + ## Rust notes The Rust projects have moved to separate repositories outside the From b02c0259f042615b2b40e87b37e59c6caa881041 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:42:19 +0900 Subject: [PATCH 07/16] Add R notes Co-authored-by: Bryce Mecum --- _posts/2024-07-16-17.0.0-release.md | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index a935a9d1d5cb..8d536255acb6 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -157,6 +157,11 @@ Thanks for your contributions and participation in the project! ## R notes +* R functions that users write that use functions that Arrow supports in dataset queries now can be used in queries too. Previously, only functions that used arithmetic operators worked. For example, `time_hours <- function(mins) mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` did not; now both work. These are automatic translations rather than true user-defined functions (UDFs); for UDFs, see `register_scalar_function()`. [GH-41223](https://github.com/apache/arrow/issues/41223) +* `summarize()` supports more complex expressions, and correctly handles cases where column names are reused in expressions. [GH-41323](https://github.com/apache/arrow/issues/41323) +* The `na_matches` argument to the `dplyr::*_join()` functions is now supported. This argument controls whether `NA` values are considered equal when joining. [GH-41223](https://github.com/apache/arrow/issues/41358) +* R metadata, stored in the Arrow schema to support round-tripping data between R and Arrow/Parquet, is now serialized and deserialized more strictly. This makes it safer to load data from files from unknown sources into R data.frames. [GH-41223](https://github.com/apache/arrow/issues/41969) + For more on what’s in the 17.0.0 R package, see the [R changelog][4]. ## Ruby and C GLib notes From 0ff2eb6bf8c9579bd286c92d42c76e0e12eea04f Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:42:32 +0900 Subject: [PATCH 08/16] Add C# notes Co-authored-by: Adam Reeve --- _posts/2024-07-16-17.0.0-release.md | 6 ++++++ 1 file changed, 6 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 8d536255acb6..e8497408770d 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -143,6 +143,12 @@ Thanks for your contributions and participation in the project! ## C# notes +- The performance of building Decimal arrays using SqlDecimal values was improved for .NET 7+ (GH-41349) +- Scalar arrays now implement `ICollection` (GH-38692) +- Concatenating arrays with a non-zero offset with ArrowArrayConcatenator was fixed (GH-41164) +- Concatenating union arrays with ArrowArrayConcatenator was fixed (GH-41198) +- Accessing values of decimal arrays with a non-zero offset was fixed (GH-41199) + ## Go Notes ## Java notes From 7e10c33a37f779f6c859425e34d72b59c1f5ec33 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:42:46 +0900 Subject: [PATCH 09/16] Add Go notes Co-authored-by: Joel Lubinitsky <33523178+joellubi@users.noreply.github.com> --- _posts/2024-07-16-17.0.0-release.md | 23 +++++++++++++++++++++++ 1 file changed, 23 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index e8497408770d..5e322cd81bcb 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -151,6 +151,29 @@ Thanks for your contributions and participation in the project! ## Go Notes +### Bug Fixes + +#### Arrow + +- Prevent exposure of invalid Go pointers in CGO code ([GH-43062](https://github.com/apache/arrow/issues/43062)) +- Fix memory leak for 0-length C array imports ([GH=41534](https://github.com/apache/arrow/issues/41534)) +- Ensure statement handle is updated so stateless prepared statements work properly ([GH-41427](https://github.com/apache/arrow/issues/41427)) + +#### Parquet + +- Fix memory leak in BufferedPageWriter ([GH-41697](https://github.com/apache/arrow/issues/41697)) +- Fix performance regression in PooledBufferWriter ([GH-41541](https://github.com/apache/arrow/issues/41541)) + +### Enhancements + +#### Arrow + +- Arrow Schemas and Records can now be created from Protobuf messages ([GH-40494](https://github.com/apache/arrow/issues/40494)) + +#### Parquet + +- Performance improvement for BitWriter VlqInt ([GH-41160](https://github.com/apache/arrow/pull/41160)) + ## Java notes **Some changes are coming up in the next version, Arrow 18. Java 8 support will be removed. The version of the flight-core artifact with shaded gRPC will no longer be distributed.** From 1da39de9054db24b74613e35aa6b1543333b4c05 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:43:17 +0900 Subject: [PATCH 10/16] Add Python notes Co-authored-by: Dane Pitkin --- _posts/2024-07-16-17.0.0-release.md | 3 +++ 1 file changed, 3 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 5e322cd81bcb..03e8ff2d9a36 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -184,6 +184,9 @@ Thanks for your contributions and participation in the project! ## Python notes +- Added support for Numpy 2.0 (GH-42170, GH-41924). +- Added support for Emscripten via Pyodide (GH-41910). + ## R notes * R functions that users write that use functions that Arrow supports in dataset queries now can be used in queries too. Previously, only functions that used arithmetic operators worked. For example, `time_hours <- function(mins) mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` did not; now both work. These are automatic translations rather than true user-defined functions (UDFs); for UDFs, see `register_scalar_function()`. [GH-41223](https://github.com/apache/arrow/issues/41223) From f571d0c6ba049a23cb8551348a5ce28f83980c24 Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:44:28 +0900 Subject: [PATCH 11/16] Add one more C++ Filesystem note Co-authored-by: Rossi Sun --- _posts/2024-07-16-17.0.0-release.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 03e8ff2d9a36..eee200c4bd3f 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -84,6 +84,7 @@ Thanks for your contributions and participation in the project! - A race condition was fixed in the asof join (GH-41149). - A potential stack overflow has been fixed (GH-41334, GH-41738). - A potential crash on very large data has been fixed (GH-41813). +- Asof join and sort merge join now support single threaded mode (GH-41190). ### Compute From 933523c5f49358a13a1124e9a4f9650dc346e2ba Mon Sep 17 00:00:00 2001 From: Sutou Kouhei Date: Thu, 18 Jul 2024 17:45:25 +0900 Subject: [PATCH 12/16] Remove needless Parquet notes Co-authored-by: Adam Reeve --- _posts/2024-07-16-17.0.0-release.md | 2 -- 1 file changed, 2 deletions(-) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index eee200c4bd3f..bfe07d1c12fd 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -131,8 +131,6 @@ Thanks for your contributions and participation in the project! - Definition and repetition levels are now more strictly checked, avoiding later crashes when reading an invalid Parquet file (GH-41321). - A crash was fixed when reading an invalid encrypted Parquet file (GH-43070). -- Fixed a bug where the BYTE_STREAM_SPLIT decoder could behave incorrectly - when nulls are present in a column (GH-41562). - Fixed a bug where `DeltaLengthByteArrayEncoder::EstimatedDataEncodedSize` could return an invalid estimate in some situations (GH-41545). - Delimiting records is now faster for columns with nested repeating (GH-41361). From 1a2adf90316379d56dbf09305cb168fa7eda7d21 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Thu, 18 Jul 2024 16:34:30 +0200 Subject: [PATCH 13/16] Update JS notes --- _posts/2024-07-16-17.0.0-release.md | 2 ++ 1 file changed, 2 insertions(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index bfe07d1c12fd..f827c4327c9f 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -181,6 +181,8 @@ Thanks for your contributions and participation in the project! ## JavaScript notes +- General maintenance. Clean up packaging ([GH-39722](https://github.com/apache/arrow/issues/39722)), update dependencies ([GH-41905](https://github.com/apache/arrow/issues/41905)). + ## Python notes - Added support for Numpy 2.0 (GH-42170, GH-41924). From ed2a2a517b18e65abb7704c2ac2d4bfc7e80dd6e Mon Sep 17 00:00:00 2001 From: Bryce Mecum Date: Thu, 18 Jul 2024 09:15:49 -0800 Subject: [PATCH 14/16] Update _posts/2024-07-16-17.0.0-release.md --- _posts/2024-07-16-17.0.0-release.md | 1 + 1 file changed, 1 insertion(+) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index f827c4327c9f..7ffa35af315d 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -191,6 +191,7 @@ Thanks for your contributions and participation in the project! ## R notes * R functions that users write that use functions that Arrow supports in dataset queries now can be used in queries too. Previously, only functions that used arithmetic operators worked. For example, `time_hours <- function(mins) mins / 60` worked, but `time_hours_rounded <- function(mins) round(mins / 60)` did not; now both work. These are automatic translations rather than true user-defined functions (UDFs); for UDFs, see `register_scalar_function()`. [GH-41223](https://github.com/apache/arrow/issues/41223) +* `mutate()` expressions can now include aggregations, such as `x - mean(x)`. [GH-41350](https://github.com/apache/arrow/pull/41350) * `summarize()` supports more complex expressions, and correctly handles cases where column names are reused in expressions. [GH-41323](https://github.com/apache/arrow/issues/41323) * The `na_matches` argument to the `dplyr::*_join()` functions is now supported. This argument controls whether `NA` values are considered equal when joining. [GH-41223](https://github.com/apache/arrow/issues/41358) * R metadata, stored in the Arrow schema to support round-tripping data between R and Arrow/Parquet, is now serialized and deserialized more strictly. This makes it safer to load data from files from unknown sources into R data.frames. [GH-41223](https://github.com/apache/arrow/issues/41969) From cfd2532a48c01edfcb92347c03cd6ff7b947d234 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Fri, 19 Jul 2024 11:01:35 +0200 Subject: [PATCH 15/16] Update Python notes Co-authored-by: Dane Pitkin --- _posts/2024-07-16-17.0.0-release.md | 25 +++++++++++++++++++++++-- 1 file changed, 23 insertions(+), 2 deletions(-) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index 7ffa35af315d..ee85ba51cc62 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -185,8 +185,29 @@ Thanks for your contributions and participation in the project! ## Python notes -- Added support for Numpy 2.0 (GH-42170, GH-41924). -- Added support for Emscripten via Pyodide (GH-41910). +Compatibility notes: +* To ensure Python 3.13 compatibility, _Py_IsFinalizing has been replaced with a public API (GH-41475). +* The C Data Interface now supports CUDA devices (GH-40384). + +New features: +* Added support for Emscripten via Pyodide (GH-41910). + +Other improvements: +* The ParquetWriter added the store_decimal_as_integer option (GH-42168). +* The Float16 logical type is supported in Parquet (GH-42016). +* Exposed bit_width and byte_width to extension types (GH-41389). +* Added bindings for Device and MemoryManager classes (GH-41126). +* The PyCapsule interface now exposes the device interface (GH-38325). +* Various PyArrow APIs have been updated to work with non-CPU architectures gracefully. (GH-42112, GH-41664, GH-41662, + +Relevant bug fixes: +* Fixed Numpy 2.0 compatibility issues (GH-42170, GH-41924). +* Fixed sporadic as_of join failures (GH-41149). +* Fixed a bug in RecordBatch.filter() when passing a ChunkedArray, which would cause a segfault (GH-38770). +* Fixed a bug in RecordBatch.from_arrays() when passing a storage array, which would cause a segfault (GH-37669). +* Fixed a bug where constructing a MapArray from Array could drop nulls (GH-41684). +* FIxed a bug where RunEndEncodedArray.from_arrays fails if run_ends are pyarrow.Array (GH-40560). +* FIxed a regression introduced in PyArrow v16 in RecordBatchReader.cast() (GH-41884). ## R notes From 6737037a491084dff36082313fbb27dc8d196b05 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Ra=C3=BAl=20Cumplido?= Date: Fri, 19 Jul 2024 11:05:55 +0200 Subject: [PATCH 16/16] Minor formatting issues --- _posts/2024-07-16-17.0.0-release.md | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/_posts/2024-07-16-17.0.0-release.md b/_posts/2024-07-16-17.0.0-release.md index ee85ba51cc62..ffc29694b7f8 100644 --- a/_posts/2024-07-16-17.0.0-release.md +++ b/_posts/2024-07-16-17.0.0-release.md @@ -61,6 +61,10 @@ Thanks for your contributions and participation in the project! ## C++ notes +For C++ notes refer to the full changelog. + +### Highlights + - Half-float values can now be parsed and formatted correctly (GH-41089). - Record batches can now be converted to row-major tensors, not only column-major (GH-40866). - The CSV writer is now able to write large string arrays that are larger than @@ -185,14 +189,14 @@ Thanks for your contributions and participation in the project! ## Python notes -Compatibility notes: +### Compatibility notes: * To ensure Python 3.13 compatibility, _Py_IsFinalizing has been replaced with a public API (GH-41475). * The C Data Interface now supports CUDA devices (GH-40384). -New features: +### New features: * Added support for Emscripten via Pyodide (GH-41910). -Other improvements: +### Other improvements: * The ParquetWriter added the store_decimal_as_integer option (GH-42168). * The Float16 logical type is supported in Parquet (GH-42016). * Exposed bit_width and byte_width to extension types (GH-41389). @@ -200,7 +204,7 @@ Other improvements: * The PyCapsule interface now exposes the device interface (GH-38325). * Various PyArrow APIs have been updated to work with non-CPU architectures gracefully. (GH-42112, GH-41664, GH-41662, -Relevant bug fixes: +### Relevant bug fixes: * Fixed Numpy 2.0 compatibility issues (GH-42170, GH-41924). * Fixed sporadic as_of join failures (GH-41149). * Fixed a bug in RecordBatch.filter() when passing a ChunkedArray, which would cause a segfault (GH-38770).