Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[R] Invalid: Float value was truncated converting to int32 #44041

Open
MislavSag opened this issue Sep 10, 2024 · 1 comment
Open

[R] Invalid: Float value was truncated converting to int32 #44041

MislavSag opened this issue Sep 10, 2024 · 1 comment

Comments

@MislavSag
Copy link

Describe the bug, including details regarding any error messages, version, and platform.

I am using R arrow package to retieve partitioned parquet data. Data is partitioned by symbol variable. I try to retrieve some data using dplyr syntax:

  ds = open_dataset(path, format = "parquet")
  s_ = schema(ds)
  s_$TSFEL_0_Spectral_roll_on_66
  x <- ds %>%
    dplyr::mutate(TSFEL_0_Spectral_roll_on_66 = arrow::cast(TSFEL_0_Spectral_roll_on_66, int32(), safe = FALSE)) %>%
    dplyr::select(any_of(c("symbol", "date", "TSFEL_0_Spectral_roll_on_66"))) %>%
    dplyr::filter(symbol == "app") %>%
    collect()

The error I get:

Error in `compute.arrow_dplyr_query()`:
! Invalid: Float value 1.51515 was truncated converting to int32
Run `rlang::last_trace()` to see where the error occurred.

If I use some other symbol (haven't tried all), above code works. But for symbol app it doesn't. I figured out the problem is in above variable that has type int32.

I am using latest version of arrow (1.17).

Component(s)

R

@kou kou changed the title Invalid: Float value was truncated converting to int32 [R] Invalid: Float value was truncated converting to int32 Sep 10, 2024
@amoeba
Copy link
Member

amoeba commented Sep 12, 2024

Hi @MislavSag, are you able to share your Parquet file or (preferably) a minimal sample of it that we could use to reproduce this?

When I tried to make a minimal reproduction, I find I don't get the error you do when I set safe = FALSE like you do. See:

library(arrow)
library(dplyr)

# w/ safe = FALSE, works fine (truncates)
arrow_table(data.frame(x=rnorm(10, 0, 10))) |> 
  mutate(x_trunc = cast(x, int32(), safe = FALSE)) |> 
  collect()

# only w/ safe = TRUE does this error (see error below)
arrow_table(data.frame(x=rnorm(10, 0, 10))) |> 
  mutate(x_trunc = cast(x, int32(), safe = TRUE)) |> 
  collect()
# Error in `compute.arrow_dplyr_query()`:
#   ! Invalid: Float value -2.13185 was truncated converting to int32
# Run `rlang::last_trace()` to see where the error occurred.

I wonder if there's something going on where the safe argument isn't being respect or if it's possible there's a typo in your code snippet?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants