Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle dask groupby warning #6391

Open
wants to merge 1 commit into
base: main
Choose a base branch
from
Open

Conversation

maximlt
Copy link
Member

@maximlt maximlt commented Sep 27, 2024

So Pandas deprecated in version 2.2.0 and with a FutureWarning passing a length-1 list-like name to grouped_by.get_group() that is not a tuple (pandas-dev/pandas#54155).

Dask seems to call Pandas internally doing a a group-by operation followed by a get_group() call, so their users are also seeing this warning (dask/dask#10572).

The change worked locally (let's see what the CI says) but I'm not sure it'll work with all the versions of Pandas and Dask supported by HoloViews.

(Sorry for the hvPlot reproducer only!)

import hvplot.dask  # noqa

from hvplot.sample_data import airline_flights

flights = airline_flights.to_dask().persist()
flight_subset = flights[flights.carrier.isin(['OH', 'F9'])]
flight_subset.hvplot(x='distance', y='depdelay', by='carrier', kind='scatter', alpha=0.2, persist=True)
/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py:270: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  return grouped.get_group(get_key)
/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py:270: FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.
  return grouped.get_group(get_key)

The full traceback when turning the warning into an error:

Traceback (most recent call last):
  File "/Users/mliquet/dev/hvplot/.mltmess/issue_dask_groupby.py", line 9, in <module>
    flight_subset.hvplot(x='distance', y='depdelay', by='carrier', kind='scatter', alpha=0.2, persist=True)
  File "/Users/mliquet/dev/hvplot/hvplot/plotting/core.py", line 95, in __call__
    return self._get_converter(x, y, kind, **kwds)(kind, x, y)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 1723, in __call__
    obj = method(x, y)
          ^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2251, in scatter
    return self.chart(Scatter, x, y, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2200, in chart
    return self.single_chart(element, x, y, data)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/hvplot/converter.py", line 2072, in single_chart
    Dataset(data, self.by + kdims, vdims).to(element, kdims, vdims, self.by),
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 145, in __call__
    group = selected.groupby(groupby, container_type=HoloMap,
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 196, in pipelined_fn
    result = method_fn(*args, **kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/__init__.py", line 1000, in groupby
    return self.interface.groupby(self, dim_names, container_type,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/holoviews/core/data/dask.py", line 223, in groupby
    group = group_type(groupby.get_group(coord), **group_kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_groupby.py", line 1639, in get_group
    return new_collection(GetGroup(self.obj.expr, key, self._slice, *self.by))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_collection.py", line 4799, in new_collection
    meta = expr._meta
           ^^^^^^^^^^
  File "/opt/homebrew/Cellar/[email protected]/3.11.8/Frameworks/Python.framework/Versions/3.11/lib/python3.11/functools.py", line 1001, in __get__
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_expr.py", line 496, in _meta
    return self.operation(*args, **self._kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask_expr/_groupby.py", line 1089, in groupby_get_group
    return _groupby_get_group(df, list(by_key), get_key, columns)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/dask/dataframe/groupby.py", line 270, in _groupby_get_group
    return grouped.get_group(get_key)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/mliquet/dev/hvplot/.venv/lib/python3.11/site-packages/pandas/core/groupby/groupby.py", line 1103, in get_group
    warnings.warn(
FutureWarning: When grouping with a length-1 list-like, you will need to pass a length-1 tuple to get_group in a future version of pandas. Pass `(name,)` instead of `name` to silence this warning.

@@ -218,8 +218,6 @@ def groupby(cls, dataset, dimensions, container_type, group_type, **kwargs):
for coord in indices:
if any(isinstance(c, float) and np.isnan(c) for c in coord):
continue
if len(coord) == 1:
Copy link
Member

@hoxbro hoxbro Sep 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the filterwarnings, lets only ignore this when pandas is greater and equal than version 2.2

@hoxbro hoxbro added the type: compatibility Compability with upstream packages label Sep 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: compatibility Compability with upstream packages
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants