Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Row-wise mapping to custom type not supported in DataFrame.apply #1001

Open
bluenote10 opened this issue Sep 13, 2024 · 2 comments
Open

Row-wise mapping to custom type not supported in DataFrame.apply #1001

bluenote10 opened this issue Sep 13, 2024 · 2 comments

Comments

@bluenote10
Copy link

Describe the bug

Using DataFrame.apply to map rows to a custom type seems to be a valid/supported pattern in pandas (i.e., works at runtime). It looks like the overloads of apply currently do not support this pattern at type checking time.

To Reproduce

  1. Provide a minimal runnable pandas example that is not properly checked by the stubs.

The following examples "maps" a data frame row-wise to a custom type SomeType. Ideally, it would be great if the type checker could infer that list_of_instances is of type list[SomeType] (which it is at runtime).

from dataclasses import dataclass
import pandas as pd

@dataclass
class SomeType:
    a: int
    b: int

df = pd.DataFrame(
    {
        "a": [1, 2, 3],
        "b": [2, 3, 4],
    }
)

list_of_instances = list(df.apply(lambda row: SomeType(a=row["a"], b=row["b"]), axis=1))

for x in list_of_instances:
    assert isinstance(x, SomeType)
    print(x)
  1. Indicate which type checker you are using (mypy or pyright).

The behavior seems to be the same with mypy and pyright.

  1. Show the error message received from that type checker while checking your example.

mypy:

check_dataframe_apply.py:16: error: No overload variant of "apply" of "DataFrame" matches argument types "Callable[[Any], SomeType]", "int"  [call-overload]
check_dataframe_apply.py:16: note: Possible overload variants:
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any]], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note:     def [S1] apply(self, f: Callable[..., S1 | NAType], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., Mapping[Any, Any]], axis: Literal['index', 0] = ..., raw: bool = ..., result_type: None = ..., args: Any = ..., **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note:     def [S1] apply(self, f: Callable[..., S1 | NAType], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['expand', 'reduce'], **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any] | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['expand'], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['reduce'], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Series[Any] | str | bytes | date | datetime | timedelta | <7 more items> | complex | Mapping[Any, Any]], axis: Literal['index', 0] | Literal['columns', 1] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['broadcast'], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., Series[Any]], axis: Literal['index', 0] = ..., raw: bool = ..., args: Any = ..., *, result_type: Literal['reduce'], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note:     def [S1] apply(self, f: Callable[..., S1 | NAType], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> Series[S1]
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., MutableSequence[Any] | ndarray[Any, Any] | tuple[Any, ...] | Index[Any] | Mapping[Any, Any]], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> Series[Any]
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., Series[Any]], raw: bool = ..., result_type: None = ..., args: Any = ..., *, axis: Literal['columns', 1], **kwargs: Any) -> DataFrame
check_dataframe_apply.py:16: note:     def apply(self, f: Callable[..., Series[Any]], raw: bool = ..., args: Any = ..., *, axis: Literal['columns', 1], result_type: Literal['reduce'], **kwargs: Any) -> DataFrame

pyright:

No overloads for "apply" match the provided argumentsPylancereportGeneralTypeIssues
frame.pyi(1344, 9): Overload 11 is the closest match
Argument of type "(row: Any) -> SomeType" cannot be assigned to parameter "f" of type "(...) -> Series[Any]" in function "apply"
  Type "(row: Any) -> SomeType" cannot be assigned to type "(...) -> Series[Any]"
    Function return type "SomeType" is incompatible with type "Series[Any]"
      "SomeType" is incompatible with "Series[Any]"PylancereportGeneralTypeIssues

Please complete the following information:

  • OS: [e.g. Windows, Linux, MacOS]: Linux
  • OS Version [e.g. 22]: Ubuntu 20.04
  • python version: 3.10.13
  • version of type checker: mypy 1.11.2
  • version of installed pandas-stubs: 2.2.2.240909 (latest as of writing)
@bluenote10 bluenote10 changed the title Row-wise mapping to custom type not supported DataFrame.apply Row-wise mapping to custom type not supported in DataFrame.apply Sep 13, 2024
@twoertwein
Copy link
Member

I think we might be able to avoid the map overload issue, but it would probably need to return Series[Unknown/Any] and not Series[SomeType]. Pandas-stubs has made the assumption that Series contains only common types to provide type checking for their operators and some methods.

@Dr-Irv
Copy link
Collaborator

Dr-Irv commented Sep 24, 2024

See discussion in #1002 . This is the same issue. Suggested fix by @twoertwein at #1001 (comment) (with tests) is welcome.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants