Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add default comparison #1114

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

andresliszt
Copy link
Contributor

@andresliszt andresliszt commented Dec 3, 2023

This PR tries to solve this pydantic issue . In my work I use objects of the type np.ndarray or pd.DataFrame a lot and excluding the defaults values has been a bit annoying due to the error mentioned in the issue. The idea of this PR is to add an extra optional argument in Field that I called default_comparison, which is a callable that receives two arguments value and default and replaces the __eq__ call of the object with a call to this function

Checklist

  • [ x] Unit tests for the changes exist
  • Documentation reflects the changes where applicable
  • [x ] Pydantic tests pass with this pydantic-core (except for expected changes)
  • [ x] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @adriangb

fn compare_with_default(&self, py: Python, value: &PyAny) -> PyResult<bool> {
if let Some(default) = self.get_default(py)? {
if let Some(default_comparison) = &self.default_comparison {
return default_comparison.call(py, (value, default), None)?.extract::<bool>(py);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm passing the values as positional here, not sure if is the best way to call the comparison operator

Copy link

codecov bot commented Dec 3, 2023

Codecov Report

Merging #1114 (d1f2343) into main (7fa450d) will increase coverage by 0.01%.
The diff coverage is 95.45%.

❗ Current head d1f2343 differs from pull request most recent head 0614011. Consider uploading reports for the commit 0614011 to get more accurate results

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
+ Coverage   89.70%   89.72%   +0.01%     
==========================================
  Files         106      106              
  Lines       16364    16380      +16     
  Branches       35       35              
==========================================
+ Hits        14680    14697      +17     
+ Misses       1677     1676       -1     
  Partials        7        7              
Files Coverage Δ
python/pydantic_core/core_schema.py 94.76% <ø> (ø)
src/serializers/fields.rs 95.02% <100.00%> (+0.37%) ⬆️
src/serializers/shared.rs 78.02% <100.00%> (-1.31%) ⬇️
src/serializers/type_serializers/with_default.rs 81.66% <94.11%> (+4.39%) ⬆️

... and 3 files with indirect coverage changes


Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fa450d...0614011. Read the comment docs.

Copy link

codspeed-hq bot commented Dec 3, 2023

CodSpeed Performance Report

Merging #1114 will degrade performances by 13.46%

Comparing andresliszt:add/default-comparison (0614011) with main (7fa450d)

Summary

❌ 1 regressions
✅ 139 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

Benchmark main andresliszt:add/default-comparison Change
test_small_class_core_model 50.2 µs 58 µs -13.46%

@andresliszt
Copy link
Contributor Author

please review

@davidhewitt
Copy link
Contributor

Hi @andresliszt thanks for the PR! I spoke to @adriangb a bit this morning and we think this is potentially reasonable to move forward with; we can't see a better solution for the exclude_defaults situation with numpy. My only question is how we think this addition will get exposed in the Python side in pydantic?

Also, on the linked issue I commented about exclude_unset; maybe this solves the problem without needing to "fix" exclude_defaults.

@andresliszt
Copy link
Contributor Author

Hi @davidhewitt ! I was thinking to expose it using pydantic Field, adding a new argument default_comparison

import numpy as np
from pydantic import BaseModel, ConfigDict, Field

class Model(BaseModel):
    my_field: np.ndarray = Field(default = np.array(), default_comparison = np.array_equal)
    model_config = ConfigDict(arbitrary_types_allowed = True)

@davidhewitt
Copy link
Contributor

That would definitely be the obvious choice. Some other idea could be to make a special Default type to avoid a new parameter to the already-bloated Field:

import numpy as np
from pydantic import BaseModel, ConfigDict, Field, Default

class Model(BaseModel):
    my_field: np.ndarray = Field(default = Default(np.array(), compare_as = np.array_equal))
    model_config = ConfigDict(arbitrary_types_allowed = True)

(credit @adriangb)

@andresliszt
Copy link
Contributor Author

After I sent my PR i started to think that excluding defaults is not the only place where the python __eq__ gets invoked. A simple model comparison will fail with numpy

Model(my_field = np.array([1,2,3])) == Model(my_field = np.array([1,2,3])) 
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I think adding compare_as to Field would solve it using it everywhere __eq__ is invoked

@davidhewitt
Copy link
Contributor

Interesting observation, I like the generality of this solution. attrs has this functionality also, so there is prior art.

https://www.attrs.org/en/stable/comparison.html#customization

@andresliszt
Copy link
Contributor Author

andresliszt commented Dec 15, 2023

I was poking around pydantic.BaseModel.__eq__ and due this comparison i think implement compare_as in Field is a bit complicated. I was reading the comments and dict comprehensions are not a choice due to the impact in the performance.

My first idea was to create a class, for instance

class FieldComparator:
    def __init__(self, value: Any, compare_as: Callable):
        self.value = value
        self.comapre_as = compare_as
    def __eq__(self, other):
        return self.compare_as(self.value, other)

And replace all values in __dict__ for whose fields that need to overwrite the __eq__ method

# Naive way
self_dict = {k: FieldComparator(value = v, compare_as = self.model_fields[k].compare_as) if self.model_fields[k].compare_as is not None else v for k,v in self.__dict__.items()}

Then using self_dict instead of self.__dict__ in the downstreams would address the original issue.

I didn't try this and i don't know how it impacts in the performance, but what about creating a new attribute in the Rust side, say __pydantic_fields_overwrite_eq__ that is built in the same place that __pydantic_fields_set__ is built (here (?))
making easier the way self_dict is built

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants