add default comparison #1114

andresliszt · 2023-12-03T23:16:57Z

This PR tries to solve this pydantic issue . In my work I use objects of the type np.ndarray or pd.DataFrame a lot and excluding the defaults values has been a bit annoying due to the error mentioned in the issue. The idea of this PR is to add an extra optional argument in Field that I called default_comparison, which is a callable that receives two arguments value and default and replaces the __eq__ call of the object with a call to this function

Checklist

[ x] Unit tests for the changes exist
Documentation reflects the changes where applicable
[x ] Pydantic tests pass with this pydantic-core (except for expected changes)
[ x] My PR is ready to review, please add a comment including the phrase "please review" to assign reviewers

Selected Reviewer: @adriangb

andresliszt · 2023-12-03T23:17:34Z

src/serializers/type_serializers/with_default.rs

+    fn compare_with_default(&self, py: Python, value: &PyAny) -> PyResult<bool> {
+        if let Some(default) = self.get_default(py)? {
+            if let Some(default_comparison) = &self.default_comparison {
+                return default_comparison.call(py, (value, default), None)?.extract::<bool>(py);


I'm passing the values as positional here, not sure if is the best way to call the comparison operator

codecov · 2023-12-03T23:20:24Z

Codecov Report

Merging #1114 (d1f2343) into main (7fa450d) will increase coverage by 0.01%.
The diff coverage is 95.45%.

❗ Current head d1f2343 differs from pull request most recent head 0614011. Consider uploading reports for the commit 0614011 to get more accurate results

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1114      +/-   ##
==========================================
+ Coverage   89.70%   89.72%   +0.01%     
==========================================
  Files         106      106              
  Lines       16364    16380      +16     
  Branches       35       35              
==========================================
+ Hits        14680    14697      +17     
+ Misses       1677     1676       -1     
  Partials        7        7

Files	Coverage Δ
python/pydantic_core/core_schema.py	`94.76% <ø> (ø)`
src/serializers/fields.rs	`95.02% <100.00%> (+0.37%)`	⬆️
src/serializers/shared.rs	`78.02% <100.00%> (-1.31%)`	⬇️
src/serializers/type_serializers/with_default.rs	`81.66% <94.11%> (+4.39%)`	⬆️

... and 3 files with indirect coverage changes

Continue to review full report in Codecov by Sentry.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7fa450d...0614011. Read the comment docs.

codspeed-hq · 2023-12-03T23:24:25Z

CodSpeed Performance Report

Merging #1114 will degrade performances by 13.46%

_{Comparing andresliszt:add/default-comparison (0614011) with main (7fa450d)}

Summary

❌ 1 regressions
✅ 139 untouched benchmarks

⚠️ Please fix the performance issues or acknowledge them on CodSpeed.

Benchmarks breakdown

	Benchmark	`main`	`andresliszt:add/default-comparison`	Change
❌	`test_small_class_core_model`	50.2 µs	58 µs	-13.46%

andresliszt · 2023-12-06T21:25:12Z

please review

davidhewitt · 2023-12-13T13:56:14Z

Hi @andresliszt thanks for the PR! I spoke to @adriangb a bit this morning and we think this is potentially reasonable to move forward with; we can't see a better solution for the exclude_defaults situation with numpy. My only question is how we think this addition will get exposed in the Python side in pydantic?

Also, on the linked issue I commented about exclude_unset; maybe this solves the problem without needing to "fix" exclude_defaults.

andresliszt · 2023-12-13T15:16:21Z

Hi @davidhewitt ! I was thinking to expose it using pydantic Field, adding a new argument default_comparison

import numpy as np
from pydantic import BaseModel, ConfigDict, Field

class Model(BaseModel):
    my_field: np.ndarray = Field(default = np.array(), default_comparison = np.array_equal)
    model_config = ConfigDict(arbitrary_types_allowed = True)

davidhewitt · 2023-12-13T15:32:33Z

That would definitely be the obvious choice. Some other idea could be to make a special Default type to avoid a new parameter to the already-bloated Field:

import numpy as np
from pydantic import BaseModel, ConfigDict, Field, Default

class Model(BaseModel):
    my_field: np.ndarray = Field(default = Default(np.array(), compare_as = np.array_equal))
    model_config = ConfigDict(arbitrary_types_allowed = True)

(credit @adriangb)

andresliszt · 2023-12-13T16:07:53Z

After I sent my PR i started to think that excluding defaults is not the only place where the python __eq__ gets invoked. A simple model comparison will fail with numpy

Model(my_field = np.array([1,2,3])) == Model(my_field = np.array([1,2,3])) 
>>> ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()

I think adding compare_as to Field would solve it using it everywhere __eq__ is invoked

davidhewitt · 2023-12-13T16:28:06Z

Interesting observation, I like the generality of this solution. attrs has this functionality also, so there is prior art.

https://www.attrs.org/en/stable/comparison.html#customization

andresliszt · 2023-12-15T20:11:06Z

I was poking around pydantic.BaseModel.__eq__ and due this comparison i think implement compare_as in Field is a bit complicated. I was reading the comments and dict comprehensions are not a choice due to the impact in the performance.

My first idea was to create a class, for instance

class FieldComparator:
    def __init__(self, value: Any, compare_as: Callable):
        self.value = value
        self.comapre_as = compare_as
    def __eq__(self, other):
        return self.compare_as(self.value, other)

And replace all values in __dict__ for whose fields that need to overwrite the __eq__ method

# Naive way
self_dict = {k: FieldComparator(value = v, compare_as = self.model_fields[k].compare_as) if self.model_fields[k].compare_as is not None else v for k,v in self.__dict__.items()}

Then using self_dict instead of self.__dict__ in the downstreams would address the original issue.

I didn't try this and i don't know how it impacts in the performance, but what about creating a new attribute in the Rust side, say __pydantic_fields_overwrite_eq__ that is built in the same place that __pydantic_fields_set__ is built (here (?))
making easier the way self_dict is built

andresliszt commented Dec 3, 2023

View reviewed changes

add default comparison

d1f2343

andresliszt force-pushed the add/default-comparison branch from 2a07995 to d1f2343 Compare December 4, 2023 03:58

update test

0614011

pydantic-hooky bot added the ready for review label Dec 6, 2023

pydantic-hooky bot assigned adriangb Dec 6, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add default comparison #1114

add default comparison #1114

andresliszt commented Dec 3, 2023 •

edited by pydantic-hooky bot

Loading

andresliszt Dec 3, 2023

codecov bot commented Dec 3, 2023 •

edited

Loading

codspeed-hq bot commented Dec 3, 2023 •

edited

Loading

andresliszt commented Dec 6, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 13, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 13, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 15, 2023 •

edited

Loading

add default comparison #1114

Are you sure you want to change the base?

add default comparison #1114

Conversation

andresliszt commented Dec 3, 2023 • edited by pydantic-hooky bot Loading

Checklist

andresliszt Dec 3, 2023

Choose a reason for hiding this comment

codecov bot commented Dec 3, 2023 • edited Loading

Codecov Report

codspeed-hq bot commented Dec 3, 2023 • edited Loading

CodSpeed Performance Report

Merging #1114 will degrade performances by 13.46%

Summary

Benchmarks breakdown

andresliszt commented Dec 6, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 13, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 13, 2023

davidhewitt commented Dec 13, 2023

andresliszt commented Dec 15, 2023 • edited Loading

andresliszt commented Dec 3, 2023 •

edited by pydantic-hooky bot

Loading

codecov bot commented Dec 3, 2023 •

edited

Loading

codspeed-hq bot commented Dec 3, 2023 •

edited

Loading

andresliszt commented Dec 15, 2023 •

edited

Loading