Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce a schema variant to reuse Validators, Serializers and CoreSchema #1414

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

BoxyUwU
Copy link
Contributor

@BoxyUwU BoxyUwU commented Aug 20, 2024

Introduces a nested CoreSchema variant that reuses the specified model's schema/validator/serializer. As an example, the core schema for the following pydantic models:

class A(BaseModel):
    field3: 'B'

class B(BaseModel):
    field1: list[A]

class Wrapper(BaseModel):
    a: A
    b: B
{
    'type': 'model',
    'cls': <class '__main__.A'>,
    'schema': {
        'type': 'model-fields',
        'fields': {
            'field3': {
                'type': 'model-field',
                'schema': {
                    'type': 'nested',
                    'cls': <class '__main__.B'>,
                    'get_info': <function GenerateSchema._generate_schema_from_property.<locals>.get_model_info at 0x7f9ec650be20>
                }
            }
        },
        'model_name': 'A'
    },
    'ref': '__main__.A:20628480'
}
{
    'type': 'model',
    'cls': <class '__main__.B'>,
    'schema': {
        'type': 'model-fields',
        'fields': {
            'field1': {
                'type': 'model-field',
                'schema': {
                    'type': 'list',
                    'items_schema': {
                        'type': 'nested',
                        'cls': <class '__main__.A'>,
                        'get_info': <function GenerateSchema._generate_schema_from_property.<locals>.get_model_info at 0x7f9ec650ba60>
                    }
                }
            }
        },
        'model_name': 'B'
    },
    'ref': '__main__.B:20632592'
}
{
    'type': 'model',
    'cls': <class '__main__.Wrapper'>,
    'schema': {
        'type': 'model-fields',
        'fields': {
            'a': {
                'type': 'model-field',
                'schema': {
                    'type': 'nested',
                    'cls': <class '__main__.A'>,
                    'get_info': <function GenerateSchema._generate_schema_from_property.<locals>.get_model_info at 0x7f9ec650bec0>
                }
            },
            'b': {
                'type': 'model-field',
                'schema': {
                    'type': 'nested',
                    'cls': <class '__main__.B'>,
                    'get_info': <function GenerateSchema._generate_schema_from_property.<locals>.get_model_info at 0x7f9ec65c8040>
                }
            }
        },
        'model_name': 'Wrapper'
    },
    'ref': '__main__.Wrapper:20635728'
}
Compare to the schema generated on main

{
    'type': 'definitions',
    'schema': {'type': 'definition-ref', 'schema_ref': '__main__.A:37460656'},
    'definitions': [
        {
            'type': 'model',
            'cls': <class '__main__.A'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field3': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.B:37433488'}}},
                'model_name': 'A'
            },
            'ref': '__main__.A:37460656'
        },
        {
            'type': 'model',
            'cls': <class '__main__.B'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field1': {'type': 'model-field', 'schema': {'type': 'list', 'items_schema': {'type': 'definition-ref', 'schema_ref': '__main__.A:37460656'}}}},
                'model_name': 'B'
            },
            'ref': '__main__.B:37433488'
        }
    ]
}
{
    'type': 'definitions',
    'schema': {'type': 'definition-ref', 'schema_ref': '__main__.B:37433488'},
    'definitions': [
        {
            'type': 'model',
            'cls': <class '__main__.A'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field3': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.B:37433488'}}},
                'model_name': 'A'
            },
            'ref': '__main__.A:37460656'
        },
        {
            'type': 'model',
            'cls': <class '__main__.B'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field1': {'type': 'model-field', 'schema': {'type': 'list', 'items_schema': {'type': 'definition-ref', 'schema_ref': '__main__.A:37460656'}}}},
                'model_name': 'B'
            },
            'ref': '__main__.B:37433488'
        }
    ]
}
{
    'type': 'definitions',
    'schema': {
        'type': 'model',
        'cls': <class '__main__.Wrapper'>,
        'schema': {
            'type': 'model-fields',
            'fields': {
                'a': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.A:37460656'}},
                'b': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.B:37433488'}}
            },
            'model_name': 'Wrapper'
        },
        'ref': '__main__.Wrapper:37440960'
    },
    'definitions': [
        {
            'type': 'model',
            'cls': <class '__main__.A'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field3': {'type': 'model-field', 'schema': {'type': 'definition-ref', 'schema_ref': '__main__.B:37433488'}}},
                'model_name': 'A'
            },
            'ref': '__main__.A:37460656'
        },
        {
            'type': 'model',
            'cls': <class '__main__.B'>,
            'schema': {
                'type': 'model-fields',
                'fields': {'field1': {'type': 'model-field', 'schema': {'type': 'list', 'items_schema': {'type': 'definition-ref', 'schema_ref': '__main__.A:37460656'}}}},
                'model_name': 'B'
            },
            'ref': '__main__.B:37433488'
        }
    ]
}

This is similar to the current definitions-ref setup but it has a number of benefits:

  • The nested model does not need to have its schema built already in order to reuse it. Currently sometimes (I think) we build the core schema for a model twice if it has not been built by the first time it is referenced in another model.
  • The nested models schema is not inlined so the built CoreSchema is smaller which means walking it can be significantly faster in cases with lots of nested models when we do not need to recurse into `nested-model
  • There is no "flattening" step after generating the schema like with definitions schemas where we walk the built schema and ensure that any nested models' schemas does not have a definitions schema. This is what I believe is a significant portion of the performance wins since we now have to do significantly less complex tree traversals over python datastructures in python.
  • We can reuse validators and serializers with this setup which in theory should result in less memory overhead in cases with deeply recursive model definitions where the innermost model may have its validator constructed significantly
    more times than once

See pydantic/pydantic#10246 for the implementation of generating this new schema variant and also the benchmak results.

Before Merging

  • We don't propagate errors out of serde_serialize
  • The PyGcTraversible implementation for Result ignores the error variant as PyErr is not traversible
  • Make sure that the nested schema variant is defined flexibly enough to be useable for more than just models.
  • Make sure that Pydantic side of reusing validators and serializers pydantic#10246 can be merged with positive perf wins otherwise there's no point :3

There is an issue pydantic/pydantic#10394 tracking future work on the pydantic side to use this schema more

src/schema_cache.rs Outdated Show resolved Hide resolved
@sydney-runkle
Copy link
Member

@adriangb, would love to get your feedback on this as well!

Copy link
Member

@sydney-runkle sydney-runkle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting concepts here. I have some reservations about application for things like schemas with forward refs, but this looks like a promising start.

A few questions:

  • What type of benchmarking have you been able to do to in order to look at both time / memory comparisons against main?
  • This strikes me as pretty similar to our current pattern used here, at least conceptually. Why should we go about it in this way, instead? Perhaps the memory usage is better because we're not actually rebuilding the schema validator / serializer itself?
  • I feel like integration with pydantic here would be a bit of a challenge, especially in the namespace realm. Have you thought about how we might handle refs to other models (schemas) in different modules, functions, etc? Definitely ok if not, just wanted to gauge progress here.

I think some helpful next steps would be:

  1. Get some feedback from @samuelcolvin, @davidhewitt, and @adriangb.
  2. Post folks familiarizing themselves with said changes, we could set aside some time to chat about this in our next oss sync 🚀.

python/pydantic_core/core_schema.py Outdated Show resolved Hide resolved
python/pydantic_core/core_schema.py Outdated Show resolved Hide resolved
python/pydantic_core/core_schema.py Outdated Show resolved Hide resolved
src/schema_cache.rs Outdated Show resolved Hide resolved
src/schema_cache.rs Outdated Show resolved Hide resolved
src/schema_cache.rs Outdated Show resolved Hide resolved
src/serializers/type_serializers/nested_model.rs Outdated Show resolved Hide resolved
src/serializers/type_serializers/nested_model.rs Outdated Show resolved Hide resolved
src/validators/nested_model.rs Outdated Show resolved Hide resolved
@BoxyUwU BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 3c5a8ad to 0961627 Compare August 22, 2024 16:23
@BoxyUwU BoxyUwU marked this pull request as draft August 22, 2024 16:24
@BoxyUwU BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 0961627 to b65d178 Compare August 22, 2024 16:25
Copy link

codspeed-hq bot commented Aug 22, 2024

CodSpeed Performance Report

Merging #1414 will not alter performance

Comparing boxy/validator_serializer_reuse (f6508d2) with main (ba8eab4)

Summary

✅ 155 untouched benchmarks

@BoxyUwU BoxyUwU force-pushed the boxy/validator_serializer_reuse branch from 8ddba74 to f6508d2 Compare September 12, 2024 14:56
@BoxyUwU BoxyUwU changed the title WIP model validator/serializer/core schema reuse Introduce a schema variant to reuse Validators, Serializers and CoreSchema Sep 12, 2024
@BoxyUwU BoxyUwU marked this pull request as ready for review September 12, 2024 15:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants