Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switch the default uuidRepresentation from "pythonLegacy" to "unspecified" (not "standard") #2728

Open
ShaneHarvey opened this issue Jan 20, 2023 · 8 comments

Comments

@ShaneHarvey
Copy link
Contributor

First off, I want to apologize for this whole uuidRepresentation situation. The issue stems from a mistake made in certain MongoDB drivers a decade ago and it could have been addressed a long time ago. I'm sorry that this problem has bubbled over into mongoengine.

I noticed this uuidRepresentation warning which states that mongoengine plans to switch the default uuidRepresentation from "pythonLegacy" to "standard":

    if "uuidrepresentation" not in keys:
        warnings.warn(
            "No uuidRepresentation is specified! Falling back to "
            "'pythonLegacy' which is the default for pymongo 3.x. "
            "For compatibility with other MongoDB drivers this should be "
            "specified as 'standard' or '{java,csharp}Legacy' to work with "
            "older drivers in those languages. This will be changed to "
            "'standard' in a future release.",
            DeprecationWarning,
        )
        kwargs["uuidRepresentation"] = "pythonLegacy"

I'm concerned about this plan and the pain it will cause users. In particular, switching the default from "pythonLegacy" to "standard" will mean that application UUID data will be silently rewritten from subtype 3 to 4 (AKA a mild form of data corruption) and queries may break.

I believe the only safe decision is for mongoengine to either keep using "pythonLegacy" as the default or switch to "unspecified". Switching to "unspecified" avoids data corruption by forcing the user to make a choice if they want to encode UUIDs. They'll need to explicitly add "pythonLegacy" if they want to work with UUIDs already stored with the default behavior, or switch to "standard" if they are storing UUIDs for the first time. Applications also have the ability to migrate from "pythonLegacy" to "standard" if they rewrite all UUID data (or adjust their queries to match either subtype 3 or 4). This is why we switched the default in PyMongo from "pythonLegacy" to "unspecified".

For more background see: https://pymongo.readthedocs.io/en/stable/examples/uuid.html

Please let me know if you have any questions.

@bagerard
Copy link
Collaborator

bagerard commented Mar 1, 2023

Argh 😬 indeed this is a tricky situation and for sure switching to 'standard' is not a good idea. On top of uuidRepresentation, we also have this 'json_options' that was introduced and that we probably need to change as well so that its default value is aligned with uuidRepresentation (I guess). I must say that I overlooked this whole thing when it got merged...

Switching to 'unspecified' sounds like the best long-term fix (or eventually not set any value and let Pymongo impose its default). We would have to make sure to write decent instruction in the changelog also.

Would you have time to open a PR @ShaneHarvey ?

@ShaneHarvey
Copy link
Contributor Author

Sure thing, I opened #2739 to address the deprecation warnings. Good call on the json options, I updated those as well.

I suppose this ticket can stay open until you're ready to actually change the default values (some future breaking release).

@bagerard
Copy link
Collaborator

bagerard commented Mar 3, 2023

I'm releasing 0.27.0 right now, feel free to do the change on master when you have the time

@ShaneHarvey
Copy link
Contributor Author

Oh that's sooner than I expected. I'll discuss with my team and get back to you.

@bagerard
Copy link
Collaborator

bagerard commented Mar 3, 2023

No worries, could be in a few weeks/month as well 😄 but on my end I m not too concerned about it. End users are expected to read the changelog when upgrading and the way we plan to change it should be the least error prone and break existing code early

@bagerard
Copy link
Collaborator

ping on this topic @ShaneHarvey

@ShaneHarvey
Copy link
Contributor Author

I could help work on this. It could be good to include in a 1.0 release if that's coming soon.

@bagerard
Copy link
Collaborator

bagerard commented Aug 30, 2024

Yep, please do if you have a chance 👍

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants