Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support for BCP 47 java.util.Locale serialization/deserialization #3259

Closed
abrav9595 opened this issue Aug 25, 2021 · 15 comments
Closed

Support for BCP 47 java.util.Locale serialization/deserialization #3259

abrav9595 opened this issue Aug 25, 2021 · 15 comments
Milestone

Comments

@abrav9595
Copy link

Describe the bug

Locale with BCP 47 extensions are not serialized / de-serialized as expected using ObjectMapper.

Version information

Which Jackson version(s) was this for?
All Jackson versions.

To Reproduce

If you have a way to reproduce this with:

Brief code sample/snippet: include here in pre-formatted/code section

Locale locale = Locale.forLanguageTag("en-US-x-debug");
// ^^^ above locale will have x-debug as extensions
// This locale is a BCP 47 compliant locale.

ObjectMapper objectMapper = new ObjectMapper();

String serializedLocale = objectMapper.writeValueAsString(locale);
// ^^^ Will be en_US-#x-debug. Conforms to toString() method of Locale.
// This is not correct. There is a rogue # is added before extensions. 

Locale deserializedLocale = objectMapper.readValue(serializedLocale, Locale.class); 
// ^^^ Due to the rogue # before the extensions, when we de-serialize it, the extensions get added as a Variant instead of extensions.

Longer example stored somewhere else (diff repo, snippet), add a link

N/A. Reproducible with the above piece of code.

Textual explanation:

  • forLanguageTag locale is used to create a locale object from a BCP 47 compliant locale string. The x-debug gets added as extensions in the Locale.
  • However, when we use ObjectMapper to serialize the string, objectMapper uses toString() method of locale to convert it to String.
  • As Per Locale Javadocs, for BCP 47 locales, we must be using toLanguageTag() and forLanguageTag() instead to convert to / from String.
  • Due to this, when we try to de-serialize this malformed Locale, the x-debug gets added as a Variant instead of extensions, which is not expected.

Expected behavior

  • BCP 47 extensions should be added as extensions and not as Variants when serializing and de-serializing.

Additional Details

  • Due to this drawback, we are forced to write our own serializer / de-serializer. It would be nice if ObjectMapper provides us a Module which does this for us.
@abrav9595 abrav9595 added the to-evaluate Issue that has been received but not yet evaluated label Aug 25, 2021
@cowtowncoder cowtowncoder added 2.13 good first issue Issue that seems easy to resolve and is likely a good candidate for contributors new to project and removed to-evaluate Issue that has been received but not yet evaluated labels Aug 25, 2021
@cowtowncoder
Copy link
Member

cowtowncoder commented Aug 25, 2021

Sounds like we have sub-optimal behavior; would be great to improve handling.
One possible complication is between addressing this only on deserializer (backwards compatible) vs changing serialization (and possibly deserialization) (backwards incompatible -- but probably ok for a new minor release)

I think this is something that might be relatively easy to support so marking as "good first issue".

@abrav9595
Copy link
Author

@cowtowncoder Thanks for responding to this issue. Would it be a good idea to create a separate module for BCP 47? For clients who want to use BCP 47 can register this new LocaleModule ? Not sure if this is a good idea though. Would love to know your thoughts.

@cowtowncoder
Copy link
Member

I often recommend new modules for additions, but in this case it seems like bit of an overkill, if it only affects handling of one JDK class. Especially if it was possible to address this wrt all users in core jackson-databind.

@praneethjreddy
Copy link
Contributor

@cowtowncoder , Can I pick this up? Looks like we need to write custom serializer/de-serializer for java.util.Locale class

@cowtowncoder
Copy link
Member

cowtowncoder commented Aug 30, 2021

@praneethjreddy Sure. I do think we have custom (de)serializer already, but it might be part of a multi-type (de)serializer... let me look.

Ok, so:

  • for deserialization, it's FromStringDeserializer
  • for serialization, basic ToStringSerializer is used (which as per name only calls Locale.toString())

So, it may be necessary to have separate serializer implementation if deserializer-only handling can not be made to work.

@cowtowncoder
Copy link
Member

CLA received for the PR; hoping to review and merge the fix RSN!

@cowtowncoder cowtowncoder changed the title Support for BCP 47 java.util.Locale Serialization / De-serialization Support for BCP 47 java.util.Locale serialization/deserialization Sep 8, 2021
@cowtowncoder cowtowncoder removed the good first issue Issue that seems easy to resolve and is likely a good candidate for contributors new to project label Sep 8, 2021
@cowtowncoder cowtowncoder added this to the 2.13.0 milestone Sep 8, 2021
cowtowncoder added a commit that referenced this issue Sep 8, 2021
@cowtowncoder
Copy link
Member

Uhhhh. Should have remember to link #1600 here; Jackson 3.0 already serializes using "toLanguageTag()".

So 3.0 (master) will probably need bit different deserializer as all new tests are now failing.

cowtowncoder added a commit that referenced this issue Sep 11, 2021
@garretwilson
Copy link

garretwilson commented Jul 3, 2023

Can someone clarify exactly what came out of this?

Are you saying that for language tag serialization we need to create our own custom serializer for Locale that calls Locale.toLanguageTag(), but that FromStringSerializer now supports deserializing either syntax automatically?

@cowtowncoder
Copy link
Member

@garretwilson At this point I think full answer would require going through changes including #3265 but your assessment sounds correct. As far as I can see, Locale is still (as of 2.15.x) by default serialized using ToStringSerializer.

@garretwilson
Copy link

garretwilson commented Jul 5, 2023

@cowtowncoder thanks for the confirmation. This lopsided implementation of RFC 5646 (the latest language tag RFC for BCP 47) is less than ideal. Language tags are a core identifier. Furthermore there is no reason to use the outdated Locale serialization anymore. (In hindsight I'm sure Java would have went with BCP 47 to begin with.)

As someone already suggested, there should be a module that serializes and deserializes Locale using language tags automatically, like we have for e.g. Instant. I know you said:

I often recommend new modules for additions, but in this case it seems like bit of an overkill, if it only affects handling of one JDK class.

However in this case the alternative is much, much worse than having a module with just a single class in it—we force the developer to implement Locale serialization using toLanguageTag() over and over with custom serializers. Again it seems lopsided to then have gone to the other extreme and built in deserialization of the BCP 47 form. I know, you can detect which one it is at runtime, but that doesn't speak to the consistency of it: deserialization is built into the framework, but yet serialization requires you to create a custom serializer.

This also brings up the fact that there doesn't seem to be a simple way to create a to-string serializer using some other method. (Nor for serializers; see #4004.) Overwriting an abstract class for such a simple use case is much too verbose.

I'll be happy to do this, along with #4004. I propose to:

  • Add a utility to-string serializer class that allows a to-string string method e.g. ToStringSerializer.as(Locale::toLanguageTag). (I can also create a version that looks up a method name and creates one of those preconstructed accessors which I think your library has.)
  • Add a utility from-string deserializer class that allows a static factory method e.g. ToStringDeserializer.as(Locale::FromLanguageTag). (I can also create a version that looks up a method name and creates one of those preconstructed accessors which I think your library has.) This is already filed as Custom static factory-based deserializer? #4004.
  • Create a separate module in e.g. https://github.com/FasterXML/jackson-modules-java8 (or somewhere else) for automatically serializing and deserializing Locale using language tags (BCP 47) — in other words, re-opening this current ticket.

Obviously I can just write these and publish them in my own library, but these use cases are by no means esoteric—this seems to me to be bread-and-butter utilities almost everyone would need. It seems better all around if this were included in Jackson.

@cowtowncoder
Copy link
Member

Ok, some notes:

  1. On new module; maybe, but not in jackson-modules-java8 as it's not JDK dependant (now that baseline is Java 8 anyway)
  2. On ToStringSerializer / FromStringDeserializer... I am mostly worried API aspects, although I can see value in adding these. More then a question of how instances would be created (factory class, or just sub-classing?).
  3. Let's not re-open this issue; for new features, new issue
  4. No need to strongly couple of (1) and (2) I suspect, i.e. Locale serializer/deserializer need not be based on new configurable To/FromString/[De]Serializer

@garretwilson
Copy link

On new module; maybe, but not in jackson-modules-java8

Fair enough.

More then a question of how instances would be created (factory class, or just sub-classing?).

I'll have a better idea of which what best once I start implementing it.

No need to strongly couple of (1) and (2) I suspect, i.e. Locale serializer/deserializer need not be based on new configurable To/FromString/[De]Serializer

Not "strongly couple", but I just suspect that once the configurable to/from string utilities are in place, they will be so nice that they will be an obvious choice.

So without making you sign on the dotted line for the specific implementation details, the thing I want to get to here is whether you're open to reviewing and accepting these conceptually. If so, I propose the following:

  1. I implement Custom static factory-based deserializer? #4004 and file a pull request. If that goes well and gets approved, then
  2. I create a ticket for a similar toString serializer helper and file a pull request. If that goes well and gets approved, then
  3. I create a new ticket for a Locale module using BCP 47 language tags. In that ticket we can decide which repository it should go in, and whether the new string serde helpers are appropriate to use in this module.

Do those sound like reasonable next steps?

@cowtowncoder
Copy link
Member

@garretwilson Apologies for slow follow-up. Yes, those sound like good next steps.

@garretwilson
Copy link

garretwilson commented Aug 14, 2023

Related to this issue, I just discovered that even if I install a custom serializer for Locale in a module using .addSerializer(…), I have to add a .addKeySerializer(…) as well or Locale won't be serialized correctly in map keys (e.g. returning Map<Locale, Long> for counts of each language). See e.g. Serialize and Deserialize Map<Object, Object> Jackson.

On top of that, it turns out the key serializer, even though it is an implementation of JsonSerializer<Locale> just like in the addSerializer(…), has to call gen.writeFieldName(…) rather than gen.writeString(…) to get it to work. So this requires two completely different serializers that both do exactly the same thing functionally and semantically, except that one has to call a different method simply because it's being used as a map key.

I'll leave discussion of that design for elsewhere, but here I think it goes to show that at the very least Jackson needs a separate module for Locale serialization and deserialization, seeing all the hoops to jump through and gotches that one can't guess untill one runs into them.

@cowtowncoder
Copy link
Member

@garretwilson Yes, the division between Value and Key serializers/deserializers is unfortunate, but not easily removable -- partly since JsonTokens (and read/write methods thereby) are different between String values (VALUE_STRING) and Object property names (FIELD_NAME).

In hindsight I suppose JsonSerializer and JsonDeserializer probably should have contained methods for Key handling too. But I think that ship has sailed; not sure it's worth the complications to change even for 3.0.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants