Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use ConcurrentLinkedHashMap as backing for LRUMap #3531

Merged
merged 41 commits into from
Jul 1, 2022

Conversation

pjfanning
Copy link
Member

@pjfanning pjfanning commented Jun 26, 2022

Relates to #2502

@cowtowncoder if this seems like a good way to go, I can add more tests

@pjfanning pjfanning marked this pull request as draft June 26, 2022 15:12
pom.xml Outdated Show resolved Hide resolved
* http://code.google.com/p/concurrentlinkedhashmap/</a>
*/
@ThreadSafe
public final class ConcurrentLinkedHashMap<K, V> extends AbstractMap<K, V>

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fwiw, there is an embedded fork in Groovy (and then taken by Micronaut) that adds support for computeIfAbsent and friends. Otherwise you inherit the non-atomic version from ConcurrentMap. That's fine in your usage, but it may be a desirable enhancement. However, since I did not add or review it all I know is the developers seem good but I don't know if their changes are good or not. But if helpful then you might check it out. I'd probably run Lincheck as a sanity test when code diving.

@cowtowncoder
Copy link
Member

Ok, sounds good to me. LMK when things are ready, happy to merge. Hopefully can be merged relatively cleanly to 3.0 too.

protected Object readResolve() {
return new LRUMap<Object,Object>(_jdkSerializeMaxEntries, _jdkSerializeMaxEntries);
return new LRUMap<K,V>(_initialEntries, _maxEntries);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Possibly not a big deal, but the idea here was that JDK serialization would retain size settings, if possible.
But perhaps CLHM does not expose these settings after construction?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

long capacity() and setCapacity(long) expose those if helpful. CLHM is serializable where it retains the entries (to mirror ConcurrentHashMap) and discards the eviction order. That differs in Guava/Caffeine, fwiw, which don't keep the entries and just the configuration.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I hit issues with the old _jdkSerializeMaxEntries values - after my initial changes, the cloned LRUMap was being constructed with _jdkSerializeMaxEntries set to 0 and this meant that CLHM immediately evicted all new entries.

@cowtowncoder would it be ok to allow the backing CLHM be non-transient, then we wouldn't need any special serialization logic? As @ben-manes highlights, the eviction order would be lost but that isn't too bad.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think cached content serialization should not be attempted, since contents are not necessarily serializable -- and at least for serializers, deserializers, are not. Alternatively if we really want to retain contents in some cases, could make it configurable. But at this point I think I'd prefer just wiping out contents.

I think losing some of the settings is probably fine: ability to JDK serialize Jackson components is bit of an odd feature in general. Although, to be honest, I think the main important thing there is to retain configuration: as far as I know, this is relied upon by frameworks like Spark etc where workers are being sent serialized set up of handlers.

@cowtowncoder
Copy link
Member

Ok, one non-code thing I'd like to figure out is how to give Ben credit for the code. An entry in CREDITS-2.x would be a start I guess, but probably he should be mentioned as co-author in that file as well, for specific subset of classes as they are copied?

Similarly, I think we should add @pjfanning as a co-author, and from this author(s) of fast FP read/write packages.

I am open to suggestions on how to do this beyond Javadocs (I think class Javadocs, or maybe package-info, should definitely have author notes too).

WDYT?

@ben-manes
Copy link

Thanks, but I don't have any strong opinions on credit and for me it is enough to retain the author tag. Happy just to contribute good code as I've benefited from so many other's open-source work (including jackson).

@pjfanning
Copy link
Member Author

I'm happy to leave the source author tag with just Ben's name - I have copied his code and made very few changes

@cowtowncoder
Copy link
Member

I'm happy to leave the source author tag with just Ben's name - I have copied his code and made very few changes

Sounds good. I just want to make sure to give where due -- that's the least we can do and easy to do.

@pjfanning pjfanning marked this pull request as ready for review June 28, 2022 19:39
@pjfanning pjfanning changed the title WIP: use ConcurrentLinkedHashMap as backing for LRUMap use ConcurrentLinkedHashMap as backing for LRUMap Jun 28, 2022
@cowtowncoder
Copy link
Member

Reading through the code, I think there is the remaining work to strip the implementation down to struts -- only the minimal interface for LookupCache / LRUMap needs to be supported.
Note that I am fine merging a full(er) implementation first, and then starting to trim things down further, given decent test coverage. So not all of my suggestions here need to be done right away.

And as to JDK serialization: I think that is what LRUMap should implement, keeping track of size values it needs, and re-constructing backing CLHM but keeping that non-serializable to remove any need for it to be serialized on its own.

The very last thing I'd want would be for users to start using Jackson's copy of CLHM -- I know it is difficult to hide it (pre-module info :) ) but let's do our best.

@ben-manes
Copy link

ben-manes commented Jun 29, 2022

The very last thing I'd want would be for users to start using Jackson's copy of CLHM -- I know it is difficult to hide it (pre-module info :) ) but let's do our best.

Eclipse hides the internal package (I think by default), no clue about IntelliJ.

Screen Shot 2022-06-29 at 4 47 44 PM

@cowtowncoder
Copy link
Member

Eclipse hides the internal package (I think by default), no clue about IntelliJ.

Interesting.

At one point I used shading to add ".private." to prevent source code references (class loader is not bothered by keywords in package names) but that was apparently frowned upon. :)

I guess we can also use some more obscure name since it's really the auto-completion that introduces lots of unintended (or poorly understood) accidental reuse of internal packages.

@pjfanning
Copy link
Member Author

One option is to make jackson's CLHM non-serializable (saving code) because LRUMap does treats the underlying CLHM instance as transient.

@cowtowncoder
Copy link
Member

Looks good! Seems like there might be just 2 more things:

  1. Renamed the main ConcurrentLinkedHashMap to something more obscure to avoid IDE auto-completion if anyone is looking for the OG one. I don't greatly care what the name is since this is effectively internal class, not part of API.
  2. If possible, remove various Weighers given that we will use basic equal-weighted entries logic. With the current usage there is no real way to customize caches wrt variable weights.

Other than that, assuming tests pass reliably, I'd be ready to merge this in, I think?

@pjfanning
Copy link
Member Author

I removed more Weigher logic and renamed the class as com.fasterxml.jackson.databind.util.internal.PrivateMaxEntriesMap

@cowtowncoder
Copy link
Member

Ok, good. How about I merge this now: we can continue tweaks if and as necessary.

@cowtowncoder cowtowncoder merged commit adf022c into FasterXML:2.14 Jul 1, 2022
@ben-manes
Copy link

Close FasterXML/jackson-module-scala#428?

@cowtowncoder
Copy link
Member

Alas, I did not realize that the Map implementation here is apparently very heavy memory user, as per #3665... half a meg for all of 4 caches? Phew. I was hoping to see if the initial sizes might be too big but that doesn't seem to be the case; all are between 16 and 64.

I would like to figure out something; further discussion on #3665.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants