Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Hybrid (G1) GC uses the defeault 200ms MaxGCPauseMillis on JDK-21+ #1706

Draft
wants to merge 6 commits into
base: develop
Choose a base branch
from

Conversation

carterkozak
Copy link
Contributor

It seems that the heuristics have gone a bit sideways in JDK-21 causing degenerate full gc pauses unnecessarily when we configure -XX:MaxGCPauseMillis=500, however setting the default 200ms value resolves this behavior.

Note that we deployed the increase from 200ms to 500ms at a time when container cpu shares informed the JDKs processor count, thus gc threads/etc, which is no longer the case. As such, it should be safe (and generally more stable) to use the hardened default values from the jdk.

==COMMIT_MSG==
Hybrid (G1) GC uses the defeault 200ms MaxGCPauseMillis on JDK-21+
==COMMIT_MSG==

Possible downsides?

Performance could change in unexpected ways!

Alternatives:

We could change the default across the board, however it's a bit safer to apply this more precisely to the java version where our previous default value causes problems.

We could slowly ratchet the value down over time, however that makes it more difficult to root cause changes, as they would be less abrupt. We're currently rolling this out explicitly (via the configuration dsl) in a subset of services of varying sizes to validate that metrics look the same or better before rolling this out.

It seems that the heuristics have gone a bit sideways in JDK-21
causing degenerate full gc pauses unnecessarily when we configure
`-XX:MaxGCPauseMillis=500`, however setting the default 200ms value
resolves this behavior.

Note that we deployed the increase from 200ms to 500ms at a time
when container cpu shares informed the JDKs processor count, thus
gc threads/etc, which is no longer the case. As such, it should
be safe (and generally more stable) to use the hardened default
values from the jdk.
@changelog-app
Copy link

changelog-app bot commented Sep 25, 2024

Generate changelog in changelog/@unreleased

What do the change types mean?
  • feature: A new feature of the service.
  • improvement: An incremental improvement in the functionality or operation of the service.
  • fix: Remedies the incorrect behaviour of a component of the service in a backwards-compatible way.
  • break: Has the potential to break consumers of this service's API, inclusive of both Palantir services
    and external consumers of the service's API (e.g. customer-written software or integrations).
  • deprecation: Advertises the intention to remove service functionality without any change to the
    operation of the service itself.
  • manualTask: Requires the possibility of manual intervention (running a script, eyeballing configuration,
    performing database surgery, ...) at the time of upgrade for it to succeed.
  • migration: A fully automatic upgrade migration task with no engineer input required.

Note: only one type should be chosen.

How are new versions calculated?
  • ❗The break and manual task changelog types will result in a major release!
  • 🐛 The fix changelog type will result in a minor release in most cases, and a patch release version for patch branches. This behaviour is configurable in autorelease.
  • ✨ All others will result in a minor version release.

Type

  • Feature
  • Improvement
  • Fix
  • Break
  • Deprecation
  • Manual task
  • Migration

Description

Hybrid (G1) GC uses the defeault 200ms MaxGCPauseMillis on JDK-21+

Check the box to generate changelog(s)

  • Generate changelog entry

return ImmutableList.of("-XX:+UseG1GC", "-XX:+UseNUMA", "-XX:MaxGCPauseMillis=" + maxGCPauseMillis);
public final List<String> gcJvmOpts(JavaVersion javaVersion) {
return ImmutableList.of(
"-XX:+UseG1GC", "-XX:+UseNUMA", "-XX:MaxGCPauseMillis=" + getMaxGCPauseMillis(javaVersion));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps it would be best to exclude this parameter entirely when it's not specified? That way if the default changes in the jdk, we follow it. I don't have a strong preference -- we could update this once we drop support for jdk17 to exclude the value unless specified.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants