Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Build tuning #1492

Merged
merged 1 commit into from
Sep 27, 2024
Merged

Build tuning #1492

merged 1 commit into from
Sep 27, 2024

Conversation

kaeawc
Copy link
Contributor

@kaeawc kaeawc commented Jun 6, 2024

What I have done and why

It was pointed out in a discussion that NowInAndroid uses a 6GB heap size which seemed high. I did a couple perf runs with current settings and no build cache or config cache. I did this at a few heap sizes like 12G, 6G, 4G, and 3G. Only at 3G do we observe a regression in build speed which is pretty easily explained by the 27 seconds spent in garbage collection - the build process was encountering GC thrashing in attempting to complete the build.

I then changed the JVM arg settings to be what I would use for tuning and while the first build at 3G was only slightly faster the subsequent ones are proving to be just as fast as all the other higher heap size ones.

Settings changed so far is just to this:

org.gradle.jvmargs=-Dfile.encoding=UTF-8 -XX:+UseG1GC -XX:SoftRefLRUPolicyMSPerMB=1 -XX:ReservedCodeCacheSize=512m -XX:+HeapDumpOnOutOfMemoryError -XX:+UnlockExperimentalVMOptions -Xmx3g -Xms3g -XX:MetaspaceSize=1g
kotlin.daemon.jvmargs=<same as Gradle>

In this PR I also want to add jemalloc to the GitHub CI as that should give roughly 10% memory savings by fixing native memory fragmentation. I did this through a custom action that caches jemalloc.

I also tried out other heap sizes with the new configuration. Anything higher than 3G didn't make a performance difference and anything lower (2G) ran the risk of the build not completing.

I'm happy to alter anything about this contribution, the goal is to help show how its possible to reduce resource usage while increasing performance.

Copy link

google-cla bot commented Jun 6, 2024

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

@kaeawc kaeawc force-pushed the kaeawc/build-tuning branch 2 times, most recently from fbf3f37 to d88f85e Compare June 6, 2024 04:37
@alexvanyo alexvanyo self-requested a review June 20, 2024 18:04
@alexvanyo
Copy link
Contributor

Could you split this PR into the gradle.properties updates and the jemalloc updates separately?

The gradle.properties updates look good to me and would approve, but would like to discuss the jemalloc change in more detail.

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 15, 2024

@alexvanyo sorry I didn't see your message, happy to split up the PR. Do you also want a separate issue?

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 15, 2024

I've removed jemalloc from this PR.

gradle.properties Outdated Show resolved Hide resolved
@SimonMarquis
Copy link
Contributor

@kaeawc could you explain the reason for each new flag? I'm not sure they are doing anything worth including in this project. Also, since this project is meant as a showcase, many devs will likely copy/paste the code, it would be best to not add "magic" and unexplained JVM tuning. 🙏

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 17, 2024

@SimonMarquis of course!

Metaspace, CodeCache, and Heap are all separate memory stores in the JVM. This goes all the way back to JDK 8.

Regarding the removal of MaxMetaspaceSize and addition of MetaspaceSize as 256MB - I outlined my research on the topic here for why Gradle projects that target JDK versions newer than 17 should try removing it. Setting a maximum without understanding of what the maximum limits in a project others will copy is arguably worse than allowing users to run into classloader memory leaks that have mostly been fixed.

CodeCache has a similar story that I'm also putting together my research for. The default for a 64bit system of 48MB is small relative to the 64-96MB that starter Android projects could take up and benefit from performance from. CodeCache is meant to have a relatively small limit.

Lastly SoftRefLRUPolicyMSPerMB is a pretty straightforward option - based on whether the JVM is configured to run as client or server mode it will release soft references in a number of milliseconds per megabyte of current heap or maximum heap sizes. The default value is 1000, which means for projects with 4GB heap size the JVM will release soft references after 68 minutes (~1 hour). 8GB heaps would release it after 2 hours. Hopefully everyone's builds are faster than that, which means their builds never release any soft references. I have consistently seen setting this to 1 instead of the default of 1000 is a win for releasing memory which can then be used for Metaspace, CodeCache, or the Heap.

Copy link
Contributor

@alexvanyo alexvanyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes and explantions look good to me, thank you for the research done here!

It looks like the CLA check is failing, if you can accept that I can merge these changes in.

And yes, please open a new issue and PR for the jemalloc change to discuss further, that'd be great.

gradle.properties Outdated Show resolved Hide resolved
Copy link
Collaborator

@dturner dturner left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See suggestion to add link to blog post so that developers can learn more about why these memory options were chosen.

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 18, 2024

I'm gonna pause on merging this because the JVM args for Kotlin Daemon are not behaving as I expected, see https://youtrack.jetbrains.com/issue/KT-71564/Kotlin-Daemon-JVM-args-behavior-does-not-match-documentation

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 21, 2024

So after a lot of digging I found all the JVM inheritance and defaults for Kotlin daemon and have a fix up for Kotlin's documentation on the subject JetBrains/kotlin-web-site#4454. Knowing that we should definitely also set the same args in both as that was the original intention, however specifically for CacheCode and Metaspace:

  • -XX:MetaspaceSize Sets the size of the allocated class metadata space that will trigger a garbage collection the first time it is exceeded. This threshold for a garbage collection is increased or decreased depending on the amount of metadata used. The default size depends on the platform. Therefore we should just not set this value as setting it delays GC events on Metaspace which is a tiny performance win but making a very strange change to the JVM's behavior.
  • -XX:ReservedCodeCacheSize is already set by the Kotlin daemon and their setting is 320mb. I was previously under the impression that neither Gradle not JetBrains were setting defaults for the code cache, it would be confusing if we set one for an open source project like that that others copy and make it different. Unfortunately there is no way today to set the Gradle JVM arg and still using Kotlin Gradle Plugin's default. Therefore Gradle should be set to 256 and Kotlin to 320.

Those changes are now in this PR

Copy link
Contributor

@alexvanyo alexvanyo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the extra investigation!

@alexvanyo alexvanyo merged commit 6b87bb2 into android:main Sep 27, 2024
4 checks passed
@G00fY2
Copy link

G00fY2 commented Sep 27, 2024

@kaeawc Did you actually measure performance of G1 vs ParallelGC? Because you replaced UseParallelGC with the default GC (which should be obsolete).
Previously benchmarks showed that ParallelGC actually performed best, at least for debug builds: https://dev.to/cdsap/jdk-17-using-parellgc-for-the-kotlin-process-in-android-builds-24af

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 27, 2024

Yes I have, both in this and other projects. It does result in a faster build time, but at the cost of larger memory footprint. The difference in speed is not very large in the latest JVMs and in a project like this that will be copied for its settings the community should not blindly follow ParallelGC when most projects will not have the resources to spend on huge CI runners and the most expensive developer machines.

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 27, 2024

G1GC is not obsolete. It is the default for a reason. Parallel used to be the default in JDK8 and prior versions, G1GC serves the widest number of applications the best, especially in the latest JDK versions.

@G00fY2
Copy link

G00fY2 commented Sep 28, 2024

@kaeawc Hmm, your results about GC seem not to match with the benchmarks I found so far.
In addition to the previously posted benchmark which uses NiA this one tested 3 different projects: https://bitrise.io/blog/post/switch-to-jdk-17-parallel-gc-and-speed-up-your-android-builds-by-9-20

when most projects will not have the resources to spend on huge CI runners and the most expensive developer machines

That's based on assumptions.

G1GC is not obsolete

I didn't meant G1GC itself is obsolete but the jvmArg -XX:+UseG1GC since it's default since Java 11. I would not expect adding arguments which match to the default.

Also another aspect: Google itself also recommends testing ParallelGC.
https://developer.android.com/build/optimize-your-build#experiment-with-the-jvm-parallel-garbage-collector
It could be confusing to find different GC without any explanation in the project.

Don't get me wrong. I really like this PR and thanks again for all the investigation and explanations in your blog posts. I am really into this whole build optimization topic.
Just the part about the GC felt short from my opinion.

Therefore I would suggest an follow up PR which either removes
-XX:+UseG1GC or reverts to -XX:+UseParallelGC.
Or we would maybe need some additional comments in the gradle.properties why choosing G1GC over ParallelGC even though there are benchmarks that suggest otherwise.

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 28, 2024

@G00fY2 I hear you on using something that is the JVM default. I think let's try to establish the goals more clearly and we can figure out which path is best to take. If we do remove it from the args I'd still like to call out in a comment that we're using G1GC.

@alexvanyo @dturner what are your goals be for NowInAndroid? Would you rather have the project be efficient on resources or speed, even if that speed is actually the same? I can redo the perf tests on this size project but also want to do them on a few other sizes to show what these settings look like when scaling over time.

@dturner
Copy link
Collaborator

dturner commented Sep 29, 2024

@kaeawc Our goals for the NowInAndroid build are in order of priority:

  1. reliability
  2. simplicity of maintenance / easy to understand for developers
  3. speed
  4. resource usage

@kaeawc
Copy link
Contributor Author

kaeawc commented Sep 30, 2024

@dturner in that case, considering @G00fY2's comment:

  1. Reliability - G1 is more likely to be reliable. As a codebase grows you'll hit heap size limitations faster with Parallel which will break a build.
  2. Simplicity - I can either see removing G1GC from the list of options just so its shorter, but then as a developer you need to understand what the JVM defaults are for the JDK you're running your project against. There are still a bunch of developers on old Gradle (7.x) and old JDK versions. So if we remove it I think keeping it as a comment is worthwhile.
  3. Speed - Profiled it again and the speed difference is not statistically significant for cached builds, but is significant (1 second) for clean builds in NowInAndroid. While Parallel is proven to be the faster option and recommended by Google, the docs there refer to this as "Experiment with the JVM parallel garbage collector" and "potentially improve build performance". I have seen cases where Parallel is not the best GC for speed and the cause is usually due to a lack of JVM tuning experience in a fast growing project. I think it is worth citing that link and noting that improved speed is possible.
  4. Resource usage - G1 is better for this, but its at the bottom of the list.

Based on all that I think this project should keep G1GC arg even though its a default, comment about Google's recommendation for using Parallel. If y'all want to discuss it more lets keep that going. If everyone thinks that's a decent path forward give a 👍🏻 and I'll make a PR with those changes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants