Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Inprocess memory leak #11406

Open
wants to merge 21 commits into
base: master
Choose a base branch
from
Open

Conversation

shivaspeaks
Copy link
Member

@shivaspeaks shivaspeaks commented Jul 24, 2024

Calculating size of the message for InProcess transport and calling outboundWireSize() updating the buffer limit. This solves the memory leak issue mentioned in #8712

Now that the sizes are being calculated for InProcess, it makes sense for asserting getOutboundWireSize() greater than 0, hence the change in unit test case.

Tested the code changes by publishing it into MavenLocal as 1.66.0-SNAPSHOT version and tried to reproduce the issue mentioned in #8712. We do not see any issue of memory leak with this SNAPSHOT version.
To double check I reversed the code changes done in this PR #9361 and then tested the SNAPSHOT version. This means we can safely enable default retry in InProcess transport.

Copy link

linux-foundation-easycla bot commented Jul 24, 2024

CLA Signed

The committers listed above are authorized under a signed CLA.

@larry-safran
Copy link
Contributor

This PR should include reversing the workaround since it is no longer needed. Also, we need a test case that enables retries for an InProcessTransport and then sends a bunch of messages so it can verify that there wasn't a memory leak.

@ejona86
Copy link
Member

ejona86 commented Aug 1, 2024

Also, we need a test case that enables retries for an InProcessTransport and then sends a bunch of messages so it can verify that there wasn't a memory leak.

That sounds like it'll be too much effort. Tests that look at memory use directly tend to be a bad idea, and "run until oom" is hard to get right. It is easiest to reproduce the issue using a real stub+channel, but I don't think RetryableStream offers a real way to see that it has committed in that case. And even if we did it at the transport layer instead, things aren't much better. Wiring that all up seems too much work for a first PR in gRPC, and probably excessive for you and me as well for the amount of value it'll provide.

AbstractTransportTest already has tests for the outbound/inbound size tracer callbacks (sizesReported() == true). They are simply disabled for InProcessTransport today. If we pass those tests, RetryableStream will be satisfied. So let's just do that, which does mean we'll implement outboundUncompressedSize, inboundWireSize, inboundUncompressedSize as well, but that's incrementally pretty easy. (The wire sizes and the uncompressed sizes are the same for in-process.)

We should still manually verify the memory use is fixed using the reproduction. But it seems too much work for little gain to make it automated.

Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're going to need a way to disable the new behavior, for performance cases. Even if the InputStream is KnownLength, a protobuf has to walk its tree and calculate its size. I'm not quite sure what form would be best, but all of them should be an option on InProcess{Channel,Server}Builder.

I'm toying with the idea of "assumedMessageSize(int)". The main issue with disabling the new behavior is obviously retries will be broken. The option could also disable retries, but it seems it might be nicer to have the user provide us with a message size, and we'll just use that instead. Dunno if we need "assumedMessageSize(int sent, int received)" or any other more fancy options. And I don't know if the idea itself is too fancy and it'd be easy to just accept no retries. Although it also impacts tracing, and more.

@larry-safran, any ideas?

@larry-safran
Copy link
Contributor

I think that assumedMessageSize(int bytes) is the way to go. Having different values for sending and receiving when they are expected to be identical seems overly complicated. Since the size tracing for both is being driven from the client side, we probably only need this method on the InProcessChannelBuilder.

@ejona86
Copy link
Member

ejona86 commented Aug 7, 2024

Since the size tracing for both is being driven from the client side, we probably only need this method on the InProcessChannelBuilder.

Right now it is being driven by the sender, which is both client and server. I don't think we'd want to mix the configuration between the channel and server; so we'd want to configure it on both sides. That'd be a bit annoying, but seems like the alternative is too confusing.

@larry-safran
Copy link
Contributor

larry-safran commented Aug 7, 2024 via email

@larry-safran
Copy link
Contributor

To fix the census test, in CensusModulesTest.java change the class declaration to
private static class StringInputStream extends InputStream implements KnownLength { and add to the StringInputStream class


    @Override
    public int available() throws IOException {
      return string == null ? 0 : string.length();
    }

That will allow the test to behave in the way it intends.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There isn't a need to have a completely separate class. Just add tests to the regular InProcessTransportTest where you check that sizes are correctly reported for various scenarios: Specified an assumedSize that is different from the actual size; didn't specify the assumedSize; sent several messages.

Copy link
Member Author

@shivaspeaks shivaspeaks Aug 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might not be able do that because we need to have an instance of InProcessTransport specified with TEST_MESSAGE_LENGTH for this test. So if we do that within InProcessTransportTest itself then we are expecting other tests like AnonymousInProcessTransportTest with the expected TEST_MESSAGE_LENGTH.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You just need to do the full channel creation in the specific tests instead of relying on the shared setup.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm trying to take reference from basicStream() and creating a new test function in InProcessTransportTest naming it basicStreamInProcess(). All the lines before line 857, I'm almost taking every line for the setup except the Assertion statements and verification statements.

And at last I'll just be Asserting with what we need to test

assertEquals(TEST_MESSAGE_LENGTH, streamTracerSender.getOutboundWireSize());
assertEquals(TEST_MESSAGE_LENGTH, streamTracerSender.getOutboundUncompressedSize());
assertEquals(TEST_MESSAGE_LENGTH, streamTracerReceiver.getInboundWireSize());
assertEquals(TEST_MESSAGE_LENGTH, streamTracerReceiver.getInboundUncompressedSize());

So by doing this, a lot of variables and inner private classes of AbstractTransportTest comes into picture. I made all of them as protected or public wherever necessary. Otherwise we will be ending up re-writing all the classes again provided in AbstractTransportTest. Adding all this will pollute the class even more and that too just for 1 test! Also, I'm running into more unseen errors by full channel creation within test. Maybe we should fallback to the current approach itself of having a different class and we cans specify in the javadoc that this test specifically runs for assumed size in InProcess transport? Its getting complex here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @larry-safran, we are waiting for your reply. We don't want to copy the private classes TestServerStreamTracer and TestClienttreamTracer in AbstractTransportTest, and moving them out to separate classes defeats the purpose of avoiding creating new test files unnecessarily.

@larry-safran
Copy link
Contributor

larry-safran commented Sep 9, 2024 via email

Copy link
Contributor

@larry-safran larry-safran left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is getting very close.

@larry-safran larry-safran self-requested a review October 8, 2024 20:06
Copy link
Member

@ejona86 ejona86 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When merging, we should rework the commit message. Saying "Fix inprocess memory leak" doesn't really say what leak is being fixed or what change is being done. Take a look at CONTRIBUTING.md for formatting commit messages, but this could be "inprocess: Fix retry unlimited buffering" or "inprocess: Support tracing message sizes". In the description it would then talk about the other aspect, so "This adds support for tracing message sizes so retry can measure how much memory is buffered" or "This makes it safe to use with retry which can now measure how much memory is buffered," respectively. It should also talk about how measuring message sizes can be expensive so an API was added to avoid that cost.

We should also use "Fixes #8712" syntax in the description so the commit/PR and the issue are linked (https://docs.github.com/en/get-started/writing-on-github/working-with-advanced-formatting/using-keywords-in-issues-and-pull-requests#linking-a-pull-request-to-an-issue).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants