-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make delegates immutable #99200
base: main
Are you sure you want to change the base?
Make delegates immutable #99200
Conversation
I'm not sure what's up with the failures here, tests that are failing on the CI seem to pass on my machine. |
src/coreclr/tools/aot/ILCompiler.ReadyToRun/JitInterface/CorInfoImpl.ReadyToRun.cs
Outdated
Show resolved
Hide resolved
Co-authored-by: Jan Kotas <[email protected]>
/azp run runtime-coreclr gcstress0x3-gcstress0xc |
Azure Pipelines successfully started running 1 pipeline(s). |
Could you please collect some perf numbers to give us an idea about the improvements and regressions in the affected areas? We may want to do some optimizations to mitigate the regressions. |
The existing code tries to compare MethodInfos as a cheap fast path. Most delegates do not have cached MethodInfo, so this fast path is hit rarely - but it is very cheap, so it is still worth it. This cheap fast path is not cheap anymore with this change. It may be best to delete the fast path that is trying to compare the MethodInfos and potentially optimize |
I think the thing that'd need benchmarking here are equality checks and maybe the impact of collectible delegates being stored in the CWT on the GC, the rest of things shouldn't be performance sensitive enough to matter I think? I'm not fully sure what'd be the proper way for benchmarking the latter.
I am going to benchmark the impact of the equality change tomorrow, I feel like if the impact won't be big, potential optimizations to it can be done later. |
Write a small program that loads an assembly as collectible and calls a method in it. The method in collectible assembly can create delegates in a loop. (If you would like to do it under benchmarknet, it works too - but it is probably more work.) |
We should be able to minimize the perf hit for collectible assemblies by piggy backing on the |
And if correctness for collectible assemblies can be taken care of by reusing the existing delegate field, ConditionalWeakTable becomes just a throughput optimization for repeated Delegate.Method calls. It may be fine to take a throughput regression for this case. I think it rare for Delegate.Method to be called repeatedly. And when it is called, it is going to be used together with other reflection that is generally slow so it does not need to be super-fast. |
I've finally gotten around to benchmarking the Equality here and the perf regression here is noticeable:
|
@jkotas Would it be possible to repurpose |
Hmm, it says |
Right. This table has the basic delegate kinds. In addition to that, there are multicast delegates and marshalled interop delegates. The delegate fields and implementation are split between Delegate and MulticastDelegate for historic reasons. If you would like to explore playing tricks with overloading the fields, I think it would help to move all fields to Delegate in a separate prep-work PR. Similar mechanical refactoring was done for native AOT in #97959. |
One issue with doing that is that it'd cause the runtime to reorder fields and put the invocationList before the pointer fields, we'd need to use some other tricks like fixed layout to prevent this. |
Yes, that's fine. It would be a temporary measure until something forces us to take R2R version reset |
@jkotas I recently had the idea to cache the MethodDesc* for equality checks instead like I pushed here today, is this approach sound enough? |
@@ -130,12 +139,7 @@ public override bool Equals([NotNullWhen(true)] object? obj) | |||
// fall through method handle check |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could any of the checks above be removed now that the "slow" path is not that slow?
@@ -1725,8 +1729,10 @@ extern "C" void QCALLTYPE Delegate_Construct(QCall::ObjectHandleOnStack _this, Q | |||
if (COMDelegate::NeedsWrapperDelegate(pMeth)) | |||
refThis = COMDelegate::CreateWrapperDelegate(refThis, pMeth); | |||
|
|||
refThis->SetMethodDesc(pMethOrig); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not 100% positive I use the correct descs when setting the field from VM in places like these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trying to be a perf fix or correctness fix?
Delegate_Construct
is a rare slow path. I do not think we need to bother optimizing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this trying to be a perf fix or correctness fix?
Delegate_Construct
is a rare slow path. I do not think we need to bother optimizing it.
It was supposed to be a perf improvement, but as per discussion with @AndyAyersMS, if we could guarantee the desc is always reliably set by the VM when a delegate is created and never lazily initialized, delegate GDV could then be improved by switching to it. I feel like guaranteeing that should be left to a future PR though since it needs somebody with more VM/reflection knowledge to check all the possible paths.
if (pMeth->GetLoaderAllocator()->IsCollectible()) | ||
refThis->SetMethodBase(pMeth->GetLoaderAllocator()->GetExposedObject()); | ||
refThis->SetInvocationList(pMeth->GetLoaderAllocator()->GetExposedObject()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
InvocationList
seems to have already been used for LoaderAllocators
in specific cases so I just moved all of them there, I'm not 100% sure if there might be a case where VM puts one there now while the field is already occupied.
Did you try to implement and measure this variant? This change is about perf trade-offs. We should be talking numbers. Micro-benchmark numbers for affected methods and also how often are the affected methods called in real-world. For example, it would be useful to collect some numbers for how often is Delegate equality or Delegate MethodInfo called in real-world apps. |
This idea came from thinking about that, in fact I just wanted to partially inline Delegate_InternalEqualMethodHandles into managed, moving the handle fetching didn't seem trivial to do so I just cached it in a field.
I assume using Delegates as dictionary keys would be the most popular usage for equality.
That is actually why I like the current variant of it, it makes the least difference to numbers, only slightly regressing .Method while keeping size unchanged and Equality probably even slightly faster, while still unlocking placement of delegates on the FOH. I feel like any other simplifications could wait for later while this would unlock implementing #85014.
So, by affected methods you'd be referring to improvements in Equals and GetHashCode speed and regressions in .Method? Since AFAIR all other APIs should remain unaffected. |
I've benchmarked the APIs, had to massage the code a bit to make the JIT happy though:
GetHashCode:
Method:
|
I'm not sure if the System.Text.Json failures here are related, I saw it here before with Windows x86 Debug and now it's on OSX x64 Debug, couldn't find an issue for them either. |
For context, my only motivation with this is unblocking #85014 cause I already got non PGO delegate inlining mostly working locally, but it currently only handles static readonly delegates and requires that to work for lambdas and #108606 and #108579 to work better with delegates in general. (I also don't have the NativeAOT VM implementation for my new JIT-EE API finished yet) |
First attempt at making delegates immutable in CoreCLR so that they can be allocated on the NonGC heap.
I've checked it with a simple app and corerun locally with a delegate from an unloadable ALC and it seemed to not crash, assert nor unload the ALC from under the delegate, however I couldn't actually find any runtime tests that would verify delegates from unloadable ALCs work so the CI coverage might be missing.
One small point of concern is that this might make delegate equality checks slower since they rely on checking the methods in the last "slow path" check, which is however always hit for different delegates AFAIR.
Contributes to #85014.
cc @jkotas