Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid crashing in case of alias detection failure #607

Merged

Conversation

BrzVlad
Copy link
Member

@BrzVlad BrzVlad commented Aug 1, 2024

Simply use the conservative may alias value instead.

Simply use the conservative may alias value instead.
@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 1, 2024

Should fix crashes in implicit null checks pass dotnet/runtime#105319

cc @radekdoulik

@lambdageek
Copy link
Member

@BrzVlad is this a bug upstream? I don't think the implicit null checks pass is heavily used. Can we isolate a test case where the operand doesn't have a value?

@akoeplinger
Copy link
Member

We actually added the same check for MMO2 in f1f846f, maybe it is something specific to us?

@akoeplinger akoeplinger merged commit b9b4464 into dotnet:dotnet/main-19.x Aug 1, 2024
15 checks passed
@lambdageek
Copy link
Member

maybe it is something specific to us

I'm not sure. the reason MMO1->getValue() might return null is because MachineMemOperand stores a PointerUnion<Value*, PseudoSourceValue*> and getValue will only extract the Value* if it's stored in there, if there's a PseudoSourceValue stored there, it'll return null.

/// Return the base address of the memory access. This may either be a normal
/// LLVM IR Value, or one of the special values used in CodeGen.
/// Special values are those obtained via
/// PseudoSourceValue::getFixedStack(int), PseudoSourceValue::getStack, and
/// other PseudoSourceValue member functions which return objects which stand
/// for frame/stack pointer relative references and other special references
/// which are not representable in the high-level IR.
const Value *getValue() const { return PtrInfo.V.dyn_cast<const Value*>(); }
const PseudoSourceValue *getPseudoValue() const {
return PtrInfo.V.dyn_cast<const PseudoSourceValue*>();
}

So maybe we're the only ones who do implicit null checks of stack pointer offsets, for example?

@lambdageek
Copy link
Member

ImplicitNullChecks only fires on branches tagged with MD_make_implicit which emit_cond_system_exception in mini-llvm.c only does for NullReferenceException - MONO_EMIT_NEW_COND_EXC (cfg,cond,"NullReferenceException") is the macro that emits it. we use it pretty liberally without really checking if the argument is a heap or stack reference. so it's pretty likely we're doing something that the pass doesn't expect.

For example we emit it for the Interlocked.Increment (ref int arg) intrinsic:

https://github.com/dotnet/runtime/blob/1337553ef2f1cb7a78bb1c19a85b8a9c929064b2/src/mono/mono/mini/intrinsics.c#L1424-L1452

which will end up with a stack operand if someone is using atomic increment on a local for some misguided reason.

@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 2, 2024

@lambdageek I agree this is likely not a proper fix. I gave up quickly trying to undertand what exactly is leading to this crash since it didn't seem trivial and avoiding crashes is high priority. Further investigation would be preferable at some point.

@matouskozak
Copy link
Member

How long does it take for this change to become effective inside dotnet/runtime? We are still seeing failures on the CI https://dev.azure.com/dnceng-public/public/_build/results?buildId=766538&view=logs&j=bb18f708-6684-5381-2c5c-c995f6debaa3&t=644a1944-383a-5910-653c-070240832801

@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 6, 2024

@matouskozak It has to be bumped manually dotnet/runtime#105867. I just triggered a new run there since it was still crashing before due to some simd intrinsic failures, which might have been caused by one of @tannergooding changes while llvm x64 ci was broken.

@matouskozak
Copy link
Member

@matouskozak It has to be bumped manually dotnet/runtime#105867. I just triggered a new run there since it was still crashing before due to some simd intrinsic failures, which might have been caused by one of @tannergooding changes while llvm x64 ci was broken.

The failures were fixed in dotnet/runtime#105538 but we verified it only on arm64. Let's see how the new run goes.

@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 6, 2024

I already had that commit when testing locally. Also I fixed a crash by replacing new_args with args as it can be seen in that PR but there still seems to be some other issue. So I disabled emitting of that intrinsic entirely to see if it fixes the crashes.

@tannergooding
Copy link
Member

Is there still a crash or issue that I need to look at here?

Was this missed or masked by CI due to the broader LLVM issue that existed?

@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 7, 2024

@tannergooding The crash was caused by dotnet/runtime@4fbd498cc25c. It is x64 specific. I reproduced it on MacOS with the llvm bump from dotnet/runtime#105867. I suspect it reproduces the same on linux.

The steps to repro should look something like this:
// Build runtime
./build.sh -subset mono+libs+clr.hosts -c Release /p:MonoEnableLLVM=true /p:MonoAOTEnableLLVM=true
// Build core root
src/tests/build.sh -mono Release generatelayoutonly
// Compile System.IO.Hashing.dll
cd artifacts/tests/coreclr/osx.x64.Release/Tests/Core_Root
export MONO_PATH=..../artifacts/tests/coreclr/osx.x64.Release/Tests/Core_Root
export PATH=$PATH:.../artifacts/bin/mono/osx.x64.Release // in order to find opt and llc
export MONO_ENV_OPTIONS="--aot=llvm"
..../artifacts/bin/mono/osx.x64.Release/cross/osx-x64/mono-aot-cross System.IO.Hashing.dll

@BrzVlad
Copy link
Member Author

BrzVlad commented Aug 7, 2024

Also note that I tried fixing the crash (the second commit in the linked PR), which seemed to be caused by accidental unitialized passing of new_args. However it was still crashing so I disabled emitting that intrinsic altogether. Note that I'm not yet convinced that the bug comes from your commit. It is possible there is an issue somewhere else, maybe in our already existing implementation of PSHUFD for example.

radekdoulik pushed a commit that referenced this pull request Aug 16, 2024
Simply use the conservative may alias value instead.
radekdoulik pushed a commit that referenced this pull request Sep 2, 2024
Simply use the conservative may alias value instead.
radekdoulik pushed a commit that referenced this pull request Sep 12, 2024
Simply use the conservative may alias value instead.
radekdoulik pushed a commit that referenced this pull request Sep 19, 2024
Simply use the conservative may alias value instead.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants