Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove valgrind call causing GPU problems #4239

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

multitalentloes
Copy link

Avoid having Valgrind calls/throw inside GPU kernels

@multitalentloes
Copy link
Author

Jenkins build this please

@blattms
Copy link
Member

blattms commented Oct 1, 2024

I think it would be cleaner to introduce a CMake check to turn those checks on and off. Off by default

@multitalentloes
Copy link
Author

Having many preprocessor statements is far from neat, so a cleaner solution that also avoids the compilation errors caused by calling host code in a device function is very welcome. How would a CMake check instead look and possibly avoid this issue in a more clean way?

@akva2
Copy link
Member

akva2 commented Oct 1, 2024

cmake would not avoid the preprocessor stuff, it would just give an option to flip it on/off.

this option sorta already exists - -DCMAKE_DISABLE_FIND_PACKAGE_Valgrind=1 make it into noops.

@multitalentloes
Copy link
Author

Here I would think we want it to compile all the time. If the valgrind call is allowed in a device function then we cannot compile in debug mode for gpus, so having a macro that removes in undonditionally only on the gpu seems like the right move? If the option would allow this code on gpu we would introduce a sort of invalid build configuration that wont compile

@atgeirr
Copy link
Member

atgeirr commented Oct 1, 2024

Could we move the preprocessor logic into Valgrind::CheckDefined() instead and make it a no-op on devices?

@multitalentloes
Copy link
Author

Oh I thought that function would be a part of the valgrind implementation but we have one layer inbetween where I can place this macro instead, that would allow me to clean up some code where this solution is already in place...

@multitalentloes
Copy link
Author

Jenkins build this please

@@ -105,7 +105,8 @@ inline bool CheckDefined([[maybe_unused]] const T& value)
template <class T>
inline bool CheckAddressable([[maybe_unused]] const T& value)
{
#if !defined NDEBUG && HAVE_VALGRIND
// if we run in debug mode AND we have valgrind AND we are NOT in a gpu function
#if !defined NDEBUG && HAVE_VALGRIND && !OPM_IS_INSIDE_DEVICE_FUNCTION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OPM_IS_INSIDE_DEVICE_FUNCTION will always be false if the function does not have a device decorator?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah if the preprocessor statement is evaluated before the function is inlined then this seems to have no impact

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not understand this statement, apart from "Yeah"...?

The code looks fine to me.

Copy link
Contributor

@kjetilly kjetilly Oct 1, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The very high level idea is that you don't want to call any Valgrind-function from within a device kernel.

This is accomplished by wrapping said statement within a preprocessor #if !OPM_IS_INSIDE_DEVICE_FUNCTION, where
OPM_IS_INSIDE_DEVICE_FUNCTION is true if and only if the function is being invoked from a device kernel.

However, a GPU kernel is not allowed to call a function which is not decorated with __device__ (with the exception of constexpr-functions with certain flags enabled), and the function in question, CheckAddressable does not have any such decorator, hence it will never be called from a device kernel, in other words, the current preprocessor test is not doing anything before the __device__ decorator (through the OPM wrapper macro) is added.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the current calls to this function would call compilation errors on GPU since they are not decorated, which means that either we should decorate it, or we go with the original solution of avoiding the calls by lots of #ifs. I prefer the first (add decorator).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, that is kind of the gist of my comment. I'm a bit perplexed if this even solved the compilation in Debug problems, @multitalentloes ? Or is the function in question not called from any device kernel at the moment?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will look more into this early next week.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants