-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
remove valgrind call causing GPU problems #4239
base: master
Are you sure you want to change the base?
remove valgrind call causing GPU problems #4239
Conversation
Jenkins build this please |
I think it would be cleaner to introduce a CMake check to turn those checks on and off. Off by default |
Having many preprocessor statements is far from neat, so a cleaner solution that also avoids the compilation errors caused by calling host code in a device function is very welcome. How would a CMake check instead look and possibly avoid this issue in a more clean way? |
cmake would not avoid the preprocessor stuff, it would just give an option to flip it on/off. this option sorta already exists - |
Here I would think we want it to compile all the time. If the valgrind call is allowed in a device function then we cannot compile in debug mode for gpus, so having a macro that removes in undonditionally only on the gpu seems like the right move? If the option would allow this code on gpu we would introduce a sort of invalid build configuration that wont compile |
Could we move the preprocessor logic into Valgrind::CheckDefined() instead and make it a no-op on devices? |
Oh I thought that function would be a part of the valgrind implementation but we have one layer inbetween where I can place this macro instead, that would allow me to clean up some code where this solution is already in place... |
Jenkins build this please |
@@ -105,7 +105,8 @@ inline bool CheckDefined([[maybe_unused]] const T& value) | |||
template <class T> | |||
inline bool CheckAddressable([[maybe_unused]] const T& value) | |||
{ | |||
#if !defined NDEBUG && HAVE_VALGRIND | |||
// if we run in debug mode AND we have valgrind AND we are NOT in a gpu function | |||
#if !defined NDEBUG && HAVE_VALGRIND && !OPM_IS_INSIDE_DEVICE_FUNCTION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OPM_IS_INSIDE_DEVICE_FUNCTION
will always be false if the function does not have a device decorator?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah if the preprocessor statement is evaluated before the function is inlined then this seems to have no impact
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do not understand this statement, apart from "Yeah"...?
The code looks fine to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The very high level idea is that you don't want to call any Valgrind-function from within a device kernel.
This is accomplished by wrapping said statement within a preprocessor #if !OPM_IS_INSIDE_DEVICE_FUNCTION
, where
OPM_IS_INSIDE_DEVICE_FUNCTION
is true if and only if the function is being invoked from a device kernel.
However, a GPU kernel is not allowed to call a function which is not decorated with __device__
(with the exception of constexpr-functions with certain flags enabled), and the function in question, CheckAddressable
does not have any such decorator, hence it will never be called from a device kernel, in other words, the current preprocessor test is not doing anything before the __device__
decorator (through the OPM wrapper macro) is added.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So the current calls to this function would call compilation errors on GPU since they are not decorated, which means that either we should decorate it, or we go with the original solution of avoiding the calls by lots of #if
s. I prefer the first (add decorator).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that is kind of the gist of my comment. I'm a bit perplexed if this even solved the compilation in Debug problems, @multitalentloes ? Or is the function in question not called from any device kernel at the moment?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will look more into this early next week.
Avoid having Valgrind calls/throw inside GPU kernels