Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix thread unsafety in occasional logging when using std::atomic #808

Closed

Conversation

Nimrod0901
Copy link

This PR is to address #804

Note:

  1. Declare LOG_OCCURRENCES_MOD_N as a local, non-atomic variable and use fetch_add % n to get the exact modulo remainder.
  2. Initialize SOME_KIND_OF_LOG_EVERY_N from 0 to 1 since fetch_add will get the previous value.
  3. Add an extra condition when n == 1.
  4. Introduce a new variable LOG_OCCURRENCES_MOD_N in the macro SOME_KIND_OF_LOG_FIRST_N. Although the name seems unrelated, I prefer not to introduce another macro to name it.
  5. Remove the usage AnnotateBenignRaceSized for variables unlikely to be under a multi-thread situation.

I only fix the atomic version and I have no idea if other versions have a similar problem.
Would you mind taking a look? @sergiud @drigz Thanks.

Aside question:
I doubt if AnnotateBenignRaceSized will take effect on std::atomic since no data race will happen.

++LOG_OCCURRENCES; \
if (++LOG_OCCURRENCES_MOD_N > n) LOG_OCCURRENCES_MOD_N -= n; \
if (LOG_OCCURRENCES_MOD_N == 1) \
int LOG_OCCURRENCES_MOD_N = LOG_OCCURRENCES.fetch_add(1, std::memory_order_relaxed) % n; \
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not clear to me why changing to fetch_add provides a solution here. op++ also calls fetch_add albeit with different memory ordering. Consequently, you cannot avoid the problem you are trying to solve. To be able to, you would need to perform all the involved operations atomically by protecting concurrent access in a critical section.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem was "a series of atomic operations of LOG_OCCURRENCES_MOD_N, but in whole, they are not atomic, which unexpected". Here only the LOG_OCCURRENCES is static, the only place we need to protect. Specifically, we only do one atomic operation fetch_add on LOG_OCCURRENCES.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My question is: in what way do the changes fix the problem? The operations are still not atomic as a whole.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operations are still not atomic as a whole.

Yes. But they don't need to be. There is only one operation that needs to be atomic, access to the static variable LOG_OCCURRENCES.

in what way do the changes fix the problem?

For every use of this macro (under different threads), a uniqueLOG_OCCURRENCES value is assigned, incremented by 1. We use a local variable LOG_OCCURRENCES_MOD_N to get the reminder of % n. By now, there is no need for protection or being atmoic. It can be written like

int LOG_OCCURRENCES_MOD_N = LOG_OCCURRENCES.fetch_add(1, std::memory_order_relaxed);
if (LOG_OCCURRENCES_MOD_N % n == 1)

They are the same.

Copy link
Collaborator

@sergiud sergiud Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You cannot do that. Once LOG_OCCURRENCES overflows you will hit undefined behavior.

Also, the change from op++ to fetch_add is not clear to me. Why is this necessary? If you need the previous value, you can use the post increment.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main idea is to promise every call can have the exact value in the if condition statements to represent if it should be logged. When the static atomic variable is fetch_add to a local variable, the state is recorded. You'll never get the same value.

Copy link
Author

@Nimrod0901 Nimrod0901 Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once LOG_OCCURRENCES overflows you will hit undefined behavior.

You're right. This may need more consideration.

the change from op++ to fetch_add is not clear to me. Why is this necessary?

int val = op++; // not atomic, there are two operations. self increment and assignment. ++op is the same
int val = op.fetch_add(1) // atomic

Copy link
Collaborator

@sergiud sergiud Mar 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not correct. With post increment you will get the previous value since it essentially calls fetch_add as well.

Please refer to the documentation of std::atomic::op++ before we continue the discussion. Thanks.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, It's my bad. You're right. I agree there is no need to use fetch_add. ++ is enough.

Once LOG_OCCURRENCES overflows you will hit undefined behavior.

Use atomic_uint instead. Unsigned integer overflow is not a UB. Will this help?

@sergiud
Copy link
Collaborator

sergiud commented Mar 18, 2022

Since a fix is not straightforward to get right, I suggest that you add unit tests.

@Nimrod0901
Copy link
Author

I make some updates. Please take a look if you have time.
The tests I added are most likely to fail on the master branch.

@sergiud
Copy link
Collaborator

sergiud commented Apr 3, 2022

I'm very sorry for the delay. I will look at the PR right after 0.6 release.

@sergiud sergiud added this to the next milestone Apr 3, 2022
@Nimrod0901
Copy link
Author

I'm very sorry for the delay. I will look at the PR right after 0.6 release.

No problem.

@sergiud sergiud removed this from the 0.7 milestone Oct 14, 2023
@sergiud sergiud linked an issue Jan 3, 2024 that may be closed by this pull request
@sergiud
Copy link
Collaborator

sergiud commented Jun 11, 2024

I'll close the PR since it's heavily diverged from master. @Nimrod0901 if you want continue working on this please rebase.

@sergiud sergiud closed this Jun 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

LOG_EVERY_N seems not thread safe in C++11 and newer
2 participants