Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Question] Is it safe to traverse Harris-*'s lists? #574

Closed
jessyoon14 opened this issue Dec 9, 2021 · 8 comments
Closed

[Question] Is it safe to traverse Harris-*'s lists? #574

jessyoon14 opened this issue Dec 9, 2021 · 8 comments
Assignees
Labels
lecture question/discussion related to lectures question Further information is requested

Comments

@jessyoon14
Copy link

jessyoon14 commented Dec 9, 2021

I have a question about slide 99 (Linearizability video 26:58).

In the video, professor mentions the following content:
During the traversal of Harris-*'s lists, the read operation may read stale values that is already overwritten in the shared memory, as it may read from its own cache. It may read a stale value that points to the already detached node. These complications are not yet resolved yet in literature.

I have two questions about this.

  1. Is it the case that Harris-*'s list traversal is actually dangerous (points to stale values in practice)? Or is it the case that in practice, list traversal is safe, but it's just not mathematically proven? I'm confused whether Harris-*'s lists are an unsafe CDS or a safe CDS.
  2. How is it possible for a thread to read a stale value from its cache that's already overwritten in the shared memory? When the shared memory is updated, isn't the cache rewritten to reflect the current shared memory? Otherwise, we wouldn't be able to use caches.

I would appreciate any help in understanding this topic.
Thank you in advance to anyone who answers :)

@jessyoon14 jessyoon14 added the question Further information is requested label Dec 9, 2021
@tomtomjhj tomtomjhj assigned jeehoonkang and unassigned kyeongmincho Dec 9, 2021
@poppindouble
Copy link

Here is my understanding, hope it helps. :)

  1. What is the meaning of "safe" CDS.

Let's say you write a CDS, you try all different kinds of way to break this CDS, and this CDS is really well designed that it doesn't have any memory issue. Can I say this CDS is "safe"?

If yes, then the word "safe" is related to the way how you manage your memory to make sure CDS don't have memory issues, like hazard pointers or epoch based memory reclamation, etc.

If we are considering the word "safe" as: It is guaranteed that the Harris-'s lists write operation order = the order of successful CAS on the head and node's next pointer. Well, yes, we are still considering Harris-'s lists is "safe" in a way that it guarantee gives us some properties.

However the remaining question is Harris-*'s lists read operation order, which leads to our second question.

  1. Where stale values come from?

Let's say you have 3 concurrent operations going in your list. Remove Node A, Read Node A, Change Node A's value, can we still get some solid equations like above? No, We don't know where this Read Node A operation should be placed. We can't decide the result should be null or new value or original value, all results make sense.

Besides above, from the computer architecture perspective, each core has their own cache, and the value of it might be out of sync with the value in memory, it depends on the protocol of how cores are communicating with each other, like MESI.

@tomtomjhj
Copy link
Member

Thanks @poppindouble for great explanation. I'd like to add some more stuff.

The problem is that the methodology for mathematically specifying libraries under relaxed memory model is not mature enough.

Let me give a simple example. pop method of Treiber's stack in relaxed memory model can return None even if the stack is in fact not empty, because head.load can read an old null pointer. So pop in relaxed memory model doesn't exactly do what we expected it to do. You could modify the code of pop to work exactly as expected, but that will incur more synchronization overhead, so this is not an option.

What we want is to have an efficient implementation of stack, and at the same time reason about the stack's behavior. So we should come up with a specification that explains such a weak behavior. For example, we can specify that if pop returns None, then the thread must not have "seen" a push that is not already popped.

@jessyoon14
Copy link
Author

jessyoon14 commented Dec 11, 2021

Thank you both for a very clear explanation! I understand the limitations of the current model.

I have a followup question about @tomtomjhj 's comment:

Let me give a simple example. pop method of Treiber's stack in relaxed memory model can return None even if the stack is in fact not empty, because head.load can read an old null pointer. So pop in relaxed memory model doesn't exactly do what we expected it to do. You could modify the code of pop to work exactly as expected, but that will incur more synchronization overhead, so this is not an option.

I'm confused by what you mean by a relaxed memory model. In the cs431 implementation of Treiber's stack, we use acquire/release synchronization, so I feel like this problem cannot happen. Are you saying that pop does not behave as expected for the cs431 Treiber stack code? Or are you saying that the current code does not have this problem, but if we replace all release/acquire synchronization in the code with Ordering::Relaxed, then pop would not behave as expected?

@poppindouble
Copy link

@jessyoon14 I gave out my own understand in #582, hope it helps. :)

@poppindouble
Copy link

@jessyoon14
First try to understand the relation between L0 cache and the main memory. CSAPP chapter 6 might help. Back to your question, Let's say you run Treiber's stack on a really weak memory model ISA machine, if you specify acquire and release in your C++ code or Rust, the CPU shouldn't do some reordering stuff, however if it is relaxed, reordering might happen and that's where the None comes from.

@tomtomjhj
Copy link
Member

we use acquire/release synchronization, so I feel like this problem cannot happen

Using acquire/release access on the atomic location doesn't guarantee that load(Acquire) on that location will always read the latest value. Only atomic RMWs (e.g. CAS) and SeqCst accesses/fences can guarantee that.

Are you saying that pop does not behave as expected for the cs431 Treiber stack code?

Note that a successful pop is done by a successful CAS. So if you ignore pops that return None (which don't try CAS in the first place), the Treiber's stack implementation works exactly like stack.

@poppindouble
Copy link

Thanks for help me clear some of my misunderstandings

@jessyoon14
Copy link
Author

Thank you!

@Lee-Janggun Lee-Janggun added the lecture question/discussion related to lectures label Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
lecture question/discussion related to lectures question Further information is requested
Projects
None yet
Development

No branches or pull requests

6 participants