[Question] Is it safe to traverse Harris-*'s lists? #574

jessyoon14 · 2021-12-09T08:05:00Z

I have a question about slide 99 (Linearizability video 26:58).

In the video, professor mentions the following content:
During the traversal of Harris-*'s lists, the read operation may read stale values that is already overwritten in the shared memory, as it may read from its own cache. It may read a stale value that points to the already detached node. These complications are not yet resolved yet in literature.

I have two questions about this.

Is it the case that Harris-*'s list traversal is actually dangerous (points to stale values in practice)? Or is it the case that in practice, list traversal is safe, but it's just not mathematically proven? I'm confused whether Harris-*'s lists are an unsafe CDS or a safe CDS.
How is it possible for a thread to read a stale value from its cache that's already overwritten in the shared memory? When the shared memory is updated, isn't the cache rewritten to reflect the current shared memory? Otherwise, we wouldn't be able to use caches.

I would appreciate any help in understanding this topic.
Thank you in advance to anyone who answers :)

poppindouble · 2021-12-10T08:14:28Z

Here is my understanding, hope it helps. :)

What is the meaning of "safe" CDS.

Let's say you write a CDS, you try all different kinds of way to break this CDS, and this CDS is really well designed that it doesn't have any memory issue. Can I say this CDS is "safe"?

If yes, then the word "safe" is related to the way how you manage your memory to make sure CDS don't have memory issues, like hazard pointers or epoch based memory reclamation, etc.

If we are considering the word "safe" as: It is guaranteed that the Harris-'s lists write operation order = the order of successful CAS on the head and node's next pointer. Well, yes, we are still considering Harris-'s lists is "safe" in a way that it guarantee gives us some properties.

However the remaining question is Harris-*'s lists read operation order, which leads to our second question.

Where stale values come from?

Let's say you have 3 concurrent operations going in your list. Remove Node A, Read Node A, Change Node A's value, can we still get some solid equations like above? No, We don't know where this Read Node A operation should be placed. We can't decide the result should be null or new value or original value, all results make sense.

Besides above, from the computer architecture perspective, each core has their own cache, and the value of it might be out of sync with the value in memory, it depends on the protocol of how cores are communicating with each other, like MESI.

tomtomjhj · 2021-12-10T08:21:51Z

Thanks @poppindouble for great explanation. I'd like to add some more stuff.

The problem is that the methodology for mathematically specifying libraries under relaxed memory model is not mature enough.

Let me give a simple example. pop method of Treiber's stack in relaxed memory model can return None even if the stack is in fact not empty, because head.load can read an old null pointer. So pop in relaxed memory model doesn't exactly do what we expected it to do. You could modify the code of pop to work exactly as expected, but that will incur more synchronization overhead, so this is not an option.

What we want is to have an efficient implementation of stack, and at the same time reason about the stack's behavior. So we should come up with a specification that explains such a weak behavior. For example, we can specify that if pop returns None, then the thread must not have "seen" a push that is not already popped.

jessyoon14 · 2021-12-11T16:20:11Z

Thank you both for a very clear explanation! I understand the limitations of the current model.

I have a followup question about @tomtomjhj 's comment:

Let me give a simple example. pop method of Treiber's stack in relaxed memory model can return None even if the stack is in fact not empty, because head.load can read an old null pointer. So pop in relaxed memory model doesn't exactly do what we expected it to do. You could modify the code of pop to work exactly as expected, but that will incur more synchronization overhead, so this is not an option.

I'm confused by what you mean by a relaxed memory model. In the cs431 implementation of Treiber's stack, we use acquire/release synchronization, so I feel like this problem cannot happen. Are you saying that pop does not behave as expected for the cs431 Treiber stack code? Or are you saying that the current code does not have this problem, but if we replace all release/acquire synchronization in the code with Ordering::Relaxed, then pop would not behave as expected?

poppindouble · 2021-12-12T07:02:39Z

@jessyoon14 I gave out my own understand in #582, hope it helps. :)

poppindouble · 2021-12-12T09:04:19Z

@jessyoon14
First try to understand the relation between L0 cache and the main memory. CSAPP chapter 6 might help. Back to your question, Let's say you run Treiber's stack on a really weak memory model ISA machine, if you specify acquire and release in your C++ code or Rust, the CPU shouldn't do some reordering stuff, however if it is relaxed, reordering might happen and that's where the None comes from.

tomtomjhj · 2021-12-12T12:56:06Z

we use acquire/release synchronization, so I feel like this problem cannot happen

Using acquire/release access on the atomic location doesn't guarantee that load(Acquire) on that location will always read the latest value. Only atomic RMWs (e.g. CAS) and SeqCst accesses/fences can guarantee that.

Are you saying that pop does not behave as expected for the cs431 Treiber stack code?

Note that a successful pop is done by a successful CAS. So if you ignore pops that return None (which don't try CAS in the first place), the Treiber's stack implementation works exactly like stack.

poppindouble · 2021-12-12T21:10:03Z

Thanks for help me clear some of my misunderstandings

jessyoon14 · 2021-12-13T05:54:01Z

Thank you!

jessyoon14 added the question Further information is requested label Dec 9, 2021

jessyoon14 assigned kyeongmincho Dec 9, 2021

tomtomjhj assigned jeehoonkang and unassigned kyeongmincho Dec 9, 2021

poppindouble mentioned this issue Dec 12, 2021

What is memory model? #582

Open

jessyoon14 closed this as completed Dec 13, 2021

Lee-Janggun added the lecture question/discussion related to lectures label Sep 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Question] Is it safe to traverse Harris-*'s lists? #574

[Question] Is it safe to traverse Harris-*'s lists? #574

jessyoon14 commented Dec 9, 2021 •

edited

Loading

poppindouble commented Dec 10, 2021

tomtomjhj commented Dec 10, 2021

jessyoon14 commented Dec 11, 2021 •

edited

Loading

poppindouble commented Dec 12, 2021

poppindouble commented Dec 12, 2021

tomtomjhj commented Dec 12, 2021

poppindouble commented Dec 12, 2021

jessyoon14 commented Dec 13, 2021

[Question] Is it safe to traverse Harris-*'s lists? #574

[Question] Is it safe to traverse Harris-*'s lists? #574

Comments

jessyoon14 commented Dec 9, 2021 • edited Loading

poppindouble commented Dec 10, 2021

tomtomjhj commented Dec 10, 2021

jessyoon14 commented Dec 11, 2021 • edited Loading

poppindouble commented Dec 12, 2021

poppindouble commented Dec 12, 2021

tomtomjhj commented Dec 12, 2021

poppindouble commented Dec 12, 2021

jessyoon14 commented Dec 13, 2021

jessyoon14 commented Dec 9, 2021 •

edited

Loading

jessyoon14 commented Dec 11, 2021 •

edited

Loading