-
Notifications
You must be signed in to change notification settings - Fork 76
Persistent-aware self_relative_ptr and Single-writer-multi-reader (SWMR) skiplist #1174
base: master
Are you sure you want to change the base?
Conversation
… swmr_map Add some related tests
… swmr_map Add some related tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@yjrobin Thank you very much for the patch! I really like the direction of this. My main comment would be to think about reusing existing components in pa_self_relative_ptr and swmr_skip_list. For pa_self_relative_ptr I think it would be easier to implement the flush_needed on top of the existing self_relative_ptr implementation (see comment below). That way it would also be easier to tell if the implementation is correct.
Regarding the skip_list implementation, I think it would be best to have everything in one place (e.g. in concurrent_map/concurrent_skip_list_impl.hpp) and just add some (template) parameters to control the concurrency setting (swmr vs mwmr) or just have separate insert methods for swmr and mwmr modes. Do you think this is possible?
Reviewed 1 of 18 files at r1.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @yjrobin)
include/libpmemobj++/experimental/atomic_pa_self_relative_ptr.hpp, line 17 at r3 (raw file):
{ /** * Atomic specialization for pa_self_relative_ptr
Does it make sense to have pa_self_relative_ptr on it's own? (without atomic around it)? I'm wondering if we couldn't just use self_relative_ptr and implement the flush_needed logic on top of it in this atomic specialization. Please take a look at include/libpmemobj++/detail/tagged_ptr.hpp. It's quite similar.
Maybe you could even use tagged_ptr directly in atomic<pa_self_relative_ptr> implementation?
include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):
next(size_type level) { assert(level < height());
I think that this kind of logic is pretty generic and this could be implemented in some load() method in the atomic<pa_self_relative_ptr> class perhaps?
This is the right way to do it, I create a new one because I want to show you the idea and logic without changing existing file. I'll try to merge the pa_self_relative_ptr into self_relative_ptr with some extra overloaded APIs.
This is possible as they are quite similar. I'll merge them.
If merging pa_self_relative_ptr into self_relative_ptr cannot be done, I'll look into the tagged_ptr.
Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great! In general, I think that having two separate classes: self_relative_ptr and pa_self_relative_ptr might be a good idea. pa_self_relative could provide only methods like store/load/compare_exchange/... which ensure atomicity is the same as visibility. If we have all API in self_relateive_ptr it might be easy to accidentally mix the API by a user.
The flush_neede/dirty flag should be an implementation detail, hidden from the user IMO.
Reviewable status: 1 of 19 files reviewed, 2 unresolved discussions (waiting on @igchor and @yjrobin)
include/libpmemobj++/experimental/swmr_skip_list_impl.hpp, line 125 at r3 (raw file):
Previously, yjrobin wrote…
Implementing this logic in load() may introduce some problems for "const" iterators as load() function generally does not modify anything. My idea is allow some read-only (not the read in the write-after-read) operation such as const iterators to read the those dirty self_relative_ptr without flushing and resetting the flag. How do you think?
Right, the load should not modify anything - but I believe you should still put this logic somewhere into the pa_self_relative_ptr (some separate method). Regarding reading dirty self_relative_ptrs, how would you ensure consistency of the data structure? Maybe for const load() we could have something similar to this implementation: https://github.com/pmem/pmdk/blob/master/src/examples/libpmem2/ringbuf/ringbuf.c#L247
The load
method there only persists the value, but does not clear the dirty flag.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)
a discussion (no related file):
Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.
As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)
a discussion (no related file):
Previously, igchor (Igor Chorążewicz) wrote…
Getting rid of transactions is quite tricky and I would suggest to focus on locks first. We need to use transactions, at least for allocations (make_persistent only works within a transaction). Replacing current implementation with Flush-on-Read approach is not so simple because you need to think about memory leaks. We can discuss this in more detail if you want.
As a first step, I would suggest removing locks and replacing enumerable_thread_specific with a single variable for SWMR case. To remove locks you can just create a dummy version of try_lock_nodes method which just do nothing in SWMR case. For enumerable_thread_specific, you could also create a dummy class, which always returns the same reference in local() method (since we only have one writer, we do not need per-thread storage).
Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 1 of 19 files reviewed, 3 unresolved discussions (waiting on @igchor and @yjrobin)
a discussion (no related file):
Previously, yjrobin wrote…
Yes, it is not necessary to remove the txn for pmem allocation. I just focus on the internal locks of the SWMR skiplist.
Dummy version of try_lock_nodes method and per-thread class is what I'm doing. Thanks for the suggestion.
@yjrobin just wanted to let you know that I will be out of the office for the next 3 or 4 weeks. If you have any questions please feel free to reach out to [email protected] or [email protected] My team will also review the changes regarding locks and tls once they are ready.
Once I'm back we can discuss how to further optimize the skip list, perhaps using flush-on-read.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
Hi @lukaszstolarczuk and @pbalcer , I've updated the code according to the previous discussion with @igchor. He said you can help to review the code when he's on vacation. Currently, there are some problems in the UTs of radix_tree which uses tagged_ptr based on self_relative_ptr. I've tried to modify it accordingly using "self_relative_ptr<void, std::false_type>". It makes the compilation successful but the testcases of radix_tree cannot pass. Could you help me with this? Thanks a lot. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: 0 of 30 files reviewed, 5 unresolved discussions (waiting on @igchor and @yjrobin)
include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 208 at r5 (raw file):
} void set_dirty_flag(bool dirty)
Why flag have to be 0
when set and 1
when not? Wouldn't be logic cleaner otherwise?
This method may be just something like
void
set_dirty_flag(bool dirty)
{
if(dirty)
offset |= dirty_flag;
}
include/libpmemobj++/detail/self_relative_ptr_base_impl.hpp, line 249 at r5 (raw file):
uintptr_t ptr = static_cast<uintptr_t>( reinterpret_cast<intptr_t>(this) + (other_offset | ~dirty_flag) + 1);
This line cause your problems with radix.
Address, which is checked in this assert
assert(get<P1>() == rhs.get()); |
is increased by 2 (and probably points to some garbage) when you unconditionally change this bit.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
…d parameter. Implement the single-writer-multi-reader skip-list based on the persistent-aware self_relative_ptr in existing concurrent_skip_list_impl as a templated parameter. Add some UTs.
The definition of the
I've fixed this problem. This is quite interesting. The I also removed the testcase for pa_self_relative_ptr_atomic, only pa_self_relative_ptr_atomic_pmem is tested. This is because persistent-aware self_relative_ptr must be on PMem. |
Hi @yjrobin, I have a few follow-up questions regarding the design and implementation. Would you be available for a meeting/call? I think it would be easier to discuss in real time than on github. If you could, please send me an email at [email protected] so we could schedule something. |
Codecov Report
@@ Coverage Diff @@
## master #1174 +/- ##
==========================================
- Coverage 93.65% 90.13% -3.52%
==========================================
Files 53 53
Lines 4567 4175 -392
==========================================
- Hits 4277 3763 -514
- Misses 290 412 +122
Flags with carried forward coverage won't be shown. Click here to find out more.
Continue to review full report at Codecov.
|
Dear all,
It has been a long time since we talked earlier. I create this PR as the kick-start to discuss with you about how we can contribute to this repo in order to make a new storage engine (based on a single-writer-multi-reader skiplist) for PmemKV.
As we discussed earlier, we both found the traditional
persistent_ptr
in PMDK is too expensive (16 bytes) and not convenient for atomic operations. So I looked into the experimentalself_relative_ptr
you mentioned and come up with some idea to modify it to facilitate the implementation of Persistent Compare-and-swap. This PR includes the following components:1. pa_self_relative_ptr
2. swmr_skip_list
next
ptr of a skiplist node.Before I move on to implement the final swmr_skip_list, I'd like to discuss with you about the
pa_self_relative_ptr
to see if there are any correctness problems. And also how you think the existing MWMR concurrent_skip_list can usepa_self_relative_ptr
to remove unnecessary locks and txns in SWMR semantics.Thanks!
Best rgds,
Jun Yang
This change is