-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(PR2) Adds the support for Flexible Data Placement(FDP) over NVMe into the Cachelib #277
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if build is OK with removing liburing package?
@@ -541,6 +543,8 @@ class NavyConfig { | |||
// ============ Device settings ============= | |||
// Set the device block size, i.e., minimum unit of IO | |||
void setBlockSize(uint64_t blockSize) noexcept { blockSize_ = blockSize; } | |||
// Set the FDP data placment mode in device | |||
void setFDPMode(bool enable) noexcept { fdp_ = enable; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. setFDPEnabled
cachelib/navy/Factory.cpp
Outdated
if(isFDPEnabled) { | ||
f = openNvmeCharFile(path); | ||
} else { | ||
f = openCacheFile(path, fdSize, truncateFile); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How can you support the partition? I.e., we want to use only [start_lba, start_lba + partition_size)
W.r.t. this, would it be possible to take the block device name and map it to char dev? i.e., "nvmen[p]" to "ngn"?
The rationale is that this will allow us to apply the size check and fallocate thing the same. i.e., we can do:
f = openCacheFile(path, ...);
if (isFDPEnabled) {
f.close();
f = openNvmeCharFile(getNvmeCharDevice(path), ...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like there were some formatting error. I meant the mapping: "nvmeXnY[pZ]" to "ngXnY".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, your requirement is as follows,
There are partitions like nvmeXnYp1, nvmeXnYp2 ... , and one of them is used in this cachelib instance.
(Ex: nvmCachePath = "/dev/nvmeXnYp2")
OR is it just one partition in that namespace, but it happened to be partition-formatted ?
The rationale is that this will allow us to apply the size check and fallocate thing the same
Is this steps used for device simulation scenarios?
Anyway the proposed approach is doable and making sense.
cachelib/navy/common/Device.cpp
Outdated
auto remainingSize = size; | ||
auto maxWriteSize = (maxWriteSize_ == 0) ? remainingSize : maxWriteSize_; | ||
maxWriteSize = (maxIOSize_ == 0) ? maxWriteSize : maxIOSize_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove this after fix setting maxIOSize_
.
cachelib/navy/common/Device.cpp
Outdated
#ifndef CACHELIB_IOURING_DISABLE | ||
if (isFDPEnabled_) { | ||
fdpNvme_ = std::make_shared<FdpNvme>(fvec_[0].fd()); | ||
setMaxIOSize(fdpNvme_->getMaxTfrSize()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix maxWriteSize_
too.
maxIOSize_ = fdpNvme_->getMaxTfrSize();
maxWriteSize_ = min(maxWriteSize_, maxIOSize_);
cachelib/navy/common/Device.cpp
Outdated
AsyncIoContext::prepUringCmdAsyncIo(int fd, uint8_t opType, void* data, | ||
uint32_t size, uint64_t offset, void* userdata, int handle) { | ||
std::unique_ptr<folly::AsyncBaseOp> asyncOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to guard this code appropriately using CACHELIB_IOURING_DISABLE
.
So, suggest to call prepUringCmdAsyncIo
inside the prepAsyncIo
with guarding this function with #ifndef CACHELIB_IOURING_DISABLE
cachelib/navy/common/Device.cpp
Outdated
} else { | ||
fdpNvme_->prepWriteUringCmdSqe(sqe, fd, data, size, offset, handle); | ||
} | ||
asyncOp = std::move(iouringCmdOp); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do you need to have asyncOp
separately? We can just return iouringCmdOp
.
cachelib/navy/common/FdpNvme.cpp
Outdated
#include <linux/nvme_ioctl.h> | ||
#include <sys/ioctl.h> | ||
#include <errno.h> | ||
#include <cstring> | ||
#include <numeric> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. in alphabetical order?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you check if this compiles in supported platforms here https://github.com/facebook/CacheLib/actions?
cachelib/navy/common/FdpNvme.cpp
Outdated
auto pid = getFdpPID(phndl); | ||
|
||
return static_cast<int>(pid); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to expose the placement id here? Why not just give out the phndl
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, possible.
FdpNvme.cpp is not protected for this case. I guess you want to bring them under -DCACHELIB_IOURING_DISABLE, right? |
As per the status, |
Oh, my bad. I just started the workflows. I expect some of them to be failed, so please fix it as it appears. |
c68f234
to
3c649e9
Compare
cachelib/navy/common/Device.cpp
Outdated
if (result) { | ||
bytesRead_.add(readSize); | ||
} else { | ||
readIOErrors_.inc(); | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. return early to reduce the indent depth.
if (!result) {
...
return false;
}
bytesRead_.add(readSize);
cachelib/navy/common/Device.cpp
Outdated
: asyncBase_(std::move(asyncBase)), | ||
id_(id), | ||
qDepth_(capacity), | ||
useIoUring_(useIoUring) { | ||
useIoUring_(useIoUring), | ||
fdpNvme_(fdpNvme) { | ||
#ifdef CACHELIB_IOURING_DISABLE | ||
// io_uring is not available on the system | ||
XDCHECK(!useIoUring_); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. XDCHECK(!useIoUring_ && !fdbNvme_)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you meant XDCHECK(!useIoUring_ || !fdbNvme_) ?
It is an error here, if either of them is enabled.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If CACHELIB_IOURING_DISABLE
, both of useIoUring_
and fdbNvme_
should be nullptr, right?
cachelib/navy/common/Device.cpp
Outdated
} | ||
#endif | ||
|
||
bool AsyncIoContext::submitIo(IOOp& op) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Could you move this back to original place, i.e., above prepAsyncIo
?
cachelib/navy/common/Device.cpp
Outdated
#ifndef CACHELIB_IOURING_DISABLE | ||
std::unique_ptr<folly::AsyncBaseOp> AsyncIoContext::prepNvmeIo(IOOp& op) { | ||
std::unique_ptr<folly::IoUringOp> iouringCmdOp; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more simpler to guard the body to minimize the use of CACHELIB_IOURING_DISABLE
, i.e.,
std::unique_ptr<folly::AsyncBaseOp> AsyncIoContext::prepNvmeIo(IOOp& op) {
#ifndef CACHELIB_IOURING_DISABLE
...
#else
return nullptr;
#endif
}
cachelib/navy/common/Device.h
Outdated
void setMaxWriteSize(uint32_t maxWriteSize) { maxWriteSize_ = maxWriteSize; } | ||
|
||
void setMaxIOSize(uint32_t maxIOSize) { maxIOSize_ = maxIOSize; } | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove those after moving FdpNvme
construction to createDirectIoFileDevice
?
cachelib/navy/common/Device.cpp
Outdated
if (!!fdpNvme_) { | ||
auto maxIOSize = fdpNvme_->getMaxTfrSize(); | ||
if (maxIOSize) { | ||
setMaxIOSize(maxIOSize); | ||
maxDeviceWriteSize = std::min<size_t>(maxDeviceWriteSize, maxIOSize); | ||
setMaxWriteSize(maxDeviceWriteSize); | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I feel like it would be better to move this logic to createDirectIoFileDevice
and pass FdpNvme
and maxIOSize
as parameters instead of isFDPEnabled
.
Also, could you move this if inside of try
without if (!!fdpNvme_)
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A query;
you meant to have something like this in the FileDevice creation, right?
FileDevice::FileDevice(...,
uint32_t maxDeviceWriteSize,
uint32_t maxIOSize,
std::shared_ptr fdpNvme,
.....)
: Device(...,
maxDeviceWriteSize,
maxIOSize)
maxIOSize would be set the Device initiation itself, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, something like that. Both maxIOSize_
and maxDeviceWriteSize_
would be set in the constructor initializer list.
cachelib/navy/common/FdpNvme.cpp
Outdated
} else { | ||
throw std::bad_alloc(); | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why we need this? Isn't this an assert?
Rather, I guess we need to raise some exception (invalid_argument
?) if ruh_status->nruhsd
is 0 to disable fdp?
if (!ruh_status->nruhsd) {
throw std::invalid_argument("Failed to initialize FDP; nruhsd is 0");
}
cachelib/navy/Factory.cpp
Outdated
size_t devPos = blockDevice.find_first_of("0123456789"); | ||
size_t pPos = blockDevice.find('p', devPos); | ||
|
||
return "/dev/ng" + blockDevice.substr(devPos, pPos - devPos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is not enough to support the partition; we have to set the offset and the size too. Not sure whether there is some library with which we can get the start lba and size of partition from sysfs, i.e., /sys/block/nvme1n1/nvme1n1p2/start
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right. Sorry, I missed the 'offset'.
And the 'offset' need to be passed to the createDirectIoFileDevice(), right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I think it would be good to have all nvme specific code in Device.cpp
or FdpNvme.h
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A thought in the same line:
Is it ok if we keep all the nvme char device handling inside FdpNvme.cpp?
-
That might require two sets of folly:file.
one that is kept(file1) by FileDevice, which is of /dev/nvmeXnYpZ; the other (file2) kept by FdpNvme which is of /dev/ngXnY.
And map is kept by FdpNvme, "file1" : "file2, offset, size".
What do you say? Do you foresee any statistics collection related issues if we keep two sets of folly::file. -
The other way is to have all the logic in Device.cpp:createFdpFileDevice() which in turn makes use of different functions exported through FdpNvme.h, such as openNvmeCharFile(blkdev_path) (Returns folly::file, part_offset, size etc)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it ok if we keep all the nvme char device handling inside FdpNvme.cpp?
Sounds good to me and prefer option 1. However, I don't think you need a map. Instead, I think we can make FdpNvme
represents an FDP NVMe block device which could be a partition or a whole namespace. The char dev fd and partition offset could be property of the FdpNvme
.
W.r.t. this, I can see you are initializing FdpNvme
for the first device only. So, I think we can restrict FdpNvme
is only supported for single block device only for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay, sure.
So, I think we can restrict FdpNvme is only supported for single block device only for now.
Yes, I am adding explicit checks and exceptions to limit the FDP support for single device for now.
@jaesoo-fb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
1 similar comment
@jaesoo-fb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
3c649e9
to
4fa7366
Compare
@arungeorge83 has updated the pull request. You must reimport the pull request before landing. |
@jaesoo-fb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall looks good. Thanks for the work.
@@ -549,6 +551,8 @@ class NavyConfig { | |||
// ============ Device settings ============= | |||
// Set the device block size, i.e., minimum unit of IO | |||
void setBlockSize(uint64_t blockSize) noexcept { blockSize_ = blockSize; } | |||
// Set the NVMe FDP Device data placement mode in the Cachelib | |||
void setFDPMode(bool enable) noexcept { fdpEnabled_ = enable; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. setEnableFDP
.
Could you also change other instances of FDPMode
to EnableFDP
?
@@ -227,6 +227,9 @@ struct CacheConfig : public JSONConfig { | |||
// Navy will split it into multiple IOs. | |||
uint32_t deviceMaxWriteSize{1024 * 1024}; | |||
|
|||
// Enable the FDP Data placement mode in the device, if it is capable. | |||
bool fdpMode{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. deviceEnableFDP
@@ -694,6 +698,9 @@ class NavyConfig { | |||
// Whether to use write size (instead of parcel size) for Navy admission | |||
// policy. | |||
bool useEstimatedWriteSize_{false}; | |||
// Whether Navy support the NVMe FDP data placement(TP4146) directives or not. | |||
// Reference: https://nvmexpress.org/nvmeflexible-data-placement-fdp-blog/ | |||
bool fdpEnabled_{false}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. enableFDP_
cachelib/navy/common/Device.cpp
Outdated
#else | ||
XDCHECK(!isFDPEnabled); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. not needed.
cachelib/navy/common/Device.h
Outdated
// Some devices have this transfer size limit due to DMA size limitations. | ||
// This limit is applicable for both writes and reads. | ||
const uint32_t maxIOSize_{0}; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit. Could you move this right above maxWriteSize_
to match with the initialization order in the constructor?
cachelib/navy/common/FdpNvme.h
Outdated
// as of now; and not supported through conventional block interfaces. | ||
class FdpNvme { | ||
public: | ||
FdpNvme(const std::string& fileName); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this explicit, i.e., explicit FdpNvme(...)
cachelib/navy/common/FdpNvme.cpp
Outdated
#include <sys/ioctl.h> | ||
|
||
#include <cstring> | ||
#include <numeric> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this used?
@@ -788,6 +844,32 @@ bool AsyncIoContext::submitIo(IOOp& op) { | |||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I cannot add comment above. Could you remove req
which is not used.
@@ -773,9 +829,8 @@
}
bool AsyncIoContext::submitIo(IOOp& op) {
- IOReq& req = op.parent_;
-
op.startTime_ = getSteadyClock();
+
while (numOutstanding_ >= qDepth_) {
if (qDepth_ > 1) {
XLOG_EVERY_MS(ERR, 10000) << fmt::format(
cachelib/navy/testing/MockDevice.h
Outdated
@@ -44,6 +44,14 @@ class MockDevice : public Device { | |||
MOCK_METHOD3(readImpl, bool(uint64_t, uint32_t, void*)); | |||
MOCK_METHOD3(writeImpl, bool(uint64_t, uint32_t, const void*)); | |||
MOCK_METHOD0(flushImpl, void()); | |||
MOCK_METHOD0(allocatePlacementHandle, int()); | |||
|
|||
virtual bool writeImpl(uint64_t offset, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
remove virtual
here and add override
instead. i.e., bool writeImpl(...) override {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This results in compilation error (due to gtest restrictions on default argument - int handle - I guess).
How about adding a line ?
MOCK_METHOD4(writeImpl, bool(uint64_t, uint32_t, const void*, int));
cachelib/navy/common/FdpNvme.cpp
Outdated
size_t bPos = blockDevice.find_last_of("/") + 1; | ||
size_t pPos = blockDevice.find('p', bPos); | ||
|
||
XDCHECK(bPos < std::string::npos); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove this.
I could not find the file TARGETS. Is it something not present in open source version? |
Yeah, this is for internal build. Please ignore. |
cachelib/navy/common/FdpNvme.cpp
Outdated
} | ||
|
||
bool isValidNvmeDevice(const std::string& fileName) { | ||
return std::regex_match(fileName, std::regex("^/dev/nvme\\d+n\\d+(p\\d+)?$")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you replace this with the equivalent (re2::RE2::FullMatch
?) from RE2? https://github.com/google/re2
We cannot use regex_match
due to its vulnerability to segfault https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61582&fbclid=IwAR16Xwiru8QryYObTnoATM5WwCSTX9IRVorMwDxYq9E0sxrsBQTw-H1iKH4
4fa7366
to
cabe1c2
Compare
@arungeorge83 has updated the pull request. You must reimport the pull request before landing. |
…nto Cachelib Summary: This commit adds the device layer support for NVMe-FDP semantics and adds the RUH-awareness feature of NVMe-FDP in the upper layers of Navy. This allows the BlockCache(large items) and BigHash(small items) of Navy to segregate their data streams in physical NAND media by using the FDP placement Identifiers. With this changes, the Cachelib can reduce the Device Write Amplification (WAF) significantly even in high SSD utilization scenarios ("nvmCacheSizeMB" above 50% of the SSD capacity) in most of the cachelib workloads. This commit introduces a 'placementHandle' concept for data placement, which can be used by both BC and BH of Navy on device write() calls, especially for FDP placements. The 'placementHandle' have to be allocated from the device. io_uring_cmd interface(through nvme char device) is used to send FDP directives to Linux kernel, as sending it through the conventional block interfaces is not suported yet. The user can select the NVMe block device (Namespace/partition) as usual (Ex: "nvmCachePaths": ["/dev/nvme0n1p1"]), and the cachelib will pick the corresponding NVMe char device internally. This commit adds a new config 'deviceEnableFDP' to enable FDP. The user needs to select this along with iOUring I/O Engine options. ("navyEnableIoUring": true, "navyQDepth": 1, "deviceEnableFDP": true). Signed-off-by: Arun George <[email protected]> Signed-off-by: Vikash Kumar <[email protected]>
@jaesoo-fb has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
options.sqe128 = true; | ||
options.cqe32 = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I cannot see the big_cqe
is used anymore. Is this still needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
right, we can remove the line "options.cqe32 = true;" for now. We don't have the big_cqe use cases in the current code.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arungeorge83 Actually, I would like to remove this option completely. So, we don't need sqe128
, either, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jaesoo-fb Actually both sqe128 and cqe32 are mandatory for NVMe passthrough commands. Else the kernel NVMe driver will reject those ios.
Reference: https://elixir.bootlin.com/linux/latest/source/drivers/nvme/host/ioctl.c#L742, Iceber/iouring-go#21
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also from Kanchan Joshi's words (Author of the IO passthru patch - https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=456cba386e94f22fa1b1426303fdcac9e66b1417),
NVMe passthrough commands are of 72 bytes in size. This can not fit into regular SQE. Therefore big SQE (128 bytes) is required.
Also passthrough commands may return more than one result to userspace. Returning addtional result is not possible with regular CQE. Hence big CQE is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for clarification. Yeah, it looks clear here https://elixir.bootlin.com/linux/v6.7/source/drivers/nvme/host/ioctl.c#L742
#ifndef CACHELIB_IOURING_DISABLE | ||
#include <liburing.h> | ||
#endif |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI. Actually, we should not include liburing.h
directly here due to some internal dependency issues causing test failures on some targets. I will remove this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok.
Then should we include <liburing.h> through <folly/experimental/io/IoUring.h> which include <liburing.h> in the file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@arungeorge83 Yeah, right. It will include the liburing.h
from the repo instead of that installed on the system.
FYI, I am taking care of minor changes including those required for the internal build, so you don't need to update the commit.
@jaesoo-fb merged this pull request in 009e89b. |
Submitted with the fixup b5d70a5 |
This adds the device layer support for NVMe-FDP semantics and adds the RUH-awareness feature of NVMe-FDP in
the upper layers of Navy. This allows the BlockCache(large items) and BigHash(small items) of Navy to segregate their data streams in physical NAND media by using the FDP placement Identifiers.
With this changes, the Cachelib can reduce the Device Write Amplification (WAF) significantly even in high SSD utilization scenarios("nvmCacheSizeMB" above 50% of the SSD capacity) in most of the cachelib workloads.
This commit introduces a 'placementHandle' concept for data placement, which can be used by both BC and BH of Navy on
device write() calls, especially for FDP placements. The 'placementHandle' have to be allocated from the device.
io_uring_cmd interface(through nvme char device) is used to send FDP directives to Linux kernel, as sending it through the
conventional block interfaces is not suported yet. The user can select the NVMe block device (Namespace/partition) as usual
(Ex: "nvmCachePaths": ["/dev/nvme0n1p1"]), and the cachelib will pick the corresponding NVMe char device internally.
This commit adds a new config 'fdpMode' to enable FDP. The user needs to select the fdpMode along with iOUring I/O Engine options ("navyEnableIoUring": true, "navyQDepth": 1, "fdpMode": true).
This second PR consist of the code which can be integrated to the 'iouring async-io' support in the Navy.