Skip to content

Commit

Permalink
Adds documentation on the FDP-CacheLib integration
Browse files Browse the repository at this point in the history
Summary:
This adds the relevant documentation related to the FDP integration to
CacheLib which was merged in commit 009e89b.

This commit adds a separate page containing all the information about
FDP and its use in CacheLib. Additionally, this page introduces a
potential user of CacheLib with all the relevant steps
to setup up an FDP SSD and to run CacheLib with FDP enabled.

The parameter to enable FDP in CacheBench "deviceEnableFDP" has been
updated to the CacheBench configuration page.

The parameter to enable FDP in Navy Config
"navyConfig.setEnableFDP(enableFDP)" is updated to the Hybrid Cache
configuration page.

Information on the FdpNvme class is added to the Navy Architecture Guide
page.

Signed-off-by: Roshan Nair <[email protected]>
  • Loading branch information
Roshan Nair committed Apr 29, 2024
1 parent c86e189 commit 07732e3
Show file tree
Hide file tree
Showing 7 changed files with 118 additions and 0 deletions.
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Navy is the SSD optimized cache engine leveraged for Hybrid Cache. Navy is plugg
- Different admission policies to optimize write endurance, hit ratio, IOPS
- Supports Direct IO and raw devices
- Supports async IO with either [io_uring](https://lwn.net/Articles/776703/) or libaio
- Supports Flexible Data Placement (FDP) to improve write endurance of the device

## Design overview
There are three over-arching goals in the design of Navy
Expand Down Expand Up @@ -128,3 +129,7 @@ All IO operations within Navy happen over a range of block offsets. `Device` pro
`FileDevice` implements `Device` over one or more regular or block device files. When multiple regular or block devices are used, `FileDevice` operates like a software RAID-0 where a single IO can be splitted into multiple IOs in the unit of fixed `stripe` size. Note, this striping is orthogonal to the chunking that happens with `Device`. Usually, the stripe size is set to the size of a Navy region(16-64MB).

For actual IO operations, `FileDevice` supports both sync and async operations. For async operations, `FileDevice` supports [`io_uring`](https://lwn.net/Articles/776703/) and `libaio` which are supported by [folly](https://github.com/facebook/folly) as `folly::IoUring` and `folly::AsyncIO`, respectively.

#### FdpNvme

`FdpNvme` embeds the FDP semantics and specific IO handling. IO with FDP semantics need to be sent through the io_uring_cmd interface and `FdpNvme` implements interfaces to allocate FDP specific `placementHandle`s, to prepare the UringCmd Sqe, etc. When FDP is enabled, `FileDevice` makes use of `FdpNvme` to use the FDP enabled file device path. For more informtaion read [FDP enabled Cache](/docs/Cache_Library_User_Guides/FDP_enabled_Cache.md).
1 change: 1 addition & 0 deletions website/docs/Cache_Library_User_Guides/CacheLib_configs.md
Original file line number Diff line number Diff line change
Expand Up @@ -41,6 +41,7 @@ Configs to initialize NVM cache lives in `CacheAllocatorConfig::nvmConfig` and a
* `setNvmCacheAdmissionPolicy`/`enableRejectFirstAPForNvm`: Sets the NvmAdmissionPolicy. Notice that the field lives with CacheAllocatorConfig.
* `setNvmAdmissionMinTTL`: Sets the NVM admission min TTL. Similarly this lives directly with CacheAllocatorConfig.
* `enableNvmCache`: Sets `CacheAllocatorConfig::nvmConfig` directly. This function should be called first if you intend to turn on NVM cache. And the other functions above would correctly modify the nvmConfig.
* `deviceEnableFDP`: Enables Flexible Data Placement (FDP) in Navy. This ensures a segregation of the LOC and SOC data within the SSD. Note that this works only if the SSD supports FDP.

### WORKERS

Expand Down
10 changes: 10 additions & 0 deletions website/docs/Cache_Library_User_Guides/Configure_HybridCache.md
Original file line number Diff line number Diff line change
Expand Up @@ -82,6 +82,16 @@ Optionally, when `NavyRequestScheduler` is used, the queue depth and IO engine o

Select Io engine between io_uring and libaio. See [Architecture Guide - Device](/docs/Cache_Library_Architecture_Guide/navy_overview#device) for more details.

Optionally, to enable Flexible Data Placement (FDP) support in `Device` layer of Navy.

```cpp
navyConfig.setEnableFDP(enableFDP);
```

* `enableFDP` = `true`

When set to `true`, FDP is enabled and the BigHash and BlockCache device writes get segregated within the SSD. For more details, refer [FDP enabled Cache](/docs/Cache_Library_User_Guides/FDP_enabled_Cache.md).

### 2. Common Settings - Job Scheduler

Two types of Job scheduler are supported (see [Architecture Guide - Navy overview](/docs/Cache_Library_Architecture_Guide/navy_overview#job-scheduler)). Common settings are as follows.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -228,6 +228,7 @@ Number of shards used for request ordering. The default is 21, corresponding to
Truncates item to allocated size to optimize write performance.
* `deviceMaxWriteSize`
This controls the largest IO size we will write to the device. Any IO above this size will be split up into multiple IOs.
* `deviceEnableFDP` This enables the use of FDP in Navy which results in a segregation of BigHash and BlockCache writes to the SSD.

### Small item engine parameters

Expand Down
100 changes: 100 additions & 0 deletions website/docs/Cache_Library_User_Guides/FDP_enabled_Cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,100 @@
---
id: FDP_enabled_Cache
title: FDP enabled cache
---

### NVMe® Flexible Data Placement (NVMe® FDP)
NVM Express®(NVMe®) released the ratified technical proposal TP4146a Flexible Data Placement (FDP) that defines a new method for placing host data (logical blocks) into the SSD in an effort to reduce SSD [Write Amplification Factor (WAF)](https://nvmexpress.org/nvmeflexible-data-placement-fdp-blog/). This provides the host a mechanism to control which of its data is placed into different sets of physical locations of the SSD (NAND blocks) called Reclaim Units. The host is able to write into multiple such Reclaim Units at a time allowing the host to isolate its data that has different lifetime. For more information on NVMe® FDP and various use cases refer this document: [Introduction to FDP](https://download.semiconductor.samsung.com/resources/white-paper/FDP_Whitepaper_102423_Final_10130020003525.pdf).

### How does CacheLib use FDP?
CacheLib's BigHash and BlockCache produce distinct IO patterns on the SSD. BigHash generates a random write pattern while BlockCache generates a sequential write pattern. In a conventional SSD, these writes get mixed up in the physical NAND media. This intermixing can lead to a higher SSD Write Amplification Factor (WAF). To combat this in production environments CacheLib uses upto [50% of the SSD as host over-provisioning](https://www.usenix.org/system/files/osdi20-berg.pdf). The Flexible Data Placement (FDP) support within CacheLib aims to segregate the BigHash and BlockCache data of CacheLib within the SSDs. This reduces the device WAF even when configured with 0% host over-provisioning and improves device endurance. FDP support within Navy is optional.

| ![](alternate_fdp_navy.png) |
|:--:|
| *CacheLib IO Flow with and without FDP* |

Since FDP directives are not yet supported by the Linux kernel block layer interface, we have used Linux kernel [I/O Passthru](https://www.usenix.org/system/files/fast24-joshi.pdf) mechanism (which leverages [io_uring_cmd interface](https://www.usenix.org/system/files/fast24-joshi.pdf)) as seen in the figure.

### FDP support within Navy
The `FdpNvme` class embeds all the FDP related semantics and APIs and is used by the `FileDevice` class. This can be extended to other modules in the future if they desire to use FDP support. The below are some key functions added to the `FdpNvme` class related to FDP and iouring_cmd.

```cpp

// Allocates an FDP specific placement handle to modules using FdpNvme like Block Cache and Big Hash.
// This handle will be interpreted by the device for data placement.
int allocateFdpHandle();

// Prepares the Uring_Cmd sqe for read/write command with FDP directives.
void prepFdpUringCmdSqe(struct io_uring_sqe& sqe,
void* buf,
size_t size,
off_t start,
uint8_t opcode,
uint8_t dtype,
uint16_t dspec);
```
### When should FDP be enabled?
When using Navy, FDP can help play a role in improving SSD endurance in comparison to Non-FDP. If the workload has both small objects and large objects, FDP's WAF gains will most likely be evident because FDP segregates these two in the SSD. In cases like CacheLib's CDN workload where BigHash is typically not configured, this FDP based segregation will make no difference to SSD WAF. On the other hand with CacheLib's KV Cache workload where both Big Hash and Block Cache are configured, we see the gains from using FDP. We showcase the WAF gains in the results section below.
### Building the code for FDP
> ** _Note on building the code:_ ** As the FDP path uses IOUring for I/Os as mentioned above, make sure to install the [liburing library](https://github.com/axboe/liburing) before building the CacheLib code.
The method to build the CacheLib code remains unchanged. Refer [Build and Installation](/docs/installation/installation.md) for the detailed steps. After building the code, to run a CacheLib instance with FDP make sure to enable FDP both in the SSD and CacheLib.
### How to enable FDP in the SSD?
```bash
#Delete any pre-exisitng namespaces
nvme delete-ns /dev/nvme0 -n 1
#Disable FDP
nvme set-feature /dev/nvme0 -f 0x1D -c 0 -s nvme get-feature /dev/nvme0 -f 0x1D -H
#Enable FDP
nvme set-feature /dev/nvme0 -f 0x1D -c 1 -s
#Verify whether FDP has been enabled/disabled
nvme get-feature /dev/nvme0 -f 0x1D -H
# Get capacity of drive and use it for further calculations
nvme id-ctrl /dev/nvme0 | grep nvmcap | sed "s/,//g" | awk '{print $3/4096}'
# Create namespace. use the capacity values from the above command in --nsze. For e.g. capacity is 459076086
nvme create-ns /dev/nvme0 -b 4096 --nsze=459076086 --ncap=459076086 -p 0,1,2,3 -n 4
#Attach namespace. e.g. NS id = 1, controller id = 0x7
nvme attach-ns /dev/nvme0 --namespace-id=1 --controllers=0x7
# Deallocate
nvme dsm /dev/nvme0n1 -n 1 -b 459076086
```

### How to enable FDP in CacheLib?
To enable Flexible Data Placement (FDP) support in `Device` layer of Navy, use the following configuration:

```cpp
navyConfig.setEnableFDP(enableFDP);
```
* When set to `true`, FDP is enabled and the BigHash and BlockCache device writes get segregated within the SSD.

Apart from enabling FDP explicitly, the steps to setup and run a CacheLib instance remain unchanged.

* To enable FDP in the CacheBench config file add the following line to the cache_config parameters:

```cpp
"navyEnableIoUring": true,
"navyQDepth": 1,
"deviceEnableFDP" : true
```

### Qemu FDP emulation
Even if an FDP enabled SSD is unavailable, a hybrid cache instance can be spun up with FDP enabled using a Qemu emulated FDP SSD. This allows for experimentation with an FDP enabled cache but the WAF gains won't be visible because Qemu doesn't emulate the SSD internal operations which lead to write amplification. However, this can be a helpful tool for understanding how to enable FDP in CacheLib and to study other aspects that come with it. To setup a Qemu emulated FDP SSD, follow the steps documented here : [Qemu NVMe Emulation](https://qemu-project.gitlab.io/qemu/system/devices/nvme.html#flexible-data-placement).

### Results with and without FDP
We run experiments using the [key-value cache traces](/docs/Cache_Library_User_Guides/Cachebench_FB_HW_eval/) using a 1.88TB SSD that supports FDP. We observe the following:
* NVM Cache Size of 930GB (50% of the 1.88TB device) and RAM Size 43GB with SOC size set to 4%
* SSD WAF FDP - 1.03, SSD WAF Non-FDP - 1.22
* NVM Cache Size of 1.88TB (100% of the 1.88TB device) and RAM Size 43GB with SOC size set to 4%
* SSD WAF FDP - 1.03, SSD WAF Non-FDP - 3.22
* Further results can be found [here.](https://download.semiconductor.samsung.com/resources/white-paper/FDP_Whitepaper_102423_Final_10130020003525.pdf)
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions website/sidebars.js
Original file line number Diff line number Diff line change
Expand Up @@ -120,6 +120,7 @@ module.exports = {
'Cache_Library_User_Guides/chained_items',
'Cache_Library_User_Guides/compact_cache',
'Cache_Library_User_Guides/Structured_Cache',
'Cache_Library_User_Guides/FDP_enabled_Cache',
],
},
{
Expand Down

0 comments on commit 07732e3

Please sign in to comment.