Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP: route discards via the I/O queue #2386

Draft
wants to merge 8 commits into
base: master
Choose a base branch
from
45 changes: 38 additions & 7 deletions include/seastar/core/internal/io_request.hh
Original file line number Diff line number Diff line change
Expand Up @@ -36,7 +36,7 @@ namespace internal {

class io_request {
public:
enum class operation : char { read, readv, write, writev, fdatasync, recv, recvmsg, send, sendmsg, accept, connect, poll_add, poll_remove, cancel };
enum class operation : char { read, readv, write, writev, fdatasync, recv, recvmsg, send, sendmsg, accept, connect, poll_add, poll_remove, cancel, discard };
private:
// the upper layers give us void pointers, but storing void pointers here is just
// dangerous. The constructors seem to be happy to convert other pointers to void*,
Expand Down Expand Up @@ -108,7 +108,12 @@ private:
int fd;
char* addr;
};

struct discard_op {
operation op;
int fd;
uint64_t offset;
uint64_t length;
};
union {
read_op _read;
readv_op _readv;
Expand All @@ -124,6 +129,7 @@ private:
poll_add_op _poll_add;
poll_remove_op _poll_remove;
cancel_op _cancel;
discard_op _discard;
};

public:
Expand Down Expand Up @@ -287,6 +293,17 @@ public:
return req;
}

static io_request make_discard(int fd, uint64_t offset, uint64_t length) {
io_request req;
req._discard = {
.op = operation::discard,
.fd = fd,
.offset = offset,
.length = length
};
return req;
}

bool is_read() const {
switch (opcode()) {
case operation::read:
Expand Down Expand Up @@ -364,6 +381,9 @@ public:
if constexpr (Op == operation::cancel) {
return _cancel;
}
if constexpr (Op == operation::discard) {
return _discard;
}
}

struct part;
Expand Down Expand Up @@ -405,6 +425,12 @@ private:
return sub_req;
}
std::vector<part> split_iovec(size_t max_length);

io_request sub_req_discard(size_t relative_offset, size_t new_length) const {
auto& op = _discard;
return make_discard(op.fd, op.offset + relative_offset, new_length);
}
std::vector<part> split_discard(size_t max_length);
};

struct io_request::part {
Expand All @@ -415,18 +441,23 @@ struct io_request::part {

// Helper pair of IO direction and length
struct io_direction_and_length {
size_t _directed_length; // bit 0 is R/W flag
size_t _directed_length; // bits 0 and 1 depict R/W/D flags

static constexpr int direction_bits_count = 2;
static constexpr int direction_bits_mask = (1 << direction_bits_count) - 1;

public:
size_t length() const noexcept { return _directed_length >> 1; }
int rw_idx() const noexcept { return _directed_length & 0x1; }
size_t length() const noexcept { return _directed_length >> direction_bits_count; }
int rwd_idx() const noexcept { return _directed_length & direction_bits_mask; }

static constexpr int discard_idx = 2;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that discard request should carry write_idx direction and whether or not to submit it via sink or via punch_hole() should be decided based on request::op()

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, I can revert the change and use io_direction_and_length::write for discards. The new direction does not look right in the case of duplex mode -- I would not expect three directions in duplex mode.

However, if there is no discard direction, I see the following three issues that are described below.

Firstly, if the write direction is used, then the calculation in internal::request_tokens() will need to contain a new if statement. It is going to distinguish between true write requests and discard requests (based on operation type), because the coefficients are different for write/discard operations.

Secondly, I am not sure about the usage of _group->_max_request_length[]. If I understand correctly, the maximum request length is different for discards and for writes. So, simple usage of _group->_max_request_length[direction] will be invalid. Somehow, I feel that it would end up with creation of a new private function that would look similar to the function below:

size_t get_max_request_length(io_direction_and_length dnl, const internal::io_request& req) {
    if (req.opcode() != internal::io_request::operation::discard) {
    	return _group->_max_request_length[dnl.rw_idx()];
    }
    
    return _group->_max_discard_request_length;
}

Lastly, the way how statistics are accumulated in io_queue::priority_class_data::on_dispatch() will mix discards and writes. The code looks as follows: _rwstat[dnl.rwd_idx()].add(dnl.length());.

If the three mentioned points do not cause a problem, then I will introduce the change.

static constexpr int read_idx = 1;
static constexpr int write_idx = 0;

io_direction_and_length(int idx, size_t val) noexcept
: _directed_length((val << 1) | idx)
: _directed_length((val << direction_bits_count) | idx)
{
assert(idx == read_idx || idx == write_idx);
assert(idx == read_idx || idx == write_idx || idx == discard_idx);
}
};

Expand Down
17 changes: 11 additions & 6 deletions include/seastar/core/io_queue.hh
Original file line number Diff line number Diff line change
Expand Up @@ -65,7 +65,7 @@ class io_sink;
using shard_id = unsigned;
using stream_id = unsigned;

class io_desc_read_write;
class queued_io_request_completion;
class queued_io_request;
class io_group;

Expand Down Expand Up @@ -93,7 +93,7 @@ public:
private:
std::vector<std::unique_ptr<priority_class_data>> _priority_classes;
io_group_ptr _group;
boost::container::static_vector<fair_queue, 2> _streams;
boost::container::static_vector<fair_queue, 3> _streams;
internal::io_sink& _sink;

friend struct ::io_queue_for_tests;
Expand All @@ -102,6 +102,7 @@ private:
priority_class_data& find_or_create_class(internal::priority_class pc);
future<size_t> queue_request(internal::priority_class pc, internal::io_direction_and_length dnl, internal::io_request req, io_intent* intent, iovec_keeper iovs) noexcept;
future<size_t> queue_one_request(internal::priority_class pc, internal::io_direction_and_length dnl, internal::io_request req, io_intent* intent, iovec_keeper iovs) noexcept;
void submit_blocks_discarding(queued_io_request_completion* desc, internal::io_request req) noexcept;

// The fields below are going away, they are just here so we can implement deprecated
// functions that used to be provided by the fair_queue and are going away (from both
Expand Down Expand Up @@ -147,6 +148,8 @@ public:
unsigned long blocks_count_rate = std::numeric_limits<int>::max();
unsigned disk_req_write_to_read_multiplier = read_request_base_count;
unsigned disk_blocks_write_to_read_multiplier = read_request_base_count;
unsigned disk_req_discard_to_read_multiplier = read_request_base_count;
unsigned disk_blocks_discard_to_read_multiplier = read_request_base_count;
size_t disk_read_saturation_length = std::numeric_limits<size_t>::max();
size_t disk_write_saturation_length = std::numeric_limits<size_t>::max();
sstring mountpoint = "undefined";
Expand All @@ -168,11 +171,12 @@ public:
size_t len, internal::io_request req, io_intent* intent, iovec_keeper iovs = {}) noexcept;
future<size_t> submit_io_write(internal::priority_class priority_class,
size_t len, internal::io_request req, io_intent* intent, iovec_keeper iovs = {}) noexcept;
future<size_t> submit_io_discard(internal::priority_class pc, size_t len, internal::io_request req, io_intent* intent) noexcept;

void submit_request(io_desc_read_write* desc, internal::io_request req) noexcept;
void submit_request(queued_io_request_completion* desc, internal::io_request req) noexcept;
void cancel_request(queued_io_request& req) noexcept;
void complete_cancelled_request(queued_io_request& req) noexcept;
void complete_request(io_desc_read_write& desc, std::chrono::duration<double> delay) noexcept;
void complete_request(queued_io_request_completion& desc, std::chrono::duration<double> delay) noexcept;

[[deprecated("I/O queue users should not track individual requests, but resources (weight, size) passing through the queue")]]
size_t queued_requests() const {
Expand Down Expand Up @@ -203,6 +207,7 @@ public:
struct request_limits {
size_t max_read;
size_t max_write;
size_t max_discard;
};

request_limits get_request_limits() const noexcept;
Expand All @@ -227,8 +232,8 @@ private:
friend const fair_group& internal::get_fair_group(const io_queue& ioq, unsigned stream);

const io_queue::config _config;
size_t _max_request_length[2];
boost::container::static_vector<fair_group, 2> _fgs;
size_t _max_request_length[3];
boost::container::static_vector<fair_group, 3> _fgs;
std::vector<std::unique_ptr<priority_class_data>> _priority_classes;
util::spinlock _lock;
const shard_id _allocated_on;
Expand Down
1 change: 1 addition & 0 deletions include/seastar/core/reactor.hh
Original file line number Diff line number Diff line change
Expand Up @@ -505,6 +505,7 @@ public:
future<> link_file(std::string_view oldpath, std::string_view newpath) noexcept;
future<> chmod(std::string_view name, file_permissions permissions) noexcept;

future<> punch_hole(int fd, uint64_t offset, uint64_t length) noexcept;
future<size_t> read_directory(int fd, char* buffer, size_t buffer_size);

future<int> inotify_add_watch(int fd, std::string_view path, uint32_t flags);
Expand Down
12 changes: 12 additions & 0 deletions include/seastar/util/file.hh
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,18 @@ future<std::vector<temporary_buffer<char>>> read_entire_file(std::filesystem::pa
/// \param path path of the file to be read.
future<sstring> read_entire_file_contiguous(std::filesystem::path path);

/// Removes a regular file via blocks discarding.
///
/// \param name name of the file to remove.
///
/// \note
/// The removal is processed by the I/O queue according to its configuration.
/// The removal may be delayed when the bandwidth is not available. The user
/// must ensure, that the file is not used until the removal finishes.
///
/// Effectively: opens a file, unlinks it and punches holes until nothing is left.
future<> remove_file_via_blocks_discarding(std::string_view name) noexcept;

SEASTAR_MODULE_EXPORT_END
/// @}

Expand Down
10 changes: 3 additions & 7 deletions src/core/file.cc
Original file line number Diff line number Diff line change
Expand Up @@ -268,13 +268,9 @@ posix_file_impl::fcntl_short(int op, uintptr_t arg) noexcept {

future<>
posix_file_impl::discard(uint64_t offset, uint64_t length) noexcept {
return engine()._thread_pool->submit<syscall_result<int>>([this, offset, length] () mutable {
return wrap_syscall<int>(::fallocate(_fd, FALLOC_FL_PUNCH_HOLE|FALLOC_FL_KEEP_SIZE,
offset, length));
}).then([] (syscall_result<int> sr) {
sr.throw_if_error();
return make_ready_future<>();
});
internal::maybe_priority_class_ref io_priority_class;
auto req = internal::io_request::make_discard(_fd, offset, length);
return _io_queue.submit_io_discard(internal::priority_class(io_priority_class), length, std::move(req), nullptr).discard_result();
}

future<>
Expand Down
Loading