commitlog: Handle oversized entries

Refs #18161

Yet another approach to dealing with large commitlog submissions.

We handle oversize single mutation by adding yet another entry
type: fragmented. In this case we only add a fragment (aha) of
the data that needs storing into each entry, along with metadata
to correlate and reconstruct the full entry on replay.

Because these fragmented entries are spread over N segments, we
also need to add references from the first segment in a chain
to the subsequent ones. These are released once we clear the
relevant cf_id count in the base.
                 *
This approach has the downside that due to how serialization etc
works w.r.t. mutations, we need to create an intermediate buffer
to hold the full serialized target entry. This is then incrementally
written into entries of < max_mutation_size, successively requesting
more segments.

On replay, when encountering a fragment chain, the fragment is
added to a "state", i.e. a mapping of currently processing
frag chains. Once we've found all fragments and concatenated
the buffers into a single fragmented one, we can issue a
replay callback as usual.

Note that a replay caller will need to create and provide such
a state object. Old signature replay function remains for tests
and such.

This approach bumps the file format (docs to come).

To ensure "atomicity" we both force syncronization, and should
the whole op fail, we restore segment state (rewinding), thus
discarding data all we wrote.

v2:
* Improve some bookeep, ensure we keep track of segments and flush
  properly, to get counter correct
This commit is contained in:
Calle Wilund
2024-04-30 14:22:09 +00:00
parent 2556e902b1
commit 05bf2ae5d7
4 changed files with 766 additions and 60 deletions

View File

@@ -111,6 +111,7 @@ public:
bool use_o_dsync = false;
bool warn_about_segments_left_on_disk_after_shutdown = true;
bool allow_going_over_size_limit = true;
bool allow_fragmented_entries = false;
// The base segment ID to use.
// The segment IDs of newly allocated segments will be issued sequentially
@@ -136,7 +137,8 @@ public:
static inline constexpr uint32_t segment_version_1 = 1u;
static inline constexpr uint32_t segment_version_2 = 2u;
static inline constexpr uint32_t segment_version_3 = 3u;
static inline constexpr uint32_t current_version = segment_version_3;
static inline constexpr uint32_t segment_version_4 = 4u;
static inline constexpr uint32_t current_version = segment_version_4;
descriptor(descriptor&&) noexcept = default;
descriptor(const descriptor&) = default;
@@ -378,7 +380,7 @@ public:
// (Re-)set data mix lifetime.
void update_max_data_lifetime(std::optional<uint64_t> commitlog_data_max_lifetime_in_seconds);
typedef std::function<future<>(buffer_and_replay_position)> commit_load_reader_func;
using commit_load_reader_func = std::function<future<>(buffer_and_replay_position)>;
class segment_error : public std::exception {};
@@ -424,7 +426,18 @@ public:
const char* what() const noexcept override;
};
class replay_state {
public:
replay_state();
~replay_state();
private:
friend class commitlog;
class impl;
std::unique_ptr<impl> _impl;
};
static future<> read_log_file(sstring filename, sstring prefix, commit_load_reader_func, position_type = 0, const db::extensions* = nullptr);
static future<> read_log_file(const replay_state&, sstring filename, sstring prefix, commit_load_reader_func, position_type = 0, const db::extensions* = nullptr);
private:
commitlog(config);