Files
scylladb/message/shared_dict.hh
Nadav Har'El 926089746b message: move RPC compression from utils/ to message/
The directory utils/ is supposed to contain general-purpose utility
classes and functions, which are either already used across the project,
or are designed to be used across the project.

This patch moves 8 files out of utils/:

    utils/advanced_rpc_compressor.hh
    utils/advanced_rpc_compressor.cc
    utils/advanced_rpc_compressor_protocol.hh
    utils/stream_compressor.hh
    utils/stream_compressor.cc
    utils/dict_trainer.cc
    utils/dict_trainer.hh
    utils/shared_dict.hh

These 8 files together implement the compression feature of RPC.
None of them are used by any other Scylla component (e.g., sstables have
a different compression), or are ready to be used by another component,
so this patch moves all of them into message/, where RPC is implemented.

Theoretically, we may want in the future to use this cluster of classes
for some other component, but even then, we shouldn't just have these
files individually in utils/ - these are not useful stand-alone
utilities. One cannot use "shared_dict.hh" assuming it is some sort of
general-purpose shared hash table or something - it is completely
specific to compression and zstd, and specifically to its use in those
other classes.

Beyond moving these 8 files, this patch also contains changes to:
1. Fix includes to the 5 moved header files (.hh).
2. Fix configure.py, utils/CMakeLists.txt and message/CMakeLists.txt
   for the three moved source files (.cc).
3. In the moved files, change from the "utils::" namespace, to the
   "netw::" namespace used by RPC. Also needed to change a bunch
   of callers for the new namespace. Also, had to add "utils::"
   explicitly in several places which previously assumed the
   current namespace is "utils::".

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#25149
2025-09-30 17:03:09 +03:00

57 lines
2.0 KiB
C++

/*
* Copyright (C) 2023-present ScyllaDB
*/
/*
* SPDX-License-Identifier: LicenseRef-ScyllaDB-Source-Available-1.0
*/
#pragma once
#define ZSTD_STATIC_LINKING_ONLY
#include <zstd.h>
#define LZ4_STATIC_LINKING_ONLY
#include <lz4.h>
#include "utils/UUID.hh"
#include <memory>
namespace netw {
// For performance reasons (cache pressure), it is desirable to have only
// one instance of a particular dictionary on a node.
//
// `shared_dict` takes a raw dictionary buffer (which preferably contains
// a dictionary in zstd format, but any content is fine), and wraps around
// it with compressor-specific dictionary types. (Each compressor attached
// some algorithm-specific hash indices and entropy tables to it).
//
// This way different compressors and decompressors can share the same
// raw dictionary buffer.
//
// Dictionaries are always read-only, so it's fine (and strongly preferable)
// to share this object between shards.
struct shared_dict {
struct dict_id {
uint64_t timestamp = 0;
utils::UUID origin_node{};
std::array<std::byte, 32> content_sha256{};
bool operator==(const dict_id&) const = default;
};
dict_id id{};
std::vector<std::byte> data;
std::unique_ptr<ZSTD_DDict, decltype(&ZSTD_freeDDict)> zstd_ddict{nullptr, ZSTD_freeDDict};
std::unique_ptr<ZSTD_CDict, decltype(&ZSTD_freeCDict)> zstd_cdict{nullptr, ZSTD_freeCDict};
std::unique_ptr<LZ4_stream_t, decltype(&LZ4_freeStream)> lz4_cdict{nullptr, LZ4_freeStream};
std::span<const std::byte> lz4_ddict;
// I got burned by an LZ4 bug (`<` used instead of `<=`) once when dealing with exactly 64 kiB,
// prefixes, so I'm using 64 kiB - 1 because of the trauma.
// But 64 kiB would probably work for this use case too.
constexpr static size_t max_lz4_dict_size = 64 * 1024 - 1;
shared_dict() = default;
shared_dict(std::span<const std::byte> d, uint64_t timestamp, utils::UUID origin_node, int zstd_compression_level = 1);
};
} // namespace netw