This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.
Closesscylladb/scylladb#22729
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
- Implement ScyllaRESTAPIClient::get_config() that obtains a current
value of given configuration parameter via /v2/config REST API
- Implement test to confirm that only liveupdatable parameters are
changed when SIGHUP is sent after configuration file change
Function set_initialization_completed() is called only once in main.cc,
and the effect is expected to be visible in all shards, as a side effect
of cfg->broadcast_to_all_shards() that is called shortly after. The same
technique was already used for enable_3_1_0_compatibility_mode() call.
Fixesscylladb/scylladb#5382
No backport - minor fix.
Closesscylladb/scylladb#22655
* github.com:scylladb/scylladb:
test: SIGHUP doesn't change non-liveupdatable configuration
test: implement ScyllaRESTAPIClient::get_config()
config: prevent SIGHUP from changing non-liveupdatable parameters
config: remove unused set_value_on_all_shards(const YAML::Node&)
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
Fixesscylladb/scylladb#5382
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
This change:
- Remove unused set_value_on_all_shards(const YAML::Node&) member
function in class config_file::named_value
The function logic was flawed, in a similar way
named_value<T>::set_value(const YAML::Node& node) is flawed: the config
source verification is insufficient for liveupdatable parameters,
allowing overwriting of non-liveupdatable config parameters (refer to
scylladb#5382). As the function was not used, it was removed instead of
fixing.
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.
In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.
Fixes#20322
* The test was flaky forever, so backport is required for all live versions.
Closesscylladb/scylladb#22064
* github.com:scylladb/scylladb:
tests: loading_cache_test: use manual_clock
utils: loading_cache: make clock_type a template parameter
test: loading_cache_test: use function-scope loader
test: loading_cache_test: simlute loader using sleep
test: lib: eventually: add sleep function param
test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
these misspellings were identified by codespell. let's fix them.
one of them is a part of a user visble string.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22443
Specialize config_from_string() for sstring to resolve lexical_cast
stream state parsing limitation. This enables correct handling of empty
string configurations, such as setting an empty value in CQL:
```cql
UPDATE system.config SET value='' WHERE
name='allowed_repair_based_node_ops';
```
Previous implementation using boost::lexical_cast would fail due to
EOF stream state, incorrectly rejecting valid empty string conversions.
Fixesscylladb/scylladb#22491
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22492
So the unit test can use manual_clock rather than lowres_clock
which can be flaky (in particular in debug mode).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Add a custom implementation of boost::adaptors::uniqued that is compatible
with C++20 ranges library. This bridges the gap between Boost.Range and
the C++ standard library ranges until std::views::unique becomes available
in C++26. Currently, the unique view is included in
[P2214](https://www.open-std.org/jtc1/sc22/wg21/docs/papers/2023/p2760r0.html)
"A Plan for C++ Ranges Evolution", which targets C++26.
The implementation provides:
- A lazy view adaptor that presents unique consecutive elements
- No modification of source range
- Compatibility with C++20 range views and concepts
- Lighter header dependencies compared to Boost
This resolves compilation errors when piping C++20 range views to
boost::adaptors::uniqued, which fails due to concept requirements
mismatch. For example:
```c++
auto range = std::views::take(n) | boost::adaptors::uniqued; // fails
```
This change also offers us a lightweight solution in terms of smaller
header dependency.
While std::ranges::unique exists in C++23, it's an eager algorithm that
modifies the source range in-place, unlike boost::adaptors::uniqued which
is a lazy view. The proposed std::views::unique (P2214) targeting C++26
would provide this functionality, but is not yet available.
This implementation serves as an interim solution for filtering consecutive
duplicate elements using range views until std::views::unique is
standardized.
For more details on the differences between `std::ranges::unique` and
`boost::adaptors::uniqued`:
- boost::adaptors::uniqued is a view adaptor that creates a lazy view over the original range. It:
* Doesn't modify the source range
* Returns a view that presents unique consecutive elements
* Is non-destructive and lazy-evaluated
* Can be composed with other views
- std::ranges::unique is an algorithm that:
* Modifies the source range in-place
* Removes consecutive duplicates by shifting elements
* Returns an iterator to the new logical end
* Cannot be used as a view or composed with other range adaptors
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22201
Declaring-but-not-defining a fully specialized template is a great way to
cut dependencies between users and providers, but unfortunately not
supported for variable templates. Clang 18 does support it, but
apparently it is a misinterpretation of the standard, and was removed
in clang 19.
We started using this non-feature in 7ed89266b3.
The fix is to use function templates. This is more verbose as
each specialization needs to define a static variable to return,
but is fully supported.
Closesscylladb/scylladb#22299
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
please note, because quite a few source files relied on
`utils/to_string.hh` to pull in the specialization of
`fmt::formatter<std::optional<T>>`, after removing
`#include <fmt/std.h>` from `utils/to_string.hh`, we have to
include `fmt/std.h` directly.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
In Scylla there are two options that control IO bandwidth limit -- the /storage_service/(compaction|stream)_throughput REST API endpoints. The endpoints are partially implemented and have no counterparts in the nodetool.
This set implements the missing bits and adds tests for new functionality.
Closesscylladb/scylladb#21877
* github.com:scylladb/scylladb:
nodetool: Implement [gs]etstreamthroughput commands
nodetool: Implement [gs]etcompationthroughput commands
test: Add validation of how IO-updating endpoints work
api: Implement /storage_service/(stream|compaction)_throughput endpoints
api: Disqualify const config reference
api: Implement /storage_service/stream_throughput endpoint
api: Move stream throughput set/get endpoints from storage service block
api: Move set_compaction_throughput_mb_per_sec to config block
util: Include fmt/ranges.h in config_file.hh
Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631
EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data.
Introduces OpenSSL based file level encrypted storage, managed via a set of providers
ranging from local files to cloud KMS providers.
For a more comprehensive explanation, see the included docs (or if possible, original
source tree).
Manual bulk merge of EAR feature from enterprise repo to main scylla repo.
Breaks some features apart, but main EAR is still a humongous commit, because to separate this
I would have to mess with code incrementally, adding time and risk.
This PR includes the local file gen tool, tests and also p11 validation.
Note: CI will not execute the full tests unless master CI is set to provide the same environment
as the enterprise one. Not sure about the status of this ATM.
Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to
check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality
will be enabled, otherwise not.
Closesscylladb/scylladb#22233
* github.com:scylladb/scylladb:
docs: Add EAR docs
main/build: Add p11-kit and initialize
tools: Add local-file-key-generator tool
tests: Add EAR tests
tmpdir: shorten test tempdir path
EAR: port the ear feature from enterprise
cql_test_env: Add optional query timeout
schema/migration_manager: Add schema validate
sstables: add get_shared_components accessor
config/config_file: Add exports and definitions of config_type_for<>
Instantiated only on shard 0.
Currently, only subscribe from unit test
Manual unit test using loop mount was added.
Note that the test requires sudo access
and root access to /dev/loop, so it cannot
run in rootless podman instance, and it'd
fail with Permission denied.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#21523
This PR extends authentication with 2 mechanisms:
- a new role_manager subclass, which allows managing users via
LDAP server,
- a new authenticator, which delegates plaintext authentication
to a running saslauthd daemon.
The features have been ported from the enterprise repository
with their test.py tests and the documentation as part of
changing license to source available.
Fixes: scylladb/scylla-enterprise#5000Fixes: scylladb/scylla-enterprise#5001Closesscylladb/scylladb#22030
Replace boost with a standard facility; this reduces dependencies as lexical_cast depends on boost ranges.
Since std::from_chars() is chatty, we introduce utils::from_chars_exactly() to trade some flexibility for conciseness.
Small build time improvement, no backport needed.
Closesscylladb/scylladb#22164
* github.com:scylladb/scylladb:
error_injection: replace boost::lexical_cast with std::from_chars
utils: introduce from_chars_exactly()
Replace boost with a standard facility; this reduces dependencies
as lexical_cast depends on boost ranges.
As a side effect the exception error message is improved.
This is a replacement for boost::lexical_cast (but without its
long dependency chain). It wraps std::from_chars(), providing a less
flexible but also more concise interface.
The dict publication routine might throw raft::request_aborted when the node is
aborted. This doesn't deserve an ERROR log. Let's demote the log printed in this
case from ERROR to DEBUG.
Fixesscylladb/scylladb#22081Closesscylladb/scylladb#22211
now that we are allowed to use C++23. we now have the luxury of using
`std::views::reverse`.
- replace `boost::adaptors::transformed` with `std::views::transform`
- remove unused `#include <boost/range/adaptor/reversed.hpp>`
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
- utils: phased_barrier: advance_and_await: allocate new gate only when needed
- utils: phased_barrier: add close() method
- and use in existing services
* Improvement. No backport needed
Closesscylladb/scylladb#22018
* github.com:scylladb/scylladb:
utils: phased_barrier: add close() method
utils: phased_barrier: advance_and_await: allocate new gate only when needed
This is a forward port (from scylla-enterprise) of additional compression options (zstd, dictionaries shared across messages) for inter-node network traffic. It works as follows:
After the patch, messaging_service (Scylla's interface for all inter-node communication)
compresses its network traffic with compressors managed by
the new advanced_rpc_compression::tracker. Those compressors compress with lz4,
but can also be configured to use zstd as long as a CPU usage limit isn't crossed.
A precomputed compression dictionary can be fed to the tracker. Each connection
handled by the tracker will then start a negotiation with the other end to switch
to this dictionary, and when it succeeds, the connection will start being compressed using that dictionary.
All traffic going through the tracker is passed as a single merged "stream" through dict_sampler.
dictionary_service has access to the dict_sampler.
On chosen nodes (in the "usual" configuration: the Raft leader), it uses the sampler to maintain
a random multi-megabyte sample of the sampler's stream. Every several minutes,
it copies the sample, trains a compression dictionary on it (by calling zstd's
training library via the alien_worker thread) and publishes the new dictionary
to system.dicts via Raft's write_mutation command.
This update triggers (eventually) a callback on all nodes, which feeds the new dictionary
to advanced_rpc_compression::tracker, and this switches (eventually) all inter-node connections
to this dictionary.
Closesscylladb/scylladb#22032
* github.com:scylladb/scylladb:
messaging_service: use advanced_rpc_compression::tracker for compression
message/dictionary_service: introduce dictionary_service
service: make Raft group 0 aware of system.dicts
db/system_keyspace: add system.dicts
utils: add advanced_rpc_compressor
utils: add dict_trainer
utils: introduce reservoir_sampling
utils: introduce alien_worker
utils: add stream_compressor
When services are stopped we generally want to call
advance_and_await(), but we should also prevent starting
new operations, so close() would do that be closing the
phased_barrier active gate (which implicitly also awaits
past operations similar to advance_and_await()).
Add unit tests for that and use in existing services.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
If there are no opearions in progress, there is no
need to close the current gate and allocate a new one.
The current gate can be reused for the new phase just as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So we can use it for defining other small_vector
deriving their internal capacity from another small_vector
type.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Adds glue needed to pass lz4 and zstd with streaming and/or dictionaries
as the network traffic compressors for Seastar's RPC servers.
The main jobs of this glue are:
1. Implementing the API expected by Seastar from RPC compressors.
2. Expose metrics about the effectiveness of the compression.
3. Allow dynamically switching algorithms and dictionaries on a running
connection, without any extra waits.
The biggest design decision here is that the choice of algorithm and dictionary
is negotiated by both sides of the connection, not dictated unilaterally by the
sender.
The negotiation algorithm is fairly complicated (a TLA+ model validating
it is included in the commit). Unilateral compression choice would be much simpler.
However, negotiation avoids re-sending the same dictionary over every
connection in the cluster after dictionary updates (with one-way communication,
it's the only reliable way to ensure that our receiver possesses the dictionary
we are about to start using), lets receivers ask for a cheaper compression mode
if they want, and lets them refuse to update a dictionary if they don't think
they have enough free memory for that.
In hindsight, those properties probably weren't worth the extra complexity and
extra development effort.
Zstd can be quite expensive, so this patch also includes a mechanism which
temporarily downgrades the compressor from zstd to lz4 if zstd has been
using too much CPU in a given slice of time. But it should be noted that
this can't be treated as a reliable "protection" from negative performance
effects of zstd, since a downgrade can happen on the sender side,
and receivers are at the mercy of senders.
We are planning to improve some usages of compression in Scylla
(in which we compress small blocks of data) by pre-training
compression dictionaries on similar data seen so far.
For example, many RPC messages have similar structure
(and likely similar data), so the similarity could be exploited
for better compression. This can be achieved e.g. by training
a dictionary on the RPC traffic, and compressing subsequent
RPC messages against that dictionary.
To work well, the training should be fed a representative sample
of the compressible data. Such a sample can be approached by
taking a random subset (of some given reasonable size) of the data,
with uniform probability.
For our purposes, we need an online algorithm for this -- one
which can select the random k-subset from a stream of arbitrary
size (e.g. all RPC traffic over an hour), while requiring only
the necessary minimum of memory.
This is a known problem, called "reservoir sampling".
This PR introduces `reservoir_sampler`, which implements
an optimal algorithm for reservoir sampling.
Additionally, it introduces `page_sampler` -- a wrapper for `reservoir_sampler`,
which uses it to select a random sample of pages from a stream of bytes.
Introduces a util which launches a new OS thread and accepts
callables for concurrent execution.
Meant to be created once at startup and used until shutdown,
for running nonpreemptible, 3rd party, non-interactive code.
Note: this new utility is almost identical to wasm::alien_thread_runner.
Maybe we should unify them.
Adds utilities for "advanced" methods of compression with lz4
and zstd -- with streaming (a history buffer persisted across messages)
and/or precomputed dictionaries.
This patch is mostly just glue needed to use the underlying
libraries with discontiguous input and output buffers, and for reusing the
same compressor context objects across messages. It doesn't contain
any innovations of its own.
There is one "design decision" in the patch. The block format of LZ4
doesn't contain the length of the compressed blocks. At decompression
time, that length must be delivered to the decompressor by a channel
separate to the compressed block itself. In `lz4_cstream`, we deal
with that by prepending a variable-length integer containing the
compressed size to each compressed block. This is suboptimal for
single-fragment messages, since the user of lz4_cstream is likely
going to remember the length of the whole message anyway,
which makes the length prepended to the block redundant.
But a loss of 1 byte is probably acceptable for most uses.
Fixes#20717
Enables abortable interface and propagates abort_source to all s3 objects used for reading the restore data.
Note: because restore is done on each shard, we have to maintain a per-shard abort source proxy for each, and do a background per-shard abort on abort call. This is synced at the end of "run()".
Abort source is added as an optional parameter to s3 storage and the s3 path in distributed loader.
There is no attempt to "clean up" an aborted restore. As we read on a mutation level from remote sstables, we should not cause incomplete sstables as such, even though we might end up of course with partial data restored.
Closesscylladb/scylladb#21567
* github.com:scylladb/scylladb:
test_backup: Add restore abort test case
sstables_loader: Make restore task abortable
distributed_loader: Add optional abort_source to get_sstables_from_object_store
s3_storage: Add optional abort_source to params/object
s3::client: Make "readable_file" abortable
The operator() of named_value() prints the allowed values on error which
can be a vector, so the ranges formatting should be there.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
since fedora 38 is EOL. and fedora 39 comes with fmt v10.0.0, also,
we've switched to the build image based on fedora 40, which ships
fmt-devel v10.2.1, there is no need to support fmt < 10.
in this change, we drop the support fmt < 10.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21847
Currently, the tri-compare operator for big_decimal (operator <=>), uses
a precise but potentially very expensive algorithm for comparing the
numbers: it first brings them to the same scale, then compares the
normalized unscaled values. big_decimal has abritrary precisions,
therefore the stored numbers can be arbitrarily large.
In extreme cases, comparing two numbers can result in huge amount of
memory allocated and stalls. If this type is used int he primary key of
a table, these comparisons can make the node completely unresponsive.
This patch adds the following fast-paths to operator <=>:
* An early return for the case of equal scales.
* An early return for different signs.
* An early return for the case where one or both of the numbers are 0.
* A fast algorithm for detecting the case where the there is a big
difference between the two numbers. This algorithm works only with the
scales and is able to compare the two numbers by using only one division
and some additions and substractions. This algorithm is imprecise and
when the numbers are closer than its confidence window, it will
fall-back to the current slow but precise tri-compare.
All but the last case should have been fast before as well, but the
scale-compare algorithm makes a huge difference. Numbers, which would
previously make the node unresponsive, now compare in constant-time.
Fixes: scylladb/scylladb#21716Closesscylladb/scylladb#21715
Adds optional abortable source to "readable_file" interface.
Note: the abortable aspect is not preserved across a "dup()" call
however, since these objects are generally not used in a cross-shard
fashion, it should be ok.
Schema of system tables is defined statically and table_schema_version needs to be explicitly set in code like this:
```
builder.with_version(system_keyspace::generate_schema_version(table_id, version_offset));
```
Whenever schema is changed, the schema version needs to change, otherwise we hit undefined behavior when trying to interpret mutation data created with the old schema using the new schema.
It's not obvious that one needs to do that and developers often forget to do that. There were several instances of mistakes of omission, some caught during review, some not, e.g.: 31ea74b96e.
This patch changes definitions to call the new `schema_builder::with_hash_version()`, which will make the schema builder compute version from schema definition so that changes of the schema will automatically change the version. This way we no longer rely on the developer to remember to bump the version offset.
All nodes should arrive at the same version, which is verified by existing `test_group0_schema_versioning` and a new unit test: `test_system_schema_version_is_stable`.
Closesscylladb/scylladb#21602
* github.com:scylladb/scylladb:
system_tables: Compute schema version automatically
schema_builder: Introduce with_hash_version()
schema: Store raw_view_info in schema::raw_schema
schema: Remove dead comment
hashing: Add hasher for unordered_map
hashing: Add hasher for unique_ptr
hashing: Add hasher for double
[avi: add missing include <memory> to hashing.hh]
Our "sstring_view" is an historic alias for the standard std::string_view.
The patch changes the last remaining random uses of this old alias across
our source directory to the standard type name.
After this patch, there are no more uses of the "sstring_view" alias.
It will be removed in the following patch.
Refs #4062.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>