This update addresses an issue in the mutation diff calculation
algorithm used during read repair. Previously, the algorithm used
`token` as the hashmap key. Since `token` is calculated basing on the
Murmur3 hash function, it could generate duplicate values for different
partition keys, causing corruption in the affected rows' values.
Fixesscylladb/scylladb#19101
Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2
Closesscylladb/scylladb#21996
* github.com:scylladb/scylladb:
storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function.
storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap.
test: Add test case for checking read repair diff calculation when having conflicting keys.
Since we manage ip to id mapping directly in gossiper now we need to
load the mapping on boot. We already do it anyway, but only due to a bug
which checks raft topology mode config before it is set, so the code
thinks that it is in the gossiper mode and loads peers table into the
gossiper and token metadata. Fix the bug and load peers into the gossiper
only since token metadata is managed by raft.
The series also removes address map related test that no longer checks
anything and replace it with unit test.
It also adds the dc/rack check to "join node" rpc. The check is done
during shadow round now, but for it to work it requires dc/rack to be
propagated through the gossiper and we want to eventually drop it.
Ref: scylladb/scylladb#21777
* 'load-peers' of https://github.com/gleb-cloudius/scylla:
topology coordinator: reject replace request if topology does not match
gossiper: fix the logic of shadow_round parameter
storage_service: do not add endpoint to the gossiper during topology loading.
storage_service: load peers into gossiper on boot in raft topology mode
storage_service: set raft topology change mode before using it in join_cluster
locator: drop inet_address usage to figure out per dc/rack replication
test: drop test_old_ip_notification_repro.py
test: address_map: check generation handling during entry addition
Developers using asyncio.gather() often assume that it waits for all futures (awaitables) givens.
But this isn't true when the return_exceptions parameter is False, which is the default.
In that case, as soon as one future completes with an exception, the gather() call will return this exception immediately, and some of the finished tasks may continue to run in the background.
This is bad for applications that use gather() to ensure that a list of background tasks has all completed.
So such applications must use asyncio.gather() with return_exceptions=True, to wait for all given futures to complete either successfully or unsuccessfully.
Closesscylladb/scylladb#22252
Said fields in statistics are of type
`disk_array<uint32_t, disk_string<uint16_t>>` and currently are handled
as array of regular strings. However these fields store exploded
clustering keys, so the elements store binary data and converting to
string can yield invalid UTF-8 characters that certain JSON parsers (jq,
or python's json) can choke on. Fix this by treating them as binary and
using `to_hex()` to convert them to string. This requires some massaging
of the json_dumper: passing field offset to all visit() methods and
using a caller-provided disk-string to sstring converter to convert disk
strings to sstring, so in the case of statistics, these fields can be
intercepted and properly handled.
While at it, the type of these fields is also fixed in the
documentation.
Before:
"min_column_names": [
"��Z���\u0011�\u0012ŷ4^��<",
"�2y\u0000�}\u007f"
],
"max_column_names": [
"��Z���\u0011�\u0012ŷ4^��<",
"}��B\u0019l%^"
],
After:
"min_column_names": [
"9dd55a92bc8811ef12c5b7345eadf73c",
"80327900e2827d7f"
],
"max_column_names": [
"9dd55a92bc8811ef12c5b7345eadf73c",
"7df79242196c255e"
],
Fixes: #22078Closesscylladb/scylladb#22225
scylla-sstable tries to read scylla.yaml via the following sequence:
1) Use user-provided location is provided (--scylla-yaml-file parameter)
2) Use the environment variables SCYLLA_HOME and/or SCYLLA_CONF if set
3) Use the default location ./conf/scylla.yaml
Step 3 is fine on dev machines, where the binaries are usually invoked
from scylla.git, which does have conf/scylla.yaml, but it doesn't work
on production machines, where the default location for scylla.yaml is
/etc/scylla/scylla.yaml. To reduce friction when used on production
machines, add another fallback in case (3) fails, which tries to read
scylla.yaml from /etc/scylla/scylla.yaml location.
Fixes: scylladb/scylladb#22202Closesscylladb/scylladb#22241
When destroying a test cluster, ScyllaCluster.stop() calls ScyllaServer.stop()
for each running server. Previously, non-zero exit status codes from scylla
servers were silently ignored during test teardown.
This change modifies the logging behavior to print the exit status code when
a scylla server exits with a non-zero status. This helps developers quickly
identify potential issues or unexpected terminations during test runs.
Differences in handling:
- Before: Non-zero exit codes were not logged
- After: Non-zero exit codes are printed, providing visibility into
server termination errors
This improvement aids in diagnosing intermittent test failures or
unexpected server shutdowns during test execution.
Refs #21742
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21934
Previously, we created a vector<utils_json::histogram> and returned it
by copying into a future. Since histogram is a JSON representation of
ihistogram, it can be heavyweight, making the vector copy overhead
significant.
Now we move the vector into the returned future instead of copying it,
eliminating the deep copy overhead. The APIs backed by this function
are marked deprecated, so this performance improvement is not that
important.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22004
In 0b0e661a85, we brought abseil back as a submodule, and we
added absl::headers as an interface library for importing
abseil headers' include directory. And:
```console
$ patchelf --print-rpath build/RelWithDebInfo/scylla
/home/kefu/dev/scylla/idl/absl::headers
```
In this change, we remove `absl::headers` from
`target_link_directories()` as it's an interface library that only
provides header files, not linkable libraries. This fixes the incorrect
inclusion of absl::headers in the rpath of the scylla executable.
Additionally, remove abseil library dependencies from the idl target
since none of the idl source files directly include abseil headers.
After this change,
```console
$ patchelf --print-rpath build/RelWithDebInfo/scylla
```
the output of `pathelf` is now empty.
Fixes#22265
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22266
We restore a snapshot of table by streaming the sstables of
the given snapshot of the table using
`sstable_streamer::stream_sstable_mutations()` in batches. This function
reads mutations from a set of sstables, and streams them to the target
nodes. Due to the limit of this function, we are not able to track the
progress in bytes.
Previously, progress tracking used individual sstables as units, which caused
inaccuracies with tablet-distributed tables, where:
- An sstable spanning multiple tablets could be counted multiple times
- Progress reporting could become misleading (e.g., showing "40" progress
for a table with 10 sstables)
This change introduces a more robust progress tracking method:
- Use "batch" as the unit of progress instead of individual sstables.
Each batch represents a tablet when restoring a table snapshot if
the tablet being restored is distributed with tablets. When it comes
to tables distributed with vnode, each batch represents an sstable.
- Stream sstables for each tablet separately, handling both partially and
fully contained sstables
- Calculate progress based on the total number of sstables being streamed
- Skip tablet IDs with no owned tokens
For vnode-distributed tables, the number of "batches" directly corresponds
to the number of sstables, ensuring:
- Consistent progress reporting across different table distribution models
- Simplified implementation
- Accurate representation of restore progress
The new approach provides a more reliable and uniform method of tracking
restoration progress across different table distribution strategies.
Also, Corrected the use of `_sstables.size()` in
`sstable_streamer::stream_sstables()`. It addressed a review comment
from Pavel that was inadvertently overlooked during previous rebasing
the commit of 5ab4932f34.
Fixesscylladb/scylladb#21816
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21841
Fixes https://github.com/scylladb/scylla-enterprise/issues/5016#issuecomment-2558464631
EAR - encryption at rest. Allows on-disk file encryption of sstables and commitlog data.
Introduces OpenSSL based file level encrypted storage, managed via a set of providers
ranging from local files to cloud KMS providers.
For a more comprehensive explanation, see the included docs (or if possible, original
source tree).
Manual bulk merge of EAR feature from enterprise repo to main scylla repo.
Breaks some features apart, but main EAR is still a humongous commit, because to separate this
I would have to mess with code incrementally, adding time and risk.
This PR includes the local file gen tool, tests and also p11 validation.
Note: CI will not execute the full tests unless master CI is set to provide the same environment
as the enterprise one. Not sure about the status of this ATM.
Note: Includes code to compile against cryptsoft kmipc SDK, but not the SDK. If you happen to
check out this tree in the scylla folder and configure, it will be linked against and KMIP functionality
will be enabled, otherwise not.
Closesscylladb/scylladb#22233
* github.com:scylladb/scylladb:
docs: Add EAR docs
main/build: Add p11-kit and initialize
tools: Add local-file-key-generator tool
tests: Add EAR tests
tmpdir: shorten test tempdir path
EAR: port the ear feature from enterprise
cql_test_env: Add optional query timeout
schema/migration_manager: Add schema validate
sstables: add get_shared_components accessor
config/config_file: Add exports and definitions of config_type_for<>
the changes porting enterprise features to oss brought some
used include to the tree. so let's remove them. these unused
includes were identified by clang-include-cleaner.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22246
Instantiated only on shard 0.
Currently, only subscribe from unit test
Manual unit test using loop mount was added.
Note that the test requires sudo access
and root access to /dev/loop, so it cannot
run in rootless podman instance, and it'd
fail with Permission denied.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#21523
This PR extends authentication with 2 mechanisms:
- a new role_manager subclass, which allows managing users via
LDAP server,
- a new authenticator, which delegates plaintext authentication
to a running saslauthd daemon.
The features have been ported from the enterprise repository
with their test.py tests and the documentation as part of
changing license to source available.
Fixes: scylladb/scylla-enterprise#5000Fixes: scylladb/scylla-enterprise#5001Closesscylladb/scylladb#22030
Replace boost with a standard facility; this reduces dependencies as lexical_cast depends on boost ranges.
Since std::from_chars() is chatty, we introduce utils::from_chars_exactly() to trade some flexibility for conciseness.
Small build time improvement, no backport needed.
Closesscylladb/scylladb#22164
* github.com:scylladb/scylladb:
error_injection: replace boost::lexical_cast with std::from_chars
utils: introduce from_chars_exactly()
When starting the view builder, we find all existing views in
`calculate_shard_build_step` and then register a listener for new views.
Between these steps we may yield and create a new view, then we miss
initializing the view build step for the new view, and we won't start
building it.
To fix this we first register the listener and then read existing views,
so a view can't be missed.
Fixesscylladb/scylladb#20338Closesscylladb/scylladb#22184
Bulk transfer of EAR functionality. Includes all providers etc.
Could maybe break up into smaller blocks, but once it gets down to
the core of it, would require messing with code instead of just moving.
So this is it.
Note: KMIP support is disabled unless you happen to have the kmipc
SDK in your scylla dir.
Adds optional encryption of sstables and commitlog, using block
level file encryption. Provides key sourcing from various sources,
such as local files or popular KMS systems.
Replace boost with a standard facility; this reduces dependencies
as lexical_cast depends on boost ranges.
As a side effect the exception error message is improved.
This is a replacement for boost::lexical_cast (but without its
long dependency chain). It wraps std::from_chars(), providing a less
flexible but also more concise interface.
The dict publication routine might throw raft::request_aborted when the node is
aborted. This doesn't deserve an ERROR log. Let's demote the log printed in this
case from ERROR to DEBUG.
Fixesscylladb/scylladb#22081Closesscylladb/scylladb#22211
Add the test file name to `ScyllaClusterManager` log file names alongside the test function name.
This avoids race conditions when tests with the same function names are executed simultaneously.
Fixesscylladb/scylladb#21807
Backport: not needed since this is a fix in the testing scripts.
Closesscylladb/scylladb#22192
these unused includes were identifier by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22199
Move the Scylla executable existence check from PythonTestSuite's constructor
to test execution time. This allows running unit tests that don't depend on
the scylla executable without building it first.
Previously, PythonTestSuite's constructor would fail if the Scylla executable
was missing, preventing even unrelated unit tests from running. Now, only
tests that actually require Scylla will fail if the executable is
missing.
Fixesscylladb/scylladb#22168
Refs scylladb/scylladb#19486
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22224
replica writes are delayed according to the view update backlog in order
to apply backpressure and reduce the rate of incoming base writes when
the backlog is large, allowing slow replicas to catch up.
previously the backlog calculation considered only the pending targets,
excluding targets that replied successfuly, probably due to confusion in
the code. instead, we want to consider the backlog of all the targets
participating in the write.
Fixesscylladb/scylladb#21672Closesscylladb/scylladb#21935
this unused include was identifier by clang-include-cleaner. after
auditing task_manager_module.hh, the report has been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22200
The definition of the template is in a source translation unit, but there
are also uses outside the translation unit. Without lto/pgo it worked due
to the definition in the translation unit, but with lto/pgo we can presume
the definition was inlined, so callers outside the translation unit did not
have anything to link with.
Fix by explicitly instantiating the template function.
Closesscylladb/scylladb#22136
* seastar 3133ecdd...a9bef537 (24):
> file: add file_system_space
> future: avoid inheriting from future payload type
> treewide: include fmt/ostream.h for using fmt::print()
> build: remove messages used for debugging
> demos: Rename websocket demo to websocket_server demo
> demos: Add a way to set port from cmd line in websocket demo
> tls: Add optional builder + future-wait to cert reload callback + expose rebuild
> rwlock: add try_hold_{read,write}_lock methods
> json: add moving push to json_list
> github: add a step to build "check-include-style"
> build: add a target for checking include style
> scheduling_group: use map for key configs instead of vector
> scheduling_group: fix indentation
> scheduling_group: fix race between scheduling group and key creation
> http: Make request writing functions public
> http: Expose connection_factory implementations
> metrics: Use separate type for shared metadata
> file: unexpected throw from inside noexcept
> metrics: Internalize metric label sets
> thread: optimize maybe_yield
> reactor: fix crash in pending registration task after poller dtor
> net: Fix ipv6 socket_address comparision
> reactor, linux-aio: factor out get_smp_count() lambda
> reactor, linux-aio: restore "available_aio" meaning after "reserve_iocbs"
Fixed usage of seastar metric label sets due to:
scylladb/seastar@733420d57 Merge 'metrics: Internalize metric label sets' from Stephan Dollberg
Closesscylladb/scylladb#22076
remove the "ScyllaDB Enterprise" labels in document. because
there is no need to differentiate ScyllaDB Enterprise from its OSS
variant, let's stop adding the "ScyllaDB Enterprise" labels to
enterprise-only features. this helps to reduce the confusion.
as we are still in the process of porting the enterprise features
to this repo, this change does not fixscylladb/scylladb#22175.
we will review the document again when completing the migration.
we also take this opportunity to stop referencing "Enterprise" in
the changed paragraph.
Refs scylladb/scylladb#22175
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22177
in 047ce136, we cherry-picked the change adding
garbage-collection-ics.rst to the document. but it was still
referencing the git sha1 and version number in enterprise.
this change updates kb/garbage-collection-ics.rst, so that it
* references the git commit sha1 in this repo
* do not reference the version introducing this feature, as
per Anna Stuchlik
> As a rule, we should avoid documenting when something was
> introduced or set as a default because our documentation
> was versioned. Per-version information should be listed in
> the release notes.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22195
Usually, the smaller the messsage, the higher the CPU cost per each network byte
saved by compression, so it often makes sense to reserve heavier compression
for bigger messages (where it can make the biggest impact for a given CPU budget)
and use ligher compression for smaller messages.
There is a knob -- internode_compression_zstd_min_message_size -- which
excludes RPC messages below certain size from being compressed with zstd.
We arbitrarily set its default to 0 bytes before.
Now we want to arbitrarily set it to 1024 bytes.
This is based purely on intuition and isn't backed by any solid data.
Fixesscylladb/scylla-enterprise#4731Closesscylladb/scylla-enterprise#4990Closesscylladb/scylladb#22204
We still have a number of issues to be solved for views with tablets.
Until they are fixed, we should prevent users from creating them,
and use the vnode-based views instead.
This patch prepares the feature for enabling views with tablets. The
feature is disabled by default, but currently it has no effect.
After all tests are adjusted to use the feature, we should depend
on the feature for deciding whether we can create materialized views
in tablet-enabled keyspaces.
The unit tests are adjusted to enable this feature explicitly, and it's
also added to the scylla sstable tool config - this tool treats all
tables as if they were tablet-based (surprisingly, with SimpleStrategy),
so for it to work on views, the new feature must be enabled.
Refs scylladb/scylladb#21832Closesscylladb/scylladb#21833
The raft voters api implementation only allowed to make a node to be
a non-voter, but for the "limited voters" feature we need to
also have the option to make the node a voter (from within the topology
coordinator).
Modifying the api to allow both adding and removing voters.
This in particular tries to simplify the API by not having to add
another set of new functions to make a voter, but having a single setter
that allows to modify the node configuration to either become a voter or
a non-voter.
Fixes: scylladb/scylladb#21914
Refs: scylladb/scylladb#18793Closesscylladb/scylladb#21899
In case of error, repair will be moved into the end_repair stage. We
should not remove repair_task_info in this case because the repair task
requested by the user is not finished yet.
To fix, we should remove repair_task_info at the end of repair stage.
Tests are added to ensure failed repair is not reported as finished.
Closesscylladb/scylladb#21973
Avoid using temporary names and instead treat the final image tag
as a temporary.
The new procedure is more or less
remote-final := local-x86_64
local-aarch64 += remote-final
remote-final := local-aarch64 (which now contains the x86_64 image too)
Closesscylladb/scylladb#21981
So that both mutation and file streaming will have the same log for
tablet streaming which simplifies the dtest checking.
Closesscylladb/scylladb#22176
Test the limited voters feature by creating a cluster with 3 DCs, one of
them disproportionately larger than the others. The raft majority should
not be lost in case the large DC goes down.
Fixes: scylladb/scylla#21915
Refs: scylladb/scylla#18793Closesscylladb/scylladb#21901