This extraction will make it easier later when co-located tablets
are introduced in load balancer.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
That will help visualizing co-location of sibling tablets for a table
that is undergoing merge.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Currently, the tri-compare operator for big_decimal (operator <=>), uses
a precise but potentially very expensive algorithm for comparing the
numbers: it first brings them to the same scale, then compares the
normalized unscaled values. big_decimal has abritrary precisions,
therefore the stored numbers can be arbitrarily large.
In extreme cases, comparing two numbers can result in huge amount of
memory allocated and stalls. If this type is used int he primary key of
a table, these comparisons can make the node completely unresponsive.
This patch adds the following fast-paths to operator <=>:
* An early return for the case of equal scales.
* An early return for different signs.
* An early return for the case where one or both of the numbers are 0.
* A fast algorithm for detecting the case where the there is a big
difference between the two numbers. This algorithm works only with the
scales and is able to compare the two numbers by using only one division
and some additions and substractions. This algorithm is imprecise and
when the numbers are closer than its confidence window, it will
fall-back to the current slow but precise tri-compare.
All but the last case should have been fast before as well, but the
scale-compare algorithm makes a huge difference. Numbers, which would
previously make the node unresponsive, now compare in constant-time.
Fixes: scylladb/scylladb#21716Closesscylladb/scylladb#21715
Topology request table may change between the code reading it and
calling to cv::when() since reading is a preemption point. In this
case cv:signal can be missed. Detect that there was no signal in between
reading and waiting by introducing reload_count which is increased each
time the state is reloaded and signaled. If the counter is different
before and after reading the state may have change so re-check it again
instead of sleeping.
Closesscylladb/scylladb#21713
* github.com:scylladb/scylladb:
topology_coordinator: introduce reload_count in topology state and use it to prevent race
storage_service: use conditional_variable::when in co-routines consistently
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21754
now that we are allowed to use C++23. we now have the luxury of using
`std::views::transform`.
in this change, we:
- replace `boost::adaptors::transformed` with `std::views::transform`
- use `fmt::join()` when appropriate where `boost::algorithm::join()`
is not applicable to a range view returned by `std::view::transform`.
- use `std::ranges::fold_left()` to accumulate the range returned by
`std::view::transform`
- use `std::ranges::fold_left()` to get the maximum element in the
range returned by `std::view::transform`
- use `std::ranges::min()` to get the minimal element in the range
returned by `std::view::transform`
- use `std::ranges::equal()` to compare the range views returned
by `std::view::transform`
- remove unused `#include <boost/range/adaptor/transformed.hpp>`
- use `std::ranges::subrange()` instead of `boost::make_iterator_range()`,
to feed `std::views::transform()` a view range.
to reduce the dependency to boost for better maintainability, and
leverage standard library features for better long-term support.
this change is part of our ongoing effort to modernize our codebase
and reduce external dependencies where possible.
limitations:
there are still a couple places where we are still using
`boost::adaptors::transformed` due to the lack of a C++23 alternative
for `boost::join()` and `boost::adaptors::uniqued`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21700
This commit addresses inconsistent spelling annotations that triggered
codespell warnings in our codebase.
Problem:
- Previous annotations like "CREATEing" and "DROPing" were flagged as
misspellings by the codespell workflow
- These annotations were used to describe CQL statement execution contexts
Solution:
- Updated annotations to "CREAT'ing" and "DROP'ing"
- Preserves the intent of the original annotations
- Silences codespell warnings without changing the underlying meaning
- Ensures consistent and spell-checker-friendly code documentation
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21741
Add tablet task manager module and keep it in storage_service.
Introduce tablet_virtual_task that covers tablet repair.
Thanks to a repair virtual task, a user can check the list of pending
repairs, get the status of a specific repair, or abort it using the task
manager API.
Fixes: #21368.
No backport, new feature
Closesscylladb/scylladb#21624
* github.com:scylladb/scylladb:
test: add test to check tablet repair tasks
test: topology_tasks: enable tablets
service: keep tablets module in storage_service
service: rename storage_service::_task_manager_module
service: add tablet_virtual_task
tasks: utilize preliminary virtual task lookup
In commit 8bf62a0 we introduced a test/pytest.ini which affects every
run of pytest in the project. One specific line in that file
log_cli = true
Overrides pytest's standard CLI output, which is traditionally short
unless the "-v" (verbose) option is used, to be always long and spammy.
There is absolutely no reason to do that - if the user wants to run
"pytest -v", they can do that - it doesn't need to be the default.
Moreover, as https://docs.pytest.org/en/stable/how-to/logging.html
explains, the "log_cli = true" was added in pytest 3.4 to revert to
pytest 3.3 behavior that "community feedback" showed was NOT LIKED.
Why would we want to revert to behavior that wasn't liked?
After this patch, which removes that line, the output of commands
like
cd test/cqlpy; pytest
return to what they used to be before commit 8bf62a0 and what the
pytest developers intended. Users who like verbose output can use
"pytest -v".
Fixes#21712
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21717
$without_systemd_check is incorrect variable name, it should be
$skip_systemd_check.
The bug skips to run "systemctl --user daemon-reload" unexpectedly on
nonroot mode installation.
This is likely root cause of the issue #21720.
Fixes#21720Closesscylladb/scylladb#21747
Scrub compaction can pick up input sstables from maintenance sstable set
but on compaction completion, it doesn't update the maintenance set
leaving the original sstable in set after it has been scrubbed. To fix
this, on compaction completion has to update the maintenance sstable if
the input originated from there. This PR solves the issue by updating the
correct sstable_sets on compaction completion.
Fixes#20030
This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2.
Closesscylladb/scylladb#21582
* github.com:scylladb/scylladb:
compaction: remove unused `update_sstable_lists_on_off_strategy_completion`
compaction_group: replace `update_sstable_lists_on_off_strategy_completion`
compaction_group: rename `update_main_sstable_list_on_compaction_completion`
compaction_group: update maintenance sstable set on scrub compaction completion
compaction_group: store table::sstable_list_builder::result in replacement_desc
table::sstable_list_builder: remove old sstables only from current list
table::sstable_list_builder: return removed sstables from build_new_list
In commit 9ff9cd37c3 we added in
test/alternator/test_number.py a workaround for a boto3 bug that
prevented us (and still prevents us) from testing numbers with high
precision. Because the workaround was so bizarre, the three lines it
requires - two imports and an assignment - were preceded by a 5-line
comment explaining it.
Unfortunately, a later commit 93b9b85c12
went and arbitrarily moved import lines around to satisfy some PEP-8
"requirements", resulting in the comment being separated from the lines
it was supposed to explain.
This patch moves the comment in front of the main line it explains.
The two imports that are needed just for this line and aren't used
elsewhere remain in their current place (where the PEP8 police demands
they stay), but this is less important for the understanding of this
trick so it's fine.
No functionality of the test was changed.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#21635
In the current scenario, 'test_replace_with_encryption' only confirms the replacement with inter-dc encryption
for normal nodes. This commit increases the coverage of test by parametrizing the test to confirm behavior
for zero token node replacement as well. This test also implicitly provides
coverage for bootstrap with encryption of zero token nodes.
This PR increases coverage for existing code. Hence we need to backport it. Since only 6.2 version has zero
token node support, hence we only backport it to 6.2
Fixes: scylladb/scylladb#21096Closesscylladb/scylladb#21609
Since we dropped scylla-jmx at 3cd2a61, Wants=scylla-jmx.service is not
needed anymore.
Also we have issue on nonroot mode installation with this line (#21720),
we need to drop this now.
Fixes#21720Closesscylladb/scylladb#21721
Topology request table may change between the code reading it and
calling to cv::when() since reading is a preemption point. In this
case cv:signal can be missed. Detect that there was no signal in between
reading and waiting by introducing reload_count which is increased each
time the state is reloaded and signaled. If the counter is different
before and after reading the state may have change so re-check it again
instead of sleeping.
Fixes: scylladb/scylladb#19994
Building upon commit 69b47694, this change addresses a subtle synchronization
weakness in node visibility checks during recovery mode testing.
Previous Approach:
- Waited only for the first node to see its peers
- Insufficient to guarantee full cluster consistency
Current Solution:
1. Implement comprehensive node visibility verification
2. Ensure all nodes mutually recognize each other
3. Prevent potential schema propagation race conditions
Key Improvements:
- Robust cluster state validation before keyspace creation
- Eliminate partial visibility scenarios
Fixesscylladb/scylladb#21724
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21726
Task status information from nodetool commands is not retained permanently:
- Status of completed tasks is only kept for `task_ttl_in_seconds`
- Status is removed after being queried, making it a one-time operation
This behavior is important for users to understand since subsequent
queries for the same completed task will not return any information.
Add documentation to make this clear to users.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21386
these unused includes are identified by clang-include-cleaner. after auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include `seastar/coroutine/maybe_yield.hh` anymore, and quite a few source files were relying on this header to bring in the declaration of `maybe_yield()`, we have to include this header in the places where this symbol is used. the same applies to `seastar/core/when_all.hh`.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#21727
* github.com:scylladb/scylladb:
.github: add "mutation" to CLEANER_DIR
mutation: remove unused "#include"s
in order to prevent future inclusion of unused headers, let's include
"mutation" subdirectory to CLEANER_DIR, so that this workflow can
identify the regressions in future.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
these unused includes are identified by clang-include-cleaner. after
auditing the source files, all of the reports have been confirmed.
please note, because `mutation/mutation.hh` does not include
`seastar/coroutine/maybe_yield.hh` anymore, and quite a few source
files were relying on this header to bring in the declaration of
`maybe_yield()`, we have to include this header in the places where
this symbol is used. the same applies to `seastar/core/when_all.hh`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Demote --scylla-data-dir and --scylla-yaml-file to schema source
helpers, rather than schema source in themselves. This practically means
that when these options are used, they won't define where the tool will
attempt to load the schema from, they will just be helpers to help locate
the schema, for whichever schema source the tool was instructed to use
(or left to choose).
--scylla-data-dir and --scylla-yaml-file being schema sources were
problematic with encryption at rest and for S3 support (not yet
implemented). With encryption, the tool needs access to the
configuration, so --scylla-yaml-file is often used to provide the path
to the configuration file, which contains encryption configuration,
needed for the tool to decrypt the sstable. Currently, using this option
implies forcing the tool to read the schema from the schema tables,
which is a problematic option for tests -- Scylla might be compacting a
schema sstable and this will make the tool fail to load the schema.
Demoting these options the schema helpers, allows providing them, while
at the same time having the option to use a different schema-source.
To allow the user to force the tool to load the schema from the schema
tables, a new --schema-tables option is added. Similarly, a
--sstable-schema option is introduced to force the tool to load the
schema from the sstable itself.
With this, each 4 schema source now has an option to force the use of
said schema source. There are various helper options to be used along
with these.
The documentation as well as the tests are updated with the changes.
The schema related documentation gets an rather extensive facelift
because it was a bit out-of-date and incomplete.
Fixes: scylladb/scylladb#20534Closesscylladb/scylladb#21678
This change improves dependency management by explicitly specifying
library linkage visibility in CMake targets.
Previously, some ScyllaDB targets used `target_link_libraries()`
without `PUBLIC` or `PRIVATE` keywords, which resulted in transitive
library dependencies by default. This unintentionally exposed
non-public dependencies to downstream targets.
Changes:
- Always use explicit `PRIVATE` or `PUBLIC` keywords with
`target_link_libraries()`
- Tighten build dependency tree
- Enforce a more modular linkage model
See: [CMake documentation on library dependencies](https://cmake.org/cmake/help/latest/command/target_link_libraries.html)
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21686
Once e.g. `ALTER KEYSPACE` is performed, all in-memory objects should be updated accordingly, but this is not entirely true for keyspace metadata object. The reason for that is that keyspace metadata are stored in 2 system tables: `system_schema.keyspaces` and `system_schema.scylla_keyspaces`. Up until now the in-memory keyspace metadata object has been updated only with entries from the first table, and missed updates when entries from the 2nd table changed. These entries were e.g. initial tablets or storage options.
This change fixes this oversight by considering both tables when checking if keyspace metadata need to be updated. From the implementation point of view, the change is simple: we're considering `system_schema.scylla_keyspaces` also in `merge_keyspaces()` and if old and new schemas have any differences, we include that when altering ks.
Fixes#20768
Backport: no need, I don't think the issue is severe, atm it seems like it can only influence the tablets number, which should not bring the cluster down nor result in returning bad data, it can mostly influence the speed of the db.
Closesscylladb/scylladb#20852
The checksummed file data source uses the chunk size to enforce that the
reads from the underlying file input stream will be aligned at the chunk
boundary. This is necessary so that we can validate the checksum of each
chunk.
However, a mismatch in the numeric types caused a bug where the
underlying file input stream would read a smaller portion of the data
file than expected.
The bug is located in the following lines:
```
auto start = _beg_pos & ~(chunk_size - 1);
auto end = (_end_pos & ~(chunk_size - 1)) + chunk_size;
```
`_beg_pos` and `_end_pos` are `uint64_t`, whereas `chunk_size` is
`uint32_t`. When executing the AND operation, the compiler converts the
right operand from `uint32_t` to `uint64_t`. Since the integer is
unsigned, the four most-significant bytes are filled with zeros, thus
erroneously truncating the corresponding bytes of the position.
Fix the bug by explicitly converting the chunk size to `uint64_t` before
any arithmetic operations. Also, replace the handwritten alignment
implementations with the `align_up()` and `align_down()` helpers.
Finally, restrict the file end position to not exceed the file length.
Since the last chunk can be smaller than the chunk size, it could happen
that the end position exceeds the file length after the round-up. This
is not a bug on its own since `make_file_input_stream()` can accept
lengths that go beyond end-of-file, but still it makes the code more
error prone and should be avoided.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#21665
Tablets are no longer an experimental feature, but topology_tasks test
suite treats them as if they were.
Enable tablets with their own config option in topology_tasks suite.
Rename storage_service::_task_manager_module to _node_ops_module.
In the following patches, storage service will keep two different
task manager modules.
Before these changes, we dereferenced `app_state` in
`manager::endpoint_downtime_not_bigger_than()` before checking that it's
not a null pointer. We fix that.
Fixesscylladb/scylladb#21699Closesscylladb/scylladb#21676
When API user requests status of a virtual task, we first need to find
which virtual_task instance tracks given operation. While doing this we
gather some info regarding the task, but we don't utilize it.
Add virtual_task_hint that keeps info that was gathered during virtual
task lookup and pass it to virtual_task's methods so the info doesn't
need to be retrieved twice.
Previously, the progress of download_task_impl launched by the "restore" API
was not tracked. Since restore operations can involve large data transfers,
this makes it difficult for users to monitor progress.
The restore process happens in two sequential steps:
1. Open specified SSTables from object storage
2. Download and stream mutation fragments from the opened SSTables to
mapped destinations
While both steps contribute to overall progress, they use different units
of measurement, making a unified progress metric challenging. Because
the load-and-stream step (step 2) is the largest time-consuming part of the
restore. This change implements progress tracking for this step as an
initial improvement to provide users with partial visibility into the
restore operation.
Fixesscylladb/scylladb#21427
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
---
this a part of experimental feature, hence no need to backport.
Closesscylladb/scylladb#21562
* github.com:scylladb/scylladb:
test/object_store: Enable tablets to match production settings
sstables_loader: Track download progress of download_task_impl
sstables_loader: improve batch tracking using ranges library
sstables_loader: print streaming progress with moving range
sstables_loader: mark sstable_streamer::stream_sstable_mutations() private
sstables_loader: fix indentation in stream_sstable_mutations()
The "--use-cmake" option currently hardwires the build directory as
"$source_dir/build". Adhere to the "--build-dir" option's argument
instead:
- If the option is not specified, its argument defaults to "build"; thus,
there is no change in behavior.
- If the option specifies a relative pathname, append it to $source_dir.
- If the option specifies an absolute pathname, use it as-is.
This is especially useful for keeping the build directory on a filesystem
separate from the source directory (without resorting to creating "build"
as a symlink, before running "configure.py"). For example, the source tree
can be accessed remotely over sshfs, from a build host, while keeping the
build artifacts (and hence the link stage) local to the build host.
Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
Closesscylladb/scylladb#21694
tombstone_gc.hh is relatively lightweight and is used in many places,
but it includes the heavyweight boost/icl/interval_map.hh. Lighten
the load for its users by wrapping lw_shared_ptr<some icl map type>
in a forward-declared class. Define the class in a new header
tombstone_gc-internals.hh, to be used by the two translation units
that need it.
Ref #1.
Closesscylladb/scylladb#21706
Update the tablestats documentation to correctly describe the "Number of
partitions" metric. The previous documentation incorrectly referred to
"estimated row count" when the command actually shows estimated partition count.
Before:
```
Number of keys (estimate) | The estimated row count
```
After:
```
Number of partitions (estimate) | The estimated partition count
```
This distinction is important since a partition (identified by its partition
key) can contain multiple rows in ScyllaDB. The updated format also matches
Cassandra's nodetool output for better compatibility.
Fixesscylladb/scylladb#21586
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#21598
Now that `update_sstable_sets_on_compaction_completion` can update both
the main and maintenance sets, callers of
`update_sstable_lists_on_off_strategy_completion` can replace it with
the former.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Rename `update_main_sstable_list_on_compaction_completion` to
`update_sstable_sets_on_compaction_completion` as the method updates
both main and maintenance sstable sets now.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Scrub compaction can pick up input sstables from maintenance sstable set
but on compaction completion, it doesn't update the maintenance set
leaving the original sstable in set after it has been scrubbed. To fix
this, on compaction completion has to update the maintenance sstable if
the input originated from there.
This patch modifies the `update_sstable_sets_on_compaction_completion`
to remove the input sstable from the maintenance sstable set if it
exists in that set.
Also added a testcase to verify the fix.
Fixes#20030
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Directly store the result of `build_new_list` in `replacement_desc`
instead of storing just the newly built sstable_set. Adjust the
`backlog_tracker_adjust_charges` to use the removed sstables list
returned by the `build_new_list`, so that when the next patch updates
the `update_main_sstable_list_on_compaction_completion` to also update
the maintenance sstable set, only sstables removed from main sstable set
will be removed from the backlog tracker.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
The `build_new_list()` method previously joined the current and new
sstable ranges, removing old sstables from the combined result. This
patch updates the method to treat them separately, ensuring old sstables
are removed only from the current sstable list.
This change enables the method to return the correct set of removed
sstables in cases where an sstable is directly moved from the
maintenance set to the main set.
Updated the method table::sstable_list_builder::build_new_list() to
return the list of sstables that was removed along with the newly built
sstable set. This change will be used to unify the
`update_sstable_lists` variants in a following patch.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Enable the `enable_tablets` configuration flag in object store tests to better
align with production environments, where it is enabled by default via the
`scylla.yaml` in Scylla's relocatable tarball. This change will improve test
coverage of tablet-related features.
Previously, `enable_tablets` defaulted to false in tests, creating a mismatch
with typical production deployments.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>