Today, when the `Fixes` prefix is missing or the developer is not a collaborator with `scylladbbot` we remove the backport labels to prevent the process from starting and notifying the developers.
Developers are worried that removing these backport labels will cause us to forget we need to do these backports. @nyh suggested to add a `scylladbbot/backport_error` label instead
Applied those changes, so when a `Fixes` prefix is missing we will add a `scylladbbot/backport_error` label and stop the process
When a user doesn't accept the invite we will still open the PR but he will not be assigned and will not be able to edit the branch when we have conflicts
Fixes: https://github.com/scylladb/scylla-pkg/issues/4898
Fixes: https://github.com/scylladb/scylla-pkg/issues/4897
Before we were using a marketplace Github action which had some limitations.
With this pull request we are updating the github action using curl option which will gives us full control of the flow instead of relying on pre made github action.
Fixes: scylladb#23088
Closesscylladb/scylladb#23215
Claim that building with CMake files is just 'not supported' instead of
not intended, especially that there are attempts to enable this.
Remove the obsolete mention of the `FOR_IDE` flag.
Closesscylladb/scylladb#22890
This commit adds documentation for zero-token nodes and an explanation
of how to use them to set up an arbiter DC to prevent a quorum loss
in multi-DC deployments.
The commit adds two documents:
- The one in Architecture describes zero-token nodes.
- The other in Cluster Management explains how to use them.
We need separate documents because zero-token nodes may be used
for other purposes in the future.
In addition, the documents are cross-linked, and the link is added
to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document.
Refs https://github.com/scylladb/scylladb/pull/19684
Fixes https://github.com/scylladb/scylladb/issues/20294Closesscylladb/scylladb#21348
In commit 4812a57f, the fmt-based formatter for gossip_digest_syn had
formatting code for cluster_id, partitioner, and group0_id
accidentally commented out, preventing these fields from being included
in the output. This commit restores the formatting by uncommenting the
code, ensuring full visibility of all fields in the gossip_digest_syn
message when logging permits.
This fixes a regression introduced in 4812a57f, which obscured these
fields and reduced debugging insight. Backporting is recommended for
improved observability.
Fixes#23142
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23155
During development of #22428 we decided that we have
no need for `object-storage.yaml`, and we'd rather store
the endpoints in `scylla.yaml` and get a REST api to exopose
the endpoints for free.
This patch removes the credentials provider used to read the
aws keys from this yaml file.
Followup work will remove the `object-storage.yaml` file
altogether and move the endpoints to `scylla.yaml`.
Signed-off-by: Robert Bindar <robert.bindar@scylladb.com>
Closesscylladb/scylladb#22951
This action will help preventing next-trigger for running every 15 minutes.
This action will run on push for a specific branch (next, next-enterprise, 2024.x, x.x)
Fixes: scylladb#23088
update action
Closesscylladb/scylladb#23141
The scylla-sstable dump-* command suite has proven invaluable in many investigations. In certain cases however, I found that `dump-data` is quite cumbersome. An example would be trying to find certain values in an sstable, or trying to read the content of system tables when a node is down. For these cases, `dump-data` is very cumbersome: one has to trudge through tons of uninteresting metadata and do compaction in their heads. This PR introduces the new scylla-sstable query command, specifically targeted at situations like this: it allows executing queries on sstables, exposing to the user all the power of CQL, to tailor the output as they see fit.
Select everything from a table:
$ scylla sstable query --system-schema /path/to/data/system_schema/keyspaces-*/*-big-Data.db
keyspace_name | durable_writes | replication
-------------------------------+----------------+-------------------------------------------------------------------------------------
system_replicated_keys | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
system_auth | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 1})
system_schema | true | ({class : org.apache.cassandra.locator.LocalStrategy})
system_distributed | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 3})
system | true | ({class : org.apache.cassandra.locator.LocalStrategy})
ks | true | ({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
system_traces | true | ({class : org.apache.cassandra.locator.SimpleStrategy}, {replication_factor : 2})
system_distributed_everywhere | true | ({class : org.apache.cassandra.locator.EverywhereStrategy})
Select everything from a single SSTable, use the JSON output (filtered through [jq](https://jqlang.github.io/jq/) for better readability):
$ scylla sstable query --system-schema --output-format=json /path/to/data/system_schema/keyspaces-*/me-3gm7_127s_3ndxs28xt4llzxwqz6-big-Data.db | jq
[
{
"keyspace_name": "system_schema",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
},
{
"keyspace_name": "system",
"durable_writes": true,
"replication": {
"class": "org.apache.cassandra.locator.LocalStrategy"
}
}
]
Select a specific field in a specific partition using the command-line:
$ scylla sstable query --system-schema --query "select replication from scylla_sstable.keyspaces where keyspace_name='ks'" ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
Select a specific field in a specific partition using ``--query-file``:
$ echo "SELECT replication FROM scylla_sstable.keyspaces WHERE keyspace_name='ks';" > query.cql
$ scylla sstable query --system-schema --query-file=./query.cql ./scylla-workdir/data/system_schema/keyspaces-*/*-Data.db
replication
-------------------------------------------------------------------------------------
({class : org.apache.cassandra.locator.NetworkTopologyStrategy}, {datacenter1 : 1})
New functionality: no backport needed.
Closesscylladb/scylladb#22007
* github.com:scylladb/scylladb:
docs/operating-scylla: document scylla-sstable query
test/cqlpy/test_tools.py: add tests for scylla-sstable query
test/cqlpy/test_tools.py: make scylla_sstable() return table name also
scylla-sstable: introduce the query command
tools/utils: get_selected_operation(): use std::string for operation_options
utils/rjson: streaming_writer: add RawValue()
cql3/type_json: add to_json_type()
test/lib/cql_test_env: introduce do_with_cql_env_noreentrant_in_thread()
There are several API endpoints that walk a specific list of sstables and sum up their bytes_on_disk() values. All those endpoints accumulate a map of sstable names to their sizes, then squashe the maps together and, finally, sum up the map values to report it back. Maintaining these intermediate collections is the waste of CPU and memory, the usage values can be summed up instantly.
Also add a test for per-cf endpoints to validate the change, and generalize the helper functions while at it.
Closesscylladb/scylladb#23143
* github.com:scylladb/scylladb:
api: Generalize disk space counting for table and system
api: Use map_reduce_cf_raw() overload with table name
api: Don't collect sstables map to count disk space usage
test: Add unit test for total/live sstable sizes
There are two semaphores in table for synchronizing changes to sstable list:
sstable_set_mutation_sem: used to serialize two concurrent operations updating
the list, to prevent them from racing with each other.
sstable_deletion_sem: A deletion guard, used to serialize deletion and
iteration over the list, to prevent iteration from finding deleted files on
disk.
they're always taken in this order to avoid deadlocks:
sstable_set_mutation_sem -> sstable_deletion_sem.
problem:
A = tablet cleanup
B = take_snapshot()
1) A acquires sstable_set_mutation_sem for updating list
2) A acquires sstable_deletion_sem, then delete sstable before updating list
3) A releases sstable_deletion_sem, then yield
4) B acquires sstable_deletion_sem
5) B iterates through list and bumps sstable deleted in step 2
6) B fails since it cannot find the file on disk
Initial reaction is to say that no procedure must delete sstable before
updating the list, that's true.
But we want a iteration, running concurrently to cleanup, to not find sstables
being removed from the system. Otherwise, e.g. snapshot works with sstables
of a tablet that was just cleaned up. That's achieved by serializing iteration
with list update.
Since sstable_deletion_sem is used within the scope of deletion only, it's
useless for achieving this. Cleanup could acquire the deletion sem when
preparing list updates, and then pass the "permit" to deletion function, but
then sstable_deletion_sem would essentially become sstable_set_mutation_sem,
which was created exactly to protect the list update.
That being said, it makes sense to merge both semaphores. Also things become
easier to reason about, and we don't have to worry about deadlocks anymore.
The deletion goes through sstable_list_builder, which holds a permit throughout
its lifetime, which guarantees that list updates and deletion are atomic to
other concurrent operations. The interface becomes less error prone with that.
It allowed us to find discard_sstables() was doing deletion without any permit,
meaning another race could happen between truncate and snapshot.
So we're fixing race of (truncate|cleanup) with take_snapshot, as far as we
know. It's possible another unknown races are fixed as well.
Fixes#23049.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#23117
To simplify aborting scylla while starting the services,
add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.
This patch prevents gate_closed_exception to escape handling
when start-up is aborted early with the stop signal,
causing https://github.com/scylladb/scylladb/issues/23153
The regression is apparently due to a25c3eaa1c
Fixes https://github.com/scylladb/scylladb/issues/23153
* Requires backport to 2025.1 due to a25c3eaa1cClosesscylladb/scylladb#23103
* github.com:scylladb/scylladb:
main: add checkpoints
main: safely check stop_signal in-between starting services
main: move prometheus start message
main: move per-shard database start message
Since it is requirement for Red Hat OpenShift Certification, we need to
run the container as non-root user.
Related scylladb/scylla-pkg#4858
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Before starting significant services that didn't
have a corresponding call to supervisor::notify
before them.
Fixes#23153
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
To simplify aborting scylla while starting the services,
Add a _ready state to stop_signal, so that until
main is ready to be stopped by the abort_source,
just register that the signal is caught, and
let a check() method poll that and request abort
and throw respective exception only then, in controlled
points that are in-between starting of services
after the service started successfully and a deferred
stop action was installed.
Refs #23153
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
The `prometheus_server` is started only conditionally
but the notification message is sent and logged
unconditionally.
Move it inside the condtional code block.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Now that we support suite subfolders, there is no
need to create an own suite for object_store and auth_cluster, topology, topology_custom.
this PR merge all these folders into one: 'cluster"
this pr also introduce and apply 'prepare_3_nodes_cluster' fixture that allow preparing non-dirty 3 nodes cluster
that can be reused between tests(for tests that was in topology folder)
number of tests in master
release -3461
dev -3472
debug -3446
number of tests in this PR
release -3460
dev -3471
debug -3445
There is a minus one test in each mode because It was 2 test_topology_failure_recovery files(topology and topology_custom) with the same utility functions but different test cases. This PR merged them into one
Closesscylladb/scylladb#22917
* github.com:scylladb/scylladb:
test.py: merge object_store into cluster folder
test.py: merge auth_cluster into cluster folter
test.py: rename topology_custom folder to cluster
test.py: merge topology test suite into topology_custom
test.py delete conftest in topology_custom
test.py apply prepare_3_nodes_cluster in topology
test.py: introduce prepare_3_nodes_cluster marker
Now when the bodies of both map-reduce reducers are the same, they can
be generalized with each other.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The existing helper that counds disk space usage for a table map-reduces
the table object "by hand". Its peer that counts the usage for all
tables uses the map_reduce_cf_raw() helper. The latter exists for
specific table as well, so the first counter can benefit from using it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All the API calls that collect disk usage of sstables accumulate
map<sstable name, disk size>, then merges shard maps into one, then
counts the "disk size" values and drops the map itself on the floor.
This is waste of CPU cycles, disk usage can be just summed up along
cf/sstables iterations, no need to accumulate map with names for that.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The pair of column_family/metrics/(total|live)_disk_space_used/{name}
reports the disk usage by sstables. The test creates table, populates,
flushes and checks that the size corresonds to what stat(2) reports for
the respective files.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
For the limited voters feature to work properly we need to make sure that we are only managing the voter status through the topology coordinator. This means that we should not change the node votership from the storage_service module for the raft topology directly.
We can drop the voter status changes from the storage_service module because the topology coordinator will handle the votership changes eventually. The calls in the storage_service module were not essential and were only used for optimization (improving the HA under certain conditions).
Furthermore, the other bundled commit improves the reaction again by reacting to the node `on_up()` and `on_down()` events, which again shortens the reaction time and improves the HA.
The change has effect on the timing in the tablets migration test though, as it previously relied on the node being made non-voter from the service_storage `raft_removenode()` function. The fix is to add another server to the topology to make sure we will keep the quorum.
Previously the test worked because the test waits for an injection to be reached and it was ensured that the injection (log line) has only been triggered after the node has been made non-voter from the `raft_removenode()`. This is not the case anymore. An alternative fix would be to wait for the first node to be made non-voter before stopping the second server, but this would make the test more complex (and it is not strictly required to only use 4 servers in the test, it has been only done for optimization purposes).
Fixes: scylladb/scylladb#22860
Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969
No backport: Part of the limited voters new feature, so this shouldn't to be backported.
Closesscylladb/scylladb#22847
* https://github.com/scylladb/scylladb:
raft: use direct return of future for `run_op_with_retry`
raft: adjust the voters interface to allow atomic changes
raft topology: drop removing the node from raft config via storage_service
raft topology: drop changing the raft voters config via storage_service
In commit 2463e524ed, Scylla's default changed
from starting with one tablet per shard to starting 10 per shard. The
functional tests don't need more tablets and it can only slow down the
tests, so the patch added --tablets-initial-scale-factor=1 to test/*/suite.yaml
but forgot to add it to test/cqlpy/run.py (to affect test/cqlpy/run) so
this patch does this now.
This patch should *only* be about making tests faster, although to be
honest, I don't see any measurable improvement in test speed (10 isn't
so many). But, unfortunately, this is only part of the story. Over time
we allowed a few cqlpy tests to be written in a way that relies on having
only a small number of tablets or even exactly one tablet per shard (!).
These tests are buggy and should be fixed - see issues #23115 and #23116
as examples. But adding the option --tablets-initial-scale-factor=1 also
to run.py will make these bugs not affect test/cqlpy/run in the same way
as it doesn't affect test.py.
These buggy tests will still break with `pytest cqlpy` against a Scylla
you ran yourself manually, so eventually will still need to fix those
test bugs.
Refs #23115
Refs #23116Closesscylladb/scylladb#23125
The cqlpy test test_compaction.py::test_compactionstats_after_major_compaction
was written to assume we have just one tablet per shard - if there are many
tablets compaction splitting the data, the test scenario might not need
compaction in the way that the test assumes it does.
Recently (commit 2463e524ed) Scylla's default
was changed to have 10 tablets per shard - not one. This broke this test.
The same commit modified test/cqlpy/suite.yaml, but that affects only test.py
and not test/cqlpy/run, and also not manual runs against a manually-installed
Scylla. If this test absolutely requires a keyspace with 1 and not 10
tablets, then it should create one explicitly. So this is what this test
does (but only if tablets are in use; if vnodes are used that's fine
too).
Before this patch,
test/cqlpy/run test_compaction.py::test_compactionstats_after_major_compaction
fails. After the patch, it passes.
Fixes#23116Closesscylladb/scylladb#23121
The currently used versions of "wasmtime", "idna", "cap-std" and
"cap-primitives" packages had low to moderate security issues.
In this patch we update the dependencies to versions with these
issues fixed.
The update was performed by changing the "wasmtime" (and "wasmtime-wasi")
version in rust/wasmtime_bindings/Cargo.toml and updating rust/Cargo.lock
using the "cargo update" command with the affected package. To fix an
issue with different dependencies having different versions of
sub-dependencies, the package "smallvec" was also updated to "1.13.1".
After the dependency update, the Rust code also needed to be updated
because of the slightly changed API. One Wasm test case needed to be
updated, as it was actually using an incorrect Wat module and not
failing before. The crate also no longer allows multiple tables in
Wasm modules by default - it is now enabled by setting the "gc" crate
feature and configuring the Engine with config.wasm_reference_types(true).
Fixes https://github.com/scylladb/scylladb/issues/23127Closesscylladb/scylladb#23128
It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up.
Fixes: scylladb/scylladb#22919
Bug is present in all live versions, backports are required.
Closesscylladb/scylladb#23044
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: register_inactive_read(): handle aborted permit
test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now()
Range scans are expected to go though lots of tombstones, no need to
spam the logs about this. The tombstone warning log is demoted to debug
level, if somebody wants to see it they can bump the logger to debug
level.
Fixes: https://github.com/scylladb/scylladb/issues/23093Closesscylladb/scylladb#23094
These redundant `std::move()` calls were identified by GCC-14.
In general, copy elision applies to these places, so adding
`std::move()` is not only unnecessary but can actually prevent
the compiler from performing copy elision, as it causes the
return statement to fail to satisfy the requirements for
copy elision optimization.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23063
This series is part of the effort to reduce the overall overhead originating from metrics reporting, both on the Scylla side and the metrics collecting server (Prometheus or similar)
The idea in this series is to create an equivalent of levels with a label.
First, label a subset of the metrics used by the dashboards.
Second, the per-table metrics that are now off by default will be marked with a different label.
The following specific optional features: CDC, CAS, and Alternator have a dedicated label now.
This will allow users to disable all metrics of features that are not in use.
All the rest of the metrics are left unlabeled.
Without any changes, users would get the same metrics they are getting today.
But you could pass the `__level=1` and get only those metrics the dashboard needs. That reduces between 50% and 70% (many metrics are hidden if not used, so the overall number of metrics varies).
The labels are not reported based on the seastar feature of hiding labels that start with an underscore.
Closesscylladb/scylladb#12246
* github.com:scylladb/scylladb:
db/view/view.cc: label metrics with basic_level
transport/server.cc: label metrics with basic_level
service/storage_proxy.cc: label metrics with basic_level and cas
main.cc: label metrics with basic_level
streaming/stream_manager.cc: label metrics with basic_level
repair/repair.cc: label metrics with basic_level
service/storage_service.cc: label metrics with basic_level
gms/gossiper.cc: label metrics with basic_level
replica/database.cc: label metrics with basic_level
cdc/log.cc: label metrics with basic_level and cdc
alternator: label metrics with basic_level and alternator
row_cache.cc: label metrics with basic_level
query_processor.cc: label metrics with basic_level
sstables.cc: label metrics with basic_level
utils/logalloc.cc label metrics with basic_level
commitlog.cc: label metrics with basic_level
compaction_manager.cc: label metrics with basic_level
Adding the __level and features labels
Refs #22916
Adds an "enable_session_tickets" option to TLS setup for our server
endpoints (not documented for internode RPC, as we don't handle it
on the client side there), allowing enabling of TLS3 client session
ticket, i.e. quicker reconnect.
Session tickets are valid within a time frame or until a node
restarts, whichever comes first.
v2:
Use "TLS1.3" in help message
Closesscylladb/scylladb#22928
The following metrics will be marked with basic_level label:
scylla_transport_cql_errors_total
scylla_transport_current_connections
scylla_transport_requests_served
scylla_transport_requests_shed
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_scylladb_current_version
scylla_reactor_utilization
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_gossip_heart_beat
scylla_gossip_live
scylla_gossip_unreachable
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_cdc_operations_failed
scylla_cdc_operations_total
All metrics are labeld with the __cdc label.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_alternator_operation
scylla_alternator_op_latency_bucket
scylla_alternator_op_latency_count
scylla_alternator_op_latency_sum
scylla_alternator_total_operations
scylla_alternator_batch_item_count
scylla_alternator_op_latency
scylla_alternator_op_latency_summary
scylla_expiration_items_deleted
All alternator metrics are marked with __alternator label.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_sstables_cell_tombstone_writes
scylla_sstables_range_tombstone_reads
scylla_sstables_range_tombstone_writes
scylla_sstables_row_tombstone_reads
scylla_sstables_tombstone_writes
The following metrics will be marked with basic_level label:
scylla_lsa_total_space_bytes
scylla_lsa_non_lsa_used_space_bytes
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
The following metrics will be marked with basic_level label:
scylla_commitlog_segments
scylla_commitlog_allocating_segments
scylla_commitlog_unused_segments
scylla_commitlog_alloc
scylla_commitlog_flush
scylla_commitlog_bytes_written
scylla_commitlog_pending_allocations
scylla_commitlog_requests_blocked_memory
scylla_commitlog_flush_limit_exceeded
scylla_commitlog_disk_total_bytes
scylla_commitlog_disk_active_bytes
scylla_commitlog_disk_slack_end_bytes
Scylla generates many metrics, and when multiplied by the number of
shards, the total number of metrics adds a significant load to a
monitoring server.
With multi-tier monitoring, it is helpful to have a smaller subset of
metrics users care about and allow them to get only those.
This patch adds two kind of labels, the a __level label, currently with
a single value, but we can add more in the future.
The second kind, is a cross feature label, curently for alternator, cdc
and cas.
We will use the __level label to mark the interesting user-facing metrics.
The current level value is:
basic - metrics for Scylla monitoring
In this phase, basic will mark all metrics used in the dashboards.
In practice, without any configuration change, Prometheus would get the
same metrics as it gets today.
While it is possible to filter by the label, e.g.:
curl http://localhost:9180/metrics?__level=basic
The labels themselves are not reported thanks to label filtering of
labels begin with __.
The feature labels:
__cdc, __cas and __alternator can be an easy way to disable a set of
metrics when not using a feature.
Signed-off-by: Amnon Heiman <amnon@scylladb.com>
Clean up the code by using direct return of future for `run_op_with_retry`.
This can be done as the `run_op_with_retry` function is already returning
a future that we can reuse directly. What needs to be taken care of is
to not use temporaries referenced from inside the lambda passed to the
`run_op_with_retry`.
Allow setting the voters and non-voters in a single operation. This
ensures that the configuration changes are done atomically.
In particular, we don't want to set voters and non-voters separately
because it could lead to inconsistencies or even the loss of quorum.
This change also partially reverts the commit 115005d, as we will only
need the convenience wrappers for removing the voters (not for adding
them).
Refs: scylladb/scylladb#18793
For the limited voters feature to work properly we need to make sure
that we are only managing the voter status through the topology
coordinator. This means that we should not change the node votership
from the storage_service module for the raft topology directly.
This needs to be done in addition to dropping of the votership change
from the storage_service module.
The `remove_from_raft_config` is redundant and can be removed because
a successfully completed `removenode` operation implies that the node
has been removed from group 0 by the topology coordinator.
Refs: scylladb/scylladb#22860
Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969
For the limited voters feature to work properly we need to make sure
that we are only managing the voter status through the topology
coordinator. This means that we should not change the node votership
from the storage_service module for the raft topology directly.
We can drop the voter status changes from the storage_service module
because the topology coordinator will handle the votership changes
eventually. The calls in the storage_service module were not essential
and were only used for optimization (improving the HA under certain
conditions).
This has effect on the timing in the tablets migration test though,
as it relied on the node being made non-voter from the service_storage
`raft_removenode()` function. The fix is to add another server to the
topology to make sure we will keep the quorum.
Previously the test worked because the test waits for an injection to be
reached and it was ensured that the injection (log line) has only been
triggered after the node has been made non-voter from the
`raft_removenode()`. This is not the case anymore. An alternative fix
would be to wait for the first node to be made non-voter before stopping
the second server, but this would make the test more complex (and it is
not strictly required to only use 4 servers in the test, it has been
only done for optimization purposes).
Fixes: scylladb/scylladb#22860
Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969
This is continuation of #21533
There are two almost identical helpers in api/ -- validate_table(ks, cf) and get_uuid(ks, cf). Both check if the ks:cf table exists, throwing bad_param_exception if it doesn't. There's slight difference in their usage, namely -- callers of the latter one get the table_id found and make use of it, while the former helper is void and its callers need to re-search for the uuid again if the need (spoiler: they do).
This PR merges two helpers together, so there's less code to maintain. As a nice side effect, the existing validate_table() callers save one re-lookup of the ks:cf pair in database mappings.
Affected endpoints are validated by existing tests:
* column_family/{autocompation|tombstone_gc|compaction_strategy}, validated by the tests described in #21533
* /storage_service/{range_to_endpoint_map|describe_ring|ownership}, validated by nodetool tests
* /storage_service/tablets/{move|repair}, validated by tablets move and repair tests
Closesscylladb/scylladb#22742
* github.com:scylladb/scylladb:
api: Remove get_uuid() local helper
api: Make use of validate_table()'s table_id
api: Make validate_table() helper return table_id after validation
api: Change validate_table()'s ctx argument to database
Previously, variables were marked as const, causing std::move() calls to
be redundant as reported by GCC warnings. This change either removes
const qualifiers or marks related lambdas as mutable, allowing the
compiler to properly utilize move constructors for better performance.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23066
As a part of the moving to bare pytest we need to extract the required test
environment preparation steps into pytest's hooks/fixtures.
Do this for S3 mock stuff (MinioServer, MockS3Server, and S3ProxyServer)
and for directories with test artifacts.
For compatibility reason add --test-py-init CLI option for bare pytest
test runner: need to add it to pytest command if you need test.py
stuff in your tests (boost, topology, etc.)
Also, postpone initialization of TestSuite.artifacts and TestSuite.hosts
from import-time to runtime.
Closesscylladb/scylladb#23087
Fix GCC warning about moving from a const reference in mp_row_consumer_k_l::flush_if_needed.
Since position_in_partition::key() returns a const reference, std::move has no effect.
Considered adding an rvalue reference overload (clustering_key_prefix&& key() &&) but
since the "me" sstable format is mandatory since 63b266e9, this approach offers no benefit.
This change simply removes the redundant std::move() call to silence the warning and
improve code clarity.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23085
Currently for system keyspace part of config members are configured outside of this helper, in the caller. It's more consistent to have full config initialization in one place.
Closesscylladb/scylladb#22975
* github.com:scylladb/scylladb:
replica: Mark database::make_keyspace_config() private
replica: Prepare full keyspace config in make_keyspace_config()
Enhance how the script handles remote repository selection for a given
SHA1 commit hash.
Previously, in 3bdbe620, the script fetched from all remotes containing
the product name, which could lead to inefficiencies and errors,
especially with multiple matching remotes. Now, it first checks if the
SHA1 is in any local remote-tracking branch, using that remote if found,
and otherwise fetches from each remote sequentially to find the first
one containing the SHA1. This approach minimizes unnecessary fetches,
making the script more efficient for debugging coredumps in repositories
with multiple remotes.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23026
LIMIT and PER PARTITION LIMIT limit the number of rows returned or taken
into consideration by a query. It makes no logical sense to have this
value at less than 1. Cassandra also has this requirement.
This patch ensures that the limit value is strictly positive and adds
an explicit test for it - it was only tested in a test ported from
Cassandra, that is disabled due to other issues.
Closesscylladb/scylladb#23013
If hosts and/or dcs filters are specified for tablet repair and
some replicas match these filters, choose the replica that will
be the repair master according to round-robin principle
(currently it's always the first replica).
If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, the repair succeeds and
the repair request is removed (currently an exception is thrown
and tablet repair scheduler reschedules the repair forever).
Fixes: https://github.com/scylladb/scylladb/issues/23100.
Needs backport to 2025.1 that introduces hosts and dcs filters for tablet repair
Closesscylladb/scylladb#23101
* github.com:scylladb/scylladb:
test: add new cases to tablet_repair tests
test: extract repiar check to function
locator: add round-robin selection of filtered replicas
locator: add tablet_task_info::selected_by_filters
service: finish repair successfully if no matching replica found
This commit adds the upgrade guides relevant in version 2025.1:
- From 6.2 to 2025.1
- From 2024.x to 2025.1
It also removes the upgrade guides that are not relevant in 2025.1 source available:
- Open Source upgrade guides
- From Open Source to Enterprise upgrade guides
- Links to the Enterprise upgrade guides
Also, as part of this PR, the remaining relevant content has been moved to
the new About Upgrade page.
WHAT NEEDS TO BE REVIEWED
- Review the instructions in the 6.2-to-2025.1 guide
- Review the instructions in the 2024.x-to-2025.1 guide
- Verify that there are no references to Open Source and Enterprise.
The scope of this PR does not have to include metrics - the info can be added
in a follow-up PR.
Fixes https://github.com/scylladb/scylladb/issues/22208
Fixes https://github.com/scylladb/scylladb/issues/22209
Fixes https://github.com/scylladb/scylladb/issues/23072
Fixes https://github.com/scylladb/scylladb/issues/22346Closesscylladb/scylladb#22352
Previously, when result_generator's default constructor was called, the
_stats member variable remained uninitialized. This could lead to
undefined behavior in release builds where uninitialized values are
unpredictable, making issues difficult to debug.
This change initializes the pointer to nullptr, ensuring consistent
behavior across all build types and preventing potential memory-related
bugs.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23073
When implementing the copy constructor for `sstable_set` (derived from
`enable_lw_shared_from_this`), we intentionally need the parent's default
constructor rather than its copy constructor. This is because each new
`sstable_set` instance maintains its own reference count and owns a clone
of the source object's implementation (`x._impl->clone()`).
Although this behavior is correct, GCC warns about not calling the parent's
copy constructor. This change explicitly calls the parent's default constructor
to:
1. Silence GCC warnings
2. Clearly document our intention to use the default constructor
3. Follow best practices for constructor initialization
The functionality remains unchanged, but the code is now more explicit about
its design and free of compiler warnings.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23083
If hosts and/or dcs filters are specified for tablet repair and
no replica matches these filters, an exception is thrown. The repair
fails and tablet repair scheduler reschedules it forever.
Such a repair should actually succeed (as all specified relpicas were
repaired) and the repair request should be removed.
Treat the repair as successful if the filters were specified and
selected no replica.
It is possible that the permit handed in to register_inactive_read() is
already aborted (currently only possible if permit timed out).
If the permit also happens to have wait for memory, the current code
will attempt to call promise<>::set_exception() on the permit's promise
to abort its waiters. But if the permit was already aborted via timeout,
this promise will already have an exception and this will trigger an
assert. Add a separate case for checking if the permit is aborted
already. If so, treat it as immediate eviction: close the reader and
clean up.
Fixes: scylladb/scylladb#22919
Unless the test in question actually wants to test timeouts. Timeouts
will have more pronounced consequences soon and thus using
db::timeout_clock::now() becomes a sure way to make tests flaky.
To avoid this, use db::no_timeout in the tests that don't care about
timeouts.
This commit adds a link to the Limitations section on the Tablets page
to the CQL pag, the tablets option.
This is actually the place where the user will need the information:
when creating a keyspace.
In addition, I've reorganized the section for better readability
(otherwise, the section about limitations was easy to miss)
and moved the section up on the page.
Note that I've removed the updated content from the `_common` folder
(which I deleted) to the .rst page - we no longer split OSS and Enterprise,
so there's no need to keep using the `scylladb_include_flag` directive
to include OSS- and Ent-specific content.
Fixes https://github.com/scylladb/scylladb/issues/22892
Fixes https://github.com/scylladb/scylladb/issues/22940Closesscylladb/scylladb#22939
Previously, the clang-tidy.yaml workflow would cancel the clang-tidy job
when a comment wasn't prefixed with "/clang-tidy", instead of skipping it.
This cancellation triggered unnecessary email notifications for developers
with GitHub action notifications enabled.
This change modifies the workflow to only run clang-tidy when the
read-toolchain job succeeds, reducing notification noise by properly
skipping the job rather than cancelling it.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23084
after introducing the test.py subfolders support,
test.py start creating weird log files like
testlog/topology_custom.mv/tablets/test_mv_tablets.1
that affect failed test collection logic
this commit fixes this and test.py logs as previously in testlog directory
without any subfolders: topology_custom.mv_tablets_test_mv_tablets.1
Closesscylladb/scylladb#23009
Fix a bug where std::same_as<...> constraint was incorrectly used as a
simple requirement instead of a nested requirement or part of a
conjunction. This caused the constraint to be always satisfied
regardless of the actual types involved.
This change promotes std::same_as<...> to a top-level constraint,
ensuring proper type checking while improving code readability.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23068
Replace boost::accumulate() calls with std::ranges facilities. This
change reduces external dependencies and modernizes the codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23062
The tree code have const and non-const overloads for searching methods
like find(), lower_bound(), etc. Not to implement them twice, it's coded
like
const_iterator find() const {
... // the implementation itself
}
iterator find() {
return iterator(const_cast<const *>(this)->find());
}
i.e. -- const overload is called, and returned by it const_iterator is
converted into a non-const iterator. For that the latter has dedicated
constructor with two inaccuracies: it's not marked as explicit and it
accepts const rvalue reference.
This patch fixes both.
Althogh this disables implicit const -> non-const conversion of
iterators, the constructor in question is public, which still opens a
way for conversion (without const_cast<>). This constructor is better
be marked private, but there's double_decker class that uses bptree
and exploits the same hacks in its finding methods, so it needs this
constructor to be callable. Alas.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#23069
Replace value-based exception catching with reference-based catching to address
GCC warnings about polymorphic type slicing:
```
warning: catching polymorphic type ‘class seastar::rpc::stream_closed’ by value [-Wcatch-value=]
```
When catching polymorphic exceptions by value, the C++ runtime copies the
thrown exception into a new instance of the specified type, slicing the
actual exception and potentially losing important information. This change
ensures all polymorphic exceptions are caught by reference to preserve the
complete exception state.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#23064
If user fails to supply the AttributeDefinitions parameter when creating
a table, Scylla used to fail on RAPIDJSON_ASSERT. Now it calls a polite
exception, which is fully in-line with what DynamoDB does.
The commit supplies also a new, relevant test routine.
Fixes#23043Closesscylladb/scylladb#23041
Fixes#22314
Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.
Bundles together the setup of "always on" schema extensions into a single call, and uses this from the three (3) init points.
Could have opted for static reg via `configurables`, but since we are moving to a single code base, the need for this is going away, hence explicit init seems more in line.
Closesscylladb/scylladb#22327
* github.com:scylladb/scylladb:
tools: Add standard extensions and propagate to schema load
cql_test_env: Use add all extensions instead of inidividually
main: Move extensions adding to function
tomstone_gc: Make validate work for tools
class clustering_range is a range of Clustering Key Prefixes implemented
as interval<clustering_key_prefix>. However, due to the nature of
Clustering Key Prefix, the ordering of clustering_range is complex and
does not satisfy the invariant of interval<>. To be more specific, as a
comment in interval<> implementation states: “The end bound can never be
smaller than the start bound”. As a range of CKP violates the invariant,
some algorithms, like intersection(), can return incorrect results.
For more details refer to scylladb#8157, scylladb#21604, scylladb#22817.
This commit:
- Add a WARNING comment to discourage usage of clustering_range
- Add WARNING comments to potentially incorrect uses of
interval<clustering_key_prefix> non-trivial methods
- Add a FIXME comment to incorrect use of
interval<clustering_key_prefix_view>::deoverlap and WARNING comments
to related interval<clustering_key_prefix_view> misuse.
Closesscylladb/scylladb#22913
Currently, when we add servers to the cluster in the test, we use
a 60s timeout which proved to be not enough in one of the debug runs.
There is no reason for this test to use a shorter timeout than all
the other tests, so in this patch we reset it to the higher default.
Fixes https://github.com/scylladb/scylladb/issues/23047Closesscylladb/scylladb#23048
Currently for system keyspace part of config members are configured
outside of this helper, in the caller. It's more consistent to have full
config initialization in one place.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Drop it from files that obviously don't need it. Also kill some forward
declarations while at it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#22979
The test `test_mv_topology_change` is a regression test for
scylladb/scylladb#19529. The problem was that CL=ANY writes issued when
all replicas were down would be kept in memory until the timeout. In
particular, MV updates are CL=ANY writes and have a 5 minute timeout.
When doing topology operations for vnodes or when migrating tablet
replicas, the cluster goes through stages where the replica sets for
writes undergo changes, and the writes started with the old replica set
need to be drained first.
Because of the aforementioned MV updates, the removenode operation could
be delayed by 5 minutes or more. Therefore, the
`test_mv_topology_change` test uses a short timeout for the removenode
operation, i.e. 30s. Apparently, this is too low for the debug mode and
the test has been observed to time out even though the removenode
operation is progressing fine.
Increase the timeout to 60s. This is the lowest timeout for the
removenode operation that we currently use among the in-repo tests, and
is lower than 5 minutes so the test will still serve its purpose.
Fixes: scylladb/scylladb#22953Closesscylladb/scylladb#22958
While generally better to reduce inline code, here we get
rid of the clustering_interval_set.hh dependency, which in turns
depends on boost interval_set, a large dependency.
incremental_compaction_test.cc is adjusted for a missing header.
Closesscylladb/scylladb#22957
Refs scylla-enterprise#5185
Fixes#22901
If a tls socket gets EPIPE the error is not translated to a specific
gnutls error code, but only a generic ERROR_PULL/PUSH. Since we treat
EPIPE as ignorable for plain sockets, we need to unwind nested exception
here to detect that the error was in fact due to this, so we can suppress
log output for this.
Closesscylladb/scylladb#22888
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22997
When a backup upload is aborted due to instance shutdown, change the log
level from ERROR to INFO since this is expected behavior. Previously,
`abort_requested_exception` during upload would trigger an ERROR log, causing
test failures since error logs indicate unexpected issues.
This change:
- Catches `abort_requested_exception` specifically during file uploads
- Logs these shutdown-triggered aborts at INFO level instead of ERROR
- Aligns with how `abort_requested_exception` is handled elsewhere in the service
This prevents false test failures while still informing administrators
about aborted uploads during shutdown.
Fixesscylladb/scylladb#22391
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22995
This series achieves two things:
1) changes default number of tablet replicas per shard to be 10 in order to reduce load imbalance between shards
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes https://github.com/scylladb/scylladb/issues/21967
2) introduces a global goal for tablet replica count per shard and adds logic to tablet scheduler to respect it by controlling per-table tablet count
The per-shard goal is enforced by controlling average per-shard tablet replica
count in a given DC, which is controlled by per-table tablet
count. This is effective in respecting the limit on individual shards
as long as tablet replicas are distributed evenly between shards.
There is no attempt to move tablets around in order to enforce limits
on individual shards in case of imbalance between shards.
If the average per-shard tablet count exceeds the limit, all tables
which contribute to it (have replicas in the DC) are scaled down
by the same factor. Due to rounding up to the nearest power of 2,
we may overshoot the per-shard goal by at most a factor of 2.
The scaling is applied after computing desired tablet count due to
all other factors: per-table tablet count hints, defaults, average tablet size.
If different DCs want different scale factors of a given table, the
lowest scale factor is chosen for a given table.
When creating a new table, its tablet count is determined by tablet
scheduler using the scheduler logic, as if the table was already created.
So any scaling due to per-shard tablet count goal is reflected immediately
when creating a table. It may however still take some time for the system
to shrink existing tables. We don't reject requests to create new tables.
Fixes#21458Closesscylladb/scylladb#22522
* github.com:scylladb/scylladb:
config, tablets: Allow tablets_initial_scale_factor to be a fraction
test: tablets_test: Test scaling when creating lots of tables
test: tablets_test: Test tablet count changes on per-table option and config changes
test: tablets_test: Add support for auto-split mode
test: cql_test_env: Expose db config
config: Make tablets_initial_scale_factor live-updateable
tablets: load_balancer: Pick initial_scale_factor from config
tablets, load_balancer: Fix and improve logging of resize decisions
tablets, load_balancer: Log reason for target tablet count
tablets: load_balancer: Move hints processing to tablet scheduler
tablets: load_balancer: Scale down tablet count to respect per-shard tablet count goal
tablets: Use scheduler's make_sizing_plan() to decide about tablet count of a new table
tablets: load_balancer: Determine desired count from size separately from count from options
tablets: load_balancer: Determine resize decision from target tablet count
tablets: load_balancer: Allow splits even if table stats not available
tablets: load_balancer: Extract make_sizing_plan()
tablets: Add formatter for resize_decision::way_type
tablets: load_balancer: Simplify resize_urgency_cmp()
tablets: load_balancer: Keep config items as instance members
locator: network_topology_strategy: Simplify calculate_initial_tablets_from_topology()
tablets: Change the meaning of initial_scale to mean min-avg-tablets-per-shard
tablets: Set default initial tablet count scale to 10
tablets: network_topology_stragy: Coroutinize calculate_initial_tablets_from_topology()
tablets: load_balancer: Extract get_schema_and_rs()
tablets: load_balancer: Drop test_mode
Altering a keyspace (that has tablets enabled) without changing
tablets attributes, i.e. no `AND tablets = {...}` results in incorrect
"Update Keyspace..." log message being printed. The printed log
contains "tablets={"enabled":false}".
Refs https://github.com/scylladb/scylladb/issues/22261Closesscylladb/scylladb#22324
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.
Also includes an enhancement to make showing of tablet id text
optional in table mode.
Closesscylladb/scylladb#22981
* github.com:scylladb/scylladb:
tablet-mon.py: Don't show merges and splits as full table recreations
tablet-mon.py: Add toggle for tablet ids
We currently depends on hostname command to get local IP, but we can do
this on Python API.
After the change, we can drop the package.
Closesscylladb/scylladb#22909
This commit removes the OSS version name, which is irrelevant
and confusing for 2025.1 and later users.
Also, it updates the warning to avoid specifying the release
when the deprecated feature will be removed.
Fixes https://github.com/scylladb/scylladb/issues/22839Closesscylladb/scylladb#22936
There's a sstable_directory::create_pending_deletion_log() helper method that's called by sstable's filesystem_storage atomic-delete methods and that prepares the deletion log for a bunch of sstables. For that method to do its job it needs to get private sstable->_storage field (which is always the filesystem_storage one), also the atomic-delete transparent context object is leaked into the sstable_directory code and low-level sstable storage code needs to include higher-level sstable_directory header.
This patch unties these knots. As the result:
- friendship between sstable and sstable_directory is removed
- transparent atomic_delete_context is encapsulated in storage.(cc|hh) code
- less code for create_pending_deletion_log() to dump TOC filename into log
Closesscylladb/scylladb#22823
* github.com:scylladb/scylladb:
sstable: Unfriend sstable_directory class
sstable_directory: Move sstable_directory::pending_delete_result
sstable_directory: Calculate prefixes outside of create_pending_deletion_log()
sstable_directory: Introduce local pending_delete_log variable
sstable_directory: Relax toc file dumping to deletion log
Before this patch we silently allowed and ignored PER PARTITION LIMIT.
SELECT DISTINCT requires all the partition key columns, which means that
setting PER PARTITION LIMIT is redundant - only one result will be
returned from every partition anyway.
Cassandra behaves the same way, so this patch also ensures
compatibility.
Fixesscylladb/scylladb#15109Closesscylladb/scylladb#22950
pip_packages is an associative array, which in bash is constructed
as ([key]=value...). In our case the value is often empty (indicating
no version constraint). Shellcheck warns against it, since `[key]= x`
could be a mistype of `[key]=x`. It's not in our case, but shellcheck
doesn't know that.
Make shellcheck happier by specifying the empty values explicitly.
Closesscylladb/scylladb#22990
Add verbose logging to identify failing test combinations in multi-DC
setup:
- Log replication factor (RF) and consistency level (CL) for each test
iteration
- Add validation checks for empty result sets
Improve error handling:
- Before indexing in a list, use `assert` to check for its emptiness
- Use assertion failures instead of exceptions for clearer test diagnostics
This change helps debug test failures by showing which RF/CL
combinations cause inconsistent results between zero-token and regular
nodes.
Refs scylladb/scylladb#22967
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22968
The test simulates the cluster getting stuck during upgrade to raft
topology due to majority loss, and then verifies that it's possible to
get out of the situation by performing recovery and redoing the upgrade.
Fixes: #17410Closesscylladb/scylladb#17675
* https://github.com/scylladb/scylladb:
test/topology_experimental_raft: add test_topology_upgrade_stuck
test.py: bump minimum python version to 3.11
test.py: move gather_safely to pylib utils
cdc: generation: don't capture token metadata when retrying update
test.py: topology: ignore hosts when waiting for group0 consistency
raft: add error injection that drops append_entries
topology_coordinator: add injection which makes upgrade get stuck
because we don't care about the exact output of grep, let's silence
its output. also, no need to check for the string is empty, so let's
just use the status code of the grep for the return value of the
function, more idiomatic this way.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22737
In some cases the paused/unpaused node can hang not after 30s timeout.
This make the test flaky. Change the condition to always check the
coordinator's log if there is a hung node.
Add `stop_after_streaming` to the list of error injections which can
cause a node's hang.
Also add a wait for a new coordinator election in cluster events
which cause such elections.
Closesscylladb/scylladb#22825
`slice.is_reversed()` was falsely flagged as accessing moved data, since
the underlying enum_set remains valid after move. However, to improve code
clarity and silence the warning, now reference `command->slice` directly
instead, which is guaranteed to be valid as the move target.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22971
Eliminate one try/catch block around call to wr.close()
by using coroutine::as_future.
Mark error paths as `[[unlikely]]`.
Use `coroutine::return_exception_ptr` to avoid rethrowing
the final exception.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#22831
Before these changes, the script didn't update the listed pip packages
if they were already installed. If the latest version of Scylla started
using new features and required an updated Python driver, for example,
the developers (and possibly the user) were forced to update it manually.
In this commit, we modify the script so that it updates the installed
packages when run. This should make things easier for everyone.
Closesscylladb/scylladb#22912
Tablet sequeunce number was part of the tablet identifier together
with last token, so on split and merge all ids changed and it appeared
in the simulator as all tablets of a table dropping and being created
anew. That's confusing. After this change, only last token is part of
the id, so split appears as adding tablets and merge appears as
removing half the tablets, which is more accurate.
Replace complex boolean expression:
```py
not driver_response_future.has_more_pages or not all_pages
```
with clearer equivalent:
```py
driver_response_future.has_more_pages and all_pages
```
The new expression is more intuitive as it directly checks for both
conditions (having more pages and wanting all pages) rather than using double
negation.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22969
Before the limited voters feature, the "raft_ignore_nodes" test was
relying upon the fact that all nodes will become voters.
With the limited voters feature, the test needs to be adjusted to
ensure that we do not lose the majority of the cluster. This could
happen when there are 7 nodes, but only 5 of them are voters - then if
we kill 3 nodes randomly we might end up with only 2 voters left.
Therefore we need to ensure that we only stop the appropriate number of
voter nodes. So we need to determine which nodes became voters and which
ones are non-voters, and select the nodes to be stopped based on that.
That means with 7 nodes and 5 voters, we can stop up to 2 voter nodes,
but at least one of the stopped nodes must be a non-voter.
Fixes: scylladb/scylladb#22902
Refs: scylladb/scylladb#18793
Refs: scylladb/scylladb#21969Closesscylladb/scylladb#22904
Now that we support suite subfolders, there is no
need to create an own suite for topology_tasks and topology_random_failures.
Closesscylladb/scylladb#22879
* https://github.com/scylladb/scylladb:
test.py: merge topology_tasks suite into topology_custom suite
test.py: merge topology_random_failures suite into topology_customs
In the current scenario, the problem discovered is that there is a time
gap between group0 creation and raft_initialize_discovery_leader call.
Because of that, the group0 snapshot/apply entry enters wrong values
from the disk(null) and updates the in-memory variables to wrong values.
During the above time gap, the in-memory variables have wrong values and
perform absurd actions.
This PR removes the variable `_manage_topology_change_kind_from_group0`
which was used earlier as a work around for correctly handling
`topology_change_kind` variable, it was brittle and had some bugs
(causing issues like scylladb/scylladb#21114). The reason for this bug
that _manage_topology_change_kind used to block reading from disk and
was enabled after group0 initialization and starting raft server for the
restart case. Similarly, it was hard to manage `topology_change_kind`
using `_manage_topology_change_kind_from_group0` correctly in bug free
manner.
Post `_manage_topology_change_kind_from_group0` removal, careful
management of `topology_change_kind` variable was needed for maintaining
correct `topology_change_kind` in all scenarios. So this PR also
performs a refactoring to populate all init data to system tables even
before group0 creation(via `raft_initialize_discovery_leader` function).
Now because `raft_initialize_discovery_leader` happens before the group
0 creation, we write mutations directly to system tables instead of a
group 0 command. Hence, post group0 creation, the node can read the
correct values from system tables and correct values are maintained
throughout.
Added a new function `initialize_done_topology_upgrade_state` which
takes care of updating the correct upgrade state to system tables before
starting group0 server. This ensures that the node can read the correct
values from system tables and correct values are maintained throughout.
By moving `raft_initialize_discovery_leader` logic to happen before
starting group0 server, and not as group0 command post server start, we
also get rid of the potential problem of init group0 command not being
the 1st command on the server. Hence ensuring full integrity as expected
by programmer.
This PR fixes a bug. Hence we need to backport it.
Fixes: scylladb/scylladb#21114Closesscylladb/scylladb#22484
* https://github.com/scylladb/scylladb:
storage_service: Remove the variable _manage_topology_change_kind_from_group0
storage_service: fix indentation after the previous commit
raft topology: Add support for raft topology system tables initialization to happen before group0 initialization
service/raft: Refactor mutation writing helper functions.
Currently, maybe_switch_to_new_writer resets _current_writer
only in a continuation after closing the current writer.
This leaves a window of vulnerability if close() yields,
and token_group_based_splitting_mutation_writer::close()
is called. Seeing the engaged _current_writer, close()
will call _current_writer->close() - which must be called
exactly once.
Solve this when switching to a new writer by resetting
_current_writer before closing it and potentially yielding.
Fixes#22715
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Closesscylladb/scylladb#22922
Replace boost::range::find() calls with std::ranges::find(). This
change reduces external dependencies and modernizes the codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22942
The split monitor wasn't handling the scenario where the table being
split is dropped. The monitor would be unable to find the tablet map
of such a table, and the error would be treated as a retryable one
causing the monitor to fall into an endless retry loop, with sleeps
in between. And that would block further splits, since the monitor
would be busy with the retries. The fix is about detecting table
was dropped and skipping to the next candidate, if any.
Fixes#21859.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#22933
This PR improves and refactors the test.topology.util new_test_keyspace generator
and adds a corresponding create_new_test_keyspace function to be used by most if not
all topology unit tests in order to standardize the way the tests create keyspaces
and to mitigate the python driver create keyspace retry issue: https://github.com/scylladb/python-driver/issues/317Fixes#22342Fixes#21905
Refs https://github.com/scylladb/scylla-enterprise/issues/5060
* No backport required, though may be desired to stabilize CI also in release branches.
Closesscylladb/scylladb#22399
* github.com:scylladb/scylladb:
test_tablet_repair_scheduler: prepare_multi_dc_repair: use create_new_test_keyspace
test/repair: create_table_insert_data_for_repair: create keyspace with unique name
topology_tasks/test_tablet_tasks: use new_test_keyspace
topology_tasks/test_node_ops_tasks: use new_test_keyspace
topology_custom/test_zero_token_nodes_no_replication: use create_new_test_keyspace
topology_custom/test_zero_token_nodes_multidc: use create_new_test_keyspace
topology_custom/test_view_build_status: use new_test_keyspace
topology_custom/test_truncate_with_tablets: use new_test_keyspace
topology_custom/test_topology_failure_recovery: use new_test_keyspace
topology_custom/test_tablets_removenode: use create_new_test_keyspace
topology_custom/test_tablets_migration: use new_test_keyspace
topology_custom/test_tablets_merge: use new_test_keyspace
topology_custom/test_tablets_intranode: use new_test_keyspace
topology_custom/test_tablets_cql: use new_test_keyspace
topology_custom/test_tablets2: use *new_test_keyspace
topology_custom/test_tablets2: test_schema_change_during_cleanup: drop unused check function
topology_custom/test_tablets: use new_test_keyspace
topology_custom/test_table_desc_read_barrier: use new_test_keyspace
topology_custom/test_shutdown_hang: use new_test_keyspace
topology_custom/test_select_from_mutation_fragments: use new_test_keyspace
topology_custom/test_rpc_compression: use new_test_keyspace
topology_custom/test_reversed_queries_during_simulated_upgrade_process: use new_test_keyspace
topology_custom/test_raft_snapshot_truncation: use create_new_test_keyspace
topology_custom/test_raft_no_quorum: use new_test_keyspace
topology_custom/test_raft_fix_broken_snapshot: use new_test_keyspace
topology_custom/test_query_rebounce: use new_test_keyspace
topology_custom/test_not_enough_token_owners: use new_test_keyspace
topology_custom/test_node_shutdown_waits_for_pending_requests: use new_test_keyspace
topology_custom/test_node_isolation: use create_new_test_keyspace
topology_custom/test_mv_topology_change: use new_test_keyspace
topology_custom/test_mv_tablets_replace: use new_test_keyspace
topology_custom/test_mv_tablets_empty_ip: use new_test_keyspace
topology_custom/test_mv_tablets: use new_test_keyspace
topology_custom/test_mv_read_concurrency: use new_test_keyspace
topology_custom/test_mv_fail_building: use new_test_keyspace
topology_custom/test_mv_delete_partitions: use new_test_keyspace
topology_custom/test_mv_building: use new_test_keyspace
topology_custom/test_mv_backlog: use new_test_keyspace
topology_custom/test_mv_admission_control: use new_test_keyspace
topology_custom/test_major_compaction: use new_test_keyspace
topology_custom/test_maintenance_mode: use new_test_keyspace
topology_custom/test_lwt_semaphore: use new_test_keyspace
topology_custom/test_ip_mappings: use new_test_keyspace
topology_custom/test_hints: use new_test_keyspace
topology_custom/test_group0_schema_versioning: use new_test_keyspace
topology_custom/test_data_resurrection_after_cleanup: use new_test_keyspace
topology_custom/test_read_repair_with_conflicting_hash_keys: use new_test_keyspace
topology_custom/test_read_repair: use new_test_keyspace
topology_custom/test_compacting_reader_tombstone_gc_with_data_in_memtable: use new_test_keyspace
topology_custom/test_commitlog_segment_data_resurrection: use new_test_keyspace
topology_custom/test_change_replication_factor_1_to_0: use new_test_keyspace
topology/test_tls: test_upgrade_to_ssl: use new_test_keyspace
test/topology/util: new_test_keyspace: drop keyspace only on success
test/topology/util: refactor new_test_keyspace
test/topology/util: CREATE KEYSPACE IF NOT EXISTS
test/topology/util: new_test_keyspace: accept ManagerClient
Replace boost::accumulate() calls with std::ranges::fold_left(). This
change reduces external dependencies and modernizes the codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22924
Replace boost::find() calls with std::ranges::find() and std::ranges::contains()
to leverage modern C++ standard library features. This change reduces external
dependencies and modernizes the codebase.
The following changes were made:
- Replaced boost::find() with std::ranges::find() where index/iterator is needed
- Used std::ranges::contains() for simple element presence checks
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22920
rebalance_tablets() was performing migrations and merges automatically
but not splits, because splits need to be acked by replicas via
load_stats. It's inconvenient in tests which want to rebalance to the
equilibrium point. This patch changes rebalance_tablets() to split
automatically by default, can be disabled for tests which expect
differently.
shared_load_stats was introduced to provide a stable holder of
load_stats which can be reused across rebalance_tablets() calls.
Resize is no longer only due to avg tablet size. Log avg tablet size as an
information, not the reason, and log the true reason for target tablet
count.
Hints have common meaning for all strategies, so the logic
belongs more to make_sizing_plan().
As a side effect, we can reuse shard capacity computation across
tables, which reduces computational complexity from O(tables*nodes) to
O(tables * DCs + nodes)
The limit is enforced by controlling average per-shard tablet replica
count in a given DC, which is controlled by per-table tablet
count. This is effective in respecting the limit on individual shards
as long as tablet replicas are distributed evenly between shards.
There is no attempt to move tablets around in order to enforce limits
on individual shards in case of imbalance between shards.
If the average per-shard tablet count exceeds the limit, all tables
which contribute to it (have replicas in the DC) are scaled down
by the same factor. Due to rounding up to the nearest power of 2,
we may overshoot the per-shard goal by at most a factor of 2.
If different DCs want different scale factors of a given table, the
lowest scale factor is chosen for a given table.
The limit is configurable. It's a global per-cluster config which
controls how many tablet replicas per shard in total we consider to be
still ok. It controls tablet allocator behavior, when choosing initial
tablet count. Even though it's a per-node config, we don't support
different limits per node. All nodes must have the same value of that
config. It's similar in that regard to other scheduler config items
like tablets_initial_scale_factor and target_tablet_size_in_bytes.
This makes decisions made by the scheduler consistent with decisions
made on table creation, with regard to tablet count.
We want to avoid over-allocation of tablets when table is created,
which would then be reduced by the scheduler's scaling logic. Not just
to avoid wasteful migrations post table creation, but to respect the
per-shard goal. To respect the per-shard goal, the algorithm will no
longer be as simple as looking at hints, and we want to share the
algorithm between the scheduler and initial tablet allocator. So
invoke the scheduler to get the tablet count when table is created.
This is in preparation for using the sizing plan during table creation
where we never have size stats, and hints are the only determining
factor for target tablet count.
Resize plan making will now happen in two stages:
1) Determine desired tablet counts per table (sizing plan)
2) Schedule resize decisions
We need intermediate step in the resize plan making, which gives us
the planned tablet counts, so that we can plug this part of the
algorithm into initial tablet allocation on table construction.
We want decisisons made by the scheduler to be consistent with
decisions made on table creation. We want to avoid over-allocation of
tablets when table is created, which would then be reduced by the
scheduler. Not just to avoid wasteful migrations post table creation,
but to respect the per-shard goal. To respect the per-shard goal, the
algorithm will no longer be as simple as looking at hints, and we want
to share the algorithm between the scheduler and initial tablet
allocator.
Also, this sizing plan will be later plugged into a virtual table for
observability.
Logic is preserved since target tablet size is constant for all
tables.
Dropping d.target_max_tablet_size() will allow us to move it
to the load_balancer scope.
Currently the scale is applied post rounding up of tablet count so
that tablet count per shard is at least 1. In order to be able to use
the scale to increase tablet count per shard, we need to apply it
prior to division by RF, otherwise we will overshoot per-shard tablet
replica count.
Example:
4 nodes, -c1, rf=3, initial_tablets_scale=10
Before: initial_tablet_count=20, tablet-per-shard=15
After: initial_tablet_count=14, tablets-per-shard=10.5
This will result in new tables having at least 10 tablet replicas per
shard by default.
We want this to reduce tablet load imbalance due to differences in
tablet count per shard, where some shards have 1 tablet and some
shards have 2 tablets. With higher tablet count per shard, this
difference-by-one is less relevant.
Fixes#21967
In some tests, we explicity set the initial scale to 1 as some of the
existing tests assume 1 compaction group per shard.
test.py uses a lower default. Having many tablets per shard slows down
certain topology operations like decommission/replace/removenode,
where the running time is proportional to tablet count, not data size,
because constant cost (latency) of migration dominates. This latency
is due to group0 operations and barriers. This is especially
pronounced in debug mode. Scheduler allows at most 2 migrations per
shard, so this latency becomes a determining factor for decommission
speed.
To avoid this problem in tests, we use lower default for tablet count per
shard, 2 in debug/dev mode and 4 in release mode. Alternatively, we
could compensate by allowing more concurrency when migrating small
tablets, but there's no infrastructure for that yet.
I observed that with 10 tablets per shard, debug-mode
topology_custom.mv/test_mv_topology_change starts to time-out during
removenode (30 s).
wrapped writer in seastar::futurize_invoke to make sure that the close() for the mutation_reader can be executed before destruction.
Fixes#22790Closesscylladb/scylladb#22812
It was only needed there for create_pending_deletion_log() method to get
private "_storage" from sstable. Now it's all gone and friendship can be
broken.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
... to where it belongs -- to the filesystem storage driver itself.
Continuation of the previous patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method in question walks the list of sstables and accumulates
sstables' prefixes into a set on pending_delete_result object. The set
in question is not used at all in this method and is in fact alien to it
-- the p.d._result object is used by the filesystem storage driver as
atomic deletion prepare/commit transparent context.
Said that, move the whole pending_delete_result to where it belongs and
relax the create_pending_deletion_log() to only return the log
directory path string.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The current code takes sstable prefix() (e.g. the /foo/bar string), then
trims from its fron the basedir (e.g. the /foo/ string) and then writes
the remainder, a slash and TOC component name (e.g. the xxx-TOC.txt
string). The final result is "bar/xxx-TOC.txt" string.
The taking into account sstable.toc_filename() renders into
sstable.prefix + \slash + component-name, the above result can be
achieved by trimming basedir directory from toc_filename().
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The lambda which dumps the diagnostics for each semaphore, is static.
Considering that said lambda captures a local (writeln) by reference, this
is wrong on two levels:
* The writeln captured on the shard which happens to initialize this
static, will be used on all shards.
* The writeln captured on the first dump, will be used on later dumps,
possibly triggering a segfault.
Drop the `static` to make the lambda local and resolve this problem.
Fixes: scylladb/scylladb#22756Closesscylladb/scylladb#22776
Recently, when running Alternator tests we get hundreds of warnings like
the following from basically all test files:
/usr/lib/python3.12/site-packages/botocore/crt/auth.py:59:
DeprecationWarning: datetime.datetime.utcnow() is deprecated and
scheduled for removal in a future version. Use timezone-aware objects
to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
/usr/local/lib/python3.12/site-packages/pytest_elk_reporter.py:299:
DeprecationWarning: datetime.datetime.utcnow() is deprecated and
scheduled for removal in a future version. Use timezone-aware objects
to represent datetimes in UTC: datetime.datetime.now(datetime.UTC).
These warnings all come from two libraries that we use in the tests -
botocore is used by Alternator tests, and elk reporter is a plugin that
we don't actually use, but it is installed by dtest and we often see
it in our runs as well. These warnings have zero interest to us - not
only do we not care if botocore uses some deprecated Python APIs and
will need to be updated in the future, all these warnings are hiding
*real* warnings about deprecated things we actually use in our own
test code.
The patch modifies test/pytest.ini (used by all our Python tests,
including but not limited to Alternator tests) to ignore deprecation
warnings from *inside* these two libraries, botocore and elk_reporter.
After this patch, test/alternator/run finishes without any warnings
at all. test/cqlpy does still have a few warnings left, which earlier
were hidden by the thousands of spammy warning eliminated in this patch.
We fix one of these warnings in this patch:
ResultSet indexing support will be removed in 4.0.
Consider using ResultSet.one()
by doing exactly what the warning recommended.
Some deprecation warnings in test/cqlpy remain in calls to
get_query_trace(). The "blame" for these warning is misplaced - this
function is part of the cassandra driver, but Python seems to think it's
part of our test code so I can't avoid them with the pytest.ini trick,
I'm not sure why. So I don't know yet how to eliminate these last warnings.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22881
Modify CMake configuration to only apply "-Xclang" options when building
with the Clang compiler. These options are Clang-specific and can cause
errors or warnings when used with other compilers like g++.
This change:
- Adds compiler detection to conditionally apply Clang-specific flags
- Prevents build failures when using non-Clang compilers
Previously, the build system would apply these flags universally, which
could lead to compilation errors with other compilers.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22899
Use std::to_underlying() when comparing unsigned types with enumeration values
to fix type mismatch warnings in GCC-14. This specifically addresses an issue in
utils/advanced_rpc_compressor.hh where comparing a uint8_t with 0 triggered a
'-Werror=type-limits' warning:
```
error: comparison is always false due to limited range of data type [-Werror=type-limits]
if (x < 0 || x >= static_cast<underlying>(type::COUNT))
~~^~~
```
Using std::to_underlying() provides clearer type semantics and avoids these kind
of comparison warnings. This change improves code readability while maintaining
the same behavior.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22898
Using the new_test_keyspace fixture is awkward for this test
as it is written to explicitly drop the created keyspaces
at certain points.
Therefore, just use create_new_test_keyspace to standardize the
creation procedure.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
new_test_keyspace is problematic here since
the presence of the banned node can fail the automatic drop of
the test keyspace due to NoHostAvailable (in debug mode for
some reason)
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Define create_new_test_keyspace that can be used in
cases we cannot automatically drop the newly created keyspace
due to e.g. loss of raft majority at the end of the test.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Following patch will convert topology tests to use
new_test_keyspace and friends.
Some tests restart server and reset the driver connection
so we cannot use the original cql Session for
dropping the created keyspace in the `finally` block.
Pass the ManagerClient instead to get a new cql
session for dropping the keyspace.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, we can not have more than one global topology operation at the same time. This means that we can not have concurrent truncate operations because truncate is implemented as a global topology operation.
Truncate excludes with other topology operations, and has to wait for those to complete before truncate starts executing. This can lead to truncate timeouts. In these cases the client retries the truncate operation, which will check for ongoing global topology operations, and will fail with
an "Another global topology request is ongoing, please retry." error.
This can be avoided by truncate checking if the ongoing global topology operation is a truncate running for the same table who's truncate has just been requested again. In this case, we can wait for the ongoing truncate to complete instead of immediately failing the operation, and
provide a better user experience.
This is an improvement, backport is not needed.
Closes#22166Closesscylladb/scylladb#22371
* github.com:scylladb/scylladb:
test: add test for re-cycling ongoing truncate operations
truncate: add additional logging and improve error message during truncate
storage_proxy: wait on already running truncate for the same table
storage_proxy: allow multiple truncate table fibers per shard
Return after executing the global metadata barrier to allow the topology
handler to handle any transitions that might have started by a
concurrect transaction.
Fixes#22792
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#22793
Refs #22628
Adds exception handler + cleanup for the case where we have a bad config/env vars (hint minio) or similar, such that we fail with exception during setting up the EAR context. In a normal startup, this is ok. We will report the exception, and the do a exit(1).
In tests however, we don't and active context will instead be freed quite proper, in which case we need to call stop to ensure we don't crash on shared pointer destruction on wrong shard. Doing so will hide the real issue from whomever runs the test.
Adds some verbosity to track issues with the network proxy used to test EAR connector difficulties. Also adds an earlier close in input stream to help network usage.
Note: This is a diagnostic helper. Still cannot repro the issue above.
Closesscylladb/scylladb#22810
* github.com:scylladb/scylladb:
gcp/aws kms: Promote service_error to recoverable + use malformed_response_error
encryption_at_rest_test: Add verbosity + earlier stream close to proxy
encryption: Add exception handler to context init (for tests)
Replace boost::accumulate() with the standard library's alternatives
to reduce external dependencies and simplify the codebase. This
change eliminates the requirement for boost::range and makes the
implementation more maintainable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22856
Disable seastar's built in handlers for SIGINT and SIGTERM and thus
fall-back to the OS's default handlers, which terminate the process.
This makes tool applications interruptable by SIGINT and SIGTERM.
The default handler just terminates the tool app immediately and
doesn't allow for cleanup, but this is fine: the tools have no important
data to save or any critical cleanup to do before exiting.
Fixes: scylladb/scylladb#16954Closesscylladb/scylladb#22838
The config variable `components_memory_reclaim_threshold` limits the
memory available to the sstable bloom filters. Any change to its value
is not immediately propagated to the sstable manager, despite it being
a LiveUpdate variable. The updated value takes effect only when a new
sstable is created or deleted.
This PR first refactors the reclaim and reload logic into a single
background fiber. It then updates the sstable manager to subscribe to
changes in the `components_memory_reclaim_threshold` configuration value
and immediately triggers the reclaim/reload fiber when a change is
detected.
Fixes#21947
This is an improvement and does not need to be backported.
Closesscylladb/scylladb#22725
* github.com:scylladb/scylladb:
sstables_manager: trigger reclaim/reload on `components_memory_reclaim_threshold` update
sstables_manager: maybe_reclaim_components: yield between iterations
sstables_manager: rename `increment_total_reclaimable_memory_and_maybe_reclaim()`
sstables_manager: move reclaim logic into `components_reclaim_reload_fiber()`
sstables_manager: rename `_sstable_deleted_event` condition variable
sstables_manager: rename `components_reloader_fiber()`
sstables_manager: fix `maybe_reclaim_components()` indentation
sstables_manager: reclaim components memory until usage falls below threshold
sstables_manager: introduce `get_components_memory_reclaim_threshold()`
sstables_manager: extract `maybe_reclaim_components()`
sstables_manager: fix `maybe_reload_components()` indentation
sstables_manager: extract out `maybe_reload_components()`
The config variable `components_memory_reclaim_threshold` limits the
memory available to the sstable bloom filters. Any change to its value
is not immediately propagated to the sstable manager, despite it being
a LiveUpdate variable. The updated value takes effect only when a new
sstable is created or deleted.
This patch updates the sstable manager to subscribe to any changes in
the above mentioned config value and immediately trigger the
reclaim/reload fiber when a change occurs. Also, adds a testcase to
verify the fix.
Fixes#21947
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Refs #22628
Mark problems parsing response (partial message, network error without exception etc
- hello testing), as "malformed_response_error", and promote this as well as
general "service_error" to recoverable exceptions (don't isolate node on error).
This to better handle intermittent network issues as well as making error-testing
more deterministic.
Refs #22628
Adds some verbosity to track issues with the network proxy used to test
EAR connector difficulties. Also adds an earlier close in input stream
to help network usage.
Note: This is a diagnostic helper. Still cannot repro the issue above.
Adds exception handler + cleanup for the case where we have a
bad config/env vars (hint minio) or similar, such that we fail
with exception during setting up the EAR context.
In a normal startup, this is ok. We will report the exception,
and the do a exit(1).
In tests however, we don't and active context will instead be
freed quite proper, in which case we need to call stop to ensure
we don't crash on shared pointer destruction on wrong shard.
Doing so will hide the real issue from whomever runs the test.
Serializing `raft::append_request` for transmission requires approximately the same amount of memory as its size. This means when the Raft library replicates a log item to M servers, the log item is effectively copied M times. To prevent excessive memory usage and potential out-of-memory issues, we limit the total memory consumption of in-flight `raft::append_request` messages.
Fixesscylladb/scylladb#14411Closesscylladb/scylladb#22835
* github.com:scylladb/scylladb:
raft_rpc::send_append_entries: limit memory usage
fms: extract entry_size to log_entry::get_size
Allows querying the content of sstables. Simple queries can be
constructed on the command-line. More advanced queries can be passed in
a file. The output can be text (similar to CQLSH) or json (similar to
SELECT JSON).
Uses a cql_test_env behind the scenes to set-up a query pipeline. The
queried sstables are not registered into cql_test_env, instead they are
queried via the virtual-table interface. This is to isolate the sstables
from any accidental modifications cql_test_env might want to do to them.
tool_app_template::run() calls get_selected_operation() to obtain the
operation (command) the user selected. To do this,
get_selected_operation() does a CLI pre-parsing pass, with a minimal
boost::program_options, so things like mixed positional/non-positional
args are correctly handled. This code use `sstring` for generic
operation-options. The problem is that boost doesn't allow values with
spaces inside for non-std::string types. This therefore prevents such
values from being used for any option downstream, because parsing would
fail at this stage. Change the type to std::string to solve this
problem.
Exposes the RawValue() method of the underlying rapidjson::Writer. This
method allows writing a pre-formatted json value to the stream. This
will allow using cql3/type_json.hh to pre-format CQL3 types, then write
these pre-formatted values into a json stream.
This variant of do_with_cql_env(), forgoes the reentrancy support in the
regular do_with_cql_env() variants, and re-uses the caller's exsting
seastar thread. This is an optimized version for callers which don't
need reentrancy and already have a thread.
The test simulates the cluster getting stuck during upgrade to raft
topology due to majority loss, and then verifies that it's possible
to get out of the situation by performing recovery and redoing the
upgrade.
Fixes: scylladb/scylladb#17410
Python 3.11 introduces asyncio.TaskGroup, which I would like to use in a
test that I'll introduce in the next commit. Modify the python version
check in test.py to prevent from accidentally running with an older
version of python.
The gather_safely function was originally defined in the
test.pylib.scylla_cluster module, but it is a generic concurrency
combinator which is not tied to the concept of Scylla clusters at all.
Move it to test.pylib.util to make this fact more clear.
In legacy topology mode, on startup, a node will attempt to insert data
of the newest CDC generation into the legacy distributed tables. In case
of any errors, the operation will be retried until success in 60s
intervals. While the node waits for the operation to be retried, it
keeps a token_metadata_ptr instance. This is a problem for two reasons:
- The tmptr instance is used in a lambda which determines the cluster
size. This lambda is used to determine the consistency level when
inserting the generation to the distributed tables - if there is only
one node, CL=ONE should be used instead of CL=QUORUM. The tmptr is
immutable so it can technically happen the the cluster is shrinked
while the code waits for the generation to be inserted.
- Token metadata instance keeps a version tracker that which prevents
topology operations from proceeding while the tracker exists. This is
a very niche problem, but it might happen that a leftover instance of
token metadata held by update_streams_description might delay a
topology operation which happens after upgrade to raft topology
happens. This actually slows down the test which simulates upgrade to
raft topology getting stuck (to be introduced in later commits).
Instead of capturing a token_metadata_ptr instance, capture a reference
to shared_token_metadata and use a freshly issued token_metadata_ptr
when computing the cluster size in order to choose the consistency
level.
Now, check_system_topology_and_cdc_generations_v3_consistency has an
additional list argument and will ignore hosts from that list if some of
them are found to be in the "left" state. Additionally, the function now
requires that the set of the live hosts in the cluster is exactly
`live_hosts` - no more, no less.
It will be needed for the test which simulates upgrade procedure getting
stuck - "un-stucking" the procedure requires removing some nodes via
legacy removenode procedure which marks them as "left" in gossip, and
then those nodes might get inserted as "left" nodes into raft topology
by the gossiper orphan remover fiber.
Some of the existing tests had to be adjusted because of the changes:
- test_unpublished_cdc_generations_arent_cleared passed only one of the
cluster's live hosts, now it passes all of them.
- test_topology_recovery_after_majority_loss removes some nodes during
the test, so they need to be put into the ignore_nodes list.
- test_topology_upgrade_basic did not include the last-added node to the
check_system_topology_and_cdc_generations_v3_consistency call, now it
does.
It will be needed for a test that simulates the cluster getting stuck
during upgrade. Specifically, it will be used to simulate network
isolation and to prevent raft commands from reaching that node.
The injection will necessary for the test, introduced in the next
commit, which verifies that it's possible to recover from an upgrade of
raft topology which gets stuck.
Reorder member variable initialization sequence to ensure `pw` is accessed
before being moved. While the current use-after-move warning from clang-tidy
is a false positive, this change:
- Makes the initialization order more logical
- Eliminates misleading static analysis warnings
- Prevents potential future issues if class structure changes
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22830
This commit removes the variable _manage_topology_change_kind_from_group0
which was used earlier as a work around for correctly handling
topology_change_kind variable, it was brittle and had some bugs. Earlier commits
made some modifications to deal with handling topology_change_kind variable
post _manage_topology_change_kind_from_group0 removal
This patch adds to the fetch_scylla.py script, used by the "--release"
option of test/{cqlpy,alternator}/run, the ability to download the new
2025.1 releases.
In the new single-stream releases, the number looks like the old
Scylla Enterprise releases, but the location of the artifacts in the
S3 bucket look like the old open-source releases (without the word
"-enterprise" in the paths). So this patch introduces a new "if"
for the (major >= 2025) case.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22778
This change adds two log messages. One for the creation of the truncate
global topology request, and another for the truncate timeout. This is
added in order to help with tracking truncate operation events.
It also extends the "Another global topology request is ongoing, please
retry." error message with more information: keyspace and table name.
Currently, we can not have more than one global topology operation at
the same time. This means that we can not have concurrent truncate
operations because truncate is implemented as a global topology
operation.
Truncate excludes with other topology operations, and has to wait for
those to complete before truncate starts executing. This can lead to
truncate timeouts. In these cases the client retries the truncate operation,
which will check for ongoing global topology operations, and will fail with
an "Another global topology request is ongoing, please retry." error.
This can be avoided by truncate checking if we have a truncate for the same
table already queued. In this case, we can wait for the ongoing truncate to
complete instead of immediatelly failing the operation, and provide a better
user experience.
Both, repair and streaming depend on view builder, but since the builder is started too late, both keep sharded<> reference on it and apply `if (view_builder.local_is_initialized())` safety checks.
However, view builder can do its sharded start much earlier, there's currently nothing that prevents it from doing so. This PR moves view builder start up together with some other of its dependencies, and relaxes the way repair and streaming use their view-builder references, in particular -- removes those ugly initialization checks.
refs: scylladb/scylladb#2737Closesscylladb/scylladb#22676
* github.com:scylladb/scylladb:
streaming: Relax streaming::make_streamig_consumer() view builder arg
streaming: Keep non-sharded view_builder dependency reference
streaming: Remove view_builder.local_is_initialized() checks
repair: Keep non-sharded view_builder dependency reference
repair: Remove view_builder.local_is_initialized() checks
main: Start sharded<view_builder> earlier
test/cql_env: Move stream manager start lower
Replace legacy shell test operator (-o) with more portable OR (||) syntax.
Fix fragile file handling in find loop by using while read loop instead.
Warnings fixed:
- SC2166: Replace [ p -o q ] with [ p ] || [ q ]
- SC2044: Replace for loop over find with while read loop
While no issues were observed with the current code, these changes improve
robustness and portability across different shell environments.
also, set the pipefail option, so that we can catch the unexpected
failure of `find` command call.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22385
This helper now fully duplicates the validate_table() one, so it
can be removed. Two callers are updated respectively.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
There are several places that validate_table() and then call
database::find_column_family(ks, cf) which goes and repeats the search
done by validate_table() before that.
To remove the unneeded work, re-use the table_id found by
validate_table() helper.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This helper calls database::find_column_family() and ignores the result.
The intention of this is just to check if the c.f. in question exists.
The find_column_family() in turn calls find_uuid() and then finds the
c.f. object using the uuid found. The latter search is not supposed to
fail, if it does, the on_internal_error() is called.
Said that, replacing find_column_family() with find_uuid() is
idempotent. And returning the found table_id will be used by next patch.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to be in-sync with another get_uuid() helper from API. This, in
turn, is to ease the unification of those two, because they are
effectively identical (see next patches)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It is redundant with reader_permit::impl::_ttl_timer. Use the latter for
TTL of inactive reads too. The usage of the two exclude each other, at
any point in time, either one or the other is used, so no reason to keep
both.
Closesscylladb/scylladb#22863
Demote do-nothing decisions to debug level, but keep them at info
if we did decide to do nothing (such as migrate a tablet). Information
about more major events (like split/merge) is kept at info level.
Once log line that logs node information now also logs the datacenter,
which was previously supplied by a log line that is now debug-only.
Closesscylladb/scylladb#22783
Currently, the tablet repair scheduler repairs all replicas of a tablet. It does not support hosts or DCs selection. It should be enough for most cases. However, users might still want to limit the repair to certain hosts or DCs in production. https://github.com/scylladb/scylladb/pull/21985 added the preparation work to add the config options for the selection. This patch adds the hosts or DCs selection support.
Fixes https://github.com/scylladb/scylladb/issues/22417
New feature. No backport is needed.
Closesscylladb/scylladb#22621
* github.com:scylladb/scylladb:
test: add test to check dcs and hosts repair filter
test: add repair dc selection to test_tablet_metadata_persistence
repair: Introduce Host and DC filter support
docs: locator: update the docs and formatter of tablet_task_info
result_set_row is a heavyweight object containing multiple cell types:
regular columns, partition keys, and static values. To prevent expensive
accidental copies, delete the copy constructor and replace it with:
1. A move constructor for efficient vector reallocation
2. An explicit copy() method when copies are actually needed
This change reduces overhead in some non-hot paths by eliminating implicit
deep copies. Please note, previously, in `create_view_from_mutation()`,
we kept a copy of `result_set_row`, and then reused `table_rs` for
holding the mutation for `scylla_tables`. Because we don't copy
the `result_set_row` in this change, in order to avoid invalidating
the `row` after reusing `table_rs` in the outer scope, we define a
new `table_rs` shadowing the one in the out scope.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22741
The existing test measures latencies of object GET-s. That's nice (though incomplete), but we want to measure upload performance. Here it is.
refs: #22460Closesscylladb/scylladb#22480
* github.com:scylladb/scylladb:
test/perf/s3: Add --part-size-mb option for upload test
test/perf/s3: Add uploading test
test/perf/s3: Some renames not to be download-centric
test/perf/s3: Make object/file name configurable
test/perf/s3: Configure maximum number of sockets
test/perf/s3: Remove parallelizm
s3/client: Make http client connections limit configurable
Given two sets of equivalent types, return the set
intersection.
This is a generic function which adapts to the actual
input type.
A unit test is added.
Closesscylladb/scylladb#22763
this PR is propper(pythonic) chance of commit 288a47f815
Creating an own folder used to be needed for two reasons:
we want a separate test suite, with its own settings
we want to structure tests, e.g. tablets, raft, schema, gossip.
We've been creating many folders recently. However, test suite
infrastructure is expensive in test.py - each suite has its own
pool of servers, concurrency settings and so on.
Make it possible to structure tests without too many suites,
by supporting subfolders within a suite.
As an example, this PR move mv tests into a separate folder
custom test.py lookup also works.
tests can be run as:
1. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets/test_mv_tablets_empty_ip
2. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets
3. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv
Fixes https://github.com/scylladb/scylladb/issues/20570Closesscylladb/scylladb#22816
* github.com:scylladb/scylladb:
test.py: move mv tests into a separate folder
test.py: suport subfolders
Seems tox is not used anywhere, so there is no need to have it then.
Especially when it messes with pytest. In some cases it can change the
config dir in pytest run.
Closesscylladb/scylladb#22819
Table updates that try to enable stream (while changing or not the
StreamViewType) on a table that already has the stream enabled
will result in ValidationError.
Table updates that try to disable stream on a table that does not
have the stream enabled will result in ValidationError.
Add two tests to verify the above.
Mark the test for changing the existing stream's StreamViewType
not to xfail.
Fixesscylladb/scylladb#6939Closesscylladb/scylladb#22827
Switch to using schema_ptr wrapper when handling schema references in
scylla_read_stats function. The existing fallback for older versions
(where schema is already a raw pointer) remains preserved.
Fixes#18700
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Closesscylladb/scylladb#22726
Fixes#22688
If we set a dc rf to zero, the options map will still retain a dc=0 entry.
If this dc is decommissioned, any further alters of keyspace will fail,
because the union of new/old options will now contained an unknown keyword.
Change alter ks options processing to simply remove any dc with rf=0 on
alter, and treat this as an implicit dc=0 in nw-topo strategy.
This means we change the reallocate_tablets routine to not rely on
the strategy objects dc mapping, but the full replica topology info
for dc:s to consider for reallocation. Since we verify the input
on attribute processing, the amount of rf/tablets moved should still
be legal.
v2:
* Update docs as well.
v3:
* Simplify dc processing
* Reintroduce options empty check, but do early in ks_prop_defs
* Clean up unit test some
Closesscylladb/scylladb#22693
The test
topology_custom/test_alternator::test_localnodes_broadcast_rpc_address
sets up nodes with a silly "broadcast rpc address" and checks that
Alternator's "/localnodes" requests returns it correctly.
The problem is that although we don't use CQL in this test, the test
framework does open a CQL connection when the test starts, and closes
it when it ends. It turns out that when we set a silly "broadcast RPC
address", the driver tends to try to connect to it when shutting down,
I'm not even sure why. But the choice of the silly address was 1.2.3.4
is unfortunate, because this IP address is actually routable - and
the driver hangs until it times out (in practice, in a bit over two
minutes). This trivial patch changes 1.2.3.4 to 127.0.0.0 - and equally
silly address but one to which connections fail immediately.
Before this patch, the test often takes more than 2 minutes to finish
on my laptop, after this patch, it always finishes in 4-5 seconds.
Fixes#22744
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22746
The code currently assumes that a session has both sender and receiver
streams, but it is possible to have just one or the other.
Change the test to include this scenario and remove this assumption from
the code.
Fixes: #22770Closesscylladb/scylladb#22771
This PR addresses two related issues in our task system:
1. Prepares for asynchronous resource cleanup by converting release_resources() to a coroutine. This refactoring enables future improvements in how we handle resource cleanup.
2. Fixes a cross-shard resource cleanup issue in the SSTable loader where destruction of per-shard progress elements could trigger "shared_ptr accessed on non-owner cpu" errors in multi-shard environments. The fix uses coroutines to ensure resources are released on their owner shards.
Fixes#22759
---
this change addresses a regression introduced by d815d7013c, which is contained by 2025.1 and master branches. so it should be backported to 2025.1 branch.
Closesscylladb/scylladb#22791
* github.com:scylladb/scylladb:
sstable_loader: fix cross-shard resource cleanup in download_task_impl
tasks: make release_resources() a coroutine
This commit eliminates unused boost header includes from the tree.
Removing these unnecessary includes reduces dependencies on the
external Boost.Adapters library, leading to faster compile times
and a slightly cleaner codebase.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22857
In a rolling upgrade, nodes that weren't upgraded yet will not recognize
the new tablet_resize_finalization state, that serves both split and
merges, leading to a crash. To fix that, coordinator will pick the
old tablet_split_finalization state for serving split finalization,
until the cluster agrees on merge, so it can start using the new
generic state for resize finalization introduced in merge series.
Regression was introduced in e00798f.
Fixes#22840.
Reported-by: Tomasz Grabiec <tgrabiec@scylladb.com>
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#22845
The script fetch_scylla.py is used by the "--release" option of
test/cqlpy/run and test/alternator/run to fetch a given release of
Scylla. The release is fetched from S3, and the script assumed that the
user properly set up $HOME/.aws/config and $HOME/.aws/credentials
to determine the source of that download and the credentials to do this.
But this is unnecessary - Scylla's "downloads.scylladb.com" bucket
actually allows **anonymous** downloads, and this is what we should use.
After this patch, fetch_scylla.py (and the "--release" option of the
run scripts) work correctly even for a user that doesn't have $HOME/.aws
set up at all.
This fix is especially important to new developers, who might not even
have AWS credentials to put into these files.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22773
Two callers of it -- repair and stream-manager -- both have non-sharded
reference and can just use it as argument. The helper in question gets
sharded<> one by itself.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Continuation of the previous path -- view builder is started early
enough and construction of stream manager can happen with non-sharded
reference on it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Continuation of the previous path -- view builder is started early
enough and construction of repair service can happen with non-sharded
reference on it.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The view_builder service is needed by repair service, but is started
after it. It's OK in a sense that repair service holds a sharded
reference on it and checks whether local_is_initialized() before using
it, which is not nice.
Fortunately, starting sharded view buidler can be done early enough,
because most of its dependencies would be already started by that time.
Two exceptions are -- view_update_generator and
system_distributed_keyspace. Both can be moved up too with the same
justification.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This is to keep it in-sync with main code, where stream manager is
started after storage_proxy's and query_processor's remotes. This
doesn't change nothing for now, but next patches will move other
services around main/cql_test_env and early start of stream manager in
cql_test_env will be problematic.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Renamed the aboved mentioned method to `increment_total_reclaimable_memory()`
as it doesn't directly reclaim memory anymore.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Move the sstable reclaim logic into `components_reclaim_reload_fiber()`
in preparation for the fix for #21947. This also simplifies the overall
reclaim/reload logic by preventing multiple fibers from attempting to
reclaim/reload component memory concurrently.
Also, update the existing test cases to adapt to this change.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Rename the `_sstable_deleted_event` condition variable to
`_components_memory_change_event` as it will be used by future patches
to signal changes in sstable component memory consumption, (i.e.)
during sstable create and delete, and also when the
`components_memory_reclaim_threshold` config value is changed.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
A future patch will move components reclaim logic into the current
`components_reloader_fiber()`, so to reflect its new purpose, rename it
to `components_reclaim_reload_fiber()`.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
The current implementation reclaims memory from SSTables only when a new
SSTable is created. An upcoming patch will move this reclaim logic into
the existing component reloader fiber. To support this change, the
`maybe_reclaim_components()` method is updated to reclaim memory until
the total memory consumption falls below the configured threshold.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Introduce `get_components_memory_reclaim_threshold()`, which returns
the components' memory threshold based on the total available memory.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Extract the code from
`increment_total_reclaimable_memory_and_maybe_reclaim()` that reclaims
the components memory into `maybe_reclaim_components()`. The extracted
new method will be used by a following patch to handle reclaim within
the components reload fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Extract the logic that reloads reclaimed components into memory in the
`components_reloader_fiber()` method into a separate method. This is in
preparation for moving the reclaim logic into the same fiber.
Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com>
Test now uses default internal part size, but for performance
comparisons its good to make it configurable.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test picks up a file and uploads it into the bucket, then prints the
time it took and uploading speed. For now it's enough, with existing S3
latencies more timing details can be obtained by turning on trace
logging on s3 logger.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now this test is all about reading objects. Rename some bits in it so
that they can be re-used by future uploading test as well.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Now the download test first creates a temporary object and then reads
data from it. It's good to have an option to download pre-existing file.
This option will also be used for uploading test (next patches)
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Add the --sockets NR option that limits the number of sockets the
underlying http client is configured to have.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The test spawns several fibers that read the same file in parallel.
There's not much point in it, just makes the code harder to maintain.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
In order to allow concurrent truncate table operations (for the time being,
only for a single table) we have to remove the limitation allowing only one
truncate table fiber per shard.
This change adds the ability to collect the active truncate fibers in
storage_proxy::remote into std::list<> instead of having just a single
truncate fiber. These fibers are waited for completion during
storage_proxy::remote::stop().
In the current scenario, topology_change_kind variable, was been handled using
_manage_topology_change_kind_from_group0 variable. This method was brittle
and had some bugs(e.g. for restart case, it led to a time gap between group0
server start and topology_change_kind being managed via group0)
Post _manage_topology_change_kind_from_group0 removal, careful management of
topology_change_kind variable was needed for maintaining correct
topology_change_kind in all scenarios. So this PR also performs a refactoring
to populate all init data to system tables even before group0 creation(via
raft_initialize_discovery_leader function). Now because raft_initialize_discovery_leader
happens before the group 0 creation, we write mutations directly to system
tables instead of a group 0 command. Hence, post group0 creation, the node
can read the correct values from system tables and correct values are
maintained throughout.
Added a new function initialize_done_topology_upgrade_state which takes
care of updating the correct upgrade state to system tables before starting
group0 server. This ensures that the node can read the correct values from
system tables and correct values are maintained throughout.
By moving raft_initialize_discovery_leader logic to happen before starting
group0 server, and not as group0 command post server start, we also get rid
of the potential problem of init group0 command not being the 1st command on
the server. Hence ensuring full integrity as expected by programmer.
Fixes: scylladb/scylladb#21114
Currently, the tablet repair scheduler repairs all replicas of a tablet.
It does not support hosts or DCs selection. It should be enough for most
cases. However, users might still want to limit the repair to certain
hosts or DCs in production. #21985 added the preparation work to add the
config options for the selection. This patch adds the hosts or DCs
selection support.
Fixes#22417
Previously, download_task_impl's destructor would destroy per-shard progress
elements on whatever shard the task was destroyed on. In multi-shard
environments, this caused "shared_ptr accessed on non-owner cpu" errors when
attempting to free memory allocated on a different shard.
Fix by:
- Convert progress_per_shard into a sharded service
- Stop the service on owner shards during cleanup using coroutines
- Add operator+= to stream_progress to leverage seastar's built-in adder
instead of a custom adder struct
Alternative approaches considered:
1. Using foreign_ptr: Rejected as it would require interface changes
that complicate stream delegation. foreign_ptr manages the underlying
pointee with another smart pointer but does not expose the smart
pointer instance in its APIs, making it impossible to use
shared_ptr<stream_progress> in the interface.
2. Using vector<stream_progress>: Rejected for similar interface
compatibility reasons.
This solution maintains the existing interfaces while ensuring proper
cross-shard cleanup.
Fixesscylladb/scylladb#22759
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Convert tasks::task_manager::task::impl::release_resources() to a coroutine
to prepare for upcoming changes that will implement asynchronous resource
release.
This is a preparatory refactoring that enables future coroutine-based
implementation of resource cleanup logic.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
The timeout of 10 seconds is too small for CI.
I didn't mean to make it so short, it was an accident.
Fix that by changing the timeout to 10 minutes.
Fixesscylladb/scylladb#22832Closesscylladb/scylladb#22836
Token metadata API now depend on gossiper to do ip to host id mappings,
so initialized it after the gossiper is initialized and de-initialized
it before gossiper is stopped.
Fixes: scylladb/scylladb#22743Closesscylladb/scylladb#22760
We need to allow replacing nodetool from scylla-enterprise-tools < 2024.2,
just like we did for scylla-tools < 5.5.
This is required to make packages able to upgrade from 2024.1.
Fixes#22820Closesscylladb/scylladb#22821
Serializing raft::append_request for transmission requires approximately
the same amount of memory as its size. This means when the Raft
library replicates a log item to M servers, the log item is
effectively copied M times. To prevent excessive memory usage
and potential out-of-memory issues, we limit the total memory
consumption of in-flight raft::append_request messages.
Fixes [scylladb/scylladb#14411](https://github.com/scylladb/scylladb/issues/14411)
test_complex_null_values is currently flaky, causing many failures
in CI. The reason for the failures is unclear, and a fix might not
be simple, so because UDFs are experimental, for now let's skip
this test until the corresponding issue is fixed.
Refs scylladb/scylladb#22799Closesscylladb/scylladb#22818
Now that we support suite subfolders,
As an example, this commit move mv tests into a separate folder
custom test.py lookup also works.
tests can be run as:
1. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets/test_mv_tablets_empty_ip
2. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv/tablets
3. ./tools/toolchain/dbuild ./test.py --no-gather-metrics --mode=dev topology_custom/mv
Creating an own folder used to be needed for two reasons:
- we want a separate test suite, with its own settings
- we want to structure tests, e.g. tablets, raft, schema, gossip.
We've been creating many folders recently. However, test suite
infrastructure is expensive in test.py - each suite has its own
pool of servers, concurrency settings and so on.
Make it possible to structure tests without too many suites,
by supporting subfolders within a suite.
Fixes#20570
Said method passes down its `diff` input to `mutate_internal()`, after
some std::ranges massaging. Said massaging is destructive -- it moves
items from the diff. If the output range is iterated-over multiple
times, only the first time will see the actual output, further
iterations will get an empty range.
When trace-level logging is enabled, this is exactly what happens:
`mutate_internal()` iterates over the range multiple times, first to log
its content, then to pass it down the stack. This ends up resulting in
a range with moved-from elements being pased down and consequently write
handlers being created with nullopt mutations.
Make the range re-entrant by materializing it into a vector before
passing it to `mutate_internal()`.
Fixes: scylladb/scylladb#21907Fixes: scylladb/scylladb#21714Closesscylladb/scylladb#21910
When building Scylla with ThinLTO enabled (default with Clang), the linker
spawns threads equal to the number of CPU cores during linking. This high
parallelism can cause out-of-memory (OOM) issues in CI environments,
potentially freezing the build host or triggering the OOM killer.
In this change:
1. Rename `LINK_MEM_PER_JOB` to `Scylla_RAM_PER_LINK_JOB` and make it
user-configurable
2. Add `Scylla_PARALLEL_LINK_JOBS` option to directly control concurrent
link jobs (useful for hosts with large RAM)
3. Increase the default value of `Scylla_PARALLEL_LINK_JOBS` to 16 GiB
when LTO is enabled
4. Default to 2 parallel link jobs when LTO is enabled if the calculated
number if less than 2 for faster build.
Notes:
- Host memory is shared across job pools, so pool separation alone doesn't help
- Ninja lacks per-job memory quota support
- Only affects link parallelism in LTO-enabled builds
See
https://clang.llvm.org/docs/ThinLTO.html#controlling-backend-parallelismFixesscylladb/scylladb#22275
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22383
Oversized materialized view and index names are rejected;
Materialized view names with invalid symbols are rejected.
fixes: #20755Closesscylladb/scylladb#21746
The Intel Optimizaton Manual states that branches with relative offsets
greater than 2GB suffer a penalty. They cite a 6% improvement when this
is avoided. Our code doesn't rely heavily on dynamically linked
libraries, so I don't expect a similar win, but it's still better to do
it than not.
Eliminate long branches by asking the dynamic linker to restrict itself
to the lower 4GB of the address space. I saw that it maps libraries
at 1GB+ addresses, so this satisfies the limitation.
Fix is from the Intel Optimization Manual as well.
This change was ported from ScyllaDB Enterprise.
Closesscylladb/scylladb#22498
This command exists but is not registered. There is a test for it, but
it happens to work only because scylla table is a prefix of scylla
tables (another command), so gdb invokes that other command instead.
Replace boost::remove_if() with the standard library's std::erase_if() or std::ranges::remove_if() to reduce external dependencies and simplify the codebase. This change eliminates the requirement for boost::range and makes the implementation more maintainable.
---
it's a cleanup, hence no need to backport.
Closesscylladb/scylladb#22788
* github.com:scylladb/scylladb:
service: migrate from boost::range::remove_if() to std::ranges::remove_if
sstable: migrate from boost::remove_if() to std::erase_if()
Replace boost::copy() with the standard library's std::ranges::copy()
to reduce external dependencies and simplify the codebase. This change
eliminates the requirement for boost::range and makes the implementation
more maintainable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22789
Set true to wait for the repair to complete. Set false to skip waiting
for the repair to complete. When the option is not provided, it defaults
to false.
It is useful for management tool that wants the api to be async.
Fixes#22418Closesscylladb/scylladb#22436
prepare helpfully prints out the path where optimized clang is stored,
but a couple of typos mean it prints out an empty string. Fix that.
Closesscylladb/scylladb#22714
Developers are expected to run new cqlpy tests against Cassandra - to
verify that the new test itself is correct. Usually there is no need
to run the entire cqlpy test suite against Cassandra, but when users do
this, it isn't confidence-inspiring to see hundreds of tests failing.
In this patch I fix many but not all of these failures.
Refs #11690 (which will remain open until we fix all the failures on
Cassandra)
* Fixed the "compact_storage" fixture recently introduced to enable the
deprecated feature in Scylla for the tests. This fixture was broken on
Cassandra and caused all compact-storage related tests to fail
on Cassandra.
* Marked all tests in test_tombstone_limit.py as scylla_only - as they
check the Scylla-only query_tombstone_page_limit configuration option.
* Marked all tests in test_service_level_api.py as scylla_only - as they
check the Scylla-only service levels feature.
* Marked a test specific to the Scylla-only IncrementalCompactionStrategy
as scylla_only. Some tests mix STCS and ICS testing in one test - this
is a mistake and isn't fixed in this patch.
* Various tests in test_tablets.py forgot to use skip_without_tablets
to skip them on Cassandra or older Scylla that doesn't have the
tablets feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
x
Closesscylladb/scylladb#22561
Use host_id in a children list of a task in task manager to indicate
a node on which the child was created.
Move TASKS_CHILDREN_REQUEST to IDL. Send it by host_id.
Fixes: https://github.com/scylladb/scylladb/issues/22284.
Ip to host_id transition; backport isn't needed.
Closesscylladb/scylladb#22487
* github.com:scylladb/scylladb:
tasks: drop task_manager::config::broadcast_address as it's unused
tasks: replace ip with host_id in task_identity
api: task_manager: pass gossiper to api::set_task_manager
tasks: keep host_id in task_manager
tasks: move tasks_get_children to IDL
The series fixes a regression and demotes a barrier_and_drain logging error to a warning since this particular condition may happen during normal operation.
We want to backport both since one is a bug fix and another is trivial and reduces CI flakiness.
Closesscylladb/scylladb#22650
* https://github.com/scylladb/scylladb:
topology_coordinator: demote barrier_and_drain rpc failure to warning
topology_coordinator: read peers table only once during topology state application
In this series we implement the UpdateTable operation to add a GSI to an existing table, or remove a GSI from a table. As the individual commit messages will explained, this required changing how Alternator stores materialized view keys - instead of insisting that these key must be real columns (that is **not** the case when adding a GSI to an existing table), the materialized view can now take as its key any Alternator attribute serialized inside the ":attrs" map holding all non-key attributes. Fixes#11567.
We also fix the IndexStatus and Backfilling attributes returned by DescribeTable - as DynamoDB API users use this API to discover when a newly added GSI completed its "backfilling" (what we call "view building") stage. Fixes#11471.
This series should not be backported lightly - it's a new feature and required fairly large and intrusive changes that can introduce bugs to use cases that don't even use Alternator or its UpdateTable operations - every user of CQL materialized views or secondary indexes, as well as Alternator GSI or LSI, will use modified code. **It should be backported to 2025.1**, though - this version was actually branched long after this PR was sent, and it provides a feature that was promised for 2025.1.
Closesscylladb/scylladb#21989
* github.com:scylladb/scylladb:
alternator: fix view build on oversized GSI key attribute
mv: clean up do_delete_old_entry
test/alternator: unflake test for IndexStatus
test/alternator: work around unrelated bug causing test flakiness
docs/alternator: adding a GSI is no longer an unimplemented feature
test/alternator: remove xfail from all tests for issue 11567
alternator: overhaul implementation of GSIs and support UpdateTable
mv: support regular_column_transformation key columns in view
alternator: add new materialized-view computed column for item in map
build: in cmake build, schema needs alternator
build: build tests with Alternator
alternator: add function serialized_value_if_type()
mv: introduce regular_column_transformation, a new type of computed column
alternator: add IndexStatus/Backfilling in DescribeTable
alternator: add "LimitExceededException" error type
docs/alternator: document two more unimplemented Alternator features
Replace boost::range::remove_if() with the standard library's
std::ranges::remove_if() to reduce external dependencies and simplify
the codebase. This change eliminates the requirement for boost::range
and makes the implementation more maintainable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Replace boost::remove_if() with the standard library's std::erase_if()
to reduce external dependencies and simplify the codebase. This change
eliminates the requirement for boost::range and makes the implementation
more maintainable.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port. This causes a restarted node not to use the
dictionary for RPC compression until the next dictionary update.
Fix that.
Fixesscylladb/scylladb#22738
This is more of a bugfix than an improvement, so it should be backported to 2025.1.
Closesscylladb/scylladb#22739
* github.com:scylladb/scylladb:
test_rpc_compression.py: test the dictionaries are loaded on startup
raft/group0_state_machine: load current RPC compression dict on startup
Before these changes, shutting down a node could be prolonged because of
mapreduce_service. `mapreduce_service::stop()` uninitializes messaging
service, which includes waiting for all ongoing RPC handlers. We already
had a mechanism for cancelling local mapreduce tasks, but we were missing
one for cancelling external queries.
In this commit, we modify the signature of the request so it supports
cancelling via an abort source. We also provide a reproducer test
for the problem.
Fixesscylladb/scylladb#22337Closesscylladb/scylladb#22651
The following is observed in pytest:
1) node1, stream master, tried to pull data from node3
2) node3, stream follower, found node1 restarted
3) node3 killed the rpc stream
4) node1 did not get the stream session failure message from node3. This
failure message was supposed to kill the stream plan on node1. That's the
reason node1 failed the stream session much later at "2024-08-19 21:07:45,539".
Note, node3 failed the stream on its side, so it should have sent the stream
session failure message.
```
$ cat node1.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO 2024-08-19 20:24:01,162 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers={127.0.34.3}, master
ERROR 2024-08-19 20:24:01,190 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to handle STREAM_MUTATION_FRAGMENTS (receive and distribute phase) for ks=ks, cf=cf, peer=127.0.34.3: seastar::nested_exception: seastar::rpc::stream_closed (rpc stream was closed by peer) (while cleaning up after seastar::rpc::stream_closed (rpc stream was closed by peer))
WARN 2024-08-19 21:07:45,539 [shard 0:main] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.3}, tx=0 KiB, 0.00 KiB/s, rx=484 KiB, 0.18 KiB/s
$ cat node3.log |grep f890bea0-5e68-11ef-99ae-e5bca04385fc
INFO 2024-08-19 20:24:01,163 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Executing streaming plan for Tablet migration-ks-index-0 with peers=127.0.34.1, slave
INFO 2024-08-19 20:24:01,164 [shard 1:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Start sending ks=ks, cf=cf, estimated_partitions=2560, with new rpc streaming
WARN 2024-08-19 20:24:01,187 [shard 0: gms] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming plan for Tablet migration-ks-index-0 failed, peers={127.0.34.1}, tx=633 KiB, 26506.81 KiB/s, rx=0 KiB, 0.00 KiB/s
WARN 2024-08-19 20:24:01,188 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] stream_transfer_task: Fail to send to 127.0.34.1:0: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Failed to send: seastar::rpc::stream_closed (rpc stream was closed by peer)
WARN 2024-08-19 20:24:01,189 [shard 0:strm] stream_session - [Stream #f890bea0-5e68-11ef-99ae-e5bca04385fc] Streaming error occurred, peer=127.0.34.1
```
To be safe in case the stream fail message is not received, node1 could fail
the stream plan as soon as the rpc stream is aborted in the
stream_mutation_fragments handler.
Fixes#20227Closesscylladb/scylladb#21960
bash error handling and reporting is atrocious. Without -e it will
just ignore errors. With -e it will stop on errors, but not report
where the error happened (apart from exiting itself with an error code).
Improve that with the `trap ERR` command. Note that this won't be invoked
on intentional error exit with `exit 1`.
We apply this on every bash script that contains -e or that it appears
trivial to set it in. Non-trivial scripts without -e are left unmodified,
since they might intentionally invoke failing scripts.
Closesscylladb/scylladb#22747
The partition key had been renamed and its type changed some time ago,
but the doc wasn't updated. Fix it.
refs: #20998
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#22683
This PR converts boost load balancer tests in preparation for load balancer changes
which add per-table tablet hints. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.
Topology is created in tests via group0 commands, which is abstracted by
the new `topology_builder` class.
Tests cannot modify token_metadata only in memory now as it needs to be
consistent with the schema and on-disk metadata. That's why modifications to
tablet metadata are now made under group0 guard and save back metadata to disk.
Closesscylladb/scylladb#22648
* github.com:scylladb/scylladb:
test: tablets: Drop keyspace after do_test_load_balancing_merge_colocation() scenario
tests: tablets: Set initial tablets to 1 to exit growing mode
test: tablets_test: Create proper schema in load balancer tests
test: lib: Introduce topology_builder
test: cql_test_env: Expose topology_state_machine
topology_state_machine: Introduce lock transition
This patch removes expansion of "SELECT *" in DESC MATERIALIZED VIEW.
Instead of explicitly printing each column, DESC command will now just
use SELECT *, if view was created with it. Also, adds a correspodning test.
Fixes#21154Closesscylladb/scylladb#21962
Replace boost::algorithm::all_of_equal() to std::ranges::all_of()
In order to reduce the header dependency to boost ranges library, let's
use the utility from the standard library when appropriate.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22730
In few test cases of test_view_build_status we create a view, wait for
it and then query the view_build_status table and expect it to have all
rows for each node and view.
But it may fail because it could happen that the wait_for_view query and
the following queries are done on different nodes, and some of the nodes
didn't apply all the table updates yet, so they have missing rows.
To fix it, we change the assert to work in the eventual consistency
sense, retrying until the number of rows is as expectd.
Fixesscylladb/scylladb#22644Closesscylladb/scylladb#22654
The "make-pr-ready-for-review" workflow was failing with an "Input
required and not supplied: token" error. This was due to GitHub Actions
security restrictions preventing access to the token when the workflow
is triggered in a fork:
```
Error: Input required and not supplied: token
```
This commit addresses the issue by:
- Running the workflow in the base repository instead of the fork. This
grants the workflow access to the required token with write permissions.
- Simplifying the workflow by using a job-level `if` condition to
controlexecution, as recommended in the GitHub Actions documentation
(https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution).
This is cleaner than conditional steps.
- Removing the repository checkout step, as the source code is not required for this workflow.
This change resolves the token error and ensures the
"make-pr-ready-for-review" workflow functions correctly.
Fixesscylladb/scylladb#22765
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22766
In c5668d99, a new source file row_cache.cc was added to the `db` target,
but with an extraneous trailing comma. In CMake's target_sources(),
source files should be space-separated - any comma is interpreted as
part of the filename, causing build failures like:
```
CMake Error at db/CMakeLists.txt:2 (target_sources):
Cannot find source file:
row_cache.cc,
```
Fix the issue by removing the trailing comma.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22754
Add the possibility to run boost and unit tests with pytest
test.py should follow the next paradigm - the ability to run all test cases sequentially by ONE pytest command.
With this paradigm, to have the better performance, we can split this 1 command into 2,3,4,5,100,200... whatever we want
It's a new functionality that does not touch test.py way of executing the boost and unit tests.
It supports the main features of test.py way of execution: automatic discovery of modes, repeats.
There is an additional requirement to execute tests in parallel: pytest-xdist. To install it, execute `pip install pytest-xdist`
To run test with pytest execute `pytest test/boost`. To execute only one file, provide the path filename `pytest test/boost/aggregate_fcts_test.cc` since it's a normal path, autocompletion will work on the terminal. To provide a specific mode, use the next parameter `--mode dev`, if parameter will not be provided pytest will try to use `ninja mode_list` to find out the compiled modes.
Parallel execution controlled by pyest-xdist and the parameter `-n 12`.
The useful command to discover the tests in the file or directory is `pytest --collect-only -q --mode dev test/boost/aggregate_fcts_test.cc`. That will return all test functions in the file. To execute only one function from the test, you can invoke the output from the previous command, but suffix for mode should be skipped, for example output will be `test/boost/aggregate_fcts_test.cc::test_aggregate_avg.dev`, so to execute this specific test function, please use the next command `pytest --mode dev test/boost/aggregate_fcts_test.cc::test_aggregate_avg`
There is a parameter `--repeat` that used to repeat the test case several times in the same way as test.py did.
It's not possible to run both boost and unit tests directories with one command, so we need to provide explicitly which directory should be executed. Like this `pytest --mode dev test/unit` or `pytest --mode dev test/boost`
Fixes: https://github.com/scylladb/qa-tasks/issues/1775Closesscylladb/scylladb#21108
* github.com:scylladb/scylladb:
test.py: Add possibility to run ldap tests from pytest
test.py: Add the possibility to run unit tests from pytest
test.py: Add the possibility to run boost test from pytest
test.py: Add discovery for C++ tests for pytest
test.py: Modify s3 server mock
test.py: Add method to get environment variables from MinIO wrapper
test.py: Move get configured modes to common lib
When upgrading for example from `2024.1` to `2025.1` the package name is
not identical casuing the upgrade command to fail:
```
Command: 'sudo DEBIAN_FRONTEND=noninteractive apt-get dist-upgrade scylla -y -o Dpkg::Options::="--force-confdef" -o Dpkg::Options::="--force-confold"'
Exit code: 100
Stdout:
Selecting previously unselected package scylla.
Preparing to unpack .../6-scylla_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb ...
Unpacking scylla (2025.1.0~dev-0.20250118.1ef2d9d07692-1) ...
Errors were encountered while processing:
/tmp/apt-dpkg-install-JbOMav/0-scylla-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/1-scylla-python3_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/2-scylla-server_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/3-scylla-kernel-conf_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/4-scylla-node-exporter_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
/tmp/apt-dpkg-install-JbOMav/5-scylla-cqlsh_2025.1.0~dev-0.20250118.1ef2d9d07692-1_amd64.deb
Stderr:
E: Sub-process /usr/bin/dpkg returned an error code (1)
```
Adding `Obsoletes` (for rpm) and `Replaces` (for deb)
Fixes: https://github.com/scylladb/scylladb/issues/22420Closesscylladb/scylladb#22457
* tools/python3 8415caf4...3e0b8932 (2):
> reloc: collect package files correctly if the package has an optional dependency
> dist: support smooth upgrade from enterprise to source availalbe
Closesscylladb/scylladb#22517
This series extends the table schema with per-table tablet options.
The options are used as hints for initial tablet allocation on table creation and later for resize (split or merge) decisions,
when the table size changes.
* New feature, no backport required
Closesscylladb/scylladb#22090
* github.com:scylladb/scylladb:
tablets: resize_decision: get rid of initial_decision
tablet_allocator: consider tablet options for resize decision
tablet_allocator: load_balancer: table_size_desc: keep target_tablet_size as member
network_topology_strategy: allocate_tablets_for_new_table: consider tablet options
network_topology_strategy: calculate_initial_tablets_from_topology: precalculate shards per dc using for_each_token_owner
network_topology_strategy: calculate_initial_tablets_from_topology: set default rf to 0
cql3: data_dictionary: format keyspace_metadata: print "enabled":true when initial_tablets=0
cql3/create_keyspace_statement: add deprecation warning for initial tablets
test: cqlpy: test_tablets: add tests for per-table tablet options
schema: add per-table tablet options
feature_service: add TABLET_OPTIONS cluster schema feature
`set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature eviction.
Fixes: https://github.com/scylladb/scylladb/issues/22629
Backport required to all active releases, they are all affected.
Closesscylladb/scylladb#22701
* github.com:scylladb/scylladb:
reader_concurrency_semaphore: set_notify_handler(): disable timeout
reader_permit: mark check_abort() as const
Add posibility to run ldap tests with pytest.
LDAP server will be created for each worker if xdist will be used.
For one thread one LDAP server will be used for all tests.
Add the possibility to run boost test from pytest.
Boost facade based on code from https://github.com/pytest-dev/pytest-cpp, but enhanced and rewritten to suite better.
Code based on https://github.com/pytest-dev/pytest-cpp. Updated, customized, enhanced to suit current needs.
Modify generate report to not modify the names, since it will break
xdist way of working. Instead modification will be done in post collect
but before executing the tests.
Add the possibility to return environment as a dict to use it later it subprocess created by xdist, without starting another s3 mock server for each thread.
Add method to retrieve MinIO server wrapper environment variables for
later processing.
This change will allow to sharing connection information with other
processes and allow reusing the server across multiple tests.
This scenario is invoked in a loop in the
test_load_balancing_merge_colocation_with_random_load test case, which
will cause accumulation of tablet maps making each reload slower in
subsequent iterations.
It wasn't a problem before because we overwritten tablet_metadata in
each iteration to contain only tablets for the current table, but now
we need to keep it consistent with the schema and don't do that.
After tablet hints, there is no notion of leaving growing mode and
tablet count is sustained continuously by initial tablet option, so we
need to lower it for merge to happen.
This is in preparation for load balancer changes needed to respect
per-table tablet hints and respecting per-shard tablet count
goal. After those changes, load balancer consults with the replication
strategy in the database, so we need to create proper schema in the
database. To do that, we need proper topology for replication
strategies which use RF > 1, otherwise keyspace creation will fail.
Will be used by load balancer tests which need more than a single-node
topology, and which want to create proper schema in the database which
depends on that topology, in particular creating keyspaces with
replication factor > 1.
We need to do that because load balancer will use replication strategy
from the database as part of plan making.
Will be used in load balancer tests to prevent concurrent topology
operations, in particular background load balancing.
load balancer will be invoked explicitly by the test. Disabling load
balancer in topology is not a solution, because we want the explicit
call to perform the load balancing.
This patch series contains improvements to our GitHub license header check workflow.
The first patch grants necessary write permissions to the workflow, allowing it to comment directly on pull requests when license header issues are found. This addresses a permissions-related error that previously prevented the workflow from creating comments.
The second patch optimizes the workflow by skipping the license check step when no relevant files have been modified in the pull request. This prevents unnecessary workflow failures that occurred when the check was run without any files to analyze.
Together, these changes make the license header checking process more robust and efficient. The workflow now properly communicates findings through PR comments and avoids running unnecessary checks.
---
no need to backport, as the workflow updated by this change only exists in master.
Closesscylladb/scylladb#22736
* github.com:scylladb/scylladb:
.github: grant write permissions for PR comments in license check workflow
.github: skip license check when no relevant files changed
It's possible to modify 'memtable_flush_period_in_ms' option only and as
single option, not with any other options together
Refs #20999Fixes#21223Closesscylladb/scylladb#22536
Grant write permissions to the check-license-header workflow to enable
commenting on pull requests. This fixes the "Resource not accessible by
integration" HTTP error that occurred when the workflow attempted to
create comments.
The permission is required according to GitHub's API documentation for
creating issue comments.
see also https://docs.github.com/en/rest/issues/comments?apiVersion=2022-11-28#create-an-issue-comment
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Skip the license header check step in `check-license-header.yaml` workflow
when no files with configured extensions were changed in the pull request.
Previously, the workflow would fail in this case since the --files
argument requires at least one file path:
```
check-license.py: error: argument --files: expected at least one argument
```
Add `if` condition to only run the check when steps.changed-files.outputs.files
is not empty.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Until now this action checked if we have a `backport/none` or `backport/x.y` label only, since we moved to the source available and the releases like 2025.1 don't match this regex this action keeps failing
Closesscylladb/scylladb#22734
set_notify_handler() is called after a querier was inserted into the
querier cache. It has two purposes: set a callback for eviction and set
a TTL for the cache entry. This latter was not disabling the
pre-existing timeout of the permit (if any) and this would lead to
premature eviction of the cache entry if the timeout was shorter than
TTL (which his typical).
Disable the timeout before setting the TTL to prevent premature
eviction.
Fixes: #scylladb/scylladb#22629
This patch addresses an issue where the buffer offset becomes incorrect when a request is retried. The new request uses an offset that has already been advanced, causing misalignment. This fix ensures the buffer offset is correctly reset, preventing such errors.
Closesscylladb/scylladb#22729
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.
This change:
- Introduce authenticator::ensure_superuser_is_created() to allow
waiting for complete initialization of super user authentication
- Implement ensure_superuser_is_created in password_authenticator, so
waiting for superuser password initialization is possible
- Implement ensure_superuser_is_create in transitional_authenticator,
so the implementation from password_authenticator is used
- Implement no-op ensure_superuser_is_create for other authenticators
- Extend service::ensure_superuser_is_created to wait for superuser
initialization in authenticator, just as it was implemented earlier
for role_manager
- Add injected error (sleep) in password_authenticator::start to
reproduce a case of delayed password creation
- Implement test_delayed_deafult_password to verify the correctness of the fix
- Ensure superuser is created in single_node_cql_env::run_in_thread to
make single_node_cql more similar to scylla_main in main.cc
Fixesscylladb/scylladb#20566
Backport not needed - a minor bugfix
Closesscylladb/scylladb#22532
* github.com:scylladb/scylladb:
test: implement test_auth_password_ensured
test: implement connect_driver argument in ManagerClient::server_add
auth: ensure default superuser password is set before serving CQL
auth: added password_authenticator_start_pause injected error
We are supposed to be loading the most recent RPC compression dictionary
on startup, but we forgot to port the relevant piece of logic during
the source-available port.
This pull request is an implementation of vector data type similar to one used by Apache Cassandra.
The patch contains:
- implementation of vector_type_impl class
- necessary functionalities similar to other data types
- support for serialization and deserialization of vectors
- support for Lua and JSON format
- valid CQL syntax for `vector<>` type
- `type_parser` support for vectors
- expression adjustments such as:
- add `collection_constructor::style_type::vector`
- rename `collection_constructor::style_type::list` to `collection_constructor::style_type::list_or_vector`
- vector type encoding (for drivers)
- unit tests
- cassandra compatibility tests
- necessary documentation
Co-authored-by: @janpiotrlakomy
Fixes https://github.com/scylladb/scylladb/issues/19455Closesscylladb/scylladb#22488
* github.com:scylladb/scylladb:
docs: add vector type documentation
cassandra_tests: translate tests covering the vector type
type_codec: add vector type encoding
boost/expr_test: add vector expression tests
expression: adjust collection constructor list style
expression: add vector style type
test/boost: add vector type cql_env boost tests
test/boost: add vector type_parser tests
type_parser: support vector type
cql3: add vector type syntax
types: implement vector_type_impl
Do not merge tablets if that would drop the tablet_count
below the minimum provided by hints.
Split tablets if the current tablet_count is less than
the minimum tablet count calculated using the table's tablet options.
TODO: override min_tablet_count if the tablet count per shard
is greater than the maximum allowed. In this case
the tables tablet counts should be scaled down proportionally.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
In column_family.cc and storage_service.cc there exist a bunch of helpers that parse and/or validate ks/cf names, and different endpoints use different combinations of those, duplicating the functionality of each other and generating some mess. This PR cleans the endpoints from column_family.cc that parse and validate fully qualified table name (the '$ks:$cf' string).
A visible "improvement" is that `validate_table()` helper usage in the api/ directory is narrowed down to storage_service.cc file only (with the intent to remove that helper completely), and the aforementioned `for_tables_on_all_shards()` helper becomes shorter and tiny bit faster, because it doesn't perform some re-lookups of tables, that had been performed by validation sanity checks before it.
There's more to be done in those helpers, this PR wraps only one part of this mess.
Below is the list of endpoints this PR affects and the tests that validate the changes:
|endpoint|test|
|-|-|
|column_family/autocompaction|rest_api/test_column_family::test_column_family_auto_compaction_table|
|column_family/tombstone_gc|rest_api/test_column_family::test_column_family_tombstone_gc_api|
|column_family/compaction_strategy|rest_api/test_column_family/test_column_family_compaction_strategy|
|compaction_manager/stop_keyspace_compaction/|rest_api/test_compaction_manager::{test_compaction_manager_stop_keyspace_compaction,test_compaction_manager_stop_keyspace_compaction_tables}|
Closesscylladb/scylladb#21533
* github.com:scylladb/scylladb:
api: Hide parse_tables() helper
api: Use parse_table_infos() in stop_keyspace_compaction handler
api: Re-use parse_table_info() in column_family API
api: Make get_uuid() return table_info (and rename)
api: Remove keyspace argument from for_table_on_all_shards()
api: Switch for_table_on_all_shards() to use table_info-s
api: Hide validate_table() helper
api: Tables vector is never empty now in for_table_on_all_shards()
api: Move vectors of tables, not copy
api: Add table validation to set_compaction_strategy_class endpoint
api: Use get_uuid() to validate_table() in column family API
api: Use parse_table_infos() in column family API
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
also, took this opportunity to remove an unused namespace alias. and
add an include which is used actually. please note,
`std::ranges::pop_heap()` and friends are actually provided by
`<algorithm>` not `<ranges>`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22716
Before fix of scylladb#20566, CQL was served irrespectively of default
superuser password creation, which led to an incorrect product behavior
and sporadic test failures. This test verifies race condition of serving
CQL and creating default superuser password. Injected failure is used to
ensure CQL use is attempted before default superuser password creation,
however, the attempt is expected to fail because scylladb#20566 is
fixed. Following that, the injected error is notified, so CQL driver can
be started correctly. Finally, CREATE USER query is executed to confirm
successful superuser authentication.
This change:
- Implement test_auth_password_ensured.py
The test starts a server without expecting CQL serving, because
expected_server_up_state=ServerUpState.HOST_ID_QUERIED and
connect_driver=False. Error password_authenticator_start_pause is
injected to block superuser password setup during server startup.
Next, the test waits for a log to confirm that the code implementing
injected error is reached. When the server startup procedure is
unfinished, some operations might not complete on a first try, so
waiting for driver connection is wrapped in repeat_if_host_unavailable.
This commit introduces connect_driver argument in
ManagerClient::server_add. The argument allow skipping CQL driver
initialization part during server start. Starting a server without
the driver is necessary to implement some test scenarios related
to system initialization.
After stopping a server, ManagerClient::server_start can be used to
start the server again, so connect_driver argument is also added here to
allow preventing connecting the driver after a server restart.
This change:
- Implement connect_driver argument in ManagerClient::server_add
- Implement connect_driver argument in ManagerClient::server_start
Before this change, it was ensured that a default superuser is created
before serving CQL. However, the mechanism didn't wait for default
password initialization, so effectively, for a short period, customer
couldn't authenticate as the superuser properily. The purpose of this
change is to improve the superuser initialization mechanism to wait for
superuser default password, just as for the superuser creation.
This change:
- Introduce authenticator::ensure_superuser_is_created() to allow
waiting for complete initialization of super user authentication
- Implement ensure_superuser_is_created in password_authenticator, so
waiting for superuser password initialization is possible
- Implement ensure_superuser_is_create in transitional_authenticator,
so the implementation from password_authenticator is used
- Implement no-op ensure_superuser_is_create for other authenticators
- Modify service::ensure_superuser_is_created to wait for superuser
initialization in authenticator, just as it was implemented earlier
for role_manager
Fixesscylladb/scylladb#20566
This change:
- Implement password_authenticator_start_pause injected error to allow
deterministic blocking of default superuser password creation
This change facilitates manual testing of system behavior when default
superuser password is being initialized. Moreover, this mechanism will
be used in next commits to implement a test to verify a fix for
erroneous CQL serving before default superuser password creation.
this workflow checks the first 10 lines for
"LicenseRef-ScyllaDB-Source-Available-1.0" in newly introduced files
when a new pull request is created against "master" or "next".
if "LicenseRef-ScyllaDB-Source-Available-1.0" is not found, the
workflow fails. for the sake of simplicity, instead of parsing the
header for SPDX License ID, we just check to see if the
"LicenseRef-ScyllaDB-Source-Available-1.0" is included.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22065
Before this patch, the regular_column_transformation constructor, which
we used in Alternator GSIs to generates a view key from a regular-column
cell, accepted a cell of any size. As a reviewer (Avi) noticed, very
long cells are possible, well beyond what Scylla allows for keys (64KB),
and because regular_column_transformation stores such values in a
contiguous "bytes" object it can cause stalls.
But allowing oversized attributes creates an even more accute problem:
While view building (backfilling in DynamoDB jargon), if we encounter
an oversized (>64KB) key, the view building step will fail and the
entire view building will hang forever.
This patch fixes both problems by adding to regular_column_transformation's
constructor the check that if the cell is 64KB or larger, an empty value
is returned for the key. This causes the backfilling to silently skip
this item, which is what we expect to happen (backfilling cannot do
anything to fix or reject the pre-existing items in the best table).
A test test_gsi_updatetable.py::test_gsi_backfill_oversized_key is
introduced to reproduce this problem and its fix. The test adds a 65KB
attribute to a base table, and then adds GSIs to this table with this
attribute as its partition key or its sort key. Before this patch, the
backfilling process for the new GSIs hangs, and never completes.
After this patch, the backfilling completes and as expected contains
other base-table items but not the item with the oversized attribute.
The new test also passes on DynamoDB.
However, while implementing this fix I realized that issue #10347 also
exists for GSIs. Issue #10347 is about the fact that DynamoDB limits
partition key and sort key attributes to 2048 and 1024 bytes,
respectively. In the fix described above we only handled the accute case
of lengths above 64 KB, but we should actually skip items whose GSI
keys are over 2048 or 1024 bytes - not 64KB. This extra checking is
not handled in this patch, and is part of a wider existing issue:
Refs #10347
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The function do_delete_old_entry() had an if() which was supposedly for
the case of collection column indexing, and which our previous patch
that improved this function to support caller-specified deletion_ts
left behind.
As a reviewer noticed, the new tombstone-setting code was in an "else"
of that existing if(), and it wasn't clear what happens if we get to that
else in the collection column indexing. So I reviewed the code and added
breakpoints and realized that in fact, do_delete_old_entry() is never
called for the collection-indexing case, which has its own
update_entry_for_computed_column() which view_updates::generate_update()
calls instead of the do_delete_old_entry() function and its friends.
So it appears that do_delete_old_entry() doesn't need that special
case at all, which simplifies it.
We should eventually simplify this code further. In particular, the
function generate_update() already knows the key of the rows it
adds or deletes so do_delete_old_entry() and its friends don't need
to call get_view_rows() to get it again. But these simplifications
and other will need to come in a later patch series, this one is
already long enough :-)
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The test for IndexStatus verifies that on a newly created table and GSI,
the IndexStatus is "ACTIVE". However, in Alternator, this doesn't strictly
need to happen *immediately* - view building, even for an empty table -
can take a short while in debug mode. This make the test test
test_gsi_describe_indexstatus flaky in debug mode.
The fix is to wait for the GSI to become active with wait_for_gsi()
before checking it is active. This is sort of silly and redundant,
but the important point that if the IndexStatus is incorrect this test
will fail, it doesn't really matter whether the wait_for_gsi() or
the DescribeTable assertion is what fails.
Now that wait_for_gsi() is used in two test files, this patch moves it
(and its friend, wait_for_gsi_gone()) to util.py.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The alternator test test_gsi_updatetable.py::test_gsi_delete_with_lsi
Creates a GSI together with a table, and then deletes it. We have a
bug unrelated to the purpose of this test - #9059 - that causes view
building to sometimes crash Scylla if the view is deleted while the
view build is starting. We see specifically in debug builds that even
view building of an *empty* table might not finish before the test
deletes the view - so this bug happens.
Work around that bug by waiting for the GSI to build after creating
the table with the GSI. This shouldn't be necessary (in DynamoDB,
a GSI created with the table always begins ready with the table),
but doesn't hurt either.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The previous patches implemented issue #11567 - adding a GSI to a
pre-existing table. So we can finally remove the mention of this
feature as an "unimplemented feature" in docs/alternator/compatibility.md.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The previous patches fully implemented issue 11567 - supporting
UpdateTable to add or delet a GSI on an existing Alternator table.
All 14 tests that were marked xfail because of this issue now pass,
so this patch removes their xfail. There are no more xfailing tests
referring to this issue.
These 14 tests, most of them in test/alternator/test_gsi_updatetable.py,
cover all aspects of this feature, including adding a GSI, deleting a
GSI, interactions between GSI and LSI, RBAC when adding or deleting a GSI,
data type limitation on an attribute that becomes a GSI key or stops
being one, GSI backfill, DescribeTable and backfill, various error
conditions, and more.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
The main goal of this patch is to fully support UpdateTable's ability
to add a GSI to an existing table, and delete a GSI from an existing
table. But to achieve this, this patch first needs to overhaul how GSIs
are implemented:
Remember that in Alternator's data model, key attributes in a table
are stored as real CQL columns (with a known type), but all other
attributes of an item are stored in one map called ":attrs".
* Before this patch, the GSI's key columns were made into real columns
in the table's schema, and the materialized view used that column as
the view's key.
* After this patch, the GSI's key columns usually (when they are not
the base table's keys, and not any LSI's key) are left in the ":attrs"
map, just like any other non-key column. We use a new type of computed
column (added in the previous patch) to extract the desired element from
this map.
This overhaul of the GSI implementation doesn't change anything in the
functionality of GSIs (and the Alternator test suite tries very hard to
ensure that), but finally allows us to add a GSI to an already-existing
table. This is now possible because the GSI will be able to pick up
existing data from inside the ":attrs" map where it is stored, instead
of requiring the data in the map to be moved to a stand-alone column as
the previous implementation needed.
So this patch also finally implements the UpdateTable operations
(Create and Delete) to add or delete a GSI on an existing table,
as this is now fairly straightfoward. For the process of "backfilling"
the existing data into the new GSI we don't need to do anything - this
is just the materialized-view "view building" process that already
exists.
Fixes#11567.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In an earlier patch, we introduced regular_column_transformation,
a new type of computed column that does a computation on a cell in
regular column in the base and returns a potentially transformed cell
(value or deletion, timestamp and ttl). In *this* patch, we wire the
materialized view code to support this new kind of computed column that
is usable as a materialized-view key column. This new type of computed
column is not yet used in this patch - this will come in the next
patch, where we will use it for Alternator GSIs.
Before this patch, the logic of deciding when the view update needs
to create a new row or delete a new one, and which timestamp and ttl
to give to the new row, could depend on one (or two - in Alternator)
cells read from base-table regular columns. In this patch, this logic
is rewritten - the notion of "base table regular columns" is generalized
to the notion of "updatable view key columns" - these are view key
columns that an update may change - because they really are base regular
columns, or a computed function of one (regular_column_transformation).
In some sense, the new code is easier to understand - there is no longer
a separate "compute_row_marker()" function, rather the top-level
generate_update() is now in charge of finding the "updatable view key
columns" and calculate the row marker (timestamp and ttl) as part
of deciding what needs to be done.
But unfortunately the code still has separate code paths for "collection
secondary indexing", and also for old-style column_computation (basically,
only token_column_computation). Perhaps in the future this can be further
simplified.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds a new computed column class for materialized views,
extract_from_attrs_column_computation
which is Alternator-specific and knows how to extract a value (of a
known type) from an attribute stored in Alternator's map-of-all-nonkey-
attributes ":attrs".
We'll use this new computed column in the next patch to reimplement GSI.
The new computed-column class is based on regular_column_transformation
introduced in the previous patch. It is not yet wired to anything:
The MV code cannot handle any regular_column_transformation yet, and
Alternator will not yet use it to create a GSI. We'll do those things
in the following patches.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch is to cmake what the previous patch was to configure.py.
In the next patch we want to make schema/schema.o depend on
alternator/executor.o - because when the schema has an Alternator
computed column, the schema code needs to construct the computed column
object (extract_from_attrs_column_computation) and that lives in
alternator/executor.o.
In the cmake-based build, all the schema/* objects are put into one
library "libschema.a". But code that uses this library (e.g., tests)
can't just use that library alone, because it depends on other code
not in schema/. So CMakeLists.txt lists other "libraries" that
libschema.a depends on - including for example "cql3". We now need
to add "alternator" to this dependency list. The dependency is marked
"PRIVATE" - schema needs alternator for its own internal uses, but
doesn't need to export alternator's APIs to its own users.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
For an unknown (to me) reason, configure.py has two separate source
file lists - "scylla_core" and "alternator". Scylla, and almost all tests,
are compiled with both lists, but just a couple of tests were compiled
with just scylla_core without alternator.
In the next patch we want to make schema/schema.o depened on
alternator/executor.o because when the schema has an Alternator
computed column, the schema code needs to construct the computed column
object (extract_from_attrs_column_computation) and that lives in
alternator/executor.o.
This change will break the build of the two tests that do not include
the Alternator objects. So let's just add the "alternator" dependencies
to the couple of tests that were missing it.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch introduces a function serialized_value_if_type() which takes
a serialized value stored in the ":attrs" map, and converts it into a
serialized *CQL* type if it matches a particular type (S, B or N) - or
returns null the value has the wrong type.
We will use this function in the following patch for deserializing
values stored in the ":attrs" map to use them as a materialized view
key. If the value has the right type, it will be converted to the CQL
type and used as the key - but if it has the wrong type the key will
be null and it will not appear in the view. This is exactly how GSI
is supposed to behave.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
In the patches that follow, we want Alternator to be able to use as a
key for a materialized view (GSI) not a real column from the schema,
but rather an attribute value deserialized from a member of the ":attrs"
map.
For this, we need the ability for materialized view to define a key
column which is computed as function of a real column (":attrs").
We already have an MV feature which we called "computed column"
(column_computation), but it is wholy inadequate for this job:
column_computation can only take a partition key, and produce a value -
while we need it to take a regular column (one member of ":attrs"),
not just the partition key, and return a cell - value or deletion,
timestamp and TTL.
So in this patch we introduce a new type of computed column, which we
called "regular_column_transformation" since it intends to perform some
sort of transformation on a single column (or more accurately, a single
atomic cell). The limitation that this function transforms a single
column only is important - if we had a function of multiple columns,
we wouldn't know which timestamp or ttl it should use for the result
if the two columns had different timestamps or TTLs.
The new class isn't wired to anything yet: The MV code cannot handle
it yet, and the Alternator code will not use it yet.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds the missing IndexStatus and Backfilling fields for the
GSIs listed by a DescribeTable request. These fields allow an application
to check whether a GSI has been fully built (IndexStatus=ACTIVE) or
currently being built (IndexStatus=CREATING, Backfilling=true).
This feature is necessary when a GSI can be added to an existing table
so its backfilling might take time - and the application might want to
wait for it.
One test - test_gsi.py::test_gsi_describe_indexstatus - begins to pass
with this fix, so the xfail tag is removed from it.
Fixes#11471.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
This patch adds to Alternator's api_error type yet another type of
error, api_error::limit_exceeded (error code "LimitExceededException").
DynamoDB returns this error code in certain situations where certain
low limits were exceeded, such as the case we'll need in a following
patch - an UpdateTable that tries to create more than one GSI at once.
The LimitExceededException error type should not be confused with
other similarly-named but different error messages like
ProvisionedThroughputExceededException or RequestLimitExceeded.
In general, we make an attempt to return the same error code that
DynamoDB returns for a given error.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Two new features were added to DynamoDB this month - MultiRegionConsistency
and WarmThroughput. Document them as unimplemented - and link to the
relevant issue in our bug tracker - in docs/alternator/compatibility.md.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
- Implement ScyllaRESTAPIClient::get_config() that obtains a current
value of given configuration parameter via /v2/config REST API
- Implement test to confirm that only liveupdatable parameters are
changed when SIGHUP is sent after configuration file change
Function set_initialization_completed() is called only once in main.cc,
and the effect is expected to be visible in all shards, as a side effect
of cfg->broadcast_to_all_shards() that is called shortly after. The same
technique was already used for enable_3_1_0_compatibility_mode() call.
Fixesscylladb/scylladb#5382
No backport - minor fix.
Closesscylladb/scylladb#22655
* github.com:scylladb/scylladb:
test: SIGHUP doesn't change non-liveupdatable configuration
test: implement ScyllaRESTAPIClient::get_config()
config: prevent SIGHUP from changing non-liveupdatable parameters
config: remove unused set_value_on_all_shards(const YAML::Node&)
This update introduces four types of credential providers:
1. Environment variables
2. Configuration file
3. AWS STS
4. EC2 Metadata service
The first two providers should only be used for testing and local runs. **They must NEVER be used in production.**
The last two providers are intended for use on real EC2 instances:
- **AWS STS**: Preferred method for obtaining temporary credentials using IAM roles.
- **EC2 Metadata Service**: Should be used as a last resort.
Additionally, a simple credentials provider chain is created. It queries each provider sequentially until valid credentials are obtained. If all providers fail, it returns an empty result.
fixes: #21828Closesscylladb/scylladb#21830
* github.com:scylladb/scylladb:
docs: update the `object_storage.md` and `admin.rst`
aws creds: add STS and Instance Metadata service credentials providers
aws creds: add env. and file credentials providers
s3 creds: move credentials out of endpoint config
Rather than target_max_tablet_size. We need both the target
as well as max and min tablet sizes, so there is no sense
in keeping the max and deriving the target and the minimum
for the max value.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Use the keyspace initial_tablets for min_tablet_count, if the latter
isn't set, then take the maximum of the option-based tablet counts:
- min_tablet_count
- and expected_data_size_in_gb / target_tablet_size
- min_per_shard_tablet_count (via
calculate_initial_tablets_from_topology)
If none of the hints produce a positive tablet_count,
fall back to calculate_initial_tablets_from_topology * initial_scale.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Current implementation is inefficient as it calls
get_datacenter_token_owners_ips and then find_node(ep)
while for_each_node easily provides a host_id for
is_normal_token_owner.
Then, since we're interested only in datacenters
configure with a replication factor (but it still might be 0),
simply iterate over the dc->rf map.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, if a datacenter has no replication_factor option
we consider its replication factor to be 1 in
calculate_initial_tablets_from_topology, but since
we're not going to have any replica on it, it should
be 0.
This is very minor since in the worst case, it
will pessimize the calculation and calculate a value
for initial_tablets that's higher than it could be.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Keyspace `initial` tablets option is deprecated
and may be removed in the future.
Rather than relying on `initial`:0 to always enabled
tablets, explicitly print "enabled":true when tablets
are enabled and initial_tablets=0, same as keyspace_metadata::describe.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Per-table hints should be used instead.
Note: the warning is produced by check_against_restricted_replication_strategies
which is called also from alter_keyspace_statement.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Test specifying of per-table tablet options on table creation
and alter table.
Also, add a negative test for atempting to use tablet options
with vnodes (that should fail).
And add a basic test for testing tablet options also with
materialized views.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Unlike with vnodes, each tablet is served only by a single
shard, and it is associated with a memtable that, when
flushed, it creates sstables which token-range is confined
to the tablet owning them.
On one hand, this allows for far better agility and elasticity
since migration of tablets between nodes or shards does not
require rewriting most if not all of the sstables, as required
with vnodes (at the cleanup phase).
Having too few tablets might limit performance due not
being served by all shards or by imbalance between shards
caused by quantization. The number of tabelts per table has to be
a power of 2 with the current design, and when divided by the
number of shards, some shards will serve N tablets, while others
may serve N+1, and when N is small N+1/N may be significantly
larger than 1. For example, with N=1, some shards will serve
2 tablet replicas and some will serve only 1, causing an imbalance
of 100%.
Now, simply allocating a lot more tablets for each table may
theoretically address this problem, but practically:
a. Each tablet has memory overhead and having too many tablets
in the system with many tables and many tablets for each of them
may overwhelm the system's and cause out-of-memory errors.
b. Too-small tablets cause a proliferation of small sstables
that are less efficient to acces, have higher metadata overhead
(due to per-sstable overhead), and might exhaust the system's
open file-descriptors limitations.
The options introduced in this change can help the user tune
the system in two ways:
1. Sizing the table to prevent unnecessary tablet splits
and migrations. This can be done when the table is created,
or later on, using ALTER TABLE.
2. Controlling min_per_shard_tablet_count to improve
tablet balancing, for hot tables.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
For example, nodes which are being decommissioned should not be
consider as available capacity for new tables. We don't allocate
tablets on such nodes.
Would result in higher per-shard load then planned.
Closesscylladb/scylladb#22657
in order to reduce the external header dependency, let's switch to
the standardlized std::ranges::min_element().
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22572
This config item controls how many CPU-bound reads are allowed to run in
parallel. The effective concurrency of a single CPU core is 1, so
allowing more than one CPU-bound reads to run concurrently will just
result in time-sharing and both reads having higher latency.
However, restricting concurrency to 1 means that a CPU bound read that
takes a lot of time to complete can block other quick reads while it is
running. Increase this default setting to 2 as a compromise between not
over-using time-sharing, while not allowing such slow reads to block the
queue behind them.
Fixes: #22450Closesscylladb/scylladb#22679
One of the design goals of the Alternator test suite (test/alternator)
is that developers should be able to run the tests against some already
running installation by running `cd test/alternator; pytest [--url ...]`.
Some of our presentations and documents recommend running Alternator
via docker as:
docker run --name scylla -d -p 8000:8000 scylladb/scylla:latest
--alternator-port=8000 --alternator-write-isolation=always
This only makes port 8000 available to the host - the CQL port is
blocked. We had a bug in conftest.py's get_valid_alternator_role()
which caused it to fail (and fail every single test) when CQL is
not available. What we really want is that when CQL is not available
and we can't figure out a correct secret key to connect to Alternator,
we just try a connect with a fake key - and hope that the option
alternator-enforce-authorization is turned off. In fact, this is what
the code comments claim was already happening - but we failed to
handle the case that CQL is not available at all.
After this patch, one can run Alternator with the above docker
command, and then run tests against it.
By the way, this provides another way for running any old release of
Scylla and running Alternator tests against it. We already supported
a similar feature via test/alternator/run's "--release" option, but
its implementation doesn't use docker.
Fixes#22591
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22592
On short-pages, cut short because of a tombstone prefix.
When page-results are filtered and the filter drops some rows, the
last-position is taken from the page visitor, which does the filtering.
This means that last partition and row position will be that of the last
row the filter saw. This will not match the last position of the
replica, when the replica cut the page due to tombstones.
When fetching the next page, this means that all the tombstone suffix of
the last page, will be re-fetched. Worse still: the last position of the
next page will not match that of the saved reader left on the replica, so
the saved reader will be dropped and a new one created from scratch.
This wasted work will show up as elevated tail latencies.
Fix by always taking the last position from raw query results.
Fixes: #22620Closesscylladb/scylladb#22622
The `which` command is typically not installed on cloud OS images
and so requires the user to remember to install it (or to be prompted
by a failure to install it).
Replace it with the built-in `type` that is always there. Wrap it
in a function to make it clear what it does.
Closesscylladb/scylladb#22594
Since mid December, tests started failing with ENOMEM while
submitting I/O requests.
Logs of failed tests show IO uring was used as backend, but we
never deliberately switched to IO uring. Investigation pointed
to it happening accidentaly in commit 1bac6b75dc,
which turned on IO uring for allowing native tool in production,
and picked linux-aio backend explicitly when initializing Scylla.
But it missed that seastar-based tests would pick the default
backend, which is io_uring once enabled.
There's a reason we never made io_uring the default, which is
that it's not stable enough, and turns out we made the right
choice back then and it apparently continue to be unstable
causing flakiness in the tests.
Let's undo that accidental change in tests by explicitly
picking the linux-aio backend for seastar-based tests.
This should hopefully bring back stability.
Refs #21968.
Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>
Closesscylladb/scylladb#22695
This commit introduces two new credentials providers: STS and Instance Metadata Service. The S3 client's provider chain has been updated to incorporate these new providers. Additionally, unit tests have been added to ensure coverage of the new functionality.
This commit entirely removes credentials from the endpoint configuration. It also eliminates all instances of manually retrieving environment credentials. Instead, the construction of file and environment credentials has been moved to their respective providers. Additionally, a new aws_credentials_provider_chain class has been introduced to support chaining of multiple credential providers.
Keep host_id of a node in task manager. If host_id wasn't resolved
yet, task manager will keep an empty id.
It's a preparation for the following changes.
Before this change, it was possible to change non-liveupdatable config
parameter without process restart. This erroneous behavior not only
contradicts the documentation but is potentially dangerous, as various
components theoretically might not be prepared for a change of
configuration parameter value without a restart. The issue came from
a fact that liveupdatability verification check was skipped for default
configuration parameters (those without its initial values
in configuration file during process start).
This change:
- Introduce _initialization_completed member in config_file
- Set _initialization_completed=true when config file is processed on
server start
- Verify config_file's initialization status during config update - if
config_file was initialized, prevent from further changes of
non-liveupdatable parameters
Fixesscylladb/scylladb#5382
Clean the code validating if a replication strategy can be used.
This PR consists of a bunch of unmerged https://github.com/scylladb/scylladb/pull/20088 commits - the solution to the problem that the linked PR tried to solve has been accomplished in another PR, leaving the refactor commits unmerged. The commits introduced in this PR have already been reviewed in the old PR.
No need to backport, it's just a refactor.
Closesscylladb/scylladb#22516
* github.com:scylladb/scylladb:
cql: restore validating replication strategies options
cql: change validating NetworkTopologyStrategy tags to internal_error
cql: inline abstract_replication_strategy::validate_replication_strategy
cql: clean redundant code validating replication strategy options
Currently, the session ID under which the truncate for tablets request is
running is created during the request creation and queuing. This is a problem
because this could overwrite the session ID of any ongoing operation on
system.topology#session
This change moves the creation of the session ID for truncate from the request
creation to the request handling.
Fixes#22613Closesscylladb/scylladb#22615
with_permit() creates a permit, with a self-reference, to avoid
attaching a continuation to the permit's run function. This
self-reference is used to keep the permit alive, until the execution
loop processes it. This self reference has to be carefully cleared on
error-paths, otherwise the permit will become a zombie, effectively
leaking memory.
Instead of trying to handle all loose ends, get rid of this
self-reference altogether: ask caller to provide a place to save the
permit, where it will survive until the end of the call. This makes the
call-site a little bit less nice, but it gets rid of a whole class of
possible bugs.
Fixes: #22588Closesscylladb/scylladb#22624
This commit refactors the way AWS credentials are managed in Scylla. Previously, credentials were included in the endpoint configuration. However, since credentials and endpoint configurations serve different purposes and may have different lifetimes, it’s more logical to manage them separately. Moving forward, credentials will be completely removed from the endpoint_config to ensure clear separation of concerns.
This change:
- Remove unused set_value_on_all_shards(const YAML::Node&) member
function in class config_file::named_value
The function logic was flawed, in a similar way
named_value<T>::set_value(const YAML::Node& node) is flawed: the config
source verification is insufficient for liveupdatable parameters,
allowing overwriting of non-liveupdatable config parameters (refer to
scylladb#5382). As the function was not used, it was removed instead of
fixing.
As of right now, materialized views (and consequently secondary
indexes), lwt and counters are unsupported or experimental with tablets.
Since by defaults tablets are enabled, training cases using those
features are currently broken.
The right thing to do here is to disable tablets in those cases.
Fixes https://github.com/scylladb/scylladb/issues/22638Closesscylladb/scylladb#22661
`validate_options` needs to be extended with
`topology` parameter, because NetworkTopologyStrategy needs to validate if every
explicitly listed DC is really existing. I did cut corner a bit and
trimmed the message thrown when it's not the case, just to avoid passing
and extra parameter (ks name) to the `validate_options`
function, as I find the longer message to be a bit redundant (the driver will
receive info which KS modification failed).
The tests that have been commented out in the previous commit have been
restored.
The check for `replication_factor` tag in
`network_topology_strategy::validate_options` is redundant for 2 reasons:
- before we reach this part of the code, the `replication_factor` tag
is replaced with specific DC names
- we actually do allow for `replication_factor` tag in
NetworkTopologyStrategy for keyspaces that have tablets disabled.
This code is unreachable, hence changing it to an internal error, which
means this situation should never occur.
The place that unrolls `replication_factor` tag checked for presence of
this tag ignoring the case, which lead to an unexpected behaviour:
- `replication_factor` tag (note the lowercase) was unrolled, as
explained above,
- the same tag but written in any other case resulted in throwing a vague
message: "replication_factor is an option for SimpleStrategy, not
NetworkTopologyStrategy".
So we're changing this validation to accept and unroll only the
lowercase version of this tag. We can't ignore the case here, as this
tag is present inside a json, and json is case-sensitive, even though the
CQL itself is case insensitive.
Added a test that passes for both scylla and cassandra.
Fixes: #15336
task_stats contains short info about a task. To get a list of task_stats
in the module, one needs to request /task_manager/list_module_tasks/{module}.
To make identification and navigation between tasks easier, extend
task_stats to contain shard, start_time, and end_time.
Closesscylladb/scylladb#22351
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.
tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.
No backport needed; new functionality
Closesscylladb/scylladb#22372
* github.com:scylladb/scylladb:
test: add test to check tablet repair child
service: add child for tablet repair virtual task
Currently, when the tablet repair is started, info regarding
the operation is kept in the system.tablets. The new tablet states
are reflected in memory after load_topology_state is called.
Before that, the data in the table and the memory aren't consistent.
To check the supported operations, tablet_virtual_task uses in-memory
tablet_metadata. Hence, it may not see the operation, even though
its info is already kept in system.tablets table.
Run read barrier in tablet_virtual_task::contains to ensure it will
see the latest data. Add a test to check it.
Fixes: #21975.
Closesscylladb/scylladb#21995
This was originally an attempt to reduce the compile time of this
translation unit, but apparently it doesn't work. Still, it has
the effect of converting stack traces that say "set_storage_service"
and refer to some lambda to stack traces that refer to the operation
being performed, so it's a net positive.
To faciliate the change, we introduce new functions rest_bind(),
similar to (and in fact wrapping) std::bind_front(), that capture
references like the lambdas did originally. We can't use
std::bind_front directly since the call to
seastar::httpd::path_description::set() cannot be disambiguated
after the function is obscured by the template returned by
std::bind_front. The new function rest_bind() has constraints
to understand which overload is in use.
Closesscylladb/scylladb#22526
This PR enhances the internode_compression configuration option in two ways:
1. Add validation for option values
Previously, we silently defaulted to 'none' when given invalid values. Now we
explicitly validate against the three supported values (all, dc, none) and
reject invalid inputs. This provides better error messages when users
misconfigure the option.
2. Fix documentation rendering
The help text for this option previously used C++ escape sequences which
rendered incorrectly in Sphinx-generated HTML. We now use bullet points with
'*' prefix to list the available values, matching our documentation style
for other config options. This ensures consistent rendering in both CLI
and HTML outputs.
Note: The current documentation format puts type/default/liveness information
in the same bullet list as option values. This affects other config options
as well and will need to be addressed in a separate change.
---
this improves the handling of invalid option values, and improves the doc rendering, neither of which is critical. hence no need to backport.
Closesscylladb/scylladb#22548
* github.com:scylladb/scylladb:
config: validate internode_compression option values
config: start available options with '*'
Bug https://bugs.python.org/issue26789 is resolved in python 3.10.
The frozen tool chain uses python 3.12. Since this is a supported and
recommended way for work environment, removing workaround and bumping
requirements for a newer python version.
Closesscylladb/scylladb#22627
Following the work done in ed4bfad5c3, the action is failing with the
following error:
```
Error: Input required and not supplied: token
```
It is due ot missing permissions in the workflow, adding it
Closesscylladb/scylladb#22630
During topology state application peers table may be updated with the
new ip->id mapping. The update is not atomic: it adds new mapping and
then removes the old one. If we call get_host_id_to_ip_map while this is
happening it may trigger an internal error there. This is a regression
since ef929c5def. Before that commit the
code read the peers table only once before starting the update loop.
This patch restores the behaviour.
Fixes: scylladb/scylladb#22578
tablet_repair_task_impl is run as a part of tablet repair. Make it
a child of tablet repair virtual task.
tablet_repair_task_impl started by /storage_service/repair_async API
(vnode repair) does not have a parent, as it is the top-level task
in that case.
* seastar 71036ebcc0...5b95d1d798 (3):
> rpc stream: do not abort stream queue if stream connection was closed without error
> resource: fallback to sysconf when failed to detect memory size from hwloc
> Merge 'scheduling_group: improve scheduling group creation exception safety' from Michael Litvak
scylla-gdb.py adjusted for scheduling_group_specific data structure
changes in Seastar. As part of that, a gratuitous dereference of
std::unique_ptr, which fails for std::unique_ptr<void*, ...>, was
removed.
The test expects and asserts that after wait_for_view is completed we
read the view_build_status table and get a row for each node and view.
But this is wrong because wait_for_view may have read the table on one
node, and then we query the table on a different node that didn't insert
all the rows yet, so the assert could fail.
To fix it we change the test to retry and check that eventually all
expected rows are found and then eventually removed on the same host.
Fixesscylladb/scylladb#22547Closesscylladb/scylladb#22585
The view builder builds a view by going over the entire token ring,
consuming the base table partitions, and generating view updates for
each partition.
A view is considered as built when we complete a full cycle of the
token ring. Suppose we start to build a view at a token F. We will
consume all partitions with tokens starting at F until the maximum
token, then go back to the minimum token and consume all partitions
until F, and then we detect that we pass F and complete building the
view. This happens in the view builder consumer in
`check_for_built_views`.
The problem is that we check if we pass the first token F with the
condition `_step.current_token() >= it->first_token` whenever we consume
a new partition or the current_token goes back to the minimum token.
But suppose that we don't have any partitions with a token greater than
or equal to the first token (this could happen if the partition with
token F was moved to another node for example), then this condition will never be
satisfied, and we don't detect correctly when we pass F. Instead, we
go back to the minimum token, building the same token ranges again,
in a possibly infinite loop.
To fix this we add another step when reaching the end of the reader's
stream. When this happens it means we don't have any more fragments to
consume until the end of the range, so we advance the current_token to
the end of the range, simulating a partition, and check for built views
in that range.
Fixesscylladb/scylladb#21829Closesscylladb/scylladb#22493
Add two cqlpy tests that reproduce a bug where a secondary index query
returns more rows than the specified limit. This occurs when the indexed
column is a partition key column or the first clustering key column,
the query result spans multiple partitions, and the last partition
causes the limit to be exceeded.
`test/cqlpy/run --release ...` shows that the tests fail for Scylla
versions all the way back to 4.4.0. Older Scylla versions fail with a
syntax error in CQL query which suggests some incompatibility in the
CQL protocol. That said, this bug is not a regression.
The tests pass in Cassandra 5.0.2.
Refs #22158.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
Closesscylladb/scylladb#22513
std::any_of was included by C++11, and boost::algorithm::any_of() is
provided by Boost for users stuck in the pre-C++11 era. in our case,
we've moved into C++23, where the ranges variant of this algorithm
is available.
in order to reduce the header dependency, let's switch to
`std::ranges::any_of()`.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22503
Materialized views with tablets are not stable yet, but we want
them available as an experimental feature, mainly for teseting.
The feature was added in scylladb/scylladb#21833,
but currently it has no effect. All tests have been updated to use the
feature, so we should finally make it work.
This patch prevents users from creating materialized views in keyspaces
using tablets when the VIEWS_WITH_TABLETS feature is not enabled - such
requests will now get rejected.
Fixesscylladb/scylladb#21832Closesscylladb/scylladb#22217
This commit addresses issue #21825, where invalid PERCENTILE values for
the `speculative_retry` setting were not properly handled, causing potential
server crashes. The valid range for PERCENTILE is between 0 and 100, as defined
in the documentation for speculative retry options, where values above 100 or
below 0 are invalid and should be rejected.
The added validation ensures that such invalid values are rejected with a clear
error message, improving system stability and user experience.
Fixes#21825Closesscylladb/scylladb#21879
Moving a PR out of draft is only allowed to users with write access,
adding a github action to switch PR to `ready for review` once the
`conflicts` label was removed
Closesscylladb/scylladb#22446
This patch adds an Alternator test for the case of UpdateItem attempting
to insert in invalid B (bytes) value into an item. Values of type B
use base64 encoding, and an attempt to insert a value which isn't
valid base64 should be rejected, and this is what this test verifies.
The new tests reproduce issue #17539, which claimed we have a bug in
this area. However, test/alternator/run with the "--release" option
shows that this bug existed in Scylla 5.2, but but fixed long ago, in
5.3 and doesn't exist in master. But we never had a regression test this
issue, so now we do.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22029
Enabled with the tablets_rack_aware_view_pairing cluster feature
rack-aware pairing pairs base to view replicas that are in the
same dc and rack, using their ordinality in the replica map
We distinguish between 2 cases:
- Simple rack-aware pairing: when the replication factor in the dc
is a multiple of the number of racks and the minimum number of nodes
per rack in the dc is greater than or equal to rf / nr_racks.
In this case (that includes the single rack case), all racks would
have the same number of replicas, so we first filter all replicas
by dc and rack, retaining their ordinality in the process, and
finally, we pair between the base replicas and view replicas,
that are in the same rack, using their original order in the
tablet-map replica set.
For example, nr_racks=2, rf=4:
base_replicas = { N00, N01, N10, N11 }
view_replicas = { N11, N12, N01, N02 }
pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 }
Note that we don't optimize for self-pairing if it breaks pairing ordinality.
- Complex rack-aware pairing: when the replication factor is not
a multiple of nr_racks. In this case, we attempt best-match
pairing in all racks, using the minimum number of base or view replicas
in each rack (given their global ordinality), while pairing all the other
replicas, across racks, sorted by their ordinality.
For example, nr_racks=4, rf=3:
base_replicas = { N00, N10, N20 }
view_replicas = { N11, N21, N31 }
pairing would be: { N00, N31 }\*, { N10, N11 }, { N20, N21 }
\* cross-rack pair
If we'd simply stable-sort both base and view replicas by rack,
we might end up with much worse pairing across racks:
{ N00, N11 }\*, { N10, N21 }\*, { N20, N31 }\*
\* cross-rack pair
Fixesscylladb/scylladb#17147
* This is an improvement so no backport is required
Closesscylladb/scylladb#21453
* github.com:scylladb/scylladb:
network_topology_strategy_test: add tablets rack_aware_view_pairing tests
view: get_view_natural_endpoint: implement rack-aware pairing for tablets
view: get_view_natural_endpoint: handle case when there are too few view replicas
view: get_view_natural_endpoint: track replica locator::nodes
locator: topology: consult local_dc_rack if node not found by host_id
locator: node: add dc and rack getters
feature_service: add tablet_rack_aware_view_pairing feature
view: get_view_natural_endpoint: refactor predicate function
view: get_view_natural_endpoint: clarify documentation
view: mutate_MV: optimize remote_endpoints filtering check
view: mutate_MV: lookup base and view erms synchronously
view: mutate_MV: calculate keyspace-dependent flags once
When a replica get a write request it performs get_schema_for_write,
which waits until the schema is synced. However, database::add_column_family
marks a schema as synced before the table is added. Hence, the write may
see the schema as synced, but hit no_such_column_family as the table
hasn't been added yet.
Mark schema as synced after the table is added to database::_tables_metadata.
Fixes: #22347.
Closesscylladb/scylladb#22348
If start_time/end_time is unspecified for a task, task_manager API
returns epoch. Nodetool prints the value in task status.
Fix nodetool tasks commands to print empty string for start_time/end_time
if it isn't specified.
Modify nodetool tasks status docs to show empty end_time.
Fixes: #22373.
Closesscylladb/scylladb#22370
Fixes#22401
In the fix for scylladb/scylla-enterprise#892, the extraction and check for sstable component encryption mask was copied
to a subroutine for description purposes, but a very important 1 << <value> shift was somehow
left on the floor.
Without this, the check for whether we actually contain a component encrypted can be wholly
broken for some components.
Closesscylladb/scylladb#22398
This change:
- Remove code that prevented audit from starting if audit_categories,
audit_tables, and audit_keyspaces are not configured
- Set liveness::LiveUpdate for audit_categories, audit_tables,
and audit_keyspaces
- Keep const reference to db::config in audit, so current config values
can be obtained by audit implementation
- Implement function audit::update_config to parse given string, update
audit datastructures when needed, and log the changes.
- Add observers to call audit::update_config when categories,
tables, or keyspaces configuration changes
New functionality, so no backport needed.
Fixes https://github.com/scylladb/scylla-enterprise/issues/1789Closesscylladb/scylladb#22449
* github.com:scylladb/scylladb:
audit: make categories, tables, and keyspaces liveupdatable
audit: move static parsing functions above audit constructors
audit: move statement_category to string conversion to static function
audit: start audit even with empty categories/tables/keyspaces
Currently, when the status of a task is queried and the task is already finished,
it gets unregistered. Getting the status shouldn't be a one-time operation.
Stop removing the task after its status is queried. Adjust tests not to rely
on this behavior. Add task_manager/drain API and nodetool tasks drain
command to remove finished tasks in the module.
Fixes: https://github.com/scylladb/scylladb/issues/21388.
It's a fix to task_manager API, should be backported to all branches
Closesscylladb/scylladb#22310
* github.com:scylladb/scylladb:
api: task_manager: do not unregister tasks on get_status
api: task_manager: add /task_manager/drain
Repair service is started after storage service, while storage service needs to reference repair one for its needs. Recently it was noticed, that this reverse order may cause troubles and was fixed with the help of an extra gate. That's not nice and makes the start-stop mess even worse. The correct fix is to fix the order both services start/stop in.
Closesscylladb/scylladb#22368
* github.com:scylladb/scylladb:
Revert "repair: add repair_service gate"
main: Start repair before storage service
repair: Check for sharded<view-builder> when constructing row_level_repair
This patch adds extensive functional tests for the DynamoDB multi-item
transactions feature - the TransactWriteItems and TransactGetItems
requests. We add 43 test functions, spanning more than 1000 lines of code,
covering the different parameters and corner cases of these requests.
Because we don't support the transaction feature in Alternator yet (this
is issue #5064), all of these tests fail on Alternator but all of them
were tested to pass on DynamoDB. So all new tests are marked "xfail".
These tests will be handy for whoever will implement this feature as
an acceptance test, and can also be useful for whoever will just want to
understand this feature better - the tests are short and simple and
heavily commented.
Note that these tests only check the correct functionality of individual
calls of these requests - these tests cannot and do not check the
consistency or isolation guarantees of concurrent invocations of
several requests. Such tests would require a different test framework,
such as the one requested in issue #6350, and are therefore not part of
this patch.
Note that this patch includes ONLY tests, and does not mean that an
implementation of the feature will soon follow. In fact, nobody is
currently working on implementing this feature.
Signed-off-by: Nadav Har'El <nyh@scylladb.com>
Closesscylladb/scylladb#22239
Introduce `defer_verbose_shutdown` in `cql_test_env` which logs
a message before and after shutting down a service, distinguishing
between success and failure.
The function is similar to the one in `main` but skips special error
handling logic applicable only to the main Scylla binary. The purpose
of the `cql_test_env` version of this function is only more verbose
logging. If necessary it can be extended in the future with additional
logic.
I estimated the impact on the size of produced log files using
`cdc_test` as an example:
```
$ build/dev/test/boost/combined_tests --run_test=cdc_test -- --smp=2 \
>logfile 2>&1
$ du -b logfile
```
the result before this commit: 1964064 bytes, after: 2196432 bytes,
so estimated ~12% increase of log file size for boost tests that use
`cql_test_env`, assuming that the number of logs printed by each test is
similar to the logs printed by `cdc_test` (but I believe `cdc_test` is
one of the less verbose tests so this is an overestimate).
The motivation for this change is easier debugging of shutdown issues.
When investigating scylladb/scylladb#21983, where an exception is
thrown somewhere during the shutdown procedure, I found it hard to
pinpoint the service from which the exception originates. This change
will make it easier to debug issues like that by wrapping shutdown of
each service in a pair of messages logged when shutdown starts and when
it finishes (including when it fails). We should get more details on
this issue when it reproduces again in CI after this commit is merged
into `master`. (I failed to reproduce it locally with 1000 runs.)
Ref scylladb/scylladb#21983Closesscylladb/scylladb#22566
Fixes#22236
If reading a file and not stopping on block bounds returned by `size()`, we could allow reading from (_file_size+<1-15>) (if crossing block boundary) and try to decrypt this buffer (last one).
Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return `_file_size` - `read offset` -> wraparound -> rather larger number than we expected (not to mention the data in question is junk/zero).
Check on last block in `transform` would wrap around size due to us being >= file size (l).
Just do an early bounds check and return zero if we're past the actual data limit.
Closesscylladb/scylladb#22395
* github.com:scylladb/scylladb:
encrypted_file_test: Test reads beyond decrypted file length
encrypted_file_impl: Check for reads on or past actual file length in transform
This commit adds the information about SStable version support in 2025.1
by replacing "2022.2" with "2022.2 and above".
In addition, this commit removes information about versions that are
no longer supported.
Fixes https://github.com/scylladb/scylladb/issues/22485Closesscylladb/scylladb#22486
test_coro_frame is flaky, as if
`service::topology_coordinator::run() [clone .resume]` wasn't running
on the shard. But it's supposed to.
Perhaps this is a bug in `find_vptrs()`?
This patch asks `scylla find` for a second opinion, and also prints
all `find_vptrs()`, to see if it's the only coroutine missing
from there.
Closesscylladb/scylladb#22534
since C++20, std::string and std::string_view started providing
`ends_with()` member function, the same applies to `seastar::sstring`,
so there is no need to use `boost::ends_with()` anymore.
in this change, we switch from `boost::ends_with()` to the member
functions variant to
- improve the readability
- reduce the header dependency
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22502
* seastar 18221366...71036ebc (2):
> Merge 'fair_queue: make the fair_group token grabbing discipline more fair' from Michał Chojnowski
apps/io_tester: add some test cases for the IO scheduler
test: in fair_queue_test, ensure that tokens are only replenished by test_env
fair_queue: track the total capacity of queued requests
fair_queue: make the fair_group token grabbing discipline more fair
> scheduling: auto-detect scheduling group key rename() method
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Closesscylladb/scylladb#22504
Previously, the internode_compression option silently defaulted to 'none'
for any unrecognized value instead of validating input. It only compared
against 'all' and 'dc', making it error-prone.
Add explicit validation for the three supported values:
- all
- dc
- none
This ensures invalid values are rejected both in command line and YAML
configuration, providing better error messages to users.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
use '*' prefix for config option values instead of escape sequences
The custom Sphinx extension that generates documentation from config.cc
help messages has issues with C++ escape sequences. For example,
"\tall: All traffic" renders incorrectly as "tall: All traffic" in HTML
output.
Instead of using escape sequences, switch to bullet-point style with '*'
prefix which works better in both CLI and HTML rendering. This matches
our existing documentation style for available option values in other
configs.
Note: This change puts type/default/liveness info in the same bullet
list as option values. This limitation affects other similar config
options and will need to be addressed comprehensively in a future
change.
Refs scylladb/scylladb#22423
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Add missing vector type documentation including: definition of vector,
adjustment of term definition, JSON encoding, Lua and cql3 type
mapping, vector dimension limit, and keyword specification.
Add cql_vector_test which tests the basic functionalities of
the vector type using CQL.
Add vectors_test which tests if descending ordering of vector
is supported.
This change has been introduced to enable CQL drivers to recognize
vector type in query results.
The encoding has been imported from Apache Cassandra implementation
to match Cassandra's and latest drivers' behaviour.
Co-authored-by: Dawid Pawlik <501149991dp@gmail.com>
Like mentioned in the previous commit, this changes introduce usage
of vector style type and adjusts the functions using list style type
to distinguish vectors from lists.
Rename collection constructor style list to list_or_vector.
Motivation for this changes is to provide a distinguishable interface
for vector type expressions.
The square bracket literal is ambigious for lists and vectors,
so that we need to perform a distinction not using CQL layer.
At first we should use the collection constructor to manage
both lists and vectors (although a vector is not a collection).
Later during preparation of expressions we should be able to get
to know the exact type using given receiver (column specification).
Knowing the type of expression we may use their respective style type
(in this case the vector style type being introduced),
which would make the implementation more precise and allow us to
evaluate the expressions properly.
This commit introduces vector style type and functions making use of it.
However vector style type is not yet used anywhere,
the next commit should adjust collection constructor and make use
of the new vector style type and it's features.
These tests check serialization and deserialization (including JSON),
basic inserts and selects, aggregate functions, element validation,
vector usage in user defined types and functions.
test_vector_between_user_types is a translated Apache Cassandra test
to check if it is handled properly internally.
This change is introduced due to lack of support for vector class name,
used by type_parser to create data_type based on given class name
(especially compound class name with inner types or other parameters).
Add function that parses vector type parameters from a class name.
Introduce vector_type CQL syntax: VECTOR<`cql_type`, `integer`>.
The parameters are respectively a type of elements of the vector
and the vector's dimension (number of elements).
Co-authored-by: Jan Łakomy <janpiotrlakomy@gmail.com>
During raft upgrade, a node may gossip about a new CDC generation that
was propagated through raft. The node that receives the generation by
gossip may have not applied the raft update yet, and it will not find
the generation in the system tables. We should consider this error
non-fatal and retry to read until it succeeds or becomes obsolete.
Another issue is when we fail with a "fatal" exception and not retrying
to read, the cdc metadata is left in an inconsistent state that causes
further attempts to insert this CDC generation to fail.
What happens is we complete preparing the new generation by calling `prepare`,
we insert an empty entry for the generation's timestamp, and then we fail. The
next time we try to insert the generation, we skip inserting it because we see
that it already has an entry in the metadata and we determine that
there's nothing to do. But this is wrong, because the entry is empty,
and we should continue to insert the generation.
To fix it, we change `prepare` to return `true` when the entry already
exists but it's empty, indicating we should continue to insert the
generation.
Fixesscylladb/scylladb#21227Closesscylladb/scylladb#22093
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated. The series factors out peers table update code from
sync_raft_topology_nodes() and calls it on topology and ip address
updates. As a side effect it fixes#22293 since now topology loading
does not require IP do be present, so the assert that is triggered in
this bug is removed.
Fixes: scylladb/scylladb#22293Closesscylladb/scylladb#22519
* github.com:scylladb/scylladb:
topology coordinator: do not update topology on address change
topology coordinator: split out the peer table update functionality from raft state application
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.
Skip the keyspace repair if no_such_keyspace is thrown during preparations.
Fixes: #22073.
Requires backport to 6.1 and 6.2 as they contain the bug
Closesscylladb/scylladb#22473
* github.com:scylladb/scylladb:
test: add test to check if repair handles no_such_keyspace
repair: handle keyspace dropped
Previously, during backup, SSTable components are preserved in the
snapshot directory even after being uploaded. This leads to redundant
uploads in case of failed backups or restarts, wasting time and
resources (S3 API calls).
This change removes SSTable components from the snapshot directory once
they are successfully uploaded to the target location. This prevents
re-uploading the same files and reduces disk usage.
This change only "Refs" https://github.com/scylladb/scylladb/issues/20655, because, we can further optimize
the backup process, consider:
- Sending HEAD requests to S3 to check for existing files before uploading.
- Implementing support for resuming partially uploaded files.
Fixes https://github.com/scylladb/scylladb/issues/21799
Refs https://github.com/scylladb/scylladb/issues/20655
---
the backup API is not used in production yet, so no need to backport.
Closesscylladb/scylladb#22285
* github.com:scylladb/scylladb:
backup_task: remove a component once it is uploaded
backup_task: extract component upload logic into dedicated function
snapshot-ctl: change snapshot_ctl::run_snapshot_modify_operation() to regular func
ExternalProject automatically creates BINARY_DIR for Seastar, but generator
expressions are not supported in this setting. This caused CMake to create
an unused "build/$<CONFIG>/seastar" directory.
Instead, define a dedicated variable matching configure.py's naming and use
it in supported options like BUILD_COMMAND. This:
- Creates build files in the standard "Seastar-prefix/src/Seastar-build"
directory instead of "build/$<CONFIG>/seastar". see
https://cmake.org/cmake/help/latest/module/ExternalProject.html#directory-options
- Makes it clearer that the variable should match configure.py settings
No functional changes to the Seastar build process - purely a cleanup to
reduce confusion when inspecting the build directory.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22437
these unused includes were identified by clang-include-cleaner. after
auditing these source files, all of the reports have been confirmed.
in which, instead of using `seastarx.hh`, `readers/mutation_reader.hh`,
use `using seastar::future` to include `future` in the global namespace,
this makes `readers/mutation_reader.hh` a header exposing `future<>`,
but this is not a good practice, because, unlike `seastarx.hh` or
`seastar/core/future.hh`, `reader/mutation_reader.hh` is not
responsible for exposing seastar declarations. so, we trade the
using statement for `#include "seastarx.hh"` in that file to decouple
the source files including it from this header because of this statement.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22439
Refs https://github.com/scylladb/seastar/issues/2513
Reloadable certificates use inotify instances. On a loaded test (CI) server, we've seen cases where we literally run out of capacity. This patch uses the extended callback and reload capability of seastar TLS to only create actual reloadable certificate objects on shard 0 for our main TLS points (encryption only does TLS on shard 0 already).
Closesscylladb/scylladb#22425
* github.com:scylladb/scylladb:
alternator: Make server peering sharded and reuse reloadable certs
messaging_service: Share reloadability of certificates across shards
redis/controller: Reuse shard 0 reloadable certificates for all shards
controller: Reuse shard 0 reloadable certificates for all shards
generic_server: Allow sharing reloadability of certificates across shards
Truncate table for tablets is implemented as a global topology operation. However, it does not have a transition state associated with it, and performs the truncate logic in `topology_coordinator::handle_global_request()` while `topology::tstate` remains empty. This creates problems because `topology::is_busy()` uses transition_state to determine if the topology state machine is busy, and will return false even though a truncate operation is ongoing.
This change introduces a new topology transition `topology::transition_state::truncate_table` and moves the truncate logic to a new method `topology_coordinator::handle_truncate_table()`. This method is now called as a handler of the `truncate_table` transition state instead of a handler of the `trunacate_table` global topology request.
This PR is a bugfix for truncate with tables and needs to be backported to 2025.1
Closesscylladb/scylladb#22452
* github.com:scylladb/scylladb:
truncate: trigger truncate logic from transition state instead of global request handler
truncate: add truncate_table transition state
The flush api could not detect if the node is down and fail the flush
before the timeout. This patch detects if there is down node and skip
the flush if so, since the flush will fail after the timeout in this
case anyway.
The slowness due to the flush timeout in
compaction_test.py::TestCompaction::test_delete_tombstone_gc_node_down
is fixed with this patch.
Fixes#22413Closesscylladb/scylladb#22445
Adds an optional callback to "listen", returning the shard local object
instance. If provided, instead of creating a "full" reloadable cerificate
object, only do so on shard 0, and use callback to reload other shards
"manually".
In the spirit of using standard-library types, instead of boost ones
where possible.
Although a disk type, it is serialized/deserialized with custom code, so
the change shouldn't cause any changes in the disk representation.
It is almost always a bad idea to run removenode force. This means a
node is removed without the remaining nodes to stream data that they
should own after the removal. This will make the cluster into a worse
state than a node being down.
One can use one of the following procedure instead:
1) Fix the dead node and move it back to the cluster
2) Run replace ops to replace the dead node
3) Run removenode ops again
We have seen misuse of nodetool removenode force by users again and
again. This patch rejects it so it can not be misused anymore.
Fixesscylladb/scylladb#15833Closesscylladb/scylladb#15834
This commit removes the information about FIPS out of the '.. only:: enterprise' directive.
As a result, the information will now show in the doc in the ScyllaDB repo
(previously, the directive included the note in the Entrprise docs only).
Refs https://github.com/scylladb/scylla-enterprise/issues/5020Closesscylladb/scylladb#22374
Fixes#21993
Removes configuration_encryptor mention from docs.
The tool itself (java) is not included in the main branch
java tools, thus need not remove from there. Only the words.
Closesscylladb/scylladb#22427
Add a test to reproduce a bug in the read DMA API of
`encrypted_file_impl` (the file implementation for Encryption-at-Rest).
The test creates an encrypted file that contains padding, and then
attempts to read from an offset within the padding area. Although this
offset is invalid on the decrypted file, the `encrypted_file_impl` makes
no checks and proceeds with the decryption of padding data, which
eventually leads to bogus results.
Refs #22236.
Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com>
(cherry picked from commit 8f936b2cbc)
Fixes#22236
If reading a file and not stopping on block bounds returned by `size()`, we could
allow reading from (_file_size+1-15) (block boundary) and try to decrypt this
buffer (last one).
Check on last block in `transform` would wrap around size due to us being >=
file size (l).
Simplest example:
Actual data size: 4095
Physical file size: 4095 + key block size (typically 16)
Read from 4096: -> 15 bytes (padding) -> transform return _file_size - read offset
-> wraparound -> rather larger number than we expected
(not to mention the data in question is junk/zero).
Just do an early bounds check and return zero if we're past the actual data limit.
v2:
* Moved check to a min expression instead
* Added lengthy comment
* Added unit test
v3:
* Fixed read_dma_bulk handling of short, unaligned read
* Added test for unaligned read
v4:
* Added another unaligned test case
`tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups.
The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that
`tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but
`tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE
Fixes#22431
This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1
Closesscylladb/scylladb#22330
* github.com:scylladb/scylladb:
test: add reproducer and test for fix to split ready CG creation
table: run set_split_mode() on all storage groups during all_storage_groups_split()
This commit adds the OS support information for version 2025.1.
In addition, the OS support page is reorganized so that:
- The content is moved from the include page _common/os-support-info.rst
to the regular os-support.rst page. The include page was necessary
to document different support for OSS and Enterprise versions, so
we don't need it anymore.
- I skipped the entries for versions that won't be supported when 2025.1
is released: 6.1 and 2023.1.
- I moved the definition of "supported" to the end of the page for better
readability.
- I've renamed the index entry to "OS Support" to be shorter on the left menu.
Fixes https://github.com/scylladb/scylladb/issues/22474Closesscylladb/scylladb#22476
This series exposes a Clock template parameter for loading_cache so that the test could use
the manual_clock rather than the lowres_clock, since relying on the latter is flaky.
In addition, the test load function is simplified to sleep some small random time and co_return the expected string,
rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test.
Fixes#20322
* The test was flaky forever, so backport is required for all live versions.
Closesscylladb/scylladb#22064
* github.com:scylladb/scylladb:
tests: loading_cache_test: use manual_clock
utils: loading_cache: make clock_type a template parameter
test: loading_cache_test: use function-scope loader
test: loading_cache_test: simlute loader using sleep
test: lib: eventually: add sleep function param
test: lib: eventually: make *EVENTUALLY_EQUAL inline functions
Most of the code from `recognized_options` is either incorrect or lacks
any implementation, for example:
- comments for Everywhere and Local strategies are contradictory, first
says to allow all options, second says that the strategy doesn't accept
any options, even though both functions have the same implementation,
- for Local & Everywhere strategies the same logic is repeated in
`validate_options` member functions, i.e. this function does nothing,
- for NetworkTopology this function returns DC names and tablet options, but tablet
options are empty; OTOH this strategy also accepts 'replication_factor'
tag, which was ommitted,
- for SimpleStrategy this function returns `replication_factor`, but this is also validated
in `validate_options` function called just before the removed
function.
All of it makes `validate_replication_strategy` work incorrectly.
That being said, 3 tests fail because of this logic's removal, so it did
something after all. The failing tests are commented out, so that the CI
passes, and will be restored in the next commit(s).
This change:
- Set liveness::LiveUpdate for audit_categories, audit_tables,
and audit_keyspaces
- Keep const reference to db::config in audit, so current config values
can be obtained by audit implementation
- Implement function audit::update_config to parse given string, update
audit datastructures when needed, and log the changes.
- Add observers to call audit::update_config when categories,
tables, or keyspaces configuration changes
Fixesscylladb/scylla-enterprise#1789
This change:
- Swap static function and audit constructors in audit.cc
This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. It allows future use of static
parsing functions in audit constructor.
This change:
- Move audit_info::category_string to a new static function
- Start using the new function in audit_info::category_string
This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. The newly created static function
will be required for proper logging of audit categories.
This change:
- Remove code that prevented audit from starting if audit_categories,
audit_tables, and audit_keyspaces are not configured
This is a preparatory commit for enabling liveupdate of audit
categories, tables, and keyspaces. Without this change, audit is
not started for particular categories/tables/keyspaces setting and
it is unwanted behavior if customer can change audit configuration via
liveupdate.
This commit has performance implications if audit sink is set (meaning
"audit"="table" or "audit"="syslog" in the config) but categories,
tables, and keyspaces are not set to audit anything. Before this commit,
audit was not started, so some operations (like creating audit_info or
lookup in empty collections) were omitted.
Currently, /task_manager/task_status_recursive/{task_id} and
/task_manager/task_status/{task_id} unregister queries task if it
has already finished.
The status should not disappear after being queried. Do not unregister
finished task when its status or recursive status is queried.
In the following patches, get_status won't be unregistering finished
tasks. However, tests need a functionality to drop a task, so that
they could manipulate only with the tasks for operations that were
invoked by these tests.
Add /task_manager/drain/{module} to unregister all finished tasks
from the module. Add respective nodetool command.
Currently, data sync repair handles most no_such_keyspace exceptions,
but it omits the preparation phase, where the exception could be thrown
during make_global_effective_replication_map.
Skip the keyspace repair if no_such_keyspace is thrown during preparations.
- To make Scylla able to run in FIPS-compliant system, add .hmac files for
crypto libraries on relocatable/rpm/deb packages.
- Currently we just write hmac value on *.hmac files, but there is new
.hmac file format something like this:
```
[global]
format-version = 1
[lib.xxx.so.yy]
path = /lib64/libxxx.so.yy
hmac = <hmac>
```
Seems like GnuTLS rejects fips selftest on .libgnutls.so.30.hmac when
file format is older one.
Since we need to absolute path on "path" directive, we need to generate
.libgnutls.so.30.hmac in older format on create-relocatable-script.py,
Signed-off-by: Takuya ASADA <syuu@scylladb.com>
Closesscylladb/scylladb#22384
The vector is a fixed-length array of non-null
specified type elements.
Implement serialization, deserialization, comparison,
JSON and Lua support, and other functionalities.
Co-authored-by: Dawid Pawlik <501149991dp@gmail.com>
Since now topology does not contain ip addresses there is no need to
create topology on an ip address change. Only peers table has to be
updated, so call a function that does peers table update only.
Raft topology state application does two things: re-creates token metadata
and updates peers table if needed. The code for both task is intermixed
now. The patch separates it into separate functions. Will be needed in
the next patch.
There's such a reference on storage_service itself, it can use this->_sys_dist_ks instead thus making its API (both internal and external) a bit simpler.
Closesscylladb/scylladb#22483
* github.com:scylladb/scylladb:
storage_service: Drop sys_dist_ks argument from track_upgrade_progress_to_topology_coordinator()
storage_service: Drop sys_dist_ks argument from raft_state_monitor_fiber()
storage_service: Drop sys_dist_ks argument from join_topology()
storage_service: Drop sys_dist_ks argument from join_cluster()
these misspellings were identified by codespell. let's fix them.
one of them is a part of a user visble string.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22443
Specialize config_from_string() for sstring to resolve lexical_cast
stream state parsing limitation. This enables correct handling of empty
string configurations, such as setting an empty value in CQL:
```cql
UPDATE system.config SET value='' WHERE
name='allowed_repair_based_node_ops';
```
Previous implementation using boost::lexical_cast would fail due to
EOF stream state, incorrectly rejecting valid empty string conversions.
Fixesscylladb/scylladb#22491
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22492
Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be
initialized before compressor::namespace_prefix due to undefined
global variable initialization order across translation units. This
was causing ZstdCompressor to be unregistered in release builds,
making it impossible to create tables with Zstd compression.
Replace the global namespace_prefix variable with a function that
returns the fully qualified compressor name. This ensures proper
initialization order and fixes the registration of the ZstdCompressor.
Fixesscylladb/scylladb#22444
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Closesscylladb/scylladb#22451
Relying on a real-time clock like lowres_clock
can be flaky (in particular in debug mode).
Use manual_clock instead to harden the test against
timing issues.
Fixes#20322
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
So the unit test can use manual_clock rather than lowres_clock
which can be flaky (in particular in debug mode).
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than a global function, accessing a thread-local `load_count`.
The thread-local load_count cannot be used when multiple test
cases run in parallel.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This test isn't about reading values from file,
but rather it's about the loading_cache.
Reading from the file can sometimes take longer than
the expected refresh times, causing flakiness (see #20322).
Rather than reading a string from a real file, just
sleep a random, short time, and co_return the string.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
This reverts commit 32ab58cdea.
Now repair service starts before and stops after storage server, so the
problem described in the commit is no longer relevant.
The latter service uses repair, but not the vice-versa, so the correct
(de)initialization order should be the same.
refs: #2737
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Currently initialization order of repair and view-builder is not
correct, so there are several places in repair code that check for v.b.
to be initialized before doing anything. There's one more place that
needs that care -- the construction of row_level_repair object.
The class instantiates helper objects that reply on view_builder to be
fully initialized and is itself created by many other task types from
repair code.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
rather then macros.
This is a first cleanup step before adding a sleep function
parameter to support also manual_clock.
Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL,
respectively, to make an error more visible in the test log
since those entry points print the offending values
when not equal.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
request handler
Before this change, the logic of truncate for tablets was triggered from
topology_coordinator::handle_global_request(). This was done without
using a topology transition state which remained empty throughout the
truncate handler's execution.
This change moves the truncate logic to a new method
topology_coordinator::handle_truncate_table(). This method is now called
as a handler of the truncate_table topology transition state instead of
a handler of the trunacate_table global topology request.
Truncate table for tablets is implemented as a global topology operation.
However, it does not have a transition state associated with it, and
performs the truncate logic in handle_global_request() while
topology::tstate remains empty. This creates problems because
topology::is_busy() uses transition_state to determine if the topology
state machine is busy, and will return false even though a truncate
operation is ongoing.
This change adds a new transition state: truncate_table
Test the simple case of base/view pairing with replication_factor
that is a multiple of the number of racks.
As well as the complex case when simple_tablets_rack_aware_view_pairing
is not possible.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Enabled with the tablets_rack_aware_view_pairing cluster feature
rack-aware pairing pairs base to view replicas that are in the
same dc and rack, using their ordinality in the replica map
We distinguish between 2 cases:
- Simple rack-aware pairing: when the replication factor in the dc
is a multiple of the number of racks and the minimum number of nodes
per rack in the dc is greater than or equal to rf / nr_racks.
In this case (that includes the single rack case), all racks would
have the same number of replicas, so we first filter all replicas
by dc and rack, retaining their ordinality in the process, and
finally, we pair between the base replicas and view replicas,
that are in the same rack, using their original order in the
tablet-map replica set.
For example, nr_racks=2, rf=4:
base_replicas = { N00, N01, N10, N11 }
view_replicas = { N11, N12, N01, N02 }
pairing would be: { N00, N01 }, { N01, N02 }, { N10, N11 }, { N11, N12 }
Note that we don't optimize for self-pairing if it breaks pairing ordinality.
- Complex rack-aware pairing: when the replication factor is not
a multiple of nr_racks. In this case, we attempt best-match
pairing in all racks, using the minimum number of base or view replicas
in each rack (given their global ordinality), while pairing all the other
replicas, across racks, sorted by their ordinality.
For example, nr_racks=4, rf=3:
base_replicas = { N00, N10, N20 }
view_replicas = { N11, N21, N31 }
pairing would be: { N00, N31 }*, { N10, N11 }, { N20, N21 }
* cross-rack pair
If we'd simply stable-sort both base and view replicas by rack,
we might end up with much worse pairing across racks:
{ N00, N11 }*, { N10, N21 }*, { N20, N31 }*
* cross-rack pair
Fixesscylladb/scylladb#17147
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently, when reducing RF, we may drop replicas from
the view before dropping replicas from the base table.
Since get_view_natural_endpoint is allowed to return
a disengaged optional if it can't find a pair for the
base replica, replcace the exiting assertion with code
handling this case, and count those events in a new
table metric: total_view_updates_failed_pairing.
Note that this does not fix the root cause for the issue
which is the unsynchronized dropping of replicas, that
should be atomic, using a single group0 transaction.
Refs scylladb/scylladb#21492
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Rather than tracking only the replica host_id, keep
track of the locator:::node& to prepare for
rack-aware pairing.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Like get_location by inet_address, there is a case, when
a node is replaced that the node cannot be found by host_id.
Currently get_location would return a reference based on the nullptr
which might cause a segfault as seen in testing. Instead,
if the host_id is of the location, revert to calling
get_location() which consults this_node or _cfg.local_dc_rack.
Otherwise, throw a runtime_error.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Simplify the function logic by calculating the predicate
function once, before scanning all base and view replicas,
rather than testing the different options in the inner loop.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
"self-pairing" is enabled only when use_legacy_self_pairing
is enabled. That is currently unclear in the documentation
comment for this function.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Currently we always lookup both `my_address` and *target_endpoint
in remote_endpoints. But if my_address is in remote_endpoints
in some cases the second lookup is not needed, so do it only
to decide whether to swap target_endpoint with my_address, if
found in remote_endpoints, or to remove that match, if
*target_endpoint is already pending as well.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Although at the moment storage_service::replicate_to_all_cores
may yield between updating the base and view tables with
a new effective_replication_map, scylladb/scylladb#21781
was submitted to change that so that they are updated
atomically together.
This change prepares for the above change, and is harmless
at the moment.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
All view live in the same keyspace as their base
table, so calculate the keyspace-dependent flags
once, outside the per-view update loop.
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
Previously, during backup, SSTable components are preserved in the
snapshot directory even after being uploaded. This leads to redundant
uploads in case of failed backups or restarts, wasting time and
resources (S3 API calls).
This change
- adds an optional query parameter named "move_files" to
"/storage_service/backup" API. if it is set to "true", SSTable
components are removed once they are backed up to object storage.
- conditionally removes SSTable components from the snapshot directory once
they are successfully uploaded to the target location. This prevents
re-uploading the same files and reduces disk usage.
This change only "Refs" #20655, because, we can move further optimize
the backup process, consider:
- Sending HEAD requests to S3 to check for existing files before uploading.
- Implementing support for resuming partially uploaded files.
Fixes#21799
Refs #20655
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
Extract upload_component() from backup_task_impl::do_backup() to improve
readability and prepare for optional post-upload cleanup. This refactoring
simplifies the main backup flow by isolating the upload logic into its own
function.
The change is motivated by an upcoming feature that will allow optional
deletion of components after successful upload, which would otherwise add
complexity to do_backup().
Refs scylladb/scylladb#21799
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
instead of implementing `snapshot_ctl::run_snapshot_modify_operation()` as
a template function, let it accept as plain noncopyable_function
instance, and return `future<>`.
Previously, `snapshot_ctl::run_snapshot_modify_operation` was a template
function that accepted a templated functor parameter. This approach
limited its usability because callers needed to be defined in the same
translation unit as the template implementation.
however, `backup_task_impl` is defined in another translation unit,
and we intend to call `snapshot_ctl::run_snapshot_modify_operation()`
in its implementation. so in order to cater this need, there are two
options:
1. to move the definition of the template function into the header file.
but the downside is that this slows down the compilation by increaing
the size of header.
2. to change the template function to a regular function. This change
restricts the function's parameter to a specific signature. However, all
current callers already return a `future<>` object, so there's minimal
impact.
in this change, we implement the second option. this allows us to call
this function from another translation unit.
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
This adds a reproducer for #22431
In cases where a tablet storage group manager had more than one storage
group, it was possible to create compaction groups outside the group0
guard, which could create problems with operations which should exclude
with compaction group creation.
tablet_storage_group_manager::all_storage_groups_split() calls set_split_mode()
for each of its storage groups to create split ready compaction groups. It does
this by iterating through storage groups using std::ranges::all_of() which is
not guaranteed to iterate through the entire range, and will stop iterating on
the first occurance of the predicate (set_split_mode()) returning false.
set_split_mode() creates the split compaction groups and returns false if the
storage group's main compaction group or merging groups are not empty. This
means that in cases where the tablet storage group manager has non-empty
storage groups, we could have a situation where split compaction groups are not
created for all storage groups.
The missing split compaction groups are later created in
tablet_storage_group_manager::split_all_storage_groups() which also calls
set_split_mode(), and that is the reason why split completes successfully. The
problem is that tablet_storage_group_manager::all_storage_groups_split() runs
under a group0 guard, and tablet_storage_group_manager::split_all_storage_groups()
does not. This can cause problems with operations which should exclude with
compaction group creation. i.e. DROP TABLE/DROP KEYSPACE
Fixes#22314
Adds expected schema extensions to the tools extension set (if used). Also uses the source config extensions in schema loader instead of temp one, to ensure we can, for example, load a schema.cql with things like `tombstone_gc` or encryption attributes in them.
It now parses only table names from its "cf" argument. Parsing
table_infos has two benefits -- it makes it possible to hide
parse_tables() thus keeping less validation code around, and the
subsequent db.find_column_family() call can avoid re-lookup of table
uuid by its ks:table pair.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Several places call parse_fully_qualified_cf_name() and get_uuid()
helpers one after another. Previous patch introduced the
parse_table_info() one that wraps both.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The method gets "fully qualified" table name, which is 'ks:cf' string
and returns back the resolved table_id value. Some callers will benefit
from knowing the parsed 'cf' part of it (see next patch).
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This argument is needed to find table by ks:cf prair. The "table" part
is taken from the vector of table_info-s, but table_info-s have table_id
value onboard, and the table can be found by this id. So keyspace is not
needed any longer.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
All callers of it already have one. Next patch will make even more use
of those passed table_info-s.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
It's no longer used outside of api/storage_service.cc. It's not yet
possible to remove it completely, but it's better not to encourage
others to use it outside of its current .cc file.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
Callers of this method provide vectors of two kinds:
- explicitly single-entry one from endpoints that work on single table
- vector returned by parse_table_infos()
The latter helper, if it gets empty list of tables from user, populates
its return value with all tables from the given keyspace.
The removed check became obsolete after recent changes. Prior to those,
the 2nd case provided vector from another helper called parse_tables(),
which could return empty result.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The set_tables_...() helper called here accept vector by value, so the
existing code copies it. It's better to move, all the more so next
changes will make this place pass vectors with more data onboard.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This handler doesn't check if the requested table exists. If it doesn't
it will throw later anyway, but most of other endpoints that work with
tables check table early. This early check allows throwing bad-param
exception on missing table, not internal-server-error one.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
This helper returns uuid, but also "Validates" the table exists by
calling db.find_uuid() and throwing bad_param exception on error.
This change will allow making for_table_on_all_shards() smaller a bit
later.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
The one is the same as parse_tables(), but returns back name:id pairs.
This change will allow making for_table_on_all_shards() smaller a bit
later, as well as removing the parse_tables() code eventually.
Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>
2025-01-13 11:32:07 +03:00
1037 changed files with 20555 additions and 25426 deletions
parser.add_argument('--commits',default=None,type=str,help='Range of promoted commits.')
parser.add_argument('--pull-request',type=int,help='Pull request number to be backported')
parser.add_argument('--head-commit',type=str,required=is_pull_request(),help='The HEAD of target branch after the pull request specified by --pull-request is merged')
body: `❌ License header check failed. Please ensure all new files include the header within the first ${{ env.HEADER_CHECK_LINES }} lines:\n\`\`\`\n${license}\n\`\`\`\nSee action logs for details.`
if ! curl -X POST "$JENKINS_URL/job/$JOB_NAME/buildWithParameters" --fail --user "$JENKINS_USER:$JENKINS_API_TOKEN" -i -v; then
echo "Error: Jenkins job trigger failed"
# Send Slack message
curl -X POST -H 'Content-type: application/json' \
-H "Authorization: Bearer $SLACK_BOT_TOKEN" \
--data '{
"channel": "#releng-team",
"text": "🚨 @here '$JOB_NAME' failed to be triggered, please check https://github.com/${{ github.repository }}/actions/runs/${{ github.run_id }} for more details",
@@ -12,7 +12,7 @@ Please use the [issue tracker](https://github.com/scylladb/scylla/issues/) to re
## Contributing code to Scylla
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
Before you can contribute code to Scylla for the first time, you should sign the [Contributor License Agreement](https://www.scylladb.com/open-source/contributor-agreement/) and send the signed form cla@scylladb.com. You can then submit your changes as patches to the [scylladb-dev mailing list](https://groups.google.com/forum/#!forum/scylladb-dev) or as a pull request to the [Scylla project on github](https://github.com/scylladb/scylla).
If you need help formatting or sending patches, [check out these instructions](https://github.com/scylladb/scylla/wiki/Formatting-and-sending-patches).
The Scylla C++ source code uses the [Seastar coding style](https://github.com/scylladb/seastar/blob/master/coding-style.md) so please adhere to that in your patches. Note that Scylla code is written with `using namespace seastar`, so should not explicitly add the `seastar::` prefix to Seastar symbols. You will usually not need to add `using namespace seastar` to new source files, because most Scylla header files have `#include "seastarx.hh"`, which does this.
@@ -280,21 +280,45 @@ Once the patch set is ready to be reviewed, push the branch to the public remote
### Development environment and source code navigation
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt`, for use only with development environments (not for building) so that they can properly analyze the source code.
Scylla includes a [CMake](https://cmake.org/) file, `CMakeLists.txt` that can be used with development environments so
that they can properly analyze the source code. However, building with CMake is not yet officially supported.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice for code hygiene, though its C++ parser sometimes makes errors and flags false issues.
Good IDEs that have support for CMake build toolchain are [CLion](https://www.jetbrains.com/clion/),
[KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
Other good options that directly parse CMake files are [KDevelop](https://www.kdevelop.org/) and [QtCreator](https://wiki.qt.io/Qt_Creator).
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects and its
C++ parser has many issues.
To use the `CMakeLists.txt` file with these programs, define the `FOR_IDE` CMake variable or shell environmental variable.
#### CLion
[Eclipse](https://eclipse.org/cdt/) is another open-source option. It doesn't natively work with CMake projects, and its C++ parser has many similar issues as CLion.
[CLion](https://www.jetbrains.com/clion/) is a commercial IDE offers reasonably good source code navigation and advice
for code hygiene, though its C++ parser sometimes makes errors and flags false issues. In order to enable proper code
analysis in CLion, the following steps are needed:
1. Get the ScyllaDB source code by following the [Getting the source code](#getting-the-source-code).
2. Follow the steps in [Dependencies](#dependencies) in order to install the required tools natively into your system.
**Don't** follow the *frozen toolchain* part described there, since CMake checks for the build dependencies installed
in the system, not in the container image provided by the toolchain.
3. In CLion, select `File`→`Open` and select the main ScyllaDB directory in order to open the CMake project there. The
project should open and fail to process the `CMakeLists.txt`. That's expected.
4. In CLion, open `File`→`Settings`.
5. Find and click on `Toolchains` (type *toolchains* into search box).
6. Select the toolchain you will use, for instance the `Default` one.
7. Type in the following system-installed tools to be used:
-`CMake`: *cmake*
-`Build Tool`: *ninja*
-`C Compiler`: *clang*
-`C++ Compiler`: *clang*
8. On the `CMake` panel/tab, click on `Reload CMake Project`
After that, CLion should successfully initialize the CMake project (marked by `[Finished]` in the console) and the
source code editor should provide code analysis support normally from now on.
### Distributed compilation: `distcc` and `ccache`
Scylla's compilations times can be long. Two tools help somewhat:
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and re-uses them when possible
- [ccache](https://ccache.samba.org/) caches compiled object files on disk and reuses them when possible
- [distcc](https://github.com/distcc/distcc) distributes compilation jobs to remote machines
A reasonably-powered laptop acts as the coordinator for compilation. A second, more powerful, machine acts as a passive compilation server.
@@ -49,7 +49,7 @@ The terms "**You**" or "**Licensee**" refer to any individual accessing or using
* **Ownership:** Licensor retains sole and exclusive ownership of all rights, interests and title in the Software and any scripts, processes, techniques, methodologies, inventions, know-how, concepts, formatting, arrangements, visual attributes, ideas, database rights, copyrights, patents, trade secrets, and other intellectual property related thereto, and all derivatives, enhancements, modifications and improvements thereof. Except for the limited license rights granted herein, Licensee has no rights in or to the Software and/ or Licensor’s trademarks, logo, or branding and You acknowledge that such Software, trademarks, logo, or branding is the sole property of Licensor.
* **Feedback:** Licensee is not required to provide any suggestions, enhancement requests, recommendations or other feedback regarding the Software ("Feedback"). If, notwithstanding this policy, Licensee submits Feedback, Licensee understands and acknowledges that such Feedback is not submitted in confidence and Licensor assumes no obligation, expressed or implied, by considering it. All right in any trademark or logo of Licensor or its affiliates and You shall make no claim of right to the Software or any part thereof to be supplied by Licensor hereunder and acknowledges that as between Licensor and You, such Software is the sole proprietary, title and interest in and to Licensor.such Feedback shall be assigned to, and shall become the sole and exclusive property of, Licensor upon its creation.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between the You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* Except for the rights expressly granted to You under this Agreement, You are not granted any other licenses or rights in the Software or otherwise. This Agreement constitutes the entire agreement between You and the Licensor with respect to the subject matter hereof and supersedes all prior or contemporaneous communications, representations, or agreements, whether oral or written.
* **Third-Party Software:** Customer acknowledges that the Software may contain open and closed source components (“OSS Components”) that are governed separately by certain licenses, in each case as further provided by Company upon request. Any applicable OSS Component license is solely between Licensee and the applicable licensor of the OSS Component and Licensee shall comply with the applicable OSS Component license.
* If any provision of this Agreement is held to be invalid or unenforceable, such provision shall be struck and the remaining provisions shall remain in full force and effect.
seastar::metrics::description("number of operations via Alternator API"),{op(CamelCaseName)}).set_skip_when_empty(),
seastar::metrics::description("number of operations via Alternator API"),{op(CamelCaseName),alternator_label,basic_level}).set_skip_when_empty(),
#define OPERATION_LATENCY(name, CamelCaseName) \
seastar::metrics::make_histogram("op_latency", \
seastar::metrics::description("Latency histogram of an operation via Alternator API"),{op(CamelCaseName)},[this]{returnto_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::description("Latency histogram of an operation via Alternator API"),{op(CamelCaseName),alternator_label,basic_level},[this]{returnto_metrics_histogram(api_operations.name.histogram());}).aggregate({seastar::metrics::shard_label}).set_skip_when_empty(), \
seastar::metrics::description("Latency summary of an operation via Alternator API"),[this]{returnto_metrics_summary(api_operations.name.summary());})(op(CamelCaseName)).set_skip_when_empty(),
seastar::metrics::description("Latency summary of an operation via Alternator API"),[this]{returnto_metrics_summary(api_operations.name.summary());})(op(CamelCaseName))(basic_level)(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements")),
seastar::metrics::description("number writes that had to be bounced from this shard because of LWT requirements"))(alternator_label).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units"),{op("DeleteItem")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("DeleteItem")})(alternator_label).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units"),{op("UpdateItem")}).set_skip_when_empty(),
seastar::metrics::description("total number of consumed write units, counted as half units"),{op("UpdateItem")})(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of rows read and dropped during filtering operations")),
seastar::metrics::description("number of rows read and dropped during filtering operations"))(alternator_label).set_skip_when_empty(),
seastar::metrics::make_counter("batch_item_count",seastar::metrics::description("The total number of items processed across all batches"),{op("BatchWriteItem")},
seastar::metrics::make_counter("batch_item_count",seastar::metrics::description("The total number of items processed across all batches"),{op("BatchGetItem")},
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)")),
seastar::metrics::description("number of table scans (counting each scan of each table that enabled expiration)"))(alternator_label).set_skip_when_empty(),
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down")),
seastar::metrics::description("number of token ranges scanned by this node while their primary owner was down"))(alternator_label).set_skip_when_empty(),
"description":"Move component files instead of copying them",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
@@ -976,7 +984,7 @@
]
},
{
"path":"/storage_service/cleanup_all/",
"path":"/storage_service/cleanup_all",
"operations":[
{
"method":"POST",
@@ -986,30 +994,6 @@
"produces":[
"application/json"
],
"parameters":[
{
"name":"global",
"description":"true if cleanup of entire cluster is requested",
"required":false,
"allowMultiple":false,
"type":"boolean",
"paramType":"query"
}
]
}
]
},
{
"path":"/storage_service/mark_node_as_clean",
"operations":[
{
"method":"POST",
"summary":"Mark the node as clean. After that the node will not be considered as needing cleanup during automatic cleanup which is triggered by some topology operations",
"description":"Set to \"true\" to flush all memtables and force tombstone garbage collection to check only the sstables being compacted (false by default). The memtable, commitlog and other uncompacted sstables will not be checked during tombstone garbage collection.",
plogger.warn("Failed to execute maybe_create_default_password due to guard conflict.{}.",retries?" Retrying":" Number of retries exceeded, giving up");
if(retries--){
continue;
}
// Log error but don't crash the whole node startup sequence.
plogger.error("Failed to create default superuser password due to guard conflict.");
throwexceptions::invalid_request_exception(format("Cannot add column {} because a column with the same name was dropped too recently. Please retry after {} seconds",
# aws_use_ec2_credentials: <bool> (default false) If true, KMS queries will use the credentials provided by ec2 instance role metadata as initial access key.
# aws_use_ec2_region: <bool> (default false) If true, KMS queries will use the AWS region indicated by ec2 instance metadata
# aws_assume_role_arn: <aws role arn> (optional) If set, any KMS query will first attempt to assume this role.
# master_key: <named KMS key for encrypting data keys> (required)
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.