Commit Graph

47522 Commits

Author SHA1 Message Date
Botond Dénes
22a28ca1db wip 2025-04-17 03:01:17 -04:00
Botond Dénes
19b4f10598 test/cluster/test_read_repair: make incremental test work with tablets
There are two tests which test incremental read repair: one with row the
other with partition tombstones. The tests currently force vnodes, by
creating the test keyspace with {'enabled': false}. Even so, the tests
were found to be flaky so one of them are marked for skip.
This commit does the following changes:
* Make the tests use tablets by creating the test keyspace with tablets.
* Change the way the tests write data so it works with tablets:
  currently the tests use scylla-sstable write + upload but this won't
  work with tablets since upload with tablets implies --load-and-stream
  which means data is streamed to all replicas (no difference created
  between nodes). Switch to the classic stop-node + write to other
  replica with CL=ONE.
* Remove the skip added to the partition-tombstone test variant.

Also add tracing to the read-repair query, to make debugging the test
easier if it fails.

Fixes: #21179
2025-04-17 02:01:17 -04:00
Pavel Emelyanov
8b2cababb6 generic_server: Don't mess with db::config
The db::config is top-level configuration of scylla, we generally try to
avoid using it even in scylla components: each uses its own config
initialized by the service creator out of the db::config itself. The
generic_server is not an exception, all the more so, it already has its
own config.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23705
2025-04-16 17:02:30 +03:00
Anna Stuchlik
0b4740f3d7 doc: add info about Scylla Doctor Automation to the docs
Fixes https://github.com/scylladb/scylladb/issues/23642

Closes scylladb/scylladb#23745
2025-04-16 11:44:35 +03:00
Pavel Emelyanov
70ac5828a8 Update seastar submodule
* seastar 099cf616...e44af9b0 (19):
  > Add assertion to `get_local_service`
  > http_client: Improve handling of server response parsing errors
  > util: include used header
  > core: Fix module linkage by using `inline constexpr` for shared constants
  > build: fix P2582R1 detection for GCC compiler compatibility
  > app-template: remove production warning
  > ioinfo: Extend printed data a bit more
  > reactor: Fix indentation after previous patch
  > reactor: Configure multiple mountpoints per disk
  > io_queue, resource, reactor: Rename dev_t -> unsigned
  > resource: Rename mountpoint to disk in resources
  > reactor: Keep queues as shared_ptr-s
  > io_queue: Drop device ID
  > io_intent: Use unsigned queue id as a key
  > io_queue: Keep unsigned queue id on an io_queue
  > file: Keep device_id on posix file impl
  > io_queue: Print mountpoint in latency goal bump message
  > io_intent: Rename qid to cid
  > reactor: Move engine()._num_io_groups assignment and check

Changes in io-queue call for scylla-gdb update as well -- now the
reactor map of device to io-queue uses seastar::shared_ptr, not
std::unique_ptr.

Closes scylladb/scylladb#23733
2025-04-16 09:44:37 +03:00
Botond Dénes
f5125ffa18 Merge 'Ensure raft group0 RPCs use the gossip scheduling group.' from Sergey Zolotukhin
Scylla operations use concurrency semaphores to limit the number of concurrent operations and prevent resource exhaustion. The semaphore is selected based on the current scheduling group.

For RAFT group operations, it is essential to use a system semaphore to avoid queuing behind user operations. This patch ensures that RAFT operations use the `gossip` scheduling group to leverage the system semaphore.

Fixes scylladb/scylladb#21637

Backport: 6.2 and 6.1

Closes scylladb/scylladb#22779

* github.com:scylladb/scylladb:
  Ensure raft group0 RPCs use the gossip scheduling group
  Move RAFT operations verbs to GOSSIP group.
2025-04-16 09:11:29 +03:00
Lakshmipathi
42ed6a87bf test: Test truncate during topology change
Add a new node, during topology change issue truncate call and
verify all nodes empty data after tablet migration.

Fixes: https://github.com/scylladb/scylla-dtest/issues/5317

Signed-off-by: Lakshmipathi Ganapathi <lakshmipathi.ganapathi@scylladb.com>

Closes scylladb/scylladb#22595
2025-04-16 09:10:22 +03:00
Tomasz Grabiec
001d3b2415 Merge 'storage_service: preserve state of busy topology when transiting tablet' from Łukasz Paszkowski
Commit 876478b84f ("storage_service: allow concurrent tablet migration in tablets/move API", 2024-02-08) introduced a code path on which the topology state machine would be busy -- in "tablet_draining" or "tablet_migration" state -- at the time of starting tablet migration. The pre-commit code would unconditionally transition the topology to "tablet_migration" state, assuming the topology had been idle previously. On the new code path, this state change would be idempotent if the topology state machine had been busy in "tablet_migration", but the state change would incorrectly overwrite the "tablet_draining" state otherwise.

Restrict the state change to when the topology state machine is idle.

In addition, add the topology update to the "updates" vector with plain push_back(). emplace_back() is not helpful here, as topology_mutation_builder::build() cannot construct in-place, and so we invoke the "canonical_mutation" move constructor once, either way.

Unit test:

Start a two node cluster. Create a single tablet on one of the nodes. Start decommissioning that node, but block decommissioning at once. In that state (i.e., in "tablet_draining"), move the tablet manually to the other node. Check that transit_tablet() leaves the topology transition state alone.

Fixes https://github.com/scylladb/scylladb/issues/20073.

Commit 876478b84f was first released in scylla-6.0.0, so we might want to backport this patch accordingly.

Closes scylladb/scylladb#23751

* github.com:scylladb/scylladb:
  storage_service: add unit test for mid-decommission transit_tablet()
  storage_service: preserve state of busy topology when transiting tablet
2025-04-16 00:19:24 +02:00
Pavel Emelyanov
b79137eaa4 storage_service: Use this->_features directly
This dependency is already there, storage service doesn't need to go
rounds via database reference to get to the features.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23739
2025-04-15 21:11:12 +03:00
Laszlo Ersek
841ca652a0 storage_service: add unit test for mid-decommission transit_tablet()
Start a two node cluster. Create a single tablet on one of the nodes.
Start decommissioning that node, but block decommissioning at once. In
that state (i.e., in "tablet_draining"), move the tablet manually to the
other node. Check that transit_tablet() leaves the topology transition
state alone.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2025-04-15 15:15:25 +02:00
Michał Chojnowski
b3d951517d test/scylla_gdb: generate a coredump when coro_task fails
This test fails sometimes, but rarely and unreliably.
We want to get a coredump from it the next time it fails.
Sending a SIGSEGV should induce that.

Refs https://github.com/scylladb/scylladb/issues/22501

Closes scylladb/scylladb#23256
2025-04-15 15:16:38 +03:00
Calle Wilund
abd2d8a58b test_tools: Manual merge of local key gen tool test from enterprise
Fixes scylladb/scylla-enterprise#5358

Transposed tool test for local file generator, originally java test.
Then enterprise test. Now here.

Closes scylladb/scylladb#23726
2025-04-15 15:14:08 +03:00
Laszlo Ersek
e1186f0ae6 storage_service: preserve state of busy topology when transiting tablet
Commit 876478b84f ("storage_service: allow concurrent tablet migration
in tablets/move API", 2024-02-08) introduced a code path on which the
topology state machine would be busy -- in "tablet_draining" or
"tablet_migration" state -- at the time of starting tablet migration. The
pre-commit code would unconditionally transition the topology to
"tablet_migration" state, assuming the topology had been idle previously.
On the new code path, this state change would be idempotent if the
topology state machine had been busy in "tablet_migration", but the state
change would incorrectly overwrite the "tablet_draining" state otherwise.

Restrict the state change to when the topology state machine is idle.

In addition, add the topology update to the "updates" vector with plain
push_back(). emplace_back() is not helpful here, as
topology_mutation_builder::build() cannot construct in-place, and so we
invoke the "canonical_mutation" move constructor once, either way.

Signed-off-by: Laszlo Ersek <laszlo.ersek@scylladb.com>
2025-04-15 13:44:45 +02:00
Piotr Dulikowski
22e3b8eccd Merge 'test/cqlpy: Adjust tests to RF-rack-valid keyspaces' from Dawid Mędrek
In this PR, we adjust tests in the cqlpy test suite so they
only use RF-rack-valid keyspaces. After that, we enable
the configuration option `rf_rack_valid_keyspaces` in the
suite by default.

Refs scylladb/scylladb#23428

Backport: backporting to 2025.1 so we can test the option there too.

Closes scylladb/scylladb#23489

* github.com:scylladb/scylladb:
  test/cqlpy: Enable rf_rack_valid_keyspaces by default
  test: Move test_alter_tablet_keyspace_rf to cluster suite
  test/cqlpy: Adjust tests to RF-rack-valid keyspaces
  test/cqlpy/cassandra_tests: Adjust to RF-rack-valid keyspaces
2025-04-15 12:43:11 +02:00
Avi Kivity
b4d4e48381 scylla-gdb: small-objects: fix for very small objects
Because of rounding and alignment, there are multiple pools for small
sizes (e.g. 4 for size 32). Because the pool selection algorithm
ignores alignment, different pools can be chosen for different object
sizes. For example, an object size of 29 will choose the first pool
of size 32, while an object size of 32 will choose the fourth pool of
size 32.

The small-objects command doesn't know about this and always considers
just the first pool for a given size. This causes it to miss out on
sister pools.

While it's possible to adjust pool selection to always choose one of the
pools, it may eat a precious cycle. So instead let's compensate in the
small-objects command. Instead of finding one pool for a given size,
find all of them, and iterate over all those pools.

Fixes #23603

Closes scylladb/scylladb#23604
2025-04-15 11:16:52 +03:00
Emil Maskovsky
3930ee8e3c raft: fix data center remaining nodes initialization
The `_remaining_nodes` attribute of the data center information was not
initialized correctly. The parameter was passed by value to the
initialization function instead of by reference or pointer.

As a result, `_remaining_nodes` was left initialized to zero, causing an
underflow when decrementing its value.

This bug did not significantly impact behavior because other safeguards,
such as capping the maximum voters per data center by the total number
of nodes, masked the issue. However, it could lead to inefficiencies, as
the remaining nodes check would not trigger correctly.

Fixes: scylladb/scylladb#23702

No backport: The bug is only present in the master branch, so no backport
is required.

Closes scylladb/scylladb#23704
2025-04-15 09:58:32 +02:00
Nadav Har'El
fbcf77d134 raft: make group0 Raft operation timeout configurable
A recent commit 370707b111 (re)introduced
a timeout for every group0 Raft operation. This timeout was set to 60
seconds, which, paraphrasing Bill Gates, "ought to be enough for anybody".

However, one of the things we do as a group0 operation is schema
changes, and we already noticed a few years ago, see commit
0b2cf21932, that in some extremely
overloaded test machines where tests run hundreds of times (!) slower
than usual, a single big schema operation - such as Alternator's
DeleteTable deleting a table and multiple of its CDC or view tables -
sometimes takes more than 60 seconds. The above fix changed the
client's timeout to wait for 300 seconds instead of 60 seconds,
but now we also need to increase our Raft timeout, or the server can
time out. We've seen this happening recently making some tests flaky
in CI (issue #23543).

So let's make this timeout configurable, as a new configuration option
group0_raft_op_timeout_in_ms. This option defaults to 60000 (i.e,
60 seconds), the same as the existing default. The test framework
overrides this default with a a higher 300 second timeout, matching
the client-side timeout.

Before this patch, this timeout was already configurable in a strange
way, using injections. But this was a misstep: We already have more
than a dozen timeouts configurable through the normal configration,
and this one should have been configured in the same way. There is
nothing "holy" about the default of 60 seconds we chose, and who
knows maybe in the future we might need to tweek it in the field,
just like we made the other timeouts tweakable. Injections cannot
be used in release mode, but configuration options can.

Fixes #23543

Signed-off-by: Nadav Har'El <nyh@scylladb.com>

Closes scylladb/scylladb#23717
2025-04-15 10:57:39 +03:00
Kefu Chai
3e3f583b84 docs/dev/tombstone.md: fix a typo
s/alwas/always/

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23734
2025-04-15 10:54:42 +03:00
Avi Kivity
5e1cf90a51 build: replace tools/java submodule with packaged cassandra-stress
We no longer use tools/java (scylladb/scylla-tools-java.git) for
nodetool or cqlsh; only cassandra-stress. Since that is available
in package form install that and excise the tools/java submodule
from the source tree.

pgo/ is adjusted to use the packaged cassandra-stress (and the cqlsh
submodule).

A few jmx references are dropped as well.

Frozen toolchain regenerated.

Optimized clang from

  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz
  https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz

Closes scylladb/scylladb#23698
2025-04-15 10:11:28 +03:00
Jenkins Promoter
9699c3ded4 Update pgo profiles - aarch64 2025-04-15 04:45:34 +03:00
Jenkins Promoter
8472aa9e53 Update pgo profiles - x86_64 2025-04-15 04:29:24 +03:00
Pavel Emelyanov
b25cb5af0c Merge 'Use named gates' from Benny Halevy
Name the gates and phased barriers we use
to make it easy to debug gate_closed_exception

Refs https://github.com/scylladb/seastar/pull/2688

* Enhancement only, no backport needed

Closes scylladb/scylladb#23329

* github.com:scylladb/scylladb:
  utils: loading_cache: use named_gate
  utils: flush_queue: use named_gate
  sstables_manager: use named gate
  sstables_loader: use named gate
  utils: phased_barrier, pluggable: use named gate
  utils: s3::client::multipart_upload: use named gate
  utils: s3::client: use named_gate
  transport: controller: use named gate
  tracing: trace_keyspace_helper: use named gate
  task_manager: module: use named gate
  topology_coordinator: use named gate
  storage_service: use named gate
  storage_proxy: wait_for_hint_sync_point: use named gate
  storage_proxy: remote: use named gate
  service: session: use named gate
  service: raft: raft_rpc: use named gate
  service: raft: raft_group0: use named gate
  service: raft: persistent_discovery: use named gate
  service: raft: group0_state_machine: use named gate
  service: migration_manager: use named gate
  replica: table: use named gate
  replica: compaction_group, storage_group: use named gate
  redis: query_processor: use named gate
  repair: repair_meta: use named gate
  reader_concurrency_semaphore: use named gate
  raft: server_impl: use named gate
  querier_cache: use named gate
  gms: gossiper: use named gate
  generic_server: use named gate
  db: sstables_format_listener: use named gate
  db: snapshot: backup_task: use named gate
  db: snapshot_ctl: use named gate
  hints: hints_sender: use named gate
  hints: manager: use named gate
  hints: hint_endpoint_manager: use named gate
  commitlog: segment_manager: use named gate
  db: batchlog_manager: use named gate
  query_processor: remote: use named gate
  compaction: compaction_state: use named gate
  alternator/server: use named_gate
2025-04-14 20:56:32 +03:00
Sergey Zolotukhin
e05c082002 Ensure raft group0 RPCs use the gossip scheduling group
Scylla operations use concurrency semaphores to limit the number
of concurrent operations and prevent resource exhaustion. The
semaphore is selected based on the current scheduling group.
For Raft group operations, it is essential to use a system semaphore to
avoid queuing behind user operations.
This commit adds a check to ensure that the raft group0 RPCs are
executed with the `gossiper` scheduling group.
2025-04-14 17:10:46 +02:00
Sergey Zolotukhin
60f1053087 Move RAFT operations verbs to GOSSIP group.
In order for RAFT operations to use the gossip system semaphore, moving RAFT
verbs to the gossip group in `do_get_rpc_client_idx`,  messaging_service.

Fixes scylladb/scylladb21637
2025-04-14 17:09:49 +02:00
Pavel Emelyanov
1bd991a111 test: Inherit sstable_assertions from sstables::test
The latter class is invented to let tests access private fields of an
sstable (mostly methods). The former is in fact an extended version of
that also does some checks. Howerver, they don't inherit from each
other, and the sstable_assertions partially duplicates some funtionality
of the test one.

Add the inheritance, remove the duplicated methods from the child class,
update the callers (the test class returns future<>s, the assertions one
"knows" it runs in seastar thread) and marm sstable::read_toc() private.

Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>

Closes scylladb/scylladb#23697
2025-04-14 13:45:14 +03:00
Kefu Chai
b3f709bed7 s3: remove an extraneous space
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23714
2025-04-14 13:02:58 +03:00
Michał Chojnowski
6e2795a843 Update seastar submodule
* seastar ed8952fb...099cf616 (10):
  > reactor: Disable hot polling if wakeup granularity is too high
  > smp: add shard_to_numa_node_mapping()
  > tests/unit/httpd_test: fix the handling of NUL bytes in the parser
  > fstream: skip allocation in no write_behinds case
  > `http`: add `xml` support to `http::mime_types::mappings`
  > Print incrementally in sigsegv handler
  > reactor: use 0x for hex addresses
  > tls: Make session resume key shared across credentials builders creds
  > build: fix CMAKE_REQUIRED_FLAGS format for sanitizer detection
  > reactor: Remove sched_debug() related code

Closes scylladb/scylladb#23703
2025-04-14 12:54:19 +03:00
Andrei Chekun
8e33d7ab81 test.py: Make the testpy log files in pytest follow the same format
Fix the incorrect log file names between conftest and scylla_manager.
This regression issue, was introduced in #22960.

Currently, scylla manager will output it's logs to the file with the
next pattern:
suite_name.path_to_the_test_file_with_subfolders.run_id.function_name.mode.run_id_cluster.log
On the same time pytest will try to find this log with next name:
suite_name.file_name_without_subfolders_path.py.run_id.function_name.mode.run_id_cluster.log

This inconsistency leads to the situation when the test failed, scylla
manager log file will not be copied to the failed_test directory and
test will have exception on teardown.

Closes scylladb/scylladb#23596
2025-04-14 12:52:48 +03:00
Evgeniy Naydanov
d6b64642c5 test.py: print out path to Scylla log for Python test suites
Test suites with `type: Python` are using single Scylla node
created by test.py, but it's handy to print a path to a log
file in pytest log too to make it easier to find the file
on failures.

Closes scylladb/scylladb#23683
2025-04-14 11:15:37 +03:00
Kefu Chai
69de816b1b scylla-gdb.py: fix a typo in gdb command description
replace "runnign" with "running".

Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>

Closes scylladb/scylladb#23716
2025-04-14 10:59:21 +03:00
Benny Halevy
8d7e4d6c36 utils: loading_cache: use named_gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:09 +03:00
Benny Halevy
46f2a24772 utils: flush_queue: use named_gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:02 +03:00
Benny Halevy
d665bb4f8b sstables_manager: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
7969293dcf sstables_loader: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
e1fe82ed33 utils: phased_barrier, pluggable: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
d3f498ae59 utils: s3::client::multipart_upload: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:47:00 +03:00
Benny Halevy
eea83464c7 utils: s3::client: use named_gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:46:51 +03:00
Benny Halevy
79e967e2f5 transport: controller: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:29:48 +03:00
Benny Halevy
3d87b67d0e tracing: trace_keyspace_helper: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:29:48 +03:00
Benny Halevy
bfdd8a98ca task_manager: module: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:29:48 +03:00
Benny Halevy
5e864b6277 topology_coordinator: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:29:46 +03:00
Benny Halevy
a67ed59399 storage_service: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
39f1175451 storage_proxy: wait_for_hint_sync_point: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
e228a112fe storage_proxy: remote: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
0a1e7de6ea service: session: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
747446cb25 service: raft: raft_rpc: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
01bb3980fc service: raft: raft_group0: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
6118150d44 service: raft: persistent_discovery: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
e430df6332 service: raft: group0_state_machine: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00
Benny Halevy
5f8b5724e6 service: migration_manager: use named gate
Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
2025-04-12 11:28:49 +03:00