Commit Graph

42904 Commits

Author SHA1 Message Date
Botond Dénes
d4563e2b28 test/pylib: rest_client: add get_injection()
The /v2/error_injection/{injection} endpoint now has a GET method too,
expose this.

(cherry picked from commit 0c61b1822c)
2024-06-11 17:32:37 +00:00
Botond Dénes
bb18a8152e api/error_injection: add getter for error_injection
Allow external code to obtain information about an error injection
point, including whether it is enabled, and importantly, what its
parameters are. Together with the `set_parameter()` added in the
previous patch, this allows tests to read out the values of internal
parameters, via a set_parameter() injection point.

(cherry picked from commit feea609e37)
2024-06-11 17:32:37 +00:00
Botond Dénes
1947290c74 utils/error_injection: add set_parameter()
Allow injection points to write values into the parameter map, which
external code can then examine. This allows exfiltrating the values if
internal variables, to be examined by tests, without exposing these
variables via an "official" path.

(cherry picked from commit 4590026b38)
2024-06-11 17:32:36 +00:00
Botond Dénes
d121fc1264 replica/database: fix live-update enable_compacting_data_for_streaming_and_repair
This config item is propagated to the table object via table::config.
Although the field in table::config, used to propagate the value, was
utils::updateable_value<T>, it was assigned a constant and so the
live-update chain was broken.
This patch fixes this.

(cherry picked from commit dbccb61636)
2024-06-11 17:32:36 +00:00
Guilherme Nogueira
1ace370ecd Remove comma that breaks CQL DML on tablets.rst
The current sample reads:

```cql
CREATE KEYSPACE my_keyspace
WITH replication = {
    'class': 'NetworkTopologyStrategy',
    'replication_factor': 3,
} AND tablets = {
    'enabled': false
};
```

The additional comma after `'replication_factor': 3` breaks the query execution.

(cherry picked from commit cf157e4423)

Closes scylladb/scylladb#19194
2024-06-10 20:24:22 +03:00
Kefu Chai
3e7de910ab docs: correct the link pointing to Scylla U
before this change it points to
https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/
which then redirects the browser to
https://university.scylladb.com/courses/scylla-operations/,
but it should have point to
https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/

in this change, the hyperlink is corrected.

Fixes #19163
Refs 6e97b83b60
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit b5dce7e3d0)

Closes scylladb/scylladb#19198
2024-06-10 20:23:08 +03:00
Kefu Chai
9cf0d618d0 build: populate cxxflags to abseil
before this change, when building abseil, we don't pass cxxflags
to compiler, and abseil libraries are build with the default
optimization level. in the case of clang, its default optimization
level is `-O0`, it compiles the fastest, but the performance of
the emitted code is not optimized for runtime performance. but we
expect good performance for the release build. a typical command line
for building abseil looks like
```
clang++  -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc
```

so, in this change, we populate cxxflags to abseil, so that the
per-mode `-O` option can be populated when building abseil.

after this change, the command line building abseil in release mode
looks like

```
clang++  -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections  -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc
```

Refs 0b0e661a85
Fixes #19161
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 535f2b2134)

Closes scylladb/scylladb#19200
2024-06-10 20:22:00 +03:00
Nadav Har'El
4810937ddf test/alternator: fix flaky test test_item_latency
The Alternator test test_metrics.py::test_item_latency confirms that
for several operation types (PutItem, GetItem, DeleteItem, UpdateItem)
we did not forget to measure their latencies.

The test checked that a latency was updated by checking that two metrics
increases:
    scylla_alternator_op_latency_count
    scylla_alternator_op_latency_sum

However, it turns out that the "sum" is only an approximate sum of all
latencies, and when the total sum grows large it sometimes does *not*
increase when a short latency is added to the statistics. When this
happens, this test fails on the assertion that the "sum" increases after
an operation. We saw this happening sometimes in CI runs.

The simple fix is to stop checking _sum at all, and only verify that
the _count increases - this is really an integer counter that
unconditionally increases when a latency is added to the histogram.

Don't worry that the strength of this test is reduced - this test was
never meant to check the accuracy or correctness of the histograms -
we should have different (and better) tests for that, unrelated to
Alternator. The purpose of *this* test is only to verify that for some
specific operation like PutItem, Alternator didn't forget to measure its
latency and update the histogram. We want to avoid a bug like we had
in counters in the past (#9406).

Fixes #18847.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 13cf6c543d)

Closes scylladb/scylladb#19193
2024-06-10 20:20:54 +03:00
Tomasz Grabiec
a3e4dc7b6c test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout
The check query may be executed on a node which doesn't yet see that
the downed server is down, as it is not shut down gracefully. The
query coordinator can choose the down node as a CL=1 replica for read
and time out.

To fix, wait for all nodes to notice the node is down before executing
the checking query.

Fixes #17938

(cherry picked from commit c8f71f4825)

Closes scylladb/scylladb#19199
2024-06-10 20:12:56 +03:00
Botond Dénes
7a6ff12ace Merge '[Backport 6.0] alternator: keep TTL work in the maintenance scheduling group' from ScyllaDB
Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group).
This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group.

Fixes: #18719

- [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this.

(cherry picked from commit 5d3f7c13f9)

(cherry picked from commit 1fe8f22d89)

 Refs #18729

Closes scylladb/scylladb#19196

* github.com:scylladb/scylladb:
  alternator, scheduler: test reproducing RPC scheduling group bug
  main: add maintenance tenant to messaging_service's scheduling config
2024-06-10 19:58:38 +03:00
Anna Stuchlik
e38d675cb9 doc: mark tablets as GA in the CREATE KEYSPACE section
This commit removes the information that tablets are an experimental feature
from the CREATE KEYSPACE section.

In addition, it removes the notes and cautions that are redundant when
a feature is GA, especially the information and warnings about the future
plans.

Fixes https://github.com/scylladb/scylladb/issues/18670

Closes scylladb/scylladb#19063

(cherry picked from commit 55ed18db07)
2024-06-10 18:53:47 +03:00
Gleb Natapov
45ff4d2c41 group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group
Currently they both run in streaming group and it may become busy during
repair/mv building and affect group0 functionality. Move it to the
gossiper group where it should have more time to run.

Fixes #18863

(cherry picked from commit a74fbab99a)

Closes scylladb/scylladb#19175
2024-06-10 10:34:29 +02:00
Nadav Har'El
0662e80917 alternator, scheduler: test reproducing RPC scheduling group bug
This patch adds a test for issue #18719: Although the Alternator TTL
work is supposedly done in the "streaming" scheduling group, it turned
out we had a bug where work sent on behalf of that code to other nodes
failed to inherit the correct scheduling group, and was done in the
normal ("statement") group.

Because this problem only happens when more than one node is involved,
the test is in the multi-node test framework test/topology_experimental_raft.

The test uses the Alternator API. We already had in that framework a
test using the Alternator API (a test for alternator+tablets), so in
this patch we move the common Alternator utility functions to a common
file, test_alternator.py, where I also put the new test.

The test is based on metrics: We write expiring data, wait for it to expire,
and then check the metrics on how much CPU work was done in the wrong
scheduling group ("statement"). Before #18719 was fixed, a lot of work
was done there (more than half of the work done in the right group).
After the issue was fixed in the previous patch, the work on the wrong
scheduling group went down to zero.

Signed-off-by: Nadav Har'El <nyh@scylladb.com>
(cherry picked from commit 1fe8f22d89)
2024-06-10 07:42:23 +00:00
Botond Dénes
5b546ad4b1 main: add maintenance tenant to messaging_service's scheduling config
Currently only the user tenant (statement scheduling group) and system
(default scheduling group) tenants exist, as we used to have only
user-initiated operations and sytem (internal) ones. Now there is need
to distinguish between two kinds of system operation: foreground and
background ones. The former should use the system tenant while the
latter will use the new maintenance tenant (streaming scheduling group).

(cherry picked from commit 5d3f7c13f9)
2024-06-10 07:42:22 +00:00
Piotr Dulikowski
e04378fdf0 Merge ' [Backport 6.0] db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek
In [d0f5873](d0f58736c8), we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster.

This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained.

We also introduce error injections to be able to test that it indeed happens.

Fixes scylladb/scylladb#18761

(cherry picked from commit [745a9c6](745a9c6ab8))

(cherry picked from commit [e855794](e855794327))

Refs scylladb/scylladb#18764

Closes scylladb/scylladb#19114

* github.com:scylladb/scylladb:
  db/hints: Introduce an error injection to test draining
  db/hints: Ensure that draining happens
2024-06-10 09:11:07 +02:00
Tomasz Grabiec
f8243cbf19 Merge '[Backport 6.0] Serialize repair with tablet migration' from ScyllaDB
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.

One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.

Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.

Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requests start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.

A blocked tablet migration (e.g. due to down node) will block repair,
whereas before it would fail. Once admin resolves the cause of blocked migration,
repair will continue.

Fixes #17658.
Fixes #18561.

(cherry picked from commit 6c64cf33df)

(cherry picked from commit 1513d6f0b0)

(cherry picked from commit 476c076a21)

(cherry picked from commit c45ce41330)

(cherry picked from commit e97acf4e30)

(cherry picked from commit 98323be296)

(cherry picked from commit 5ca54a6e88)

 Refs #18641

Closes scylladb/scylladb#19144

* github.com:scylladb/scylladb:
  test: pylib: Do not block async reactor while removing directories
  repair: Exclude tablet migrations with tablet repair
  repair_service: Propagate topology_state_machine to repair_service
  main, storage_service: Move topology_state_machine outside storage_service
  storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
  tablet_scheduler: Make disabling of balancing interrupt shuffle mode
  tablet_scheduler: Log whether balancing is considered as enabled
2024-06-09 00:20:44 +02:00
Tomasz Grabiec
27f01bf4e3 test: pylib: Do not block async reactor while removing directories
This fixes a problem where suite cleanup schedules lots of uninstall()
tasks for servers started in the suite, which schedules lots of tasks,
which synchronously call rmtree(). These take over a minute to finish,
which blocks other tasks for tests which are still executing.

In particular, this was observed to case
ManagerClient.server_stop_gracefully() to time-out. It has a timeout
of 60 seconds. The server was stopped quickly, but the RESTful API
response was not processed in time and the call timed out when it got
the async reactor.

(cherry picked from commit 5ca54a6e88)
2024-06-08 16:31:18 +02:00
Tomasz Grabiec
ded9aca6ee repair: Exclude tablet migrations with tablet repair
We want to exclude repair with tablet migrations to avoid races
between repair reads and writes with replica movement. Repair is not
prepared to handle topology transitions in the middle.

One reason why it's not safe is that repair may successfully write to
a leaving replica post streaming phase and consider all replicas to be
repaired, but in fact they are not, the new replica would not be
repaired.

Other kinds of races could result in repair failures. If repair writes
to a leaving replica which was already cleaned up, such writes will
fail, causing repair to fail.

Excluding works by keeping effective_replication_map_ptr in a version
which doesn't have table's tablets in transitions. That prevents later
transitions from starting because topology coordinator's barrier will
wait for that erm before moving to a stage later than
allow_write_both_read_old, so before any requets start using the new
topology. Also, if transitions are already running, repair waits for
them to finish.

Fixes #17658.
Fixes #18561.

(cherry picked from commit 98323be296)
2024-06-08 16:31:18 +02:00
Tomasz Grabiec
ccd441a4de repair_service: Propagate topology_state_machine to repair_service
(cherry picked from commit e97acf4e30)
2024-06-08 16:31:15 +02:00
Jenkins Promoter
79e4e411b3 Update ScyllaDB version to: 6.0.1 2024-06-07 09:31:05 +03:00
Kefu Chai
f8ba94a960 doc: document "enable_tablets" option
it sets the cluster feature of tablets, and is a prerequisite for
using tablets.

Refs #18670
Fixes #19157
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit bac7e1e942)

Closes scylladb/scylladb#19158
2024-06-07 07:03:30 +03:00
Tzach Livyatan
dfe89157c6 Docs: fix start command in Update replace-dead-node.rst
Fix #18920

(cherry picked from commit c30f81c389)

Closes scylladb/scylladb#19142
2024-06-07 07:02:02 +03:00
Kefu Chai
50d8fa6b77 topology_coordinator: handle/wait futures when stopping topology_coordinator
before this change, unlike other services in scylla,
topology_coordinator is not properly stopped when it is aborted,
because the scylla instance is no longer a leader or is being shut down.
its `run()` method just stops the grand loop and bails out before
topology_coordinator is destroyed. but we are tracking the migration
state of tablets using a bunch of futures, which might not be
handled yet, and some of them could carry failures. in that case,
when the `future` instances with failure state get destroyed,
seastar calls `report_failed_future`. and seastar considers this
practice a source a bug -- as one just fails to handle an error.
that's why we have following error:

```
WARN  2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre
loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal
l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58
 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524
```
and the backtrace looks like:
```
seastar::current_backtrace_tasklocal() at ??:?
seastar::current_tasktrace() at ??:?
seastar::current_backtrace() at ??:?
seastar::report_failed_future(seastar::future_state_base::any&&) at ??:?
service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:?
service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:?
service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:?
seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:?
seastar::reactor::run_some_tasks() at ??:?
seastar::reactor::do_run() at ??:?
seastar::reactor::run() at ??:?
seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:?
```

and even worse, these futures are indirectly owned by `topology_coordinator`.
so there are chances that they could be used even after `topology_coordinator`
is destroyed. this is a use-after-free issue. because the
`run_topology_coordinator` fiber exits when the scylla instance retires
from the leader's role, this use-after-free could be fatal to a
running instance due to undefined behavior of use after free.

so, in this change, we handle the futures in `_tablets`, and note
down the failures carried by them if any.

Fixes #18745
Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>
(cherry picked from commit 4a36918989)

Closes scylladb/scylladb#19139
2024-06-07 07:00:25 +03:00
Jenkins Promoter
a77615adf3 Update ScyllaDB version to: 6.0.0 scylla-6.0.0-candidate-20240606102200 scylla-6.0.0-candidate-20240606081124 scylla-6.0.0 2024-06-06 16:03:39 +03:00
Tomasz Grabiec
e518bb68b2 main, storage_service: Move topology_state_machine outside storage_service
It will be propagated to repair_service to avoid cyclic dependency:

storage_service <-> repair_service

(cherry picked from commit c45ce41330)
2024-06-06 13:01:19 +00:00
Tomasz Grabiec
af2caeb2de storage_srvice, toplogy: Extract topology_state_machine::await_quiesced()
Will be used later in a place which doesn't have access to storage_service
but has to toplogy_state_machine.

It's not necessary to start group0 operation around polling because
the busy() state can be checked atomically and if it's false it means
the topology is no longer busy.

(cherry picked from commit 476c076a21)
2024-06-06 13:01:19 +00:00
Tomasz Grabiec
d5ebfea1ff tablet_scheduler: Make disabling of balancing interrupt shuffle mode
Tests will rely on that, they will run in shuffle mode, and disable
balancing around section which otherwise would be infinitely blocked
by ongoing shuffling (like repair).

(cherry picked from commit 1513d6f0b0)
2024-06-06 13:01:18 +00:00
Tomasz Grabiec
3fec9e1344 tablet_scheduler: Log whether balancing is considered as enabled
(cherry picked from commit 6c64cf33df)
2024-06-06 13:01:18 +00:00
Kamil Braun
5d3dde50f4 Merge '[Backport 6.0] Fail bootstrap if ip mapping is missing during double write stage' from ScyllaDB
If a node restart just before it stores bootstrapping node's IP it will
not have ID to IP mapping for bootstrapping node which may cause failure
on a write path. Detect this and fail bootstrapping if it happens.

(cherry picked from commit 1faef47952)

(cherry picked from commit 27445f5291)

(cherry picked from commit 6853b02c00)

(cherry picked from commit f91db0c1e4)

 Refs #18927

Closes scylladb/scylladb#19118

* github.com:scylladb/scylladb:
  raft topology: fix indentation after previous commit
  raft topology: do not add bootstrapping node without IP as pending
  test: add test of bootstrap where the coordinator crashes just before storing IP mapping
  schema_tables: remove unused code
2024-06-06 11:35:13 +02:00
Tomasz Grabiec
b7fe4412d0 test: pylib: Fetch all pages by default in run_async
Fetching only the first page is not the intuitive behavior expected by users.

This causes flakiness in some tests which generate variable amount of
keys depending on execution speed and verify later that all keys were
written using a single SELECT statement. When the amount of keys
becomes larger than page size, the test fails.

Fixes #18774

(cherry picked from commit 2c3f7c996f)

Closes scylladb/scylladb#19130
2024-06-06 08:22:45 +03:00
Benny Halevy
fd7284ec06 gms: endpoint_state: get_dc_rack: do not assign to uninitialized memory
Assigning to a member of an uninitialized optional
does not initialize the object before assigning to it.
This resulted in the AddressSanitizer detecting attempt
to double-free when the uninitialized string contained
apprently a bogus pointer.

The change emplaces the returned optional when needed
without resorting to the copy-assignment operator.
So it's not suceptible to assigning to uninitialized
memory, and it's more efficient as well...

Fixes scylladb/scylladb#19041

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit b2fa954d82)

Closes scylladb/scylladb#19117
2024-06-06 08:21:05 +03:00
Botond Dénes
8d12eeee62 Merge '[Backport 6.0] tasks: introduce task manager's task folding' from Aleksandra Martyniuk
Task manager's tasks stay in memory after they are finished.
Moreover, even if a child task is unregistered from task manager,
it is still alive since its parent keeps a foreign pointer to it. Also,
when a task has finished successfully there is no point in keeping
all of its descendants in memory.

The patch introduces folding of task manager's tasks. Whenever
a task which has a parent is finished it is unregistered from task
manager and foreign_ptr to it (kept in its parent) is replaced
with its status. Children's statuses of the task are dropped unless
they or one of their descendants failed. So for each operation we
keep a tree of tasks which contains:
- a root task and its direct children (status if they are finished, a task
  otherwise);
- running tasks and their direct children (same as above);
- a statuses path from root to failed tasks.

/task_manager/wait_task/ does not unregister tasks anymore.

Refs: https://github.com/scylladb/scylladb/issues/16694.

- [ ] ** Backport reason (please explain below if this patch should be backported or not) **
Requires backport to 6.0 as task number exploded with tablets.

(cherry picked from commit 6add9edf8a)

(cherry picked from commit 319e799089)

(cherry picked from commit e6c50ad2d0)

(cherry picked from commit a82a2f0624)

(cherry picked from commit c1b2b8cb2c)

(cherry picked from commit 30f97ea133)

(cherry picked from commit fc0796f684)

(cherry picked from commit d7e80a6520)

(cherry picked from commit beef77a778)

Refs https://github.com/scylladb/scylladb/pull/18735

Closes scylladb/scylladb#19104

* github.com:scylladb/scylladb:
  docs: describe task folding
  test: rest_api: add test for task tree structure
  test: rest_api: modify new_test_module
  tasks: test: modify test_task methods
  api: task_manager: do not unregister task in /task_manager/wait_task/
  tasks: unregister tasks with parents when they are finished
  tasks: fold finished tasks info their parents
  tasks: make task_manager::task::impl::finish_failed noexcept
  tasks: change _children type
2024-06-06 07:56:12 +03:00
Gleb Natapov
e11827f37e raft topology: fix indentation after previous commit
(cherry picked from commit f91db0c1e4)
2024-06-05 13:55:29 +00:00
Gleb Natapov
0acfc223ab raft topology: do not add bootstrapping node without IP as pending
If there is no mapping from host id to ip while a node is in bootstrap
state there is no point adding it to pending endpoint since write
handler will not be able to map it back to host id anyway. If the
transition sate requires double writes though we still want to fail.
In case the state is write_both_read_old we fail the barrier that will
cause topology operation to rollback and in case of write_both_read_new
we assert but this should not happen since the mapping is persisted by
this point (or we failed in write_both_read_old state).

Fixes: scylladb/scylladb#18676
(cherry picked from commit 6853b02c00)
2024-06-05 13:55:28 +00:00
Gleb Natapov
c53cd98a41 test: add test of bootstrap where the coordinator crashes just before storing IP mapping
On the next boot there is no host ID to IP mapping which causes node to
crash again with "No mapping for :: in the passed effective replication map"
assertion.

(cherry picked from commit 27445f5291)
2024-06-05 13:55:28 +00:00
Gleb Natapov
fa6a7cf144 schema_tables: remove unused code
(cherry picked from commit 1faef47952)
2024-06-05 13:55:28 +00:00
Patryk Jędrzejczak
65021c4b1c [Backport 6.0] test: test_topology_ops: run correctly without tablets
The values of `tablets_enabled` were nonempty strings, so they
always evaluated to `True` in the if statement responsible for
enabling writing workers only if tablets are disabled. Hence, the
writing workers were always disabled.

The original commit, ea4717da65,
contains one more change, which is not needed (and conflicting)
in 6.0 because scylladb/scylladb#18898 has been backported first.

Closes scylladb/scylladb#19111
2024-06-05 15:15:00 +02:00
Botond Dénes
341c29bd74 Merge '[Backport 6.0] storage_service: Fix race between tablet split and stats retrieval' from Raphael "Raph" Carvalho
Retrieval of tablet stats must be serialized with mutation to token metadata, as the former requires tablet id stability.
If tablet split is finalized while retrieving stats, the saved erm, used by all shards, can have a lower tablet count than the one in a particular shard, causing an abort as tablet map requires that any id feeded into it is lower than its current tablet count.

Fixes https://github.com/scylladb/scylladb/issues/18085.

(cherry picked from commit abcc68dbe7)

(cherry picked from commit 551bf9dd58)

(cherry picked from commit e7246751b6)

Refs https://github.com/scylladb/scylladb/pull/18287

Closes scylladb/scylladb#19095

* github.com:scylladb/scylladb:
  topology_experimental_raft/test_tablets: restore usage of check_with_down
  test: Fix flakiness in topology_experimental_raft/test_tablets
  service: Use tablet read selector to determine which replica to account table stats
  storage_service: Fix race between tablet split and stats retrieval
2024-06-05 13:06:32 +03:00
Aleksandra Martyniuk
e963631859 docs: describe task folding
(cherry picked from commit beef77a778)
2024-06-05 10:09:13 +02:00
Jenkins Promoter
c6f0a3267e Update ScyllaDB version to: 6.0.0-rc3 scylla-6.0.0-rc3 scylla-6.0.0-rc3-candidate-20240605025744 2024-06-05 10:03:47 +03:00
Marcin Maliszkiewicz
f02f2fef40 docs: remove note about performance degradation with default superuser
This doesn't apply for auth-v2 as we improved data placement and
removed cassandra quirk which was setting different CL for some
default superuser involved operations.

Fixes #18773

(cherry picked from commit 9adf74ae6c)

Closes scylladb/scylladb#18860
2024-06-05 09:04:45 +03:00
Benny Halevy
f8ae38a68c data_dictionary: keyspace_metadata: format: print also initial_tablets
Currently, there is no indication of tablets in the logged KSMetaData.
Print the tablets configuration of either the`initial`  number of tablets,
if enabled, or {'enabled':false} otherwise.

For example:
```
migration_manager - Create new Keyspace: KSMetaData{name=tablets_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"initial":0}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004d446a8}

migration_manager - Create new Keyspace: KSMetaData{name=vnodes_ks, strategyClass=org.apache.cassandra.locator.NetworkTopologyStrategy, strategyOptions={"datacenter1": "1"}, cfMetaData={}, durable_writes=true, tablets={"enabled":false}, userTypes=org.apache.cassandra.config.UTMetaData@0x600004c33ea8}

Signed-off-by: Benny Halevy <bhalevy@scylladb.com>
(cherry picked from commit 4fe700a962)

Closes scylladb/scylladb#19009
2024-06-05 08:31:21 +03:00
Botond Dénes
8a064daccf Update tools/java submodule
* tools/java 4ee15fd9...6dfc187a (1):
  > Update Scylla Java driver to 3.11.5.3.

[botond: regenerate frozen toolchain]

Closes scylladb/scylladb#18999
2024-06-05 08:00:19 +03:00
Botond Dénes
7f540407c9 Merge '[Backport 6.0] repair: Introduce new primary replica selection algorithm for tablets' from ScyllaDB
Tablet allocation does not guarantee fairness of
the first replica in the replicas set across dcs.
The lack of this fix cause the following dtest to fail:
repair_additional_test.py::TestRepairAdditional::test_repair_option_pr_multi_dc

Use the tablet_map get_primary_replica or get_primary_replica_within_dc,
respectively to see if this node is the primary replica for each tablet
or not.

Fixes https://github.com/scylladb/scylladb/issues/17752

No backport is required before 6.0 as tablets (and tablet repair) are introduced in 6.0

(cherry picked from commit c52f70f92c)

(cherry picked from commit 2de79c39dc)

(cherry picked from commit 84761acc31)

(cherry picked from commit 009767455d)

(cherry picked from commit 18df36d920)

 Refs #18784

Closes scylladb/scylladb#19068

* github.com:scylladb/scylladb:
  repair: repair_tablets: use get_primary_replica
  repair: repair_tablets: no need to check ranges_specified per tablet
  locator: tablet_map: add get_primary_replica_within_dc
  locator: tablet_map: get_primary_replica: do not copy tablet info
  locator: tablet_map: get_primary_replica: return tablet_replica
2024-06-05 07:47:24 +03:00
Aleksandra Martyniuk
50e1369d1d test: rest_api: add test for task tree structure
Add test which checks whether the tasks are folded into their parent
as expected.

(cherry picked from commit d7e80a6520)
2024-06-04 14:42:10 +00:00
Aleksandra Martyniuk
21e860453c test: rest_api: modify new_test_module
Remove remaining test tasks when a test module is removed, so that
a node could shutdown even if a test fails.

(cherry picked from commit fc0796f684)
2024-06-04 14:42:10 +00:00
Dawid Medrek
fc3d2d8fde db/hints: Introduce an error injection to test draining
We want to verify that a hint directory is drained
when any of the nodes correspodning to it leaves
the cluster. The test scenario should happen before
the whole cluster has been migrated to
the host-ID-based hinted handoff, so when we still
rely on the mappings between hint endpoint managers
and the hint directories managed by them.

To make such a test possible, in these changes we
introduce an error injection rejecting incoming
hints. We want to test a scenario when:

1. hints are saved towards a given node -- node N1,
2. N1 changes its IP to a different one,
3. some other node -- node N2 -- changes its IP
   to the original IP of N1,
4. hints are saved towards N2 and they are stored
   in the same directory as the hints saved towards
   N1 before,
5. we start draining N2.

Because at some point N2 needs to be stopped,
it may happen that some mutations towards
a distributed system table generate a hint
to N2 BEFORE it has finished changing its IP,
effectively creating another hint directory
where ALL of the hints towards the node
will be stored from there on. That would disturb
the test scenario. Hence, this error injection is
necessary to ensure that all of the steps in the
test proceed as expected.

(cherry picked from commit e855794327)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
1d34da21a9 tasks: test: modify test_task methods
Wait until the task is done in test_task::finish_failed and
test_task::finish to ensure that it is folded into its parent.

(cherry picked from commit 30f97ea133)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
377bc345f1 api: task_manager: do not unregister task in /task_manager/wait_task/
If /task_manager/wait_task/ unregisters the task, then there is no
way to examine children failures, since their statuses can be checked
only through their parent.

(cherry picked from commit c1b2b8cb2c)
2024-06-04 14:42:09 +00:00
Aleksandra Martyniuk
607be221b8 tasks: unregister tasks with parents when they are finished
Unregister children that are finished from task manager. They can be
examined through they parents.

(cherry picked from commit a82a2f0624)
2024-06-04 14:42:09 +00:00