scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-05-24 00:32:15 +00:00

Author	SHA1	Message	Date
Botond Dénes	5139e74058	Merge '[Backport 6.0] Improve handling of outdated --experimental-features' from ScyllaDB Some time ago it turned out that if unrecognized feature name is met in scylla.yaml, the whole experimental features list is ignored, but scylla continues to boot. There's UNUSED feature which is the proper way to deprecate a feature, and this PR improves its handling in several ways. 1. The recently removed "tablets" feature is partially brought back, but marked as UNUSED 2. Any UNUSED features met while parsing are printed into logs 3. The enum_option<> helper is enlightened along the way refs: #18968 (cherry picked from commit `f56cdb1cac`) (cherry picked from commit `0c0a7d9b9a`) (cherry picked from commit `b85a02a3fe`) (cherry picked from commit `b2520b8185`) Refs #19230 Closes scylladb/scylladb#19266 * github.com:scylladb/scylladb: config: Mark tablets feature as unused main: Warn unused features enum_option: Carry optional key on board enum_option: Remove on-board _map member	2024-06-14 15:43:17 +03:00
Michał Chojnowski	ddcaefefdc	test_tablets: add test_tablet_storage_freeing Tests that tablet storage is freed after it is migrated away. Fixes #16946 (cherry picked from commit `823da140dd`)	2024-06-14 10:19:32 +00:00
Michał Chojnowski	f466dcfa5f	test: pylib: add get_sstables_disk_usage() Adds an util for measuring the disk usage of the given table on the given node. Will be used in a follow-up patch for testing that sstables are freed properly. (cherry picked from commit `7741491b47`)	2024-06-14 10:19:32 +00:00
Benny Halevy	6122f9454d	storage_service: join_token_ring: reject replace on different dc or rack Do not allow replacing a node on one dc/rack with a node on a different dc/rack as this violates the assumption of replace node operation that all token ranges previously owned by the dead node would be rebuilt on the new node. Fixes #16858 Refs scylladb/scylla-enterprise#3518 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `34dfa4d3a3`) Closes scylladb/scylladb#19281	2024-06-14 07:43:58 +03:00
Botond Dénes	b18d9e5d0d	Merge '[Backport 6.0] make enable_compacting_data_for_streaming_and_repair truly live-update' from ScyllaDB This config item is propagated to the table object via table::config. Although the field in `table::config`, used to propagate the value, was `utils::updateable_value<T>`, it was assigned a constant and so the live-update chain was broken. This series fixes this and adds a test which fails before the patch and passes after. The test needed new test infrastructure, around the failure injection api, namely the ability to exfiltrate the value of internal variable. This infrastructure is also added in this series. Fixes: https://github.com/scylladb/scylladb/issues/18674 - [x] This patch has to be backported because it fixes broken functionality (cherry picked from commit `dbccb61636`) (cherry picked from commit `4590026b38`) (cherry picked from commit `feea609e37`) (cherry picked from commit `0c61b1822c`) (cherry picked from commit `8ef4fbdb87`) Refs #18705 Closes scylladb/scylladb#19240 * github.com:scylladb/scylladb: test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update test/pylib: rest_client: add get_injection() api/error_injection: add getter for error_injection utils/error_injection: add set_parameter() replica/database: fix live-update enable_compacting_data_for_streaming_and_repair	2024-06-13 12:45:23 +03:00
Kamil Braun	cb6a97d0dc	raft: fsm: add details to on_internal_error_noexcept message If we receive a message in the same term but from a different leader than we expect, we print: ``` Got append request/install snapshot/read_quorum from an unexpected leader ``` For some reason the message did not include the details (who the leader was and who the sender was) which requires almost zero effort and might be useful for debugging. So let's include them. Ref: scylladb/scylla-enterprise#4276 (cherry picked from commit `99a0599e1e`) Closes scylladb/scylladb#19265	2024-06-13 11:25:11 +02:00
Wojciech Mitros	813fef44d3	exceptions: make view update timeouts inherit from timed_out_error Currently, when generating and propagating view updates, if we notice that we've already exceeded the time limit, we throw an exception inheriting from `request_timeout_exception`, to later catch and log it when finishing request handling. However, when catching, we only check timeouts by matching the `timed_out_error` exception, so the exception thrown in the view update code is not registered as a timeout exception, but an unknown one. This can cause tests which were based on the log output to start failing, as in the past we were noticing the timeout at the end of the request handling and using the `timed_out_error` to keep processing it and now, even though we do notice the timeout even earlier, due to it's type we log an error to the log, instead of treating it as a regular timeout. In this patch we make the error thrown on timeout during view updates inherit from `timed_out_error` instead of the `request_timeout_exception` (it is also moved from the "exceptions" directory, where we define exceptions returned to the user). Aside from helping with the issue described above, we also improve our metrics, as the `request_timeout_exception` is also not checked for in the `is_timeout_exception` method, and because we're using it to check whether we should update write timeout metrics, they will only start getting updated after this patch. Fixes #19261 (cherry picked from commit `4aa7ada771`) Closes scylladb/scylladb#19262	2024-06-13 12:01:12 +03:00
Botond Dénes	1c67c6cf78	Merge '[Backport 6.0] test: memtable_test: increase unspooled_dirty_soft_limit ' from ScyllaDB before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes https://github.com/scylladb/scylladb/issues/19034 --- the issue applies to both 5.4 and 6.0, and this issue hurts the CI stability, hence we should backport it. (cherry picked from commit `2df4e9cfc2`) (cherry picked from commit `223fba3243`) Refs #19252 Closes scylladb/scylladb#19258 * github.com:scylladb/scylladb: test: memtable_test: increase unspooled_dirty_soft_limit test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE	2024-06-13 07:26:43 +03:00
Pavel Emelyanov	5811df4d4b	config: Mark tablets feature as unused This features used to be there for a while, but then it was removed by `83d491af02`. This patch partially takes it back, but maps to UNUSED, so that if met in config, it's warned, but other features are parsed as well. refs: #18968 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `b2520b8185`)	2024-06-12 18:35:32 +00:00
Pavel Emelyanov	cb9d6e080c	main: Warn unused features When seeing an UNUSED feature -- print it into log. This is where the enum_option::key is in use. The thing is that experimental features map different unused feature names into the single UNUSED feature enum value, so once the feature is parsed its configured name only persists in the option's key member (saved by previous patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `b85a02a3fe`)	2024-06-12 18:35:32 +00:00
Pavel Emelyanov	86068790ec	enum_option: Carry optional key on board It facilitates option formatting, but the main purpose is to be able to find out the exact keys, not values, later (see next patch). Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `0c0a7d9b9a`)	2024-06-12 18:35:31 +00:00
Pavel Emelyanov	3501ede024	enum_option: Remove on-board _map member The map in question is immutable and can obtained from the Mapper type at any time, there's no need in keeping its copy on each enum_option Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `f56cdb1cac`)	2024-06-12 18:35:31 +00:00
Anna Stuchlik	bc89aac9d0	doc: reorganize ToC of the Reference section This commit adds a proper ToC to the Reference section to improve how it renders. (cherry picked from commit `63084c6798`) Closes scylladb/scylladb#19257 scylla-6.0.1 scylla-6.0.1-candidate-20240613060935	2024-06-12 19:12:53 +02:00
Kefu Chai	b39c0a1d15	test: memtable_test: increase unspooled_dirty_soft_limit before this change, when performing memtable_test, we expect that the memtables of ks.cf is the only memtables being flushed. and we inject 4 failures in the code path of flush, and wait until 4 of them are triggered. but in the background, `dirty_memory_manager` performs flush on all tables when necessary. so, the total number of failures is not necessary the total number of failures triggered when flushing ks.cf, some of them could be triggered when flushing system tables. that's why we have sporadict test failures from this test. as we might check `t.min_memtable_timestamp()` too soon. after this change, we increase `unspooled_dirty_soft_limit` setting, in order to disable `dirty_memory_manager`, so that the only flush is performed by the test. Fixes #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `223fba3243`)	2024-06-12 15:44:11 +00:00
Kefu Chai	548fd01bd4	test: memtable_test: replace BOOST_ASSERT with BOOST_REQURE before this change, we verify the behavior of design under test using `BOOST_ASSERT()`, which is a wrapper around `assert()`, so if a test fails, the test just aborts. this is not very helpful for postmortem debugging. after this change, we use `BOOST_REQUIRE` macro for verifying the behavior, so that Boost.Test prints out the condition if it does not hold when we test it. Refs #19034 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `2df4e9cfc2`)	2024-06-12 15:44:11 +00:00
Pavel Emelyanov	2306c3b522	test: Reduce failure detector timeout for failed tablets migration test Most of the time this test spends waiting for a node to die. Helps 3x times Was real 9m21,950s user 1m11,439s sys 1m26,022s Now real 3m37,780s user 0m58,439s sys 1m13,698s refs: #17764 Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> (cherry picked from commit `a4e8f9340a`) Closes scylladb/scylladb#19233	2024-06-12 10:02:45 +03:00
Tomasz Grabiec	6d90ff84d9	Merge '[Backport 6.0] tablets: Filter-out left nodes in get_natural_endpoints()' from ScyllaDB The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 (cherry picked from commit `3e1ba4c859`) (cherry picked from commit `0d596a425c`) Refs #18955 Closes scylladb/scylladb#19143 * github.com:scylladb/scylladb: tablets: Filter-out left nodes in get_natural_endpoints() test: pylib: Extract start_writes() load generator utility	2024-06-12 01:31:38 +02:00
Botond Dénes	0d13c51dd4	test/topology_custom: add test for enable_compacting_data_for_streaming_and_repair live-update Avoid this the live-update feature of this config item breaking silently. (cherry picked from commit `8ef4fbdb87`)	2024-06-11 17:32:37 +00:00
Botond Dénes	d4563e2b28	test/pylib: rest_client: add get_injection() The /v2/error_injection/{injection} endpoint now has a GET method too, expose this. (cherry picked from commit `0c61b1822c`)	2024-06-11 17:32:37 +00:00
Botond Dénes	bb18a8152e	api/error_injection: add getter for error_injection Allow external code to obtain information about an error injection point, including whether it is enabled, and importantly, what its parameters are. Together with the `set_parameter()` added in the previous patch, this allows tests to read out the values of internal parameters, via a set_parameter() injection point. (cherry picked from commit `feea609e37`)	2024-06-11 17:32:37 +00:00
Botond Dénes	1947290c74	utils/error_injection: add set_parameter() Allow injection points to write values into the parameter map, which external code can then examine. This allows exfiltrating the values if internal variables, to be examined by tests, without exposing these variables via an "official" path. (cherry picked from commit `4590026b38`)	2024-06-11 17:32:36 +00:00
Botond Dénes	d121fc1264	replica/database: fix live-update enable_compacting_data_for_streaming_and_repair This config item is propagated to the table object via table::config. Although the field in table::config, used to propagate the value, was utils::updateable_value<T>, it was assigned a constant and so the live-update chain was broken. This patch fixes this. (cherry picked from commit `dbccb61636`)	2024-06-11 17:32:36 +00:00
Michał Chojnowski	80ac0da11c	storage_proxy: avoid infinite growth of _throttled_writes storage_proxy has a throttling mechanism which attempts to limit the number of background writes by forcefully raising CL to ALL (it's not implemented exactly like that, but that's the effect) when the amount of background and queued writes is above some fixed threshold. If this is applied to a write, it becomes "throttled", and its ID is appended to into _throttled_writes. Whenever the amount of background and queued writes falls below the threshold, writes are "unthrottled" — some IDs are popped from _throttled_writes and the writes represented by these IDs — if their handlers still exist — have their CL lowered back. The problem here is that IDs are only ever removed from _throttled_writes if the number of queued and background writes falls below the threshold. But this doesn't have to happen in any finite time, if there's constant write pressure. And in fact, in one load test, it hasn't happened in 3 hours, eventually causing the buffer to grow into gigabytes and trigger OOM. This patch is intended to be a good-enough-in-practice fix for the problem. Fixes #17476 Fixes #1834 (cherry picked from commit `fee48f67ef`) Closes scylladb/scylladb#19180	2024-06-11 18:33:38 +03:00
Raphael S. Carvalho	d4c3a43b34	replica: Refresh mutation source when allocating tablet replicas Consider the following: 1) table A has N tablets and views 2) migration starts for a tablet of A from node 1 to 2. 3) migration is at write_both_read_old stage 4) coordinator will push writes to both nodes (pending and leaving) 5) A has view, so writes to it will also result in reads (table::push_view_replica_updates()) 6) tablet's update_effective_replication_map() is not refreshing tablet sstable set (for new tablet migrating in) 7) so read on step 5 is not being able to find sstable set for tablet migrating in Causes the following error: "tablets - SSTable set wasn't found for tablet 21 of table mview.users" which means loss of write on pending replica. The fix will refresh the table's sstable set (tablet_sstable_set) and cache's snapshot. It's not a problem to refresh the cache snapshot as long as the logical state of the data hasn't changed, which is true when allocating new tablet replicas. That's also done in the context of compactions for example. Fixes #19052. Fixes #19033. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `7b41630299`) Closes scylladb/scylladb#19229	2024-06-11 18:12:43 +03:00
Kefu Chai	31ba5561e7	build: remove coverage compiling options from the cxx_flags in `44e85c7d`, we remove coverage compiling options from the cflags when building abseil. but in `535f2b21`, these options were brought back as parts of cxx_flags. so we need to remove them again from cxx_flags. Fixes #19219 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `d05db52d11`) Closes scylladb/scylladb#19237	2024-06-11 18:11:35 +03:00
Tomasz Grabiec	7479167af2	tablets: Filter-out left nodes in get_natural_endpoints() The API already promises this, the comment on effective_replication_map says: "Excludes replicas which are in the left state". Tablet replicas on the replaced node are rebuilt after the node already left. We may no longer have the IP mapping for the left node so we should not include that node in the replica set. Otherwise, storage_proxy may try to use the empty IP and fail: storage_proxy - No mapping for :: in the passed effective replication map It's fine to not include it, because storage proxy uses keyspace RF and not replica list size to determine quorum. The node is not coming up, so noone should need to contact it. Users which need replica list stability should use the host_id-based API. Fixes #18843 (cherry picked from commit `0d596a425c`)	2024-06-11 12:18:17 +02:00
Tomasz Grabiec	e35ab96f8b	test: pylib: Extract start_writes() load generator utility (cherry picked from commit `3e1ba4c859`)	2024-06-11 12:18:17 +02:00
Guilherme Nogueira	1ace370ecd	Remove comma that breaks CQL DML on tablets.rst The current sample reads: ```cql CREATE KEYSPACE my_keyspace WITH replication = { 'class': 'NetworkTopologyStrategy', 'replication_factor': 3, } AND tablets = { 'enabled': false }; ``` The additional comma after `'replication_factor': 3` breaks the query execution. (cherry picked from commit `cf157e4423`) Closes scylladb/scylladb#19194	2024-06-10 20:24:22 +03:00
Kefu Chai	3e7de910ab	docs: correct the link pointing to Scylla U before this change it points to https://university.scylladb.com/courses/scylla-operations/lessons/change-data-capture-cdc/ which then redirects the browser to https://university.scylladb.com/courses/scylla-operations/, but it should have point to https://university.scylladb.com/courses/data-modeling/lessons/change-data-capture-cdc/ in this change, the hyperlink is corrected. Fixes #19163 Refs `6e97b83b60` Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `b5dce7e3d0`) Closes scylladb/scylladb#19198	2024-06-10 20:23:08 +03:00
Kefu Chai	9cf0d618d0	build: populate cxxflags to abseil before this change, when building abseil, we don't pass cxxflags to compiler, and abseil libraries are build with the default optimization level. in the case of clang, its default optimization level is `-O0`, it compiles the fastest, but the performance of the emitted code is not optimized for runtime performance. but we expect good performance for the release build. a typical command line for building abseil looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -MF absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o.d -o absl/base/CMakeFiles/scoped_set_env.dir/internal/scoped_set_env.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/base/internal/scoped_set_env.cc ``` so, in this change, we populate cxxflags to abseil, so that the per-mode `-O` option can be populated when building abseil. after this change, the command line building abseil in release mode looks like ``` clang++ -I/home/kefu/dev/scylladb/master/abseil -ffunction-sections -fdata-sections -O3 -mllvm -inline-threshold=2500 -fno-slp-vectorize -DSCYLLA_BUILD_MODE=release -g -gz -ffile-prefix-map=/home/kefu/dev/scylladb/master=. -march=westmere -std=gnu++20 -Wall -Wextra -Wcast-qual -Wconversion -Wfloat-overflow-conversion -Wfloat-zero-conversion -Wfor-loop-analysis -Wformat-security -Wgnu-redeclared-enum -Winfinite-recursion -Winvalid-constexpr -Wliteral-conversion -Wmissing-declarations -Woverlength-strings -Wpointer-arith -Wself-assign -Wshadow-all -Wshorten-64-to-32 -Wsign-conversion -Wstring-conversion -Wtautological-overlap-compare -Wtautological-unsigned-zero-compare -Wundef -Wuninitialized -Wunreachable-code -Wunused-comparison -Wunused-local-typedefs -Wunused-result -Wvla -Wwrite-strings -Wno-float-conversion -Wno-implicit-float-conversion -Wno-implicit-int-float-conversion -Wno-unknown-warning-option -DNOMINMAX -MD -MT absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -MF absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o.d -o absl/flags/CMakeFiles/flags_commandlineflag_internal.dir/internal/commandlineflag.cc.o -c /home/kefu/dev/scylladb/master/abseil/absl/flags/internal/commandlineflag.cc ``` Refs `0b0e661a85` Fixes #19161 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `535f2b2134`) Closes scylladb/scylladb#19200	2024-06-10 20:22:00 +03:00
Nadav Har'El	4810937ddf	test/alternator: fix flaky test test_item_latency The Alternator test test_metrics.py::test_item_latency confirms that for several operation types (PutItem, GetItem, DeleteItem, UpdateItem) we did not forget to measure their latencies. The test checked that a latency was updated by checking that two metrics increases: scylla_alternator_op_latency_count scylla_alternator_op_latency_sum However, it turns out that the "sum" is only an approximate sum of all latencies, and when the total sum grows large it sometimes does not increase when a short latency is added to the statistics. When this happens, this test fails on the assertion that the "sum" increases after an operation. We saw this happening sometimes in CI runs. The simple fix is to stop checking _sum at all, and only verify that the _count increases - this is really an integer counter that unconditionally increases when a latency is added to the histogram. Don't worry that the strength of this test is reduced - this test was never meant to check the accuracy or correctness of the histograms - we should have different (and better) tests for that, unrelated to Alternator. The purpose of this test is only to verify that for some specific operation like PutItem, Alternator didn't forget to measure its latency and update the histogram. We want to avoid a bug like we had in counters in the past (#9406). Fixes #18847. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `13cf6c543d`) Closes scylladb/scylladb#19193	2024-06-10 20:20:54 +03:00
Tomasz Grabiec	a3e4dc7b6c	test: tablets: Fix flakiness of test_removenode_with_ignored_node due to read timeout The check query may be executed on a node which doesn't yet see that the downed server is down, as it is not shut down gracefully. The query coordinator can choose the down node as a CL=1 replica for read and time out. To fix, wait for all nodes to notice the node is down before executing the checking query. Fixes #17938 (cherry picked from commit `c8f71f4825`) Closes scylladb/scylladb#19199	2024-06-10 20:12:56 +03:00
Botond Dénes	7a6ff12ace	Merge '[Backport 6.0] alternator: keep TTL work in the maintenance scheduling group' from ScyllaDB Alternator has a custom TTL implementation. This is based on a loop, which scans existing rows in the table, then decides whether each row have reached its end-of-life and deletes it if it did. This work is done in the background, and therefore it uses the maintenance (streaming) scheduling group. However, it was observed that part of this work leaks into the statement scheduling group, competing with user workloads, negatively affecting its latencies. This was found to be causes by the reads and writes done on behalf of the alternator TTL, which looses its maintenance scheduling group when these have to go to a remote node. This is because the messaging service was not configured to recognize the streaming scheduling group, when statement verbs like read or writes are invoked. The messaging service currently recognizes two statement "tenants": the user tenant (statement scheduling group) and system (default scheduling group), as we used to have only user-initiated operations and sytsem (internal) ones. With alternator TTL, there is now a need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). This series adds a streaming tenant to the messaging service configuration and it adds a test which confirms that with this change, alternator TTL is entirely contained in the maintenance scheduling group. Fixes: #18719 - [x] Scans executed on behalf of alternator TTL are running in the statement group, disturbing user-workloads, this PR has to be backported to fix this. (cherry picked from commit `5d3f7c13f9`) (cherry picked from commit `1fe8f22d89`) Refs #18729 Closes scylladb/scylladb#19196 * github.com:scylladb/scylladb: alternator, scheduler: test reproducing RPC scheduling group bug main: add maintenance tenant to messaging_service's scheduling config	2024-06-10 19:58:38 +03:00
Anna Stuchlik	e38d675cb9	doc: mark tablets as GA in the CREATE KEYSPACE section This commit removes the information that tablets are an experimental feature from the CREATE KEYSPACE section. In addition, it removes the notes and cautions that are redundant when a feature is GA, especially the information and warnings about the future plans. Fixes https://github.com/scylladb/scylladb/issues/18670 Closes scylladb/scylladb#19063 (cherry picked from commit `55ed18db07`)	2024-06-10 18:53:47 +03:00
Gleb Natapov	45ff4d2c41	group0, topology coordinator: run group0 and the topology coordinator in gossiper scheduling group Currently they both run in streaming group and it may become busy during repair/mv building and affect group0 functionality. Move it to the gossiper group where it should have more time to run. Fixes #18863 (cherry picked from commit `a74fbab99a`) Closes scylladb/scylladb#19175	2024-06-10 10:34:29 +02:00
Nadav Har'El	0662e80917	alternator, scheduler: test reproducing RPC scheduling group bug This patch adds a test for issue #18719: Although the Alternator TTL work is supposedly done in the "streaming" scheduling group, it turned out we had a bug where work sent on behalf of that code to other nodes failed to inherit the correct scheduling group, and was done in the normal ("statement") group. Because this problem only happens when more than one node is involved, the test is in the multi-node test framework test/topology_experimental_raft. The test uses the Alternator API. We already had in that framework a test using the Alternator API (a test for alternator+tablets), so in this patch we move the common Alternator utility functions to a common file, test_alternator.py, where I also put the new test. The test is based on metrics: We write expiring data, wait for it to expire, and then check the metrics on how much CPU work was done in the wrong scheduling group ("statement"). Before #18719 was fixed, a lot of work was done there (more than half of the work done in the right group). After the issue was fixed in the previous patch, the work on the wrong scheduling group went down to zero. Signed-off-by: Nadav Har'El <nyh@scylladb.com> (cherry picked from commit `1fe8f22d89`)	2024-06-10 07:42:23 +00:00
Botond Dénes	5b546ad4b1	main: add maintenance tenant to messaging_service's scheduling config Currently only the user tenant (statement scheduling group) and system (default scheduling group) tenants exist, as we used to have only user-initiated operations and sytem (internal) ones. Now there is need to distinguish between two kinds of system operation: foreground and background ones. The former should use the system tenant while the latter will use the new maintenance tenant (streaming scheduling group). (cherry picked from commit `5d3f7c13f9`)	2024-06-10 07:42:22 +00:00
Piotr Dulikowski	e04378fdf0	Merge ' [Backport 6.0] db/hints: Use host ID to IP mappings to choose the ep manager to drain when node is leaving' from Dawid Mędrek In [`d0f5873`](`d0f58736c8`), we introduced mappings IP–host ID between hint directories and the hint endpoint managers managing them. As a consequence, it may happen that one hint directory stores hints towards multiple nodes at the same time. If any of those nodes leaves the cluster, we should drain the hint directory. However, before these changes that doesn't happen – we only drain it when the node of the same host ID as the hint endpoint manager leaves the cluster. This PR fixes that draining issue in the pre-host-ID-based hinted handoff. Now no matter which of the nodes corresponding to a hint directory leaves the cluster, the directory will be drained. We also introduce error injections to be able to test that it indeed happens. Fixes scylladb/scylladb#18761 (cherry picked from commit [`745a9c6`](`745a9c6ab8`)) (cherry picked from commit [`e855794`](`e855794327`)) Refs scylladb/scylladb#18764 Closes scylladb/scylladb#19114 * github.com:scylladb/scylladb: db/hints: Introduce an error injection to test draining db/hints: Ensure that draining happens	2024-06-10 09:11:07 +02:00
Tomasz Grabiec	f8243cbf19	Merge '[Backport 6.0] Serialize repair with tablet migration' from ScyllaDB We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requests start using the new topology. Also, if transitions are already running, repair waits for them to finish. A blocked tablet migration (e.g. due to down node) will block repair, whereas before it would fail. Once admin resolves the cause of blocked migration, repair will continue. Fixes #17658. Fixes #18561. (cherry picked from commit `6c64cf33df`) (cherry picked from commit `1513d6f0b0`) (cherry picked from commit `476c076a21`) (cherry picked from commit `c45ce41330`) (cherry picked from commit `e97acf4e30`) (cherry picked from commit `98323be296`) (cherry picked from commit `5ca54a6e88`) Refs #18641 Closes scylladb/scylladb#19144 * github.com:scylladb/scylladb: test: pylib: Do not block async reactor while removing directories repair: Exclude tablet migrations with tablet repair repair_service: Propagate topology_state_machine to repair_service main, storage_service: Move topology_state_machine outside storage_service storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() tablet_scheduler: Make disabling of balancing interrupt shuffle mode tablet_scheduler: Log whether balancing is considered as enabled	2024-06-09 00:20:44 +02:00
Tomasz Grabiec	27f01bf4e3	test: pylib: Do not block async reactor while removing directories This fixes a problem where suite cleanup schedules lots of uninstall() tasks for servers started in the suite, which schedules lots of tasks, which synchronously call rmtree(). These take over a minute to finish, which blocks other tasks for tests which are still executing. In particular, this was observed to case ManagerClient.server_stop_gracefully() to time-out. It has a timeout of 60 seconds. The server was stopped quickly, but the RESTful API response was not processed in time and the call timed out when it got the async reactor. (cherry picked from commit `5ca54a6e88`)	2024-06-08 16:31:18 +02:00
Tomasz Grabiec	ded9aca6ee	repair: Exclude tablet migrations with tablet repair We want to exclude repair with tablet migrations to avoid races between repair reads and writes with replica movement. Repair is not prepared to handle topology transitions in the middle. One reason why it's not safe is that repair may successfully write to a leaving replica post streaming phase and consider all replicas to be repaired, but in fact they are not, the new replica would not be repaired. Other kinds of races could result in repair failures. If repair writes to a leaving replica which was already cleaned up, such writes will fail, causing repair to fail. Excluding works by keeping effective_replication_map_ptr in a version which doesn't have table's tablets in transitions. That prevents later transitions from starting because topology coordinator's barrier will wait for that erm before moving to a stage later than allow_write_both_read_old, so before any requets start using the new topology. Also, if transitions are already running, repair waits for them to finish. Fixes #17658. Fixes #18561. (cherry picked from commit `98323be296`)	2024-06-08 16:31:18 +02:00
Tomasz Grabiec	ccd441a4de	repair_service: Propagate topology_state_machine to repair_service (cherry picked from commit `e97acf4e30`)	2024-06-08 16:31:15 +02:00
Jenkins Promoter	79e4e411b3	Update ScyllaDB version to: 6.0.1	2024-06-07 09:31:05 +03:00
Kefu Chai	f8ba94a960	doc: document "enable_tablets" option it sets the cluster feature of tablets, and is a prerequisite for using tablets. Refs #18670 Fixes #19157 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `bac7e1e942`) Closes scylladb/scylladb#19158	2024-06-07 07:03:30 +03:00
Tzach Livyatan	dfe89157c6	Docs: fix start command in Update replace-dead-node.rst Fix #18920 (cherry picked from commit `c30f81c389`) Closes scylladb/scylladb#19142	2024-06-07 07:02:02 +03:00
Kefu Chai	50d8fa6b77	topology_coordinator: handle/wait futures when stopping topology_coordinator before this change, unlike other services in scylla, topology_coordinator is not properly stopped when it is aborted, because the scylla instance is no longer a leader or is being shut down. its `run()` method just stops the grand loop and bails out before topology_coordinator is destroyed. but we are tracking the migration state of tablets using a bunch of futures, which might not be handled yet, and some of them could carry failures. in that case, when the `future` instances with failure state get destroyed, seastar calls `report_failed_future`. and seastar considers this practice a source a bug -- as one just fails to handle an error. that's why we have following error: ``` WARN 2024-05-19 23:00:42,895 [shard 0:strm] seastar - Exceptional future ignored: seastar::rpc::unknown_verb_error (unknown verb), backtrace: /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56c14e /home/bhalevy/.ccm/scylla-repository/local_tarball/libre loc/libseastar.so+0x56c770 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x56ca58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x38c6ad 0x29cdd07 0x29b376b 0x29a5b65 0x108105a /home/bhalevy/.ccm/scylla-repository/local_tarbal l/libreloc/libseastar.so+0x3ff1df /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x400367 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x3ff838 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36de58 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libseastar.so+0x36d092 0x1017cba 0x1055080 0x1016ba7 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27b89 /home/bhalevy/.ccm/scylla-repository/local_tarball/libreloc/libc.so.6+0x27c4a 0x1015524 ``` and the backtrace looks like: ``` seastar::current_backtrace_tasklocal() at ??:? seastar::current_tasktrace() at ??:? seastar::current_backtrace() at ??:? seastar::report_failed_future(seastar::future_state_base::any&&) at ??:? service::topology_coordinator::tablet_migration_state::~tablet_migration_state() at topology_coordinator.cc:? service::topology_coordinator::~topology_coordinator() at topology_coordinator.cc:? service::run_topology_coordinator(seastar::sharded<db::system_distributed_keyspace>&, gms::gossiper&, netw::messaging_service&, locator::shared_token_metadata&, db::system_keyspace&, replica::database&, service::raft_group0&, service::topology_state_machine&, seastar::abort_source&, raft::server&, seastar::noncopyable_function<seastar::future<service::raft_topology_cmd_result> (utils::tagged_tagged_integer<raft::internal::non_final, raft::term_tag, unsigned long>, unsigned long, service::raft_topology_cmd const&)>, service::tablet_allocator&, std::chrono::duration<long, std::ratio<1l, 1000l> >, service::endpoint_lifecycle_notifier&) [clone .resume] at topology_coordinator.cc:? seastar::internal::coroutine_traits_base<void>::promise_type::run_and_dispose() at main.cc:? seastar::reactor::run_some_tasks() at ??:? seastar::reactor::do_run() at ??:? seastar::reactor::run() at ??:? seastar::app_template::run_deprecated(int, char**, std::function<void ()>&&) at ??:? ``` and even worse, these futures are indirectly owned by `topology_coordinator`. so there are chances that they could be used even after `topology_coordinator` is destroyed. this is a use-after-free issue. because the `run_topology_coordinator` fiber exits when the scylla instance retires from the leader's role, this use-after-free could be fatal to a running instance due to undefined behavior of use after free. so, in this change, we handle the futures in `_tablets`, and note down the failures carried by them if any. Fixes #18745 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `4a36918989`) Closes scylladb/scylladb#19139	2024-06-07 07:00:25 +03:00
Jenkins Promoter	a77615adf3	Update ScyllaDB version to: 6.0.0 scylla-6.0.0-candidate-20240606102200 scylla-6.0.0-candidate-20240606081124 scylla-6.0.0	2024-06-06 16:03:39 +03:00
Tomasz Grabiec	e518bb68b2	main, storage_service: Move topology_state_machine outside storage_service It will be propagated to repair_service to avoid cyclic dependency: storage_service <-> repair_service (cherry picked from commit `c45ce41330`)	2024-06-06 13:01:19 +00:00
Tomasz Grabiec	af2caeb2de	storage_srvice, toplogy: Extract topology_state_machine::await_quiesced() Will be used later in a place which doesn't have access to storage_service but has to toplogy_state_machine. It's not necessary to start group0 operation around polling because the busy() state can be checked atomically and if it's false it means the topology is no longer busy. (cherry picked from commit `476c076a21`)	2024-06-06 13:01:19 +00:00
Tomasz Grabiec	d5ebfea1ff	tablet_scheduler: Make disabling of balancing interrupt shuffle mode Tests will rely on that, they will run in shuffle mode, and disable balancing around section which otherwise would be infinitely blocked by ongoing shuffling (like repair). (cherry picked from commit `1513d6f0b0`)	2024-06-06 13:01:18 +00:00

... 2 3 4 5 6 ...

43077 Commits