scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-06-06 15:03:06 +00:00

Author	SHA1	Message	Date
Kefu Chai	cf71fd3977	test: topology_custom: ensure node visibility before keyspace creation Building upon commit `69b47694`, this change addresses a subtle synchronization weakness in node visibility checks during recovery mode testing. Previous Approach: - Waited only for the first node to see its peers - Insufficient to guarantee full cluster consistency Current Solution: 1. Implement comprehensive node visibility verification 2. Ensure all nodes mutually recognize each other 3. Prevent potential schema propagation race conditions Key Improvements: - Robust cluster state validation before keyspace creation - Eliminate partial visibility scenarios Fixes scylladb/scylladb#21724 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21726 (cherry picked from commit `65949ce607`) Closes scylladb/scylladb#21733	2024-12-04 13:57:55 +02:00
André LFA	9cd356d66c	Update report-scylla-problem.rst removing references to old Health Check Report Closes scylladb/scylladb#21467 Fixes scylladb/scylladb#21599 (cherry picked from commit `703e6f3b1f`) Closes scylladb/scylladb#21590	2024-12-04 13:55:31 +02:00
Jenkins Promoter	dd9dcb28a3	Update ScyllaDB version to: 6.1.5	2024-12-01 15:58:56 +02:00
Botond Dénes	3771405482	Merge 'repair: fix task_manager_module::abort_all_repairs' from Aleksandra Martyniuk Currently, task_manager_module::abort_all_repairs marks top-level repairs as aborted (but does not abort them) and aborts all existing shard tasks. A running repair checks whether its id isn't contained in _aborted_pending_repairs and then proceeds to create shard tasks. If abort_all_repairs is executed after _aborted_pending_repairs is checked but before shard tasks are created, then those new tasks won't be aborted. The issue is the most severe for tablet_repair_task_impl that checks the _aborted_pending_repairs content from different shards, that do not see the top-level task. Hence the repair isn't stopped but it creates shard repair tasks on all shards but the one that initialized repair. Abort top-level tasks in abort_all_repairs. Fix the shard on which the task abort is checked. Fixes: #21612. Needs backport to 6.1 and 6.2 as they contain the bug. Closes scylladb/scylladb#21616 * github.com:scylladb/scylladb: test: add test to check if repair is properly aborted repair: add shard param to task_manager_module::is_aborted repair: use task abort source to abort repair repair: drop _aborted_pending_repairs and utilize tasks abort mechanism repair: fix task_manager_module::abort_all_repairs (cherry picked from commit `5ccbd500e0`) Closes scylladb/scylladb#21641	2024-11-25 11:01:12 +02:00
Nadav Har'El	506b366e5d	alternator: fix "/localnodes" to not return down nodes Alternator's "/localnodes" HTTP requests is supposed to return the list of nodes in the local DC to which the user can send requests. Before commit `bac7c33313` we used the gossiper is_alive() method to determine if a node should be returned. That commit changed the check to is_normal() - because a node can be alive but in non-normal (e.g., joining) state and not ready for requests. However, it turns out that checking is_normal() is not enough, because if node is stopped abruptly, other nodes will still consider it "normal", but down (this is so-called "DN" state). So we need to check both is_alive() and is_normal(). This patch also adds a test reproducing this case, where a node is shut down abruptly. Before this patch, the test failed ("/localnodes" continued to return the dead node), and after it it passes. Fixes #21538 Signed-off-by: Nadav Har'El <nyh@scylladb.com> Closes scylladb/scylladb#21540 (cherry picked from commit `7607f5e33e`) Closes scylladb/scylladb#21633 scylla-6.1.4-candidate-20241124103828 scylla-6.1.4	2024-11-21 08:50:44 +02:00
Anna Stuchlik	f2bed0f362	doc: add the 6.0-to-2024.2 upgrade guide-from-6 This commit adds an upgrade guide from ScyllDB 6.0 to ScyllaDB Enterprise 2024.2. Fixes https://github.com/scylladb/scylladb/issues/20063 Fixes https://github.com/scylladb/scylladb/issues/20062 Refs https://github.com/scylladb/scylla-enterprise/issues/4544 (cherry picked from commit `3d4b7e41ef`) Closes scylladb/scylladb#21619	2024-11-18 17:22:12 +02:00
Raphael S. Carvalho	b0bb40e8d4	replica: Fix schema change during migration cleanup During migration cleanup, there's a small window in which the storage group was stopped but not yet removed from the list. So concurrent operations traversing the list could work with stopped groups. During a test which emitted schema changes during migrations, a failure happened when updating the compaction strategy of a table, but since the group was stopped, the compaction manager was unable to find the state for that group. In order to fix it, we'll skip stopped groups when traversing the list since they're unused at this stage of migration and going away soon. Fixes #20699. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `b8d6f864bc`) Closes scylladb/scylladb#21203	2024-11-15 10:40:21 +02:00
Calle Wilund	5058d6af41	cql_test_env/gossip: Prevent double shutdown call crash Fixes #21159 When an exception is thrown in sstable write etc such that storage_manager::isolate is initiated, we start a shutdown chain for message service, gossip etc. These are synced (properly) in storage_manager::stop, but if we somehow call gossiper::shutdown outside the normal service::stop cycle, we can end up running the method simultaneously, intertwined (missing the guard because of the state change between check and set). We then end up co_awaiting an invalid future (_failure_detector_loop_done) - a second wait. Fixed by a.) Remove superfluous gossiper::shutdown in cql_test_env. This was added in `20496ed`, ages ago. However, it should not be needed nowadays. b.) Ensure _failure_detector_loop_done is always waitable. Just to be sure. (cherry picked from commit `c28a5173d9`) Closes scylladb/scylladb#21394	2024-11-15 10:40:04 +02:00
Emil Maskovsky	730d39df40	test/topology_custom: fix the flaky test_raft_recovery_stuck The test is only sending a subset of the running servers for the rolling restart. The rolling restart is checking the visibility of the restarted node agains the other nodes, but if that set is incomplete some of the running servers might not have seen the restarted node yet. Improved the manager client rolling restart method to consider all the running nodes for checking the restarted node visibility. Fixes: scylladb/scylladb#19959 Closes scylladb/scylladb#21477 (cherry picked from commit `92db2eca0b`) Closes scylladb/scylladb#21555	2024-11-15 10:39:18 +02:00
Botond Dénes	78ad345f7f	Merge 'scylla_raid_setup: fix failure on SELinux package installation' from Takuya ASADA After merged `5a470b2bfb`, we found that scylla_raid_setup fails on offline mode installation. This is because pkg_install() just print error and exit script on offline mode, instead of installing packages since offline mode not supposed able to connect internet. Seems like it occur because of missing "policycoreutils-python-utils" package, which is the package for "semange" command. So we need to implement the relabeling patch without using the command. Fixes https://github.com/scylladb/scylladb/issues/21441 Also, since Amazon Linux 2 has different package name for semange, we need to adjust package name. Fixes https://github.com/scylladb/scylladb/issues/21351 Closes scylladb/scylladb#21474 * github.com:scylladb/scylladb: scylla_raid_setup: support installing semanage on Amazon Linux 2 scylla_raid_setup: fix failure on SELinux package installation (cherry picked from commit `1c212df62d`) Closes scylladb/scylladb#21546	2024-11-14 15:57:47 +02:00
Botond Dénes	4610dde4da	streaming: stream-session: switch to tracking permit The stream-session is the receiving end of streaming, it reads the mutation fragment stream from an RPC stream and writes it onto the disk. As such, this part does no disk IO and therefore, using a permit with count resources is superfluous. Furthermore, after `d98708013c`, the count resources on this permit can cause a deadlock on the receiver end, via the `db::view::check_view_update_path()`, which wants to read the content of a system table and therefore has to obtain a permit of its own. Switch to a tracking-only permit, primarily to resolve the deadlock, but also because admission is not necessary for a read which does no IO. Refs: scylladb/scylladb#20885 (partial fix, solves only one of the deadlocks) Fixes: scylladb/scylladb#21264 Fixes: scylladb/scylladb#21570 Closes scylladb/scylladb#21059 (cherry picked from commit `7c75fc599f`) Closes scylladb/scylladb#21571	2024-11-14 12:45:03 +02:00
Botond Dénes	ecb9cb374e	Merge '[Backport 6.1] compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors' from ScyllaDB stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them by continue with shutdown. stop_ongoing_compactions, in particular, currently returns the status of stopped compaction tasks from `stop_tasks`, but still all tasks must be stopped after it, even if they failed, so assert that and ignore the errors. Fixes scylladb/scylladb#21159 * Needs backport to 6.2 and 6.1, as commit `8cc99973eb` causes handles storage that might cause compaction tasks to fail and eventually terminate on shudown when the exceptions are thrown in noexcept context in the deferred stop destructor body (cherry picked from commit `e942c074f2`) (cherry picked from commit `d8500472b3`) (cherry picked from commit `c08ba8af68`) (cherry picked from commit `a7a55298ea`) (cherry picked from commit `6cce67bec8`) Refs #21299 Closes scylladb/scylladb#21435 * github.com:scylladb/scylladb: compaction_manager: stop: await _stop_future if engaged compaction_manager: really_do_stop: assert that no tasks are left behind compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors compaction/compaction_manager: stop_tasks(): unlink stopped tasks compaction/compaction_manager: make _tasks an intrusive list	2024-11-14 07:00:28 +02:00
Benny Halevy	5f9b3b08f4	compaction_manager: stop: await _stop_future if engaged The current condition that consults the compaction manager state for awaiting `_stop_future` works since _stop_future is assigned after the state is set to `stopped`, but it is incidental. What matters is that `_stop_future` is engaged. While at it, exchange _stop_future with a ready future so that stop() can be safely called multiple times. And dropped the superfluous co_return. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `6cce67bec8`)	2024-11-12 15:21:04 +02:00
Benny Halevy	fe03c9b724	compaction_manager: really_do_stop: assert that no tasks are left behind stop_ongoing_compactions now ignores any errors returned by tasks, and it should leave no task left behind. Assert that here, before the compaction_manager is destroyed. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `a7a55298ea`)	2024-11-12 15:21:00 +02:00
Benny Halevy	cbddf18727	compaction_manager: stop_tasks, stop_ongoing_compactions: ignore errors stop() methods, like destructors must always succeed, and returning errors from them is futile as there is nothing else we can do with them but continue with shutdown. Leaked errors on the stop path may cause termination on shutdown, when called in a deferred action destructor. Fixes scylladb/scylladb#21298 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `c08ba8af68`)	2024-11-12 15:14:21 +02:00
Botond Dénes	2a32e2ae82	compaction/compaction_manager: stop_tasks(): unlink stopped tasks Stopped tasks currently linger in _tasks until the fiber that created the task is scheduled again and unlinks the task. This window between stop and remove prevents reliable checks for empty _tasks list after all tasks are stopped. Unlink the task early so really_do_stop() can safely check for an empty _tasks list (next patch). (cherry picked from commit `d8500472b3`)	2024-11-12 15:13:32 +02:00
Botond Dénes	d63b9efa7e	compaction/compaction_manager: make _tasks an intrusive list _tasks is currently std::list<shared_ptr<compaction_task_executor>>, but it has no role in keeping the instances alive, this is done by the fibers which create the task (and pin a shared ptr instance). This lends itself to an intrusive list, avoiding that extra allocation upon push_back(). Using an intrusive list also makes it simpler and much cheaper (O(1) vs. O(N)) to remove tasks from the _tasks list. This will be made use of in the next patch. Code using _task has to be updated because the value_type changes from shared_ptr<compaction_task_executor> to compaction_task_executor&. (cherry picked from commit `e942c074f2`)	2024-11-12 11:42:34 +02:00
Yaron Kaikov	a1fea6b225	./github/workflows/add-label-when-promoted.yaml: Run auto-backport only on default branch In https://github.com/scylladb/scylladb/pull/21496#event-15221789614 ``` scylladbbot force-pushed the backport/21459/to-6.1 branch from 414691c to `59a4ccd` Compare 2 days ago ``` Backport automation triggered by `push` but also should either start from `master` branch (or `enterprise` branch from Enterprise), we need to verify it by checking also the default branch. Fixes: https://github.com/scylladb/scylladb/issues/21514 Closes scylladb/scylladb#21515 (cherry picked from commit `2596d1577b`) Closes scylladb/scylladb#21530	2024-11-11 17:44:41 +02:00
Michał Chojnowski	04b3d96259	mvcc_test: fix a benign failure of test_apply_to_incomplete_respects_continuity For performance reasons, mutation_partition_v2::maybe_drop(), and by extension also mutation_partition_v2::apply_monotonically(mutation_partition_v2&&) can evict empty row entries, and hence change the continuity of the merged entry. For checking that apply_to_incomplete respects continuity, test_apply_to_incomplete_respects_continuity obtains the continuity of the partition entry before and after apply_to_incomplete by calling e.squashed().get_continuity(). But squashed() uses apply_monotonically(), so in some circumstances the result of squashed() can have smaller continuity than the argument of squashed(), which messes with the thing that the test is trying to check, and causes spurious failures. This patch changes the method of calculating the continuity set, so that it matches the entry exactly, fixing the test failures. Fixes scylladb/scylladb#13757 Closes scylladb/scylladb#21459 (cherry picked from commit `35921eb67e`) Closes scylladb/scylladb#21496	2024-11-08 15:33:20 +01:00
Yaron Kaikov	236b235a89	.github/scripts/auto-backport.py: update method to get closed prs `commit.get_pulls()` in PyGithub returns pull requests that are directly associated with the given commit Since in closed PR. the relevant commit is an event type, the backport automation didn't get the PR info for backporting Ref: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21468 (cherry picked from commit `ef104b7b96`) Closes scylladb/scylladb#21482	2024-11-08 10:26:44 +02:00
Yaron Kaikov	3ddb61c90e	.github/script/auto-backport.py: push backport PR to `scylladbbot` fork Since Scylla is a public repo, when we create a fork, it doesn't fork the team and permissions (unlike private repos where it does). When we have a backport PR with conflicts, the developers need to be able to update the branch to fix the conflicts. To do so, we modified the logic of the backport automation as follows: - Every backport PR (with and without conflicts) will be open directly on the `scylladbbot` fork repo - When there are conflicts, an email will be sent to the original PR author with an invitation to become a contributor in the `scylladbbot` fork with `push` permissions. This will happen only once if Auther is not a contributor. - Together with sending the invite, all backport labels will be removed and a comment will be added to the original PR with instructions - The PR author must add the backport labels after the invitation is accepted Fixes: https://github.com/scylladb/scylladb/issues/18973 Closes scylladb/scylladb#21401 (cherry picked from commit `77604b4ac7`) Closes scylladb/scylladb#21465	2024-11-07 15:05:56 +02:00
Yaron Kaikov	160823ccaf	github: add script for backports automation instead of Mergify Adding an auto-backport.py script to handle backport automation instead of Mergify. The rules of backport are as follows: * Merged or Closed PRs with any backport/x.y label (one or more) and promoted-to-master label * Backport PR will be automatically assigned to the original PR author * In case of conflicts the backport PR will be open in the original autoor fork in draft mode. This will give the PR owner the option to resolve conflicts and push those changes to the PR branch (Today in Scylla when we have conflicts, the developers are forced to open another PR and manually close the backport PR opened by Mergify) * Fixing cherry-pick the wrong commit SHA. With the new script, we always take the SHA from the stable branch * Support backport for enterprise releases (from Enterprise branch) Fixes: https://github.com/scylladb/scylladb/issues/18973 (cherry picked from commit `f9e171c7af`) Closes scylladb/scylladb#21470	2024-11-07 06:58:16 +02:00
Jenkins Promoter	9ff31c6c4e	Update ScyllaDB version to: 6.1.4	2024-11-06 16:08:17 +02:00
Botond Dénes	6a66faab41	Merge '[Backport 6.1] repair: Fix finished ranges metrics for removenode' from ScyllaDB The skipped ranges should be multiplied by the number of tables Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`) (cherry picked from commit `1392a6068d`) (cherry picked from commit `9868ccbac0`) Refs #21252 Closes scylladb/scylladb#21314 * github.com:scylladb/scylladb: test: Add test_node_ops_metrics.py repair: Make the ranges more consistent in the log repair: Fix finished ranges metrics for removenode	2024-11-05 09:44:29 +02:00
Tzach Livyatan	c1e42cacac	Update os-support-info.rst - add CentOS ScyllaDB support RHEL 9 and derivatives, including CentOS 9. Fix https://github.com/scylladb/scylladb/issues/21309 (cherry picked from commit `1878af9399`) Closes scylladb/scylladb#21333	2024-11-05 09:43:51 +02:00
Benny Halevy	baa4d1a6e7	compaction_manager: compaction_disabled: return true if not in compaction_state When a compaction_group is removed via `compaction_manager::remove`, it is erase from `_compaction_state`, and therefore compaction is definitely not enabled on it. This triggers an internal error if tablets are cleaned up during drop/truncate, which checks that compaction is disabled in all compaction groups. Note that the callers of `compaction_disabled` aren't really interested in compaction being actively disabled on the compaction_group, but rather if it's enabled or not. A follow-up patch can be consider to reverse the logic and expose `compaction_enabled` rather than `compaction_disabled`. Fixes scylladb/scylladb#20060 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `78ceaeabca`) Closes scylladb/scylladb#21405	2024-11-05 09:42:01 +02:00
Kamil Braun	b057168dd0	Merge '[Backport 6.1] cql/tablets: fix retrying ALTER tablets KEYSPACE' from Marcin Maliszkiewicz ALTER tablets-enabled KEYSPACES (KS) may fail due to group0_concurrent_modification, in which case it's repeated by a for loop surrounding the code. But because raft's add_entry consumes the raft's guard (by std::move'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned for loop altogether and rethrow the exception, as the rf_change event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. Note: refactor is implemented in the follow-up commit. Fixes: https://github.com/scylladb/scylladb/issues/21102 Should be backported to every 6.x branch, as it may lead to a crash. (cherry picked from commit `de511f56ac`) (cherry picked from commit `3f4c8a30e3`) (cherry picked from commit `522bede8ec`) Refs https://github.com/scylladb/scylladb/pull/21121 Closes scylladb/scylladb#21340 * github.com:scylladb/scylladb: test: topology: add disable_schema_agreement_wait utility function test: add UT to test retrying ALTER tablets KEYSPACE cql/tablets: fix indentation in `rf_change` event handler cql/tablets: fix retrying ALTER tablets KEYSPACE	2024-11-04 12:23:47 +01:00
Benny Halevy	7dbe39a9a5	storage_service: on_change: update_peer_info only if peer info changed Return an optional peer_info from get_peer_info_for_update when the `app_state_map` arg does not change peer_info, so that we can skip calling update_peer_info, if it didn't change. Fixes scylladb/scylladb#20991 Refs scylladb/scylladb#16376 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21152 (cherry picked from commit `04d741bcbb`)	2024-11-04 11:44:05 +02:00
Tomasz Grabiec	eec3e22c6a	node-exporter: Disable hwmon collector This collector reads nvme temperature sensor, which was observed to cause bad performance on Azure cloud following the reading of the sensor for ~6 seconds. During the event, we can see elevated system time (up to 30%) and softirq time. CPU utilization is high, with nvm_queue_rq taking several orders of magnitude more time than normally. There are signs of contention, we can see __pv_queued_spin_lock_slowpath in the perf profile, called. This manifests as latency spikes and potentially also throughput drop due to reduced CPU capacity. By default, the monitoring stack queries it once every 60s. (cherry picked from commit `93777fa907`) Closes scylladb/scylladb#21305	2024-10-31 14:05:38 +01:00
Marcin Maliszkiewicz	7d87f744ea	test: topology: add disable_schema_agreement_wait utility function Code extracted from `fa45fdf5f7` as it's being used by test_alter_tablets_keyspace_concurrent_modification and we're backporting it.	2024-10-30 16:57:19 +01:00
Piotr Smaron	d8e36873cf	test: add UT to test retrying ALTER tablets KEYSPACE The newly added testcase is based on the already existing `test_alter_dropped_tablets_keyspace`. A new error injection is created, which stops the ALTER execution just before the changes are submitted to RAFT. In the meantime, a new schema change is performed using the 2nd node in the cluster, thus causing the 1st node to retry the ALTER statement. (cherry picked from commit `522bede8ec`)	2024-10-30 16:49:33 +01:00
Piotr Smaron	1dddd2a8ca	cql/tablets: fix indentation in `rf_change` event handler Just moved the code that previously was under a `for` loop by 1 tab, i.e. 4 spaces, to the left. (cherry picked from commit `3f4c8a30e3`)	2024-10-30 16:49:33 +01:00
Piotr Smaron	ab333f2453	cql/tablets: fix retrying ALTER tablets KEYSPACE ALTER tablets-enabled KEYSPACES (KS) may fail due to `group0_concurrent_modification`, in which case it's repeated by a `for` loop surrounding the code. But because raft's `add_entry` consumes the raft's guard (by `std::move`'ing the guard object), retries of ALTER KS will use a moved-from guard object, which is UB, potentially a crash. The fix is to remove the before mentioned `for` loop altogether and rethrow the exception, as the `rf_change` event will be repeated by the topology state machine if it receives the concurrent modification exception, because the event will remain present in the global requests queue, hence it's going to be executed as the very next event. `topology_coordinator::handle_topology_coordinator_error` handling the case of `group0_concurrent_modification` has been extended with logging in order not to write catch-log-throw boilerplate. Note: refactor is implemented in the follow-up commit. Fixes: scylladb/scylladb#21102 (cherry picked from commit `de511f56ac`)	2024-10-30 16:49:33 +01:00
Gleb Natapov	0b502a2610	topology coordinator: take a copy of a replication state in raft_topology_cmd_handler Current code takes a reference and holds it past preemption points. And while the state itself is not suppose to change the reference may become stale because the state is re-created on each raft topology command. Fix it by taking a copy instead. This is a slow path anyway. Fixes: scylladb/scylladb#21220 (cherry picked from commit `fb38bfa35d`) Closes scylladb/scylladb#21373	2024-10-30 14:12:44 +01:00
Kamil Braun	51f7ff8697	Merge '[Backport 6.1] storage_proxy: Add conditions checking to avoid UB in speculating read executors.' from ScyllaDB During the investigation of scylladb/scylladb#20282, it was discovered that implementations of speculating read executors have undefined behavior when called with an incorrect number of read replicas. This PR introduces two levels of condition checking: - Condition checking in speculating read executors for the number of replicas. - Checking the consistency of the Effective Replication Map in filter_for_query(): the map is considered incorrect if the list of replicas contains a node from a data center whose replication factor is 0. Please note: This PR does not fix the issue found in scylladb/scylladb#20282; it only adds condition checks to prevent undefined behavior in cases of inconsistent inputs. Refs scylladb/scylladb#20625 As this issue applies to the releases versions and can affect clients, we need backports to 6.0, 6.1, 6.2. (cherry picked from commit `132358dc92`) (cherry picked from commit `ae23d42889`) (cherry picked from commit `ad93cf5753`) (cherry picked from commit `8db6d6bd57`) (cherry picked from commit `c373edab2d`) Refs #20851 Closes scylladb/scylladb#21068 * github.com:scylladb/scylladb: Add conditions checking for get_read_executor Avoid an extra call to block_for in db::filter_for_query. Improve code readability in consistency_level.cc and storage_proxy.cc tools: Add build_info header with functions providing build type information tests: Add tests for alter table with RF=1 to RF=0	2024-10-29 12:32:48 +01:00
Asias He	9fdc596ff7	test: Add test_node_ops_metrics.py It tests the node_ops_metrics_done metric reaches 100% when a node ops is done. Refs: #21174 (cherry picked from commit `9868ccbac0`)	2024-10-28 09:54:30 +00:00
Asias He	5a2196b94a	repair: Make the ranges more consistent in the log Consider the number of tables for the number of ranges logging. Make it more consistent with the log when the ops starts. (cherry picked from commit `1392a6068d`)	2024-10-28 09:54:30 +00:00
Asias He	34cb594dd5	repair: Fix finished ranges metrics for removenode The skipped ranges should be multiplied by the number of tables. Otherwise the finished ranges ratio will not reach 100%. Fixes #21174 (cherry picked from commit `cffe3dc49f`)	2024-10-28 09:54:30 +00:00
Lakshmi Narayanan Sreethar	91c693bf93	[Backport 6.1] replica/table: check memtable before discarding tombstone during read On the read path, the compacting reader is applied only to the sstable reader. This can cause an expired tombstone from an sstable to be purged from the request before it has a chance to merge with deleted data in the memtable leading to data resurrection. Fix this by checking the memtables before deciding to purge tombstones from the request on the read path. A tombstone will not be purged if a key exists in any of the table's memtables with a minimum live timestamp that is lower than the maximum purgeable timestamp. Fixes #20916 `perf-simple-query` stats before and after this fix : `build/Dev/scylla perf-simple-query --smp=1 --flush` : ``` // Before this Fix // --------------- 94941.79 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59393 insns/op, 24029 cycles/op, 0 errors) 97551.14 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59376 insns/op, 23966 cycles/op, 0 errors) 96599.92 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59367 insns/op, 23998 cycles/op, 0 errors) 97774.91 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59370 insns/op, 23968 cycles/op, 0 errors) 97796.13 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59368 insns/op, 23947 cycles/op, 0 errors) throughput: mean=96932.78 standard-deviation=1215.71 median=97551.14 median-absolute-deviation=842.13 maximum=97796.13 minimum=94941.79 instructions_per_op: mean=59374.78 standard-deviation=10.78 median=59369.59 median-absolute-deviation=6.36 maximum=59393.12 minimum=59367.02 cpu_cycles_per_op: mean=23981.67 standard-deviation=32.29 median=23967.76 median-absolute-deviation=16.33 maximum=24029.38 minimum=23947.19 // After this Fix // -------------- 95313.53 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59392 insns/op, 24058 cycles/op, 0 errors) 97311.48 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59375 insns/op, 24005 cycles/op, 0 errors) 98043.10 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59381 insns/op, 23941 cycles/op, 0 errors) 96750.31 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59396 insns/op, 24025 cycles/op, 0 errors) 93381.21 tps ( 71.1 allocs/op, 0.0 logallocs/op, 14.1 tasks/op, 59390 insns/op, 24097 cycles/op, 0 errors) throughput: mean=96159.93 standard-deviation=1847.88 median=96750.31 median-absolute-deviation=1151.55 maximum=98043.10 minimum=93381.21 instructions_per_op: mean=59386.60 standard-deviation=8.78 median=59389.55 median-absolute-deviation=6.02 maximum=59396.40 minimum=59374.73 cpu_cycles_per_op: mean=24025.13 standard-deviation=58.39 median=24025.17 median-absolute-deviation=32.67 maximum=24096.66 minimum=23941.22 ``` This PR fixes a regression introduced in `ce96b472d3` and should be backported to older versions. Closes scylladb/scylladb#20985 * github.com:scylladb/scylladb: topology-custom: add test to verify tombstone gc in read path replica/table: check memtable before discarding tombstone during read compaction_group: track maximum timestamp across all sstables (cherry picked from commit `519e167611`) Backported from #20985 to 6.1. Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#21250 scylla-6.1.3 scylla-6.1.3-candidate-20241028103030	2024-10-25 11:13:54 +03:00
Piotr Dulikowski	77f0533a01	SCYLLA-VERSION-GEN: correct the logic for skipping SCYLLA--FILE The SCYLLA-VERSION-GEN file skips updating the SCYLLA--FILE files if the commit hash from SCYLLA-RELEASE-FILE is the same. The original reason for this was to prevent the date in the version string from changing if multiple modes are built across midnight (scylladb/scylla-pkg#826). However - intentionally or not - it serves another purpose: it prevents an infinite loop in the build process. If the build.ninja file needs to be rebuilt, the configure.py script unconditionally calls ./SCYLLA-VERSION-GEN. On the other hand, if one of the SCYLLA-*-FILE files is updated then this triggers rebuild of build.ninja. Apparently, this is sufficient for ninja to enter an infinite loop. However, the check assumes that the RELEASE is in the format <build identifier>.<date>.<commit hash> and assumes that none of the components have a dot inside - otherwise it breaks and just works incorrectly. Specifically, when building a private version, it is recommended to set the build identifier to `count.yourname`. Previously, before `85219e9`, this problem wasn't noticed most likely because reconfigure process was broken and stopped overwriting the build.ninja file after the first iteration. Fix the problem by fixing the logic that extracts the commit hash - instead of looking at the third dot-separated field counting from the left side, look at the last field. Fixes: scylladb/scylladb#21027 (cherry picked from commit `64ca58125e`) Closes scylladb/scylladb#21104	2024-10-25 11:09:51 +03:00
Benny Halevy	145230e032	storage_service: rebuild: warn about tablets-enabled keyspaces Until we automatically support rebuild for tablets-enabled keyspaces, warn the user about them. The reason this is not an error, is that after increasing RF in a new datacenter, the current procedure is to run `nodetool rebuild` on all nodes in that dc to rebuild the new vnode replicas. This is not required for tablets, since the additional replicas are rebuilt automatically as part of ALTER KS. However, `nodetool rebuild` is also run after local data loss (e.g. due to corruption and removal of sstables). In this case, rebuild is not supported for tablets-enabled keyspaces, as tablet replicas that had lost data may have already been migrated to other nodes, and rebuilding the requested node will not know about it. It is advised to repair all nodes in the datacenter instead. Refs scylladb/scylladb#17575 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `ed1e9a1543`) Closes scylladb/scylladb#20723	2024-10-25 11:06:38 +03:00
Tomasz Grabiec	39c1a448f6	Merge '[Backport 6.1] replica: Fix tombstone GC during tablet split preparation' from Raphael Raph Carvalho During split prepare phase, there will be more than 1 compaction group with overlapping token range for a given replica. Assume tablet 1 has sstable A containing deleted data, and sstable B containing a tombstone that shadows data in A. Then split starts: sstable B is split first, and moved from main (unsplit) group to a split-ready group now compaction runs in split-ready group before sstable A is split tombstone GC logic today only looks at underlying group, so compaction is step 2 will discard the deleted data in A, since it belongs to another group (the unsplit one), and so the tombstone can be purged incorrectly. To fix it, compaction will now work with all uncompacting sstables that belong to the same replica, since tombstone GC requires all sstables that possibly contain shadowed data to be available for correct decision to be made. Fixes https://github.com/scylladb/scylladb/issues/20044. Please replace this line with justification for the backport/* labels added to this PR Branches 6.0, 6.1 and 6.2 are vulnerable, so backport is needed. (cherry picked from commit `bcd358595f`) (cherry picked from commit `93815e0649`) Refs https://github.com/scylladb/scylladb/pull/20939 Closes scylladb/scylladb#21205 * github.com:scylladb/scylladb: replica: Fix tombstone GC during tablet split preparation service: Improve error handling for split	2024-10-23 11:41:36 +02:00
Botond Dénes	03f370e971	Merge '[Backport 6.1] Check system.tablets update before putting it into the table' from ScyllaDB Having tablet metadata with more than 1 pending replica will prevent this metadata from being (re)loaded due to sanity check on load. This patch fails the operation which tries to save the wrong metadata with a similar sanity check. For that, changes submitted to raft are validated, and if it's topology_change that affects system.tablets, the new "replicas" and "new_replicas" values are checked similarly to how they will be on (re)load. fixes #20043 (cherry picked from commit `f09fe4f351`) (cherry picked from commit `e5bf376cbc`) (cherry picked from commit `1863ccd900`) Refs #21020 Closes scylladb/scylladb#21110 * github.com:scylladb/scylladb: tablets: Validate system.tablets update group0_client: Introduce change validation group0_client: Add shared_token_metadata dependency replica/tablets: Add to_tablet_metadata_(row_)?key helpers replica/tablets: extract tablet_replica_set_from_cell()	2024-10-23 10:02:13 +03:00
Pavel Emelyanov	c52e5a8a87	tablets: Validate system.tablets update Implement change validation for raft topology_change command. For now the only check is that the "pending replicas" contains at most one entry. The check mirrors similar one in `process_one_row` function. If not passed, this prevents system.tablets from being updated with the mutation(s) that will not be loaded later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:17:00 +03:00
Pavel Emelyanov	337c777635	group0_client: Introduce change validation Add validate_change() methods (well, a template and an overload) that are called by prepare_command() and are supposed to validate the proposed change before it hits persistent storage Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:56 +03:00
Pavel Emelyanov	881ec8600f	group0_client: Add shared_token_metadata dependency It will be needed later to get tablet_metadata from. The dependency is "OK", shared_token_metadata is low-level sharded service. Client already references db::system_keyspace, which in turn references replica::database which, finally, references token_metadata Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:52 +03:00
Pavel Emelyanov	4bed029b56	replica/tablets: Add to_tablet_metadata_(row_)?key helpers Extraceted from larger patch `f5976aa87b` (replica/tablets: add get_tablet_metadata_change_hint() and update_tablet_metadata_change_hint()) by Botond. The helpers are needed to decode mutations with tablets update to validate them later. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com>	2024-10-22 13:16:47 +03:00
Kefu Chai	751f1fda16	replica/tablets: extract tablet_replica_set_from_cell() so it can be reused to implement a low-level tool which reads tablets data from sstables Refs scylladb/scylladb#16488 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com>	2024-10-22 13:16:44 +03:00
Botond Dénes	0d41447e1a	Merge '[Backport 6.1] atomic_delete: allow deletion of sstables from several prefixes' from ScyllaDB Allow create_pending_deletion_log to delete a bunch of sstables potentially resides in different prefixes (e.g. in the base directory and under staging/). The motivation arises from table::cleanup_tablet that calls compaction_group::cleanup on all cg:s via cleanup_compaction_groups. Cleanup, in turn, calls delete_sstables_atomically on all sstables in the compaction_group, in all states, including the normal state as well as staging - hence the requirement to support deleting sstables in different sub-directories. Also, apparently truncate calls delete_atomically for all sstables too, via table::discard_sstables, so if it happened to be executed during view update generation, i.e. when there are sstables in staging, it should hit the assertion failure reported in https://github.com/scylladb/scylladb/issues/18862 as well (although I haven't seen it yet, but I see no reason why it would happen). So the issue was apparently present since the initial implementation of the pending_delete_log. It's just that with tablet migration it is more likely to be hit. Fixes scylladb/scylladb#18862 Needs backport to 6.0 since tablets require this capability (cherry picked from commit `a7b92d7b6f`) (cherry picked from commit `027e64876a`) (cherry picked from commit `44bd183187`) (cherry picked from commit `f47b5e60bc`) Refs #19555 Closes scylladb/scylladb#20644 * github.com:scylladb/scylladb: sstable_directory: create_pending_deletion_log: place pending_delete log under the base directory sstables: storage: keep base directory in base class sstables: storage: define opened_directory in header file sstable_directory: use only dirlog	2024-10-22 09:17:26 +03:00
Benny Halevy	71d90b2fbc	view: check_needs_view_update_path: get token_metadata_ptr check_needs_view_update_path is async and might yield so the token_metadata reference passed to it must be kept alive throughout the call. Fixes scylladb/scylladb#20979 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d34878e96c`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#21039	2024-10-22 09:16:40 +03:00

1 2 3 4 5 ...

43780 Commits