scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-27 20:05:10 +00:00

Author	SHA1	Message	Date
Michael Litvak	36b1a486de	cdc: fix handling of new generation during raft upgrade During raft upgrade, a node may gossip about a new CDC generation that was propagated through raft. The node that receives the generation by gossip may have not applied the raft update yet, and it will not find the generation in the system tables. We should consider this error non-fatal and retry to read until it succeeds or becomes obsolete. Another issue is when we fail with a "fatal" exception and not retrying to read, the cdc metadata is left in an inconsistent state that causes further attempts to insert this CDC generation to fail. What happens is we complete preparing the new generation by calling `prepare`, we insert an empty entry for the generation's timestamp, and then we fail. The next time we try to insert the generation, we skip inserting it because we see that it already has an entry in the metadata and we determine that there's nothing to do. But this is wrong, because the entry is empty, and we should continue to insert the generation. To fix it, we change `prepare` to return `true` when the entry already exists but it's empty, indicating we should continue to insert the generation. Fixes scylladb/scylladb#21227 Closes scylladb/scylladb#22093 (cherry picked from commit `4f5550d7f2`) Closes scylladb/scylladb#22545	2025-01-29 19:52:18 +02:00
Kefu Chai	7da4223411	compress: fix compressor initialization order by making namespace_prefix a function Fixes a race condition where COMPRESSOR_NAME in zstd.cc could be initialized before compressor::namespace_prefix due to undefined global variable initialization order across translation units. This was causing ZstdCompressor to be unregistered in release builds, making it impossible to create tables with Zstd compression. Replace the global namespace_prefix variable with a function that returns the fully qualified compressor name. This ensures proper initialization order and fixes the registration of the ZstdCompressor. Fixes scylladb/scylladb#22444 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22451 (cherry picked from commit `4a268362b9`) Closes scylladb/scylladb#22510	2025-01-27 19:50:23 +02:00
Calle Wilund	2b40f405f0	commitlog: Fix assertion in oversized_alloc Fixes #20633 Cannot assert on actual request_controller when releasing permit, as the release, if we have waiters in queue, will subtract some units to hand to them. Instead assert on permit size + waiter status (and if zero, also controller value) * v2 - use SCYLLA_ASSERT (cherry picked from commit `58087ef427`) Closes scylladb/scylladb#22455	2025-01-26 16:46:35 +02:00
Kamil Braun	399325f0f0	Merge '[Backport 6.2] raft: Handle non-critical config update errors in when changing voter status.' from Sergey Z When a node is bootstrapped and joined a cluster as a non-voter and changes it's role to a voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 Backport: This issue occurs frequently and disrupts the CI workflow to some extent. Backports are needed for versions 6.1 and 6.2. - (cherry picked from commit `775411ac56`) - (cherry picked from commit `16053a86f0`) - (cherry picked from commit `8c48f7ad62`) - (cherry picked from commit `3da4848810`) - (cherry picked from commit `228a66d030`) Parent PR: #22253 Closes scylladb/scylladb#22358 * github.com:scylladb/scylladb: raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. raft: Refactor functions using `modify_config` to use a common wrapper for retrying. raft: Handle non-critical config update errors in when changing status to voter. test: Add test to check that a node does not fail on unknown commit status error when starting up. raft: Add run_op_with_retry in raft_group0.	2025-01-24 17:05:50 +01:00
Sergey Zolotukhin	9730f98d34	raft: refactor `remove_from_raft_config` to use a timed `modify_config` call. To avoid potential hangs during the `remove_from_raft_config` operation, use a timed `modify_config` call. This ensures the operation doesn't get stuck indefinitely. (cherry picked from commit `228a66d030`)	2025-01-22 09:54:28 +01:00
Sergey Zolotukhin	d419fb4a0c	raft: Refactor functions using `modify_config` to use a common wrapper for retrying. There are several places in `raft_group0` where almost identical code is used for retrying `modify_config` in case of `commit_status_unknown` error. To avoid code duplication all these places were changed to use a new wrapper `run_op_with_retry`. (cherry picked from commit `3da4848810`)	2025-01-22 09:54:26 +01:00
Nadav Har'El	bff9ddde12	Merge '[Backport 6.2] view_builder: write status to tables before starting to build' from null When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes #20638 The PR contains few additional small fixes in separate commits related to the view build status table. It addresses flakiness issues in tests that use the view build status table to determine when view building is complete. The table may be in incorrect state due to these issues, having a row with status STARTED when it actually finished building the view, which will cause us to wait in `wait_for_view` until it timeouts. For testing I used a test similar to `test_view_build_status_with_replace_node`, but it only creates the views and calls `wait_for_view`. Without these commits it failed in 4/1024 runs, and with the commits it passed 2048/2048. backport to fix the bugs that affects previous versions and improve CI stability - (cherry picked from commit `b1be2d3c41`) - (cherry picked from commit `1104411f83`) - (cherry picked from commit `7a6aec1a6c`) Parent PR: #22307 Closes scylladb/scylladb#22356 * github.com:scylladb/scylladb: view_builder: hold semaphore during entire startup view_builder: pass view name by value to write_view_build_status view_builder: write status to tables before starting to build scylla-6.2.3-candidate-20250119105533 scylla-6.2.3	2025-01-19 15:36:44 +02:00
Michael Litvak	e8fde3b0a3	view_builder: hold semaphore during entire startup Guard the whole view builder startup routine by holding the semaphore until it's done instead of releasing it early, so that it's not intercepted by migration notifications. (cherry picked from commit `7a6aec1a6c`)	2025-01-19 11:03:53 +02:00
Michael Litvak	a7f8842776	view_builder: pass view name by value to write_view_build_status The function write_view_build_status takes two lambda functions and chooses which of them to run depending on the upgrade state. It might run both of them. The parameters ks_name and view_name should be passed by value instead of by reference because they are moved inside each lambda function. Otherwise, if both lambdas are run, the second call operates on invalid values that were moved. (cherry picked from commit `1104411f83`)	2025-01-19 11:03:53 +02:00
Piotr Dulikowski	b88e0f8f74	Merge '[Backport 6.2] main, view: Pair view builder drain with its start' from null In this PR, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534, so we revert that patch. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Backport: The patch I'm reverting made it to 6.2, so we want to backport this one there too. Fixes scylladb/scylladb#20772 Fixes scylladb/scylladb#21534 - (cherry picked from commit `a5715086a4`) - (cherry picked from commit `06ce976370`) - (cherry picked from commit `d1f960eee2`) Parent PR: #21909 Closes scylladb/scylladb#22331 * github.com:scylladb/scylladb: test/topology_custom: Add test for Scylla with disabled view building main, view: Pair view builder drain with its start Revert "main,cql_test_env: start group0_service before view_builder"	2025-01-17 09:57:36 +01:00
Sergey Zolotukhin	2ea97d8c19	raft: Handle non-critical config update errors in when changing status to voter. When a node is bootstrapped and joins a cluster as a non-voter, errors can occur while committing a new Raft record, for instance, if the Raft leader changes during this time. These errors are not critical and should not cause a node crash, as the action can be retried. Fixes scylladb/scylladb#20814 (cherry picked from commit `8c48f7ad62`)	2025-01-16 20:09:31 +00:00
Sergey Zolotukhin	36ff3e8f5f	test: Add test to check that a node does not fail on unknown commit status error when starting up. Test that a node is starting successfully if while joining a cluster and becoming a voter, it receives an unknown commit status error. Test for scylladb/scylladb#20814 (cherry picked from commit `16053a86f0`)	2025-01-16 20:09:31 +00:00
Sergey Zolotukhin	325cdd3ebc	raft: Add run_op_with_retry in raft_group0. Since when calling `modify_config` it's quite often we need to do retries, to avoid code duplication, a function wrapper that allows a function to be called with automatic retries in case of failures was added. (cherry picked from commit `775411ac56`)	2025-01-16 20:09:30 +00:00
Michael Litvak	c5bdc9e58f	view_builder: write status to tables before starting to build When adding a new view for building, first write the status to the system tables and then add the view building step that will start building it. Otherwise, if we start building it before the status is written to the table, it may happen that we complete building the view, write the SUCCESS status, and then overwrite it with the STARTED status. The view_build_status table will remain in incorrect state indicating the view building is not complete. Fixes scylladb/scylladb#20638 (cherry picked from commit `b1be2d3c41`)	2025-01-16 20:08:36 +00:00
Kamil Braun	cbef20e977	Merge '[Backport 6.2] Fix possible data corruption due to token keys clashing in read repair.' from Sergey This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 Since the issue affects all the relevant scylla versions, backport to: 6.1, 6.2 - (cherry picked from commit `e577f1d141`) - (cherry picked from commit `39785c6f4e`) - (cherry picked from commit `155480595f`) Parent PR: #21996 Closes scylladb/scylladb#22298 * github.com:scylladb/scylladb: storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. test: Add test case for checking read repair diff calculation when having conflicting keys.	2025-01-16 17:13:12 +01:00
Dawid Mędrek	833ea91940	test/topology_custom: Add test for Scylla with disabled view building Before this commit, there doesn't seem to have been a test verifying that starting and shutting down Scylla behave correctly when the configuration option `view_building` is set to false. In these changes, we add one. (cherry picked from commit `d1f960eee2`)	2025-01-16 14:12:31 +01:00
Dawid Mędrek	1200d3b735	main, view: Pair view builder drain with its start In these changes, we pair draining the view builder with its start. To better understand what was done and why, let's first look at the situation before this commit and the context of it: (a) The following things happened in order: 1. The view builder would be constructed. 2. Right after that, a deferred lambda would be created to stop the view builder during shutdown. 3. group0_service would be started. 4. A deferred lambda stopping group0_service would be created right after that. 5. The view builder would be started. (b) Because the view builder depends on group0_client, it couldn't be started before starting group0_service. On the other hand, other services depend on the view builder, e.g. the stream manager. That makes changing the order of initialization a difficult problem, so we want to avoid doing that unless we're sure it's the right choice. (c) Since the view builder uses group0_client, there was a possibility of running into a segmentation fault issue in the following scenario: 1. A call to `view_builder::mark_view_build_success()` is issued. 2. We stop group0_service. 3. `view_builder::mark_view_build_success()` calls `announce_with_raft()`, which leads to a use-after-free because group0_service has already been destroyed. This very scenario took place in scylladb/scylladb#20772. Initially, we decided to solve the issue by initializing group0_service a bit earlier (scylladb/scylladb@7bad8378c7). Unfortunately, it led to other issues described in scylladb/scylladb#21534. We reverted that change in the previous commit. These changes are the second attempt to the problem where we want to solve it in a safer manner. The solution we came up with is to pair the start of the view builder with a deferred lambda that deinitializes it by calling `view_builder::drain()`. No other component of the system should be able to use the view builder anymore, so it's safe to do that. Furthermore, that pairing makes the analysis of initialization/deinitialization order much easier. We also solve the aformentioned use-after-free issue because the view builder itself will no longer attempt to use group0_client. Note that we still pair a deferred lambda calling `view_builder::stop()` with the construction of the view builder; that function will also call `view_builder::drain()`. Another notable thing is `view_builder::drain()` may be called earlier by `storage_service::do_drain()`. In other words, these changes cover the situation when Scylla runs into a problem when starting up. Fixes scylladb/scylladb#20772 (cherry picked from commit `06ce976370`)	2025-01-16 12:37:04 +01:00
Dawid Mędrek	84b774515b	Revert "main,cql_test_env: start group0_service before view_builder" The patch solved a problem related to an initialization order (scylladb/scylladb#20772), but we ran into another one: scylladb/scylladb#21534. After moving the initialization of group0_service, it ended up being destroyed AFTER the CDC generation service would. Since CDC generations are accessed in `storage_service::topology_state_load()`: ``` for (const auto& gen_id : _topology_state_machine._topology.committed_cdc_generations) { rtlogger.trace("topology_state_load: process committed cdc generation {}", gen_id); co_await _cdc_gens.local().handle_cdc_generation(gen_id); ``` we started getting the following failure: ``` Service &seastar::sharded<cdc::generation_service>::local() [Service = cdc::generation_service]: Assertion `local_is_initialized()' failed. ``` We're reverting the patch to go back to a more stable version of Scylla and in the following commit, we'll solve the original issue in a more systematic way. This reverts commit `7bad8378c7`. (cherry picked from commit `a5715086a4`)	2025-01-16 12:36:41 +01:00
Sergey Zolotukhin	06a8956174	test: Include parent test name in `ScyllaClusterManager` log file names. Add the test file name to `ScyllaClusterManager` log file names alongside the test function name. This avoids race conditions when tests with the same function names are executed simultaneously. Fixes scylladb/scylladb#21807 Backport: not needed since this is a fix in the testing scripts. Closes scylladb/scylladb#22192 (cherry picked from commit `2f1731c551`) Closes scylladb/scylladb#22249	2025-01-14 16:33:27 +01:00
Sergey Zolotukhin	f0f833e8ab	storage_proxy/read_repair: Remove redundant 'schema' parameter from `data_read_resolver::resolve` function. The `data_read_resolver` class inherits from `abstract_read_resolver`, which already includes the `schema_ptr _schema` member. Therefore, using a separate function parameter in `data_read_resolver::resolve` initialized with the same variable in `abstract_read_executor` is redundant. (cherry picked from commit `155480595f`)	2025-01-14 14:43:52 +01:00
Sergey Zolotukhin	b04b6aad9e	storage_proxy/read_repair: Use `partition_key` instead of `token` key for mutation diff calculation hashmap. This update addresses an issue in the mutation diff calculation algorithm used during read repair. Previously, the algorithm used `token` as the hashmap key. Since `token` is calculated basing on the Murmur3 hash function, it could generate duplicate values for different partition keys, causing corruption in the affected rows' values. Fixes scylladb/scylladb#19101 (cherry picked from commit `39785c6f4e`)	2025-01-14 14:37:47 +01:00
Sergey Zolotukhin	12ee41869a	test: Add test case for checking read repair diff calculation when having conflicting keys. The test updates two rows with keys that result in a Murmur3 hash collision, which is used to generate Scylla tokens. These tokens are involved in read repair diff calculations. Due to the identical token values, a hash map key collision occurs. Consequently, an incorrect value from the second row (with a different primary key) is then sent for writing as 'repaired', causing data corruption. (cherry picked from commit `e577f1d141`)	2025-01-13 22:05:32 +00:00
Kamil Braun	ef93c3a8d7	Merge '[Backport 6.2] cache_algorithm_test: fix flaky failures' from Michał Chojnowski This series attempts to get read of flakiness in cache_algorithm_test by solving two problems. Problem 1: The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. Problem 2: Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) Fixes scylladb/scylladb#21536 (cherry picked from commit `1fffd976a4`) (cherry picked from commit `6caaead4ac`) Parent PR: scylladb/scylladb#21948 Closes scylladb/scylladb#22228 * github.com:scylladb/scylladb: cache_algorithm_test: harden against stats being confused by background activity cache_algorithm_test: fix a use of an uninitialized variable	2025-01-09 14:30:31 +01:00
Aleksandra Martyniuk	a59c4653fe	repair: check tasks local to given shard Currently task_manager_module::is_aborted checks the tasks local to caller's shard on a given shard. Fix the method to check the task map local to the given shard. Fixes: #22156. Closes scylladb/scylladb#22161 (cherry picked from commit `a91e03710a`) Closes scylladb/scylladb#22197	2025-01-08 13:07:38 +02:00
Yaron Kaikov	2e87e317d9	.github/scripts/auto-backport.py: Add comment to PR when conflicts apply When we open a PR with conflicts, the PR owner gets a notification about the assignment but has no idea if this PR is with conflicts or not (in Scylla it's important since CI will not start on draft PR) Let's add a comment to notify the user we have conflicts Closes scylladb/scylladb#21939 (cherry picked from commit `2e6755ecca`) Closes scylladb/scylladb#22190	2025-01-08 13:07:08 +02:00
Botond Dénes	f73f7c17ec	Merge 'sstables_manager: do not reclaim unlinked sstables' from Lakshmi Narayanan Sreethar When an sstable is unlinked, it remains in the _active list of the sstable manager. Its memory might be reclaimed and later reloaded, causing issues since the sstable is already unlinked. This patch updates the on_unlink method to reclaim memory from the sstable upon unlinking, remove it from memory tracking, and thereby prevent the issues described above. Added a testcase to verify the fix. Fixes #21887 This is a bug fix in the bloom filter reload/reclaim mechanism and should be backported to older versions. Closes scylladb/scylladb#21895 * github.com:scylladb/scylladb: sstables_manager: reclaim memory from sstables on unlink sstables_manager: introduce reclaim_memory_and_stop_tracking_sstable() sstables: introduce disable_component_memory_reload() sstables_manager: log sstable name when reclaiming components (cherry picked from commit `d4129ddaa6`) Closes scylladb/scylladb#21998	2025-01-08 13:06:24 +02:00
Michał Chojnowski	379f23d854	cache_algorithm_test: harden against stats being confused by background activity Cache stats are global, so there's no good way to reliably verify that e.g. a given read causes 0 cache misses, because something done by Scylla in a background can trigger a cache miss. This can cause the test to fail spuriously. With how the test framework and the cache are designed, there's probably no good way to test this properly. It would require ensuring that cache stats are per-read, or at least per-table, and that Scylla's background activity doesn't cause enough memory pressure to evict the tested rows. This patch tries to deal with the flakiness without deleting the test altogether by letting it retry after a failure if it notices that it can be explained by a read which wasn't done by the test. (Though, if the test can't be written well, maybe it just shouldn't be written...) (cherry picked from commit `6caaead4ac`)	2025-01-08 11:48:23 +01:00
Michał Chojnowski	10815d2599	cache_algorithm_test: fix a use of an uninitialized variable The test needs to create some arbitrary partition keys of a given size. It intends to create keys of the form: 0x0000000000000000000000000000000000000000... 0x0100000000000000000000000000000000000000... 0x0200000000000000000000000000000000000000... But instead, unintentionally, it creates partially initialized keys of the form: 0x0000000000000000garbagegarbagegarbagegar... 0x0100000000000000garbagegarbagegarbagegar... 0x0200000000000000garbagegarbagegarbagegar... Each of these keys is created several times and -- for the test to pass -- the result must be the same each time. By coincidence, this is usually the case, since the same allocator slots are used. But if some background task happens to overwrite the allocator slot during a preemption, the keys used during "SELECT" will be different than the keys used during "INSERT", and the test will fail due to extra cache misses. (cherry picked from commit `1fffd976a4`)	2025-01-08 11:48:17 +01:00
Patryk Jędrzejczak	a63a0eac1e	[Backport 6.2] raft: improve logs for abort while waiting for apply New logs allow us to easily distinguish two cases in which waiting for apply times out: - the node didn't receive the entry it was waiting for, - the node received the entry but didn't apply it in time. Distinguishing these cases simplifies reasoning about failures. The first case indicates that something went wrong on the leader. The second case indicates that something went wrong on the node on which waiting for apply timed out. As it turns out, many different bugs result in the `read_barrier` (which calls `wait_for_apply`) timeout. This change should help us in debugging bugs like these. We want to backport this change to all supported branches so that it helps us in all tests. Fixes scylladb/scylladb#22160 Closes scylladb/scylladb#22159	2025-01-07 17:01:22 +01:00
Kamil Braun	afd588d4c7	Merge '[Backport 6.2] Do not reset quarantine list in non raft mode' from Gleb The series contains small fixes to the gossiper one of which fixes #21930. Others I noticed while debugged the issue. Fixes: #21930 (cherry picked from commit `91cddcc17f`) Parent PR: #21956 Closes scylladb/scylladb#21991 * github.com:scylladb/scylladb: gossiper: do not reset _just_removed_endpoints in non raft mode gossiper: do not call apply for the node's old state	2025-01-03 16:28:08 +01:00
Abhinav	f5bce45399	Fix gossiper orphan node floating problem by adding a remover fiber In the current scenario, if during startup, a node crashes after initiating gossip and before joining group0, then it keeps floating in the gossiper forever because the raft based gossiper purging logic is only effective once node joins group0. This orphan node hinders the successor node from same ip to join cluster since it collides with it during gossiper shadow round. This commit intends to fix this issue by adding a background thread which periodically checks for such orphan entries in gossiper and removes them. A test is also added in to verify this logic. This test fails without this background thread enabled, hence verifying the behavior. Fixes: scylladb/scylladb#20082 Closes scylladb/scylladb#21600 (cherry picked from commit `6c90a25014`) Closes scylladb/scylladb#21822	2025-01-02 14:57:46 +01:00
Gleb Natapov	cda997fe59	gossiper: do not reset _just_removed_endpoints in non raft mode By the time the function is called during start it may already be populated. Fixes: scylladb/scylladb#21930 (cherry picked from commit `e318dfb83a`)	2024-12-25 12:01:16 +02:00
Gleb Natapov	155a0462d5	gossiper: do not call apply for the node's old state If a nodes changed its address an old state may be still in a gossiper, so ignore it. (cherry picked from commit `e80355d3a1`)	2024-12-23 11:47:12 +02:00
Piotr Dulikowski	76b1173546	Merge 'service/topology_coordinator: migrate view builder only if all nodes are up' from Michał Jadwiszczak The migration process is doing read with consistency level ALL, requiring all nodes to be alive. Fixes scylladb/scylladb#20754 The PR should be backported to 6.2, this version has view builder on group0. Closes scylladb/scylladb#21708 * github.com:scylladb/scylladb: test/topology_custom/test_view_build_status: add reproducer service/topology_coordinator: migrate view builder only if all nodes are up (cherry picked from commit `def51e252d`) Closes scylladb/scylladb#21850	2024-12-19 14:10:55 +01:00
Piotr Dulikowski	e5a37d63c0	Merge 'transport/server: revert using async function in `for_each_gently()`' from Michał Jadwiszczak This patch reverts `324b3c43c0` and adds synchronous versions of `service_level_controller::find_effective_service_level()` and `client_state::maybe_update_per_service_level_params()`. It isn't safe to do asynchronous calls in `for_each_gently`, as the connection may be disconnected while a call in callback preempts. Fixes scylladb/scylladb#21801 Closes scylladb/scylladb#21761 * github.com:scylladb/scylladb: Revert "generic_server: use async function in `for_each_gently()`" transport/server: use synchronous calls in `for_each_gently` callback service/client_state: add synchronous method to update service level params qos/service_level_controller: add `find_cached_effective_service_level` (cherry picked from commit `c601f7a359`) Closes scylladb/scylladb#21849	2024-12-19 14:10:31 +01:00
Tomasz Grabiec	0851e3fba7	Merge '[Backport 6.2] utils: cached_file: Mark permit as awaiting on page miss' from ScyllaDB Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO (relies on https://github.com/scylladb/scylladb/pull/20522). But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or partition index IO involved because that IO will signal read concurrency semaphore to invite more concurrency. Fixes #21325 (cherry picked from commit `868f5b59c4`) (cherry picked from commit `0f2101b055`) Refs #21323 Closes scylladb/scylladb#21358 * github.com:scylladb/scylladb: utils: cached_file: Mark permit as awaiting on page miss utils: cached_file: Push resource_unit management down to cached_file	2024-12-16 19:55:00 +01:00
Michael Litvak	99f190f699	service/qos/service_level_controller: update cache on startup Update the service level cache in the node startup sequence, after the service level and auth service are initialized. The cache update depends on the service level data accessor being set and the auth service being initialized. Before the commit, it may happen that a cache update is not triggered after the initialization. The commit adds an explicit call to update the cache where it is guaranteed to be ready. Fixes scylladb/scylladb#21763 Closes scylladb/scylladb#21773 (cherry picked from commit `373855b493`) Closes scylladb/scylladb#21893	2024-12-16 14:19:06 +01:00
Michael Litvak	04e8506cbb	service/qos: increase timeout of internal get_service_levels queries The function get_service_levels is used to retrieve all service levels and it is called from multiple different contexts. Importantly, it is called internally from the context of group0 state reload, where it should be executed with a long timeout, similarly to other internal queries, because a failure of this function affects the entire group0 client, and a longer timeout can be tolerated. The function is also called in the context of the user command LIST SERVICE LEVELS, and perhaps other contexts, where a shorter timeout is preferred. The commit introduces a function parameter to indicate whether the context is internal or not. For internal context, a long timeout is chosen for the query. Otherwise, the timeout is shorter, the same as before. When the distinction is not important, a default value is chosen which maintains the same behavior. The main purpose is to fix the case where the timeout is too short and causes a failure that propagates and fails the group0 client. Fixes scylladb/scylladb#20483 Closes scylladb/scylladb#21748 (cherry picked from commit `53224d90be`) Closes scylladb/scylladb#21890	2024-12-16 14:15:26 +01:00
Yaron Kaikov	8e606a239f	github: check if PR is closed instead of merge In Scylla, we can have either `closed` or `merged` PRs. Based on that we decide when to start the backport process when the label was added after the PR is closed (or merged), In https://github.com/scylladb/scylladb/pull/21876 even when adding the proper backport label didn't trigger the backport automation. Https://github.com/scylladb/scylladb/pull/21809/ caused this, we should have left the `state=closed` (this includes both closed and merged PR) Fixing it Closes scylladb/scylladb#21906 (cherry picked from commit `b4b7617554`) Closes scylladb/scylladb#21922	2024-12-16 14:07:32 +02:00
Jenkins Promoter	8cdff8f52f	Update ScyllaDB version to: 6.2.3	2024-12-15 15:55:40 +02:00
Kamil Braun	3ec741dbac	Merge 'topology_coordinator: introduce reload_count in topology state and use it to prevent race' from Gleb Natapov Topology request table may change between the code reading it and calling to cv::when() since reading is a preemption point. In this case cv:signal can be missed. Detect that there was no signal in between reading and waiting by introducing reload_count which is increased each time the state is reloaded and signaled. If the counter is different before and after reading the state may have change so re-check it again instead of sleeping. Closes scylladb/scylladb#21713 * github.com:scylladb/scylladb: topology_coordinator: introduce reload_count in topology state and use it to prevent race storage_service: use conditional_variable::when in co-routines consistently (cherry picked from commit `8f858325b6`) Closes scylladb/scylladb#21803	2024-12-12 15:45:31 +01:00
Anna Stuchlik	e3e7ac16e9	doc: remove wrong image upgrade info (5.2-to-2023.1) This commit removes the information about the recommended way of upgrading ScyllaDB images - by updating ScyllaDB and OS packages in one step. This upgrade procedure is not supported (it was implemented, but then reverted). Refs https://github.com/scylladb/scylladb/issues/15733 Closes scylladb/scylladb#21876 Fixes https://github.com/scylladb/scylla-enterprise/issues/5041 Fixes https://github.com/scylladb/scylladb/issues/21898 (cherry picked from commit `98860905d8`)	2024-12-12 15:22:26 +02:00
Tomasz Grabiec	81d6d88016	utils: cached_file: Mark permit as awaiting on page miss Otherwise, the read will be considered as on-cpu during promoted index search, which will severely underutlize the disk because by default on-cpu concurrency is 1. I verified this patch on the worst case scenario, where the workload reads missing rows from a large partition. So partition index is cached (no IO) and there is no data file IO. But there is IO during promoted index search (via cached_file). Before the patch this workload was doing 4k req/s, after the patch it does 30k req/s. The problem is much less pronounced if there is data file or index file IO involved because that IO will signal read concurrency semaphore to invite more concurrency. (cherry picked from commit `0f2101b055`)	2024-12-09 23:18:00 +01:00
Tomasz Grabiec	56f93dd434	utils: cached_file: Push resource_unit management down to cached_file It saves us permit operations on the hot path when we hit in cache. Also, it will lay the ground for marking the permit as awaiting later. (cherry picked from commit `868f5b59c4`)	2024-12-09 23:17:56 +01:00
Kefu Chai	28a32f9c50	github: do not nest ${{}} inside condition In commit `2596d157`, we added a condition to run auto-backport.py only when the GitHub Action is triggered by a push to the default branch. However, this introduced an unexpected error due to incorrect condition handling. Problem: - `github.event.before` evaluates to an empty string - GitHub Actions' single-pass expression evaluation system causes the step to always execute, regardless of `github.event_name` Despite GitHub's documentation suggesting that ${{ }} can be omitted, it recommends using explicit ${{}} expressions for compound conditions. Changes: - Use explicit ${{}} expression for compound conditions - Avoid string interpolation in conditional statements Root Cause: The previous implementation failed because of how GitHub Actions evaluates conditional expressions, leading to an unintended script execution and a 404 error when attempting to compare commits. Example Error: ``` python .github/scripts/auto-backport.py --repo scylladb/scylladb --base-branch refs/heads/master --commits ..2b07d93beac7bc83d955dadc20ccc307f13f20b6 shell: /usr/bin/bash -e {0} env: DEFAULT_BRANCH: master GITHUB_TOKEN: *** Traceback (most recent call last): File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 201, in <module> main() File "/home/runner/work/scylladb/scylladb/.github/scripts/auto-backport.py", line 162, in main commits = repo.compare(start_commit, end_commit).commits File "/usr/lib/python3/dist-packages/github/Repository.py", line 888, in compare headers, data = self._requester.requestJsonAndCheck( File "/usr/lib/python3/dist-packages/github/Requester.py", line 353, in requestJsonAndCheck return self.__check( File "/usr/lib/python3/dist-packages/github/Requester.py", line 378, in __check raise self.__createException(status, responseHeaders, output) github.GithubException.UnknownObjectException: 404 {"message": "Not Found", "documentation_url": "https://docs.github.com/rest/commits/commits#compare-two-commits", "status": "404"} ``` Fixes scylladb/scylladb#21808 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21809 (cherry picked from commit `e04aca7efe`) Closes scylladb/scylladb#21820	2024-12-06 16:34:59 +02:00
Avi Kivity	20b2d5b7c9	Merge 'compaction: update maintenance sstable set on scrub compaction completion' from Lakshmi Narayanan Sreethar Scrub compaction can pick up input sstables from maintenance sstable set but on compaction completion, it doesn't update the maintenance set leaving the original sstable in set after it has been scrubbed. To fix this, on compaction completion has to update the maintenance sstable if the input originated from there. This PR solves the issue by updating the correct sstable_sets on compaction completion. Fixes #20030 This issue has existed since the introduction of main and maintenance sstable sets into scrub compaction. It would be good to have the fix backported to versions 6.1 and 6.2. Closes scylladb/scylladb#21582 * github.com:scylladb/scylladb: compaction: remove unused `update_sstable_lists_on_off_strategy_completion` compaction_group: replace `update_sstable_lists_on_off_strategy_completion` compaction_group: rename `update_main_sstable_list_on_compaction_completion` compaction_group: update maintenance sstable set on scrub compaction completion compaction_group: store table::sstable_list_builder::result in replacement_desc table::sstable_list_builder: remove old sstables only from current list table::sstable_list_builder: return removed sstables from build_new_list (cherry picked from commit `58baeac0ad`) Closes scylladb/scylladb#21790	2024-12-06 10:36:46 +02:00
Michael Pedersen	f37deb7e98	docs: correct the storage size for n2-highmem-32 to 9000GB updated storage size for n2-highmem-32 to 9000GB as this is default in SC Fixes scylladb/scylladb#21785 Closes scylladb/scylladb#21537 (cherry picked from commit `309f1606ae`) Closes scylladb/scylladb#21595	2024-12-05 09:51:11 +02:00
Tomasz Grabiec	933ec7c6ab	utils: UUID: Make get_time_UUID() respect the clock offset schema_change_test currently fails due to failure to start a cql test env in unit tests after the point where this is called (in one of the test cases): forward_jump_clocks(std::chrono::seconds(606024*31)); The problem manifests with a failure to join the cluster due to missing_column exception ("missing_column: done") being thrown from system_keyspace::get_topology_request_state(). It's a symptom of join request being missing in system.topology_requests. It's missing because the row is expired. When request is created, we insert the mutations with intended TTL of 1 month. The actual TTL value is computed like this: ttl_opt topology_request_tracking_mutation_builder::ttl() const { return std::chrono::duration_cast<std::chrono::seconds>(std::chrono::microseconds(_ts)) + std::chrono::months(1) - std::chrono::duration_cast<std::chrono::seconds>(gc_clock::now().time_since_epoch()); } _ts comes from the request_id, which is supposed to be a timeuuid set from current time when request starts. It's set using utils::UUID_gen::get_time_UUID(). It reads the system clock without adding the clock offset, so after forward_jump_clocks(), _ts and gc_clock::now() may be far off. In some cases the accumulated offset is larger than 1month and the ttl becomes negative, causing the request row to expire immediately and failing the boot sequence. The fix is to use db_clock, which respects offsets and is consistent with gc_clock. The test doesn't fail in CI becuase there each test case runs in a separate process, so there is no bootstrap attempt (by new cql test env) after forward_jump_clocks(). Closes scylladb/scylladb#21558 (cherry picked from commit `1d0c6aa26f`) Closes scylladb/scylladb#21584 Fixes #21581	2024-12-04 14:18:16 +01:00
Kefu Chai	2b5cd10b66	docs: explain task status retention and one-time query behavior Task status information from nodetool commands is not retained permanently: - Status of completed tasks is only kept for `task_ttl_in_seconds` - Status is removed after being queried, making it a one-time operation This behavior is important for users to understand since subsequent queries for the same completed task will not return any information. Add documentation to make this clear to users. Fixes scylladb/scylladb#21757 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21386 (cherry picked from commit `afeff0a792`) Closes scylladb/scylladb#21759	2024-12-04 13:49:24 +02:00
Kefu Chai	bf47de9f7f	test: topology_custom: ensure node visibility before keyspace creation Building upon commit `69b47694`, this change addresses a subtle synchronization weakness in node visibility checks during recovery mode testing. Previous Approach: - Waited only for the first node to see its peers - Insufficient to guarantee full cluster consistency Current Solution: 1. Implement comprehensive node visibility verification 2. Ensure all nodes mutually recognize each other 3. Prevent potential schema propagation race conditions Key Improvements: - Robust cluster state validation before keyspace creation - Eliminate partial visibility scenarios Fixes scylladb/scylladb#21724 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#21726 (cherry picked from commit `65949ce607`) Closes scylladb/scylladb#21734	2024-12-04 13:46:26 +02:00

1 2 3 4 5 ...

44704 Commits