scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Gleb Natapov	aa75444438	test: test that expired erm that held for too long triggers notification (cherry picked from commit `5dcdaa6f66`)	2025-11-26 15:08:41 +00:00
Gleb Natapov	e2d59df166	token_metadata: fix notification about expiring erm held for to long Commit `6e4803a750` broke notification about expired erms held for too long since it resets the tracker without calling its destructor (where notification is triggered). Fix assign operator to call destructor. (cherry picked from commit `9f97c376f1`)	2025-11-26 15:08:41 +00:00
Ernest Zaslavsky	7e6b653e5c	streaming: fix loop break condition in tablet_sstable_streamer::stream Correct the loop termination logic that previously caused certain SSTables to be prematurely excluded, resulting in lost mutations. This change ensures all relevant SSTables are properly streamed and their mutations preserved. (cherry picked from commit `dedc8bdf71`) Closes scylladb/scylladb#27153 Fixes: #26979 Parent PR: #26980 Unfortunatelly the pytest based test cannot be ported back because of changes made to the testing harness and scylla-tools	2025-11-25 11:59:01 +03:00
Avi Kivity	84b7e06268	tools: toolchain: prepare: replace 'reg' with 'skopeo' The prepare scripts uses 'reg' to verify we're not going to overwrite an existing image. The 'reg' command is not available in Fedora 43. Use 'skopeo' instead. Skopeo is part of the podman ecosystem so hopefully will live longer. Fixes #27178. Closes scylladb/scylladb#27179 (cherry picked from commit `d6ef5967ef`) Closes scylladb/scylladb#27199	2025-11-24 16:32:04 +02:00
Jenkins Promoter	812fc721cd	Update ScyllaDB version to: 2025.3.5	2025-11-24 15:50:44 +02:00
Raphael S. Carvalho	867cb1e7ac	replica: Fail timed-out single-key read on cleaned up tablet replica Consider the following: 1) single-key read starts, blocks on replica e.g. waiting for memory. 2) the same replica is migrated away 3) single-key read expires, coordinator abandons it, releases erm. 4) migration advances to cleanup stage, barrier doesn't wait on timed-out read 5) compaction group of the replica is deallocated on cleanup 6) that single-key resumes, but doesn't find sstable set (post cleanup) 7) with abort-on-internal-error turned on, node crashes It's fine for abandoned (= timed out) reads to fail, since the coordinator is gone. For active reads (non timed out), the barrier will wait for them since their coordinator holds erm. This solution consists of failing reads which underlying tablet replica has been cleaned up, by just converting internal error to plain exception. Fixes #26229. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#27078 (cherry picked from commit `74ecedfb5c`) Closes scylladb/scylladb#27155	2025-11-21 17:48:21 +03:00
Patryk Jędrzejczak	a9fc235aee	test: test_raft_recovery_stuck: ensure mutual visibility before using driver Not waiting for nodes to see each other as alive can cause the driver to fail the request sent in `wait_for_upgrade_state()`. scylladb/scylladb#19771 has already replaced concurrent restarts with `ManagerClient.rolling_restart()`, but it has missed this single place, probably because we do concurrent starts here. Fixes #27055 Closes scylladb/scylladb#27075 (cherry picked from commit `e35ba974ce`) Closes scylladb/scylladb#27109	2025-11-20 10:41:58 +02:00
Botond Dénes	78ecb8854a	Merge '[Backport 2025.3] Automatic cleanup improvements' from Scylladb[bot] This series allows an operator to reset 'cleanup needed' flag if he already cleaned up the node, so that automatic cleanup will not do it again. We also change 'nodetool cleanup' back to run cleanup on one node only (and reset 'cleanup needed' flag in the end), but the new '--global' option allows to run cleanup on all nodes that needed it simultaneously. Fixes https://github.com/scylladb/scylladb/issues/26866 Backport to all supported version since automatic cleanup behaviour as it is now may create unexpected by the operator load during cluster resizing. - (cherry picked from commit `e872f9cb4e`) - (cherry picked from commit `0f0ab11311`) Parent PR: #26868 Closes scylladb/scylladb#27093 * github.com:scylladb/scylladb: cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster cleanup: Add RESTful API to allow reset cleanup needed flag	2025-11-20 10:41:04 +02:00
Botond Dénes	d2d9140029	Merge '[Backport 2025.3] encryption::kms_host: Add exponential backoff-retry for 503 errors' from Scylladb[bot] Refs #26822 Fixes #27062 AWS says to treat 503 errors, at least in the case of ec2 metadata query, as backoff-retry (generally, we do _not_ retry on provider level, but delegate this to higher levels). This patch adds special treatment for 503:s (service unavailable) for both ec2 meta and actual endpoint, doing exponential backoff. Note: we do _not_ retry forever. Not tested as such, since I don't get any errors when testing (doh!). Should try to set up a mock ec2 meta with injected errors maybe. - (cherry picked from commit `190e3666cb`) - (cherry picked from commit `d22e0acf0b`) Parent PR: #26934 Closes scylladb/scylladb#27063 * github.com:scylladb/scylladb: encryption::kms_host: Add exponential backoff-retry for 503 errors encryption::kms_host: Include http error code in kms_error	2025-11-20 10:40:20 +02:00
Botond Dénes	91e6efdde8	Merge '[Backport 2025.3] service/qos: Fall back to default scheduling group when using maintenance socket' from Scylladb[bot] The service level controller relies on `auth::service` to collect information about roles and the relation between them and the service levels (those attached to them). Unfortunately, the service level controller is initialized way earlier than `auth::service` and so we had to prevent potential invalid queries of user service levels (cf. `46193f5e79`). Unfortunately, that came at a price: it made the maintenance socket incompatible with the current implementation of the service level controller. The maintenance socket starts early, before the `auth::service` is fully initialized and registered, and is exposed almost immediately. If the user attempts to connect to Scylla within this time window, via the maintenance socket, one of the things that will happen is choosing the right service level for the connection. Since the `auth::service` is not registered, Scylla with fail an assertion and crash. A similar scenario occurs when using maintenance mode. The maintenance socket is how the user communicates with the database, and we're not prepared for that either. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. Some accesses to `auth::service` are not affected and we do not modify those. Fixes scylladb/scylladb#26816 Backport: yes. This is a fix of a regression. - (cherry picked from commit `c0f7622d12`) - (cherry picked from commit `222eab45f8`) - (cherry picked from commit `394207fd69`) - (cherry picked from commit `b357c8278f`) Parent PR: #26856 Closes scylladb/scylladb#27039 * github.com:scylladb/scylladb: test/cluster/test_maintenance_mode.py: Wait for initialization test: Disable maintenance mode correctly in test_maintenance_mode.py test: Fix keyspace in test_maintenance_mode.py service/qos: Do not crash Scylla if auth_integration absent	2025-11-20 10:39:48 +02:00
Botond Dénes	a067723f55	Merge '[Backport 2025.3] cdc: set column drop timestamp in the future' from Scylladb[bot] When dropping a column from a CDC log table, set the column drop timestamp several seconds into the future. If a value is written to a column concurrently with dropping that column, the value's timestamp may be after the column drop timestamp. If this value is also flushed to an SSTable, the SSTable would be corrupted, because it considers the column missing after the drop timestamp and doesn't allow values for it. While this issue affects general tables, it especially impacts CDC tables because this scenario can occur when writing to a table with CDC preimage enabled while dropping a column from the base table. This happens even if the base mutation doesn't write to the dropped column, because CDC log mutations can generate values for a column even if the base mutation doesn't. For general tables, this issue can be avoided by simply not writing to a column while dropping it. We fix this for the more problematic case of CDC log tables by setting the column drop timestamp several seconds into the future, ensuring that writes concurrent with column drops are much less likely to have timestamps greater than the column drop timestamp. Fixes https://github.com/scylladb/scylladb/issues/26340 the issue affects all previous releases, backport to improve stability - (cherry picked from commit `eefae4cc4e`) - (cherry picked from commit `48298e38ab`) - (cherry picked from commit `039323d889`) - (cherry picked from commit `e85051068d`) Parent PR: #26533 Closes scylladb/scylladb#27036 * github.com:scylladb/scylladb: test: test concurrent writes with column drop with cdc preimage cdc: check if recreating a column too soon cdc: set column drop timestamp in the future	2025-11-20 10:39:18 +02:00
Gleb Natapov	b53bf43844	cleanup: introduce "nodetool cluster cleanup" command to run cleanup on all dirty nodes in the cluster `97ab3f6622` changed "nodetool cleanup" (without arguments) to run cleanup on all dirty nodes in the cluster. This was somewhat unexpected, so this patch changes it back to run cleanup on the target node only (and reset "cleanup needed" flag afterwards) and it adds "nodetool cluster cleanup" command that runs the cleanup on all dirty nodes in the cluster. (cherry picked from commit `0f0ab11311`)	2025-11-19 10:53:42 +02:00
Gleb Natapov	3d60e5e825	cleanup: Add RESTful API to allow reset cleanup needed flag Cleaning up a node using per keyspace/table interface does not reset cleanup needed flag in the topology. The assumption was that running cleanup on already clean node does nothing and completes quickly. But due to https://github.com/scylladb/scylladb/issues/12215 (which is closed as WONTFIX) this is not the case. This patch provides the ability to reset the flag in the topology if operator cleaned up the node manually already. (cherry picked from commit `e872f9cb4e`)	2025-11-19 10:44:30 +02:00
Avi Kivity	e9e849c2bf	Merge '[Backport 2025.3] Synchronize tablet split and load-and-stream' from Scylladb[bot] Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1) split starts 2) split finalization executes barrier and succeed 3) load-and-stream runs now, starts writing sstable (pre-split) 4) split finalization publishes changes to tablet metadata 5) load-and-stream finishes writing sstable 6) sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): 1) load-and-stream awaits for topology to quiesce 2) perform split compaction on sstable that spans both sibling tablets This patch implements # 1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes https://github.com/scylladb/scylladb/issues/26455. - (cherry picked from commit `3abc66da5a`) - (cherry picked from commit `4654cdc6fd`) Parent PR: #26456 Closes scylladb/scylladb#26648 * github.com:scylladb/scylladb: sstables_loader: Don't bypass synchronization with busy topology test: Add reproducer for l-a-s and split synchronization issue sstables_loader: Synchronize tablet split and load-and-stream	2025-11-17 17:14:36 +02:00
Calle Wilund	484e7aed2c	encryption::kms_host: Add exponential backoff-retry for 503 errors Refs #26822 AWS says to treat 503 errors, at least in the case of ec2 metadata query, as backoff-retry (generally, we do _not_ retry on provider level, but delegate this to higher levels). This patch adds special treatment for 503:s (service unavailable) for both ec2 meta and actual endpoint, doing exponential backoff. Note: we do _not_ retry forever. Not tested as such, since I don't get any errors when testing (doh!). Should try to set up a mock ec2 meta with injected errors maybe. v2: * Use utils::exponential_backoff_retry (cherry picked from commit `d22e0acf0b`)	2025-11-17 11:48:42 +00:00
Calle Wilund	77407fd704	encryption::kms_host: Include http error code in kms_error Keep track of actual HTTP failure. (cherry picked from commit `190e3666cb`)	2025-11-17 11:48:41 +00:00
Benny Halevy	898f193ef6	scylla-sstable: correctly dump sharding_metadata This patch fixes 2 issues at one go: First, Currently sstables::load clears the sharding metadata (via open_data()), and so scylla-sstable always prints an empty array for it. Second, printing token values would generate invalid json as they are currently printed as binary bytes, and they should be printed simply as numbers, as we do elsewhere, for example, for the first and last keys. Fixes #26982 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26991 (cherry picked from commit `f9ce98384a`) Closes scylladb/scylladb#27037 scylla-2025.3.4-candidate-20251117021558 scylla-2025.3.4	2025-11-16 15:43:35 +02:00
Michael Litvak	ba40c1eba9	test: test concurrent writes with column drop with cdc preimage add a test that writes to a table concurrently with dropping a column, where the table has CDC enabled with preimage. the test reproduces issue #26340 where this results in a malformed sstable. (cherry picked from commit `e85051068d`)	2025-11-16 10:03:07 +01:00
Michael Litvak	28eaa12af9	cdc: check if recreating a column too soon When we drop a column from a CDC log table, we set the column drop timestamp a few seconds into the future. This can cause unexpected problems if a user tries to recreate a CDC column too soon, before the drop timestamp has passed. To prevent this issue, when creating a CDC column we check its creation timestamp against the existing drop timestamp, if any, and fail with an informative error if the recreation attempt is too soon. (cherry picked from commit `039323d889`)	2025-11-16 10:03:07 +01:00
Michael Litvak	c37d224db6	cdc: set column drop timestamp in the future When dropping a column from a CDC log table, set the column drop timestamp several seconds into the future. If a value is written to a column concurrently with dropping that column, the value's timestamp may be after the column drop timestamp. If this value is also flushed to an SSTable, the SSTable would be corrupted, because it considers the column missing after the drop timestamp and doesn't allow values for it. While this issue affects general tables, it especially impacts CDC tables because this scenario can occur when writing to a table with CDC preimage enabled while dropping a column from the base table. This happens even if the base mutation doesn't write to the dropped column, because CDC log mutations can generate values for a column even if the base mutation doesn't. For general tables, this issue can be avoided by simply not writing to a column while dropping it. We fix this for the more problematic case of CDC log tables by setting the column drop timestamp several seconds into the future, ensuring that writes concurrent with column drops are much less likely to have timestamps greater than the column drop timestamp. Fixes scylladb/scylladb#26340 (cherry picked from commit `48298e38ab`)	2025-11-16 09:34:51 +01:00
Dawid Mędrek	7b32c277fe	test/cluster/test_maintenance_mode.py: Wait for initialization If we try to perform queries too early, before the call to `storage_service::start_maintenance_mode` has finished, we will fail with the following error: ``` ERROR 2025-11-12 20:32:27,064 [shard 0:sl:d] token_metadata - sorted_tokens is empty in first_token_index! ``` To avoid that, we should wait until initialization is complete. (cherry picked from commit `b357c8278f`)	2025-11-15 22:10:28 +00:00
Dawid Mędrek	6d6f870a5f	test: Disable maintenance mode correctly in test_maintenance_mode.py Although setting the value of `maintenance_mode` to the string `"false"` disables maintenance mode, the testing framework misinterprets the value and thinks that it's actually enabled. As a result, it might try to connect to Scylla via the maintenance socket, which we don't want. (cherry picked from commit `394207fd69`)	2025-11-15 22:10:28 +00:00
Dawid Mędrek	7112e0bfba	test: Fix keyspace in test_maintenance_mode.py The keyspace used in the test is not necessarily called `ks`. (cherry picked from commit `222eab45f8`)	2025-11-15 22:10:28 +00:00
Dawid Mędrek	c96bd48fd0	service/qos: Do not crash Scylla if auth_integration absent If the user connects to Scylla via the maintenance socket, it may happen that `auth_integration` has not been registered in the service level controller yet. One example is maintenance mode when that will never happen; another when the connection occurs before Scylla is fully initialized. To avoid unnecessary crashes, we add new branches if the passed user is absent or if it corresponds to the anonymous role. Since the role corresponding to a connection via the maintenance socket is the anonymous role, that solves the problem. In those cases, we completely circumvent any calls to `auth_integration` and handle them separately. The modified methods are: * `get_user_scheduling_group`, * `with_user_service_level`, * `describe_service_levels`. For the first two, the new behavior is in line with the previous implementation of those functions. The last behaves differently now, but since it's a soft error, crashing the node is not necessary anyway. We throw an exception instead, whose error message should give the user a hint of what might be wrong. The other uses of `auth_integration` within the service level controller are not problematic: * `find_effective_service_level`, * `find_cached_effective_service_level`. They take the name of a role as their argument. Since the anonymous role doesn't have a name, it's not possible to call them with it. Fixes scylladb/scylladb#26816 (cherry picked from commit `c0f7622d12`)	2025-11-15 22:10:28 +00:00
Jenkins Promoter	e6e3678e00	Update pgo profiles - aarch64	2025-11-15 05:11:05 +02:00
Jenkins Promoter	b5f03af147	Update pgo profiles - x86_64	2025-11-15 04:30:41 +02:00
Raphael S. Carvalho	d63e9342ef	sstables_loader: Don't bypass synchronization with busy topology The patch `c543059f86` fixed the synchronization issue between tablet split and load-and-stream. The synchronization worked only with raft topology, and therefore was disabled with gossip. To do the check, storage_service::raft_topology_change_enabled() but the topology kind is only available/set on shard 0, so it caused the synchronization to be bypassed when load-and-stream runs on any shard other than 0. The reason the reproducer didn't catch it is that it was restricted to single cpu. It will now run with multi cpu and catch the problem observed. Fixes #22707 Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#26730 (cherry picked from commit `7f34366b9d`) (cherry picked from commit `e8a74d0fb3`) Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com>	2025-11-14 10:49:30 -03:00
Botond Dénes	420242646b	Merge '[Backport 2025.3] [schema] Speculative retry rounding fix' from Scylladb[bot] This patch series re-enables support for speculative retry values `0` and `100`. These values have been supported some time ago, before [schema: fix issue 21825: add validation for PERCENTILE values in speculative_retry configuration. #21879 ](https://github.com/scylladb/scylladb/pull/21879). When that PR prevented using invalid `101PERCENTILE` values, valid `100PERCENTILE` and `0PERCENTILE` value were prevented too. Reproduction steps from [[Bug]: drop schema and all tables after apply speculative_retry = '99.99PERCENTILE' #26369](https://github.com/scylladb/scylladb/issues/26369) are unable to reproduce the issue after the fix. A test is added to make sure the inclusive border values `0` and `100` are supported. Documentation is updated to give more information to the users. It now states that these border values are inclusive, and also that the precision, with automatic rounding, is 1 decimal digit. Fixes #26369 This is a bug fix. If at any time a client tries to use value >= 99.5 and < 100, the raft error will happen. Backport is needed. The code which introduced inconsistency is introduced in 2025.2, so no backporting to 2025.1. - (cherry picked from commit `da2ac90bb6`) - (cherry picked from commit `5d1913a502`) - (cherry picked from commit `aba4c006ba`) - (cherry picked from commit `85f059c148`) - (cherry picked from commit `7ec9e23ee3`) Parent PR: #26909 Closes scylladb/scylladb#27014 * github.com:scylladb/scylladb: test: cqlpy: add test case for non-numeric PERCENTILE value schema: speculative_retry: update exception type for sstring ops docs: cql: ddl.rst: update speculative-retry-options test: cqlpy: add test for valid speculative_retry values schema: speculative_retry: allow 0 and 100 PERCENTILE values	2025-11-14 10:32:19 +02:00
Botond Dénes	8dd5cc3891	Merge '[Backport 2025.3] cql3: Fix std::bad_cast when deserializing vectors of collections' from Scylladb[bot] cql3: Fix std::bad_cast when deserializing vectors of collections This PR fixes a bug where attempting to INSERT a vector containing collections (e.g., `vector<set<int>,1>`) would fail. On the client side, this manifested as a `ServerError: std::bad_cast`. The cause was "type slicing" issue in the reserialize_value function. When retrieving the vector's element type, the result was being assigned by value (using auto) instead of by reference. This "sliced" the polymorphic abstract_type object, stripping it of its actual derived type information. As a result, a subsequent dynamic_cast would fail, even if the underlying type was correct. To prevent this entire class of bugs from happening again, I've made the polymorphic base class `abstract_type` explicitly uncopyable. Fixes: #26704 This fix needs to be backported as these releases are affected: `2025.4` , `2025.3`. - (cherry picked from commit `960fe3da60`) - (cherry picked from commit `77da4517d2`) Parent PR: #26740 Closes scylladb/scylladb#26997 * github.com:scylladb/scylladb: cql3: Make abstract_type explicitly noncopyable cql3: Fix std::bad_cast when deserializing vectors of collections	2025-11-14 10:30:55 +02:00
Yaron Kaikov	d4861c8068	install-dependencies.sh: update node_exporter to 1.10.2 Update node exporter to solve CVE-2025-22871 [regenerate frozen toolchain with optimized clang from https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-aarch64.tar.gz https://devpkg.scylladb.com/clang/clang-19.1.7-Fedora-41-x86_64.tar.gz ] Fixes: https://scylladb.atlassian.net/browse/SCYLLADB-5 Closes scylladb/scylladb#26916 (cherry picked from commit `c601371b57`) Closes scylladb/scylladb#26952	2025-11-14 10:28:56 +02:00
Dario Mirovic	143b903203	test: cqlpy: add test case for non-numeric PERCENTILE value Add test case for non-numeric PERCENTILE value, which raises an error different to the out-of-range invalid values. Regex in the test test_invalid_percentile_speculative_retry_values is expanded. Refs #26369 (cherry picked from commit `7ec9e23ee3`)	2025-11-13 19:44:43 +00:00
Dario Mirovic	6237b13959	schema: speculative_retry: update exception type for sstring ops Change speculative_retry::to_sstring and speculative_retry::from_sstring to throw exceptions::configuration_exception instead of std::invalid_argument. These errors can be triggered by CQL, so appropriate CQL exception should be used. Reference: https://github.com/scylladb/scylladb/issues/24748#issuecomment-3025213304 Refs #26369 (cherry picked from commit `85f059c148`)	2025-11-13 19:44:43 +00:00
Dario Mirovic	ee0f821ed2	docs: cql: ddl.rst: update speculative-retry-options Clarify how the value of `XPERCENTILE` is handled: - Values 0 and 100 are supported - The percentile value is rounded to the nearest 0.1 (1 decimal place) Refs #26369 (cherry picked from commit `aba4c006ba`)	2025-11-13 19:44:43 +00:00
Dario Mirovic	8b1547df9c	test: cqlpy: add test for valid speculative_retry values test_valid_percentile_speculative_retry_values is introduced to test that valid values for speculative_retry are properly accepted. Some of the values are moved from the test_invalid_percentile_speculative_retry_values test, because the previous commit added support for them. Refs #26369 (cherry picked from commit `5d1913a502`)	2025-11-13 19:44:43 +00:00
Dario Mirovic	f75c15e076	schema: speculative_retry: allow 0 and 100 PERCENTILE values This patch allows specifying 0 and 100 PERCENTILE values in speculative_retry. It was possible to specify these values before #21825. #21825 prevented specifying invalid values, like -1 and 101, but also prevented using 0 and 100. On top of that, speculative_retry::to_sstring function did rounding when formatting the string, which introduced inconsistency. Fixes #26369 (cherry picked from commit `da2ac90bb6`)	2025-11-13 19:44:43 +00:00
Karol Nowacki	b78c9ec5de	cql3: Make abstract_type explicitly noncopyable The polymorphic abstract_type class serves as an interface and should not be copied. To prevent accidental and unsafe copies, make it explicitly uncopyable. (cherry picked from commit `77da4517d2`)	2025-11-13 11:51:22 +01:00
Karol Nowacki	a8135cf239	cql3: Fix std::bad_cast when deserializing vectors of collections When deserializing a vector whose elements are collections (e.g., set, list), the operation raises a `std::bad_cast` exception. This was caused by type slicing due to an incorrect assignment of a polymorphic type by value instead of by reference. This resulted in a failed `dynamic_cast` even when the underlying type was correct. (cherry picked from commit `960fe3da60`)	2025-11-13 11:51:18 +01:00
Raphael S. Carvalho	bf359388b1	test: Add reproducer for l-a-s and split synchronization issue Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `4654cdc6fd`)	2025-11-12 22:16:41 -03:00
Raphael S. Carvalho	d3ce390e4d	sstables_loader: Synchronize tablet split and load-and-stream Load-and-stream is broken when running concurrently to the finalization step of tablet split. Consider this: 1) split starts 2) split finalization executes barrier and succeed 3) load-and-stream runs now, starts writing sstable (pre-split) 4) split finalization publishes changes to tablet metadata 5) load-and-stream finishes writing sstable 6) sstable cannot be loaded since it spans two tablets two possible fixes (maybe both): 1) load-and-stream awaits for topology to quiesce 2) perform split compaction on sstable that spans both sibling tablets This patch implements #1. By awaiting for topology to quiesce, we guarantee that load-and-stream only starts when there's no chance coordinator is handling some topology operation like split finalization. Fixes #26455. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `3abc66da5a`)	2025-11-12 22:16:38 -03:00
Yaron Kaikov	9a3d5da553	auto-backport: Add support for JIRA issue references - Added support for JIRA issue references in PR body and commit messages - Supports both short format (PKG-92) and full URL format - Maintains existing GitHub issue reference support - JIRA pattern matches https://scylladb.atlassian.net/browse/{PROJECT-ID} - Allows backporting for PRs that reference JIRA issues with 'fixes' keyword Fixes: https://github.com/scylladb/scylladb/issues/26955 Closes scylladb/scylladb#26954 (cherry picked from commit `3ade3d8f5b`) Closes scylladb/scylladb#26965	2025-11-12 22:37:09 +02:00
Botond Dénes	e6b721dfd6	service/storage_proxy: send batches with CL=EACH_QUORUM Batches that fail on the initial send are retired later, until they succeed. These retires happen with CL=ALL, regardless of what the original CL of the batch was. This is unnecessarily strict. We tried to follow Cassandra here, but Cassandra has a big caveat in their use of CL=ALL for batches. They accept saving just a hint for any/all of the endpoints, so a batch which was just logged in hints is good enough for them. We do not plan on replicating this usage of hints at this time, so as a middle ground, the CL is changed to EACH_QUORUM. Fixes: scylladb/scylladb#25432 Closes scylladb/scylladb#26304 (cherry picked from commit `d9c3772e20`) Closes scylladb/scylladb#26929	2025-11-11 10:38:11 +03:00
Ran Regev	bd526cb341	nodetool refresh primary-replica-only Fixes: #26440 1. Added description to primary-replica-only option 2. Fixed code text to better reflect the constrained cheked in the code itself. namely: that both primary replica only and scope must be applied only if load and steam is applied too, and that they are mutual exclusive to each other. Note: when https://github.com/scylladb/scylladb/issues/26584 is implemented (with #26609) there will be a need to align the docs as well - namely, primary-replica-only and scope will no longer be mutual exclusive Signed-off-by: Ran Regev <ran.regev@scylladb.com> Closes scylladb/scylladb#26480 (cherry picked from commit `aaf53e9c42`) Closes scylladb/scylladb#26905	2025-11-11 10:37:58 +03:00
Piotr Dulikowski	aaeb937359	Merge '[Backport 2025.3] transport: call update_scheduling_group for non-auth connections' from Andrzej Jackowski This is backport of fix for https://github.com/scylladb/scylladb/issues/26040 and related test (https://github.com/scylladb/scylladb/pull/26589) to 2025.3. Before this change, unauthorized connections stayed in main scheduling group. It is not ideal, in such case, rather sl:default should be used, to have a consistent behavior with a scenario where users is authenticated but there is no service level assigned to the user. This commit adds a call to update_scheduling_group at the end of connection creation for an unauthenticated user, to make sure the service level is switched to sl:default. Fixes: https://github.com/scylladb/scylladb/issues/26040 Fixes: https://github.com/scylladb/scylladb/issues/26581 (cherry picked from commit `278019c328`) (cherry picked from commit `8642629e8e`) No backport, as it's already a backport (but similar PRs will be created for 2025.4) Closes scylladb/scylladb#26814 * github.com:scylladb/scylladb: test: add test_anonymous_user to test_raft_service_levels transport: call update_scheduling_group for non-auth connections	2025-11-09 00:03:57 +01:00
Jenkins Promoter	508d06e264	Update ScyllaDB version to: 2025.3.4	2025-11-04 12:06:50 +02:00
Jenkins Promoter	a29329d418	Update pgo profiles - aarch64	2025-11-01 05:15:33 +02:00
Jenkins Promoter	2cb0354170	Update pgo profiles - x86_64	2025-11-01 04:55:45 +02:00
Andrzej Jackowski	8b15a6ee50	test: add test_anonymous_user to test_raft_service_levels The primary goal of this test is to reproduce scylladb/scylladb#26040 so the fix (`278019c328`) can be backported to older branches. Scenario: connect via CQL as an anonymous user and verify that the `sl:default` scheduling group is used. Before the fix for #26040 `main` scheduling group was incorrectly used instead of `sl:default`. Control connections may legitimately use `sl:driver`, so the test accepts those occurrences while still asserting that regular anonymous queries use `sl:default`. This adds explicit coverage on master. After scylladb#24411 was implemented, some other tests started to fail when scylladb#26040 was unfixed. However, none of the tests asserted this exact behavior. Refs: scylladb/scylladb#26040 Refs: scylladb/scylladb#26581 Closes scylladb/scylladb#26589 (cherry picked from commit `8642629e8e`)	2025-10-30 18:39:44 +01:00
Andrzej Jackowski	17f724f221	transport: call update_scheduling_group for non-auth connections Before this change, unauthorized connections stayed in `main` scheduling group. It is not ideal, in such case, rather `sl:default` should be used, to have a consistent behavior with a scenario where users is authenticated but there is no service level assigned to the user. This commit adds a call to `update_scheduling_group` at the end of connection creation for an unauthenticated user, to make sure the service level is switched to `sl:default`. Fixes: scylladb/scylladb#26040 (cherry picked from commit `278019c328`)	2025-10-30 18:38:43 +01:00
Pavel Emelyanov	5eb6da551f	Merge '[Backport 2025.3] db/config: Add SSTable compression options for user tables' from Scylladb[bot] ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well. This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality. Fixes #25195. - (cherry picked from commit `1106157756`) - (cherry picked from commit `ea41f652c4`) - (cherry picked from commit `a7e46974d4`) - (cherry picked from commit `e1d9c83406`) - (cherry picked from commit `8d5bd212ca`) - (cherry picked from commit `6ba0fa20ee`) - (cherry picked from commit `8410532fa0`) Parent PR: #26003 Closes scylladb/scylladb#26301 * github.com:scylladb/scylladb: test/cluster: Add tests for invalid SSTable compression options test/boost: Add tests for SSTable compression config options main: Validate SSTable compression options from config db/config: Add SSTable compression options for user tables db/config: Prepare compression_parameters for config system compressor: Validate presence of sstable_compression in parameters compressor: Add missing space in exception message	2025-10-30 10:31:16 +03:00
Pavel Emelyanov	0e6381f14d	lister: Fix race between readdir and stat Sometimes file::list_directory() returns entries without type set. In thase case lister calls file_type() on the entry name to get it. In case the call returns disengated type, the code assumes that some error occurred and resolves into exception. That's not correct. The file_type() method returns disengated type only if the file being inspected is missing (i.e. on ENOENT errno). But this can validly happen if a file is removed bettween readdir and stat. In that case it's not "some error happened", but a enry should be just skipped. In "some error happened", then file_type() would resolve into exceptional future on its own. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26595 (cherry picked from commit `d9bfbeda9a`) Closes scylladb/scylladb#26764 scylla-2025.3.3-candidate-20251030111537 scylla-2025.3.3	2025-10-29 11:34:47 +02:00

1 2 3 4 5 ...

48641 Commits