scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-29 04:37:00 +00:00

Author	SHA1	Message	Date
Patryk Jędrzejczak	397a214cbd	test: test_raft_recovery_stuck: reconnect driver after rolling restarts It turns out that #21477 wasn't sufficient to fix the issue. The driver may still decide to reconnect the connection after `rolling_restart` returns. One possible explanation is that the driver sometimes handles the DOWN notification after all nodes consider each other UP. Reconnecting the driver after restarting nodes seems to be a reliable workaround that many tests use. We also use it here. Fixes #19959 Closes scylladb/scylladb#26638 (cherry picked from commit `5321720853`) Closes scylladb/scylladb#26755	2025-10-29 11:37:15 +02:00
Anna Stuchlik	b5a5828fff	doc: add support for Debian 12 Fixes https://github.com/scylladb/scylladb/issues/26640 Closes scylladb/scylladb#26668 (cherry picked from commit `9c0ff7c46b`) Closes scylladb/scylladb#26677	2025-10-29 11:36:03 +02:00
Patryk Jędrzejczak	ffd70630e5	Merge '[Backport 2025.2] raft topology: fix group0 tombstone GC in the Raft-based recovery procedure' from Scylladb[bot] Group0 tombstone GC considers only the current group 0 members while computing the group 0 tombstone GC time. It's not enough because in the Raft-based recovery procedure, there can be nodes that haven't joined the current group 0 yet, but they have belonged to a different group 0 and thus have a non-empty group 0 state ID. The current code can cause a data resurrection in group 0 tables. We fix this issue in this PR and add a regression test. This issue was uncovered by `test_raft_recovery_entry_loss`, which became flaky recently. We skipped this test for now. We will unskip it in a following PR because it's skipped only on master, while we want to backport this PR. Fixes #26534 This PR contains an important bugfix, so we should backport it to all branches with the Raft-based recovery procedure (2025.2 and newer). - (cherry picked from commit `1d09b9c8d0`) - (cherry picked from commit `6b2e003994`) - (cherry picked from commit `c57f097630`) Parent PR: #26612 Closes scylladb/scylladb#26678 * https://github.com/scylladb/scylladb: test: test group0 tombstone GC in the Raft-based recovery procedure group0_state_id_handler: remove unused group0_server_accessor group0_state_id_handler: consider state IDs of all non-ignored topology members	2025-10-27 10:22:42 +01:00
Patryk Jędrzejczak	f2535f2c5e	test: test group0 tombstone GC in the Raft-based recovery procedure We add a regression test for the bug fixed in the previous commits. (cherry picked from commit `c57f097630`)	2025-10-24 13:02:27 +02:00
Patryk Jędrzejczak	42271b6c41	group0_state_id_handler: remove unused group0_server_accessor It became unused in the previous commit. (cherry picked from commit `6b2e003994`)	2025-10-22 17:12:20 +00:00
Patryk Jędrzejczak	3a662cf68d	group0_state_id_handler: consider state IDs of all non-ignored topology members It's not enough to consider only the current group 0 members. In the Raft-based recovery procedure, there can be nodes that haven't joined the current group 0 yet, but they have belonged to a different group 0 and thus have a non-empty group 0 state ID. We fix this issue in this commit by considering topology members instead. We don't consider ignored nodes as an optimization. When some nodes are dead, the group 0 state ID handler won't have to wait until all these nodes leave the cluster. It will only have to wait until all these nodes are ignored, which happens at the beginning of the first removenode/replace. As a result, tombstones of group 0 tables will be purged much sooner. We don't rename the `group0_members` variable to keep the change minimal. There seems to be no precise and succinct name for the used set of nodes anyway. We use `std::ranges::join_view` in one place because: - `std::ranges::concat` will become available in C++26, - `boost::range::join` is not a good option, as there is an ongoing effort to minimize external dependencies in Scylla. (cherry picked from commit `1d09b9c8d0`)	2025-10-22 17:12:20 +00:00
Asias He	57bea9bc47	repair: Fix uuid and nodes_down order in the log Fixes #26536 Closes scylladb/scylladb#26547 (cherry picked from commit `33bc1669c4`) Closes scylladb/scylladb#26628	2025-10-22 11:29:24 +03:00
Botond Dénes	0299d7bc46	Merge '[Backport 2025.2] db/config: Add SSTable compression options for user tables' from Scylladb[bot] ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size. The same default applies to system tables as well. This series introduces a new configuration option to allow customizing the default for user tables. It also adds some tests for the new functionality. Fixes #25195. - (cherry picked from commit `1106157756`) - (cherry picked from commit `ea41f652c4`) - (cherry picked from commit `a7e46974d4`) - (cherry picked from commit `e1d9c83406`) - (cherry picked from commit `8d5bd212ca`) - (cherry picked from commit `6ba0fa20ee`) - (cherry picked from commit `8410532fa0`) Parent PR: #26003 Closes scylladb/scylladb#26300 * github.com:scylladb/scylladb: test/cluster: Add tests for invalid SSTable compression options test/boost: Add tests for SSTable compression config options main: Validate SSTable compression options from config db/config: Add SSTable compression options for user tables db/config: Prepare compression_parameters for config system compressor: Validate presence of sstable_compression in parameters compressor: Add missing space in exception message	2025-10-20 10:43:08 +03:00
Botond Dénes	5880893227	Merge '[Backport 2025.2] raft topology: disable schema pulls in the Raft-based recovery procedure' from Scylladb[bot] Schema pulls should always be disabled when group 0 is used. However, `migration_manager::disable_schema_pulls()` is never called during a restart with `recovery_leader` set in the Raft-based recovery procedure, which causes schema pulls to be re-enabled on all live nodes (excluding the nodes replacing the dead nodes). Moreover, schema pulls remain enabled on each node until the node is restarted, which could be a very long time. We fix this issue and add a regression test in this PR. Fixes #26569 This is an important bug fix, so it should be backported to all branches with the Raft-based recovery procedure (2025.2 and newer branches). - (cherry picked from commit `ec3a35303d`) - (cherry picked from commit `da8748e2b1`) - (cherry picked from commit `71de01cd41`) Parent PR: #26572 Closes scylladb/scylladb#26596 * github.com:scylladb/scylladb: test: test_raft_recovery_entry_loss: fix the typo in the test case name test: verify that schema pulls are disabled in the Raft-based recovery procedure raft topology: disable schema pulls in the Raft-based recovery procedure	2025-10-20 10:42:36 +03:00
Nikos Dragazis	d995abfe0b	test/cluster: Add tests for invalid SSTable compression options Complementary to the previous patch. It triggers semantic validation checks in `compression_parameters::validate()` and expects the server to exit. The tests examine both command line and YAML options. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `8410532fa0`)	2025-10-20 00:00:03 +03:00
Nikos Dragazis	979925e822	test/boost: Add tests for SSTable compression config options Since patch `03461d6a54`, all boost unit tests depending on `cql_test_env` are compiled into a single executable (`combined_tests`). Add the new test in there. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `6ba0fa20ee`)	2025-10-20 00:00:02 +03:00
Nikos Dragazis	2acbc62d9d	main: Validate SSTable compression options from config `compression_parameters` provides two levels of validation: * syntactic checks - implemented in the constructor * semantic checks - implemented by `compression_parameters::validate()` The former are applied implicitly when parsing the options from the command line or from scylla.yaml. The latter are currently not applied, but they should. In lack of a better place, apply them in main, right after joining the cluster, to make sure that the cluster features have been negotiated. The feature needed here is the `SSTABLE_COMPRESSION_DICTS`. Validation will fail if the feature is disabled and a dictionary compression algorithm has been selected. Also, mark `validate()` as const so that it can be called from a config object. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `8d5bd212ca`)	2025-10-20 00:00:02 +03:00
Nikos Dragazis	5321da2c0b	db/config: Add SSTable compression options for user tables ScyllaDB offers the `compression` DDL property for configuring compression per user table (compression algorithm and chunk size). If not specified, the default compression algorithm is the LZ4Compressor with a 4KiB chunk size (refer to the default constructor for `compression_parameters`). The same default applies to system tables as well. Add a new configuration option to allow customizing the default for user tables. Use the previously hardcoded default as the new option's default value. Note that the option has no effect on ALTER TABLE statements. An altered table either inherits explicit compression options from the CQL statement, or maintains its existing options. Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `e1d9c83406`)	2025-10-20 00:00:02 +03:00
Nikos Dragazis	f23d4b1f49	db/config: Prepare compression_parameters for config system SSTable compression is currently configurable only per table, via the `compression` property in CREATE/ALTER TABLE statements. This is represented internally via the `compression_parameters` class. We plan to offer the same options via the configuration as well, to make the default compression method for user tables configurable. This patch prepares the ground by making the `compression_parameters` usable as a `config_file::named_value`, namely: * Define an extraction operator (required by `boost::program_options` for parsing the options from command line). * Define a formatter (required by `named_value::operator()`). * Define a template specialization for `config_type_for` (required by `named_value` constructor). * Define a yaml converter (required for parsing the options from scylla.yaml). Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `a7e46974d4`)	2025-10-19 23:59:26 +03:00
Patryk Jędrzejczak	b6a0a1290d	test: test_raft_recovery_entry_loss: fix the typo in the test case name (cherry picked from commit `71de01cd41`)	2025-10-17 10:26:22 +00:00
Patryk Jędrzejczak	26ed158cb0	test: verify that schema pulls are disabled in the Raft-based recovery procedure We do this at the end of `test_raft_recovery_entry_loss`. It's not worth to add a separate regression test, as tests of the recovery procedure are complicated and have a long running time. Also, we choose `test_raft_recovery_entry_loss` out of all tests of the recovery procedure because it does some schema changes. (cherry picked from commit `da8748e2b1`)	2025-10-17 10:26:22 +00:00
Patryk Jędrzejczak	4460b9e6fb	raft topology: disable schema pulls in the Raft-based recovery procedure Schema pulls should always be disabled when group 0 is used. However, `migration_manager::disable_schema_pulls()` is never called during a restart with `recovery_leader` set in the Raft-based recovery procedure, which causes schema pulls to be re-enabled on all live nodes (excluding the nodes replacing the dead nodes). Moreover, schema pulls remain enabled on each node until the node is restarted, which could be a very long time. The old gossip-based recovery procedure doesn't have this problem because we disable schema pulls after completing the upgrade-to-group0 procedure, which is a part of the old recovery procedure. Fixes #26569 (cherry picked from commit `ec3a35303d`)	2025-10-17 10:26:22 +00:00
Michał Chojnowski	fc8cc87fc2	test/boost/sstable_compressor_factory_test: fix thread-unsafe usage of Boost.Test It turns out that Boost assertions are thread-unsafe, (and can't be used from multiple threads concurrently). This causes the test to fail with cryptic log corruptions sometimes. Fix that by switching to thread-safe checks. Fixes scylladb/scylladb#24982 Closes scylladb/scylladb#26472 (cherry picked from commit `7c6e84e2ec`) Closes scylladb/scylladb#26550	2025-10-15 12:15:48 +03:00
Jenkins Promoter	de3c316e7d	Update pgo profiles - aarch64	2025-10-15 04:50:20 +03:00
Jenkins Promoter	71e2d7ae24	Update pgo profiles - x86_64	2025-10-15 04:28:19 +03:00
Michał Chojnowski	e585d6cb3b	test_sstable_compression_dictionaries_basic: reconnect robustly after node reboots Using `driver_connect()` after a cluster restart isn't enough to ensure full CQL availability, but the test assumes that it is. Fix that by making the test wait for CQL availability via `get_ready_cql()`. Also, replace some manual usages of wait_for_cql_and_get_hosts with `get_ready_cql()` too. Fixes scylladb/scylladb#25362 Closes scylladb/scylladb#25366 (cherry picked from commit `85fd4d23fa`) Closes scylladb/scylladb#26513	2025-10-12 21:09:18 +03:00
Michał Chojnowski	19874119e5	docs: fix a parameter name in API calls in sstable-dictionary-compression.rst The correct argument name is `cf`, not `table`. Fixes scylladb/scylladb#25275 Closes scylladb/scylladb#26447 (cherry picked from commit `87e3027c81`) Closes scylladb/scylladb#26493	2025-10-10 10:12:57 +03:00
Pavel Emelyanov	ce740b9e45	Merge '[Backport 2025.2] service/qos: set long timeout for auth queries on SL cache update' from Scylladb[bot] pass an appropriate query state for auth queries called from service level cache reload. we use the function qos_query_state to select a query_state based on caller context - for internal queries, we set a very long timeout. the service level cache reload is called from group0 reload. we want it to have a long timeout instead of the default 5 seconds for auth queries, because we don't have strict latency requirement on the one hand, and on the other hand a timeout exception is undesired in the group0 reload logic and can break group0 on the node. Fixes https://github.com/scylladb/scylladb/issues/25290 backport possible to improve stability - (cherry picked from commit `a1161c156f`) - (cherry picked from commit `3c3dd4cf9d`) - (cherry picked from commit `ad1a5b7e42`) Parent PR: #26180 Closes scylladb/scylladb#26477 * github.com:scylladb/scylladb: service/qos: set long timeout for auth queries on SL cache update auth: add query_state parameter to query functions auth: refactor query_all_directly_granted	2025-10-10 10:12:35 +03:00
Patryk Jędrzejczak	5e2ef4f3d8	raft topology: make the voter handler consider only group 0 members In the Raft-based recovery procedure, we create a new group 0 and add live nodes to it one by one. This means that for some time there are nodes which belong to the topology, but not to the new group 0. The voter handler running on the recovery leader incorrectly considers these nodes while choosing voters. The consequences: - misleading logs, for example, "making servers {<ID of a non-member>} voters", where the non-member won't become a voter anyway, - increased chance of majority loss during the recovery procedure, for example, all 3 nodes that first joined the new group 0 are in the same dc and rack, but only one of them becomes a voter because the voter handler tries to make non-members in other dcs/racks voters. Fixes #26321 Closes scylladb/scylladb#26327 (cherry picked from commit `67d48a459f`) Closes scylladb/scylladb#26426	2025-10-09 18:22:13 +02:00
Michael Litvak	9d9c94bf47	service/qos: set long timeout for auth queries on SL cache update pass an appropriate query state for auth queries called from service level cache reload. we use the function qos_query_state to select a query_state based on caller context - for internal queries, we set a very long timeout. the service level cache reload is called from group0 reload. we want it to have a long timeout instead of the default 5 seconds for auth queries, because we don't have strict latency requirement on the one hand, and on the other hand a timeout exception is undesired in the group0 reload logic and can break group0 on the node. Fixes scylladb/scylladb#25290 (cherry picked from commit `ad1a5b7e42`)	2025-10-09 12:47:31 +00:00
Michael Litvak	4d54e98304	auth: add query_state parameter to query functions add a query_state parameter to several auth functions that execute internal queries. currently the queries use the internal_distributed_query_state() query state, and we maintain this as default, but we want also to be able to pass a query state from the caller. in particular, the auth queries currently use a timeout of 5 seconds, and we will want to set a different timeout when executed in some different context. (cherry picked from commit `3c3dd4cf9d`)	2025-10-09 12:47:31 +00:00
Michael Litvak	28349a442f	auth: refactor query_all_directly_granted rewrite query_all_directly_granted to use execute_internal instead of query_internal in a style that is more consistent with the rest of the module. This will also be useful for a later change because execute_internal accepts an additional parameter of query_state. (cherry picked from commit `a1161c156f`)	2025-10-09 12:47:30 +00:00
Raphael S. Carvalho	40bad7524f	replica: Fix race between drop table and merge completion handling Consider this: 1) merge finishes, wakes up fiber to merge compaction groups 2) drop table happens, which in turn invokes truncate underneath 3) merge fiber stops old groups 4) truncate disables compaction on all groups, but the ones stopped 5) truncate performs a check that compaction has been disabled on all groups, including the ones stopped 6) the check fails because groups being stopped didn't have compaction explicitly disabled on them To fix it, the check on step 6 will ignore groups that have been stopped, since those are not eligible for having compaction explicitly disabled on them. The compaction check is there, so ongoing compaction will not propagate data being truncated, but here it happens in the context of drop table which doesn't leave anything behind. Also, a group stopped is somewhat equivalent to compaction disabled on it, since the procedure to stop a group stops all ongoing compaction and eventually removes its state from compaction manager. Fixes #25551. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> Closes scylladb/scylladb#25563 (cherry picked from commit `149f9d8448`) Closes scylladb/scylladb#25631	2025-10-08 09:40:44 +03:00
Botond Dénes	e18023f814	Merge '[Backport 2025.2] tools: fix documentation links after change to source-available' from Scylladb[bot] Some tools commands have links to online documentation in their help output. These links were left behind in the source-available change, they still point to the old opensource docs. Furthermore, the links in the scylla-sstable help output always point to the latest stable release's documentation, instead of the appropriate one for the branch the tool was built from. Fix both of these. Fixes: scylladb/scylladb#26320 Broken documentation link fix for the tool help output, needs backport to all live source-available versions. - (cherry picked from commit `5a69838d06`) - (cherry picked from commit `15a4a9936b`) - (cherry picked from commit `fe73c90df9`) Parent PR: #26322 Closes scylladb/scylladb#26387 * github.com:scylladb/scylladb: tools/scylla-sstable: fix doc links release: adjust doc_link() for the post source-available world tools/scylla-nodetool: remove trailing " from doc urls	2025-10-08 06:38:01 +03:00
Botond Dénes	e90ae84b00	tools/scylla-sstable: fix doc links The doc links in scylla-sstable help output are static, so they always point to the documentation of the latest stable release, not to the documentation of the release the tool binary is from. On top of that, the links point to old open-source documentation, which is now EOL. Fix both problems: point link at the new source-available documentation pages and make them version aware. (cherry picked from commit `fe73c90df9`)	2025-10-07 10:16:46 +03:00
Botond Dénes	fa3f9c08ea	release: adjust doc_link() for the post source-available world There is no more separate enterprise product and the doc urls are slightly different. (cherry picked from commit `15a4a9936b`)	2025-10-07 10:16:27 +03:00
Botond Dénes	926042f29f	tools/scylla-nodetool: remove trailing " from doc urls They are accidental leftover from a previous way of storing command descriptions. (cherry picked from commit `5a69838d06`)	2025-10-07 10:16:27 +03:00
Jenkins Promoter	00b673ac24	Update ScyllaDB version to: 2025.2.4	2025-10-05 16:34:26 +03:00
Benny Halevy	2c6cfad7e4	test_tablets_merge: test_tablet_split_merge_with_many_tables: reduce number of tables in debug mode As the test hits timeouts in debug mode on aarch64. Fixes #26252 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#26303 (cherry picked from commit `b81c6a339b`) Closes scylladb/scylladb#26324	2025-10-01 14:12:58 +03:00
Asias He	c575fb85fe	repair: Always reset node ops progress to 100% upon completion Always set the node ops progress to 100% when the operation finishes, regardless of success or failure. This ensures the progress never remains below 100%, which would otherwise indicates a pending node operation in case of an error. Fixes #26193 Closes scylladb/scylladb#26194 (cherry picked from commit `b31e651657`) Closes scylladb/scylladb#26265	2025-10-01 14:09:47 +03:00
Botond Dénes	d6e9844241	Merge '[Backport 2025.2] scylla-gdb: Fix fair-queue entry printing' from Scylladb[bot] Catching a live entry in IO queue is very rare event, so we haven't seen it so far, but the `_ticket` member had been removed ~2 years ago and had been replaced with `_capacity` which is plain 64bit integer. Fixes #26184 The issue is present in 2025.x as well and looks cheap to backport - (cherry picked from commit `8438c59ad3`) Parent PR: #26185 Also includes backport of #24835 which also applies to 2025.2 and is now crucial. The scylla_io_queues.ticket() method is renamed by this backport, but without 24835 it will be problematic to fix all callers of it Closes scylladb/scylladb#26263 * github.com:scylladb/scylladb: scylla-gdb: Fix fair-queue entry printing scylla-gdb: Don't show io_queue executing and queued resources	2025-10-01 14:09:21 +03:00
Botond Dénes	bbf9ac6252	Merge '[Backport 2025.2] compaction: ensure that all compaction executors are stopped' from Scylladb[bot] Currently, while stopping the compaction_manager, we stop task_manager compaction module and concurrently run compaction_manager::really_do_stop. really_do_stop stops and waits for all task_executors that are kept in compaction_manager::_tasks, but nothing ensures that no more tasks will be added there. Due to leftover tasks, we trigger on_fatal_internal_error. Modify the order of compaction_manager::stop. After the change, we stop compaction tasks in the following order: - abort module abort source; - close module gate in the background; - stop_ongoing_compactions (kept in compaction_manager::_tasks); - wait until module gate is closed. Check module abort source before creating compaction executor and adding it to _tasks. Thanks to the above, we can be sure that: - after module::stop there will be no tasks in _tasks; - compaction_manager::stop aborts all tasks; we don't wait for any whole compaction to finish. Fixes: https://github.com/scylladb/scylladb/issues/25806. Fixes shutdown bug; Needs backports to all version - (cherry picked from commit `17707d0e6b`) - (cherry picked from commit `97c77d7cd5`) Parent PR: #25885 Closes scylladb/scylladb#26223 * github.com:scylladb/scylladb: compaction: move _tasks check compaction: stop compaction module in really_do_stop	2025-10-01 14:08:45 +03:00
Jenkins Promoter	7d41ff11c9	Update pgo profiles - aarch64	2025-10-01 04:46:21 +03:00
Jenkins Promoter	7daded2360	Update pgo profiles - x86_64	2025-10-01 04:27:01 +03:00
Tomasz Grabiec	d3d07331b3	Merge '[Backport 2025.2] replica: Fix split compaction when tablet boundaries change' from Scylladb[bot] Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. All 2025.* versions are vulnerable, so fix must be backported to them. - (cherry picked from commit `0c1587473c`) - (cherry picked from commit `68f23d54d8`) Parent PR: #25690 Closes scylladb/scylladb#25934 * github.com:scylladb/scylladb: replica: Fix split compaction when tablet boundaries change replica: Futurize split_compaction_options()	2025-09-30 19:50:50 +02:00
Pavel Emelyanov	a597a93d5e	scylla-gdb: Fix fair-queue entry printing Catching a live entry in IO queue is very rare event, so we haven't seen it so far, but the `_ticket` member had been removed ~2 years ago and had been replaced with `_capacity` which is plain 64bit integer. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#26185 (cherry picked from commit `8438c59ad3`)	2025-09-30 11:27:07 +03:00
Pavel Emelyanov	d2d22170b4	scylla-gdb: Don't show io_queue executing and queued resources These counters are no longer accounted by io-queue code and are always zero. Even more -- accounting removal happened years ago and we don't have Scylla versions built with seastar older than that. Signed-off-by: Pavel Emelyanov <xemul@scylladb.com> Closes scylladb/scylladb#24835	2025-09-30 11:27:07 +03:00
Raphael S. Carvalho	65e78d6336	replica: Fix split compaction when tablet boundaries change Consider the following: 1) balancer emits split decision 2) split compaction starts 3) split decision is revoked 4) emits merge decision 5) completes merge, before compaction in step 2 finishes After last step, split compaction initiated in step 2 can fail because it works with the global tablet map, rather than the map when the compaction started. With the global state changing under its feet, on merge, the mutation splitting writer will think it's going backwards since sibling tablets are merged. This problem was also seen when running load-and-stream, where split initiated by the sstable writer failed, split completed, and the unsplit sstable is left in the table dir, causing problems in the restart. To fix this, let's make split compaction always work with the state when it started, not a global state. Fixes #24153. Signed-off-by: Raphael S. Carvalho <raphaelsc@scylladb.com> (cherry picked from commit `68f23d54d8`)	2025-09-29 20:26:36 -03:00
Nikos Dragazis	dc37af8205	compressor: Validate presence of sstable_compression in parameters SSTable compression parameters should always define an algorithm via the `sstable_compression` sub-option. Add a check in the constructor to ensure this is always provided (unless no options are given, which is interpreted as "no compression"). This change has no user-visible effect, since the same check is already performed at a higher-level, while validating the CQL properties of CREATE TABLE and ALTER TABLE statements (see `cf_prop_defs::validate()`). However, it will become useful in later patches, when compression config options will be introduced. Although now redundant, keep the sanity check in `cf_prop_defs::validate()` to maintain consistency of error messages with Cassandra. Note also that Cassandra uses 'class' instead of 'sstable_compression' since version 3.11.10, but Scylla still doesn't support this, see: https://github.com/scylladb/scylladb/issues/4200 Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `ea41f652c4`)	2025-09-28 20:00:56 +00:00
Nikos Dragazis	b957b4aac4	compressor: Add missing space in exception message Signed-off-by: Nikos Dragazis <nikolaos.dragazis@scylladb.com> (cherry picked from commit `1106157756`)	2025-09-28 20:00:56 +00:00
Ferenc Szili	31ac99493f	docs: add description of number of tablets computed by tablet allocator This change adds the documentation section which explains the algorithm to compute the absolute number of tablets a table has. Fixes: #25740 Closes scylladb/scylladb#25741 (cherry picked from commit `d462dc8839`) Closes scylladb/scylladb#26261	2025-09-28 20:29:39 +03:00
Aleksandra Martyniuk	bb548aae54	test: fix test_two_tablets_concurrent_repair_and_migration_repair_writer_level test_two_tablets_concurrent_repair_and_migration_repair_writer_level waits for the first node that logs info about repair_writer using asyncio.wait. The done group is never awaited, so we never learn about the error. The test itself is incorrect and the log about repair_writer is never printed. We never learn about that and tests finishes successfully after 10 minutes timeout. Fix the test: - disable hinted handoff; - repair tablets of the whole table: - new table is added so that concurrent migration is possible; - use wait_for_first_completed that awaits done group; - do some cleanups. Remove nightly mark. Fixes: #26148. Closes scylladb/scylladb#26209 (cherry picked from commit `48bbe09c8b`) Closes scylladb/scylladb#26219	2025-09-27 17:26:16 +03:00
Gleb Natapov	fab7024fba	storage_service: change node_ops_info::ignore_nodes to host id It drop useless translation from id to ip during removenode through topology coordinator. Closes scylladb/scylladb#25958 (cherry picked from commit `d3badf7406`) Closes scylladb/scylladb#26250	2025-09-26 10:55:34 +02:00
Aleksandra Martyniuk	163e5be3f7	compaction: move _tasks check In compaction_manager::really_do_stop we check whether _tasks list is empty after the compactions are stopped. However, a new task may still sneak in, causing the assertion failure. Such a task won't be there for long - module::make_task will fail as the module is already stopped. Move the assertion, that checks if _tasks is empty, after the compaction_states' gates are closed. Fixes: #25806. (cherry picked from commit `97c77d7cd5`)	2025-09-25 16:01:43 +02:00
Aleksandra Martyniuk	5cfd052e1b	compaction: stop compaction module in really_do_stop Currently, compaction::task_manager_module is stopped in compaction_manager::stop, concurrently to really_do_stop. We can't predict the order of the two. Do not set _task_manager_module to nullptr at stop, because compaction_manager::really_do_stop() may be called before the actual shutdown, while other components still try to use it. compaction::task_manager_module does not keep a pointer to compaction_manager, so we won't end up with memory leak. Stop compaction module in really_do_stop, after ongoing compactions are stopped. It's a preparation for further patches. (cherry picked from commit `17707d0e6b`)	2025-09-25 16:01:39 +02:00

1 2 3 4 5 ...

48140 Commits