scylladb

mirror of https://github.com/scylladb/scylladb.git synced 2026-04-25 02:50:33 +00:00

Author	SHA1	Message	Date
Botond Dénes	6091e81e18	reader_concurrency_semaphore: register_inactive_read(): handle aborted permit It is possible that the permit handed in to register_inactive_read() is already aborted (currently only possible if permit timed out). If the permit also happens to have wait for memory, the current code will attempt to call promise<>::set_exception() on the permit's promise to abort its waiters. But if the permit was already aborted via timeout, this promise will already have an exception and this will trigger an assert. Add a separate case for checking if the permit is aborted already. If so, treat it as immediate eviction: close the reader and clean up. Fixes: scylladb/scylladb#22919 (cherry picked from commit `7ba29ec46c`)	2025-04-11 04:04:42 -04:00
Botond Dénes	742ff76d4d	test/boost/reader_concurrency_semaphore_test: move away from db::timeout_clock::now() Unless the test in question actually wants to test timeouts. Timeouts will have more pronounced consequences soon and thus using db::timeout_clock::now() becomes a sure way to make tests flaky. To avoid this, use db::no_timeout in the tests that don't care about timeouts. (cherry picked from commit `4d8eb02b8d`)	2025-04-11 04:04:42 -04:00
Avi Kivity	11993efc8c	Merge '[Backport 6.2] row_cache: don't garbage-collect tombstones which cover data in memtables' from Scylladb[bot] The row cache can garbage-collect tombstones in two places: 1) When populating the cache - the underlying reader pipeline has a `compacting_reader` in it; 2) During reads - reads now compact data including garbage collection; In both cases, garbage collection has to do overlap checks against memtables, to avoid collecting tombstones which cover data in the memtables. This PR includes fixes for (2), which were not handled at all currently. (1) was already supposed to be fixed, see https://github.com/scylladb/scylladb/issues/20916. But the test added in this PR showed that the test is incomplete: https://github.com/scylladb/scylladb/issues/23291. A fix for this issue is also included. Fixes: https://github.com/scylladb/scylladb/issues/23291 Fixes: https://github.com/scylladb/scylladb/issues/23252 The fix will need backport to all live release. - (cherry picked from commit `c2518cdf1a`) - (cherry picked from commit `6b5b563ef7`) - (cherry picked from commit `7e600a0747`) - (cherry picked from commit `d126ea09ba`) - (cherry picked from commit `cb76cafb60`) - (cherry picked from commit `df09b3f970`) - (cherry picked from commit `e5afd9b5fb`) - (cherry picked from commit `34b18d7ef4`) - (cherry picked from commit `f7938e3f8b`) - (cherry picked from commit `6c1f6427b3`) - (cherry picked from commit `0d39091df2`) Parent PR: #23255 Closes scylladb/scylladb#23671 * github.com:scylladb/scylladb: test/boost/row_cache_test: add memtable overlap check tests replica/table: add error injection to memtable post-flush phase utils/error_injection: add a way to set parameters from error injection points test/cluster: add test_data_resurrection_in_memtable.py test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts replica/mutation_dump: don't assume cells are live replica/database: do_apply() add error injection point replica: improve memtable overlap checks for the cache replica/memtable: add is_merging_to_cache() db/row_cache: add overlap-check for cache tombstone garbage collection mutation/mutation_compactor: copy key passed-in to consume_new_partition()	2025-04-10 21:41:52 +03:00
Botond Dénes	5fb8a6dae2	mutation/frozen_mutation: frozen_mutation_consumer_adaptor: fix end-of-partition handling This adaptor adapts a mutation reader pausable consumer to the frozen mutation visitor interface. The pausable consumer protocol allows the consumer to skip the remaining parts of the partition and resume the consumption with the next one. To do this, the consumer just has to return stop_iteration::yes from one of the consume() overloads for clustering elements, then return stop_iteration::no from consume_end_of_partition(). Due to a bug in the adaptor, this sequence leads to terminating the consumption completely -- so any remaining partitions are also skipped. This protocol implementation bug has user-visible effects, when the only user of the adaptor -- read repair -- happens during a query which has limitations on the amount of content in each partition. There are two such queries: select distinct ... and select ... with partition limit. When converting the repaired mutation to to query result, these queries will trigger the skip sequence in the consumer and due to the above described bug, will skip the remaining partitions in the results, omitting these from the final query result. This patch fixes the protocol bug, the return value of the underlying consumer's consume_end_of_partition() is now respected. A unit test is also added which reproduces the problem both with select distinct ... and select ... per partition limit. Follow-up work: * frozen_mutation_consumer_adaptor::on_end_of_partition() calls the underlying consumer's on_end_of_stream(), so when consuming multiple frozen mutations, the underlying's on_end_of_stream() is called for each partition. This is incorrect but benign. * Improve documentation of mutation_reader::consume_pausable(). Fixes: #20084 Closes scylladb/scylladb#23657 (cherry picked from commit `d67202972a`) Closes scylladb/scylladb#23693	2025-04-10 21:38:13 +03:00
Botond Dénes	a157b3e62f	test/boost/row_cache_test: add memtable overlap check tests Similar to test/cluster/test_data_resurrection_in_memtable.py but works on a single node and uses more low-level mechanism. These tests can also reproduce more advanced scenarios, like concurrent reads, with some reading from flushed memtables. (cherry picked from commit `0d39091df2`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ce1d990dd6	replica/table: add error injection to memtable post-flush phase After the memtable was flushed to disk, but before it is merged to cache. The injection point will only active for the table specified in the "table_name" injection parameter. (cherry picked from commit `6c1f6427b3`)	2025-04-10 07:33:09 -04:00
Botond Dénes	37b51871ec	utils/error_injection: add a way to set parameters from error injection points With this, now it is possible to have two-way communication between the error injection point and its enabler. The test can enable the error injection point, then wait until it is hit, before proceedin. (cherry picked from commit `f7938e3f8b`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ac18570069	test/cluster: add test_data_resurrection_in_memtable.py Reproducers for #23252 and #23291 -- cache garbage collecting tombstones resurrecting data in the memtable. (cherry picked from commit `34b18d7ef4`)	2025-04-10 07:33:09 -04:00
Botond Dénes	990e92d7cf	test/pylib/utils: wait_for_cql_and_get_hosts(): sort hosts Such that a given index in the return hosts refers to the same underlying Scylla instance, as the same index in the passed-in nodes list. This is what users of this method intuitively expect, but currently the returned hosts list is unordered (has random order). (cherry picked from commit `e5afd9b5fb`)	2025-04-10 07:33:09 -04:00
Botond Dénes	67a56ae192	replica/mutation_dump: don't assume cells are live Currently the dumper unconditionally extracts the value of atomic cells, assuming they are live. This doesn't always hold of course and attempting to get the value of a dead cell will lead to marshalling errors. Fix by checking is_live() before attempting to get the cell value. Fix for both regular and collection cells. (cherry picked from commit `df09b3f970`)	2025-04-10 07:33:09 -04:00
Botond Dénes	85a7a9cb05	replica/database: do_apply() add error injection point So writes (to user tables) can be failed on a replica, via error injection. Should simplify tests which want to create differences in what writes different replicas receive. (cherry picked from commit `cb76cafb60`)	2025-04-10 07:33:09 -04:00
Botond Dénes	95205a1b29	replica: improve memtable overlap checks for the cache The current memtable overlap check that is used by the cache -- table::get_max_purgeable_fn_for_cache_underlying_reader() -- only checks the active memtable, so memtables which are either being flushed or are already flushed and also have active reads against them do not participate in the overlap check. This can result in temporary data resurrection, where a cache read can garbage-collect a tombstone which still covers data in a flushing or flushed memtable, which still have active read against it. To prevent this, extend the overlap check to also consider all of the memtable list. Furthermore, memtable_list::erase() now places the removed (flushed) memtable in an intrusive list. These entries are alive only as long as there are readers still keeping an `lw_shared_ptr<memtable>` alive. This list is now also consulted on overlap checks. (cherry picked from commit `d126ea09ba`)	2025-04-10 07:33:09 -04:00
Botond Dénes	ef423eb4c7	replica/memtable: add is_merging_to_cache() And set it when the memtable is merged to cache. (cherry picked from commit `7e600a0747`)	2025-04-10 07:33:08 -04:00
Botond Dénes	d10a2688b1	db/row_cache: add overlap-check for cache tombstone garbage collection The cache should not garbage-collect tombstone which cover data in the memtable. Add overlap checks (get_max_purgeable) to garbage collection to detect tombstones which cover data in the memtable and to prevent their garbage collection. (cherry picked from commit `6b5b563ef7`)	2025-04-10 07:33:08 -04:00
Botond Dénes	4647aa0366	mutation/mutation_compactor: copy key passed-in to consume_new_partition() This doesn't introduce additional work for single-partition queries: the key is copied anyway on consume_end_of_stream(). Multi-partition reads and compaction are not that sensitive to additional copy added. This change fixes a bug in the compacting_reader: currently the reader passes _last_uncompacted_partition_start.key() to the compactor's consume_new_partition(). When the compactor emits enough content for this partition, _last_uncompacted_partition_start is moved from to emit the partition start, this makes the key reference passed to the compaction corrupt (refer to moved-from value). This in turn means that subsequent GC checks done by the compactor will be done with a corrupt key and therefore can result in tombstone being garbage-collected while they still cover data elsewhere (data resurrection). The compacting reader is violating the API contract and normally the bug should be fixed there. We make an exception here because doing the fix in the mutation compactor better aligns with our future plans: * The fix simplifies the compactor (gets rid of _last_dk). * Prepares the way to get rid of the consume API used by the compactor. (cherry picked from commit `c2518cdf1a`)	2025-04-09 14:02:30 +00:00
Michał Chojnowski	664d36c737	table: fix a race in table::take_storage_snapshot() `safe_foreach_sstable` doesn't do its job correctly. It iterates over an sstable set under the sstable deletion lock in an attempt to ensure that SSTables aren't deleted during the iteration. The thing is, it takes the deletion lock after the SSTable set is already obtained, so SSTables might get unlinked before we take the lock. Remove this function and fix its usages to obtain the set and iterate over it under the lock. Closes scylladb/scylladb#23397 (cherry picked from commit `e23fdc0799`) Closes scylladb/scylladb#23627	2025-04-08 19:06:42 +03:00
Lakshmi Narayanan Sreethar	90add328ad	replica/table::do_apply : do not check for async gate's closure The `table::do_apply()` method verifies if the compaction group's async gate is open to determine if the compaction group is active. Closing this async gate prevents any new operations but waits for existing holders to exit, allowing their operations to complete. When holding a gate, holders will observe the gate as closed when it is being closed, but this is irrelevant as they are already inside the gate and are allowed to complete. All the callers of `table::do_apply()` already enter the gate before calling the method. So, the async gate check inside `table::do_apply()` will erroneously throw an exception when the compaction group is closing despite holding the gate. This commit removes the check to prevent this from happening. Fixes #23348 Signed-off-by: Lakshmi Narayanan Sreethar <lakshmi.sreethar@scylladb.com> Closes scylladb/scylladb#23579 (cherry picked from commit `750f4baf44`) Closes scylladb/scylladb#23644	2025-04-08 18:58:35 +03:00
Yaron Kaikov	f4909aafc7	.github: Make "make-pr-ready-for-review" workflow run in base repo in `57683c1a50` we fixed the `token` error, but removed the checkout part which causing now the following error ``` failed to run git: fatal: not a git repository (or any of the parent directories): .git ``` Adding the repo checkout stage to avoid such error Fixes: https://github.com/scylladb/scylladb/issues/22765 Closes scylladb/scylladb#23641 (cherry picked from commit `2dc7ea366b`) Closes scylladb/scylladb#23653	2025-04-08 13:47:50 +03:00
Kefu Chai	9e3eb4329c	.github: Make "make-pr-ready-for-review" workflow run in base repo The "make-pr-ready-for-review" workflow was failing with an "Input required and not supplied: token" error. This was due to GitHub Actions security restrictions preventing access to the token when the workflow is triggered in a fork: ``` Error: Input required and not supplied: token ``` This commit addresses the issue by: - Running the workflow in the base repository instead of the fork. This grants the workflow access to the required token with write permissions. - Simplifying the workflow by using a job-level `if` condition to controlexecution, as recommended in the GitHub Actions documentation (https://docs.github.com/en/actions/writing-workflows/choosing-when-your-workflow-runs/using-conditions-to-control-job-execution). This is cleaner than conditional steps. - Removing the repository checkout step, as the source code is not required for this workflow. This change resolves the token error and ensures the "make-pr-ready-for-review" workflow functions correctly. Fixes scylladb/scylladb#22765 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#22766 (cherry picked from commit `ca832dc4fb`) Closes scylladb/scylladb#23617	2025-04-07 08:11:09 +03:00
Kefu Chai	4ac3f82df9	dist: systemd: use default KillMode before this change, we specify the KillMode of the scylla-service service unit explicitly to "process". according to according to https://www.freedesktop.org/software/systemd/man/latest/systemd.kill.html, > If set to process, only the main process itself is killed (not recommended!). and the document suggests use "control-group" over "process". but scylla server is not a multi-process server, it is a multi-threaded server. so it should not make any difference even if we switch to the recommended "control-group". in the light that we've been seeing "defunct" scylla process after stopping the scylla service using systemd. we are wondering if we should try to change the `KillMode` to "control-group", which is the default value of this setting. in this change, we just drop the setting so that the systemd stops the service by stopping all processes in the control group of this unit are stopped. Fixes scylladb/scylladb#21507 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> (cherry picked from commit `961a53f716`) Closes scylladb/scylladb#23176	2025-04-04 17:55:04 +03:00
Yaron Kaikov	b67329b34e	.github: add action to make PR ready for review when conflicts label was removed Moving a PR out of draft is only allowed to users with write access, adding a github action to switch PR to `ready for review` once the `conflicts` label was removed Closes scylladb/scylladb#22446 (cherry picked from commit `ed4bfad5c3`) Closes scylladb/scylladb#23006	2025-03-30 11:59:40 +03:00
Kefu Chai	48ff7cf61c	gms: Fix fmt formatter for gossip_digest_sync In commit `4812a57f`, the fmt-based formatter for gossip_digest_syn had formatting code for cluster_id, partitioner, and group0_id accidentally commented out, preventing these fields from being included in the output. This commit restores the formatting by uncommenting the code, ensuring full visibility of all fields in the gossip_digest_syn message when logging permits. This fixes a regression introduced in `4812a57f`, which obscured these fields and reduced debugging insight. Backporting is recommended for improved observability. Fixes #23142 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23155 (cherry picked from commit `2a9966a20e`) Closes scylladb/scylladb#23198	2025-03-30 11:57:15 +03:00
Kefu Chai	18d5af1cd3	storage_proxy: Prevent integer overflow in abstract_read_executor::execute Fix UBSan abort caused by integer overflow when calculating time difference between read and write operations. The issue occurs when: 1. The queried partition on replicas is not purgeable (has no recorded modified time) 2. Digests don't match across replicas 3. The system attempts to calculate timespan using missing/negative last_modified timestamps This change skips cross-DC repair optimization when write timestamp is negative or missing, as this optimization is only relevant for reads occurring within write_timeout of a write. Error details: ``` service/storage_proxy.cc:5532:80: runtime error: signed integer overflow: -9223372036854775808 - 1741940132787203 cannot be represented in type 'int64_t' (aka 'long') SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior service/storage_proxy.cc:5532:80 Aborting on shard 1, in scheduling group sl:default ``` Related to previous fix `39325cf` which handled negative read_timestamp cases. Fixes #23314 Signed-off-by: Kefu Chai <kefu.chai@scylladb.com> Closes scylladb/scylladb#23359 (cherry picked from commit `ebf9125728`) Closes scylladb/scylladb#23386	2025-03-30 11:54:59 +03:00
Tomasz Grabiec	6cdd1cccdc	test: tablets: Fix flakiness due to ungraceful shutdown The test fails sporadically with: cassandra.ReadFailure: Error from server: code=1300 [Replica(s) failed to execute read] message="Operation failed for test3.test2 - received 1 responses and 1 failures from 2 CL=QUORUM." info={'consistency': 'QUORUM', 'required_responses': 2, 'received_responses': 1, 'failures': 1} That's becase a server is stopped in the middle of the workload. The server is stopped ungracefully which will cause some requests to time out. We should stop it gracefully to allow in-flight requests to finish. Fixes #20492 Closes scylladb/scylladb#23451 (cherry picked from commit `8e506c5a8f`) Closes scylladb/scylladb#23468	2025-03-28 14:56:39 +01:00
Anna Stuchlik	3f0e52a5ee	doc: zero-token nodes and Arbiter DC This commit adds documentation for zero-token nodes and an explanation of how to use them to set up an arbiter DC to prevent a quorum loss in multi-DC deployments. The commit adds two documents: - The one in Architecture describes zero-token nodes. - The other in Cluster Management explains how to use them. We need separate documents because zero-token nodes may be used for other purposes in the future. In addition, the documents are cross-linked, and the link is added to the Create a ScyllaDB Cluster - Multi Data Centers (DC) document. Refs https://github.com/scylladb/scylladb/pull/19684 Fixes https://github.com/scylladb/scylladb/issues/20294 Closes scylladb/scylladb#21348 (cherry picked from commit `9ac0aa7bba`) Closes scylladb/scylladb#23200	2025-03-10 10:52:13 +01:00
Piotr Dulikowski	9fc27b734f	test: test_mv_topology_change: increase timeout for removenode The test `test_mv_topology_change` is a regression test for scylladb/scylladb#19529. The problem was that CL=ANY writes issued when all replicas were down would be kept in memory until the timeout. In particular, MV updates are CL=ANY writes and have a 5 minute timeout. When doing topology operations for vnodes or when migrating tablet replicas, the cluster goes through stages where the replica sets for writes undergo changes, and the writes started with the old replica set need to be drained first. Because of the aforementioned MV updates, the removenode operation could be delayed by 5 minutes or more. Therefore, the `test_mv_topology_change` test uses a short timeout for the removenode operation, i.e. 30s. Apparently, this is too low for the debug mode and the test has been observed to time out even though the removenode operation is progressing fine. Increase the timeout to 60s. This is the lowest timeout for the removenode operation that we currently use among the in-repo tests, and is lower than 5 minutes so the test will still serve its purpose. Fixes: scylladb/scylladb#22953 Closes scylladb/scylladb#22958 (cherry picked from commit `43ae3ab703`) Closes scylladb/scylladb#23052	2025-03-04 16:04:00 +01:00
Benny Halevy	9fd5909a5e	token_group_based_splitting_mutation_writer: maybe_switch_to_new_writer: prevent double close Currently, maybe_switch_to_new_writer resets _current_writer only in a continuation after closing the current writer. This leaves a window of vulnerability if close() yields, and token_group_based_splitting_mutation_writer::close() is called. Seeing the engaged _current_writer, close() will call _current_writer->close() - which must be called exactly once. Solve this when switching to a new writer by resetting _current_writer before closing it and potentially yielding. Fixes #22715 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> Closes scylladb/scylladb#22922 (cherry picked from commit `29b795709b`) Closes scylladb/scylladb#22964	2025-02-23 14:27:11 +02:00
Botond Dénes	103a986eca	Merge '[Backport 6.2] reader_concurrency_semaphore: set_notify_handler(): disable timeout ' from Scylladb[bot] `set_notify_handler()` is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: https://github.com/scylladb/scylladb/issues/22629 Backport required to all active releases, they are all affected. - (cherry picked from commit `a3ae0c7cee`) - (cherry picked from commit `9174f27cc8`) Parent PR: #22701 Closes scylladb/scylladb#22751 * github.com:scylladb/scylladb: reader_concurrency_semaphore: set_notify_handler(): disable timeout reader_permit: mark check_abort() as const	2025-02-19 09:59:47 +02:00
Botond Dénes	1d4ea169e3	reader_concurrency_semaphore: set_notify_handler(): disable timeout set_notify_handler() is called after a querier was inserted into the querier cache. It has two purposes: set a callback for eviction and set a TTL for the cache entry. This latter was not disabling the pre-existing timeout of the permit (if any) and this would lead to premature eviction of the cache entry if the timeout was shorter than TTL (which his typical). Disable the timeout before setting the TTL to prevent premature eviction. Fixes: #scylladb/scylladb#22629 (cherry picked from commit `9174f27cc8`)	2025-02-18 04:48:22 -05:00
Botond Dénes	86c9bc778a	tools/scylla-nodetool: netstats: don't assume both senders and receivers The code currently assumes that a session has both sender and receiver streams, but it is possible to have just one or the other. Change the test to include this scenario and remove this assumption from the code. Fixes: #22770 Closes scylladb/scylladb#22771 (cherry picked from commit `87e8e00de6`) Closes scylladb/scylladb#22873	2025-02-18 10:35:10 +02:00
Botond Dénes	4acb366a28	service/storage_proxy: schedule_repair(): materialize the range into a vector Said method passes down its `diff` input to `mutate_internal()`, after some std::ranges massaging. Said massaging is destructive -- it moves items from the diff. If the output range is iterated-over multiple times, only the first time will see the actual output, further iterations will get an empty range. When trace-level logging is enabled, this is exactly what happens: `mutate_internal()` iterates over the range multiple times, first to log its content, then to pass it down the stack. This ends up resulting in a range with moved-from elements being pased down and consequently write handlers being created with nullopt mutations. Make the range re-entrant by materializing it into a vector before passing it to `mutate_internal()`. Fixes: scylladb/scylladb#21907 Fixes: scylladb/scylladb#21714 Closes scylladb/scylladb#21910 (cherry picked from commit `7150442f6a`) Closes scylladb/scylladb#22853	2025-02-18 10:34:42 +02:00
Botond Dénes	24eb8f49ba	service: query_pager: fix last-position for filtering queries On short-pages, cut short because of a tombstone prefix. When page-results are filtered and the filter drops some rows, the last-position is taken from the page visitor, which does the filtering. This means that last partition and row position will be that of the last row the filter saw. This will not match the last position of the replica, when the replica cut the page due to tombstones. When fetching the next page, this means that all the tombstone suffix of the last page, will be re-fetched. Worse still: the last position of the next page will not match that of the saved reader left on the replica, so the saved reader will be dropped and a new one created from scratch. This wasted work will show up as elevated tail latencies. Fix by always taking the last position from raw query results. Fixes: #22620 Closes scylladb/scylladb#22622 (cherry picked from commit `7ce932ce01`) Closes scylladb/scylladb#22718	2025-02-13 15:15:24 +02:00
Botond Dénes	8e6648870b	reader_concurrency_semaphore: with_permit(): proper clean-up after queue overload with_permit() creates a permit, with a self-reference, to avoid attaching a continuation to the permit's run function. This self-reference is used to keep the permit alive, until the execution loop processes it. This self reference has to be carefully cleared on error-paths, otherwise the permit will become a zombie, effectively leaking memory. Instead of trying to handle all loose ends, get rid of this self-reference altogether: ask caller to provide a place to save the permit, where it will survive until the end of the call. This makes the call-site a little bit less nice, but it gets rid of a whole class of possible bugs. Fixes: #22588 Closes scylladb/scylladb#22624 (cherry picked from commit `f2d5819645`) Closes scylladb/scylladb#22703	2025-02-13 15:03:57 +02:00
Botond Dénes	77696b1e43	reader_concurrency_semaphore: foreach_permit(): include _inactive_reads So inactive reads show up in semaphore diagnostics dumps (currently the only non-test user of this method). Fixes: #22574 Closes scylladb/scylladb#22575 (cherry picked from commit `e1b1a2068a`) Closes scylladb/scylladb#22610	2025-02-13 15:03:23 +02:00
Aleksandra Martyniuk	3497ba7f60	replica: mark registry entry as synch after the table is added When a replica get a write request it performs get_schema_for_write, which waits until the schema is synced. However, database::add_column_family marks a schema as synced before the table is added. Hence, the write may see the schema as synced, but hit no_such_column_family as the table hasn't been added yet. Mark schema as synced after the table is added to database::_tables_metadata. Fixes: #22347. Closes scylladb/scylladb#22348 (cherry picked from commit `328818a50f`) Closes scylladb/scylladb#22603	2025-02-13 15:02:59 +02:00
Aleksandra Martyniuk	ade0fe2d7a	nodetool: tasks: print empty string for start_time/end_time if unspecified If start_time/end_time is unspecified for a task, task_manager API returns epoch. Nodetool prints the value in task status. Fix nodetool tasks commands to print empty string for start_time/end_time if it isn't specified. Modify nodetool tasks status docs to show empty end_time. Fixes: #22373. Closes scylladb/scylladb#22370 (cherry picked from commit `477ad98b72`) Closes scylladb/scylladb#22600	2025-02-13 13:26:54 +02:00
Jenkins Promoter	72cf5ef576	Update ScyllaDB version to: 6.2.4	2025-02-09 16:52:35 +02:00
Botond Dénes	2978ed58a2	reader_permit: mark check_abort() as const All it does is read one field, making it const makes using it easier. (cherry picked from commit `a3ae0c7cee`)	2025-02-09 00:32:13 +00:00
Tomasz Grabiec	6922acb69f	Merge '[Backport 6.2] split: run set_split_mode() on all storage groups during all_storage_groups_split()' from Scylladb[bot] `tablet_storage_group_manager::all_storage_groups_split()` calls `set_split_mode()` for each of its storage groups to create split ready compaction groups. It does this by iterating through storage groups using `std::ranges::all_of()` which is not guaranteed to iterate through the entire range, and will stop iterating on the first occurrence of the predicate (`set_split_mode()`) returning false. `set_split_mode()` creates the split compaction groups and returns false if the storage group's main compaction group or merging groups are not empty. This means that in cases where the tablet storage group manager has non-empty storage groups, we could have a situation where split compaction groups are not created for all storage groups. The missing split compaction groups are later created in `tablet_storage_group_manager::split_all_storage_groups()` which also calls `set_split_mode()`, and that is the reason why split completes successfully. The problem is that `tablet_storage_group_manager::all_storage_groups_split()` runs under a group0 guard, but `tablet_storage_group_manager::split_all_storage_groups()` does not. This can cause problems with operations which should exclude with compaction group creation. i.e. DROP TABLE/DROP KEYSPACE Fixes #22431 This is a bugfix and should be back ported to versions with tablets: 6.1 6.2 and 2025.1 - (cherry picked from commit `24e8d2a55c`) - (cherry picked from commit `8bff7786a8`) Parent PR: #22330 Closes scylladb/scylladb#22559 * github.com:scylladb/scylladb: test: add reproducer and test for fix to split ready CG creation table: run set_split_mode() on all storage groups during all_storage_groups_split()	2025-02-07 14:22:57 +01:00
Tomasz Grabiec	61e303a3e3	locator: network_topology_strategy: Fix SIGSEGV when creating a table when there is a rack with no normal nodes In that case, new_racks will be used, but when we discover no candidates, we try to pop from existing_racks. Fixes #22625 Closes scylladb/scylladb#22652 (cherry picked from commit `e22e3b21b1`) Closes scylladb/scylladb#22721	2025-02-06 16:47:14 +01:00
Avi Kivity	8ede62d288	Update seastar submodule (hwloc failures on some AWS instances) * seastar ec5da7a606...e40388c4c7 (1): > resource: fallback to sysconf when failed to detect memory size from hwloc Fixes #22382	2025-02-04 16:29:45 +02:00
Avi Kivity	4ac9c710fc	Merge '[Backport 6.2] api: task_manager: do not unregister finish task when its status is queried' from Scylladb[bot] Currently, when the status of a task is queried and the task is already finished, it gets unregistered. Getting the status shouldn't be a one-time operation. Stop removing the task after its status is queried. Adjust tests not to rely on this behavior. Add task_manager/drain API and nodetool tasks drain command to remove finished tasks in the module. Fixes: https://github.com/scylladb/scylladb/issues/21388. It's a fix to task_manager API, should be backported to all branches - (cherry picked from commit `e37d1bcb98`) - (cherry picked from commit `18cc79176a`) Parent PR: #22310 Closes scylladb/scylladb#22597 * github.com:scylladb/scylladb: api: task_manager: do not unregister tasks on get_status api: task_manager: add /task_manager/drain	2025-02-03 23:04:31 +02:00
Avi Kivity	34fa9bd586	Merge '[Backport 6.2] Simplify loading_cache_test and use manual_clock' from Scylladb[bot] This series exposes a Clock template parameter for loading_cache so that the test could use the manual_clock rather than the lowres_clock, since relying on the latter is flaky. In addition, the test load function is simplified to sleep some small random time and co_return the expected string, rather than reading it from a real file, since the latter's timing might also be flaky, and it out-of-scope for this test. Fixes #20322 * The test was flaky forever, so backport is required for all live versions. - (cherry picked from commit `b509644972`) - (cherry picked from commit `934a9d3fd6`) - (cherry picked from commit `d68829243f`) - (cherry picked from commit `b258f8cc69`) - (cherry picked from commit `0841483d68`) - (cherry picked from commit `32b7cab917`) Parent PR: #22064 Closes scylladb/scylladb#22640 * github.com:scylladb/scylladb: tests: loading_cache_test: use manual_clock utils: loading_cache: make clock_type a template parameter test: loading_cache_test: use function-scope loader test: loading_cache_test: simlute loader using sleep test: lib: eventually: add sleep function param test: lib: eventually: make *EVENTUALLY_EQUAL inline functions	2025-02-03 22:56:31 +02:00
Benny Halevy	79bff0885c	tests: loading_cache_test: use manual_clock Relying on a real-time clock like lowres_clock can be flaky (in particular in debug mode). Use manual_clock instead to harden the test against timing issues. Fixes #20322 Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `32b7cab917`) Signed-off-by: Benny Halevy <bhalevy@scylladb.com>	2025-02-03 16:15:46 +02:00
Benny Halevy	abf8f44e03	utils: loading_cache: make clock_type a template parameter So the unit test can use manual_clock rather than lowres_clock which can be flaky (in particular in debug mode). Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `0841483d68`)	2025-02-03 16:02:37 +02:00
Benny Halevy	00f1dcfd09	test: loading_cache_test: use function-scope loader Rather than a global function, accessing a thread-local `load_count`. The thread-local load_count cannot be used when multiple test cases run in parallel. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b258f8cc69`)	2025-02-03 16:01:53 +02:00
Benny Halevy	b0166a3a9c	test: loading_cache_test: simlute loader using sleep This test isn't about reading values from file, but rather it's about the loading_cache. Reading from the file can sometimes take longer than the expected refresh times, causing flakiness (see #20322). Rather than reading a string from a real file, just sleep a random, short time, and co_return the string. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `d68829243f`)	2025-02-03 16:00:51 +02:00
Benny Halevy	7addc3454d	test: lib: eventually: add sleep function param To allow support for manual_clock instead of seastar::sleep. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `934a9d3fd6`)	2025-02-03 16:00:47 +02:00
Benny Halevy	9d5e3f050e	test: lib: eventually: make *EVENTUALLY_EQUAL inline functions rather then macros. This is a first cleanup step before adding a sleep function parameter to support also manual_clock. Also, add a call to BOOST_REQUIRE_EQUAL/BOOST_CHECK_EQUAL, respectively, to make an error more visible in the test log since those entry points print the offending values when not equal. Signed-off-by: Benny Halevy <bhalevy@scylladb.com> (cherry picked from commit `b509644972`)	2025-02-03 15:50:22 +02:00
Michael Litvak	c7421a8804	view_builder: fix loop in view builder when tokens are moved The view builder builds a view by going over the entire token ring, consuming the base table partitions, and generating view updates for each partition. A view is considered as built when we complete a full cycle of the token ring. Suppose we start to build a view at a token F. We will consume all partitions with tokens starting at F until the maximum token, then go back to the minimum token and consume all partitions until F, and then we detect that we pass F and complete building the view. This happens in the view builder consumer in `check_for_built_views`. The problem is that we check if we pass the first token F with the condition `_step.current_token() >= it->first_token` whenever we consume a new partition or the current_token goes back to the minimum token. But suppose that we don't have any partitions with a token greater than or equal to the first token (this could happen if the partition with token F was moved to another node for example), then this condition will never be satisfied, and we don't detect correctly when we pass F. Instead, we go back to the minimum token, building the same token ranges again, in a possibly infinite loop. To fix this we add another step when reaching the end of the reader's stream. When this happens it means we don't have any more fragments to consume until the end of the range, so we advance the current_token to the end of the range, simulating a partition, and check for built views in that range. Fixes scylladb/scylladb#21829 Closes scylladb/scylladb#22493 (cherry picked from commit `6d34125eb7`) Closes scylladb/scylladb#22606	2025-02-03 13:27:28 +01:00

1 2 3 4 5 ...

44770 Commits